Shortcuts

Welcome to MMDeploy’s documentation!

You can switch between Chinese and English documents in the lower-left corner of the layout.

Get Started

MMDeploy provides useful tools for deploying OpenMMLab models to various platforms and devices.

With the help of them, you can not only do model deployment using our pre-defined pipelines but also customize your own deployment pipeline.

Introduction

In MMDeploy, the deployment pipeline can be illustrated by a sequential modules, i.e., Model Converter, MMDeploy Model and Inference SDK.

deploy-pipeline

Model Converter

Model Converter aims at converting training models from OpenMMLab into backend models that can be run on target devices. It is able to transform PyTorch model into IR model, i.e., ONNX, TorchScript, as well as convert IR model to backend model. By combining them together, we can achieve one-click end-to-end model deployment.

MMDeploy Model

MMDeploy Model is the result package exported by Model Converter. Beside the backend models, it also includes the model meta info, which will be used by Inference SDK.

Inference SDK

Inference SDK is developed by C/C++, wrapping the preprocessing, model forward and postprocessing modules in model inference. It supports FFI such as C, C++, Python, C#, Java and so on.

Prerequisites

In order to do an end-to-end model deployment, MMDeploy requires Python 3.6+ and PyTorch 1.5+.

Step 0. Download and install Miniconda from the official website.

Step 1. Create a conda environment and activate it.

conda create --name mmdeploy python=3.8 -y
conda activate mmdeploy

Step 2. Install PyTorch following official instructions, e.g.

On GPU platforms:

conda install pytorch=={pytorch_version} torchvision=={torchvision_version} cudatoolkit={cudatoolkit_version} -c pytorch -c conda-forge

On CPU platforms:

conda install pytorch=={pytorch_version} torchvision=={torchvision_version} cpuonly -c pytorch

Note

On GPU platform, please ensure that {cudatoolkit_version} matches your host CUDA toolkit version. Otherwise, it probably brings in conflicts when deploying model with TensorRT.

Installation

We recommend that users follow our best practices installing MMDeploy.

Step 0. Install MMCV.

pip install -U openmim
mim install mmcv-full

Step 1. Install MMDeploy and inference engine

We recommend using MMDeploy precompiled package as our best practice. Currently, we support model converter and sdk inference pypi package, and the sdk c/cpp library is provided here. You can download them according to your target platform and device.

The supported platform and device matrix is presented as following:

OS-Arch Device ONNX Runtime TensorRT
Linux-x86_64 CPU Y N/A
CUDA Y Y
Windows-x86_64 CPU Y N/A
CUDA Y Y

Note: if MMDeploy prebuilt package doesn’t meet your target platforms or devices, please build MMDeploy from source

Take the latest precompiled package as example, you can install it as follows:

Linux-x86_64
# 1. install MMDeploy model converter
pip install mmdeploy==0.14.0

# 2. install MMDeploy sdk inference
# you can install one to install according whether you need gpu inference
# 2.1 support onnxruntime
pip install mmdeploy-runtime==0.14.0
# 2.2 support onnxruntime-gpu, tensorrt
pip install mmdeploy-runtime-gpu==0.14.0

# 3. install inference engine
# 3.1 install TensorRT
# !!! If you want to convert a tensorrt model or inference with tensorrt,
# download TensorRT-8.2.3.0 CUDA 11.x tar package from NVIDIA, and extract it to the current directory
pip install TensorRT-8.2.3.0/python/tensorrt-8.2.3.0-cp38-none-linux_x86_64.whl
pip install pycuda
export TENSORRT_DIR=$(pwd)/TensorRT-8.2.3.0
export LD_LIBRARY_PATH=${TENSORRT_DIR}/lib:$LD_LIBRARY_PATH
# !!! Moreover, download cuDNN 8.2.1 CUDA 11.x tar package from NVIDIA, and extract it to the current directory
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH

# 3.2 install ONNX Runtime
# you can install one to install according whether you need gpu inference
# 3.2.1 onnxruntime
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-1.8.1
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
# 3.2.2 onnxruntime-gpu
pip install onnxruntime-gpu==1.8.1
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-gpu-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-gpu-1.8.1.tgz
export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-gpu-1.8.1
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
Windows-x86_64

Please learn its prebuilt package from this guide.

Convert Model

After the installation, you can enjoy the model deployment journey starting from converting PyTorch model to backend model by running tools/deploy.py.

Based on the above settings, we provide an example to convert the Faster R-CNN in MMDetection to TensorRT as below:

# clone mmdeploy to get the deployment config. `--recursive` is not necessary
git clone https://github.com/open-mmlab/mmdeploy.git

# clone mmdetection repo. We have to use the config file to build PyTorch nn module
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .
cd ..

# download Faster R-CNN checkpoint
wget -P checkpoints https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth

# run the command to start model conversion
python mmdeploy/tools/deploy.py \
    mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
    checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
    mmdetection/demo/demo.jpg \
    --work-dir mmdeploy_model/faster-rcnn \
    --device cuda \
    --dump-info

The converted model and its meta info will be found in the path specified by --work-dir. And they make up of MMDeploy Model that can be fed to MMDeploy SDK to do model inference.

For more details about model conversion, you can read how_to_convert_model. If you want to customize the conversion pipeline, you can edit the config file by following this tutorial.

Tip

You can convert the above model to onnx model and perform ONNX Runtime inference just by changing ‘detection_tensorrt_dynamic-320x320-1344x1344.py’ to ‘detection_onnxruntime_dynamic.py’ and making ‘–device’ as ‘cpu’.

Inference Model

After model conversion, we can perform inference not only by Model Converter but also by Inference SDK.

Inference by Model Converter

Model Converter provides a unified API named as inference_model to do the job, making all inference backends API transparent to users. Take the previous converted Faster R-CNN tensorrt model for example,

from mmdeploy.apis import inference_model
result = inference_model(
  model_cfg='mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py',
  deploy_cfg='mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py',
  backend_files=['mmdeploy_model/faster-rcnn/end2end.engine'],
  img='mmdetection/demo/demo.jpg',
  device='cuda:0')

Note

‘backend_files’ in this API refers to backend engine file path, which MUST be put in a list, since some inference engines like OpenVINO and ncnn separate the network structure and its weights into two files.

Inference by SDK

You can directly run MMDeploy demo programs in the precompiled package to get inference results.

wget https://github.com/open-mmlab/mmdeploy/releases/download/v0.14.0/mmdeploy-0.14.0-linux-x86_64-cuda11.3.tar.gz
tar xf mmdeploy-0.14.0-linux-x86_64-cuda11.3
cd mmdeploy-0.14.0-linux-x86_64-cuda11.3
# run python demo
python example/python/object_detection.py cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg
# run C/C++ demo
# build the demo according to the README.md in the folder.
./bin/object_detection cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg

Note

In the above command, the input model is SDK Model path. It is NOT engine file path but actually the path passed to –work-dir. It not only includes engine files but also meta information like ‘deploy.json’ and ‘pipeline.json’.

In the next section, we will provide examples of deploying the converted Faster R-CNN model talked above with SDK different FFI (Foreign Function Interface).

Python API
from mmdeploy_runtime import Detector
import cv2

img = cv2.imread('mmdetection/demo/demo.jpg')

# create a detector
detector = Detector(model_path='mmdeploy_models/faster-rcnn', device_name='cuda', device_id=0)
# run the inference
bboxes, labels, _ = detector(img)
# Filter the result according to threshold
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
  [left, top, right, bottom], score = bbox[0:4].astype(int),  bbox[4]
  if score < 0.3:
      continue
  cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))

cv2.imwrite('output_detection.png', img)

You can find more examples from here.

C++ API

Using SDK C++ API should follow next pattern,

image

Now let’s apply this procedure on the above Faster R-CNN model.

#include <cstdlib>
#include <opencv2/opencv.hpp>
#include "mmdeploy/detector.hpp"

int main() {
  const char* device_name = "cuda";
  int device_id = 0;
  std::string model_path = "mmdeploy_model/faster-rcnn";
  std::string image_path = "mmdetection/demo/demo.jpg";

  // 1. load model
  mmdeploy::Model model(model_path);
  // 2. create predictor
  mmdeploy::Detector detector(model, mmdeploy::Device{device_name, device_id});
  // 3. read image
  cv::Mat img = cv::imread(image_path);
  // 4. inference
  auto dets = detector.Apply(img);
  // 5. deal with the result. Here we choose to visualize it
  for (int i = 0; i < dets.size(); ++i) {
    const auto& box = dets[i].bbox;
    fprintf(stdout, "box %d, left=%.2f, top=%.2f, right=%.2f, bottom=%.2f, label=%d, score=%.4f\n",
            i, box.left, box.top, box.right, box.bottom, dets[i].label_id, dets[i].score);
    if (bboxes[i].score < 0.3) {
      continue;
    }
    cv::rectangle(img, cv::Point{(int)box.left, (int)box.top},
                  cv::Point{(int)box.right, (int)box.bottom}, cv::Scalar{0, 255, 0});
  }
  cv::imwrite("output_detection.png", img);
  return 0;
}

When you build this example, try to add MMDeploy package in your CMake project as following. Then pass -DMMDeploy_DIR to cmake, which indicates the path where MMDeployConfig.cmake locates. You can find it in the prebuilt package.

find_package(MMDeploy REQUIRED)
target_link_libraries(${name} PRIVATE mmdeploy ${OpenCV_LIBS})

For more SDK C++ API usages, please read these samples.

For the rest C, C# and Java API usages, please read C demos, C# demos and Java demos respectively. We’ll talk about them more in our next release.

Accelerate preprocessing(Experimental)

If you want to fuse preprocess for acceleration,please refer to this doc

Evaluate Model

You can test the performance of deployed model using tool/test.py. For example,

python ${MMDEPLOY_DIR}/tools/test.py \
    ${MMDEPLOY_DIR}/configs/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    ${MMDET_DIR}/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
    --model ${BACKEND_MODEL_FILES} \
    --metrics ${METRICS} \
    --device cuda:0

Note

Regarding the –model option, it represents the converted engine files path when using Model Converter to do performance test. But when you try to test the metrics by Inference SDK, this option refers to the directory path of MMDeploy Model.

You can read how to evaluate a model for more details.

Build from Source

Download

git clone -b master git@github.com:open-mmlab/mmdeploy.git --recursive

Note:

  • If fetching submodule fails, you could get submodule manually by following instructions:

    cd mmdeploy
    git clone git@github.com:NVIDIA/cub.git third_party/cub
    cd third_party/cub
    git checkout c3cceac115
    
    # go back to third_party directory and git clone pybind11
    cd ..
    git clone git@github.com:pybind/pybind11.git pybind11
    cd pybind11
    git checkout 70a58c5
    
  • If it fails when git clone via SSH, you can try the HTTPS protocol like this:

    git clone -b master https://github.com/open-mmlab/mmdeploy.git --recursive
    

Build

Please visit the following links to find out how to build MMDeploy according to the target platform.

Use Docker Image

We provide two dockerfiles for CPU and GPU respectively. For CPU users, we install MMDeploy with ONNXRuntime, ncnn and OpenVINO backends. For GPU users, we install MMDeploy with TensorRT backend. Besides, users can install mmdeploy with different versions when building the docker image.

Build docker image

For CPU users, we can build the docker image with the latest MMDeploy through:

cd mmdeploy
docker build docker/CPU/ -t mmdeploy:master-cpu

For GPU users, we can build the docker image with the latest MMDeploy through:

cd mmdeploy
docker build docker/GPU/ -t mmdeploy:master-gpu

For installing MMDeploy with a specific version, we can append --build-arg VERSION=${VERSION} to build command. GPU for example:

cd mmdeploy
docker build docker/GPU/ -t mmdeploy:0.1.0 --build-arg  VERSION=0.1.0

For installing libs with the aliyun source, we can append --build-arg USE_SRC_INSIDE=${USE_SRC_INSIDE} to build command.

# GPU for example
cd mmdeploy
docker build docker/GPU/ -t mmdeploy:inside --build-arg  USE_SRC_INSIDE=true

# CPU for example
cd mmdeploy
docker build docker/CPU/ -t mmdeploy:inside --build-arg  USE_SRC_INSIDE=true

Run docker container

After building the docker image succeed, we can use docker run to launch the docker service. GPU docker image for example:

docker run --gpus all -it mmdeploy:master-gpu

FAQs

  1. CUDA error: the provided PTX was compiled with an unsupported toolchain:

    As described here, update the GPU driver to the latest one for your GPU.

  2. docker: Error response from daemon: could not select device driver “” with capabilities: [gpu].

    # Add the package repositories
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker
    

Build from Script

Through user investigation, we know that most users are already familiar with python and torch before using mmdeploy. Therefore we provide scripts to simplify mmdeploy installation.

Assuming you already have

  • python3 -m pip (conda or pyenv)

  • nvcc (depends on inference backend)

  • torch (not compulsory)

run this script to install mmdeploy + ncnn backend, nproc is not compulsory.

$ cd /path/to/mmdeploy
$ python3 tools/scripts/build_ubuntu_x64_ncnn.py
..

A sudo password may be required during this time, and the script will try its best to build and install mmdeploy SDK and demo:

  • Detect host OS version, make job number, whether use root and try to fix python3 -m pip

  • Find the necessary basic tools, such as g++-7, cmake, wget, etc.

  • Compile necessary dependencies, such as pyncnn, protobuf

The script will also try to avoid affecting host environment:

  • The dependencies of source code compilation are placed in the mmdeploy-dep directory at the same level as mmdeploy

  • The script would not modify variables such as PATH, LD_LIBRARY_PATH, PYTHONPATH, etc.

  • The environment variables that need to be modified will be printed, please pay attention to the final output

The script will eventually execute python3 tools/check_env.py, the successful installation should display the version number of the corresponding backend and ops_is_available: True, for example:

$ python3 tools/check_env.py
..
2022-09-13 14:49:13,767 - mmdeploy - INFO - **********Backend information**********
2022-09-13 14:49:14,116 - mmdeploy - INFO - onnxruntime: 1.8.0	ops_is_avaliable : True
2022-09-13 14:49:14,131 - mmdeploy - INFO - tensorrt: 8.4.1.5	ops_is_avaliable : True
2022-09-13 14:49:14,139 - mmdeploy - INFO - ncnn: 1.0.20220901	ops_is_avaliable : True
2022-09-13 14:49:14,150 - mmdeploy - INFO - pplnn_is_avaliable: True
..

Here is the verified installation script. If you want mmdeploy to support multiple backends at the same time, you can execute each script once:

script OS version
build_ubuntu_x64_ncnn.py 18.04/20.04
build_ubuntu_x64_ort.py 18.04/20.04
build_ubuntu_x64_pplnn.py 18.04/20.04
build_ubuntu_x64_torchscript.py 18.04/20.04
build_ubuntu_x64_tvm.py 18.04/20.04
build_jetson_orin_python38.sh JetPack5.0 L4T 34.1

CMake Build Option Spec

NAME VALUE DEFAULT REMARK
MMDEPLOY_SHARED_LIBS {ON, OFF} ON Switch to build shared libs
MMDEPLOY_BUILD_SDK {ON, OFF} OFF Switch to build MMDeploy SDK
MMDEPLOY_BUILD_SDK_MONOLITHIC {ON, OFF} OFF Build single lib
MMDEPLOY_BUILD_TEST {ON, OFF} OFF Build unittest
MMDEPLOY_BUILD_SDK_PYTHON_API {ON, OFF} OFF Switch to build MMDeploy SDK python package
MMDEPLOY_BUILD_SDK_CSHARP_API {ON, OFF} OFF Build C# SDK API
MMDEPLOY_BUILD_SDK_JAVA_API {ON, OFF} OFF Build Java SDK API
MMDEPLOY_BUILD_TEST {ON, OFF} OFF Switch to build MMDeploy SDK unittest cases
MMDEPLOY_SPDLOG_EXTERNAL {ON, OFF} OFF Build with spdlog installation package that comes with the system
MMDEPLOY_ZIP_MODEL {ON, OFF} OFF Enable SDK with zip format
MMDEPLOY_COVERAGE {ON, OFF} OFF Build for cplus code coverage report
MMDEPLOY_TARGET_DEVICES {"cpu", "cuda"} cpu Enable target device. You can enable more by passing a semicolon separated list of device names to MMDEPLOY_TARGET_DEVICES variable, e.g. -DMMDEPLOY_TARGET_DEVICES="cpu;cuda"
MMDEPLOY_TARGET_BACKENDS {"trt", "ort", "pplnn", "ncnn", "openvino", "torchscript", "snpe", "tvm", "acl"} N/A Enabling inference engine. By default, no target inference engine is set, since it highly depends on the use case. When more than one engine are specified, it has to be set with a semicolon separated list of inference backend names, e.g.
-DMMDEPLOY_TARGET_BACKENDS="trt;ort;pplnn;ncnn;openvino"
After specifying the inference engine, it's package path has to be passed to cmake as follows,
1. trt: TensorRT. TENSORRT_DIR and CUDNN_DIR are needed.

-DTENSORRT_DIR=${TENSORRT_DIR}
-DCUDNN_DIR=${CUDNN_DIR}
2. ort: ONNXRuntime. ONNXRUNTIME_DIR is needed.
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR}
3. pplnn: PPL.NN. pplnn_DIR is needed.
-Dpplnn_DIR=${PPLNN_DIR}
4. ncnn: ncnn. ncnn_DIR is needed.
-Dncnn_DIR=${NCNN_DIR}/build/install/lib/cmake/ncnn
5. openvino: OpenVINO. InferenceEngine_DIR is needed.
-DInferenceEngine_DIR=${INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/share
6. torchscript: TorchScript. Torch_DIR is needed.
-DTorch_DIR=${Torch_DIR}
7. snpe: qcom snpe. SNPE_ROOT must existed in the environment variable because of C/S mode.
8. coreml: CoreML. Torch_DIR is required.
-DTorch_DIR=${Torch_DIR}
9. TVM: TVM. TVM_DIR is required.
-DTVM_DIR=${TVM_DIR}
MMDEPLOY_CODEBASES {"mmcls", "mmdet", "mmseg", "mmedit", "mmocr", "all"} all Enable codebase's postprocess modules. You can provide a semicolon separated list of codebase names to enable them, e.g., -DMMDEPLOY_CODEBASES="mmcls;mmdet". Or you can pass all to enable them all, i.e., -DMMDEPLOY_CODEBASES=all

How to convert model

This tutorial briefly introduces how to export an OpenMMlab model to a specific backend using MMDeploy tools. Notes:

How to convert models from Pytorch to other backends

Prerequisite

  1. Install and build your target backend. You could refer to ONNXRuntime-install, TensorRT-install, ncnn-install, PPLNN-install, OpenVINO-install for more information.

  2. Install and build your target codebase. You could refer to MMClassification-install, MMDetection-install, MMSegmentation-install, MMOCR-install, MMEditing-install.

Usage

python ./tools/deploy.py \
    ${DEPLOY_CFG_PATH} \
    ${MODEL_CFG_PATH} \
    ${MODEL_CHECKPOINT_PATH} \
    ${INPUT_IMG} \
    --test-img ${TEST_IMG} \
    --work-dir ${WORK_DIR} \
    --calib-dataset-cfg ${CALIB_DATA_CFG} \
    --device ${DEVICE} \
    --log-level INFO \
    --show \
    --dump-info

Description of all arguments

  • deploy_cfg : The deployment configuration of mmdeploy for the model, including the type of inference framework, whether quantize, whether the input shape is dynamic, etc. There may be a reference relationship between configuration files, mmdeploy/mmcls/classification_ncnn_static.py is an example.

  • model_cfg : Model configuration for algorithm library, e.g. mmclassification/configs/vision_transformer/vit-base-p32_ft-64xb64_in1k-384.py, regardless of the path to mmdeploy.

  • checkpoint : torch model path. It can start with http/https, see the implementation of mmcv.FileClient for details.

  • img : The path to the image or point cloud file used for testing during model conversion.

  • --test-img : The path of image file that used to test model. If not specified, it will be set to None.

  • --work-dir : The path of work directory that used to save logs and models.

  • --calib-dataset-cfg : Only valid in int8 mode. Config used for calibration. If not specified, it will be set to None and use “val” dataset in model config for calibration.

  • --device : The device used for model conversion. If not specified, it will be set to cpu, for trt use cuda:0 format.

  • --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

  • --show : Whether to show detection outputs.

  • --dump-info : Whether to output information for SDK.

How to find the corresponding deployment config of a PyTorch model

  1. Find model’s codebase folder in configs/. Example, convert a yolov3 model you need to find configs/mmdet folder.

  2. Find model’s task folder in configs/codebase_folder/. Just like yolov3 model, you need to find configs/mmdet/detection folder.

  3. Find deployment config file in configs/codebase_folder/task_folder/. Just like deploy yolov3 model you can use configs/mmdet/detection/detection_onnxruntime_dynamic.py.

Example

python ./tools/deploy.py \
    configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
    $PATH_TO_MMDET/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py \
    $PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco.pth \
    $PATH_TO_MMDET/demo/demo.jpg \
    --work-dir work_dir \
    --show \
    --device cuda:0

How to evaluate the exported models

You can try to evaluate model, referring to how_to_evaluate_a_model.

List of supported models exportable to other backends

Refer to Support model list

How to write config

This tutorial describes how to write a config for model conversion and deployment. A deployment config includes onnx config, codebase config, backend config.

1. How to write onnx config

Onnx config to describe how to export a model from pytorch to onnx.

Description of onnx config arguments

  • type: Type of config dict. Default is onnx.

  • export_params: If specified, all parameters will be exported. Set this to False if you want to export an untrained model.

  • keep_initializers_as_inputs: If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs.

  • opset_version: Opset_version is 11 by default.

  • save_file: Output onnx file.

  • input_names: Names to assign to the input nodes of the graph.

  • output_names: Names to assign to the output nodes of the graph.

  • input_shape: The height and width of input tensor to the model.

Example

onnx_config = dict(
    type='onnx',
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='end2end.onnx',
    input_names=['input'],
    output_names=['output'],
    input_shape=None)

If you need to use dynamic axes

If the dynamic shape of inputs and outputs is required, you need to add dynamic_axes dict in onnx config.

  • dynamic_axes: Describe the dimensional information about input and output.

Example
    dynamic_axes={
        'input': {
            0: 'batch',
            2: 'height',
            3: 'width'
        },
        'dets': {
            0: 'batch',
            1: 'num_dets',
        },
        'labels': {
            0: 'batch',
            1: 'num_dets',
        },
    }

2. How to write codebase config

Codebase config part contains information like codebase type and task type.

Description of codebase config arguments

  • type: Model’s codebase, including mmcls, mmdet, mmseg, mmocr, mmedit.

  • task: Model’s task type, referring to List of tasks in all codebases.

Example
codebase_config = dict(type='mmcls', task='Classification')

3. How to write backend config

The backend config is mainly used to specify the backend on which model runs and provide the information needed when the model runs on the backend , referring to ONNX Runtime, TensorRT, ncnn, PPLNN.

  • type: Model’s backend, including onnxruntime, ncnn, pplnn, tensorrt, openvino.

Example

backend_config = dict(
    type='tensorrt',
    common_config=dict(
        fp16_mode=False, max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 512, 1024],
                    opt_shape=[1, 3, 1024, 2048],
                    max_shape=[1, 3, 2048, 2048])))
    ])

4. A complete example of mmcls on TensorRT

Here we provide a complete deployment config from mmcls on TensorRT.


codebase_config = dict(type='mmcls', task='Classification')

backend_config = dict(
    type='tensorrt',
    common_config=dict(
        fp16_mode=False,
        max_workspace_size=1 << 30),
    model_inputs=[
        dict(
            input_shapes=dict(
                input=dict(
                    min_shape=[1, 3, 224, 224],
                    opt_shape=[4, 3, 224, 224],
                    max_shape=[64, 3, 224, 224])))])

onnx_config = dict(
    type='onnx',
    dynamic_axes={
        'input': {
            0: 'batch',
            2: 'height',
            3: 'width'
        },
        'output': {
            0: 'batch'
        }
    },
    export_params=True,
    keep_initializers_as_inputs=False,
    opset_version=11,
    save_file='end2end.onnx',
    input_names=['input'],
    output_names=['output'],
    input_shape=[224, 224])

5. The name rules of our deployment config

There is a specific naming convention for the filename of deployment config files.

(task name)_(backend name)_(dynamic or static).py
  • task name: Model’s task type.

  • backend name: Backend’s name. Note if you use the quantization function, you need to indicate the quantization type. Just like tensorrt-int8.

  • dynamic or static: Dynamic or static export. Note if the backend needs explicit shape information, you need to add a description of input size with height x width format. Just like dynamic-512x1024-2048x2048, it means that the min input shape is 512x1024 and the max input shape is 2048x2048.

Example

detection_tensorrt-int8_dynamic-320x320-1344x1344.py

6. How to write model config

According to model’s codebase, write the model config file. Model’s config file is used to initialize the model, referring to MMClassification, MMDetection, MMSegmentation, MMOCR, MMEditing.

How to evaluate model

After converting a PyTorch model to a backend model, you may evaluate backend models with tools/test.py

Prerequisite

Install MMDeploy according to get-started instructions. And convert the PyTorch model or ONNX model to the backend model by following the guide.

Usage

python tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--out ${OUTPUT_PKL_FILE}] \
[--format-only] \
[--metrics ${METRICS}] \
[--show] \
[--show-dir ${OUTPUT_IMAGE_DIR}] \
[--show-score-thr ${SHOW_SCORE_THR}] \
--device ${DEVICE} \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}]
[--log2file work_dirs/output.txt]
[--batch-size ${BATCH_SIZE}]
[--speed-test] \
[--warmup ${WARM_UP}] \
[--log-interval ${LOG_INTERVERL}] \

Description of all arguments

  • deploy_cfg: The config for deployment.

  • model_cfg: The config of the model in OpenMMLab codebases.

  • --model: The backend model file. For example, if we convert a model to TensorRT, we need to pass the model file with “.engine” suffix.

  • --out: The path to save output results in pickle format. (The results will be saved only if this argument is given)

  • --format-only: Whether format the output results without evaluation or not. It is useful when you want to format the result to a specific format and submit it to the test server

  • --metrics: The metrics to evaluate the model defined in OpenMMLab codebases. e.g. “segm”, “proposal” for COCO in mmdet, “precision”, “recall”, “f1_score”, “support” for single label dataset in mmcls.

  • --show: Whether to show the evaluation result on the screen.

  • --show-dir: The directory to save the evaluation result. (The results will be saved only if this argument is given)

  • --show-score-thr: The threshold determining whether to show detection bounding boxes.

  • --device: The device that the model runs on. Note that some backends restrict the device. For example, TensorRT must run on cuda.

  • --cfg-options: Extra or overridden settings that will be merged into the current deploy config.

  • --metric-options: Custom options for evaluation. The key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function.

  • --log2file: log evaluation results (and speed) to file.

  • --batch-size: the batch size for inference, which would override samples_per_gpu in data config. Default is 1. Note that not all models support batch_size>1.

  • --speed-test: Whether to activate speed test.

  • --warmup: warmup before counting inference elapse, require setting speed-test first.

  • --log-interval: The interval between each log, require setting speed-test first.

  • --json-file: The path of json file to save evaluation results. Default is ./results.json.

* Other arguments in tools/test.py are used for speed test. They have no concern with evaluation.

Example

python tools/test.py \
    configs/mmcls/classification_onnxruntime_static.py \
    {MMCLS_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
    --model model.onnx \
    --out out.pkl \
    --device cpu \
    --speed-test

Note

  • The performance of each model in OpenMMLab codebases can be found in the document of each codebase.

Quantize model

Why quantization ?

The fixed-point model has many advantages over the fp32 model:

  • Smaller size, 8-bit model reduces file size by 75%

  • Benefit from the smaller model, the Cache hit rate is improved and inference would be faster

  • Chips tend to have corresponding fixed-point acceleration instructions which are faster and less energy consumed (int8 on a common CPU requires only about 10% of energy)

APK file size and heat generation are key indicators while evaluating mobile APP; On server side, quantization means that you can increase model size in exchange for precision and keep the same QPS.

Post training quantization scheme

Taking ncnn backend as an example, the complete workflow is as follows:

mmdeploy generates quantization table based on static graph (onnx) and uses backend tools to convert fp32 model to fixed point.

mmdeploy currently support ncnn with PTQ.

How to convert model

After mmdeploy installation, install ppq

git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python3 setup.py install

Back in mmdeploy, enable quantization with the option ‘tools/deploy.py –quant’.

cd /path/to/mmdeploy

export MODEL_CONFIG=/home/rg/konghuanjun/mmclassification/configs/resnet/resnet18_8xb32_in1k.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth

# get some imagenet sample images
git clone https://github.com/nihui/imagenet-sample-images --depth=1

# quantize
python3 tools/deploy.py  configs/mmcls/classification_ncnn-int8_static.py  ${MODEL_CONFIG}  ${MODEL_PATH}   /path/to/self-test.png   --work-dir work_dir --device cpu --quant --quant-image-dir /path/to/imagenet-sample-images
...

Description

Parameter Meaning
--quant Enable quantization, the default value is False
--quant-image-dir Calibrate dataset, use Validation Set in MODEL_CONFIG by default

Custom calibration dataset

Calibration set is used to calculate quantization layer parameters. Some DFQ (Data Free Quantization) methods do not even require a dataset.

  • Create a folder, just put in some images (no directory structure, no negative example, no special filename format)

  • The image needs to be the data comes from real scenario otherwise the accuracy would be drop

  • You can not quantize model with test dataset

    Type

    Train dataset

    Validation dataset

    Test dataset

    Calibration dataset

    Usage

    QAT

    PTQ

    Test accuracy

    PTQ

It is highly recommended that verifying model precision after quantization. Here is some quantization model test result.

Useful Tools

Apart from deploy.py, there are other useful tools under the tools/ directory.

torch2onnx

This tool can be used to convert PyTorch model from OpenMMLab to ONNX.

Usage

python tools/torch2onnx.py \
    ${DEPLOY_CFG} \
    ${MODEL_CFG} \
    ${CHECKPOINT} \
    ${INPUT_IMG} \
    --work-dir ${WORK_DIR} \
    --device cpu \
    --log-level INFO

Description of all arguments

  • deploy_cfg : The path of the deploy config file in MMDeploy codebase.

  • model_cfg : The path of model config file in OpenMMLab codebase.

  • checkpoint : The path of the model checkpoint file.

  • img : The path of the image file used to convert the model.

  • --work-dir : Directory to save output ONNX models Default is ./work-dir.

  • --device : The device used for conversion. If not specified, it will be set to cpu.

  • --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

extract

ONNX model with Mark nodes in it can be partitioned into multiple subgraphs. This tool can be used to extract the subgraph from the ONNX model.

Usage

python tools/extract.py \
    ${INPUT_MODEL} \
    ${OUTPUT_MODEL} \
    --start ${PARITION_START} \
    --end ${PARITION_END} \
    --log-level INFO

Description of all arguments

  • input_model : The path of input ONNX model. The output ONNX model will be extracted from this model.

  • output_model : The path of output ONNX model.

  • --start : The start point of extracted model with format <function_name>:<input/output>. The function_name comes from the decorator @mark.

  • --end : The end point of extracted model with format <function_name>:<input/output>. The function_name comes from the decorator @mark.

  • --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

Note

To support the model partition, you need to add Mark nodes in the ONNX model. The Mark node comes from the @mark decorator. For example, if we have marked the multiclass_nms as below, we can set end=multiclass_nms:input to extract the subgraph before NMS.

@mark('multiclass_nms', inputs=['boxes', 'scores'], outputs=['dets', 'labels'])
def multiclass_nms(*args, **kwargs):
    """Wrapper function for `_multiclass_nms`."""

onnx2pplnn

This tool helps to convert an ONNX model to an PPLNN model.

Usage

python tools/onnx2pplnn.py \
    ${ONNX_PATH} \
    ${OUTPUT_PATH} \
    --device cuda:0 \
    --opt-shapes [224,224] \
    --log-level INFO

Description of all arguments

  • onnx_path: The path of the ONNX model to convert.

  • output_path: The converted PPLNN algorithm path in json format.

  • device: The device of the model during conversion.

  • opt-shapes: Optimal shapes for PPLNN optimization. The shape of each tensor should be wrap with “[]” or “()” and the shapes of tensors should be separated by “,”.

  • --log-level: To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

onnx2tensorrt

This tool can be used to convert ONNX to TensorRT engine.

Usage

python tools/onnx2tensorrt.py \
    ${DEPLOY_CFG} \
    ${ONNX_PATH} \
    ${OUTPUT} \
    --device-id 0 \
    --log-level INFO \
    --calib-file /path/to/file

Description of all arguments

  • deploy_cfg : The path of the deploy config file in MMDeploy codebase.

  • onnx_path : The ONNX model path to convert.

  • output : The path of output TensorRT engine.

  • --device-id : The device index, default to 0.

  • --calib-file : The calibration data used to calibrate engine to int8.

  • --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

onnx2ncnn

This tool helps to convert an ONNX model to an ncnn model.

Usage

python tools/onnx2ncnn.py \
    ${ONNX_PATH} \
    ${NCNN_PARAM} \
    ${NCNN_BIN} \
    --log-level INFO

Description of all arguments

  • onnx_path : The path of the ONNX model to convert from.

  • output_param : The converted ncnn param path.

  • output_bin : The converted ncnn bin path.

  • --log-level : To set log level which in 'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'. If not specified, it will be set to INFO.

profiler

This tool helps to test latency of models with PyTorch, TensorRT and other backends. Note that the pre- and post-processing is excluded when computing inference latency.

Usage

python tools/profiler.py \
    ${DEPLOY_CFG} \
    ${MODEL_CFG} \
    ${IMAGE_DIR} \
    --model ${MODEL} \
    --device ${DEVICE} \
    --shape ${SHAPE} \
    --num-iter ${NUM_ITER} \
    --warmup ${WARMUP} \
    --cfg-options ${CFG_OPTIONS} \
    --batch-size ${BATCH_SIZE} \
    --img-ext ${IMG_EXT}

Description of all arguments

  • deploy_cfg : The path of the deploy config file in MMDeploy codebase.

  • model_cfg : The path of model config file in OpenMMLab codebase.

  • image_dir : The directory to image files that used to test the model.

  • --model : The path of the model to be tested.

  • --shape : Input shape of the model by HxW, e.g., 800x1344. If not specified, it would use input_shape from deploy config.

  • --num-iter : Number of iteration to run inference. Default is 100.

  • --warmup : Number of iteration to warm-up the machine. Default is 10.

  • --device : The device type. If not specified, it will be set to cuda:0.

  • --cfg-options : Optional key-value pairs to be overrode for model config.

  • --batch-size: the batch size for test inference. Default is 1. Note that not all models support batch_size>1.

  • --img-ext: the file extensions for input images from image_dir. Defaults to ['.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif'].

Example:

python tools/profiler.py \
    configs/mmcls/classification_tensorrt_dynamic-224x224-224x224.py \
    ../mmclassification/configs/resnet/resnet18_8xb32_in1k.py \
    ../mmclassification/demo/ \
    --model work-dirs/mmcls/resnet/trt/end2end.engine \
    --device cuda \
    --shape 224x224 \
    --num-iter 100 \
    --warmup 10 \
    --batch-size 1

And the output look like this:

----- Settings:
+------------+---------+
| batch size |    1    |
|   shape    | 224x224 |
| iterations |   100   |
|   warmup   |    10   |
+------------+---------+
----- Results:
+--------+------------+---------+
| Stats  | Latency/ms |   FPS   |
+--------+------------+---------+
|  Mean  |   1.535    | 651.656 |
| Median |   1.665    | 600.569 |
|  Min   |   1.308    | 764.341 |
|  Max   |   1.689    | 591.983 |
+--------+------------+---------+

generate_md_table

This tool can be used to generate supported-backends markdown table.

Usage

python tools/generate_md_table.py \
    ${yml_file} \
    ${output} \
    ${backends}

Description of all arguments

  • yml_file: input yml config path

  • output: output markdown file path

  • backends: output backends list. If not specified, it will be set ‘onnxruntime’ ‘tensorrt’ ‘torchscript’ ‘pplnn’ ‘openvino’ ‘ncnn’.

Example:

Generate backends markdown table from mmocr.yml

python tools/generate_md_table.py tests/regression/mmocr.yml tests/regression/mmocr.md onnxruntime tensorrt torchscript pplnn openvino ncnn

And the output look like this:

model task onnxruntime tensorrt torchscript pplnn openvino ncnn
DBNet TextDetection Y Y Y Y Y Y
CRNN TextRecognition Y Y Y Y N Y
SAR TextRecognition Y N N N N N

Supported models

The table below lists the models that are guaranteed to be exportable to other backends.

Model Codebase TorchScript OnnxRuntime TensorRT ncnn PPLNN OpenVINO Ascend RKNN Model config
RetinaNet MMDetection Y Y Y Y Y Y Y Y config
Faster R-CNN MMDetection Y Y Y Y Y Y Y N config
YOLOv3 MMDetection Y Y Y Y N Y Y Y config
YOLOX MMDetection Y Y Y Y N Y N Y config
FCOS MMDetection Y Y Y Y N Y N N config
FSAF MMDetection Y Y Y Y Y Y N Y config
Mask R-CNN MMDetection Y Y Y N N Y N N config
SSD* MMDetection Y Y Y Y N Y N Y config
FoveaBox MMDetection Y Y N N N Y N N config
ATSS MMDetection N Y Y N N Y N N config
GFL MMDetection N Y Y N ? Y N N config
Cascade R-CNN MMDetection N Y Y N Y Y N N config
Cascade Mask R-CNN MMDetection N Y Y N N Y N N config
ConvNeXt MMDetection N Y Y N N Y N N config
Swin Transformer* MMDetection N Y Y N N N N N config
VFNet MMDetection N N N N N Y N N config
RepPoints MMDetection N N Y N ? Y N N config
DETR MMDetection N Y Y N ? N N N config
ResNet MMClassification Y Y Y Y Y Y Y Y config
ResNeXt MMClassification Y Y Y Y Y Y Y Y config
SE-ResNet MMClassification Y Y Y Y Y Y Y Y config
MobileNetV2 MMClassification Y Y Y Y Y Y Y Y config
ShuffleNetV1 MMClassification Y Y Y Y Y Y Y Y config
ShuffleNetV2 MMClassification Y Y Y Y Y Y Y Y config
VisionTransformer MMClassification Y Y Y Y ? Y Y N config
SwinTransformer MMClassification Y Y Y N ? N ? N config
FCN MMSegmentation Y Y Y Y Y Y Y Y config
PSPNet*static MMSegmentation Y Y Y Y Y Y Y Y config
DeepLabV3 MMSegmentation Y Y Y Y Y Y Y N config
DeepLabV3+ MMSegmentation Y Y Y Y Y Y Y N config
Fast-SCNN*static MMSegmentation Y Y Y N Y Y N Y config
UNet MMSegmentation Y Y Y Y Y Y Y Y config
ANN* MMSegmentation Y Y Y N N N N N config
APCNet MMSegmentation Y Y Y Y N N N Y config
BiSeNetV1 MMSegmentation Y Y Y Y N Y N Y config
BiSeNetV2 MMSegmentation Y Y Y Y N Y N N config
CGNet MMSegmentation Y Y Y Y N Y N Y config
DMNet MMSegmentation ? Y N N N N N N config
DNLNet MMSegmentation ? Y Y Y N Y N N config
EMANet MMSegmentation Y Y Y N N Y N N config
EncNet MMSegmentation Y Y Y N N Y N N config
ERFNet MMSegmentation Y Y Y Y N Y N Y config
FastFCN MMSegmentation Y Y Y Y N Y N N config
GCNet MMSegmentation Y Y Y N N N N N config
ICNet* MMSegmentation Y Y Y N N Y N N config
ISANet*static MMSegmentation N Y Y N N Y N Y config
NonLocal Net MMSegmentation ? Y Y Y N Y N N config
OCRNet MMSegmentation ? Y Y Y N Y N Y config
PointRend MMSegmentation Y Y Y N N Y N N config
Semantic FPN MMSegmentation Y Y Y Y N Y N Y config
STDC MMSegmentation Y Y Y Y N Y N Y config
UPerNet* MMSegmentation ? Y Y N N N N Y config
DANet MMSegmentation ? Y Y N N N N N config
Segmenter*static MMSegmentation Y Y Y Y N Y N N config
SRCNN MMEditing Y Y Y Y Y Y N N config
ESRGAN MMEditing Y Y Y Y Y Y N N config
SRGAN MMEditing Y Y Y Y Y Y N N config
SRResNet MMEditing Y Y Y Y Y Y N N config
Real-ESRGAN MMEditing Y Y Y Y Y Y N N config
EDSR MMEditing Y Y Y Y N Y N N config
RDN MMEditing Y Y Y Y Y Y N N config
DBNet MMOCR Y Y Y Y Y Y Y N config
PANet MMOCR Y Y Y Y ? Y Y N config
PSENet MMOCR Y Y Y Y ? Y Y N config
CRNN MMOCR Y Y Y Y Y N N N config
SAR MMOCR N Y N N N N N N config
SATRN MMOCR Y Y Y N N N N N config
HRNet MMPose N Y Y Y N Y N N config
MSPN MMPose N Y Y Y N Y N N config
LiteHRNet MMPose N Y Y N N Y N N config
Hourglass MMPose N Y Y Y N Y N N config
ViPNAS MMPose ? ? ? Y ? ? ? ? config
PointPillars MMDetection3d ? Y Y N N Y N N config
CenterPoint (pillar) MMDetection3d ? Y Y N N Y N N config
Fcos3d MMDetection3d ? Y Y N N N N N config
RotatedRetinaNet RotatedDetection N Y Y N N N N N config
Oriented RCNN RotatedDetection N Y Y N N N N N config
Gliding Vertex RotatedDetection N N Y N N N N N config
TSN MMAction2 N Y Y N N N N N config
SlowFast MMAction2 N Y Y N N N N N config

Note

  • Tag:

    • static: This model only support static export. Please use static deploy config, just like $MMDEPLOY_DIR/configs/mmseg/segmentation_tensorrt_static-1024x2048.py.

  • SSD: When you convert SSD model, you need to use min shape deploy config just like 300x300-512x512 rather than 320x320-1344x1344, for example $MMDEPLOY_DIR/configs/mmdet/detection/detection_tensorrt_dynamic-300x300-512x512.py.

  • YOLOX: YOLOX with ncnn only supports static shape.

  • Swin Transformer: For TensorRT, only version 8.4+ is supported.

  • SAR: Chinese text recognition model is not supported as the protobuf size of ONNX is limited.

Benchmark

Backends

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: ncnn, TensorRT, PPLNN

Latency benchmark

Platform

  • Ubuntu 18.04

  • ncnn 20211208

  • Cuda 11.3

  • TensorRT 7.2.3.4

  • Docker 20.10.8

  • NVIDIA tesla T4 tensor core GPU for TensorRT

Other settings

  • Static graph

  • Batch size 1

  • Synchronize devices after each inference.

  • We count the average inference performance of 100 images of the dataset.

  • Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.

  • Input resolution varies for different datasets of different codebases. All inputs are real images except for mmediting because the dataset is not large enough.

Users can directly test the speed through model profiling. And here is the benchmark in our environment.

mmcls TensorRT(ms) PPLNN(ms) ncnn(ms) Ascend(ms)
model spatial T4 JetsonNano2GB Jetson TX2 T4 SnapDragon888 Adreno660 Ascend310
fp32 fp16 int8 fp32 fp16 fp32 fp16 fp32 fp32 fp32
ResNet 224x224 2.97 1.26 1.21 59.32 30.54 24.13 1.30 33.91 25.93 2.49
ResNeXt 224x224 4.31 1.42 1.37 88.10 49.18 37.45 1.36 133.44 69.38 -
SE-ResNet 224x224 3.41 1.66 1.51 74.59 48.78 29.62 1.91 107.84 80.85 -
ShuffleNetV2 224x224 1.37 1.19 1.13 15.26 10.23 7.37 4.69 9.55 10.66 -
mmdet part1 TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
YOLOv3 320x320 14.76 24.92 24.92 - 18.07
SSD-Lite 320x320 8.84 9.21 8.04 1.28 19.72
RetinaNet 800x1344 97.09 25.79 16.88 780.48 38.34
FCOS 800x1344 84.06 23.15 17.68 - -
FSAF 800x1344 82.96 21.02 13.50 - 30.41
Faster R-CNN 800x1344 88.08 26.52 19.14 733.81 65.40
Mask R-CNN 800x1344 104.83 58.27 - - 86.80
mmdet part2 ncnn
model spatial SnapDragon888 Adreno660
fp32 fp32
MobileNetv2-YOLOv3 320x320 48.57 66.55
SSD-Lite 320x320 44.91 66.19
YOLOX 416x416 111.60 134.50
mmedit TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
ESRGAN 32x32 12.64 12.42 12.45 - 7.67
SRCNN 32x32 0.70 0.35 0.26 58.86 0.56
mmocr TensorRT(ms) PPLNN(ms) ncnn(ms)
model spatial T4 T4 SnapDragon888 Adreno660
fp32 fp16 int8 fp16 fp32 fp32
DBNet 640x640 10.70 5.62 5.00 34.84 - -
CRNN 32x32 1.93 1.40 1.36 - 10.57 20.00
mmseg TensorRT(ms) PPLNN(ms)
model spatial T4 Jetson TX2 T4
fp32 fp16 int8 fp32 fp16
FCN 512x1024 128.42 23.97 18.13 1682.54 27.00
PSPNet 1x3x512x1024 119.77 24.10 16.33 1586.19 27.26
DeepLabV3 512x1024 226.75 31.80 19.85 - 36.01
DeepLabV3+ 512x1024 151.25 47.03 50.38 2534.96 34.80

Performance benchmark

Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.

mmcls PyTorch TorchScript ONNX Runtime TensorRT PPLNN Ascend
model metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32
ResNet-18 top-1 69.90 69.90 69.88 69.88 69.86 69.86 69.86 69.91
top-5 89.43 89.43 89.34 89.34 89.33 89.38 89.34 89.43
ResNeXt-50 top-1 77.90 77.90 77.90 77.90 - 77.78 77.89 -
top-5 93.66 93.66 93.66 93.66 - 93.64 93.65 -
SE-ResNet-50 top-1 77.74 77.74 77.74 77.74 77.75 77.63 77.73 -
top-5 93.84 93.84 93.84 93.84 93.83 93.72 93.84 -
ShuffleNetV1 1.0x top-1 68.13 68.13 68.13 68.13 68.13 67.71 68.11 -
top-5 87.81 87.81 87.81 87.81 87.81 87.58 87.80 -
ShuffleNetV2 1.0x top-1 69.55 69.55 69.55 69.55 69.54 69.10 69.54 -
top-5 88.92 88.92 88.92 88.92 88.91 88.58 88.92 -
MobileNet V2 top-1 71.86 71.86 71.86 71.86 71.87 70.91 71.84 71.87
top-5 90.42 90.42 90.42 90.42 90.40 89.85 90.41 90.42
Vision Transformer top-1 85.43 85.43 - 85.43 85.42 - - 85.43
top-5 97.77 97.77 - 97.77 97.76 - - 97.77
Swin Transformer top-1 81.18 81.18 81.18 81.18 81.18 - -
top-5 95.61 95.61 95.61 95.61 95.61 - -
mmdet Pytorch TorchScript ONNXRuntime TensorRT PPLNN Ascend
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32
YOLOV3 Object Detection COCO2017 box AP 33.7 33.7 - 33.5 33.5 33.5 - -
SSD Object Detection COCO2017 box AP 25.5 25.5 - 25.5 25.5 - - -
RetinaNet Object Detection COCO2017 box AP 36.5 36.4 - 36.4 36.4 36.3 36.5 36.4
FCOS Object Detection COCO2017 box AP 36.6 - - 36.6 36.5 - - -
FSAF Object Detection COCO2017 box AP 37.4 37.4 - 37.4 37.4 37.2 37.4 -
YOLOX Object Detection COCO2017 box AP 40.5 40.3 - 40.3 40.3 29.3 - -
Faster R-CNN Object Detection COCO2017 box AP 37.4 37.3 - 37.3 37.3 37.1 37.3 37.2
ATSS Object Detection COCO2017 box AP 39.4 - - 39.4 39.4 - - -
Cascade R-CNN Object Detection COCO2017 box AP 40.4 - - 40.4 40.4 - 40.4 -
GFL Object Detection COCO2017 box AP 40.2 - 40.2 40.2 40.0 - - -
RepPoints Object Detection COCO2017 box AP 37.0 - - 36.9 - - - -
DETR Object Detection COCO2017 box AP 40.1 40.1 - 40.1 40.1 - -
Mask R-CNN Instance Segmentation COCO2017 box AP 38.2 38.1 - 38.1 38.1 - 38.0 -
mask AP 34.7 34.7 - 33.7 33.7 - - -
Swin-Transformer Instance Segmentation COCO2017 box AP 42.7 - 42.7 42.5 37.7 - - -
mask AP 39.3 - 39.3 39.3 35.4 - - -
mmedit Pytorch TorchScript ONNX Runtime TensorRT PPLNN NCNN
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32 int8
SRCNN Super Resolution Set5 PSNR 28.4316 28.4120 28.4323 28.4323 28.4286 28.1995 28.4311 - -
SSIM 0.8099 0.8106 0.8097 0.8097 0.8096 0.7934 0.8096 - -
ESRGAN Super Resolution Set5 PSNR 28.2700 28.2619 28.2592 28.2592 - - 28.2624 - -
SSIM 0.7778 0.7784 0.7764 0.7774 - - 0.7765 - -
ESRGAN-PSNR Super Resolution Set5 PSNR 30.6428 30.6306 30.6444 30.6430 - - 27.0426 - -
SSIM 0.8559 0.8565 0.8558 0.8558 - - 0.8557 - -
SRGAN Super Resolution Set5 PSNR 27.9499 27.9252 27.9408 27.9408 - - 27.9388 - -
SSIM 0.7846 0.7851 0.7839 0.7839 - - 0.7839 - -
SRResNet Super Resolution Set5 PSNR 30.2252 30.2069 30.2300 30.2300 - - 30.2294 - -
SSIM 0.8491 0.8497 0.8488 0.8488 - - 0.8488 - -
Real-ESRNet Super Resolution Set5 PSNR 28.0297 - 27.7016 27.7016 - - 27.7049 - -
SSIM 0.8236 - 0.8122 0.8122 - - 0.8123 - -
EDSRx4 Super Resolution Set5 PSNR 30.2223 30.2192 30.2214 30.2214 30.2211 30.1383 - 30.2194 29.9340
SSIM 0.8500 0.8507 0.8497 0.8497 0.8497 0.8469 - 0.8498 0.8409
EDSRx2 Super Resolution Set5 PSNR 35.7592 - - - - - - 35.7733 35.4266
SSIM 0.9372 - - - - - - 0.9365 0.9334
mmocr Pytorch TorchScript ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32
DBNet* TextDetection ICDAR2015 recall 0.7310 0.7308 0.7304 0.7198 0.7179 0.7111 0.7304 0.7309
precision 0.8714 0.8718 0.8714 0.8677 0.8674 0.8688 0.8718 0.8714
hmean 0.7950 0.7949 0.7950 0.7868 0.7856 0.7821 0.7949 0.7950
PSENet TextDetection ICDAR2015 recall 0.7526 0.7526 0.7526 0.7526 0.7520 0.7496 - 0.7526
precision 0.8669 0.8669 0.8669 0.8669 0.8668 0.8550 - 0.8669
hmean 0.8057 0.8057 0.8057 0.8057 0.8054 0.7989 - 0.8057
PANet TextDetection ICDAR2015 recall 0.7401 0.7401 0.7401 0.7357 0.7366 - - 0.7401
precision 0.8601 0.8601 0.8601 0.8570 0.8586 - - 0.8601
hmean 0.7955 0.7955 0.7955 0.7917 0.7930 - - 0.7955
CRNN TextRecognition IIIT5K acc 0.8067 0.8067 0.8067 0.8067 0.8063 0.8067 0.8067 -
SAR TextRecognition IIIT5K acc 0.9517 - 0.9287 - - - - -
SATRN TextRecognition IIIT5K acc 0.9470 0.9487 0.9487 0.9487 0.9483 0.9483 - -
mmseg Pytorch TorchScript ONNXRuntime TensorRT PPLNN Ascend
model dataset metric fp32 fp32 fp32 fp32 fp16 int8 fp16 fp32
FCN Cityscapes mIoU 72.25 72.36 - 72.36 72.35 74.19 72.35 72.35
PSPNet Cityscapes mIoU 78.55 78.66 - 78.26 78.24 77.97 78.09 78.67
deeplabv3 Cityscapes mIoU 79.09 79.12 - 79.12 79.12 78.96 79.12 79.06
deeplabv3+ Cityscapes mIoU 79.61 79.60 - 79.60 79.60 79.43 79.60 79.51
Fast-SCNN Cityscapes mIoU 70.96 70.96 - 70.93 70.92 66.00 70.92 -
UNet Cityscapes mIoU 69.10 - - 69.10 69.10 68.95 - -
ANN Cityscapes mIoU 77.40 - - 77.32 77.32 - - -
APCNet Cityscapes mIoU 77.40 - - 77.32 77.32 - - -
BiSeNetV1 Cityscapes mIoU 74.44 - - 74.44 74.43 - - -
BiSeNetV2 Cityscapes mIoU 73.21 - - 73.21 73.21 - - -
CGNet Cityscapes mIoU 68.25 - - 68.27 68.27 - - -
EMANet Cityscapes mIoU 77.59 - - 77.59 77.6 - - -
EncNet Cityscapes mIoU 75.67 - - 75.66 75.66 - - -
ERFNet Cityscapes mIoU 71.08 - - 71.08 71.07 - - -
FastFCN Cityscapes mIoU 79.12 - - 79.12 79.12 - - -
GCNet Cityscapes mIoU 77.69 - - 77.69 77.69 - - -
ICNet Cityscapes mIoU 76.29 - - 76.36 76.36 - - -
ISANet Cityscapes mIoU 78.49 - - 78.49 78.49 - - -
OCRNet Cityscapes mIoU 74.30 - - 73.66 73.67 - - -
PointRend Cityscapes mIoU 76.47 76.47 - 76.41 76.42 - - -
Semantic FPN Cityscapes mIoU 74.52 - - 74.52 74.52 - - -
STDC Cityscapes mIoU 75.10 - - 75.10 75.10 - - -
STDC Cityscapes mIoU 77.17 - - 77.17 77.17 - - -
UPerNet Cityscapes mIoU 77.10 - - 77.19 77.18 - - -
Segmenter ADE20K mIoU 44.32 44.29 44.29 44.29 43.34 43.35 - -
mmpose Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metric fp32 fp32 fp32 fp16 fp16 fp32
HRNet Pose Detection COCO AP 0.748 0.748 0.748 0.748 - 0.748
AR 0.802 0.802 0.802 0.802 - 0.802
LiteHRNet Pose Detection COCO AP 0.663 0.663 0.663 - - 0.663
AR 0.728 0.728 0.728 - - 0.728
MSPN Pose Detection COCO AP 0.762 0.762 0.762 0.762 - 0.762
AR 0.825 0.825 0.825 0.825 - 0.825
Hourglass Pose Detection COCO AP 0.717 0.717 0.717 0.717 - 0.717
AR 0.774 0.774 0.774 0.774 - 0.774
mmrotate Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metrics fp32 fp32 fp32 fp16 fp16 fp32
RotatedRetinaNet Rotated Detection DOTA-v1.0 mAP 0.698 0.698 0.698 0.697 - -
Oriented RCNN Rotated Detection DOTA-v1.0 mAP 0.756 0.756 0.758 0.730 - -
GlidingVertex Rotated Detection DOTA-v1.0 mAP 0.732 - 0.733 0.731 - -
RoI Transformer Rotated Detection DOTA-v1.0 mAP 0.761 - 0.758 - - -
mmaction2 Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
model task dataset metrics fp32 fp32 fp32 fp16 fp16 fp32
TSN Recognition Kinetics-400 top-1 69.71 - 69.71 - - -
top-5 88.75 - 88.75 - - -
SlowFast Recognition Kinetics-400 top-1 74.45 - 75.62 - - -
top-5 91.55 - 92.10 - - -

Notes

  • As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.

  • Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily.

  • DBNet uses the interpolate mode nearest in the neck of the model, which TensorRT-7 applies a quite different strategy from Pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate mode bilinear which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.

  • Mask AP of Mask R-CNN drops by 1% for the backend. The main reason is that the predicted masks are directly interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of the model and then to original image in other backends.

  • MMPose models are tested with flip_test explicitly set to False in model configs.

  • Some models might get low accuracy in fp16 mode. Please adjust the model to avoid value overflow.

Test on embedded device

Here are the test conclusions of our edge devices. You can directly obtain the results of your own environment with model profiling.

Software and hardware environment

  • host OS ubuntu 18.04

  • backend SNPE-1.59

  • device Mi11 (qcom 888)

mmcls

model dataset spatial fp32 top-1 (%) snpe gpu hybrid fp32 top-1 (%) latency (ms)
ShuffleNetV2 ImageNet-1k 224x224 69.55 69.83* 20±7
MobilenetV2 ImageNet-1k 224x224 71.86 72.14* 15±6

tips:

  1. The ImageNet-1k dataset is too large to test, only part of the dataset is used (8000/50000)

  2. The heating of device will downgrade the frequency, so the time consumption will actually fluctuate. Here are the stable values after running for a period of time. This result is closer to the actual demand.

mmocr detection

model dataset spatial fp32 hmean snpe gpu hybrid hmean latency(ms)
PANet ICDAR2015 1312x736 0.795 0.785 @thr=0.9 3100±100

mmpose

model dataset spatial snpe hybrid AR@IoU=0.50 snpe hybrid AP@IoU=0.50 latency(ms)
pose_hrnet_w32 Animalpose 256x256 0.997 0.989 630±50

tips:

  • Test pose_hrnet using AnimalPose’s test dataset instead of val dataset.

mmseg

model dataset spatial mIoU latency(ms)
fcn Cityscapes 512x1024 71.11 4915±500

tips:

  • fcn works fine with 512x1024 size. Cityscapes dataset uses 1024x2048 resolution which causes device to reboot.

Notes

  • We needs to manually split the mmdet model into two parts. Because

    • In snpe source code, onnx_to_ir.py can only parse onnx input while ir_to_dlc.py does not support topk operator

    • UDO (User Defined Operator) does not work with snpe-onnx-to-dlc

  • mmedit model

    • srcnn requires cubic resize which snpe does not support

    • esrgan converts fine, but loading the model causes the device to reboot

  • mmrotate depends on e2cnn and needs to be installed manually its Python3.6 compatible branch

Test on TVM

Supported Models

Model Codebase Model config
RetinaNet MMDetection config
Faster R-CNN MMDetection config
YOLOv3 MMDetection config
YOLOX MMDetection config
Mask R-CNN MMDetection config
SSD MMDetection config
ResNet MMClassification config
ResNeXt MMClassification config
SE-ResNet MMClassification config
MobileNetV2 MMClassification config
ShuffleNetV1 MMClassification config
ShuffleNetV2 MMClassification config
VisionTransformer MMClassification config
FCN MMSegmentation config
PSPNet MMSegmentation config
DeepLabV3 MMSegmentation config
DeepLabV3+ MMSegmentation config
UNet MMSegmentation config

The table above list the models that we have tested. Models not listed on the table might still be able to converted. Please have a try.

Test

  • Ubuntu 20.04

  • tvm 0.9.0

mmcls metric PyTorch TVM
ResNet-18 top-1 69.90 69.90
ResNeXt-50 top-1 77.90 77.90
ShuffleNet V2 top-1 69.55 69.55
MobileNet V2 top-1 71.86 71.86
mmdet(*) metric PyTorch TVM
SSD box AP 25.5 25.5

*: We only test model on ssd since dynamic shape is not supported for now.

mmseg metric PyTorch TVM
FCN mIoU 72.25 72.36
PSPNet mIoU 78.55 77.90

Quantization test result

Currently mmdeploy support ncnn quantization

Quantize with ncnn

mmcls

model dataset fp32 top-1 (%) int8 top-1 (%)
ResNet-18 Cifar10 94.82 94.83
ResNeXt-32x4d-50 ImageNet-1k 77.90 78.20*
MobileNet V2 ImageNet-1k 71.86 71.43*
HRNet-W18* ImageNet-1k 76.75 76.25*

Note:

  • Because of the large amount of imagenet-1k data and ncnn has not released Vulkan int8 version, only part of the test set (4000/50000) is used.

  • The accuracy will vary after quantization, and it is normal for the classification model to increase by less than 1%.

OCR detection

model dataset fp32 hmean int8 hmean
PANet ICDAR2015 0.795 0.792 @thr=0.9
TextSnake CTW1500 0.817 0.818

Note: mmocr Uses ‘shapely’ to compute IoU, which results in a slight difference in accuracy

Pose detection

model dataset fp32 AP int8 AP
Hourglass COCO2017 0.717 0.713
S-ViPNAS-MobileNetV3 COCO2017 0.687 0.683
S-ViPNAS-Res50 COCO2017 0.701 0.696
S-ViPNAS-MobileNetV3 COCO Wholebody 0.459 0.445
S-ViPNAS-Res50 COCO Wholebody 0.484 0.476
S-ViPNAS-MobileNetV3_dark COCO Wholebody 0.499 0.481
S-ViPNAS-Res50_dark COCO Wholebody 0.520 0.511

Note: MMPose models are tested with flip_test explicitly set to False in model configs.

Super Resolution

model dataset fp32 PSNR/SSIM int8 PSNR/SSIM
EDSRx2 Set5 35.7733/0.9365 35.4266/0.9334
EDSRx4 Set5 30.2194/0.8498 29.9340/0.8409

mmseg

model dataset fp32 mIoU int8 mIoU
Fast-SCNN cityscapes 70.96 70.24

Note:

  • Int8 models of the Fast-SCNN requires ncnnoptimize.

  • NCNN will extract 512 images from the train as a calibration dataset

MMClassification Support

MMClassification is an open-source image classification toolbox based on PyTorch. It is a part of the OpenMMLab project.

MMClassification installation tutorial

Please refer to install.md for installation.

List of MMClassification models supported by MMDeploy

Model TorchScript ONNX Runtime TensorRT ncnn PPLNN OpenVINO Model config
ResNet Y Y Y Y Y Y config
ResNeXt Y Y Y Y Y Y config
SE-ResNet Y Y Y Y Y Y config
MobileNetV2 Y Y Y Y Y Y config
ShuffleNetV1 Y Y Y Y Y Y config
ShuffleNetV2 Y Y Y Y Y Y config
VisionTransformer Y Y Y Y ? Y config
SwinTransformer Y Y Y N ? N config

MMDetection Support

MMDetection is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.

MMDetection installation tutorial

Please refer to get_started.md for installation.

List of MMDetection models supported by MMDeploy

Model Task OnnxRuntime TensorRT ncnn PPLNN OpenVINO Model config
ATSS ObjectDetection Y Y N N Y config
FCOS ObjectDetection Y Y Y N Y config
FoveaBox ObjectDetection Y N N N Y config
FSAF ObjectDetection Y Y Y Y Y config
RetinaNet ObjectDetection Y Y Y Y Y config
SSD ObjectDetection Y Y Y N Y config
VFNet ObjectDetection N N N N Y config
YOLOv3 ObjectDetection Y Y Y N Y config
YOLOX ObjectDetection Y Y Y N Y config
Cascade R-CNN ObjectDetection Y Y N Y Y config
Faster R-CNN ObjectDetection Y Y Y Y Y config
Faster R-CNN + DCN ObjectDetection Y Y Y Y Y config
GFL ObjectDetection Y Y N ? Y config
RepPoints ObjectDetection N Y N ? Y config
DETR ObjectDetection Y Y N ? Y config
Cascade Mask R-CNN InstanceSegmentation Y N N N Y config
Mask R-CNN InstanceSegmentation Y Y N N Y config
Swin Transformer InstanceSegmentation Y Y N N N config

MMSegmentation Support

MMSegmentation is an open source object segmentation toolbox based on PyTorch. It is a part of the OpenMMLab project.

MMSegmentation installation tutorial

Please refer to get_started.md for installation.

List of MMSegmentation models supported by MMDeploy

Model OnnxRuntime TensorRT ncnn PPLNN OpenVino Model config
FCN Y Y Y Y Y config
PSPNet* Y Y Y Y Y config
DeepLabV3 Y Y Y Y Y config
DeepLabV3+ Y Y Y Y Y config
Fast-SCNN* Y Y N Y Y config
UNet Y Y Y Y Y config
ANN* Y Y N N N config
APCNet Y Y Y N N config
BiSeNetV1 Y Y Y N Y config
BiSeNetV2 Y Y Y N Y config
CGNet Y Y Y N Y config
DMNet Y N N N N config
DNLNet Y Y Y N Y config
EMANet Y Y N N Y config
EncNet Y Y N N Y config
ERFNet Y Y Y N Y config
FastFCN Y Y Y N Y config
GCNet Y Y N N N config
ICNet* Y Y N N Y config
ISANet* Y Y N N Y config
NonLocal Net Y Y Y N Y config
OCRNet Y Y Y N Y config
PointRend* Y Y N N N config
Semantic FPN Y Y Y N Y config
STDC Y Y Y N Y config
UPerNet* Y Y N N N config
DANet Y Y N N Y config
Segmenter* Y Y Y N Y config
SegFormer* Y Y N N Y config
SETR Y N N N Y config
CCNet N N N N N config
PSANet N N N N N config
DPT N N N N N config

Reminder

  • Only whole inference mode is supported for all mmseg models.

  • PSPNet, Fast-SCNN only support static shape, because nn.AdaptiveAvgPool2d is not supported in most of backends dynamically.

  • For models only supporting static shape, you should use the deployment config file of static shape such as configs/mmseg/segmentation_tensorrt_static-1024x2048.py.

  • For users prefer deployed models generate probability feature map, put codebase_config = dict(with_argmax=False) in deploy configs.

MMEditing Support

MMEditing is an open-source image and video editing toolbox based on PyTorch. It is a part of the OpenMMLab project.

MMEditing installation tutorial

Please refer to official installation guide to install the codebase.

MMEditing models support

Model Task ONNX Runtime TensorRT ncnn PPLNN OpenVINO Model config
SRCNN super-resolution Y Y Y Y Y config
ESRGAN super-resolution Y Y Y Y Y config
ESRGAN-PSNR super-resolution Y Y Y Y Y config
SRGAN super-resolution Y Y Y Y Y config
SRResNet super-resolution Y Y Y Y Y config
Real-ESRGAN super-resolution Y Y Y Y Y config
EDSR super-resolution Y Y Y N Y config
RDN super-resolution Y Y Y Y Y config
Global&Local* inpainting Y Y N N N config
DeepFillv1* inpainting Y Y N N N config
PConv* inpainting Y Y N N N config
DeepFillv2* inpainting Y Y N N N config
AOT-GAN* inpainting Y Y N N N config
  1. We skipped quantitative evaluation for image inpainting due to the high computational cost required for testing.

MMOCR Support

MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is a part of the OpenMMLab project.

MMOCR installation tutorial

Please refer to install.md for installation.

List of MMOCR models supported by MMDeploy

Model Task TorchScript OnnxRuntime TensorRT ncnn PPLNN OpenVINO Model config
DBNet text-detection Y Y Y Y Y Y config
PSENet text-detection Y Y Y Y N Y config
PANet text-detection Y Y Y Y N Y config
CRNN text-recognition Y Y Y Y Y N config
SAR text-recognition N Y N N N N config
SATRN text-recognition Y Y Y N N N config

Reminder

Note that ncnn, pplnn, and OpenVINO only support the configs of DBNet18 for DBNet.

For CRNN models with TensorRT-int8 backend, we recommend TensorRT 7.2.3.4 and CUDA 10.2.

For the PANet with the checkpoint pretrained on ICDAR dataset, if you want to convert the model to TensorRT with 16 bits float point, please try the following script.

# Copyright (c) OpenMMLab. All rights reserved.
from typing import Sequence

import torch
import torch.nn.functional as F

from mmdeploy.core import FUNCTION_REWRITER
from mmdeploy.utils.constants import Backend

FACTOR = 32
ENABLE = False
CHANNEL_THRESH = 400


@FUNCTION_REWRITER.register_rewriter(
    func_name='mmocr.models.textdet.necks.FPEM_FFM.forward',
    backend=Backend.TENSORRT.value)
def fpem_ffm__forward__trt(ctx, self, x: Sequence[torch.Tensor], *args,
                           **kwargs) -> Sequence[torch.Tensor]:
    """Rewrite `forward` of FPEM_FFM for tensorrt backend.

    Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint
    `https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
    _sbn_600e_icdar2015_20210219-42dbe46a.pth`

    Args:
        ctx (ContextCaller): The context with additional information.
        self: The instance of the class FPEM_FFM.
        x (List[Tensor]): A list of feature maps of shape (N, C, H, W).

    Returns:
        outs (List[Tensor]): A list of feature maps of shape (N, C, H, W).
    """
    c2, c3, c4, c5 = x
    # reduce channel
    c2 = self.reduce_conv_c2(c2)
    c3 = self.reduce_conv_c3(c3)
    c4 = self.reduce_conv_c4(c4)

    if ENABLE:
        bn_w = self.reduce_conv_c5[1].weight / torch.sqrt(
            self.reduce_conv_c5[1].running_var + self.reduce_conv_c5[1].eps)
        bn_b = self.reduce_conv_c5[
            1].bias - self.reduce_conv_c5[1].running_mean * bn_w
        bn_w = bn_w.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))
        bn_b = bn_b.reshape(1, -1, 1, 1).repeat(1, 1, c5.size(2), c5.size(3))
        conv_b = self.reduce_conv_c5[0].bias.reshape(1, -1, 1, 1).repeat(
            1, 1, c5.size(2), c5.size(3))
        c5 = FACTOR * (self.reduce_conv_c5[:-1](c5)) - (FACTOR - 1) * (
            bn_w * conv_b + bn_b)
        c5 = self.reduce_conv_c5[-1](c5)
    else:
        c5 = self.reduce_conv_c5(c5)

    # FPEM
    for i, fpem in enumerate(self.fpems):
        c2, c3, c4, c5 = fpem(c2, c3, c4, c5)
        if i == 0:
            c2_ffm = c2
            c3_ffm = c3
            c4_ffm = c4
            c5_ffm = c5
        else:
            c2_ffm += c2
            c3_ffm += c3
            c4_ffm += c4
            c5_ffm += c5

    # FFM
    c5 = F.interpolate(
        c5_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    c4 = F.interpolate(
        c4_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    c3 = F.interpolate(
        c3_ffm,
        c2_ffm.size()[-2:],
        mode='bilinear',
        align_corners=self.align_corners)
    outs = [c2_ffm, c3, c4, c5]
    return tuple(outs)


@FUNCTION_REWRITER.register_rewriter(
    func_name='mmdet.models.backbones.resnet.BasicBlock.forward',
    backend=Backend.TENSORRT.value)
def basic_block__forward__trt(ctx, self, x: torch.Tensor) -> torch.Tensor:
    """Rewrite `forward` of BasicBlock for tensorrt backend.

    Rewrite this function avoid overflow for tensorrt-fp16 with the checkpoint
    `https://download.openmmlab.com/mmocr/textdet/panet/panet_r18_fpem_ffm
    _sbn_600e_icdar2015_20210219-42dbe46a.pth`

    Args:
        ctx (ContextCaller): The context with additional information.
        self: The instance of the class FPEM_FFM.
        x (Tensor): The input tensor of shape (N, C, H, W).

    Returns:
        outs (Tensor): The output tensor of shape (N, C, H, W).
    """
    if self.conv1.in_channels < CHANNEL_THRESH:
        return ctx.origin_func(self, x)

    identity = x

    out = self.conv1(x)
    out = self.norm1(out)
    out = self.relu(out)

    out = self.conv2(out)

    if torch.abs(self.norm2(out)).max() < 65504:
        out = self.norm2(out)
        out += identity
        out = self.relu(out)
        return out
    else:
        global ENABLE
        ENABLE = True
        # the output of the last bn layer exceeds the range of fp16
        w1 = self.norm2.weight / torch.sqrt(self.norm2.running_var +
                                            self.norm2.eps)
        bias = self.norm2.bias - self.norm2.running_mean * w1
        w1 = w1.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2), out.size(3))
        bias = bias.reshape(1, -1, 1, 1).repeat(1, 1, out.size(2),
                                                out.size(3)) + identity
        out = self.relu(w1 * (out / FACTOR) + bias / FACTOR)

        return out

MMPose Support

MMPose is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project.

MMPose installation tutorial

Please refer to official installation guide to install the codebase.

MMPose models support

Model Task ONNX Runtime TensorRT ncnn PPLNN OpenVINO Model config
HRNet PoseDetection Y Y Y N Y config
MSPN PoseDetection Y Y Y N Y config
LiteHRNet PoseDetection Y Y Y N Y config

Example

python tools/deploy.py \
configs/mmpose/posedetection_tensorrt_static-256x192.py \
$MMPOSE_DIR/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
$MMPOSE_DIR/checkpoints/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
$MMDEPLOY_DIR/demo/resources/human-pose.jpg \
--work-dir work-dirs/mmpose/topdown/hrnet/trt \
--device cuda

Note

  • Usually, mmpose models need some extra information for the input image, but we can’t get it directly. So, when exporting the model, you can use $MMDEPLOY_DIR/demo/resources/human-pose.jpg as input.

MMDetection3d Support

MMDetection3d is a next-generation platform for general 3D object detection. It is a part of the OpenMMLab project.

MMDetection3d installation tutorial

Please refer to getting_started.md for installation.

Example

export MODEL_PATH=https://download.openmmlab.com/mmdetection3d/v1.0.0_models/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car/hv_pointpillars_secfpn_6x8_160e_kitti-3d-car_20220331_134606-d42d15ed.pth

python tools/deploy.py \
       configs/mmdet3d/voxel-detection/voxel-detection_tensorrt_dynamic.py \
       ${MMDET3D_DIR}/configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py \
       ${MODEL_PATH} \
       ${MMDET3D_DIR}/demo/data/kitti/kitti_000008.bin \
        --work-dir \
        work_dir \
        --show \
        --device \
        cuda:0

List of MMDetection3d models supported by MMDeploy

Model Task OnnxRuntime TensorRT ncnn PPLNN OpenVINO Model config
PointPillars VoxelDetection Y Y* N N Y config
  1. mmdet3d models on cu102+TRT8.4 can be visualized normally. For cuda-11 or TRT8.2 users, these issues should be checked

  1. Voxel detection onnx model excludes model.voxelize layer and model post process, and you can use python api to call these func.

Example:

from mmdeploy.codebase.mmdet3d.deploy import VoxelDetectionModel
VoxelDetectionModel.voxelize(...)
VoxelDetectionModel.post_process(...)

MMRotate Support

MMRotate is an open-source toolbox for rotated object detection based on PyTorch. It is a part of the OpenMMLab project.

MMRotate installation tutorial

Please refer to official installation guide to install the codebase.

MMRotate models support

Model Task ONNX Runtime TensorRT NCNN PPLNN OpenVINO Model config
RotatedRetinaNet RotatedDetection Y Y N N N config
Oriented RCNN RotatedDetection Y Y N N N config
Gliding Vertex RotatedDetection N Y N N N config
RoI Transformer RotatedDetection Y Y N N N config

Example

# convert ort
python tools/deploy.py \
configs/mmrotate/rotated-detection_onnxruntime_dynamic.py \
$MMROTATE_DIR/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py \
$MMROTATE_DIR/checkpoints/rotated_retinanet_obb_r50_fpn_1x_dota_le135-e4131166.pth \
$MMROTATE_DIR/demo/demo.jpg \
--work-dir work-dirs/mmrotate/rotated_retinanet/ort \
--device cpu

# compute metric
python tools/test.py \
    configs/mmrotate/rotated-detection_onnxruntime_dynamic.py \
    $MMROTATE_DIR/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py \
    --model work-dirs/mmrotate/rotated_retinanet/ort/end2end.onnx \
    --metrics mAP

# generate submit file
python tools/test.py \
    configs/mmrotate/rotated-detection_onnxruntime_dynamic.py \
    $MMROTATE_DIR/configs/rotated_retinanet/rotated_retinanet_obb_r50_fpn_1x_dota_le135.py \
    --model work-dirs/mmrotate/rotated_retinanet/ort/end2end.onnx \
    --format-only \
    --metric-options submission_dir=work-dirs/mmrotate/rotated_retinanet/ort/Task1_results

Note

  • Usually, mmrotate models need some extra information for the input image, but we can’t get it directly. So, when exporting the model, you can use $MMROTATE_DIR/demo/demo.jpg as input.

MMAction2 Support

MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.

Install mmaction2

Please follow the installation guide to install mmaction2.

Supported models

Model TorchScript ONNX Runtime TensorRT ncnn PPLNN OpenVINO
TSN N Y Y N N N
SlowFast N Y Y N N N

Supported ncnn feature

The current use of the ncnn feature is as follows:

feature windows linux mac android
fp32 inference ✔️ ✔️ ✔️ ✔️
int8 model convert - ✔️ ✔️ -
nchw layout ✔️ ✔️ ✔️ ✔️
Vulkan support - ✔️ ✔️ ✔️

The following features cannot be automatically enabled by mmdeploy and you need to manually modify the ncnn build options or adjust the running parameters in the SDK

  • bf16 inference

  • nc4hw4 layout

  • Profiling per layer

  • Turn off NCNN_STRING to reduce .so file size

  • Set thread number and CPU affinity

ONNX Runtime Support

Introduction of ONNX Runtime

ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frameworks. Check its github for more information.

Installation

Please note that only onnxruntime>=1.8.1 of CPU version on Linux platform is supported by now.

  • Install ONNX Runtime python package

pip install onnxruntime==1.8.1

Build custom ops

Prerequisite

  • Download onnxruntime-linux from ONNX Runtime releases, extract it, expose ONNXRUNTIME_DIR and finally add the lib path to LD_LIBRARY_PATH as below:

wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz

tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH

Build on Linux

cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install

How to convert a model

How to add a new custom op

Reminder

  • The custom operator is not included in supported operator list in ONNX Runtime.

  • The custom operator should be able to be exported to ONNX.

Main procedures

Take custom operator roi_align for example.

  1. Create a roi_align directory in ONNX Runtime source directory ${MMDEPLOY_DIR}/csrc/backend_ops/onnxruntime/

  2. Add header and source file into roi_align directory ${MMDEPLOY_DIR}/csrc/backend_ops/onnxruntime/roi_align/

  3. Add unit test into tests/test_ops/test_ops.py Check here for examples.

Finally, welcome to send us PR of adding custom operators for ONNX Runtime in MMDeploy. :nerd_face:

OpenVINO Support

This tutorial is based on Linux systems like Ubuntu-18.04.

Installation

It is recommended to create a virtual environment for the project.

  1. Install OpenVINO. It is recommended to use the installer or install using pip. Installation example using pip:

pip install openvino-dev>=2022.3.0
  1. *Optional If you want to use OpenVINO in SDK, you need install OpenVINO with install_guides.

  2. Install MMDeploy following the instructions.

To work with models from MMDetection, you may need to install it additionally.

Usage

Example:

python tools/deploy.py \
    configs/mmdet/detection/detection_openvino_static-300x300.py \
    /mmdetection_dir/mmdetection/configs/ssd/ssd300_coco.py \
    /tmp/snapshots/ssd300_coco_20210803_015428-d231a06e.pth \
    tests/data/tiger.jpeg \
    --work-dir ../deploy_result \
    --device cpu \
    --log-level INFO

List of supported models exportable to OpenVINO from MMDetection

The table below lists the models that are guaranteed to be exportable to OpenVINO from MMDetection.

Model name Config Dynamic Shape
ATSS configs/atss/atss_r50_fpn_1x_coco.py Y
Cascade Mask R-CNN configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.py Y
Cascade R-CNN configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.py Y
Faster R-CNN configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py Y
FCOS configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco.py Y
FoveaBox configs/foveabox/fovea_r50_fpn_4x4_1x_coco.py Y
FSAF configs/fsaf/fsaf_r50_fpn_1x_coco.py Y
Mask R-CNN configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py Y
RetinaNet configs/retinanet/retinanet_r50_fpn_1x_coco.py Y
SSD configs/ssd/ssd300_coco.py Y
YOLOv3 configs/yolo/yolov3_d53_mstrain-608_273e_coco.py Y
YOLOX configs/yolox/yolox_tiny_8x8_300e_coco.py Y
Faster R-CNN + DCN configs/dcn/faster_rcnn_r50_fpn_dconv_c3-c5_1x_coco.py Y
VFNet configs/vfnet/vfnet_r50_fpn_1x_coco.py Y

Notes:

  • Custom operations from OpenVINO use the domain org.openvinotoolkit.

  • For faster work in OpenVINO in the Faster-RCNN, Mask-RCNN, Cascade-RCNN, Cascade-Mask-RCNN models the RoiAlign operation is replaced with the ExperimentalDetectronROIFeatureExtractor operation in the ONNX graph.

  • Models “VFNet” and “Faster R-CNN + DCN” use the custom “DeformableConv2D” operation.

Deployment config

With the deployment config, you can specify additional options for the Model Optimizer. To do this, add the necessary parameters to the backend_config.mo_options in the fields args (for parameters with values) and flags (for flags).

Example:

backend_config = dict(
    mo_options=dict(
        args=dict({
            '--mean_values': [0, 0, 0],
            '--scale_values': [255, 255, 255],
            '--data_type': 'FP32',
        }),
        flags=['--disable_fusing'],
    )
)

Information about the possible parameters for the Model Optimizer can be found in the documentation.

Troubleshooting

  • ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

    To resolve missing external dependency on Ubuntu*, execute the following command:

    sudo apt-get install libpython3.7
    

PPLNN Support

MMDeploy supports ppl.nn v0.9.1 and later. This tutorial is based on Linux systems like Ubuntu-18.04.

Installation

  1. Please install pyppl following install-guide.

  2. Install MMDeploy following the instructions.

Usage

Example:

python tools/deploy.py \
    configs/mmdet/detection/detection_pplnn_dynamic-800x1344.py \
    /mmdetection_dir/mmdetection/configs/retinanet/retinanet_r50_fpn_1x_coco.py \
    /tmp/snapshots/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
    tests/data/tiger.jpeg \
    --work-dir ../deploy_result \
    --device cuda \
    --log-level INFO

SNPE feature support

Currently mmdeploy integrates the onnx2dlc model conversion and SDK inference, but the following features are not yet supported:

  • GPU_FP16 mode

  • DSP/AIP quantization

  • Operator internal profiling

  • UDO operator

TensorRT Support

Installation

Install TensorRT

Please install TensorRT 8 follow install-guide.

Note:

  • pip Wheel File Installation is not supported yet in this repo.

  • We strongly suggest you install TensorRT through tar file

  • After installation, you’d better add TensorRT environment variables to bashrc by:

    cd ${TENSORRT_DIR} # To TensorRT root directory
    echo '# set env for TensorRT' >> ~/.bashrc
    echo "export TENSORRT_DIR=${TENSORRT_DIR}" >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$TENSORRT_DIR' >> ~/.bashrc
    source ~/.bashrc
    

Build custom ops

Some custom ops are created to support models in OpenMMLab, and the custom ops can be built as follow:

cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=trt ..
make -j$(nproc)

If you haven’t installed TensorRT in the default path, Please add -DTENSORRT_DIR flag in CMake.

 cmake -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} ..
 make -j$(nproc) && make install

Convert model

Please follow the tutorial in How to convert model. Note that the device must be cuda device.

Int8 Support

Since TensorRT supports INT8 mode, a custom dataset config can be given to calibrate the model. Following is an example for MMDetection:

# calibration_dataset.py

# dataset settings, same format as the codebase in OpenMMLab
dataset_type = 'CalibrationDataset'
data_root = 'calibration/dataset/root'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'val_annotations.json',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'test_annotations.json',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

Convert your model with this calibration dataset:

python tools/deploy.py \
    ...
    --calib-dataset-cfg calibration_dataset.py

If the calibration dataset is not given, the data will be calibrated with the dataset in model config.

FAQs

  • Error Cannot found TensorRT headers or Cannot found TensorRT libs

    Try cmake with flag -DTENSORRT_DIR:

    cmake -DBUILD_TENSORRT_OPS=ON -DTENSORRT_DIR=${TENSORRT_DIR} ..
    make -j$(nproc)
    

    Please make sure there are libs and headers in ${TENSORRT_DIR}.

  • Error error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]

    There is an input shape limit in deployment config:

    backend_config = dict(
        # other configs
        model_inputs=[
            dict(
                input_shapes=dict(
                    input=dict(
                        min_shape=[1, 3, 320, 320],
                        opt_shape=[1, 3, 800, 1344],
                        max_shape=[1, 3, 1344, 1344])))
        ])
        # other configs
    

    The shape of the tensor input must be limited between input_shapes["input"]["min_shape"] and input_shapes["input"]["max_shape"].

  • Error error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS

    TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don’t want to upgrade.

    Read this for detail.

  • Install mmdeploy on Jetson

    We provide a tutorial to get start on Jetsons here.

TorchScript support

Introduction of TorchScript

TorchScript a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. Check the Introduction to TorchScript for more details.

Build custom ops

Prerequisite

  • Download libtorch from the official website here.

Please note that only Pre-cxx11 ABI and version 1.8.1+ on Linux platform are supported by now.

For previous versions of libtorch, users can find through the issue comment. Libtorch1.8.1+cu111 as an example, extract it, expose Torch_DIR and add the lib path to LD_LIBRARY_PATH as below:

wget https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.1%2Bcu111.zip

unzip libtorch-shared-with-deps-1.8.1+cu111.zip
cd libtorch
export Torch_DIR=$(pwd)
export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH

Note:

  • If you want to save libtorch env variables to bashrc, you could run

    echo '# set env for libtorch' >> ~/.bashrc
    echo "export Torch_DIR=${Torch_DIR}" >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
    source ~/.bashrc
    

Build on Linux

cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=torchscript -DTorch_DIR=${Torch_DIR} ..
make -j$(nproc) && make install

How to convert a model

SDK backend

TorchScript SDK backend may be built by passing -DMMDEPLOY_TORCHSCRIPT_SDK_BACKEND=ON to cmake.

Notice that libtorch is sensitive to C++ ABI versions. On platforms defaulted to C++11 ABI (e.g. Ubuntu 16+) one may pass -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" to cmake to use pre-C++11 ABI for building. In this case all dependencies with ABI sensitive interfaces (e.g. OpenCV) must be built with pre-C++11 ABI.

FAQs

  • Error: projects/thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.

    May export CUDNN_ROOT=/root/path/to/cudnn to resolve the build error.

Supported RKNN feature

Currently, MMDeploy only tests rk3588 and rv1126 with linux platform.

The following features cannot be automatically enabled by mmdeploy and you need to manually modify the configuration in MMDeploy like here.

  • target_platform other than default

  • quantization settings

  • optimization level other than 1

Core ML feature support

MMDeploy support convert Pytorch model to Core ML and inference.

Installation

To convert the model in mmdet, you need to compile libtorch to support custom operators such as nms.

cd ${PYTORCH_DIR}
mkdir build && cd build
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=`which python` \
    -DCMAKE_INSTALL_PREFIX=install \
    -DDISABLE_SVE=ON # low version like 1.8.0 of pytorch need this option
make install

Usage

python tools/deploy.py \
    configs/mmdet/detection/detection_coreml_static-800x1344.py \
    /mmdetection_dir/configs/retinanet/retinanet_r18_fpn_1x_coco.py \
    /checkpoint/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth \
    /mmdetection_dir/demo/demo.jpg \
    --work-dir work_dir/retinanet \
    --device cpu \
    --dump-info

ONNX Runtime Ops

grid_sampler

Description

Perform sample from input with pixel locations from grid.

Parameters

Type Parameter Description
int interpolation_mode Interpolation mode to calculate output values. (0: bilinear , 1: nearest)
int padding_mode Padding mode for outside grid values. (0: zeros, 1: border, 2: reflection)
int align_corners If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

Inputs

input: T
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
grid: T
Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.

Outputs

output: T
Output feature; 4-D tensor of shape (N, C, outH, outW).

Type Constraints

  • T:tensor(float32, Linear)

MMCVModulatedDeformConv2d

Description

Perform Modulated Deformable Convolution on input feature, read Deformable ConvNets v2: More Deformable, Better Results for detail.

Parameters

Type Parameter Description
list of ints stride The stride of the convolving kernel. (sH, sW)
list of ints padding Paddings on both sides of the input. (padH, padW)
list of ints dilation The spacing between kernel elements. (dH, dW)
int deformable_groups Groups of deformable offset.
int groups Split input into groups. input_channel should be divisible by the number of groups.

Inputs

inputs[0]: T
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
inputs[1]: T
Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
inputs[2]: T
Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
inputs[3]: T
Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
inputs[4]: T, optional
Input bias; 1-D tensor of shape (output_channel).

Outputs

outputs[0]: T
Output feature; 4-D tensor of shape (N, output_channel, outH, outW).

Type Constraints

  • T:tensor(float32, Linear)

NMSRotated

Description

Non Max Suppression for rotated bboxes.

Parameters

Type Parameter Description
float iou_threshold The IoU threshold for NMS.

Inputs

inputs[0]: T
Input feature; 2-D tensor of shape (N, 5), where N is the number of rotated bboxes, .
inputs[1]: T
Input offset; 1-D tensor of shape (N, ), where N is the number of rotated bboxes.

Outputs

outputs[0]: T
Output feature; 1-D tensor of shape (K, ), where K is the number of keep bboxes.

Type Constraints

  • T:tensor(float32, Linear)

RoIAlignRotated

Description

Perform RoIAlignRotated on output feature, used in bbox_head of most two-stage rotated object detectors.

Parameters

Type Parameter Description
int output_height height of output roi
int output_width width of output roi
float spatial_scale used to scale the input boxes
int sampling_ratio number of input samples to take for each output sample. 0 means to take samples densely for current models.
int aligned If aligned=0, use the legacy implementation in MMDetection. Else, align the results more perfectly.
int clockwise If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

Inputs

input: T
Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
rois: T
RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 6) given as [[batch_index, cx, cy, w, h, theta], ...]. The RoIs' coordinates are the coordinate system of input.

Outputs

feat: T
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].

Type Constraints

  • T:tensor(float32)

TensorRT Ops

TRTBatchedNMS

Description

Batched NMS with a fixed number of output bounding boxes.

Parameters

Type Parameter Description
int background_label_id The label ID for the background class. If there is no background class, set it to -1.
int num_classes The number of classes.
int topK The number of bounding boxes to be fed into the NMS step.
int keepTopK The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the topK value.
float scoreThreshold The scalar threshold for score (low scoring boxes are removed).
float iouThreshold The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed).
int isNormalized Set to false if the box coordinates are not normalized, meaning they are not in the range [0,1]. Defaults to true.
int clipBoxes Forcibly restrict bounding boxes to the normalized range [0,1]. Only applicable if isNormalized is also true. Defaults to true.

Inputs

inputs[0]: T
boxes; 4-D tensor of shape (N, num_boxes, num_classes, 4), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
inputs[1]: T
scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).

Outputs

outputs[0]: T
dets; 3-D tensor of shape (N, valid_num_boxes, 5), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, x1, y1, score]`
outputs[1]: tensor(int32, Linear)
labels; 2-D tensor of shape (N, valid_num_boxes).

Type Constraints

  • T:tensor(float32, Linear)

grid_sampler

Description

Perform sample from input with pixel locations from grid.

Parameters

Type Parameter Description
int interpolation_mode Interpolation mode to calculate output values. (0: bilinear , 1: nearest)
int padding_mode Padding mode for outside grid values. (0: zeros, 1: border, 2: reflection)
int align_corners If align_corners=1, the extrema (-1 and 1) are considered as referring to the center points of the input's corner pixels. If align_corners=0, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic.

Inputs

inputs[0]: T
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
inputs[1]: T
Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.

Outputs

outputs[0]: T
Output feature; 4-D tensor of shape (N, C, outH, outW).

Type Constraints

  • T:tensor(float32, Linear)

MMCVInstanceNormalization

Description

Carry out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

Parameters

Type Parameter Description
float epsilon The epsilon value to use to avoid division by zero. Default is 1e-05

Inputs

input: T
Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.
scale: T
The input 1-dimensional scale tensor of size C.
B: T
The input 1-dimensional bias tensor of size C.

Outputs

output: T
The output tensor of the same shape as input.

Type Constraints

  • T:tensor(float32, Linear)

MMCVModulatedDeformConv2d

Description

Perform Modulated Deformable Convolution on input feature. Read Deformable ConvNets v2: More Deformable, Better Results for detail.

Parameters

Type Parameter Description
list of ints stride The stride of the convolving kernel. (sH, sW)
list of ints padding Paddings on both sides of the input. (padH, padW)
list of ints dilation The spacing between kernel elements. (dH, dW)
int deformable_group Groups of deformable offset.
int group Split input into groups. input_channel should be divisible by the number of groups.

Inputs

inputs[0]: T
Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
inputs[1]: T
Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
inputs[2]: T
Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
inputs[3]: T
Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
inputs[4]: T, optional
Input weight; 1-D tensor of shape (output_channel).

Outputs

outputs[0]: T
Output feature; 4-D tensor of shape (N, output_channel, outH, outW).

Type Constraints

  • T:tensor(float32, Linear)

MMCVMultiLevelRoiAlign

Description

Perform RoIAlign on features from multiple levels. Used in bbox_head of most two-stage detectors.

Parameters

Type Parameter Description
int output_height height of output roi.
int output_width width of output roi.
list of floats featmap_strides feature map stride of each level.
int sampling_ratio number of input samples to take for each output sample. 0 means to take samples densely for current models.
float roi_scale_factor RoIs will be scaled by this factor before RoI Align.
int finest_scale Scale threshold of mapping to level 0. Default: 56.
int aligned If aligned=0, use the legacy implementation in MMDetection. Else, align the results more perfectly.

Inputs

inputs[0]: T
RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...].
inputs[1~]: T
Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.

Outputs

outputs[0]: T
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].

Type Constraints

  • T:tensor(float32, Linear)

MMCVRoIAlign

Description

Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.

Parameters

Type Parameter Description
int output_height height of output roi
int output_width width of output roi
float spatial_scale used to scale the input boxes
int sampling_ratio number of input samples to take for each output sample. 0 means to take samples densely for current models.
str mode pooling mode in each bin. avg or max
int aligned If aligned=0, use the legacy implementation in MMDetection. Else, align the results more perfectly.

Inputs

inputs[0]: T
Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
inputs[1]: T
RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].

Outputs

outputs[0]: T
RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].

Type Constraints

  • T:tensor(float32, Linear)

ScatterND

Description

ScatterND takes three inputs data tensor of rank r >= 1, indices tensor of rank q >= 1, and updates tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data, and then updating its value to values specified by updates at specific index positions specified by indices. Its output shape is the same as the shape of data. Note that indices should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

The output is calculated via the following equation:

  output = np.copy(data)
  update_indices = indices.shape[:-1]
  for idx in np.ndindex(update_indices):
      output[indices[idx]] = updates[idx]

Parameters

None

Inputs

inputs[0]: T
Tensor of rank r>=1.
inputs[1]: tensor(int32, Linear)
Tensor of rank q>=1.
inputs[2]: T
Tensor of rank q + r - indices_shape[-1] - 1.

Outputs

outputs[0]: T
Tensor of rank r >= 1.

Type Constraints

  • T:tensor(float32, Linear), tensor(int32, Linear)

TRTBatchedRotatedNMS

Description

Batched rotated NMS with a fixed number of output bounding boxes.

Parameters

Type Parameter Description
int background_label_id The label ID for the background class. If there is no background class, set it to -1.
int num_classes The number of classes.
int topK The number of bounding boxes to be fed into the NMS step.
int keepTopK The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the topK value.
float scoreThreshold The scalar threshold for score (low scoring boxes are removed).
float iouThreshold The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed).
int isNormalized Set to false if the box coordinates are not normalized, meaning they are not in the range [0,1]. Defaults to true.
int clipBoxes Forcibly restrict bounding boxes to the normalized range [0,1]. Only applicable if isNormalized is also true. Defaults to true.

Inputs

inputs[0]: T
boxes; 4-D tensor of shape (N, num_boxes, num_classes, 5), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
inputs[1]: T
scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).

Outputs

outputs[0]: T
dets; 3-D tensor of shape (N, valid_num_boxes, 6), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, width, height, theta, score]`
outputs[1]: tensor(int32, Linear)
labels; 2-D tensor of shape (N, valid_num_boxes).

Type Constraints

  • T:tensor(float32, Linear)

GridPriorsTRT

Description

Generate the anchors for object detection task.

Parameters

Type Parameter Description
int stride_w The stride of the feature width.
int stride_h The stride of the feature height.

Inputs

inputs[0]: T
The base anchors; 2-D tensor with shape [num_base_anchor, 4].
inputs[1]: TAny
height provider; 1-D tensor with shape [featmap_height]. The data will never been used.
inputs[2]: TAny
width provider; 1-D tensor with shape [featmap_width]. The data will never been used.

Outputs

outputs[0]: T
output anchors; 2-D tensor of shape (num_base_anchor*featmap_height*featmap_widht, 4).

Type Constraints

  • T:tensor(float32, Linear)

  • TAny: Any

ScaledDotProductAttentionTRT

Description

Dot product attention used to support multihead attention, read Attention Is All You Need for more detail.

Parameters

None

Inputs

inputs[0]: T
query; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
inputs[1]: T
key; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
inputs[2]: T
value; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
inputs[3]: T
mask; 2-D/3-D tensor with shape [sequence_length, sequence_length] or [batch_size, sequence_length, sequence_length]. optional.

Outputs

outputs[0]: T
3-D tensor of shape [batch_size, sequence_length, embedding_size]. `softmax(q@k.T)@v`
outputs[1]: T
3-D tensor of shape [batch_size, sequence_length, sequence_length]. `softmax(q@k.T)`

Type Constraints

  • T:tensor(float32, Linear)

GatherTopk

Description

TensorRT 8.2~8.4 would give unexpected result for multi-index gather.

data[batch_index, bbox_index, ...]

Read this for more details.

Parameters

None

Inputs

inputs[0]: T
Tensor to be gathered, with shape (A0, ..., An, G0, C0, ...).
inputs[1]: tensor(int32, Linear)
Tensor of index. with shape (A0, ..., An, G1)

Outputs

outputs[0]: T
Tensor of output. With shape (A0, ..., An, G1, C0, ...)

Type Constraints

  • T:tensor(float32, Linear), tensor(int32, Linear)

TRTDCNv3

Description

TensorRT deformable convolution v3 is used to support InternImage. The op contains only im2col logic even though it is named convolution. For more detail, you may refer to InternImage

Parameters

Type Parameter Description
int kernel_h The kernel size of h dim.
int kernel_w The kernel size of w dim.
int stride_h The stride size of h dim.
int stride_w The stride size of w dim.
int pad_h The padding size of h dim.
int pad_w The padding size of w dim.
int dilation_h The dilation size of h dim.
int dilation_w The dilation size of w dim.
int group The group nums.
int group_channels The number of channels per group.
float offset_scale The offset cale.
int im2col_step The step for img2col.

Inputs

inputs[0]: T
A 4-D Tensor, with shape of [batch, height, width, channels].
inputs[1]: T
A 4-D Tensor, with shape of [batch, height, width, channels].
inputs[2]: T
A 4-D Tensor, with shape of [batch, height, width, channels].

Outputs

outputs[0]: T
A 4-D Tensor, with shape of [batch, height, width, channels].

Type Constraints

  • T:tensor(float32, Linear), tensor(int32, Linear), tensor(int32, Linear)

ncnn Ops

Expand

Description

Broadcast the input blob following the given shape and the broadcast rule of ncnn.

Parameters

Expand has no parameters.

Inputs

inputs[0]: ncnn.Mat
bottom_blobs[0]; An ncnn.Mat of input data.
inputs[1]: ncnn.Mat
bottom_blobs[1]; An 1-dim ncnn.Mat. A valid shape of ncnn.Mat.

Outputs

outputs[0]: T
top_blob; The blob of ncnn.Mat which expanded by given shape and broadcast rule of ncnn.

Type Constraints

  • ncnn.Mat: Mat(float32)

Gather

Description

Given the data and indice blob, gather entries of the axis dimension of data indexed by indices.

Parameters

Type Parameter Description
int axis Which axis to gather on. Default is 0.

Inputs

inputs[0]: ncnn.Mat
bottom_blobs[0]; An ncnn.Mat of input data.
inputs[1]: ncnn.Mat
bottom_blobs[1]; An 1-dim ncnn.Mat of indices on given axis.

Outputs

outputs[0]: T
top_blob; The blob of ncnn.Mat which gathered by given data and indice blob.

Type Constraints

  • ncnn.Mat: Mat(float32)

Shape

Description

Get the shape of the ncnn blobs.

Parameters

Shape has no parameters.

Inputs

inputs[0]: ncnn.Mat
bottom_blob; An ncnn.Mat of input data.

Outputs

outputs[0]: T
top_blob; 1-D ncnn.Mat of shape (bottom_blob.dims,), `bottom_blob.dims` is the input blob dimensions.

Type Constraints

  • ncnn.Mat: Mat(float32)

TopK

Description

Get the indices and value(optional) of largest or smallest k data among the axis. This op will map to onnx op TopK, ArgMax, and ArgMin.

Parameters

Type Parameter Description
int axis The axis of data which topk calculate on. Default is -1, indicates the last dimension.
int largest The binary value which indicates the TopK operator selects the largest or smallest K values. Default is 1, the TopK selects the largest K values.
int sorted The binary value of whether returning sorted topk value or not. If not, the topk returns topk values in any order. Default is 1, this operator returns sorted topk values.
int keep_dims The binary value of whether keep the reduced dimension or not. Default is 1, each output blob has the same dimension as input blob.

Inputs

inputs[0]: ncnn.Mat
bottom_blob[0]; An ncnn.Mat of input data.
inputs[1] (optional): ncnn.Mat
bottom_blob[1]; An optional ncnn.Mat. A blob of K in TopK. If this blob not exist, K is 1.

Outputs

outputs[0]: T
top_blob[0]; If outputs has only 1 blob, outputs[0] is the indice blob of topk, if outputs has 2 blobs, outputs[0] is the value blob of topk. This blob is ncnn.Mat format with the shape of bottom_blob[0] or reduced shape of bottom_blob[0].
outputs[1]: T
top_blob[1] (optional); If outputs has 2 blobs, outputs[1] is the value blob of topk. This blob is ncnn.Mat format with the shape of bottom_blob[0] or reduced shape of bottom_blob[0].

Type Constraints

  • ncnn.Mat: Mat(float32)

mmdeploy Architecture

This article mainly introduces the functions of each directory of mmdeploy and how it works from model conversion to real inference.

Take a general look at the directory structure

The entire mmdeploy can be seen as two independent parts: model conversion and SDK.

We introduce the entire repo directory structure and functions, without having to study the source code, just have an impression.

Peripheral directory features:

$ cd /path/to/mmdeploy
$ tree -L 1
.
├── CMakeLists.txt    # Compile custom operator and cmake configuration of SDK
├── configs                   # Algorithm library configuration for model conversion
├── csrc                          # SDK and custom operator
├── demo                      # FFI interface examples in various languages, such as csharp, java, python, etc.
├── docker                   # docker build
├── mmdeploy           # python package for model conversion
├── requirements      # python requirements
├── service                    # Some small boards not support python, we use C/S mode for model conversion, here is server code
├── tests                         # unittest
├── third_party           # 3rd party dependencies required by SDK and FFI
└── tools                        # Tools are also the entrance to all functions, such as onnx2xx.py, profile.py, test.py, etc.

It should be clear

  • Model conversion mainly depends on tools, mmdeploy and small part of csrc directory;

  • SDK is consist of three directories: csrc, third_party and demo.

Model Conversion

Here we take ViT of mmcls as model example, and take ncnn as inference backend example. Other models and inferences are similar.

Let’s take a look at the mmdeploy/mmdeploy directory structure and get an impression:

.
├── apis                             # The api used by tools is implemented here, such as onnx2ncnn.py   ├── calibration.py          # trt dedicated collection of quantitative data   ├── core                              # Software infrastructure   ├── extract_model.py  # Use it to export part of onnx   ├── inference.py             # Abstract function, which will actually call torch/ncnn specific inference   ├── ncnn                            # ncnn Wrapper   └── visualize.py              # Still an abstract function, which will actually call torch/ncnn specific inference and visualize
..
├── backend                  # Backend wrapper   ├── base                            # Because there are multiple backends, there must be an OO design for the base class   ├── ncnn                           # This calls the ncnn python interface for model conversion      ├── init_plugins.py           # Find the path of ncnn custom operators and ncnn tools      ├── onnx2ncnn.py            # Wrap `mmdeploy_onnx2ncnn` into a python interface      ├── quant.py                       # Wrap `ncnn2int8` as a python interface      └── wrapper.py                  # Wrap pyncnn forward API
..
├── codebase                #  Algorithm rewriter   ├── base                          # There are multiple algorithms here that we need a bit of OO design   ├── mmcls                      #  mmcls related model rewrite      ├── deploy                       # mmcls implementation of base abstract task/model/codebase      └── models                      # Real model rewrite          ├── backbones                 # Rewrites of backbone network parts, such as multiheadattention          ├── heads                           # Such as MultiLabelClsHead          ├── necks                            # Such as GlobalAveragePooling
│..
├── core                         # Software infrastructure of rewrite mechanism
├── mmcv                     #  Rewrite mmcv
├── pytorch                 #  Rewrite pytorch operator for ncnn, such as Gemm
..

Each line above needs to be read, don’t skip it.

When typing tools/deploy.py to convert ViT, these are 3 things:

  1. Rewrite of mmcls ViT forward

  2. ncnn does not support gather opr, customize and load it with libncnn.so

  3. Run exported ncnn model with real inference, render output, and make sure the result is correct

1. Rewrite forward

Because when exporting ViT to onnx, it generates some operators that ncnn doesn’t support perfectly, mmdeploy’s solution is to hijack the forward code and change it. The output onnx is suitable for ncnn.

For example, rewrite the process of conv -> shape -> concat_const -> reshape to conv -> reshape to trim off the redundant shape and concat operator.

All mmcls algorithm rewriters are in the mmdeploy/codebase/mmcls/models directory.

2. Custom Operator

Operators customized for ncnn are in the csrc/mmdeploy/backend_ops/ncnn/ directory, and are loaded together with libncnn.so after compilation. The essence is in hotfix ncnn, which currently implements these operators:

  • topk

  • tensorslice

  • shpe

  • gather

  • expand

  • constantofshape

3. Model Conversion and testing

We first use the modified mmdeploy_onnx2ncnnto convert model, then inference withpyncnn and custom ops.

When encountering a framework such as snpe that does not support python well, we use C/S mode: wrap a server with protocols such as gRPC, and forward the real inference output.

For Rendering, mmdeploy directly uses the rendering API of upstream algorithm codebase.

SDK

After the model conversion completed, the SDK compiled with C++ can be used to execute on different platforms.

Let’s take a look at the csrc/mmdeploy directory structure:

.
├── apis           # csharp, java, go, Rust and other FFI interfaces
├── backend_ops    # Custom operators for each inference framework
├── CMakeLists.txt
├── codebase       # The type of results preferred by each algorithm framework, such as multi-use bbox for detection task
├── core           # Abstraction of graph, operator, device and so on
├── device         # Implementation of CPU/GPU device abstraction
├── execution      # Implementation of the execution abstraction
├── graph          # Implementation of graph abstraction
├── model          # Implement both zip-compressed and uncompressed work directory
├── net            # Implementation of net, such as wrap ncnn forward C API
├── preprocess     # Implement preprocess
└── utils          # OCV tools

The essence of the SDK is to design a set of abstraction of the computational graph, and combine the multiple models’

  • preprocess

  • inference

  • postprocess

Provide FFI in multiple languages at the same time.

How to support new models

We provide several tools to support model conversion.

Function Rewriter

The PyTorch neural network is written in python that eases the development of the algorithm. But the use of Python control flow and third-party libraries make it difficult to export the network to an intermediate representation. We provide a ‘monkey patch’ tool to rewrite the unsupported function to another one that can be exported. Here is an example:

from mmdeploy.core import FUNCTION_REWRITER

@FUNCTION_REWRITER.register_rewriter(
    func_name='torch.Tensor.repeat', backend='tensorrt')
def repeat_static(ctx, input, *size):
    origin_func = ctx.origin_func
    if input.dim() == 1 and len(size) == 1:
        return origin_func(input.unsqueeze(0), *([1] + list(size))).squeeze(0)
    else:
        return origin_func(input, *size)

It is easy to use the function rewriter. Just add a decorator with arguments:

  • func_name is the function to override. It can be either a PyTorch function or a custom function. Methods in modules can also be overridden by this tool.

  • backend is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.

The arguments are the same as the original function, except a context ctx as the first argument. The context provides some useful information such as the deployment config ctx.cfg and the original function (which has been overridden) ctx.origin_func.

Module Rewriter

If you want to replace a whole module with another one, we have another rewriter as follows:

@MODULE_REWRITER.register_rewrite_module(
    'mmedit.models.backbones.sr_backbones.SRCNN', backend='tensorrt')
class SRCNNWrapper(nn.Module):

    def __init__(self,
                 module,
                 cfg,
                 channels=(3, 64, 32, 3),
                 kernel_sizes=(9, 1, 5),
                 upscale_factor=4):
        super(SRCNNWrapper, self).__init__()

        self._module = module

        module.img_upsampler = nn.Upsample(
            scale_factor=module.upscale_factor,
            mode='bilinear',
            align_corners=False)

    def forward(self, *args, **kwargs):
        """Run forward."""
        return self._module(*args, **kwargs)

    def init_weights(self, *args, **kwargs):
        """Initialize weights."""
        return self._module.init_weights(*args, **kwargs)

Just like function rewriter, add a decorator with arguments:

  • module_type the module class to rewrite.

  • backend is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.

All instances of the module in the network will be replaced with instances of this new class. The original module and the deployment config will be passed as the first two arguments.

Custom Symbolic

The mappings between PyTorch and ONNX are defined in PyTorch with symbolic functions. The custom symbolic function can help us to bypass some ONNX nodes which are unsupported by inference engine.

@SYMBOLIC_REWRITER.register_symbolic('squeeze', is_pytorch=True)
def squeeze_default(ctx, g, self, dim=None):
    if dim is None:
        dims = []
        for i, size in enumerate(self.type().sizes()):
            if size == 1:
                dims.append(i)
    else:
        dims = [sym_help._get_const(dim, 'i', 'dim')]
    return g.op('Squeeze', self, axes_i=dims)

The decorator arguments:

  • func_name The function name to add symbolic. Use full path if it is a custom torch.autograd.Function. Or just a name if it is a PyTorch built-in function.

  • backend is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.

  • is_pytorch True if the function is a PyTorch built-in function.

  • arg_descriptors the descriptors of the symbolic function arguments. Will be feed to torch.onnx.symbolic_helper._parse_arg.

Just like function rewriter, there is a context ctx as the first argument. The context provides some useful information such as the deployment config ctx.cfg and the original function (which has been overridden) ctx.origin_func. Note that the ctx.origin_func can be used only when is_pytorch==False.

How to support new backends

MMDeploy supports a number of backend engines. We welcome the contribution of new backends. In this tutorial, we will introduce the general procedures to support a new backend in MMDeploy.

Prerequisites

Before contributing the codes, there are some requirements for the new backend that need to be checked:

  • The backend must support ONNX as IR.

  • If the backend requires model files or weight files other than a “.onnx” file, a conversion tool that converts the “.onnx” file to model files and weight files is required. The tool can be a Python API, a script, or an executable program.

  • It is highly recommended that the backend provides a Python interface to load the backend files and inference for validation.

Support backend conversion

The backends in MMDeploy must support the ONNX. The backend loads the “.onnx” file directly, or converts the “.onnx” to its own format using the conversion tool. In this section, we will introduce the steps to support backend conversion.

  1. Add backend constant in mmdeploy/utils/constants.py that denotes the name of the backend.

    Example:

    # mmdeploy/utils/constants.py
    
    class Backend(AdvancedEnum):
        # Take TensorRT as an example
        TENSORRT = 'tensorrt'
    
  2. Add a corresponding package (a folder with __init__.py) in mmdeploy/backend/. For example, mmdeploy/backend/tensorrt. If the backend requires model files or weight files other than a “.onnx” file, create a onnx2backend.py file in the corresponding folder (e.g., create mmdeploy/backend/tensorrt/onnx2tensorrt.py). Then add a conversion function onnx2backend in the file. The function should convert a given “.onnx” file to the required backend files in a given work directory. There are no requirements on other parameters of the function and the implementation details. You can use any tools for conversion. Here are some examples:

    Use Python script:

    def onnx2openvino(input_info: Dict[str, Union[List[int], torch.Size]],
                      output_names: List[str], onnx_path: str, work_dir: str):
    
        input_names = ','.join(input_info.keys())
        input_shapes = ','.join(str(list(elem)) for elem in input_info.values())
        output = ','.join(output_names)
    
        mo_args = f'--input_model="{onnx_path}" '\
                  f'--output_dir="{work_dir}" ' \
                  f'--output="{output}" ' \
                  f'--input="{input_names}" ' \
                  f'--input_shape="{input_shapes}" ' \
                  f'--disable_fusing '
        command = f'mo.py {mo_args}'
        mo_output = run(command, stdout=PIPE, stderr=PIPE, shell=True, check=True)
    

    Use executable program:

    def onnx2ncnn(onnx_path: str, work_dir: str):
        onnx2ncnn_path = get_onnx2ncnn_path()
        save_param, save_bin = get_output_model_file(onnx_path, work_dir)
        call([onnx2ncnn_path, onnx_path, save_param, save_bin])\
    
  3. Create a backend manager class and implement the interface to support model conversion, version check and other features.

    Example:

     # register the backend manager
     # the backend manager derive from BaseBackendManager
     @BACKEND_MANAGERS.register('tensorrt')
     class TensorRTManager(BaseBackendManager):
    
         @classmethod
         def is_available(cls, with_custom_ops: bool = False) -> bool:
             ....
    
    
         @classmethod
         def get_version(cls) -> str:
             ....
    
         @classmethod
         def to_backend(cls,
                     ir_files: Sequence[str],
                     work_dir: str,
                     deploy_cfg: Any,
                     log_level: int = logging.INFO,
                     device: str = 'cpu',
                     **kwargs) -> Sequence[str]:
             ...
    
  4. Create a config file in configs/_base_/backends (e.g., configs/_base_/backends/tensorrt.py). If the backend just takes the ‘.onnx’ file as input, the new config can be simple. The config of the backend only consists of one field denoting the name of the backend (which should be same as the name in mmdeploy/utils/constants.py).

    Example:

    backend_config = dict(type='onnxruntime')
    

    If the backend requires other files, then the arguments for the conversion from “.onnx” file to backend files should be included in the config file.

    Example:

    backend_config = dict(
        type='tensorrt',
        common_config=dict(
            fp16_mode=False, max_workspace_size=0))
    

    After possessing a base backend config file, you can easily construct a complete deploy config through inheritance. Please refer to our config tutorial for more details. Here is an example:

    _base_ = ['../_base_/backends/onnxruntime.py']
    
    codebase_config = dict(type='mmcls', task='Classification')
    onnx_config = dict(input_shape=None)
    
  5. Define APIs in a new package in mmdeploy/apis.

    Example:

    # mmdeploy/apis/ncnn/__init__.py
    
    from mmdeploy.backend.ncnn import is_available
    
    __all__ = ['is_available']
    
    if is_available():
        from mmdeploy.backend.ncnn.onnx2ncnn import (onnx2ncnn,
                                                     get_output_model_file)
        __all__ += ['onnx2ncnn', 'get_output_model_file']
    
  6. Convert the models of OpenMMLab to backends (if necessary) and inference on backend engine. If you find some incompatible operators when testing, you can try to rewrite the original model for the backend following the rewriter tutorial or add custom operators.

  7. Add docstring and unit tests for new code :).

Support backend inference

Although the backend engines are usually implemented in C/C++, it is convenient for testing and debugging if the backend provides Python inference interface. We encourage the contributors to support backend inference in the Python interface of MMDeploy. In this section we will introduce the steps to support backend inference.

  1. Add a file named wrapper.py to corresponding folder in mmdeploy/backend/{backend}. For example, mmdeploy/backend/tensorrt/wrapper.py. This module should implement and register a wrapper class that inherits the base class BaseWrapper in mmdeploy/backend/base/base_wrapper.py.

    Example:

    from mmdeploy.utils import Backend
    from ..base import BACKEND_WRAPPER, BaseWrapper
    
    @BACKEND_WRAPPER.register_module(Backend.TENSORRT.value)
    class TRTWrapper(BaseWrapper):
    
  2. The wrapper class can initialize the engine in __init__ function and inference in forward function. Note that the __init__ function must take a parameter output_names and pass it to base class to determine the orders of output tensors. The input and output variables of forward should be dictionaries denoting the name and value of the tensors.

  3. For the convenience of performance testing, the class should define a “execute” function that only calls the inference interface of the backend engine. The forward function should call the “execute” function after preprocessing the data.

    Example:

    from mmdeploy.utils import Backend
    from mmdeploy.utils.timer import TimeCounter
    from ..base import BACKEND_WRAPPER, BaseWrapper
    
    @BACKEND_WRAPPER.register_module(Backend.ONNXRUNTIME.value)
    class ORTWrapper(BaseWrapper):
    
        def __init__(self,
                     onnx_file: str,
                     device: str,
                     output_names: Optional[Sequence[str]] = None):
            # Initialization
            # ...
            super().__init__(output_names)
    
        def forward(self, inputs: Dict[str,
                                       torch.Tensor]) -> Dict[str, torch.Tensor]:
            # Fetch data
            # ...
    
            self.__ort_execute(self.io_binding)
    
    		# Postprocess data
            # ...
    
        @TimeCounter.count_time('onnxruntime')
        def __ort_execute(self, io_binding: ort.IOBinding):
    		# Only do the inference
            self.sess.run_with_iobinding(io_binding)
    
  4. Implement build_wrapper method in the backend manager.

    Example:

         @BACKEND_MANAGERS.register('onnxruntime')
         class ONNXRuntimeManager(BaseBackendManager):
    
             @classmethod
             def build_wrapper(cls,
                               backend_files: Sequence[str],
                               device: str = 'cpu',
                               input_names: Optional[Sequence[str]] = None,
                               output_names: Optional[Sequence[str]] = None,
                               deploy_cfg: Optional[Any] = None,
                               **kwargs):
                 from .wrapper import ORTWrapper
                 return ORTWrapper(
                     onnx_file=backend_files[0],
                     device=device,
                     output_names=output_names)
    
  5. Add docstring and unit tests for new code :).

Support new backends using MMDeploy as a third party

Previous parts show how to add a new backend in MMDeploy, which requires changing its source codes. However, if we treat MMDeploy as a third party, the methods above are no longer efficient. To this end, adding a new backend requires us pre-install another package named aenum. We can install it directly through pip install aenum.

After installing aenum successfully, we can use it to add a new backend through:

from mmdeploy.utils.constants import Backend
from aenum import extend_enum

try:
    Backend.get('backend_name')
except Exception:
    extend_enum(Backend, 'BACKEND', 'backend_name')

We can run the codes above before we use the rewrite logic of MMDeploy.

How to add test units for backend ops

This tutorial introduces how to add unit test for backend ops. When you add a custom op under backend_ops, you need to add the corresponding test unit. Test units of ops are included in tests/test_ops/test_ops.py.

Prerequisite

  • Compile new ops: After adding a new custom op, needs to recompile the relevant backend, referring to build.md.

1. Add the test program test_XXXX()

You can put unit test for ops in tests/test_ops/. Usually, the following program template can be used for your custom op.

example of ops unit test

@pytest.mark.parametrize('backend', [TEST_TENSORRT, TEST_ONNXRT])        # 1.1 backend test class
@pytest.mark.parametrize('pool_h,pool_w,spatial_scale,sampling_ratio',   # 1.2 set parameters of op
                         [(2, 2, 1.0, 2), (4, 4, 2.0, 4)])               # [(# Examples of op test parameters),...]
def test_roi_align(backend,
                   pool_h,                                               # set parameters of op
                   pool_w,
                   spatial_scale,
                   sampling_ratio,
                   input_list=None,
                   save_dir=None):
    backend.check_env()

    if input_list is None:
        input = torch.rand(1, 1, 16, 16, dtype=torch.float32)            # 1.3 op input data initialization
        single_roi = torch.tensor([[0, 0, 0, 4, 4]], dtype=torch.float32)
    else:
        input = torch.tensor(input_list[0], dtype=torch.float32)
        single_roi = torch.tensor(input_list[1], dtype=torch.float32)

    from mmcv.ops import roi_align

    def wrapped_function(torch_input, torch_rois):                       # 1.4 initialize op model to be tested
        return roi_align(torch_input, torch_rois, (pool_w, pool_h),
                         spatial_scale, sampling_ratio, 'avg', True)

    wrapped_model = WrapFunction(wrapped_function).eval()

    with RewriterContext(cfg={}, backend=backend.backend_name, opset=11): # 1.5 call the backend test class interface
        backend.run_and_validate(
            wrapped_model, [input, single_roi],
            'roi_align',
            input_names=['input', 'rois'],
            output_names=['roi_feat'],
            save_dir=save_dir)

1.1 backend test class

We provide some functions and classes for difference backends, such as TestOnnxRTExporter, TestTensorRTExporter, TestNCNNExporter.

1.2 set parameters of op

Set some parameters of op, such as ’pool_h‘, ’pool_w‘, ’spatial_scale‘, ’sampling_ratio‘ in roi_align. You can set multiple parameters to test op.

1.3 op input data initialization

Initialization required input data.

1.4 initialize op model to be tested

The model containing custom op usually has two forms.

  • torch model: Torch model with custom operators. Python code related to op is required, refer to roi_align unit test.

  • onnx model: Onnx model with custom operators. Need to call onnx api to build, refer to multi_level_roi_align unit test.

1.5 call the backend test class interface

Call the backend test class run_and_validate to run and verify the result output by the op on the backend.

    def run_and_validate(self,
                         model,
                         input_list,
                         model_name='tmp',
                         tolerate_small_mismatch=False,
                         do_constant_folding=True,
                         dynamic_axes=None,
                         output_names=None,
                         input_names=None,
                         expected_result=None,
                         save_dir=None):
Parameter Description
  • model: Input model to be tested and it can be torch model or any other backend model.

  • input_list: List of test data, which is mapped to the order of input_names.

  • model_name: The name of the model.

  • tolerate_small_mismatch: Whether to allow small errors in the verification of results.

  • do_constant_folding: Whether to use constant light folding to optimize the model.

  • dynamic_axes: If you need to use dynamic dimensions, enter the dimension information.

  • output_names: The node name of the output node.

  • input_names: The node name of the input node.

  • expected_result: Expected ground truth values for verification.

  • save_dir: The folder used to save the output files.

2. Test Methods

Use pytest to call the test function to test ops.

pytest tests/test_ops/test_ops.py::test_XXXX

How to test rewritten models

After you create a rewritten model using our rewriter, it’s better to write a unit test for the model to validate if the model rewrite would come into effect. Generally, we need to get outputs of the original model and rewritten model, then compare them. The outputs of the original model can be acquired directly by calling the forward function of the model, whereas the way to generate the outputs of the rewritten model depends on the complexity of the rewritten model.

Test rewritten model with small changes

If the changes to the model are small (e.g., only change the behavior of one or two variables and don’t introduce side effects), you can construct the input arguments for the rewritten functions/modules,run model’s inference in RewriteContext and check the results.

# mmcls.models.classfiers.base.py
class BaseClassifier(BaseModule, metaclass=ABCMeta):
    def forward(self, img, return_loss=True, **kwargs):
        if return_loss:
            return self.forward_train(img, **kwargs)
        else:
            return self.forward_test(img, **kwargs)

# Custom rewritten function
@FUNCTION_REWRITER.register_rewriter(
    'mmcls.models.classifiers.BaseClassifier.forward', backend='default')
def forward_of_base_classifier(ctx, self, img, *args, **kwargs):
    """Rewrite `forward` for default backend."""
    return self.simple_test(img, {})

In the example, we only change the function that forward calls. We can test this rewritten function by writing the following test function:

def test_baseclassfier_forward():
    input = torch.rand(1)
    from mmcls.models.classifiers import BaseClassifier
    class DummyClassifier(BaseClassifier):

        def __init__(self, init_cfg=None):
            super().__init__(init_cfg=init_cfg)

        def extract_feat(self, imgs):
            pass

        def forward_train(self, imgs):
            return 'train'

        def simple_test(self, img, tmp, **kwargs):
            return 'simple_test'

    model = DummyClassifier().eval()

    model_output = model(input)
    with RewriterContext(cfg=dict()), torch.no_grad():
        backend_output = model(input)

    assert model_output == 'train'
    assert backend_output == 'simple_test'

In this test function, we construct a derived class of BaseClassifier to test if the rewritten model would work in the rewrite context. We get outputs of the original model by directly calling model(input) and get the outputs of the rewritten model by calling model(input) in RewriteContext. Finally, we can check the outputs by asserting their value.

Test rewritten model with big changes

In the first example, the output is generated in Python. Sometimes we may make big changes to original model functions (e.g., eliminate branch statements to generate correct computing graph). Even if the outputs of a rewritten model running in Python are correct, we cannot assure that the rewritten model can work as expected in the backend. Therefore, we need to test the rewritten model in the backend.

# Custom rewritten function
@FUNCTION_REWRITER.register_rewriter(
    func_name='mmseg.models.segmentors.BaseSegmentor.forward')
def base_segmentor__forward(ctx, self, img, img_metas=None, **kwargs):
    if img_metas is None:
        img_metas = {}
    assert isinstance(img_metas, dict)
    assert isinstance(img, torch.Tensor)

    deploy_cfg = ctx.cfg
    is_dynamic_flag = is_dynamic_shape(deploy_cfg)
    img_shape = img.shape[2:]
    if not is_dynamic_flag:
        img_shape = [int(val) for val in img_shape]
    img_metas['img_shape'] = img_shape
    return self.simple_test(img, img_metas, **kwargs)

The behavior of this rewritten function is complex. We should test it as follows:

def test_basesegmentor_forward():
    from mmdeploy.utils.test import (WrapModel, get_model_outputs,
                                    get_rewrite_outputs)

    segmentor = get_model()
    segmentor.cpu().eval()

    # Prepare data
    # ...

    # Get the outputs of original model
    model_inputs = {
        'img': [imgs],
        'img_metas': [img_metas],
        'return_loss': False
    }
    model_outputs = get_model_outputs(segmentor, 'forward', model_inputs)

    # Get the outputs of rewritten model
    wrapped_model = WrapModel(segmentor, 'forward', img_metas = None, return_loss = False)
    rewrite_inputs = {'img': imgs}
    rewrite_outputs, is_backend_output = get_rewrite_outputs(
        wrapped_model=wrapped_model,
        model_inputs=rewrite_inputs,
        deploy_cfg=deploy_cfg)
    if is_backend_output:
        # If the backend plugins have been installed, the rewrite outputs are
        # generated by backend.
        rewrite_outputs = torch.tensor(rewrite_outputs)
        model_outputs = torch.tensor(model_outputs)
        model_outputs = model_outputs.unsqueeze(0).unsqueeze(0)
        assert torch.allclose(rewrite_outputs, model_outputs)
    else:
        # Otherwise, the outputs are generated by python.
        assert rewrite_outputs is not None

We provide some utilities to test rewritten functions. At first, you can construct a model and call get_model_outputs to get outputs of the original model. Then you can wrap the rewritten function with WrapModel, which serves as a partial function, and get the results with get_rewrite_outputs. get_rewrite_outputs returns two values that indicate the content of outputs and whether the outputs come from the backend. Because we cannot assume that everyone has installed the backend, we should check if the results are generated by a Python or backend engine. The unit test must cover both conditions. Finally, we should compare the original and rewritten outputs, which may be done simply by calling torch.allclose.

Note

To learn the complete usage of the test utilities, please refer to our apis document.

How to get partitioned ONNX models

MMDeploy supports exporting PyTorch models to partitioned onnx models. With this feature, users can define their partition policy and get partitioned onnx models at ease. In this tutorial, we will briefly introduce how to support partition a model step by step. In the example, we would break YOLOV3 model into two parts and extract the first part without the post-processing (such as anchor generating and NMS) in the onnx model.

Step 1: Mark inputs/outpupts

To support the model partition, we need to add Mark nodes in the ONNX model. This could be done with mmdeploy’s @mark decorator. Note that to make the mark work, the marking operation should be included in a rewriting function.

At first, we would mark the model input, which could be done by marking the input tensor img in the forward method of BaseDetector class, which is the parent class of all detector classes. Thus we name this marking point as detector_forward and mark the inputs as input. Since there could be three outputs for detectors such as Mask RCNN, the outputs are marked as dets, labels, and masks. The following code shows the idea of adding mark functions and calling the mark functions in the rewrite. For source code, you could refer to mmdeploy/codebase/mmdet/models/detectors/base.py

from mmdeploy.core import FUNCTION_REWRITER, mark

@mark(
    'detector_forward', inputs=['input'], outputs=['dets', 'labels', 'masks'])
def __forward_impl(ctx, self, img, img_metas=None, **kwargs):
    ...


@FUNCTION_REWRITER.register_rewriter(
    'mmdet.models.detectors.base.BaseDetector.forward')
def base_detector__forward(ctx, self, img, img_metas=None, **kwargs):
    ...
    # call the mark function
    return __forward_impl(...)

Then, we have to mark the output feature of YOLOV3Head, which is the input argument pred_maps in get_bboxes method of YOLOV3Head class. We could add a internal function to only mark the pred_maps inside yolov3_head__get_bboxes function as following.

from mmdeploy.core import FUNCTION_REWRITER, mark

@FUNCTION_REWRITER.register_rewriter(
    func_name='mmdet.models.dense_heads.YOLOV3Head.get_bboxes')
def yolov3_head__get_bboxes(ctx,
                            self,
                            pred_maps,
                            img_metas,
                            cfg=None,
                            rescale=False,
                            with_nms=True):
    # mark pred_maps
    @mark('yolo_head', inputs=['pred_maps'])
    def __mark_pred_maps(pred_maps):
        return pred_maps
    pred_maps = __mark_pred_maps(pred_maps)
    ...

Note that pred_maps is a list of Tensor and it has three elements. Thus, three Mark nodes with op name as pred_maps.0, pred_maps.1, pred_maps.2 would be added in the onnx model.

Step 2: Add partition config

After marking necessary nodes that would be used to split the model, we could add a deployment config file configs/mmdet/detection/yolov3_partition_onnxruntime_static.py. If you are not familiar with how to write config, you could check write_config.md.

In the config file, we need to add partition_config. The key part is partition_cfg, which contains elements of dict that designates the start nodes and end nodes of each model segments. Since we only want to keep YOLOV3 without post-processing, we could set the start as ['detector_forward:input'], and end as ['yolo_head:input']. Note that start and end can have multiple marks.

_base_ = ['./detection_onnxruntime_static.py']

onnx_config = dict(input_shape=[608, 608])
partition_config = dict(
    type='yolov3_partition', # the partition policy name
    apply_marks=True, # should always be set to True
    partition_cfg=[
        dict(
            save_file='yolov3.onnx', # filename to save the partitioned onnx model
            start=['detector_forward:input'], # [mark_name:input/output, ...]
            end=['yolo_head:input'],  # [mark_name:input/output, ...]
            output_names=[f'pred_maps.{i}' for i in range(3)]) # output names
    ])

Step 3: Get partitioned onnx models

Once we have marks of nodes and the deployment config with parition_config being set properly, we could use the tool torch2onnx to export the model to onnx and get the partition onnx files.

python tools/torch2onnx.py \
configs/mmdet/detection/yolov3_partition_onnxruntime_static.py \
../mmdetection/configs/yolo/yolov3_d53_mstrain-608_273e_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-608_273e_coco/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
../mmdetection/demo/demo.jpg \
--work-dir ./work-dirs/mmdet/yolov3/ort/partition

After run the script above, we would have the partitioned onnx file yolov3.onnx in the work-dir. You can use the visualization tool netron to check the model structure.

With the partitioned onnx file, you could refer to useful_tools.md to do the following procedures such as mmdeploy_onnx2ncnn, onnx2tensorrt.

How to do regression test

This tutorial describes how to do regression test. The deployment configuration file contains codebase config and inference config.

1. Python Environment

pip install -r requirements/tests.txt

If pip throw an exception, try to upgrade numpy.

pip install -U numpy

2. Usage

python ./tools/regression_test.py \
    --codebase "${CODEBASE_NAME}" \
    --backends "${BACKEND}" \
    [--models "${MODELS}"] \
    --work-dir "${WORK_DIR}" \
    --device "${DEVICE}" \
    --log-level INFO \
    [--performance  -p] \
    [--checkpoint-dir "$CHECKPOINT_DIR"]

Description

  • --codebase : The codebase to test, eg.mmdet. If you want to test multiple codebase, use mmcls mmdet ...

  • --backends : The backend to test. By default, all backends would be tested. You can use onnxruntime tesensorrtto choose several backends. If you also need to test the SDK, you need to configure the sdk_config in tests/regression/${codebase}.yml.

  • --models : Specify the model to be tested. All models in yml are tested by default. You can also give some model names. For the model name, please refer to the relevant yml configuration file. For example ResNet SE-ResNet "Mask R-CNN". Model name can only contain numbers and letters.

  • --work-dir : The directory of model convert and report, use ../mmdeploy_regression_working_dir by default.

  • --checkpoint-dir: The path of downloaded torch model, use ../mmdeploy_checkpoints by default.

  • --device : device type, use cuda by default

  • --log-level : These options are available:'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG',  'NOTSET'. The default value is INFO.

  • -p or --performance : Test precision or not. If not enabled, only model convert would be tested.

Notes

For Windows user:

  1. To use the && connector in shell commands, you need to download PowerShell 7 Preview 5+.

  2. If you are using conda env, you may need to change python3 to python in regression_test.py because there is python3.exe in %USERPROFILE%\AppData\Local\Microsoft\WindowsApps directory.

Example

  1. Test all backends of mmdet and mmpose for model convert and precision

python ./tools/regression_test.py \
    --codebase mmdet mmpose \
    --work-dir "../mmdeploy_regression_working_dir" \
    --device "cuda" \
    --log-level INFO \
    --performance
  1. Test model convert and precision of some backends of mmdet and mmpose

python ./tools/regression_test.py \
    --codebase mmdet mmpose \
    --backends onnxruntime tensorrt \
    --work-dir "../mmdeploy_regression_working_dir" \
    --device "cuda" \
    --log-level INFO \
    -p
  1. Test some backends of mmdet and mmpose, only test model convert

python ./tools/regression_test.py \
    --codebase mmdet mmpose \
    --backends onnxruntime tensorrt \
    --work-dir "../mmdeploy_regression_working_dir" \
    --device "cuda" \
    --log-level INFO
  1. Test some models of mmdet and mmcls, only test model convert

python ./tools/regression_test.py \
    --codebase mmdet mmpose \
    --models ResNet SE-ResNet "Mask R-CNN" \
    --work-dir "../mmdeploy_regression_working_dir" \
    --device "cuda" \
    --log-level INFO

3. Regression Test Configuration

Example and parameter description

globals:
  codebase_dir: ../mmocr # codebase path to test
  checkpoint_force_download: False # whether to redownload the model even if it already exists
  images:
    img_densetext_det: &img_densetext_det ../mmocr/demo/demo_densetext_det.jpg
    img_demo_text_det: &img_demo_text_det ../mmocr/demo/demo_text_det.jpg
    img_demo_text_ocr: &img_demo_text_ocr ../mmocr/demo/demo_text_ocr.jpg
    img_demo_text_recog: &img_demo_text_recog ../mmocr/demo/demo_text_recog.jpg
  metric_info: &metric_info
    hmean-iou: # metafile.Results.Metrics
      eval_name: hmean-iou #  test.py --metrics args
      metric_key: 0_hmean-iou:hmean # the key name of eval log
      tolerance: 0.1 # tolerated threshold interval
      task_name: Text Detection # the name of metafile.Results.Task
      dataset: ICDAR2015 # the name of metafile.Results.Dataset
    word_acc: # same as hmean-iou, also a kind of metric
      eval_name: acc
      metric_key: 0_word_acc_ignore_case
      tolerance: 0.2
      task_name: Text Recognition
      dataset: IIIT5K
  convert_image_det: &convert_image_det # the image that will be used by detection model convert
    input_img: *img_densetext_det
    test_img: *img_demo_text_det
  convert_image_rec: &convert_image_rec
    input_img: *img_demo_text_recog
    test_img: *img_demo_text_recog
  backend_test: &default_backend_test True # whether test model precision for backend
  sdk: # SDK config
    sdk_detection_dynamic: &sdk_detection_dynamic configs/mmocr/text-detection/text-detection_sdk_dynamic.py
    sdk_recognition_dynamic: &sdk_recognition_dynamic configs/mmocr/text-recognition/text-recognition_sdk_dynamic.py

onnxruntime:
  pipeline_ort_recognition_static_fp32: &pipeline_ort_recognition_static_fp32
    convert_image: *convert_image_rec # the image used by model conversion
    backend_test: *default_backend_test # whether inference on the backend
    sdk_config: *sdk_recognition_dynamic # test SDK or not. If it exists, use a specific SDK config for testing
    deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_static.py # the deploy cfg path to use, based on mmdeploy path

  pipeline_ort_recognition_dynamic_fp32: &pipeline_ort_recognition_dynamic_fp32
    convert_image: *convert_image_rec
    backend_test: *default_backend_test
    sdk_config: *sdk_recognition_dynamic
    deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py

  pipeline_ort_detection_dynamic_fp32: &pipeline_ort_detection_dynamic_fp32
    convert_image: *convert_image_det
    deploy_config: configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py

tensorrt:
  pipeline_trt_recognition_dynamic_fp16: &pipeline_trt_recognition_dynamic_fp16
    convert_image: *convert_image_rec
    backend_test: *default_backend_test
    sdk_config: *sdk_recognition_dynamic
    deploy_config: configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-1x32x32-1x32x640.py

  pipeline_trt_detection_dynamic_fp16: &pipeline_trt_detection_dynamic_fp16
    convert_image: *convert_image_det
    backend_test: *default_backend_test
    sdk_config: *sdk_detection_dynamic
    deploy_config: configs/mmocr/text-detection/text-detection_tensorrt-fp16_dynamic-320x320-2240x2240.py

openvino:
  # same as onnxruntime backend configuration
ncnn:
  # same as onnxruntime backend configuration
pplnn:
  # same as onnxruntime backend configuration
torchscript:
  # same as onnxruntime backend configuration


models:
  - name: crnn # model name
    metafile: configs/textrecog/crnn/metafile.yml # the path of model metafile, based on codebase path
    codebase_model_config_dir: configs/textrecog/crnn # the basepath of `model_configs`, based on codebase path
    model_configs: # the config name to teset
      - crnn_academic_dataset.py
    pipelines: # pipeline name
      - *pipeline_ort_recognition_dynamic_fp32

  - name: dbnet
    metafile: configs/textdet/dbnet/metafile.yml
    codebase_model_config_dir: configs/textdet/dbnet
    model_configs:
      - dbnet_r18_fpnc_1200e_icdar2015.py
    pipelines:
      - *pipeline_ort_detection_dynamic_fp32
      - *pipeline_trt_detection_dynamic_fp16

      # special pipeline can be added like this
      - convert_image: xxx
        backend_test: xxx
        sdk_config: xxx
        deploy_config: configs/mmocr/text-detection/xxx

4. Generated Report

This is an example of mmocr regression test report.

Model Model Config Task Checkpoint Dataset Backend Deploy Config Static or Dynamic Precision Type Conversion Result hmean-iou word_acc Test Pass
0 crnn ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py Text Recognition ../mmdeploy_checkpoints/mmocr/crnn/crnn_academic-a723a1c5.pth IIIT5K Pytorch - - - - - 80.5 -
1 crnn ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py Text Recognition ${WORK_DIR}/mmocr/crnn/onnxruntime/static/crnn_academic-a723a1c5/end2end.onnx x onnxruntime configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py static fp32 True - 80.67 True
2 crnn ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py Text Recognition ${WORK_DIR}/mmocr/crnn/onnxruntime/static/crnn_academic-a723a1c5 x SDK-onnxruntime configs/mmocr/text-recognition/text-recognition_sdk_dynamic.py static fp32 True - x False
3 dbnet ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py Text Detection ../mmdeploy_checkpoints/mmocr/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth ICDAR2015 Pytorch - - - - 0.795 - -
4 dbnet ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py Text Detection ../mmdeploy_checkpoints/mmocr/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth ICDAR onnxruntime configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py dynamic fp32 True - - True
5 dbnet ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py Text Detection ${WORK_DIR}/mmocr/dbnet/tensorrt/dynamic/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597/end2end.engine ICDAR tensorrt configs/mmocr/text-detection/text-detection_tensorrt-fp16_dynamic-320x320-2240x2240.py dynamic fp16 True 0.793302 - True
6 dbnet ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py Text Detection ${WORK_DIR}/mmocr/dbnet/tensorrt/dynamic/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597 ICDAR SDK-tensorrt configs/mmocr/text-detection/text-detection_sdk_dynamic.py dynamic fp16 True 0.795073 - True

5. Supported Backends

  • [x] ONNX Runtime

  • [x] TensorRT

  • [x] PPLNN

  • [x] ncnn

  • [x] OpenVINO

  • [x] TorchScript

  • [x] SNPE

  • [x] MMDeploy SDK

6. Supported Codebase and Metrics

Codebase Metric Support
mmdet bbox :heavy_check_mark:
segm :heavy_check_mark:
PQ :x:
mmcls accuracy :heavy_check_mark:
mmseg mIoU :heavy_check_mark:
mmpose AR :heavy_check_mark:
AP :heavy_check_mark:
mmocr hmean :heavy_check_mark:
acc :heavy_check_mark:
mmedit PSNR :heavy_check_mark:
SSIM :heavy_check_mark:

ONNX export Optimizer

This is a tool to optimize ONNX model when exporting from PyTorch.

Installation

Build MMDeploy with torchscript support:

export Torch_DIR=$(python -c "import torch;print(torch.utils.cmake_prefix_path + '/Torch')")

cmake \
    -DTorch_DIR=${Torch_DIR} \
    -DMMDEPLOY_TARGET_BACKENDS="${your_backend};torchscript" \
    .. # You can also add other build flags if you need

cmake --build . -- -j$(nproc) && cmake --install .

Usage

# import model_to_graph_custom_optimizer so we can hijack onnx.export
from mmdeploy.apis.onnx.optimizer import model_to_graph__custom_optimizer # noqa
from mmdeploy.core import RewriterContext
from mmdeploy.apis.onnx.passes import optimize_onnx

# load you model here
model = create_model()

# export with ONNX Optimizer
x = create_dummy_input()
with RewriterContext({}, onnx_custom_passes=optimize_onnx):
    torch.onnx.export(model, x, output_path)

The model would be optimized after export.

You can also define your own optimizer:

# create the optimize callback
def _optimize_onnx(graph, params_dict, torch_out):
    from mmdeploy.backend.torchscript import ts_optimizer
    ts_optimizer.onnx._jit_pass_onnx_peephole(graph)
    return graph, params_dict, torch_out

with RewriterContext({}, onnx_custom_passes=_optimize_onnx):
    # export your model

Cross compile snpe inference server on Ubuntu 18

mmdeploy has provided a prebuilt package, if you want to compile it by self, or need to modify the .proto file, you can refer to this document.

Note that the official gRPC documentation does not have complete support for the NDK.

1. Environment

Item Version Remarks
snpe 1.59 1.60 uses clang-8.0, which may cause compatibility issues
host OS ubuntu18.04 snpe1.59 specified version
NDK r17c snpe1.59 specified version
gRPC commit 6f698b5 -
Hardware equipment qcom888 qcom chip required

2. Cross compile gRPC with NDK

  1. Pull gRPC repo, compile protoc and grpc_cpp_plugin on host

# Install dependencies
$ apt-get update && apt-get install -y libssl-dev
# Compile
$ git clone https://github.com/grpc/grpc --recursive=1 --depth=1
$ mkdir -p cmake/build
$ pushd cmake/build

$ cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DgRPC_INSTALL=ON \
  -DgRPC_BUILD_TESTS=OFF \
  -DgRPC_SSL_PROVIDER=package \
  ../..
# Install to host
$ make -j
$ sudo make install
  1. Download the NDK and cross-compile the static libraries with android aarch64 format

$ wget https://dl.google.com/android/repository/android-ndk-r17c-linux-x86_64.zip
$ unzip android-ndk-r17c-linux-x86_64.zip

$ export ANDROID_NDK=/path/to/android-ndk-r17c

$ cd /path/to/grpc
$ mkdir -p cmake/build_aarch64  && pushd cmake/build_aarch64

$ cmake ../.. \
 -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
 -DANDROID_ABI=arm64-v8a \
 -DANDROID_PLATFORM=android-26 \
 -DANDROID_TOOLCHAIN=clang \
 -DANDROID_STL=c++_shared \
 -DCMAKE_BUILD_TYPE=Release \
 -DCMAKE_INSTALL_PREFIX=/tmp/android_grpc_install_shared

$ make -j
$ make install
  1. At this point /tmp/android_grpc_install should have the complete installation file

$ cd /tmp/android_grpc_install
$ tree -L 1
.
├── bin
├── include
├── lib
└── share

3. (Skipable) Self-test whether NDK gRPC is available

  1. Compile the helloworld that comes with gRPC

$ cd /path/to/grpc/examples/cpp/helloworld/
$ mkdir cmake/build_aarch64 -p && pushd cmake/build_aarch64

$ cmake ../.. \
 -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
 -DANDROID_ABI=arm64-v8a \
 -DANDROID_PLATFORM=android-26 \
 -DANDROID_STL=c++_shared \
 -DANDROID_TOOLCHAIN=clang \
 -DCMAKE_BUILD_TYPE=Release \
 -Dabsl_DIR=/tmp/android_grpc_install_shared/lib/cmake/absl \
 -DProtobuf_DIR=/tmp/android_grpc_install_shared/lib/cmake/protobuf \
 -DgRPC_DIR=/tmp/android_grpc_install_shared/lib/cmake/grpc

$ make -j
$ ls greeter*
greeter_async_client   greeter_async_server     greeter_callback_server  greeter_server
greeter_async_client2  greeter_callback_client  greeter_client
  1. Turn on debug mode on your phone, push the binary to /data/local/tmp

$ adb push greeter* /data/local/tmp
  1. adb shell into the phone, execute client/server

/data/local/tmp $ ./greeter_client
Greeter received: Hello world

4. Cross compile snpe inference server

  1. Open the snpe tools website and download version 1.59. Unzip and set environment variables

Note that snpe >= 1.60 starts using clang-8.0, which may cause incompatibility with libc++_shared.so on older devices.

$ export SNPE_ROOT=/path/to/snpe-1.59.0.3230
  1. Open the snpe server directory within mmdeploy, use the options when cross-compiling gRPC

$ cd /path/to/mmdeploy
$ cd service/snpe/server

$ mkdir -p build && cd build
$ export ANDROID_NDK=/path/to/android-ndk-r17c
$ cmake .. \
 -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
 -DANDROID_ABI=arm64-v8a \
 -DANDROID_PLATFORM=android-26 \
 -DANDROID_STL=c++_shared \
 -DANDROID_TOOLCHAIN=clang \
 -DCMAKE_BUILD_TYPE=Release \
 -Dabsl_DIR=/tmp/android_grpc_install_shared/lib/cmake/absl \
 -DProtobuf_DIR=/tmp/android_grpc_install_shared/lib/cmake/protobuf \
 -DgRPC_DIR=/tmp/android_grpc_install_shared/lib/cmake/grpc

 $ make -j
 $ file inference_server
inference_server: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/bin/linker64, BuildID[sha1]=252aa04e2b982681603dacb74b571be2851176d2, with debug_info, not stripped

Finally, you can see infernece_server, adb push it to the device and execute.

5. Regenerate the proto interface

If you have changed inference.proto, you need to regenerate the .cpp and .py interfaces

$ python3 -m pip install grpc_tools --user
$ python3 -m  grpc_tools.protoc -I./ --python_out=./client/ --grpc_python_out=./client/ inference.proto

$ ln -s `which protoc-gen-grpc`
$ protoc --cpp_out=./ --grpc_out=./  --plugin=protoc-gen-grpc=grpc_cpp_plugin  inference.proto

Reference

  • snpe tutorial https://developer.qualcomm.com/sites/default/files/docs/snpe/cplus_plus_tutorial.html

  • gRPC cross build script https://raw.githubusercontent.com/grpc/grpc/master/test/distrib/cpp/run_distrib_test_cmake_aarch64_cross.sh

  • stackoverflow https://stackoverflow.com/questions/54052229/build-grpc-c-for-android-using-ndk-arm-linux-androideabi-clang-compiler

Frequently Asked Questions

TensorRT

  • “WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.”

    Fp16 mode requires a device with full-rate fp16 support.

  • “error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]”

    When building an ICudaEngine from an INetworkDefinition that has dynamically resizable inputs, users need to specify at least one optimization profile. Which can be set in deploy config:

    backend_config = dict(
      common_config=dict(max_workspace_size=1 << 30),
      model_inputs=[
          dict(
              input_shapes=dict(
                  input=dict(
                      min_shape=[1, 3, 320, 320],
                      opt_shape=[1, 3, 800, 1344],
                      max_shape=[1, 3, 1344, 1344])))
      ])
    

    The input tensor shape should be limited between min_shape and max_shape.

  • “error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS”

    TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the defaulted choice for SM version >= 7.0. You may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you dont want to upgrade.

Libtorch

  • Error: libtorch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.

    May export CUDNN_ROOT=/root/path/to/cudnn to resolve the build error.

Windows

  • Error: similar like this OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\cx\miniconda3\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies

    Solution: according to this post, the issue may be caused by NVidia and will fix in CUDA release 11.7. For now one could use the fixNvPe.py script to modify the nvidia dlls in the pytorch lib dir.

    python fixNvPe.py --input=C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\lib\*.dll

    You can find your pytorch installation path with:

    import torch
    print(torch.__file__)
    
  • enable_language(CUDA) error

    -- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
    -- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1 (found version "11.1")
    CMake Error at C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:491 (message):
      No CUDA toolset found.
    Call Stack (most recent call first):
      C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
      C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:59 (__determine_compiler_id_test)
      C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCUDACompiler.cmake:339 (CMAKE_DETERMINE_COMPILER_ID)
      C:/workspace/mmdeploy-0.6.0-windows-amd64-cuda11.1-tensorrt8.2.3.0/sdk/lib/cmake/MMDeploy/MMDeployConfig.cmake:27 (enable_language)
      CMakeLists.txt:5 (find_package)
    

    Cause: CUDA Toolkit 11.1 was installed before Visual Studio, so the VS plugin was not installed. Or the version of VS is too new, so that the installation of the VS plugin is skipped during the installation of the CUDA Toolkit

    Solution: This problem can be solved by manually copying the four files in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\extras\visual_studio_integration\MSBuildExtensions to C:\Software\Microsoft Visual Studio\2022\Community\Msbuild\Microsoft\VC\v170\BuildCustomizations The specific path should be changed according to the actual situation.

ONNX Runtime

  • Under Windows system, when visualizing model inference result failed with the following error:

    onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Failed to load library, error code: 193
    

    Cause: In latest Windows systems, there are two onnxruntime.dll under the system path, and they will be loaded first, causing conflicts.

    C:\Windows\SysWOW64\onnxruntime.dll
    C:\Windows\System32\onnxruntime.dll
    

    Solution: Choose one of the following two options

    1. Copy the dll in the lib directory of the downloaded onnxruntime to the directory where mmdeploy_onnxruntime_ops.dll locates (It is recommended to use Everything to search the ops dll). For example, copy onnxruntime lib/onnxruntime.dll to mmdeploy/lib, then the mmdeploy/lib directory should like this

      `-- mmdeploy_onnxruntime_ops.dll
      `-- mmdeploy_onnxruntime_ops.lib
      `-- onnxruntime.dll
      
    2. Rename the two dlls in the system path so that they cannot be loaded.

Pip

  • pip installed package but could not import them.

    Make sure your are using conda pip.

    $ which pip
    # /path/to/.local/bin/pip
    /path/to/miniconda3/lib/python3.9/site-packages/pip
    

apis

apis/tensorrt

apis/onnxruntime

apis/ncnn

apis/pplnn

Indices and tables

Read the Docs v: 0.x
Versions
latest
stable
1.x
v1.1.0
v1.0.0
0.x
v0.14.0
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.