Welcome to MMDeploy’s documentation!¶
You can switch between Chinese and English documents in the lower-left corner of the layout.
Get Started¶
MMDeploy provides useful tools for deploying OpenMMLab models to various platforms and devices.
With the help of them, you can not only do model deployment using our pre-defined pipelines but also customize your own deployment pipeline.
Introduction¶
In MMDeploy, the deployment pipeline can be illustrated by a sequential modules, i.e., Model Converter, MMDeploy Model and Inference SDK.
Model Converter¶
Model Converter aims at converting training models from OpenMMLab into backend models that can be run on target devices. It is able to transform PyTorch model into IR model, i.e., ONNX, TorchScript, as well as convert IR model to backend model. By combining them together, we can achieve one-click end-to-end model deployment.
MMDeploy Model¶
MMDeploy Model is the result package exported by Model Converter. Beside the backend models, it also includes the model meta info, which will be used by Inference SDK.
Inference SDK¶
Inference SDK is developed by C/C++, wrapping the preprocessing, model forward and postprocessing modules in model inference. It supports FFI such as C, C++, Python, C#, Java and so on.
Prerequisites¶
In order to do an end-to-end model deployment, MMDeploy requires Python 3.6+ and PyTorch 1.8+.
Step 0. Download and install Miniconda from the official website.
Step 1. Create a conda environment and activate it.
conda create --name mmdeploy python=3.8 -y
conda activate mmdeploy
Step 2. Install PyTorch following official instructions, e.g.
On GPU platforms:
conda install pytorch=={pytorch_version} torchvision=={torchvision_version} cudatoolkit={cudatoolkit_version} -c pytorch -c conda-forge
On CPU platforms:
conda install pytorch=={pytorch_version} torchvision=={torchvision_version} cpuonly -c pytorch
Note
On GPU platform, please ensure that {cudatoolkit_version} matches your host CUDA toolkit version. Otherwise, it probably brings in conflicts when deploying model with TensorRT.
Installation¶
We recommend that users follow our best practices installing MMDeploy.
Step 0. Install MMCV.
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0rc2"
Step 1. Install MMDeploy and inference engine
We recommend using MMDeploy precompiled package as our best practice. Currently, we support model converter and sdk inference pypi package, and the sdk c/cpp library is provided here. You can download them according to your target platform and device.
The supported platform and device matrix is presented as following:
OS-Arch | Device | ONNX Runtime | TensorRT |
---|---|---|---|
Linux-x86_64 | CPU | Y | N/A |
CUDA | Y | Y | |
Windows-x86_64 | CPU | Y | N/A |
CUDA | Y | Y |
Note: if MMDeploy prebuilt package doesn’t meet your target platforms or devices, please build MMDeploy from source
Take the latest precompiled package as example, you can install it as follows:
Linux-x86_64
# 1. install MMDeploy model converter
pip install mmdeploy==1.3.0
# 2. install MMDeploy sdk inference
# you can install one to install according whether you need gpu inference
# 2.1 support onnxruntime
pip install mmdeploy-runtime==1.3.0
# 2.2 support onnxruntime-gpu, tensorrt
pip install mmdeploy-runtime-gpu==1.3.0
# 3. install inference engine
# 3.1 install TensorRT
# !!! If you want to convert a tensorrt model or inference with tensorrt,
# download TensorRT-8.2.3.0 CUDA 11.x tar package from NVIDIA, and extract it to the current directory
pip install TensorRT-8.2.3.0/python/tensorrt-8.2.3.0-cp38-none-linux_x86_64.whl
pip install pycuda
export TENSORRT_DIR=$(pwd)/TensorRT-8.2.3.0
export LD_LIBRARY_PATH=${TENSORRT_DIR}/lib:$LD_LIBRARY_PATH
# !!! Moreover, download cuDNN 8.2.1 CUDA 11.x tar package from NVIDIA, and extract it to the current directory
export CUDNN_DIR=$(pwd)/cuda
export LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH
# 3.2 install ONNX Runtime
# you can install one to install according whether you need gpu inference
# 3.2.1 onnxruntime
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-1.8.1
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
# 3.2.2 onnxruntime-gpu
pip install onnxruntime-gpu==1.8.1
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-gpu-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-gpu-1.8.1.tgz
export ONNXRUNTIME_DIR=$(pwd)/onnxruntime-linux-x64-gpu-1.8.1
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
Windows-x86_64
Please learn its prebuilt package from this guide.
Convert Model¶
After the installation, you can enjoy the model deployment journey starting from converting PyTorch model to backend model by running tools/deploy.py
.
Based on the above settings, we provide an example to convert the Faster R-CNN in MMDetection to TensorRT as below:
# clone mmdeploy to get the deployment config. `--recursive` is not necessary
git clone -b main https://github.com/open-mmlab/mmdeploy.git
# clone mmdetection repo. We have to use the config file to build PyTorch nn module
git clone -b 3.x https://github.com/open-mmlab/mmdetection.git
cd mmdetection
mim install -v -e .
cd ..
# download Faster R-CNN checkpoint
wget -P checkpoints https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
# run the command to start model conversion
python mmdeploy/tools/deploy.py \
mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py \
checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
mmdetection/demo/demo.jpg \
--work-dir mmdeploy_model/faster-rcnn \
--device cuda \
--dump-info
The converted model and its meta info will be found in the path specified by --work-dir
.
And they make up of MMDeploy Model that can be fed to MMDeploy SDK to do model inference.
For more details about model conversion, you can read how_to_convert_model. If you want to customize the conversion pipeline, you can edit the config file by following this tutorial.
Tip
You can convert the above model to onnx model and perform ONNX Runtime inference just by changing ‘detection_tensorrt_dynamic-320x320-1344x1344.py’ to ‘detection_onnxruntime_dynamic.py’ and making ‘–device’ as ‘cpu’.
Inference Model¶
After model conversion, we can perform inference not only by Model Converter but also by Inference SDK.
Inference by Model Converter¶
Model Converter provides a unified API named as inference_model
to do the job, making all inference backends API transparent to users.
Take the previous converted Faster R-CNN tensorrt model for example,
from mmdeploy.apis import inference_model
result = inference_model(
model_cfg='mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py',
deploy_cfg='mmdeploy/configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py',
backend_files=['mmdeploy_model/faster-rcnn/end2end.engine'],
img='mmdetection/demo/demo.jpg',
device='cuda:0')
Note
‘backend_files’ in this API refers to backend engine file path, which MUST be put in a list, since some inference engines like OpenVINO and ncnn separate the network structure and its weights into two files.
Inference by SDK¶
You can directly run MMDeploy demo programs in the precompiled package to get inference results.
wget https://github.com/open-mmlab/mmdeploy/releases/download/v1.3.0/mmdeploy-1.3.0-linux-x86_64-cuda11.8.tar.gz
tar xf mmdeploy-1.3.0-linux-x86_64-cuda11.8
cd mmdeploy-1.3.0-linux-x86_64-cuda11.8
# run python demo
python example/python/object_detection.py cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg
# run C/C++ demo
# build the demo according to the README.md in the folder.
./bin/object_detection cuda ../mmdeploy_model/faster-rcnn ../mmdetection/demo/demo.jpg
Note
In the above command, the input model is SDK Model path. It is NOT engine file path but actually the path passed to –work-dir. It not only includes engine files but also meta information like ‘deploy.json’ and ‘pipeline.json’.
In the next section, we will provide examples of deploying the converted Faster R-CNN model talked above with SDK different FFI (Foreign Function Interface).
Python API¶
from mmdeploy_runtime import Detector
import cv2
img = cv2.imread('mmdetection/demo/demo.jpg')
# create a detector
detector = Detector(model_path='mmdeploy_models/faster-rcnn', device_name='cuda', device_id=0)
# run the inference
bboxes, labels, _ = detector(img)
# Filter the result according to threshold
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
[left, top, right, bottom], score = bbox[0:4].astype(int), bbox[4]
if score < 0.3:
continue
cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))
cv2.imwrite('output_detection.png', img)
You can find more examples from here.
C++ API¶
Using SDK C++ API should follow next pattern,
Now let’s apply this procedure on the above Faster R-CNN model.
#include <cstdlib>
#include <opencv2/opencv.hpp>
#include "mmdeploy/detector.hpp"
int main() {
const char* device_name = "cuda";
int device_id = 0;
std::string model_path = "mmdeploy_model/faster-rcnn";
std::string image_path = "mmdetection/demo/demo.jpg";
// 1. load model
mmdeploy::Model model(model_path);
// 2. create predictor
mmdeploy::Detector detector(model, mmdeploy::Device{device_name, device_id});
// 3. read image
cv::Mat img = cv::imread(image_path);
// 4. inference
auto dets = detector.Apply(img);
// 5. deal with the result. Here we choose to visualize it
for (int i = 0; i < dets.size(); ++i) {
const auto& box = dets[i].bbox;
fprintf(stdout, "box %d, left=%.2f, top=%.2f, right=%.2f, bottom=%.2f, label=%d, score=%.4f\n",
i, box.left, box.top, box.right, box.bottom, dets[i].label_id, dets[i].score);
if (dets[i].score < 0.3) {
continue;
}
cv::rectangle(img, cv::Point{(int)box.left, (int)box.top},
cv::Point{(int)box.right, (int)box.bottom}, cv::Scalar{0, 255, 0});
}
cv::imwrite("output_detection.png", img);
return 0;
}
When you build this example, try to add MMDeploy package in your CMake project as following. Then pass -DMMDeploy_DIR
to cmake, which indicates the path where MMDeployConfig.cmake
locates. You can find it in the prebuilt package.
find_package(MMDeploy REQUIRED)
target_link_libraries(${name} PRIVATE mmdeploy ${OpenCV_LIBS})
For more SDK C++ API usages, please read these samples.
For the rest C, C# and Java API usages, please read C demos, C# demos and Java demos respectively. We’ll talk about them more in our next release.
Evaluate Model¶
You can test the performance of deployed model using tool/test.py
. For example,
python ${MMDEPLOY_DIR}/tools/test.py \
${MMDEPLOY_DIR}/configs/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
${MMDET_DIR}/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py \
--model ${BACKEND_MODEL_FILES} \
--metrics ${METRICS} \
--device cuda:0
Note
Regarding the –model option, it represents the converted engine files path when using Model Converter to do performance test. But when you try to test the metrics by Inference SDK, this option refers to the directory path of MMDeploy Model.
You can read how to evaluate a model for more details.
Build from Source¶
Download¶
git clone -b main git@github.com:open-mmlab/mmdeploy.git --recursive
Note:
If fetching submodule fails, you could get submodule manually by following instructions:
cd mmdeploy git clone git@github.com:NVIDIA/cub.git third_party/cub cd third_party/cub git checkout c3cceac115 # go back to third_party directory and git clone pybind11 cd .. git clone git@github.com:pybind/pybind11.git pybind11 cd pybind11 git checkout 70a58c5 cd .. git clone git@github.com:gabime/spdlog.git spdlog cd spdlog git checkout 9e8e52c048
If it fails when
git clone
viaSSH
, you can try theHTTPS
protocol like this:git clone -b main https://github.com/open-mmlab/mmdeploy.git --recursive
Build¶
Please visit the following links to find out how to build MMDeploy according to the target platform.
Use Docker Image¶
This document guides how to install mmdeploy with Docker.
Get prebuilt docker images¶
MMDeploy provides prebuilt docker images for the convenience of its users on Docker Hub. The docker images are built on
the latest and released versions. For instance, the image with tag openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy
is built on the latest mmdeploy and the image with tag openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy1.2.0
is for mmdeploy==1.2.0
.
The specifications of the Docker Image are shown below.
Item | Version |
---|---|
OS | Ubuntu20.04 |
CUDA | 11.8 |
CUDNN | 8.9 |
Python | 3.8.10 |
Torch | 2.0.0 |
TorchVision | 0.15.0 |
TorchScript | 2.0.0 |
TensorRT | 8.6.1.6 |
ONNXRuntime | 1.15.1 |
OpenVINO | 2022.3.0 |
ncnn | 20230816 |
openppl | 0.8.1 |
You can select a tag and run docker pull
to get the docker image:
export TAG=openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy
docker pull $TAG
Build docker images (optional)¶
If the prebuilt docker images do not meet your requirements,
then you can build your own image by running the following script.
The docker file is docker/Release/Dockerfile
and its building argument is MMDEPLOY_VERSION
,
which can be a tag or a branch from mmdeploy.
export MMDEPLOY_VERSION=main
export TAG=mmdeploy-${MMDEPLOY_VERSION}
docker build docker/Release/ -t ${TAG} --build-arg MMDEPLOY_VERSION=${MMDEPLOY_VERSION}
Run docker container¶
After pulling or building the docker image, you can use docker run
to launch the docker service:
export TAG=openmmlab/mmdeploy:ubuntu20.04-cuda11.8-mmdeploy
docker run --gpus=all -it --rm $TAG
FAQs¶
CUDA error: the provided PTX was compiled with an unsupported toolchain:
As described here, update the GPU driver to the latest one for your GPU.
docker: Error response from daemon: could not select device driver “” with capabilities: [gpu].
# Add the package repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
Build from Script¶
Through user investigation, we know that most users are already familiar with python and torch before using mmdeploy. Therefore we provide scripts to simplify mmdeploy installation.
Assuming you already have
python3 -m pip (
conda
orpyenv
)nvcc (depends on inference backend)
torch (not compulsory)
run this script to install mmdeploy + ncnn backend, nproc
is not compulsory.
$ cd /path/to/mmdeploy
$ python3 tools/scripts/build_ubuntu_x64_ncnn.py
..
A sudo password may be required during this time, and the script will try its best to build and install mmdeploy SDK and demo:
Detect host OS version,
make
job number, whether useroot
and try to fixpython3 -m pip
Find the necessary basic tools, such as g++-7, cmake, wget, etc.
Compile necessary dependencies, such as pyncnn, protobuf
The script will also try to avoid affecting host environment:
The dependencies of source code compilation are placed in the
mmdeploy-dep
directory at the same level as mmdeployThe script would not modify variables such as PATH, LD_LIBRARY_PATH, PYTHONPATH, etc.
The environment variables that need to be modified will be printed, please pay attention to the final output
The script will eventually execute python3 tools/check_env.py
, the successful installation should display the version number of the corresponding backend and ops_is_available: True
, for example:
$ python3 tools/check_env.py
..
2022-09-13 14:49:13,767 - mmdeploy - INFO - **********Backend information**********
2022-09-13 14:49:14,116 - mmdeploy - INFO - onnxruntime: 1.8.0 ops_is_avaliable : True
2022-09-13 14:49:14,131 - mmdeploy - INFO - tensorrt: 8.4.1.5 ops_is_avaliable : True
2022-09-13 14:49:14,139 - mmdeploy - INFO - ncnn: 1.0.20220901 ops_is_avaliable : True
2022-09-13 14:49:14,150 - mmdeploy - INFO - pplnn_is_avaliable: True
..
Here is the verified installation script. If you want mmdeploy to support multiple backends at the same time, you can execute each script once:
script | OS version |
---|---|
build_ubuntu_x64_ncnn.py | 18.04/20.04 |
build_ubuntu_x64_ort.py | 18.04/20.04 |
build_ubuntu_x64_pplnn.py | 18.04/20.04 |
build_ubuntu_x64_torchscript.py | 18.04/20.04 |
build_ubuntu_x64_tvm.py | 18.04/20.04 |
build_jetson_orin_python38.sh | JetPack5.0 L4T 34.1 |
CMake Build Option Spec¶
NAME | VALUE | DEFAULT | REMARK |
---|---|---|---|
MMDEPLOY_SHARED_LIBS | {ON, OFF} | ON | Switch to build shared libs |
MMDEPLOY_BUILD_SDK | {ON, OFF} | OFF | Switch to build MMDeploy SDK |
MMDEPLOY_BUILD_SDK_MONOLITHIC | {ON, OFF} | OFF | Build single lib |
MMDEPLOY_BUILD_TEST | {ON, OFF} | OFF | Build unittest |
MMDEPLOY_BUILD_SDK_PYTHON_API | {ON, OFF} | OFF | Switch to build MMDeploy SDK python package |
MMDEPLOY_BUILD_SDK_CSHARP_API | {ON, OFF} | OFF | Build C# SDK API |
MMDEPLOY_BUILD_SDK_JAVA_API | {ON, OFF} | OFF | Build Java SDK API |
MMDEPLOY_BUILD_TEST | {ON, OFF} | OFF | Switch to build MMDeploy SDK unittest cases |
MMDEPLOY_SPDLOG_EXTERNAL | {ON, OFF} | OFF | Build with spdlog installation package that comes with the system |
MMDEPLOY_ZIP_MODEL | {ON, OFF} | OFF | Enable SDK with zip format |
MMDEPLOY_COVERAGE | {ON, OFF} | OFF | Build for cplus code coverage report |
MMDEPLOY_TARGET_DEVICES | {"cpu", "cuda"} | cpu | Enable target device. You can enable more by
passing a semicolon separated list of device names to MMDEPLOY_TARGET_DEVICES variable, e.g. -DMMDEPLOY_TARGET_DEVICES="cpu;cuda" |
MMDEPLOY_TARGET_BACKENDS | {"trt", "ort", "pplnn", "ncnn", "openvino", "torchscript", "snpe", "tvm"} | N/A | Enabling inference engine. By default, no target inference engine is set, since it highly depends on the use case. When more than one engine are specified, it has to be set with a semicolon separated list of inference backend names, e.g.
After specifying the inference engine, it's package path has to be passed to cmake as follows, 1. trt: TensorRT. TENSORRT_DIR and CUDNN_DIR are needed.
2. ort: ONNXRuntime. ONNXRUNTIME_DIR is needed.
3. pplnn: PPL.NN. pplnn_DIR is needed.
4. ncnn: ncnn. ncnn_DIR is needed.
5. openvino: OpenVINO. InferenceEngine_DIR is needed.
6. torchscript: TorchScript. Torch_DIR is needed.
7. snpe: qcom snpe. SNPE_ROOT must existed in the environment variable because of C/S mode.8. coreml: CoreML. Torch_DIR is required.
9. TVM: TVM. TVM_DIR is required.
|
MMDEPLOY_CODEBASES | {"mmpretrain", "mmdet", "mmseg", "mmagic", "mmocr", "all"} | all | Enable codebase's postprocess modules. You can provide a semicolon separated list of codebase names to enable them, e.g., -DMMDEPLOY_CODEBASES="mmpretrain;mmdet" . Or you can pass all to enable them all, i.e., -DMMDEPLOY_CODEBASES=all |
How to convert model¶
This tutorial briefly introduces how to export an OpenMMlab model to a specific backend using MMDeploy tools. Notes:
Supported backends are ONNXRuntime, TensorRT, ncnn, PPLNN, OpenVINO.
Supported codebases are MMPretrain, MMDetection, MMSegmentation, MMOCR, MMagic.
How to convert models from Pytorch to other backends¶
Prerequisite¶
Install and build your target backend. You could refer to ONNXRuntime-install, TensorRT-install, ncnn-install, PPLNN-install, OpenVINO-install for more information.
Install and build your target codebase. You could refer to MMPretrain-install, MMDetection-install, MMSegmentation-install, MMOCR-install, MMagic-install.
Usage¶
python ./tools/deploy.py \
${DEPLOY_CFG_PATH} \
${MODEL_CFG_PATH} \
${MODEL_CHECKPOINT_PATH} \
${INPUT_IMG} \
--test-img ${TEST_IMG} \
--work-dir ${WORK_DIR} \
--calib-dataset-cfg ${CALIB_DATA_CFG} \
--device ${DEVICE} \
--log-level INFO \
--show \
--dump-info
Description of all arguments¶
deploy_cfg
: The deployment configuration of mmdeploy for the model, including the type of inference framework, whether quantize, whether the input shape is dynamic, etc. There may be a reference relationship between configuration files,mmdeploy/mmpretrain/classification_ncnn_static.py
is an example.model_cfg
: Model configuration for algorithm library, e.g.mmpretrain/configs/vision_transformer/vit-base-p32_ft-64xb64_in1k-384.py
, regardless of the path to mmdeploy.checkpoint
: torch model path. It can start with http/https, see the implementation ofmmcv.FileClient
for details.img
: The path to the image or point cloud file used for testing during the model conversion.--test-img
: The path of the image file that is used to test the model. If not specified, it will be set toNone
.--work-dir
: The path of the work directory that is used to save logs and models.--calib-dataset-cfg
: Only valid in int8 mode. The config used for calibration. If not specified, it will be set toNone
and use the “val” dataset in the model config for calibration.--device
: The device used for model conversion. If not specified, it will be set tocpu
. For trt, usecuda:0
format.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.--show
: Whether to show detection outputs.--dump-info
: Whether to output information for SDK.
How to find the corresponding deployment config of a PyTorch model¶
Find the model’s codebase folder in
configs/
. For converting a yolov3 model, you need to checkconfigs/mmdet
folder.Find the model’s task folder in
configs/codebase_folder/
. For a yolov3 model, you need to checkconfigs/mmdet/detection
folder.Find the deployment config file in
configs/codebase_folder/task_folder/
. For deploying a yolov3 model to the onnx backend, you could useconfigs/mmdet/detection/detection_onnxruntime_dynamic.py
.
Example¶
python ./tools/deploy.py \
configs/mmdet/detection/detection_tensorrt_dynamic-320x320-1344x1344.py \
$PATH_TO_MMDET/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
$PATH_TO_MMDET/checkpoints/yolo/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
$PATH_TO_MMDET/demo/demo.jpg \
--work-dir work_dir \
--show \
--device cuda:0
How to evaluate the exported models¶
You can try to evaluate model, referring to how_to_evaluate_a_model.
List of supported models exportable to other backends¶
Refer to Support model list
How to write config¶
This tutorial describes how to write a config for model conversion and deployment. A deployment config includes onnx config
, codebase config
, backend config
.
1. How to write onnx config¶
Onnx config to describe how to export a model from pytorch to onnx.
Description of onnx config arguments¶
type
: Type of config dict. Default isonnx
.export_params
: If specified, all parameters will be exported. Set this to False if you want to export an untrained model.keep_initializers_as_inputs
: If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs.opset_version
: Opset_version is 11 by default.save_file
: Output onnx file.input_names
: Names to assign to the input nodes of the graph.output_names
: Names to assign to the output nodes of the graph.input_shape
: The height and width of input tensor to the model.
Example¶
onnx_config = dict(
type='onnx',
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=None)
If you need to use dynamic axes¶
If the dynamic shape of inputs and outputs is required, you need to add dynamic_axes dict in onnx config.
dynamic_axes
: Describe the dimensional information about input and output.
Example¶
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'dets': {
0: 'batch',
1: 'num_dets',
},
'labels': {
0: 'batch',
1: 'num_dets',
},
}
2. How to write codebase config¶
Codebase config part contains information like codebase type and task type.
3. How to write backend config¶
The backend config is mainly used to specify the backend on which model runs and provide the information needed when the model runs on the backend , referring to ONNX Runtime, TensorRT, ncnn, PPLNN.
type
: Model’s backend, includingonnxruntime
,ncnn
,pplnn
,tensorrt
,openvino
.
Example¶
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False, max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 512, 1024],
opt_shape=[1, 3, 1024, 2048],
max_shape=[1, 3, 2048, 2048])))
])
4. A complete example of mmpretrain on TensorRT¶
Here we provide a complete deployment config from mmpretrain on TensorRT.
codebase_config = dict(type='mmpretrain', task='Classification')
backend_config = dict(
type='tensorrt',
common_config=dict(
fp16_mode=False,
max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 224, 224],
opt_shape=[4, 3, 224, 224],
max_shape=[64, 3, 224, 224])))])
onnx_config = dict(
type='onnx',
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'output': {
0: 'batch'
}
},
export_params=True,
keep_initializers_as_inputs=False,
opset_version=11,
save_file='end2end.onnx',
input_names=['input'],
output_names=['output'],
input_shape=[224, 224])
5. The name rules of our deployment config¶
There is a specific naming convention for the filename of deployment config files.
(task name)_(backend name)_(dynamic or static).py
task name
: Model’s task type.backend name
: Backend’s name. Note if you use the quantization function, you need to indicate the quantization type. Just liketensorrt-int8
.dynamic or static
: Dynamic or static export. Note if the backend needs explicit shape information, you need to add a description of input size withheight x width
format. Just likedynamic-512x1024-2048x2048
, it means that the min input shape is512x1024
and the max input shape is2048x2048
.
Example¶
detection_tensorrt-int8_dynamic-320x320-1344x1344.py
6. How to write model config¶
According to model’s codebase, write the model config file. Model’s config file is used to initialize the model, referring to MMPretrain, MMDetection, MMSegmentation, MMOCR, MMagic.
How to evaluate model¶
After converting a PyTorch model to a backend model, you may evaluate backend models with tools/test.py
Prerequisite¶
Install MMDeploy according to get-started instructions. And convert the PyTorch model or ONNX model to the backend model by following the guide.
Usage¶
python tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--out ${OUTPUT_PKL_FILE}] \
[--format-only] \
[--metrics ${METRICS}] \
[--show] \
[--show-dir ${OUTPUT_IMAGE_DIR}] \
[--show-score-thr ${SHOW_SCORE_THR}] \
--device ${DEVICE} \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}]
[--log2file work_dirs/output.txt]
[--batch-size ${BATCH_SIZE}]
[--speed-test] \
[--warmup ${WARM_UP}] \
[--log-interval ${LOG_INTERVERL}] \
Description of all arguments¶
deploy_cfg
: The config for deployment.model_cfg
: The config of the model in OpenMMLab codebases.--model
: The backend model file. For example, if we convert a model to TensorRT, we need to pass the model file with “.engine” suffix.--out
: The path to save output results in pickle format. (The results will be saved only if this argument is given)--format-only
: Whether format the output results without evaluation or not. It is useful when you want to format the result to a specific format and submit it to the test server--metrics
: The metrics to evaluate the model defined in OpenMMLab codebases. e.g. “segm”, “proposal” for COCO in mmdet, “precision”, “recall”, “f1_score”, “support” for single label dataset in mmpretrain.--show
: Whether to show the evaluation result on the screen.--show-dir
: The directory to save the evaluation result. (The results will be saved only if this argument is given)--show-score-thr
: The threshold determining whether to show detection bounding boxes.--device
: The device that the model runs on. Note that some backends restrict the device. For example, TensorRT must run on cuda.--cfg-options
: Extra or overridden settings that will be merged into the current deploy config.--metric-options
: Custom options for evaluation. The key-value pair in xxx=yyy format will be kwargs for dataset.evaluate() function.--log2file
: log evaluation results (and speed) to file.--batch-size
: the batch size for inference, which would overridesamples_per_gpu
in data config. Default is1
. Note that not all models supportbatch_size>1
.--speed-test
: Whether to activate speed test.--warmup
: warmup before counting inference elapse, require setting speed-test first.--log-interval
: The interval between each log, require setting speed-test first.
* Other arguments in tools/test.py
are used for speed test. They have no concern with evaluation.
Example¶
python tools/test.py \
configs/mmpretrain/classification_onnxruntime_static.py \
{MMPRETRAIN_DIR}/configs/resnet/resnet50_b32x8_imagenet.py \
--model model.onnx \
--out out.pkl \
--device cpu \
--speed-test
Quantize model¶
Why quantization ?¶
The fixed-point model has many advantages over the fp32 model:
Smaller size, 8-bit model reduces file size by 75%
Benefit from the smaller model, the Cache hit rate is improved and inference would be faster
Chips tend to have corresponding fixed-point acceleration instructions which are faster and less energy consumed (int8 on a common CPU requires only about 10% of energy)
APK file size and heat generation are key indicators while evaluating mobile APP; On server side, quantization means that you can increase model size in exchange for precision and keep the same QPS.
Post training quantization scheme¶
Taking ncnn backend as an example, the complete workflow is as follows:

mmdeploy generates quantization table based on static graph (onnx) and uses backend tools to convert fp32 model to fixed point.
mmdeploy currently supports ncnn with PTQ.
How to convert model¶
After mmdeploy installation, install ppq
git clone https://github.com/openppl-public/ppq.git
cd ppq
pip install -r requirements.txt
python3 setup.py install
Back in mmdeploy, enable quantization with the option ‘tools/deploy.py –quant’.
cd /path/to/mmdeploy
export MODEL_CONFIG=/home/rg/konghuanjun/mmpretrain/configs/resnet/resnet18_8xb32_in1k.py
export MODEL_PATH=https://download.openmmlab.com/mmclassification/v0/resnet/resnet18_8xb32_in1k_20210831-fbbb1da6.pth
# get some imagenet sample images
git clone https://github.com/nihui/imagenet-sample-images --depth=1
# quantize
python3 tools/deploy.py configs/mmpretrain/classification_ncnn-int8_static.py ${MODEL_CONFIG} ${MODEL_PATH} /path/to/self-test.png --work-dir work_dir --device cpu --quant --quant-image-dir /path/to/imagenet-sample-images
...
Description
Parameter | Meaning |
---|---|
--quant | Enable quantization, the default value is False |
--quant-image-dir | Calibrate dataset, use Validation Set in MODEL_CONFIG by default |
Custom calibration dataset¶
Calibration set is used to calculate quantization layer parameters. Some DFQ (Data Free Quantization) methods do not even require a dataset.
Create a folder, just put in some images (no directory structure, no negative example, no special filename format)
The image needs to be the data comes from real scenario otherwise the accuracy would be drop
You can not quantize model with test dataset
Type
Train dataset
Validation dataset
Test dataset
Calibration dataset
Usage
QAT
PTQ
Test accuracy
PTQ
It is highly recommended that verifying model precision after quantization. Here is some quantization model test result.
Useful Tools¶
Apart from deploy.py
, there are other useful tools under the tools/
directory.
torch2onnx¶
This tool can be used to convert PyTorch model from OpenMMLab to ONNX.
Usage¶
python tools/torch2onnx.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${CHECKPOINT} \
${INPUT_IMG} \
--work-dir ${WORK_DIR} \
--device cpu \
--log-level INFO
Description of all arguments¶
deploy_cfg
: The path of the deploy config file in MMDeploy codebase.model_cfg
: The path of model config file in OpenMMLab codebase.checkpoint
: The path of the model checkpoint file.img
: The path of the image file used to convert the model.--work-dir
: Directory to save output ONNX models Default is./work-dir
.--device
: The device used for conversion. If not specified, it will be set tocpu
.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.
extract¶
ONNX model with Mark
nodes in it can be partitioned into multiple subgraphs. This tool can be used to extract the subgraph from the ONNX model.
Usage¶
python tools/extract.py \
${INPUT_MODEL} \
${OUTPUT_MODEL} \
--start ${PARITION_START} \
--end ${PARITION_END} \
--log-level INFO
Description of all arguments¶
input_model
: The path of input ONNX model. The output ONNX model will be extracted from this model.output_model
: The path of output ONNX model.--start
: The start point of extracted model with format<function_name>:<input/output>
. Thefunction_name
comes from the decorator@mark
.--end
: The end point of extracted model with format<function_name>:<input/output>
. Thefunction_name
comes from the decorator@mark
.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.
Note¶
To support the model partition, you need to add Mark nodes in the ONNX model. The Mark node comes from the @mark
decorator.
For example, if we have marked the multiclass_nms
as below, we can set end=multiclass_nms:input
to extract the subgraph before NMS.
@mark('multiclass_nms', inputs=['boxes', 'scores'], outputs=['dets', 'labels'])
def multiclass_nms(*args, **kwargs):
"""Wrapper function for `_multiclass_nms`."""
onnx2pplnn¶
This tool helps to convert an ONNX
model to an PPLNN
model.
Usage¶
python tools/onnx2pplnn.py \
${ONNX_PATH} \
${OUTPUT_PATH} \
--device cuda:0 \
--opt-shapes [224,224] \
--log-level INFO
Description of all arguments¶
onnx_path
: The path of theONNX
model to convert.output_path
: The convertedPPLNN
algorithm path in json format.device
: The device of the model during conversion.opt-shapes
: Optimal shapes for PPLNN optimization. The shape of each tensor should be wrap with “[]” or “()” and the shapes of tensors should be separated by “,”.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.
onnx2tensorrt¶
This tool can be used to convert ONNX to TensorRT engine.
Usage¶
python tools/onnx2tensorrt.py \
${DEPLOY_CFG} \
${ONNX_PATH} \
${OUTPUT} \
--device-id 0 \
--log-level INFO \
--calib-file /path/to/file
Description of all arguments¶
deploy_cfg
: The path of the deploy config file in MMDeploy codebase.onnx_path
: The ONNX model path to convert.output
: The path of output TensorRT engine.--device-id
: The device index, default to0
.--calib-file
: The calibration data used to calibrate engine to int8.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.
onnx2ncnn¶
This tool helps to convert an ONNX
model to an ncnn
model.
Usage¶
python tools/onnx2ncnn.py \
${ONNX_PATH} \
${NCNN_PARAM} \
${NCNN_BIN} \
--log-level INFO
Description of all arguments¶
onnx_path
: The path of theONNX
model to convert from.output_param
: The convertedncnn
param path.output_bin
: The convertedncnn
bin path.--log-level
: To set log level which in'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. If not specified, it will be set toINFO
.
profiler¶
This tool helps to test latency of models with PyTorch, TensorRT and other backends. Note that the pre- and post-processing is excluded when computing inference latency.
Usage¶
python tools/profiler.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
${IMAGE_DIR} \
--model ${MODEL} \
--device ${DEVICE} \
--shape ${SHAPE} \
--num-iter ${NUM_ITER} \
--warmup ${WARMUP} \
--cfg-options ${CFG_OPTIONS} \
--batch-size ${BATCH_SIZE} \
--img-ext ${IMG_EXT}
Description of all arguments¶
deploy_cfg
: The path of the deploy config file in MMDeploy codebase.model_cfg
: The path of model config file in OpenMMLab codebase.image_dir
: The directory to image files that used to test the model.--model
: The path of the model to be tested.--shape
: Input shape of the model byHxW
, e.g.,800x1344
. If not specified, it would useinput_shape
from deploy config.--num-iter
: Number of iteration to run inference. Default is100
.--warmup
: Number of iteration to warm-up the machine. Default is10
.--device
: The device type. If not specified, it will be set tocuda:0
.--cfg-options
: Optional key-value pairs to be overrode for model config.--batch-size
: the batch size for test inference. Default is1
. Note that not all models supportbatch_size>1
.--img-ext
: the file extensions for input images fromimage_dir
. Defaults to['.jpg', '.jpeg', '.png', '.ppm', '.bmp', '.pgm', '.tif']
.
Example:¶
python tools/profiler.py \
configs/mmpretrain/classification_tensorrt_dynamic-224x224-224x224.py \
../mmpretrain/configs/resnet/resnet18_8xb32_in1k.py \
../mmpretrain/demo/ \
--model work-dirs/mmpretrain/resnet/trt/end2end.engine \
--device cuda \
--shape 224x224 \
--num-iter 100 \
--warmup 10 \
--batch-size 1
And the output look like this:
----- Settings:
+------------+---------+
| batch size | 1 |
| shape | 224x224 |
| iterations | 100 |
| warmup | 10 |
+------------+---------+
----- Results:
+--------+------------+---------+
| Stats | Latency/ms | FPS |
+--------+------------+---------+
| Mean | 1.535 | 651.656 |
| Median | 1.665 | 600.569 |
| Min | 1.308 | 764.341 |
| Max | 1.689 | 591.983 |
+--------+------------+---------+
generate_md_table¶
This tool can be used to generate supported-backends markdown table.
Usage¶
python tools/generate_md_table.py \
${YML_FILE} \
${OUTPUT} \
--backends ${BACKENDS}
Description of all arguments¶
yml_file:
input yml config pathoutput:
output markdown file path--backends:
output backends list. If not specified, it will be set ‘onnxruntime’ ‘tensorrt’ ‘torchscript’ ‘pplnn’ ‘openvino’ ‘ncnn’.
Example:¶
Generate backends markdown table from mmocr.yml
python tools/generate_md_table.py tests/regression/mmocr.yml tests/regression/mmocr.md --backends onnxruntime tensorrt torchscript pplnn openvino ncnn
And the output look like this:
model | task | onnxruntime | tensorrt | torchscript | pplnn | openvino | ncnn |
---|---|---|---|---|---|---|---|
DBNet | TextDetection | Y | Y | Y | Y | Y | Y |
DBNetpp | TextDetection | Y | Y | N | N | Y | Y |
PANet | TextDetection | Y | Y | Y | Y | Y | Y |
PSENet | TextDetection | Y | Y | Y | Y | Y | Y |
TextSnake | TextDetection | Y | Y | Y | N | N | N |
MaskRCNN | TextDetection | Y | Y | Y | N | N | N |
CRNN | TextRecognition | Y | Y | Y | Y | N | Y |
SAR | TextRecognition | Y | N | Y | N | N | N |
SATRN | TextRecognition | Y | Y | Y | N | N | N |
ABINet | TextRecognition | Y | Y | Y | N | N | N |
SDK Documentation¶
Setup & Usage¶
Quick Start¶
In terms of model deployment, most ML models require some preprocessing steps on the input data and postprocessing steps on the output to get structured output. MMDeploy sdk provides a lot of pre-processing and post-processing process. When you convert and deploy a model, you can enjoy the convenience brought by mmdeploy sdk.
Model Conversion¶
You can refer to convert model for more details.
After model conversion with --dump-info
, the structure of model directory (tensorrt model) is as follows. If you convert to other backend, the structure will be slightly different. The two images are for quick conversion validation.
├── deploy.json
├── detail.json
├── pipeline.json
├── end2end.onnx
├── end2end.engine
├── output_pytorch.jpg
└── output_tensorrt.jpg
The files related to sdk are:
deploy.json // model information.
pipeline.json // inference information.
end2end.engine // model file for tensort, will be different for other backends.
SDK can read the model directory directly or you can pack the related files to zip archive for better distribution or encryption. To read the zip file, the sdk should build with -DMMDEPLOY_ZIP_MODEL=ON
SDK Inference¶
Generally speaking, there are three steps to inference a model.
Create a pipeline
Load the data
Model inference
We use classifier
as an example to show these three steps.
Create a pipeline¶
std::string model_path = "/data/resnet"; // or "/data/resnet.zip" if build with `-DMMDEPLOY_ZIP_MODEL=ON`
mmdeploy_model_t model;
mmdeploy_model_create_by_path(model_path, &model);
mmdeploy_classifier_t classifier{};
mmdeploy_classifier_create(model, "cpu", 0, &classifier);
std::string model_path = "/data/resnet.zip"
std::ifstream ifs(model_path, std::ios::binary); // /path/to/zipmodel
ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0, std::ios::beg);
std::string str(size, '\0'); // binary data, should decrypt if it's encrypted
ifs.read(str.data(), size);
mmdeploy_model_t model;
mmdeploy_model_create(str.data(), size, &model);
mmdeploy_classifier_t classifier{};
mmdeploy_classifier_create(model, "cpu", 0, &classifier);
Load the data¶
cv::Mat img = cv::imread(image_path);
Model inference¶
mmdeploy_classification_t* res{};
int* res_count{};
mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
profiler¶
The SDK has ability to record the time consumption of each module in the pipeline. It’s closed by default. To use this ability, two steps are required:
Generate profiler data
Analyze profiler Data
Generate profiler data¶
Using the C interface and classification pipeline as an example, when creating the pipeline, the create api with context information needs to be used, and profiler handle needs to be added to the context. The detailed code is shown below. Running the demo normally will generate profiler data “profiler_data.txt” in the current directory.
#include <fstream>
#include <opencv2/imgcodecs/imgcodecs.hpp>
#include <string>
#include "mmdeploy/classifier.h"
int main(int argc, char* argv[]) {
if (argc != 4) {
fprintf(stderr, "usage:\n image_classification device_name dump_model_directory image_path\n");
return 1;
}
auto device_name = argv[1];
auto model_path = argv[2];
auto image_path = argv[3];
cv::Mat img = cv::imread(image_path);
if (!img.data) {
fprintf(stderr, "failed to load image: %s\n", image_path);
return 1;
}
mmdeploy_model_t model{};
mmdeploy_model_create_by_path(model_path, &model);
// create profiler and add it to context
// profiler data will save to profiler_data.txt
mmdeploy_profiler_t profiler{};
mmdeploy_profiler_create("profiler_data.txt", &profiler);
mmdeploy_context_t context{};
mmdeploy_context_create_by_device(device_name, 0, &context);
mmdeploy_context_add(context, MMDEPLOY_TYPE_PROFILER, nullptr, profiler);
mmdeploy_classifier_t classifier{};
int status{};
status = mmdeploy_classifier_create_v2(model, context, &classifier);
if (status != MMDEPLOY_SUCCESS) {
fprintf(stderr, "failed to create classifier, code: %d\n", (int)status);
return 1;
}
mmdeploy_mat_t mat{
img.data, img.rows, img.cols, 3, MMDEPLOY_PIXEL_FORMAT_BGR, MMDEPLOY_DATA_TYPE_UINT8};
// inference loop
for (int i = 0; i < 100; i++) {
mmdeploy_classification_t* res{};
int* res_count{};
status = mmdeploy_classifier_apply(classifier, &mat, 1, &res, &res_count);
mmdeploy_classifier_release_result(res, res_count, 1);
}
mmdeploy_classifier_destroy(classifier);
mmdeploy_model_destroy(model);
mmdeploy_profiler_destroy(profiler);
mmdeploy_context_destroy(context);
return 0;
}
Analyze profiler Data¶
The performance data can be visualized using a script.
python tools/sdk_analyze.py profiler_data.txt
The parsing results are as follows: “name” represents the name of the node, “n_call” represents the number of calls, “t_mean” represents the average time consumption, “t_50%” and “t_90%” represent the percentiles of the time consumption.
+---------------------------+--------+-------+--------+--------+-------+-------+
| name | occupy | usage | n_call | t_mean | t_50% | t_90% |
+===========================+========+=======+========+========+=======+=======+
| ./Pipeline | - | - | 100 | 4.831 | 1.913 | 1.946 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| Preprocess/Compose | - | - | 100 | 0.125 | 0.118 | 0.144 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| LoadImageFromFile | 0.017 | 0.017 | 100 | 0.081 | 0.077 | 0.098 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| Resize | 0.003 | 0.003 | 100 | 0.012 | 0.012 | 0.013 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| CenterCrop | 0.002 | 0.002 | 100 | 0.008 | 0.008 | 0.008 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| Normalize | 0.002 | 0.002 | 100 | 0.009 | 0.009 | 0.009 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| ImageToTensor | 0.002 | 0.002 | 100 | 0.008 | 0.007 | 0.007 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| Collect | 0.001 | 0.001 | 100 | 0.005 | 0.005 | 0.005 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| resnet | 0.968 | 0.968 | 100 | 4.678 | 1.767 | 1.774 |
+---------------------------+--------+-------+--------+--------+-------+-------+
| postprocess | 0.003 | 0.003 | 100 | 0.015 | 0.015 | 0.017 |
+---------------------------+--------+-------+--------+--------+-------+-------+
API Reference¶
C API Reference¶
common.h¶
-
enum mmdeploy_pixel_format_t¶
Values:
-
enumerator MMDEPLOY_PIXEL_FORMAT_BGR¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_RGB¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_GRAYSCALE¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_NV12¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_NV21¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_BGRA¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_COUNT¶
-
enumerator MMDEPLOY_PIXEL_FORMAT_BGR¶
-
enum mmdeploy_data_type_t¶
Values:
-
enumerator MMDEPLOY_DATA_TYPE_FLOAT¶
-
enumerator MMDEPLOY_DATA_TYPE_HALF¶
-
enumerator MMDEPLOY_DATA_TYPE_UINT8¶
-
enumerator MMDEPLOY_DATA_TYPE_INT32¶
-
enumerator MMDEPLOY_DATA_TYPE_COUNT¶
-
enumerator MMDEPLOY_DATA_TYPE_FLOAT¶
-
enum mmdeploy_status_t¶
Values:
-
enumerator MMDEPLOY_SUCCESS¶
-
enumerator MMDEPLOY_E_INVALID_ARG¶
-
enumerator MMDEPLOY_E_NOT_SUPPORTED¶
-
enumerator MMDEPLOY_E_OUT_OF_RANGE¶
-
enumerator MMDEPLOY_E_OUT_OF_MEMORY¶
-
enumerator MMDEPLOY_E_FILE_NOT_EXIST¶
-
enumerator MMDEPLOY_E_FAIL¶
-
enumerator MMDEPLOY_STATUS_COUNT¶
-
enumerator MMDEPLOY_SUCCESS¶
-
typedef struct mmdeploy_device *mmdeploy_device_t¶
-
typedef struct mmdeploy_profiler *mmdeploy_profiler_t¶
-
struct mmdeploy_mat_t¶
Public Members
-
uint8_t *data¶
-
int height¶
-
int width¶
-
int channel¶
-
mmdeploy_pixel_format_t format¶
-
mmdeploy_device_t device¶
-
uint8_t *data¶
-
struct mmdeploy_rect_t¶
-
struct mmdeploy_point_t¶
-
typedef struct mmdeploy_value *mmdeploy_value_t¶
-
typedef struct mmdeploy_context *mmdeploy_context_t¶
-
mmdeploy_value_t mmdeploy_value_copy(mmdeploy_value_t value)¶
Copy value
- Parameters
value –
- Returns
-
void mmdeploy_value_destroy(mmdeploy_value_t value)¶
Destroy value
- Parameters
value –
-
int mmdeploy_device_create(const char *device_name, int device_id, mmdeploy_device_t *device)¶
Create device handle
- Parameters
device_name –
device_id –
device –
- Returns
-
void mmdeploy_device_destroy(mmdeploy_device_t device)¶
Destroy device handle
- Parameters
device –
-
int mmdeploy_profiler_create(const char *path, mmdeploy_profiler_t *profiler)¶
Create profiler
- Parameters
path – path to save the profile data
profiler – handle for profiler, should be added to context and deleted by mmdeploy_profiler_destroy
- Returns
status of create
-
void mmdeploy_profiler_destroy(mmdeploy_profiler_t profiler)¶
Destroy profiler handle
- Parameters
profiler – handle for profiler, profile data will be written to disk after this call
-
int mmdeploy_context_create(mmdeploy_context_t *context)¶
Create context
- Parameters
context –
- Returns
-
int mmdeploy_context_create_by_device(const char *device_name, int device_id, mmdeploy_context_t *context)¶
Create context
- Parameters
device_name –
device_id –
context –
- Returns
-
void mmdeploy_context_destroy(mmdeploy_context_t context)¶
Destroy context
- Parameters
context –
-
int mmdeploy_context_add(mmdeploy_context_t context, mmdeploy_context_type_t type, const char *name, const void *object)¶
Add context object
- Parameters
context –
type –
name –
object –
- Returns
-
int mmdeploy_common_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *value)¶
Create input value from array of mats
- Parameters
mats –
mat_count –
value –
- Returns
executor.h¶
-
typedef mmdeploy_value_t (*mmdeploy_then_fn_t)(mmdeploy_value_t, void*)¶
-
typedef mmdeploy_value_t (*mmdeploy_then_fn_v2_t)(mmdeploy_value_t*, void*)¶
-
typedef int (*mmdeploy_then_fn_v3_t)(mmdeploy_value_t *input, mmdeploy_value_t *output, void*)¶
-
typedef struct mmdeploy_sender *mmdeploy_sender_t¶
-
typedef struct mmdeploy_scheduler *mmdeploy_scheduler_t¶
-
typedef mmdeploy_sender_t (*mmdeploy_let_value_fn_t)(mmdeploy_value_t, void*)¶
-
mmdeploy_scheduler_t mmdeploy_executor_inline()¶
-
mmdeploy_scheduler_t mmdeploy_executor_system_pool()¶
-
mmdeploy_scheduler_t mmdeploy_executor_create_thread_pool(int num_threads)¶
Create a thread pool with the given number of worker threads
- Parameters
num_threads – [in]
- Returns
the handle to the created thread pool
-
mmdeploy_scheduler_t mmdeploy_executor_create_thread()¶
-
mmdeploy_scheduler_t mmdeploy_executor_dynamic_batch(mmdeploy_scheduler_t scheduler, int max_batch_size, int timeout)¶
-
int mmdeploy_scheduler_destroy(mmdeploy_scheduler_t scheduler)¶
-
mmdeploy_sender_t mmdeploy_sender_copy(mmdeploy_sender_t input)¶
Create a copy of a copyable sender. Only senders created by mmdeploy_executor_split is copyable for now.
- Parameters
input – [in] copyable sender,
- Returns
the sender created, or nullptr if the sender is not copyable
-
int mmdeploy_sender_destroy(mmdeploy_sender_t sender)¶
Destroy a sender, notice that all sender adapters will consume input senders, only unused senders should be destroyed using this function.
- Parameters
input – [in]
-
mmdeploy_sender_t mmdeploy_executor_just(mmdeploy_value_t value)¶
Create a sender that sends the provided value.
- Parameters
value – [in]
- Returns
created sender
-
mmdeploy_sender_t mmdeploy_executor_schedule(mmdeploy_scheduler_t scheduler)¶
- Parameters
scheduler – [in]
- Returns
the sender created
-
mmdeploy_sender_t mmdeploy_executor_transfer_just(mmdeploy_scheduler_t scheduler, mmdeploy_value_t value)¶
-
mmdeploy_sender_t mmdeploy_executor_transfer(mmdeploy_sender_t input, mmdeploy_scheduler_t scheduler)¶
Transfer the execution to the execution agent of the provided scheduler
- Parameters
input – [in]
scheduler – [in]
- Returns
the sender created
-
mmdeploy_sender_t mmdeploy_executor_on(mmdeploy_scheduler_t scheduler, mmdeploy_sender_t input)¶
-
mmdeploy_sender_t mmdeploy_executor_then(mmdeploy_sender_t input, mmdeploy_then_fn_t fn, void *context)¶
-
mmdeploy_sender_t mmdeploy_executor_let_value(mmdeploy_sender_t input, mmdeploy_let_value_fn_t fn, void *context)¶
-
mmdeploy_sender_t mmdeploy_executor_split(mmdeploy_sender_t input)¶
Convert the input sender into a sender that is copyable via mmdeploy_sender_copy. Notice that this function doesn’t make the sender multi-shot, it just return a sender that is copyable.
- Parameters
input – [in]
- Returns
the sender that is copyable
-
mmdeploy_sender_t mmdeploy_executor_when_all(mmdeploy_sender_t inputs[], int32_t n)¶
-
mmdeploy_sender_t mmdeploy_executor_ensure_started(mmdeploy_sender_t input)¶
-
int mmdeploy_executor_start_detached(mmdeploy_sender_t input)¶
-
mmdeploy_value_t mmdeploy_executor_sync_wait(mmdeploy_sender_t input)¶
-
int mmdeploy_executor_sync_wait_v2(mmdeploy_sender_t input, mmdeploy_value_t *output)¶
-
void mmdeploy_executor_execute(mmdeploy_scheduler_t scheduler, void (*fn)(void*), void *context)¶
model.h¶
-
typedef struct mmdeploy_model *mmdeploy_model_t¶
-
int mmdeploy_model_create_by_path(const char *path, mmdeploy_model_t *model)¶
Create SDK Model instance from given model path.
- Parameters
path – [in] model path
model – [out] sdk model instance that must be destroyed by mmdeploy_model_destroy
- Returns
status code of the operation
-
int mmdeploy_model_create(const void *buffer, int size, mmdeploy_model_t *model)¶
Create SDK Model instance from memory.
- Parameters
buffer – [in] a linear buffer contains the model information
size – [in] size of
buffer
in bytesmodel – [out] sdk model instance that must be destroyed by mmdeploy_model_destroy
- Returns
status code of the operation
-
void mmdeploy_model_destroy(mmdeploy_model_t model)¶
Destroy model instance.
- Parameters
model – [in] sdk model instance created by mmdeploy_model_create_by_path or mmdeploy_model_create
pipeline.h¶
-
typedef struct mmdeploy_pipeline *mmdeploy_pipeline_t¶
-
int mmdeploy_pipeline_create_v3(mmdeploy_value_t config, mmdeploy_context_t context, mmdeploy_pipeline_t *pipeline)¶
Create pipeline
- Parameters
config –
context –
pipeline –
- Returns
-
int mmdeploy_pipeline_create_from_model(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_pipeline_t *pipeline)¶
Create pipeline from internal pipeline config of the model
- Parameters
model –
context –
pipeline –
- Returns
-
int mmdeploy_pipeline_apply(mmdeploy_pipeline_t pipeline, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Apply pipeline.
- Parameters
pipeline – [in] handle of the pipeline
input – [in] input value
output – [out] output value
- Returns
status of the operation
-
int mmdeploy_pipeline_apply_async(mmdeploy_pipeline_t pipeline, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Apply pipeline asynchronously
- Parameters
pipeline – handle of the pipeline
input – input sender that will be consumed by the operation
output – output sender
- Returns
status of the operation
-
void mmdeploy_pipeline_destroy(mmdeploy_pipeline_t pipeline)¶
destroy pipeline
- Parameters
pipeline – [in]
classifier.h¶
-
struct mmdeploy_classification_t¶
-
typedef struct mmdeploy_classifier *mmdeploy_classifier_t¶
-
int mmdeploy_classifier_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_classifier_t *classifier)¶
Create classifier’s handle.
- Parameters
model – [in] an instance of mmclassification sdk model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
classifier – [out] instance of a classifier, which must be destroyed by mmdeploy_classifier_destroy
- Returns
status of creating classifier’s handle
-
int mmdeploy_classifier_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_classifier_t *classifier)¶
Create classifier’s handle.
- Parameters
model_path – [in] path of mmclassification sdk model exported by mmdeploy model converter
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
classifier – [out] instance of a classifier, which must be destroyed by mmdeploy_classifier_destroy
- Returns
status of creating classifier’s handle
-
int mmdeploy_classifier_apply(mmdeploy_classifier_t classifier, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_classification_t **results, int **result_count)¶
Use classifier created by mmdeploy_classifier_create_by_path to get label information of each image in a batch.
- Parameters
classifier – [in] classifier’s handle created by mmdeploy_classifier_create_by_path
mats – [in] a batch of images
mat_count – [in] number of images in the batch
results – [out] a linear buffer to save classification results of each image, which must be freed by mmdeploy_classifier_release_result
result_count – [out] a linear buffer with length being
mat_count
to save the number of classification results of each image. It must be released by mmdeploy_classifier_release_result
- Returns
status of inference
-
void mmdeploy_classifier_release_result(mmdeploy_classification_t *results, const int *result_count, int count)¶
Release the inference result buffer created mmdeploy_classifier_apply.
- Parameters
results – [in] classification results buffer
result_count – [in]
results
size buffercount – [in] length of
result_count
-
void mmdeploy_classifier_destroy(mmdeploy_classifier_t classifier)¶
Destroy classifier’s handle.
- Parameters
classifier – [in] classifier’s handle created by mmdeploy_classifier_create_by_path
-
int mmdeploy_classifier_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_classifier_t *classifier)¶
Same as mmdeploy_classifier_create, but allows to control execution context of tasks via context.
-
int mmdeploy_classifier_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *value)¶
Pack classifier inputs into mmdeploy_value_t.
- Parameters
mats – [in] a batch of images
mat_count – [in] number of images in the batch
value – [out] the packed value
- Returns
status of the operation
-
int mmdeploy_classifier_apply_v2(mmdeploy_classifier_t classifier, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Same as mmdeploy_classifier_apply, but input and output are packed in mmdeploy_value_t.
-
int mmdeploy_classifier_apply_async(mmdeploy_classifier_t classifier, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Apply classifier asynchronously.
- Parameters
classifier – [in] handle of the classifier
input – [in] input sender that will be consumed by the operation
output – [out] output sender
- Returns
status of the operation
-
int mmdeploy_classifier_get_result(mmdeploy_value_t output, mmdeploy_classification_t **results, int **result_count)¶
- Parameters
output – [in] output obtained by applying a classifier
results – [out] a linear buffer containing classification results of each image, released by mmdeploy_classifier_release_result
result_count – [out] a linear buffer containing the number of results for each input image, released by mmdeploy_classifier_release_result
- Returns
status of the operation
detector.h¶
-
struct mmdeploy_instance_mask_t¶
-
struct mmdeploy_detection_t¶
-
typedef struct mmdeploy_detector *mmdeploy_detector_t¶
-
int mmdeploy_detector_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_detector_t *detector)¶
Create detector’s handle.
- Parameters
model – [in] an instance of mmdetection sdk model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] instance of a detector
- Returns
status of creating detector’s handle
-
int mmdeploy_detector_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_detector_t *detector)¶
Create detector’s handle.
- Parameters
model_path – [in] path of mmdetection sdk model exported by mmdeploy model converter
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] instance of a detector
- Returns
status of creating detector’s handle
-
int mmdeploy_detector_apply(mmdeploy_detector_t detector, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_detection_t **results, int **result_count)¶
Apply detector to batch images and get their inference results.
- Parameters
detector – [in] detector’s handle created by mmdeploy_detector_create_by_path
mats – [in] a batch of images
mat_count – [in] number of images in the batch
results – [out] a linear buffer to save detection results of each image. It must be released by mmdeploy_detector_release_result
result_count – [out] a linear buffer with length being
mat_count
to save the number of detection results of each image. And it must be released by mmdeploy_detector_release_result
- Returns
status of inference
-
void mmdeploy_detector_release_result(mmdeploy_detection_t *results, const int *result_count, int count)¶
Release the inference result buffer created by mmdeploy_detector_apply.
- Parameters
results – [in] detection results buffer
result_count – [in]
results
size buffercount – [in] length of
result_count
-
void mmdeploy_detector_destroy(mmdeploy_detector_t detector)¶
Destroy detector’s handle.
- Parameters
detector – [in] detector’s handle created by mmdeploy_detector_create_by_path
-
int mmdeploy_detector_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_detector_t *detector)¶
Same as mmdeploy_detector_create, but allows to control execution context of tasks via context.
-
int mmdeploy_detector_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *input)¶
Pack detector inputs into mmdeploy_value_t.
- Parameters
mats – [in] a batch of images
mat_count – [in] number of images in the batch
- Returns
the created value
-
int mmdeploy_detector_apply_v2(mmdeploy_detector_t detector, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Same as mmdeploy_detector_apply, but input and output are packed in mmdeploy_value_t.
-
int mmdeploy_detector_apply_async(mmdeploy_detector_t detector, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Apply detector asynchronously.
- Parameters
detector – [in] handle to the detector
input – [in] input sender
- Returns
output sender
-
int mmdeploy_detector_get_result(mmdeploy_value_t output, mmdeploy_detection_t **results, int **result_count)¶
Unpack detector output from a mmdeploy_value_t.
- Parameters
output – [in] output obtained by applying a detector
results – [out] a linear buffer to save detection results of each image. It must be released by mmdeploy_detector_release_result
result_count – [out] a linear buffer with length number of input images to save the number of detection results of each image. Must be released by mmdeploy_detector_release_result
- Returns
status of the operation
pose_detector.h¶
-
struct mmdeploy_pose_detection_t¶
Public Members
-
mmdeploy_point_t *point¶
keypoint
-
float *score¶
keypoint score
-
int length¶
number of keypoint
-
mmdeploy_point_t *point¶
-
typedef struct mmdeploy_pose_detector *mmdeploy_pose_detector_t¶
-
int mmdeploy_pose_detector_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_pose_detector_t *detector)¶
Create a pose detector instance.
- Parameters
model – [in] an instance of mmpose model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] handle of the created pose detector, which must be destroyed by mmdeploy_pose_detector_destroy
- Returns
status code of the operation
-
int mmdeploy_pose_detector_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_pose_detector_t *detector)¶
Create a pose detector instance.
- Parameters
model_path – [in] path to pose detection model
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] handle of the created pose detector, which must be destroyed by mmdeploy_pose_detector_destroy
- Returns
status code of the operation
-
int mmdeploy_pose_detector_apply(mmdeploy_pose_detector_t detector, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_pose_detection_t **results)¶
Apply pose detector to a batch of images with full image roi.
- Parameters
detector – [in] pose detector’s handle created by mmdeploy_pose_detector_create_by_path
images – [in] a batch of images
count – [in] number of images in the batch
results – [out] a linear buffer contains the pose result, must be release by mmdeploy_pose_detector_release_result
- Returns
status code of the operation
-
int mmdeploy_pose_detector_apply_bbox(mmdeploy_pose_detector_t detector, const mmdeploy_mat_t *mats, int mat_count, const mmdeploy_rect_t *bboxes, const int *bbox_count, mmdeploy_pose_detection_t **results)¶
Apply pose detector to a batch of images supplied with bboxes(roi)
- Parameters
detector – [in] pose detector’s handle created by mmdeploy_pose_detector_create_by_path
images – [in] a batch of images
image_count – [in] number of images in the batch
bboxes – [in] bounding boxes(roi) detected by mmdet
bbox_count – [in] number of bboxes of each
images
, must be same length asimages
results – [out] a linear buffer contains the pose result, which has the same length as
bboxes
, must be release by mmdeploy_pose_detector_release_result
- Returns
status code of the operation
-
void mmdeploy_pose_detector_release_result(mmdeploy_pose_detection_t *results, int count)¶
Release result buffer returned by mmdeploy_pose_detector_apply or mmdeploy_pose_detector_apply_bbox.
- Parameters
results – [in] result buffer by pose detector
count – [in] length of
result
-
void mmdeploy_pose_detector_destroy(mmdeploy_pose_detector_t detector)¶
destroy pose_detector
- Parameters
detector – [in] handle of pose_detector created by mmdeploy_pose_detector_create_by_path or mmdeploy_pose_detector_create
-
int mmdeploy_pose_detector_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_pose_detector_t *detector)¶
-
int mmdeploy_pose_detector_create_input(const mmdeploy_mat_t *mats, int mat_count, const mmdeploy_rect_t *bboxes, const int *bbox_count, mmdeploy_value_t *value)¶
-
int mmdeploy_pose_detector_apply_v2(mmdeploy_pose_detector_t detector, mmdeploy_value_t input, mmdeploy_value_t *output)¶
-
int mmdeploy_pose_detector_apply_async(mmdeploy_pose_detector_t detector, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
-
int mmdeploy_pose_detector_get_result(mmdeploy_value_t output, mmdeploy_pose_detection_t **results)¶
pose_tracker.h¶
-
typedef struct mmdeploy_pose_tracker *mmdeploy_pose_tracker_t¶
-
typedef struct mmdeploy_pose_tracker_state *mmdeploy_pose_tracker_state_t¶
-
struct mmdeploy_pose_tracker_param_t¶
Public Members
-
int32_t det_interval¶
-
int32_t det_label¶
-
float det_thr¶
-
float det_min_bbox_size¶
-
float det_nms_thr¶
-
int32_t pose_max_num_bboxes¶
-
float pose_kpt_thr¶
-
int32_t pose_min_keypoints¶
-
float pose_bbox_scale¶
-
float pose_min_bbox_size¶
-
float pose_nms_thr¶
-
float *keypoint_sigmas¶
-
int32_t keypoint_sigmas_size¶
-
float track_iou_thr¶
-
int32_t track_max_missing¶
-
int32_t track_history_size¶
-
float std_weight_position¶
-
float std_weight_velocity¶
-
float smooth_params[3]¶
-
int32_t det_interval¶
-
struct mmdeploy_pose_tracker_target_t¶
Public Members
-
mmdeploy_point_t *keypoints¶
-
int32_t keypoint_count¶
-
float *scores¶
-
mmdeploy_rect_t bbox¶
-
uint32_t target_id¶
-
mmdeploy_point_t *keypoints¶
-
int mmdeploy_pose_tracker_default_params(mmdeploy_pose_tracker_param_t *params)¶
Fill params with default parameters.
- Parameters
params – [inout]
- Returns
status of the operation
-
int mmdeploy_pose_tracker_create(mmdeploy_model_t det_model, mmdeploy_model_t pose_model, mmdeploy_context_t context, mmdeploy_pose_tracker_t *pipeline)¶
Create pose tracker pipeline.
- Parameters
det_model – [in] detection model object, created by mmdeploy_model_create
pose_model – [in] pose model object
context – [in] context object describing execution environment (device, profiler, etc…), created by mmdeploy_context_create
pipeline – [out] handle of the created pipeline
- Returns
status of the operation
-
void mmdeploy_pose_tracker_destroy(mmdeploy_pose_tracker_t pipeline)¶
Destroy pose tracker pipeline.
- Parameters
pipeline – [in]
-
int mmdeploy_pose_tracker_create_state(mmdeploy_pose_tracker_t pipeline, const mmdeploy_pose_tracker_param_t *params, mmdeploy_pose_tracker_state_t *state)¶
Create a tracker state handle corresponds to a video stream.
- Parameters
pipeline – [in] handle of a pose tracker pipeline
params – [in] params for creating the tracker state
state – [out] handle of the created tracker state
- Returns
status of the operation
-
void mmdeploy_pose_tracker_destroy_state(mmdeploy_pose_tracker_state_t state)¶
Destroy tracker state.
- Parameters
state – [in] handle of the tracker state
-
int mmdeploy_pose_tracker_apply(mmdeploy_pose_tracker_t pipeline, mmdeploy_pose_tracker_state_t *states, const mmdeploy_mat_t *frames, const int32_t *use_detect, int32_t count, mmdeploy_pose_tracker_target_t **results, int32_t **result_count)¶
Apply pose tracker pipeline, notice that this function supports batch operation by feeding arrays of size
count
tostates
,frames
anduse_detect
.- Parameters
pipeline – [in] handle of a pose tracker pipeline
states – [in] tracker states handles, array of size
count
frames – [in] input frames of size
count
use_detect – [in] control the use of detector, array of size
count
-1: use params.det_interval, 0: don’t use detector, 1: force use detectorcount – [in] batch size
results – [out] a linear buffer contains the tracked targets of input frames. Should be released by mmdeploy_pose_tracker_release_result
result_count – [out] a linear buffer of size
count
contains the number of tracked targets of the frames. Should be released by mmdeploy_pose_tracker_release_result
- Returns
status of the operation
-
void mmdeploy_pose_tracker_release_result(mmdeploy_pose_tracker_target_t *results, const int32_t *result_count, int count)¶
Release result objects.
- Parameters
results – [in]
result_count – [in]
count – [in]
rotated_detector.h¶
-
struct mmdeploy_rotated_detection_t¶
-
typedef struct mmdeploy_rotated_detector *mmdeploy_rotated_detector_t¶
-
int mmdeploy_rotated_detector_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_rotated_detector_t *detector)¶
Create rotated detector’s handle.
- Parameters
model – [in] an instance of mmrotate sdk model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] instance of a rotated detector
- Returns
status of creating rotated detector’s handle
-
int mmdeploy_rotated_detector_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_rotated_detector_t *detector)¶
Create rotated detector’s handle.
- Parameters
model_path – [in] path of mmrotate sdk model exported by mmdeploy model converter
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] instance of a rotated detector
- Returns
status of creating rotated detector’s handle
-
int mmdeploy_rotated_detector_apply(mmdeploy_rotated_detector_t detector, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_rotated_detection_t **results, int **result_count)¶
Apply rotated detector to batch images and get their inference results.
- Parameters
detector – [in] rotated detector’s handle created by mmdeploy_rotated_detector_create_by_path
mats – [in] a batch of images
mat_count – [in] number of images in the batch
results – [out] a linear buffer to save detection results of each image. It must be released by mmdeploy_rotated_detector_release_result
result_count – [out] a linear buffer with length being
mat_count
to save the number of detection results of each image. And it must be released by mmdeploy_rotated_detector_release_result
- Returns
status of inference
-
void mmdeploy_rotated_detector_release_result(mmdeploy_rotated_detection_t *results, const int *result_count)¶
Release the inference result buffer created by mmdeploy_rotated_detector_apply.
- Parameters
results – [in] rotated detection results buffer
result_count – [in]
results
size buffer
-
void mmdeploy_rotated_detector_destroy(mmdeploy_rotated_detector_t detector)¶
Destroy rotated detector’s handle.
- Parameters
detector – [in] rotated detector’s handle created by mmdeploy_rotated_detector_create_by_path or by mmdeploy_rotated_detector_create
-
int mmdeploy_rotated_detector_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_rotated_detector_t *detector)¶
Same as mmdeploy_detector_create, but allows to control execution context of tasks via context.
-
int mmdeploy_rotated_detector_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *input)¶
Pack rotated detector inputs into mmdeploy_value_t.
- Parameters
mats – [in] a batch of images
mat_count – [in] number of images in the batch
- Returns
the created value
-
int mmdeploy_rotated_detector_apply_v2(mmdeploy_rotated_detector_t detector, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Same as mmdeploy_rotated_detector_apply, but input and output are packed in mmdeploy_value_t.
-
int mmdeploy_rotated_detector_apply_async(mmdeploy_rotated_detector_t detector, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Apply rotated detector asynchronously.
- Parameters
detector – [in] handle to the detector
input – [in] input sender
- Returns
output sender
-
int mmdeploy_rotated_detector_get_result(mmdeploy_value_t output, mmdeploy_rotated_detection_t **results, int **result_count)¶
Unpack rotated detector output from a mmdeploy_value_t.
- Parameters
output – [in] output obtained by applying a detector
results – [out] a linear buffer to save detection results of each image. It must be released by mmdeploy_detector_release_result
result_count – [out] a linear buffer with length number of input images to save the number of detection results of each image. Must be released by mmdeploy_detector_release_result
- Returns
status of the operation
segmentor.h¶
-
struct mmdeploy_segmentation_t¶
Public Members
-
int height¶
height of
mask
that equals to the input image’s height
-
int width¶
width of
mask
that equals to the input image’s width
-
int classes¶
the number of labels in
mask
-
int *mask¶
segmentation mask of the input image, in which mask[i * width + j] indicates the label id of pixel at (i, j), this field might be null
-
float *score¶
segmentation score map of the input image in CHW format, in which score[height * width * k + i * width + j] indicates the score of class k at pixel (i, j), this field might be null
-
int height¶
-
typedef struct mmdeploy_segmentor *mmdeploy_segmentor_t¶
-
int mmdeploy_segmentor_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_segmentor_t *segmentor)¶
Create segmentor’s handle.
- Parameters
model – [in] an instance of mmsegmentation sdk model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
segmentor – [out] instance of a segmentor, which must be destroyed by mmdeploy_segmentor_destroy
- Returns
status of creating segmentor’s handle
-
int mmdeploy_segmentor_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_segmentor_t *segmentor)¶
Create segmentor’s handle.
- Parameters
model_path – [in] path of mmsegmentation sdk model exported by mmdeploy model converter
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
segmentor – [out] instance of a segmentor, which must be destroyed by mmdeploy_segmentor_destroy
- Returns
status of creating segmentor’s handle
-
int mmdeploy_segmentor_apply(mmdeploy_segmentor_t segmentor, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_segmentation_t **results)¶
Apply segmentor to batch images and get their inference results.
- Parameters
segmentor – [in] segmentor’s handle created by mmdeploy_segmentor_create_by_path or mmdeploy_segmentor_create
mats – [in] a batch of images
mat_count – [in] number of images in the batch
results – [out] a linear buffer of length
mat_count
to save segmentation result of each image. It must be released by mmdeploy_segmentor_release_result
- Returns
status of inference
-
void mmdeploy_segmentor_release_result(mmdeploy_segmentation_t *results, int count)¶
Release result buffer returned by mmdeploy_segmentor_apply.
- Parameters
results – [in] result buffer
count – [in] length of
results
-
void mmdeploy_segmentor_destroy(mmdeploy_segmentor_t segmentor)¶
Destroy segmentor’s handle.
- Parameters
segmentor – [in] segmentor’s handle created by mmdeploy_segmentor_create_by_path
-
int mmdeploy_segmentor_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_segmentor_t *segmentor)¶
-
int mmdeploy_segmentor_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *value)¶
-
int mmdeploy_segmentor_apply_v2(mmdeploy_segmentor_t segmentor, mmdeploy_value_t input, mmdeploy_value_t *output)¶
-
int mmdeploy_segmentor_apply_async(mmdeploy_segmentor_t segmentor, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
-
int mmdeploy_segmentor_get_result(mmdeploy_value_t output, mmdeploy_segmentation_t **results)¶
text_detector.h¶
-
struct mmdeploy_text_detection_t¶
Public Members
-
mmdeploy_point_t bbox[4]¶
a text bounding box of which the vertex are in clock-wise
-
float score¶
-
mmdeploy_point_t bbox[4]¶
-
typedef struct mmdeploy_text_detector *mmdeploy_text_detector_t¶
-
int mmdeploy_text_detector_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_text_detector_t *detector)¶
Create text-detector’s handle.
- Parameters
model – [in] an instance of mmocr text detection model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
detector – [out] instance of a text-detector, which must be destroyed by mmdeploy_text_detector_destroy
- Returns
status of creating text-detector’s handle
-
int mmdeploy_text_detector_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_text_detector_t *detector)¶
Create text-detector’s handle.
- Parameters
model_path – [in] path to text detection model
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device
detector – [out] instance of a text-detector, which must be destroyed by mmdeploy_text_detector_destroy
- Returns
status of creating text-detector’s handle
-
int mmdeploy_text_detector_apply(mmdeploy_text_detector_t detector, const mmdeploy_mat_t *mats, int mat_count, mmdeploy_text_detection_t **results, int **result_count)¶
Apply text-detector to batch images and get their inference results.
- Parameters
detector – [in] text-detector’s handle created by mmdeploy_text_detector_create_by_path
mats – [in] a batch of images
mat_count – [in] number of images in the batch
results – [out] a linear buffer to save text detection results of each image. It must be released by calling mmdeploy_text_detector_release_result
result_count – [out] a linear buffer of length
mat_count
to save the number of detection results of each image. It must be released by mmdeploy_detector_release_result
- Returns
status of inference
-
void mmdeploy_text_detector_release_result(mmdeploy_text_detection_t *results, const int *result_count, int count)¶
Release the inference result buffer returned by mmdeploy_text_detector_apply.
- Parameters
results – [in] text detection result buffer
result_count – [in]
results
size buffercount – [in] the length of buffer
result_count
-
void mmdeploy_text_detector_destroy(mmdeploy_text_detector_t detector)¶
Destroy text-detector’s handle.
- Parameters
detector – [in] text-detector’s handle created by mmdeploy_text_detector_create_by_path or mmdeploy_text_detector_create
-
int mmdeploy_text_detector_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_text_detector_t *detector)¶
Same as mmdeploy_text_detector_create, but allows to control execution context of tasks via context.
-
int mmdeploy_text_detector_create_input(const mmdeploy_mat_t *mats, int mat_count, mmdeploy_value_t *input)¶
Pack text-detector inputs into mmdeploy_value_t.
- Parameters
mats – [in] a batch of images
mat_count – [in] number of images in the batch
- Returns
the created value
-
int mmdeploy_text_detector_apply_v2(mmdeploy_text_detector_t detector, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Same as mmdeploy_text_detector_apply, but input and output are packed in mmdeploy_value_t.
-
int mmdeploy_text_detector_apply_async(mmdeploy_text_detector_t detector, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Apply text-detector asynchronously.
- Parameters
detector – [in] handle to the detector
input – [in] input sender that will be consumed by the operation
- Returns
output sender
-
int mmdeploy_text_detector_get_result(mmdeploy_value_t output, mmdeploy_text_detection_t **results, int **result_count)¶
Unpack detector output from a mmdeploy_value_t.
- Parameters
output – [in] output sender returned by applying a detector
results – [out] a linear buffer to save detection results of each image. It must be released by mmdeploy_text_detector_release_result
result_count – [out] a linear buffer with length number of input images to save the number of detection results of each image. Must be released by mmdeploy_text_detector_release_result
- Returns
status of the operation
-
typedef int (*mmdeploy_text_detector_continue_t)(mmdeploy_text_detection_t *results, int *result_count, void *context, mmdeploy_sender_t *output)¶
-
int mmdeploy_text_detector_apply_async_v3(mmdeploy_text_detector_t detector, const mmdeploy_mat_t *imgs, int img_count, mmdeploy_sender_t *output)¶
-
int mmdeploy_text_detector_continue_async(mmdeploy_sender_t input, mmdeploy_text_detector_continue_t cont, void *context, mmdeploy_sender_t *output)¶
text_recognizer.h¶
-
struct mmdeploy_text_recognition_t¶
-
typedef struct mmdeploy_text_recognizer *mmdeploy_text_recognizer_t¶
-
int mmdeploy_text_recognizer_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_text_recognizer_t *recognizer)¶
Create a text recognizer instance.
- Parameters
model – [in] an instance of mmocr text recognition model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
recognizer – [out] handle of the created text recognizer, which must be destroyed by mmdeploy_text_recognizer_destroy
- Returns
status code of the operation
-
int mmdeploy_text_recognizer_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_text_recognizer_t *recognizer)¶
Create a text recognizer instance.
- Parameters
model_path – [in] path to text recognition model
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
recognizer – [out] handle of the created text recognizer, which must be destroyed by mmdeploy_text_recognizer_destroy
- Returns
status code of the operation
-
int mmdeploy_text_recognizer_apply(mmdeploy_text_recognizer_t recognizer, const mmdeploy_mat_t *images, int count, mmdeploy_text_recognition_t **results)¶
Apply text recognizer to a batch of text images.
- Parameters
recognizer – [in] text recognizer’s handle created by mmdeploy_text_recognizer_create_by_path
images – [in] a batch of text images
count – [in] number of images in the batch
results – [out] a linear buffer contains the recognized text, must be release by mmdeploy_text_recognizer_release_result
- Returns
status code of the operation
-
int mmdeploy_text_recognizer_apply_bbox(mmdeploy_text_recognizer_t recognizer, const mmdeploy_mat_t *images, int image_count, const mmdeploy_text_detection_t *bboxes, const int *bbox_count, mmdeploy_text_recognition_t **results)¶
Apply text recognizer to a batch of images supplied with text bboxes.
- Parameters
recognizer – [in] text recognizer’s handle created by mmdeploy_text_recognizer_create_by_path
images – [in] a batch of text images
image_count – [in] number of images in the batch
bboxes – [in] bounding boxes detected by text detector
bbox_count – [in] number of bboxes of each
images
, must be same length asimages
results – [out] a linear buffer contains the recognized text, which has the same length as
bboxes
, must be release by mmdeploy_text_recognizer_release_result
- Returns
status code of the operation
-
void mmdeploy_text_recognizer_release_result(mmdeploy_text_recognition_t *results, int count)¶
Release result buffer returned by mmdeploy_text_recognizer_apply or mmdeploy_text_recognizer_apply_bbox.
- Parameters
results – [in] result buffer by text recognizer
count – [in] length of
result
-
void mmdeploy_text_recognizer_destroy(mmdeploy_text_recognizer_t recognizer)¶
destroy text recognizer
- Parameters
recognizer – [in] handle of text recognizer created by mmdeploy_text_recognizer_create_by_path or mmdeploy_text_recognizer_create
-
int mmdeploy_text_recognizer_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_text_recognizer_t *recognizer)¶
Same as mmdeploy_text_recognizer_create, but allows to control execution context of tasks via context.
-
int mmdeploy_text_recognizer_create_input(const mmdeploy_mat_t *images, int image_count, const mmdeploy_text_detection_t *bboxes, const int *bbox_count, mmdeploy_value_t *output)¶
Pack text-recognizer inputs into mmdeploy_value_t.
- Parameters
images – [in] a batch of images
image_count – [in] number of images in the batch
bboxes – [in] bounding boxes detected by text detector
bbox_count – [in] number of bboxes of each
images
, must be same length asimages
- Returns
value created
-
int mmdeploy_text_recognizer_apply_v2(mmdeploy_text_recognizer_t recognizer, mmdeploy_value_t input, mmdeploy_value_t *output)¶
-
int mmdeploy_text_recognizer_apply_async(mmdeploy_text_recognizer_t recognizer, mmdeploy_sender_t input, mmdeploy_sender_t *output)¶
Same as mmdeploy_text_recognizer_apply_bbox, but input and output are packed in mmdeploy_value_t.
-
int mmdeploy_text_recognizer_apply_async_v3(mmdeploy_text_recognizer_t recognizer, const mmdeploy_mat_t *imgs, int img_count, const mmdeploy_text_detection_t *bboxes, const int *bbox_count, mmdeploy_sender_t *output)¶
-
int mmdeploy_text_recognizer_continue_async(mmdeploy_sender_t input, mmdeploy_text_recognizer_continue_t cont, void *context, mmdeploy_sender_t *output)¶
-
int mmdeploy_text_recognizer_get_result(mmdeploy_value_t output, mmdeploy_text_recognition_t **results)¶
Unpack text-recognizer output from a mmdeploy_value_t.
- Parameters
output – [in]
results – [out]
- Returns
status of the operation
video_recognizer.h¶
-
struct mmdeploy_video_recognition_t¶
-
struct mmdeploy_video_sample_info_t¶
-
typedef struct mmdeploy_video_recognizer *mmdeploy_video_recognizer_t¶
-
int mmdeploy_video_recognizer_create(mmdeploy_model_t model, const char *device_name, int device_id, mmdeploy_video_recognizer_t *recognizer)¶
Create video recognizer’s handle.
- Parameters
model – [in] an instance of mmaction sdk model created by mmdeploy_model_create_by_path or mmdeploy_model_create in model.h
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
recognizer – [out] handle of the created video recognizer, which must be destroyed by mmdeploy_video_recognizer_destroy
- Returns
status of creating video recognizer’s handle
-
int mmdeploy_video_recognizer_create_by_path(const char *model_path, const char *device_name, int device_id, mmdeploy_video_recognizer_t *recognizer)¶
Create a video recognizer instance.
- Parameters
model_path – [in] path to video recognition model
device_name – [in] name of device, such as “cpu”, “cuda”, etc.
device_id – [in] id of device.
recognizer – [out] handle of the created video recognizer, which must be destroyed by mmdeploy_video_recognizer_destroy
- Returns
status code of the operation
-
int mmdeploy_video_recognizer_apply(mmdeploy_video_recognizer_t recognizer, const mmdeploy_mat_t *images, const mmdeploy_video_sample_info_t *video_info, int video_count, mmdeploy_video_recognition_t **results, int **result_count)¶
Apply video recognizer to a batch of videos.
- Parameters
recognizer – [in] video recognizer’s handle created by mmdeploy_video_recognizer_create_by_path
images – [in] a batch of videos
video_info – [in] video information of each video
video_count – [in] number of videos
results – [out] a linear buffer contains the recognized video, must be release by mmdeploy_video_recognizer_release_result
result_count – [out] a linear buffer with length being
video_count
to save the number of recognition results of each video. It must be released by mmdeploy_video_recognizer_release_result
- Returns
status code of the operation
-
void mmdeploy_video_recognizer_release_result(mmdeploy_video_recognition_t *results, int *result_count, int video_count)¶
Release result buffer returned by mmdeploy_video_recognizer_apply.
- Parameters
results – [in] result buffer by video recognizer
result_count – [in]
results
size buffervideo_count – [in] length of
result_count
-
void mmdeploy_video_recognizer_destroy(mmdeploy_video_recognizer_t recognizer)¶
destroy video recognizer
- Parameters
recognizer – [in] handle of video recognizer created by mmdeploy_video_recognizer_create_by_path or mmdeploy_video_recognizer_create
-
int mmdeploy_video_recognizer_create_v2(mmdeploy_model_t model, mmdeploy_context_t context, mmdeploy_video_recognizer_t *recognizer)¶
Same as mmdeploy_video_recognizer_create, but allows to control execution context of tasks via context.
-
int mmdeploy_video_recognizer_create_input(const mmdeploy_mat_t *images, const mmdeploy_video_sample_info_t *video_info, int video_count, mmdeploy_value_t *value)¶
Pack video recognizer inputs into mmdeploy_value_t.
- Parameters
images – [in] a batch of videos
video_info – [in] video information of each video
video_count – [in] number of videos in the batch
value – [out] created value
- Returns
status code of the operation
-
int mmdeploy_video_recognizer_apply_v2(mmdeploy_video_recognizer_t recognizer, mmdeploy_value_t input, mmdeploy_value_t *output)¶
Apply video recognizer to a batch of videos.
- Parameters
input – [in] packed input
output – [out] inference output
- Returns
status code of the operation
-
int mmdeploy_video_recognizer_get_result(mmdeploy_value_t output, mmdeploy_video_recognition_t **results, int **result_count)¶
Apply video recognizer to a batch of videos.
- Parameters
output – [in] inference output
results – [out] structured output
result_count – [out] number of each videos
- Returns
status code of the operation
Supported models¶
The table below lists the models that are guaranteed to be exportable to other backends.
Model config | Codebase | TorchScript | OnnxRuntime | TensorRT | ncnn | PPLNN | OpenVINO | Ascend | RKNN |
---|---|---|---|---|---|---|---|---|---|
RetinaNet | MMDetection | Y | Y | Y | Y | Y | Y | Y | Y |
Faster R-CNN | MMDetection | Y | Y | Y | Y | Y | Y | Y | N |
YOLOv3 | MMDetection | Y | Y | Y | Y | N | Y | Y | Y |
YOLOX | MMDetection | Y | Y | Y | Y | N | Y | N | Y |
FCOS | MMDetection | Y | Y | Y | Y | N | Y | N | N |
FSAF | MMDetection | Y | Y | Y | Y | Y | Y | N | Y |
Mask R-CNN | MMDetection | Y | Y | Y | N | N | Y | N | N |
SSD* | MMDetection | Y | Y | Y | Y | N | Y | N | Y |
FoveaBox | MMDetection | Y | Y | N | N | N | Y | N | N |
ATSS | MMDetection | N | Y | Y | N | N | Y | N | N |
GFL | MMDetection | N | Y | Y | N | ? | Y | N | N |
Cascade R-CNN | MMDetection | N | Y | Y | N | Y | Y | N | N |
Cascade Mask R-CNN | MMDetection | N | Y | Y | N | N | Y | N | N |
Swin Transformer* | MMDetection | N | Y | Y | N | N | Y | N | N |
VFNet | MMDetection | N | N | N | N | N | Y | N | N |
RepPoints | MMDetection | N | N | Y | N | ? | Y | N | N |
DETR | MMDetection | N | Y | Y | N | ? | N | N | N |
CenterNet | MMDetection | N | Y | Y | N | ? | Y | N | N |
SOLO | MMDetection | N | Y | N | N | N | Y | N | N |
SOLOv2 | MMDetection | N | Y | N | N | N | Y | N | N |
ResNet | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
ResNeXt | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
SE-ResNet | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
MobileNetV2 | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
MobileNetV3 | MMPretrain | Y | Y | Y | Y | N | Y | N | N |
ShuffleNetV1 | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
ShuffleNetV2 | MMPretrain | Y | Y | Y | Y | Y | Y | Y | Y |
VisionTransformer | MMPretrain | Y | Y | Y | Y | ? | Y | Y | N |
SwinTransformer | MMPretrain | Y | Y | Y | N | ? | N | ? | N |
MobileOne | MMPretrain | N | Y | Y | N | N | N | N | N |
FCN | MMSegmentation | Y | Y | Y | Y | Y | Y | Y | Y |
PSPNet*static | MMSegmentation | Y | Y | Y | Y | Y | Y | Y | Y |
DeepLabV3 | MMSegmentation | Y | Y | Y | Y | Y | Y | Y | N |
DeepLabV3+ | MMSegmentation | Y | Y | Y | Y | Y | Y | Y | N |
Fast-SCNN*static | MMSegmentation | Y | Y | Y | N | Y | Y | N | Y |
UNet | MMSegmentation | Y | Y | Y | Y | Y | Y | Y | Y |
ANN* | MMSegmentation | Y | Y | Y | N | N | N | N | N |
APCNet | MMSegmentation | Y | Y | Y | Y | N | N | N | Y |
BiSeNetV1 | MMSegmentation | Y | Y | Y | Y | N | Y | N | Y |
BiSeNetV2 | MMSegmentation | Y | Y | Y | Y | N | Y | N | N |
CGNet | MMSegmentation | Y | Y | Y | Y | N | Y | N | Y |
DMNet | MMSegmentation | ? | Y | N | N | N | N | N | N |
DNLNet | MMSegmentation | ? | Y | Y | Y | N | Y | N | N |
EMANet | MMSegmentation | Y | Y | Y | N | N | Y | N | N |
EncNet | MMSegmentation | Y | Y | Y | N | N | Y | N | N |
ERFNet | MMSegmentation | Y | Y | Y | Y | N | Y | N | Y |
FastFCN | MMSegmentation | Y | Y | Y | Y | N | Y | N | N |
GCNet | MMSegmentation | Y | Y | Y | N | N | N | N | N |
ICNet* | MMSegmentation | Y | Y | Y | N | N | Y | N | N |
ISANet*static | MMSegmentation | N | Y | Y | N | N | Y | N | Y |
NonLocal Net | MMSegmentation | ? | Y | Y | Y | N | Y | N | N |
OCRNet | MMSegmentation | ? | Y | Y | Y | N | Y | N | Y |
PointRend | MMSegmentation | Y | Y | Y | N | N | Y | N | N |
Semantic FPN | MMSegmentation | Y | Y | Y | Y | N | Y | N | Y |
STDC | MMSegmentation | Y | Y | Y | Y | N | Y | N | Y |
UPerNet* | MMSegmentation | ? | Y | Y | N | N | N | N | Y |
DANet | MMSegmentation | ? | Y | Y | N | N | N | N | N |
Segmenter *static | MMSegmentation | Y | Y | Y | Y | N | Y | N | N |
SRCNN | MMagic | Y | Y | Y | Y | Y | Y | N | N |
ESRGAN | MMagic | Y | Y | Y | Y | Y | Y | N | N |
SRGAN | MMagic | Y | Y | Y | Y | Y | Y | N | N |
SRResNet | MMagic | Y | Y | Y | Y | Y | Y | N | N |
Real-ESRGAN | MMagic | Y | Y | Y | Y | Y | Y | N | N |
EDSR | MMagic | Y | Y | Y | Y | N | Y | N | N |
RDN | MMagic | Y | Y | Y | Y | Y | Y | N | N |
DBNet | MMOCR | Y | Y | Y | Y | Y | Y | Y | N |
DBNetpp | MMOCR | Y | Y | Y | ? | ? | Y | ? | N |
PANet | MMOCR | Y | Y | Y | Y | ? | Y | Y | N |
PSENet | MMOCR | Y | Y | Y | Y | ? | Y | Y | N |
TextSnake | MMOCR | Y | Y | Y | Y | ? | ? | ? | N |
MaskRCNN | MMOCR | Y | Y | Y | ? | ? | ? | ? | N |
CRNN | MMOCR | Y | Y | Y | Y | Y | N | N | N |
SAR | MMOCR | N | Y | N | N | N | N | N | N |
SATRN | MMOCR | Y | Y | Y | N | N | N | N | N |
ABINet | MMOCR | Y | Y | Y | N | N | N | N | N |
HRNet | MMPose | N | Y | Y | Y | N | Y | N | N |
MSPN | MMPose | N | Y | Y | Y | N | Y | N | N |
LiteHRNet | MMPose | N | Y | Y | N | N | Y | N | N |
Hourglass | MMPose | N | Y | Y | Y | N | Y | N | N |
SimCC | MMPose | N | Y | Y | Y | N | N | N | N |
PointPillars | MMDetection3d | ? | Y | Y | N | N | Y | N | N |
CenterPoint (pillar) | MMDetection3d | ? | Y | Y | N | N | Y | N | N |
RotatedRetinaNet | RotatedDetection | N | Y | Y | N | N | N | N | N |
Oriented RCNN | RotatedDetection | N | Y | Y | N | N | N | N | N |
Gliding Vertex | RotatedDetection | N | N | Y | N | N | N | N | N |
Note¶
Tag:
static: This model only support static export. Please use
static
deploy config, just like $MMDEPLOY_DIR/configs/mmseg/segmentation_tensorrt_static-1024x2048.py.
SSD: When you convert SSD model, you need to use min shape deploy config just like 300x300-512x512 rather than 320x320-1344x1344, for example $MMDEPLOY_DIR/configs/mmdet/detection/detection_tensorrt_dynamic-300x300-512x512.py.
YOLOX: YOLOX with ncnn only supports static shape.
Swin Transformer: For TensorRT, only version 8.4+ is supported.
SAR: Chinese text recognition model is not supported as the protobuf size of ONNX is limited.
Benchmark¶
Latency benchmark¶
Platform¶
Ubuntu 18.04
ncnn 20211208
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT
Other settings¶
Static graph
Batch size 1
Synchronize devices after each inference.
We count the average inference performance of 100 images of the dataset.
Warm up. For ncnn, we warm up 30 iters for all codebases. As for other backends: for classification, we warm up 1010 iters; for other codebases, we warm up 10 iters.
Input resolution varies for different datasets of different codebases. All inputs are real images except for
mmagic
because the dataset is not large enough.
Users can directly test the speed through model profiling. And here is the benchmark in our environment.
mmpretrain | TensorRT(ms) | PPLNN(ms) | ncnn(ms) | Ascend(ms) | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
model | spatial | T4 | JetsonNano2GB | Jetson TX2 | T4 | SnapDragon888 | Adreno660 | Ascend310 | |||
fp32 | fp16 | int8 | fp32 | fp16 | fp32 | fp16 | fp32 | fp32 | fp32 | ||
ResNet | 224x224 | 2.97 | 1.26 | 1.21 | 59.32 | 30.54 | 24.13 | 1.30 | 33.91 | 25.93 | 2.49 |
ResNeXt | 224x224 | 4.31 | 1.42 | 1.37 | 88.10 | 49.18 | 37.45 | 1.36 | 133.44 | 69.38 | - |
SE-ResNet | 224x224 | 3.41 | 1.66 | 1.51 | 74.59 | 48.78 | 29.62 | 1.91 | 107.84 | 80.85 | - |
ShuffleNetV2 | 224x224 | 1.37 | 1.19 | 1.13 | 15.26 | 10.23 | 7.37 | 4.69 | 9.55 | 10.66 | - |
mmdet part1 | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
YOLOv3 | 320x320 | 14.76 | 24.92 | 24.92 | - | 18.07 |
SSD-Lite | 320x320 | 8.84 | 9.21 | 8.04 | 1.28 | 19.72 |
RetinaNet | 800x1344 | 97.09 | 25.79 | 16.88 | 780.48 | 38.34 |
FCOS | 800x1344 | 84.06 | 23.15 | 17.68 | - | - |
FSAF | 800x1344 | 82.96 | 21.02 | 13.50 | - | 30.41 |
Faster R-CNN | 800x1344 | 88.08 | 26.52 | 19.14 | 733.81 | 65.40 |
Mask R-CNN | 800x1344 | 104.83 | 58.27 | - | - | 86.80 |
mmdet part2 | ncnn | ||
---|---|---|---|
model | spatial | SnapDragon888 | Adreno660 |
fp32 | fp32 | ||
MobileNetv2-YOLOv3 | 320x320 | 48.57 | 66.55 |
SSD-Lite | 320x320 | 44.91 | 66.19 |
YOLOX | 416x416 | 111.60 | 134.50 |
mmagic | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
ESRGAN | 32x32 | 12.64 | 12.42 | 12.45 | - | 7.67 |
SRCNN | 32x32 | 0.70 | 0.35 | 0.26 | 58.86 | 0.56 |
mmocr | TensorRT(ms) | PPLNN(ms) | ncnn(ms) | ||||
---|---|---|---|---|---|---|---|
model | spatial | T4 | T4 | SnapDragon888 | Adreno660 | ||
fp32 | fp16 | int8 | fp16 | fp32 | fp32 | ||
DBNet | 640x640 | 10.70 | 5.62 | 5.00 | 34.84 | - | - |
CRNN | 32x32 | 1.93 | 1.40 | 1.36 | - | 10.57 | 20.00 |
mmseg | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
FCN | 512x1024 | 128.42 | 23.97 | 18.13 | 1682.54 | 27.00 |
PSPNet | 1x3x512x1024 | 119.77 | 24.10 | 16.33 | 1586.19 | 27.26 |
DeepLabV3 | 512x1024 | 226.75 | 31.80 | 19.85 | - | 36.01 |
DeepLabV3+ | 512x1024 | 151.25 | 47.03 | 50.38 | 2534.96 | 34.80 |
Performance benchmark¶
Users can directly test the performance through how_to_evaluate_a_model.md. And here is the benchmark in our environment.
mmpretrain | PyTorch | TorchScript | ONNX Runtime | TensorRT | PPLNN | Ascend | |||
---|---|---|---|---|---|---|---|---|---|
model | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
ResNet-18 | top-1 | 69.90 | 69.90 | 69.88 | 69.88 | 69.86 | 69.86 | 69.86 | 69.91 |
top-5 | 89.43 | 89.43 | 89.34 | 89.34 | 89.33 | 89.38 | 89.34 | 89.43 | |
ResNeXt-50 | top-1 | 77.90 | 77.90 | 77.90 | 77.90 | - | 77.78 | 77.89 | - |
top-5 | 93.66 | 93.66 | 93.66 | 93.66 | - | 93.64 | 93.65 | - | |
SE-ResNet-50 | top-1 | 77.74 | 77.74 | 77.74 | 77.74 | 77.75 | 77.63 | 77.73 | - |
top-5 | 93.84 | 93.84 | 93.84 | 93.84 | 93.83 | 93.72 | 93.84 | - | |
ShuffleNetV1 1.0x | top-1 | 68.13 | 68.13 | 68.13 | 68.13 | 68.13 | 67.71 | 68.11 | - |
top-5 | 87.81 | 87.81 | 87.81 | 87.81 | 87.81 | 87.58 | 87.80 | - | |
ShuffleNetV2 1.0x | top-1 | 69.55 | 69.55 | 69.55 | 69.55 | 69.54 | 69.10 | 69.54 | - |
top-5 | 88.92 | 88.92 | 88.92 | 88.92 | 88.91 | 88.58 | 88.92 | - | |
MobileNet V2 | top-1 | 71.86 | 71.86 | 71.86 | 71.86 | 71.87 | 70.91 | 71.84 | 71.87 |
top-5 | 90.42 | 90.42 | 90.42 | 90.42 | 90.40 | 89.85 | 90.41 | 90.42 | |
Vision Transformer | top-1 | 85.43 | 85.43 | - | 85.43 | 85.42 | - | - | 85.43 |
top-5 | 97.77 | 97.77 | - | 97.77 | 97.76 | - | - | 97.77 | |
Swin Transformer | top-1 | 81.18 | 81.18 | 81.18 | 81.18 | 81.18 | - | - | - |
top-5 | 95.61 | 95.61 | 95.61 | 95.61 | 95.61 | - | - | - | |
EfficientFormer | top-1 | 80.46 | 80.45 | 80.46 | 80.46 | - | - | - | - |
top-5 | 94.99 | 94.98 | 94.99 | 94.99 | - | - | - | - |
mmdet | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | Ascend | OpenVINO | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 | fp32 |
YOLOV3 | Object Detection | COCO2017 | box AP | 33.7 | 33.7 | - | 33.5 | 33.5 | 33.5 | - | - | - |
SSD | Object Detection | COCO2017 | box AP | 25.5 | 25.5 | - | 25.5 | 25.5 | - | - | - | - |
RetinaNet | Object Detection | COCO2017 | box AP | 36.5 | 36.4 | - | 36.4 | 36.4 | 36.3 | 36.5 | 36.4 | - |
FCOS | Object Detection | COCO2017 | box AP | 36.6 | - | - | 36.6 | 36.5 | - | - | - | - |
FSAF | Object Detection | COCO2017 | box AP | 37.4 | 37.4 | - | 37.4 | 37.4 | 37.2 | 37.4 | - | - |
CenterNet | Object Detection | COCO2017 | box AP | 25.9 | 26.0 | 26.0 | 26.0 | 25.8 | - | - | - | - |
YOLOX | Object Detection | COCO2017 | box AP | 40.5 | 40.3 | - | 40.3 | 40.3 | 29.3 | - | - | - |
Faster R-CNN | Object Detection | COCO2017 | box AP | 37.4 | 37.3 | - | 37.3 | 37.3 | 37.1 | 37.3 | 37.2 | - |
ATSS | Object Detection | COCO2017 | box AP | 39.4 | - | - | 39.4 | 39.4 | - | - | - | - |
Cascade R-CNN | Object Detection | COCO2017 | box AP | 40.4 | - | - | 40.4 | 40.4 | - | 40.4 | - | - |
GFL | Object Detection | COCO2017 | box AP | 40.2 | - | 40.2 | 40.2 | 40.0 | - | - | - | - |
RepPoints | Object Detection | COCO2017 | box AP | 37.0 | - | - | 36.9 | - | - | - | - | - |
DETR | Object Detection | COCO2017 | box AP | 40.1 | 40.1 | - | 40.1 | 40.1 | - | - | - | - |
Mask R-CNN | Instance Segmentation | COCO2017 | box AP | 38.2 | 38.1 | - | 38.1 | 38.1 | - | 38.0 | - | - |
mask AP | 34.7 | 34.7 | - | 33.7 | 33.7 | - | - | - | - | |||
Swin-Transformer | Instance Segmentation | COCO2017 | box AP | 42.7 | - | 42.7 | 42.5 | 37.7 | - | - | - | - |
mask AP | 39.3 | - | 39.3 | 39.3 | 35.4 | - | - | - | - | |||
SOLO | Instance Segmentation | COCO2017 | mask AP | 33.1 | - | 32.7 | - | - | - | - | - | 32.7 |
SOLOv2 | Instance Segmentation | COCO2017 | mask AP | 34.8 | - | 34.5 | - | - | - | - | - | 34.5 |
mmagic | Pytorch | TorchScript | ONNX Runtime | TensorRT | PPLNN | |||||
---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 |
SRCNN | Super Resolution | Set5 | PSNR | 28.4316 | 28.4120 | 28.4323 | 28.4323 | 28.4286 | 28.1995 | 28.4311 |
SSIM | 0.8099 | 0.8106 | 0.8097 | 0.8097 | 0.8096 | 0.7934 | 0.8096 | |||
ESRGAN | Super Resolution | Set5 | PSNR | 28.2700 | 28.2619 | 28.2592 | 28.2592 | - | - | 28.2624 |
SSIM | 0.7778 | 0.7784 | 0.7764 | 0.7774 | - | - | 0.7765 | |||
ESRGAN-PSNR | Super Resolution | Set5 | PSNR | 30.6428 | 30.6306 | 30.6444 | 30.6430 | - | - | 27.0426 |
SSIM | 0.8559 | 0.8565 | 0.8558 | 0.8558 | - | - | 0.8557 | |||
SRGAN | Super Resolution | Set5 | PSNR | 27.9499 | 27.9252 | 27.9408 | 27.9408 | - | - | 27.9388 |
SSIM | 0.7846 | 0.7851 | 0.7839 | 0.7839 | - | - | 0.7839 | |||
SRResNet | Super Resolution | Set5 | PSNR | 30.2252 | 30.2069 | 30.2300 | 30.2300 | - | - | 30.2294 |
SSIM | 0.8491 | 0.8497 | 0.8488 | 0.8488 | - | - | 0.8488 | |||
Real-ESRNet | Super Resolution | Set5 | PSNR | 28.0297 | - | 27.7016 | 27.7016 | - | - | 27.7049 |
SSIM | 0.8236 | - | 0.8122 | 0.8122 | - | - | 0.8123 | |||
EDSR | Super Resolution | Set5 | PSNR | 30.2223 | 30.2192 | 30.2214 | 30.2214 | 30.2211 | 30.1383 | - |
SSIM | 0.8500 | 0.8507 | 0.8497 | 0.8497 | 0.8497 | 0.8469 | - |
mmocr | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | OpenVINO | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
DBNet* | TextDetection | ICDAR2015 | recall | 0.7310 | 0.7308 | 0.7304 | 0.7198 | 0.7179 | 0.7111 | 0.7304 | 0.7309 |
precision | 0.8714 | 0.8718 | 0.8714 | 0.8677 | 0.8674 | 0.8688 | 0.8718 | 0.8714 | |||
hmean | 0.7950 | 0.7949 | 0.7950 | 0.7868 | 0.7856 | 0.7821 | 0.7949 | 0.7950 | |||
DBNetpp | TextDetection | ICDAR2015 | recall | 0.8209 | 0.8209 | 0.8209 | 0.8199 | 0.8204 | 0.8204 | - | 0.8209 |
precision | 0.9079 | 0.9079 | 0.9079 | 0.9117 | 0.9117 | 0.9142 | - | 0.9079 | |||
hmean | 0.8622 | 0.8622 | 0.8622 | 0.8634 | 0.8637 | 0.8648 | - | 0.8622 | |||
PSENet | TextDetection | ICDAR2015 | recall | 0.7526 | 0.7526 | 0.7526 | 0.7526 | 0.7520 | 0.7496 | - | 0.7526 |
precision | 0.8669 | 0.8669 | 0.8669 | 0.8669 | 0.8668 | 0.8550 | - | 0.8669 | |||
hmean | 0.8057 | 0.8057 | 0.8057 | 0.8057 | 0.8054 | 0.7989 | - | 0.8057 | |||
PANet | TextDetection | ICDAR2015 | recall | 0.7401 | 0.7401 | 0.7401 | 0.7357 | 0.7366 | - | - | 0.7401 |
precision | 0.8601 | 0.8601 | 0.8601 | 0.8570 | 0.8586 | - | - | 0.8601 | |||
hmean | 0.7955 | 0.7955 | 0.7955 | 0.7917 | 0.7930 | - | - | 0.7955 | |||
TextSnake | TextDetection | CTW1500 | recall | 0.8052 | 0.8052 | 0.8052 | 0.8055 | - | - | - | - |
precision | 0.8535 | 0.8535 | 0.8535 | 0.8538 | - | - | - | - | |||
hmean | 0.8286 | 0.8286 | 0.8286 | 0.8290 | - | - | - | - | |||
MaskRCNN | TextDetection | ICDAR2015 | recall | 0.7766 | 0.7766 | 0.7766 | 0.7766 | 0.7761 | 0.7670 | - | - |
precision | 0.8644 | 0.8644 | 0.8644 | 0.8644 | 0.8630 | 0.8705 | - | - | |||
hmean | 0.8182 | 0.8182 | 0.8182 | 0.8182 | 0.8172 | 0.8155 | - | - | |||
CRNN | TextRecognition | IIIT5K | acc | 0.8067 | 0.8067 | 0.8067 | 0.8067 | 0.8063 | 0.8067 | 0.8067 | - |
SAR | TextRecognition | IIIT5K | acc | 0.9517 | - | 0.9287 | - | - | - | - | - |
SATRN | TextRecognition | IIIT5K | acc | 0.9470 | 0.9487 | 0.9487 | 0.9487 | 0.9483 | 0.9483 | - | - |
ABINet | TextRecognition | IIIT5K | acc | 0.9603 | 0.9563 | 0.9563 | 0.9573 | 0.9507 | 0.9510 | - | - |
mmseg | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | Ascend | ||||
---|---|---|---|---|---|---|---|---|---|---|
model | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
FCN | Cityscapes | mIoU | 72.25 | 72.36 | - | 72.36 | 72.35 | 74.19 | 72.35 | 72.35 |
PSPNet | Cityscapes | mIoU | 78.55 | 78.66 | - | 78.26 | 78.24 | 77.97 | 78.09 | 78.67 |
deeplabv3 | Cityscapes | mIoU | 79.09 | 79.12 | - | 79.12 | 79.12 | 78.96 | 79.12 | 79.06 |
deeplabv3+ | Cityscapes | mIoU | 79.61 | 79.60 | - | 79.60 | 79.60 | 79.43 | 79.60 | 79.51 |
Fast-SCNN | Cityscapes | mIoU | 70.96 | 70.96 | - | 70.93 | 70.92 | 66.00 | 70.92 | - |
UNet | Cityscapes | mIoU | 69.10 | - | - | 69.10 | 69.10 | 68.95 | - | - |
ANN | Cityscapes | mIoU | 77.40 | - | - | 77.32 | 77.32 | - | - | - |
APCNet | Cityscapes | mIoU | 77.40 | - | - | 77.32 | 77.32 | - | - | - |
BiSeNetV1 | Cityscapes | mIoU | 74.44 | - | - | 74.44 | 74.43 | - | - | - |
BiSeNetV2 | Cityscapes | mIoU | 73.21 | - | - | 73.21 | 73.21 | - | - | - |
CGNet | Cityscapes | mIoU | 68.25 | - | - | 68.27 | 68.27 | - | - | - |
EMANet | Cityscapes | mIoU | 77.59 | - | - | 77.59 | 77.6 | - | - | - |
EncNet | Cityscapes | mIoU | 75.67 | - | - | 75.66 | 75.66 | - | - | - |
ERFNet | Cityscapes | mIoU | 71.08 | - | - | 71.08 | 71.07 | - | - | - |
FastFCN | Cityscapes | mIoU | 79.12 | - | - | 79.12 | 79.12 | - | - | - |
GCNet | Cityscapes | mIoU | 77.69 | - | - | 77.69 | 77.69 | - | - | - |
ICNet | Cityscapes | mIoU | 76.29 | - | - | 76.36 | 76.36 | - | - | - |
ISANet | Cityscapes | mIoU | 78.49 | - | - | 78.49 | 78.49 | - | - | - |
OCRNet | Cityscapes | mIoU | 74.30 | - | - | 73.66 | 73.67 | - | - | - |
PointRend | Cityscapes | mIoU | 76.47 | 76.47 | - | 76.41 | 76.42 | - | - | - |
Semantic FPN | Cityscapes | mIoU | 74.52 | - | - | 74.52 | 74.52 | - | - | - |
STDC | Cityscapes | mIoU | 75.10 | - | - | 75.10 | 75.10 | - | - | - |
STDC | Cityscapes | mIoU | 77.17 | - | - | 77.17 | 77.17 | - | - | - |
UPerNet | Cityscapes | mIoU | 77.10 | - | - | 77.19 | 77.18 | - | - | - |
Segmenter | ADE20K | mIoU | 44.32 | 44.29 | 44.29 | 44.29 | 43.34 | 43.35 | - | - |
mmpose | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
HRNet | Pose Detection | COCO | AP | 0.748 | 0.748 | 0.748 | 0.748 | - | 0.748 |
AR | 0.802 | 0.802 | 0.802 | 0.802 | - | 0.802 | |||
LiteHRNet | Pose Detection | COCO | AP | 0.663 | 0.663 | 0.663 | - | - | 0.663 |
AR | 0.728 | 0.728 | 0.728 | - | - | 0.728 | |||
MSPN | Pose Detection | COCO | AP | 0.762 | 0.762 | 0.762 | 0.762 | - | 0.762 |
AR | 0.825 | 0.825 | 0.825 | 0.825 | - | 0.825 | |||
Hourglass | Pose Detection | COCO | AP | 0.717 | 0.717 | 0.717 | 0.717 | - | 0.717 |
AR | 0.774 | 0.774 | 0.774 | 0.774 | - | 0.774 | |||
SimCC | Pose Detection | COCO | AP | 0.607 | - | 0.608 | - | - | - |
AR | 0.668 | - | 0.672 | - | - | - |
mmrotate | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metrics | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
RotatedRetinaNet | Rotated Detection | DOTA-v1.0 | mAP | 0.698 | 0.698 | 0.698 | 0.697 | - | - |
Oriented RCNN | Rotated Detection | DOTA-v1.0 | mAP | 0.756 | 0.756 | 0.758 | 0.730 | - | - |
GlidingVertex | Rotated Detection | DOTA-v1.0 | mAP | 0.732 | - | 0.733 | 0.731 | - | - |
RoI Transformer | Rotated Detection | DOTA-v1.0 | mAP | 0.761 | - | 0.758 | - | - | - |
mmaction2 | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metrics | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
TSN | Recognition | Kinetics-400 | top-1 | 69.71 | - | 69.71 | - | - | - |
top-5 | 88.75 | - | 88.75 | - | - | - | |||
SlowFast | Recognition | Kinetics-400 | top-1 | 74.45 | - | 75.62 | - | - | - |
top-5 | 91.55 | - | 92.10 | - | - | - |
As some datasets contain images with various resolutions in codebase like MMDet. The speed benchmark is gained through static configs in MMDeploy, while the performance benchmark is gained through dynamic ones.
Some int8 performance benchmarks of TensorRT require Nvidia cards with tensor core, or the performance would drop heavily.
DBNet uses the interpolate mode
nearest
in the neck of the model, which TensorRT-7 applies a quite different strategy from Pytorch. To make the repository compatible with TensorRT-7, we rewrite the neck to use the interpolate modebilinear
which improves final detection performance. To get the matched performance with Pytorch, TensorRT-8+ is recommended, which the interpolate methods are all the same as Pytorch.Mask AP of Mask R-CNN drops by 1% for the backend. The main reason is that the predicted masks are directly interpolated to original image in PyTorch, while they are at first interpolated to the preprocessed input image of the model and then to original image in other backends.
MMPose models are tested with
flip_test
explicitly set toFalse
in model configs.Some models might get low accuracy in fp16 mode. Please adjust the model to avoid value overflow.
Test on embedded device¶
Here are the test conclusions of our edge devices. You can directly obtain the results of your own environment with model profiling.
Software and hardware environment¶
host OS ubuntu 18.04
backend SNPE-1.59
device Mi11 (qcom 888)
mmpretrain¶
model | dataset | spatial | fp32 top-1 (%) | snpe gpu hybrid fp32 top-1 (%) | latency (ms) |
---|---|---|---|---|---|
ShuffleNetV2 | ImageNet-1k | 224x224 | 69.55 | 69.83* | 20±7 |
MobilenetV2 | ImageNet-1k | 224x224 | 71.86 | 72.14* | 15±6 |
tips:
The ImageNet-1k dataset is too large to test, only part of the dataset is used (8000/50000)
The heating of device will downgrade the frequency, so the time consumption will actually fluctuate. Here are the stable values after running for a period of time. This result is closer to the actual demand.
mmocr detection¶
model | dataset | spatial | fp32 hmean | snpe gpu hybrid hmean | latency(ms) |
---|---|---|---|---|---|
PANet | ICDAR2015 | 1312x736 | 0.795 | 0.785 @thr=0.9 | 3100±100 |
mmpose¶
model | dataset | spatial | snpe hybrid AR@IoU=0.50 | snpe hybrid AP@IoU=0.50 | latency(ms) |
---|---|---|---|---|---|
pose_hrnet_w32 | Animalpose | 256x256 | 0.997 | 0.989 | 630±50 |
tips:
Test
pose_hrnet
using AnimalPose’s test dataset instead of val dataset.
mmseg¶
model | dataset | spatial | mIoU | latency(ms) |
---|---|---|---|---|
fcn | Cityscapes | 512x1024 | 71.11 | 4915±500 |
tips:
fcn
works fine with 512x1024 size. Cityscapes dataset uses 1024x2048 resolution which causes device to reboot.
Notes¶
We needs to manually split the mmdet model into two parts. Because
In snpe source code,
onnx_to_ir.py
can only parse onnx input whileir_to_dlc.py
does not supporttopk
operatorUDO (User Defined Operator) does not work with
snpe-onnx-to-dlc
mmagic model
srcnn
requires cubic resize which snpe does not supportesrgan
converts fine, but loading the model causes the device to reboot
mmrotate depends on e2cnn and needs to be installed manually its Python3.6 compatible branch
Test on TVM¶
Supported Models¶
Model | Codebase | Model config |
---|---|---|
RetinaNet | MMDetection | config |
Faster R-CNN | MMDetection | config |
YOLOv3 | MMDetection | config |
YOLOX | MMDetection | config |
Mask R-CNN | MMDetection | config |
SSD | MMDetection | config |
ResNet | MMPretrain | config |
ResNeXt | MMPretrain | config |
SE-ResNet | MMPretrain | config |
MobileNetV2 | MMPretrain | config |
ShuffleNetV1 | MMPretrain | config |
ShuffleNetV2 | MMPretrain | config |
VisionTransformer | MMPretrain | config |
FCN | MMSegmentation | config |
PSPNet | MMSegmentation | config |
DeepLabV3 | MMSegmentation | config |
DeepLabV3+ | MMSegmentation | config |
UNet | MMSegmentation | config |
The table above list the models that we have tested. Models not listed on the table might still be able to converted. Please have a try.
Test¶
Ubuntu 20.04
tvm 0.9.0
mmpretrain | metric | PyTorch | TVM |
---|---|---|---|
ResNet-18 | top-1 | 69.90 | 69.90 |
ResNeXt-50 | top-1 | 77.90 | 77.90 |
ShuffleNet V2 | top-1 | 69.55 | 69.55 |
MobileNet V2 | top-1 | 71.86 | 71.86 |
mmdet(*) | metric | PyTorch | TVM |
---|---|---|---|
SSD | box AP | 25.5 | 25.5 |
*: We only test model on ssd since dynamic shape is not supported for now.
mmseg | metric | PyTorch | TVM |
---|---|---|---|
FCN | mIoU | 72.25 | 72.36 |
PSPNet | mIoU | 78.55 | 77.90 |
Quantization test result¶
Currently mmdeploy support ncnn quantization
Quantize with ncnn¶
mmpretrain¶
model | dataset | fp32 top-1 (%) | int8 top-1 (%) |
---|---|---|---|
ResNet-18 | Cifar10 | 94.82 | 94.83 |
ResNeXt-32x4d-50 | ImageNet-1k | 77.90 | 78.20* |
MobileNet V2 | ImageNet-1k | 71.86 | 71.43* |
HRNet-W18* | ImageNet-1k | 76.75 | 76.25* |
Note:
Because of the large amount of imagenet-1k data and ncnn has not released Vulkan int8 version, only part of the test set (4000/50000) is used.
The accuracy will vary after quantization, and it is normal for the classification model to increase by less than 1%.
MMPretrain Deployment¶
MMPretrain aka mmpretrain
is an open-source image classification toolbox based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmpretrain¶
Please follow this quick guide to install mmpretrain.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmpretrain models to the specified backend models. Its detailed usage can be learned from here.
The command below shows an example about converting resnet18
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download resnet18 model from mmpretrain model zoo
mim download mmpretrain --config resnet18_8xb32_in1k --dest .
# convert mmpretrain model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmpretrain/classification_onnxruntime_dynamic.py \
resnet18_8xb32_in1k.py \
resnet18_8xb32_in1k_20210831-fbbb1da6.pth \
tests/data/tiger.jpeg \
--work-dir mmdeploy_models/mmpretrain/ort \
--device cpu \
--show \
--dump-info
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deployment config files of all supported backends for mmpretrain. The config filename pattern is:
classification_{backend}-{precision}_{static | dynamic}_{shape}.py
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml and etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert resnet18
to other backend models by changing the deployment config file classification_onnxruntime_dynamic.py
to others, e.g., converting to tensorrt-fp16 model by classification_tensorrt-fp16_dynamic-224x224-224x224.py
.
Tip
When converting mmpretrain models to tensorrt models, –device should be set to “cuda”
Model Specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmpretrain/ort
in the previous example. It includes:
mmdeploy_models/mmpretrain/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmpretrain/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmpretrain/classification_onnxruntime_dynamic.py'
model_cfg = './resnet18_8xb32_in1k.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmpretrain/ort/end2end.onnx']
image = 'tests/data/tiger.jpeg'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_classification.png')
SDK model inference¶
You can also perform SDK model inference like following,
from mmdeploy_runtime import Classifier
import cv2
img = cv2.imread('tests/data/tiger.jpeg')
# create a classifier
classifier = Classifier(model_path='./mmdeploy_models/mmpretrain/ort', device_name='cpu', device_id=0)
# perform inference
result = classifier(img)
# show inference result
for label_id, score in result:
print(label_id, score)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | TorchScript | ONNX Runtime | TensorRT | ncnn | PPLNN | OpenVINO |
---|---|---|---|---|---|---|
ResNet | Y | Y | Y | Y | Y | Y |
ResNeXt | Y | Y | Y | Y | Y | Y |
SE-ResNet | Y | Y | Y | Y | Y | Y |
MobileNetV2 | Y | Y | Y | Y | Y | Y |
MobileNetV3 | Y | Y | Y | Y | ? | Y |
ShuffleNetV1 | Y | Y | Y | Y | Y | Y |
ShuffleNetV2 | Y | Y | Y | Y | Y | Y |
VisionTransformer | Y | Y | Y | Y | ? | Y |
SwinTransformer | Y | Y | Y | N | ? | Y |
MobileOne | Y | Y | Y | Y | ? | Y |
EfficientNet | Y | Y | Y | N | ? | Y |
Conformer | Y | Y | Y | N | ? | Y |
EfficientFormer | Y | Y | Y | N | ? | Y |
MMDetection Deployment¶
MMDetection aka mmdet
is an open source object detection toolbox based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmdet¶
Please follow the installation guide to install mmdet.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmdet models to the specified backend models. Its detailed usage can be learned from here.
The command below shows an example about converting Faster R-CNN
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download faster r-cnn model from mmdet model zoo
mim download mmdet --config faster-rcnn_r50_fpn_1x_coco --dest .
# convert mmdet model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmdet/detection/detection_onnxruntime_dynamic.py \
faster-rcnn_r50_fpn_1x_coco.py \
faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth \
demo/resources/det.jpg \
--work-dir mmdeploy_models/mmdet/ort \
--device cpu \
--show \
--dump-info
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deployment config files of all supported backends for mmdetection, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
{task}: task in mmdetection.
There are two of them. One is
detection
and the other isinstance-seg
, indicating instance segmentation.mmdet models like
RetinaNet
,Faster R-CNN
andDETR
and so on belongs todetection
task. WhileMask R-CNN
is one ofinstance-seg
models. You can find more of them in chapter Supported models.DO REMEMBER TO USE
detection/detection_*.py
deployment config file when trying to convert detection models and useinstance-seg/instance-seg_*.py
to deploy instance segmentation models.{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert faster r-cnn
to other backend models by changing the deployment config file detection_onnxruntime_dynamic.py
to others, e.g., converting to tensorrt-fp16 model by detection_tensorrt-fp16_dynamic-320x320-1344x1344.py
.
Tip
When converting mmdet models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmdet/ort
in the previous example. It includes:
mmdeploy_models/mmdet/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmdet/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmdet/detection/detection_onnxruntime_dynamic.py'
model_cfg = './faster-rcnn_r50_fpn_1x_coco.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmdet/ort/end2end.onnx']
image = './demo/resources/det.jpg'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_detection.png')
SDK model inference¶
You can also perform SDK model inference like following,
from mmdeploy_runtime import Detector
import cv2
img = cv2.imread('./demo/resources/det.jpg')
# create a detector
detector = Detector(model_path='./mmdeploy_models/mmdet/ort', device_name='cpu', device_id=0)
# perform inference
bboxes, labels, masks = detector(img)
# visualize inference result
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
[left, top, right, bottom], score = bbox[0:4].astype(int), bbox[4]
if score < 0.3:
continue
cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))
cv2.imwrite('output_detection.png', img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | Task | OnnxRuntime | TensorRT | ncnn | PPLNN | OpenVINO |
---|---|---|---|---|---|---|
ATSS | Object Detection | Y | Y | N | N | Y |
FCOS | Object Detection | Y | Y | Y | N | Y |
FoveaBox | Object Detection | Y | N | N | N | Y |
FSAF | Object Detection | Y | Y | Y | Y | Y |
RetinaNet | Object Detection | Y | Y | Y | Y | Y |
SSD | Object Detection | Y | Y | Y | N | Y |
VFNet | Object Detection | N | N | N | N | Y |
YOLOv3 | Object Detection | Y | Y | Y | N | Y |
YOLOX | Object Detection | Y | Y | Y | N | Y |
Cascade R-CNN | Object Detection | Y | Y | N | Y | Y |
Faster R-CNN | Object Detection | Y | Y | Y | Y | Y |
Faster R-CNN + DCN | Object Detection | Y | Y | Y | Y | Y |
GFL | Object Detection | Y | Y | N | ? | Y |
RepPoints | Object Detection | N | Y | N | ? | Y |
DETR* | Object Detection | Y | Y | N | ? | Y |
Deformable DETR* | Object Detection | Y | Y | N | ? | Y |
Conditional DETR* | Object Detection | Y | Y | N | ? | Y |
DAB-DETR* | Object Detection | Y | Y | N | ? | Y |
DINO* | Object Detection | Y | Y | N | ? | Y |
CenterNet | Object Detection | Y | Y | N | ? | Y |
RTMDet | Object Detection | Y | Y | N | ? | Y |
Cascade Mask R-CNN | Instance Segmentation | Y | Y | N | N | Y |
Mask R-CNN | Instance Segmentation | Y | Y | N | N | Y |
Swin Transformer | Instance Segmentation | Y | Y | N | N | Y |
SOLO | Instance Segmentation | Y | N | N | N | Y |
SOLOv2 | Instance Segmentation | Y | N | N | N | Y |
Panoptic FPN | Panoptic Segmentation | Y | Y | N | N | N |
MaskFormer | Panoptic Segmentation | Y | Y | N | N | N |
Mask2Former* | Panoptic Segmentation | Y | Y | N | N | N |
Reminder¶
For transformer based models, strongly suggest use
TensorRT>=8.4
.Mask2Former should use
TensorRT>=8.6.1
for dynamic shape inference.DETR-like models do not support multi-batch inference.
MMSegmentation Deployment¶
MMSegmentation aka mmseg
is an open source semantic segmentation toolbox based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmseg¶
Please follow the installation guide to install mmseg.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
NOTE:
Adding
$(pwd)/build/lib
toPYTHONPATH
is for importing mmdeploy SDK python module -mmdeploy_runtime
, which will be presented in chapter SDK model inference.When inference onnx model by ONNX Runtime, it requests ONNX Runtime library be found. Thus, we add it to
LD_LIBRARY_PATH
.
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmseg models to the specified backend models. Its detailed usage can be learned from here.
The command below shows an example about converting unet
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download unet model from mmseg model zoo
mim download mmsegmentation --config unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024 --dest .
# convert mmseg model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmseg/segmentation_onnxruntime_dynamic.py \
unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py \
fcn_unet_s5-d16_4x4_512x1024_160k_cityscapes_20211210_145204-6860854e.pth \
demo/resources/cityscapes.png \
--work-dir mmdeploy_models/mmseg/ort \
--device cpu \
--show \
--dump-info
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deployment config files of all supported backends for mmsegmentation. The config filename pattern is:
segmentation_{backend}-{precision}_{static | dynamic}_{shape}.py
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert unet
to other backend models by changing the deployment config file segmentation_onnxruntime_dynamic.py
to others, e.g., converting to tensorrt-fp16 model by segmentation_tensorrt-fp16_dynamic-512x1024-2048x2048.py
.
Tip
When converting mmseg models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmseg/ort
in the previous example. It includes:
mmdeploy_models/mmseg/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmseg/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmseg/segmentation_onnxruntime_dynamic.py'
model_cfg = './unet-s5-d16_fcn_4xb4-160k_cityscapes-512x1024.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmseg/ort/end2end.onnx']
image = './demo/resources/cityscapes.png'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='./output_segmentation.png')
SDK model inference¶
You can also perform SDK model inference like following,
from mmdeploy_runtime import Segmentor
import cv2
import numpy as np
img = cv2.imread('./demo/resources/cityscapes.png')
# create a classifier
segmentor = Segmentor(model_path='./mmdeploy_models/mmseg/ort', device_name='cpu', device_id=0)
# perform inference
seg = segmentor(img)
# visualize inference result
## random a palette with size 256x3
palette = np.random.randint(0, 256, size=(256, 3))
color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8)
for label, color in enumerate(palette):
color_seg[seg == label, :] = color
# convert to BGR
color_seg = color_seg[..., ::-1]
img = img * 0.5 + color_seg * 0.5
img = img.astype(np.uint8)
cv2.imwrite('output_segmentation.png', img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | TorchScript | OnnxRuntime | TensorRT | ncnn | PPLNN | OpenVino |
---|---|---|---|---|---|---|
FCN | Y | Y | Y | Y | Y | Y |
PSPNet* | Y | Y | Y | Y | Y | Y |
DeepLabV3 | Y | Y | Y | Y | Y | Y |
DeepLabV3+ | Y | Y | Y | Y | Y | Y |
Fast-SCNN* | Y | Y | Y | N | Y | Y |
UNet | Y | Y | Y | Y | Y | Y |
ANN* | Y | Y | Y | N | N | N |
APCNet | Y | Y | Y | Y | N | N |
BiSeNetV1 | Y | Y | Y | Y | N | Y |
BiSeNetV2 | Y | Y | Y | Y | N | Y |
CGNet | Y | Y | Y | Y | N | Y |
DMNet | ? | Y | N | N | N | N |
DNLNet | ? | Y | Y | Y | N | Y |
EMANet | Y | Y | Y | N | N | Y |
EncNet | Y | Y | Y | N | N | Y |
ERFNet | Y | Y | Y | Y | N | Y |
FastFCN | Y | Y | Y | Y | N | Y |
GCNet | Y | Y | Y | N | N | N |
ICNet* | Y | Y | Y | N | N | Y |
ISANet* | N | Y | Y | N | N | Y |
NonLocal Net | ? | Y | Y | Y | N | Y |
OCRNet | Y | Y | Y | Y | N | Y |
PointRend* | Y | Y | Y | N | N | N |
Semantic FPN | Y | Y | Y | Y | N | Y |
STDC | Y | Y | Y | Y | N | Y |
UPerNet* | N | Y | Y | N | N | N |
DANet | ? | Y | Y | N | N | Y |
Segmenter* | N | Y | Y | Y | N | Y |
SegFormer* | Y | Y | Y | N | N | Y |
SETR | ? | Y | N | N | N | Y |
CCNet | ? | N | N | N | N | N |
PSANet | ? | N | N | N | N | N |
DPT | ? | N | N | N | N | N |
Reminder¶
Only
whole
inference mode is supported for all mmseg models.PSPNet, Fast-SCNN only support static shape, because nn.AdaptiveAvgPool2d is not supported by most inference backends.
For models that only supports static shape, you should use the deployment config file of static shape such as
configs/mmseg/segmentation_tensorrt_static-1024x2048.py
.For users prefer deployed models generate probability feature map, put
codebase_config = dict(with_argmax=False)
in deploy configs.
MMagic Deployment¶
MMagic aka mmagic
is an open-source image and video editing toolbox based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmagic¶
Please follow the installation guide to install mmagic.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmagic models to the specified backend models. Its detailed usage can be learned from here.
When using tools/deploy.py
, it is crucial to specify the correct deployment config. We’ve already provided builtin deployment config files of all supported backends for mmagic, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
{task}: task in mmagic.
MMDeploy supports models of one task in mmagic, i.e.,
super resolution
. Please refer to chapter supported models for task-model organization.DO REMEMBER TO USE the corresponding deployment config file when trying to convert models of different tasks.
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Convert super resolution model¶
The command below shows an example about converting ESRGAN
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download esrgan model from mmagic model zoo
mim download mmagic --config esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k --dest .
# convert esrgan model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmagic/super-resolution/super-resolution_onnxruntime_dynamic.py \
esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py \
esrgan_psnr_x4c64b23g32_1x16_1000k_div2k_20200420-bf5c993c.pth \
demo/resources/face.png \
--work-dir mmdeploy_models/mmagic/ort \
--device cpu \
--show \
--dump-info
You can also convert the above model to other backend models by changing the deployment config file *_onnxruntime_dynamic.py
to others, e.g., converting to tensorrt model by super-resolution/super-resolution_tensorrt-_dynamic-32x32-512x512.py
.
Tip
When converting mmagic models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmagic/ort
in the previous example. It includes:
mmdeploy_models/mmagic/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmagic/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmagic/super-resolution/super-resolution_onnxruntime_dynamic.py'
model_cfg = 'esrgan_psnr-x4c64b23g32_1xb16-1000k_div2k.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmagic/ort/end2end.onnx']
image = './demo/resources/face.png'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_restorer.bmp')
SDK model inference¶
You can also perform SDK model inference like following,
from mmdeploy_runtime import Restorer
import cv2
img = cv2.imread('./demo/resources/face.png')
# create a classifier
restorer = Restorer(model_path='./mmdeploy_models/mmagic/ort', device_name='cpu', device_id=0)
# perform inference
result = restorer(img)
# visualize inference result
# convert to BGR
result = result[..., ::-1]
cv2.imwrite('output_restorer.bmp', result)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | Task | ONNX Runtime | TensorRT | ncnn | PPLNN | OpenVINO |
---|---|---|---|---|---|---|
SRCNN | super-resolution | Y | Y | Y | Y | Y |
ESRGAN | super-resolution | Y | Y | Y | Y | Y |
ESRGAN-PSNR | super-resolution | Y | Y | Y | Y | Y |
SRGAN | super-resolution | Y | Y | Y | Y | Y |
SRResNet | super-resolution | Y | Y | Y | Y | Y |
Real-ESRGAN | super-resolution | Y | Y | Y | Y | Y |
EDSR | super-resolution | Y | Y | Y | N | Y |
RDN | super-resolution | Y | Y | Y | Y | Y |
MMOCR Deployment¶
MMOCR aka mmocr
is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is a part of the OpenMMLab project.
Installation¶
Install mmocr¶
Please follow the installation guide to install mmocr.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmocr models to the specified backend models. Its detailed usage can be learned from here.
When using tools/deploy.py
, it is crucial to specify the correct deployment config. We’ve already provided builtin deployment config files of all supported backends for mmocr, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
{task}: task in mmocr.
MMDeploy supports models of two tasks of mmocr, one is
text detection
and the other istext-recogntion
.DO REMEMBER TO USE the corresponding deployment config file when trying to convert models of different tasks.
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
In the next two chapters, we will task dbnet
model from text detection
task and crnn
model from text recognition
task respectively as examples, showing how to convert them to onnx model that can be inferred by ONNX Runtime.
Convert text detection model¶
cd mmdeploy
# download dbnet model from mmocr model zoo
mim download mmocr --config dbnet_resnet18_fpnc_1200e_icdar2015 --dest .
# convert mmocr model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py \
dbnet_resnet18_fpnc_1200e_icdar2015.py \
dbnet_resnet18_fpnc_1200e_icdar2015_20220825_221614-7c0e94f2.pth \
demo/resources/text_det.jpg \
--work-dir mmdeploy_models/mmocr/dbnet/ort \
--device cpu \
--show \
--dump-info
Convert text recognition model¶
cd mmdeploy
# download crnn model from mmocr model zoo
mim download mmocr --config crnn_mini-vgg_5e_mj --dest .
# convert mmocr model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py \
crnn_mini-vgg_5e_mj.py \
crnn_mini-vgg_5e_mj_20220826_224120-8afbedbb.pth \
demo/resources/text_recog.jpg \
--work-dir mmdeploy_models/mmocr/crnn/ort \
--device cpu \
--show \
--dump-info
You can also convert the above models to other backend models by changing the deployment config file *_onnxruntime_dynamic.py
to others, e.g., converting dbnet
to tensorrt-fp32 model by text-detection/text-detection_tensorrt-_dynamic-320x320-2240x2240.py
.
Tip
When converting mmocr models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmocr/dbnet/ort
in the previous example. It includes:
mmdeploy_models/mmocr/dbnet/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmocr/dbnet/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model Inference¶
Backend model inference¶
Take the previous converted end2end.onnx
mode of dbnet
as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py'
model_cfg = 'dbnet_resnet18_fpnc_1200e_icdar2015.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmocr/dbnet/ort/end2end.onnx']
image = './demo/resources/text_det.jpg'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_ocr.png')
Tip:
Map ‘deploy_cfg’, ‘model_cfg’, ‘backend_model’ and ‘image’ to corresponding arguments in chapter convert text recognition model, you will get the ONNX Runtime inference results of crnn
onnx model.
SDK model inference¶
Given the above SDK models of dbnet
and crnn
, you can also perform SDK model inference like following,
Text detection SDK model inference¶
import cv2
from mmdeploy_runtime import TextDetector
img = cv2.imread('demo/resources/text_det.jpg')
# create text detector
detector = TextDetector(
model_path='mmdeploy_models/mmocr/dbnet/ort',
device_name='cpu',
device_id=0)
# do model inference
bboxes = detector(img)
# draw detected bbox into the input image
if len(bboxes) > 0:
pts = ((bboxes[:, 0:8] + 0.5).reshape(len(bboxes), -1,
2).astype(int))
cv2.polylines(img, pts, True, (0, 255, 0), 2)
cv2.imwrite('output_ocr.png', img)
Text Recognition SDK model inference¶
import cv2
from mmdeploy_runtime import TextRecognizer
img = cv2.imread('demo/resources/text_recog.jpg')
# create text recognizer
recognizer = TextRecognizer(
model_path='mmdeploy_models/mmocr/crnn/ort',
device_name='cpu',
device_id=0
)
# do model inference
texts = recognizer(img)
# print the result
print(texts)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | Task | TorchScript | OnnxRuntime | TensorRT | ncnn | PPLNN | OpenVINO |
---|---|---|---|---|---|---|---|
DBNet | text-detection | Y | Y | Y | Y | Y | Y |
DBNetpp | text-detection | N | Y | Y | ? | ? | Y |
PSENet | text-detection | Y | Y | Y | Y | N | Y |
PANet | text-detection | Y | Y | Y | Y | N | Y |
TextSnake | text-detection | Y | Y | Y | ? | ? | ? |
MaskRCNN | text-detection | Y | Y | Y | ? | ? | ? |
CRNN | text-recognition | Y | Y | Y | Y | Y | N |
SAR | text-recognition | N | Y | Y | N | N | N |
SATRN | text-recognition | Y | Y | Y | N | N | N |
ABINet | text-recognition | Y | Y | Y | ? | ? | ? |
Reminder¶
ABINet for TensorRT require pytorch1.10+ and TensorRT 8.4+.
SAR uses
valid_ratio
inside network inference, which causes performance drops. When thevalid_ratio
s between testing image and the image for conversion are quite different, the gap would be enlarged.For TensorRT backend, users have to choose the right config. For example, CRNN only accepts 1 channel input. Here is a recommendation table:
Model | Config |
---|---|
MaskRCNN | text-detection_mrcnn_tensorrt_dynamic-320x320-2240x2240.py |
CRNN | text-recognition_tensorrt_dynamic-1x32x32-1x32x640.py |
SATRN | text-recognition_tensorrt_dynamic-32x32-32x640.py |
SAR | text-recognition_tensorrt_dynamic-48x64-48x640.py |
ABINet | text-recognition_tensorrt_static-32x128.py |
MMPose Deployment¶
MMPose aka mmpose
is an open-source toolbox for pose estimation based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmpose¶
Please follow the best practice to install mmpose.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmpose models to the specified backend models. Its detailed usage can be learned from here.
The command below shows an example about converting hrnet
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download hrnet model from mmpose model zoo
mim download mmpose --config td-hm_hrnet-w32_8xb64-210e_coco-256x192 --dest .
# convert mmdet model to onnxruntime model with static shape
python tools/deploy.py \
configs/mmpose/pose-detection_onnxruntime_static.py \
td-hm_hrnet-w32_8xb64-210e_coco-256x192.py \
hrnet_w32_coco_256x192-c78dce93_20200708.pth \
demo/resources/human-pose.jpg \
--work-dir mmdeploy_models/mmpose/ort \
--device cpu \
--show
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deployment config files of all supported backends for mmpose. The config filename pattern is:
pose-detection_{backend}-{precision}_{static | dynamic}_{shape}.py
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert hrnet
to other backend models by changing the deployment config file pose-detection_onnxruntime_static.py
to others, e.g., converting to tensorrt model by pose-detection_tensorrt_static-256x192.py
.
Tip
When converting mmpose models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmpose/ort
in the previous example. It includes:
mmdeploy_models/mmpose/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmpose/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmpose/pose-detection_onnxruntime_static.py'
model_cfg = 'td-hm_hrnet-w32_8xb64-210e_coco-256x192.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmpose/ort/end2end.onnx']
image = './demo/resources/human-pose.jpg'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='output_pose.png')
SDK model inference¶
TODO
MMDetection3d Deployment¶
MMDetection3d aka mmdet3d
is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project.
Install mmdet3d¶
We could install mmdet3d through mim. For other ways of installation, please refer to here
python3 -m pip install -U openmim
python3 -m mim install "mmdet3d>=1.1.0"
Convert model¶
For example, use tools/deploy.py
to convert centerpoint to onnxruntime format
# cd to mmdeploy root directory
# download config and model
mim download mmdet3d --config centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d --dest .
export MODEL_CONFIG=centerpoint_pillar02_second_secfpn_head-circlenms_8xb4-cyclic-20e_nus-3d.py
export MODEL_PATH=centerpoint_02pillar_second_secfpn_circlenms_4x8_cyclic_20e_nus_20220811_031844-191a3822.pth
export TEST_DATA=tests/data/n008-2018-08-01-15-16-36-0400__LIDAR_TOP__1533151612397179.pcd.bin
python3 tools/deploy.py configs/mmdet3d/voxel-detection/voxel-detection_onnxruntime_dynamic.py $MODEL_CONFIG $MODEL_PATH $TEST_DATA --work-dir centerpoint
This step would generate end2end.onnx
in work-dir
ls -lah centerpoint
..
-rw-rw-r-- 1 rg rg 87M 11月 4 19:48 end2end.onnx
Model inference¶
At present, the voxelize preprocessing and postprocessing of mmdet3d are not converted into onnx operations; the C++ SDK has not yet implemented the voxelize calculation.
The caller needs to refer to the corresponding python implementation to complete.
Supported models¶
model | task | dataset | onnxruntime | openvino | tensorrt* |
---|---|---|---|---|---|
centerpoint | voxel detection | nuScenes | ✔️ | ✔️ | ✔️ |
pointpillars | voxel detection | nuScenes | ✔️ | ✔️ | ✔️ |
pointpillars | voxel detection | KITTI | ✔️ | ✔️ | ✔️ |
smoke | monocular detection | KITTI | ✔️ | x | ✔️ |
Make sure trt >= 8.6 for some bug fixed, such as ScatterND, dynamic shape crash and so on.
MMRotate Deployment¶
MMRotate is an open-source toolbox for rotated object detection based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmrotate¶
Please follow the installation guide to install mmrotate.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
NOTE:
Adding
$(pwd)/build/lib
toPYTHONPATH
is for importing mmdeploy SDK python module -mmdeploy_runtime
, which will be presented in chapter SDK model inference.When inference onnx model by ONNX Runtime, it requests ONNX Runtime library be found. Thus, we add it to
LD_LIBRARY_PATH
.
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmrotate models to the specified backend models. Its detailed usage can be learned from here.
The command below shows an example about converting rotated-faster-rcnn
model to onnx model that can be inferred by ONNX Runtime.
cd mmdeploy
# download rotated-faster-rcnn model from mmrotate model zoo
mim download mmrotate --config rotated-faster-rcnn-le90_r50_fpn_1x_dota --dest .
wget https://github.com/open-mmlab/mmrotate/raw/main/demo/dota_demo.jpg
# convert mmrotate model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmrotate/rotated-detection_onnxruntime_dynamic.py \
rotated-faster-rcnn-le90_r50_fpn_1x_dota.py \
rotated_faster_rcnn_r50_fpn_1x_dota_le90-0393aa5c.pth \
dota_demo.jpg \
--work-dir mmdeploy_models/mmrotate/ort \
--device cpu \
--show \
--dump-info
It is crucial to specify the correct deployment config during model conversion. We’ve already provided builtin deployment config files of all supported backends for mmrotate. The config filename pattern is:
rotated_detection-{backend}-{precision}_{static | dynamic}_{shape}.py
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
Therefore, in the above example, you can also convert rotated-faster-rcnn
to other backend models by changing the deployment config file rotated-detection_onnxruntime_dynamic
to others, e.g., converting to tensorrt-fp16 model by rotated-detection_tensorrt-fp16_dynamic-320x320-1024x1024.py
.
Tip
When converting mmrotate models to tensorrt models, –device should be set to “cuda”
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmrotate/ort
in the previous example. It includes:
mmdeploy_models/mmrotate/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmrotate/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model inference¶
Backend model inference¶
Take the previous converted end2end.onnx
model as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import torch
deploy_cfg = 'configs/mmrotate/rotated-detection_onnxruntime_dynamic.py'
model_cfg = './rotated-faster-rcnn-le90_r50_fpn_1x_dota.py'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmrotate/ort/end2end.onnx']
image = './dota_demo.jpg'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# visualize results
task_processor.visualize(
image=image,
model=model,
result=result[0],
window_name='visualize',
output_file='./output.png')
SDK model inference¶
You can also perform SDK model inference like following,
from mmdeploy_runtime import RotatedDetector
import cv2
import numpy as np
img = cv2.imread('./dota_demo.jpg')
# create a detector
detector = RotatedDetector(model_path='./mmdeploy_models/mmrotate/ort', device_name='cpu', device_id=0)
# perform inference
det = detector(img)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
Supported models¶
Model | OnnxRuntime | TensorRT |
---|---|---|
Rotated RetinaNet | Y | Y |
Rotated FasterRCNN | Y | Y |
Oriented R-CNN | Y | Y |
Gliding Vertex | Y | Y |
RTMDET-R | Y | Y |
MMAction2 Deployment¶
MMAction2 is an open-source toolbox for video understanding based on PyTorch. It is a part of the OpenMMLab project.
Installation¶
Install mmaction2¶
Please follow the installation guide to install mmaction2.
Install mmdeploy¶
There are several methods to install mmdeploy, among which you can choose an appropriate one according to your target platform and device.
Method I: Install precompiled package
You can refer to get_started
Method II: Build using scripts
If your target platform is Ubuntu 18.04 or later version, we encourage you to run
scripts. For example, the following commands install mmdeploy as well as inference engine - ONNX Runtime
.
git clone --recursive -b main https://github.com/open-mmlab/mmdeploy.git
cd mmdeploy
python3 tools/scripts/build_ubuntu_x64_ort.py $(nproc)
export PYTHONPATH=$(pwd)/build/lib:$PYTHONPATH
export LD_LIBRARY_PATH=$(pwd)/../mmdeploy-dep/onnxruntime-linux-x64-1.8.1/lib/:$LD_LIBRARY_PATH
Method III: Build from source
If neither I nor II meets your requirements, building mmdeploy from source is the last option.
Convert model¶
You can use tools/deploy.py to convert mmaction2 models to the specified backend models. Its detailed usage can be learned from here.
When using tools/deploy.py
, it is crucial to specify the correct deployment config. We’ve already provided builtin deployment config files of all supported backends for mmaction2, under which the config file path follows the pattern:
{task}/{task}_{backend}-{precision}_{static | dynamic}_{shape}.py
其中:
{task}: task in mmaction2.
{backend}: inference backend, such as onnxruntime, tensorrt, pplnn, ncnn, openvino, coreml etc.
{precision}: fp16, int8. When it’s empty, it means fp32
{static | dynamic}: static shape or dynamic shape
{shape}: input shape or shape range of a model
{2d/3d}: model type
In the next part,we will take tsn
model from video recognition
task as an example, showing how to convert them to onnx model that can be inferred by ONNX Runtime.
Convert video recognition model¶
cd mmdeploy
# download tsn model from mmaction2 model zoo
mim download mmaction2 --config tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb --dest .
# convert mmaction2 model to onnxruntime model with dynamic shape
python tools/deploy.py \
configs/mmaction/video-recognition/video-recognition_2d_onnxruntime_static.py \
tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb \
tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb_20220906-cd10898e.pth \
tests/data/arm_wrestling.mp4 \
--work-dir mmdeploy_models/mmaction/tsn/ort \
--device cpu \
--show \
--dump-info
Model specification¶
Before moving on to model inference chapter, let’s know more about the converted model structure which is very important for model inference.
The converted model locates in the working directory like mmdeploy_models/mmaction/tsn/ort
in the previous example. It includes:
mmdeploy_models/mmaction/tsn/ort
├── deploy.json
├── detail.json
├── end2end.onnx
└── pipeline.json
in which,
end2end.onnx: backend model which can be inferred by ONNX Runtime
*.json: the necessary information for mmdeploy SDK
The whole package mmdeploy_models/mmaction/tsn/ort is defined as mmdeploy SDK model, i.e., mmdeploy SDK model includes both backend model and inference meta information.
Model Inference¶
Backend model inference¶
Take the previous converted end2end.onnx
mode of tsn
as an example, you can use the following code to inference the model and visualize the results.
from mmdeploy.apis.utils import build_task_processor
from mmdeploy.utils import get_input_shape, load_config
import numpy as np
import torch
deploy_cfg = 'configs/mmaction/video-recognition/video-recognition_2d_onnxruntime_static.py'
model_cfg = 'tsn_imagenet-pretrained-r50_8xb32-1x1x3-100e_kinetics400-rgb'
device = 'cpu'
backend_model = ['./mmdeploy_models/mmaction2/tsn/ort/end2end.onnx']
image = 'tests/data/arm_wrestling.mp4'
# read deploy_cfg and model_cfg
deploy_cfg, model_cfg = load_config(deploy_cfg, model_cfg)
# build task and backend model
task_processor = build_task_processor(model_cfg, deploy_cfg, device)
model = task_processor.build_backend_model(backend_model)
# process input image
input_shape = get_input_shape(deploy_cfg)
model_inputs, _ = task_processor.create_input(image, input_shape)
# do model inference
with torch.no_grad():
result = model.test_step(model_inputs)
# show top5-results
pred_scores = result[0].pred_scores.item.tolist()
top_index = np.argsort(pred_scores)[::-1]
for i in range(5):
index = top_index[i]
print(index, pred_scores[index])
SDK model inference¶
Given the above SDK model of tsn
you can also perform SDK model inference like following,
Video recognition SDK model inference¶
from mmdeploy_runtime import VideoRecognizer
import cv2
# refer to demo/python/video_recognition.py
# def SampleFrames(cap, clip_len, frame_interval, num_clips):
# ...
cap = cv2.VideoCapture('tests/data/arm_wrestling.mp4')
clips, info = SampleFrames(cap, 1, 1, 25)
# create a recognizer
recognizer = VideoRecognizer(model_path='./mmdeploy_models/mmaction/tsn/ort', device_name='cpu', device_id=0)
# perform inference
result = recognizer(clips, info)
# show inference result
for label_id, score in result:
print(label_id, score)
Besides python API, mmdeploy SDK also provides other FFI (Foreign Function Interface), such as C, C++, C#, Java and so on. You can learn their usage from demos.
MMAction2 only API of c, c++ and python for now.
Supported ncnn feature¶
The current use of the ncnn feature is as follows:
feature | windows | linux | mac | android |
---|---|---|---|---|
fp32 inference | ✔️ | ✔️ | ✔️ | ✔️ |
int8 model convert | - | ✔️ | ✔️ | - |
nchw layout | ✔️ | ✔️ | ✔️ | ✔️ |
Vulkan support | - | ✔️ | ✔️ | ✔️ |
The following features cannot be automatically enabled by mmdeploy and you need to manually modify the ncnn build options or adjust the running parameters in the SDK
bf16 inference
nc4hw4 layout
Profiling per layer
Turn off NCNN_STRING to reduce .so file size
Set thread number and CPU affinity
onnxruntime Support¶
Introduction of ONNX Runtime¶
ONNX Runtime is a cross-platform inference and training accelerator compatible with many popular ML/DNN frameworks. Check its github for more information.
Installation¶
Please note that only onnxruntime>=1.8.1 of on Linux platform is supported by now.
Install ONNX Runtime python package¶
CPU Version
pip install onnxruntime==1.8.1 # if you want to use cpu version
GPU Version
pip install onnxruntime-gpu==1.8.1 # if you want to use gpu version
Install float16 conversion tool (optional)¶
If you want to use float16 precision, install the tool by running the following script:
pip install onnx onnxconverter-common
Build custom ops¶
Download ONNXRuntime Library¶
Download onnxruntime-linux-*.tgz
library from ONNX Runtime releases, extract it, expose ONNXRUNTIME_DIR
and finally add the lib path to LD_LIBRARY_PATH
as below:
CPU Version
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-1.8.1.tgz
cd onnxruntime-linux-x64-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
GPU Version
In X64 GPU:
wget https://github.com/microsoft/onnxruntime/releases/download/v1.8.1/onnxruntime-linux-x64-gpu-1.8.1.tgz
tar -zxvf onnxruntime-linux-x64-gpu-1.8.1.tgz
cd onnxruntime-linux-x64-gpu-1.8.1
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
In Arm GPU:
# Arm not have 1.8.1 version package
wget https://github.com/microsoft/onnxruntime/releases/download/v1.10.0/onnxruntime-linux-aarch64-1.10.0.tgz
tar -zxvf onnxruntime-linux-aarch64-1.10.0.tgz
cd onnxruntime-linux-aarch64-1.10.0
export ONNXRUNTIME_DIR=$(pwd)
export LD_LIBRARY_PATH=$ONNXRUNTIME_DIR/lib:$LD_LIBRARY_PATH
You can also go to ONNX Runtime Release to find corresponding release version package.
Build on Linux¶
CPU Version
cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_DEVICES='cpu' -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
GPU Version
cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_DEVICES='cuda' -DMMDEPLOY_TARGET_BACKENDS=ort -DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} ..
make -j$(nproc) && make install
How to convert a model¶
You could follow the instructions of tutorial How to convert model
How to add a new custom op¶
Reminder¶
The custom operator is not included in supported operator list in ONNX Runtime.
The custom operator should be able to be exported to ONNX.
Main procedures¶
Take custom operator roi_align
for example.
Create a
roi_align
directory in ONNX Runtime source directory${MMDEPLOY_DIR}/csrc/backend_ops/onnxruntime/
Add header and source file into
roi_align
directory${MMDEPLOY_DIR}/csrc/backend_ops/onnxruntime/roi_align/
Add unit test into
tests/test_ops/test_ops.py
Check here for examples.
Finally, welcome to send us PR of adding custom operators for ONNX Runtime in MMDeploy. :nerd_face:
OpenVINO Support¶
This tutorial is based on Linux systems like Ubuntu-18.04.
Installation¶
It is recommended to create a virtual environment for the project.
Install python package¶
Install OpenVINO. It is recommended to use the installer or install using pip. Installation example using pip:
pip install openvino-dev[onnx]==2022.3.0
Download OpenVINO runtime for SDK (Optional)¶
If you want to use OpenVINO in SDK, you need install OpenVINO with install_guides.
Take openvino==2022.3.0
as example:
wget https://storage.openvinotoolkit.org/repositories/openvino/packages/2022.3/linux/l_openvino_toolkit_ubuntu20_2022.3.0.9052.9752fafe8eb_x86_64.tgz
tar xzf ./l_openvino_toolkit*.tgz
cd l_openvino*
export InferenceEngine_DIR=$pwd/runtime/cmake
bash ./install_dependencies/install_openvino_dependencies.sh
Build mmdeploy SDK with OpenVINO (Optional)¶
Install MMDeploy following the instructions.
cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_DEVICES='cpu' -DMMDEPLOY_TARGET_BACKENDS=openvino -DInferenceEngine_DIR=${InferenceEngine_DIR} ..
make -j$(nproc) && make install
To work with models from MMDetection, you may need to install it additionally.
Usage¶
You could follow the instructions of tutorial How to convert model
Example:
python tools/deploy.py \
configs/mmdet/detection/detection_openvino_static-300x300.py \
/mmdetection_dir/mmdetection/configs/ssd/ssd300_coco.py \
/tmp/snapshots/ssd300_coco_20210803_015428-d231a06e.pth \
tests/data/tiger.jpeg \
--work-dir ../deploy_result \
--device cpu \
--log-level INFO
List of supported models exportable to OpenVINO from MMDetection¶
The table below lists the models that are guaranteed to be exportable to OpenVINO from MMDetection.
Model name | Config | Dynamic Shape |
---|---|---|
ATSS | configs/atss/atss_r50_fpn_1x_coco.py |
Y |
Cascade Mask R-CNN | configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.py |
Y |
Cascade R-CNN | configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.py |
Y |
Faster R-CNN | configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py |
Y |
FCOS | configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_4x2_2x_coco.py |
Y |
FoveaBox | configs/foveabox/fovea_r50_fpn_4x4_1x_coco.py |
Y |
FSAF | configs/fsaf/fsaf_r50_fpn_1x_coco.py |
Y |
Mask R-CNN | configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py |
Y |
RetinaNet | configs/retinanet/retinanet_r50_fpn_1x_coco.py |
Y |
SSD | configs/ssd/ssd300_coco.py |
Y |
YOLOv3 | configs/yolo/yolov3_d53_mstrain-608_273e_coco.py |
Y |
YOLOX | configs/yolox/yolox_tiny_8x8_300e_coco.py |
Y |
Faster R-CNN + DCN | configs/dcn/faster_rcnn_r50_fpn_dconv_c3-c5_1x_coco.py |
Y |
VFNet | configs/vfnet/vfnet_r50_fpn_1x_coco.py |
Y |
Notes:
Custom operations from OpenVINO use the domain
org.openvinotoolkit
.For faster work in OpenVINO in the Faster-RCNN, Mask-RCNN, Cascade-RCNN, Cascade-Mask-RCNN models the RoiAlign operation is replaced with the ExperimentalDetectronROIFeatureExtractor operation in the ONNX graph.
Models “VFNet” and “Faster R-CNN + DCN” use the custom “DeformableConv2D” operation.
Deployment config¶
With the deployment config, you can specify additional options for the Model Optimizer.
To do this, add the necessary parameters to the backend_config.mo_options
in the fields args
(for parameters with values) and flags
(for flags).
Example:
backend_config = dict(
mo_options=dict(
args=dict({
'--mean_values': [0, 0, 0],
'--scale_values': [255, 255, 255],
'--data_type': 'FP32',
}),
flags=['--disable_fusing'],
)
)
Information about the possible parameters for the Model Optimizer can be found in the documentation.
Troubleshooting¶
ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
To resolve missing external dependency on Ubuntu*, execute the following command:
sudo apt-get install libpython3.7
PPLNN Support¶
MMDeploy supports ppl.nn v0.8.1 and later. This tutorial is based on Linux systems like Ubuntu-18.04.
Installation¶
Please install pyppl following install-guide.
Install MMDeploy following the instructions.
Usage¶
Example:
python tools/deploy.py \
configs/mmdet/detection/detection_pplnn_dynamic-800x1344.py \
/mmdetection_dir/mmdetection/configs/retinanet/retinanet_r50_fpn_1x_coco.py \
/tmp/snapshots/retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth \
tests/data/tiger.jpeg \
--work-dir ../deploy_result \
--device cuda \
--log-level INFO
SNPE feature support¶
Currently mmdeploy integrates the onnx2dlc model conversion and SDK inference, but the following features are not yet supported:
GPU_FP16 mode
DSP/AIP quantization
Operator internal profiling
UDO operator
TensorRT Support¶
Installation¶
Install TensorRT¶
Please install TensorRT 8 follow install-guide.
Note:
pip Wheel File Installation
is not supported yet in this repo.We strongly suggest you install TensorRT through tar file
After installation, you’d better add TensorRT environment variables to bashrc by:
cd ${TENSORRT_DIR} # To TensorRT root directory echo '# set env for TensorRT' >> ~/.bashrc echo "export TENSORRT_DIR=${TENSORRT_DIR}" >> ~/.bashrc echo 'export LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$TENSORRT_DIR' >> ~/.bashrc source ~/.bashrc
Build custom ops¶
Some custom ops are created to support models in OpenMMLab, and the custom ops can be built as follow:
cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=trt ..
make -j$(nproc)
If you haven’t installed TensorRT in the default path, Please add -DTENSORRT_DIR
flag in CMake.
cmake -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} ..
make -j$(nproc) && make install
Convert model¶
Please follow the tutorial in How to convert model. Note that the device must be cuda
device.
Int8 Support¶
Since TensorRT supports INT8 mode, a custom dataset config can be given to calibrate the model. Following is an example for MMDetection:
# calibration_dataset.py
# dataset settings, same format as the codebase in OpenMMLab
dataset_type = 'CalibrationDataset'
data_root = 'calibration/dataset/root'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
val=dict(
type=dataset_type,
ann_file=data_root + 'val_annotations.json',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'test_annotations.json',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')
Convert your model with this calibration dataset:
python tools/deploy.py \
...
--calib-dataset-cfg calibration_dataset.py
If the calibration dataset is not given, the data will be calibrated with the dataset in model config.
FAQs¶
Error
Cannot found TensorRT headers
orCannot found TensorRT libs
Try cmake with flag
-DTENSORRT_DIR
:cmake -DBUILD_TENSORRT_OPS=ON -DTENSORRT_DIR=${TENSORRT_DIR} .. make -j$(nproc)
Please make sure there are libs and headers in
${TENSORRT_DIR}
.Error
error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]
There is an input shape limit in deployment config:
backend_config = dict( # other configs model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 320, 320], opt_shape=[1, 3, 800, 1344], max_shape=[1, 3, 1344, 1344]))) ]) # other configs
The shape of the tensor
input
must be limited betweeninput_shapes["input"]["min_shape"]
andinput_shapes["input"]["max_shape"]
.Error
error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS
TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the default choice for SM version >= 7.0. However, you may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you don’t want to upgrade.
Read this for detail.
Install mmdeploy on Jetson
We provide a tutorial to get start on Jetsons here.
TorchScript support¶
Introduction of TorchScript¶
TorchScript a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. Check the Introduction to TorchScript for more details.
Build custom ops¶
Prerequisite¶
Download libtorch from the official website here.
Please note that only Pre-cxx11 ABI and version 1.8.1+ on Linux platform are supported by now.
For previous versions of libtorch, users can find through the issue comment. Libtorch1.8.1+cu111 as an example, extract it, expose Torch_DIR
and add the lib path to LD_LIBRARY_PATH
as below:
wget https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.1%2Bcu111.zip
unzip libtorch-shared-with-deps-1.8.1+cu111.zip
cd libtorch
export Torch_DIR=$(pwd)
export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH
Note:
If you want to save libtorch env variables to bashrc, you could run
echo '# set env for libtorch' >> ~/.bashrc echo "export Torch_DIR=${Torch_DIR}" >> ~/.bashrc echo 'export LD_LIBRARY_PATH=$Torch_DIR/lib:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc
Build on Linux¶
cd ${MMDEPLOY_DIR} # To MMDeploy root directory
mkdir -p build && cd build
cmake -DMMDEPLOY_TARGET_BACKENDS=torchscript -DTorch_DIR=${Torch_DIR} ..
make -j$(nproc) && make install
How to convert a model¶
You could follow the instructions of tutorial How to convert model
SDK backend¶
TorchScript SDK backend may be built by passing -DMMDEPLOY_TORCHSCRIPT_SDK_BACKEND=ON
to cmake
.
Notice that libtorch
is sensitive to C++ ABI versions. On platforms defaulted to C++11 ABI (e.g. Ubuntu 16+) one may
pass -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0"
to cmake
to use pre-C++11 ABI for building. In this case all
dependencies with ABI sensitive interfaces (e.g. OpenCV) must be built with pre-C++11 ABI.
FAQs¶
Error:
projects/thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN.
May export CUDNN_ROOT=/root/path/to/cudnn to resolve the build error.
Supported RKNN feature¶
Currently, MMDeploy only tests rk3588 and rv1126 with linux platform.
The following features cannot be automatically enabled by mmdeploy and you need to manually modify the configuration in MMDeploy like here.
target_platform other than default
quantization settings
optimization level other than 1
TVM feature support¶
MMDeploy has integrated TVM for model conversion and SDK. Features include:
AutoTVM tuner
Ansor tuner
Graph Executor runtime
Virtual machine runtime
Core ML feature support¶
MMDeploy support convert Pytorch model to Core ML and inference.
Installation¶
To convert the model in mmdet, you need to compile libtorch to support custom operators such as nms (only needed in conversion stage). For MacOS 12 users, please install Pytorch 1.8.0, for MacOS 13 users, please install Pytorch 2.0.0+.
cd ${PYTORCH_DIR}
mkdir build && cd build
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DPYTHON_EXECUTABLE=`which python` \
-DCMAKE_INSTALL_PREFIX=install \
-DDISABLE_SVE=ON
make install
Usage¶
python tools/deploy.py \
configs/mmdet/detection/detection_coreml_static-800x1344.py \
/mmdetection_dir/configs/retinanet/retinanet_r18_fpn_1x_coco.py \
/checkpoint/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth \
/mmdetection_dir/demo/demo.jpg \
--work-dir work_dir/retinanet \
--device cpu \
--dump-info
ONNX Runtime Ops¶
grid_sampler¶
Description¶
Perform sample from input
with pixel locations from grid
.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
interpolation_mode |
Interpolation mode to calculate output values. (0: bilinear , 1: nearest ) |
int |
padding_mode |
Padding mode for outside grid values. (0: zeros , 1: border , 2: reflection ) |
int |
align_corners |
If align_corners=1 , the extrema (-1 and 1 ) are considered as referring to the center points of the input's corner pixels. If align_corners=0 , they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
Inputs¶
- input: T
- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
- grid: T
- Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.
Outputs¶
- output: T
- Output feature; 4-D tensor of shape (N, C, outH, outW).
Type Constraints¶
T:tensor(float32, Linear)
MMCVModulatedDeformConv2d¶
Description¶
Perform Modulated Deformable Convolution on input feature, read Deformable ConvNets v2: More Deformable, Better Results for detail.
Parameters¶
Type | Parameter | Description |
---|---|---|
list of ints |
stride |
The stride of the convolving kernel. (sH, sW) |
list of ints |
padding |
Paddings on both sides of the input. (padH, padW) |
list of ints |
dilation |
The spacing between kernel elements. (dH, dW) |
int |
deformable_groups |
Groups of deformable offset. |
int |
groups |
Split input into groups. input_channel should be divisible by the number of groups. |
Inputs¶
- inputs[0]: T
- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
- inputs[1]: T
- Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
- inputs[2]: T
- Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
- inputs[3]: T
- Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
- inputs[4]: T, optional
- Input bias; 1-D tensor of shape (output_channel).
Outputs¶
- outputs[0]: T
- Output feature; 4-D tensor of shape (N, output_channel, outH, outW).
Type Constraints¶
T:tensor(float32, Linear)
NMSRotated¶
Description¶
Non Max Suppression for rotated bboxes.
Parameters¶
Type | Parameter | Description |
---|---|---|
float |
iou_threshold |
The IoU threshold for NMS. |
Inputs¶
- inputs[0]: T
- Input feature; 2-D tensor of shape (N, 5), where N is the number of rotated bboxes, .
- inputs[1]: T
- Input offset; 1-D tensor of shape (N, ), where N is the number of rotated bboxes.
Outputs¶
- outputs[0]: T
- Output feature; 1-D tensor of shape (K, ), where K is the number of keep bboxes.
Type Constraints¶
T:tensor(float32, Linear)
RoIAlignRotated¶
Description¶
Perform RoIAlignRotated on output feature, used in bbox_head of most two-stage rotated object detectors.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
output_height |
height of output roi |
int |
output_width |
width of output roi |
float |
spatial_scale |
used to scale the input boxes |
int |
sampling_ratio |
number of input samples to take for each output sample. 0 means to take samples densely for current models. |
int |
aligned |
If aligned=0 , use the legacy implementation in MMDetection. Else, align the results more perfectly. |
int |
clockwise |
If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False. |
Inputs¶
- input: T
- Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
- rois: T
- RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 6) given as [[batch_index, cx, cy, w, h, theta], ...]. The RoIs' coordinates are the coordinate system of input.
Outputs¶
- feat: T
- RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].
Type Constraints¶
T:tensor(float32)
TensorRT Ops¶
TRTBatchedNMS¶
Description¶
Batched NMS with a fixed number of output bounding boxes.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
background_label_id |
The label ID for the background class. If there is no background class, set it to -1 . |
int |
num_classes |
The number of classes. |
int |
topK |
The number of bounding boxes to be fed into the NMS step. |
int |
keepTopK |
The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the topK value. |
float |
scoreThreshold |
The scalar threshold for score (low scoring boxes are removed). |
float |
iouThreshold |
The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed). |
int |
isNormalized |
Set to false if the box coordinates are not normalized, meaning they are not in the range [0,1] . Defaults to true . |
int |
clipBoxes |
Forcibly restrict bounding boxes to the normalized range [0,1] . Only applicable if isNormalized is also true . Defaults to true . |
Inputs¶
- inputs[0]: T
- boxes; 4-D tensor of shape (N, num_boxes, num_classes, 4), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
- inputs[1]: T
- scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).
Outputs¶
- outputs[0]: T
- dets; 3-D tensor of shape (N, valid_num_boxes, 5), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, x1, y1, score]`
- outputs[1]: tensor(int32, Linear)
- labels; 2-D tensor of shape (N, valid_num_boxes).
Type Constraints¶
T:tensor(float32, Linear)
grid_sampler¶
Description¶
Perform sample from input
with pixel locations from grid
.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
interpolation_mode |
Interpolation mode to calculate output values. (0: bilinear , 1: nearest ) |
int |
padding_mode |
Padding mode for outside grid values. (0: zeros , 1: border , 2: reflection ) |
int |
align_corners |
If align_corners=1 , the extrema (-1 and 1 ) are considered as referring to the center points of the input's corner pixels. If align_corners=0 , they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
Inputs¶
- inputs[0]: T
- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
- inputs[1]: T
- Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.
Outputs¶
- outputs[0]: T
- Output feature; 4-D tensor of shape (N, C, outH, outW).
Type Constraints¶
T:tensor(float32, Linear)
MMCVInstanceNormalization¶
Description¶
Carry out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.
y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.
Parameters¶
Type | Parameter | Description |
---|---|---|
float |
epsilon |
The epsilon value to use to avoid division by zero. Default is 1e-05 |
Inputs¶
- input: T
- Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.
- scale: T
- The input 1-dimensional scale tensor of size C.
- B: T
- The input 1-dimensional bias tensor of size C.
Outputs¶
- output: T
- The output tensor of the same shape as input.
Type Constraints¶
T:tensor(float32, Linear)
MMCVModulatedDeformConv2d¶
Description¶
Perform Modulated Deformable Convolution on input feature. Read Deformable ConvNets v2: More Deformable, Better Results for detail.
Parameters¶
Type | Parameter | Description |
---|---|---|
list of ints |
stride |
The stride of the convolving kernel. (sH, sW) |
list of ints |
padding |
Paddings on both sides of the input. (padH, padW) |
list of ints |
dilation |
The spacing between kernel elements. (dH, dW) |
int |
deformable_group |
Groups of deformable offset. |
int |
group |
Split input into groups. input_channel should be divisible by the number of groups. |
Inputs¶
- inputs[0]: T
- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
- inputs[1]: T
- Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
- inputs[2]: T
- Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
- inputs[3]: T
- Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
- inputs[4]: T, optional
- Input weight; 1-D tensor of shape (output_channel).
Outputs¶
- outputs[0]: T
- Output feature; 4-D tensor of shape (N, output_channel, outH, outW).
Type Constraints¶
T:tensor(float32, Linear)
MMCVMultiLevelRoiAlign¶
Description¶
Perform RoIAlign on features from multiple levels. Used in bbox_head of most two-stage detectors.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
output_height |
height of output roi. |
int |
output_width |
width of output roi. |
list of floats |
featmap_strides |
feature map stride of each level. |
int |
sampling_ratio |
number of input samples to take for each output sample. 0 means to take samples densely for current models. |
float |
roi_scale_factor |
RoIs will be scaled by this factor before RoI Align. |
int |
finest_scale |
Scale threshold of mapping to level 0. Default: 56. |
int |
aligned |
If aligned=0 , use the legacy implementation in MMDetection. Else, align the results more perfectly. |
Inputs¶
Outputs¶
- outputs[0]: T
- RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].
Type Constraints¶
T:tensor(float32, Linear)
MMCVRoIAlign¶
Description¶
Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
output_height |
height of output roi |
int |
output_width |
width of output roi |
float |
spatial_scale |
used to scale the input boxes |
int |
sampling_ratio |
number of input samples to take for each output sample. 0 means to take samples densely for current models. |
str |
mode |
pooling mode in each bin. avg or max |
int |
aligned |
If aligned=0 , use the legacy implementation in MMDetection. Else, align the results more perfectly. |
Inputs¶
- inputs[0]: T
- Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
- inputs[1]: T
- RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].
Outputs¶
- outputs[0]: T
- RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].
Type Constraints¶
T:tensor(float32, Linear)
ScatterND¶
Description¶
ScatterND takes three inputs data
tensor of rank r >= 1, indices
tensor of rank q >= 1, and updates
tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input data
, and then updating its value to values specified by updates at specific index positions specified by indices
. Its output shape is the same as the shape of data
. Note that indices
should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
The output
is calculated via the following equation:
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
Parameters¶
None
Inputs¶
- inputs[0]: T
- Tensor of rank r>=1.
- inputs[1]: tensor(int32, Linear)
- Tensor of rank q>=1.
- inputs[2]: T
- Tensor of rank q + r - indices_shape[-1] - 1.
Outputs¶
- outputs[0]: T
- Tensor of rank r >= 1.
Type Constraints¶
T:tensor(float32, Linear), tensor(int32, Linear)
TRTBatchedRotatedNMS¶
Description¶
Batched rotated NMS with a fixed number of output bounding boxes.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
background_label_id |
The label ID for the background class. If there is no background class, set it to -1 . |
int |
num_classes |
The number of classes. |
int |
topK |
The number of bounding boxes to be fed into the NMS step. |
int |
keepTopK |
The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the topK value. |
float |
scoreThreshold |
The scalar threshold for score (low scoring boxes are removed). |
float |
iouThreshold |
The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed). |
int |
isNormalized |
Set to false if the box coordinates are not normalized, meaning they are not in the range [0,1] . Defaults to true . |
int |
clipBoxes |
Forcibly restrict bounding boxes to the normalized range [0,1] . Only applicable if isNormalized is also true . Defaults to true . |
Inputs¶
- inputs[0]: T
- boxes; 4-D tensor of shape (N, num_boxes, num_classes, 5), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
- inputs[1]: T
- scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).
Outputs¶
- outputs[0]: T
- dets; 3-D tensor of shape (N, valid_num_boxes, 6), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, width, height, theta, score]`
- outputs[1]: tensor(int32, Linear)
- labels; 2-D tensor of shape (N, valid_num_boxes).
Type Constraints¶
T:tensor(float32, Linear)
GridPriorsTRT¶
Description¶
Generate the anchors for object detection task.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
stride_w |
The stride of the feature width. |
int |
stride_h |
The stride of the feature height. |
Inputs¶
- inputs[0]: T
- The base anchors; 2-D tensor with shape [num_base_anchor, 4].
- inputs[1]: TAny
- height provider; 1-D tensor with shape [featmap_height]. The data will never been used.
- inputs[2]: TAny
- width provider; 1-D tensor with shape [featmap_width]. The data will never been used.
Outputs¶
- outputs[0]: T
- output anchors; 2-D tensor of shape (num_base_anchor*featmap_height*featmap_widht, 4).
Type Constraints¶
T:tensor(float32, Linear)
TAny: Any
ScaledDotProductAttentionTRT¶
Description¶
Dot product attention used to support multihead attention, read Attention Is All You Need for more detail.
Parameters¶
None
Inputs¶
- inputs[0]: T
- query; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
- inputs[1]: T
- key; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
- inputs[2]: T
- value; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
- inputs[3]: T
- mask; 2-D/3-D tensor with shape [sequence_length, sequence_length] or [batch_size, sequence_length, sequence_length]. optional.
Outputs¶
- outputs[0]: T
- 3-D tensor of shape [batch_size, sequence_length, embedding_size]. `softmax(q@k.T)@v`
- outputs[1]: T
- 3-D tensor of shape [batch_size, sequence_length, sequence_length]. `softmax(q@k.T)`
Type Constraints¶
T:tensor(float32, Linear)
GatherTopk¶
Description¶
TensorRT 8.2~8.4 would give unexpected result for multi-index gather.
data[batch_index, bbox_index, ...]
Read this for more details.
Parameters¶
None
Inputs¶
- inputs[0]: T
- Tensor to be gathered, with shape (A0, ..., An, G0, C0, ...).
- inputs[1]: tensor(int32, Linear)
- Tensor of index. with shape (A0, ..., An, G1)
Outputs¶
- outputs[0]: T
- Tensor of output. With shape (A0, ..., An, G1, C0, ...)
Type Constraints¶
T:tensor(float32, Linear), tensor(int32, Linear)
MMCVMultiScaleDeformableAttention¶
Description¶
Perform attention computation over a small set of key sampling points around a reference point rather than looking over all possible spatial locations. Read Deformable DETR: Deformable Transformers for End-to-End Object Detection for detail.
Parameters¶
None
Inputs¶
- inputs[0]: T
- Input feature; 4-D tensor of shape (N, S, M, D), where N is the batch size, S is the length of feature maps, M is the number of attention heads, and D is hidden_dim.
- inputs[1]: T
- Input offset; 2-D tensor of shape (L, 2), L is the number of feature maps, `2` is shape of feature maps.
- inputs[2]: T
- Input mask; 1-D tensor of shape (L, ), this tensor is used to find the sampling locations for different feature levels as the input feature tensors are flattened.
- inputs[3]: T
- Input weight; 6-D tensor of shape (N, Lq, M, L, P, 2). Lq is the length of feature maps(encoder)/length of queries(decoder), P is the number of points
- inputs[4]: T, optional
- Input weight; 5-D tensor of shape (N, Lq, M, L, P).
Outputs¶
- outputs[0]: T
- Output feature; 3-D tensor of shape (N, Lq, M*D).
Type Constraints¶
T:tensor(float32, Linear)
ncnn Ops¶
Expand¶
Description¶
Broadcast the input blob following the given shape and the broadcast rule of ncnn.
Parameters¶
Expand has no parameters.
Inputs¶
- inputs[0]: ncnn.Mat
- bottom_blobs[0]; An ncnn.Mat of input data.
- inputs[1]: ncnn.Mat
- bottom_blobs[1]; An 1-dim ncnn.Mat. A valid shape of ncnn.Mat.
Outputs¶
- outputs[0]: T
- top_blob; The blob of ncnn.Mat which expanded by given shape and broadcast rule of ncnn.
Type Constraints¶
ncnn.Mat: Mat(float32)
Gather¶
Description¶
Given the data and indice blob, gather entries of the axis dimension of data indexed by indices.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
axis |
Which axis to gather on. Default is 0. |
Inputs¶
- inputs[0]: ncnn.Mat
- bottom_blobs[0]; An ncnn.Mat of input data.
- inputs[1]: ncnn.Mat
- bottom_blobs[1]; An 1-dim ncnn.Mat of indices on given axis.
Outputs¶
- outputs[0]: T
- top_blob; The blob of ncnn.Mat which gathered by given data and indice blob.
Type Constraints¶
ncnn.Mat: Mat(float32)
Shape¶
Description¶
Get the shape of the ncnn blobs.
Parameters¶
Shape has no parameters.
Inputs¶
- inputs[0]: ncnn.Mat
- bottom_blob; An ncnn.Mat of input data.
Outputs¶
- outputs[0]: T
- top_blob; 1-D ncnn.Mat of shape (bottom_blob.dims,), `bottom_blob.dims` is the input blob dimensions.
Type Constraints¶
ncnn.Mat: Mat(float32)
TopK¶
Description¶
Get the indices and value(optional) of largest or smallest k data among the axis. This op will map to onnx op TopK
, ArgMax
, and ArgMin
.
Parameters¶
Type | Parameter | Description |
---|---|---|
int |
axis |
The axis of data which topk calculate on. Default is -1, indicates the last dimension. |
int |
largest |
The binary value which indicates the TopK operator selects the largest or smallest K values. Default is 1, the TopK selects the largest K values. |
int |
sorted |
The binary value of whether returning sorted topk value or not. If not, the topk returns topk values in any order. Default is 1, this operator returns sorted topk values. |
int |
keep_dims |
The binary value of whether keep the reduced dimension or not. Default is 1, each output blob has the same dimension as input blob. |
Inputs¶
- inputs[0]: ncnn.Mat
- bottom_blob[0]; An ncnn.Mat of input data.
- inputs[1] (optional): ncnn.Mat
- bottom_blob[1]; An optional ncnn.Mat. A blob of K in TopK. If this blob not exist, K is 1.
Outputs¶
- outputs[0]: T
- top_blob[0]; If outputs has only 1 blob, outputs[0] is the indice blob of topk, if outputs has 2 blobs, outputs[0] is the value blob of topk. This blob is ncnn.Mat format with the shape of bottom_blob[0] or reduced shape of bottom_blob[0].
- outputs[1]: T
- top_blob[1] (optional); If outputs has 2 blobs, outputs[1] is the value blob of topk. This blob is ncnn.Mat format with the shape of bottom_blob[0] or reduced shape of bottom_blob[0].
Type Constraints¶
ncnn.Mat: Mat(float32)
mmdeploy Architecture¶
This article mainly introduces the functions of each directory of mmdeploy and how it works from model conversion to real inference.
Take a general look at the directory structure¶
The entire mmdeploy can be seen as two independent parts: model conversion and SDK.
We introduce the entire repo directory structure and functions, without having to study the source code, just have an impression.
Peripheral directory features:
$ cd /path/to/mmdeploy
$ tree -L 1
.
├── CMakeLists.txt # Compile custom operator and cmake configuration of SDK
├── configs # Algorithm library configuration for model conversion
├── csrc # SDK and custom operator
├── demo # FFI interface examples in various languages, such as csharp, java, python, etc.
├── docker # docker build
├── mmdeploy # python package for model conversion
├── requirements # python requirements
├── service # Some small boards not support python, we use C/S mode for model conversion, here is server code
├── tests # unittest
├── third_party # 3rd party dependencies required by SDK and FFI
└── tools # Tools are also the entrance to all functions, such as onnx2xx.py, profiler.py, test.py, etc.
It should be clear
Model conversion mainly depends on
tools
,mmdeploy
and small part ofcsrc
directory;SDK is consist of three directories:
csrc
,third_party
anddemo
.
Model Conversion¶
Here we take ViT of mmpretrain as model example, and take ncnn as inference backend example. Other models and inferences are similar.
Let’s take a look at the mmdeploy/mmdeploy directory structure and get an impression:
.
├── apis # The api used by tools is implemented here, such as onnx2ncnn.py
│ ├── calibration.py # trt dedicated collection of quantitative data
│ ├── core # Software infrastructure
│ ├── extract_model.py # Use it to export part of onnx
│ ├── inference.py # Abstract function, which will actually call torch/ncnn specific inference
│ ├── ncnn # ncnn Wrapper
│ └── visualize.py # Still an abstract function, which will actually call torch/ncnn specific inference and visualize
..
├── backend # Backend wrapper
│ ├── base # Because there are multiple backends, there must be an OO design for the base class
│ ├── ncnn # This calls the ncnn python interface for model conversion
│ │ ├── init_plugins.py # Find the path of ncnn custom operators and ncnn tools
│ │ ├── onnx2ncnn.py # Wrap `mmdeploy_onnx2ncnn` into a python interface
│ │ ├── quant.py # Wrap `ncnn2int8` as a python interface
│ │ └── wrapper.py # Wrap pyncnn forward API
..
├── codebase # Algorithm rewriter
│ ├── base # There are multiple algorithms here that we need a bit of OO design
│ ├── mmpretrain # mmpretrain related model rewrite
│ │ ├── deploy # mmpretrain implementation of base abstract task/model/codebase
│ │ └── models # Real model rewrite
│ │ ├── backbones # Rewrites of backbone network parts, such as multiheadattention
│ │ ├── heads # Such as MultiLabelClsHead
│ │ ├── necks # Such as GlobalAveragePooling
│..
├── core # Software infrastructure of rewrite mechanism
├── mmcv # Rewrite mmcv
├── pytorch # Rewrite pytorch operator for ncnn, such as Gemm
..
Each line above needs to be read, don’t skip it.
When typing tools/deploy.py
to convert ViT, these are 3 things:
Rewrite of mmpretrain ViT forward
ncnn does not support
gather
opr, customize and load it with libncnn.soRun exported ncnn model with real inference, render output, and make sure the result is correct
1. Rewrite forward
¶
Because when exporting ViT to onnx, it generates some operators that ncnn doesn’t support perfectly, mmdeploy’s solution is to hijack the forward code and change it. The output onnx is suitable for ncnn.
For example, rewrite the process of conv -> shape -> concat_const -> reshape
to conv -> reshape
to trim off the redundant shape
and concat
operator.
All mmpretrain algorithm rewriters are in the mmdeploy/codebase/mmpretrain/models
directory.
2. Custom Operator¶
Operators customized for ncnn are in the csrc/mmdeploy/backend_ops/ncnn/
directory, and are loaded together with libncnn.so
after compilation. The essence is in hotfix ncnn, which currently implements these operators:
topk
tensorslice
shape
gather
expand
constantofshape
3. Model Conversion and testing¶
We first use the modified mmdeploy_onnx2ncnn
to convert model, then inference withpyncnn
and custom ops.
When encountering a framework such as snpe that does not support python well, we use C/S mode: wrap a server with protocols such as gRPC, and forward the real inference output.
For Rendering, mmdeploy directly uses the rendering API of upstream algorithm codebase.
SDK¶
After the model conversion completed, the SDK compiled with C++ can be used to execute on different platforms.
Let’s take a look at the csrc/mmdeploy directory structure:
.
├── apis # csharp, java, go, Rust and other FFI interfaces
├── backend_ops # Custom operators for each inference framework
├── CMakeLists.txt
├── codebase # The type of results preferred by each algorithm framework, such as multi-use bbox for detection task
├── core # Abstraction of graph, operator, device and so on
├── device # Implementation of CPU/GPU device abstraction
├── execution # Implementation of the execution abstraction
├── graph # Implementation of graph abstraction
├── model # Implement both zip-compressed and uncompressed work directory
├── net # Implementation of net, such as wrap ncnn forward C API
├── preprocess # Implement preprocess
└── utils # OCV tools
The essence of the SDK is to design a set of abstraction of the computational graph, and combine the multiple models’
preprocess
inference
postprocess
Provide FFI in multiple languages at the same time.
How to support new models¶
We provide several tools to support model conversion.
Function Rewriter¶
The PyTorch neural network is written in python that eases the development of the algorithm. But the use of Python control flow and third-party libraries make it difficult to export the network to an intermediate representation. We provide a ‘monkey patch’ tool to rewrite the unsupported function to another one that can be exported. Here is an example:
from mmdeploy.core import FUNCTION_REWRITER
@FUNCTION_REWRITER.register_rewriter(
func_name='torch.Tensor.repeat', backend='tensorrt')
def repeat_static(input, *size):
ctx = FUNCTION_REWRITER.get_context()
origin_func = ctx.origin_func
if input.dim() == 1 and len(size) == 1:
return origin_func(input.unsqueeze(0), *([1] + list(size))).squeeze(0)
else:
return origin_func(input, *size)
It is easy to use the function rewriter. Just add a decorator with arguments:
func_name
is the function to override. It can be either a PyTorch function or a custom function. Methods in modules can also be overridden by this tool.backend
is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.
The arguments are the same as the original function, except a context ctx
as the first argument. The context provides some useful information such as the deployment config ctx.cfg
and the original function (which has been overridden) ctx.origin_func
.
Module Rewriter¶
If you want to replace a whole module with another one, we have another rewriter as follows:
@MODULE_REWRITER.register_rewrite_module(
'mmagic.models.backbones.sr_backbones.SRCNN', backend='tensorrt')
class SRCNNWrapper(nn.Module):
def __init__(self,
module,
cfg,
channels=(3, 64, 32, 3),
kernel_sizes=(9, 1, 5),
upscale_factor=4):
super(SRCNNWrapper, self).__init__()
self._module = module
module.img_upsampler = nn.Upsample(
scale_factor=module.upscale_factor,
mode='bilinear',
align_corners=False)
def forward(self, *args, **kwargs):
"""Run forward."""
return self._module(*args, **kwargs)
def init_weights(self, *args, **kwargs):
"""Initialize weights."""
return self._module.init_weights(*args, **kwargs)
Just like function rewriter, add a decorator with arguments:
module_type
the module class to rewrite.backend
is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.
All instances of the module in the network will be replaced with instances of this new class. The original module and the deployment config will be passed as the first two arguments.
Custom Symbolic¶
The mappings between PyTorch and ONNX are defined in PyTorch with symbolic functions. The custom symbolic function can help us to bypass some ONNX nodes which are unsupported by inference engine.
@SYMBOLIC_REWRITER.register_symbolic('squeeze', is_pytorch=True)
def squeeze_default(g, self, dim=None):
if dim is None:
dims = []
for i, size in enumerate(self.type().sizes()):
if size == 1:
dims.append(i)
else:
dims = [sym_help._get_const(dim, 'i', 'dim')]
return g.op('Squeeze', self, axes_i=dims)
The decorator arguments:
func_name
The function name to add symbolic. Use full path if it is a customtorch.autograd.Function
. Or just a name if it is a PyTorch built-in function.backend
is the inference engine. The function will be overridden when the model is exported to this engine. If it is not given, this rewrite will be the default rewrite. The default rewrite will be used if the rewrite of the given backend does not exist.is_pytorch
True if the function is a PyTorch built-in function.arg_descriptors
the descriptors of the symbolic function arguments. Will be feed totorch.onnx.symbolic_helper._parse_arg
.
Just like function rewriter, there is a context ctx
as the first argument. The context provides some useful information such as the deployment config ctx.cfg
and the original function (which has been overridden) ctx.origin_func
. Note that the ctx.origin_func
can be used only when is_pytorch==False
.
How to support new backends¶
MMDeploy supports a number of backend engines. We welcome the contribution of new backends. In this tutorial, we will introduce the general procedures to support a new backend in MMDeploy.
Prerequisites¶
Before contributing the codes, there are some requirements for the new backend that need to be checked:
The backend must support ONNX as IR.
If the backend requires model files or weight files other than a “.onnx” file, a conversion tool that converts the “.onnx” file to model files and weight files is required. The tool can be a Python API, a script, or an executable program.
It is highly recommended that the backend provides a Python interface to load the backend files and inference for validation.
Support backend conversion¶
The backends in MMDeploy must support the ONNX. The backend loads the “.onnx” file directly, or converts the “.onnx” to its own format using the conversion tool. In this section, we will introduce the steps to support backend conversion.
Add backend constant in
mmdeploy/utils/constants.py
that denotes the name of the backend.Example:
# mmdeploy/utils/constants.py class Backend(AdvancedEnum): # Take TensorRT as an example TENSORRT = 'tensorrt'
Add a corresponding package (a folder with
__init__.py
) inmmdeploy/backend/
. For example,mmdeploy/backend/tensorrt
. In the__init__.py
, there must be a function namedis_available
which checks if users have installed the backend library. If the check is passed, then the remaining files of the package will be loaded.Example:
# mmdeploy/backend/tensorrt/__init__.py def is_available(): return importlib.util.find_spec('tensorrt') is not None if is_available(): from .utils import from_onnx, load, save from .wrapper import TRTWrapper __all__ = [ 'from_onnx', 'save', 'load', 'TRTWrapper' ]
Create a config file in
configs/_base_/backends
(e.g.,configs/_base_/backends/tensorrt.py
). If the backend just takes the ‘.onnx’ file as input, the new config can be simple. The config of the backend only consists of one field denoting the name of the backend (which should be same as the name inmmdeploy/utils/constants.py
).Example:
backend_config = dict(type='onnxruntime')
If the backend requires other files, then the arguments for the conversion from “.onnx” file to backend files should be included in the config file.
Example:
backend_config = dict( type='tensorrt', common_config=dict( fp16_mode=False, max_workspace_size=0))
After possessing a base backend config file, you can easily construct a complete deploy config through inheritance. Please refer to our config tutorial for more details. Here is an example:
_base_ = ['../_base_/backends/onnxruntime.py'] codebase_config = dict(type='mmpretrain', task='Classification') onnx_config = dict(input_shape=None)
If the backend requires model files or weight files other than a “.onnx” file, create a
onnx2backend.py
file in the corresponding folder (e.g., createmmdeploy/backend/tensorrt/onnx2tensorrt.py
). Then add a conversion functiononnx2backend
in the file. The function should convert a given “.onnx” file to the required backend files in a given work directory. There are no requirements on other parameters of the function and the implementation details. You can use any tools for conversion. Here are some examples:Use Python script:
def onnx2openvino(input_info: Dict[str, Union[List[int], torch.Size]], output_names: List[str], onnx_path: str, work_dir: str): input_names = ','.join(input_info.keys()) input_shapes = ','.join(str(list(elem)) for elem in input_info.values()) output = ','.join(output_names) mo_args = f'--input_model="{onnx_path}" '\ f'--output_dir="{work_dir}" ' \ f'--output="{output}" ' \ f'--input="{input_names}" ' \ f'--input_shape="{input_shapes}" ' \ f'--disable_fusing ' command = f'mo.py {mo_args}' mo_output = run(command, stdout=PIPE, stderr=PIPE, shell=True, check=True)
Use executable program:
def onnx2ncnn(onnx_path: str, work_dir: str): onnx2ncnn_path = get_onnx2ncnn_path() save_param, save_bin = get_output_model_file(onnx_path, work_dir) call([onnx2ncnn_path, onnx_path, save_param, save_bin])\
Define APIs in a new package in
mmdeploy/apis
.Example:
# mmdeploy/apis/ncnn/__init__.py from mmdeploy.backend.ncnn import is_available __all__ = ['is_available'] if is_available(): from mmdeploy.backend.ncnn.onnx2ncnn import (onnx2ncnn, get_output_model_file) __all__ += ['onnx2ncnn', 'get_output_model_file']
Create a backend manager class which derive from
BaseBackendManager
, implement itsto_backend
static method.Example:
@classmethod def to_backend(cls, ir_files: Sequence[str], deploy_cfg: Any, work_dir: str, log_level: int = logging.INFO, device: str = 'cpu', **kwargs) -> Sequence[str]: return ir_files
Convert the models of OpenMMLab to backends (if necessary) and inference on backend engine. If you find some incompatible operators when testing, you can try to rewrite the original model for the backend following the rewriter tutorial or add custom operators.
Add docstring and unit tests for new code :).
Support backend inference¶
Although the backend engines are usually implemented in C/C++, it is convenient for testing and debugging if the backend provides Python inference interface. We encourage the contributors to support backend inference in the Python interface of MMDeploy. In this section we will introduce the steps to support backend inference.
Add a file named
wrapper.py
to corresponding folder inmmdeploy/backend/{backend}
. For example,mmdeploy/backend/tensorrt/wrapper.py
. This module should implement and register a wrapper class that inherits the base classBaseWrapper
inmmdeploy/backend/base/base_wrapper.py
.Example:
from mmdeploy.utils import Backend from ..base import BACKEND_WRAPPER, BaseWrapper @BACKEND_WRAPPER.register_module(Backend.TENSORRT.value) class TRTWrapper(BaseWrapper):
The wrapper class can initialize the engine in
__init__
function and inference inforward
function. Note that the__init__
function must take a parameteroutput_names
and pass it to base class to determine the orders of output tensors. The input and output variables offorward
should be dictionaries denoting the name and value of the tensors.For the convenience of performance testing, the class should define a “execute” function that only calls the inference interface of the backend engine. The
forward
function should call the “execute” function after preprocessing the data.Example:
from mmdeploy.utils import Backend from mmdeploy.utils.timer import TimeCounter from ..base import BACKEND_WRAPPER, BaseWrapper @BACKEND_WRAPPER.register_module(Backend.ONNXRUNTIME.value) class ORTWrapper(BaseWrapper): def __init__(self, onnx_file: str, device: str, output_names: Optional[Sequence[str]] = None): # Initialization # ... super().__init__(output_names) def forward(self, inputs: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]: # Fetch data # ... self.__ort_execute(self.io_binding) # Postprocess data # ... @TimeCounter.count_time('onnxruntime') def __ort_execute(self, io_binding: ort.IOBinding): # Only do the inference self.sess.run_with_iobinding(io_binding)
Create a backend manager class which derive from
BaseBackendManager
, implement itsbuild_wrapper
static method.Example:
@BACKEND_MANAGERS.register('onnxruntime') class ONNXRuntimeManager(BaseBackendManager): @classmethod def build_wrapper(cls, backend_files: Sequence[str], device: str = 'cpu', input_names: Optional[Sequence[str]] = None, output_names: Optional[Sequence[str]] = None, deploy_cfg: Optional[Any] = None, **kwargs): from .wrapper import ORTWrapper return ORTWrapper( onnx_file=backend_files[0], device=device, output_names=output_names)
Add docstring and unit tests for new code :).
Support new backends using MMDeploy as a third party¶
Previous parts show how to add a new backend in MMDeploy, which requires changing its source codes. However, if we treat MMDeploy as a third party, the methods above are no longer efficient. To this end, adding a new backend requires us pre-install another package named aenum
. We can install it directly through pip install aenum
.
After installing aenum
successfully, we can use it to add a new backend through:
from mmdeploy.utils.constants import Backend
from aenum import extend_enum
try:
Backend.get('backend_name')
except Exception:
extend_enum(Backend, 'BACKEND', 'backend_name')
We can run the codes above before we use the rewrite logic of MMDeploy.
How to add test units for backend ops¶
This tutorial introduces how to add unit test for backend ops. When you add a custom op under backend_ops
, you need to add the corresponding test unit. Test units of ops are included in tests/test_ops/test_ops.py
.
Prerequisite¶
Compile new ops
: After adding a new custom op, needs to recompile the relevant backend, referring to build.md.
1. Add the test program test_XXXX()¶
You can put unit test for ops in tests/test_ops/
. Usually, the following program template can be used for your custom op.
example of ops unit test¶
@pytest.mark.parametrize('backend', [TEST_TENSORRT, TEST_ONNXRT]) # 1.1 backend test class
@pytest.mark.parametrize('pool_h,pool_w,spatial_scale,sampling_ratio', # 1.2 set parameters of op
[(2, 2, 1.0, 2), (4, 4, 2.0, 4)]) # [(# Examples of op test parameters),...]
def test_roi_align(backend,
pool_h, # set parameters of op
pool_w,
spatial_scale,
sampling_ratio,
input_list=None,
save_dir=None):
backend.check_env()
if input_list is None:
input = torch.rand(1, 1, 16, 16, dtype=torch.float32) # 1.3 op input data initialization
single_roi = torch.tensor([[0, 0, 0, 4, 4]], dtype=torch.float32)
else:
input = torch.tensor(input_list[0], dtype=torch.float32)
single_roi = torch.tensor(input_list[1], dtype=torch.float32)
from mmcv.ops import roi_align
def wrapped_function(torch_input, torch_rois): # 1.4 initialize op model to be tested
return roi_align(torch_input, torch_rois, (pool_w, pool_h),
spatial_scale, sampling_ratio, 'avg', True)
wrapped_model = WrapFunction(wrapped_function).eval()
with RewriterContext(cfg={}, backend=backend.backend_name, opset=11): # 1.5 call the backend test class interface
backend.run_and_validate(
wrapped_model, [input, single_roi],
'roi_align',
input_names=['input', 'rois'],
output_names=['roi_feat'],
save_dir=save_dir)
1.1 backend test class¶
We provide some functions and classes for difference backends, such as TestOnnxRTExporter
, TestTensorRTExporter
, TestNCNNExporter
.
1.2 set parameters of op¶
Set some parameters of op, such as ’pool_h‘, ’pool_w‘, ’spatial_scale‘, ’sampling_ratio‘ in roi_align. You can set multiple parameters to test op.
1.3 op input data initialization¶
Initialization required input data.
1.4 initialize op model to be tested¶
The model containing custom op usually has two forms.
torch model
: Torch model with custom operators. Python code related to op is required, refer toroi_align
unit test.onnx model
: Onnx model with custom operators. Need to call onnx api to build, refer tomulti_level_roi_align
unit test.
1.5 call the backend test class interface¶
Call the backend test class run_and_validate
to run and verify the result output by the op on the backend.
def run_and_validate(self,
model,
input_list,
model_name='tmp',
tolerate_small_mismatch=False,
do_constant_folding=True,
dynamic_axes=None,
output_names=None,
input_names=None,
expected_result=None,
save_dir=None):
Parameter Description¶
model
: Input model to be tested and it can be torch model or any other backend model.input_list
: List of test data, which is mapped to the order of input_names.model_name
: The name of the model.tolerate_small_mismatch
: Whether to allow small errors in the verification of results.do_constant_folding
: Whether to use constant light folding to optimize the model.dynamic_axes
: If you need to use dynamic dimensions, enter the dimension information.output_names
: The node name of the output node.input_names
: The node name of the input node.expected_result
: Expected ground truth values for verification.save_dir
: The folder used to save the output files.
2. Test Methods¶
Use pytest to call the test function to test ops.
pytest tests/test_ops/test_ops.py::test_XXXX
How to test rewritten models¶
After you create a rewritten model using our rewriter, it’s better to write a unit test for the model to validate if the model rewrite would come into effect. Generally, we need to get outputs of the original model and rewritten model, then compare them. The outputs of the original model can be acquired directly by calling the forward function of the model, whereas the way to generate the outputs of the rewritten model depends on the complexity of the rewritten model.
Test rewritten model with small changes¶
If the changes to the model are small (e.g., only change the behavior of one or two variables and don’t introduce side effects), you can construct the input arguments for the rewritten functions/modules,run model’s inference in RewriteContext
and check the results.
# mmpretrain.models.classfiers.base.py
class BaseClassifier(BaseModule, metaclass=ABCMeta):
def forward(self, img, return_loss=True, **kwargs):
if return_loss:
return self.forward_train(img, **kwargs)
else:
return self.forward_test(img, **kwargs)
# Custom rewritten function
@FUNCTION_REWRITER.register_rewriter(
'mmpretrain.models.classifiers.BaseClassifier.forward', backend='default')
def forward_of_base_classifier(self, img, *args, **kwargs):
"""Rewrite `forward` for default backend."""
return self.simple_test(img, {})
In the example, we only change the function that forward
calls. We can test this rewritten function by writing the following test function:
def test_baseclassfier_forward():
input = torch.rand(1)
from mmpretrain.models.classifiers import BaseClassifier
class DummyClassifier(BaseClassifier):
def __init__(self, init_cfg=None):
super().__init__(init_cfg=init_cfg)
def extract_feat(self, imgs):
pass
def forward_train(self, imgs):
return 'train'
def simple_test(self, img, tmp, **kwargs):
return 'simple_test'
model = DummyClassifier().eval()
model_output = model(input)
with RewriterContext(cfg=dict()), torch.no_grad():
backend_output = model(input)
assert model_output == 'train'
assert backend_output == 'simple_test'
In this test function, we construct a derived class of BaseClassifier
to test if the rewritten model would work in the rewrite context. We get outputs of the original model by directly calling model(input)
and get the outputs of the rewritten model by calling model(input)
in RewriteContext
. Finally, we can check the outputs by asserting their value.
Test rewritten model with big changes¶
In the first example, the output is generated in Python. Sometimes we may make big changes to original model functions (e.g., eliminate branch statements to generate correct computing graph). Even if the outputs of a rewritten model running in Python are correct, we cannot assure that the rewritten model can work as expected in the backend. Therefore, we need to test the rewritten model in the backend.
# Custom rewritten function
@FUNCTION_REWRITER.register_rewriter(
func_name='mmseg.models.segmentors.BaseSegmentor.forward')
def base_segmentor__forward(self, img, img_metas=None, **kwargs):
ctx = FUNCTION_REWRITER.get_context()
if img_metas is None:
img_metas = {}
assert isinstance(img_metas, dict)
assert isinstance(img, torch.Tensor)
deploy_cfg = ctx.cfg
is_dynamic_flag = is_dynamic_shape(deploy_cfg)
img_shape = img.shape[2:]
if not is_dynamic_flag:
img_shape = [int(val) for val in img_shape]
img_metas['img_shape'] = img_shape
return self.simple_test(img, img_metas, **kwargs)
The behavior of this rewritten function is complex. We should test it as follows:
def test_basesegmentor_forward():
from mmdeploy.utils.test import (WrapModel, get_model_outputs,
get_rewrite_outputs)
segmentor = get_model()
segmentor.cpu().eval()
# Prepare data
# ...
# Get the outputs of original model
model_inputs = {
'img': [imgs],
'img_metas': [img_metas],
'return_loss': False
}
model_outputs = get_model_outputs(segmentor, 'forward', model_inputs)
# Get the outputs of rewritten model
wrapped_model = WrapModel(segmentor, 'forward', img_metas = None, return_loss = False)
rewrite_inputs = {'img': imgs}
rewrite_outputs, is_backend_output = get_rewrite_outputs(
wrapped_model=wrapped_model,
model_inputs=rewrite_inputs,
deploy_cfg=deploy_cfg)
if is_backend_output:
# If the backend plugins have been installed, the rewrite outputs are
# generated by backend.
rewrite_outputs = torch.tensor(rewrite_outputs)
model_outputs = torch.tensor(model_outputs)
model_outputs = model_outputs.unsqueeze(0).unsqueeze(0)
assert torch.allclose(rewrite_outputs, model_outputs)
else:
# Otherwise, the outputs are generated by python.
assert rewrite_outputs is not None
We provide some utilities to test rewritten functions. At first, you can construct a model and call get_model_outputs
to get outputs of the original model. Then you can wrap the rewritten function with WrapModel
, which serves as a partial function, and get the results with get_rewrite_outputs
. get_rewrite_outputs
returns two values that indicate the content of outputs and whether the outputs come from the backend. Because we cannot assume that everyone has installed the backend, we should check if the results are generated by a Python or backend engine. The unit test must cover both conditions. Finally, we should compare the original and rewritten outputs, which may be done simply by calling torch.allclose
.
Note¶
To learn the complete usage of the test utilities, please refer to our apis document.
How to get partitioned ONNX models¶
MMDeploy supports exporting PyTorch models to partitioned onnx models. With this feature, users can define their partition policy and get partitioned onnx models at ease. In this tutorial, we will briefly introduce how to support partition a model step by step. In the example, we would break YOLOV3 model into two parts and extract the first part without the post-processing (such as anchor generating and NMS) in the onnx model.
Step 1: Mark inputs/outpupts¶
To support the model partition, we need to add Mark
nodes in the ONNX model. This could be done with mmdeploy’s @mark
decorator. Note that to make the mark
work, the marking operation should be included in a rewriting function.
At first, we would mark the model input, which could be done by marking the input tensor img
in the forward
method of BaseDetector
class, which is the parent class of all detector classes. Thus we name this marking point as detector_forward
and mark the inputs as input
. Since there could be three outputs for detectors such as Mask RCNN
, the outputs are marked as dets
, labels
, and masks
. The following code shows the idea of adding mark functions and calling the mark functions in the rewrite. For source code, you could refer to mmdeploy/codebase/mmdet/models/detectors/single_stage.py
from mmdeploy.core import FUNCTION_REWRITER, mark
@mark(
'detector_forward', inputs=['input'], outputs=['dets', 'labels', 'masks'])
def __forward_impl(self, img, img_metas=None, **kwargs):
...
@FUNCTION_REWRITER.register_rewriter(
'mmdet.models.detectors.base.BaseDetector.forward')
def base_detector__forward(self, img, img_metas=None, **kwargs):
...
# call the mark function
return __forward_impl(...)
Then, we have to mark the output feature of YOLOV3Head
, which is the input argument pred_maps
in get_bboxes
method of YOLOV3Head
class. We could add a internal function to only mark the pred_maps
inside yolov3_head__get_bboxes
function as following.
from mmdeploy.core import FUNCTION_REWRITER, mark
@FUNCTION_REWRITER.register_rewriter(
func_name='mmdet.models.dense_heads.YOLOV3Head.get_bboxes')
def yolov3_head__get_bboxes(self,
pred_maps,
img_metas,
cfg=None,
rescale=False,
with_nms=True):
# mark pred_maps
@mark('yolo_head', inputs=['pred_maps'])
def __mark_pred_maps(pred_maps):
return pred_maps
pred_maps = __mark_pred_maps(pred_maps)
...
Note that pred_maps
is a list of Tensor
and it has three elements. Thus, three Mark
nodes with op name as pred_maps.0
, pred_maps.1
, pred_maps.2
would be added in the onnx model.
Step 2: Add partition config¶
After marking necessary nodes that would be used to split the model, we could add a deployment config file configs/mmdet/detection/yolov3_partition_onnxruntime_static.py
. If you are not familiar with how to write config, you could check write_config.md.
In the config file, we need to add partition_config
. The key part is partition_cfg
, which contains elements of dict that designates the start nodes and end nodes of each model segments. Since we only want to keep YOLOV3
without post-processing, we could set the start
as ['detector_forward:input']
, and end
as ['yolo_head:input']
. Note that start
and end
can have multiple marks.
_base_ = ['./detection_onnxruntime_static.py']
onnx_config = dict(input_shape=[608, 608])
partition_config = dict(
type='yolov3_partition', # the partition policy name
apply_marks=True, # should always be set to True
partition_cfg=[
dict(
save_file='yolov3.onnx', # filename to save the partitioned onnx model
start=['detector_forward:input'], # [mark_name:input/output, ...]
end=['yolo_head:input'], # [mark_name:input/output, ...]
output_names=[f'pred_maps.{i}' for i in range(3)]) # output names
])
Step 3: Get partitioned onnx models¶
Once we have marks of nodes and the deployment config with parition_config
being set properly, we could use the tool torch2onnx
to export the model to onnx and get the partition onnx files.
python tools/torch2onnx.py \
configs/mmdet/detection/yolov3_partition_onnxruntime_static.py \
../mmdetection/configs/yolo/yolov3_d53_8xb8-ms-608-273e_coco.py \
https://download.openmmlab.com/mmdetection/v2.0/yolo/yolov3_d53_mstrain-608_273e_coco/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8.pth \
../mmdetection/demo/demo.jpg \
--work-dir ./work-dirs/mmdet/yolov3/ort/partition
After run the script above, we would have the partitioned onnx file yolov3.onnx
in the work-dir
. You can use the visualization tool netron to check the model structure.
With the partitioned onnx file, you could refer to useful_tools.md to do the following procedures such as mmdeploy_onnx2ncnn
, onnx2tensorrt
.
How to do regression test¶
This tutorial describes how to do regression test. The deployment configuration file contains codebase config and inference config.
1. Python Environment¶
pip install -r requirements/tests.txt
If pip throw an exception, try to upgrade numpy.
pip install -U numpy
2. Usage¶
python ./tools/regression_test.py \
--codebase "${CODEBASE_NAME}" \
--backends "${BACKEND}" \
[--models "${MODELS}"] \
--work-dir "${WORK_DIR}" \
--device "${DEVICE}" \
--log-level INFO \
[--performance 或 -p] \
[--checkpoint-dir "$CHECKPOINT_DIR"]
Description¶
--codebase
: The codebase to test, eg.mmdet
. If you want to test multiple codebase, usemmpretrain mmdet ...
--backends
: The backend to test. By default, allbackend
s would be tested. You can useonnxruntime tesensorrt
to choose several backends. If you also need to test the SDK, you need to configure thesdk_config
intests/regression/${codebase}.yml
.--models
: Specify the model to be tested. All models inyml
are tested by default. You can also give some model names. For the model name, please refer to the relevant yml configuration file. For exampleResNet SE-ResNet "Mask R-CNN"
. Model name can only contain numbers and letters.--work-dir
: The directory of model convert and report, use../mmdeploy_regression_working_dir
by default.--checkpoint-dir
: The path of downloaded torch model, use../mmdeploy_checkpoints
by default.--device
: device type, usecuda
by default--log-level
: These options are available:'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'
. The default value isINFO
.-p
or--performance
: Test precision or not. If not enabled, only model convert would be tested.
Notes¶
For Windows user:
To use the
&&
connector in shell commands, you need to downloadPowerShell 7 Preview 5+
.If you are using conda env, you may need to change
python3
topython
in regression_test.py because there ispython3.exe
in%USERPROFILE%\AppData\Local\Microsoft\WindowsApps
directory.
Example¶
Test all backends of mmdet and mmpose for model convert and precision
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO \
--performance
Test model convert and precision of some backends of mmdet and mmpose
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--backends onnxruntime tensorrt \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO \
-p
Test some backends of mmdet and mmpose, only test model convert
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--backends onnxruntime tensorrt \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO
Test some models of mmdet and mmpretrain, only test model convert
python ./tools/regression_test.py \
--codebase mmdet mmpose \
--models ResNet SE-ResNet "Mask R-CNN" \
--work-dir "../mmdeploy_regression_working_dir" \
--device "cuda" \
--log-level INFO
3. Regression Test Configuration¶
Example and parameter description¶
globals:
codebase_dir: ../mmocr # codebase path to test
checkpoint_force_download: False # whether to redownload the model even if it already exists
images:
img_densetext_det: &img_densetext_det ../mmocr/demo/demo_densetext_det.jpg
img_demo_text_det: &img_demo_text_det ../mmocr/demo/demo_text_det.jpg
img_demo_text_ocr: &img_demo_text_ocr ../mmocr/demo/demo_text_ocr.jpg
img_demo_text_recog: &img_demo_text_recog ../mmocr/demo/demo_text_recog.jpg
metric_info: &metric_info
hmean-iou: # metafile.Results.Metrics
eval_name: hmean-iou # test.py --metrics args
metric_key: 0_hmean-iou:hmean # the key name of eval log
tolerance: 0.1 # tolerated threshold interval
task_name: Text Detection # the name of metafile.Results.Task
dataset: ICDAR2015 # the name of metafile.Results.Dataset
word_acc: # same as hmean-iou, also a kind of metric
eval_name: acc
metric_key: 0_word_acc_ignore_case
tolerance: 0.2
task_name: Text Recognition
dataset: IIIT5K
convert_image_det: &convert_image_det # the image that will be used by detection model convert
input_img: *img_densetext_det
test_img: *img_demo_text_det
convert_image_rec: &convert_image_rec
input_img: *img_demo_text_recog
test_img: *img_demo_text_recog
backend_test: &default_backend_test True # whether test model precision for backend
sdk: # SDK config
sdk_detection_dynamic: &sdk_detection_dynamic configs/mmocr/text-detection/text-detection_sdk_dynamic.py
sdk_recognition_dynamic: &sdk_recognition_dynamic configs/mmocr/text-recognition/text-recognition_sdk_dynamic.py
onnxruntime:
pipeline_ort_recognition_static_fp32: &pipeline_ort_recognition_static_fp32
convert_image: *convert_image_rec # the image used by model conversion
backend_test: *default_backend_test # whether inference on the backend
sdk_config: *sdk_recognition_dynamic # test SDK or not. If it exists, use a specific SDK config for testing
deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_static.py # the deploy cfg path to use, based on mmdeploy path
pipeline_ort_recognition_dynamic_fp32: &pipeline_ort_recognition_dynamic_fp32
convert_image: *convert_image_rec
backend_test: *default_backend_test
sdk_config: *sdk_recognition_dynamic
deploy_config: configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py
pipeline_ort_detection_dynamic_fp32: &pipeline_ort_detection_dynamic_fp32
convert_image: *convert_image_det
deploy_config: configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py
tensorrt:
pipeline_trt_recognition_dynamic_fp16: &pipeline_trt_recognition_dynamic_fp16
convert_image: *convert_image_rec
backend_test: *default_backend_test
sdk_config: *sdk_recognition_dynamic
deploy_config: configs/mmocr/text-recognition/text-recognition_tensorrt-fp16_dynamic-1x32x32-1x32x640.py
pipeline_trt_detection_dynamic_fp16: &pipeline_trt_detection_dynamic_fp16
convert_image: *convert_image_det
backend_test: *default_backend_test
sdk_config: *sdk_detection_dynamic
deploy_config: configs/mmocr/text-detection/text-detection_tensorrt-fp16_dynamic-320x320-2240x2240.py
openvino:
# same as onnxruntime backend configuration
ncnn:
# same as onnxruntime backend configuration
pplnn:
# same as onnxruntime backend configuration
torchscript:
# same as onnxruntime backend configuration
models:
- name: crnn # model name
metafile: configs/textrecog/crnn/metafile.yml # the path of model metafile, based on codebase path
codebase_model_config_dir: configs/textrecog/crnn # the basepath of `model_configs`, based on codebase path
model_configs: # the config name to teset
- crnn_academic_dataset.py
pipelines: # pipeline name
- *pipeline_ort_recognition_dynamic_fp32
- name: dbnet
metafile: configs/textdet/dbnet/metafile.yml
codebase_model_config_dir: configs/textdet/dbnet
model_configs:
- dbnet_r18_fpnc_1200e_icdar2015.py
pipelines:
- *pipeline_ort_detection_dynamic_fp32
- *pipeline_trt_detection_dynamic_fp16
# special pipeline can be added like this
- convert_image: xxx
backend_test: xxx
sdk_config: xxx
deploy_config: configs/mmocr/text-detection/xxx
4. Generated Report¶
This is an example of mmocr regression test report.
Model | Model Config | Task | Checkpoint | Dataset | Backend | Deploy Config | Static or Dynamic | Precision Type | Conversion Result | hmean-iou | word_acc | Test Pass | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | crnn | ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py | Text Recognition | ../mmdeploy_checkpoints/mmocr/crnn/crnn_academic-a723a1c5.pth | IIIT5K | Pytorch | - | - | - | - | - | 80.5 | - |
1 | crnn | ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py | Text Recognition | ${WORK_DIR}/mmocr/crnn/onnxruntime/static/crnn_academic-a723a1c5/end2end.onnx | x | onnxruntime | configs/mmocr/text-recognition/text-recognition_onnxruntime_dynamic.py | static | fp32 | True | - | 80.67 | True |
2 | crnn | ../mmocr/configs/textrecog/crnn/crnn_academic_dataset.py | Text Recognition | ${WORK_DIR}/mmocr/crnn/onnxruntime/static/crnn_academic-a723a1c5 | x | SDK-onnxruntime | configs/mmocr/text-recognition/text-recognition_sdk_dynamic.py | static | fp32 | True | - | x | False |
3 | dbnet | ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py | Text Detection | ../mmdeploy_checkpoints/mmocr/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth | ICDAR2015 | Pytorch | - | - | - | - | 0.795 | - | - |
4 | dbnet | ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py | Text Detection | ../mmdeploy_checkpoints/mmocr/dbnet/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597.pth | ICDAR | onnxruntime | configs/mmocr/text-detection/text-detection_onnxruntime_dynamic.py | dynamic | fp32 | True | - | - | True |
5 | dbnet | ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py | Text Detection | ${WORK_DIR}/mmocr/dbnet/tensorrt/dynamic/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597/end2end.engine | ICDAR | tensorrt | configs/mmocr/text-detection/text-detection_tensorrt-fp16_dynamic-320x320-2240x2240.py | dynamic | fp16 | True | 0.793302 | - | True |
6 | dbnet | ../mmocr/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py | Text Detection | ${WORK_DIR}/mmocr/dbnet/tensorrt/dynamic/dbnet_r18_fpnc_sbn_1200e_icdar2015_20210329-ba3ab597 | ICDAR | SDK-tensorrt | configs/mmocr/text-detection/text-detection_sdk_dynamic.py | dynamic | fp16 | True | 0.795073 | - | True |
5. Supported Backends¶
[x] ONNX Runtime
[x] TensorRT
[x] PPLNN
[x] ncnn
[x] OpenVINO
[x] TorchScript
[x] SNPE
[x] MMDeploy SDK
6. Supported Codebase and Metrics¶
Codebase | Metric | Support |
---|---|---|
mmdet | bbox | :heavy_check_mark: |
segm | :heavy_check_mark: | |
PQ | :x: | |
mmpretrain | accuracy | :heavy_check_mark: |
mmseg | mIoU | :heavy_check_mark: |
mmpose | AR | :heavy_check_mark: |
AP | :heavy_check_mark: | |
mmocr | hmean | :heavy_check_mark: |
acc | :heavy_check_mark: | |
mmagic | PSNR | :heavy_check_mark: |
SSIM | :heavy_check_mark: |
ONNX export Optimizer¶
This is a tool to optimize ONNX model when exporting from PyTorch.
Installation¶
Build MMDeploy with torchscript
support:
export Torch_DIR=$(python -c "import torch;print(torch.utils.cmake_prefix_path + '/Torch')")
cmake \
-DTorch_DIR=${Torch_DIR} \
-DMMDEPLOY_TARGET_BACKENDS="${your_backend};torchscript" \
.. # You can also add other build flags if you need
cmake --build . -- -j$(nproc) && cmake --install .
Usage¶
# import model_to_graph_custom_optimizer so we can hijack onnx.export
from mmdeploy.apis.onnx.optimizer import model_to_graph__custom_optimizer # noqa
from mmdeploy.core import RewriterContext
from mmdeploy.apis.onnx.passes import optimize_onnx
# load you model here
model = create_model()
# export with ONNX Optimizer
x = create_dummy_input()
with RewriterContext({}, onnx_custom_passes=optimize_onnx):
torch.onnx.export(model, x, output_path)
The model would be optimized after export.
You can also define your own optimizer:
# create the optimize callback
def _optimize_onnx(graph, params_dict, torch_out):
from mmdeploy.backend.torchscript import ts_optimizer
ts_optimizer.onnx._jit_pass_onnx_peephole(graph)
return graph, params_dict, torch_out
with RewriterContext({}, onnx_custom_passes=_optimize_onnx):
# export your model
Cross compile snpe inference server on Ubuntu 18¶
mmdeploy has provided a prebuilt package, if you want to compile it by self, or need to modify the .proto
file, you can refer to this document.
Note that the official gRPC documentation does not have complete support for the NDK.
1. Environment¶
Item | Version | Remarks |
---|---|---|
snpe | 1.59 | 1.60 uses clang-8.0, which may cause compatibility issues |
host OS | ubuntu18.04 | snpe1.59 specified version |
NDK | r17c | snpe1.59 specified version |
gRPC | commit 6f698b5 | - |
Hardware equipment | qcom888 | qcom chip required |
2. Cross compile gRPC with NDK¶
Pull gRPC repo, compile
protoc
andgrpc_cpp_plugin
on host
# Install dependencies
$ apt-get update && apt-get install -y libssl-dev
# Compile
$ git clone https://github.com/grpc/grpc --recursive=1 --depth=1
$ mkdir -p cmake/build
$ pushd cmake/build
$ cmake \
-DCMAKE_BUILD_TYPE=Release \
-DgRPC_INSTALL=ON \
-DgRPC_BUILD_TESTS=OFF \
-DgRPC_SSL_PROVIDER=package \
../..
# Install to host
$ make -j
$ sudo make install
Download the NDK and cross-compile the static libraries with android aarch64 format
$ wget https://dl.google.com/android/repository/android-ndk-r17c-linux-x86_64.zip
$ unzip android-ndk-r17c-linux-x86_64.zip
$ export ANDROID_NDK=/path/to/android-ndk-r17c
$ cd /path/to/grpc
$ mkdir -p cmake/build_aarch64 && pushd cmake/build_aarch64
$ cmake ../.. \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-26 \
-DANDROID_TOOLCHAIN=clang \
-DANDROID_STL=c++_shared \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/tmp/android_grpc_install_shared
$ make -j
$ make install
At this point
/tmp/android_grpc_install
should have the complete installation file
$ cd /tmp/android_grpc_install
$ tree -L 1
.
├── bin
├── include
├── lib
└── share
3. (Skipable) Self-test whether NDK gRPC is available¶
Compile the helloworld that comes with gRPC
$ cd /path/to/grpc/examples/cpp/helloworld/
$ mkdir cmake/build_aarch64 -p && pushd cmake/build_aarch64
$ cmake ../.. \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-26 \
-DANDROID_STL=c++_shared \
-DANDROID_TOOLCHAIN=clang \
-DCMAKE_BUILD_TYPE=Release \
-Dabsl_DIR=/tmp/android_grpc_install_shared/lib/cmake/absl \
-DProtobuf_DIR=/tmp/android_grpc_install_shared/lib/cmake/protobuf \
-DgRPC_DIR=/tmp/android_grpc_install_shared/lib/cmake/grpc
$ make -j
$ ls greeter*
greeter_async_client greeter_async_server greeter_callback_server greeter_server
greeter_async_client2 greeter_callback_client greeter_client
Turn on debug mode on your phone, push the binary to
/data/local/tmp
$ adb push greeter* /data/local/tmp
adb shell
into the phone, execute client/server
/data/local/tmp $ ./greeter_client
Greeter received: Hello world
4. Cross compile snpe inference server¶
Open the snpe tools website and download version 1.59. Unzip and set environment variables
Note that snpe >= 1.60 starts using
clang-8.0
, which may cause incompatibility withlibc++_shared.so
on older devices.
$ export SNPE_ROOT=/path/to/snpe-1.59.0.3230
Open the snpe server directory within mmdeploy, use the options when cross-compiling gRPC
$ cd /path/to/mmdeploy
$ cd service/snpe/server
$ mkdir -p build && cd build
$ export ANDROID_NDK=/path/to/android-ndk-r17c
$ cmake .. \
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-26 \
-DANDROID_STL=c++_shared \
-DANDROID_TOOLCHAIN=clang \
-DCMAKE_BUILD_TYPE=Release \
-Dabsl_DIR=/tmp/android_grpc_install_shared/lib/cmake/absl \
-DProtobuf_DIR=/tmp/android_grpc_install_shared/lib/cmake/protobuf \
-DgRPC_DIR=/tmp/android_grpc_install_shared/lib/cmake/grpc
$ make -j
$ file inference_server
inference_server: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /system/bin/linker64, BuildID[sha1]=252aa04e2b982681603dacb74b571be2851176d2, with debug_info, not stripped
Finally, you can see infernece_server
, adb push
it to the device and execute.
5. Regenerate the proto interface¶
If you have changed inference.proto
, you need to regenerate the .cpp and .py interfaces
$ python3 -m pip install grpc_tools --user
$ python3 -m grpc_tools.protoc -I./ --python_out=./client/ --grpc_python_out=./client/ inference.proto
$ ln -s `which protoc-gen-grpc`
$ protoc --cpp_out=./ --grpc_out=./ --plugin=protoc-gen-grpc=grpc_cpp_plugin inference.proto
Reference¶
snpe tutorial https://developer.qualcomm.com/sites/default/files/docs/snpe/cplus_plus_tutorial.html
gRPC cross build script https://raw.githubusercontent.com/grpc/grpc/master/test/distrib/cpp/run_distrib_test_cmake_aarch64_cross.sh
stackoverflow https://stackoverflow.com/questions/54052229/build-grpc-c-for-android-using-ndk-arm-linux-androideabi-clang-compiler
Frequently Asked Questions¶
TensorRT¶
“WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected.”
Fp16 mode requires a device with full-rate fp16 support.
“error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]”
When building an
ICudaEngine
from anINetworkDefinition
that has dynamically resizable inputs, users need to specify at least one optimization profile. Which can be set in deploy config:backend_config = dict( common_config=dict(max_workspace_size=1 << 30), model_inputs=[ dict( input_shapes=dict( input=dict( min_shape=[1, 3, 320, 320], opt_shape=[1, 3, 800, 1344], max_shape=[1, 3, 1344, 1344]))) ])
The input tensor shape should be limited between
min_shape
andmax_shape
.“error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS”
TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the defaulted choice for SM version >= 7.0. You may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you dont want to upgrade.
Libtorch¶
Error:
libtorch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN.
May
export CUDNN_ROOT=/root/path/to/cudnn
to resolve the build error.
Windows¶
Error: similar like this
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\cx\miniconda3\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies
Solution: according to this post, the issue may be caused by NVidia and will fix in CUDA release 11.7. For now one could use the fixNvPe.py script to modify the nvidia dlls in the pytorch lib dir.
python fixNvPe.py --input=C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\lib\*.dll
You can find your pytorch installation path with:
import torch print(torch.__file__)
enable_language(CUDA) error
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044. -- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1 (found version "11.1") CMake Error at C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:491 (message): No CUDA toolset found. Call Stack (most recent call first): C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD) C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCompilerId.cmake:59 (__determine_compiler_id_test) C:/Software/cmake/cmake-3.23.1-windows-x86_64/share/cmake-3.23/Modules/CMakeDetermineCUDACompiler.cmake:339 (CMAKE_DETERMINE_COMPILER_ID) C:/workspace/mmdeploy-0.6.0-windows-amd64-cuda11.1-tensorrt8.2.3.0/sdk/lib/cmake/MMDeploy/MMDeployConfig.cmake:27 (enable_language) CMakeLists.txt:5 (find_package)
Cause: CUDA Toolkit 11.1 was installed before Visual Studio, so the VS plugin was not installed. Or the version of VS is too new, so that the installation of the VS plugin is skipped during the installation of the CUDA Toolkit
Solution: This problem can be solved by manually copying the four files in
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\extras\visual_studio_integration\MSBuildExtensions
toC:\Software\Microsoft Visual Studio\2022\Community\Msbuild\Microsoft\VC\v170\BuildCustomizations
The specific path should be changed according to the actual situation.
ONNX Runtime¶
Under Windows system, when visualizing model inference result failed with the following error:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Failed to load library, error code: 193
Cause: In latest Windows systems, there are two
onnxruntime.dll
under the system path, and they will be loaded first, causing conflicts.C:\Windows\SysWOW64\onnxruntime.dll C:\Windows\System32\onnxruntime.dll
Solution: Choose one of the following two options
Copy the dll in the lib directory of the downloaded onnxruntime to the directory where mmdeploy_onnxruntime_ops.dll locates (It is recommended to use Everything to search the ops dll)
Rename the two dlls in the system path so that they cannot be loaded.
Pip¶
pip installed package but could not
import
them.Make sure your are using conda pip.
$ which pip # /path/to/.local/bin/pip /path/to/miniconda3/lib/python3.9/site-packages/pip
apis¶
- mmdeploy.apis.build_task_processor(model_cfg: mmengine.config.config.Config, deploy_cfg: mmengine.config.config.Config, device: str) → mmdeploy.codebase.base.task.BaseTask[source]¶
Build a task processor to manage the deployment pipeline.
- Parameters
model_cfg (str | mmengine.Config) – Model config file.
deploy_cfg (str | mmengine.Config) – Deployment config file.
device (str) – A string specifying device type.
- Returns
A task processor.
- Return type
BaseTask
- mmdeploy.apis.create_calib_input_data(calib_file: str, deploy_cfg: Union[str, mmengine.config.config.Config], model_cfg: Union[str, mmengine.config.config.Config], model_checkpoint: Optional[str] = None, dataset_cfg: Optional[Union[str, mmengine.config.config.Config]] = None, dataset_type: str = 'val', device: str = 'cpu') → None[source]¶
Create dataset for post-training quantization.
- Parameters
calib_file (str) – The output calibration data file.
deploy_cfg (str | Config) – Deployment config file or Config object.
model_cfg (str | Config) – Model config file or Config object.
model_checkpoint (str) – A checkpoint path of PyTorch model, defaults to None.
dataset_cfg (Optional[Union[str, Config]], optional) – Model config to provide calibration dataset. If none, use model_cfg as the dataset config. Defaults to None.
dataset_type (str, optional) – The dataset type. Defaults to ‘val’.
device (str, optional) – Device to create dataset. Defaults to ‘cpu’.
- mmdeploy.apis.extract_model(model: Union[str, onnx.onnx_ml_pb2.ModelProto], start_marker: Union[str, Iterable[str]], end_marker: Union[str, Iterable[str]], start_name_map: Optional[Dict[str, str]] = None, end_name_map: Optional[Dict[str, str]] = None, dynamic_axes: Optional[Dict[str, Dict[int, str]]] = None, save_file: Optional[str] = None) → onnx.onnx_ml_pb2.ModelProto[source]¶
Extract partition-model from an ONNX model.
The partition-model is defined by the names of the input and output tensors exactly.
Examples
>>> from mmdeploy.apis import extract_model >>> model = 'work_dir/fastrcnn.onnx' >>> start_marker = 'detector:input' >>> end_marker = ['extract_feat:output', 'multiclass_nms[0]:input'] >>> dynamic_axes = { 'input': { 0: 'batch', 2: 'height', 3: 'width' }, 'scores': { 0: 'batch', 1: 'num_boxes', }, 'boxes': { 0: 'batch', 1: 'num_boxes', } } >>> save_file = 'partition_model.onnx' >>> extract_model(model, start_marker, end_marker, dynamic_axes=dynamic_axes, save_file=save_file)
- Parameters
model (str | onnx.ModelProto) – Input ONNX model to be extracted.
start_marker (str | Sequence[str]) – Start marker(s) to extract.
end_marker (str | Sequence[str]) – End marker(s) to extract.
start_name_map (Dict[str, str]) – A mapping of start names, defaults to None.
end_name_map (Dict[str, str]) – A mapping of end names, defaults to None.
dynamic_axes (Dict[str, Dict[int, str]]) – A dictionary to specify dynamic axes of input/output, defaults to None.
save_file (str) – A file to save the extracted model, defaults to None.
- Returns
The extracted model.
- Return type
onnx.ModelProto
- mmdeploy.apis.get_predefined_partition_cfg(deploy_cfg: mmengine.config.config.Config, partition_type: str)[source]¶
Get the predefined partition config.
Notes
Currently only support mmdet codebase.
- Parameters
deploy_cfg (mmengine.Config) – use deploy config to get the codebase and task type.
partition_type (str) – A string specifying partition type.
- Returns
A dictionary of partition config.
- Return type
dict
- mmdeploy.apis.inference_model(model_cfg: Union[str, mmengine.config.config.Config], deploy_cfg: Union[str, mmengine.config.config.Config], backend_files: Sequence[str], img: Union[str, numpy.ndarray], device: str) → Any[source]¶
Run inference with PyTorch or backend model and show results.
Examples
>>> from mmdeploy.apis import inference_model >>> model_cfg = ('mmdetection/configs/fcos/' 'fcos_r50_caffe_fpn_gn-head_1x_coco.py') >>> deploy_cfg = ('configs/mmdet/detection/' 'detection_onnxruntime_dynamic.py') >>> backend_files = ['work_dir/fcos.onnx'] >>> img = 'demo.jpg' >>> device = 'cpu' >>> model_output = inference_model(model_cfg, deploy_cfg, backend_files, img, device)
- Parameters
model_cfg (str | mmengine.Config) – Model config file or Config object.
deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
backend_files (Sequence[str]) – Input backend model file(s).
img (str | np.ndarray) – Input image file or numpy array for inference.
device (str) – A string specifying device type.
- Returns
The inference results
- Return type
Any
- mmdeploy.apis.torch2onnx(img: Any, work_dir: str, save_file: str, deploy_cfg: Union[str, mmengine.config.config.Config], model_cfg: Union[str, mmengine.config.config.Config], model_checkpoint: Optional[str] = None, device: str = 'cuda:0')[source]¶
Convert PyTorch model to ONNX model.
Examples
>>> from mmdeploy.apis import torch2onnx >>> img = 'demo.jpg' >>> work_dir = 'work_dir' >>> save_file = 'fcos.onnx' >>> deploy_cfg = ('configs/mmdet/detection/' 'detection_onnxruntime_dynamic.py') >>> model_cfg = ('mmdetection/configs/fcos/' 'fcos_r50_caffe_fpn_gn-head_1x_coco.py') >>> model_checkpoint = ('checkpoints/' 'fcos_r50_caffe_fpn_gn-head_1x_coco-821213aa.pth') >>> device = 'cpu' >>> torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)
- Parameters
img (str | np.ndarray | torch.Tensor) – Input image used to assist converting model.
work_dir (str) – A working directory to save files.
save_file (str) – Filename to save onnx model.
deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
model_cfg (str | mmengine.Config) – Model config file or Config object.
model_checkpoint (str) – A checkpoint path of PyTorch model, defaults to None.
device (str) – A string specifying device type, defaults to ‘cuda:0’.
- mmdeploy.apis.torch2torchscript(img: Any, work_dir: str, save_file: str, deploy_cfg: Union[str, mmengine.config.config.Config], model_cfg: Union[str, mmengine.config.config.Config], model_checkpoint: Optional[str] = None, device: str = 'cuda:0')[source]¶
Convert PyTorch model to torchscript model.
- Parameters
img (str | np.ndarray | torch.Tensor) – Input image used to assist converting model.
work_dir (str) – A working directory to save files.
save_file (str) – Filename to save torchscript model.
deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
model_cfg (str | mmengine.Config) – Model config file or Config object.
model_checkpoint (str) – A checkpoint path of PyTorch model, defaults to None.
device (str) – A string specifying device type, defaults to ‘cuda:0’.
- mmdeploy.apis.visualize_model(model_cfg: Union[str, mmengine.config.config.Config], deploy_cfg: Union[str, mmengine.config.config.Config], model: Union[str, Sequence[str]], img: Union[str, numpy.ndarray, Sequence[str]], device: str, backend: Optional[mmdeploy.utils.constants.Backend] = None, output_file: Optional[str] = None, show_result: bool = False, **kwargs)[source]¶
Run inference with PyTorch or backend model and show results.
Examples
>>> from mmdeploy.apis import visualize_model >>> model_cfg = ('mmdetection/configs/fcos/' 'fcos_r50_caffe_fpn_gn-head_1x_coco.py') >>> deploy_cfg = ('configs/mmdet/detection/' 'detection_onnxruntime_dynamic.py') >>> model = 'work_dir/fcos.onnx' >>> img = 'demo.jpg' >>> device = 'cpu' >>> visualize_model(model_cfg, deploy_cfg, model, img, device, show_result=True)
- Parameters
model_cfg (str | mmengine.Config) – Model config file or Config object.
deploy_cfg (str | mmengine.Config) – Deployment config file or Config object.
model (str | Sequence[str]) – Input model or file(s).
img (str | np.ndarray | Sequence[str]) – Input image file or numpy array for inference.
device (str) – A string specifying device type.
backend (Backend) – Specifying backend type, defaults to None.
output_file (str) – Output file to save visualized image, defaults to None. Only valid if show_result is set to False.
show_result (bool) – Whether to show plotted image in windows, defaults to False.
apis/tensorrt¶
- mmdeploy.apis.tensorrt.from_onnx(onnx_model: Union[str, onnx.onnx_ml_pb2.ModelProto], output_file_prefix: str, input_shapes: Dict[str, Sequence[int]], max_workspace_size: int = 0, fp16_mode: bool = False, int8_mode: bool = False, int8_param: Optional[dict] = None, device_id: int = 0, log_level: tensorrt.Logger.Severity = tensorrt.Logger.ERROR, **kwargs) → tensorrt.ICudaEngine[source]¶
Create a tensorrt engine from ONNX.
- Parameters
onnx_model (str or onnx.ModelProto) – Input onnx model to convert from.
output_file_prefix (str) – The path to save the output ncnn file.
input_shapes (Dict[str, Sequence[int]]) – The min/opt/max shape of each input.
max_workspace_size (int) – To set max workspace size of TensorRT engine. some tactics and layers need large workspace. Defaults to 0.
fp16_mode (bool) – Specifying whether to enable fp16 mode. Defaults to False.
int8_mode (bool) – Specifying whether to enable int8 mode. Defaults to False.
int8_param (dict) – A dict of parameter int8 mode. Defaults to None.
device_id (int) – Choice the device to create engine. Defaults to 0.
log_level (trt.Logger.Severity) – The log level of TensorRT. Defaults to trt.Logger.ERROR.
- Returns
The TensorRT engine created from onnx_model.
- Return type
tensorrt.ICudaEngine
Example
>>> from mmdeploy.apis.tensorrt import from_onnx >>> engine = from_onnx( >>> "onnx_model.onnx", >>> {'input': {"min_shape" : [1, 3, 160, 160], >>> "opt_shape" : [1, 3, 320, 320], >>> "max_shape" : [1, 3, 640, 640]}}, >>> log_level=trt.Logger.WARNING, >>> fp16_mode=True, >>> max_workspace_size=1 << 30, >>> device_id=0) >>> })
- mmdeploy.apis.tensorrt.is_available(with_custom_ops: bool = False) → bool¶
Check whether backend is installed.
- Parameters
with_custom_ops (bool) – check custom ops exists.
- Returns
True if backend package is installed.
- Return type
bool
- mmdeploy.apis.tensorrt.load(path: str, allocator: Optional[Any] = None) → tensorrt.ICudaEngine[source]¶
Deserialize TensorRT engine from disk.
- Parameters
path (str) – The disk path to read the engine.
allocator (Any) – gpu allocator
- Returns
The TensorRT engine loaded from disk.
- Return type
tensorrt.ICudaEngine
- mmdeploy.apis.tensorrt.onnx2tensorrt(work_dir: str, save_file: str, model_id: int, deploy_cfg: Union[str, mmengine.config.config.Config], onnx_model: Union[str, onnx.onnx_ml_pb2.ModelProto], device: str = 'cuda:0', partition_type: str = 'end2end', **kwargs)[source]¶
Convert ONNX to TensorRT.
Examples
>>> from mmdeploy.backend.tensorrt.onnx2tensorrt import onnx2tensorrt >>> work_dir = 'work_dir' >>> save_file = 'end2end.engine' >>> model_id = 0 >>> deploy_cfg = ('configs/mmdet/detection/' 'detection_tensorrt_dynamic-320x320-1344x1344.py') >>> onnx_model = 'work_dir/end2end.onnx' >>> onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, 'cuda:0')
- Parameters
work_dir (str) – A working directory.
save_file (str) – The base name of the file to save TensorRT engine. E.g. end2end.engine.
model_id (int) – Index of input model.
deploy_cfg (str | mmengine.Config) – Deployment config.
onnx_model (str | onnx.ModelProto) – input onnx model.
device (str) – A string specifying cuda device, defaults to ‘cuda:0’.
partition_type (str) – Specifying partition type of a model, defaults to ‘end2end’.
apis/onnxruntime¶
- mmdeploy.apis.onnxruntime.is_available(with_custom_ops: bool = False) → bool¶
Check whether backend is installed.
- Parameters
with_custom_ops (bool) – check custom ops exists.
- Returns
True if backend package is installed.
- Return type
bool
apis/ncnn¶
- mmdeploy.apis.ncnn.from_onnx(onnx_model: Union[onnx.onnx_ml_pb2.ModelProto, str], output_file_prefix: str)[source]¶
Convert ONNX to ncnn.
The inputs of ncnn include a model file and a weight file. We need to use an executable program to convert the .onnx file to a .param file and a .bin file. The output files will save to work_dir.
Example
>>> from mmdeploy.apis.ncnn import from_onnx >>> onnx_path = 'work_dir/end2end.onnx' >>> output_file_prefix = 'work_dir/end2end' >>> from_onnx(onnx_path, output_file_prefix)
- Parameters
onnx_path (ModelProto|str) – The path of the onnx model.
output_file_prefix (str) – The path to save the output ncnn file.
- mmdeploy.apis.ncnn.is_available(with_custom_ops: bool = False) → bool¶
Check whether backend is installed.
- Parameters
with_custom_ops (bool) – check custom ops exists.
- Returns
True if backend package is installed.
- Return type
bool