精度速度测试结果¶
软硬件环境¶
Ubuntu 18.04
ncnn 20211208
Cuda 11.3
TensorRT 7.2.3.4
Docker 20.10.8
NVIDIA tesla T4 tensor core GPU for TensorRT
速度测试¶
mmpretrain | TensorRT(ms) | PPLNN(ms) | ncnn(ms) | Ascend(ms) | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
model | spatial | T4 | JetsonNano2GB | Jetson TX2 | T4 | SnapDragon888 | Adreno660 | Ascend310 | |||
fp32 | fp16 | int8 | fp32 | fp16 | fp32 | fp16 | fp32 | fp32 | fp32 | ||
ResNet | 224x224 | 2.97 | 1.26 | 1.21 | 59.32 | 30.54 | 24.13 | 1.30 | 33.91 | 25.93 | 2.49 |
ResNeXt | 224x224 | 4.31 | 1.42 | 1.37 | 88.10 | 49.18 | 37.45 | 1.36 | 133.44 | 69.38 | - |
SE-ResNet | 224x224 | 3.41 | 1.66 | 1.51 | 74.59 | 48.78 | 29.62 | 1.91 | 107.84 | 80.85 | - |
ShuffleNetV2 | 224x224 | 1.37 | 1.19 | 1.13 | 15.26 | 10.23 | 7.37 | 4.69 | 9.55 | 10.66 | - |
mmdet part1 | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
YOLOv3 | 320x320 | 14.76 | 24.92 | 24.92 | - | 18.07 |
SSD-Lite | 320x320 | 8.84 | 9.21 | 8.04 | 1.28 | 19.72 |
RetinaNet | 800x1344 | 97.09 | 25.79 | 16.88 | 780.48 | 38.34 |
FCOS | 800x1344 | 84.06 | 23.15 | 17.68 | - | - |
FSAF | 800x1344 | 82.96 | 21.02 | 13.50 | - | 30.41 |
Faster R-CNN | 800x1344 | 88.08 | 26.52 | 19.14 | 733.81 | 65.40 |
Mask R-CNN | 800x1344 | 104.83 | 58.27 | - | - | 86.80 |
mmdet part2 | ncnn | ||
---|---|---|---|
model | spatial | SnapDragon888 | Adreno660 |
fp32 | fp32 | ||
MobileNetv2-YOLOv3 | 320x320 | 48.57 | 66.55 |
SSD-Lite | 320x320 | 44.91 | 66.19 |
YOLOX | 416x416 | 111.60 | 134.50 |
mmagic | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
ESRGAN | 32x32 | 12.64 | 12.42 | 12.45 | - | 7.67 |
SRCNN | 32x32 | 0.70 | 0.35 | 0.26 | 58.86 | 0.56 |
mmocr | TensorRT(ms) | PPLNN(ms) | ncnn(ms) | ||||
---|---|---|---|---|---|---|---|
model | spatial | T4 | T4 | SnapDragon888 | Adreno660 | ||
fp32 | fp16 | int8 | fp16 | fp32 | fp32 | ||
DBNet | 640x640 | 10.70 | 5.62 | 5.00 | 34.84 | - | - |
CRNN | 32x32 | 1.93 | 1.40 | 1.36 | - | 10.57 | 20.00 |
mmseg | TensorRT(ms) | PPLNN(ms) | ||||
---|---|---|---|---|---|---|
model | spatial | T4 | Jetson TX2 | T4 | ||
fp32 | fp16 | int8 | fp32 | fp16 | ||
FCN | 512x1024 | 128.42 | 23.97 | 18.13 | 1682.54 | 27.00 |
PSPNet | 1x3x512x1024 | 119.77 | 24.10 | 16.33 | 1586.19 | 27.26 |
DeepLabV3 | 512x1024 | 226.75 | 31.80 | 19.85 | - | 36.01 |
DeepLabV3+ | 512x1024 | 151.25 | 47.03 | 50.38 | 2534.96 | 34.80 |
精度测试¶
mmpretrain | PyTorch | TorchScript | ONNX Runtime | TensorRT | PPLNN | Ascend | |||
---|---|---|---|---|---|---|---|---|---|
model | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
ResNet-18 | top-1 | 69.90 | 69.90 | 69.88 | 69.88 | 69.86 | 69.86 | 69.86 | 69.91 |
top-5 | 89.43 | 89.43 | 89.34 | 89.34 | 89.33 | 89.38 | 89.34 | 89.43 | |
ResNeXt-50 | top-1 | 77.90 | 77.90 | 77.90 | 77.90 | - | 77.78 | 77.89 | - |
top-5 | 93.66 | 93.66 | 93.66 | 93.66 | - | 93.64 | 93.65 | - | |
SE-ResNet-50 | top-1 | 77.74 | 77.74 | 77.74 | 77.74 | 77.75 | 77.63 | 77.73 | - |
top-5 | 93.84 | 93.84 | 93.84 | 93.84 | 93.83 | 93.72 | 93.84 | - | |
ShuffleNetV1 1.0x | top-1 | 68.13 | 68.13 | 68.13 | 68.13 | 68.13 | 67.71 | 68.11 | - |
top-5 | 87.81 | 87.81 | 87.81 | 87.81 | 87.81 | 87.58 | 87.80 | - | |
ShuffleNetV2 1.0x | top-1 | 69.55 | 69.55 | 69.55 | 69.55 | 69.54 | 69.10 | 69.54 | - |
top-5 | 88.92 | 88.92 | 88.92 | 88.92 | 88.91 | 88.58 | 88.92 | - | |
MobileNet V2 | top-1 | 71.86 | 71.86 | 71.86 | 71.86 | 71.87 | 70.91 | 71.84 | 71.87 |
top-5 | 90.42 | 90.42 | 90.42 | 90.42 | 90.40 | 89.85 | 90.41 | 90.42 | |
Vision Transformer | top-1 | 85.43 | 85.43 | - | 85.43 | 85.42 | - | - | 85.43 |
top-5 | 97.77 | 97.77 | - | 97.77 | 97.76 | - | - | 97.77 | |
Swin Transformer | top-1 | 81.18 | 81.18 | 81.18 | 81.18 | 81.18 | - | - | - |
top-5 | 95.61 | 95.61 | 95.61 | 95.61 | 95.61 | - | - | - | |
EfficientFormer | top-1 | 80.46 | 80.45 | 80.46 | 80.46 | - | - | - | - |
top-5 | 94.99 | 94.98 | 94.99 | 94.99 | - | - | - | - |
mmdet | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | Ascend | OpenVINO | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 | fp32 |
YOLOV3 | Object Detection | COCO2017 | box AP | 33.7 | 33.7 | - | 33.5 | 33.5 | 33.5 | - | - | - |
SSD | Object Detection | COCO2017 | box AP | 25.5 | 25.5 | - | 25.5 | 25.5 | - | - | - | - |
RetinaNet | Object Detection | COCO2017 | box AP | 36.5 | 36.4 | - | 36.4 | 36.4 | 36.3 | 36.5 | 36.4 | - |
FCOS | Object Detection | COCO2017 | box AP | 36.6 | - | - | 36.6 | 36.5 | - | - | - | - |
FSAF | Object Detection | COCO2017 | box AP | 37.4 | 37.4 | - | 37.4 | 37.4 | 37.2 | 37.4 | - | - |
CenterNet | Object Detection | COCO2017 | box AP | 25.9 | 26.0 | 26.0 | 26.0 | 25.8 | - | - | - | - |
YOLOX | Object Detection | COCO2017 | box AP | 40.5 | 40.3 | - | 40.3 | 40.3 | 29.3 | - | - | - |
Faster R-CNN | Object Detection | COCO2017 | box AP | 37.4 | 37.3 | - | 37.3 | 37.3 | 37.1 | 37.3 | 37.2 | - |
ATSS | Object Detection | COCO2017 | box AP | 39.4 | - | - | 39.4 | 39.4 | - | - | - | - |
Cascade R-CNN | Object Detection | COCO2017 | box AP | 40.4 | - | - | 40.4 | 40.4 | - | 40.4 | - | - |
GFL | Object Detection | COCO2017 | box AP | 40.2 | - | 40.2 | 40.2 | 40.0 | - | - | - | - |
RepPoints | Object Detection | COCO2017 | box AP | 37.0 | - | - | 36.9 | - | - | - | - | - |
DETR | Object Detection | COCO2017 | box AP | 40.1 | 40.1 | - | 40.1 | 40.1 | - | - | - | - |
Mask R-CNN | Instance Segmentation | COCO2017 | box AP | 38.2 | 38.1 | - | 38.1 | 38.1 | - | 38.0 | - | - |
mask AP | 34.7 | 34.7 | - | 33.7 | 33.7 | - | - | - | - | |||
Swin-Transformer | Instance Segmentation | COCO2017 | box AP | 42.7 | - | 42.7 | 42.5 | 37.7 | - | - | - | - |
mask AP | 39.3 | - | 39.3 | 39.3 | 35.4 | - | - | - | - | |||
SOLO | Instance Segmentation | COCO2017 | mask AP | 33.1 | - | 32.7 | - | - | - | - | - | 32.7 |
SOLOv2 | Instance Segmentation | COCO2017 | mask AP | 34.8 | - | 34.5 | - | - | - | - | - | 34.5 |
mmagic | Pytorch | TorchScript | ONNX Runtime | TensorRT | PPLNN | |||||
---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 |
SRCNN | Super Resolution | Set5 | PSNR | 28.4316 | 28.4120 | 28.4323 | 28.4323 | 28.4286 | 28.1995 | 28.4311 |
SSIM | 0.8099 | 0.8106 | 0.8097 | 0.8097 | 0.8096 | 0.7934 | 0.8096 | |||
ESRGAN | Super Resolution | Set5 | PSNR | 28.2700 | 28.2619 | 28.2592 | 28.2592 | - | - | 28.2624 |
SSIM | 0.7778 | 0.7784 | 0.7764 | 0.7774 | - | - | 0.7765 | |||
ESRGAN-PSNR | Super Resolution | Set5 | PSNR | 30.6428 | 30.6306 | 30.6444 | 30.6430 | - | - | 27.0426 |
SSIM | 0.8559 | 0.8565 | 0.8558 | 0.8558 | - | - | 0.8557 | |||
SRGAN | Super Resolution | Set5 | PSNR | 27.9499 | 27.9252 | 27.9408 | 27.9408 | - | - | 27.9388 |
SSIM | 0.7846 | 0.7851 | 0.7839 | 0.7839 | - | - | 0.7839 | |||
SRResNet | Super Resolution | Set5 | PSNR | 30.2252 | 30.2069 | 30.2300 | 30.2300 | - | - | 30.2294 |
SSIM | 0.8491 | 0.8497 | 0.8488 | 0.8488 | - | - | 0.8488 | |||
Real-ESRNet | Super Resolution | Set5 | PSNR | 28.0297 | - | 27.7016 | 27.7016 | - | - | 27.7049 |
SSIM | 0.8236 | - | 0.8122 | 0.8122 | - | - | 0.8123 | |||
EDSR | Super Resolution | Set5 | PSNR | 30.2223 | 30.2192 | 30.2214 | 30.2214 | 30.2211 | 30.1383 | - |
SSIM | 0.8500 | 0.8507 | 0.8497 | 0.8497 | 0.8497 | 0.8469 | - |
mmocr | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | OpenVINO | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
DBNet* | TextDetection | ICDAR2015 | recall | 0.7310 | 0.7308 | 0.7304 | 0.7198 | 0.7179 | 0.7111 | 0.7304 | 0.7309 |
precision | 0.8714 | 0.8718 | 0.8714 | 0.8677 | 0.8674 | 0.8688 | 0.8718 | 0.8714 | |||
hmean | 0.7950 | 0.7949 | 0.7950 | 0.7868 | 0.7856 | 0.7821 | 0.7949 | 0.7950 | |||
DBNetpp | TextDetection | ICDAR2015 | recall | 0.8209 | 0.8209 | 0.8209 | 0.8199 | 0.8204 | 0.8204 | - | 0.8209 |
precision | 0.9079 | 0.9079 | 0.9079 | 0.9117 | 0.9117 | 0.9142 | - | 0.9079 | |||
hmean | 0.8622 | 0.8622 | 0.8622 | 0.8634 | 0.8637 | 0.8648 | - | 0.8622 | |||
PSENet | TextDetection | ICDAR2015 | recall | 0.7526 | 0.7526 | 0.7526 | 0.7526 | 0.7520 | 0.7496 | - | 0.7526 |
precision | 0.8669 | 0.8669 | 0.8669 | 0.8669 | 0.8668 | 0.8550 | - | 0.8669 | |||
hmean | 0.8057 | 0.8057 | 0.8057 | 0.8057 | 0.8054 | 0.7989 | - | 0.8057 | |||
PANet | TextDetection | ICDAR2015 | recall | 0.7401 | 0.7401 | 0.7401 | 0.7357 | 0.7366 | - | - | 0.7401 |
precision | 0.8601 | 0.8601 | 0.8601 | 0.8570 | 0.8586 | - | - | 0.8601 | |||
hmean | 0.7955 | 0.7955 | 0.7955 | 0.7917 | 0.7930 | - | - | 0.7955 | |||
TextSnake | TextDetection | CTW1500 | recall | 0.8052 | 0.8052 | 0.8052 | 0.8055 | - | - | - | - |
precision | 0.8535 | 0.8535 | 0.8535 | 0.8538 | - | - | - | - | |||
hmean | 0.8286 | 0.8286 | 0.8286 | 0.8290 | - | - | - | - | |||
MaskRCNN | TextDetection | ICDAR2015 | recall | 0.7766 | 0.7766 | 0.7766 | 0.7766 | 0.7761 | 0.7670 | - | - |
precision | 0.8644 | 0.8644 | 0.8644 | 0.8644 | 0.8630 | 0.8705 | - | - | |||
hmean | 0.8182 | 0.8182 | 0.8182 | 0.8182 | 0.8172 | 0.8155 | - | - | |||
CRNN | TextRecognition | IIIT5K | acc | 0.8067 | 0.8067 | 0.8067 | 0.8067 | 0.8063 | 0.8067 | 0.8067 | - |
SAR | TextRecognition | IIIT5K | acc | 0.9517 | - | 0.9287 | - | - | - | - | - |
SATRN | TextRecognition | IIIT5K | acc | 0.9470 | 0.9487 | 0.9487 | 0.9487 | 0.9483 | 0.9483 | - | - |
ABINet | TextRecognition | IIIT5K | acc | 0.9603 | 0.9563 | 0.9563 | 0.9573 | 0.9507 | 0.9510 | - | - |
mmseg | Pytorch | TorchScript | ONNXRuntime | TensorRT | PPLNN | Ascend | ||||
---|---|---|---|---|---|---|---|---|---|---|
model | dataset | metric | fp32 | fp32 | fp32 | fp32 | fp16 | int8 | fp16 | fp32 |
FCN | Cityscapes | mIoU | 72.25 | 72.36 | - | 72.36 | 72.35 | 74.19 | 72.35 | 72.35 |
PSPNet | Cityscapes | mIoU | 78.55 | 78.66 | - | 78.26 | 78.24 | 77.97 | 78.09 | 78.67 |
deeplabv3 | Cityscapes | mIoU | 79.09 | 79.12 | - | 79.12 | 79.12 | 78.96 | 79.12 | 79.06 |
deeplabv3+ | Cityscapes | mIoU | 79.61 | 79.60 | - | 79.60 | 79.60 | 79.43 | 79.60 | 79.51 |
Fast-SCNN | Cityscapes | mIoU | 70.96 | 70.96 | - | 70.93 | 70.92 | 66.00 | 70.92 | - |
UNet | Cityscapes | mIoU | 69.10 | - | - | 69.10 | 69.10 | 68.95 | - | - |
ANN | Cityscapes | mIoU | 77.40 | - | - | 77.32 | 77.32 | - | - | - |
APCNet | Cityscapes | mIoU | 77.40 | - | - | 77.32 | 77.32 | - | - | - |
BiSeNetV1 | Cityscapes | mIoU | 74.44 | - | - | 74.44 | 74.43 | - | - | - |
BiSeNetV2 | Cityscapes | mIoU | 73.21 | - | - | 73.21 | 73.21 | - | - | - |
CGNet | Cityscapes | mIoU | 68.25 | - | - | 68.27 | 68.27 | - | - | - |
EMANet | Cityscapes | mIoU | 77.59 | - | - | 77.59 | 77.6 | - | - | - |
EncNet | Cityscapes | mIoU | 75.67 | - | - | 75.66 | 75.66 | - | - | - |
ERFNet | Cityscapes | mIoU | 71.08 | - | - | 71.08 | 71.07 | - | - | - |
FastFCN | Cityscapes | mIoU | 79.12 | - | - | 79.12 | 79.12 | - | - | - |
GCNet | Cityscapes | mIoU | 77.69 | - | - | 77.69 | 77.69 | - | - | - |
ICNet | Cityscapes | mIoU | 76.29 | - | - | 76.36 | 76.36 | - | - | - |
ISANet | Cityscapes | mIoU | 78.49 | - | - | 78.49 | 78.49 | - | - | - |
OCRNet | Cityscapes | mIoU | 74.30 | - | - | 73.66 | 73.67 | - | - | - |
PointRend | Cityscapes | mIoU | 76.47 | 76.47 | - | 76.41 | 76.42 | - | - | - |
Semantic FPN | Cityscapes | mIoU | 74.52 | - | - | 74.52 | 74.52 | - | - | - |
STDC | Cityscapes | mIoU | 75.10 | - | - | 75.10 | 75.10 | - | - | - |
STDC | Cityscapes | mIoU | 77.17 | - | - | 77.17 | 77.17 | - | - | - |
UPerNet | Cityscapes | mIoU | 77.10 | - | - | 77.19 | 77.18 | - | - | - |
Segmenter | ADE20K | mIoU | 44.32 | 44.29 | 44.29 | 44.29 | 43.34 | 43.35 | - | - |
mmpose | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metric | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
HRNet | Pose Detection | COCO | AP | 0.748 | 0.748 | 0.748 | 0.748 | - | 0.748 |
AR | 0.802 | 0.802 | 0.802 | 0.802 | - | 0.802 | |||
LiteHRNet | Pose Detection | COCO | AP | 0.663 | 0.663 | 0.663 | - | - | 0.663 |
AR | 0.728 | 0.728 | 0.728 | - | - | 0.728 | |||
MSPN | Pose Detection | COCO | AP | 0.762 | 0.762 | 0.762 | 0.762 | - | 0.762 |
AR | 0.825 | 0.825 | 0.825 | 0.825 | - | 0.825 | |||
Hourglass | Pose Detection | COCO | AP | 0.717 | 0.717 | 0.717 | 0.717 | - | 0.717 |
AR | 0.774 | 0.774 | 0.774 | 0.774 | - | 0.774 | |||
SimCC | Pose Detection | COCO | AP | 0.607 | - | 0.608 | - | - | - |
AR | 0.668 | - | 0.672 | - | - | - |
mmrotate | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metrics | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
RotatedRetinaNet | Rotated Detection | DOTA-v1.0 | mAP | 0.698 | 0.698 | 0.698 | 0.697 | - | - |
Oriented RCNN | Rotated Detection | DOTA-v1.0 | mAP | 0.756 | 0.756 | 0.758 | 0.730 | - | - |
GlidingVertex | Rotated Detection | DOTA-v1.0 | mAP | 0.732 | - | 0.733 | 0.731 | - | - |
RoI Transformer | Rotated Detection | DOTA-v1.0 | mAP | 0.761 | - | 0.758 | - | - | - |
mmaction2 | Pytorch | ONNXRuntime | TensorRT | PPLNN | OpenVINO | ||||
---|---|---|---|---|---|---|---|---|---|
model | task | dataset | metrics | fp32 | fp32 | fp32 | fp16 | fp16 | fp32 |
TSN | Recognition | Kinetics-400 | top-1 | 69.71 | - | 69.71 | - | - | - |
top-5 | 88.75 | - | 88.75 | - | - | - | |||
SlowFast | Recognition | Kinetics-400 | top-1 | 74.45 | - | 75.62 | - | - | - |
top-5 | 91.55 | - | 92.10 | - | - | - |
由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的
TensorRT 的一些 int8 性能基准测试需要有 tensor core 的 Nvidia 卡,否则性能会大幅下降
DBNet 在模型
neck
使用了nearest
插值,TensorRT-7 用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了neck
以使用bilinear
插值,这提高了检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。对于 mmpose 模型,在模型配置文件中
flip_test
需设置为False
部分模型在 fp16 模式下可能存在较大的精度损失,请根据具体情况对模型进行调整。