# TensorRT Ops¶

## TRTBatchedNMS¶

### Description¶

Batched NMS with a fixed number of output bounding boxes.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`background_label_id` |
The label ID for the background class. If there is no background class, set it to `-1` . |

`int` |
`num_classes` |
The number of classes. |

`int` |
`topK` |
The number of bounding boxes to be fed into the NMS step. |

`int` |
`keepTopK` |
The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the `topK` value. |

`float` |
`scoreThreshold` |
The scalar threshold for score (low scoring boxes are removed). |

`float` |
`iouThreshold` |
The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed). |

`int` |
`isNormalized` |
Set to `false` if the box coordinates are not normalized, meaning they are not in the range `[0,1]` . Defaults to `true` . |

`int` |
`clipBoxes` |
Forcibly restrict bounding boxes to the normalized range `[0,1]` . Only applicable if `isNormalized` is also `true` . Defaults to `true` . |

### Inputs¶

`inputs[0]`: T- boxes; 4-D tensor of shape (N, num_boxes, num_classes, 4), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
`inputs[1]`: T- scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).

### Outputs¶

`outputs[0]`: T- dets; 3-D tensor of shape (N, valid_num_boxes, 5), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, x1, y1, score]`
`outputs[1]`: tensor(int32, Linear)- labels; 2-D tensor of shape (N, valid_num_boxes).

### Type Constraints¶

T:tensor(float32, Linear)

## grid_sampler¶

### Description¶

Perform sample from `input`

with pixel locations from `grid`

.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`interpolation_mode` |
Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest` ) |

`int` |
`padding_mode` |
Padding mode for outside grid values. (0: `zeros` , 1: `border` , 2: `reflection` ) |

`int` |
`align_corners` |
If `align_corners=1` , the extrema (`-1` and `1` ) are considered as referring to the center points of the input's corner pixels. If `align_corners=0` , they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |

### Inputs¶

`inputs[0]`: T- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.
`inputs[1]`: T- Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW are the height and width of offset and output.

### Outputs¶

`outputs[0]`: T- Output feature; 4-D tensor of shape (N, C, outH, outW).

### Type Constraints¶

T:tensor(float32, Linear)

## MMCVInstanceNormalization¶

### Description¶

Carry out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`float` |
`epsilon` |
The epsilon value to use to avoid division by zero. Default is 1e-05 |

### Inputs¶

`input`: T- Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.
`scale`: T- The input 1-dimensional scale tensor of size C.
`B`: T- The input 1-dimensional bias tensor of size C.

### Outputs¶

`output`: T- The output tensor of the same shape as input.

### Type Constraints¶

T:tensor(float32, Linear)

## MMCVModulatedDeformConv2d¶

### Description¶

Perform Modulated Deformable Convolution on input feature. Read Deformable ConvNets v2: More Deformable, Better Results for detail.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`list of ints` |
`stride` |
The stride of the convolving kernel. (sH, sW) |

`list of ints` |
`padding` |
Paddings on both sides of the input. (padH, padW) |

`list of ints` |
`dilation` |
The spacing between kernel elements. (dH, dW) |

`int` |
`deformable_group` |
Groups of deformable offset. |

`int` |
`group` |
Split input into groups. `input_channel` should be divisible by the number of groups. |

### Inputs¶

`inputs[0]`: T- Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.
`inputs[1]`: T- Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
`inputs[2]`: T- Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW are the height and width of weight, outH and outW are the height and width of offset and output.
`inputs[3]`: T- Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).
`inputs[4]`: T, optional- Input weight; 1-D tensor of shape (output_channel).

### Outputs¶

`outputs[0]`: T- Output feature; 4-D tensor of shape (N, output_channel, outH, outW).

### Type Constraints¶

T:tensor(float32, Linear)

## MMCVMultiLevelRoiAlign¶

### Description¶

Perform RoIAlign on features from multiple levels. Used in bbox_head of most two-stage detectors.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`output_height` |
height of output roi. |

`int` |
`output_width` |
width of output roi. |

`list of floats` |
`featmap_strides` |
feature map stride of each level. |

`int` |
`sampling_ratio` |
number of input samples to take for each output sample. `0` means to take samples densely for current models. |

`float` |
`roi_scale_factor` |
RoIs will be scaled by this factor before RoI Align. |

`int` |
`finest_scale` |
Scale threshold of mapping to level 0. Default: 56. |

`int` |
`aligned` |
If `aligned=0` , use the legacy implementation in MMDetection. Else, align the results more perfectly. |

### Inputs¶

`inputs[0]`: T

`inputs[1~]`: T

### Outputs¶

`outputs[0]`: T- RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].

### Type Constraints¶

T:tensor(float32, Linear)

## MMCVRoIAlign¶

### Description¶

Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`output_height` |
height of output roi |

`int` |
`output_width` |
width of output roi |

`float` |
`spatial_scale` |
used to scale the input boxes |

`int` |
`sampling_ratio` |
number of input samples to take for each output sample. `0` means to take samples densely for current models. |

`str` |
`mode` |
pooling mode in each bin. `avg` or `max` |

`int` |
`aligned` |
If `aligned=0` , use the legacy implementation in MMDetection. Else, align the results more perfectly. |

### Inputs¶

`inputs[0]`: T- Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.
`inputs[1]`: T- RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].

### Outputs¶

`outputs[0]`: T- RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].

### Type Constraints¶

T:tensor(float32, Linear)

## ScatterND¶

### Description¶

ScatterND takes three inputs `data`

tensor of rank r >= 1, `indices`

tensor of rank q >= 1, and `updates`

tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`

, and then updating its value to values specified by updates at specific index positions specified by `indices`

. Its output shape is the same as the shape of `data`

. Note that `indices`

should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

The `output`

is calculated via the following equation:

```
output = np.copy(data)
update_indices = indices.shape[:-1]
for idx in np.ndindex(update_indices):
output[indices[idx]] = updates[idx]
```

### Parameters¶

None

### Inputs¶

`inputs[0]`: T- Tensor of rank r>=1.
`inputs[1]`: tensor(int32, Linear)- Tensor of rank q>=1.
`inputs[2]`: T- Tensor of rank q + r - indices_shape[-1] - 1.

### Outputs¶

`outputs[0]`: T- Tensor of rank r >= 1.

### Type Constraints¶

T:tensor(float32, Linear), tensor(int32, Linear)

## TRTBatchedRotatedNMS¶

### Description¶

Batched rotated NMS with a fixed number of output bounding boxes.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`background_label_id` |
The label ID for the background class. If there is no background class, set it to `-1` . |

`int` |
`num_classes` |
The number of classes. |

`int` |
`topK` |
The number of bounding boxes to be fed into the NMS step. |

`int` |
`keepTopK` |
The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the `topK` value. |

`float` |
`scoreThreshold` |
The scalar threshold for score (low scoring boxes are removed). |

`float` |
`iouThreshold` |
The scalar threshold for IoU (new boxes that have high IoU overlap with previously selected boxes are removed). |

`int` |
`isNormalized` |
Set to `false` if the box coordinates are not normalized, meaning they are not in the range `[0,1]` . Defaults to `true` . |

`int` |
`clipBoxes` |
Forcibly restrict bounding boxes to the normalized range `[0,1]` . Only applicable if `isNormalized` is also `true` . Defaults to `true` . |

### Inputs¶

`inputs[0]`: T- boxes; 4-D tensor of shape (N, num_boxes, num_classes, 5), where N is the batch size; `num_boxes` is the number of boxes; `num_classes` is the number of classes, which could be 1 if the boxes are shared between all classes.
`inputs[1]`: T- scores; 4-D tensor of shape (N, num_boxes, 1, num_classes).

### Outputs¶

`outputs[0]`: T- dets; 3-D tensor of shape (N, valid_num_boxes, 6), `valid_num_boxes` is the number of boxes after NMS. For each row `dets[i,j,:] = [x0, y0, width, height, theta, score]`
`outputs[1]`: tensor(int32, Linear)- labels; 2-D tensor of shape (N, valid_num_boxes).

### Type Constraints¶

T:tensor(float32, Linear)

## GridPriorsTRT¶

### Description¶

Generate the anchors for object detection task.

### Parameters¶

Type | Parameter | Description |
---|---|---|

`int` |
`stride_w` |
The stride of the feature width. |

`int` |
`stride_h` |
The stride of the feature height. |

### Inputs¶

`inputs[0]`: T- The base anchors; 2-D tensor with shape [num_base_anchor, 4].
`inputs[1]`: TAny- height provider; 1-D tensor with shape [featmap_height]. The data will never been used.
`inputs[2]`: TAny- width provider; 1-D tensor with shape [featmap_width]. The data will never been used.

### Outputs¶

`outputs[0]`: T- output anchors; 2-D tensor of shape (num_base_anchor*featmap_height*featmap_widht, 4).

### Type Constraints¶

T:tensor(float32, Linear)

TAny: Any

## ScaledDotProductAttentionTRT¶

### Description¶

Dot product attention used to support multihead attention, read Attention Is All You Need for more detail.

### Parameters¶

None

### Inputs¶

`inputs[0]`: T- query; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
`inputs[1]`: T- key; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
`inputs[2]`: T- value; 3-D tensor with shape [batch_size, sequence_length, embedding_size].
`inputs[3]`: T- mask; 2-D/3-D tensor with shape [sequence_length, sequence_length] or [batch_size, sequence_length, sequence_length]. optional.

### Outputs¶

`outputs[0]`: T- 3-D tensor of shape [batch_size, sequence_length, embedding_size]. `softmax(q@k.T)@v`
`outputs[1]`: T- 3-D tensor of shape [batch_size, sequence_length, sequence_length]. `softmax(q@k.T)`

### Type Constraints¶

T:tensor(float32, Linear)

## GatherTopk¶

### Description¶

TensorRT 8.2~8.4 would give unexpected result for multi-index gather.

```
data[batch_index, bbox_index, ...]
```

Read this for more details.

### Parameters¶

None

### Inputs¶

`inputs[0]`: T- Tensor to be gathered, with shape (A0, ..., An, G0, C0, ...).
`inputs[1]`: tensor(int32, Linear)- Tensor of index. with shape (A0, ..., An, G1)

### Outputs¶

`outputs[0]`: T- Tensor of output. With shape (A0, ..., An, G1, C0, ...)

### Type Constraints¶

T:tensor(float32, Linear), tensor(int32, Linear)

## MMCVMultiScaleDeformableAttention¶

### Description¶

Perform attention computation over a small set of key sampling points around a reference point rather than looking over all possible spatial locations. Read Deformable DETR: Deformable Transformers for End-to-End Object Detection for detail.

### Parameters¶

None

### Inputs¶

`inputs[0]`: T- Input feature; 4-D tensor of shape (N, S, M, D), where N is the batch size, S is the length of feature maps, M is the number of attention heads, and D is hidden_dim.
`inputs[1]`: T- Input offset; 2-D tensor of shape (L, 2), L is the number of feature maps, `2` is shape of feature maps.
`inputs[2]`: T- Input mask; 1-D tensor of shape (L, ), this tensor is used to find the sampling locations for different feature levels as the input feature tensors are flattened.
`inputs[3]`: T- Input weight; 6-D tensor of shape (N, Lq, M, L, P, 2). Lq is the length of feature maps(encoder)/length of queries(decoder), P is the number of points
`inputs[4]`: T, optional- Input weight; 5-D tensor of shape (N, Lq, M, L, P).

### Outputs¶

`outputs[0]`: T- Output feature; 3-D tensor of shape (N, Lq, M*D).

### Type Constraints¶

T:tensor(float32, Linear)