Główne pojęcia
RT-DETR, the first real-time end-to-end object detector, outperforms previously advanced YOLO detectors in both speed and accuracy, while eliminating the negative impact of NMS post-processing.
Streszczenie
The paper proposes RT-DETR, the first real-time end-to-end object detector that outperforms previously advanced YOLO detectors in both speed and accuracy.
Key highlights:
- RT-DETR addresses the computational bottleneck in the Transformer encoder by designing an efficient hybrid encoder that decouples intra-scale feature interaction and cross-scale feature fusion.
- RT-DETR introduces the uncertainty-minimal query selection scheme to provide high-quality initial queries for the decoder, improving the accuracy of the detector.
- RT-DETR supports flexible speed tuning by adjusting the number of decoder layers, allowing it to adapt to various real-time scenarios without retraining.
- Experimental results show that RT-DETR-R50 achieves 53.1% AP on COCO and 108 FPS on T4 GPU, outperforming L and X models of previously advanced YOLO detectors in both speed and accuracy.
- RT-DETR-R50 also outperforms DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and about 21 times in FPS.
- After pre-training with Objects365, RT-DETR-R50 / R101 achieves 55.3% / 56.2% AP, resulting in surprising performance improvements.
Statystyki
RT-DETR-R50 achieves 53.1% AP on COCO and 108 FPS on T4 GPU.
RT-DETR-R101 achieves 54.3% AP on COCO and 74 FPS on T4 GPU.
Cytaty
"RT-DETR, the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma."
"RT-DETR achieves an ideal trade-off between the speed and accuracy."