Abstract:
Object detection algorithms based on deep convolutional neural networks usually have a large number of parameters and complex structures and are poor at extracting global feature information. In view of this, we propose a lightweight object detection model based on MobileViT, which combines the results of the YOLO family of algorithms. The MobileViT and SimPPF modules were used as the backbone network, and the PAN topological structure was used for multi-scale feature fusion, while the Decoupled Head structure was used to improve detection accuracy. On VOC07+12, the proposed model achieved a detection accuracy of 85.5% AP with smaller parameters. On NVIDIA RTX3060 GPU, the detection speed is 80.3 FPS. The experimental results show that the algorithm in this paper has a smaller number of parameters and higher detection accuracy than other object detection algorithms of similar scales.