Abstract: As an algorithm with better detection accuracy and speed, single shot multibox detector (SSD) has made great progress in many aspects. However, it cannot achieve a good detection effect for small objects because it does not make full use of high-level semantic information. Aiming at this problem, VGG is replaced with residual network (ResNet) as the backbone network, the feature fusion method is used to enhance the semantic information of the shallow feature map, at the same time, the attention module is introduced, which retains more object feature information and suppresses irrelevant information. Through the above methods, the detection effect of small objects can be enhanced. By experimenting on the PASCAL VOC2007 datasets, the validity of the proposed algorithm is proven, the mean average precision value of the algorithm is 80.2%, which is better than those of other improved SSD algorithms. Although the addition of feature fusion and attention modules to the algorithm can cause a decrease in the detection speed, it is still better than deconvolutional single shot detector (DSSD) algorithm.
Keywords: object detection; single shot multibox detector (SSD); residual network (ResNet); feature fusion; attention mechanism