Abstract
The simplebaseline model achieves high performance of human pose estimation with simple network structure. But the model lacks the layer and spatial information fusion. In this paper, we propose DLSAnet, which fuse layers and spatial information efficetively. DLSAnet uses DLA as backbone which has excellent feature extraction capabilities in the field of object detection. In addition, a modified spatial pyramid pooling is introduced to pool and connect multi-scale local area features, allowing the network to learn object features more comprehensively. Using a four-branch SPP module instead of a single-branch SPP module connected by a single hopping layer. This method is effective in alleviating the problem of slow loss drop late in training. Experiments show that DLSAnet can achieve better accuracy.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.