Abstract:
In view of the fact that most of the existing scene recognition methods focus on the features of the scene itself, but ignore the details such as the context between the objects and the appearance features of the scene, it is difficult to obtain satisfactory classification results through a single overall feature. This paper proposes a scene recognition method which combines object semantic description and texture feature learning. It used LSTM network to recognize the objects in the scene. The local aggregation description vector based on the semantic information of the scene was used to learn the context information. The distribution of the image was described in detail by the texture features of the scene, and fused with the features extracted from multiple models. The recognition accuracy of the proposed method is 96.06%, 89.35% and 78.88% respectively on the widely-used scene datasets Scene15, MIT67 and SUN397, which shows that the fusion features in this paper are complementary to each other and prove the effectiveness of the method.