Abstract:
Aimed at the problem that the skeleton behavior recognition algorithm cannot fully extract the motion information of the human skeleton and does not use the semantic information, a dual-stream action recognition model that can embed the semantic information is proposed. In this model, the joint name semantics and velocity semantics were embedded into the skeleton data through the semantic embedding module, and different features were extracted through distance flow and joint flow respectively. In the distance flow, the distance graph structure was used as the input according to the relative positions of different joints during motion, and the distance information was extracted by the graph convolution network. The joint flow used the human skeleton as the model to construct the joint graph as the input, and the structural features were extracted by graph convolution. The distance information and structural features complemented each other for prediction. The recognition accuracy on the dataset NTU-RGBD reached 96.45%, and the accuracy on the Kinetics dataset reached 38.01%.