Abstract:
Since the lengths and moments of actions occurring in the videos vary, the effectiveness of action recognition is directly affected by temporal alignment. A few-shot action recognition method based on temporal alignment with multi-phase attention mechanism is presented in this paper. Through the phased attention mechanism, the temporal alignment of video clips was realized more accurately, the temporal mismatch at the video stage level was avoided, and the temporal information of actions in the video was more reasonably obtained and utilized. By eliminating segment-wise features pairs with low similarity scores, the interference of non-action segments was reduced, and the accuracy of few-shot action recognition was improved. c-way k-shot meta learning was adopted in the training procedure. The experiments were conducted on the UCF-101 and Kinetics datasets, which verified the effectiveness of the proposed method compared with related advanced methods.