基于视频关键帧提取的快速T3D动作识别模型

FAST T3D ACTION RECOGNITION METHOD BASED ON VIDEO KEY FRAME EXTRACTION

  • 摘要: 视频级动作识别存在着数据量大、识别速度慢的问题,主要原因是需要提取空间维度上人体姿态,还需要考虑时间维度上动作关联。提出一种基于视频关键帧提取的快速T3D动作识别模型,通过改进的Superpoint网络提取视频关键帧,缩减数据量。以T3D网络为基础,时空分解其关键模块可变时序卷积层,显著提升了其计算效率。在公共数据集UCF-101和HMDB-51数据集进行了实验验证,准确率和原T3D网络近似,但其识别速度为原T3D网络的2倍,更适合于实际的应用场景。

     

    Abstract: The video level action recognition method has the problems of large amount of video input data and slow recognition speed. The main reason is that these methods not only need to extract human posture in the spatial dimension, but also need to consider the association of actions in the temporal dimension. This paperproposes a fast T3D action recognition method based on video key frame extraction.It extracted video key frames through improved Superpoint network to reduce the amount of video data. Based on T3D network, the computational efficiency was improvedthrough spatiotemporal decomposition of its key module variable timing convolution layer. Experimental validation was conducted on the public datasets UCF-101 and HMDB-51.This method's accuracy is similar to the original T3D network, but its recognition speed is twice that of the original T3D network, which is more suitable for practical application scenarios.

     

/

返回文章
返回