基于动态时空信息融合的视频行为识别

VIDEO BEHAVIOR RECOGNITION BASED ON DYNAMIC SPATIOTEMPORAL INFORMATION FUSION

摘要: 由于视频数据在时空维度上具有复杂和冗余的信息。针对这个问题, 提出运动模块, 该模块基于时空特征去计算像素特征之间的时空差异。将动态的时空差异分解为两个分支进行处理, 一个分支用于修正相邻帧间特征差上的时空位移, 另一个分支用于捕获此时间差上的上下文信息。在当前时间差中, 对时空差异的像素点的概率分布进行建模。结果表明, 在尽量不影响计算量(flops)与参数量的情况下, 运动模块提高了视频识别任务方面的性能, 并在公共数据集上证实了其有效性和效率。

Abstract: Video data has complex and redundant information in time and space dimensions. In order to solve this problem, we designed a motion module. This module calculated the temporal and spatial differences between pixels based on time and space features. The dynamic spatiotemporal differences were decomposed into two branches for processing. One branch was used to correct the temporal and spatial displacements on adjacent frames, and the other one was used to capture contextual information at adjacent moments. In the time interval of adjacent frames, the temporal and spatial probability distribution of pixels was modeled. The results show that the motion module improves the performance of video recognition while slightly affecting flops and parameters. Its effectiveness and efficiency was verified on public datasets.