基于改进教师-学生模型的色情音频事件检测

DETECTION OF PORNOGRAPHIC AUDIO EVENTS BASED ON IMPROVED MEAN TEACHER MODEL

  • 摘要: 为保障青少年身心健康,国家日益重视色情信息的监管工作。针对传统色情音频检测无法精准定位事件起止时间的问题,提出一种基于半监督学习的改进教师-学生模型。将无标签、弱标签、强标签数据作为训练集输入,通过多层神经网络提取音频的帧、段特征,随后迭代优化帧、段所产生的分类损失以及教师-学生模型和段分类模型之间的一致性损失。在真实数据集上,实验结果表明当时间容忍度为5s时,色情类别召回率达到94.3%,F1得分可达到83.4%。

     

    Abstract: To protect the physical and mental health of young people, China attaches more attention to the supervision of the pornographic information. Aiming at the problem that traditional pornographic audio detection cannot accurately locate the start and end times of events, we propose an improved mean teacher model based on semi-supervised learning. The input of training set was unlabeled, weak label, and strong label data. The audio frame and segment features were extracted through a multilayer neural network, and iteratively optimized the classification loss that produced by the frame and segment, and the consistency loss between the teacher-student model and the segment classification model. The experimental results on a real dataset show that when the time tolerance is 5 seconds, the porn category recall rate reaches 94.3%, and the F1 score can reach 83.4%.

     

/

返回文章
返回