基于通道门控的Res2Net说话人识别

RES2NET SPEAKER IDENTIFICATION BASED ON CHANNEL GATING

  • 摘要: 针对当前说话人识别模型提取声纹特征辨识力较弱,无法精准地辨识说话人身份的问题,提出一种基于通道门控的Res2Net(CG-Res2Net)说话人识别算法。该算法通过Res2Net在一个残差块内建立层次化的残差连接,提高系统的声纹特征提取能力;在残差连接特征组之间采用通道门控机制,对声纹特征中重要通道和相对无用的通道分别赋予较高和较弱的权重。VoxCeleb1-test实验结果表明,CG-Res2Net的EER(Equal Error Rate)和minDCF(Minimum Detection Cost Function)两个评价指标优于Res2Net。相较于ResNet网络,EER和minDCF分别提升了38.05%和17.95%,相较于SE-Res2Net网络,EER和minDCF分别提升了17.5%和4.47%。

     

    Abstract: Aimed at the problem that the current speaker recognition model has weak recognition ability to extract voiceprint features and cannot accurately identify the identity of the speaker, a speaker recognition algorithm based on channel gated Res2Net (CG-Res2Net) is proposed. A hierarchical residual connection was established in a residuals block by Res2Net to improve the voicing feature extraction capability of the system. The channel gating mechanism was used between the residual connection feature groups, and the important channels and relatively useless channels in the voiceprint feature were given higher and lower weights respectively. VoxCeleb1-test results show that the EER and minDCF of CG-Res2Net are better than those of Res2Net. Compared with ResNet network, EER and minDCF increases 38.05% and 17.95%, respectively, while compared with SE-Res2Net network, EER and minDCF increased 17.5% and 4.47%, respectively.

     

/

返回文章
返回