融合注意力机制的ResNeXt语音欺骗检测模型

SPEECH ANTI-SPOOFING MODEL BASED ON RESNEXT WITH ATTENTION

  • 摘要: 针对残差神经网络在语音欺骗检测中存在超参数过多且对于高频特征显著性突出不够的问题,提出一种融合注意力机制的ResNeXt-Attention网络(RA-Net)。RA-Net采用残差结合分组卷积的方式,用一组小卷积核代替大卷积核,且采用MFM(Max Feature Map)作为新的拼接方法。加入的注意力机制通过学习原始特征的信息,减少了对边缘信息的关注。在ASVspoof2019数据集上实验表明,RA-Net相比基准线高斯混合模型(GMM)的等错误率(EER)降低了4.72百分点和6.23百分点,与残差网络(Residal Neural Network,ResNet)相比EER降低了0.69百分点和0.89百分点,证明了该模型的有效性。

     

    Abstract: Aimed at the problem that residual neural network has too many hyperparameters in speech deception detection, and the high-frequency features are not prominent enough, a ResNeXt-Attention network (RA-Net) fused with attention mechanism is proposed. RA-Net used residuals combined with grouped convolution, replaced large convolution kernels with a set of small convolution kernels, and used MFM (max feature map) as a new splicing method. The added attention mechanism reduced the attention to edge information by learning the original feature information. Experiments on the ASVspoof2019 data set show that compared with the baseline Gaussian mixture model (GMM), the equal error rate (EER) of RA-Net is reduced by 4.72 percentage points and 6.23 percentage points. And the EER is reduced by 0.69 percentage points and 0.89 percentage points compared with the residual network (ResNet). The validity of the model is confirmed.

     

/

返回文章
返回