基于条件Wasserstein生成对抗网络的说话人辨认研究

SPEAKER IDENTIFICATION RESEARCH BASED ON CONDITIONAL WASSERSTEIN GENERATIVE ADVERSARIAL NETWORKS

  • 摘要: 在低资源场景下,由于传统的说话人辨认方法无法提取大量有效信息来训练网络,导致模型发生过拟合现象。受GAN在图像领域成功应用的启发,提出基于条件Wasserstein生成对抗网络C-WGAN的说话人辨认方法。该方法将真实样本的FBANK特征作为条件输入到生成器中以控制生成指定的模拟样本,并采用Wasserstein距离来衡量两个语音特征分布之间的距离,得到稳定的训练环境,避免了模式崩溃。实验结果表明,该方法的分类错误率CER 降低至1.96%,相对基线方法x-vector和CNN分别降低了67.2%和53.9%,同时在低采样率的情况下,该方法的识别准确率表现出很强的竞争力。

     

    Abstract: In low-resource scenarios, traditional speaker identification methods fail to extract sufficient effective information for network training, leading to model overfitting. Inspired by the success of generative adversarial networks GANs in image processing, we propose a speaker identification method based on conditional Wasserstein GAN C-WGAN. Real-sample FBANK features were fed as conditional input to the generator, controlling the synthesis of specified simulated samples. Wasserstein distance measured the divergence between speech feature distributions, stabilizing training and preventing mode collapse. Experiments show that the method achieves a classification error rate CER of 1. 96%, representing 67. 2% and 53. 9% reductions compared with x-vector and CNN baselines. It also demonstrates strong competitiveness under low sampling rates.

     

/

返回文章
返回