SPEAKER IDENTIFICATION RESEARCH BASED ON CONDITIONAL WASSERSTEIN GENERATIVE ADVERSARIAL NETWORKS
-
Abstract
In low-resource scenarios, traditional speaker identification methods fail to extract sufficient effective
information for network training, leading to model overfitting. Inspired by the success of generative adversarial networks GANs in image processing, we propose a speaker identification method based on conditional Wasserstein GAN C-WGAN. Real-sample FBANK features were fed as conditional input to the generator, controlling the synthesis of specified simulated samples. Wasserstein distance measured the divergence between speech feature distributions, stabilizing training and preventing mode collapse. Experiments show that the method achieves a classification error rate CER of 1. 96%, representing 67. 2% and 53. 9% reductions compared with x-vector and CNN baselines. It also demonstrates strong competitiveness under low sampling rates.
-
-