基于相似度融合和GCN的音频分类模型

AUDIO CLASSIFICATION MODEL BASED ON SIMILARITY FUSION AND GCN

  • 摘要: 为了充分利用样本间基于不同音频特征的相似度表示的拓扑结构特性的互补性,提出一种基于相似度融合和GCN的音频分类模型。分别利用基于CNN14和DenseNet的网络提取输入音频的特征,并进行音频类别的预测;利用相似度网络融合模型对基于以上两个网络获得的预测标签向量的相似度进行非线性融合;分别用DenseNet提取的特征和融合相似度网络对GCN的节点特征和邻接矩阵进行初始化,通过GCN进行节点特征优化以提升音频分类准确性。实验结果表明,在不同的音频分类任务中,该模型相比于基线模型取得了更高的分类准确率,且基于SNF的相似度融合模块和基于GCN的分类模块均对模型性能的提升有贡献。

     

    Abstract: In order to take full advantage of the complementarity of topological features represented by similarity of different audio features between samples, an audio classification model based on similarity fusion and GCN is proposed. The network based on CNN14 and DenseNet was used to extract the features of input audio, and the audio class prediction was conducted. The similarity network fusion (SNF) model was used to make nonlinear fusion of the mutual similarity networks, which were based on the similarity of the predicted tag vectors obtained from the above two networks. The features extracted by DenseNet were used to initialize the node features of Graph Convolutional Networks (GCN), and the fusion similarity network was used to initialize the adjacency matrix. The node features optimization by GCN could improve audio classification accuracy significantly. Experimental results show that the proposed model is superior to the baseline model in different audio classification tasks. The similarity fusion module based on SNF and the classification module based on GCN both contribute to the improvement of model performance.

     

/

返回文章
返回