Abstract:
In order to take full advantage of the complementarity of topological features represented by similarity of different audio features between samples, an audio classification model based on similarity fusion and GCN is proposed. The network based on CNN14 and DenseNet was used to extract the features of input audio, and the audio class prediction was conducted. The similarity network fusion (SNF) model was used to make nonlinear fusion of the mutual similarity networks, which were based on the similarity of the predicted tag vectors obtained from the above two networks. The features extracted by DenseNet were used to initialize the node features of Graph Convolutional Networks (GCN), and the fusion similarity network was used to initialize the adjacency matrix. The node features optimization by GCN could improve audio classification accuracy significantly. Experimental results show that the proposed model is superior to the baseline model in different audio classification tasks. The similarity fusion module based on SNF and the classification module based on GCN both contribute to the improvement of model performance.