FARMER LOAN DEFAULT RISK PREDICTION BASED ON IMBALANCED DATA
PROCESSING AND WEIGHTED SOFT VOTING HETEROGENEOUS ENSEMBLE MODEL
-
Abstract
Building a default prediction model for farmers is significant for advancing agricultural credit risk
management. To address imbalanced default data, a weighted soft-voting heterogeneous ensemble model based on indicator optimization and data balancing is proposed. Key indicators were selected via support vector machine recursive feature elimination, and balanced training samples were created using fuzzy C-means clustering and SMOTE techniques. Six base learners were integrated. Soft-voting weights were determined through validation sets, and predictions were fused for the final result. Experiments show that the model achieves higher accuracy than single, homogeneous, and other heterogeneous ensemble models. Analysis of linear support vector machine coefficients indicates that agricultural productive income and unpaid loans positively correlate with default risk, while attention to financial products negatively correlates.
-
-