基于不平衡数据处理与加权软投票异质集成的农户贷款违约风险预测

FARMER LOAN DEFAULT RISK PREDICTION BASED ON IMBALANCED DATA PROCESSING AND WEIGHTED SOFT VOTING HETEROGENEOUS ENSEMBLE MODEL

  • 摘要: 构建农户违约预测模型对深化农业信贷风险管理具有重要意义。针对违约数据不平衡问题,提出一种基于指标优化和不平衡数据处理的加权软投票异质集成模型。利用支持向量机递归特征消除法选取关键指标,结合模糊C均值聚类和SMOTE技术构建平衡训练样本。集成六种基学习器,通过验证集确定软投票权重,融合各模型预测结果,获得最终预测。实验表明,该模型相比单一模型、同质和其他异质集成模型具有更高精度。线性支持向量机的系数权重分析显示,农业生产性收入、未偿还贷款等指标与违约风险正相关,金融产品关注度等指标与违约风险负相关。

     

    Abstract: Building a default prediction model for farmers is significant for advancing agricultural credit risk management. To address imbalanced default data, a weighted soft-voting heterogeneous ensemble model based on indicator optimization and data balancing is proposed. Key indicators were selected via support vector machine recursive feature elimination, and balanced training samples were created using fuzzy C-means clustering and SMOTE techniques. Six base learners were integrated. Soft-voting weights were determined through validation sets, and predictions were fused for the final result. Experiments show that the model achieves higher accuracy than single, homogeneous, and other heterogeneous ensemble models. Analysis of linear support vector machine coefficients indicates that agricultural productive income and unpaid loans positively correlate with default risk, while attention to financial products negatively correlates.

     

/

返回文章
返回