第 46 卷第 11 期
2023 年 11 月
合肥工业大学学报
JOURNAL OF HEFEI UNIVERSITY OF TECHNOLOGY (NATURAL SCIENCE)
Vol. 46 No. 11
Nov. 2023

DOI:10.3969/j.issn.1003-5060.2023.11.021

基于代价敏感神经网络集成模型的类别不平衡问题研究

张俊杰,曹丽

(合肥工业大学数学学院,安徽合肥230601)

摘要

在解决类别不平衡问题的过程中, 传统分类模型往往偏向对大类别样本的学习, 影响模型分类效果。基于此, 文章从数据采样、模型选择 2 方面入手, 给出代价敏感神经网络集成 (cost-sensitive neural network ensemble, CSNN_Ensemble) 模型。首先通过随机下采样, 得到多组训练数据集; 其次对每组训练数据集训练 BP 神经网络, 并结合代价矩阵构造多个代价敏感神经网络; 最后以代价敏感神经网络为基学习器构造并行集成模型, 并以投票的方式进行最终决策。实验结果表明, 该模型在 $ F_{1} $ 值、AUC 值和期望总体代价 3 种性能方面表现优越, 并具有一定的鲁棒性。

关键词

类别不平衡;随机下采样;代价敏感神经网络(CSNN);集成模型;Friedman检验

中图分类号:TP181

文献标志码:A

文章编号:1003-5060(2023)11-1573-07

Research on class imbalance problem based on cost-sensitive neural network ensemble model

ZHANG Junjie, CAO Li

(School of Mathematics, Hefei University of Technology, Hefei 230601, China)

Abstract

In the process of solving the problem of class imbalance, the traditional classification model tends to prefer the learning of large class samples, which affects the classification effect of the model. Based on this, from the aspects of data sampling and model selection, a cost-sensitive neural network ensemble (CSNN_Ensemble) model is proposed. Firstly, several training data sets are obtained by random undersampling method. Secondly, back propagation (BP) neural networks are trained separately for each training data set, and several cost-sensitive neural networks are constructed by considering the cost matrix. Finally, the cost-sensitive neural networks are used to construct the parallel ensemble model, and the final decision of the model is made by voting. The results of the experiment show that the model has excellent performance in $ F_{1} $ value, AUC value and expected total cost, and has good robustness.

Keywords

class imbalance; random under-sampling; cost-sensitive neural network (CSNN); ensemble model; Friedman test

收稿日期:2022-03-09

修回日期:2022-03-18

基金项目:国家自然科学基金资助项目(41972304)