Journal of Systems Engineering and Electronics ›› 2019, Vol. 30 ›› Issue (6): 1182-1191.doi: 10.21629/JSEE.2019.06.12
• Systems Engineering • Previous Articles Next Articles
Xiaolong XU1,*(), Wen CHEN2(), Yanfei SUN3()
Received:
2018-06-25
Online:
2019-12-20
Published:
2019-12-25
Contact:
Xiaolong XU
E-mail:xuxl@njupt.edu.cn;1216043012@njupt.edu.cn;sunyanfei@njupt.edu.cn
About author:
XU Xiaolong was born in 1977. He received his B.S. in computer and its applications, M.S. in computer software and theories and Ph.D. degree in communications and information systems at Nanjing University of Posts & Telecommunications, Nanjing, China, in 1999, 2002 and 2008, respectively. He worked as a postdoctoral researcher at the Station of Electronic Science and Technology, Nanjing University of Posts & Telecommunications from 2011 to 2013. He is currently a professor in College of Computer, Nanjing University of Posts & Telecommunications. He is a senior member of China Computer Federation. His current research interests include cloud computing and big data, mobile computing, intelligent agent and information security. E-mail: Supported by:
Xiaolong XU, Wen CHEN, Yanfei SUN. Over-sampling algorithm for imbalanced data classification[J]. Journal of Systems Engineering and Electronics, 2019, 30(6): 1182-1191.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
Table 3
$\mathbf {Precision}$, $\mathbf {Recall}$ and $\mathbf {F}$-$\mathbf{ value}$ of minority class based on the Pima dataset"
Method | ||||
Original | 0.606 | 0.563 | 0.584 | |
SMOTE | 100 | 0.565 | 0.737 | 0.64 |
200 | 0.547 | 0.768 | 0.639 | |
300 | 0.531 | 0.787 | 0.634 | |
400 | 0.533 | 0.813 | 0.643 | |
500 | 0.522 | 0.809 | 0.634 | |
DSMOTE | 100 | 0.574 | 0.737 | 0.646 |
200 | 0.550 | 0.795 | 0.65 | |
300 | 0.537 | 0.815 | 0.647 | |
400 | 0.529 | 0.843 | 0.65 | |
500 | 0.515 | 0.856 | 0.643 | |
Borderline-SMOTE | 100 | 0.545 | 0.763 | 0.636 |
200 | 0.524 | 0.789 | 0.629 | |
300 | 0.509 | 0.791 | 0.619 | |
400 | 0.513 | 0.814 | 0.629 | |
500 | 0.504 | 0.803 | 0.643 |
Table 4
$\mathbf {Precision}$, $\mathbf{ Recall}$ and $\mathbf {F}$-$\mathbf {value}$ of minority class based on the Breast-w dataset"
Method | ||||
Original | 0.910 | 0.892 | 0.901 | |
SMOTE | 100 | 0.906 | 0.939 | 0.922 |
200 | 0.905 | 0.946 | 0.925 | |
300 | 0.906 | 0.943 | 0.924 | |
400 | 0.906 | 0.954 | 0.929 | |
500 | 0.909 | 0.959 | 0.933 | |
DSMOTE | 100 | 0.913 | 0.953 | 0.932 |
200 | 0.906 | 0.954 | 0.929 | |
300 | 0.909 | 0.953 | 0.930 | |
400 | 0.909 | 0.954 | 0.931 | |
500 | 0.910 | 0.963 | 0.935 | |
Borderline-SMOTE | 100 | 0.907 | 0.952 | 0.929 |
200 | 0.906 | 0.950 | 0.927 | |
300 | 0.906 | 0.948 | 0.927 | |
400 | 0.905 | 0.946 | 0.925 | |
500 | 0.913 | 0.954 | 0.933 |
Table 5
$\mathbf {Precision}$, $\mathbf{ Recall}$ and $\mathbf {F}$-$\mathbf {value}$ of minority class based on the Vehicle dataset"
Method | ||||
Original | 0.874 | 0.874 | 0.874 | |
SMOTE | 100 | 0.890 | 0.894 | 0.892 |
200 | 0.902 | 0.879 | 0.891 | |
300 | 0.881 | 0.854 | 0.867 | |
400 | 0.875 | 0.844 | 0.859 | |
500 | 0.887 | 0.829 | 0.857 | |
DSMOTE | 100 | 0.901 | 0.915 | 0.908 |
200 | 0.905 | 0.910 | 0.907 | |
300 | 0.894 | 0.889 | 0.892 | |
400 | 0.916 | 0.874 | 0.895 | |
500 | 0.901 | 0.864 | 0.882 | |
Borderline-SMOTE | 100 | 0.894 | 0.894 | 0.894 |
200 | 0.890 | 0.854 | 0.872 | |
300 | 0.874 | 0.834 | 0.853 | |
400 | 0.887 | 0.864 | 0.875 | |
500 | 0.861 | 0.839 | 0.850 |
Table 6
$\mathbf {Precision}$, $\mathbf {Recall}$ and $\mathbf {F}$-$\mathbf {value}$ of minority class based on the Ecoli dataset"
Method | ||||
Original | 0.756 | 0.766 | 0.761 | |
SMOTE | 100 | 0.739 | 0.883 | 0.805 |
200 | 0.734 | 0.896 | 0.807 | |
300 | 0.701 | 0.883 | 0.782 | |
400 | 0.693 | 0.909 | 0.787 | |
500 | 0.697 | 0.896 | 0.784 | |
DSMOTE | 100 | 0.737 | 0.909 | 0.814 |
200 | 0.723 | 0.948 | 0.820 | |
300 | 0.711 | 0.896 | 0.793 | |
400 | 0.699 | 0.935 | 0.800 | |
500 | 0.706 | 0.936 | 0.805 | |
Borderline-SMOTE | 100 | 0.697 | 0.896 | 0.784 |
200 | 0.683 | 0.922 | 0.785 | |
300 | 0.66 | 0.882 | 0.756 | |
400 | 0.645 | 0.922 | 0.759 | |
500 | 0.642 | 0.909 | 0.753 |
1 | TAN X P, SU S J, HUANG Z P, et al. Wireledss sensor networks intrusion detection based on SMOTE and the random forest algorithm. Sensors, 2019, 19 (1): 203. |
2 | LI C L, LIU S G. A comparative study of the class imbalance problem in Twitter spam detection. Concurrency and Computation: Practice and Experience, 2017, 30 (5): e4281. |
3 | LI Y L, SUN G S, ZHU Y H. Data imbalance problem in text classification. Proc. of the 3rd International Symposium on Information Processing, 2010, 301- 305. |
4 |
ZHU M, XIA J, JIN X Q, et al. Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access, 2018, 6, 4641- 4652.
doi: 10.1109/ACCESS.2018.2789428 |
5 | WEI X. Research of ensemble classification methods for class-imbalance and cost-sensitive datasets. Hefei, China: University of Science and Technology of China, 2017. |
6 | CHAWLA N V, LAZAREVIC A, HALL L O, et al. SMOTEBoost: improving prediction of the minority class in boosting. Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2003, 107- 119. |
7 | FREUND Y. Experiment with a new boosting algorithm. Proc. of the 13th International Conference on Machine Learning, 1996, 148- 156. |
8 | FAN W, STOLFO S J, ZHANG J. AdaCost: misclassification cost-sensitive boosting. Proc. of the 6th International Conference on Machine Learning, 1997, 97- 105. |
9 | CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2011, 16 (1): 321- 357. |
10 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Proc. of the International Conference on Advances in Intelligent Computing, 2005, 878- 887. |
11 | ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. of the International Conference on Knowledge Discovery and Data Mining, 1996, 226- 231. |
12 | WASIKOWSKI M. Combating the class imbalance problem in small sample data sets. Kansas, USA: University of Kansas, 2009. |
13 | JOSHI M V, KUMAR V, AGARWAL R C. Evaluating boosting algorithms to classify rare classes: comparison and improvements. Proc. of the IEEE International Conference on Data Mining, 2001, 257- 264. |
14 | WU G, CHANG E Y. Class-boundary alignment for imbalanced data set learning. Proc. of the Workshop on Learning from Imbalanced Data Sets, 2003, 49- 56. |
15 |
HUANG K Z, YANG H Q, KING I, et al. Imbalanced learning with a biased minimax probability machine. IEEE Trans. on Systems, Man and Cybernetics, 2006, 36 (4): 913- 923.
doi: 10.1109/TSMCB.2006.870610 |
16 | TOMEK I. Two modifications of CNN. IEEE Trans. on Systems, Man and Cybernetics, 1976, 6 (11): 769- 772. |
17 |
SÁEZ J A, LUENGO J, STEFANOWSKI J, et al. SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 2015, 291, 184- 203.
doi: 10.1016/j.ins.2014.08.051 |
18 |
MA L, FAN S H. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinformatics, 2017, 18, 169.
doi: 10.1186/s12859-017-1578-z |
19 | DONG Y J, WANG X H. A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. Proc. of the 5th International Conference on Knowledge Science, Engineering and Management, 2011, 343- 352. |
20 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning. Proc. of the IEEE World Congress on Computational Intelligence, 2008, 1322- 1328. |
21 |
BUNKHUMPORNPAT C, SINAPIROMSARAN K, LURSINSAP C. DBSMOTE: density-based synthetic minority over-sampling technique. Applied Intelligence, 2012, 36 (3): 664- 684.
doi: 10.1007/s10489-011-0287-y |
22 |
UTIÉRREZ P D, LASTRA M, BENÍTEZ J M, et al. SMOTE-GPU: big data preprocessing on commodity hardware for imbalanced classification. Progress in Artificial Intelligence, 2017, 6 (4): 347- 354.
doi: 10.1007/s13748-017-0128-2 |
23 | ZHOU C S, LIU B, WANG S H. CMO-SMOTE: misclassification cost minimization oriented synthetic minority oversampling technique for imbalanced learning. Proc. of the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics, 2016, 353- 358. |
24 | ZHANG C, CHEN Y E, LIU X H, et al. Abstention-SMOTE: an over-sampling approach for imbalanced data classification. Proc. of the International Conference on Information Technology, 2017, 17- 21. |
25 | ZHANG Y, ZHANG H, ZHANG X, et al. Deep learning intrusion detection model based on optimized imbalanced network data. Proc. of the 18th International Conference on Communication Technology, 2018, 1128- 1132. |
26 |
JIANG K, LU J, XIA K L. A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE. Arabian Journal for Science and Engineering, 2016, 41 (8): 3255- 3266.
doi: 10.1007/s13369-016-2179-2 |
27 | PRUSTY M R, JAYANTHI T, VELUSAMY K. Weighted-SMOTE: a modification to SMOTE for event classification in sodium cooled fast reactors. Progress in Nuclear Energy, 2017, 100 (9): 355- 364. |
28 | GONG C L, GU L X. A novel SMOTE-based classification approach to online data imbalance problem. Mathematical Problems in Engineering, 2016, 5685970. |
29 |
XUE W, ZHANG J. Dealing with imbalanced dataset: are-sampling method based on the improved SMOTE algorithm. Communications in Statistics-Simulation and Computation, 2016, 45 (4): 1160- 1172.
doi: 10.1080/03610918.2012.728274 |
30 | SU P H, LIU Y H, SONG X. Research on intrusion detection method based on improved smote and XGBoost. Proc. of the 8th International Conference on Communication and Network Security, 2018, 37- 41. |
31 | BHAGAT R C, PATIL S S. Enhanced SMOTE algorithm for classification of imbalanced big-data using random forest. Proc. of the IEEE International Advance Computing Conference, 2015, 403- 408. |
32 | DEMIDOVA L, KLYUEVA I. SVM classification: optimization with the SMOTE algorithm for the class imbalance problem. Proc. of the 6th Mediterranean Conference on Embedded Computing, 2017, 1- 4. |
33 | JUNSOMBOON N, PHIENTHRAKUL T. Combining over-sampling and under-sampling techniques for imbalance dataset. Proc. of the 9th International Conference on Machine Learning and Computing, 2017, 243- 247. |
34 | GOSAIN A, SARDANA S. Farthest SMOTE: a modified SMOTE approach. Proc. of the International Conference on Computational Intelligence in Data Mining, 2017, 309- 320. |
35 |
SUN J, LANG J, FUJITA H, et al. Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 2018, 425, 76- 91.
doi: 10.1016/j.ins.2017.10.017 |
36 |
HARLIMAN K U R, UCHIDA K. Data-and algorithm-hybrid approach for imbalanced data problems in deep neural network. International Journal of Machine Learning and Computing, 2018, 8 (3): 208- 213.
doi: 10.18178/ijmlc.2018.8.3.689 |
37 |
TAY F E H, SHEN L. A modified Chi2 algorithm for discretization. IEEE Trans. on Knowledge and Data Engineering, 2002, 14 (3): 666- 670.
doi: 10.1109/TKDE.2002.1000349 |
38 | BAY S D. The UCI KDD repository. http://kdd.ics.uci.edu. |
[1] | Aref YELGHI, Cemal KÖSE, Asef YELGHI, Amir SHAHKAR. Automatic fuzzy-DBSCAN algorithm for morphological and overlapping datasets [J]. Journal of Systems Engineering and Electronics, 2020, 31(6): 1245-1253. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||