Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (2): 389-398.doi: 10.23919/JSEE.2021.000032
• ELECTRONICS TECHNOLOGY • Previous Articles Next Articles
Xiaolong XU1(), Wen CHEN2(), Xinheng WANG3,*()
Received:
2020-01-20
Online:
2021-04-29
Published:
2021-04-29
Contact:
Xinheng WANG
E-mail:xuxl@njupt.edu.cn;1216043012@njupt.edu.cn;xinheng.wang@uwl.ac.uk
About author:
Supported by:
Xiaolong XU, Wen CHEN, Xinheng WANG. RFC: a feature selection algorithm for software defect prediction[J]. Journal of Systems Engineering and Electronics, 2021, 32(2): 389-398.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
Table 1
LOC counts and Halstead complexity"
Number | Metric | Number | Metric | |
0 | LOC_BLANK | 20 | HALSTEAD_DIFFICULTY | |
1 | BRANCH_COUNT | 21 | HALSTEAD_EFFORT | |
2 | CALL_PAIRS | 22 | HALSTEAD_ERROR_EST | |
3 | LOC_CODE_AND_COMMENT | 23 | HALSTEAD_LENGTH | |
4 | LOC_COMMENTS | 24 | HALSTEAD_LEVEL | |
5 | CONDITION_COUNT | 25 | HALSTEAD_PROG_TIME | |
6 | CYCLOMATIC_COMPLEXITY | 26 | HALSTEAD_VOLUME | |
7 | CYCLOMATIC_DENSITY | 27 | MAINTENANCE_SEVERITY | |
8 | DECISION_COUNT | 28 | MODIFIED_CONDITION_COUNT | |
9 | DECISION_DENSITY | 29 | MULTIPLE_CONDITION_COUNT | |
10 | DESIGN_COMPLEXITY | 30 | NODE_COUNT | |
11 | DESIGN_DENSITY | 31 | NORMALIZED_CYLOMATIC_COMPLEXITY | |
12 | EDGE_COUNT | 32 | NUM_OPERANDS | |
13 | ESSENTIAL_COMPLEXITY | 33 | NUM_OPERATORS | |
14 | ESSENTIAL_DENSITY | 34 | NUM_UNIQUE_OPERANDS | |
15 | LOC_EXECUTABLE | 35 | NUM_UNIQUE_OPERATORS | |
16 | PARAMETER_COUNT | 36 | NUMBER_OF_LINES | |
17 | GLOBAL_DATA_COMPLEXITY | 37 | PERCENT_COMMENTS | |
18 | GLOBAL_DATA_DENSITY | 38 | LOC_TOTAL | |
19 | HALSTEAD_CONTENT |
Table 4
AUC of the J48 classifier after using different feature selection methods"
Dataset | NONE | IG | CS | ReliefF | RFC | CFS |
pc1 | 0.669 | 0.753 | 0.699 | 0.519 | 0.715 | 0.727 |
pc3 | 0.647 | 0.602 | 0.601 | 0.534 | 0.668 | 0.646 |
pc4 | 0.755 | 0.873 | 0.881 | 0.511 | 0.680 | 0.859 |
kc1 | 0.700 | 0.747 | 0.746 | 0.715 | 0.737 | 0.702 |
kc3 | 0.567 | 0.574 | 0.624 | 0.547 | 0.656 | 0.679 |
kc4 | 0.738 | 0.750 | 0.751 | 0.757 | 0.767 | 0.761 |
mc2 | 0.647 | 0.562 | 0.577 | 0.611 | 0.639 | 0.550 |
cm1 | 0.531 | 0.561 | 0.553 | 0.500 | 0.631 | 0.575 |
mw1 | 0.429 | 0.551 | 0.553 | 0.527 | 0.564 | 0.583 |
Average | 0.631 | 0.664 | 0.665 | 0.580 | 0.673 | 0.676 |
Table 5
F-value of the J48 classifier after using different feature selection methods"
Dataset | NONE | IG | CS | ReliefF | RFC | CFS |
pc1 | 0.309 | 0.230 | 0.191 | 0.025 | 0.289 | 0.222 |
pc3 | 0.282 | 0.016 | 0.012 | 0.012 | 0.133 | 0.027 |
pc4 | 0.518 | 0.471 | 0.465 | 0.004 | 0.118 | 0.393 |
kc1 | 0.381 | 0.320 | 0.330 | 0.285 | 0.273 | 0.277 |
kc3 | 0.292 | 0.040 | 0.195 | 0.088 | 0.094 | 0.279 |
kc4 | 0.789 | 0.768 | 0.776 | 0.778 | 0.806 | 0.789 |
mc2 | 0.487 | 0.367 | 0.377 | 0.414 | 0.464 | 0.330 |
cm1 | 0.132 | 0.026 | 0.026 | 0.014 | 0.103 | 0.061 |
mw1 | 0.213 | 0.207 | 0.231 | 0.139 | 0.200 | 0.303 |
Average | 0.378 | 0.272 | 0.289 | 0.195 | 0.276 | 0.298 |
Table 6
AUC of Na?ve Bayes after using different feature selection methods"
Dataset | NONE | IG | CS | ReliefF | RFC | CFS |
pc1 | 0.733 | 0.703 | 0.687 | 0.712 | 0.749 | 0.75 |
pc3 | 0.767 | 0.79 | 0.8 | 0.744 | 0.771 | 0.78 |
pc4 | 0.836 | 0.825 | 0.823 | 0.805 | 0.813 | 0.818 |
kc1 | 0.792 | 0.769 | 0.771 | 0.789 | 0.788 | 0.75 |
kc3 | 0.814 | 0.751 | 0.765 | 0.772 | 0.791 | 0.787 |
kc4 | 0.752 | 0.742 | 0.743 | 0.739 | 0.753 | 0.701 |
mc2 | 0.703 | 0.627 | 0.636 | 0.628 | 0.712 | 0.652 |
cm1 | 0.736 | 0.748 | 0.741 | 0.745 | 0.768 | 0.746 |
mw1 | 0.751 | 0.753 | 0.727 | 0.686 | 0.755 | 0.751 |
Average | 0.765 | 0.745 | 0.744 | 0.736 | 0.767 | 0.748 |
Table 7
F-value of Na?ve Bayes after using different feature selection methods"
Dataset | NONE | IG | CS | ReliefF | RFC | CFS |
pc1 | 0.278 | 0.269 | 0.288 | 0.081 | 0.267 | 0.277 |
pc3 | 0.256 | 0.348 | 0.348 | 0.171 | 0.338 | 0.373 |
pc4 | 0.434 | 0.44 | 0.437 | 0.388 | 0.447 | 0.43 |
kc1 | 0.4 | 0.381 | 0.383 | 0.429 | 0.421 | 0.38 |
kc3 | 0.346 | 0.345 | 0.341 | 0.324 | 0.34 | 0.378 |
kc4 | 0.508 | 0.441 | 0.437 | 0.644 | 0.532 | 0.433 |
mc2 | 0.448 | 0.425 | 0.425 | 0.366 | 0.49 | 0.45 |
cm1 | 0.307 | 0.294 | 0.296 | 0.304 | 0.333 | 0.303 |
mw1 | 0.326 | 0.361 | 0.365 | 0.203 | 0.403 | 0.391 |
Average | 0.367 | 0.367 | 0.369 | 0.323 | 0.397 | 0.379 |
Table 8
Proportion of features selected with different feature selection methods"
Dataset | IG | CS | ReliefF | RFC | CFS |
pc1 | 15 | 15 | 15 | 17.25 | 20.67 |
pc3 | 15 | 15 | 15 | 17.25 | 23.42 |
pc4 | 15 | 15 | 15 | 18.41 | 12.50 |
kc1 | 23.81 | 23.81 | 23.81 | 27.78 | 41.75 |
kc3 | 15 | 15 | 15 | 17.25 | 19.25 |
kc4 | 15 | 15 | 15 | 17.25 | 14.75 |
mc2 | 15 | 15 | 15 | 17.25 | 27.67 |
cm1 | 15 | 15 | 15 | 16.17 | 23.25 |
mw1 | 15 | 15 | 15 | 17.25 | 20.92 |
Average | 15.98 | 15.98 | 15.98 | 18.43 | 22.69 |
Table 9
Indexes of features on all data sets selected with different feature selection methods"
Dataset | IG | CS | ReliefF | RFC | CFS |
pc1 | 0 40 4 34 19 36 | 3 39 40 4 34 36 | 38 40 16 27 7 11 | 0 38 2 27 34 11 36 | 0 31 3 40 4 30 7 19 36 |
pc3 | 0 3 40 19 34 36 | 0 3 40 34 19 36 | 38 40 16 27 7 11 | 0 31 38 16 19 11 36 | 0 38 3 40 4 27 19 26 34 |
pc4 | 38 0 3 40 29 5 | 28 38 3 40 29 5 | 9 38 40 16 24 14 | 9 38 39 35 16 24 27 | 38 3 40 16 8 |
kc1 | 21 16 11 15 12 | 17 21 16 15 12 | 21 19 13 9 8 | 20 19 7 13 9 8 | 0 6 21 3 7 9 1 8 18 |
kc3 | 40 2 32 22 23 26 | 40 2 32 22 1 6 | 40 16 18 24 27 7 | 31 35 16 24 27 7 | 3 40 32 |
kc4 | 12 31 40 2 30 10 | 12 31 40 2 30 10 | 31 39 40 2 27 11 | 12 31 27 1 30 11 36 6 | 12 31 40 2 1 |
mc2 | 12 21 40 2 18 30 | 12 40 2 18 20 30 | 40 16 18 27 7 11 | 2 16 35 18 27 20 7 11 | 28 21 0 3 40 18 33 2 4 22 20 30 14 |
cm1 | 39 40 35 15 4 34 | 39 40 35 15 4 34 | 38 31 40 24 27 11 | 31 38 35 24 27 11 | 38 40 4 15 20 7 19 10 36 |
mw1 | 12 40 22 30 34 36 | 12 0 40 30 36 10 | 38 40 24 27 19 11 | 28 38 39 5 27 8 34 11 | 21 0 40 4 5 30 19 34 10 |
1 | CHEN X, GU Q, LIU W S, et al Survey of static software defect prediction. Journal of Software, 2016, 27 (1): 1- 25. |
2 |
WANG Q, WU S J, LI M S Software defect prediction. Journal of Software, 2008, 19 (7): 1565- 1580.
doi: 10.3724/SP.J.1001.2008.01565 |
3 | HALSTEAD M H. Elements of software science. New York: Elsevier, 1977. |
4 |
MCCABE T J A complexity measure. IEEE Trans. on Software Engineering, 1976, SE-2 (4): 308- 320.
doi: 10.1109/TSE.1976.233837 |
5 | ABREU F B E, MELO W. Evaluating the impact of object-oriented design on software quality. Proc. of the 3rd International Conference on Software Metrics Symposium, 1996: 90−99. |
6 |
CHIDAMBER S R, KEMERER C F A metrics suite for object oriented design. IEEE Trans. on Software Engineering, 1994, 20 (6): 1476- 1493.
doi: 10.1109/32.295895 |
7 | CHEN X, SHEN Y X, MENG S Q, et al Multi-objective optimization based feature selection method for software defect prediction. Journal of Frontiers of Computer Science and Technology, 2018, 12 (9): 1420- 1433. |
8 | KONONENKO I. Estimating attributes: analysis and extensions of RELIEF. Proc. of the European Conference on Machine Learning, 1994: 171–182. |
9 | XIA Y, YAN G Y, SI Q R. A study on the significance of software metrics in defect prediction. Proc. of the 6th International Symposium on Computational Intelligence and Design, 2013: 343–346. |
10 | DANIJEL R, MARJAN H, TORKAR R Software fault prediction metrics: a systematic literature review. Information & Software Technology, 2013, 55 (8): 1397- 1418. |
11 | PUNITHA K, CHITRA S. Software defect prediction using software metrics-a survey. Proc. of the Information Communication and Embedded Systems Conference, 2013: 555–558. |
12 | AGRAWAL A, MENZIES T. Is “better data” better than “better data miners”? (on the benefits of tuning SMOTE for defect prediction). https://arxiv.org/pdf/1705.03697.pdf. |
13 |
JING X Y, WU F, DONG X, et al An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. on Software Engineering, 2017, 43 (4): 321- 339.
doi: 10.1109/TSE.2016.2597849 |
14 | TANTITHAMTHAVORN C, HASSAN A E, MATSUMOTO K. The impact of mislabelling on the performance and interpretation of defect prediction models. Proc. of the 37th International Conference on Software Engineering, 2015: 812–823. |
15 | CHEN L, FANG B, SHANG Z W Tackling class overlap and imbalance problems in software defect prediction. Software Quality Journal, 2016, 26 (9): 97- 125. |
16 |
STUCKMAN J, WALDEN J, SCANDARIATO R The effect of dimensionality reduction on software vulnerability prediction models. IEEE Trans. on Reliability, 2017, 66 (1): 17- 37.
doi: 10.1109/TR.2016.2630503 |
17 | KIRA K, RENDELL L. A practical approach to feature selection. Proc. of the 9th International Workshop on Machine Learning, 1992: 249–256. |
18 |
SHIVKUMAR S, WHITEHEAD E J, AKELLA R, et al Reducing features to improve code change-based bug prediction. IEEE Trans. on Software Engineering, 2013, 39 (4): 552- 569.
doi: 10.1109/TSE.2012.43 |
19 | GUO L, MA Y, CUKIC B, et al. Robust prediction of fault-proneness by random forests. Proc. of the 15th Software Reliability Engineering Conference, 2004: 417–428. |
20 |
MENZIES T, GREENWALD J, FRANK A Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 2007, 33 (1): 2- 13.
doi: 10.1109/TSE.2007.256941 |
21 | RODRIGUEZ D, RUIZ R, CUADRADO-GALLEGO J J, et al. Detecting fault modules applying feature selection to classifiers. Proc. of the IEEE International Conference on Information Reuse & Integration, 2007: 667–672. |
22 |
CATAL C, DIRI B Unlabelled extra data do not always mean extra performance for semi-supervised fault prediction. Expert Systems, 2009, 26 (5): 458- 471.
doi: 10.1111/j.1468-0394.2009.00509.x |
23 |
GAO K, KHOSHGOFTAAR T M, WANG H, et al Choosing software metrics for defect prediction: an investigation on feature selection methods. Software: Practice and Experience, 2011, 41 (5): 579- 606.
doi: 10.1002/spe.1043 |
24 | WANG H, KHOSHGOFTAAR T M, NAPOLITANO. A comparative study of ensemble feature selection methods for software defect prediction. Proc. of the 9th International Conference on Machine Learning & Applications, 2010: 135–140. |
25 |
BENNIN K E, KEUNG J, PHANNACHITTA P, et al MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans. on Software Engineering, 2018, 44 (6): 534- 550.
doi: 10.1109/TSE.2017.2731766 |
26 |
MIHOLCA D L, CZIBULA G, CZIBULA I G A novel approach for software defect prediction through hybridizing gradual relational association rules with artificial neural networks. Information Sciences, 2018, 441, 152- 170.
doi: 10.1016/j.ins.2018.02.027 |
27 |
CAO Y, DING Z M, XUE F, et al An improved twin support vector machine based on multi-objective cuckoo search for software defect prediction. International Journal of Bio-Inspired Computation, 2018, 11 (4): 282- 291.
doi: 10.1504/IJBIC.2018.092808 |
28 | YU L, LIU H. Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc. of the 20th International Conference on Machine Learning, 2003: 856–863. |
29 | CHAPMAN M, CALLIS P, JACKSON W. Metrics data program. http://mdp.ivv.nasa.gov. |
30 | JIANG Y, LIN J, CUKIC B, et al. Variance analysis in software fault prediction models. Proc. of the 20th International Symposium on Software Reliability Engineering, 2009: 99–108 |
31 | HALL M A. Correlation-based feature selection for discrete and numeric class machine learning. Proc. of the 17th International Conference on Machine Learning, 2000: 359–366. |
[1] | Hao FENG, Jianzhong WU, Lu ZHANG, Mingsheng LIAO. Unsupervised change detection of man-made objects using coherent and incoherent features of multi-temporal SAR images [J]. Journal of Systems Engineering and Electronics, 2022, 33(4): 896-906. |
[2] | Lin TANG, Leilei SUN, Chonghui GUO, Zhen ZHANG. Adaptive spectral affinity propagation clustering [J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 647-664. |
[3] | Jinfeng LYU, Fucai LIU, Yaxue REN. Fuzzy identification of nonlinear dynamic system based on selection of important input variables [J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 737-747. |
[4] | Yan MA, Zhaoyong MAO, Jian QIN, Xiangyao MENG, Yujie XIAO, Jianhua CHEN, Wei FENG. Cluster segmentation algorithm based on the Vicsek with static summoning points [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 607-618. |
[5] | Zhongxiang CHANG, Zhongbao ZHOU, Feng YAO, Xiaolu LIU. Observation scheduling problem for AEOS with a comprehensive task clustering [J]. Journal of Systems Engineering and Electronics, 2021, 32(2): 347-364. |
[6] | Xuhao GUI, Junfeng ZHANG, Zihan PENG. Trajectory clustering for arrival aircraft via new trajectory representation [J]. Journal of Systems Engineering and Electronics, 2021, 32(2): 473-486. |
[7] | Pengcheng GUO, Zheng LIU, Jingjing WANG. Radar group target recognition based on HRRPs and weighted mean shift clustering [J]. Journal of Systems Engineering and Electronics, 2020, 31(6): 1152-1159. |
[8] | Aref YELGHI, Cemal KÖSE, Asef YELGHI, Amir SHAHKAR. Automatic fuzzy-DBSCAN algorithm for morphological and overlapping datasets [J]. Journal of Systems Engineering and Electronics, 2020, 31(6): 1245-1253. |
[9] | Gaofeng WU, Kaifang WAN, Xiaoguang GAO, Xiaowei FU. Placement of unmanned aerial vehicles as communication relays in two-tiered multi-agent system: clustering based methods [J]. Journal of Systems Engineering and Electronics, 2020, 31(2): 231-242. |
[10] | Rui SUN, Qiheng HUANG, Wei FANG, Xudong ZHANG. Attributes-based person re-identification via CNNs with coupled clusters loss [J]. Journal of Systems Engineering and Electronics, 2020, 31(1): 45-55. |
[11] | Xiaolong XU, Wen CHEN, Yanfei SUN. Over-sampling algorithm for imbalanced data classification [J]. Journal of Systems Engineering and Electronics, 2019, 30(6): 1182-1191. |
[12] | Jiajun HUANG, Chaojie ZHANG, Xiaojun JIN. Approach to MAI cancellation for micro-satellite clusters [J]. Journal of Systems Engineering and Electronics, 2019, 30(5): 823-830. |
[13] | Jiayun CHANG, Xiongjun FU, Wen JIANG, Min XIE. Wideband radar detector based on characteristic parameters of echoes [J]. Journal of Systems Engineering and Electronics, 2019, 30(5): 897-904. |
[14] | Xinglin SHEN, Zhiyong SONG, Hongqi FAN, Qiang FU. Fast density peak-based clustering algorithm for multiple extended target tracking [J]. Journal of Systems Engineering and Electronics, 2019, 30(3): 435-447. |
[15] | Zhiqiang JIAO, Peiyang YAO, Jieyong ZHANG, Yun ZHONG, Xun WANG. MAV/UAV task coalition phased-formation method [J]. Journal of Systems Engineering and Electronics, 2019, 30(2): 402-414. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||