Structure learning on Bayesian networks by finding the optimal ordering with and without priors

doi:10.21629/JSEE.2018.06.09

Abstract

Abstract:

Ordering based search methods have advantages over graph based search methods for structure learning of Bayesian networks in terms on the efficiency. With the aim of further increasing the accuracy of ordering based search methods, we first propose to increase the search space, which can facilitate escaping from the local optima. We present our search operators with majorizations, which are easy to implement. Experiments show that the proposed algorithm can obtain significantly more accurate results. With regard to the problem of the decrease on efficiency due to the increase of the search space, we then propose to add path priors as constraints into the swap process. We analyze the coefficient which may influence the performance of the proposed algorithm, the experiments show that the constraints can enhance the efficiency greatly, while has little effect on the accuracy. The final experiments show that, compared to other competitive methods, the proposed algorithm can find better solutions while holding high efficiency at the same time on both synthetic and real data sets.

Key words: Bayesian network, structure learning, ordering search space, graph search space, prior constraint

Chuchao HE, Xiaoguang GAO, Zhigao GUO. Structure learning on Bayesian networks by finding the optimal ordering with and without priors[J]. Journal of Systems Engineering and Electronics, 2018, 29(6): 1209-1227.

Figures/Tables 21

Table 1

Fig 1

Fig 2

Fig 3

Fig 4

Fig 5

Table 2

Fig 6

Fig 7

Fig 8

Table 3

Fig 9

Fig 10

Fig 11

Input: Data, node sizes, a random order, priors, confidence factor $p$, prior function SU$'$
Output: the final order
$pc_accept$——the probability to accept the transformation when consistent with the prior
$pc_reject$——the probability to reject the transformation when inconsistent with the prior
current_ord——the current order who holds the best score
best_score——the score of the current order
$t$——the subscript variable $sc$——the score of the order obtained by transformation
1. $pc_accept$ $\leftarrow $ interpolate SU$'$ at value $p$; $pc_reject$ $\leftarrow $ interpolate SU$'$ at value $(1-p)$; the final order $\leftarrow $ a random order; current_ord $\leftarrow $ the final order; best_score $\leftarrow $ K2(current_ord); $t \leftarrow 0$;
2.Repeat
3. For $i = t + 2: n$
4. $trans_prob$ $\leftarrow $ a random value between 0 and 1;
5. node1 $\leftarrow $ the $i$th node in the current_ord;
6. node2 $\leftarrow $ the $(t+1)$th node in the current_ord;
7. if there's a directed path from node1 to node2
8. if $trans_prob > pc_accept$ //accept the transformation in a large probability
9. Update the {current_ord} by swapping node1 and node2;
10. $sc$ $\leftarrow $ K2(current_ord);
11. if $sc$ $ > $ {best_score}
12. best_score $\leftarrow $ $sc$;
13. the final order $\leftarrow $ {current_ord};
14. $t\leftarrow 0$;
15. else
16. end if
17. $t\leftarrow 0$;
18. end if
19. else//if there's no direct path from node1 to node2
20. if $trans_prob > pc_reject$ //accept the transformation in a small probability
21. Update the {current_ord} by swapping node1 and node2;
22. $sc$ $\leftarrow $ K2({current_ord});
23. if $sc$ $ > $ best_score
24. best_score $\leftarrow $ $sc$;
25. the final order $\leftarrow $ {current_ord};
26. $t\leftarrow 0$;
27. else
28. current_ord $\leftarrow $ the final order;
29. end if
30. end if
31. end if
32. End for
33. $t \leftarrow t + 1$;
34. Until $t = = n - 2$
35.Return the final order

Fig 12

Fig 14

Table 5

Table 6

Table 7

References 39

1	PEARL J. Probabilistic reasoning in intelligent system. Journal of Philosophy, 1988, 88 (8): 1022- 1027. doi: 10.5840/jphil199188844
2	FRIEDMAN N, KOLLER D. Being Bayesian about network structure: a Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 2003, 50 (1/2): 95- 125. doi: 10.1023/A:1020249912095
3	SPIRTES P, GLYMOUR C, SCHEINES R. Causation, prediction, and search. Technometrics, 1996, 45 (3): 272- 273.
4	FRIEDMAN N, LINIAL M, NACHMAN I, et al. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 2000, 7 (3/4): 601- 620.
5	ALIFERIS C F, TSAMARDINOS I, STATNIKOV A. HITON: a novel Markov blanket algorithm for optimal variable selection. Proc. of AMIA Annual Symposium, 2003: 21-25.
6	TSAMARDINOS I, ALIFERIS C F, STATNIKOV A. Time and sample efficient discovery of Markov blankets and direct causal relation. Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003: 673-678.
7	RICARDO B Y, BERTHIER R N. Modern information retrieval. New York: ACM Press, 2011.
8	CHAPMAN W W, FIZMAN M, CHAPMAN B E, et al. A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. Journal of Biomedical Informatics, 2001, 34 (1): 4- 14.
9	ACID S, CAMPOS L M D, FERNANDEZ-LUNA J M, et al. A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service. Artificial Intelligence in Medicine, 2004, 30 (3): 215- 232.
10	CHICKERING D M, MEEK C, HECKERMAN D. Largesample learning of Bayesian networks is NP-hard. Proc. of the 19th Conference on Uncertainty in Artificial Intelligence, 2002: 124-133.
11	CHICKERING D M. Learning Bayesian networks is NPcomplete. Networks, 1996, 112 (2): 121- 130.
12	COOPER G F, HERSKOVITS E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 1992, 9 (4): 309- 347.
13	TEYSSIER M, KOLLER D. Ordering-based search: a simple and effective algorithm for learning Bayesian networks. Proc. of the Conference on Uncertainty in Artificial Intelligence, 2012: 584-590.
14	SPIRTES P. Causality from probability. Evolving Knowledge in the Natural and Behavioral Science, 1989.
15	SPIRTES P. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 1991, 9 (1): 62- 72.
16	CHEN J, BELL D A, LIU W. Learning belief networks from data: an information theory based approach. Proc. of the 6th International Conference on Information and Knowledge Management, 1997: 325-331.
17	MADIGAN D, YORK J, ALLARD D. Bayesian graphical models for discrete data. International Statistical Review, 1995, 63 (2): 215- 232. doi: 10.2307/1403615
18	GRZEGORCZYK M, HUSMEIER D. Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Machine Learning, 2008, 71 (2/3): 265- 305.
19	SCHWARZ G. Estimating the dimension of a model. Annals of Statistics, 1978, 6 (2): 15- 18.
20	SUZUKI J. Learning Bayesian belief networks based on the minimum description length principle: basic properties. IEICE Trans. on Fundamentals of Electronics Communications & Computer Sciences, 1996, 82 (10): 2237- 2245.
21	BOUCHAALA L, MASMOUDI A, GARGOURIC F, et al. Improving algorithms for structure learning in Bayesian networks using a new implicit score. Expert Systems with Applications, 2010, 37 (8): 5470- 5475.
22	DE CAMPOS L M. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research, 2006, 7 (8): 2149- 2187.
23	CAMPOS L M D, FEMANDEZ-LUNA J M, GAMEZ J A, et al. Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning, 2002, 31 (3): 291- 311. doi: 10.1016/S0888-613X(02)00091-9
24	GUO W, XIAO Q, HOU Y, et al. Bayesian network learning based on relationship prediction PSO and its application in agricultural expert system. Proc. of the 25th Chinese Control and Decision Conference, 2013: 1818-1822.
25	ETXEBERRIA R, LARRAÑAGA B, PICAZA J M. Analysis of the behavior of genetic algorithms when learning Bayesian network structure from data. Pattern Recognition Letters, 1997, 18 (11-13): 1269- 1273. doi: 10.1016/S0167-8655(97)00106-2
26	JI J, WEI H, LIU C. An artificial bee colony algorithm for learning Bayesian networks. Soft Computing, 2013, 17 (6): 983- 994. doi: 10.1007/s00500-012-0966-6
27	JIN Y, HU Y, ZHANG J, et al. Bayesian network structure learning combining K2 with simulated annealing. Journal of Southeast University (Natural Science Edition), 2012, 42 (S1): 1320- 1324. doi: 10.3969/j.issn.1001-0505.2012.S1.018
28	CAO W, FANG X. An improved method for Bayesian network structure learning. Proc. of the IEEE International Conference on Natural Computation, 2010: 3133-3137.
29	TSAMARDIONS I, BROWN L E, ALIFERIS C F. The maxmin hill climbing Bayesian network structure learning algorithm. Machine Learning, 2006, 65 (1): 31- 78.
30	OTT S, MIYANO S. Finding optimal gene networks using biological constraints. Genome Informatics, 2003, 14, 124- 133.
31	KOIVISTO M, SOOD K. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 2004, 5, 549- 573.
32	SINGH A P, MOORE A W. Finding optimal Bayesian networks by dynamic programming. Pittsburgh: Carnegie Mellon University, 2005.
33	SILANDER T, MYLLYMAKI P. A simple approach for finding the globally optimal Bayesian network structure. Proc. of the Conference on Uncertainty in Artificial Intelligence, 2006: 445-452.
34	YUAN C, MALONE B, WU X. Learning optimal Bayesian networks using a* search. Proc. of the International Joint Conference on Artificial Intelligence, 2011: 2186-2191.
35	MALONE B, YUAN C, HANSEN E A. Memory-efficient dynamic programming for learning optimal Bayesian networks. Proc. of the AAAI Conference on Artificial Intelligence, 2011: 1057-1062.
36	PERRIER E. Finding optimal Bayesian network given a superstructure. Journal of Machine Learning Research, 2008, 9 (4): 2251- 2286.
37	LARRANAGA P, KUIJPERS C M H, MURGA R H, et al. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Trans. on Systems Man and Cybernetics Part A: Systems and Humans, 1996, 26 (4): 487- 493. doi: 10.1109/3468.508827
38	LIU F, TIAN F, ZHU Q. A novel ordering-based greedy Bayesian network learning algorithm on limited data. AI 2007: Advances in Artificial Intelligence. Berlin: Springer Berlin Heidelberg, 2007: 80-89.
39	LAURITZEN S L, SPIEGELHALTER D J. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, 1988, 50 (2): 157- 224. doi: 10.1111/j.2517-6161.1988.tb01721.x

	V	S	TB	LC	B	TC	X	D
V		n	y	n	n	y	y	y
V		n	y	n	n	y	y	y
S	n		n	y	y	y	y	y
TB	n	n		n	n	y	y	y
LC	n	n	n		n	y	y	y
B	n	n	n	n		n	n	y
TC	n	n	n	n	n		y	y
X	n	n	n	n	n	n		n
D	n	n	n	n	n	n	n

BN	Node	Edge	Sample size	Original network	HC_bo+C	HC	HC_bo
Cancer	5	4	500	–1 106.0	–1 113.0	–1 113.0	–1 113.0
			2 000	–4 212.0	–4 218.0	–4 219.0	–4 218.0
			5 000	–10 485.0	–10 488.0	–10 499.0	–10 488.0
			10 000	–21 138.0	–21 143.0	–21 149.0	–21 143.0
Asia	5	4	500	–1 157.0	–1 157.0	–1 161.0	–1 160.0
			2 000	–4 527.0	–4 528.0	–4 549.0	–4 531.0
			5 000	–11 148.0	–11 155.0	–11 212.0	–11 160.0
			10 000	–22 394.0	–22 394.0	–22 436.0	–22 394.0
Sachs	11	17	500	–4 043.0	–4 050.0	–4 182.0	–4 162.0
			2 000	–14 853.0	–14 853.0	–15 088.0	–14 985.0
			5 000	–36 353.0	–36 353.0	–36 583.0	–36 353.0
			10 000	–72 153.0	–72 153.0	–72 837.0	–72 153.0
Child	20	25	500	–6 830.0	–6 942.0	–7 350.0	–7 030.0
			2 000	–25 270.0	–25 263.0	–27 260.0	–25 480.0
			5 000	–62 420.0	–62 655.0	–67 440.0	–63 400.0
			10 000	–123 860.0	–124 281.0	–129 760.0	–124 640.0
Insurance	27	52	500	–8 490.0	–8 605.0	–9 820.0	–8 820.0
			2 000	–30 380.0	–30 931.0	–32 330.0	–31 290.0
			5 000	–73 310.0	–73 522.0	–79 510.0	–74 100.0
			10 000	–143 680.0	–144 077.0	–158 250.0	–144 270.0
Alarm	37	46	500	–6 600.0	–6 650.0	–7 100.0	–7 000.0
			2 000	–23 350.0	–23 465.0	–24 400.0	–23 610.0
			5 000	–56 850.0	–57 032.0	–60 530.0	–57 470.0
			10 000	–111 450.0	–111 573.0	–114 470.0	–112 050.0
Hailfinder	56	66	500	–28 150.0	–28 571.0	–31 880.0	–28 730.0
			2 000	–103 490.0	–103 698.0	–108 450.0	–106 940.0
			5 000	–225 310.0	–252 862.0	–256 970.0	–252 950.0
			10 000	–499 850.0	–500 515.0	–507 920.0	–502 150.0

Data set	Node	Sample size	HC_bo+	HC_bo	HC
monks_T	7	432	–2 876.5	–2 876.5	–2 876.5
car	7	1 728	–13 646	–13 684	–13 696
diabetes	9	768	–12 248	–12 271	–12 276
abalone	9	4 177	–65 845	–66 088	–66 563
nursery	9	12 960	–125 990	–126 130	–126 190
contrasep	10	1 473	–15 347	–15 387	–15 388
wine	14	178	–2 880.4	–2 932.3	–2 897.7
heart	14	270	–4 294.1	–4 311.5	–4 299.0
australian	15	690	–11 098	–11 114	–11 116
zoo	17	101	–645.05	–681.54	–674.16
pen_A	17	7 494	–192 350	–200 580	–210 730
letter_A	17	15 000	–452 530	–452 610	–471 520
segment	20	2 310	–58 599	–60 537	–69 570
german	25	1 000	–21 154	–21 269	–21 189

Model	Node	Sample size	HC_bo+C	HC_bo	OMRMRG
kdd_test	64	34 955	–75 053.0	–75 271.0	–75 439.0
plants_ts	69	17 412	–231 600.0	–232 270.0	–232 790.0
Hepar2	70	10 000	–329 860.0	–330 820.0	–331 560.0
Win95pts	76	10 000	–104 200.0	–104 500.0	–104 730.0
baudio_ts	100	15 000	–618 110.0	–619 910.0	–621 300.0

[1]	Fang YE, Ying MAO, Yibing LI, Xinrui LIU. Target threat estimation based on discrete dynamic Bayesian networks with small samples [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1135-1142.
[2]	Xiangyuan TAN, Xiaoguang GAO, Chuchao HE, Zidong WANG. Causal constraint pruning for exact learning of Bayesian network structure [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 854-872.
[3]	Yu YANG, Xiaoguang GAO, Zhigao GUO. Finding optimal Bayesian networks by a layered learning method [J]. Journal of Systems Engineering and Electronics, 2019, 30(5): 946-958.
[4]	Xiaoguang GAO, Yu YANG, Zhigao GUO. Learning Bayesian networks by constrained Bayesian estimation [J]. Journal of Systems Engineering and Electronics, 2019, 30(3): 511-524.
[5]	Ruohai Di, Xiaoguang Gao, and Zhigao Guo. Learning Bayesian network parameters under new monotonic constraints#br# [J]. Journal of Systems Engineering and Electronics, 2017, 28(6): 1248-1255.
[6]	Binghua Song, Zhongbao Zhou, Chaoqun Ma, Jinglun Zhou, and Shaofeng Geng. Reliability analysis of monotone coherent multi-state systems based on Bayesian networks [J]. Journal of Systems Engineering and Electronics, 2016, 27(6): 1326-1335.
[7]	Zhiqiang Cai, Shubin Si, Shudong Sun, and Hongyan Dui. Learning Bayesian network structure with immune algorithm [J]. Journal of Systems Engineering and Electronics, 2015, 26(2): 282-291.
[8]	Zhengdao Zhang, Jinlin Zhu, and Feng Pan. Fault detection and diagnosis for data incomplete industrial systems with new Bayesian network approach [J]. Journal of Systems Engineering and Electronics, 2013, 24(3): 500-.
[9]	Chunfeng Wang, Sanyang Liu, and Mingmin Zhu. Bayesian network learning algorithm based on unconstrained optimization and ant colony optimization [J]. Journal of Systems Engineering and Electronics, 2012, 23(5): 784-790.
[10]	Mingmin Zhu, Sanyang Liu, Youlong Yang, and Kui Liu. Using junction trees for structural learning of Bayesian networks [J]. Journal of Systems Engineering and Electronics, 2012, 23(2): 286-292.
[11]	Li Wang, and Mingzhe Wang. Modeling of combined Bayesian networks and cognitive framework for decision-making in C2 [J]. Journal of Systems Engineering and Electronics, 2010, 21(5): 812-820.
[12]	Lian Guangyao, Huang Kaoli, Chen Jianhui & Wei Zhonglin. Study of testability measurement method for equipment based on Bayesian network model [J]. Journal of Systems Engineering and Electronics, 2009, 20(5): 1017-1023.
[13]	Tang Zheng & Gao Xiaoguang. Research on the self-defence electronic jamming decision-making based on the discrete dynamic Bayesian network [J]. Journal of Systems Engineering and Electronics, 2008, 19(4): 702-708.
[14]	Chen Fei, Wang Xiufeng & Rao Yimei. Learning Bayesian networks using genetic algorithm [J]. Journal of Systems Engineering and Electronics, 2007, 18(1): 142-147.
[15]	Shi Zhifu , Zhang An & Wang Anli. Target distribution in cooperative combat based on Bayesian optimization algorithm* [J]. Journal of Systems Engineering and Electronics, 2006, 17(2): 339-342.