Markovian and Bayesian Network Toolkit
The IBM System G Markovian and Bayesian Network toolkit currently offers efficient Large-Scale Bayesian Inference tools with the option of enabling Markovian transition on all the nodes of the Bayesian network. The Markovian transition enables the tool to capture the temporal aspect of the observations and improve inference performance.
Large Scale Bayesian Inference
Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with clique width and the number of states of random variables. System G middleware supports exact inference in large scale graphical models. We studied potential table representation and scalable algorithms for node level primitives for probabilistic inference. Based on such node level primitives, we create computation kernels for evidence collection and evidence distribution. In addition to the data parallel algorithms, we also studied the structural parallelism in exact inference in junction trees, a hypergraph representation of Bayesian networks, by utilizing architecture-aware schedulers.
Designing an efficient inference engine for large scale graphical models requires a systematic solution. For computational reasons, a causality graph in a graphical model may be converted into a different form. We identified the computational primitives and organized the potential table (POT) data accordingly. The following figures shows the systematic solutions and an compact and highly efficient way for representing a POT in memory.
The following table shows the computational performance of a MPI version of the exact inference for anomaly detection, where we aggregately processed about 5000 person for 30 days. The graph model for each person involves dozens random variables. Therefore, if a Bayesian inference engine takes 1 second to process such a graph, it will take more than 17 days to process all the data sequentially. However, in our implementation, we completed the computation in 5 minutes using 64 MPI processes.
The original form of Bayesian inference obtains the observations at the same time slice and the inference of the previous time slices are independent of the current inference. Hence temporal changes of the observations are lost in the inference. For example, in the application of human behavioral anomaly detection, the anomaly activities can span much longer than the interval of detection such as an user searches for the next job two weeks before he/she logs on to the company network in the midnight and steals information from the company.
To combat this problem, we created Markovian features on each node of the Bayesian Network:
The following example shows a scenario that the Markovian feature update can help capture:
We tested the Markovian featured Bayesian Network on a human behavioral anomaly detection dataset which has 5500 users and three true anomalies in each month. The results are presented as monthly rank of the true abnormal person, which means the lower the ranks of the true anomalies, the better the performance. In the below table, we can see the performance improvement with the Markovian features on Bayesian Networks.