Machine Reasoning Overview
Machine Reasoning Toolkit
IBM System G Machine Reasoning toolkit provides a platform to manage analytic runs and data produced as well as machine learning algorithms for detecting anomalies in both structural and unstructural data. The platform provides infrastructure that maintains provenance, mix-and-match analytic runs, and transparency of data storage to analytics.
The machine reasoning is conducted in a layered fashion:
- Sensor layer represents the raw data from each data source. For example: a node could contain user id, time stamp, machine id, the file that is being manipulated.
- Feature layer extracts features from the raw data. For instance, the domain of each web access record, the search word of each search engine access record, egonet features of the user's communication records, and the size of the file that is moved to removable devices are extracted from the HTTP activities and file manipulation activities respectively.
- Concept layer assigns anomaly scores (ranging from 0 to 1) to the extracted features with different comparison base. The algorithms that are used to compute anomaly scores are as in the Anomaly Detection Tools. Take the feature, sizes of files being copied to removable drive, as an example: one anomaly score can be computed by comparing the value of this feature with the history of the same user, and another anomaly score can be computed by comparing such a value with the feature values of the peer groups of the user.
- Semantics layer use pure statistic method, Pareto Depth Analysis, to reduce the number of scores from the feature layer. Details about the algorithm can be found in Anomaly Detection Tools .
- Cognition layer fuses the anomaly scores with Markovian and Bayesian Network Tools. The structure of the network is concluded from past use cases and can be customized to other domains.
Based on Natural Language Processing and Machine Learning techniques, we created Text Emotion Detectors which allows machines to detect disgruntlement, depression, anxiety, etc, to help understand a person is under stress. We identify not only positive, neutral, and negative sentiments but also fine-grained emotions from user-generated content. Combining with the Behavioral Analytics, this may be used for reducing the risk of mental breakdown.
Here are short descriptions of the components of the toolkit:
(1) Machine Reasoning Platform
The platform that maintains provenance and transparency of data storage to analytics(2) Markovian and Bayesian Network Tools
A generic and efficient Bayesian inference tool with Markovian feature update to capture temporal properties of the features.
(3) Anomaly Detection Tools
Three generic anomaly detection algorithms to identify anomalies in multi-dimensional data
(4) Text Emotion Analysis
A fine-grained emotion detection from text. This tool was based on supervised learning on annotated user generated content in public domain social media domain and Enron email dataset.