## Anomaly Detection Tools

System G anomaly detection toolkit identifies outliers using different statistical measurements including local outlier factors, Pareto-depth analysis, and Granger causality.

Local Outlier Factor:

Local outlier factor is a measurement that compares the local density of the target point with the target point's neighbors' local density. Given the neighborhood size K, the local outlier factor of point A can be defined as LOFk(A) = (sum of local reachability density of all A's K-nearest neighbor)/(local reachability density of A times the size of A's K-nearest neighborhood), where the local reachability density of a point X is defined as the size of X's K-nearest neighborhood divided by the sum of reachability distance between X and all X's K-nearest neighbor. The reachability between point X and Y is the maximum value of the distance between X and Y and the K-distance of Y, where the K-distance of Y is defined as the distance between Y and its Kth nearest neighbor.

We utilize LOF to measure anomalies of user behavior with three different comparison basis:

Pareto-depth Analysis:

Granger Causality:

Assuming a user has several regular sequences of actions, where the actions can be invoked by the user or other users. For example, a secretary usually sends out emails to multiple people shortly after his/her boss sent out an email to him/her. Based on such an assumption and a training dataset of all users' activities, we train activity sequence models for all users using Granger causality and use the model to measure how anomalous the user's action sequence is.