IBM System G Social Media Solution (SMISC - Social Media in Strategic Communications):
V2.0 applications (Released: 09/14/2015)
IBM System G Social Media Solution includes three sets of applications. Content Monitoring and Analysis applications, including Live Monitoring, Multimedia Monitoring, Story Detection, Geo Monitoring, Trend Monitoring, Concept Analytics and Impact Prediction, focus on content and aim to answer questions such as what text/images/hashtags are tweeted, how popular they are, and what are the sentiment and impact of individual tweets or collections of tweets. People Analysis applications, including Person Analytics and Target Discovery, focus on analyzing emotions, personalities, trust and behaviors of people base on their tweeting activities. These applications help analysts to understand better the individuals in social movements and gain insights into the driving factors of various phenomenon. Network Exploration and Analysis applications, including Link Exploration and Forensic Analytics, support exploration and analysis of links and dynamic information flows in the networks, to address questions such as how tweets are propagated in the networks through people over time, and whether there are any anomalies during information dissemination (e.g. rumor spreading). Below we describe individual applications in more details.
Overview & Awards
We have conducted substantial research for all the proposed tasks, producing ~80 publications, including one paper on Science and four papers on Nature Physics and Nature Communications. Our papers also won ACM International Conference on Information and Knowledge Management (CIKM) 2012 Best Paper Award, IEEE BigData 2013 Best Paper Award, and one paper was selected as the cover article of the Proceedings of National Academy of Science in January 2013. Most algorithms, methods and interfaces developed as a result of our research work have been integrated into a comprehensive IBM System G Social Media Solution. The solution aims to help analysts monitor and explore dynamic social media data and conduct in-context analysis with a suite of applications that analyze and present data in various contexts. It offers a variety of analysis-rich information, including sentiment mined from both text and multimedia, emotions, personality traits, influence of individuals, impact of conversations, connections between tweets, users, hashtags, etc., as well as flows and spread of information. All such information greatly facilitates analysts in discovering, understanding, and tracking dynamic social media movements in rich contexts.
Social media data are inherently linked and form large heterogeneous graphs. Analysis of social media data often takes into consideration both different types of entities and the relationships between them. Therefore, it is not only natural but also beneficial to apply graph-based representations and technologies to social media data analysis. IBM System G Social Media Solution is built on top of IBM System G Graph Computing Platform, which provides a comprehensive software stack for Big Data Analytics. Fig.1 illustrates the system architecture. The server side handles data collection, storage, retrieval, and analysis. Particularly, the Database layer organizes data of various types using the property graph model, consisting of a graph structure with vertices and edges, and the attributes associated with each vertex and edge, a.k.a. graph properties. The Middleware layer includes several runtime libraries specifically designed for property graph computations to provide graph computing primitives. These primitives serve as building blocks for constructing various high-level graph analytics. The Analytics layer provides tools to traverse graph, retrieve relevant data, and conduct graph and data analytics. The Data ETL (extract, transform, load) module connects to the data sources to collect raw data, extract entities and relationships between entities, transform them into graph representations (vertices and edges with properties), and load into the graph-based data stores and indices, all in a dynamic and continuous fashion. The client side provides interactive Web-based user interfaces to gather information requests from users, interact with the backend through a Data Services module, and visualize the results returned by the module. Users can switch between different applications by using the uniform navigation bar at the top of all interfaces, or simply by following certain links provided in one application's interface for invoking other related applications. For example, the user profile images displayed in all applications are linked to the application that provides person analytics of the corresponding users.
1. Live Tweet Monitoring
The Live Monitoring application provides real-time monitoring of tweets (including retweets) that are relevant to user interests. The application's user interface, which contains multiple UI components. The input components and allow a user to input the keywords or select the data channel for which s/he wants to monitor. The statistics components and display aggregate statistics calculated from the tweets since the start time of the monitoring, including sentiment information based on tweet text and a word cloud. The list of current tweets is dynamically updated as tweets come in, with the most recent tweets displayed at the top. The retweet graph visualizes the retweeting relationships (edges) between users (nodes). The size of each node corresponds to the number of followers. Hovering over a node shows the ID and profile image of the corresponding user. Hovering over an edge shows the content of the associated retweet. Details of the retweets are also provided in on the right hand side. The map indicates the location information of the tweets whenever available.
2. Trend Analytics
The Trend Monitoring application provides timeline views of the popularity of hashtags (a) or topics (b) relevant to a given data channel. Users can interact with chart legends to hide/show particular hashtags or topics, or mouse over to get detailed count information at each time slice. The unit of the time dimension can be switched between 10 seconds, 1 minute and 1 hour for hashtag monitoring.
3. Multimedia Monitoring
The Multimedia Monitoring application displays real-time tweets containing images along with automatically calculated visual sentiment to assist analysts in understanding social movements from a visual perspective. The visual sentiment of an image is determined by the sentiment of the Adjective and Noun Pairs (ANPs), such as happy dog, horizontal text, etc., that are automatically generated to describe the image. A prediction model, established using a deep learning framework that correlates thousands of ANPs with image features, is used to generate a list of ANPs given the features extracted from the image. The automatically generated ANPs are incredible sunset, amazing sunset, awesome sunset, colorful sky, fantastic sunrise, which are close to the human's descriptions of the image. Since the sentiment associated with these ANPs is positive, the visual sentiment of the image is set to positive.
4. Geographic Monitoring
The Geo Monitoring application visualizes tweets where they happen, as they happen on a world map. Users can pan and zoom in to get a high-resolution view of any location in the world and monitor real-time tweets from that location. These tweets are also displayed at the bottom of the interface, together with the profile images of the authors. Clicking on a profile image goes to the Person Analytics application for an in-depth analysis of the corresponding user.
5. Scope Identification
Currently IBM System G Social Media Solution focuses on Twitter data. The system can consume tweets through multiple means such as Twitter public API, and GNIP Decahose and PowerTrack subscriptions. The administrator sets up via an admin console channels for data collection. Each data channel corresponds to a topic of interest, which can be broad or narrow depending on the Twitter queries/filters/rules specified for the channel. Users can further define scopes of their interests within a data channel by creating filters through the Scope Identification application. Each filter consists of one or more terms ANDed or ORed together. An interactive visualization showing terms related via co-occurrences in tweets is provided to help users choose terms for defining filters. During monitoring, exploration and analysis, users can switch between data channels freely and apply a filter defined for the selected data channel to further constrain data in this channel.
6. Concept Analytics
The Concept Analytics application visualizes distribution of tweet count over a period of time for a concept defined by the user. Each concept is expressed by keywords ANDed or ORed together. In addition, NOT and phrases are supported. The user can also request words expressing positive or negative sentiment to be automatically added to the concept definition. Results for multiple concepts are displayed side by side to allow easy comparison. The collection of tweets corresponding to a particular concept during a specific time span can be retrieved upon request.
7. Link Exploration
Link Exploration application allows analysts to easily explore and discover both direct and indirect connections between various entities (e.g. tweets, users, hashtags) in the heterogeneous graph representation of Twitter data. A standard node-link visualization is used to display a sub-graph retrieved based on a user query specified via the interactive query panel. For example, the 2-hop ego network of a specific hashtag and the 2-hop ego network of a specific user, both of which link together tweet, user, hashtag, image, and time nodes via create, retweet, mention, reply, or contain relationships. Hovering over a node displays more detailed information of this node, left-clicking on a node retrieves a new sub-graph using the selected node to query, and right-clicking on a user node invokes person analytics.
8. Impact Prediction
The Impact Prediction application aims to dynamically capture and analyze “virtual social conversations” formed around various topics, and predict their potential impact to the business that may be affected. Since Twitter users often use hashtags to participate in particular social conversations, the application extracts virtual social conversations by grouping tweets around common hashtags. All tweets within a single conversation are analyzed to extract a set of features based on tweet content (e.g. percentage of keyword coverage), author information (e.g. number of identified influencers), and other metadata (e.g. location, language). Then a regression-based prediction model is applied to these features to calculate an impact score of the conversation. The prediction model is created with the help of domain knowledge provided by subject matter experts, and can be dynamically updated given user feedback.
9. Story Detection
The Story Detection application clusters tweets containing images into “stories” to help analysts quickly get a sense of rapidly developing storylines in social media. Each group contains tweets with similar images. The similarities between the images are determined based on the similarities between the ANPs associated with these images. This approach enables images that are close to one another at the semantic level to be grouped together, even though they may not be similar in low-level image features.
10. Personality Analytics
Person Analytics application conducts multi-dimensional emotion analysis, personality analysis, and trust analysis of a given Twitter user based on his/her tweets. Emotion analysis detects the user's expressed emotions at different time points and summarize those emotions to reveal the user's emotional style. Personality analysis focuses on the Big 5 personality traits of the user. Trust analysis calculates the user's trustingness and trustworthiness by looking at his/her interactions with others via tweets. The interface provides an interactive visualization of the emotion analysis result (a) to create a visual emotional profile of the target user, and displays personality and trust scores.
11. Target Detection
We demonstrate our capability of showing both discovered properties and raw temporal data for analysts to validate the properties. Our use case is identifying bot accounts on Twitter using a visual analytics system, TargetVue. TargetVue consists of four modules. The data collection module, the preprocessing module, the analysis module, and the visualization module. Users are initially visualized in the global view and listed in the user list sorted based on their anomaly scores. The global view shows the similarity among users in feature space. From the global view, we found most of the users are placed in the center area and only a few of them were placed at the marginal with a high degree of outlierness. We select those outliers with high anomaly scores for a further investigation.
The investigation view layouts the selected users on a triangle grid. We propose three glyph designs: behavior glyph, z-glyph and interaction glyph. For more information on how to interpret these glyphs please see the TargetVue documentation.
12. Forensic Analytics
Forensic Analytics application focuses on analyzing retweet sequences to detect anomalous ones which may indicate rumors or other malicious actions. Given a retweet sequence, the application evaluates how anomalous the sequence is using One-Class Conditional Random Fields. The user interface provides interactive visualizations to present top anomalous retweet sequences in rich context, which allows analysts to easily explore, understand, and validate analysis results.