The US presidential election dominates the global media every four years, with news articles, which are carefully analysed by commentators and campaign strategists, playing a major role in shaping voter opinion. Academics have developed an online tool, Election Watch, which analyses the content of news about the US election by the international media.
A paper about the project by academics at the University of Bristol’s Intelligent Systems Laboratory will be presented at 13th conference of the European Chapter of the Association for Computational Linguistics held in Avignon, France.
Election Watch automatically monitors political discourse about the 2012 US presidential election from over 700 American and international news outlets. The information displayed is based, so far, on 91,456 articles.
The web tool allows users to explore news stories via an interactive interface and demonstrates the application of modern machine learning and language technologies. After analysing news articles about the 2012 US election the researchers have found patterns in the political narrative.
The online site is updated daily, by presenting narrative patterns as they were extracted from news. Narrative patterns include actors, actions, triplets representing political support between actors, and automatically inferred political allegiance of actors.
The site also presents the key named entities, timelines and heat maps. Network analysis allows the researchers to infer the role of each actor in the general political discourse, recognising adversaries and allied actors. Users can browse articles by political statements, rather than by keywords. For example, users can browse articles where Romney is described as criticising Obama. All the graphical briefing is automatically generated and interactive and each relation presented to the user can be used to retrieve supporting articles, from a set of hundreds of online news sources.
Nello Cristianini, Professor of Artificial Intelligence, who is leading the project, said: “The number of news articles devoted to the US election is so large that no exhaustive analysis can be attempted by conventional means. Even if just focusing on the leading English-language outlets, there are hundreds of thousands of articles to analyse just for the primary phase. So any large-scale analysis of global coverage will necessarily need to make use of computational methods.
“However, most computational approaches to news content analysis are limited to sophisticated forms of keyword counting, be it for sentiment analysis, or topic detection, and relative statistical analysis. This will necessarily miss many aspects of the narration to which voters are exposed, and which may therefore be of interest to analysts.”
The researchers aim was to access information that is closer to what a human analyst could extract, but still simple enough to be reliably extracted by computational means in a Big Data setting.
In this project, they automated techniques from Quantitative Narrative Analysis (QNA) so that they can be applied on a vast scale. This approach was aimed at identifying the actors and the actions that dominate a story, as well as basic units of narration: subject-verb-object triplets. While still very simple, this information captures a variety of relations that would be missed by classical means, and that are relevant to political discourse.
One of the results is a network whose nodes are actors, represented by noun phrases such as “the democratic party”, and the edges are actions, represented by transitive verbs such as “endorsed”.
The domain of US politics is particularly amenable to this type of network analysis, due to the binary nature of the choice (at least after the primary phase), so that all various issues and players need to ultimately fit into a bi-polar playing field. Also the communication is easily analysed, with explicit support or opposition often being stated for the candidates by various actors.
It is therefore possible to automatically detect the relation between these actors, generating a relational network whose topology depends on the political relations between these players. An analysis of the properties of this network can reveal a lot of information about the political landscape, as represented in the news narrative. Another key result of this type of analysis is that the researchers can also identify which actors are more often portrayed as subjects or objects of political narrative, and which of them are more likely to be the subject or the object of positive and negative statements.
As experimental results the researchers will present at the conference both experiments on the past five election cycles, and up-to-date analysis of the 2012 election. The first set will only be based on the New York Times coverage, while the analysis of the current election will be based on more than 719 international outlets, having generated to date more than 70,000 articles. So far the researchers system has extracted 261,510 triplets, which contain 27,317 distinct actors. The online tool has in the in the meantime reached the mark of 91,000 articles.
The researchers will concentrate on two classes of results: the properties of the network of political support among actors, which reveals complex party allegiances, and the embedding of actors in a space that reveals their position in the media narrative, subjects or object of positive or negative statements.
The computational infrastructure is capable of detecting election-related articles, analysing their content, solving co-reference and anaphora, identifying verbs that denote support or opposition, identifying key actors, filtering information that is statistically not reliable, and finally analysing the properties of the resulting relational network.
While each step of the extraction phase may be imperfect, the statistical corrections coming from the use of very big datasets deliver a sufficiently clean signal for political observers to monitor the state of play of a complex process such as a US presidential campaign.