In celebration / abject dread of the 2020 election cycle that is already upon us, Epsilon Theory is beginning a new monthly feature – the ET Election Index. Our aim with the feature is to lay as bare as possible the popular narratives governing the US elections in 2020. That includes narratives concerning policy proposals and candidates found in the news, opinion and feature content produced by national, local and smaller outlets.
Our goal isn’t to uncover ‘media bias’.
Our goal isn’t to discuss the ‘fairness’ of coverage to different candidates.
Our goal isn’t to ‘fact check.’
Our goal is to make you a better, more informed consumer of political news by showing you indicators that the news you are reading may be affected by (1) adherence to narratives and other abstractions, (2) the association/conflation of topics and (3) the presence of opinions. Our goal is to help you – as much as it is possible to do – to cut through the intentional or unintentional ways in which media outlets guide you how to think about various issues, an activity we call Fiat News.
Our goal is to help you make up your own damn mind.
What do we do?
We leverage the natural language processing (NLP) tools from Quid to construct network graphs of all available sources from the LexisNexis Newsdesk database. These network graphs are built by comparing each word of the language used in each queried story against each word in every other story, resulting in a matrix of similarity between each such article.
Here at Epsilon Theory, we then take this output and attached meta-data to build and score a range of signals by topic – which can be either candidates, policies or any other categorization scheme which fits sensibly inside election coverage.
Any prose you see under the heading “commentary” you should read as our judgments and opinion. Everything else – unless otherwise noted – you should take as the systematic output of the process and signals described above.
Cohesion: Our Measure of Narrative-Adherence
We think that news and opinion journalism often replace facts and statements with abstractions of those facts. Those abstractions may take the form of euphemisms, logical frameworks, conflations or descriptive turns of phrase. Over time, especially when the abstractions are emotionally or culturally powerful (memes), these abstractions can become increasingly divorced from the facts they were originally built to represent. They may also become loaded with other, more subjective, meaning. This makes interpretation both difficult and variable by individual.
We call these abstractions narrative. By measuring the aggregate similarity of language used by writers in the discussion of a topic over time, we believe we can track the extent to which a narrative exists for that topic. We call this Cohesion.
We also believe that a cohesive narrative tends to cause future content to adhere to the conventions and language that have become common knowledge for that topic. As a result, when we consume news content about topics or individuals with high Cohesion, we must be aware that the information may be presented to us in a way that aligns it with existing narrative abstractions rather than on an independent basis.
In effect, this is one of the ways in which we are subtly guided how to think about an issue.
What does Cohesion look like?
Our measure is based on the underlying aggregate similarity between all of the articles relating to a single topic. When more articles refer to Pete Buttigieg as “a gay, millennial” candidate, this measure increases. When more articles refer to Elizabeth Warren as “policy-oriented”, this measure increases. When articles vary between isolated references to Andrew Yang’s UBI proposal, or his penchant for cryptocurrency, or descriptions of him as an “entrepreneur”, or “technologically savvy”, that cohesion measure will typically fall.
While we calculate our measure based on the underlying data describing this aggregate similarity, we also present visualizations from our partners at Quid. We think it can be helpful to see what connections – and the lack of connections – look like. An extremely low cohesion network might look something like the below.
In contrast, a high cohesion network might look like the below. In both of these graphs, similarity is presented by proximity on the chart, emphasized in very high cases (‘adjacencies’) by a connecting line, and organized by color into clusters of topical or linguistic meaning/similarity.
Attention: Our Measure of Association and Conflation of Topics
We also believe it is useful to note when authors use increasingly similar language to describe two topics, especially when one topic is a sub-set of the other. We think that this behavior is often an expressed desire (or belief) on the part of the author to establish or presume a relationship between those topics. Because the relationship between topics and/or people forms a significant part of how we think about issues, their conflation is an effective way to guide us in how to think about a topic.
In other words, these are the topics which warrant the question: “Why am I reading this NOW?”
We call our measure of the similarity of language used between two topics Attention.
What does Attention look like?
While visualization of Attention doesn’t fully capture how connected a topic is to (or better, within) another, it can still be useful to develop an intuition for it. A high Attention relationship might look like the below, where highlighted nodes and connecting lines identify those of a topic within another (in this case, references to the Dallas Cowboys within broader NFL coverage):
In contrast, a low Attention topic would be one for which the centrality and connectivity of articles was limited. The below visualization presents what that might look like (in this case, an illustration of references to concussions within the same broader NFL coverage):
Sentiment: Our Measure of the Potential Influence of Author Opinion
We also believe that the graying of lines between news and non-news coverage has created an environment in which nearly every piece can be expected to include at least some affect of the author or publisher, even when great pains are taken by fastidious, ethical journalists to avoid it.
We think sentiment measures – even ours – should be viewed with caution. The main reason is that many topics are intrinsically related to terms which have ‘negative’ sentiment under any scoring system. You won’t find many articles about climate change with a cheery tone, but that doesn’t mean that the author is necessarily including opinion. What can be useful, however, are comparisons of sentiment over time, or comparisons of like coverage – for example, across candidates. That is where we tend to focus our efforts.
Who is it for?
People who want to make up their own damn minds.
People who want to know when conscious or subconscious use of language by politicians, media members and other influential figures and entities is influencing the lens through which we think about and – with other civically and politically minded fellow citizens – collectively experience the debate about our country’s political direction.
Some outlets try to approach these questions through analysis with sentiment, or with sentiment and time-on-air measures, or with fact checks. Those are all adequate for some purposes, but they don’t capture the things we intuitively sense about the nature of political coverage and the stories we begin telling ourselves about “what this candidate means for America” or “what it means that this policy is even being discussed.” Those kinds of abstractions are where we tend to be led toward politically polarized outcomes. They’re also the kind of thing that NLP is very well-suited to explore.
With that, here’s our first look at the summary of Election Narratives as of April 30, 2019, which we will supplement with Candidate and Policy-specific detail in future pieces.
Election Narrative Structure as of April 30, 2019
Commentary on Election Narrative Structure
- The Democratic rallying cry that has resonated throughout election coverage is, roughly, “This is not America.” Right now that looks like a progressive populist message, caught up in a variety of discussions about and coverage of socialism, freer immigration, rural America, student debt, and Latino turnout.
- Among the large clusters of articles that are not “round-up” style pieces, the five most influential clusters on the overall network – meaning those with the most universal language similarity – are (1) Latino Turnout, (2) Mayor Pete Buttigieg, (3) Mueller Report, (4) The Joe Biden Allegations and (5) Rural America.
- Those with the most similarity to multiple different topics – indicating strong external relevance to other subjects – are (1) New Hampshire Democrats, (2) Latino Turnout, (3) Student Loans and Marijuana, (4) Not America, Ilhan Omar and Socialism, and (5) Mueller Report.
- While Biden’s coverage was much higher in volume than that of other candidates in April, the content that was most similar to the overall story being told about elections was not about Biden. His most connected topics in April were, alas, not related to his candidacy but to behavioral allegations.
- PTSD from 2016 seems to have kept Rural America squarely in the crosshairs of a lot of the media discussion about the 2020 election.
- While an electoral strategy predicated on Latino turnout has not produced much content (yet), what content is there is very consistent with the overall story being told by the media about the 2020 election.
Candidate Cohesion Summary
Commentary on Candidate Cohesion
- Candidates in bold are those for whose coverage we would exercise special caution in our news consumption habits.
- Even pre-candidacy, coverage of Biden used very consistent language. With his formal announcement, the April measure placed him at the top. While some of this is related to the topical consistency of the announcement itself (i.e. we don’t think everyone using and printing the same quotes from the announcement is necessarily concerning), we still think that readers should be mindful of adherence to media narratives in coverage of Biden’s candidacy at this time.
- We share a similar view of Sanders, whose prior candidacy and fairly well-established ‘story’ has created the strongest primary-season cohesion of all candidates. The media have largely decided how to report on Sanders and how to structure their coverage of events relating to him. We continue to recommend caution to readers.
- Yang, Gabbard, Buttigieg and Klobuchar have tended to yield the least internally consistent language among all candidates; however, both Buttigieg and Gabbard saw meaningful gains in April. We think that is indicative of Buttigieg’s formal candidacy announcement and crystallization of “anti-War” coverage for Gabbard, but readers should be on guard.
- We are observing some deteriorating narratives for Warren and O’Rourke. That doesn’t mean a negative – it simply means that media articles are not as uniformly on-message in the language they use to describe each of these candidates. We think this is more ‘real’ in the case of O’Rourke, where the previously generous coverage has begun to break down. In our judgment, the weakened narrative structure for Warren is more the result of regular involvement in non-election news creeping in and diversifying the focus and language used.
Candidate Sentiment Summary
Commentary on Candidate Sentiment
- The candidates in bold are those whose levels or changes in coverage sentiment would give us pause in our consumption of election coverage.
- Yang’s sporadic coverage is almost universally glowing, as he has thus far received disproportionately positive “Yang Gang”-style puff piece features. As always, we aren’t arguing intent to make him a more prominent candidate, but we would expect his coverage, cohesion and popularity to grow on this basis.
- Biden gets the other side of the coin. Coverage of him is universally – in both April and on a primary-season-to-date basis – decidedly more negative than that of other candidate. It is somewhat harder to ignore the media’s displeasure with his polled popularity as a candidate.
- We would use caution in our takeaways from the big movers as well. Before his announcement, Buttigieg was a Yang-like darling in the media. Afterward, he was subjected to substantially more critical language.
- Sanders, on the other hand, has been a net beneficiary of negativity toward Biden in particular, at least in media sentiment, if not in polls. As noted in the cohesion section, we would exercise caution in consuming news and feature content about both candidates at this time.
Candidate Attention Summary
Commentary on Candidate Attention
- Among high-polling candidates, the broader election narrative (i.e. topics, language, people and issues) aligns most closely with Sanders and has since the beginning of the primary process. In other words, the things that journalists, opinion writers, bloggers and pundits are saying about the election and its major issues are the things they also associate with Bernie Sanders.
- Both Biden and Buttigieg’s alignment with key election narratives fell sharply in April, although this is likely because most coverage in April focused on their candidacy announcements (which tend to be unrelated to broader election issues). As noted above, one Buttigieg-related cluster (not the full range of articles relating to him) sat nearest the ‘Zeitgeist’ of the election coverage in April, so ‘non-announcement’ coverage still seems to be high attention for Mayor Pete.
- By analog, the narrative alignment of Beto’s candidacy improved after a similar drop he experienced in March, although in the aggregate his issues-and-language connection to the broader election coverage is still lower than it was at the beginning of 2019.
- Despite his popularity in polls, the language used in Biden articles is broadly at odds with general election narratives (which, as you might imagine from Sanders’ much higher Attention score, is because comfort discussing socialist policy is very much on-narrative). It will be worth noting in May how much the announcement drove Biden’s figure lower.
- In general, our interpretation is that media treatment at this stage has been more aligned with more progressive candidates and their platforms and less favorable to more centrist candidates and theirs.
The Continuing Series
We will publish candidate-specific reports from April for the rest of May as part of our free service on the Epsilon Theory website. Thereafter, this series will continue as an Epsilon Theory Premium feature and will require a subscription.
If you are looking for a more detailed package of our election narrative signals and analytics, including raw data and candidate- and issue-level narrative structure analysis, please email us directly at [email protected]
While I deeply appreciate this much-needed lens into the co-evolution of political dynamics and the memetic structures they spawn, I worry about your use of the broader concept of “cohesion” in reference to memetically-precipitated tails wagging behavioral dogs.
Despite the fact that this inverted feedback mechanism exists, it is but one reason why such narratives would show up as “cohesive”, as measured by this metric. Namely, assuming narratives possess a tie to reality–however tenuous–we must separate the degree to which narrative “cohesion” represents accurate distillations of underlying behavioral patterns as opposed to self-fulfilling fabrications.
Else, we risk cynically blinding ourselves to the fact that narratives do in fact emerge from an underlying reality, even when that process of emergence has been co-opted by people and institutions who understand how to consciously transform digitally-meditated rhetoric into a form of supernormal stimuli (Missionaries in ET-lexicon, I suppose). Cohesion may represent primary signal in connection with the underlying dynamical reality, or it may represent a cynically manipulated simulacra of this signal. And while you rightfully encourage skepticism of the latter, this metric establishes a frame by which the latter is assumed to always overshadow the former. Interestingly enough, this tension appeared (to me, at least) as the root of most of the caveats / ambiguity that emerged during yesterday’s ET Live.
The rationale behind my concern is best summarized by your own identification of the need for an “attention” metric as proxy for the degree to which collective focus possesses the tendency to fuse conceptual structures beyond the threshold of pragmatic utility (at least from the consumer’s POV).
Essentially, I’m making the claim that by making invisible the contribution of meaningful underlying pattern to narrative cohesion, the “cohesion” metric appears to violate the spirit of the the attention metric as presently formulated. In my view this is a critical flaw along the “cohesion” dimension, and will likely bias reader / viewer perception of your analyses too strongly toward the notion that the tail not only wags the dog, but has in fact devoured it whole. I suspect at times it’s tempting to believe this, but am not convinced of its pragmatism.
Aside from that–and as alluded to earlier–this is basically the only part of the election cycle to whose unfolding I look forward. Keep up the amazing work.
In Service to the Pack,
Matthew