Let’s start off by making one thing clear. Twitter is not real life. The people on Twitter are not a cross-section of the American electorate. Looking just at the potential voters in the Democratic primary, we know that Twitter users tend to be more progressive, more often white, and more educated. But, Twitter can still be a useful starting point. One problem in understanding how people think about politics is that we mainly have to rely on surveys of voters. This is problematic because surveys responses force voters to reflect, usually with close-ended responses that limit what they can say, about issues that the voter may or not have thought about recently (or at all). Therefore, responses are dependent on the particulars of question phrasing, and some of the information we want to know the most (like what people view as the most important issue) are difficult to extract in a reliable way.
Twitter provides us with a different context than that of surveys. Here individuals are observed actively engaging in politics. We can see what sort of issues they find most pressing by looking at how they respond to those issues. We can start to untangle the particular places that voters want to hear from candidates on then. Of course, we still have a limitation on who is present, but that is why it is only a start.
To start, I looked at what issues candidates are talking about, and, for my purposes, more importantly, what issues are causing the largest response from Twitter. To do this I used the Twitter API to grab tweets from all the potential 2020 Democratic candidates since the beginning of 2019. This included tweets from both campaign and official accounts.
The hard part here is identifying the issues. Recently, the Washington Post attempted this by having people read all 5,600 social media posts (Twitter and Facebook) from candidates and code them. I had four times the tweets and significantly fewer resources so instead, I created a bigram-dictionary to identify issues in tweets. Basically, I shortened the tweets into bigrams (sets of two tokenized words) and then took the 1,000 most frequent bigrams and coded them into issues (the dictionary is available here). If a tweet contained an issue-related bigram, I coded it as mentioning that issue. In the end, 5,352 tweets were identified as having mentioned at least one issue, of those 937 mentioned multiple issues.
We can start by just looking at what issues were mentioned the most. For clarity, in the plot below, I included only issues that were mentioned on average in 1% of the candidate’s tweets. First, are some patterns that we would expect: Tulsi Gabbard is the only candidate talking a lot about foreign policy, Jay Inslee talks about climate change more than other candidates, and Bernie talks about health insurance. Second, are some things we might have expected, but I didn’t have strong predictions would happen. Pete Buttigieg does not like talking about issues, health insurance tends to very common across candidates, and Andrew Yang only really talks about economic inequality.
In order to get into how people are responding to different issues, we can look at the number of retweets and likes a tweet gets depending on the issue. To do this though we need to try to net out the fact that different candidates are talking about different issues. I did this by estimating a negative binomial model in the BRMS package. A negative binomial model works well for the type of data we have here, where each observation is a count of something (number of retweets, number of likes). In addition, I included a variety of other variables, including what candidate tweeted it, what time of day it was tweeted (morning, afternoon, evening or night), and if it was a quote tweet, retweet, or a reply. To account for the fact that candidates have gained or lost follows during this time period I included a spline for each candidate. Splines are a way to flexibly include time trends that do not make strong linear assumptions (don’t just fit a straight line) but allow for a flexible curve instead.
What we get in the end then is a variety of coefficients that we can interpret for how that variable changes the number of retweets or likes. Negative binomial coefficients aren’t easily interpretable in themselves, so we can translate them into Incidence Rate Ratios (IRR) which, for the issue variables, show the multiplicative effect of having a tweet mention a specific issue. If a tweet would have gotten 10 likes, and the IRR for an issue is 2, then had it mentioned that issue we would expect it to get 20 likes instead.
The IRR plot across all issues is below. At the top, we see several issues that are highly focused on Trump or particular events (like the shutdown or the ongoing disaster in Puerto Rico). Below this, we start to see some more particular issues pop out: Guns, Immigration, and Climate Change are consistently significant across both retweets and likes. Tweets mentioning issues around guns do extremely well, garnering approximately 50% more retweets than a tweet that does not mention guns. Immigration tweets tend to do 38% better on retweets and 25% better in likes while Climate Change does 19% better on retweets and 15% better on likes.
In contrast, the only issue that seems to consistently hurt the chances of a tweet is sexual assault. This might be because people have a natural reluctance to discuss what is often seen as a more personal issue, or it might be because patriarchal views still dominate.
Returning to our initial question. It looks like Twitter is interested in hearing candidates talk about guns, immigration, climate change, and Puerto Rico as well as discussing what Trump is doing. This might not be what the electorate wants to hear from candidates as a whole, but this can give us more of a direction than the gut feelings that often dominate such discussion.
As an added bonus, here are the splines. You can watch Beto drop.