As straightforward as it may seem, simply asking voters who of the more than 20 Democratic presidential candidates they plan on supporting in the upcoming primaries is not a great way to identify which candidate is in the best position to win.
With over 20 candidates and a handful of potential candidates in the field, it is increasingly difficult to identify support for candidates. If surveyors use a traditional vote choice question, then the only information identified is who is the number one candidate. This is obviously important, but as candidates drop out it isn’t clear how votes will move around. Another option is asking which candidates a voter is considering supporting. This provides more detail about the potential support for a candidate, but does not necessarily indicate preferences between different candidates. Ideally, we would like know the degree of support each candidate has that includes both its intensity and its breadth.
One way to combine the different ratings is to treat this as a latent variable problem, estimating general support for each candidate. To do this, we asked 475 Democrats who they were considering voting for and who they would not consider voting for. Then, for those who said they were considering multiple candidates, we asked them to rank the candidates they selected.
Joe Biden received potential support from the largest percentage of voters, and was in the middle of the pack as far as voters saying they won’t vote for him. Bernie Sanders, on the other hand, was the third most supported candidate and the most ‘opposed’ candidate with 27% of Democratic voters saying that they wouldn’t consider supporting him. Combining these ratings requires some attention to what we are interested in gathering: latent support for a candidate.
Start by letting θj be the candidate level support while Ysupp
j,k is the response to our question asking voters if they are considering supporting a candidate. It is 1 if voter k indicates they could support candidate j and 0 if not. Yopp
j,k is the same but for the couldn’t support question, and so is 1 if voter k indicates they couldn’t support candidate j. We can start with the model:
~ α supp
~ α opp
This says that voter response to our survey question is a function of the latent support of a candidate (θ) and the voter's general tendency to say they will support/oppose a candidate (α).
For voters that say they would support multiple candidates, we drop out their responses saying who they would support and instead include them based on their rankings of candidates that they support. Let Yj,i
k be 1 if voter k ranks candidate j as better than candidate i, and 0 if it is the other way around. We can then estimate:
k ~ θj − θi
The latent θs for each candidate are plotted below. The score ranges from approximately -1 to 1 with higher values indicating a candidate has more overall support. There are some similarities to the measure above, but there are also some major differences. The gap between Biden and Warren shrank, while Buttigieg and Sanders flipped orders. In addition, Abrams jumped up several ranks. Finally, DeBlasio is now firmly at the very bottom of the pack thanks to the high proportion of voters that said they would not vote for him.
The latent scores are a bit hard to understand, so what we can do is use them to calculate pairwise comparisons as if everyone had ranked all candidates -- effectively allowing us to estimate how each candidate would perform if matched up head-to-head with each of the other candidates. We find that Biden outperforms every candidate. We also see a very clear top tier in the candidates, comprised of Biden, Warren, Buttigieg, Sanders, and Harris. This tier is a bit different than what is often identified as the top tier of Democratic candidates. For example, Warren is solidly in the top tier according to our latent pairwise comparisons, while Nate Silver recently placed her in a second tier. Furthermore, despite Biden’s continued dominance of polls that ask respondents to report a single candidate preference, he does not really stand out from the rest of the top tier here.
A field of nearly 30 candidates ought to push pollsters to find new ways to compare support across candidates. The decisions that they make will be important in determining which candidates receive media attention, which are considered viable, and so on -- which can have knock-on effects in the electorate itself. Here I’ve demonstrated one way to try to parse out where candidates stand in comparison to each other by combining both support, opposition, and nuanced opinions about candidates into a single metric.
A Few Technical Details
There is a unique problem here: responses to one survey question partially constrained responses to other survey questions. The assumptions of latent variable models assume that the only dependence across items is the result of the latent variable, which is broken in this case. For example, respondents only ranked candidates that they identified as potentially voting for, and so if you included both the rankings and the support question there would be a strange dependence. A similar concern might be raised about the fact that I separately include if a voter would vote for a candidate and if a voter wouldn’t vote for a candidate. Because these questions were asked separately and there were not any constraints, there are a small number of people who said they would vote for a candidate and said they would vote against a candidate. These people either: were confused, changed their mind, or were not paying attention.