Last week, I discussed one approach to answer the question: how to compare vote shares of an N-candidate election (N > 2) to the vote shares of a 2-candidate election? This post covers a second methodology. This new method is more rigorous but also harder to use - which is a common tradeoff in solving math problems, and why I didn't immediately leap here.
This is an important problem for the current Democratic primary elections in the U.S. Because Bernie Sanders is a repeat candidate, the media has carelessly compared his vote share in 2020 to his vote share in 2016. These comparisons are absurd because in 2016, the only other viable candidate was Hillary Clinton. So when he's facing 8-10 more candidates, his vote share is never going to match what he did in 2016. I've kept a whole list of instances in which the media reprinted this fallacy here.
While it's clear that no candidate in 2020 would get close to 60% of the votes, what is not clear is what vote share would be considered comparable? This is the elephant in the room.
***
You can read about the previous method here (link). The advantage of that method is its simplicity. I wanted something that is roughly accurate for any number of contestants (N). The question it solves can be simply stated as "In an N-person contest, if the winning candidate got X% of the votes, what vote split should this be equivalent to in a 2-person contest?"
The trick was to aggregate the (N-1) vote shares of the non-winners by taking the average (non-viable candidates are better combined into one invented candidate).
To use this method, you only need to know the winner's vote share. You don't even need to know the individual vote shares of the other candidates. Such simplicity comes at the price of precision.
***
The second method that I'm discussing in this post uses vote shares of every contestant. There is no need to combine non-viable candidates. The method makes use of the even race reference, which I mentioned in this other post.
To use this method, you first aggregate the vote shares into a single metric, which I call "Distance from Even Race" (DFER). This distance is then standardized by setting the maximum distance to 1. The maximum distance is attained in an extreme one-sided race in which the winner takes 100% of the votes, and no other candidates receive any votes. Then, this standardized distance from even race (SDFER) is mapped to a 2-person race, using the following table:
Since New Hampshire has finalized primary results, I use this data to illustrate how to use the table.
I took the vote shares from CNN (link). They listed 11 entries, one of which is "Other" which presumably aggregated some minor candidates. I treat this as an 11-person contest.
The even race is one in which each candidate gets one-eleventh (9%) of the votes. The distance of the actual results from the even race (DFER) is 0.10. The maximum distance is 0.30 (this is a hypothetical race in which the winner gets 100% of the votes). Thus, the standardized distance is 0.32. This means the actual result is roughly one third of the way between an even race and an extreme race.
The remaining question is what vote split in a 2-person race represents one-third of the way between an even race and an extreme race. From the table, you learn that it is roughly 66% - 34%.
***
Contrast what I just said with the media's hyperventilating over Sanders losing half of his support, and other such nonsense. The vote-share distribution in New Hampshire in the 2020 Democratic primary is equivalent to a 66%-34% split in a 2-person contest.
So, in the same ballpark as 2016, maybe even a bit better.
***
As mentioned before, this second method is more complicated to use: you have to make a separate calculation for each number of contestants (N). The number of candidates determine the vote share in the case of an even race. That in turn determines the distance from even race of the extreme case. And that decides the factor used in standardizing the DFER.
A reverse lookup shows that a 60/40 vote split (what was achieved in 2016 between Sanders and Clinton) is equivalent to a standardized distance of 20%. Only moderately far from an even race for which SDFER = 0%. I explained it before, the 60/40 split in a 2-person race is not a landslide! The even race is 50/50 so the winner got 10 percent more votes than even.
***
With this framework, one can analyze the dynamics of a multi-person race. Let me give you some high-level insights here, and I'll be back to explain them with some examples.
1) Having a group of non-competitive candidates, each taking a small bite out of the apple, makes it very hard for the top dogs to separate themselves.
2) Shuffling vote shares among the top candidates doesn't change the competitive picture by much.
3) Competitiveness is affected by the gap between the winner and the second-runner-up but also by the aggregate vote shares of top candidates versus bottom candidates.
***
The following details are for the nerds.
A given set of vote shares in an N-person race is a point in N-dimensional space. The even-race vote share is another point in the same space, and it's situated at the "center" of the set of feasible vote shares. Any feasible point in that set has an Euclidean distance to the center. Further, each feasible set has N vertices, each corresponding to an elementary vector (0, 0, ..., 1, ...,) with exactly one 1 and all other zeroes. These correspond to extreme races in which one candidate receives all the votes.
The distance from even race metric (distance from center) immediately places subsets of vote-share distributions on equal footing. Any distribution located the same distance from the center is regarded as equally competitive.
We are still in N-dimensional space. The feasible set is symmetric because the dimensions are interchangeable. There are edges in this set that represent shifts of votes from one candidate to a second candidate while all other vote shares are fixed. Each such edge passes through the center as well as an extreme point.
Given a particular vote share distribution, we have a point in the feasible set, and associated with it, a subset of feasible points that have the same distance from the center. Within this subset will be point(s) that fall on one of the edges named above. On such an edge, the center is mapped to 0 and an extreme point to 1. The distance of the given vote share distribution from the center is standardized to a value between 0 and 1.
This standardized distance is then applied to the case of a 2-person race. Vote shares in a 2-person race are points in the x-y plane. The feasible set is the line from (1,0) to (0,1) passing through the center of (0.5, 0.5). The equivalent point is determined by moving a standardized distance from the center towards one of the extreme points.
If anyone wants to help write this up in proper mathematics, let me know.
Nice proposal, would recommend using Bhattacharya distance (https://en.wikipedia.org/wiki/Bhattacharyya_distance) for distances within this probability simplex (https://en.wikipedia.org/wiki/Simplex) and finding same-distance divergences from 50-50.
Posted by: Merrick Usta | 02/18/2020 at 10:08 AM
Sorry, but no amount of math could tell you this — you don’t know how people would have voted if there had been only two candidates. This method is as much a fallacy as what the news sources do. The only way of knowing how people would have voted in a 2-way election is to hold a 2-way election. In the same way that the only way to know if someone is electable is to see if they’re elected.
Posted by: Cris | 02/18/2020 at 01:18 PM
Cris: Here's how I think about it. I am not trying to predict who would have voted for whom - which as you pointed out, is a counterfactual. What I'm doing is more descriptive statistics. Given the observed vote share distribution, what can we learn about the competitiveness of the contest? What is really happening is that I'm finding a principled way to order multidimensional vectors. It's easy to order a 2-dimensional race where the winner's share is all you need. Once you have multiple contestants, it's not clear how to order the results. The fact that there is no one perfect method does not mean that all methods are equally bad!
Posted by: Kaiser | 02/18/2020 at 03:06 PM
MU: Thank you very much for those suggestions. Will reach out once I have more to say.
Posted by: Kaiser | 02/18/2020 at 09:05 PM