Early voting correlations — state by state early votes, covid19 count and partisan bias

Larry Tarof
10 min readOct 24, 2020

Introduction

There has never been more uncertainty in the US surrounding the 2020 vote. Americans are voting early in 2020 as never before, largely due to the combination of hyperpartisan political divide and prudence in response to covid19.

It is worth understanding the partisan divide surrounding the early vote, recognizing that each state, with its own electors, forms its own dynamic. There are a number of principal factors at play:

1. The absolute number of votes cast early

2. That number as fraction of both the registered voters and total eligible voters

3. Covid dynamics

4. Hyperpartisan dynamics

This is a PSA (public service analysis). While I’m concurrently trying to learn better data visualization tools, I am for now amalgamating different sets of data. Here are some graphs and correlations/non-correlations that are time sensitive relative to the US election on Nov 3, only 10 days away at the time of this writing.

Sources used and data extraction procedures

There are some excellent sources which are updated dynamically. It is worth discussing what these sources measure.

https://electproject.github.io/Early-Vote-2020G/index.html provides how many votes cast per state, and is updating daily by Professor Michael McDonald, a political scientist from the University of Florida. The Oct 24 stats were manually extracted, state by state, and are used in this work for how many votes are cast per state.

https://worldpopulationreview.com/state-rankings/number-of-registered-voters-by-state is used for # of registered voters, % of registered voters, and then dividing these two quantities to obtain # of eligible voters.

The Cook partisan voting index tracks the voter percentage difference, state by state, based on the voting over the last two presidential election cycles. Numbers were extracted manually from https://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index are used in this work.

Nate Silver’s team at https://projects.fivethirtyeight.com/2020-election-forecast/ provides state by state poll/popular vote forecast, state by state, as part of their “snake” visualization of the electoral college map. These numbers were extracted manually, state by state, Oct 24, for the Presidential race visualization, substrating Biden from Trump, and these numbers were used in this work as a measure of partisan bias on Oct 24, 2020. Note that Nate Silver’s percentage were multiplied by 100 to maintain consistency with Cook numbers above.

For both Cook and Nate Silver partisan bias, I’ve maintained the usual visualization — namely that negative numbers, to the left, represent the left, and positive numbers, to the right, represent the right.

There are many sources of covid19 information based on primary info. In this case, the visualizations by Dan Goodspeed https://dangoodspeed.com/covid/total-cases-since-june are used, which measures the number of covid cases since Jun 1, and are also color-correlated c.f the Cook partisan bias index. The specific numbers from Oct 22 are manually extracted and used in this work.

I’ve used csv data downloads when possible, but most of the data needed manual extraction (either 50 or 100 points per extraction, many extractions over many days). It is possible that a few, but hopefully not many, manual transcription errors were made.

Data correlations and discussion

First — let’s examine early voting statistics. At the time of this writing, Oct 24, 2020, more than 56 million votes have been recorded, representing 36.7% of the 152 million who have registered. Note that the number of registered voters is significantly larger than the 2016 electorate of approximately 130 million.

Fig 1 — Early votes vs state partisan bias (Cook), per state

The number of early voters vs Cook partisan bias is shown in Fig 1. As with all the succeeding figures, each point represents one state for a total of 50 points in each graph. This has the visual impression of an exploding volcano. California, Texas and Florida are way off the top, and the remaining states give the impression of a mountain topography. The closer to neutral, the greater the absolute number of voters. At first this makes sense. The closer to political neutral, the more any individual vote matters. The farther from neutral, the more likely any individual vote will not make as much difference to that state. Note that New York and Hawaii early voting statistics are not available at this time from these sources.

Fig 2 — Early votes vs state partisan bias (Nate Silver), per state

We can get an even better picture of this from Fig 2, which is the number of early voters vs Nate Silver partisan bias. Here we see the same general idea, but the “mountain” is more smoothed out, and it becomes even clearer that the main distribution to the left (left partisan bias) goes higher than the main distribution to the right (right partisan bias). Here we see some evidence of the first of the major themes of this election: the right does not favor early voting and the left does favor early voting. The absolute numbers seem clear.

However, the picture muddies somewhat when we consider the percentage of registered voters in each state who have voted early.

Fig 3 — % of registered voters voting early vs Cook partisan bias, per state

Fig. 3 shows the percentage of registered voters in each state who have voted early vs Cook partisan bias. This does not look like the exploding volcano. Indeed one is hard pressed to find any semblance of distribution. Florida and Texas are at the forefront — with 60% already having voted according to this data — but they are leading a peer group of peer group of 8 states, all of whom have more than half the registered votes cast as of this writing.

Fig 4 — % of registered voters voting early vs Nate Silver partisan bias, per state

Fig 4 shows the percentage of registered voters in each state who have voted early vs Nate Silver partisan bias. This, too, does not look like the exploding volcano, but there is a difference — there is a clearer dropoff beyond Texas to the right. This is consistent with the following interpretation. Nate Silver’s polling is the most up to date possible. The Cook partisan index is most relevant in an environment where change happens slowly. As states skew left, there is a loose correlation between skewing left and voting early. This is, though, a loose correlation. It is worth noting explicitly the skew left for Texas, an historically red state. It is particularly noteworthy that in this state, with the second highest electoral votes, 60% of the registered vote has already voted. According to https://www.vote.org/voter-registration-deadlines/ there is no chance for election day registration and the registration deadline has long passed.

So whatever developments occur between the time of this writing and election day can affect at most 40% of the registered vote, assuming the 60% already voted are tabulated fairly and accurately.

What about the effect of covid19? This is a topic which merits a paper of its own, but in summary there is a clear effect in policy, right vs left, and the tendency to have more active covid19 cases.

Fig 5a — states with most covid19 cases (Dan Goodspeed)
Fig 5b — states with least covid19 cases (Dan Goodspeed)

First, it’s important to see the effect of right vs left on covid19 cases. Dan Goodspeed (referenced above) has done a great job of visualizing the data in Flourish (highly recommended), and I’ve screen-captured the most recent (Oct 22 at the time of this writing) data of covid19 cases per million since in two charts: Figs 5a and 5b — showing the most and least cases, 25 states per chart. Visually it’s clear that the first chart, with most cases, is predominantly red and the second chart, with least cases, is predominantly blue. His color coding follows the Cook partisan bias. Why since Jun 1? Dan Goodspeed finds that Jun 1 is a good date by which early prevention measures had the chance to be in place, and so he counts, from primary data, from Jun 1.

Fig 6 — % of population with covid19 vs Cook partisan bias, per state. The dashed line is the 50th percentile.
Fig 7 — % of population with covid19 vs Nate Silver partisan bias, per state. The dashed line is the 50th percentile.

Consistent with the analysis in this paper, I’ve plotted the same data for all 50 states vs both Cook (analogous to the visualization in Fig 5) and Nate Silver bias, in Figs 6 and 7. In both cases I’ve included a dashed line indicating 50th percentile. Half the states are above this line and half the states are below. Visually it is clear that the upper right quadrant is highly populated — i.e. all points above the median of covid19 cases since Jun 1 have a Cook partisan bias >-1. With the exception of California (at the median level), there are no left leaning states above the median. The Nate Silver plot shows a similar story, but in this case, all points above the median line have a bias >-5, not -1. I.e. there is a cluster of states with high covid19 incidence, which states are nominally near or above zero for historical bias (i.e lean right) which are now leaning left. Although many interpretations are possible, one which looks causal is that the high covid19 count has caused these states to skew more left than before.

In terms of percentages of people reported as infected with covid19 since Jun 1, there is a large disparity — with Vermont as low as 0.16% of the population and N.Dakota as high as 4.2% of the population having contracted covid19. This is a large spread, which merits its own paper and discussion, but it is well worth mentioning here.

How does covid affect early voting?

Fig 8 — early votes vs percentage of covid19 infections, per state

Fig 8 shows the early votes, by state, vs population percentage of covid19 infection, again since Jun 1. Leaving out the major states of California, Texas and Florida, the distribution looks peaked once again, similar to Figs 1 and 2, but with the peak of the mountain to the left of the volcano explosion.

Fig 9, though, shows a different picture. This shows the percentage of early voters (early voters as fraction of registered voters) vs percentage of covid19 contraction since Jun 1 for each state. There is no distribution apparent whatsoever. There is a spread in percentage of voters having already voted early going from near zero to as high as 60%, well spread out, with the average of the full US electorate of 36.7% and a median value of 25.4%. The high absolute numbers of early voters in Califorina, Texas and Florida skew the average much higher than the median. There is a spread in covid19 percentage from near zero to as high as 4.2%, well spread out. One would naively think that voters in those states with more covid19 cases would want to vote early and not risk spreading or contracting the infection on Election Day. Why is this not the case? Let’s return to this point after considering the hyperpartisan divide in the US today.

Fig 10 — State partisan bias, Nate Silver vs Cook, per state. The line for the equation of best fit is shown together with correlation coefficient.

The partisan bias calculated by Nate Silver polls is plotted against the Cook partisan bias, representing the last two election cycles, in Fig 10. This is what hyperpartisan looks like. The equation, with a high correlation coefficient of 0.94 is given. Recent hyperpartisan bias magnifies the previous bias, measured over the previous two election cycles, which were already polarized, by a further factor of 1.76, but also skews left by 7.66 percentage point differences (consistent with 4.3 points on the previous Cook scale). This percentage difference is a similar number to the poll results for Biden vs Trump as of this writing. Let’s be clear about what these numbers mean. A 33.3 point difference on a full 100% vote means there are two voters in one direction for every voter in the other direction — and this occurs both left vs. right and right vs. left. This is not a small difference, and is a much larger difference than from even the previous two election cycles, which were already quite polarized. The observed magnification factor means the left is getting more left and the right is getting more right.

This brings us back to the question of why, in Fig 9, there is no correlation whatsoever between the percentage of early voters as fraction of registered voters and the percentage of population which has contracted covid19 since Jun 1. I would suggest there are two competing effects: red states and Trump are on record as not supporting early voting (which would suppress percentage voting early) but also as implementing policies which are in opposition to preventing covid19 spread (as evidenced from Figs 5–7). These two effects may well offset one another. High covid19 cases and the absolute and utter lack of control of covid19 in the US are driving people to vote early. However, in many cases it may be that this early voting is more related to total US covid spread rather than local covid spread or fear of Election Day voter intimidation at the polls (many references, but I don’t want to delve here — I merely mention as a potential cause), so there are multiple dynamics at work.

Closing remarks

If you’ve read this far, I greatly appreciate it. I’ve tried to present data with as little political spin as possible and draw the correlation and non-correlations suggested by the data. We see that in the 2020 election there are the multiple dynamics at work which have influenced early voting. Indeed, in 8 states more than half the registered voters have already voted, including Texas at the 60% mark and with the most number of early votes of any state. Hyperpartisanship, covid19 and other dynamics are influencing the 2020 election dynamics as never before in our memories.

Also, it is worth pointing out explicitly that there is no guarantee that the polling percentages reflect the final electorate percentages for any state. It is clear that the Election Day voters will be skewed right, as it appears the early voting is skewed left. But the balance between them remains to be written, no matter what the polls say as percentages.

--

--

Larry Tarof

Larry is a semiconductor physicist by day and a musician (piano/voice/guitar, “Dr L’s Music”) evenings/weekends. He should someday update his LinkedIn profile.