Skewed dice and the “Jim’s Birthday Problem”

Introduction

I come from a family with a fair number of gamers. Board games, role playing games, video games: someone in my family plays just every kind of game. A few years ago, I took a course on machine learning, where we spent a while reviewing all the probability rules that get covered in a Statistics 101 class, and while pondering the types of questions that DON’T get put into the 101 curriculum. This is about one of them.

Other than a coin flip, the most common probability example in introductory classes is based on a fair, uniform 6-sided die, where we assume that all sides are uniformly equal in probability. For example:

The probability of rolling a “n” is 1/6, or 16.7%, for any “n” in [1, 2, 3, 4, 5, 6]. (1)

In these introductory classes, this uniform assumption is extended to example after example.

The probability of rolling “6” every time on k dice rolls is (1/6)^k. (2)

In order to roll a “4” in two die rolls, the first and second dice must be 1 and 3, or 2 and 2, or 3 and 1. These are the only 3 combinations that produce a “4.” Therefore, the probability of rolling a 4 on two dice rolls is the sum of the probabilities of rolling 1+3, 2+2, and 3+1.

There are 36 combinations of rolling “a”+”b” on two rolls, and each are equally probable.

Therefore, the probability of rolling “4” is 3 out of 36 = 8.3%. (3)

Probabilities for a skewed die

The fundamentals of probability still hold if the assumption of uniformity is violated. Skewed dice were designed (by the author) to illustrate how pervasive the uniformity assumption is. These dice have been rolled a few hundred times to determine experimentally the probabilities of each number coming up.

Figure 1: skewed dice, as saved on Thingiverse. Thanks to Josh Silverstein for saving, printing, sanding, and painting a set of model dice for me.

For Table 1 below, these probabilities have been rounded to a convenient number that is within the confidence interval of estimates of the true probability. For example: the best estimate of the probability of rolling “2” is 19.67%, but I’m reporting it as 1 in 5, or 20%. These rounded-off probabilities total 101%. Deal with it. More precise estimates are included in Appendix 1.

NOTE: for this paper, it is assumed that the numbers on an n-sided die are 1, 2, 3, …, n. For mathematicians and computer scientists who like to start counting at zero, you have my apologies.

Table 1:

Digit	Odds	Probability
1	1 in 3	33%
2	1 in 5	20%
3	1 in 10	10%
4	1 in 5	20%
5	1 in 9	11%
6	1 in 15	7%

For Equation (2), calculating the probability works the same way: the probability of rolling “1” three times on three rolls is

(1/3)³, or 3.7% (2a)

while the probability of rolling “6” three times is much less likely

(1/15)³, or 0.03% (2b)

Probabilities for multiple rolls

The probability of rolling a 4 on two dice rolls is STILL the sum of the probabilities of rolling 1+3, 2+2, and 3+1, but if the uniformity assumption is not valid, these probabilities have to be added up separately. For example:

Let Pr(j, k) be the probability of rolling “j” as the sum of “k” die rolls. The probability of rolling a 3 on one roll is therefore denoted Pr(3, 1), which from Table 1 is 10%. The probability of rolling a 1 and a 3 is the joint probability of the two rolls:

Probability of 1 and 3 is the same as Probability of 3 and 1:

= Pr(1, 1) * Pr(3, 1) = (1/3)*(1/10) = .033 (3a)

Probability of rolling a pair of 2s on 2 rolls is the square of the probability of rolling a 2 on one roll:

= Pr(2, 1) * Pr(2, 1) = (1/5) * (1/5) = .04 (3b)

So the probability of rolling “4”:

= P( 1 and 3) + P( 2 and 2) + P( 3 and 1) = .033 + .04 + .033 = 0.106 —> 10.6% (3c)

This is further illustrated in Figure 2. For any given number of rolls, the probability of rolling a certain total is an extension of the probability from each side.

Figure 2: Probability of rolling a total number j with k rolls of the skewed die.

As a further extension of this, we see that there is a simple way to calculate the probabilities for an arbitrary k die rolls is iteratively. If we have calculated all the probabilities for k-1 rolls, then:

(4)

A table of these probabilities is shown in Appendix 2.

Application of Central Limit Theorem

For a “fair” six-sided die, the average score over a large number of rolls is 3.5, which is the average of all sides: (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5. For a skewed die, that formula is more properly expanded into a weighted average, by multiplying the value of each side by its probability. Thus, for our 6-sided die, The average of many rolls is:

Pr(1, 1) * 1 + Pr(2, 1) * 2 + Pr(3, 1) * 3 + Pr(4, 1) * 4 + Pr(5, 1) * 5 + Pr(6, 1) * 6 +

[ (.33 * 1) + (.2 * 2) + ( .1 * 3) + (.2 * 5) + (.11 * 5) + (.07 * 6) = 2.8 (5)

So, for example, if you rolled the biased die 5 times, you can expect a total of around 2.8 * 5 = 14.

It goes without saying that to get a 14 on 5 rolls, this can happen by rolling 4 three times and 1 twice, or rolling 1 three times, 5 once, and 6 once, or any number of other ways. However, as we roll the dice more times, it becomes more and more likely that there will be a combination of numbers which eventually approach the “true” average. For a more rigorous explanation of this, go somewhere else.

Another of the consequences of the Central Limit Theorem is that for a high enough number of rolls, the distribution of the sum of rolls looks more and more like a normal Gaussian distribution. In Figure 2, this is clear as the k=4 rolls and k=5 rolls look more and more like a normal bell-shaped curve.

This brings up an amusing application. In this case, the standard deviation is around 3.22, which is around 83% of the standard deviation for the fair die. Also, the mean value from each roll of the die is around 79% of the mean for a fair die. Within a margin of error, the sum of 6 rolls of this biased die will give around the same distribution of outcomes as 5 rolls of a fair die.

Calculations are left as an exercise for the reader (or see how it’s done here.)

The Jim’s Birthday problem

In Equation (4), we presented an iterative formula for Pr(j, k). Theoretically, a closed form solution for calculating this should exist. In honor of my brother’s 50th birthday, I’m calling this the Jim’s Birthday Problem.

APPENDIX 1: actual dice rolls for probability estimates

Based on 3 sets of 100 rolls.

APPENDIX 2: Pr(n, k) — probability of rolling “n” on “k” rolls

Given the single roll probabilities estimated in Appendix 1, the probabilities for rolling a given sum j on k rolls is calculated from Equation (4).

Probabilities:

Sum	1 roll	2 rolls	3 rolls	4 rolls	5 rolls
1	33%
2	19.667%	10.890%
3	9.667%	12.980%	3.594%
4	19.667%	10.248%	6.425%	1.186%
5	11.333%	16.782%	6.987%	2.827%	0.391%
6	6.667%	16.150%	10.950%	3.917%	1.166%
7		12.660%	13.408%	6.315%	1.963%
8		8.681%	13.189%	8.924%	3.361%
9		5.747%	12.243%	10.389%	5.256%
10		3.907%	10.589%	11.304%	6.964%
11		1.511%	8.697%	11.521%	8.511%
12		0.444%	6.041%	10.979%	9.761%
13			3.780%	9.524%	10.457%
14			2.232%	7.625%	10.411%
15			1.166%	5.791%	9.690%
16			0.519%	4.069%	8.550%
17			0.151%	2.624%	7.104%
18			0.030%	1.535%	5.537%
19				0.824%	4.054%
20				0.403%	2.792%
21				0.169%	1.807%
22				0.058%	1.085%
23				0.013%	0.602%
24				0.002%	0.307%
25					0.143%
26					0.059%
27					0.021%
28					0.006%
29					0.001%
30					0.000%

Back in my day, candy was a nickel

Everyone knows that inflation is “when prices go up” and everyone knows that inflation is the “natural” state of our economy. (More specifically, the Federal government maintains intentional feedback mechanisms to maintain a low positive inflation rate, but that’s an article for another day.) What fewer people have is a gut feeling for how much effect a small, cumulative inflation has on prices over the years.

In a recent Newsweek article about the proposed rise in the US Federal minimum wage, Senator John Thune is quoted as opposing the proposed $15 per hour level because as a kid, he made $6 an hour and did fine. As Newsweek points out, decades of inflation makes $6 in 1978 worth about the same as $24 in 2021,

This “candy was a nickel when I was a kid” misdirection is common, and it offends me. When people intentionally misuse math to make a false comparison between data from different decades, it bothers me even more that they do it to hide their own lack of empathy. Minimum wage was $2.65 in 1978. How many people have a feel for how much more that would be today?

I decided to put together a handy conversion table.

The US Bureau of Labor Statistics has more raw data than most of you are going to want for a “market basket” calculation they use to compare prices over time. They calculate several flavors of Consumer Price Index (CPI), and while there are some issues, it’s been a standard for comparison since shortly after WWII. I’ve picked series CUSR0000SA0 for this comparison: “All items in U.S. city average, all urban consumers, seasonally adjusted”.

The CPI was 262.231 in January 2021.

At some point in the past, it was half of that, at 131.1, so a dollar at that time was worth $2 now.

From the tables, that date was August 1990, so $1 in 1990 would be worth around $2 now. The rest of my table follows the same format: CPI in October 1951 was 26.2, one tenth of today’s, so $1 then would be worth $10 now.

Date	CPI	Multiplier
2021 January	262.231	1
2000 December	174.6	1.5x
1990 August	131.1	2x
1981 January	87.2	3x
1978 July	65.5	4x
1975 February	52.6	5x
1973 April	43.7	6x
1966 September	32.8	8x
1951 October	26.2	10x
1947 January	21.5	12x

Table of consumer price index for selected dates and multipliers to today’s prices

Note: the 1970s were a time of consistently high inflation, and everyone was aware of it. If your wages weren’t going up 10% or more some years, you were making less than the year before. That shows up in the table: $1 in February 1975 is worth $5 today, while $1 in April 1973 is worth $6 today.

To make a long story short, if anyone tells you about wages and prices in the good old days when they were a kid, be prepared to multiply it 2x for folks who got their first job in 1990, triple it for first jobs around 1981, and multiply it by up to 10x for some of the oldest boomers.

What’s going on with GameStop, Part 1

For anyone trying to get a clearer look at what’s going on with GameStop (GME) and the stock market, here’s our attempt at making it clearer. (I’m writing this jointly with Debby Williams of Feminist Utopia. Actually, other than some of the number parts, she wrote most of it.) We’re going to have to break this into several posts, because there’s a lot of layers, but let’s start with the basics.

Here’s some of what happened.

A few weeks ago, powerful investment managers at hedge funds made a promise to sell GME stock at a fixed price that was less than the current price, at an agreed-upon future date. That’s called a short, and the big players in question are Melvin Capital, a team that focuses almost entirely on shorts, and their partners at Citadel. Following this, they bad-mouthed the company to lower the value of the stock. They did this at ‘idea dinners’, talking on CNBC, and by making sure the social media people of companies like Citron Research were pushing their agenda. If they had been successful, as they usually are, the resulting loss in stock value would become an opportunity to buy below their fixed price and complete the sale at a profit. That’s pretty standard.

What makes this less standard is that retail investors noticed that (1) the promised volume was more than 140% of the total tradable volume, (2) many of them felt that the company was being unfairly damaged, and (3) they like GameStop… as a friend. Therefore, they and their friends bought and held GME stock in an attempt to keep its price from going down, which put Melvin and Citadel in a bind. If the stock went up, they’d be on the hook for a huge loss. Melvin/Citadel fought back, and other hedge funds joined in and are also fighting to push down the value of GameStop. As of right now, they’re not winning that fight, but there’s a lot more going on.

Our main points:

1) We expect an unusually large amount of reader push-back from our opinions here, and we request that people hold their thoughts before posting. If you don’t know what “to the moon” and “diamond hands” are then you don’t know anything about WallStreetBets subreddit and what their motivations are or aren’t. You seriously have to go read through posts there. Similarly, if this week was the first time you heard the expression “squeezing a short” then you don’t know enough to peel back layers of this onion, and should probably listen with your mouth shut.

2) Debby got her first deep course in securities working at Dow Jones in the 90s. We learned thoroughly during that time that (a) the market is a gambling casino, (b) the house always wins, and (c) it’s better to gamble in ways where you lose less. Not to belabor the point, but we’ve learned a lot since then, and there are a lot of layers to what’s going on.

3) If you’re going to argue that companies need to have functional value to succeed in the stock market then you need to take your time machine back to the 1980’s and invest there because that view is just disconnected from reality. It’s a lie we tell in high school econ, like we tell kids the Founding Fathers were such upstanding guys. It’s a useful simplification sometimes, but the exceptions are too common to ignore. Please hold replies to this — we’re going to go into it a lot deeper in another post.

4) People are primates. We are emotional creatures, as are the hedge fund managers. If that bothers you… tough. Individual stocks are traded on perceived value along with emotions.

Anyway we’ve really struggled on how to format this. We literally have over a dozen pages of writing and links to cover, on topics from the history of short selling to the lack of enforcement by the SEC. So at this time we’re just going to answer some of the really bad positions we’ve been seeing first. There will be at least 3 posts of “bad takes” followed by “Hedge funds are bad for America” possibly intertwined with “Shorts are stupid” and a finale of “What can be done?”

Bad Take #1 – WSB is manipulating the market.

The readers and posters of the subreddit “WallStreetBets” (WSB) is not “manipulating the market” any more than the hedge funds manipulate the market when they go on CNBC or have their “ideas dinners” where they decide which ones to short and try to sell the idea to other major players. The so-called manipulation that WSB doing is the mirror image of hedge funds saying a company’s stock is garbage; people are allowed to point out that a gamble is garbage and that they want a part of the action. While it was one of the self-described “smooth-brained apes” on WSB that first publicly drew attention to the over-shorting of GME, it’s clear from trading volume that WSB wasn’t the only group of people who noticed it. They’ve been discussing it since December 2019 and has only accelerated. Let’s discuss some key points on the timeline….

On January 12, GameStop announced that Ryan Cohen (founder of Chewy’s) would be joining the board along with 2 other former Chewy execs, as part of a venture capital’s firm decision to help GameStop pivot and leverage their online presence. That added a lot of perceived value to the company and made the short position even weaker. Over the next 2 days, 238 million shares changed hands, which was huge when you realize that a typical day has seen 4 million to 10 million shares traded per day. Also during those two days, the share price doubled, from $20 per share to $40 per share.

Notice the volume change? That didn’t happen because WSB just wanted to set fire to the system. It happened because larger investors decided that the over-shorting plus the possibility of greater online sales meant GME just became a much better buy. WSB at this time has around 150,000 active members. Many are living paycheck to paycheck and/or living with their parents, and are very accustomed to being broke or near broke. Even if all 150,000 of them each found the money to buy $1,000 worth of “stonks” at $20 a share, that would have purchased 7.5 million shares, and even that much is an over-estimation of what they were able to do. Many WSB posters bought single shares or even fractional shares. In fact, many WSB members had bought their shares back in Aug 2020 when shares were $4 and simply held them. Buying and holding is not market manipulation. Also clearly, that accounts for only a tiny fraction of the 145 million shares that changed hands on the 13th, and the additional 94 million that traded the next day.

The obvious conclusion is that other big investment funds were “squeezing the short.” They were buying these stocks to keep the price up and force Melvin Capital to take a huge loss when their due date came on the original promise. It’s just not possible that trades of 145 million shares, valued at over a billion dollars, just changed hands accidentally or due to the actions of WSB.

Our position is that without Reddit, this squeeze would still have happened. You probably wouldn’t have known about it, and there wouldn’t be a group of Reddit apes to blame. You probably didn’t notice the at least 2 short squeezes on Tesla stock in 2020 either. This became big funds against big funds by January 13^th.

Since the 13^th, we’ve seen a lot of people accusing WSB not only of market manipulation, but of doing so irresponsibly, vindictively, and even ignorantly. They predict a widespread crash in the market with long-term effects for years to come. It’s important to ask why they would do this.

Blaming WSBs for ‘manipulating the market’ is hedge fund managers’ way to denigrate the little people for trying to get a few crumbs from the table. It’s like blaming immigrants for low wages or blaming communities of color for crime. It’s all a distraction from the real story which is this: greedy hedge funds over-shorted and got caught doing something illegal. (Naked shorts have been illegal since the 2008 financial crisis). It’s causing a market disruption because many funds are having to sell off their good stocks. That’s why the portfolios of the middle class are starting to take a hit.

So the shorting hedge funds pushed back. Then what happened? It became personal for retail investors when CNBC started talking smack about WSB and “retail investors.” It was all out war after Robinhood bowed to Citadel and restricted trades including selling GME that had been purchased with margin money without permission.

For those having trouble keeping up, Robinhood is a platform where many of the self-proclaimed Wendys-paycheck-earning, smooth-brained apes from Reddit were able to buy fractional stocks with their chicken tendie money. Robinhood is supported by a big investment fund, and when it became clear that the buy-and-hold strategy by the Redditors was making it hard on the hedgies that were shorting, Robinhood blocked them from buying more, and in some cases sold shares on their behalf against their will. Sure, it was in the fine print that they were allowed to do it. That doesn’t make them the good guys.

That covers the first layer of what’s going on. Stay tuned for another layer, later this week.

Benford’s Law in Georgia politics

In recent court filings about the Georgia election, Christos A. Makridis made some wild accusations about voter fraud which are rebutted in great detail by Professor Jonathan Rodden of Stanford. Amid his piece-by-piece analysis, Rodden addresses a claim about Benford’s Law.

Madrikis writes:

One diagnostic for detecting fraud involves Benford’s law. In the case of election fraud, that means looking at the distribution of digits across votes within a specified geography. Using precinct level data
for Georgia, my research identified 1,017 suspicious precincts out of 2,656 when we look at advance ballots.

In his response, Rodden can be seen almost visibly throwing up his hands:

Studies using the socalled First Digit Newcomb-Benford Law have come under heavy criticism for
electoral applications… The type of analysis undertaken by scholars in this literature cannot be written up in a breezy paragraph that lacks crucial details, such as Dr. Makridis’ brief exposition on page 4 of his report.

In this post, I will discuss some rules of thumb for when Benford’s Law will be a reasonable thing to expect, and why this isn’t one of them.

For those unfamiliar with this “law”, let’s start by noting that it’s not a causal law like “gravity points down” or a legislative law like “don’t drive drunk” but rather an observation that many data sets follow a particular interesting pattern. It’s not an enforced rule, but an observed one. Here’s what it looks like.

Suppose you gave everyone in the local high school 24 cookies, and told them to eat what they want and take the rest home. When they left the school, you tallied how many cookies each person had left. Chances are, many of the kids will eat no more than 4. Their remainder will be “twenty-something,” or some number starting with 2. Of the others, almost all will eat fewer than 14 cookies, leaving a number starting with a 1. There may be a few with zero left, but not many. There will likewise be few who eat 20 of the 24 and take home the rest: there will be few remainders of 4.

That’s basically all it is. For a lot of data sets, you can expect the first digit to be “usually” some particular digit or follow a certain pattern. If you are looking at a list of the tallest buildings in the world (in feet), you can expect the first digit to be almost always 1. There is a very short list of buildings over 2000 feet, and anything under 1000 just won’t make the list.

Anyway, Benford’s Law is the observation that for certain types of data, the first digit is 1 more often than any other, and 2 is second most common, etc. Madrikis is suspicious because the number of people voting at each precinct doesn’t follow that pattern. Rodden rolls his eyes. If it’s not yet clear why, let me finish.

Georgia voters are assigned to one of 2652 precincts. In the 2020 presidential election, just under 2.5 million of them voted for each presidential candidate. If evenly distributed, that would be around 940 registered voters per precinct. If you tally the number of votes per candidate at each precinct, you can expect a lot of those tallies to start with 9. A lot will also start with 8, and a pretty good number will start with 1. You would expect the number of precincts with totals starting with 2 will be minimal: it either means 2000+ voters went one way, or under 300. It happens, but in Benford’s Law data sets, 2 is the second most common first digit. That just isn’t going to be the case here.

That’s all — common sense. If someone wants to dig up the precinct data and confirm how it actually fell out, feel free. Please let me know. I think the GA Secretary of State has that. I just know I wouldn’t apply Benford’s Law here, and anyone using it as “proof” of fraud because election results didn’t turn out the way they wanted is ignorant of how it is applied, or is trying to bamboozle people, or both.

Quadrillions

Folks, here’s another “I read it so you don’t have to”. For the TL;DR crowd, the claims that statistics show a one in a quadrillion chance of the election turning out the way it did comes from a painfully abused misapplication of math. It’s nonsense.

The Supreme Court filing by Texas against Pennsylvania, Georgia, Michigan, and Wisconsin has, among other things, the following statement by Charles J. Cicchetti, an economist:
_{“I was asked to analyze some of the validity and credibility of the 2020 presidential election in key battleground states…
I determine the Z-scores comparing the number of votes Clinton received in 2016 to the number of votes Biden received in 2020. The Z-score in 396.3. This value corresponds to a confidence that I can reject the hypothesis many times more than one in a quadrillion times that the two outcomes were similar.”}

Over the last 3 days since the filing, I’ve heard “quadrillion” more times than I usually do in a decade. It’s now being thrown around as more “evidence” that there was something fishy about the ballots, while taking the subsequent leap that the whole states’ elections should be thrown out. That’s not what the numbers say.

Here’s what Dr. Cicchetti’s analysis actually did. It demonstrated that the outcome of the 2020 election cannot be assumed to have come from identical conditions as the 2016 election. That is all. In case any of you would not have already concluded that, here is a little more about what he did.

Background, first of all. The Z-score test that was applied is appropriate for analyzing random events, which is not necessarily the right tool for the job. Here’s how the tool is supposed to be used.
Suppose you have a slightly damaged coin and you want to know if it flips 50:50 fair. You flip it 100 times, counting the number of heads, and then do another 100 flips, counting again. Common sense says that with an unbiased coin, you “should” get around 50 heads from each series, but we also know that it’s in the nature of random coin flips not to get exactly 50 every time. What if you got 54 heads followed by 47 heads? Weird? Probably not. Most people would say that’s just what happens sometimes. There’s no reason to think that there’s anything funny going on with results like that. But what if you got 92 heads followed by 94 heads? Clearly there’s something fishy about the coin. 90 heads followed by 90 tails? Most people would agree that not only is that a heavily tricked-out coin, but you can’t be using the same trick coin for both runs. That’s where statistical tests come in. The Z-test is one of the tools used to determine whether it’s safe to conclude that a (possibly biased) coin tossed in two sequences are likely to have acted differently.

(I think my readers will forgive me if I leave out the calculation of the probabilities above. Suffice it to say that anyone with a college class in statistics has the background to look up how to do it. For what it’s worth, the chances of getting EXACTLY 50 heads on 100 throws of a fair coin is only about 8%, and getting 46-53 out of 100 is only around 43%, so the 54 & 47 example is totally not fishy. Getting further and further from 50:50, we find that 88 or more heads is where we break the one in a quadrillion threshold, while 92 or more out of 100 is around a one in a 6.2 quintillion. 92/100 followed by 94/100 is what I’d call “once in never.”)

Suppose we accept the filing’s statement that in 2016, Clinton received 1,877,963 votes, which is 45.9% of the total cast. Suppose we imagine that these votes came about from a long series of coin flips from a slighty bent coin thrown over 3 million times and which come of “not Trump” 45.9% of the time. Here is what Cicchetti tested: If we know in advance that this exact same biased coin was used again, what is the probability that when we throw it 4 million times in 2020, it comes up with “not Trump” 49.5% of the time instead of close to 45.9%?

The answer will not surprise you. The chances are one in … you guessed it … one in LOTS. Lots and lots and lots. Any statistician would say there is no reasonable possibility that this was the same coin. One in a quadrillion is actually optimistic by more than a billion times a billion times a billion. It’s “yuge.” With real coin throws, more throws you have, the more closely the average matches the true average of the coin. So with 4 million throws, while the true probability of throwing “not Trump” is 45.9%, the probability of being outside of 45.9% +/- 0.2% is about 1 in 16,000, and the probability of being outside of +/-0.4% is already in the neighborhood of 1 in a quadrillion. For the average to move by 3.6%, the only reasonable conclusion is that the assumptions of this mathematical model don’t match reality.

So that’s what we conclude. We conclude that conditions have changed. We conclude that the voters felt that the Clinton-vs-Trump election was not identical to the Biden-vs-Trump one. Some people changed their minds. Different people thought it was important enough to vote. In any case, the conclusion is clear. The 2020 election was not the 2016 election. His assumptions don’t match reality.

That’s really all that Cicchetti can say. But there’s more.

Elections, you all understand, are not coin flips. They are not random. They are the fully deterministic outcome of a single set of choices made by a single set of voters, each getting one vote, and at the end of the day, those choices are counted. There are no statistics to tabulate. They are what they are. If you want to say that a re-vote by the same electorate on a different day under different conditions would have different results, that’s just wind. Those hypothetical elections don’t exist, and don’t matter. Dewey could have beaten Truman.

Polls leading up to elections get analyzed statistically. If you want to predict the winner, you have to assume that limited number of people you polled are a true reflection of the millions who will actually vote, and modeling those polls like a series of coin flips is one of the tools in the tool box. If I call 100 people on their land lines, and ask them who they voted for for president 4 years ago, and all 100 say “Mickey Mouse”, I may call the election as a sure thing. If that’s not who wins the election, then the assumptions I made were wrong.

That is all.

Please stay home for Thanksgiving

Stay home. Seriously. For all of you have heard and ignored “please stay home” for the last 9 months and ignored it, could you please consider this for one trip?

Thanksgiving 2020 USA is going to be the Covid-19 super-spreader event of all time. As I’ve said in a previous post, the scariest time is when the number of new cases is not only rising, but the rate of rise is increasing. That’s where we are right now, and that’s why — if you don’t stay home — you have the chance of being one of the 2,000,000 cases a week I expect to be reported during the first 2 weeks of December.

If that doesn’t scare you, either you can’t be convinced that the 1200+ people that are dying of it daily are important, or you don’t care that people are surviving with life-changing symptoms that may last the rest of their lives, or you don’t believe that you or anyone you care about can get it, or you don’t realize that more than 1/3 of all the people who have gotten it are still sick, or there’s just something wrong with you and you don’t care. I don’t know if I can help with any of those. I just deal with data. Primatology baffles me.

Can’t believe we’ll get 2,000,000 new cases in a week? This I can help you with. Here’s what I started to write 4 days ago:

Nationwide, the number of people catching Covid is going up fast. The new record of 140,000 new cases yesterday (Nov 10) was up 40% from a week before. Count on breaking 160,000 new cases by the end of the week.

Well, that was optimistic. Every day this week was 35-40% higher than the week before, and Friday we had more than 180,000 new cases. The scariest part for me is that there’s no reason to expect this to change soon. With 3 days topping 155,000 this week, a conservative estimate for next week is at least one day over 210,000. See here and here. If we hit 285k per day in the next month, that’s 2,000,000 a week.

Still not worried because only 4% of closed cases resulted in death? There are now around 4,100,000 active cases in the US out of the 11.2 million who have gotten it since the start of the year. In other words, more than 1/3 of all the people who have ever gotten it are still sick. That’s going to go up, too: every day, the number of people getting sick is going up faster than the number getting well by 80,000 people a day Expect over 5 million sick by Thanksgiving. More hospitals are running out of beds.

I live in Seattle, where we’ve taken Covid seriously, but infection rates are going up. It has been announced that starting tomorrow we get even more new restrictions, and it’s an inconvenience, but I totally support it. Since we’re still 6 months from nationwide availability of a vaccine, it’s what we have to do. Statewide, the average infection rate this week reached 23 per 100,000, a 20% rise from last week.

In comparison, every state in the Montana / Wisconsin / Kansas triangle is multiple times worse off. Every one of these states has recent infection rates of 90 or higher (again this is average cases per 100,000 population over the last week). Iowa is at 149, over 6x higher than Washington. South Dakota and North Dakota are at 164 and 181, respectively 7x and 8x higher than Washington.

I have an admittedly personal interest in the states I’ve lived in before Washington. Infection rates in Indiana this week are 3.6x as bad as here. In Ohio and Texas, it’s “only” 2.3x and 1.6x. So… yay?

We now have 11 counties nationwide where over 10% of their population has gotten Covid. Some of the worst are Norton, Kansas (18.6%), Bon Homme, South Dakoda (17.4%), Buffalo, South Dakota (16.9%), and Eddy, North Dakota (13.4%). There are 18 counties where more than 2% of the whole county got sick JUST IN THE LAST WEEK. Of these, the worst are Crowley, Colorado (6.8%), Lee, Kentucky (4.2%), and Jones, Texas (3.6%).

For those hoping that this only a fluke of statistics in small, rural counties, I give you El Paso, Texas, population 839,000, where there were over 11,000 new cases this week, comprising over 1.3% of the population. El Paso made news this week by re-opening restaurants while getting help from the state government and suffering from overflowing hospitals and increasing numbers of dying patients.

Too many big numbers? Don’t care that 200,000 new cases a day means 1.4 million new cases a week, and if “only” 4% of those die, that’s 56,000 more dead Americans for every week we don’t get this under control? Let’s make it more personal. Six people died because one couple wanted all their friends at their wedding in Maine. Not 6 dead from the wedding party: that’s not how the super-spreaders work. These 6 were at a nursing home where the parent of a wedding guest worked. According to the Maine CDC, this wedding has been linked to at least 178 cases.

So here’s my main point: please stay home this Thanksgiving. Wear masks. Try not to spread Covid. Don’t be a plague rat. Do it for your family, and for your friends, and for the rest of us. Be safe.

Covid’s third wave getting worse in November

Well, Trump announced repeatedly that by November 4 we would all stop talking about Covid. Sorry? That was yesterday, and it’s not getting better.

I recall that first half of March was a scary time for me. With Covid cases on the fast rise, people were talking about doubling rates of 5 days. We went from 10,000 new cases a day March 26 to 20,000 March 31 before it slowed down. It made me crazy that people seemed oblivious to the fact that if there IS a doubling rate, then there’s no way to tell when cases were going to stop going up and up and up. We were on the “concave up” part of the curve. Upward curvature. It was bad. Would we reach 50,000 a day? 100,000 a day? It seemed inconceivably bad.

Well, we’re there again. Last week we set the world record for number of cases, twice, with 92,000 new cases on Thursday, and 100,000 new cases on Friday (using the count from Worldometers). This week, new records were set Wednesday at 108,000 and Thursday at 118,000. We have upward curvature, and 25% per week rise is a 3-week doubling rate. I’m terrified again.

We’re going to have our 10,000,000th case this week. We’re going to have days with 180,000 new cases before Thanksgiving. Happy Holidays, everyone. Wear a mask, wash your hands, and stop being a plague rat.

47 years!

47 years! Forty-seven years! 47 yeeeeeeears!!

The latest keep-on-stabbing point against Biden is that he’s taken a government salary for 47 years, and since he’s done nothing for the country in that time, he’s “taken” more than Trump has through any tax finagling he has managed to perform. This is an argument for people bad at math.

It’s also an argument for people who FEEEL that Biden is a bad guy for spending a lifetime of government service, learning patience and statecraft. I know that a reasoned rebuttal isn’t going to change their feelings. Still, bad math bothers me, so I have to take a moment for quick riposte.

We can leave aside for a moment the fact that Biden’s constituents re-elected him again and again, and that THEY thought he was doing something for them. For that rebuttal, feel free to read any of the first 5 articles you get if you Google “Joe Biden’s Accomplishments.” Let’s just stick to arithmetic.

Trump claims to be a billionaire, making hundreds of millions of dollars a year. By the most kind, conservative interpretation, if he paid 10% on $100 million a year, he’d pay $5 million a year. Almost all of us pay a higher rate than that, and he claims to make more than that, but let’s just leave that number there for comparison. By this rule of thumb, he’s found loopholes to avoid virtually all of that $5 million a year for most of the last 20 years’ taxes.

In comparison, the cumulative total of Biden’s earnings, earned by a full-time job through multiple re-elections over those 47 years is less than the last 2 years’ tax loopholes.

1. Biden’s salary for 8 years as Vice President was 8 x $230,700 = 1.85 million.

2. Suppose that at maximum the other 39 of those 47 years were paid at today’s senator rate (they weren’t) then 39 x $174k = 6.79 million.

3. For those who need me to do the rest of the arithmetic for them, thats 1.85 + 6.79 = 8.64 million cumulative total salary.

So what we’re hearing from the 47-year crowd is that if Trump cheated the government out of at least $15 million since becoming president, that’s ok because the other guy got over half of that for 47 years of salary. That’s a fine argument if you’re bad at math. That’s a fine, fine argument if you FEEEEL that only Trump can save you. Just remember that we recognize that you’re comfortable saying irrational, emotional, illogical things and that the rest of the world looks down on you.

No good news – COVID update, October 2020.

It’s been 3 months since I started my last Covid post on Facebook with this:
“You know what sucks the most about being in tune with data? Seeing the future.”

For those of you not looking, or not seeing trends, here’s more of the same. For the last month or so, there’s been a band of outbreaks from Montana through Arkansas which we can safely call the the opening month of the third wave.

There are literally 29 counties in the USA where over 1% of their population was diagnosed with covid THIS WEEK. 9 counties in North Dakota, 7 in Montana, 5 in South Dakota, 4 in Nebraska, 2 in Kansas, and 2 in Wisconsin. In Toole County, Montana, 2.5% of the county got it this week.

In comparison, my home town of Seattle was the first place in the USA to have over a dozen confirmed cases in February, and our county has been holding steady at around 0.05% a week, just having passed 1% cumulative cases. Over 80% are from months ago, and have recovered and gone home.

Anyone who knows anything about the spread of disease knows that these mostly-rural counties have a safety advantage because of cheap land and open spaces. Montana has no 50-story apartment buildings or single acres where a quarter million people are breathing each others’ air every day. If they chose to be as careful as New York, where 0.05% of the state got sick this week, they could do that. Montana is at least as rural as New Hampshire, where state-wide infection rate is at 0.7% cumulative, and their worst county this week was at 0.05%.

Europe is not immune, and it may be about to get worse there, too. Belgium and the Czech Republic had infection rates this week worse than any of our states besides North and South Dakota. Ignoring a couple of tiny countries, Netherlands and France are the 3rd and 4th most noteworthy this week, with average infection rates as bad as our 6th-to-10th worst states.
Taken as a whole, they’re in for a rough winter. As are we all.

There are those who say they aren’t worried about the spread because the death counts haven’t been rising with the illness rate. There’s actually 3 reasons why this is misinformed optimism.

The first reason that death rates haven’t risen in the last few weeks is that a tremendous number of new cases are among young people. Median age of those diagnosed dropped from 46 at the beginning of May to 36 at the end of August, and continues to go down.

Younger victims don’t die as often. They’re the Typhoid Marys and the plague rats, any many of them believe themselves to be invulnerable, but even the asymptomatic ones can bring it home and spread it to family and friends. It’s no coincidence that the infection rate started going up shortly after kids started going back to school.

The second reason that death rates haven’t risen much is timing. In each of the first 2 waves, death rates trailed behind infection rates by 2-3 weeks. So we’re seeing the beginning now. More is on the way.

The 3rd reason is the concerted effort by our current government not to call anything a covid death if there is any excuse not to. As reported here in August, and extended here more recently, the data is clear: we’re under-reporting on purpose.

As always, more data sources are out there: here for example, and here. As always, keep your distance, wear a mask, wash your hands. Don’t be a plague rat.

Covid: how not to lie about it

As most people know, surgeons wear masks. They’ve been doing it since around the American Civil War. It is well-established that gowns, gloves, masks, and rigidly followed sterile procedures save lives of patients, because not spreading the bacteria and viruses from surgeons and staff is critical to prevent infections. As a matter of logic, someone making a claim that overturns this kind of evidence-backed well-tested standard had better bring data. When an idea is this well-tested, the burden of proof is on them.

This week, the anti-mask crowd has been circulating a new article from Denis G. Rancourt’s. Just like during the 1918 influenza pandemic, people are saying that wearing a mask — one of our most widely needed steps to stop the virus — is a waste of time, and they’re totally wrong. Much as it pains me, and you’ll probably see why within a few paragraphs, I’m going to go through this entire article to talk about how wrong it is.

I’m not going to discuss Rancourt’s qualifications as a former physics professor, or his biases implied by his affiliation with political groups. You can find that elsewhere. I’m going to evaluate the math and science of his paper on its own.

My most charitable interpretation of Denis G. Rancourt’s article is that he doesn’t understand the difference between “this study does not conclusively prove something” and “this study proves that this thing is false.” I encourage you to read as far as his first citation. In this “study” there were 32 people split into 2 groups, and over the course of less than 3 months, one from each group got sick. Proper conclusion — this study tells us nothing, other than the obvious conclusion that 32 people over 77 days is not enough tell you anything about people getting sick. Rancourt claims that the main point is, “Face mask use in HCW was not demonstrated to provide benefit in terms of cold symptoms or getting colds.” I agree that it was “not demonstrated” but that’s hardly a surprise.

Before going any further, I’d like to point out another way that Rancourt is missing the point in reference to his first citation. Even if masks do little to protect the wearer, their main function is to protect everybody else. You can find citations for this anywhere: “You can’t look in a crowd and say, oh, that person should wear mask. There’s a lot of asymptomatic infection, so everybody has to wear a mask.” Either (1) the author doesn’t understand that, in which case it will pain me to read the rest of his citations because they aren’t going to be helpful, or (2) he’s being stubbornly and willfully ignorant, which is worse, or (3) he is just trying to make a case for something he feels inconvenienced by, and he truly doesn’t care that wearing masks will help someone besides himself, which is the worst interpretation of all.

I looked at Rancourt’s second citation, and saw the opposite conclusion to the one he is trying to spread. In their words: “There is some evidence to support the wearing of masks or respirators during illness to protect others, and public health emphasis on mask wearing during illness may help to reduce influenza virus transmission.” In his words, “None of the studies reviewed showed a benefit from wearing a mask.” Is this interpretation a lie or incompetence? At this point, I am inclined to give him the benefit of the doubt and say he’s just not as smart as he thinks he is.

Third citation: Rancourt writes “None of the studies established a conclusive relationship between mask/respirator use and protection against influenza infection.” However, from the paper’s summary: “There is some evidence to support the wearing of masks or respirators during illness to protect others, and public health emphasis on mask wearing during illness may help to reduce influenza virus transmission. There are fewer data to support the use of masks or respirators to prevent becoming infected.”

Again, there are a few ways to interpret Rancourt’s “none of the studies established a conclusive relationship.” Scientific papers rarely “establish a conclusive relationship” which, as a former physics professor, he should know. Anything short of Nobel prize-winning breakthroughs get the “there is some evidence to support” treatment, as this paper did. As above, he makes a claim opposite to that of the authors. As above, he tries to claim that his own lack of “conclusive relationship” is proof of its opposite.

At this point, I’m losing faith in my most charitable interpretation of the author’s condition, that he’s just ignorant or uninformed. Leaning towards “he’s a plague rat”.

I’m going to read them all, however. I don’t want to, but someone has to.

Fourth citation: Rancourt writes, “we found no significant difference between N95 respirators and surgical masks in associated risk… ” This is in fact one of the authors’ conclusions, but this has nothing to do with WEARING vs. NOT WEARING a mask.

The authors also concluded:

“Transmission of acute respiratory infections occurs primarily by contact and droplet routes, and accordingly, the use of a surgical mask, eye protection, gown and gloves should be considered appropriate personal protective equipment when providing routine care for a patient with a transmissible acute respiratory infection.”

In other words, WEAR A MASK!

So far, we’re zero out of four for his own citations to support his own recommendations. Onward!

Fifth citation: From Rancourt, “Evidence of a protective effect of masks or respirators against verified respiratory infection (VRI) was not statistically significant”

From the authors:

Meta-analysis of randomized controlled trials (RCTs) indicated a protective effect of masks and respirators against clinical respiratory illness (CRI) (risk ratio [RR] = 0.59; 95% confidence interval [CI]:0.46–0.77) and influenza-like illness (ILI) (RR = 0.34; 95% CI:0.14–0.82). Compared to masks, N95 respirators conferred superior protection against CRI (RR = 0.47; 95% CI: 0.36–0.62) and laboratory-confirmed bacterial (RR = 0.46; 95% CI: 0.34–0.62), but not viral infections or ILI. …

This systematic review and meta-analysis supports the use of respiratory protection. <This emphasis is mine>

That makes us 5 for 5 on Rancourt saying the opposite of what the authors conclude. Two more to go. Ugh. No… there’s also 23 end notes. I may have to review those another day, or just let them drop. I think 5 out of 5 is enough to establish a pattern of behavior… which in this case resembles that of a plague rat.

Sixth citation, and Seventh (and final) citation: Again, these articles have nothing to do with whether a mask is useful. It compares the effectiveness of two kinds of masks. I’m done.

As I stated at the beginning, extraordinary claims require extraordinary evidence. This paper has none. Wear a mask, you plague rats!