« Table pitfall | Main | Subjectivity »



"the peak on the outer line of 2006, for instance, is number 41 (green color) & has a value of 18%, which is the frequency of appearing as winner in that year."

This is an opportunity for me to complain again about the mania for turning everything into a percentage. Why not simply say that in 52 weeks of the two lotteries, the number 41 appeared on 19 occasions? Presenting the results as integers would also show how heavily quantised the results are, which isn't apparent from the percentages alone (eventually the viewer should notice that the results are 17.31% or 18.27%, never anything in between).

Recently I saw a graph of percentage Democratic seats in Congress over a century or so, and the data was in percent. Okay, that made the majority line a simple straight horizontal 50%, but it wouldn't have been beyond the wit of the graph maker to have the majority line rise as a step curve over the decades, giving the opportunity to show the actual number of seats, the actual seat majorities, and the changing size of the House, all in one convenient graph.


i just happened upon this post in my rss reader. i had coincidentally been examining this same data when a user posted it on swivel.com.

here's a scatter plot of the lottery numbers in question. this seems to be a simple visual way to show that complex-looking data is random.


Ah, truly a graph that fits this site's motto: "Recycling chartjunk as junk art."

I'm particularly fond of the lines connected between 2006 and 1988, as if there was some sort of great Mandala (lotter wheel?) at work here.


Thanks to huned for the data. I have to take back my complaint about normalising all the years; I had assumed they had two lotteries a week=104 a year, but it seems they had 123-124 a year, with substantially fewer than that in some years. I don't know what pattern would produce that number of draws a year, but if it's variable, then of course it's not wrong to take a percentage.

Xan Gregg

It's suspicious that the day value of each date from the Swivel data (is there another source) never exceeds 12. I wonder if m/d/y got crossed with d/m/y somewhere.

Doing a distribution on day of week on each interpretation yields two plausible results. With the dates as provided each day of week is equally likely. Reversing month and day fields shows Sunday (5%) to be much less common that the others which are almost equal but not quite.

Looking at the day of week *by year* with the reversed month/day interpretation (http://www.forthgo.com/blog/wp-content/uploads/2007/01/lottobyday.PNG) show a believable pattern: Sunday lottos stopped in 1993 while Friday lottos started and Saturday lottos picked up. Distributions before and after those transitions are steady.

Either way, it seems there is quite a big chunk of data missing, which is not readily noticed because of the large volume of data present.


funny tips about how to win the lottery:

Dolores Sosa

Thanks for sharing this data. I’ve been searching for it lately for my review.

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter