« Joining the fun | Main | Web publishing »



Variability in the numbers will be low due to the high counts so it wouldn't have needed much if any smoothing. Even if the tails the counts are probably about 400 so 95% CI are about 10%, hardly noticeable and the graphic artist probably just smoothed it out.

Unless there is another option many people may feel it is worthwhile to buy a ticket for less than 40 trips simply to be able to buy only one ticket. I've done a similar thing with day tickets in non-English speaking countries to avoid the problems due to not understanding currency. Others may have a ticket supplied by an employer and there are always sudden holidays and illness that prevent fully using a ticket.


It looks to me like an exponential drop off to the left side of the break-even 40-trip mark. Then the right hand side looks like it could be fit by a piece of a Gaussian.

To me, this says that people are pretty good at deciding whether or not to buy the 30-day card. Although, I think some credit should go to NYC in selling the right product (30-days = about 20 work days = 40 one-way trips to and from work) for a price that is easily computed in your head ($81/$2 = about 40).


In addition to the 40-trip mark noted on the graph, a better x-axis, and a right-hand vertical axis which has been normalized, I'd like to know the total number of riders below the 40-trip mark and the total above.

As for the technical note, I think the curve is real data. There are about a million data points in the graph, so I can believe that it ends up being fairly smooth.


looking at the long smooth stretches interspersed with tiny jagged sections, the distribution of these jagged sections, combined with the smoothness of the tails, it looks like something someone drew in photoshop...but i guess i could be wrong.

it makes it's point, though.


It should have been a cumulative curve of percentage against trips. The median, quartiles, and percentiles could be read off by anyone who cared to, or alternatively the percentage of users who took up to 40, up to 100 trips etc. By reading the percentage scale backwards from 100% the reader could say who took more than 40, more than 100 etc. Who's really interested in who took exactly 40 trips, no more and no less, or exactly 100?

The mean of 56 trips could be marked, as it would not be trivial to read it by inspection alone, and the total number of users represented by the "100%" would be given in nearby explanatory text. Multiplying the read-off percentages by the total number would give anyone inspecting the graph the absolute number of users taking more than 40 trips and so on.


Oops, I just wrote exactly what Kaiser already did in the article! So much for my reading comprehension.


It looks a bit like an extreme-value distribution (EVD), which is a distribution that looks like Normal, but with an asymmetrically "fat" tail on one side or the other. Given the what the numbers represent (the total/max number of trips taken on a given card), an EVD might make perfect sense.


Lognormal curve...logarithms of data are normally distributed...

Standard form of normal distribution when range is from zero to infinity, instead of negative to positive infinity.

Tony K

We can all speculateon what the distribution of the data is relative to standard algorithmic distributions. But the only thing we will agree on without data is that it is not a normal distribution.

In any case, I agree with the folks who say that a cumulative distribution would be much more useful in this case.



Folks, cut the title some slack. It says "bell curve", not "normal distribution", and I think that term is perfectly appropriate for informal usage. (I mean, it does look like a bell!) The difference between a true normal distribution and this curve is mildly interesting, but does not affect the point of the story.

For the people advocating a cumulative distribution chart: I think you'd find that only about 1% of the readers of that article would understand that diagram. A cumulative distribution graph is highly unfamiliar to most people--and if you diaagree, I challenge you to find even one example of such a chart in a mainstream press article.


That would be a very odd sounding bell indeed. I think that it would have been more informative to a least include a line showing the median. On a more technical note, I would wager that this could actually be approximated well as a mixture two or three Poisson distributions.


I agree with the EVD comment. It looks like a classic Fisher-Tippet distribution.

The comments to this entry are closed.

Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter