Via Twitter, Andrew B. (link) asked if I could comment on the following chart, published by PC Magazine as part of their ISP study. (link)
This chart is decent, although it can certainly be improved. Here is a better version:
A couple of little things are worth pointing out. The choice of red and green to indicate down and up speed respectively is baffling. Red and green are loaded terms which I often avoid. A red dot unfortunately signifies STOP, but ISP users would definitely not want to stop on their broadband superhighway!
In terms of plot symbols, up and down arrows are natural for this data.
Using the Trifecta checkup (link), I am most concerned about the D(ata) corner.
The first sign of trouble is the arbitrary construction of an "Index". This index isn't really an index because there is no reference level. The s0-called index is really a weighted average of the download and upload speeds, with 80% weight given to the former. In reality, the download speeds are even weighted higher because download speeds are multiples of the upload speeds, in their original units.
Besides, putting these ISPs side by side gives an impression that they are comparable things. But direct comparison here is an invitation to trouble. For example, Verizon is represented only by its FIOS division (fiber optics). We have Comcast and Cox which are cable providers. The geographical footprints of these providers are also different.
This is not a trivial matter. Midcontinent operates primarily in North and South Dakota. Some other provider may do better than Midcontinent on average but within those two states, the other provider may perform much worse.
Note that the data came from the Speedtest website (over 150,000 speed tests). In my OCCAM framework (link), this dataset is Observational, without Controls, seemgingly Complete, and Adapted (from speed testing for technical support).
Here is the author's disclosure, which should cause concern:
We require at least 50 tests from unique IP addresses for any vendor to receive inclusion. That's why, despite a couple of years of operation, we still don't have information on Google Fiber (to name one such vendor). It simply doesn't have enough users who took our test in the past year.
So, the selection of providers is based on the frequency of Speedtest queries. Is that really a good way to select samples? The author presents one possible explanation for why Google Fiber is absent - that it has too few users (without any evidence). In general, there are many reasons for such an absence. One might be that a provider is so good that few customers complain about speeds and therefore they don't do speed tests. Another might be that a provider has a homegrown tool for measuring speeds. Or any number of other reasons. These reasons create biases in various directions, which makes the analysis confusing.
Think about your own behavior. When was the last time you did a speed test? Did you use Speedtest.com? How did you hear about them? For me, I was pointed to the site by the tech support person at my ISP. Of course, the reason why I called them was that I was experiencing speed issues with my connection.
Given the above, do you think the set of speed measurements used in this study gives us accurate estimates of the speeds delivered by ISPs?
While the research question is well worth answering, and the visual form is passable, it is hard to take the chart seriously because of how this data was collected.