« Update on Dataviz Workshop 3 | Main | A promising infographic about motorcycle helmets »



Andrew from the Chitika Insights research team here. Within the graph and text of every report we try to make our indices as clear to readers as possible so that they understand exactly what is being shown, along with what is not being shown. That being said, there are always ways to improve, and admittedly, Daniel and yourself brought up some excellent points on how to better visualize these hour by hour data sets. As such, in the future, we will be employing the following:

- We will utilize hourly raw volume percentages observed over a time period, essentially dividing volume by hour over total volume. For example, if we observed roughly 10,000,000 Android impressions for the entire sample, and about 500,000 came at the 0 hour, then the 0 hour for Android would equal 5%. This method keeps the peak hour visible, but in the context of total daily usage.

- To visualize comparative volume, we will use an area chart to highlight what percentage of total relative traffic each data point represents. Again, using the recent data set as an example, if Android makes up 32% of relative traffic (i.e. combined iOS and Android) at the 0 hour, an area chart will feature Android taking up 32% of said x-axis point, while iOS takes up the remaining 68%.

Just to address some other points within your posts - the data utilized as part of our Insights studies are not collected for the purposes of debugging, but rather, part of the basic information included within a user agent, which we catalog whenever a user loads a webpage containing our ad code, whether or not an ad actually appears. Indeed, we can only observe the activity occurring within our network of around 350,000 websites, and are only one data source out of many. Regarding controls, while we can't say with absolute certainty that these usage patterns are representative of a typical day, the 14-day sample we averaged over one 24-hour period generally presented consistent usage patterns across the data set. Also to clarify, these graphs essentially show the usage patterns of the entire aggregate user base of that OS, rather than any given user or user type. Additionally, our back end statistical processes do account for any "invalid traffic" from bots or similar sources, and we disregard it within our sample.

Again, thank you for the feedback. Our aim is always to make these aggregate usage trends as valuable and clear as possible, and these changes will help us better deliver on that moving forward.


Hi Andrew, thanks for your note! My top advice is (a) be clear about what the interesting research question is and (b) be clear about what the model of the relationship between OS/devices/etc. and diurnal usage pattern is.

Regarding "invalid" traffic, how do you know your processes removed all of them? This isn't a problem specific to your situation but is an industry wide issue. I don't think any of us can even estimate what proportion of such traffic is caught by these processes. In fact, I keep running into suspicious traffic every time I look at web data.

Great to hear you're making improvements.

The comments to this entry are closed.


Link to Principal Analytics Prep

See our curriculum, instructors. Apply.
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR.

See my Youtube and Flickr.

Book Blog

Link to junkcharts

Graphics design by Amanda Lee

The Read

Keep in Touch

follow me on Twitter