« Update on Dataviz Workshop 3 | Main | A promising infographic about motorcycle helmets »

Comments

Andrew

Andrew from the Chitika Insights research team here. Within the graph and text of every report we try to make our indices as clear to readers as possible so that they understand exactly what is being shown, along with what is not being shown. That being said, there are always ways to improve, and admittedly, Daniel and yourself brought up some excellent points on how to better visualize these hour by hour data sets. As such, in the future, we will be employing the following:

- We will utilize hourly raw volume percentages observed over a time period, essentially dividing volume by hour over total volume. For example, if we observed roughly 10,000,000 Android impressions for the entire sample, and about 500,000 came at the 0 hour, then the 0 hour for Android would equal 5%. This method keeps the peak hour visible, but in the context of total daily usage.

- To visualize comparative volume, we will use an area chart to highlight what percentage of total relative traffic each data point represents. Again, using the recent data set as an example, if Android makes up 32% of relative traffic (i.e. combined iOS and Android) at the 0 hour, an area chart will feature Android taking up 32% of said x-axis point, while iOS takes up the remaining 68%.

Just to address some other points within your posts - the data utilized as part of our Insights studies are not collected for the purposes of debugging, but rather, part of the basic information included within a user agent, which we catalog whenever a user loads a webpage containing our ad code, whether or not an ad actually appears. Indeed, we can only observe the activity occurring within our network of around 350,000 websites, and are only one data source out of many. Regarding controls, while we can't say with absolute certainty that these usage patterns are representative of a typical day, the 14-day sample we averaged over one 24-hour period generally presented consistent usage patterns across the data set. Also to clarify, these graphs essentially show the usage patterns of the entire aggregate user base of that OS, rather than any given user or user type. Additionally, our back end statistical processes do account for any "invalid traffic" from bots or similar sources, and we disregard it within our sample.

Again, thank you for the feedback. Our aim is always to make these aggregate usage trends as valuable and clear as possible, and these changes will help us better deliver on that moving forward.

Kaiser

Hi Andrew, thanks for your note! My top advice is (a) be clear about what the interesting research question is and (b) be clear about what the model of the relationship between OS/devices/etc. and diurnal usage pattern is.

Regarding "invalid" traffic, how do you know your processes removed all of them? This isn't a problem specific to your situation but is an industry wide issue. I don't think any of us can even estimate what proportion of such traffic is caught by these processes. In fact, I keep running into suspicious traffic every time I look at web data.

Great to hear you're making improvements.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing analytics and data visualization expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Book Blog



Link to junkcharts

Graphics design by Amanda Lee

The Read



Good Books

Keep in Touch

follow me on Twitter

Residues