« Talking shop about probability | Main | Notes on evaluating predictions, or Background to my airfare predictor article »


Feed You can follow this conversation by subscribing to the comment feed for this post.

Sherman Dorn

Why is it a test failure when you discover something? I would think that "hey, we don't have to worry about stupid color choices!" is a pretty good finding.

Tom West

@sherman dorn: you're right, but management may not see it that way.

The random number thing is generally less important than you might think. If the server looks at the time and produces option A when the seconds is odd, and B when its even, then it's highly unlikely you'll get a systematic bias.

Also, I always say that A/B testign lets you pick the perfect shade of blue - but what if the best option was actually a red?


Tom: My point about random number generators is that there is no industry standard. I have come across a lot of fishy ones.

As for your method, I have always wanted to ask someone where the server gets the time? Is there any chance that that query fails?

As for your tongue-in-cheek complaint, why not test red against blue?

Sjors Peerdeman

I'm wondering why you didn't discuss existing A/B-testing tools, such as Visual Website Optimizer and Optimizely? I'm interested in the pros and cons of these tools from a data scientist's (?) standpoint.

Tom West

@Kaiser: For sure, if your server time fails, then the test doesn't work. But that's the same as saying if your random number generator fails, then the test doesn't work.

Yes, you coudl test red against blue... but my point was that A/B testing is often used to zoom in on a specific solution within narrow constraints - but the best answer may be outside those constraints.


Sjors: I am a happy user of Optimizely. We use both their SaaS platform and have a homegrown solution similar to PlanOut. The Facebook solution is written for developers while the Optimizely style solution is created for business people. A SaaS solution has limitations on what tests can be run.

The issues I outlined above are not solved by having tools. In fact, I encountered some of them in tests which used tools. What you need are brains - what I call numbersense. The Facebook solution provides structure and ingredients that the analyst finds helpful in diagnosing problems. I'm not saying that deploying PlanOut will magically prevent those problems.

Tom: On testing large versus small changes, I think the more interesting debate is whether one should test a complete redesign in which dozens of changes are introduced all at once against the existing design.

beli followers instagram

bad very easily happen if the measurement data collection is not done directly. especially if the facebook users are doing other activities simultaneously

The comments to this entry are closed.

Get new posts by email:
Kaiser Fung. Business analytics and data visualization expert. Author and Speaker.
Visit my website. Follow my Twitter. See my articles at Daily Beast, 538, HBR, Wired.

See my Youtube and Flickr.


  • only in Big Data
Numbers Rule Your World:
Amazon - Barnes&Noble

Amazon - Barnes&Noble

Junk Charts Blog

Link to junkcharts

Graphics design by Amanda Lee

Next Events

Jan: 10 NYPL Data Science Careers Talk, New York, NY

Past Events

Aug: 15 NYPL Analytics Resume Review Workshop, New York, NY

Apr: 2 Data Visualization Seminar, Pasadena, CA

Mar: 30 ASA DataFest, New York, NY

See more here

Principal Analytics Prep

Link to Principal Analytics Prep