« Talking shop about probability | Main | Notes on evaluating predictions, or Background to my airfare predictor article »

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Sherman Dorn

Why is it a test failure when you discover something? I would think that "hey, we don't have to worry about stupid color choices!" is a pretty good finding.

Tom West

@sherman dorn: you're right, but management may not see it that way.

The random number thing is generally less important than you might think. If the server looks at the time and produces option A when the seconds is odd, and B when its even, then it's highly unlikely you'll get a systematic bias.

Also, I always say that A/B testign lets you pick the perfect shade of blue - but what if the best option was actually a red?

Kaiser

Tom: My point about random number generators is that there is no industry standard. I have come across a lot of fishy ones.

As for your method, I have always wanted to ask someone where the server gets the time? Is there any chance that that query fails?

As for your tongue-in-cheek complaint, why not test red against blue?

Sjors Peerdeman

I'm wondering why you didn't discuss existing A/B-testing tools, such as Visual Website Optimizer and Optimizely? I'm interested in the pros and cons of these tools from a data scientist's (?) standpoint.

Tom West

@Kaiser: For sure, if your server time fails, then the test doesn't work. But that's the same as saying if your random number generator fails, then the test doesn't work.

Yes, you coudl test red against blue... but my point was that A/B testing is often used to zoom in on a specific solution within narrow constraints - but the best answer may be outside those constraints.

Kaiser

Sjors: I am a happy user of Optimizely. We use both their SaaS platform and have a homegrown solution similar to PlanOut. The Facebook solution is written for developers while the Optimizely style solution is created for business people. A SaaS solution has limitations on what tests can be run.

The issues I outlined above are not solved by having tools. In fact, I encountered some of them in tests which used tools. What you need are brains - what I call numbersense. The Facebook solution provides structure and ingredients that the analyst finds helpful in diagnosing problems. I'm not saying that deploying PlanOut will magically prevent those problems.

Tom: On testing large versus small changes, I think the more interesting debate is whether one should test a complete redesign in which dozens of changes are introduced all at once against the existing design.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Marketing and advertising analytics expert. Author and Speaker. Currently at Vimeo and NYU. See my full bio.

Next Events

Aug: 20 DataViz New York Meetup

Aug: 26 Optimizely Experience, Invited Expert, New York

Past Events

See here

Junk Charts Blog



Link to junkcharts

Graphics design by Amanda Lee

Search3

  • only in Big Data

Community