The A-Z of A/B Testing
Not since Albert Einstein's 1904decision never to drag a brush through his hair while he drew breath has a scientific idea caused such a brouhaha as A/B Testing. Annually, Microsoft conducts over 10,000 A/B tests. Google does likewise. And, as we shall see, A/B testing played a big part in Obama's second-term election. But what is A/B Testing, and why should we give a hoot about it?
Today, the use of randomized, double-blind testing – the gold standard for scientific experiments – is widespread. A/B Testing is an example of such testing. A prototype of the method originated in the early nineteenth century.
The Mangled Legs Experiment: Trelawny et al. (1835)
Two poachers, identical twins Cuthbert and Toulouse Lafarge, set out after nightfall to steal the main ingredient fora rabbit stew from Squire Trelawny's private Connecticut estate. During planning, the village doctor attested that the Lafarge brothers' lower limbs were decidedly leg-shaped.
However, after stumbling onto a couple of cunningly positioned man traps, the shape of each brother's right leg more closely resembled the muffler of a '69 Chevy recently passed through a wrecker's yard crusher.
For a Natural History buff like Squire Trelawny, this opportunity to advance science was too good to miss. And before you could say, "They're lower class, and therefore expendable," the estate gamekeeper trundled the brothers Lafarge to the Squire's mansion in a handcart. There, the Squire lashed Cuthbert to the kitchen table and Toulouse to a Welsh Dresser in the dining room. So began one of the world's first controlled experiments.
For the Squire, this was an opportunity to test the efficacy of his pet theory — homoeopathy, a school of thought based on the deranged assumption that diluting an active agent increases its healing power, and the more you dilute it, the more exponentially powerful it becomes. We'll return to the Squire's shenanigans later. First, let us explore A/B testing in some detail.
In A/B testing, A is the 'control' variable.B is a new version of the control variable. After submitting both variables to blind testing, the version that most improves your business metric(s) is the 'winner', and you modify your system accordingly.
Google engineers ran their first A/B test in 2000 to determine the optimum number of results to display on their search engine results page. Eleven years after their first test — Google regularly ran over 7,000 A/B tests.
A/B testing is as powerful as a Saturn rocket on steroids. In 2012, for example, Barack Obama used it in what turned out to be the most successful fundraising campaign in US history. He became the first candidate to receive a billion dollars in donations. According to Time Magazine, the campaign digitally acquired an incredible $690 million of this sum.
Obama's website optimization campaign team conducted over five hundred tests, delivering a 49% increase in online donation conversions. And an even more impressive 161% hike in online sign-up conversions.
As a contributory factor, the performance of the donation website accounted for the campaign's success far more than any three-word slogans for up-and-coming policies or listing the good deeds of term one. Many of the eminently reasonable explanations for Obama'ssecond term had as much influence on his campaign as Liberace's career had on Heavy Metal record sales.
The power of A/B tests lies in where they are targeted. Most set their sights on the part of the brain dedicated to what boffins call 'system one thinking,' which accounts for 95% of human decision-making and involves cognitive bias.
The psychological concept of cognitive bias dominates system one thinking. It can be defined as 'systematic patterns of deviation from the norm or rationality in judgment, making inferences about other people and situations illogical.'
Just in case you missed it, here's the power and glory of A/B testing in a nutshell: Obama's successful campaign raked in more money than anyone had shaken a stick at before by engaging a part of the electorate's mind designed to make snap decisions, often illogically. In the immortal words of Dorothy: 'Toto, I've a feeling we're not in Kansas anymore!'
Dorothy is right. We are nowhere near Kansas. Instead, we're in a strange land where logical thought is locked up in an attic like a mad Victorian uncle. We may imagine that we make rational decisions all the time. Wrong! Evolution has endowed us with brains awash with cognitive bias over which we have, by definition, no conscious control.
The Anchor Effect
The anchor effect offers an excellent example of cognitive bias, though there are hundreds of cognitive biases to choose from. Anchoring bias happens when we place too much emphasis on the initial data we encounter and allow that to influence subsequent decisions.
Let's say you're on the hunt for a stuffed animal as a center piece for your den. Early in your shopping expedition, you stumble across a stuffed capybara in a taxidermist's window. The price tag reads $5,000. You shop around; maybe there's a better deal to be had. You come across a second capybara. It's the same size as the first; only the asking price is $500. Your primeval brain will quickly see the second capybara as 'cheap' — a bargain even. But this is true only because the first capybara your mind encountered cost considerably more.
If you had come across the $500 capybara first and had nothing to compare it with, you'd probably not see it as cheap. It's hard to see anyone parting with five hundred bucks for a giant rodent carcass with thirty pounds of upholsterer’s packing shoved up its ass. What happened is that the anchor – the first price of $5,000 encountered – unconsciously skewed your opinion.
The majority of tests used in Obama's campaign had a significant impact using strategies based on a range of cognitive biases but also the psychology of persuasion and other neuro-scientific concepts. All tests were legal and above board.
Think you could easily apply A/B testing to your website? Think again. Obama's data scientists didn't just load random data into freeware that came free with a box of Golden Grahams. His campaign had a large, highly-skilled team with in-depth consumer psychology and neuro-marketing knowledge, constructing and deploying state-of-the-art code.
However, A/B testing can benefit ordinary website owners who set their sites lower than winning a second term at the White House. [INSERT CTA HERE]
Meanwhile, back at the ranch
The Squire's cook whacked her dinner gong — a pre-arranged signal — and the village blacksmith began sawing off Cuthbert's muffler-shaped leg, Boston style (with wild abandon.) Half a dozen burly estate workers pressed Cuthbert to the table to counter the pain; he bit down on the handle of a chimney sweep's brush. The blacksmith huffed and puffed until red-faced, trying to shave four-tenths of a second off the world record for leg removal using a bone saw.
His record attempt looked like it might just succeed, had not ratcheting tension in Maurice's jaws as the saw bit bone, shattered the handle of the sweep's brush to matchwood. This distraction cost the blacksmith three-tenths of a second and cost Cuthbert his life as he bled out.
When the cook's gong reverberated from the East Wing to the dining room, Squire Trelawny's butler administered brotherToulouse one-eighth of a teaspoon essence of Bog Myrtle dissolved in enough water to fill the St. Lawrence Seaway. Then he stepped back to observe any healing effects.
The homoeopathic cure proved death-inducing as opposed to life-enhancing as Toulouse, like his brother, also bled out. Twenty minutes later, in the life-must-go-on tradition, the butler piled the brothers' bodies into the same handcart they'd arrived in and trundled them through the village to the midden for disposal with as much reverence as you might treat a dead sparrow at the foot of an oak tree.
The first trial of such a method actually occurred in Germany in 1835, when a group of mirthless Teutonic scientists set out to test the efficacy of a homoeopathic drug. (See Stolberg M(2006). Inventing the randomized, double-blind trial: The Nürnberg salt test of1835. JLL Bulletin: Commentaries on the history of treatment evaluation.)
I should fess up here: Given Germanpeople's sober approach to life at the time, the write-up of the actual testswas duller than the 'recent deaths' page of the Idaho Potato Gazette, and itseemed churlish to subject readers to it. I hope you can forgive me.