Tag Archive for 'stabilization'

Rules for a successful multivariate test (Billy’s Optimization Guide Part 3)

Rules of Six Detail

If you missed it, see Part 1 (A/B Split Testing) and Part 2 (Multivariate Test Basics).

With the basics of part 2 down, it’s time to start designing a multivariate test.  Every optimization project has different challenges and goals, luckily though, there are a few rules that apply to every multivariate test design.  These rules fit into two categories: technical rules and content rules.

Technical rules:

  1. Choose the appropriate multivariate test type (full or fractional factorial)
  2. Determine the number of factors and levels that can be tested based on estimated conversion traffic (choose a test array)
  3. Stop the test when it has stabilized, not based on your earlier estimations

These rules ensure statistical significance by constraining the test to the appropriate size at the beginning and then letting the test gather the proper amount of data at the end.

Running a test full factorial, if your traffic supports it, may be a good choice if you’re testing content that you believe to have many interactions or if you only want to test 2 factors with 2 levels each.  (Note: the smallest fractional factorial test size is 3 factors with 2 levels each.)  Typically though, you’ll want to run a fractional factorial test to save time and expand the number of factors and levels you can test.

In order to find out how many factors and levels you can test, you need to have some idea of your predicted page views, conversions, as well as an estimate of lift.  The reason that lift matters, is that a large lift will get you more conversions and so your test will stabilize quicker.  Because of this, I would be conservative with lift estimates to ensure that the test is not designed too large.  At Widemile, we have a large list of arrays available to our tool and have calculated the approximate conversions needed to stabilize, allowing me to look at the three criteria I listed and find the arrays that are statistically viable for testing.  You should look for something similar with your tool of choice.

To figure out when a test is stabilized, I prefer to primarily look at level influence stabilization with experiment conversion rate stabilization for support.  Widemile Optimize shows this using graphs, so I simply look for horizontal trending of lines, meaning winning levels and experiments stay winners and their level of influence or conversion rates stay fairly constant (look horizontal) over 3-5 days.  If you don’t have graphs available,  the historical cumulative conversion rate for your experiments and see if there is a lot of variance between the latest few days of your test.

Content rules:

  1. Every item you test should answer an important question
  2. Test variety not quantity
  3. Test opposites first then refine
  4. Remember you can run more than one test

The content rules are closely tied together.  In effect, they ensure that the items selected for testing have purpose and that they don’t needlessly expand the size of your test, reducing its efficiency.  I begin designing tests by creating hypothesis regarding issues with the page and then choose factors and design levels to address those issues.

An example hypothesis is “Having a hero shot on the right side of the page causes users to ignore the important value proposition on the left side.”  To test this, I would choose hero shot position as a factor and then have “left side hero shot” as the baseline level and “right side hero shot” as the second level.  This example also illustrates that, other than headlines and images, testing layout is possible with creative use of CSS and sometimes JavaScript.  As long as you can revert from one to another and it matches the other factors and levels, you are at liberty to test anything.

Coming back to the rules, make sure that you are testing as few items as possible to find out what you need.  Before testing a collection of lifestyle hero shots, choose one and test it against an iconic hero shot.  This will save you the time of going down a path of testing something that may not work.

Lastly, you aren’t going to be able to get the best page on the first run or even second, third, etc.  If you knew what your audience liked 100% of the time then you wouldn’t need testing.  Remember to think of your overall test plan beyond just the first run, so that you can answer all the questions you need without having to force everything into one test.

In summary, determine what you’re trying to achieve, select the proper testing method to meet those goals and then make sure to be purposeful and efficient with the content you end up testing in front of your visitors.  Testing and optimization is not difficult, although it can be tough to start.  Follow these rules and you’ll be on your way to conquering conversion rates, bounce rates, funnel drop-offs and many other metrics.

Photo credit: Aranda\Lasch (CC)

3 difficult optimization results and what you can learn from them (3 of 3)

Note: This is the third post of a 3 part series, each focusing on one type of test result that is tough to deal with. Read the other 2 articles on highly mixed data and the original page beating the new variations.

Ready for the toughest of all test results? I brought in Widemile’s Chief Scientist, Vladimir Brayman, for this post to help me with some of the concepts around this topic. The last of the three results is when the results just won’t stabilize.

How does this happen?
As long as you have homogeneous traffic and enough time, a test should stabilize. Unfortunately, this is not always possible and I don’t know anyone with unlimited time. The most obvious way this occurs is when a test is designed too large, meaning you don’t have enough conversion traffic for the number of variations you are trying to test.

Additionally, getting homogenous traffic is not always easy. If your sources are too different, you can have problems. Text, banner, e-mail ads and even Yahoo vs Google traffic may behave differently. The worst case is when these sources of traffic are added mid-test. I have had tests where an e-mail campaign was done at the end of a test without my knowledge (until I asked about the huge spike in traffic!)

You can’t control all traffic coming to your page from some sources like PR, blogs, seasonal events and news. This goes back to part 1, about highly mixed data; everything there applies to this case too.

A test also may not stabilize because the test is designed with elements that are too similar. The same thing can happen when 2 elements are different but have approximately the same amount of impact. In these situations, your data will go back and forth on which of them are the winners.

Anything outside of your page that has a large influence can destabilize your test, this includes pieces of your funnel. One symptom of this is when your clickthroughs are fairly consistent but the full conversions are not. If you are testing a landing page and the sign-up process after it is very kludgey and difficult for users then it can have a large impact on your tests’ ability to stabilize. This is especially true if the experience for visitors changes. An example of this is visitors bailing from a purchase funnel because shipping to their area is prohibitively more expensive than other areas. Although they would have converted if shipping was within the average price range, they ended up not converting because of something encountered outside of the landing page, skewing your results. This is in almost every test, but the magnitude of its impact depends on what exactly occurs.

What can you do to prevent this?

If you are using a testing tool different from what you normally track your conversions with, make sure you run a baseline test so that you can compare the numbers your testing tool gives you with the ones your conversion analytics produces. They should be within about 10%-15% of each other over about a week or so. Finding a large discrepancy here will save you from headaches down the line. This essentially double checks the expected traffic numbers by ensuring you are measuring your current conversion correctly, which allows you to design a test of the appropriate size. By size, I mean ensure that you have enough testing time and within that time you will get enough traffic.

While easier said than done, it is important to look for new traffic that may be driven to your page and to segment it out. Since this shares some of the same problems as highly mixed data, those solutions apply here too.

What can you do if this happens?

First, don’t cut your tests short unless you think more data won’t solve the problem. If you don’t reach stabilization, you are wasting all the time you tested since you have inconclusive data. Always try to be as conservative as possible and end tests only when you are very confident that the test is stabilized or that there is no other choice.

Think about restarting the test if it isn’t stable. Use a smaller design. Pick the important factors (pieces) and the levels (variations) that you think will perform and are drastically different from each other. This prevents elements from looking unstable as they flip flop as the optimal.

If your only problem is that 2 variations are vying for the winning position, then they likely perform about the same. It probably is not really worth your time to wait for them to stabilize and so stopping the test and going with either of them likely will have little difference to your conversion rates.

The problem of outside funnel influence is a bit harder, but not impossible to solve. The best solution is to segment the users that are determined to be unqualified. For example, if you only ship or work with US customers and businesses, then filter out any users that are outside of the US and do your analysis from there. This can be done either at the data level if you can tell where the data came from, otherwise this can be done with a splitter or qualification page that leads people into the appropriate funnel first. This may impact your overall conversions itself though, so careful testing around these methods should be done as well.

From my experience, the problems I’ve listed in these three posts are either preventable or unlikely to occur. The value of having an optimization expert is because they can avoid these situations or at the very least extract useful lessons when they do happen. Having said that, don’t be scared to test. Once you get the hang of it, it is a lot of fun and one of the keys to effectively growing and maturing your online marketing campaigns.

CC photo credit #1: ryanincCC photo credit #2: jurvetson

3 difficult optimization results and what you can learn from them (1 of 3)

Note: This is the first post of a 3 part series, each focusing on one type of test result that is tough to deal with.

Mixed drinks

There are 3 types of optimization results that people never look forward to getting.

Unfortunately, anyone who runs enough tests will run into these situations. In the following 3 posts I will go over the 3 situations and outline how they happen and how to prevent and handle them.

This first post is about tests with highly mixed data.

How does this happen?
Typically mixed test data occurs when events out of your control, or you forget to control, impact your test and skew your test results.

Your average traffic finds your tested page either through search, browsing around your website or through your planned advertising campaigns. However they get to the page, tests are designed (or should be designed!) based on how visitors will get there.

The problem arises when, outside of the scope of those involved in testing or because of some oversight, a new type of traffic is driven to the page without any preparation being done for that traffic. While traffic is good for your sample size, it is bad because those new visitors are coming in with totally different motivations, assumptions and knowledge. This means they probably will react differently to your tested elements than the traffic you were driving to it originally.

Most often this happens to me because of a new marketing push, such as an e-mail blast, new display ads or promotions at a trade show. This can happen even more unexpectedly if some outside party drives a lot of traffic to your page. A news story or blog review that innocently links to your page, can suddenly becomes a big source of new traffic.

What can you do to prevent this?
The first step is to spread the knowledge around your company that this testing is going on and that anything that may impact the page and its visitors should be run by the optimization team first.

Next, always segment out significant traffic and track it separately. If you segment, the worst case scenario is that they perform the same as your current traffic and you did a little extra work. The alternative is having to trudge through your data, trying to separate the 2 types of traffic and possibly having to restart the test if you can’t separate them out.

Lastly, be aware. Watch your data and look for big changes. If you see something strange or a sudden shift, try to find a cause. It usually will be nothing, but if you do find something, quickly build a segment for it. Even if the “new” traffic has already started hitting your page, a segment should be setup as soon as possible.

What can you do if this happens?

I would still try to segment the data in any way possible. Even taking certain days/time out of your data, may be enough to salvage your results. Do your analysis with and without those days and see if the optimal page changes. You should take extreme care when doing this still though and make sure you have statistically relevant results.

My next solution is just to restart the test. Testing is about continual growth and you can’t always get what you want out of every test. Be happy that you got some extra traffic and try to take precautions to take it into account, or prevent it, the next time around.

Let me know if you’ve ever run into these problems before and how you handled them. Look out for part 2 of this series in the next week.

Photo Source (under CC)