Archive for the 'Testing Concerns' Category

What skills are necessary for optimization?

design

Use analytics? Update your website?  Then you have everything you need.

While optimization is a distinct process, it shares the same skill set as these common online marketing practices.

Similar to analytics, optimization requires implementation, data analysis and measurable marketing goals.  And as with updating a website, you need creative and design expertise, web development and copywriting.

The example optimization workflow below illustrates what and when the above skills and resources are needed in the process:

  1. Planning: At the beginning, all you need are basic marketing skills: select a page, spell out the questions you have about the page and determine the KPI’s for success.
  2. Design: Use copywriting and creative/design skills to create test ideas to answer your questions and drive performance based on the selected KPI’s.
  3. Build out: Use web development resources to translate your ideas into code.  You may be able to do this step yourself, depending on the content of the test and your own technical abilities.
  4. Reporting: After the test is live, you need to analyze the data.  You’re looking for answers to your questions  (Do testimonials  increase sales?) and new insights (We don’t have to use flash to grab attention.)

While some education is necessary, optimization utilizes skills familiar to online marketers.  Optimization isn’t more difficult than other online marketing, it’s just different.

I recommend starting testing ASAP, even if it is with a small portion of a web page and/or will have a small impact.  Going through the process will help make it a natural part of your marketing cycles.   After adjusting by doing a test or two, running a large scale optimization campaign quickly becomes not only feasible, but some of your most important work.

Photo Credit: http://www.flickr.com/photos/confused_andy/ / CC BY-SA 2.0

Rules for a successful multivariate test (Billy’s Optimization Guide Part 3)

Rules of Six Detail

If you missed it, see Part 1 (A/B Split Testing) and Part 2 (Multivariate Test Basics).

With the basics of part 2 down, it’s time to start designing a multivariate test.  Every optimization project has different challenges and goals, luckily though, there are a few rules that apply to every multivariate test design.  These rules fit into two categories: technical rules and content rules.

Technical rules:

  1. Choose the appropriate multivariate test type (full or fractional factorial)
  2. Determine the number of factors and levels that can be tested based on estimated conversion traffic (choose a test array)
  3. Stop the test when it has stabilized, not based on your earlier estimations

These rules ensure statistical significance by constraining the test to the appropriate size at the beginning and then letting the test gather the proper amount of data at the end.

Running a test full factorial, if your traffic supports it, may be a good choice if you’re testing content that you believe to have many interactions or if you only want to test 2 factors with 2 levels each.  (Note: the smallest fractional factorial test size is 3 factors with 2 levels each.)  Typically though, you’ll want to run a fractional factorial test to save time and expand the number of factors and levels you can test.

In order to find out how many factors and levels you can test, you need to have some idea of your predicted page views, conversions, as well as an estimate of lift.  The reason that lift matters, is that a large lift will get you more conversions and so your test will stabilize quicker.  Because of this, I would be conservative with lift estimates to ensure that the test is not designed too large.  At Widemile, we have a large list of arrays available to our tool and have calculated the approximate conversions needed to stabilize, allowing me to look at the three criteria I listed and find the arrays that are statistically viable for testing.  You should look for something similar with your tool of choice.

To figure out when a test is stabilized, I prefer to primarily look at level influence stabilization with experiment conversion rate stabilization for support.  Widemile Optimize shows this using graphs, so I simply look for horizontal trending of lines, meaning winning levels and experiments stay winners and their level of influence or conversion rates stay fairly constant (look horizontal) over 3-5 days.  If you don’t have graphs available,  the historical cumulative conversion rate for your experiments and see if there is a lot of variance between the latest few days of your test.

Content rules:

  1. Every item you test should answer an important question
  2. Test variety not quantity
  3. Test opposites first then refine
  4. Remember you can run more than one test

The content rules are closely tied together.  In effect, they ensure that the items selected for testing have purpose and that they don’t needlessly expand the size of your test, reducing its efficiency.  I begin designing tests by creating hypothesis regarding issues with the page and then choose factors and design levels to address those issues.

An example hypothesis is “Having a hero shot on the right side of the page causes users to ignore the important value proposition on the left side.”  To test this, I would choose hero shot position as a factor and then have “left side hero shot” as the baseline level and “right side hero shot” as the second level.  This example also illustrates that, other than headlines and images, testing layout is possible with creative use of CSS and sometimes JavaScript.  As long as you can revert from one to another and it matches the other factors and levels, you are at liberty to test anything.

Coming back to the rules, make sure that you are testing as few items as possible to find out what you need.  Before testing a collection of lifestyle hero shots, choose one and test it against an iconic hero shot.  This will save you the time of going down a path of testing something that may not work.

Lastly, you aren’t going to be able to get the best page on the first run or even second, third, etc.  If you knew what your audience liked 100% of the time then you wouldn’t need testing.  Remember to think of your overall test plan beyond just the first run, so that you can answer all the questions you need without having to force everything into one test.

In summary, determine what you’re trying to achieve, select the proper testing method to meet those goals and then make sure to be purposeful and efficient with the content you end up testing in front of your visitors.  Testing and optimization is not difficult, although it can be tough to start.  Follow these rules and you’ll be on your way to conquering conversion rates, bounce rates, funnel drop-offs and many other metrics.

Photo credit: Aranda\Lasch (CC)

Gamble with your conversions to raise them

You and your competitor’s all have the same landing pages.  You have a hero shot of the product, a big call to action button and short, punchy copy.  Or maybe you’re already ahead of your competitors and have run a few tests on your page, picking up more conversions on the way.  In either situation, you’ll eventually hit a wall and struggle to get additional lift.  So how do you continue to improve?

Go for broke.  Try something you’ve never tried before.  It might end up being a total failure, but it also might give you the lift you want.

The gamble you make with optimization can end in 2 ways:

  • You lose X amount of conversions over the week or two that the test is running
  • You gain X amount of conversions for the effective lifetime of the page

The possible upside dwarfs the downside by a large margin and, either way, you learn something new and can optimize the next test more successfully based on what you learned.

Luckily, with skill and experience, the risks of testing are minimized, however beating a strong page is never easy or guaranteed.  But when you do find something new that works or see that your current page still is a champ, you can rest assured that you’re doing all you can to drive conversions.

3 difficult optimization results and what you can learn from them (3 of 3)

Note: This is the third post of a 3 part series, each focusing on one type of test result that is tough to deal with. Read the other 2 articles on highly mixed data and the original page beating the new variations.

Ready for the toughest of all test results? I brought in Widemile’s Chief Scientist, Vladimir Brayman, for this post to help me with some of the concepts around this topic. The last of the three results is when the results just won’t stabilize.

How does this happen?
As long as you have homogeneous traffic and enough time, a test should stabilize. Unfortunately, this is not always possible and I don’t know anyone with unlimited time. The most obvious way this occurs is when a test is designed too large, meaning you don’t have enough conversion traffic for the number of variations you are trying to test.

Additionally, getting homogenous traffic is not always easy. If your sources are too different, you can have problems. Text, banner, e-mail ads and even Yahoo vs Google traffic may behave differently. The worst case is when these sources of traffic are added mid-test. I have had tests where an e-mail campaign was done at the end of a test without my knowledge (until I asked about the huge spike in traffic!)

You can’t control all traffic coming to your page from some sources like PR, blogs, seasonal events and news. This goes back to part 1, about highly mixed data; everything there applies to this case too.

A test also may not stabilize because the test is designed with elements that are too similar. The same thing can happen when 2 elements are different but have approximately the same amount of impact. In these situations, your data will go back and forth on which of them are the winners.

Anything outside of your page that has a large influence can destabilize your test, this includes pieces of your funnel. One symptom of this is when your clickthroughs are fairly consistent but the full conversions are not. If you are testing a landing page and the sign-up process after it is very kludgey and difficult for users then it can have a large impact on your tests’ ability to stabilize. This is especially true if the experience for visitors changes. An example of this is visitors bailing from a purchase funnel because shipping to their area is prohibitively more expensive than other areas. Although they would have converted if shipping was within the average price range, they ended up not converting because of something encountered outside of the landing page, skewing your results. This is in almost every test, but the magnitude of its impact depends on what exactly occurs.

What can you do to prevent this?

If you are using a testing tool different from what you normally track your conversions with, make sure you run a baseline test so that you can compare the numbers your testing tool gives you with the ones your conversion analytics produces. They should be within about 10%-15% of each other over about a week or so. Finding a large discrepancy here will save you from headaches down the line. This essentially double checks the expected traffic numbers by ensuring you are measuring your current conversion correctly, which allows you to design a test of the appropriate size. By size, I mean ensure that you have enough testing time and within that time you will get enough traffic.

While easier said than done, it is important to look for new traffic that may be driven to your page and to segment it out. Since this shares some of the same problems as highly mixed data, those solutions apply here too.

What can you do if this happens?

First, don’t cut your tests short unless you think more data won’t solve the problem. If you don’t reach stabilization, you are wasting all the time you tested since you have inconclusive data. Always try to be as conservative as possible and end tests only when you are very confident that the test is stabilized or that there is no other choice.

Think about restarting the test if it isn’t stable. Use a smaller design. Pick the important factors (pieces) and the levels (variations) that you think will perform and are drastically different from each other. This prevents elements from looking unstable as they flip flop as the optimal.

If your only problem is that 2 variations are vying for the winning position, then they likely perform about the same. It probably is not really worth your time to wait for them to stabilize and so stopping the test and going with either of them likely will have little difference to your conversion rates.

The problem of outside funnel influence is a bit harder, but not impossible to solve. The best solution is to segment the users that are determined to be unqualified. For example, if you only ship or work with US customers and businesses, then filter out any users that are outside of the US and do your analysis from there. This can be done either at the data level if you can tell where the data came from, otherwise this can be done with a splitter or qualification page that leads people into the appropriate funnel first. This may impact your overall conversions itself though, so careful testing around these methods should be done as well.

From my experience, the problems I’ve listed in these three posts are either preventable or unlikely to occur. The value of having an optimization expert is because they can avoid these situations or at the very least extract useful lessons when they do happen. Having said that, don’t be scared to test. Once you get the hang of it, it is a lot of fun and one of the keys to effectively growing and maturing your online marketing campaigns.

CC photo credit #1: ryanincCC photo credit #2: jurvetson

3 difficult optimization results and what you can learn from them (2 of 3)

Note: This is the second post of a 3 part series, each focusing on one type of test result that is tough to deal with. Read the first article on highly mixed data.

As an optimization analyst, this is probably the hardest result to bring to a client. Oddly enough, it actually is favorable to part 1’s highly mixed data and part 3. I am talking about optimization that determines that the original page is better than the tested variations.

How does this happen?
Sometimes a page just gets it right. How would you change Google? I looked for a few variations and came across one by Andy Rutledge and another by Valacar. They both are beautiful designs and a lot of thought were put into them, but at the same time, would they really make Google more profitable? It’s definitely a tough sell and there is a big challenge in improving this type of page.

The goal is for users to search. Yes, they want users to click on ads eventually, but there’s not a whole lot they can do for ad clicks on the homepage. The best they can do is get users to search as fast as possible. So would a redesign make it more usable and readable? Maybe. To a level that it would increase their revenues? That’s tough to say.

The more simple the goals of the page, the less information and messaging the users needs, the more likely that the page will be difficult to optimize.

What can you do to prevent this?
Be careful when choosing a page to test. Find a page where the user will take some time to look at what is going on. This is another reason why most landing pages are great places to optimize, because users naturally need to be introduced to the product and sold on why to convert.

The logical thing to do would be to simply refrain from testing pages that seem to be performing well, but this is rarely a good rule. Unless it is performing well because of a lot of testing, then you don’t really know if a page is performing well or not (see my post on conversion rates.) Testing always brings surprises and personal judgment is no replacement for a test; a good looking page can perform poorly and a page with subpar creative can perform great.

What can you do if this happens?

Because of the above reasons, you may actually plan for this scenario to occur. Many people believe redesigning an old page will provide improvement, but what if it is old and performing well? In that case, you may plan to try to improve but not expect to beat the old version.

In any case, if your original page wins, then you have confirmation of your page’s success. It is unlikely that all possible improvements were tested in one test run though, so it may take a few more runs to really confirm its solidarity, but the page has won against the initial best ideas and that is an achievement.

This lesson tells you that you can move on and that is progress in itself.

Moving forward, I would try drastically different approaches, either in layout or design and testing around offers. Otherwise, I would apply the successful original page to tests for other areas of your site.

I have to be honest when I say that this rarely ever happens. Almost every page has room for improvement at every step of the conversion funnel.

Whew, I will try to get the third and toughest optimization result next week.

CC photo credit: philosophygeek

3 difficult optimization results and what you can learn from them (1 of 3)

Note: This is the first post of a 3 part series, each focusing on one type of test result that is tough to deal with.

Mixed drinks

There are 3 types of optimization results that people never look forward to getting.

Unfortunately, anyone who runs enough tests will run into these situations. In the following 3 posts I will go over the 3 situations and outline how they happen and how to prevent and handle them.

This first post is about tests with highly mixed data.

How does this happen?
Typically mixed test data occurs when events out of your control, or you forget to control, impact your test and skew your test results.

Your average traffic finds your tested page either through search, browsing around your website or through your planned advertising campaigns. However they get to the page, tests are designed (or should be designed!) based on how visitors will get there.

The problem arises when, outside of the scope of those involved in testing or because of some oversight, a new type of traffic is driven to the page without any preparation being done for that traffic. While traffic is good for your sample size, it is bad because those new visitors are coming in with totally different motivations, assumptions and knowledge. This means they probably will react differently to your tested elements than the traffic you were driving to it originally.

Most often this happens to me because of a new marketing push, such as an e-mail blast, new display ads or promotions at a trade show. This can happen even more unexpectedly if some outside party drives a lot of traffic to your page. A news story or blog review that innocently links to your page, can suddenly becomes a big source of new traffic.

What can you do to prevent this?
The first step is to spread the knowledge around your company that this testing is going on and that anything that may impact the page and its visitors should be run by the optimization team first.

Next, always segment out significant traffic and track it separately. If you segment, the worst case scenario is that they perform the same as your current traffic and you did a little extra work. The alternative is having to trudge through your data, trying to separate the 2 types of traffic and possibly having to restart the test if you can’t separate them out.

Lastly, be aware. Watch your data and look for big changes. If you see something strange or a sudden shift, try to find a cause. It usually will be nothing, but if you do find something, quickly build a segment for it. Even if the “new” traffic has already started hitting your page, a segment should be setup as soon as possible.

What can you do if this happens?

I would still try to segment the data in any way possible. Even taking certain days/time out of your data, may be enough to salvage your results. Do your analysis with and without those days and see if the optimal page changes. You should take extreme care when doing this still though and make sure you have statistically relevant results.

My next solution is just to restart the test. Testing is about continual growth and you can’t always get what you want out of every test. Be happy that you got some extra traffic and try to take precautions to take it into account, or prevent it, the next time around.

Let me know if you’ve ever run into these problems before and how you handled them. Look out for part 2 of this series in the next week.

Photo Source (under CC)

How to get ideal test conditions (and results)

A big mistake in testing is to overlook variables inside and outside of the test that impact results. In an ideal test, the only variables would be the ones you are testing on your page. That usually isn’t possible though, but as long as you account for them in your analysis, you will get correct and actionable information.

Sky image

If you test a seasonal page, then the optimal page you get for that season, probably won’t perform when the season ends. By not paying attention to those kind of variables, you are setting yourself up into thinking you’ve found the optimal page. The same type of mistake is made by grouping e-mail, print, SEM campaigns and event traffic, unless you know they react the same to your changes.

Even within segments, there might be more segments to uncover. Your only limitation should be traffic; don’t segment so granular that you can’t run a decent sized test in a decent amount of time.

One of my clients doesn’t get a lot of traffic, but the traffic he does get is very distinct. One converts in the single digits and the other converts in the teens. Although combining them would get me more data, it would be very confused data since they convert so differently.

A few things to look out for:

  • The ad or offers visitors see beforehand
  • Interactions between your factors (if you aren’t testing interactions)
  • Technical problems
  • Problems that occur before or after the tested page

A note about the last bullet, the problems can range from a technical problem to a problem with the overall funnel. If people get different experiences in the funnel that drastically impact whether they convert or not, it can add a noise to your test. Some examples are different checkout processes for registered and non-registered users or users being inelligible for service.

The purpose of testing is to find out if a certain element performs well under the conditions you provide. If you aren’t paying attention to all the conditions, then the results you derive will be incorrect without you knowing.

3 steps to quickly make a good multivariate test

Having great testing technology puts a lot of power in your hands. You can test anything and everything you want. However, like any other tool, to use it effectively you have to use it right. There’s a lot of best practices and thought that goes into test design, but following these three rules can get you a good test in most situations.

Steps
  1. Maximize your traffic: Pack as much as you can into a test for the amount of traffic you have to keep it a short test. Using Widemile’s platform that’s 2 weeks to be safe, with Google Optimizer you should do at least a month (explanation).
  2. Test opposites: If you test stuff that’s similar, they’ll perform about the same. So find out the general theme you should be following first by testing opposites (B2B vs B2C, podcast vs ebook, descriptive vs benefits).
  3. Learn from the previous test: Always make sure you line up your tests so that you learn something that can be used in the next one to either refine or to learn something new.

The goal of these three things are to maximize your time spent testing by testing as much as possible while also minimizing testing suboptimal content. For example, if I was selling iPods and I tested 2 images of people running with the iPod, one with a man and the other a woman, I might think that was a good test. However I could have totally missed out on an image that worked better, such as an iPod next to a PC. I could test that out after the initial test, but then I just wasted one test run. The right way would be to test one sport image versus one PC image and find out which direction to go. From there I could test against other opposing images or refine the PC image.

The only warning I’d throw in is that if you’re trying to test a lot of things at once, you might want to scale back. Pick a 2-4 themes depending on your test size and stick to testing them out. Don’t mix and match.

Follow these steps and you’re on your way to getting not quick tests, but efficient ones.

What's an average conversion rate? 40%!

Would you believe that? And if it were true, would it really mean anything to you? It shouldn’t.

Conversion Rate Table

I get asked this question fairly often and at first glance it seems like a logical question to ask, but really the focus should be elsewhere.

From my experience, conversion rates range from less than a percent all the way up to 30% or more. Does knowing that help me optimize my clients’ pages? No. Every page has so many variables internally and externally that it is very difficult and nonsensical to worry about the average conversion rate.

The goal of your page, differences between your product/service against your competition, target you’re trying to reach, avenues you advertise and numerous other factors all effect your conversion rates. A competitor having a higher conversion rate than you, does not mean you’re doing something wrong. Set the baseline for yourself and keep improving it. That’s how marketers should approach their conversion rate.

If you’re testing, you’ll find out if you’re campaign is performing suboptimal and find out what the optimal is at the same time.

Pretty amazing huh?

I don’t tell clients I’m going to get their conversion rates above industry averages, I tell them that I’m going to make their campaign as successful as possible. Do that and you’ll be ahead of competition and ahead of where you were when you first started.

Google Optimizer is slow (or Not all Multivariate Testing is the same)

*Update: Hello!  If you’ve found this article after reading the book Always Be Testing, I encourage you to take a look at a more recent and in-depth article I’ve written here: An Essential Primer on Full and Fractional Factorial Test Design.  Thanks for visiting!

Without knowing it, people might assume that there’s only one method to multivariate testing. That it has been long figured out by math and statistic wizards. I have learned otherwise from Widemile’s personal math wizard, Chief Scientist, Vladimir Brayman.

(Just as a side note, he does not have a typical office. Rather than papers and folders strewn about, he has statistic and math books. Lucky for me though, he has a great skill at distilling all the goodness in those books and teaching me what I need to know, in a way I understand.)

Most recently, we discussed why Widemile’s technology trumps Google Optimizer.

Widemile vs Google

Having a strong creative team and testing experts ensures better results than giving a marketer a tool like Google Optimizer, that’s easy for most people to understand. But explaining how Widemile’s technology can test more, faster, is a little more complicated.

Let’s explore how Google’s testing works versus Widemile’s. Google Optimizer uses full factorial test design, meaning it creates a page for every combination of your tested page elements. So if you wanted to test 4 different hero shots, 4 buttons and 4 headlines, that would require 4*4*4=64 page combinations. The disadvantage of this method is that you need significant traffic for each of the 64 pages. Meaning you either need a lot of traffic or a lot of time; for most companies, they’ll need both.

To solve this, Widemile’s optimization platform use fractional factorial test design. This method tests only a small fraction of the total possible page combinations and uses statistical analysis to derive almost all of the same information that would be found in a full factorial test. This works because marginal information is gained in testing all 64 page combinations, while testing a few important combinations tell us nearly everything we need to know.

Google actually criticizes fractional factorial test design (look here where it says “A note about ‘fractional factorial testing’”), saying that it requires the same number of impressions, but can not derive the depth of conclusions that a full factorial design can. While true that full factorial squeezes out the most information, that is at a sacrifice of extending the test many times longer than with a fractional factorial test, all to learn the smallest influences.

Doing successive tests to find high influence items with fractional factorial testing will get much higher gains than getting every ounce of information out of one extremely long full factorial test. In addition, with a carefully designed fractional factorial test you can learn all the major influences and the interactions between elements on the page.

Fractional factorial test design gets you a completed test in weeks rather than months or years even, and because of that, you can test more than you would normally be able to in the same time frame. You can either test more in one larger test, or do many smaller successive tests.

Not to say that Google Optimizer isn’t a great tool, especially since it is free, but any company that spends thousands of dollars on SEM has a lot to gain by using technology that gets rapid results.

If you got any questions about this, let me know and I’ll try to answer them or get you an answer.