Archive for the 'Methodology' Category

Why optimization is like social media

Twitter social media cigarettes

Quick, short term wins are possible with optimization, however the real value comes as it builds and evolves with your audience.   Because of this, I liken optimization to social media.

Here are 3 quick reasons why optimization and social media are similar:

1. Users tell you what’s wrong and you can respond
Your users are the most valuable resource you have. Not only do they give you their personal information, hard earned cash and/or attention, they give you feedback on how you can serve them better.

In social media this comes in a push/pull fashion, allowing you to alert users about your own actions, as well as respond directly to their immediate concerns. The same happens with optimization, except that it occurs naturally as users interact with your website. Their actions reveal flaws in your offers, messaging and design and you can use that information to build a better experience and new tests.

2. Time sensitive
You wouldn’t blog or tweet the same thing everyday, so why would you keep your website static?  Optimization allows your website to be  responsive to your users current needs.  Anything from a change in opinion to seasonality effects to new marketing campaigns can cause a need to adjust your website.

While some items don’t need to be changed for long periods, you should constantly question whether what you are using is also what’s best.

3. It is a competitive advantage
Social media has caught on, however few companies are using it to its full potential. Optimization is in a similar boat but it’s even earlier it its life. How many social media experts can you name versus optimization experts?

Starting an optimization program now means you’ll be that much further ahead once testing becomes mainstream.

In summary, optimization is rapidly becoming a basic part of the online marketing landscape.  With this new technology comes more opportunities to win new customers and retain current ones.  However, making the most out of it will be what separate the winners from the losers.  Optimization is a long term commitment, just like social media, so keep tweeting but don’t forget to keep testing too.

Photo Credit: http://www.flickr.com/photos/carrotcreative/ / CC BY 2.0

What skills are necessary for optimization?

design

Use analytics? Update your website?  Then you have everything you need.

While optimization is a distinct process, it shares the same skill set as these common online marketing practices.

Similar to analytics, optimization requires implementation, data analysis and measurable marketing goals.  And as with updating a website, you need creative and design expertise, web development and copywriting.

The example optimization workflow below illustrates what and when the above skills and resources are needed in the process:

  1. Planning: At the beginning, all you need are basic marketing skills: select a page, spell out the questions you have about the page and determine the KPI’s for success.
  2. Design: Use copywriting and creative/design skills to create test ideas to answer your questions and drive performance based on the selected KPI’s.
  3. Build out: Use web development resources to translate your ideas into code.  You may be able to do this step yourself, depending on the content of the test and your own technical abilities.
  4. Reporting: After the test is live, you need to analyze the data.  You’re looking for answers to your questions  (Do testimonials  increase sales?) and new insights (We don’t have to use flash to grab attention.)

While some education is necessary, optimization utilizes skills familiar to online marketers.  Optimization isn’t more difficult than other online marketing, it’s just different.

I recommend starting testing ASAP, even if it is with a small portion of a web page and/or will have a small impact.  Going through the process will help make it a natural part of your marketing cycles.   After adjusting by doing a test or two, running a large scale optimization campaign quickly becomes not only feasible, but some of your most important work.

Photo Credit: http://www.flickr.com/photos/confused_andy/ / CC BY-SA 2.0

Optimizing registrations: Taking a look at Picnik

A huge part of doing optimization well is knowing what to test (put garbage in, get garbage out), so keeping up with good design philosophy is extremely valuable.  While brushing up on web design, I came across a Smashing Magazine article on UI design trends by Janko Jovanovic.  He uses a lot of great examples of good design, some of which are perfect for illustrating some optimization options.

picnik_badge_180x60

With that in mind, I’m going to examine one of the sites mentioned and discuss the good, the bad and the testing opportunities I see.  The (lucky?) site I picked was Picnik, which has done a commendable job on their registration strategy.  (Also, like Widemile, they are a Seattle start-up.)  I only wish the site wasn’t flash based, which is more difficult to optimize.  Despite that, my thoughts on test variations and best practices are still applicable to it and any other registration campaigns.

Quick summary: Picnik is an online photo editing application.  You can upload photos and do easy photo editing all within the browser.  You can try out the app, even exporting and saving photos, without registration.

Let’s get started by checking out their form:

picnik

Although a bit busy, I like the way the form assists users.  It has a green highlight to for the selected field and dynamically pops up error messages (see the username alert below).  Additionally, it hides and locks the “again” fields until there is valid input in the corresponding field.

picnik2

One highlight is how this is a good example of when a lightbox/page overlay type form might be appropriate (note that behind the form is the page I was working on, which has been darkened).  Why is it appropriate?  Because this is the form that pops up after the user clicks “Register.”  It makes sense to be direct and reduce additional marketing if the user indicates they want to sign-up by clicking directly on the register button.

Is this right for your site/landing page/microsite?  It’s hard to say, but I would recommend testing it.  This would fall into the category of a funnel test because it eliminates a page in the registration funnel.  As long as your full page and lightbox form don’t have any glaring issues, you should quickly see the influence of whether a small and direct lightbox form works, or if a whole page with additional information is necessary.

In terms of testing this overlay form, there are a few big opportunities for improvement.

  • Testing title and intro copy. Use “free” in the headline and as the first word, e.g. “Free registration”, then list a few benefits rather than saying “All we need is a username, password, and email address.”
  • Eliminate typing passwords and emails twice. Test this to see if it has a negative impact on registrations and if it creates lot of nonstarters (people who register but never return to the app.)
  • Change the color of alerts to red instead of green because green is the site’s hyperlink color and also used for highlighting the selected field.
  • The button should stand out. Call to actions typically work better when they are a different color from the rest of the site.  The button copy should be amped up a bit to “Get Started Editing”, “Save your photos now” or something similar too.

So how does Picnik capture users that don’t click register directly?  They offer it after a photo is saved:

picnikfull

As you can see, this page has a lot more content than the lightbox form since its a full page.  It has the job of pushing someone into registering after having used the product.  This is a good technique (mentioned in Javanovic’s article), but there’s always the question of if you’re offering too much or too little.  Testing how much to offer would be a very interesting and fruitful optimization campaign.

Overall, I’m not a huge fan of this page, but I do like the approach.  It has continuity at the top, showing the actual photo edited, and the form and main registration benefit (”Want Picnik to keep a copy?”) are prominent.  Also, they have structured the page to prioritize their conversion goals, keeping the focus on registration but still advertising the opportunity for people to print their photos or sign up for premium service below.

Here are a few recommendations to improve this page:

  • What’s the clock icon for? Make the headline bigger or put in an informative image that will help encourage registration.
  • Make the bullet points more prominent. The bullets disappear once the form begins to be filled out, using the same alert and field revealing technique I described with the previous form.  I would make sure the bullets stay on the page.
  • Test all the copy.  It’s hard to know what feature is most important to users without testing.  Uploading more photos might be more appealing or saving their connections to Flickr and Facebook.
  • Change the buttons. “Close photo” and “Create my account” look the same, they should be differentiated to emphasize their individual actions.  With a primary call to action, it needs to stand out.  Also, I would make the “Close photo” and “Continue editing” buttons much smaller to discourage immediate attention and clicks on those buttons, the point being to drive people to read the registration benefit copy.

Optimizing for registration involves many steps, beyond just improving the registration pages.  You can delve into when to ask for registration, test the ROI of emphasizing different products and then executing  segmentation focused pages as well.  However the easiest returns will come from some simple fixes like I’ve discussed above.

I hope this was helfpul talking over a real example, let me know if you’d like me to do more of these and if there’s any great sites out there I should look at.

Rules for a successful multivariate test (Billy’s Optimization Guide Part 3)

Rules of Six Detail

If you missed it, see Part 1 (A/B Split Testing) and Part 2 (Multivariate Test Basics).

With the basics of part 2 down, it’s time to start designing a multivariate test.  Every optimization project has different challenges and goals, luckily though, there are a few rules that apply to every multivariate test design.  These rules fit into two categories: technical rules and content rules.

Technical rules:

  1. Choose the appropriate multivariate test type (full or fractional factorial)
  2. Determine the number of factors and levels that can be tested based on estimated conversion traffic (choose a test array)
  3. Stop the test when it has stabilized, not based on your earlier estimations

These rules ensure statistical significance by constraining the test to the appropriate size at the beginning and then letting the test gather the proper amount of data at the end.

Running a test full factorial, if your traffic supports it, may be a good choice if you’re testing content that you believe to have many interactions or if you only want to test 2 factors with 2 levels each.  (Note: the smallest fractional factorial test size is 3 factors with 2 levels each.)  Typically though, you’ll want to run a fractional factorial test to save time and expand the number of factors and levels you can test.

In order to find out how many factors and levels you can test, you need to have some idea of your predicted page views, conversions, as well as an estimate of lift.  The reason that lift matters, is that a large lift will get you more conversions and so your test will stabilize quicker.  Because of this, I would be conservative with lift estimates to ensure that the test is not designed too large.  At Widemile, we have a large list of arrays available to our tool and have calculated the approximate conversions needed to stabilize, allowing me to look at the three criteria I listed and find the arrays that are statistically viable for testing.  You should look for something similar with your tool of choice.

To figure out when a test is stabilized, I prefer to primarily look at level influence stabilization with experiment conversion rate stabilization for support.  Widemile Optimize shows this using graphs, so I simply look for horizontal trending of lines, meaning winning levels and experiments stay winners and their level of influence or conversion rates stay fairly constant (look horizontal) over 3-5 days.  If you don’t have graphs available,  the historical cumulative conversion rate for your experiments and see if there is a lot of variance between the latest few days of your test.

Content rules:

  1. Every item you test should answer an important question
  2. Test variety not quantity
  3. Test opposites first then refine
  4. Remember you can run more than one test

The content rules are closely tied together.  In effect, they ensure that the items selected for testing have purpose and that they don’t needlessly expand the size of your test, reducing its efficiency.  I begin designing tests by creating hypothesis regarding issues with the page and then choose factors and design levels to address those issues.

An example hypothesis is “Having a hero shot on the right side of the page causes users to ignore the important value proposition on the left side.”  To test this, I would choose hero shot position as a factor and then have “left side hero shot” as the baseline level and “right side hero shot” as the second level.  This example also illustrates that, other than headlines and images, testing layout is possible with creative use of CSS and sometimes JavaScript.  As long as you can revert from one to another and it matches the other factors and levels, you are at liberty to test anything.

Coming back to the rules, make sure that you are testing as few items as possible to find out what you need.  Before testing a collection of lifestyle hero shots, choose one and test it against an iconic hero shot.  This will save you the time of going down a path of testing something that may not work.

Lastly, you aren’t going to be able to get the best page on the first run or even second, third, etc.  If you knew what your audience liked 100% of the time then you wouldn’t need testing.  Remember to think of your overall test plan beyond just the first run, so that you can answer all the questions you need without having to force everything into one test.

In summary, determine what you’re trying to achieve, select the proper testing method to meet those goals and then make sure to be purposeful and efficient with the content you end up testing in front of your visitors.  Testing and optimization is not difficult, although it can be tough to start.  Follow these rules and you’ll be on your way to conquering conversion rates, bounce rates, funnel drop-offs and many other metrics.

Photo credit: Aranda\Lasch (CC)

My response to Google's Lead Designer leaving because of testing culture

I recently read Douglas Bowman’s blog, Google’s former Visual Design Lead, about why he left Google.  In it, he describes how the engineering culture contributed to his decision to leave:

When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. [...] that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.

He then references Google testing 41 shades of blue and a recent debate he had over “whether a border should be 3, 4 or 5 pixels wide” and was asked to provide data to back up that decision.

Bowman’s post brought up some feelings of disappointment towards Google because despite having their own optimization tool, they did not create a culture to encourage their lead designer to expand his work and actually drove him away.  Optimization and testing is still in its early stage, so mistakes will be common, however I hope news like this doesn’t scare others away from testing.

Rather, I hope companies can learn from Bowman’s experience.  Instead of holding designers to every detail, testing should allow them to explore, learn and refine their ideas.  Testing should not prevent “any daring design decisions,” I feel it should actually encourage them.  As I said before, gamble with your conversions to raise them.

In the end, it’s all about having an understanding of how testing should and should not be used.  You can use testing to find the best shade of blue, but that doesn’t necessarily mean that’s what you should be testing right now.  Don’t be afraid to take a step back and try something new rather than fiddling with details, testing tools give you that freedom.  Big risks, reap big rewards in optimization.  Not taking risks leads to inefficient testing and, in Google’s case, a designer’s resignation.

Photo credit: i-marco (CC)

How to pick a page to test and optimize

Selecting the right page to test is possibly the most important decision of an optimization campaign.  You can have great ideas, the technology and talent behind you, but if you pick the wrong page you could be doing a lot of work for minimal return.

So how do you get the biggest bang for your buck with testing?  Here’s a quick list of things to look for in a page:

  • A single, specific and easy to measure conversion goal
  • Sizable conversion traffic (at least 200 conversions in a week)
  • A page that suffers from poor design or unclear conversion goal
  • No large technical hurdles to implementing and executing the test
  • A conversion rate that’s lower than comparable pages

Attacking pages with these attributes will get you some easy wins and help establish testing in your company.  Typically landing pages are the best pages to optimize, especially if they have the end conversion goal on the page, e.g. a form submission, download or click-out.

From there, I would move onto other pages in the funnel, taking a look at bounce rates to help determine where you need to help push visitors further into the funnel.  If there are no other pages in the funnel, find other poor or under performing pages on your site and take a look at them according to the rules above.

The main idea is to see that testing is a process and that just because you have ideas to improve a page, does not mean it is the best page to spend your time improving.

Photo credit: lepiaf.geo (CC)

Breaking down multivariate testing (Billy’s Optimization Guide Part 2)

If you missed it, see Part 1 (A/B Split Testing).  Update: Part 3 on Rules for a Successful Multivariate Test is here.

The technical and statistical aspects of multivariate testing can be complicated but in order to design successful tests you don’t need to know everything, just the basics of how it works and some guidelines.  I’m assuming you already have some understanding of multivariate testing, however I want to cover the basics and make sure we’re on the same level before going into how to design good multivariate tests.

Check out the wireframe below.  Pretty standard for a landing page, right?  To properly design a multivariate test, we have to look at the page in a certain way.  Using three key terms, factors, levels and experiments, we can break down a test and describe its framework.

Factor: An element of the Web page (headline, image, text) being tested.  The element can also be groups of content, e.g. left column, button and hero shot together, or all banner ads on the page.

Level: Content that is assigned to a specific factor to be tested.  For example, one variation of a hero shot.

Below are 4 factors from our example page (headline, hero shot, offer and button) and then each of those factors with 4 levels represented by the different colors.  Note that the levels of one factor do not have to relate in anyway to the levels of other factors.

The last term, experiments, makes use of both factors and levels.

Experiment: A unique combination of levels used during a test.

Here you can see 4 different experiments.  Each experiment is different and holds different combinations of levels.  Note that there actually are many more variations (4×4x4×4=256 combinations).

Essentially a multivariate test involves showing these experiments randomly to live traffic, while tracking how each experiment performs.  The one that performs the best wins.  Each experiment is shown to many people, but each person only sees one experiment.  (There is some complexity in this, if you are still confused or want to know more, go to my primer on full and fractional factorial testing.)

In my next post, I will use these terms to outline the rules to creating a great multivariate test.

Are your visitors telling you if you're getting hotter or colder?

In elementary school, I played the game Hot or Cold in class.  The rules of the game are simple:

  • One child is picked as the “searcher” and leaves the room
  • The class collectively chooses an object in the room, like a marker or eraser, for the searcher to find
  • Once the object is selected, the searcher returns to the room and has to find the mystery object as quickly as possible

To help the searcher out, the other kids in the room scream hot, if the searcher gets closer to the object, or cold, if they get farther.

To make the game more challenging, the searcher might be limited to only one clue, just hot or just cold.  Kids that were told both hot and cold found the objects fairly quickly, but if they were only allowed one type of feedback, it took them much longer.

For the same reasons that it is hard to find the object in the game without being told where it is closer and farther from, in testing, if you don’t design your tests with two distinct variations, you might go wandering for a long time trying to find what exactly your customer wants.

My metaphor fails in one way though.  In the game, the searcher does find the object eventually, even with just one type of hint.  However, If you don’t design tests correctly though, you may never find a page that resonates strongly with the audience.  You might test dozens of testimonials and find the most successful testimonial, but if you never test it against no testimonial or a review, you may be missing out on even bigger gains.

Let your audience tell you hot and cold by designing your tests intelligently and they’ll help you find the optimal page faster than ever.

Photo credit: Night Owl City CC

Gamble with your conversions to raise them

You and your competitor’s all have the same landing pages.  You have a hero shot of the product, a big call to action button and short, punchy copy.  Or maybe you’re already ahead of your competitors and have run a few tests on your page, picking up more conversions on the way.  In either situation, you’ll eventually hit a wall and struggle to get additional lift.  So how do you continue to improve?

Go for broke.  Try something you’ve never tried before.  It might end up being a total failure, but it also might give you the lift you want.

The gamble you make with optimization can end in 2 ways:

  • You lose X amount of conversions over the week or two that the test is running
  • You gain X amount of conversions for the effective lifetime of the page

The possible upside dwarfs the downside by a large margin and, either way, you learn something new and can optimize the next test more successfully based on what you learned.

Luckily, with skill and experience, the risks of testing are minimized, however beating a strong page is never easy or guaranteed.  But when you do find something new that works or see that your current page still is a champ, you can rest assured that you’re doing all you can to drive conversions.

An Essential Primer on Full and Fractional Factorial Test Design

What are full and fractional factorial test designs? How do they relate to optimization and what about interactions?

Once you get down and dirty with testing, these questions matter. Whether selecting an optimization platform or trying to thoroughly understand the tests you are building, grasping these concepts will put you in greater control and allow you to design and analyze your tests more effectively.

As simply as possible, I hope to educate you and other marketers about full and fractional factorial test designs and why fractional factorial is the best choice for multivariate testing of online campaigns.

Note: “Partial factorial” and “fractional factorial” are the same. Also, if you don’t have a thorough understanding of experiments and interactions, please read those first.

The tests used in optimization are from the design of experiments field. (From Wikipedia: “Design of experiments is the design of all information-gathering exercises where variation is present, whether under the full control of the experimenter or not.”) The two types of tests I will focus on are fractional factorial and full factorial.

Here is an example I will use to explain these concepts. Below is a test matrix outlining a test for a landing page with 5 factors with 2 levels each. Don’t let the vocabulary scare you away, this means that there are 5 parts of the page being tested and 2 variations of each.

Recipe Matrix: 5 factors = 5 parts (hero shot, headline, etc.) and 2 levels = 2 variations

These factors and their respective levels make up the possible combinations for a landing page. The combinations displayed are called experiments.

Let’s calculate the total number of experiments possible (even if you know how to do this already, this is important to understanding the distinction between fractional and full factorial.) There are 2 levels for each factor, so you can have 2×2x2×2x2 (2 to the 5th power) = 32 possible experiments. This means there are exactly 32 combinations of hero shots, headlines, sub headlines, button text and main copy from our matrix outlined above. Note that if we add another factor, it becomes 2 to the 6th power or 64 possible experiments. Additionally, if you add 2 more levels to any of the existing 5 factors, it will increase from 32 to 4×2x2×2x2 = 64 experiments also.

In testing, each experiment must get a minimum amount of measurable conversions, known as the sample size per experiment. This ensures that there is enough data for a solid statistical analysis. Therefore the more experiments you have, the more conversions you need. You can think of conversion data as time also, since the longer you leave your web page up, the more data you get.

Now we’re ready to go back to the difference between the two test designs. Full factorial testing requires that every possible experiment combination is shown, so our 5-factor test would need to display all 32 experiments. This means that if there is a sample size of 100 conversions, 3,200 conversions will be required. Fractional factorial works differently, it displays a much smaller number of experiments, about 8 in this case, so it would need about 800 conversions.

Since full factorial gathers additional data, it reveals all possible interactions, but as seen by the numbers above, there is a trade-off. More data equals more information but more data also equals a longer test duration. The minimum data requirements for full factorial are very high since you are showing every experiment.

Even if you are using full factorial to get the same amount of information as a fractional factorial test, it will take more time since you need more data to see statistically relevant differences between the many experiments.

You might be wondering how fractional factorial can be accurate if interactions are possible?

Random interactions of high relevance are very rare, especially when looking for interactions of more than 2 factors. You really need to design tests where you look for meaningful interactions that are based on true business requirements rather than hoping for a random and low influence interaction between a red button, a hero shot and a headline.

Whatever the interaction is, you need to be able to understand your audience and infer why there was an interaction in the first place, only then are you ready to start designing for interactions.

Tests should not be filled with random levels, they should be carefully designed for success by focusing on testable hypotheses around the audience. Could a 1 pixel drop shade on a button interacting with the copyright statement ever be truly significant, and not a victim of random error? Is it worth sacrificing thousands of conversions to learn a lesson that won’t result in any relevant increase of real world conversions?

There are interactions that might make sense and those that should be avoided from being measured because of the amount of testing time it adds.

This brings me to fractional factorial. It is possible for fractional factorial tests to detect interactions. How so? Using our example of a 5-factor test, fractional factorial can include everything from only main-effects all the way to 4-factor interaction effects. Full factorial’s only difference is that it is the full extension and includes the 5-factor interaction effects.

Fractional factorial is not a one-trick pony, it is a continuum ranging from testing for no interactions (only main effects) to one factor less than full factorial. It is exactly what the name fractional implies; even one less is a “fraction” of full factorial. It gives you the power to make trade-offs between testing only main effects to testing for interactions based on intelligent test design.

Once you decide to test for all possible interactions, you are committing to a full-factorial test and incur the associated traffic requirements. I’d love to see a test design that is designed for full interactions and still makes sense! Not having the ability to reduce the number of interactions is a huge detriment rather than a benefit of solutions limited to full-factorial testing.

Radically shorter test times allow for many more smart marketing ideas to be tested and adapted based on what you learn from each test run. You, the marketer have the ability to analyze your results and tweak follow-on tests to capitalize on what you learn. This common-sense approach is what hypothesis-based testing is all about and is very powerful. Focus on testing smart ideas to increase your conversion rate – that’s what matters most.

The graph below illustrates how much information is gained and the amount of testing needed, based on the number of interactions tested.

In my experience, the red area shows how valuable the data is based on which effects are being tested, while the blue area shows the amount of data (or time) needed to gather the data to confirm those effects. The x-axis goes from left to right, from main effects to full factorial (5-factor effects).

At Widemile, we believe it is more effective to perform quick, successive tests detecting only main-effects rather than randomly hoping for interactions. While interactions might give you small or even large gains, it likely will never not trump the gains from additional testing, nor the time and money lost looking for random interactions. The additional time required for full factorial tests is large and not many marketers want to wait more than a month for a test to complete.

Fractional factorial is preferred by a few camps, including Widemile, Omniture’s Test&Target (formerly Offermatica) and Interwoven’s Optimost. Full factorial is used in Google’s free Website Optimizer and some tools offered by smaller providers.

Testing for all interactions sacrifices a lot of time. With the speed that audiences, marketing campaigns and seasons can change, it is important to get the most testing done in the least amount of time without sacrificing the quality of the data. Fractional factorial allows you to do just that, making it the wisest choice for multivariate testing.