First, Last and Equal Attribution – 3 Wrongs Don’t Make It Right

February 18th, 2009

Topics: Digital Marketing, Innovations, Optimization

Hi Everyone,

Last week I had the pleasure of listening to Eric Peterson speak not once, but twice. The first time was during a Coremetrics webinar on campaign attribution and the second later that evening at the local Web Analytics Wednesday where Eric delivered a longer presentation that included the same attribution material. And while I have a great deal of respect (and even friendship) for Eric and an equal amount of respect for Coremetrics, I feel a need to challenge the content.

For awhile I’ve been speaking about the emergence of the third generation of web analytics, as I call it. For those that haven’t heard me present this before, the first generation was characterized by IT departments measuring web site activity via software installations of log file analysis tools. The second generation was dominated by marketing departments utilizing hosted solutions and page tagging. The primary value these two generations of solutions provided were aggregate reports, along with rudimentary ad-hoc analysis capabilities (rudimentary, that is, compared to modern business intelligence systems).

Whereby the first two generations were characterized by reports, the third is certainly about the data – the open access to un-aggregated visitor detail data and the endless forms of true analysis that can be performed with it. Knowing that Coremetrics is one of a few major vendors to store un-aggregated data in an industry-standard database (along with WebTrends) I was expecting a thoughtful discourse on statistical modeling. Alas, what we were told was to utilize not one, but three flawed attribution models (last, first and equal), in hopes that three wrongs would make it right I suppose.

Since our high school statistics classes we have been taught the difference between correlation and causality. Statistics show that as ice cream sales increase, so do drowning deaths. Therefore, ice cream causes drowning, right? Of course not – it is the onset of warmer temperatures that indirectly leads to both. As trite as this example may seem, it is no different than the fallacy that a campaign’s inclusion in a visitor’s click-path prior to conversion means that it had a causal affect on the conversion, or that it belongs in our campaign portfolio. The same campaign may have been clicked on by many more non-converting visitors … at substantial expense.

True, if a visitor clicked on a campaign prior to conversion, it’s certainly more likely to have had a causal impact. That’s especially true for the last campaign. But if we’re going to finally break away from the flawed last-click attribution model, why not do it correctly? We have the data – let’s use a statistical model.

Now for the less-than mathematically savvy user of web analytics, no, this doesn’t mean your solution will be more complicated. Quite the contrary. Before credit card companies implemented mathematical models to detect fraud, we consumers would first learn of fraud only after we received our statement. And then after weeks arguing with our vendor we might have gotten the charges removed. Today we get a phone call within hours of the questionable transaction and a new card sent overnight to us, no questions asked. Math made our lives easier.

So will it be for campaign attribution. Imagine a campaign report that tells you, in a statistically valid way, which campaigns and campaign attributes actually had a positive contribution to conversion and to your campaign budget, versus those that didn’t. Then imagine that same report telling you how to improve results. I propose the following report:

Dream Campaign Report

Don’t sweat the details – I just punched some example data into a spreadsheet. Instead, focus on the bigger picture of having a report that shows you how your campaigns truly performed and recommends to you an adjusted mix based on the current set of campaigns. Then imagine the data for auction-based networks being automatically passed to an automated campaign optimization system. Now that would be progress towards true optimization of campaign budgets while also making the marketer’s job much easier.

Note that at the moment WebTrends doesn’t provide the above report either (but we do have the requisite data in a readily accessible format). My point is that it’s time to embrace the third generation of this industry and start truly leveraging the data in mathematically and scientifically valid ways.

- Barry

P.S. Please send me your thoughts on the dream campaign report.

  • http://www.latitudegroup.com Luke Regan

    Why not just pause your Display campaign for a month? Track conversions on site and Google searches for your brand that month versus the month with it on, normalise for seasonality and you’re away. Too simplistic?

    What about engagement metrics, how are they factored in? And offline conversions, WoM – conversations in the coffee shop/pub? You can’t track everything.

    If we’re talking about monitoring long term brand building/goodwill then that’s a whole other conundrum.

  • Robyn

    Hi Barry, thanks for initiating this discussion I have enjoyed reading all the comments! I want to go back to the fundamental issue we have with media attribution – 100% allocation to last click. Forgive me, I’m not a statistician, but I can’t understand why we don’t attribute all clicks (not just last) and display reports showing the interaction effect of all direct marketing sources rather than looking at each channel/campaign in isolation. I would love to see a report similar to your first draft above in a MANOVA/ANOVA output table format. Can someone with more statistical knowledge that me explain why we can’t display the contribution significance on conversion where – for example – segment A interacted with search only versus Segment B who was exposed to banner AND email and so on…. Compute a F-stat/p-value and show to us the interaction effect of multiple direct communications. In addition to this, if our clients have the database infrastructure set up (and yes, we all know this is not always the case!) we can map email and DM comms into this mix providing us with much more comprehensive overview of marketing effectiveness with the world of CRM/eCRM. Or can we actually manually do this now? Does Omniture/Web Analytics allow us to export click data out, media agencies provide us with impressions, database managers provide us with DM/EM contact and response files and we compute these MANOVA tables ourselves? Sorry if I am behind the curve here is this is what everyone does already……

  • Plamen Arnaudov

    Hi Barry,

    Thank you for a thoughtful article. I think the gist of the argument is strong but like James Dutton, I’m not quite sure what to make of the report mock-up. What would drive the recommendation metric? What is a contribution index? Please add some basic exegesis.

    One thing I feel for sure is missing from the dream report is the comparative performance of other traffic sources within the same time frame, such as organic search or direct traffic. Lately, we’ve been doing some work to consolidate traffic sources in deeply customized WT reports and our clients are happier for it.

    Moving on, I’m of the opinion that good campaign tracking includes not just segmentation by campaign sources or mediums but cross-sects that dimensionally with other kinds of segmentation to rule out (or in) other possible factors in under-/over-performance. Your example of a ‘hidden’ causal link between ice-cream and summer is not impossible to capture by solid data if the right segmentations are combined. For instance: If your display campaign is doing great in Australia, is it because they find the picture of a penguin irresistible or is it because you’ve always done well marketing sunscreen lotion to people under thin ozone? Crossing campaign source performance with direct traffic performance with user locale performance will help you answer that. This is true not only for geographic location: pretty much any layer of visitor segmentation you have (or don’t have) may show decisive evidence to help you make sense of relative campaign performance.

    I strongly agree with you that this all needs to be put in an open framework where the user asks questions and the interface crunches data attempting to answer it–after all, it has to be a fluid process for it to qualify for the “next-gen” label. But let’s not assume that fluidity and progress are somehow synonymous. it won’t be so bad if users have a well-designed tabular report as their starting point (let’s face it — few of them will have the time or inclination to travel beyond). The key to next-gen Web Analytics presentation is to better understand your audience and provide the intuitive tools for everyone to get to where they want to go fast–this includes power users but also users who want the data analyst to do the work and spell out the answer.

    Which brings us back to that really interesting “Recommendation” column, I’d love to know more about what you were thinking there and how it would jive with openly querying the data. For instance, if I add a dimension of Geographical Location, would the recommendation engine know to scale down its excitement over folks at Lichtenstein who tend to show very high conversion rates over very few visits? Would the end user be allowed to define the parameters that drive this recommendation metric?

  • http://gotanalytics.blogspot.com Chris Grant

    Sanjay’s point about us not being statisticians is true, but whether that matters is another question.

    We don’t have to be statisticians to use Barry’s spreadsheet. But we do have to trust the underlying intense calculations that produce those numbers. The underlying algorithms are not the kind of thing we’re used to.

    We’re (in web analytics) accustomed to tabulations (counts), sums, and ratios in the web analytics world. I think Barry’s talking about something far more sophisticated, probably something that Excel can’t do at all. WebTrends Analytics, Omniture SiteCatalyst, and Google Analytics are just big tabulation engines; they count things and lay them out in tables, with a little simple division math thrown in from time to time. They make simple little graphs too. And that’s all they do.

    We, the humans, look at those numbers and make judgements about things, like whether there’s a time trend, or whether one number is bigger than another, and whether the difference or trend is big enough to pay attention to. Most of us don’t base those judgements on calculations at all, we just make a gut feeling call.

    Barry’s table displays the output of something much smarter, using mathematical techniques that have been around for a long, long time but are hardly ever used in web analytics. They have to do with inferring causality or association (i.e. what he calls contribution) in very complex situations.

    I think very few of us have ever been exposed to those techniques. We don’t have to be statisticians, though. But we do have to do some learning. The concept of “contribution” is one thing we have to learn about, and we have to learn how to interpret it and use it — but not how to calculate it. We need to learn to work with models and modeling (we need to learn what models ARE).

    I’m bummed about how expensive these things will probably be, however. It’s quite likely that I won’t get my hands on this for years, if ever. And I think this jump in sophistication will give those who can afford it a competitive edge. I hate being an underdog.

  • Pingback: attribution is the hottest topic these days … « Analytics Strategist

  • http://www.webanalyticsdemystified.com Eric T. Peterson

    Barry,

    Thanks for responding to my feedback. Regarding your comment about “mathematically supportable analysis that clearly shows it to be a closer approximation of causality” I think you are confusing your model and mine. Appropriate Attribution, as described in the short presentations you saw online and here in Portland, makes no attempt to determine causality.

    Appropriate Attribution is simply the next logical step between our collective dependence on last-based campaign value assignment and the “third-generation” statistical model you have started to describe. I would not describe it as a solution to the campaign attribution problem we all have, but most companies are so far away from resolution I preferred to take an incremental approach rather than telling people to use some arbitrary model or put their faith in software that does not yet exist in the web analytics sector ;-)

    I’ll save the details for the white paper and have little doubt you will give it a good read once it’s available.

    Thanks again,

    E.

  • http://www.sanjayonline.co.uk Sanjay Morzaria

    Barry,

    Interesting article. We do need a third generation way to measure our analytics but fear that most of us are not statisticians.

    Therefore, perhaps there is another angle instead? Wouldn’t it be wonderful if we were able to give a “monetary” value (based on a rational matrix) to each page so that we are able to give a meaningful weight to the visited pages? This would mean that the lower visited pages with a higher monetary value would be reported to our stakeholders rather than being lost in the data. We can then really start to understand where the real value is on our website.

    Stakeholders relate to £ and $ so anything we can do to measure our website accordingly will make them sit up and take notice. Has anyone done this on their website already?

    Sanjay

  • Barry Parshall

    Hi All,

    First of all, thanks to everyone for all the great comments.

    I’d like to start with Eric’s thoughtful and fair reply. While my posting is partially aimed at your presentation and model, most of my criticism is directed at the vendors … including the one who employs me. I can hardly criticize non-vendors attempting to design better analytical models within the confines of existing solutions. That said, and in the spirit of self-criticism, I’d like to know if “Appropriate Attribution” has mathematically supportable analysis that clearly shows it to be a closer approximation of causality (and if so will you share the data) or if at this point we only have a few case studies that may or may not be anomalous.

    Next I’d like to reply to Jen’s words about market readiness. Mathematical models impact our lives on a daily basis – it’s just all hidden. Ultimately I see all this manifesting as a new report that is far more useful and actionable to the advertiser than the ones we have today. I think the more compelling issue will be trust. It’s funny, we all know last click is bad, but we understand it and rely on it today. Introduce a model designed by math PhD’s that is thus beyond our own understanding, and we feel compelled to question it. Perhaps the simple version of the end report will be for the trusting types, while a second report will include all the error margins and certainty factors for those who need a window into the machine. In the end readiness and adoption will be a reflection of how simple and prescriptive the manifested information is, and of course if it actually produces results. Intuitively I see no reason why these criteria can’t all be met, but then you’d need a SuperCrunching system to compute the number of times I’ve been wrong.

    James, I could spend all night responding to your comments. It’s irrefutable that I’ve only scrached the surface of this topic. And while my educational background is in computational mathematics, the work required is beyond my aged abilities (such as they were in the first place). But Chris is right, the math has been around for a very long time, has been applied to probably thousands of similar problems, and is likely built into many commercial products like SAS, SPSS and Analysis Services. I too wonder if a single model could perfectly solve the World’s needs. Likely not. But I’m reasonably confident a single model fueled by a handful of input parameters (derived from a simple set of questions for the user) would get us pretty darn close and at any rate far better than what we have today.

    Your comment about impression data is spot on. One of the limitations imposed upon analytics vendors is that we lack impression data at the visitor level (and the ability to easily stitch together the visitor viewing the ad with the one who visited the site). So while some conclusions can likely be reached by correlating impression counts with clickthroughs and conversions at the campaign ID level, it’s certainly far less statistically interesting. I promise to comment further on this in a future post.

    Anyway, thanks again for the comments. I’ll close by asking again for feedback on the dream report. Do users want to be offered suggestions on what to do with their campaign mix, or would they prefer to interpret the data on their own, or some blend of both?

    - Barry

  • Terri Boyle

    Interesting article, Barry. You’ve well-stated the evolution of WA and I agree with you regarding the 3rd cycle of WA.

    So, how do we (as a community) start it? And even better, how do we prove to a business that it is important enough to invest in it? Thanks to the economy and the mind-set it has most businesses in, convincing them to part with budget to allow a practitioner to “evolve” their information to the next key step is difficult right now.

    It is one thing to have the data and the understanding of how it should be viewed, but how do we show our businesses that it is beneficial, time-sensitive and cost effective to all sizes…not just the big guy?

    I don’t want to sound negative or ignorant but am curious how you’d recommend on doing this.

  • http://blog.jimnovo.com/ Jim Novo

    This is not a comment on Eric’s material as I have not seen it…

    Barry, this kind of reporting could single-handedly save the Display business (or kill it, depending on the results) if implemented. Perhaps you could get IAB to fund the project?

    Even if we only get to “we’re not sure why the ad had an effect, but we can prove there was one”, that’s a heck of a lot better than “the ad was present so we infer an effect”.

    I can’t wait to see the first report showing *negative* impact on outcome from an impression or click. WA folks, in your gut, you just *know* that is happening, right? It can’t be that every interaction with advertising has a positive outome. Boy, will that send a shudder through the ad world, both online and offline.

    I say let’s prove it!

  • http://www.waomarketing.com/blog Jacques Warren

    I second Chris here. There’s no way Web Analytics will hold water with BI if we don’t add that layer. I knew it was a good idea to keep my college stat books!

  • Jen

    I have to agree with Michael. The key here is that this opens more opportunities for understanding user behavior. More inputs for marketers to use to improve the bottom line – whatever it is they care about. The focus is still around “indicators” and testing. So if attribution gives me an indication of how well or poorly I am acquiring users at the earlier stages of consideration, and I can make an adjustment that I can then test (assess-adjust-test-repeat, right?) then it seems like goodness, despite lacking hardcore statistical backing.

    I too hope we get there someday – but just like it took a long time to stop talking about hits (or BYTES, who remembers bytes? ;) ) I think it will be awhile before they are ready for statistics. We have to figure out a way to mix it in slowly … like adding spinach to your kids spaghetti sauce! And, for better or worse, I think it will come from the analysts and consultants before the vendors.

  • http://twitter.com/slicecast James Dutton

    Barry,

    You bring up some interesting points in your article. I would have to agree with you that all of the web analytics vendors I speak with are trying to address the media attribution issue, your suggestions are interesting. However, I am interested for you to perhaps spell out more of your thoughts on modeling the data – your screenshot of your report is interesting, but in a way quite simplistic – without your sharing at least some direction on the model that would be used to address the attribution weighting it seems all you’re providing here is a report that shows a media attribution based on [unknown model #1] – is your thought that a fixed model would apply to all businesses? that a model would be customisable per external analyses?

    It’s very satisfying to see you approach the subject, but I feel that you haven’t gone far enough to help the average web reporter to understand what this means – and yes for most users a combination of first, last and linear attribution is going to satisfy the reporting needs of a web report (wondering how many web analysts are really responsible for output of such analysis: media allocation and channel planning?). I don’t want to throw another spanner into the debate, but I’m wondering if you have considered the dimension of data that most WA tools do not hold within the warehouse – display advertising impressions? In the light of display advertising yielding very little in terms of clicks, then how would a click based model address the view-through impact of display?

    Will this issue – one of the biggest issues we see day to day – be the first time web analytics tools start to introduce real statistical methodologies? Will attribution be addressed through a simple visual front end, or will it be addressed through the current standard of econometric modeling handled by the more analytically minded media agencies? Does this analysis work for offline attribution, or would econometrics still be the best way to align media analysis for offline?

    How about this for a suggestion – why not post sample media data and have analysts contribute to analysis of that to identify ways to address the attribution debate. I’m sure you will get some very interesting responses!

    Anyway, enough of my rambling – I think you made a great start here, and looking forward to seeing this debate continue!

    Cheers, James.

  • http://gotanalytics.blogspot.com Chris Grant

    At any rate … a mathematically derived MODEL (that’s what this is, not a report) would be a couple generations ahead of where we are now. Heck, nobody in web analytics even reports on the statistical significance of a year-over-year delta, including WebTrends. Nobody even mentions statistics or sample size in their documentation or help screens. If you can get people in this industry to understand, appreciate, adopt, and pay for even this modest level of sophistication, it will be as impressive as just getting the technology to happen in the first place. If you can furthermore keep the price to where ordinary-size sites get a payback on it in a reasonable time, then “impressive” won’t even begin to describe it.

    Well, go for it WebTrends! The encouraging thing is that the math for this has been available for decades. This isn’t rocket science. Most graduate students did it their first year (and were possibly disappointed years later to realize that the business world hasn’t even made it to high-school statistics levels). The hard part might be getting customers to think that it really is worth more than the simple arithmetic of first-last-equal.

  • http://gotanalytics.blogspot.com Chris Grant

    Very interesting. I’m not quite sure what Eric said that you’re challenging, though.

  • http://www.mymotech.com Michael

    You took Statistics in High School? Talk about decrease in quality of education. Ok, but seriously, you are absolutely right. This is exactly the challenge that has to be addressed in all of our marketing departments. Our job is to grow the top/bottom line without an increase or maybe even a decrease in overall ad spend. Understanding the relationships between marketing channels at least opens the door to making that a reality.

  • http://www.webanalyticsdemystified.com Eric T. Peterson

    Barry,

    It was nice to see you last week as well, and as I said at the time I am encouraged to see that WebTrends is starting to talk about how the new open platform will allow companies to finally put the “analytics” into “web analytics.” I guess reading your post I wonder why you felt the need to challenge Appropriate Attribution?

    We’re both talking about making the only logical choice regarding digital marketing — moving off of the entrenched “last” model and starting to do more to understand how online marketing is really impacting the business. You’re talking about the future — so much so that you comment that your customers cannot actually do what you describe today, at least in your applications — and I’m talking about the present.

    Appropriate Attribution is something that your customers can do today — we talked about that. The ratio is simple and doesn’t involve any complex models or black-box algorithms — it’s math we learned in elementary school. And the result is, at least in my research and humble opinion, significantly more telling about our digital marketing spend than most of the current crop of campaign reports shipping today.

    While I don’t disagree with your correlation and causality argument, sometimes the obvious answer is also the right one. The data supporting Appropriate Attribution, well not perfect, was certainly compelling enough for me to present this model as an alternative to the relative blindness most digital marketers suffer from today.

    And while I have little doubt that your SuperCrunched-approach towards campaign analysis will provide great value, I do wonder about the wisdom of your telling people to sit on their hands until you’re able to ship this third-generation of campaign analytics. Based on the feedback I have been getting, your customers (and those of your worthy competitors) are actively seeking solutions they can implement ** now ** that will help them optimize a rapidly dwindling marketing budget.

    Anyway, I agree that three wrongs don’t make a right. But I respectfully disagree with your thesis that using a triangulation-based approach to better understand how marketing dollars are being spent is wrong. Fortunately the market will decide, and you and I will continue to enjoy presenting together.

    Sincerely,

    Eric T. Peterson
    Web Analytics Demystified
    http://www.webanalyticsdemystified.com