What is Attribution?

Attribution is all about attributing a cause to a result. In this post I will discuss it at a high level in the context of seeking a conversion or sale. So what does that mean and why does that matter exactly?

In a company you have many different ways you can influence customers or potential customers. A few examples:

  1. Phone Call
  2. Email with picture A
  3. Email with picture B
  4. Send a Coupon

Now imagine you just made a sale and you had sent a coupon, called, and sent an email to that customer in the last 6 months. Which one was responsible for the sale? Was it just one responsible, or was the combination of mediums important? How do I know if the phone call was "worth it"? What about the coupon? Should I replicate this combination or are there better combinations?

The answers to all of this require great data about what the causes are (action) that led to the effect (sale). This is what attribution is all about.

If you wonder whether you should be using attribution in your efforts just ask yourself the following questions.

Do you want to know what activities you have done led to sales? Do you want to know whether a campaign you launched was worth it? Do you want to know what marketing you do is effective?

If the answer is yes to either of those, you should get comfy with a few different attribution models.

Purpose of this post

Now that we understand the problem and it's importance, let me be specific about what I hope to accomplish in this post.

  • Primary Goal: Communicate a high level understanding of several basic approach to attribution, how they work, what they are good at, what they are not so good at.
  • Secondary Goal: Expand on the questions in the "What is Attribution?" section and give some ideas for how this data can actually be used and why it is so powerful once you have it.

Last-Touch Attribution

Let's start with the simplest attribution model, Last-Touch Attribution.

What is last-touch attribution?

This is just what it sounds like - you attribute the sale to the last touch. Last-Touch Attribution would say whatever the most recent activity was before the sale was 100% responsible for that sale.

Example Interaction

For Last-Touch let's consider a scenario where we are selling pens and pencils. We send a bunch of ads to see what sticks and attribute it like this (all to the last-touch).

Why would you use last-touch attribution?

Because last-touch gives sole weight to the last interaction, it can be very helpful in products or companies where customers do not think about the purchase for extended periods of time. For example if you are selling cheap pens and pencils, a customer probably didn't start thinking about the purchase over the course of multiple interactions and work up to the decision to buy. They probably simply knew they were going to need some soon or were running low, and they saw an ad at the right time with the right message and clicked to buy. In that scenario, last-touch could be very appropriate.

Why would you avoid last-touch attribution?

If we consider the context of our example, it's clear that it's not appropriate for inter-related advertising channels or actions. In our example, they only opened the email because they were thinking about the product. They only got to the popup because they opened the email and clicked through. In this type of scenario, it isn't accurate to say the final action gets all the credit. In many luxury products, last-touch is not ideal.

Note: There is a similar strategy called "First-Touch" attribution which does this same idea but weights the first touch exclusively instead of the last touch.

Multi-Touch Attribution

While the last-touch attribution is appropriate in some situations, it has some obvious flaws that can make it very inaccurate in some industries. Here's a personal example of why last-touch may not be appropriate.

Multi-touch Attribution was created to solve the problem of interactions over time. Last-Touch Attribution assumed the most recent action was solely responsible, but in reality there could be many attributing factors. We will cover four types - no-decay, time-decay, position-decay, and U-shaped.

Note "no-decay" is often called linear. I try to avoid that term as it can be confusing when business and technical teams come together since both time-decay and position-decay can decay at a linear rate.

No-Decay

What is no-decay attribution?

No-decay attribution gives every interaction equal weight. This would say if you sent 4 advertisements to the customer, they are each 25% responsible for the sale.

Example Interaction

My dad just planned a trip to Joshua Tree. He visited long ago and has been getting regular email, letters, and voice mails updating him on new events to encourage him to return. For months he didn't open or respond to a single one, but it did get him thinking about how much he enjoyed it and would like to go back. Eventually he opened an email and did some research to plan his trip and is going next month

Why would you use no-decay attribution?

No-decay attribution can be in scenarios where any given interaction could trigger a sale, you just need it to land at the right time and different people continue to think through. If the customer doesn't say no, then they may later say yes. In some scenarios which they say "yes" to isn't consistent so you can equally weight them all.

Position-Decay and Time-Day

What is position-decay and time-decay attribution?

Position-decay and weight-decay are both means of putting more attribution to recent actions and less on older action.

Position-decay does not care about time (ie days), but does care about how many actions ago. For example the most recent email, the 2nd most recent email, the 3rd most recent email, etc. Maybe all of those happened in the last 2 weeks or maybe they were spread out over 2 months, position-decay only sees how many positions it is removed from the sale.

Time-decay is very similar to position-decay but instead instead of decaying based on number of actions, attribution decays based on time (ie days) from the sale.

Example Interaction

Imagine you want to get a new couch (or your spouse wants you to get a new couch). It's not really a priority or very urgent You do a quick amazon search but don't buy in the moment and plan to come back to it later to figure out what to buy and who from. It's not particularly important because you already have one and while it would be nice to have a nicer one, it really isn't a big deal. This could go on for years, but with the right messaging this person could be motivated to buy.

Why would you use position-decay or time-decay attribution?

You may get various advertisements about it before you eventually take the time to place the purchase. While each does have importance to help remind you to make the purchase - the one that triggers you to buy is most important because it hit at the time with the message to motivate you to actually take action. Maybe the messaging was better, maybe the time it was sent was better, but for whatever reason it was better. The older messages did have an impact because it kept it at the forefront of your mind, but the older it was the clearer it is that it did not trigger a purchase. The final final action is what triggered the buy and you feel it deserves the most credit.

U-Shaped

What is u-shaped attribution?

A u-shaped attribution weighs the first and last touches most heavily, with lighter weights on the middle interactions.

Example Interaction

To re-use the example from before, My dad just planned a trip to Joshua Tree. He visited long ago and has been getting regular email, letters, and voice mails updating him on new events to encourage him to return. For months he didn't open or respond to a single one, but it did get him thinking about how much he enjoyed it and would like to go back. Eventually he opened an email and did some research to plan his trip and is going next month

Why would you use u-shaped attribution?

In this case you could argue that the initial outreach that got him thinking about going on the trip was was very important. The final communication that got him to purchase was also important. The stuff that happened in the middle mattered but wasn't as critical and wasn't even looked at. Because of this both the first and last touch are given heavy attribution while the middle stuff is given less.

In some luxury products this can be a great attribution methodology.

Markov Models

Markov models are a bit more involved. The steps are:

  1. Calculate probabilities to create a transition matrix
  2. Calculate Removal Effect to determine attribution
    Note: Rather than attributing on a record by record level, attribution normally happens in groups on an action or channel level. In marketing contexts these are often referred to as campaigns where you can attribute sales to different channels within a campaign.

Transition Table

The transition table stores probabilities for all transitions. If I send an email, what's the probability that a phone call is next? What's the probability that a popup is next? What's the probability that a sale is next? All combinations are stored in a transition table.

Note: This data is also sometimes represented as a graph with nodes being actions and edges being probabilities.

Here I have a simply dummy transition table. For each action or state, we have the probability of what the next action is. This is calculated from your existing marketing data and is effectively a model of your marketing communication actions. From each action I can predict te chance of other actions occurring after that. This also means that I can see how they interact as I can see the different paths and probabilities to all those paths.

You will notice that Sale and NotSale don't lead to anything else. Once a sale is made the goal is met. In some businesses it makes sense to cut it off here, and in others it makes sense to allow for longer chains where "completing" the chain is multiple buys in succession. For simplicity and the sake of high level learning I will assume the goal is a single sale.

NotSale can mean many things to businesses. It could mean you failed to sell, or failed to upgrade the customer, it could mean that the customer asked you to not contact them in any way in the future. Regardless, the goal of a sale failed (usually in the scope of a campaign).

Phone Email Popup Sale NotSale
Phone 0.20 0.30 0.35 0.05 0.10
Email 0.20 0.34 0.25 0.10 0.11
Popup 0.22 0.31 0.30 0.08 0.09
Sale 1.00 0.00 0.00 0.00 0.00
NotSale 0.00 0.00 0.00 0.00 1.00

Removal Effect

Now that we have all the probabilities of all the interactions we can measure what the sale rate is for any given chain. We can also use those probabilities to run simulations. We can then run those simulations without 1 action and see how it effects the sale rate. The difference in that is the removal weight. By calculating this removal weight for each action, we can determine the amount to attribute to a particular action or channel.

{'total_revenue': 228135.55,
 'Popup': 0.5060799842334842,
 'Email': 0.17580872663283006,
 'Phone': 0.3181112891336858}

Gotchas to consider

In any attribution approach there are some things to consider. The goal of this section isn't to scare you off, but just to ensure you realize that you can't just take one of these and implement it without further thought. Each of these areas would require further research and thought as you seek to implement an attribution model in your organization.

  • Just because you sent an email and the customer clicked through and bought doesn't mean they wouldn't have bought without that email. Some of those customers would have bought anyway so really your marketing effort did nothing, but may still get credit for the sale. This can skew your ROI calculations.
  • You may not be able to track all touch points. Any interaction the customer has you want to be able to track - so spend time thinking about what all the possible touch points are and how reliably you can track each of them. There may be some you can't track - but you need to be aware of those and not just ignore them.
  • Digital tracking to understand those touch points can be extremely difficult, especially as people have more and more devices they use (phone, tablet, work computer, personal computer, etc.). People may look up a product on their computer, go visit a store, and then look up reviews on their phone before buying. Tying all those together can be challenging, but helps to get the whole picture.
  • Not all customers are equal. Different customers respond to different channels in different ways. Some customers are expensive to sell to others are relatively easy. Segmenting revenue reports by channel and by other customer segments is important so that you have a plan for all your customers and not just planning for the average.
  • For every hypothesis on how customers behave and how you should attribute, you should be trying to verify that as best you can with data. Can you see any patterns in who buys and who does not that supports your theory?

Conclusion

I hope you learned a bit about marketing attribution. These are the basics and are a great starting point. Proper attribution is important because once you can properly attribute the causes of sales you can calculate ROI. When you can calculate ROI you can make data driven decisions about effectiveness of experiments campaigns and initiatives. The better you can measure success the faster you can experiment with marketing initiatives and the more confident you can be in the results you are seeing. Marketing is not a discipline where anyone has the "right" answer and can jump in and have every campaign land perfectly. It's all about setting things up so as you try things you learn as much as possible from what worked and what didn't so your next marketing effort is more informed and has an increased chance of success.

Two other critical things I hope you got from this:

  • The core concepts are something that anyone can understand, even if you don't understand any of the code.
  • Why you would use one attribution methodology vs another is as much a business question as it is a technical question, so these decisions should be made through collaboration of the 2 teams.