May 31, 2020
An article by Sergey Matrosov
You know, one of the main reasons why this blog is named like the old-fashioned attribution “Last Click” is that I and my co-host started our marketing careers when this attribution – like other classical ones such as “First Click” – was the only one we could use. So the naming is a kind of tribute to those old school days.
But not all old-fashioned methods can be used nowadays. These attributions are like broken glasses: it’s hard to use them, but sometimes you have to. This may be because channels are specific for your business or you are willing to look at your marketing efforts from different points of view. But today, many marketers work in BigQuery so if your goal is to build reports there using classic attributions, this may be the moment you get stuck. So, let’s deal with it!
But before we start, remember, we don’t use the lookback window here. If you need it, for example, at last click (touch) and it is equal to 7 days, you must add the following to CASE-WHEN functions: date of the First Touch - date of conversion < 7 days in seconds (604800).
Ok, I guess we can start now. Full SQL query can be found here
1. Our first step is to make a table with event streaming by user
WITH
WebsiteUserStreaming AS (
SELECT "11111" AS userid, "2020-01-01" AS time, "blog" AS source, "zero" AS event UNION ALL
SELECT "11111", "2020-01-02", "organic", "CONVERSION" UNION ALL
SELECT "22222", "2020-01-01", "ppc", "zero" UNION ALL
SELECT "22222", "2020-01-02", "blog", "zero" UNION ALL
SELECT "22222", "2020-01-03", "organic", "zero" UNION ALL
SELECT "22222", "2020-01-04", "ppc", "zero" UNION ALL
SELECT "22222", "2020-01-05", "organic", "zero" UNION ALL
SELECT "22222", "2020-01-06", "direct", "zero" UNION ALL
SELECT "22222", "2020-01-07", "direct", "CONVERSION" UNION ALL
SELECT "66666", "2020-01-01", "ppc", "CONVERSION" UNION ALL
SELECT "66666", "2020-01-02", "blog", "zero" UNION ALL
SELECT "66666", "2020-01-03", "ppc", "zero"),
2. Then we need to format our query:
FormatedStreaming AS (
SELECT
userid,
UNIX_SECONDS(TIMESTAMP(time)) AS date_in_sec,
source,
CASE
WHEN event = 'zero' THEN 0
WHEN event = 'CONVERSION' THEN 1
END
AS event
FROM
WebsiteUserStreaming)
3. Then, we need to check it:
SELECT
*
FROM
FormatedStreaming
4. Finally, we can write our queries.
First of all, let’s make the Last Touch attribution. Just to avoid any misunderstandings, remember that it is the attribution itself, the rule or set of rules, that determines how conversions are assigned to touchpoints on the path to conversion. Last Touch gives all credit to the last point.
--LAST_TOUCH
SELECT
userid,
date_in_sec,
source,
CASE
WHEN event > 0 AND date_in_sec > LAG(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
WHEN event > 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
ELSE
'null'
END
AS last_touch_attribution,
FROM
FormatedStreaming
Last Non-Direct Touch gives all credit to the Last Touch point too, but if the last one is “direct”, it gives all credit to channel before “direct”.
--LAST_TOUCH_NON_DIRECT
SELECT
userid,
date_in_sec,
source,
CASE
WHEN event > 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
WHEN event = 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN 'null'
WHEN event = 0 AND LAG(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN 'null'
WHEN SUM(event) OVER (PARTITION BY userid) > 0 AND date_in_sec < LAST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AND LEAD(source) OVER (PARTITION BY userid ORDER BY date_in_sec) = 'direct' AND source != 'direct' THEN '1'
WHEN event > 0 AND date_in_sec = LAST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AND source != 'direct' THEN '1'
ELSE
'null'
END
AS last_non_direct,
FROM
FormatedStreaming
But wait, what about userid ‘66666’? Why does ‘ppc’ have credit? Yes, it is not the last in the path of the user, but it is the last in the conversion path. Just look at our first table above!
First Touch is rather simple too: all credit is given to the first channel.
--FIRST_TOUCH
SELECT
userid,
date_in_sec,
source,
CASE
WHEN MAX(event) OVER (PARTITION BY userid) = 1 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
ELSE
'null'
END
AS first_touch_attribution
FROM
FormatedStreaming
Any touch is a rare thing, but sometimes you may need to use it to measure a channel from the mix or to measure the mix itself, without knowing its exact impact, but knowing that it is crucial for your conversion path. For example, if your buying cycle is rather long, say around 90 days, and your customers read your blog posts from day 1 to day 90, you give a 100% credit to each interaction with the blog. Yes, all other channels in the example below will have 100% credit too, but in report via filters, you can choose only a blog to view its impact. I feel that I need to clarify: the attribution model is not about measuring conversions, it is about evaluating marketing efforts. That is why you could see x10 or x100 “conversions'' in any touch report, but it is not conversions, it is credit.
--ANY TOUCH
SELECT
userid,
date_in_sec,
source,
CASE
WHEN MAX(event) OVER (PARTITION BY userid) = 1 THEN '1'
ELSE
'null'
END
AS any_touch_attribution
FROM
FormatedStreaming
If your goal is only to give credit to every touch with the blog, if there is a conversion, just change rules in CASE WHEN:
--ALL CREDIT TO BLOG ONLY
SELECT
userid,
date_in_sec,
source,
CASE
WHEN MAX(event) OVER (PARTITION BY userid) = 1 AND source = 'blog' THEN '1'
ELSE
'null'
END
AS blog_only
FROM
FormatedStreaming
Also, you can combine a set of rules: Last Touch + credit to blog each time if it was in the conversion’s path.
--BLOG + LAST TOUCH
SELECT
userid,
date_in_sec,
source,
CASE
WHEN MAX(event) OVER (PARTITION BY userid) = 1 AND source = 'blog' THEN '1'
WHEN event > 0 AND date_in_sec > LAG(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
WHEN event > 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) THEN '1'
ELSE
'null'
END
AS blog_and_last
FROM
FormatedStreaming
Now for the time decay attribution. This is one of the attributions that can be called as multitouch. The core of it is that it gives a percentage of the credit to all the channels on the conversion path: the amount of credit for each channel is less (decaying) the further back in time the channel was interacted with. For example, consider a scenario where your customer interacted with the blog, then with “organic” and then with “ppc” before conversion. Channel “ppc” was the last one and will have the most credit (percent), ‘organic’ will have less and blog will have the lowest value.
--TIME_DECAY
SELECT
userid,
date_in_sec,
source,
ROUND(IF(SAFE_CAST(weights AS FLOAT64)=0 OR SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid)=0, 0, SAFE_CAST(weights AS FLOAT64)/SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid)), 2) AS time_decay,
FROM
(SELECT
userid,
date_in_sec,
source,
CASE
WHEN date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND MAX(event) OVER (PARTITION BY userid) = 1 THEN SAFE_CAST(1.1-ROW_NUMBER() OVER (PARTITION BY userid) AS STRING)
WHEN date_in_sec > LAG(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND MAX(event) OVER (PARTITION BY userid) = 1 THEN SAFE_CAST(ROUND(1.1-1/ROW_NUMBER() OVER (PARTITION BY userid), 2) AS STRING)
ELSE
'null'
END
AS weights
FROM
FormatedStreaming)
However, if you need the reversed version, the first channel takes the most credit and others have lower values the later on in time they are. Here is the code:
--TIME_DECAY_REVERSED
SELECT
userid,
date_in_sec,
source,
ROUND(IF(SAFE_CAST(weights AS FLOAT64)=0 OR SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid)=0, 0, SAFE_CAST(weights AS FLOAT64)/SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid)), 2) AS time_decay_reversed,
FROM
(SELECT
userid,
date_in_sec,
source,
CASE
WHEN date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND MAX(event) OVER (PARTITION BY userid) = 1 THEN SAFE_CAST(ROW_NUMBER() OVER (PARTITION BY userid) AS STRING)
WHEN date_in_sec > LAG(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND MAX(event) OVER (PARTITION BY userid) = 1 THEN SAFE_CAST(ROUND(1/ROW_NUMBER() OVER (PARTITION BY userid), 2) AS STRING)
ELSE
'null'
END
AS weights
FROM
FormatedStreaming)
The last one is position-based. Like time decay, it is sort of multitouch attribution. It gives a percentage of credit, based on when the touch occurred (first, last, in-between). In our example, first and last take 40% each (=80% of the credit), and in-between shares take the remaining 20%. To have accurate data, we use a ROUND function with two decimal places, but if your channel's path to conversion is usually long, use more decimal places. Otherwise, summing up all credits within the user journey from channel to channel could result in a sum > 1.
--POSITION BASED
SELECT
userid,
date_in_sec,
source,
CASE
WHEN SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid) > 1 THEN SAFE_CAST(ROUND(SAFE_CAST(weights AS FLOAT64)/SUM(SAFE_CAST(weights AS FLOAT64)) OVER (PARTITION BY userid), 2) AS STRING)
ELSE weights
END AS position_based
FROM
(
SELECT
userid,
date_in_sec,
source,
CASE
--one channel and instant conversion
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND COUNT(userid) OVER (PARTITION BY userid) = 1 THEN '1'
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND COUNT(userid) OVER (PARTITION BY userid) > 2
THEN SAFE_CAST(ROUND(COUNT(userid) OVER (PARTITION BY userid)*0.4, 3) AS STRING)
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND date_in_sec = FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND COUNT(userid) OVER (PARTITION BY userid) = 2 THEN '0.5'
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND date_in_sec != FIRST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND date_in_sec != LAST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AND COUNT(userid) OVER (PARTITION BY userid) > 2 THEN SAFE_CAST(ROUND(COUNT(userid) OVER (PARTITION BY userid)*0.2, 1) AS STRING)
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND date_in_sec = LAST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND COUNT(userid) OVER (PARTITION BY userid) > 2
THEN SAFE_CAST(ROUND(COUNT(userid) OVER (PARTITION BY userid)*0.4, 1) AS STRING)
WHEN MAX(event) OVER (PARTITION BY userid) > 0 AND date_in_sec = LAST_VALUE(date_in_sec) OVER (PARTITION BY userid ORDER BY date_in_sec) AND COUNT(userid) OVER (PARTITION BY userid) = 2 THEN '0.5'
ELSE '0'
END AS weights
FROM
FormatedStreaming)