May 7, 2020
Step 1. Select all the hits to be grouped later.
First, let’s select all the necessary hits. Please note that we are selecting only pageviews as we are going to perform a page-level analysis. All the hits are grouped by fullvisitorID and are sorted by the hit timestamp.
SELECT
fullvisitorID,
FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime+hits.time/1000 AS INT64))) AS hit_timestamp,
hits.page.pagePath AS pagePath,
FROM
`pro-tracker-id.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
hits.type = "PAGE"
ORDER BY
fullvisitorID,
hit_timestamp
Step 2. Select second and third pageviews in the user’s journey.
In the second query, we use the LEAD … OVER function to create new columns which will show the second and third pages visited. PARTITION BY fullvisitorID helps to insert “null” if there is no next page visit, and in this case, the following row will start with the next user’s journey. The rows are also sorted by hit_timestamp.
SELECT
fullvisitorID,
hit_timestamp,
pagePath,
LEAD (pagePath, 1) OVER (PARTITION BY fullvisitorID ORDER BY hit_timestamp) AS second_pagePath,
LEAD (pagePath, 2) OVER (PARTITION BY fullvisitorID ORDER BY hit_timestamp) AS third_pagePath
FROM (
SELECT
fullvisitorID,
FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime+hits.time/1000 AS INT64))) AS hit_timestamp,
hits.page.pagePath AS pagePath,
FROM
`pro-tracker-id.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
hits.type = "PAGE"
ORDER BY
fullvisitorID,
hit_timestamp)
ORDER BY
fullvisitorID,
hit_timestamp
Step 3. Calculate the top paths to the target page.
In the last step, we select the first, second and third pages visited and calculate the most popular paths to the analyzed page with the help of the COUNT function. The final query would be as follows:
SELECT
pagePath,
second_pagePath,
third_pagePath,
COUNT(*) AS count
FROM (
SELECT
fullvisitorID,
hit_timestamp,
pagePath,
LEAD (pagePath, 1) OVER (PARTITION BY fullvisitorID ORDER BY hit_timestamp) AS second_pagePath,
LEAD (pagePath, 2) OVER (PARTITION BY fullvisitorID ORDER BY hit_timestamp) AS third_pagePath
FROM (
SELECT
fullvisitorID,
FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime+hits.time/1000 AS INT64))) AS hit_timestamp,
hits.page.pagePath AS pagePath,
FROM
`pro-tracker-id.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
hits.type = "PAGE"
ORDER BY
fullvisitorID,
hit_timestamp)
ORDER BY
fullvisitorID,
hit_timestamp)
WHERE
third_pagePath = "/idea/"
GROUP BY
pagePath,
second_pagePath,
third_pagePath
ORDER BY
count DESC
I appreciate feedback, so if you have any comments regarding the suggested solution, please don’t hesitate to share them in the comments section.