June 22, 2019
An article by Sergey Matrosov
Let’s say that you want to know the conversion amount in a specific date range for streaming in BigQuery: the total amount for each day and the total for all days as well. You could do this by making either two queries resulting in two tables, or one query resulting in one table. The first solution is not particularly user-friendly (you have to click from one tab to another etc.), but it is very quick to carry out. The second solution sounds pretty easy too, although it could involve a slight challenge; however, in terms of output, one table is produced.Our conversions include ‘Registrations’, thus our streaming table looks like this:
Date | Event |
2019-01-01 | Registration |
2019-01-01 | Activation |
2019-01-01 | Registration |
... | ... |
2019-01-02 | Registration |
... | ... |
And so on.
Obtaining data only by day is really that easy:
SELECT
date,
COUNT(event) AS Conversions
FROM
`your-project.your_dataset.table`
WHERE
event = 'Registration'
AND date BETWEEN ‘2019-01-01’
AND ‘2019-01-05’
GROUP BY
date
Date | Conversions |
2019-01-01 | 10 |
2019-01-02 | 5 |
2019-01-03 | 6 |
2019-01-04 | 1 |
2019-01-05 | 2 |
Then, for the grand total (10 + 5 + 6 + 1 + 2 = 24 in our case), you might think that it’s all about one line of query in SELECT, but it’s not. Using “Group By” will only count events by each day.
To achieve your goal, you need to use “CROSS JOIN” with this query:
SELECT
COUNT(event) AS Total,
FROM
`your-project.your_dataset.table`
WHERE
event = 'Registration'
AND date BETWEEN ‘2019-01-01’
AND ‘2019-01-05’
So, the full query will be:
SELECT
t1.date AS Date,
t1.Conversions AS Conversions,
t2.Total AS Grand_Total
FROM (
SELECT
date,
COUNT(event) AS Conversions
FROM
`your-project.your_dataset.table`
WHERE
event = 'Registration'
AND date BETWEEN ‘2019-01-01’
AND ‘2019-01-05’) AS t1
CROSS JOIN (
SELECT
COUNT(event) AS Total,
FROM
`your-project.your_dataset.table`
WHERE
event = 'Registration'
AND date BETWEEN ‘2019-01-01’
AND ‘2019-01-05’) AS t2
ORDER BY
Date ASC
And the result will be as expected:
Date | Conversions | Grand_Total |
2019-01-01 | 10 | 24 |
2019-01-02 | 5 | 24 |
2019-01-03 | 6 | 24 |
2019-01-04 | 1 | 24 |
2019-01-05 | 2 | 24 |
Indeed, as a result, you will get a duplicate of the total in each row. However, the result clearer and more convenient, especially if it is used only in BigQuery’s interface.