Session Attribution With GA4 Measurement Protocol
In this article, I’ll try to clarify the understandably murky Measurement Protocol functionality in Google Analytics 4.
Measurement Protocol is a way to send events to Google Analytics 4 directly from a machine capable of sending HTTP requests (such as a web server). It’s an alternative collection method to the client-side libraries of Google Tag and the Firebase SDK.
Measurement Protocol in GA4 is very different from its predecessor in Universal Analytics. In Universal Analytics, Measurement Protocol was the collection mechanism, with the same protocol being used for both client-side library hits and server-to-server hits.
In GA4, MP is really its own thing. It’s completely decoupled from the client-side libraries, and it works with a different schema, a different set of rules, and a different data profile in the reports.
One of the pressing questions with Measurement Protocol has been whether it’s possible to insert events into the past so that they would be associated with the correct session.
In this article, I’ll show you that yes, it’s possible. Read on.X
The Simmer Newsletter
Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava into your email inbox!
To send a Measurement Protocol hit successfully and have it be associated with the correct session, you need three things in place:
- The Client ID, which you can get from BigQuery (
_gacookie, or using the GTAG GET API Custom Template.
- The Session ID, which you can get from BigQuery (event parameter named
_ga_<measurement ID>cookie, or using the GTAG GET API.
- The timestamp (in microseconds) of the event, set to a maximum 72 hours into the past, measured from the moment you dispatch the Measurement Protocol request.
The timestamp should be chosen so that it occurs while the session (represented by the Session ID) is still active.
If you have all three of these in place, the event that you send will be associated with the session source/medium of the active session.
Use Google BigQuery to find the parameters
I’ve hopefully established myself by now as the ultimate Google BigQuery fanboy. I’ve never kept it a secret that I think it’s the only actually useful way to access Google Analytics 4 data. We even built an online course on Simmer around this thought process.
For accessing the three parameters listed in the previous chapter, BigQuery is invaluable. As long as you can pinpoint the session you want to augment, there’s not much else to it.
For example, here’s the query I used to find the exact session I want to insert my hit into.
SELECT user_pseudo_id, event_timestamp, event_name, event_params FROM `project.dataset.table` WHERE user_pseudo_id = '1871825471.1628513380' AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id' ) = 1674465678
WHERE clause specifies the Client ID (
user_pseudo_id) and Session ID (nested
event_params.ga_session_id), and I can then use the list of events to find the correct “slot” for my timestamp.
event_timestamp field is already in microseconds, so if I want my Measurement Protocol event to be added after a specific event, I’ll just need to make sure the timestamp I set in the Measurement Protocol hit is greater than that.
Another reason I chose this particular session as an example is that if you look at the
page_view event, you can see that its event parameters indicate the session is marked with the
linkedin.com / referral source/medium values.
Send the hit
To test this, I used the amazing Event Builder tool. It lets you give Measurement Protocol a spin by generating the hit for you. You will, of course, need to provide all the required parameters by following the documentation.
In Event Builder, I added the API Secret, the Measurement ID, the Client ID, the Event Name, and the timestamp in their required fields.
Importantly, the timestamp was set a few microseconds after the
page_view event I saw in BigQuery. This is to make sure that the session “starts” correctly from BigQuery’s point of view. You’ll of course want to position this event so that it’s logically placed within whatever funnel it belongs in. For example, if you’re sending purchase events, you’ll want to place the event after the checkout events have been collected.
Finally, in Event Details I added a
page_location (this is not necessary – it was part of another test I was running) as well as the Session ID I got from BigQuery (or the user’s cookies).
When ready, I clicked the Validate event button, followed by Send to GA.
First, the event shows up in the Realtime reports in GA4.
This is interesting for two reasons.
First, the event was dispatched in realtime, but its actual timestamp was a couple of days in the past.
Second, the Measurement Protocol documentation states that if you want to see the hit appear in Realtime reports, it needs the
engagement_time_msec parameter – this didn’t appear to be true.
After waiting some 24 hours for good measure, I built a simple Free form Exploration report, where I filtered for the event name together with its Session source / medium value. This is what it looked like:
This confirmed that the event was associated with the session source and medium to which it was sent using the timestamp parameter.
Finally, my ultimate source of truth, I ran the same query as above again, and this is what I saw:
The event is placed exactly where I wanted to place it.
Looking at this, you can see one major caveat when it comes to Measurement Protocol:
Events do not inherit properties from “the session”.
There is no “session scope” when it comes to the BigQuery output. Instead, events are shown with the parameters their actual requests had.
Thus, just looking at the BigQuery report, it would be impossible to determine whether the session source/medium attribution actually worked, as BigQuery doesn’t have the capability to tell you what the source/medium of any given session was.
That’s why I was forced to suffer through the Google Analytics 4 user interface to see if the alignment actually worked (and it did!).
This article was a proof-of-concept to show you that Measurement Protocol for sessions does work – just like the documentation says.
You need three parameters for this: the Client ID, the Session ID, and the timestamp (in microseconds) of the hit.
BigQuery is a great tool for testing things out, but if you want to integrate this method with your sales engine or something more sophisticated, you’ll need to harvest the required parameters from the user’s cookies or wherever you store these values.
Finally, I want to emphasize an extremely important point about Google Analytics and Measurement Protocol:
Google Analytics isn’t supposed to be an exact representation of ALL THE DATA. You can use the tool as you like, of course, but it’s a marketing analytics service. It’s supposed to tell you how well your marketing campaigns are doing based on traffic on your website (or your app).
In other words, using Measurement Protocol to send offline (or server-side) hits to Google Analytics when those hits can’t be associated with an actual user or session is just a bad idea. It adds a lot of noise to the dataset, and those hits will be isolated from the rest of the behavioral data that GA collects.
I hope you enjoyed this article. Let me know in the comments if you have any questions. I didn’t test all the permutations (such as what happens if you send the event to a time when the session ID was no longer an active session), so if you do some testing of your own, I’d love to hear about it.