With the rise of ad and content blockers (think Ghostery and uBlock Origin), as well as browser tracking protections (see www.cookiestatus.com), marketing technology vendors have their work cut out for them. And when I refer to “their work”, I mean they have to proactively identify and exploit any loopholes they can find to keep on collecting their precious data. In this article, I'll take a look at one such exploit vector, the Canonical Name (CNAME) DNS record, in particular.
On New Year's Eve 2018, I published an article which instructed how to scrape pages of a site and write the results into Google BigQuery. I considered it to be a cool way to build your own web scraper, as it utilized the power and scale of the Google Cloud platform combined with the flexibility of a headless crawler built on top of Puppeteer. In today's article, I'm revisiting this solution in order to share with you its latest version, which includes a feature that you might find extremely useful when auditing the cookies that are dropped on your site.
Update 17 February 2020: Google Tag Manager's Preview mode cookies have been updated with the necessary flags, so they will not break once SameSite enforcement begins. If you've opened the browser console in Google Chrome (since Chrome 76), you might have seen a bunch of warnings in a yellow background related to something called a SameSite cookie attribute that is either missing or incompletely set for cookies set on external domains.
Welcome back my friends (to the show that never ends)! It's been a couple of weeks since my last barrage of articles, and I think the time is ripe to do some testing! First things first, here's a picture of me shovelling snow: And now back to the topic at hand. One of the things that seems to be a hot topic in Universal Analytics is cross-domain tracking. I've never really tackled the beast head-on, since there's such a wealth of excellent articles about it out there.
Every now and then we want to create a bridge between the stateful machines we send data to (e.g. Google Analytics), and the stateless environment where we collect the data itself (e.g. Google Tag Manager). This is not easy. There is no synergy between Google Analytics and Google Tag Manager which would let the latter understand anything about things like sessions or landing pages or Bounce Rates. One thing we can reliably measure, however, is whether or not the visitor is a New User in Google Analytics.
The web is stateless. It's basically blind to your past, and it does a poor job of predicting what you might do in the future. When you browse to a website, the browser requests the page from the web server, and then proceeds to render it for you. This is a detached, clinical process, and any personalized or stateful data transfer is left to the sophistication of your web server.