Last updated 18 Jan 2019: Added details about the free tier limitations, and showed how to avoid the Dataflow jobs auto-scaling out of control. I’m (still) a huge fan of Snowplow Analytics. Their open-source, modular approach to DIY analytics pipelines has inspired me two write articles about them, and to host a meetup in Helsinki. In my previous Snowplow with Amazon Web Services guide, I walked you through setting up a Snowplow pipeline using Amazon Web Services.
A recent guide of mine introduced the Google Analytics adapter in Snowplow. The idea was that you can duplicate the Google Analytics requests sent via Google Tag Manager and dispatch them to your Snowplow analytics pipeline, too. The pipeline then takes care of these duplicated requests, using the new adapter to automatically align the hits with their corresponding data tables, ready for data modeling and analysis. While testing the new adapter, I implemented a Snowplow pipeline from scratch for parsing data from my own website.
I’m back with another customTask tip, but this time I’m exploring some new territory. Snowplow just introduced their latest version update, which included (among other things) an adapter for processing Google Analytics payloads. Never heard of Snowplow? It’s a collection of open-source libraries designed to let you build your own analytics pipeline, all the way from data collection, through ETL (extract, transform, load), using custom enrichments and JSON schemas, and finally into your own data warehouse, where you can then analyze the data using whatever tools you find preferable.