Have you ever stopped to think about what the chain of events was that led you to a particular decision? Maybe not, but in web analytics it is something that should be considered. After all, there is something counter-intuitive about analytics tools such as Google Analytics, which require us to think in terms of clicks and recorded events that occur on the website, during the visit.

Thankfully, we have evolved as a species, and we no longer place too much emphasis on last click attribution. In short, last click attribution is a model where success of a particular goal is attributed to the conditions underlying the active session. So if a person visits a page via bookmark and downloads a PDF file on that page, the direct visit channel (which covers, by default, bookmarked visits) is the only one which receives the kudos for the file download. Nothing is attributed to other channels, which might have exposed the user to the brand, the product, and the file itself. Ridiculous, right? R.I.P. last click attribution!

Now, if you read the first paragraph, then continued onto the second paragraph, and you’ve respected the basic tenets of literacy by moving onto this, the third paragraph, you might already know where I’m going with this preamble. We have entered the age of attribution modeling, where we try to right the wrongs inherent in these tools by affording a certain slice of the attribution pie to all the channels that participated in the conversion. This is vital, since it approaches a more acceptable level of realism, as it models actual human interaction with all the various channels which bring traffic to your site. Check Avinash Kaushik’s nice introduction to the topic here.

However, this post is not about attribution modeling, at least in the sense we’ve come to learn by using these tools. No, this goes deeper into the murky depths of web analytics.

X

The Simmer Newsletter

Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava into your email inbox!

Clickstream context

In web analytics, the clickstream is what we measure. It’s the collection of hits, events, variables, pageviews, transactions, etc. that make up the traffic of a website. Data for the clickstream is traditionally collected by adding client-side code (JavaScript, most often) to the website, tagging elements which produce added value to the analysis, and annotating each visit with custom variables and such.

I tell you, this is a flawed approach. Are you familiar with the Observer’s Paradox, outlined by William Labov?

The aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain this data by systematic observation.

Now transport this idea into the realm of clickstream analytics. Here’s the gist: when we tag our website, we are injecting our own, a priori hypotheses into the analysis, thus making it less an objective observation of traffic and more a filtered, premeditated peek through a keyhole. Conversely, we should be expanding our horizons to the infinite mass of possibility that each visit entails. How do we know that tagging contact form X is the best possible dimension to measure, when thinking of business goals? Actually, how do we ever know what to tag?

Image courtesy of minifig

The problem lies in context. We have absolutely no idea what brought the visitor to make a specific decision. Sure, we can look at our tags and surmise that visitor A visited landing page B via channel C and brought a conversion to goal D. Once we have enough As, Bs, Cs and Ds, we can hypothesize that this is a recurring trend. Emphasis on enough and hypothesize.

Too few marketing professionals and analysts really question the fact that we are working with samples, and all we ever can do is hypothesize. Not respecting this point can lead to terrible decisions. Being too lazy to look beyond standard reports of a web analytics tool is another thing that’s killing creativity in analytics, and creativity is the very thing that’s called for here.

Uncovering context

There’s so much buzz out there around Big Data these days. The buzz is there for a reason. We have the capacity, the processing power, and the insight to work with copious amounts of data, so all we need to do is extend this to traditional web analytics as well.

Measure everything! That’s the dream. Observe the clickstream as an unfiltered mesh of all the measurable actions performed by an identified visitor. No more a priori tagging, no more injecting your own notions of a successful visit into data collection, no more messing with the data before it reaches you. Server logs are already out there, with all the requests made to the server during each visit. Spice the data with information on mouse movement, scroll depth and speed, heat charts, empty clicks (i.e. clicks that didn’t result in an action, such as clicking an image you thought was a link but wasn’t), browser controls (such as clicking the back-button or adding a bookmark), etc. That’s what I call data!

When we have access to the raw performance of a website, we can then use the tools to dissect it, analyze it, observe it (in real time as well), play with it, filter it, integrate it, export it, and make actual, informed decisions with it.

Beyond the clickstream

Let’s go back to the very first sentence of this post.

Have you ever stopped to think about what the chain of events was that led you to a particular decision?

A wise reader has undoubtedly already grumbled about the fact that I only focus on on-site clicks and hits in all of the above. And you are right.

Context is not just something that can be measured on-site. There’s a world out there, if you haven’t noticed, full of stimulation. Any number of things might affect your decision to make a purchase in an eCommerce store: You might have just broken your phone and you need a new one, you might win the lottery, or there might be some nasty weather outside and you want to fly somewhere warm.

In traditional tagging, creativity was most often channeled to the question of “What on-site element should I tag next?”. If we measure everything in our clickstream, we can divert our creativity to the wealth of external context that underlies our on-site decisions. Want to understand why so many people are visiting your investor site at odd times? Tap into a financial API and use your stock price fluctuation as the trigger. Want to know if weather affects visitors’ decisions in your eCommerce store? Annotate the visits with the weather conditions of the visitor’s location! There are so many things you can measure out there. You are only limited by your creativity and available technology to support your needs.

Where to next

I’d love to hear if there are solutions for measuring everything out there.

Meanwhile, there’s so much of this “external” context that you can start measuring. Take a long look through this API Directory, and see if you can find something that you want to measure.

And remember this mantra: Context is King! Measure Everything!