Simple RegEx Table For Google Tag Manager

When our good friends in the Google Tag Manager developer team first introduced the Lookup Table Macro, we were excited. For many of us, it soon became the weapon of choice especially when used as a management and optimization tool for the container itself.

However, the macro wasn’t considered perfect. In fact, the most frequently heard request had to do with the core functionality of the feature itself: the macro should support operations, that is, predicate logic. It’s not enough to just have equal match lookups; people wanted support for operations such as “is x larger than y” or “does y contain x”.

The thing is, I don’t agree with changing the Lookup Table Macro to support these types of operations. Sure, a table whose values you can query with more complex operations than simple lookups would be awesome, but it wouldn’t be a lookup table anymore. We’d need a different variable type for those use cases.

In this post, I’ll take a look at just what makes a lookup table a lookup table, and I’ll also give you a nifty Custom JavaScript variable that lets you create a regular expression table by yourself. This table lets you query an input variable (e.g. {{Page Path}}) against a number of regular expressions (rows). If a match is made, then some value is returned. So, it’s essentially a variation of the Lookup Table, but with regular expressions instead of exact match lookups.

The Simmer Newsletter

Subscribe to the Simmer newsletter to get the latest news and content from Simo Ahava into your email inbox!

Look it up!

Even though I’m a product of the unsurpassed Finnish education system, I suck at ornithology. So you’ll excuse me for the following, clumsy metaphor.

Consider the homing pigeon. It has an intimate knowledge of a location, and it flies to that location. If there is nothing there, it gets confused and poops. If it does find a recipient or a message, it does its thing and coos happily.

Well, when you have a lookup table, it’s the same thing. You use a variable reference to pinpoint to a specific cell in a table. If this cell exists, any value stored within is returned. If the cell doesn’t exist, the script gets confused and poops an undefined or an error.

This is what makes lookup tables so incredibly efficient. It’s all based on binary logic.

There are no complex operations, no predicates to be evaluated. It’s just a question of “does table X have a value under label Y”.

In JavaScript, a lookup table can be a plain object (most common), or an Array, or even a String. Basically, it can be any Array-like structure.

If you use a plain object as a lookup table, it’s common to call it an associative array or a hash table, but we’ll call them lookup tables here for clarity.

So, you can perform lookups on all Array-like structures. The three examples listed above can be used for lookups like this:

// Plain object
var newValue = objectTable['key'];
// Array
var anotherValue = arrayTable[3];
// String
var newestValue = 'String'[5];

With the plain object, you can also use dot notation in some cases.

As you can see, you’re directly requesting a specifically labelled value in the table, and if it exists, it’s returned to you without any further operations.

Now, if you were to introduce predicate logic into the mix, with something like table['[Kk]ey'] (fictional example), it would mean that the lookup should check every single cell until a match is made in the table to see if they have either ‘Key’ or ‘key’ as their label.

This is because with JavaScript data structures, a label can only ever be one thing. With lookup tables, you’re requesting for a given label without any variations, e.g. ‘Key’, and only that label is thus queried in the table. This is because programming logic dictates that only one cell can exist in the table with that label.

As soon as you add predicate logic into the mix, you’re forcing the lookup to check every single cell until a match is made, because you can’t label cells with regular expressions or dynamic values (e.g. ‘Key/key’).

The difference between the binary check of the lookup vs. the traversal of a more complex operation becomes clearer when thinking in terms of performance.

Queries on a lookup table are said to work in constant time. Since you’re querying for a specific label in a table of arbitrary size, the complexity of the operation will always be the same. Either the label exists or it doesn’t. The table can be huge or it can be miniscule, the performance is always the same.

Performance is usually indicated with Big O notation. The notation for constant time (i.e. the lookup) would be O(1).

When using predicate logic, you achieve O(1) only if you match the query with the first cell that is checked. Every subsequent cell that is checked for a match incurs a linear decrease in the performance. Thus, comparison logic is said to work in linear time.

Describing linear time operations with O(1) would be fairly optimistic. For this reason, Big O notation tends to describe the worst-case scenario. The worst-case scenario of linear time would be that the value is in the very last cell that is queried. Thus, the notation would be O(n), where n is the number of cells in the table.

This also means that the larger the table, the more expensive the operation becomes, in terms of performance.

With small tables this difference is pretty trivial, but with large tables and multitudes of chained variables, the performance hit can be significant, especially if it takes time to make the match, and the labels are arbitrary enough that you can’t use facilitating data structures or search algorithms.

So if you’re concerned about performance, and you should be if it’s a web page, always use the Lookup Table variable.

RegEx Lookup Table with Custom JavaScript

Well I know you’re not satisfied with my explanation, and you’re still craving for a more flexible way to fetch values from a table.

I hope the GTM developers will, at some point, introduce another variable type that’s essentially a lookup table but where you can specify the predicate logic used row-by-row.

Until then, you can make do with workarounds such as the script below.

Copy the following code into a new Custom JavaScript Variable:

function() {
    // Set inputVariable to the input you want to assess
    var inputVariable = {{Page URL}};

    // Set defaultVal to what you want to return if no match is made
    var defaultVal = undefined;
    
    // Add rows as two-cell Arrays within the Array 'table'.
    // The first cell contains the RegExp you want to match
    // the inputVariable with, the second cell contains the
    // return value if a match is made. The third cell (optional),
    // contains any RegEx flags you want to use. 
    //
    // The return value can be another GTM variable or any 
    // supported JavaScript type.
    //
    // Remember, no comma after the last row in the table Array,
    // and remember to double escape reserved characters: \\?
    var table = [
        ['/home/?$', 'Home Page'], // Row 1
        ['\\?location=', 'Contact Us Page'], // Row 2
        ['/products/[123][0-9]', 'Products 10-39', 'i'] // Row n (last)
    ];
    
    // Go through all the rows in the table, do the tests,
    // and return the return value of the FIRST successful
    // match.
    for (var i = 0, len = table.length; i < len; i += 1) {
        var regex = new RegExp(table[i][0], table[i][2]);
        if (regex.test(inputVariable)) {
            return table[i][1];
        }
    }
    return defaultVal;
}

Here’s how the variable works:

First, you give it the input: some variable or value that you want to assess in the table rows
Next, you insert the rows
Rows are actually Arrays within the table Array

1. The first cell contains the regular expression you want to evaluate against the input (note, you will need to _double_ escape reserved characters!)

2. The second cell contains the value that is returned if a match is made

3. The third cell is optional, and can contain any regular expression flags (e.g. 'g', 'i') you might want to use

Finally, there’s a little for-loop which loops through each row of the table Array, checking the regular expression against the input variable. If and when a match is made, the specified return value is returned by the function
If no match is made, the specified default value is returned

Remember to edit the rows to match your table. Since it’s plain text JavaScript, you could also create the table in Excel (formatting it with the square brackets and all), and then just copy-paste it as plain text to the JavaScript macro body.

Summary

As always, this solution is educational first, a proof-of-concept second, and a usable, out-of-the-box workaround last. So feel free to modify it to your own purposes, or just ditch it completely.

The key takeaway from this article should be an understanding of how Lookup Tables work, and how much more complicated they would get if operational logic would be introduced as well. For that reason, my feature request remains that the Lookup Table would be kept as it is, but a new variable type would be introduced, where you can specify the operation on a row-by-row basis. This way, everyone wins.

By the way, if you’re interested in performance, JavaScript, and other data structures and search algorithms, take a look at this book:

Michael McMillan: Data Structures and Algorithms with JavaScript

The book has the basics summed up really well. The next step would be to grab a book about design patterns and more complex data structures. It’s all very educational.