Increase App Performance — Tame Tag Managers and Take Back Control From Third-Parties
Oct 6, 2020 · 12 min read
Javascript application performance can be tricky, even if the application has been well-tuned for speed — there’s a big area that engineering does not own and has little control over.
It’s the tag manager and third party code in general.
I can get an app to load and finish in 2 seconds. But with all the vendor code, the reality is usually a 2–3 additional seconds. There’s not much room for improvement left within the JS app itself.
There’s two Passions I have. Architecture and performance
I refused to accept that It could not be improved by engineering, without getting involved in rebuilding the rule sets and code inside of a tag manager. So I started working out how to gain control over third party code at runtime.
I primarily use Adobe Launch, so moving forward, I’ll be referencing Launch and it’s internals. For the record, I’ve got nothing against Launch — all tag managers are equally terrible.
For more generic perf writings, give this a read:
What causes vendor code to easily end up slower than the core application?
Bad timing, over-aggressive use of tag managers, too many localStorage
, DOM, and Cookie manipulation.
Vendor scripts usually have long request chains, calling several domains, loading additional scripts. Every new origin requires a DNS lookup and TCP connection. This handshake can range from 20ms to 200ms depending on the vendor CDN and what pressure the vendors own network is under. Other steps like HTTPS handshakes will also increase connection time.
Some vendor scripts will bring jquery, iframes, angular apps. One vendor can pull down 2mb of JS over its request chain. Since vendor tags and tag managers do not coordinate with applications, a lot of code is executed early — just in case.
All this time can add up to quite a substantial amount.
Vendor code isn’t actually as stable as you’d believe. Third party code can easily take down your very real application. Global scope poisoning can deal damage. One bad polyfill on a prototype 👋
(Gate actually does some real magic in theoretical tests — you can Gate unreliable vendors to guarantee app boot, your code will polyfill anything the vendor would do)
More on Gate 👇👇👇
Advanced performance tactics
In 2019, I found a project called YETT — it was built for GDPR compliance. YETT monkey-patches document.createElement so whenever Adobe Launch attempts to create a new script tag, I’m able to intercept and replace what element and attributes are created.
YETT wasn’t perfect, and I eventually rewrote it for better efficiency and more capabilities that better align with the purpose of performance improvement.
Now that I have the ability to block JS scripts that match a regex, the challenge shifts to when and how one should unblock these scripts.
Since many of these third parties provide data or telemetry that has business value. It’s important that I’m able to maintain the same level of data capture rates compared with a baseline.
To do this, I’ll usually create an analytics event that runs as soon as this script is loaded (it’s the first render-blocking script) and I’ll add other analytics events for each various implementation tactic. This lets me experiment with how I’d unblock third party code in production, without compromising the existing data capture rates.
Be the captain. Take control of that code
When and how should code be unblocked.
We need to take many things into consideration.
A timeout would be unreliable and register as a long-running function on lighthouse. If the user navigates away too quickly — you’d lose the telemetry data. You’d also lose performance gains by firing it too early or too late in the load process.
OnLoad event can’t be used for the reason mentioned above, and it also drags performance significantly.
I need an intelligent way to unblock vendor code based on RUM conditions or user action
Gate v1
Since any off the shelf tools didn’t work exactly as I want them to, I rewrote everything and created Gate (as a nod to YETT, which means gate)
I decided to unblock scripts as soon as the user interacts in any way. The theory being that if a weak device is under too much pressure, the user wouldn’t be able to interact anyway — so it was a neat way to unblock code at the right time.
There were many pitfalls with this approach, and it reflected in data capture rates which were at around 80%. Interaction alone will not capture a bounce, or if the user switches tabs, or clicks the back button, or if they visit the page but never interact and just sit there. Lastly, if my listeners don’t attach for some reason and I can’t tell if the user is interacting, there’s no way to unblock the tags.
I started adding additional events like page navigation, pushState, and tab activity. This helped with bounces or clicking the back button, switching tabs. Data capture improved to around 87%
87% was good enough to try blocking “nice to have” vendor tags which had a big impact on runtime performance.
Even then, others noticed a difference in data capture, but the perf gains were significant enough to accept some data capture misses on non-critical vendors.
Gate v1 didn’t require modifying YETT, and anyone could build this mechanism.
Gate v1 Improved RUM by around 50% and substantially improved lighthouse scores
Gate v2
YETT had some issues when unblocking specific vendor tags, it wouldn’t carry over the original attributes, only the src — so id, async or defer, data attributes were missing.
It also unblocks everything at once, meaning that upon user interaction — the page could jank as massive amounts of JS were added.
And YETT itself is quite limited to block and unblock, which can only be triggered once.
Rebuilding YETT
For now, I’m not going to create a multi-stage lifecycle keeping the single unblock action for the time being.
- I addressed the unblocking issue that caused vendor tags to error because they couldn’t find a parent tag by id or data attribute.
- I replaced the unblock process with a promise-based, synchronous unblock process, ensuring code was added back in the same order it was going to be injected at originally. The mechanism uses a concept called pre-heating, it’s typically Used to separate code loading and parsing from execution in an efficient way. Each tag that is unblocked waits for a requestIdleCallback before unblocking the next tag — this, combined with requestAnimationFrame, removed page jank as vendors were only added during idle time.
- In v4, the plan is to integrate unblock stages that allow engineering to unblock tags based on current CPU stress, allowing critical tags to unblock near-instantly, and allowing others to be unblocked based on priority and resource availability.
Gate v3
Rewriting YETT resolved all random script errors that would occasionally pop up due to the buggy unblock.
The next area to focus on was the race conditions. If a user does not interact and does not unload the page, or if I’m unable to register interaction — I still want to capture the telemetry. I needed some fallback safety.
Timers are not reliable, so I need to figure out how to create a dynamic fallback based on other client information.
Client-side TTI measurement.
TTI seemed like a good way to determine when a device isn’t under stress and can act as a fallback to trigger the unblock if it wasn’t already triggered by user interaction.
This was tricky and in v4, I’ve had to rewrite these tools.
- Googles
tti-polyfill
: it works but only In chromium-based environments or ones with the LongTask API - A slightly outdated library that uses the deprecated performance timings APIs, to try and determine processor pressure. It did not support the modern V2 API, so I needed to make a few changes. It’s also not very efficient and required more modifications to reduce overhead. The initial concept was modeled on how Google Lighthouse and Akamai Boomerang determine TTI.
With everything combined, I can trigger the unblock when TTI is reached, or background schedule the unblock upon user interaction. I can also measure relative TTI and initialize the unblock when only part of the page is interactive.
With relative TTI calculations, I can monitor the CPU cycles and unblock each vendor based on when the relative TTI has been reached.
The current data capture rates of v3 are at 96% , with no RUM impact caused by Gate itself. At this level of accuracy, I can offload more vendor tags without compromising telemetry data.
Applied to a small number of vendors, Gate blocks 3mb of uncompressed JS from being executed upfront, less code to parse, fewer network constraints, and less CPU strain competing with the main applications ability to boot and hydrate.
That’s an impressive improvement considering Gate is a standalone script tag and only weighs in at 5kb
Simply pasting the script and adding a script tag to your head, and adding some vendors to an array. It’s one of the fastest, easiest, and most significant performance improvements that can be applied by a single standalone script tag.
Gate v4
The next gate implementation will re-organize all the rewrites, add processing tier, and priority. Potentially I may use ES Proxy in order to hijack Adobe Launch’s satellite and call/apply handlers — doing so would let me adjust how the tag manager boots — reducing the initial hit of hydrating satellite in one very long function task.
Gate, Module Federation, and the future state.
Gate and its script blocking is the ultimate workaround. Since no other team is required to perform this change, it’s cheap and effective.
However, Gate script blocking is not the cleanest solution.
In a perfect world, I’d like to replace the poor performance of a tag manager, or it’s vendors. Or at least control code loading and execution on-demand, not dump scripts on the page and add custom code to use those scripts.
Module Federation to the rescue!
I touched on this future in my last article, leveraging startup code. Since then — I’ve expanded on this concept.
Under the hood, Adobe launch is a window.container
object of rules, with an OSS project called reactor-turbine. Turbine interprets the window.container
rules, then deletes the global container object after satellite
is hydrated.
So, we have options
- I could trick Launch into thinking it’s already initialized, so Adobe delivers me a clean and accessible config object from the backend. Then compile Turbine with Webpack and change the way it loads code and vendor dependencies. Some complexity would come into play around new vendor tags, but MF can register new modules dynamically at runtime, so that can be handled. But I still have to download two copies of Launch, since the container object comes with Adobe’s Turbine engine.
- I could use the edge network to strip Adobes launch script out of the container file, returning a small container object. Then apply similar tactics mentioned in the previous point
- I might be able to proxy function calls and efficiently hydrate satellite. This tactic would also allow me to bind module federation to a different event API
- I could leave launch alone, and instead, replace all the custom code scripts with the low-level webpack MF interface, but that’s a large effort compared to reading the rules and handling them in a more sophisticated manner.
- I could bind to
_sattlite.track()
, when Launch triggers events, I could intercept these, see if they meet my parameters, if they do — we can feed it to webpack, otherwise call the original track function.
In all solutions, the core concept is to use the tag manager as a rule engine only, and it exclusively sends instructions to webpack for code loading and event triggering. Since we are combining launch with webpack, we would end up with a far more efficient startup. Launch runtime and vendors would not compete with main app resources — since both app and vendors code orchestration is managed by webpack and vendor scripts can be added on demand when the analytics module is “required” similar to how code splitting works, but webpack 5 gives us the ability to set these async external scripts which can just be URLs, we get the on-demand efficient loading.
The ideal solution would be to recompile reactor turbine, then at the edge, replace Adobe Launch container version of turbine with my own, delivering a single script that contains the rules and a federated rule engine. Launch would work as a CMS, and custom code snippets would be committed to GitHub, deploying new analytics code that webpack always consumes as evergreen. Launch itself would do less work, it’s not evaluating and injecting code — turbine works as a rule engine and glorified messenger that tells webpack what it wants to do.
Since MF lets us import code, engineering can also import analytics triggers when required, instead of mutation observer and lots of event listeners. If some script depends on an api call to respond with page data, an engineer could import and execute the script from a third party only when the app has the data required.
Combined with startup code and Gate — engineering can execute some vendor code that competes with the main webpack runtime, in the event some tags are critical. If we take advantage of how MF works- engineering can also preload third-party code at a different stage to when webpack executed it. Using technology like preheating, you can reduce the processor stress but still be able to trigger the event immediately since that module is loaded into webpack cache.
What if I don’t use tag managers?
It’s much easier, gate or yett with some method to unblock it will work just fine. You could also use MF in the same manner and take advantage of async externals to import vendor tags via webpack and script URL.
The tag manager ads a layer of complexity, since I need to process custom rules and bind it to webpack.
These performance tactics apply to any js, the tag manager is a strong use case since engineering has little visibility and the convenience of tag managers make them an easy solution to quickly adding a lot of code.
The best way to boost performance is by removing code. Gate serves this purpose during initial boot, and adds it back in such a manner that improves RUM, increases lighthouse scores, and does not impact data capture telemetry rates.
Gate also lives to its privacy origins. It can serve as a compliance layer for privacy laws.
While I’m using the reference of analytics and telemetry. Gate serves a greater purpose. Third parties like google maps or live chat tend to be quite heavy, the ability to offload these scripts till they are needed or browsers can effortlessly execute it without processed strain improves user experience and overall site perf.
Why don’t you use a service worker?
I’d love to, but don’t want to risk the worker getting stuck and causing problems in production. The goal was to introduce as little risk as possible and find a way to handle this with normal JS, I don’t want to intercept the network unless I really need to.
Web Workers, on the other hand, we might be able to make larger changes to tagging and process some third parties off the main thread
Gate and partial hydration
Turning back to the main app, I can leverage Gate to determine when to hydrate HTML lower on the document. I usually do not need the entire react application to use the app. In reality, I only require above the fold markup to be hydrated and interactive.
Everything else can be hydrated based on intersection observer, requestIdleCallback, or relative TTI calculations. The best combo will be hydrate on either page TTI or relative TTI in conjunction with intersection observer. Such a method would ensure that the user experience is prioritized on demand when lazy hydrating an application.
Lazy hydration has been a great solution for managing render and hydration times inside of an application, attaching more sophisticated ways to perform partial hydration is nice to have.
Where’s the Gate repo?
At this time, Gate is not open source.
If you want to create your own Gate. Here’s a good place to start. It’ll save time but these libraries all required a rebuild to maximize perf.
You will still see a difference depending on how you implement the Gate-style logic
The missing part you’d need to create is the fallback and unblock logic. There is no preheat or efficient unblock. YETT won’t copy attributes. Time-to-interactive has some efficiency issues but works across browsers. The TTI polyfill from google was pretty good, it required minimal changes-but will only work on browsers with Performance V2. Some APIs are deprecated but will still function.
What about MF and Turbine?
That will likely be open source, with instructions on how to implement. It would require a federated remote be deployed that contains custom code that usually is pasted into the text box of launch.
How I bind to the launch container is still to be determined. _sattelite.track()
could be a friendly way that doesn't involve drastic changes to the tag manager runtime.
For more on performance: