Server-Timing Compression
Table of Contents
Introduction
For domains that use our RUM product, through the power of the Edge, we are now injecting up to 3 Server-Timing entries per resource (including basepage). Because we don’t sample and because we beacon back every timer of every resource requested from every non cross-origin <IFRAME>
, we’ve already been applying custom compression based on tries to save data going out of the browser. For Server-Timing, we have a lot of work to do, because the data can get pretty verbose and redundant.
Example
Let’s imagine a scenario where we have five resources each with 2-3 Server-Timing entries:
Resource | Server-Timing |
---|---|
Resource #1 | cdn-cache; desc=HIT, edge; dur=26 |
Resource #2 | cdn-cache; desc=MISS, edge; dur=23, origin; dur=129 |
Resource #3 | cdn-cache; desc=HIT, edge; dur=11 |
Resource #4 | cdn-cache; desc=MISS, edge; dur=16, origin; dur=327 |
Resource #5 | cdn-cache; desc=MISS, edge; dur=19, origin; dur=214 |
All five of those resources emit cdn-cache
and edge
entries. Three emit an origin
entry. Built on the assumptions that developers are emitting mostly the same Server-Timing header name
values, that desc
is essentially an enum, and that dur
values will be mostly unique, We augmented the resource timing compression library we use to compress all collected Server-Timing data.
There are two parts to the compression:
- a lookup data structure shared by all resources
- a compressed string (for each Server-Timing entry) that “keys into” the shared lookup
Here’s how it works. First, we iterate over the Server-Timing entries for all resources counting up the unique name
/ description
tuples like so:
|
|
Then, we take that data structure and convert it into an array of arrays, ordering by most popular (cdn-cache
and edge
before origin
, "MISS"
before "HIT"
):
|
|
Remember, the goal here is to save as many bytes on the wire as possible. Let’s take advantage of the fact that there are no descriptions specified for any of the edge
or origin
entries. This leaves us with our final lookup:
|
|
This lookup tells us that for all of the Server-Timing entries for all of the resources we have collected:
- there are exactly three unique
name
values:cdn-cache
,edge
, andorigin
- for
cdn-cache
, there are exactly two non-empty descriptions ("HIT"
and"MISS"
) and"MISS"
appears as many or more times that"HIT"
- there are no non-empty descriptions for either
edge
ororigin
Now we can figure out how our individual resources will “key into” our lookup. Here’s a view of the Server-Timing entries we’ve collected:
Resource | name | description | duration |
---|---|---|---|
Resource #1 | cdn-cache | HIT | 0 |
Resource #1 | edge | "" | 26 |
Resource #2 | cdn-cache | MISS | 0 |
Resource #2 | edge | "" | 23 |
Resource #2 | origin | "" | 129 |
Resource #3 | cdn-cache | HIT | 0 |
Resource #3 | edge | "" | 11 |
Resource #4 | cdn-cache | MISS | 0 |
Resource #4 | edge | "" | 16 |
Resource #4 | origin | "" | 327 |
Resource #5 | cdn-cache | MISS | 0 |
Resource #5 | edge | "" | 19 |
Resource #5 | origin | "" | 214 |
For each row, let’s apply two changes:
- replace the
name
with the index into thename
value of the lookup, as “name index” - replace the
description
with the index into thedescription
value for thatname
of the lookup, as “desc index”
Resource | name index | desc index | duration |
---|---|---|---|
Resource #1 | 0 | 1 | 0 |
Resource #1 | 1 | 26 | |
Resource #2 | 0 | 0 | 0 |
Resource #2 | 1 | 23 | |
Resource #2 | 2 | 129 | |
Resource #3 | 0 | 1 | 0 |
Resource #3 | 1 | 11 | |
Resource #4 | 0 | 0 | 0 |
Resource #4 | 1 | 16 | |
Resource #4 | 2 | 327 | |
Resource #5 | 0 | 0 | 0 |
Resource #5 | 1 | 19 | |
Resource #5 | 2 | 214 |
Next, let’s apply the template ${duration}:${name index}.${desc index}
, as “Compressed #1”:
Resource | Compressed #1 |
---|---|
Resource #1 | 0:0.1 |
Resource #1 | 26:1 |
Resource #2 | 0:0.0 |
Resource #2 | 23:1 |
Resource #2 | 129:2 |
Resource #3 | 0:0.1 |
Resource #3 | 11:1 |
Resource #4 | 0:0.0 |
Resource #4 | 16:1 |
Resource #4 | 327:2 |
Resource #5 | 0:0.0 |
Resource #5 | 19:1 |
Resource #5 | 214:2 |
Now, let’s strip out those inferrable zeroes, as “Compressed #2”:
Resource | Compressed #2 |
---|---|
Resource #1 | 0:.1 |
Resource #1 | 26:1 |
Resource #2 | 0 |
Resource #2 | 23:1 |
Resource #2 | 129:2 |
Resource #3 | 0:.1 |
Resource #3 | 11:1 |
Resource #4 | 0 |
Resource #4 | 16:1 |
Resource #4 | 327:2 |
Resource #5 | 0 |
Resource #5 | 19:1 |
Resource #5 | 214:2 |
Finally, let’s concatenate multiple entries for each resource with a comma, as “Compressed #3”:
Resource | Compressed #3 |
---|---|
Resource #1 | 0:.1,26:1 |
Resource #2 | 0,23:1,129:2 |
Resource #3 | 0:.1,11:1 |
Resource #4 | 0,16:1,327:2 |
Resource #5 | 0,19:1,214:2 |
It is at this point that we can stick both the lookup and the compressed per-resource Server-Timing entries into the trie on our beacon.
(A clever reader might notice a failure to lop off durations of value zero, which might be quite common. 😦 The writer intends to personally apologize to each byte needlessly sent over the wire and promises to rectify this egregious and regrettable omission as soon as possible.)