Server-Timing Compression

Table of Contents

Introduction

For domains that use our RUM product, through the power of the Edge, we are now injecting up to 3 Server-Timing entries per resource (including basepage). Because we don’t sample and because we beacon back every timer of every resource requested from every non cross-origin <IFRAME>, we’ve already been applying custom compression based on tries to save data going out of the browser. For Server-Timing, we have a lot of work to do, because the data can get pretty verbose and redundant.

Example

Let’s imagine a scenario where we have five resources each with 2-3 Server-Timing entries:

Resource Server-Timing
Resource #1 cdn-cache; desc=HIT, edge; dur=26
Resource #2 cdn-cache; desc=MISS, edge; dur=23, origin; dur=129
Resource #3 cdn-cache; desc=HIT, edge; dur=11
Resource #4 cdn-cache; desc=MISS, edge; dur=16, origin; dur=327
Resource #5 cdn-cache; desc=MISS, edge; dur=19, origin; dur=214

All five of those resources emit cdn-cache and edge entries. Three emit an origin entry. Built on the assumptions that developers are emitting mostly the same Server-Timing header name values, that desc is essentially an enum, and that dur values will be mostly unique, We augmented the resource timing compression library we use to compress all collected Server-Timing data.

There are two parts to the compression:

  • a lookup data structure shared by all resources
  • a compressed string (for each Server-Timing entry) that “keys into” the shared lookup

Here’s how it works. First, we iterate over the Server-Timing entries for all resources counting up the unique name / description tuples like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "cdn-cache": {
    "count": 5,
    "counts": {"HIT": 2, "MISS": 3}
  },
  "edge": {
    "count": 5,
    "counts": {"": 5}
  },
  "origin": {
    "count": 3,
    "counts": {"": 3}
  }
}

Then, we take that data structure and convert it into an array of arrays, ordering by most popular (cdn-cache and edge before origin, "MISS" before "HIT"):

1
2
3
4
5
[
  ["cdn-cache", "MISS", "HIT"],
  ["edge", ""],
  ["origin", ""],
];

Remember, the goal here is to save as many bytes on the wire as possible. Let’s take advantage of the fact that there are no descriptions specified for any of the edge or origin entries. This leaves us with our final lookup:

1
[["cdn-cache", "MISS", "HIT"], "edge", "origin"];

This lookup tells us that for all of the Server-Timing entries for all of the resources we have collected:

  • there are exactly three unique name values: cdn-cache, edge, and origin
  • for cdn-cache, there are exactly two non-empty descriptions ("HIT" and "MISS") and "MISS" appears as many or more times that "HIT"
  • there are no non-empty descriptions for either edge or origin

Now we can figure out how our individual resources will “key into” our lookup. Here’s a view of the Server-Timing entries we’ve collected:

Resource name description duration
Resource #1 cdn-cache HIT 0
Resource #1 edge "" 26
Resource #2 cdn-cache MISS 0
Resource #2 edge "" 23
Resource #2 origin "" 129
Resource #3 cdn-cache HIT 0
Resource #3 edge "" 11
Resource #4 cdn-cache MISS 0
Resource #4 edge "" 16
Resource #4 origin "" 327
Resource #5 cdn-cache MISS 0
Resource #5 edge "" 19
Resource #5 origin "" 214

For each row, let’s apply two changes:

  • replace the name with the index into the name value of the lookup, as “name index”
  • replace the description with the index into the description value for that name of the lookup, as “desc index”
Resource name index desc index duration
Resource #1 0 1 0
Resource #1 1 26
Resource #2 0 0 0
Resource #2 1 23
Resource #2 2 129
Resource #3 0 1 0
Resource #3 1 11
Resource #4 0 0 0
Resource #4 1 16
Resource #4 2 327
Resource #5 0 0 0
Resource #5 1 19
Resource #5 2 214

Next, let’s apply the template ${duration}:${name index}.${desc index}, as “Compressed #1”:

Resource Compressed #1
Resource #1 0:0.1
Resource #1 26:1
Resource #2 0:0.0
Resource #2 23:1
Resource #2 129:2
Resource #3 0:0.1
Resource #3 11:1
Resource #4 0:0.0
Resource #4 16:1
Resource #4 327:2
Resource #5 0:0.0
Resource #5 19:1
Resource #5 214:2

Now, let’s strip out those inferrable zeroes, as “Compressed #2”:

Resource Compressed #2
Resource #1 0:.1
Resource #1 26:1
Resource #2 0
Resource #2 23:1
Resource #2 129:2
Resource #3 0:.1
Resource #3 11:1
Resource #4 0
Resource #4 16:1
Resource #4 327:2
Resource #5 0
Resource #5 19:1
Resource #5 214:2

Finally, let’s concatenate multiple entries for each resource with a comma, as “Compressed #3”:

Resource Compressed #3
Resource #1 0:.1,26:1
Resource #2 0,23:1,129:2
Resource #3 0:.1,11:1
Resource #4 0,16:1,327:2
Resource #5 0,19:1,214:2

It is at this point that we can stick both the lookup and the compressed per-resource Server-Timing entries into the trie on our beacon.

(A clever reader might notice a failure to lop off durations of value zero, which might be quite common. 😦 The writer intends to personally apologize to each byte needlessly sent over the wire and promises to rectify this egregious and regrettable omission as soon as possible.)