HTTP Archive: jQuery

March 18, 2013 4:30 pm | 35 Comments

A recent thread on Github forÂ html5-boilerplate discusses whether there’s a benefit from loading jQuery from Google Hosted Libraries, as opposed to serving it from your local server. They referenced the great articleÂ Caching and the Google AJAX Libraries by Steve Webster.

Steve(W)’s article concludes by saying that loading jQuery from Google Hosted Libraries is probably NOT a good idea because of the low percentage of sites that use a single version. Instead, developers should bundle jQuery with their own scripts and host it from their own web server. Steve got his data from the HTTP Archive – a project that I run. His article was written in November 2011 so I wanted to update the numbers in this post to help the folks on that Github thread. I also raise some issues that arise from creating combined scripts, especially ones that result in sizes greater than jQuery.

Preamble

SteveW shows the SQL he used. I’m going to do the same. As background, when SteveW did his analysis in November 2011 there were only ~30,000 URLs that were analyzed in each HTTP Archive crawl. We’re currently analyzing ~300,000 per crawl. So this is a bigger and different sample set. I’m going to be looking at the HTTP Archive crawl for Mar 1 2013 which contains 292,297 distinct URLs. The SQL shown in this blog post references these pages based on their unique pageids: pageid >=Â 6726015 and pageid <= 7043218, so you’ll see that in the queries below.

Sites Loading jQuery from Google Hosted Libraries

The first stat in SteveW’s article is the percentage of sites using the core jQuery module from Google Hosted Libraries. Here’s the updated query and results:

mysql> select count(distinct(pageid)) as count,
(100*count(distinct(pageid))/292297) as percent 
from requests where pageid >= 6726015 and pageid <= 7043218 
and url like "%://ajax.googleapis.com/ajax/libs/jquery/%";
+-------+---------+
| count | percent |
+-------+---------+
| 53414 | 18.2739 |
+-------+---------+

18% of the world’s top 300K URLs load jQuery from Google Hosted Libraries, up from 13% in November 2011.

As I mentioned, the sample size is much different across these two dates: ~30K vs ~300K. To do more of an apples-to-apples comparison I restricted the Mar 1 2013 query to just the top 30K URLs (which is 28,980 unique URLs after errors, etc.):

mysql> select count(distinct(p.pageid)) as count,
(100*count(distinct(p.pageid))/28980) as percent 
from requests as r, pages as p where p.pageid >= 6726015 and p.pageid <= 7043218 
and rank <= 30000 and p.pageid=r.pageid 
and r.url LIKE "%://ajax.googleapis.com/ajax/libs/jquery/%";
+-------+---------+
| count | percent |
+-------+---------+
|  5517 | 19.0373 |
+-------+---------+

This shows an even higher percentage of sites loading jQuery core from Google Hosted Libraries: 19% vs 13% in November 2011.

Most Popular Version of jQuery from Google Hosted Libraries

The main question being asked is: Is there enough critical mass from jQuery on Google Hosted Libraries to get a performance boost? The performance boost would come from cross-site caching: The user goes to site A which deposits jQuery version X.Y.Z into the browser cache. When the user goes to another site that needs jQuery X.Y.Z it’s already in the cache and the site loads more quickly. The probability of cross-site caching is greater if sites use the same version of jQuery, and is lower if there’s a large amount of version fragmentation. Here’s a look at the top 10 versions of jQuery loaded from Google Hosted Libraries (GHL) in the Mar 1 2013 crawl.

mysql> select url, count(distinct(pageid)) as count,
(100*count(distinct(pageid))/292297) as percent 
from requests where pageid >= 6726015 and pageid <= 7043218 
and url LIKE "%://ajax.googleapis.com/ajax/libs/jquery/%" 
group by url order by count desc;

jQuery version	percentage of sites
Table 1. Top Versions of jQuery from GHL Mar 1 2013
1.4.2 (http)	1.7%
1.7.2 (http)	1.6%
1.7.1 (http)	1.6%
1.3.2 (http)	1.2%
1.7.2 (https)	1.1%
1.8.3 (http)	1.0%
1.7.1 (https)	0.8%
1.8.2 (http)	0.7%
1.6.1 (http)	0.6%
1.5.2 (http) 1.6.2 (http)	0.5% (tied)

That looks highly fragmented. SteveW saw less fragmentation in Nov 2011:

jQuery version	percentage of sites
Table 2. Top Versions of jQuery from GHL Nov 15 2011
1.4.2 (http)	2.7%
1.3.2 (http)	1.3%
1.6.2 (http)	0.8%
1.4.4 (http)	0.8%
1.6.1 (http)	0.7%
1.5.2 (http)	0.7%
1.6.4 (http)	0.5%
1.5.1 (http)	0.5%
1.4 (http)	0.4%
1.4.2 (https)	0.4%

Takeaways #1

Here are my takeaways from looking at jQuery served from Google Hosted Libraries compared to November 2011:

The most popular version of jQuery is 1.4.2 in both analyses. Even though the percentage dropped from 2.7% to 1.7%, it’s surprising that such an old version maintained the #1 spot.Â jQuery 1.4.2, was released February 19, 2010 – over three years ago! The latest version, jQuery 1.9.1, doesn’t make it in the top 10 most popular versions, but it was only released on February 4, 2013. The newest version in the top 10 is jQuery 1.8.3, which is used on 1% of sites (6th most popular). It was released November 13, 2012. The upgrade rate on jQuery is slow, with many sites using versions that are multiple years old.
There is less critical mass on a single version of jQuery compared to November 2011: 2.7% vs. 1.7%. If your site uses the most popular version of jQuery the probability of users benefiting from cross-site caching is lower today than it was in November 2011.
There is more critical mass across the top 10 versions of jQuery. The top 10 versions of jQuery accounted for 8.8% of sites in November 2011, but has increased to 10.8% today.
8% of sites loading jQuery from Google Hosted Libraries add a query string to the URL. The most popular URL with a querystring is /ajax/libs/jquery/1.4.2/jquery.min.js?ver=1.4.2. (While that’s not surprizing, the second most popular URL is: /ajax/libs/jquery/1.7.1/jquery.min.js?ver=3.5.1.) As SteveW pointed out, this greatly reduces the probability of benefiting from cross-site caching because the browser uses the entire URL as the key when looking up files in the cache. Sites should drop the querystring when loading jQuery from Google Hosted Libraries (or any server for that matter).

The Main Question

While these stats are interesting, they don’t answer the original question asked in the Github thread: Which is better for performance: Loading jQuery from Google Hosted Libraries or from your own server?

There are really three alternatives to consider:

Load core jQuery from Google Hosted Libraries.
Load core jQuery from your own server.
Load core jQuery bundled with your other scripts from your own server.

I don’t have statistics for #3 in the HTTP ArchiveÂ because I’m searching for URLs that match some regex containing “jquery” and it’s unlikely that a website’s combined script would preserve that naming convention.

I can find statistics for #2. This tells us the number of sites that could potentially contribute to the critical mass for cross-site caching benefits if they switched from self-hosting to loading from Google Hosted Libraries. Finding sites that host their own version of jQuery is difficult. I want to restrict it to sites loading core jQuery (since that’s what they would switch to on Google Hosted Libraries). After some trial-and-error I came up with this long query. It basically looks for a URL containing “jquery.[min.].js”, “jquery-1.x[.y][.min].js”, or “jquery-latest[.min].js”.

select count(distinct(pageid)) as count,
(100*count(distinct(pageid))/292297) as percent 
from requests where pageid >= 6726015 and pageid <= 7043218 
and ( url like "%/jquery.js%" or url like "%/jquery.min.js%" 
or url like "%/jquery-1._._.js%" or url like "%/jquery-1._._.min.js%" 
or url like "%/jquery-1._.js%" or url like "%/jquery-1._.min.js%" 
or url like "%/jquery-latest.js%" or url like "%/jquery-latest.min.js%" ) 
and mimeType like "%script%";
+--------+---------+
| count  | percent |
+--------+---------+
| 164161 | 56.1624 |
+--------+---------+

Here are the most popular hostnames across all sites:

hostname	percentage of sites
Table 3. Top Hostnames Serving jQuery Mar 1 2013
ajax.googleapis.com	18.3%
code.jquery.com	1.4%
yandex.st	0.3%
ajax.aspnetcdn.com	0.2%
mat1.gtimg.com	0.2%
ak2.imgaft.com	0.1%
img1.imgsmail.ru	0.1%
www.yfum.com	0.1%
img.sedoparking.com	0.1%
www.biggerclicks.com	0.1%

Takeaways #2

56% of sites are using core jQuery. This is very impressive. This is similar to the findings from BuiltWithÂ (compared to “Top 100,000” trends). The percentage of sites using some portion of jQuery is even higher if you take into consideration jQuery modules other than core, and websites that bundle jQuery with their own scripts and rename the resulting URL.
38% of sites are loading core jQuery from something other than Google Hosted Libraries (56% – 18%). Thus, there would be a much greater potential to benefit from cross-site caching if these websites moved to Google Hosted Libraries. Keep in mind – this query is just for core jQuery – so these websites are already loading that module as a separate resource meaning it would be easy to switch that request to another server.
Although the tail is long, Google Hosted Libraries is by far the most used source for core jQuery. If we want to increase the critical mass around requesting jQuery, Google Hosted Libraries is the clear choice.

Conclusion

This blog post contains many statistics that are useful in deciding whether to load jQuery from Google Hosted Libraries. The pros of requesting jQuery core from Google Hosted Libraries are:

potential benefit of cross-site caching
ease of switching if you’re already loading jQuery core as a standalone request
no hosting, storage, bandwidth, nor maintenance costs
benefit of Google’s CDN performance
1-year cache time

The cons to loading jQuery from Google Hosted Libraries include:

an extra DNS lookup
you might use a different CDN that’s faster
can’t combine jQuery with your other scripts

There are two other more complex but potentially significant issues to think about if you’re considering bundling jQuery with your other scripts. (Thanks to Ilya Grigorik for mentioning these.)

First, combining multiple scripts together increases the likelihood of the resource needing to be updated. This is especially true with regard to bundling with jQuery since jQuery is likely to change less than your site-specific JavaScript.

Second, unlike an HTML document, a script is not parsed incrementally. That’s why some folks, like Gmail, load their JavaScript in an iframe segmented into multiple inline script blocks thus allowing the JavaScript engine to parse and execute the initial blocks while the file is still being downloaded. Combining scripts into a single, large script might reach the point where delayed parsing would be offset by downloading two or more scripts. As far as I know this has not been investigated enough to determine how “large” the script must be to reach the point of negative returns.

If you’re loading core jQuery as a standalone request from your own serverÂ (which 38% of sites are doing), you’ll probably get an easy performance boost by switching to Google Hosted Libraries. If you’re considering creating a combined script that includes jQuery, the issues raised here may mean that’s not the optimal solution.

SteveW and I both agree:Â To make the best decision, website owners need to test the alternatives. Using a RUM solution like Google Analytics Site Speed, Soasta mPulse, Torbit Insight, or New Relic RUMÂ will tell you the impact on your real users.

35 Comments

Zopflinator

March 8, 2013 3:03 pm | 10 Comments

Last week Google open sourced Zopfli – a new data compression library based on DeflateÂ (and therefore a drop-in replacement for Deflate without requiring any changes to the browser). The post mentioned that the “output generated by Zopfli is typically 3â€“8% smaller compared to zlib” but Zopfli takes longer to compress files – “2 to 3 orders of magnitude more than zlib at maximum quality”. 2 to 3 orders of magnitude longer! When I read that I was thinking Zopfli might take seconds or perhaps minutes to compress a 500K script.

Also, even after reading the detailed PDF of results it still wasn’t clear to me how Zopfli would perform on typical web content. The PDF mentioned four corpora. One was called “Alexa-top-10k” which is cool – that’s real web stuff. But was it just the HTML for those sites? Or did it also include scripts and stylesheets? And did it go too far and include images and Flash? I clicked on the link in the footnotes to get more info and was amused when it opened a list of URLs from my HTTP Archive project! That was gratifying but didn’t clarify which responses were analyzed from that list of sites.

The Canterbury Corpus was also listed. Seriously?! Isn’t Chaucer too old to be relevant to new compression algorithms? Turns out it’s not based on Canterbury Tales. (Am I the only one who jumped to that conclusion?) Instead, it’s a standard set of various types of filesÂ but most of them aren’t representative of content on the Web.

Finally, it mentioned “zlib at maximum quality”. AFAIK most sites run the default compression built into the server, for exampleÂ mod_deflate for Apache. While it’s possible to change the compression level, I assume most site owners go with the default. (That would be an interesting and easy question to answer.) I had no idea how “zlib at maximum quality” compared to default web server compression.

My thirst for Zopfli information went unquenched. I wanted a better idea of how Zopfli performed on typical web resources in comparison to default server compression settings, and how long it took.

Luckily, before I got off Twitter IlyaÂ pointed to some Zopfli installation instructions fromÂ FranÃ§ois Beaufort. That’s all I needed to spend the rest of the morning building a tool that I and anyone else can use to see how Zopfli performs:

Zopflinator

It’s easy to use. Just enter the URL to a web resource and submit. Here’s an example for jQuery 1.9.1 served from Google Hosted Libraries.

Notes:

The form doesn’t accept URLs for HTML, images, and Flash. I added HTML to that list because Zopfli makes the most sense for static resources due to the longer compression times.
By default the results are recorded in my DB and shown lower in the page. It’s anonymous – no IPs, User-Agent strings, etc. are saved – just the URL, content-type, and compression results. But sometimes people want to test URLs for stealth sites – so that’s a good reason to UNcheck the crowdsource checkbox.
Four file sizes are shown. (The percentages are based on comparison to the uncompressed file.)

Uncompressed file size is the size of the content without any compression.
mod_deflate size is based on serving the content through my Apache server using mod_deflate’s default settings.
The gzip -9 size is based on running that from the commandline on the uncompressed content.
Zopfli results are also based on running the Zopfli install from the commandline.

Folks who don’t want to install Zopfli can download the resultant file.

The results in this example are pretty typical – Zopfli is usually 1% lower than gzip -9, and gzip -9 is usually 1% lower than mod_deflate. 1% doesn’t seem like that much, but remember that 1% is relative to the original uncompressed response. The more relevant comparison is between Zopfli and mod_deflate. About 70% of compressible responses are actually compressedÂ (across the world’s top 300K websites). The primary question is how much less data would be transferred if responses were instead compressed with Zopfli?

One day after being launched, 238 URLs have been run through Zopflinator. The average sizes are:

uncompressed: 106,272 bytes
mod_deflate: 28,732Â bytesÂ (27%)
gzip -9: 28,062 bytes (26%)
Zopfli: 27,033 bytes (25%)

The average Zopfli savings over mod_deflate is 1,699 bytes (5.9%). This is pretty significant savings. Just imagine if all script and stylesheet transfer sizes were reduced by 6% – that would reduce the number of roundtrips as well as reducing mobile data usage.

Previously I mentioned that Zopfli has compression times that are “2Â to 3 orders of magnitude more than zlib at maximum quality”. When running Zopfli from the commandline it seemed instantaneous – so my fears of it taking seconds or perhaps minutes were unfounded. Of course, this is based on my perception – the actual CPU time consumed by the web server at scale could be significant. That’s why it’s suggested that Zopfli might be best suited for static content. A great example isÂ Google Hosted Libraries.

Of course, anyone using Zopfli out-of-band has to configure their server to use these pre-generated files for compressed responses. So far I haven’t seen examples of how to do this. Ilya pointed out that AWS and Google Cloud Storage work this way already – where you have to upload uncompressed as well as compressed versions of web resources. Zopfli fits well if you use those services. For people who currently do compression on the fly (e.g., using mod_deflate) the change to use Zopfli is likely to outweigh the benefits as long as Zopfli can’t be run in realtime.

I hope to see adoption of Zopfli where it makes sense. If you use Zopfli to compress your resources, please add a comment with feedback on your experience.

10 Comments

Reloading post-onload resources

February 26, 2013 5:35 pm | 16 Comments

Two performance best practices are to add a far future expiration date and to delay loading resources (esp. scripts) until after the onload event. But it turns out that the combination of these best practices leads to a situation where it’s hard for users to refresh resources. More specifically, hitting Reload (or even shift+Reload) doesn’t refresh these cacheable, lazy-loaded resources in Firefox, Chrome, Safari, Android, and iPhone.

What we expect from Reload

The browser has a cache (or 10) where it saves copies of responses. If the user feels those cached responses are stale, she can hit the Reload button to ignore the cache and refetch everything, thus ensuring she’s seeing the latest copy of the website’s content. I couldn’t find anything in the HTTP Spec dictating the behavior of the Reload button, but all browsers have this behavior AFAIK:

If you click Reload (or control+R or command+R) then all the resources are refetched using a Conditional GET request (with the If-Modified-Since and If-None-Match validators). If the server’s version of the response has not changed, it returns a short “304 Not Modified” status with no response body. If the response has changed then “200 OK” and the entire response body is returned.
If you click shift+Reload (or control+Reload or control+shift+R or command+shift+R) then all the resources are refetched withOUT the validation headers. This is less efficient since every response body is returned, but guarantees that any cached responses that are stale are overwritten.

Bottomline, regardless of expiration dates we expect that hitting Reload gets the latest version of the website’s resources, and shift+Reload will do so even more aggressively.

Welcome to Reload 2.0

In the days of Web 1.0, resources were requested using HTML markup – IMG, SCRIPT, LINK, etc. With Web 2.0 resources are often requested dynamically. Two common examples are loading scripts asynchronously (e.g., Google Analytics) and dynamically fetching images (e.g., for photo carousels or images below-the-fold). Sometimes these resources are requested after window onload so that the main page can render quickly for a better user experience, better metrics, etc. If these resources have a far future expiration date, the browser needs extra intelligence to do the right thing.

If the user navigates to the page normally (clicking on a link, typing a URL, using a bookmark, etc.) and the dynamic resource is in the cache, the browser should use the cached copy (assuming the expiration date is still in the future).
If the user reloads the page, the browser should refetch all the resources including resources loaded dynamically in the page.
If the user reloads the page, I would think resources loaded in the onload handler should also be refetched. These are likely part of the basic construction of the page and they should be refetched if the user wants to refresh the page’s contents.
But what should the browser do if the user reloads the page and there are resources loaded after the onload event? Some web apps are long lived with sessions that last hours or even days. If the user does a reload, should every dynamically-loaded resource for the life of the web app be refetched ignoring the cache?

An Example

Let’s look at an example: Postonload Reload.

This page loads an image and a script using five different techniques:

markup – The basic HTML approach: <img src= and <script src=.
dynamic in body – In the body of the page is a script block that creates an image and a script element dynamically and sets the SRC causing the resource to be fetched. This code executes before onload.
onload – An image and a script are dynamically created in the onload handler.
1 ms post-onload – An image and a script are dynamically created via a 1 millisecond setTimeout callback in the onload handler.
5 second post-onload – An image and a script are dynamically created via a 5 second setTimeout callback in the onload handler.

All of the images and scripts have an expiration date one month in the future. If the user hits Reload, which of the techniques should result in a refetch? Certainly we’d expect techniques 1 & 2 to cause a refetch. I would hope 3 would be refetched. I think 4 should be refetched but doubt many browsers do that, and 5 probably shouldn’t be refetched. Settle on your expected results and then take a look at the table below.

The Results

Before jumping into the Reload results, let’s first look at what happens if the user just navigates to the page. This is achieved by clicking on the “try again” link in the example. In this case none of the resources are refetched. All of the resources have been saved to the cache with an expiration date one month in the future, so every browser I tested just reads them from cache. This is good and what we would expect.

But the behavior diverges when we look at the Reload results captured in the following table.

technique	resource	Chrome 25	Safari 6	Android Safari/534	iPhone Safari/7534	Firefox 19	IE 8,10	Opera 12
Table 1. Resources that are refetched on Reload
markup	image 1	Y	Y	Y	Y	Y	Y	Y
markup	script 1	Y	Y	Y	Y	Y	Y	Y
dynamic	image 2	Y	Y	Y	Y	Y	Y	Y
dynamic	script 2	Y	Y	Y	Y	Y	Y	Y
onload	image 3	–	–	–	–	Y	Y	Y
onload	script 3	–	–	–	–	–	Y	Y
1ms postonload	image 4	–	–	–	–	–	–	Y
1ms postonload	script 4	–	–	–	–	–	–	Y
5sec postonload	image 5	–	–	–	–	–	–	–
5sec postonload	script 5	–	–	–	–	–	–	–

The results for Chrome, Safari, Android mobile Safari, and iPhone mobile Safari are the same. When you click Reload in these browsers the resources in the page get refetched (resources 1&2), but not so for the resources loaded in the onload handler and later (resources 3-5).

Firefox is interesting. It loads the four resources in the page plus the onload handler’s image (image 3), but not the onload handler’s script (script 3). Curious.

IE 8 and 10 are the same: they load the four resources in the page as well as the image & script from the onload handler (resources 1-3). I didn’t test IE 9 but I assume it’s the same.

Opera has the best results in my opinion. It refetches all of the resources in the main page, the onload handler, and 1 millisecond after onload (resources 1-4), but it does not refetch the resources 5 seconds after onload (image 5 & script 5). I poked at this a bit. If I raise the delay from 1 millisecond to 50 milliseconds, then image 4 & script 4 are not refetched. I think this is a race condition where if Opera is still downloading resources from the onload handler when these first delayed resources are created, then they are also refetched. To further verify this I raised the delay to 500 milliseconds and confirmed the resources were not refetched, but then increased the response time of all the resources to 1 second (instead of instantaneous) and this caused image 4 & script 4 to be refetched, even though the delay was 500 milliseconds after onload.

Note that pressing shift+Reload (and other combinations) didn’t alter the results.

Takeaways

A bit esoteric? Perhaps. This is a deep dive on a niche issue, I’ll grant you that. But I have a few buts:

If you’re a web developer using far future expiration dates and lazy loading, you might get unexpected results when you change a resource and hit Reload, and even shift+Reload. If you’re not getting the latest version of your dev resources you might have to clear your cache.

This isn’t just an issue for web devs. It affects users as well. Numerous sites lazy-load resources with far future expiration dates including 8 of the top 10 sites: Google,Â YouTube, Yahoo, Microsoft Live, Tencent QQ, Amazon, and Twitter. If you Reload any of these sites with a packet sniffer open in the first four browsers listed, you’ll see a curious pattern: cacheable resources loaded before onload have a 304 response status, while those after onload are read from cache and don’t get refetched. The only way to ensure you get a fresh version is to clear your cache, defeating the expected benefit of the Reload button.

Here’s a waterfall showing the requests when Amazon is reloaded in Chrome. The red vertical line marks the onload event. Notice how the resources before onload have 304 status codes. Right after the onload are some image beacons that aren’t cacheable, so they get refetched and return 200 status codes. The cacheable images loaded after onload are all read from cache, so any updates to those resources are missed.

Finally, whenever behavior varies across browsers it’s usually worthwhile to investigate why. Often one behavior is preferred over another, and we should get the specs and vendors aligned in that direction. In this case, we should make Reload more consistent and have it refetch resources, even those loaded dynamically in the onload handler.

16 Comments

HTTP Archive: new stats

February 16, 2013 11:22 am | 2 Comments

Over the last two months I’ve been coding on the HTTP Archive. I blogged previously about DB enhancements and adding document flush. Much of this work was done in order to add several new metrics. I just finished adding charts for those stats and wanted to explain each one.

Note: In this discussion I want to comment on how these metrics have trended over the last two years. During that time the sample size of URLs has grown from 15K to 300K. In order have a more consistent comparison I look at trends for the Top 1000 websites. In the HTTP Archive GUI you can choose between “All”, “Top 1000”, and “Top 100”. The links to charts below take you straight to the “Top 1000” set of results.

Speed Index

The Speed Index chartÂ measures rendering speed. Speed Index was invented by Pat Meenan as part of WebPagetest. (WebPagetest is the framework that runs all of the HTTP Archive tests.) It is the average time (in milliseconds) at which visible parts of the page are displayed. (See the Speed Index documentation for more information.) As we move to Web 2.0, with pages that are richer and more dynamic, window.onload is a less accurate representationÂ of the user’s perception of website speed. Speed Index better reflects how quickly the user can see the page’s content. (Note that we’re currently investigating if the September 2012 increase in Speed Index is the result of bandwidth contention caused by the increase to 300K URLs that occurred at the same time.)

Doc Size

The Doc Size chart shows the size of the main HTML document. To my surprize this has only grown ~10% over the last two years. I would have thought that the use of inlining (i.e., data:) and richer pages would have shown a bigger increase, especially across the Top 1000 sites.

DOM Elements

I’ve hypothesized that the number of DOM elements in a page has a big impact on performance, so I’m excited to be tracking this in theÂ DOM Elements chart. The number of DOM elements has increased ~16% since May 2011 (when this was added to WebPagetest). Note: Number of DOM elements is not currently available onÂ HTTP Archive Mobile.

Max Reqs on 1 Domain

The question of whether domain sharding is still a valid optimization comes up frequently. The arguments against it include browsers now do more connections per hostname (from 2 to 6) and adding more domains increases the time spent doing DNS lookups. While I agree with these points, I still see many websites that download a large number of resources from a single domain and would cut their page load time in half if they sharded across two domains. This is a great example of the need for Situational Performance OptimizationÂ evangelizedÂ by Guy Podjarny. If a site has a small number of resources on one domain, they probably shouldn’t do domain sharding. Whereas if many resources use the same domain, domain sharding is likely a good choice.

ToÂ gaugeÂ the opportunity for this best practice we need to know how often a single domain is used for a large number of resources. That metric is provided by the Max Reqs on 1 Domain chart. For a given website, the number of requests for each domain are counted. The Â number of requests on the most-used domain is saved as the value of “max reqs on 1 domain” for that page. The average of these max request counts is shown in the chart. For the Top 1000 websites the value has hovered around 42 for the past two years, even while the total number of requests per page as increased from 82 to 99. This tells me that third party content is a major contributor to the increase in total requests, and there are still many opportunities where domain sharding could be beneficial.

The average number of domains per page is also shown in this chart. That has risen 50%, further suggesting that third party content is a major contributor to page weight.

Cacheable Resources

This chart was previously called “Requests with Caching Headers”. While the presence of caching headers is interesting, a more important performance metric is the number of resources that have a non-zero cache lifetime (AKA, “freshness lifetime” as defined in the HTTP spec RFC 2616). To that end I now calculate a new stat for requests, “expAge”, that is the cache lifetime (in seconds). The Cacheable Resources chart shows the percentage of resources with a non-zero expAge.

This revamp included a few other improvements over the previous calculations:

It takes the Expires header into consideration. I previously assumed that if someone sent Expires they were likely to also send max-age, but it turns out that 9% of requests have an Expires but do not specify max-age. (Max-age takes precendence if both exist.)
When the expAge value is based on the Expires date (because max-age is absent), the freshness lifetime is the delta of the Expires date and the Date response header value. For the ~1% of requests that don’t have a Date header, the client’s date value at the time of the request is used.
The new calculation takes into consideration Cache-Control no-store, no-cache, and must-revalidate, setting expAge to zero if any of those are present.

The percentage of resources that are cacheable hasn’t increased much in the last two years, hovering around 60%. And remember – the chart shown here is for the Top 1000 websites which are more highly tuned for performance than the long tail. This metric drops down to ~42% across all 300K top sites. I think this is a big opportunity for better performance, especially since I believe many sites don’t specify caching headers due to lack of awareness. A deeper study for a performance researcher out there would be to determine how many of the uncacheable resources truly shouldn’t be cached (e.g., logging beacons) versus static resources that could have a positive cache time (e.g, resources with a Last-Modified date in the past).

Cache Lifetime

The Cache Lifetime chart gives a histogram of expAge values for an individual crawl. (See the definition of expAge above.) This chart used to be called “Cache-Control: max-age”, but that was only focused on the max-age value. As described previously, the new expAge calculation takes the Expires header into consideration, as well as other Cache-Control options that override cache lifetime. For the Top 1000 sites on Feb 1 2013, 39% of resources had a cache lifetime of 0. Remembering that top sites are typically better tuned for performance, we’re not surprized that this jumps toÂ 59% across all sites.

Sites hosting HTML on CDN

The last new chart is Sites hosting HTML on CDN. This shows the percentage of sites that have their main HTML document hosted on a CDN. WebPagetest started tracking this on Oct 1, 2012. The CDNs recorded in the most recent crawl were Google, Cloudflare, Akamai, lxdns.com, Limelight, Level 3, Edgecast, Cotendo CDN, ChinaCache, CDNetworks, Incapsula, Amazon CloudFront, AT&T, Yottaa, NetDNA, Mirror Image, Fastly,Â Internap, Highwinds, Windows Azure, cubeCDN, Azion, BitGravity, Cachefly, CDN77, Panther, OnApp, Simple CDN, and BO.LT. This is a new feature and I”m sure there are questions about determining and adding CDNs. We’ll follow-up on those as they come in. Keep in mind that this is just for the main HTML document.Â

It’s great to see the HTTP Archive growing both in terms of coverage (number of URLs) and depth of metrics. Make sure to checkout the About page to find links to the code, data downloads, FAQ, and discussion group.

2 Comments

HTTP Archive: adding flush

January 31, 2013 12:09 am | 8 Comments

In my previous post, HTTP Archive: new schema & dumps,Â I described my workÂ to make the database faster, easier to download, consume less disk space, and contain more stats. Once these updates were finished I was excited to start going through the code and make pages faster using the new schema changes. Although time consuming, it’s been fun to change some queries and see the site get much faster.

Along the way I bumped into the page for viewing an individual website’s results, for example Whole Foods. Despite my schema changes, it has a slow (~10 seconds) query in the middle of the page. I’ve created a bug to figure out how to improve this (I think I need a new index), but for the short term I decided to just flush the document before the slow query. This page is long, so the slow part is well below-the-fold. By adding flush I would be able to get the above-the-fold content to render more quickly.

I wrote a blog post in 2009 describing Flushing the Document Early. It describes flushing thusly:

Flushing is when the server sends the initial part of the HTML document to the client before the entire response is ready. All major browsers start parsing the partial response. When done correctly, flushing results in a page that loads and feels faster. The key is choosing the right point at which to flush the partial HTML document response. The flush should occur before the expensive parts of the back end work, such as database queries and web service calls. But the flush should occur after the initial response has enough content to keep the browser busy. The part of the HTML document that is flushed should contain some resources as well as some visible content. If resources (e.g., stylesheets, external scripts, and images) are included, the browser gets an early start on its download work. If some visible content is included, the user receives feedback sooner that the page is loading.

My first step was to add a call to PHP’s flush functionÂ right before trends.inc which contains the slow query:

<?php
flush();
require_once('trends.inc'); // contains the slow query
?>

Nothing changed. The page still took ~10 seconds to render. In that 2009 blog post I mentioned it’s hard to get the details straight. Fortunately I dug into those details in the corresponding chapter from Even Faster Web Sites.Â I reviewed the chapter and read about how PHP uses output buffering, requiring some additional PHP flush functions. Specifically, all existing output buffers have to be cleared with a call toÂ ob_end_flush, a new output buffer is activated by ob_start, and this new output buffer has to be cleared usingÂ ob_flushÂ before calling flush:

<?php
// Flush any currently open buffers.
while (ob_get_level() > 0) {
 ob_end_flush();
}
ob_start();
?>
[a bunch of HTML...]
<?php
ob_flush();
flush();
require_once('trends.inc'); // contains the slow query
?>

After following the advice for managing PHP’s output buffers, flushing still didn’t work. Reading further in the chapter I saw that Apache has a buffer that it uses when gzipping. If the size of the output is less than 8K at the time flush is called, Apache won’t flush the output because it wants at least 8K before it gzips. In my case I had only ~6K of output before the slow query so was falling short of the 8K threshold. An easy workaround is to add padding to the HTML document to exceed the threshold:

<?php
// Flush any currently open buffers.
while (ob_get_level() > 0) {
    ob_end_flush();
}
ob_start();
?>
[a bunch of HTML...]
<!-- 0001020304050607080[2K worth of padding]... -->
<?php
ob_flush();
flush();
require_once('trends.inc'); // contains the slow query
?>

After adding the padding flushing worked! It felt much faster. As expected, the flush occurred at a point well below-the-fold, so the page looks done unless the user quickly scrolls down. The downside of adding padding to the page is a larger HTML document that takes longer to download, is larger to store, etc. Instead, we used Apache’s DeflateBufferSize directive to lower the gzip threshold to 4K. With this change the page renders faster without the added page weight.

The flush change is now in production. You can see the difference using these URLs:

These URLs open a random website each time to avoid any cached MySQL results. Without flushing, the page doesn’t change for ~10 seconds. With flushing, the above-the-fold content changes after ~3 seconds, and the below-the-fold content arrives ~7 seconds later.

I still don’t see flushing used on many websites. It can be confusing and even frustrating to setup. My responses already had chunked encoding, so I didn’t have to jump through that hoop. But as you can see the faster rendering makes a significant difference. If you’re not flushing your document early, I recommend you give it a try.

8 Comments

Web Performance Community & Conversation

December 5, 2012 3:48 pm | 4 Comments

I first started talking about web performance in 2007. My first blog post was The Importance of Front-End PerformanceÂ over on YDN in March 2007. The next month Tenni Theurer and I spoke at Web 2.0 Expo on High Performance Webpages. I hadn’t spoken at a conference since 1990 – 17 years earlier! This speaking appearance was before YSlow and my book High Performance Web Sites had been released. There was no conversation around web performance at this time – at least none that I was aware of.

Our 3 hour (!) workshop was schedule for 9:30am on a Sunday morning. Tenni and I thought we were doomed. I told her we should expect 20 or so people, and not to be disappointed if the audience was small. I remember it was a beautiful day in San Francisco and I thought to myself, “If I was here for a conference I would be out touring San Francisco rather than sitting in a 3 hour workshop at 9:30 on a Sunday morning.”

Tenni and I were surprised that we’d been assigned to a gigantic ballroom. We were also surprised that there were already 20+ people there when we arrived early to setup. But the real surprise came while we sat there waiting to start – nearly 300 people flowed into the room. We looked at each other with disbelief. Wow!

Constant blogging, open sourcing, and public speaking carried the conversation forward. The first Velocity conference took place in June 2008 with ~500 attendees. Velocity 2012 had ~2500 attendees, and now takes place on three continents. The conversation has certainly grown!

That was a fun look at the past, but what I really wanted to do in this blog post was highlight three critical places where the web performance conversation is being held now.

Web Performance Meetups – SergeyÂ Chernyshev started the New York Web Performance Group in April 2009. Today there are 46 Web Performance meetupsÂ with 16,631 members worldwide. Wow! This is a huge community and a great format for web performance enthusiasts to gather and share what they’ve learned to continue to make the Web even faster.
Exceptional Performance mailing listÂ – I started the Exceptional Performance team at Yahoo! In doing so I also created the Exceptional Performance Yahoo! GroupÂ with its associated mailing list. This group has atrophied in recent years, but I’m going to start using it again as a way to communicate to dedicated web performance developers. It currently has 1340 members and the spam rate is low. I encourage you to sign up and read & post messages on the list.
PerfPlanet.com – I’ll be honest – I think my blog is really good. And while I encourage you to subscribe to my RSS feed, it’s actually more important that you subscribe to the feed from PerfPlanet.com. Stoyan Stefanov, another former member of the Exceptional Performance team, maintains the site including its awesome Performance Calendar (now in its fourth instantiation). Stoyan has collected ~50 of the best web performance blogs. This is my main source for the latest news and developments in the world of web performance.

It’s exciting to see our community grow. I still believe we’re at the tip of theÂ iceberg. Back in 2007 I would have never predicted that we’d have 16K web performance meetup members, 2500 Velocity attendees, and 1340 mailing list members. I wonder what it’ll be in 2014. It’s fun to imagine.

4 Comments

clear current page UX

December 5, 2012 1:55 am | Comments Off on clear current page UX

Yesterday inÂ Perception of SpeedÂ I wrote about how clicking a link doesn’t immediately clear the screen. Instead, browsers wait “until the next page arrives” before clearing the screen. This improves the user experience because instead of being unoccupied (staring at a blank screen), users are occupied looking at the current page.

But when exactly do browsers clear the screen? Pick what you think is the best answer to that question:

when the first byte of the new page arrives
when the new page’s BODY element is rendered
when DOMContentLoaded fires
when window.onload fires

I would have guessed “A”. In fact, I’ve been telling people for years that the current page is cleared when the new document’s first byte arrives. That changed yesterday when my officemate, Ilya Grigorik, wondered outloud when exactly the browser cleared the page. It turns out the answer is at or slightly before “B” – when the new page’s BODY element is created.

Test Page

Here’s a test page that helps us explore this behavior:

Clear UX test page

The page contains a script that takes 10 seconds to return. (You can change this by editing the value for “delay” in the querystring.) I used a script because they block the browser from parsing the HTML document. Placing this script at different points in the page allows us to isolate when the screen is cleared. The three choices for positioning the script are:

top of HEAD – The SCRIPT tag is placed immediately after the HEAD tag, even before TITLE. This is our proxy for “first byte” of the new document.
bottom of HEAD – The SCRIPT tag is placed right before the /HEAD tag. There are a few STYLE blocks in the HEAD to create some work for the browser while parsing the HEAD. This allows us to see the state right before the BODY element is created.
top of BODY – The SCRIPT tag is placed immediately after the BODY tag. This allows us to isolate the state right after the BODY element is created.

Background colors are assigned randomly to make it clear when the BODY has been rendered.

Test Results

I ran these three tests on the major browsers to measure how long they waited before clearing the page. I assumed it would either be 0 or 10 seconds, but it turns out some browsers are in the middle. The following table shows the number of seconds it took before the browser cleared the page for each of the three test cases.

Browser	top of HEAD	bottom of HEAD	top of BODY
Table 1: Number of seconds until page is cleared
Chrome 23	~5	~5	0
Firefox 17	10	10	0
IE 6-9	10	10	0
Opera 12	~4	~4	~4
Safari 6	10	10	0

Let’s look at the results for each position of the script and see what we can learn.

Â #1: Top of HEAD

In this test the 10 second script is placed between the HEAD and TITLE:

<head>
<script src=...
<title>...

Even though the HTML document has arrived and parsing has begun, none of the browsers clear the screen at this point. Â Therefore, browsers do NOT clear the page when the first byte of the new page arrives. Firefox, IE, and Safari don’t clear the screen until 10 seconds have passed and the BODY has been parsed (as we’ll confirm in test #3). Chrome and Opera have interesting behavior here – they both clear the screen after 4-5 seconds.

The major browsers differ in how they handle this situation. Which is better – to preserve the old page until the new page’s body is ready, or to clear the old page after a few seconds? Clearing the screen sooner gives the user some feedback that things are progressing, but it also leaves the user staring at a blank screen. (And the hypothesis fromÂ yesterdayÂ is that staring at a blank screen is UNoccupied time which is less satisfying.)

Do Chrome and Opera intentionally clear the page to provide the user feedback, or is something else triggering it? Ilya was doing some deep dive tracing with Chrome (as he often does) and found that the screen clearing appeared to coincide with GC kicking in.Â It’s not clear which behavior is best, but it is interesting that the browsers differ in how long they wait before clearing the screen when the BODY is blocked from being parsed.

Another interesting observation from this test is the time at which the TITLE is changed. I was surprised to see that all the browsers clear the old title immediately, even though the old page contents are left unchanged. This results in a mismatch where the title no longer matches the page. Every browser except one replaces the old title with the URL being requested. This is a bit unwieldy since the URL doesn’t fit in the amount of space available in a tab. The only exception to this is Firefox which displays “Connecting…” in the tab. It’s further interesting to note that this “interim” title is displayed for the entire 10 seconds until the script finishes downloading. This makes sense because the script is blocking the parser from reaching the TITLE tag. We’ll see in the next test that the TITLE is updated sooner when the script is moved lower.

Â #2: Bottom of HEAD

In this test the 10 second script is placed at the end of the HEAD block:

<script src=...
</head>
<body>

The results for clearing the page are the same as test #1: Firefox, IE, and Safari don’t clear the page for 10 seconds. Chrome and Opera clear the screen after 4-5 seconds.Â The main point of this test is to confirm that the browser still hasn’t cleared the page up to the point immediately before parsing the BODY tag.

The title of the new page is displayed immediately. This makes sense since the TITLE tag isn’t blocked from being parsed as it was in test #1. It’s interesting, however, that the browser parses the TITLE tag and updates the title, but doesn’t clear the old page’s contents. This is worse than the previous mismatch. In test #1 an “interim” title was shown (the URL or “Connecting…”). Now the actual title of the new page is shown above the content of the old page.

Â #3: Top of BODY

In this test the 10 second script is placed immediately after the BODY tag:

</head>
<body>
<script src=...

Since nothing is blocking the browser from parsing the BODY tag, all but one of the browsers immediately clear the old page and render the new body (as seen by swapping in a new random background color). However, because the very next tag is the 10 second script, the rest of the page is blocked from rendering so the user is left staring at a blank page for 10 seconds. The browser that behaves differently is Opera – it maintains its ~4 second delay before erasing the screen. This is curious. What is it waiting for? It parses the new BODY tag and knows there’s a new background color at time 0, but it waits to render that for several seconds. Does downloading the script block the renderer? But what causes the renderer to kick in even though the script still has numerous seconds before it returns?

For these contrived test cases Opera has the best behavior in my opinion. The other browsers leave the user staring at the old page for 10 seconds wondering if something’s broken (test #1 & #2 for Firefox, IE, and Safari), or leave the user staring at a blank page for 10 seconds while rendering is blocked (test #3 for Chrome, Firefox, IE, and Safari). Opera always compromises letting the user watch the old content for 3-4 seconds before clearing the page, and then shows a blank screen or the new body for the remaining 6-7 seconds.

Takeaways

These examples, while contrived, yield some real world takeaways.

Browsers generally clear the page when the BODY is parsed, not when the first bytes arrive. I say “generally” because during testing I was able to get browsers to clear the page before BODY by adding async and inline scripts, but it was very finicky. Changing just a few lines in innocuous ways would change the behavior such that I wasn’t able to nail down what exactly was causing the page to be cleared. But all of this was after the page’s first bytes had arrived.
It’s unclear what is the best user experience for clearing the page. The major browsers have fairly divergent behavior, and it’s not clear whether these differences are intentional. Transitioning from one page to the next is an action that’s repeated billions of times a day. I’m surprised there’s not more consistency and data on what produces the best user experience.
Avoid frontend SPOF. I’ve writtenÂ (and spoken)Â extensively about how loading scripts synchronously can create a single point of failure in your web page. This new information about how and when the old page is cleared adds to the concern about slow-loading synchronous scripts, especially in light of the inconsistent way browsers handle them. Whenever possible load scripts asynchronously.

Comments Off on clear current page UX

The Perception of Speed

December 3, 2012 5:02 pm | 19 Comments

Have you ever noticed that when you click on a link the page doesn’t change right away?

If I had written the code I would have cleared the page as soon as the link was clicked. But in a masterstroke of creating the perception of faster websites, browsers instead don’t erase the old page until the next page arrives.

Keeping the old page in place improves the user experience for the same reason that making airline passengers walk six times longer to get their bagsÂ reduces complaints about long waits at the baggage claim. “Occupied time (walking to baggage claim) feels shorter than unoccupied time (standing at the carousel),” according to M.I.T. operations researcher Richard Larson, an expert on queueing. In my example of clicking a link, occupied Â time (looking at the old page) feels shorter than unoccupied time (staring at a blank screen).

Let’s try an example using this page you’re currently viewing. Both of the following links add a five second delay to this page. The first link refetches this page the normal way – the browser leaves the old page until the new page arrives. The second link clears the page before refetching. Which feels slower to you?

Clicking the first link leaves the user staring at this page’s text for 5+ seconds before it refreshes. It’s slow, but not that noticeable. Clicking the second link makes the same wait time more annoying. I actually start getting antsy, shuffling my feet and shifting in my chair. For real users and web pages this translates into higher abandonment rates.

One takeaway from this is toÂ keep your eye on the ball. In the case of web performance, we want to create a faster, better user experience. There are great techniques for tackling that problem head-on (reduce repaints, optimize JavaScript, etc.), but sometimes you can make significant improvements with changes that address the user’s perception of speed such as spinners and progress bars.

Another takeaway is to build single page web apps. The old Web 1.0 way of requesting a new page for every user action and repainting the entire page is more likely to produce a jarring, slow experience. Using Ajax allows us to keep the user engaged while requests and responses are handled in the background, often being able to do them asynchronously.

19 Comments

Comparing RUM & Synthetic Page Load Times

November 14, 2012 5:30 pm | 14 Comments

Yesterday I read Etsy’s October 2012 Site Performance Report. Etsy is one of only a handful of companies that publish their performance stats with explanations and future plans. It’s really valuable (and brave!), and gives other developers an opportunity to learn from an industry leader. In this article Etsy mentions that the page load time stats are gathered from a private instance of WebPagetest. They explain their use of synthetically-generated measurements instead of RUM (Real User Monitoring) data:

You might be surprised that we are using synthetic tests for this front-end report instead of Real User MonitoringÂ (RUM)Â data. Â RUM is a big part of performance monitoring at Etsy, but when we are looking at trends in front-end performance over time, synthetic testing allows us to eliminate much of the network variability that is inherent in real user data.Â This helps us tie performance regressions to specific code changes, and get a more stable view of performance overall.

Etsy’s choice of synthetic data for tracking performance as part of their automated build process totally makes sense. I’ve talked to many companies that do the same thing. Teams dealing with builds and code regressions should definitely do this. BUT… it’s important to include RUM data when sharing performance measurements beyond the internal devops team.

Why should RUM data always be used when talking beyond the core team?

The issue with only showing synthetic data is that it typically makes a website appear much faster than it actually is. This has been true since I first started tracking real user metrics back in 2004. My rule-of-thumb is that your real users are experiencing page load times that are twice as long as their corresponding synthetic measurements.

RUM data, by definition, is from real users. It is the ground truth for what users are experiencing. Synthetic data, even when generated using real browsers over a real network, can never match the diversity of performance variables that exist in the real world: browsers, mobile devices, geo locations, network conditions, user accounts, page view flow, etc. The reason we use synthetic data is that it allows us to create a consistent testing environment by eliminating the variables. The variables we choose for synthetic testing matches a segment of users (hopefully) but it can’t capture the diversity of users that actually visit our websites every day. That’s what RUM is for.

The core team is likely aware of the biases and assumptions that come with synthetic data. They know that it was generated using only laptops and doesn’t include any mobile devices; that it used a simulated LAN connection and not a slower DSL connection; that IE 9 was used and IE 6&7 aren’t included. Heck, they probably specified these test conditions. The problem is that the people outside the team who see the (rosy) synthetic metrics aren’t aware of these caveats. Even if you note these caveats on your slides, they still won’t remember them! What they will remember is that you said the page loaded in 4 seconds, when in reality most users are getting a time closer to 8 seconds.

How different are RUM measurements as compared to synthetic?

As I said a minute ago, my rule-of-thumb is that RUM page load times are typically 2x what you see from synthetic measurements. After my comment on the Etsy blog post about adding RUM data and a tweet from @jkowall asking for data comparing RUM to synthetic less than 24 hours later, I decided to gather some real data from my website.

Similar to Etsy, I used WebPagetest to generate synthetic measurements. I chose a single URL: https://stevesouders.com/blog/2012/10/11/cache-is-king/. I measured it using a simulated DSL connection in Chrome 23, Firefox 16, and IE 9. I measured both First View (empty cache) and Repeat View (primed cache). I did three page loads and chose the median. My RUM data came from Google Analytics’ Site Speed feature over the last month. As shown in this chart of the page load time results, the RUM page load times are 2-3x slower than the synthetic measurements.

There’s some devil in the details. The synthetic data could have been more representative: I could have done more than three page loads, tried different network conditions, and even chosen different geo locations. The biggest challenge was mixing the First View and Repeat View page load times to compare to RUM. The RUM data contains both empty cache and primed cache page views, but the split is unknown. A study Tenni Theurer and I did in 2007 showed that ~80% of page views are done with a primed cache. To be more conservative I averaged the First View and Repeat View measurements and call that “Synth 50/50” in the chart. The following table contains the raw data:

	Chrome 23	Firefox 16	IE 9
Synthetic First View (secs)	4.64	4.18	4.56
Synthetic Repeat View (secs)	2.08	2.42	1.86
Synthetic 50/50 (secs)	3.36	3.30	3.21
RUM (secs)	9.94	8.59	6.67
RUM data points	94	603	89

In my experience these results showing RUM page load times being much slower than synthetic measurements are typical. I’d love to hear from other website owners about how their RUM and synthetic measurements compare. In the meantime, be cautious about only showing your synthetic page load times – the real user experience is likely quite a bit slower.

14 Comments

Q&A: Nav Timing and post-onload requests

October 30, 2012 11:29 am | 1 Comment

Today I got an email asking this question about window.performance.timing:

I’ve noticed that on all browsers (where timing is supported), the timing stops once the readyState of the document = ‘complete’.Â Seems normal, but in many cases, I’ve seen web pages that load additional “stuff” via aysnc loading (mostly mktg tags) and the timing object doesn’t seem to reflect this additional load time.Â Is this by design?

It’s a good question so I wanted to post my response for others who might be wondering the same thing.

An increasing number of websites load resources after the window.onload event. In fact, 8 of the world’s top 10 websites have post-onload activity: Google, Facebook, YouTube, Yahoo, Windows Live, Twitter, Tencent, and Amazon.

It makes sense that you’d want to capture this activity as part of any performance metrics, so why isn’t it already part of window.performance.timing? Remember we’re dealing with a W3C specification – the Navigation Timing specification to be exact. As with many specs, what seems intuitively simple is more difficult to capture in exact language. Looking at this processing model graph we see that the questioner is right – Navigation Timing stops at the window.onload event. We might think of extending the spec to have an “end of network activity” event that would include these post-onload requests.

Defining “end of network activity” is the tricky part. In the case of many sites, such as the 8 sites listed previously, the post-onload activity is a few HTTP requests. This is more straightforward. But what about sites that do a polling XHR every 5 seconds? Or sites that use Comet (hanging GET) or websockets? Their network activity never ends, so the Navigation Timing metrics would never complete.

There’s also a tension between wanting to capture this later activity but also wanting sites to send back information as quickly as possible. Many users (5-15% in my experience) quickly click through to the next page. For these users it’s important to fire the metrics beacon (AKA tag) before they leave the page. The two key questions for this issue become:

How can post-onload activity be measured?
When should the timing metrics be beaconed back?

It’s possible that Navigation Timing could be extended to have an “end of network activity” value. For sites that have infinite network activity the specification could have language that set this value in the absence of network activity for N (2?) seconds or after a maximum of M (10?) seconds after window.onload. I encourage people in favor of this idea to send email to public-web-perf@w3.org (with [NavigationTiming] at the start of the subject line).

Instead of a high level “end of network activity” measurement, web developers can get more fine grained measurements today. Many of the post-onload HTTP requests are scripts, images, and XHRs that can all be timed individually with JavaScript. In the future the Resource Timing spec will provide this per-request level of timing information. Developers of websites that have infinite post-onload activity can make their own decisions about what to include in their post-onload activity measurements.

With regard to when the timing metrics should be beaconed back, it’s important to send a beacon as quickly as possible to get information before the user does a quick click through. Therefore, I recommend sending the Nav Timing data in a beacon as part of window.onload. (See sample code in A Practical Guide to the Navigation Timing API.) For sites that want to measure post-onload activity, I recommend sending a second beacon with this additional information. Sending two beacons should have a minimal performance impact for the user, network, and backend server. If the delta in the number of first vs second beacons is within a tolerable range, then the website owner could choose to send only the latter beacon containing all the performance information.

1 Comment