Reloading post-onload resources
Two performance best practices are to add a far future expiration date and to delay loading resources (esp. scripts) until after the onload event. But it turns out that the combination of these best practices leads to a situation where it’s hard for users to refresh resources. More specifically, hitting Reload (or even shift+Reload) doesn’t refresh these cacheable, lazy-loaded resources in Firefox, Chrome, Safari, Android, and iPhone.
What we expect from Reload
The browser has a cache (or 10) where it saves copies of responses. If the user feels those cached responses are stale, she can hit the Reload button to ignore the cache and refetch everything, thus ensuring she’s seeing the latest copy of the website’s content. I couldn’t find anything in the HTTP Spec dictating the behavior of the Reload button, but all browsers have this behavior AFAIK:
- If you click Reload (or control+R or command+R) then all the resources are refetched using a Conditional GET request (with the If-Modified-Since and If-None-Match validators). If the server’s version of the response has not changed, it returns a short “304 Not Modified” status with no response body. If the response has changed then “200 OK” and the entire response body is returned.
- If you click shift+Reload (or control+Reload or control+shift+R or command+shift+R) then all the resources are refetched withOUT the validation headers. This is less efficient since every response body is returned, but guarantees that any cached responses that are stale are overwritten.
Bottomline, regardless of expiration dates we expect that hitting Reload gets the latest version of the website’s resources, and shift+Reload will do so even more aggressively.
Welcome to Reload 2.0
In the days of Web 1.0, resources were requested using HTML markup – IMG, SCRIPT, LINK, etc. With Web 2.0 resources are often requested dynamically. Two common examples are loading scripts asynchronously (e.g., Google Analytics) and dynamically fetching images (e.g., for photo carousels or images below-the-fold). Sometimes these resources are requested after window onload so that the main page can render quickly for a better user experience, better metrics, etc. If these resources have a far future expiration date, the browser needs extra intelligence to do the right thing.
- If the user navigates to the page normally (clicking on a link, typing a URL, using a bookmark, etc.) and the dynamic resource is in the cache, the browser should use the cached copy (assuming the expiration date is still in the future).
- If the user reloads the page, the browser should refetch all the resources including resources loaded dynamically in the page.
- If the user reloads the page, I would think resources loaded in the onload handler should also be refetched. These are likely part of the basic construction of the page and they should be refetched if the user wants to refresh the page’s contents.
- But what should the browser do if the user reloads the page and there are resources loaded after the onload event? Some web apps are long lived with sessions that last hours or even days. If the user does a reload, should every dynamically-loaded resource for the life of the web app be refetched ignoring the cache?
An Example
Let’s look at an example: Postonload Reload.
This page loads an image and a script using five different techniques:
- markup – The basic HTML approach:
<img src=
and<script src=
. - dynamic in body – In the body of the page is a script block that creates an image and a script element dynamically and sets the SRC causing the resource to be fetched. This code executes before onload.
- onload – An image and a script are dynamically created in the onload handler.
- 1 ms post-onload – An image and a script are dynamically created via a 1 millisecond setTimeout callback in the onload handler.
- 5 second post-onload – An image and a script are dynamically created via a 5 second setTimeout callback in the onload handler.
All of the images and scripts have an expiration date one month in the future. If the user hits Reload, which of the techniques should result in a refetch? Certainly we’d expect techniques 1 & 2 to cause a refetch. I would hope 3 would be refetched. I think 4 should be refetched but doubt many browsers do that, and 5 probably shouldn’t be refetched. Settle on your expected results and then take a look at the table below.
The Results
Before jumping into the Reload results, let’s first look at what happens if the user just navigates to the page. This is achieved by clicking on the “try again” link in the example. In this case none of the resources are refetched. All of the resources have been saved to the cache with an expiration date one month in the future, so every browser I tested just reads them from cache. This is good and what we would expect.
But the behavior diverges when we look at the Reload results captured in the following table.
Table 1. Resources that are refetched on Reload | ||||||||
technique | resource | Chrome 25 | Safari 6 | Android Safari/534 | iPhone Safari/7534 | Firefox 19 | IE 8,10 | Opera 12 |
---|---|---|---|---|---|---|---|---|
markup | image 1 | Y | Y | Y | Y | Y | Y | Y |
script 1 | Y | Y | Y | Y | Y | Y | Y | |
dynamic | image 2 | Y | Y | Y | Y | Y | Y | Y |
script 2 | Y | Y | Y | Y | Y | Y | Y | |
onload | image 3 | – | – | – | – | Y | Y | Y |
script 3 | – | – | – | – | – | Y | Y | |
1ms postonload | image 4 | – | – | – | – | – | – | Y |
script 4 | – | – | – | – | – | – | Y | |
5sec postonload | image 5 | – | – | – | – | – | – | – |
script 5 | – | – | – | – | – | – | – |
The results for Chrome, Safari, Android mobile Safari, and iPhone mobile Safari are the same. When you click Reload in these browsers the resources in the page get refetched (resources 1&2), but not so for the resources loaded in the onload handler and later (resources 3-5).
Firefox is interesting. It loads the four resources in the page plus the onload handler’s image (image 3), but not the onload handler’s script (script 3). Curious.
IE 8 and 10 are the same: they load the four resources in the page as well as the image & script from the onload handler (resources 1-3). I didn’t test IE 9 but I assume it’s the same.
Opera has the best results in my opinion. It refetches all of the resources in the main page, the onload handler, and 1 millisecond after onload (resources 1-4), but it does not refetch the resources 5 seconds after onload (image 5 & script 5). I poked at this a bit. If I raise the delay from 1 millisecond to 50 milliseconds, then image 4 & script 4 are not refetched. I think this is a race condition where if Opera is still downloading resources from the onload handler when these first delayed resources are created, then they are also refetched. To further verify this I raised the delay to 500 milliseconds and confirmed the resources were not refetched, but then increased the response time of all the resources to 1 second (instead of instantaneous) and this caused image 4 & script 4 to be refetched, even though the delay was 500 milliseconds after onload.
Note that pressing shift+Reload (and other combinations) didn’t alter the results.
Takeaways
A bit esoteric? Perhaps. This is a deep dive on a niche issue, I’ll grant you that. But I have a few buts:
If you’re a web developer using far future expiration dates and lazy loading, you might get unexpected results when you change a resource and hit Reload, and even shift+Reload. If you’re not getting the latest version of your dev resources you might have to clear your cache.
This isn’t just an issue for web devs. It affects users as well. Numerous sites lazy-load resources with far future expiration dates including 8 of the top 10 sites: Google, YouTube, Yahoo, Microsoft Live, Tencent QQ, Amazon, and Twitter. If you Reload any of these sites with a packet sniffer open in the first four browsers listed, you’ll see a curious pattern: cacheable resources loaded before onload have a 304 response status, while those after onload are read from cache and don’t get refetched. The only way to ensure you get a fresh version is to clear your cache, defeating the expected benefit of the Reload button.
Here’s a waterfall showing the requests when Amazon is reloaded in Chrome. The red vertical line marks the onload event. Notice how the resources before onload have 304 status codes. Right after the onload are some image beacons that aren’t cacheable, so they get refetched and return 200 status codes. The cacheable images loaded after onload are all read from cache, so any updates to those resources are missed.
Finally, whenever behavior varies across browsers it’s usually worthwhile to investigate why. Often one behavior is preferred over another, and we should get the specs and vendors aligned in that direction. In this case, we should make Reload more consistent and have it refetch resources, even those loaded dynamically in the onload handler.
HTTP Archive: new stats
Over the last two months I’ve been coding on the HTTP Archive. I blogged previously about DB enhancements and adding document flush. Much of this work was done in order to add several new metrics. I just finished adding charts for those stats and wanted to explain each one.
Note: In this discussion I want to comment on how these metrics have trended over the last two years. During that time the sample size of URLs has grown from 15K to 300K. In order have a more consistent comparison I look at trends for the Top 1000 websites. In the HTTP Archive GUI you can choose between “All”, “Top 1000”, and “Top 100”. The links to charts below take you straight to the “Top 1000” set of results.
Speed Index
The Speed Index chart measures rendering speed. Speed Index was invented by Pat Meenan as part of WebPagetest. (WebPagetest is the framework that runs all of the HTTP Archive tests.) It is the average time (in milliseconds) at which visible parts of the page are displayed. (See the Speed Index documentation for more information.) As we move to Web 2.0, with pages that are richer and more dynamic, window.onload is a less accurate representation of the user’s perception of website speed. Speed Index better reflects how quickly the user can see the page’s content. (Note that we’re currently investigating if the September 2012 increase in Speed Index is the result of bandwidth contention caused by the increase to 300K URLs that occurred at the same time.)
Doc Size
The Doc Size chart shows the size of the main HTML document. To my surprize this has only grown ~10% over the last two years. I would have thought that the use of inlining (i.e., data:) and richer pages would have shown a bigger increase, especially across the Top 1000 sites.
DOM Elements
I’ve hypothesized that the number of DOM elements in a page has a big impact on performance, so I’m excited to be tracking this in the DOM Elements chart. The number of DOM elements has increased ~16% since May 2011 (when this was added to WebPagetest). Note: Number of DOM elements is not currently available on HTTP Archive Mobile.
Max Reqs on 1 Domain
The question of whether domain sharding is still a valid optimization comes up frequently. The arguments against it include browsers now do more connections per hostname (from 2 to 6) and adding more domains increases the time spent doing DNS lookups. While I agree with these points, I still see many websites that download a large number of resources from a single domain and would cut their page load time in half if they sharded across two domains. This is a great example of the need for Situational Performance Optimization evangelized by Guy Podjarny. If a site has a small number of resources on one domain, they probably shouldn’t do domain sharding. Whereas if many resources use the same domain, domain sharding is likely a good choice.
To gauge the opportunity for this best practice we need to know how often a single domain is used for a large number of resources. That metric is provided by the Max Reqs on 1 Domain chart. For a given website, the number of requests for each domain are counted. The  number of requests on the most-used domain is saved as the value of “max reqs on 1 domain” for that page. The average of these max request counts is shown in the chart. For the Top 1000 websites the value has hovered around 42 for the past two years, even while the total number of requests per page as increased from 82 to 99. This tells me that third party content is a major contributor to the increase in total requests, and there are still many opportunities where domain sharding could be beneficial.
The average number of domains per page is also shown in this chart. That has risen 50%, further suggesting that third party content is a major contributor to page weight.
Cacheable Resources
This chart was previously called “Requests with Caching Headers”. While the presence of caching headers is interesting, a more important performance metric is the number of resources that have a non-zero cache lifetime (AKA, “freshness lifetime” as defined in the HTTP spec RFC 2616). To that end I now calculate a new stat for requests, “expAge”, that is the cache lifetime (in seconds). The Cacheable Resources chart shows the percentage of resources with a non-zero expAge.
This revamp included a few other improvements over the previous calculations:
- It takes the Expires header into consideration. I previously assumed that if someone sent Expires they were likely to also send max-age, but it turns out that 9% of requests have an Expires but do not specify max-age. (Max-age takes precendence if both exist.)
- When the expAge value is based on the Expires date (because max-age is absent), the freshness lifetime is the delta of the Expires date and the Date response header value. For the ~1% of requests that don’t have a Date header, the client’s date value at the time of the request is used.
- The new calculation takes into consideration Cache-Control no-store, no-cache, and must-revalidate, setting expAge to zero if any of those are present.
Cache Lifetime
The Cache Lifetime chart gives a histogram of expAge values for an individual crawl. (See the definition of expAge above.) This chart used to be called “Cache-Control: max-age”, but that was only focused on the max-age value. As described previously, the new expAge calculation takes the Expires header into consideration, as well as other Cache-Control options that override cache lifetime. For the Top 1000 sites on Feb 1 2013, 39% of resources had a cache lifetime of 0. Remembering that top sites are typically better tuned for performance, we’re not surprized that this jumps to 59% across all sites.
Sites hosting HTML on CDN
The last new chart is Sites hosting HTML on CDN. This shows the percentage of sites that have their main HTML document hosted on a CDN. WebPagetest started tracking this on Oct 1, 2012. The CDNs recorded in the most recent crawl were Google, Cloudflare, Akamai, lxdns.com, Limelight, Level 3, Edgecast, Cotendo CDN, ChinaCache, CDNetworks, Incapsula, Amazon CloudFront, AT&T, Yottaa, NetDNA, Mirror Image, Fastly, Internap, Highwinds, Windows Azure, cubeCDN, Azion, BitGravity, Cachefly, CDN77, Panther, OnApp, Simple CDN, and BO.LT. This is a new feature and I”m sure there are questions about determining and adding CDNs. We’ll follow-up on those as they come in. Keep in mind that this is just for the main HTML document.Â
It’s great to see the HTTP Archive growing both in terms of coverage (number of URLs) and depth of metrics. Make sure to checkout the About page to find links to the code, data downloads, FAQ, and discussion group.
HTTP Archive: adding flush
In my previous post, HTTP Archive: new schema & dumps, I described my work to make the database faster, easier to download, consume less disk space, and contain more stats. Once these updates were finished I was excited to start going through the code and make pages faster using the new schema changes. Although time consuming, it’s been fun to change some queries and see the site get much faster.
Along the way I bumped into the page for viewing an individual website’s results, for example Whole Foods. Despite my schema changes, it has a slow (~10 seconds) query in the middle of the page. I’ve created a bug to figure out how to improve this (I think I need a new index), but for the short term I decided to just flush the document before the slow query. This page is long, so the slow part is well below-the-fold. By adding flush I would be able to get the above-the-fold content to render more quickly.
I wrote a blog post in 2009 describing Flushing the Document Early. It describes flushing thusly:
Flushing is when the server sends the initial part of the HTML document to the client before the entire response is ready. All major browsers start parsing the partial response. When done correctly, flushing results in a page that loads and feels faster. The key is choosing the right point at which to flush the partial HTML document response. The flush should occur before the expensive parts of the back end work, such as database queries and web service calls. But the flush should occur after the initial response has enough content to keep the browser busy. The part of the HTML document that is flushed should contain some resources as well as some visible content. If resources (e.g., stylesheets, external scripts, and images) are included, the browser gets an early start on its download work. If some visible content is included, the user receives feedback sooner that the page is loading.
My first step was to add a call to PHP’s flush function right before trends.inc which contains the slow query:
<?php flush(); require_once('trends.inc'); // contains the slow query ?>
Nothing changed. The page still took ~10 seconds to render. In that 2009 blog post I mentioned it’s hard to get the details straight. Fortunately I dug into those details in the corresponding chapter from Even Faster Web Sites. I reviewed the chapter and read about how PHP uses output buffering, requiring some additional PHP flush functions. Specifically, all existing output buffers have to be cleared with a call to ob_end_flush, a new output buffer is activated by ob_start, and this new output buffer has to be cleared using ob_flush before calling flush:
<?php // Flush any currently open buffers. while (ob_get_level() > 0) { ob_end_flush(); } ob_start(); ?> [a bunch of HTML...] <?php ob_flush(); flush(); require_once('trends.inc'); // contains the slow query ?>
After following the advice for managing PHP’s output buffers, flushing still didn’t work. Reading further in the chapter I saw that Apache has a buffer that it uses when gzipping. If the size of the output is less than 8K at the time flush is called, Apache won’t flush the output because it wants at least 8K before it gzips. In my case I had only ~6K of output before the slow query so was falling short of the 8K threshold. An easy workaround is to add padding to the HTML document to exceed the threshold:
<?php // Flush any currently open buffers. while (ob_get_level() > 0) { ob_end_flush(); } ob_start(); ?> [a bunch of HTML...] <!-- 0001020304050607080[2K worth of padding]... --> <?php ob_flush(); flush(); require_once('trends.inc'); // contains the slow query ?>
After adding the padding flushing worked! It felt much faster. As expected, the flush occurred at a point well below-the-fold, so the page looks done unless the user quickly scrolls down. The downside of adding padding to the page is a larger HTML document that takes longer to download, is larger to store, etc. Instead, we used Apache’s DeflateBufferSize directive to lower the gzip threshold to 4K. With this change the page renders faster without the added page weight.
The flush change is now in production. You can see the difference using these URLs:
These URLs open a random website each time to avoid any cached MySQL results. Without flushing, the page doesn’t change for ~10 seconds. With flushing, the above-the-fold content changes after ~3 seconds, and the below-the-fold content arrives ~7 seconds later.
I still don’t see flushing used on many websites. It can be confusing and even frustrating to setup. My responses already had chunked encoding, so I didn’t have to jump through that hoop. But as you can see the faster rendering makes a significant difference. If you’re not flushing your document early, I recommend you give it a try.
HTTP Archive: new schema & dumps
I spent most of the last month, including my holiday break (cue violin) implementing major changes to the HTTP Archive. Most of the externally visible UI changes aren’t there yet – I’ll be blogging about them when they’re available. The changes I worked on were primarily on the backend to make the database faster to query, consume less disk space, easier to download, and contain more stats. This required going through every crawl since Nov 15, 2010 to import the data, massage the schema, calculate new stats, and export the updated mysql dump.
You might ask why I had to start with importing the data. Well, that’s a funny story.
Funny Story
The HTTP Archive has 6 MyISAM tables. Most of the data is in the “pages” and “requests” tables. There are ~5.5M rows in pages, and 472M in requests – reflecting the ratio of ~85 requests per page. Keeping in mind that we’re saving all the HTTP request & response headers, it’s not surprising that the requests table is over 400G. After removing some rows and applying some schema changes to reduce the table size, I ran “optimize table requests;” to reclaim the disk space. Unbeknownst to me, MySQL tried to create a temporary copy of the (400G) table on our (660G) disk. After running out of disk space, both the original requests table and the incomplete copy were corrupted. I tried the documented recovery procedures without success. I have no other clues why this happened and assume it’s atypical.
Nevertheless, in one evening 400G of data gathered over the last 2.5 years disappeared.
I’m being a little meladramatic – it’s not as bad as it sounds. I knew I had MySQL dumps for every crawl. The data wasn’t lost, it just meant it was going to be harder to apply my schema updates and calculate new stats, especially since I never wanted the requests table to climb over 300G again. And it meant that the script for applying these changes had to start by importing the data from the dump file – something that took 1-5 hours depending on the size of the crawl. There are 56 crawls.
After spending about a minute bemoaning my bad luck (and lack of MySQL skills), I accepted the fact that I was entering a task where it was necessary to put my head down and just push on. I learned this lesson in high school when I showed up for my first construction job. I arrived at the job site where the crew was adding a second story to a house. I knew the foreman and asked him what I should do, bubbling over with excitement. He pointed to the roof that had formerly been on the house and now lay in a giant pile next to the house. They were of equal size. “Load the roof in that dump truck and take it to the dump.” I looked at him bewildered, “But it won’t fit in the truck.” “Fill the truck as full as you can, drive it to the dump, come back and do it again. Keep doing that until it’s gone.” And you know what? Sometimes that’s what happens in life. I taught myself to never look at the size of the pile – it was too discouraging. I just kept my head down and loaded the truck – for three days – until the job was done. (Sorry for the lecture but I think that’s an important lesson.)
Pushing on
Here we are 4 weeks later and the job is done. While the update script was running I made a significant change to the web GUI – nothing on the production website accesses the requests table. This means we can remove old crawls without impacting users, enabling us to keep the requests table to a manageable size, usually just containing the last few crawls. Keeping the last few crawls online is extremely useful for answering one-off questions, like “how many sites use script X?” or “how many requests have header Y?”.
Since the web GUI doesn’t require the requests table, and since restoring the dump files used to take so long, I split the dump files into two files: one for pages and another for requests. With this change it’s possible for people running private instances of the HTTP Archive code to import the smaller pages table much faster if that’s all they want. Another change is that whereas the dump files were formerly only available in MySQL format, they’re now also available as CSV. You can see the list of dump files on the HTTP Archive downloads page. All of these changes were also applied to the HTTP Archive Mobile.
I’ve documented the schema changes in dbapi.inc along with the commands needed to apply the schema to existing tables. Unless you have your own private crawl data, I recommend you start from scratch and create new tables from the new dump files.
My next step is to add charts for the new stats that were added:
- Speed Index
- time to first byte
- visual complete time
- fully loaded time
- cdn (if any)
- font requests
- max # of requests on a single domain
- size of the main HTML document
- # of DOM elements
- gzip savings & missed savings
MySQL Goodness
If you’re a MySQL guru you probably cringed through much of this blog post. If you’d like to help with the MySQL code, or even just give some advice, please contact me. For example, I just saw that one of the tables was accidentally recreated as InnoDB (since we updated to MySQL 5.5 and the default storage engine changed). Strangely, the same query is 4x slower on InnoDB than it is on MyISAM. It’s a straightforward query, not full text, etc. This is the kind of thing where I wish there was a MySQL guru who was somewhat familiar with the schema so I could explain the issue or question without taking too much time. There’s still a lot of fun MySQL ahead.
This is just the first of many blog posts around these recent HTTP Archive changes. I’ve got a good one coming up on document flush. I’ll also talk about each of the new stats when they’re available.
a good blog post
Every morning I have a two hour breakfast where I catch up on the latest happenings in the world of web performance via email, Twitter, news, and blogs. My primary source for relevant blog posts is the Planet Performance RSS Feed – I recommend anyone working on web performance subscribe to this feed.
Twitter is another major source of blog post links. It’s a little noisier – which is why I only follow 37 people and love retweets. One of the people I follow is Stephen Shankland (@stshank). He’s the main person in what I consider “major media” (he’s a writer for CNET News) who really follows the web performance space. The other day he retweeted a blog post about responsive images by Paul Robert Lloyd, Responsive Images: What We Thought We Needed.
I read the post, enjoyed it, and learned some stuff so I tweeted about it.
Sharing a good blog post. Nothing big. But then I got this response:
I don’t personally know Paul nor Anselm. And Anselm’s question wasn’t at all obnoxious or unfounded. He was just curious why I had tweeted that URL. As I’ve been driving around the last few days I’ve been thinking about how I would answer this question. I really thought it was a good blog post – but why? The reasons for my tweet grew and grew to the point where I thought it deserved a blog post.
Here’s why I liked Responsive Images: What We Thought We Needed and by extension what I think makes a good blog post.
- Logical Structure
- Logical flow is key to good writing but is overlooked in many blog posts. Paul sets up the problem, provides some history, explains the proposed solutions, highlights the drawbacks, and wraps it up with a motivating conclusion. Perfect.
- Entertaining Writing Style
- Tech writing can be really boring. It takes a touch of flair to liven it up. Paul does that starting with the first sentence:
-
If you were to read a web designer’s Christmas wish list, it would likely include a solution for displaying images responsively.
- Mixing the main topic with the current cultural focus. Hemmingway? No. Made me smirk? Yes. Later he has a quote from Scrooge. Not only is this interesting as a reference to something outside of technology, it also took time to find the quote. And I loved this sentence:
-
That both feature verbose and opaque syntax, I’m not sure either should find its way into the browser – especially as alternative approaches have yet to be fully explored.
- The structure of this sentence is interesting. It causes me to slow down and really understand what he means. This might indicate that a simpler sentence or multiple sentences would have been easier for the reader to digest. But I felt a little challenge and enjoyed the change in beat to the rhythm of the read.
- Good Grammar
- Perhaps this isn’t big to most people, but OCD people like me (and Zakas and Crockford) see good grammar as attention to detail. And the details are really important when it comes to technology. I couldn’t find a single grammatical mistake when I read Paul’s post, which raises the probability that the details behind the technical analysis are solid as well. (On re-reading the post multiple times I discovered two typos. There might be more. Nevertheless this is much better than the norm.)
- Technically Sound
- I’ve worked with a lot of the alternatives for responsive images. I’m reluctant to say that I’m an expert, but I consider myself well informed. I think Paul’s description of the problem and arguments for what’s lacking are solid. I especially resonate with his third issue, “The size of a display has little relation to the size of an image”. This isn’t the first time I’ve heard this argument, but it’s more often than not left out of the discussion. The technical details and reasoning in the post were sound.
- Something to Learn
- I enjoy reading blog posts that provide me an opportunity to learn something new, or to help reinforce and reprioritize what I already know. Here are some of my takeaways from Paul’s post: Apple proposed srcset, verbosity is an issue, WebP helps, scaling images isn’t always the solution (see Bill Murray), and removing images is sometimes a good alternative.Â
- Informative Examples
- Examples have to be relevant, understandable, and pithy. Paul’s code samples do this, as well as his visual examples of Bill Murray and image grids.
- Thorough References
- Paul’s article is full of great links:Â complex, devilish,
<picture>
, Responsive Images Community Group,srcset
, experimentation with image compression, contextual methods of querying, and more. I had visited many of these links previously – which reinforced my sense that this was a quality article as I continued to read. This also told me that the links I hadn’t visited were worth visiting – which they are.
It takes a lot of time to write a good blog post. I appreciate the dedication that Paul and so many others in the web dev community show by sharing their thoughts and what they’ve learned. It’s pretty amazing.
Web Performance Community & Conversation
I first started talking about web performance in 2007. My first blog post was The Importance of Front-End Performance over on YDN in March 2007. The next month Tenni Theurer and I spoke at Web 2.0 Expo on High Performance Webpages. I hadn’t spoken at a conference since 1990 – 17 years earlier! This speaking appearance was before YSlow and my book High Performance Web Sites had been released. There was no conversation around web performance at this time – at least none that I was aware of.
Our 3 hour (!) workshop was schedule for 9:30am on a Sunday morning. Tenni and I thought we were doomed. I told her we should expect 20 or so people, and not to be disappointed if the audience was small. I remember it was a beautiful day in San Francisco and I thought to myself, “If I was here for a conference I would be out touring San Francisco rather than sitting in a 3 hour workshop at 9:30 on a Sunday morning.”
Tenni and I were surprised that we’d been assigned to a gigantic ballroom. We were also surprised that there were already 20+ people there when we arrived early to setup. But the real surprise came while we sat there waiting to start – nearly 300 people flowed into the room. We looked at each other with disbelief. Wow!
Constant blogging, open sourcing, and public speaking carried the conversation forward. The first Velocity conference took place in June 2008 with ~500 attendees. Velocity 2012 had ~2500 attendees, and now takes place on three continents. The conversation has certainly grown!
That was a fun look at the past, but what I really wanted to do in this blog post was highlight three critical places where the web performance conversation is being held now.
- Web Performance Meetups – Sergey Chernyshev started the New York Web Performance Group in April 2009. Today there are 46 Web Performance meetups with 16,631 members worldwide. Wow! This is a huge community and a great format for web performance enthusiasts to gather and share what they’ve learned to continue to make the Web even faster.
- Exceptional Performance mailing list – I started the Exceptional Performance team at Yahoo! In doing so I also created the Exceptional Performance Yahoo! Group with its associated mailing list. This group has atrophied in recent years, but I’m going to start using it again as a way to communicate to dedicated web performance developers. It currently has 1340 members and the spam rate is low. I encourage you to sign up and read & post messages on the list.
- PerfPlanet.com – I’ll be honest – I think my blog is really good. And while I encourage you to subscribe to my RSS feed, it’s actually more important that you subscribe to the feed from PerfPlanet.com. Stoyan Stefanov, another former member of the Exceptional Performance team, maintains the site including its awesome Performance Calendar (now in its fourth instantiation). Stoyan has collected ~50 of the best web performance blogs. This is my main source for the latest news and developments in the world of web performance.
It’s exciting to see our community grow. I still believe we’re at the tip of the iceberg. Back in 2007 I would have never predicted that we’d have 16K web performance meetup members, 2500 Velocity attendees, and 1340 mailing list members. I wonder what it’ll be in 2014. It’s fun to imagine.
clear current page UX
Yesterday in Perception of Speed I wrote about how clicking a link doesn’t immediately clear the screen. Instead, browsers wait “until the next page arrives” before clearing the screen. This improves the user experience because instead of being unoccupied (staring at a blank screen), users are occupied looking at the current page.
But when exactly do browsers clear the screen? Pick what you think is the best answer to that question:
- when the first byte of the new page arrives
- when the new page’s BODY element is rendered
- when DOMContentLoaded fires
- when window.onload fires
I would have guessed “A”. In fact, I’ve been telling people for years that the current page is cleared when the new document’s first byte arrives. That changed yesterday when my officemate, Ilya Grigorik, wondered outloud when exactly the browser cleared the page. It turns out the answer is at or slightly before “B” – when the new page’s BODY element is created.
Test Page
Here’s a test page that helps us explore this behavior:
The page contains a script that takes 10 seconds to return. (You can change this by editing the value for “delay” in the querystring.) I used a script because they block the browser from parsing the HTML document. Placing this script at different points in the page allows us to isolate when the screen is cleared. The three choices for positioning the script are:
- top of HEAD – The SCRIPT tag is placed immediately after the HEAD tag, even before TITLE. This is our proxy for “first byte” of the new document.
- bottom of HEAD – The SCRIPT tag is placed right before the /HEAD tag. There are a few STYLE blocks in the HEAD to create some work for the browser while parsing the HEAD. This allows us to see the state right before the BODY element is created.
- top of BODY – The SCRIPT tag is placed immediately after the BODY tag. This allows us to isolate the state right after the BODY element is created.
Background colors are assigned randomly to make it clear when the BODY has been rendered.
Test Results
I ran these three tests on the major browsers to measure how long they waited before clearing the page. I assumed it would either be 0 or 10 seconds, but it turns out some browsers are in the middle. The following table shows the number of seconds it took before the browser cleared the page for each of the three test cases.
Table 1: Number of seconds until page is cleared | |||
Browser | top of HEAD | bottom of HEAD | top of BODY |
---|---|---|---|
Chrome 23 | ~5 | ~5 | 0 |
Firefox 17 | 10 | 10 | 0 |
IE 6-9 | 10 | 10 | 0 |
Opera 12 | ~4 | ~4 | ~4 |
Safari 6 | 10 | 10 | 0 |
Let’s look at the results for each position of the script and see what we can learn.
 #1: Top of HEAD
In this test the 10 second script is placed between the HEAD and TITLE:
<head> <script src=... <title>...
Even though the HTML document has arrived and parsing has begun, none of the browsers clear the screen at this point. Â Therefore, browsers do NOT clear the page when the first byte of the new page arrives. Firefox, IE, and Safari don’t clear the screen until 10 seconds have passed and the BODY has been parsed (as we’ll confirm in test #3). Chrome and Opera have interesting behavior here – they both clear the screen after 4-5 seconds.
The major browsers differ in how they handle this situation. Which is better – to preserve the old page until the new page’s body is ready, or to clear the old page after a few seconds? Clearing the screen sooner gives the user some feedback that things are progressing, but it also leaves the user staring at a blank screen. (And the hypothesis from yesterday is that staring at a blank screen is UNoccupied time which is less satisfying.)
Do Chrome and Opera intentionally clear the page to provide the user feedback, or is something else triggering it? Ilya was doing some deep dive tracing with Chrome (as he often does) and found that the screen clearing appeared to coincide with GC kicking in. It’s not clear which behavior is best, but it is interesting that the browsers differ in how long they wait before clearing the screen when the BODY is blocked from being parsed.
Another interesting observation from this test is the time at which the TITLE is changed. I was surprised to see that all the browsers clear the old title immediately, even though the old page contents are left unchanged. This results in a mismatch where the title no longer matches the page. Every browser except one replaces the old title with the URL being requested. This is a bit unwieldy since the URL doesn’t fit in the amount of space available in a tab. The only exception to this is Firefox which displays “Connecting…” in the tab. It’s further interesting to note that this “interim” title is displayed for the entire 10 seconds until the script finishes downloading. This makes sense because the script is blocking the parser from reaching the TITLE tag. We’ll see in the next test that the TITLE is updated sooner when the script is moved lower.
 #2: Bottom of HEAD
In this test the 10 second script is placed at the end of the HEAD block:
<script src=... </head> <body>
The results for clearing the page are the same as test #1: Firefox, IE, and Safari don’t clear the page for 10 seconds. Chrome and Opera clear the screen after 4-5 seconds. The main point of this test is to confirm that the browser still hasn’t cleared the page up to the point immediately before parsing the BODY tag.
The title of the new page is displayed immediately. This makes sense since the TITLE tag isn’t blocked from being parsed as it was in test #1. It’s interesting, however, that the browser parses the TITLE tag and updates the title, but doesn’t clear the old page’s contents. This is worse than the previous mismatch. In test #1 an “interim” title was shown (the URL or “Connecting…”). Now the actual title of the new page is shown above the content of the old page.
 #3: Top of BODY
In this test the 10 second script is placed immediately after the BODY tag:
</head> <body> <script src=...
Since nothing is blocking the browser from parsing the BODY tag, all but one of the browsers immediately clear the old page and render the new body (as seen by swapping in a new random background color). However, because the very next tag is the 10 second script, the rest of the page is blocked from rendering so the user is left staring at a blank page for 10 seconds. The browser that behaves differently is Opera – it maintains its ~4 second delay before erasing the screen. This is curious. What is it waiting for? It parses the new BODY tag and knows there’s a new background color at time 0, but it waits to render that for several seconds. Does downloading the script block the renderer? But what causes the renderer to kick in even though the script still has numerous seconds before it returns?
For these contrived test cases Opera has the best behavior in my opinion. The other browsers leave the user staring at the old page for 10 seconds wondering if something’s broken (test #1 & #2 for Firefox, IE, and Safari), or leave the user staring at a blank page for 10 seconds while rendering is blocked (test #3 for Chrome, Firefox, IE, and Safari). Opera always compromises letting the user watch the old content for 3-4 seconds before clearing the page, and then shows a blank screen or the new body for the remaining 6-7 seconds.
Takeaways
These examples, while contrived, yield some real world takeaways.
- Browsers generally clear the page when the BODY is parsed, not when the first bytes arrive. I say “generally” because during testing I was able to get browsers to clear the page before BODY by adding async and inline scripts, but it was very finicky. Changing just a few lines in innocuous ways would change the behavior such that I wasn’t able to nail down what exactly was causing the page to be cleared. But all of this was after the page’s first bytes had arrived.
- It’s unclear what is the best user experience for clearing the page. The major browsers have fairly divergent behavior, and it’s not clear whether these differences are intentional. Transitioning from one page to the next is an action that’s repeated billions of times a day. I’m surprised there’s not more consistency and data on what produces the best user experience.
- Avoid frontend SPOF. I’ve written (and spoken) extensively about how loading scripts synchronously can create a single point of failure in your web page. This new information about how and when the old page is cleared adds to the concern about slow-loading synchronous scripts, especially in light of the inconsistent way browsers handle them. Whenever possible load scripts asynchronously.
The Perception of Speed
Have you ever noticed that when you click on a link the page doesn’t change right away?
If I had written the code I would have cleared the page as soon as the link was clicked. But in a masterstroke of creating the perception of faster websites, browsers instead don’t erase the old page until the next page arrives.
Keeping the old page in place improves the user experience for the same reason that making airline passengers walk six times longer to get their bags reduces complaints about long waits at the baggage claim. “Occupied time (walking to baggage claim) feels shorter than unoccupied time (standing at the carousel),” according to M.I.T. operations researcher Richard Larson, an expert on queueing. In my example of clicking a link, occupied  time (looking at the old page) feels shorter than unoccupied time (staring at a blank screen).
Let’s try an example using this page you’re currently viewing. Both of the following links add a five second delay to this page. The first link refetches this page the normal way – the browser leaves the old page until the new page arrives. The second link clears the page before refetching. Which feels slower to you?
Clicking the first link leaves the user staring at this page’s text for 5+ seconds before it refreshes. It’s slow, but not that noticeable. Clicking the second link makes the same wait time more annoying. I actually start getting antsy, shuffling my feet and shifting in my chair. For real users and web pages this translates into higher abandonment rates.
One takeaway from this is to keep your eye on the ball. In the case of web performance, we want to create a faster, better user experience. There are great techniques for tackling that problem head-on (reduce repaints, optimize JavaScript, etc.), but sometimes you can make significant improvements with changes that address the user’s perception of speed such as spinners and progress bars.
Another takeaway is to build single page web apps. The old Web 1.0 way of requesting a new page for every user action and repainting the entire page is more likely to produce a jarring, slow experience. Using Ajax allows us to keep the user engaged while requests and responses are handled in the background, often being able to do them asynchronously.
Comparing RUM & Synthetic Page Load Times
Yesterday I read Etsy’s October 2012 Site Performance Report. Etsy is one of only a handful of companies that publish their performance stats with explanations and future plans. It’s really valuable (and brave!), and gives other developers an opportunity to learn from an industry leader. In this article Etsy mentions that the page load time stats are gathered from a private instance of WebPagetest. They explain their use of synthetically-generated measurements instead of RUM (Real User Monitoring) data:
You might be surprised that we are using synthetic tests for this front-end report instead of Real User Monitoring (RUM) data.  RUM is a big part of performance monitoring at Etsy, but when we are looking at trends in front-end performance over time, synthetic testing allows us to eliminate much of the network variability that is inherent in real user data. This helps us tie performance regressions to specific code changes, and get a more stable view of performance overall.
Etsy’s choice of synthetic data for tracking performance as part of their automated build process totally makes sense. I’ve talked to many companies that do the same thing. Teams dealing with builds and code regressions should definitely do this. BUT… it’s important to include RUM data when sharing performance measurements beyond the internal devops team.
Why should RUM data always be used when talking beyond the core team?
The issue with only showing synthetic data is that it typically makes a website appear much faster than it actually is. This has been true since I first started tracking real user metrics back in 2004. My rule-of-thumb is that your real users are experiencing page load times that are twice as long as their corresponding synthetic measurements.
RUM data, by definition, is from real users. It is the ground truth for what users are experiencing. Synthetic data, even when generated using real browsers over a real network, can never match the diversity of performance variables that exist in the real world: browsers, mobile devices, geo locations, network conditions, user accounts, page view flow, etc. The reason we use synthetic data is that it allows us to create a consistent testing environment by eliminating the variables. The variables we choose for synthetic testing matches a segment of users (hopefully) but it can’t capture the diversity of users that actually visit our websites every day. That’s what RUM is for.
The core team is likely aware of the biases and assumptions that come with synthetic data. They know that it was generated using only laptops and doesn’t include any mobile devices; that it used a simulated LAN connection and not a slower DSL connection; that IE 9 was used and IE 6&7 aren’t included. Heck, they probably specified these test conditions. The problem is that the people outside the team who see the (rosy) synthetic metrics aren’t aware of these caveats. Even if you note these caveats on your slides, they still won’t remember them! What they will remember is that you said the page loaded in 4 seconds, when in reality most users are getting a time closer to 8 seconds.
How different are RUM measurements as compared to synthetic?
As I said a minute ago, my rule-of-thumb is that RUM page load times are typically 2x what you see from synthetic measurements. After my comment on the Etsy blog post about adding RUM data and a tweet from @jkowall asking for data comparing RUM to synthetic less than 24 hours later, I decided to gather some real data from my website.
Similar to Etsy, I used WebPagetest to generate synthetic measurements. I chose a single URL: https://stevesouders.com/blog/2012/10/11/cache-is-king/. I measured it using a simulated DSL connection in Chrome 23, Firefox 16, and IE 9. I measured both First View (empty cache) and Repeat View (primed cache). I did three page loads and chose the median. My RUM data came from Google Analytics’ Site Speed feature over the last month. As shown in this chart of the page load time results, the RUM page load times are 2-3x slower than the synthetic measurements.
There’s some devil in the details. The synthetic data could have been more representative: I could have done more than three page loads, tried different network conditions, and even chosen different geo locations. The biggest challenge was mixing the First View and Repeat View page load times to compare to RUM. The RUM data contains both empty cache and primed cache page views, but the split is unknown. A study Tenni Theurer and I did in 2007 showed that ~80% of page views are done with a primed cache. To be more conservative I averaged the First View and Repeat View measurements and call that “Synth 50/50” in the chart. The following table contains the raw data:
Chrome 23 | Firefox 16 | IE 9 | |
---|---|---|---|
Synthetic First View (secs) | 4.64 | 4.18 | 4.56 |
Synthetic Repeat View (secs) | 2.08 | 2.42 | 1.86 |
Synthetic 50/50 (secs) | 3.36 | 3.30 | 3.21 |
RUM (secs) | 9.94 | 8.59 | 6.67 |
RUM data points | 94 | 603 | 89 |
In my experience these results showing RUM page load times being much slower than synthetic measurements are typical. I’d love to hear from other website owners about how their RUM and synthetic measurements compare. In the meantime, be cautious about only showing your synthetic page load times – the real user experience is likely quite a bit slower.