Web Page Load Times
Interest in web performance is growing rapidly. Web companies are trying to use speed as a distinguishing feature. At the same time, web pages have more content than ever before which makes for a slower page. Ajax helps reduce the number of roundtrips required for a web application, but today's alternatives for measuring web performance don't work well for Web 2.0 apps.
What's needed is a way to measure web page load times that works for Web 2.0, is easy for web developers to adopt and maintain, can be leveraged by web metrics service providers, generates data usable by web development tools, and provides context to browsers so they can give better feedback to users about their experience.
Episodes provides a framework to do this. It has the following key features:
Interest in web performance is growing rapidly. In 2008 two new conferences on web performance and operations, Velocity and Structure 2008, both sold out. Several books on web performance and scalability have been published recently or will be published in the near future: my book High Performance Web Sites, High Performance MySQL: Optimization, Backups, Replication, and More by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy Zawodny, Arjen Lentz, and Derek Balling, The Art of Capacity Planning by John Allspaw (expected late 2008), and Website Optimization by Andy King (expected late 2008).
In the past, web performance work has focused on the backend: databases, web servers, application frameworks, etc. Recently the focus has shifted to clientside performance. For most web pages only 20% of the end user response time is spent retrieving the page from the web server. The other 80% of the time is spent parsing the page, downloading all the resources included in the page, parsing CSS, and parsing and executing JavaScript. (See High Performance Web Sites, Chapter A by Steve Souders.) In order to optimize clientside performance, it's necessary to be able to measure how long it takes for a web page to load.
The tools and services that exist today for measuring clientside performance worked in the Web 1.0 world, but are incapable of or inadequate for measuring Web 2.0. New techniques exist, but are fragmented. Web developers often need to use multiple solutions to get the full coverage of performance measurements desired. There's a learning curve with each alternative. Switching costs to swap out one alternative for another can be high. Some alternatives are not sound, resulting in inaccurate measurements.
Episodes is a web performance measurement framework that solves these issues. It has the following key features:
The goal is to make Episodes the industrywide solution for measuring web page load times. his is possible because Episodes has benefits for all the stakeholders. Web developers only need to learn and deploy a single framework. Tool developers and web metrics service providers get more accurate timing information by relying on instrumentation inserted by the developer of the web page. Browser developers gain insight into what's happening in the web page by relying on the context relayed by Episodes.
Most importantly, users benefit by the adoption of Episodes. They get a browser that can better inform them of the web page's status for Web 2.0 apps. Since Episodes is a lighter weight design than other instrumentation frameworks, users get faster pages. As Episodes makes it easier for web developers to shine a light on performance issues, the end result is an Internet experience that is faster for everyone.
Measuring up to the onload event is a technique that works for Web 1.0, but not in today's Web 2.0 world.
The two main approaches for measuring Web 2.0 apps are recorded test scripts and programmatic scripting.
Programmatic scripting is the approached used by Episodes.
Several frameworks exist today for measuring Web 1.0 page load times from the client perspective. Keynote, Gomez, WebMetrics, Pingdom, and others provide services to measure response times. Developer tools such as Fasterfox and YSlow report page load times.
These services and tools use the window onload event as the "end time" for response time measurements.
In the Web 1.0 world the onload event works pretty well as an indicator of the page being ready.
As the adoption of DHTML and Ajax grows, the onload event is less indicative of when the page is done loading.
In some cases, web developers have gone to great pains to render content at the top of the page quickly while heavier content is loaded later in the page - thus the page is ready from the user's perspective before the onload event.
In other cases, especially for Ajax apps, it's not until after the onload fires
that large amounts of JavaScript are downloaded and many DOM operations are performed.
In these cases the onload event is too optimistic in measuring page load times.
Using onload doesn't work for Web 2.0.
Because Web 2.0 pages are complex and constructed dynamically, it's necessary to rely on the page developers, or someone else familiar with the page, to manually identify where to perform time measurements in the page. There are two ways to do this: recorded test scripts and programmatic scripting.
Performance test frameworks, such as Selenium, Watir, and Keynote, provide tools for recording a web page test. These test scripts (meaning a script of steps to replay, not a JavaScript script) are replayed again and again to generate time measurements. For a Web 2.0 page, the start and end points can be instrumented to coincide with a specific DOM update or HTTP response.
This approach allows for specific and customized tests that are accurate, but are time consuming to create. Also, the scripts tend to be rigid and thus are hard to maintain. As changes are made to the web app, the test scripts become out-of-sync. Often the web developer and test script developer are different people, so it's even more likely that application changes aren't reflected in the test script. It's challenging enough to maintain test scripts for your own sites, imagine maintaining scripts that measure your competitors' sites. From a global perspective, this approach results in a lot of redundant work. Imagine the number of Google Search scripts creating by search competitors around the world. Finally, test scripts can only be run on simulated web traffic. Often, measurements from real user traffic is more insightful.
Programmatic scripting is the preferred approach for measuring Web 2.0 apps. With this approach web developers embed JavaScript calls that record time measurements within the web page. The web developer (or team) building the app is likely to also be responsible for embedding the JavaScript timing code, so it's more likely to be kept up-to-date as the code changes. Also, this approach allows for time measurements to be gathered from real user traffic.
There are drawbacks to the programmatic scripting approach. It needs to be implemented. (Although the recent release of Jiffy provides an Open Source alternative.) The switching cost is high. Actually embedding the framework may increase the page size to the point that it has a detrimental effect on performance. And programmatic scripting isn't a viable solution for measuring competitors. All of these drawbacks are addressed by Episodes.
A feature of Episodes is that the work to instrument a page with timers can be completely separate from the code to collect and report the timing information.
This section starts off by explaining how to instrument pages using window.postMessage.
That is followed by examples of how the episodic times can be collected and reported.
The need for backwards compatible code to support older browsers and a prototype of episodes.js that could be shared by all web pages is presented.
Episode timers are inserted in the page using JavaScript's window.postMessage
(see Mozilla Developer Center,
postMessage Method for IE8, and
John Resig's post).
window.postMessage is supported in Internet Explorer 8, Firefox 3, Opera 9.5, and WebKit Nightlies.
(The implementation of Episodes for older browsers is discussed in the section on Backwards Compatibility.)
There are some key benefits of choosing window.postMessage.
It's built into the browser. No additional JavaScript function definitions are required, so the increase in page size is small.
window.postMessage uses events, so multiple event listeners can consume the timing measurements.
window.postMessage takes two parameters.
The first parameter is a string (the message).
The second parameter is a target origin.
In Episodes the target origin is always "*" to allow collectors other than the page itself (e.g., browser extensions).
The message parameter is an opaque string. This is very flexible. For the purposes of Episodes a message syntax is defined to allow instrumenters and collectors to work together. The syntax is:
"EPISODES:action[:arguments]"
The following actions and associated arguments are defined:
"EPISODES:init"
"EPISODES:init" between subsequent user requests such as display mail folder, compose message, etc.
"EPISODES:mark:markName[:markTime]"
markName.
The epoch time recorded is markTime if specified; otherwise it's the current epoch time.
"EPISODES:measure:episodeName[:startMarkName|startEpochTime[:endMarkName|endEpochTime]]"
episodeName.
The "start time" of the episode is the epoch time associated with the mark named startMarkName or the explicit
epoch time value startEpochTime. If neither of these parameters is provided then startMarkName
is assumed to be the same as episodeName. By default, the "end time" of the episode is the epoch time at the point
the message is sent (i.e., "now").
Optionally, the end time can be specified.
If endMarkName is provided the end time will be the epoch time associated with the mark called endMarkName.
If endEpochTime is provided that epoch time value is used as the end time.
"EPISODES:done"
Figure 1 shows an example of a page instrumented with Episodes. These are, in fact, the Episodes that exist in the HTML document you're reading now. You can View the document source to see the actual JavaScript.
01: <html>
02:
03: <head>
04: <script>
05: if ("undefined"===typeof(window.postMessage)) window.postMessage=function(){};
06: window.postMessage("EPISODES:mark:firstbyte", "*");
07: window.postMessage("EPISODES:measure:backend:starttime", "*");
08:
09: function doPageReady() {
10: window.postMessage("EPISODES:measure:frontend:firstbyte", "*");
11: window.postMessage("EPISODES:measure:pageready:starttime", "*");
12: window.postMessage("EPISODES:measure:totaltime:starttime", "*");
13: window.postMessage("EPISODES:done", "*");
14: }
15:
16: window.addEventListener("load", doPageReady, false);
17: </script>
18:
19: [...]
20:
21: </head>
22: <body>
23:
24: [...]
25:
26: <img src="episodes.gif" width=544 height=144
27: onload='window.postMessage("EPISODES:measure:abovethefold:starttime")'>
28:
29: [...]
30:
31: </body>
32: </html>
Figure 1. Example of Episodes instrumentation.
Let's walk through how this page is instrumented.
window.postMessage. If that function doesn't exist, then a stub function is created.
doPageReady function that is called when the page is done.
For this page the page is done when the onload event occurs, so doPageReady is attached to the onload event in line 16.
This function records three measurements.
doPageReady function, posts the "EPISODES:done" message.
This indicates that all the episodes for this request are completed, and anyone gathering
the episode timing information can report the data.
A visual representation of this page's episodes is shown in Figure 2. You can see how "totaltime" and "pageready" are identical for this page, but if a page did lazy loading or prefetching these times would differ. "abovethefold" is slightly less than "totaltime" indicating that after the image is downloaded the browser still takes approximately 21ms to finish the page. The "backend" time is greater than "frontend" because this page has so few resources.
Figure 2. Visual representation of this page's episodes.
Episodes is implemented using JavaScript events.
Specifically, it uses window.postMessage.
Applications that want to consume these events do so using attachEvent
for Internet Explorer or addEventListener in other browsers.
(Also refer to ppk's article Advanced event registration models.)
An example of an Episodes event listener is shown in Figure 3.
01: var marks = {};
02: var measures = {};
03:
04: function handleEpisodeMessage(e) {
05: var message = e.data;
06: var aParts = message.split(':');
07: if ( "EPISODES" === aParts[0] ) {
08: var action = aParts[1];
09:
10: if ( "init" === action ) {
11: marks = {};
12: measures = {};
13: }
14:
15: else if ( "mark" === action ) {
16: var markName = aParts[2];
17: marks[markName] = aParts[3] || Number(new Date());
18: }
19:
20: else if ( "measure" === action ) {
21: var episodeName = aParts[2];
22:
23: var startMarkName =
24: ( "undefined" != typeof(aParts[3]) ? aParts[3] : episodeName );
25:
26: var startEpochTime =
27: ( "undefined" != typeof(marks[startMarkName]) ? marks[startMarkName] :
28: ( ("" + startMarkName) === parseInt(startMarkName) ? startMarkName :
29: undefined ) );
30:
31: var endEpochTime =
32: ( "undefined" === typeof(aParts[4]) ? Number(new Date()) :
33: ( "undefined" != typeof(marks[aParts[4]]) ? marks[aParts[4]] :
34: aParts[4] ) );
35:
36: if ( startEpochTime ) {
37: measures[episodeName] = parseInt(endEpochTime - startEpochTime);
38: }
39: }
40:
41: else if ( "done" === action ) {
42: var sTimes = "";
43: for ( var key in measures ) {
44: sTimes += "," + key + ":" + measures[key];
45: }
46:
47: if ( sTimes ) {
48: sTimes = sTimes.substring(1); // strip leading ","
49: img = new Image();
50: img.src = "http://yourserver.com/beacon.gif" + "?ets=" + sTimes;
51: }
52: }
53:
54: }
55: }
56:
57: window.addEventListener("message", handleEpisodeMessage, false);
Figure 3. Example of Episodes event listener.
Here are key pieces of the Episodes event listener implementation shown in Figure 3.
marks hash associated with the markName.
If no epoch time is specified the current time is used.
startMarkName to the value specified or to the name of the episode itself if no start argument is specified.
Lines 26-29 determine the "start time" of this episode.
If a mark exists named startMarkName that value is used, otherwise startMarkName is tested to see if
it's an explicit epoch time. Otherwise, startEpochTime is undefined.
Lines 31-34 determine the "end time" of this episode.
If no end time argument is specified then the current time is used.
If there is an end time argument, the mark associated with that is used if it exists, otherwise the end time argument is assumed to be an explicit epoch time.
Line 37 records the length of this episode by taking the difference between the end and start times.
totaltime:531,pageready:531,abovethefold:510,backend:368,frontend:164
ets.
This syntax is used to match what Jiffy uses,
with the hope of promoting code reuse.
attachEvent would be used instead of addEventListener.
episodes.jsCollectors of episodic timing data will take different actions when reporting the data, but the code to gather the data will be very similar. To facilitate the adoption of Episodes a common collection implementation is available: episodes.js.
This script implements an event listener that collects all the Episodes message events and stores the timing information. Web developers access that information through the following API functions.
EPISODES.getMeasures()
EPISODES.getStarts()
EPISODES.sendBeacon(url)
url.
For example,
EPISODES.sendBeacon("http://yourserver.com/beacon.gif");
http://yourserver.com/beacon.gif?ets=totaltime:531,pageready:531,abovethefold:510,backend:368,frontend:164
EPISODES.addEventListener(sType, callback, bCapture)
window.attachEvent
and other browsers which use window.addEventListener.
There are other benefits of using episodes.js:
window.postMessage a custom event implementation
(episodes-compat.js) is invoked that
supports the same code for posting and listening to Episodes events. In other words, window.postMessage("EPISODES:etc", "*")
and window.addEventListener("message", yourfunction, false) work.
See the Backwards Compatibility section for more information on this custom event implementation.
onbeforeunload handler to record the start time in a cookie,
and then to look for and parse that cookie when the requested page loads.
This technique is done automatically by episodes.js so that full page load times can be measured
when the user navigates within the same domain.
Episodes uses window.postMessage, but that's only supported in Internet Explorer 8, Firefox 3, Opera 9.5, and WebKit Nightlies.
Episodes's use of window.postMessage is beautiful because, for newer browsers, there is no additional JavaScript implementation to download.
For current browsers that don't support window.postMessage a custom event is used.
Custom events provide a publish and subscribe implementation in JavaScript for event types that aren't built into the browser.
A drawback of custom events is that the framework to implement the custom event must be downloaded which can degrade page performance.
However, developers who want to gather timing information about their web apps are already embedding additional JavaScript to their pages,
for example, Netflix and
Whitepages.com.
The cost of downloading the Episodes custom event implementation is comparable to these other frameworks.
The advantage is that adopting Episodes means that, as newer browsers gain market share, we move to an implementation that has no download cost.
A goal of Episodes is to make the code to instrument and gather timing information identical on both old and new browsers.
For older browsers this is done by adding a definition for window.postMessage.
Additionally, window.addEventListener and window.attachEvent are overridden to support
listening to the "message" event type.
Section 3.1.2 Episodes Instrumentation Example shows how this page is instrumented with Episodes. To see a page with both instrumentation and data collection let's look at the Episodes Example. The example page is instrumented with marks and measures, similar to that shown in Figure 1. This section highlights the additional code necessary to also collect episode timing data.
At the top of the page, immediately after the HEAD tag, this JavaScript occurs:
<script>
var t_firstbyte = Number(new Date());
</script>
<script src="http://stevesouders.com/episodes/episodes.js"></script>
<script>
window.postMessage("EPISODES:mark:firstbyte:" + t_firstbyte, "*");
</script>
Downloading episodes.js provides a default implementation for gathering Episodes and to support older browsers,
as described in the episodes.js section.
But this download could affect the page itself.
We're caught in a race condition where we want to start using Episodes immediately, but the implementation to consume the events doesn't yet exist.
The solution is to record a time measurement (t_firstbyte) and mark it after episodes.js has been downloaded.
In the example page suppose we want to beacon the Episodes timing information back to our server. To do that we attach a callback to the "message" event.
<script>
function handleEpisodeResults(event) {
if ( "EPISODES:done" === event.data ) {
EPISODES.sendBeacon("http://yourserver.com/beacon.gif");
}
}
EPISODES.addEventListener("message", handleEpisodeResults, false);
</script>
We've used EPISODES.addEventListener to handle the differences between Internet Explorer's use
of window.attachEvent and other browsers which use window.addEventListener.
When the "EPISODES:done" message event fires, the beacon is sent.
To demonstrate how using events allows other applications to measure Episodes, I created the Episodes Firebug add-on. As shown in Figure 4, this add-on provides a graphical rendering of the Episodes instrumented in the page.
Figure 4. Episodes Firebug add-on.
The ways of measuring performance in Web 1.0 applications is not sufficient for Web 2.0. Efforts are already underway to address the gap. Episodes is similar to some of those projects, Jiffy for example. Episodes can be an industry standard because of its distinguishing features of using JavaScript events and an implementation that is built into the newest browsers, requiring no additional download. The goal is to evangelize the benefits to all interested parties and show how Episodes leads to a future of more accurate and more efficient performance measurements. There are four main groups who work in measuring web page performance: web developers, web metrics service providers, tool developers, and browser developers. This section describes the benefits that Episodes brings to each of them.
Today, if a web developer or web company wants to measure the load time of their Web 2.0 application, they can either instrument their page programmatically or use a web metrics service provider. The best choice is programmatic scripting to get data from real user traffic. This would entail implementing an instrumentation library in JavaScript, instrumenting their pages, collecting the beaconed data, and generating reports. With Episodes, the library is provided and Open Source code for generating reports would exist.
Web companies that want to compare their load times to a competitor are forced to use a web metrics service provider. This requires recording scripts with the service provider's proprietary tool for the competitor's pages as well as their own pages. Maintaining these scripts is burdensome and prone to become out-of-sync. With Episodes, competitor's pages would already be instrumented and the service provider's test agent could record the Episodes events.
Some would argue that web companies might not want to use Episodes and allow their competitors to have visibility into their page's load times, or they might game the measurements. But the benefits to any individual company outweigh these concerns. Already web companies can measure each other's pages, it's just more costly, inefficient, and brittle. Allowing competitors to have accurate measurements of their own web pages is not a sacrifice, especially when the gain is more accurate measurements for the web company itself, as well as better integration with web metrics service providers, web tools, and browsers as described in the following sections.
Web metrics service providers, such as Keynote, Gomez, WebMetrics, and Pingdom, have solutions for measuring Web 2.0 applications, but they are proprietary solutions. These are less desirable to customers because of the high cost of implementation and the high switching cost.
Instead, these companies could modify their test agents to collect timing information from Episodes. That way, web companies that had already instrumented their pages with Episodes for their own internal measurements could sign on with a web metrics service without the hurdle of recording test scripts or instrumenting with the service provider's proprietary JavaScript library. There would be a race to be the first service that supports Episodes. Also, because programmatically scripting is more robust than recording test scripts, customers would be more likely to sustain their use of the service provider because the measurements would retain value over a longer period.
Web development tools such as Fasterfox and
YSlow report page load times based on the onload event.
Web 2.0 developers need something more powerful.
Episodes provides a framework that web tools could use to gather more detailed and more accurate timing information from Web 2.0 applications.
The Episodes Firebug add-on is an example of such a tool.
In the Web 1.0 world browser users benefit from the status bar and other visual cues giving them feedback about their current activity. This feedback is weaker for Web 2.0 applications. The status bar never says "Done" after my web-based email inbox is retrieved using Ajax, or at least not unless the web developer has done the work to explicitly implement that feedback.
Episodes provides a way for browsers to have more insight into what the web application is doing, and could use that to give the user better contextual feedback. Also, browser developers could use the episodic timing data from popular web apps to monitor their progress in making pages load faster in newer browser versions.
It's critical that these groups work together to adopt Episodes. A first step will be to gather input from industry leaders in each group. The implementation put forth here is a prototype. It needs to be hardened. An important step is getting agreement on 5-10 episode names that have a common definition: totaltime, pageready, backend, frontend, etc.
Everyone wants a faster web experience. Users want it, and web companies are working harder than ever to deliver it. The key is visibility into where web apps are spending their time so that developers can focus on what makes the biggest difference to the user. It's exciting to think about a future where performance measurements are easy to implement, accurate, and open. Episodes provides a framework to do this that works for Web 2.0 applications, makes it easy for timing information to be consumed beyond the web page itself, and can serve as an industrywide standard for measuring web page performance.