Episodes Home
Episodes: a Framework for Measuring

Web Page Load Times

Steve Souders
Google
July 2008

Abstract

Interest in web performance is growing rapidly. Web companies are trying to use speed as a distinguishing feature. At the same time, web pages have more content than ever before which makes for a slower page. Ajax helps reduce the number of roundtrips required for a web application, but today's alternatives for measuring web performance don't work well for Web 2.0 apps.

What's needed is a way to measure web page load times that works for Web 2.0, is easy for web developers to adopt and maintain, can be leveraged by web metrics service providers, generates data usable by web development tools, and provides context to browsers so they can give better feedback to users about their experience.

Episodes provides a framework to do this. It has the following key features:

  • Supports measuring Web 2.0 applications by having the timing instrumentation integrated with the application's client code.
  • Separates the instrumentation from the data collection. This reduces the work for the application developer, allows multiple services to consume and report the information, and results in a lighter weight implementation.
  • Is Open Source, gathering the best practices from across the industry without bias to any company or organization.
  • Provides a single framework that can be used by web developers, tool developers, browser developers, and web metrics service providers.

Contents

1   Web Page Performance

Interest in web performance is growing rapidly. In 2008 two new conferences on web performance and operations, Velocity and Structure 2008, both sold out. Several books on web performance and scalability have been published recently or will be published in the near future: my book High Performance Web Sites, High Performance MySQL: Optimization, Backups, Replication, and More by Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy Zawodny, Arjen Lentz, and Derek Balling, The Art of Capacity Planning by John Allspaw (expected late 2008), and Website Optimization by Andy King (expected late 2008).

In the past, web performance work has focused on the backend: databases, web servers, application frameworks, etc. Recently the focus has shifted to clientside performance. For most web pages only 20% of the end user response time is spent retrieving the page from the web server. The other 80% of the time is spent parsing the page, downloading all the resources included in the page, parsing CSS, and parsing and executing JavaScript. (See High Performance Web Sites, Chapter A by Steve Souders.) In order to optimize clientside performance, it's necessary to be able to measure how long it takes for a web page to load.

The tools and services that exist today for measuring clientside performance worked in the Web 1.0 world, but are incapable of or inadequate for measuring Web 2.0. New techniques exist, but are fragmented. Web developers often need to use multiple solutions to get the full coverage of performance measurements desired. There's a learning curve with each alternative. Switching costs to swap out one alternative for another can be high. Some alternatives are not sound, resulting in inaccurate measurements.

Episodes is a web performance measurement framework that solves these issues. It has the following key features:

The goal is to make Episodes the industrywide solution for measuring web page load times. his is possible because Episodes has benefits for all the stakeholders. Web developers only need to learn and deploy a single framework. Tool developers and web metrics service providers get more accurate timing information by relying on instrumentation inserted by the developer of the web page. Browser developers gain insight into what's happening in the web page by relying on the context relayed by Episodes.

Most importantly, users benefit by the adoption of Episodes. They get a browser that can better inform them of the web page's status for Web 2.0 apps. Since Episodes is a lighter weight design than other instrumentation frameworks, users get faster pages. As Episodes makes it easier for web developers to shine a light on performance issues, the end result is an Internet experience that is faster for everyone.

2   Measuring Web Pages

Measuring up to the onload event is a technique that works for Web 1.0, but not in today's Web 2.0 world. The two main approaches for measuring Web 2.0 apps are recorded test scripts and programmatic scripting. Programmatic scripting is the approached used by Episodes.

2.1   Measuring Web 1.0

Several frameworks exist today for measuring Web 1.0 page load times from the client perspective. Keynote, Gomez, WebMetrics, Pingdom, and others provide services to measure response times. Developer tools such as Fasterfox and YSlow report page load times.

These services and tools use the window onload event as the "end time" for response time measurements. In the Web 1.0 world the onload event works pretty well as an indicator of the page being ready. As the adoption of DHTML and Ajax grows, the onload event is less indicative of when the page is done loading. In some cases, web developers have gone to great pains to render content at the top of the page quickly while heavier content is loaded later in the page - thus the page is ready from the user's perspective before the onload event. In other cases, especially for Ajax apps, it's not until after the onload fires that large amounts of JavaScript are downloaded and many DOM operations are performed. In these cases the onload event is too optimistic in measuring page load times. Using onload doesn't work for Web 2.0.

2.2   Measuring Web 2.0

Because Web 2.0 pages are complex and constructed dynamically, it's necessary to rely on the page developers, or someone else familiar with the page, to manually identify where to perform time measurements in the page. There are two ways to do this: recorded test scripts and programmatic scripting.

2.2.1   Recorded Test Scripts

Performance test frameworks, such as Selenium, Watir, and Keynote, provide tools for recording a web page test. These test scripts (meaning a script of steps to replay, not a JavaScript script) are replayed again and again to generate time measurements. For a Web 2.0 page, the start and end points can be instrumented to coincide with a specific DOM update or HTTP response.

This approach allows for specific and customized tests that are accurate, but are time consuming to create. Also, the scripts tend to be rigid and thus are hard to maintain. As changes are made to the web app, the test scripts become out-of-sync. Often the web developer and test script developer are different people, so it's even more likely that application changes aren't reflected in the test script. It's challenging enough to maintain test scripts for your own sites, imagine maintaining scripts that measure your competitors' sites. From a global perspective, this approach results in a lot of redundant work. Imagine the number of Google Search scripts creating by search competitors around the world. Finally, test scripts can only be run on simulated web traffic. Often, measurements from real user traffic is more insightful.

2.2.2   Programmatic Scripting

Programmatic scripting is the preferred approach for measuring Web 2.0 apps. With this approach web developers embed JavaScript calls that record time measurements within the web page. The web developer (or team) building the app is likely to also be responsible for embedding the JavaScript timing code, so it's more likely to be kept up-to-date as the code changes. Also, this approach allows for time measurements to be gathered from real user traffic.

There are drawbacks to the programmatic scripting approach. It needs to be implemented. (Although the recent release of Jiffy provides an Open Source alternative.) The switching cost is high. Actually embedding the framework may increase the page size to the point that it has a detrimental effect on performance. And programmatic scripting isn't a viable solution for measuring competitors. All of these drawbacks are addressed by Episodes.

3   Episodes

A feature of Episodes is that the work to instrument a page with timers can be completely separate from the code to collect and report the timing information. This section starts off by explaining how to instrument pages using window.postMessage. That is followed by examples of how the episodic times can be collected and reported. The need for backwards compatible code to support older browsers and a prototype of episodes.js that could be shared by all web pages is presented.

3.1   Instrumenting Episodes

Episode timers are inserted in the page using JavaScript's window.postMessage (see Mozilla Developer Center, postMessage Method for IE8, and John Resig's post). window.postMessage is supported in Internet Explorer 8, Firefox 3, Opera 9.5, and WebKit Nightlies. (The implementation of Episodes for older browsers is discussed in the section on Backwards Compatibility.)

There are some key benefits of choosing window.postMessage. It's built into the browser. No additional JavaScript function definitions are required, so the increase in page size is small. window.postMessage uses events, so multiple event listeners can consume the timing measurements.

3.1.1   Episodes Message Syntax

window.postMessage takes two parameters. The first parameter is a string (the message). The second parameter is a target origin. In Episodes the target origin is always "*" to allow collectors other than the page itself (e.g., browser extensions).

The message parameter is an opaque string. This is very flexible. For the purposes of Episodes a message syntax is defined to allow instrumenters and collectors to work together. The syntax is:

    "EPISODES:action[:arguments]"

The following actions and associated arguments are defined:

"EPISODES:init"
Initialize the Episodes timers. "init" is only used by Web 2.0 applications where multiple "page views" are performed within one JavaScript state. For example, a web email application would record the Episodes when the applications loads, but the developer would need to call "EPISODES:init" between subsequent user requests such as display mail folder, compose message, etc.
"EPISODES:mark:markName[:markTime]"
"mark" records an epoch time associated with the name markName. The epoch time recorded is markTime if specified; otherwise it's the current epoch time.
"EPISODES:measure:episodeName[:startMarkName|startEpochTime[:endMarkName|endEpochTime]]"
"measure" records the length of the episode called episodeName. The "start time" of the episode is the epoch time associated with the mark named startMarkName or the explicit epoch time value startEpochTime. If neither of these parameters is provided then startMarkName is assumed to be the same as episodeName. By default, the "end time" of the episode is the epoch time at the point the message is sent (i.e., "now"). Optionally, the end time can be specified. If endMarkName is provided the end time will be the epoch time associated with the mark called endMarkName. If endEpochTime is provided that epoch time value is used as the end time.
"EPISODES:done"
"done" indicates that the episodes are finished. This is most useful as an indicator to data collectors that they can gather the time measurements and report them.

3.1.2   Episodes Instrumentation Example

Figure 1 shows an example of a page instrumented with Episodes. These are, in fact, the Episodes that exist in the HTML document you're reading now. You can View the document source to see the actual JavaScript.

01:  <html>
02:  
03:  <head>
04:  <script>
05:  if ("undefined"===typeof(window.postMessage)) window.postMessage=function(){};
06:  window.postMessage("EPISODES:mark:firstbyte", "*");
07:  window.postMessage("EPISODES:measure:backend:starttime", "*");
08:  
09:  function doPageReady() { 
10:    window.postMessage("EPISODES:measure:frontend:firstbyte", "*");
11:    window.postMessage("EPISODES:measure:pageready:starttime", "*");
12:    window.postMessage("EPISODES:measure:totaltime:starttime", "*");
13:    window.postMessage("EPISODES:done", "*");
14:  }
15:  
16:  window.addEventListener("load", doPageReady, false);
17:  </script>
18:  
19:  [...]
20:  
21:  </head>
22:  <body>
23:  
24:  [...]
25:  
26:  <img src="episodes.gif" width=544 height=144
27:       onload='window.postMessage("EPISODES:measure:abovethefold:starttime")'>
28:  
29:  [...]
30:  
31:  </body>
32:  </html>

Figure 1. Example of Episodes instrumentation.

Let's walk through how this page is instrumented.

A visual representation of this page's episodes is shown in Figure 2. You can see how "totaltime" and "pageready" are identical for this page, but if a page did lazy loading or prefetching these times would differ. "abovethefold" is slightly less than "totaltime" indicating that after the image is downloaded the browser still takes approximately 21ms to finish the page. The "backend" time is greater than "frontend" because this page has so few resources.

Figure 2. Visual representation of this page's episodes.

3.2   Collecting Episodes

Episodes decouples the insertion of timers from the collection of the timing information. The previous section explained how to insert episode timers in a page. This section describes an implementation of an Episodes event listener and provides examples of different types of listener applications.

3.2.1   Episodes Event Listener

Episodes is implemented using JavaScript events. Specifically, it uses window.postMessage. Applications that want to consume these events do so using attachEvent for Internet Explorer or addEventListener in other browsers. (Also refer to ppk's article Advanced event registration models.) An example of an Episodes event listener is shown in Figure 3.

01:  var marks = {};
02:  var measures = {};
03:  
04:  function handleEpisodeMessage(e) {
05:    var message = e.data;
06:    var aParts = message.split(':');
07:    if ( "EPISODES" === aParts[0] ) {
08:      var action = aParts[1];
09:  
10:      if ( "init" === action ) {
11:        marks = {};
12:        measures = {};
13:      }
14:  
15:      else if ( "mark" === action ) {	
16:        var markName = aParts[2];
17:        marks[markName] = aParts[3] || Number(new Date());
18:      }
19:  
20:      else if ( "measure" === action ) {
21:        var episodeName = aParts[2];
22:  
23:        var startMarkName = 
24:          ( "undefined" != typeof(aParts[3]) ? aParts[3] : episodeName );
25:  
26:        var startEpochTime = 
27:          ( "undefined" != typeof(marks[startMarkName]) ? marks[startMarkName] : 
28:            ( ("" + startMarkName) === parseInt(startMarkName) ? startMarkName : 
29:              undefined ) );
30:  
31:        var endEpochTime = 
32:          ( "undefined" === typeof(aParts[4]) ? Number(new Date()) : 
33:            ( "undefined" != typeof(marks[aParts[4]]) ? marks[aParts[4]] : 
34:              aParts[4] ) );
35:  
36:        if ( startEpochTime ) {
37:          measures[episodeName] = parseInt(endEpochTime - startEpochTime);
38:        }
39:      }
40:  
41:      else if ( "done" === action ) {
42:        var sTimes = "";
43:        for ( var key in measures ) {
44:          sTimes += "," + key + ":" + measures[key];
45:        }
46:  
47:        if ( sTimes ) {
48:          sTimes = sTimes.substring(1); // strip leading ","
49:          img = new Image();
50:          img.src = "http://yourserver.com/beacon.gif" + "?ets=" + sTimes;
51:        }
52:      }
53:  
54:    }
55:  }
56: 
57:  window.addEventListener("message", handleEpisodeMessage, false);

Figure 3. Example of Episodes event listener.

Here are key pieces of the Episodes event listener implementation shown in Figure 3.

3.2.2   episodes.js

Collectors of episodic timing data will take different actions when reporting the data, but the code to gather the data will be very similar. To facilitate the adoption of Episodes a common collection implementation is available: episodes.js.

This script implements an event listener that collects all the Episodes message events and stores the timing information. Web developers access that information through the following API functions.

EPISODES.getMeasures()
Returns a hash of the episodes and their corresponding durations.
EPISODES.getStarts()
Returns a hash of the episodes and their corresponding start times as an epoch time. This is useful for collectors that want to know the timing of one episode relative to another, for example, for drawing a graphical visualization.
EPISODES.sendBeacon(url)
Creates a querystring containing all the episodic timing information and sends that query string to the specified url. For example,
  EPISODES.sendBeacon("http://yourserver.com/beacon.gif");
would generate a request such as:
  http://yourserver.com/beacon.gif?ets=totaltime:531,pageready:531,abovethefold:510,backend:368,frontend:164
EPISODES.addEventListener(sType, callback, bCapture)
For ease, this function is provided to handle the differences between Internet Explorer which uses window.attachEvent and other browsers which use window.addEventListener.

There are other benefits of using episodes.js:

3.2.3   Backwards Compatibility

Episodes uses window.postMessage, but that's only supported in Internet Explorer 8, Firefox 3, Opera 9.5, and WebKit Nightlies. Episodes's use of window.postMessage is beautiful because, for newer browsers, there is no additional JavaScript implementation to download.

For current browsers that don't support window.postMessage a custom event is used. Custom events provide a publish and subscribe implementation in JavaScript for event types that aren't built into the browser. A drawback of custom events is that the framework to implement the custom event must be downloaded which can degrade page performance. However, developers who want to gather timing information about their web apps are already embedding additional JavaScript to their pages, for example, Netflix and Whitepages.com. The cost of downloading the Episodes custom event implementation is comparable to these other frameworks. The advantage is that adopting Episodes means that, as newer browsers gain market share, we move to an implementation that has no download cost.

A goal of Episodes is to make the code to instrument and gather timing information identical on both old and new browsers. For older browsers this is done by adding a definition for window.postMessage. Additionally, window.addEventListener and window.attachEvent are overridden to support listening to the "message" event type.

3.2.4   Episodes Collector Example

Section 3.1.2 Episodes Instrumentation Example shows how this page is instrumented with Episodes. To see a page with both instrumentation and data collection let's look at the Episodes Example. The example page is instrumented with marks and measures, similar to that shown in Figure 1. This section highlights the additional code necessary to also collect episode timing data.

At the top of the page, immediately after the HEAD tag, this JavaScript occurs:

  <script>
  var t_firstbyte = Number(new Date());
  </script>

  <script src="http://stevesouders.com/episodes/episodes.js"></script>

  <script>
  window.postMessage("EPISODES:mark:firstbyte:" + t_firstbyte, "*");
  </script>

Downloading episodes.js provides a default implementation for gathering Episodes and to support older browsers, as described in the episodes.js section. But this download could affect the page itself. We're caught in a race condition where we want to start using Episodes immediately, but the implementation to consume the events doesn't yet exist. The solution is to record a time measurement (t_firstbyte) and mark it after episodes.js has been downloaded.

In the example page suppose we want to beacon the Episodes timing information back to our server. To do that we attach a callback to the "message" event.

  <script>
  function handleEpisodeResults(event) {
    if ( "EPISODES:done" === event.data ) {
      EPISODES.sendBeacon("http://yourserver.com/beacon.gif");    
    }
  }
  EPISODES.addEventListener("message", handleEpisodeResults, false);
  </script>

We've used EPISODES.addEventListener to handle the differences between Internet Explorer's use of window.attachEvent and other browsers which use window.addEventListener. When the "EPISODES:done" message event fires, the beacon is sent.

3.2.5   Episodes Firebug Add-on

To demonstrate how using events allows other applications to measure Episodes, I created the Episodes Firebug add-on. As shown in Figure 4, this add-on provides a graphical rendering of the Episodes instrumented in the page.

Figure 4. Episodes Firebug add-on.

4   Episodes as an Industry Standard

The ways of measuring performance in Web 1.0 applications is not sufficient for Web 2.0. Efforts are already underway to address the gap. Episodes is similar to some of those projects, Jiffy for example. Episodes can be an industry standard because of its distinguishing features of using JavaScript events and an implementation that is built into the newest browsers, requiring no additional download. The goal is to evangelize the benefits to all interested parties and show how Episodes leads to a future of more accurate and more efficient performance measurements. There are four main groups who work in measuring web page performance: web developers, web metrics service providers, tool developers, and browser developers. This section describes the benefits that Episodes brings to each of them.

4.1   Benefits for Web Developers

Today, if a web developer or web company wants to measure the load time of their Web 2.0 application, they can either instrument their page programmatically or use a web metrics service provider. The best choice is programmatic scripting to get data from real user traffic. This would entail implementing an instrumentation library in JavaScript, instrumenting their pages, collecting the beaconed data, and generating reports. With Episodes, the library is provided and Open Source code for generating reports would exist.

Web companies that want to compare their load times to a competitor are forced to use a web metrics service provider. This requires recording scripts with the service provider's proprietary tool for the competitor's pages as well as their own pages. Maintaining these scripts is burdensome and prone to become out-of-sync. With Episodes, competitor's pages would already be instrumented and the service provider's test agent could record the Episodes events.

Some would argue that web companies might not want to use Episodes and allow their competitors to have visibility into their page's load times, or they might game the measurements. But the benefits to any individual company outweigh these concerns. Already web companies can measure each other's pages, it's just more costly, inefficient, and brittle. Allowing competitors to have accurate measurements of their own web pages is not a sacrifice, especially when the gain is more accurate measurements for the web company itself, as well as better integration with web metrics service providers, web tools, and browsers as described in the following sections.

4.2   Benefits for Web Metrics Service Providers

Web metrics service providers, such as Keynote, Gomez, WebMetrics, and Pingdom, have solutions for measuring Web 2.0 applications, but they are proprietary solutions. These are less desirable to customers because of the high cost of implementation and the high switching cost.

Instead, these companies could modify their test agents to collect timing information from Episodes. That way, web companies that had already instrumented their pages with Episodes for their own internal measurements could sign on with a web metrics service without the hurdle of recording test scripts or instrumenting with the service provider's proprietary JavaScript library. There would be a race to be the first service that supports Episodes. Also, because programmatically scripting is more robust than recording test scripts, customers would be more likely to sustain their use of the service provider because the measurements would retain value over a longer period.

4.3   Benefits for Web Tools

Web development tools such as Fasterfox and YSlow report page load times based on the onload event. Web 2.0 developers need something more powerful. Episodes provides a framework that web tools could use to gather more detailed and more accurate timing information from Web 2.0 applications. The Episodes Firebug add-on is an example of such a tool.

4.4   Benefits for Browsers

In the Web 1.0 world browser users benefit from the status bar and other visual cues giving them feedback about their current activity. This feedback is weaker for Web 2.0 applications. The status bar never says "Done" after my web-based email inbox is retrieved using Ajax, or at least not unless the web developer has done the work to explicitly implement that feedback.

Episodes provides a way for browsers to have more insight into what the web application is doing, and could use that to give the user better contextual feedback. Also, browser developers could use the episodic timing data from popular web apps to monitor their progress in making pages load faster in newer browser versions.

4.5   Next Steps

It's critical that these groups work together to adopt Episodes. A first step will be to gather input from industry leaders in each group. The implementation put forth here is a prototype. It needs to be hardened. An important step is getting agreement on 5-10 episode names that have a common definition: totaltime, pageready, backend, frontend, etc.

5   Conclusion

Everyone wants a faster web experience. Users want it, and web companies are working harder than ever to deliver it. The key is visibility into where web apps are spending their time so that developers can focus on what makes the biggest difference to the user. It's exciting to think about a future where performance measurements are easy to implement, accurate, and open. Episodes provides a framework to do this that works for Web 2.0 applications, makes it easy for timing information to be consumed beyond the web page itself, and can serve as an industrywide standard for measuring web page performance.