Browser script loading roundup

February 7, 2010 12:12 am | 8 Comments

How are browsers doing when it comes to parallel script loading?

Back in the days of IE7 and Firefox 2.0, no browser loaded scripts in parallel with other resources. Instead, these older browsers would block all subsequent resource requests until the script was received, parsed, and executed. Here’s how the HTTP requests look when this blocking occurs in older browsers:

The test page that generated this waterfall chart has six HTTP requests:

  1. the HTML document
  2. the 1st script – 2 seconds to download, 2 seconds to execute
  3. the 2nd script – 2 seconds to download, 2 seconds to execute
  4. an image – 1 second to download
  5. a stylesheet- 1 second to download
  6. an iframe – 1 second to download

The figure above shows how the scripts block each other and block the image, stylesheet, and iframe, as well. The image, stylesheet, and iframe download in parallel with each other, but not until the scripts are finished downloading sequentially.

The likely reason scripts were downloaded sequentially in older browsers was to preserve execution order. This is critical when code in the 2nd script depends on symbols defined in the 1st script. Preserving execution order avoids undefined symbol errors. But the missed opportunity is obvious – while the browser is downloading the first script and guaranteeing to execute it first, it could be downloading the other four resources in parallel.

Thankfully, newer browsers now load scripts in parallel!

This is a big win for today’s web apps that often contain 100K+ of JavaScript split across multiple files. Loading the same test page in IE8, Firefox 3.6, Chrome 4, and Safari 4 produces an HTTP waterfall chart like this:

Things look a lot better, but not as good as they should be. In this case, IE8 loads the two scripts and stylesheet in parallel, but the image and iframe are blocked. All of the newer browsers have similar limitations with regard to the extent to which they load scripts in parallel with other types of resources. This table from Browserscope shows where we are and the progress made to get to this point. The recently added “Compare” button added to Browserscope made it easy to generate this historical view.

While downloading scripts, IE8 still blocks on images and iframes. Chrome 4, Firefox 3.6, and Safari 4 block on iframes. Opera 10.10 blocks on all resource types. I’m confident parallel script loading will continue to improve based on the great progress made in the last batch of browsers. Let’s keep our eyes on the next browsers to see if things improve even more.

8 Comments

Page Speed 1.6 Beta – new rules, native library

February 1, 2010 9:48 pm | 8 Comments

Page Speed 1.6 Beta was released today. There are a few big changes, but the most important fix is compatibility with Firefox 3.6. If you’re running the latest version of Firefox visit the download page to get Page Speed 1.6. Phew!

I wanted to highlight some of the new features mentioned in the 1.6 release notes: new rules and native library.

Three new rules were added as part of Page Speed 1.6:

  • Specify a character set early – If you don’t specify a character set for your web pages or specify it too low in the page, the browser could parse it incorrectly. You can specify a character set using the META tag or in the Content-Type response header. Returning charset in the Content-Type header will ensure the browser sees it early. (See this Zoompf post for more information.)
  • Minify HTML – Top performing web sites are already on top of this, right? Analyzing the Alexa U.S. top 10 shows an average savings of 8% if they minified their HTML. You can easily check your site with this new rule, and even save the optimized version.
  • Minimize Request Size – Okay, this is cool and shows how Google tries to squeeze out every last drop of performance. This rule sees if the total size of the request headers exceed one packet (~1500 bytes). Requiring a roundtrip just to submit the request hurts performance, especially for users with high latency.

The other big feature I wanted to highlight first came out in Page Speed 1.5 but didn’t get much attention – the Page Speed C++ Native Library. It probably didn’t get much attention because it’s one of those changes that, if done correctly, no one notices. The work behind the native library involves porting the rules from JavaScript to C++. Why bother? Here’s what the release notes say:

This should speed up scoring, as well as allow rules to be run in programs other than just the Page Speed Firefox extension.

Making Page Speed run faster is great, but the idea of implementing the performance logic in a C++ library so the rules can be run in other programs is very cool. And where have we seen this recently? In the Site Performance section recently added to Webmaster Tools. Now we have a server-side tool that produces the same recommendations found from running the Page Speed add-on. Here are the rules that have been ported to the native library:

added in 1.5:

  • Combine external JavaScript
  • Combine external CSS
  • Enable gzip compression
  • Optimize images
  • Minimize redirects
  • Minimize DNS lookups
  • Avoid bad requests
  • Serve resources from a consistent URL
added in 1.6:

  • specify charset early
  • Minify HTML
  • Minimize request size
  • Put CSS in the document head
  • Minify CSS
  • Optimize the order of styles and scripts
  • serve scaled images
  • specify image dimensions

Webmaster Tools Site Performance today shows recommendations based on the rules in native library 1.5. Now that more rules have been added to native library 1.6, webmasters can expect to see those recommendations in the near future. But this integration shouldn’t stop with Webmaster Tools. I’d love to see other tools and services integrate native library. If you’re interested in using native library, check out the page-speed project on Google Code and contact the page-speed-discuss Google Group.

8 Comments

Speed Tracer – visibility into the browser

December 10, 2009 7:46 am | 9 Comments

Is it just me, or does anyone else think Google’s on fire lately, lighting up the world of web performance? Quick review of news from the past two weeks:

Speed Tracer was my highlight from last night’s Google Campfire One. The event celebrated the release of GWT 2.0. Performance and “faster” were emphasized again and again throughout the evening’s presentations (I love that). GWT’s new code splitting capabilities are great for performance, but Speed Tracer easily wowed the audience – including me. In this post, I’ll describe what I like about Speed Tracer, what I hope to see added next, and then I’ll step back and talk about the state of performance profilers.

Getting started with Speed Tracer

Some quick notes about Speed Tracer:

  • It’s a Chrome extension, so it only runs in Chrome. (Chrome extensions is yet another announcement this week.)
  • It’s written in GWT 2.0.
  • It works on all web sites, even sites that don’t use GWT.

The Speed Tracer getting started page provides the details for installation. You have to be on the Chrome dev channel. Installing Speed Tracer adds a green stopwatch to the toolbar. Clicking on the icon starts Speed Tracer in a separate Chrome window. As you surf sites in the original window, the performance information is shown in the Speed Tracer window.

Beautiful visibility

When it comes to optimizing performance, developers have long been working in the dark. Without the ability to measure JavaScript execution, page layout, reflows, and HTML parsing, it’s not possible to optimize the pain points of today’s web apps. Speed Tracer gives developers visibility into these parts of page loading via the Sluggishness view, as shown here. (Click on the figure to see a full screen view.) Not only is this kind of visibility great, but the display is just, well, beautiful. Good UI and dev tools don’t often intersect, but when they do it makes development that much easier and more enjoyable.

Speed Tracer also has a Network view, with the requisite waterfall chart of HTTP requests. Performance hints are built into the tool flagging issues such as bad cache headers, exceedingly long responses, Mozilla cache hash collision, too many reflows, and uncompressed responses. Speed Tracer also supports saving and reloading the profiled information. This is extremely useful when working on bugs or analyzing performance with other team members.

Feature requests

I’m definitely going to be using Speed Tracer. For a first version, it’s extremely feature rich and robust. There are a few enhancements that will make it even stronger:

  • overall pie chart – The “breakdown by time” for phases like script evaluation and layout are available for segments within a page load. As a starting point, I’d like to see the breakdown for the entire page. When drilling down on a specific load segment, this detail is great. But having overall stats will give developers a clue where they should focus most of their attention.
  • network timing – Similar to the issues I discovered in Firebug Net Panel, long-executing JavaScript in the main page blocks the network monitor from accurately measuring the duration of HTTP requests. This will likely require changes to WebKit to record event times in the events themselves, as was done in the fix for Firefox.
  • .HAR support – Being able to save Speed Tracer’s data to file and share it is great. Recently, Firebug, HttpWatch, and DebugBar have all launched support for the HTTP Archive file format I helped create. The format is extensible, so I hope to see Speed Tracer support the .HAR file format soon. Being able to share performance information across tools and browsers is a necessary next step. That’s a good segue…

Developers need more

Three years ago, there was only one tool for profiling web pages: Firebug. Developers love working in Firefox, but sometimes you just have to profile in Internet Explorer. Luckily, over the last year we’ve seen some good profilers come out for IE including MSFast , AOL Pagetest, WebPagetest.org, and dynaTrace Ajax Edition. DynaTrace’s tool is the most recent addition, and has great visibility similar to Speed Tracer, as well as JavaScript debugging capabilities. There have been great enhancements to Web Inspector, and the Chrome team has built on top of that adding timeline and memory profiling to Chrome. And now Speed Tracer is out and bubbling to the top of the heap.

The obvious question is:

Which tool should a developer choose?

But the more important question is:

Why should a developer have to choose?

There are eight performance profilers listed here. None of them work in more than a single browser. I realize web developers are exceedingly intelligent and hardworking, but no one enjoys having to use two different tools for the same task. But that’s exactly what developers are being asked to do. To be a good developer, you have to be profiling your web site in multiple browsers. By definition, that means you have to install, learn, and update multiple tools. In addition, there are numerous quirks to keep in mind when going from one tool to another. And the features offered are not consistent across tools. It’s a real challenge to verify that your web app performs well across the major browsers. When pressed, rock star web developers I ask admit they only use one or two profilers – it’s just too hard to stay on top of a separate tool for each browser.

This week at Add-on-Con, Doug Crockford’s closing keynote is about the Future of the Web Browser. He’s assembled a panel of representatives from Chrome, Opera, Firefox, and IE. (Safari declined to attend.) My hope is they’ll discuss the need for a cross-browser extension model. There’s been progress in building protocols to support remote debugging: WebDebugProtocol and Crossfire in Firefox, Scope in Opera, and ChromeDevTools in Chrome. My hope for 2010 is that we see cross-browser convergence on standards for extensions and remote debugging, so that developers will have a slightly easier path for ensuring their apps are high performance on all browsers.

9 Comments

Firebug Net Panel: more accurate timing

November 3, 2009 3:28 pm | 9 Comments

There’s a lot of work that transpires before I can recommend a performance tool. I have to do a large amount of testing to verify the tool’s accuracy, and frequently (more often than not) that testing reveals inaccuracies.

Like many web developers, I love Firebug and have been using it since it first came out. Firebug’s Net Panel, thanks to Jan (“Honza”) Odvarko, has seen huge improvements over the last year or so: customized columns, avoiding confusion between real requests vs. cache reads, new (more colorful!) UI, and the recent support of export.

Until now, Net Panel suffered from an accuracy problem: because Net Panel reads network events in the same JavaScript thread as the main page, it’s possible for network events to be blocked resulting in inaccurate time measurements. Cuzillion is helpful here to create a test case. This example has an image that takes 1 second to download, followed by an inline script that takes 5 seconds to execute, and finally another 1 second image. Even though the first image is only 1 second, the “done” network event is blocked for 5 seconds while the inline script executes. In Firebug 1.4’s Net Panel, this image incorrectly appears to take 5 seconds to download, instead of just 1 second:

Honza has come through again, delivering a fix to this problem in Firebug 1.5 (currently in beta as firebug-1.5X.0b1 which requires Firefox 3.6 beta). The fix included help from the Firefox team to add the actual time to each network event. The results are clearly more accurate:

A few other nice features to point out: Firebug Net Panel is the only packet sniffer I’m aware of that displays the DOMContentLoaded and onload events (blue and red vertical line). Firebug 1.5 Net Panel has multiple columns available, and the ability to customize which columns you want to display:

With these new features and improved timing accuracy, Firebug Net Panel is a great choice for analyzing HTTP traffic in your web pages. If you’re not subscribed to Honza’s blog, I recommend you sign up. He’s always working on something new that’s helpful to web developers and especially to Firebug users.

Note 1: Remember, you need both Firefox 3.6 beta and firebug-1.5X.0b1 to see the new Net Panel.

Note 2: This is being sent from Malmö, Sweden where I’m attending Øredev.

9 Comments

Firefox 3.5 at the top

June 30, 2009 8:23 am | 21 Comments

The web world is abuzz today with the release of Firefox 3.5. On the launch page, Mozilla touts the results of running SunSpider. Over on UA Profiler, I’ve developed a different set of tests that count the number of critical performance features browsers do, or don’t, have. Currently, there are 11 traits that are measured. Firefox 3.5 scores higher than any other browser with 10 out of 11 of the performance features browsers need to create a fast user experience.

Firefox 3.5 is a significant improvement over Firefox 3.0, climbing from 7/11 to 10/11 of these performance traits. Among the major browsers, Firefox 3.5 is followed by Chrome 2 (9/11), Safari 4 (8/11), IE 8 (7/11), and Opera 10 (6/11). Unfortunately, IE 6 and 7 have only 4 out of these 11 performance features, a sad state of affairs for today’s web developers and users.

The performance traits measured by UA Profiler include number of connections per hostname, maximum number of connections, parallel loading of scripts and stylesheets, proper caching of resources including redirects, the LINK PREFETCH attribute, and support for data: URLs. When I started UA Profiler, none of the browsers were scoring very high. But there’s great progress in the last year. It’s time to raise the bar! I plan on adding more tests to UA Profiler this summer, and hope the browser development teams will continue to rise to the challenge in an effort to make the Web a faster place for all of us.

21 Comments

Performance Impact of CSS Selectors

March 10, 2009 11:28 pm | 53 Comments

A few months back there were some posts about the performance impact of inefficient CSS selectors. I was intrigued – this is the kind of browser idiosyncratic behavior that I live for. On further investigation, I’m not so sure that it’s worth the time to make CSS selectors more efficient. I’ll go even farther and say I don’t think anyone would notice if we woke up tomorrow and every web page’s CSS selectors were magically optimized.

The first post that caught my eye was about CSS Qualified Selectors by Shaun Inman. This post wasn’t actually about CSS performance, but in one of the comments David Hyatt (architect for Safari and WebKit, also worked on Mozilla, Camino, and Firefox) dropped this bomb:

The sad truth about CSS3 selectors is that they really shouldn’t be used at all if you care about page performance. Decorating your markup with classes and ids and matching purely on those while avoiding all uses of sibling, descendant and child selectors will actually make a page perform significantly better in all browsers.

Wow. Let me say that again. Wow.

The next posts were amazing. It was a series on Testing CSS Performance from Jon Sykes in three parts: part 1, part 2, and part 3. It’s fun to see how his tests evolve, so part 3 is really the one to read. This had me convinced that optimizing CSS selectors was a key step to fast pages.

But there were two things about the tests that troubled me. First, the large number of DOM elements and rules worried me. The pages contain 60,000 DOM elements and 20,000 CSS rules. This is an order of magnitude more than most pages. Pages this large make browsers behave in unusual ways (we’ll get back to that later). The table below has some stats from the top ten U.S. web sites for comparison.

Web Site # CSS Rules
#DOM Elements
AOL 2289 1628
eBay 305 588
Facebook 2882 1966
Google 92 552
Live Search 376 449
MSN 1038 886
MySpace 932 444
Wikipedia 795 1333
Yahoo! 800 564
YouTube 821 817
average 1033 923

The second thing that concerned me was how small the baseline test page was, compared to the more complex pages. The main question I want to answer is “do inefficient CSS selectors slow down pages?” All five test pages contain 20,000 anchor elements (nested inside P, DIV, DIV, DIV). What changes is their CSS: baseline (no CSS), tag selector (one rule for the A tag), 20,000 class selectors, 20,000 child selectors, and finally 20,000 descendant selectors. The last three pages top out at over 3 megabytes in size. But the baseline page and tag selector page, with little or no CSS, are only 1.8 megabytes. These pages answer the question “how much faster would my page be if I eliminated all CSS?” But not many of us are going to eliminate all CSS from our pages.

I revised the test as follows:

  • 2000 anchors and 2000 rules (instead of 20,000) – this actually results in ~6000 DOM elements because of all the nesting in P, DIV, DIV, DIV
  • the baseline page and tag selector page have 2000 rules just like all the other pages, but these are simple class rules that don’t match any classes in the page

I ran these tests on 12 browsers. Page render time was measured with a script block at the top and bottom of the page. (I loaded the page from local disk to avoid possible impact from chunked encoding.) The results are shown in the chart below. (I don’t show Opera 9.63 – it was way too slow – but you can download all the data as csv. You can also see the test pages.)

Performance varies across browsers; strangely, two new browsers, IE 8 and Firefox 3.1, are the slowest but comparisons should not be made from one browser to another. Although all the tests for a given browser were conducted on a single PC, different browsers might have been tested on different PCs with different performance characteristics. The goal of this experiment is not to compare browser performance – it’s to see how browsers handle progressively more complex CSS selectors.

[Revision: On further inspection comparing Firefox 3.0 and 3.1, I discovered that the test PC I used for testing Firefox 3.1 and IE 8 was slower than the other test PCs used in this experiment. I subsequently re-ran those tests as well as Firefox 3.0 and IE 7 on PCs that were more consistent and updated the chart above. Even with this re-run, because of possible differences in test hardware, do not use this data to compare one browser to another.]

Not surprisingly, the more complex pages (child selectors and descendant selectors) usually perform the worst. The biggest surprise is how small the delta is from the baseline to the most complex, worst performing test page. The average slowdown across all browsers is 50 ms, and if we look at the big ones (IE 6&7, FF3), the average delta is just 20 ms. For 70% or more of today’s users, improving these CSS selectors would only make a 20 ms improvement.

Keep in mind – these test pages are close to worst case. The 2000 anchors wrapped in P, DIV, DIV, DIV result in 6000 DOM elements – that’s twice as big as the max in the top ten sites. And the complex pages have 2000 extremely inefficient rules – a typical site has around one third of their rules that are complex child or descendant selectors. Facebook, for example, with the maximum number of rules at 2882 only has 750 that are these extremely inefficient rules.

Why do the results from my tests suggest something different from what’s been said lately? One difference comes from looking at things at such a large scale. It’s okay to exaggerate test cases if the results are proportional to common use cases. But in this case, browsers behave differently when confronted with a 3 megabyte page with 60,000 elements and 20,000 rules. I especially noticed that my results were much different for IE 6&7. I wondered if there was a hockey stick in how IE handled CSS selectors. To investigate this I loaded the child selector and descendant selector pages with increasing number of anchors and rules, from 1000 to 20,000. The results, shown in the chart below, reveal that IE hits a cliff around 18,000 rules. But when IE 6&7 work on a page that is closer to reality, as in my tests, they’re actually the fastest performers.

Based on these tests I have the following hypothesis: For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. There are some types of CSS rules and interactions with JavaScript that can make a page noticeably slower. This is where the focus should be. So I’m starting to collect real world examples of small CSS style-related issues (offsetWidth, :hover) that put the hurt on performance. If you have some, send them my way. I’m speaking at SXSW this weekend. If you’re there, and want to discuss CSS selectors, please find me. It’s important that we’re all focusing on the performance improvements that our users will really notice.

53 Comments

John Resig: Drop-in JavaScript Performance

February 9, 2009 11:32 pm | 1 Comment

I wrote a post on the Google Code Blog about John Resig’s tech talk “Drop-in JavaScript Performance.” The video and slides are now available.

In this talk, John starts off highlighting why performance will improve in the next generation of browsers, thanks to advances in JavaScript engines and new features such as process per tab and parallel script loading. He digs deeper into JavaScript performance, touching on shaping, tracing, just-in-time compilation, and the various benchmarks (SunSpider, Dromaeo, and V8 benchmark). John plugs my UA Profiler, with its tests for simultaneous connections, parallel script loading, and link prefetching. He wraps up with a collection of many other advanced features in the areas of communiction, DOM, styling, data, and measurements.

1 Comment

State of Performance 2008

December 17, 2008 8:08 pm | 3 Comments

My Stanford class, CS193H High Performance Web Sites, ended last week. The last lecture was called “State of Performance”. This was my review of what happened in 2008 with regard to web performance, and my predictions and hopes for what we’ll see in 2009. You can see the slides (ppt, GDoc), but I wanted to capture the content here with more text. Let’s start with a look back at 2008.

2008

Year of the Browser
This was the year of the browser. Browser vendors have put users first, competing to make the web experience faster with each release. JavaScript got the most attention. WebKit released a new interpreter called Squirrelfish. Mozilla came out with Tracemonkey – Spidermonkey with Trace Trees added for Just-In-Time (JIT) compilation. Google released Chrome with the new V8 JavaScript engine. And new benchmarks have emerged to put these new JavaScript engines through the paces: Sunspider, V8 Benchmark, and Dromaeo.

In addition to JavaScript improvements, browser performance was boosted in terms of how web pages are loaded. IE8 Beta came out with parallel script loading, where the browser continues parsing the HTML document while downloading external scripts, rather than blocking all progress like most other browsers. WebKit, starting with version 526, has a similar feature, as does Chrome 0.5 and the most recent Firefox nightlies (Minefield 3.1). On a similar note, IE8 and Firefox 3 both increased the number of connections opened per server from 2 to 6. (Safari and Opera were already ahead with 4 connections per server.) (See my original blog posts for more information: IE8 speeds things up, Roundup on Parallel Connections, UA Profiler and Google Chrome, and Firefox 3.1: raising the bar.)

Velocity
Velocity 2008, the first conference focused on web performance and operations, launched June 23-24. Jesse Robbins and I served as co-chairs. This conference, organized by O’Reilly, was densely packed – both in terms of content and people (it sold out!). Speakers included John Allspaw (Flickr), Luiz Barroso (Google), Artur Bergman (Wikia), Paul Colton (Aptana), Stoyan Stefanov (Yahoo!), Mandi Walls (AOL), and representatives from the IE and Firefox teams. Velocity 2009 is slated for June 22-24 in San Jose and we’re currently accepting speaker proposals.
Jiffy
Improving web performance starts with measuring performance. Measurements can come from a test lab using tools like Selenium and Watir. To get measurements from geographically dispersed locations, scripted tests are possible through services like Keynote, Gomez, Webmetrics, and Pingdom. But the best data comes from measuring real user traffic. The basic idea is to measure page load times using JavaScript in the page and beacon back the results. Many web companies have rolled their own versions of this instrumentation. It isn’t that complex to build from scratch, but there are a few gotchas and it’s inefficient for everyone to reinvent the wheel. That’s where Jiffy comes in. Scott Ruthfield and folks from Whitepages.com released Jiffy at Velocity 2008. It’s Open Source and easy to use. If you don’t currently have real user load time instrumentation, take a look at Jiffy.
JavaScript: The Good Parts
Moving forward, the key to fast web pages is going to be the quality and performance of JavaScript. JavaScript luminary Doug Crockford helps lights the way with his book JavaScript: The Good Parts, published by O’Reilly. More is needed! We need books and guidelines for performance best practices and design patterns focused on JavaScript. But Doug’s book is a foundation on which to build.
smush.it
My former colleagues from the Yahoo! Exceptional Performance team, Stoyan Stefanov and Nicole Sullivan, launched smush.it. In addition to a great name and beautiful web site, smush.it packs some powerful functionality. It analyzes the images on a web page and calculates potential savings from various optimizations. Not only that, it creates the optimized versions for download. Try it now!
Google Ajax Libraries API
JavaScript frameworks are powerful and widely used. Dion Almaer and the folks at Google saw an opportunity to help the development community by launching the Ajax Libraries API. This service hosts popular frameworks including jQuery, Prototype, script.aculo.us, MooTools, Dojo, and YUI. Web sites using any of these frameworks can reference the copy hosted on Google’s worldwide server network and gain the benefit of faster downloads and cross-site caching. (Original blog post: Google AJAX Libraries API.)
UA Profiler
Okay, I’ll get a plug for my own work in here. UA Profiler looks at browser characteristics that make pages load faster, such as downloading scripts without blocking, max number of open connections, and support for “data:” URLs. The tests run automatically – all you have to do is navigate to the test page from any browser with JavaScript support. The results are available to everyone, regardless of whether you’ve run the tests. I’ve been pleased with the interest in UA Profiler. In some situations it has identified browser regressions that developers have caught and fixed.

2009

Here’s what I think and hope we’ll see in 2009 for web performance.

Visibility into the Browser
Packet sniffers (like HTTPWatch, Fiddler, and WireShark) and tools like YSlow allow developers to investigate many of the “old school” performance issues: compression, Expires headers, redirects, etc. In order to optimize Web 2.0 apps, developers need to see the impact of JavaScript and CSS as the page loads, and gather stats on CPU load and memory consumption. Eric Lawrence and Christian Stockwell’s slides from Velocity 2008 give a hint of what’s possible. Now we need developer tools that show this information.
Think “Web 2.0”
Web 2.0 pages are often developed with a Web 1.0 mentality. In Web 1.0, the amount of CSS, JavaScript, and DOM elements on your page was more tolerable because it would be cleared away with the user’s next action. That’s not the case in Web 2.0. Web 2.0 apps can persist for minutes or even hours. If there are a lot of CSS selectors that have to be parsed with each repaint - that pain is felt again and again. If we include the JavaScript for all possible user actions, the size of JavaScript bloats and increases memory consumption and garbage collection. Dynamically adding elements to the DOM slows down our CSS (more selector matching) and JavaScript (think getElementsByTagName). As developers, we need to develop a new way of thinking about the shape and behavior of our web apps in a way that addresses the long page persistence that comes with Web 2.0.
Speed as a Feature
In my second month at Google I was ecstatic to see the announcement that landing page load time was being incorporated into the quality score used by Adwords. I think we’ll see more and more that the speed of web pages will become more important to users, more important to aggregators and vendors, and subsequently more important to web publishers.
Performance Standards
As the industry becomes more focused on web performance, a need for industry standards is going to arise. Many companies, tools, and services measure “response time”, but it’s unclear that they’re all measuring the same thing. Benchmarks exist for the browser JavaScript engines, but benchmarks are needed for other aspects of browser performance, like CSS and DOM. And current benchmarks are fairly theoretical and extreme. In addition, test suites are needed that gather measurements under more real world conditions. Standard libraries for measuring performance are needed, a la Jiffy, as well as standard formats for saving and exchanging performance data.
JavaScript Help
With the emergence of Web 2.0, JavaScript is the future. The Catch-22 is that JavaScript is one of the biggest performance problems in a web page. Help is needed so JavaScript-heavy web pages can still be fast. One specific tool that’s needed is something that takes a monolithic JavaScript payload and splits into smaller modules, with the necessary logic to know what is needed when. Doloto is a project from Microsoft Research that tackles this problem, but it’s not available publicly. Razor Optimizer attacks this problem and is a good first step, but it needs to be less intrusive to incorporate this functionality.

Browsers also need to make it easier for developers to load JavaScript with less of a performance impact. I’d like to see two new attributes for the SCRIPT tag: DEFER and POSTONLOAD. DEFER isn’t really “new” – IE has had the DEFER attribute since IE 5.5. DEFER is part of the HTML 4.0 specification, and it has been added to Firefox 3.1. One problem is you can’t use DEFER with scripts that utilize document.write, and yet this is critical for mitigating the performance impact of ads. Opera has shown that it’s possible to have deferred scripts still support document.write. This is the model that all browsers should follow for implementing DEFER. The POSTONLOAD attribute would tell the browser to load this script after all other resources have finished downloading, allowing the user to see other critical content more quickly. Developers can work around these issues with more code, but we’ll see wider adoption and more fast pages if browsers can help do the heavy lifting.

Focus on Other Platforms
Most of my work has focused on the desktop browser. Certainly, more best practices and tools are needed for the mobile space. But to stay ahead of where the web is going we need to analyze the user experience in other settings including automobile, airplane, mass transit, and 3rd world. Otherwise, we’ll be playing catchup after those environments take off.
Fast by Default
I enjoy Tom Hanks’ line in A League of Their Own when Geena Davis (“Dottie”) says playing ball got too hard: “It’s supposed to be hard. If it wasn’t hard, everyone would do it. The hard… is what makes it great.” I enjoy a challenge and tackling a hard problem. But doggone it, it’s just too hard to build a high performance web site. The bar shouldn’t be this high. Apache needs to turn off ETags by default. WordPress needs to cache comments better. Browsers need to cache SSL responses when the HTTP headers say to. Squid needs to support HTTP/1.1. The world’s web developers shouldn’t have to write code that anticipates and fixes all of these issues.

Good examples of where we can go are Runtime Page Optimizer and Strangeloop appliances. RPO is an IIS module (and soon Apache) that automatically fixes web pages as they leave the server to minify and combine JavaScript and stylesheets, enable CSS spriting, inline CSS images, and load scripts asynchronously. (Original blog post: Runtime Page Optimizer.) The web appliance from Strangeloop does similar realtime fixes to improve caching and reduce payload size. Imagine combining these tools with smush.it and Doloto to automatically improve the performance of web pages. Companies like Yahoo! and Google might need more customized solutions, but for the other 99% of developers out there, it needs to be easier to make pages fast by default.

This is a long post, but I still had to leave out a lot of performance highlights from 2008 and predictions for what lies ahead. I look forward to hearing your comments.

3 Comments

Hammerhead: moving performance testing upstream

September 30, 2008 10:07 pm | 54 Comments

Today at The Ajax Experience, I released Hammerhead, a Firebug extension for measuring page load times.

Improving performance starts with metrics. How long does it take for the page to load? Seems like a simple question to answer, but gathering accurate measurements can be a challenge. In my experience, performance metrics exist at four stages along the development process.

  • real user data – I love real user metrics. JavaScript frameworks like Jiffy measure page load times from real traffic. When your site is used by a large, diverse audience, data from real page views is ground-truth.
  • bucket testing – When you’re getting ready to push a new release, if you’re lucky you can do bucket testing to gather performance metrics. You release the new code to a subset of users while maintaining another set of users on the old code (the “control”). If you sample your user population correctly and gather enough data, comparing the before and after timing information gives you a preview of the latency impact of your next release.
  • synthetic or simulated testing – In some situations, it’s not possible to gather real user data. You might not have the infrastructure to do bucket testing and real user instrumentation. Your new build isn’t ready for release, but you still want to gauge where you are with regard to performance. Or perhaps you’re measuring your competitors’ performance. In these situations, the typical solution is to do scripted testing on some machine in your computer lab, or perhaps through a service like Keynote or Gomez.
  • dev box – The first place performance testing happens (or should happen) is on the developer’s box. As she finishes her code changes, the developer can see if she made things better or worse. What was the impact of that JavaScript rewrite? What happens if I add another stylesheet, or split my images across two domains?

Performance metrics get less precise as you move from real user data to dev box testing, as shown in Figure 1. That’s because, as you move away from real user data, biases are introduced. For bucket testing, the challenge is selecting users in an unbiased way. For synthetic testing, you need to choose scenarios and test accounts that are representative of real users. Other variables of your real user audience are difficult or impossible to simulate: bandwidth speed, CPU power, OS, browser, geographic location, etc. Attempting to simulate real users in your synthetic testing is a slippery, and costly, slope. Finally, testing on the dev box usually involves one scenario on a CPU that is more powerful than the typical user, and an Internet connection that is 10-100 times faster.

Figure 1 - Precision and ability to iterate along the development process

Given this loss of precision, why would we bother with anything other than real user data? The reason is speed of development. Dev box data can be gathered within hours of a code change, whereas it can take days to gather synthetic data, weeks to do bucket testing, and a month or more to release the code and have real user data. If you wait for real user data to see the impact of your changes, it can take a year to iterate on a short list of performance improvements. To quickly identify the most important performance improvements and their optimal implementation, it’s important to improve our ability to gather performance metrics earlier in the development process: on the dev box.

As a developer, it can be painful to measure the impact of a code change on your dev box. Getting an accurate time measurement is the easy part; you can use YSlow, Fasterfox, or an alert dialog. But then you have to load the page multiple times. The most painful part is transcribing the load times into Excel. Were all the measurements done with an empty cache or a primed cache, or was that even considered?

Hammerhead makes it easier to gather performance metrics on your dev box. Figure 2 shows the results of hammering a few news web sites with Hammerhead. By virtue of being a Firebug extension, Hammerhead is available in a platform that web developers are familiar with. To setup a Hammerhead test, one or more URLs are added to the list, and the “# of loads” is specified. Once started, Hammerhead loads each URL the specified number of times.

Figure 2 - Hammerhead results for several news sites

Figure 2 - Hammerhead results for a few news web sites

The next two things aren’t rocket science, but they make a big difference. First, there are two columns of results, one for empty cache and one for primed cache. Hammerhead automatically clears the disk and memory cache, or just the memory cache, in between each page load to gather metrics for both of these scenarios. Second, Hammerhead displays the median and average time measurement. Additionally, you can export the data in CSV format.

Even if you’re not hammering a site, other features make Hammerhead a useful add-on. The Cache & Time panel, shown in Figure 3, shows the current URL’s load time. It also contains buttons to clear the disk and memory cache, or just the memory cache. It has another feature that I haven’t seen anywhere else. You can choose to have Hammerhead clear these caches after every page view. This is a nice feature for me when I’m loading the same page again and again to see it’s performance in an empty or a primed cache state. If you forget to switch this back, it gets reset automatically next time you restart Firefox.

Figure 3 - Cache & Time panel in Hammerhead

Figure 3 - Cache & Time panel in Hammerhead

If you don’t have Hammerhead open, you can still see the load time in the status bar. Right clicking the Hammerhead icon gives you access for clearing the cache. The ability to clear just the memory cache is another valuable feature I haven’t seen elsewhere. I feel this is the best way to simulate the primed cache scenario, where the user has been to your site recently, but not during the current browser session.

Hammerhead makes it easier to gather performance metrics early in the development process, resulting in a faster development cycle. The biggest bias is that most developers have a much faster bandwidth connection than the typical user. Easy-to-install bandwidth throttlers are a solution. Steve Lamm blogged on Google Code about how Hammerhead can be used with bandwidth throttlers on the dev box, bringing together both ease and greater precision of performance measurements. GIve it a spin and let me know what you think.

54 Comments

Velocity Wrap-up

June 26, 2008 10:51 am | 3 Comments

This week I co-chaired Velocity, the web performance and operations conference from O’Reilly. It was great! Jesse and I told the story about how the conference came about. When we proposed the conference we believed there was a community of performance and operations engineers that needed a forum to share and learn, and the attendance at Velocity confirmed this. Velocity sold out with over 600 attendees!

The lineup of speakers was great. There was a lot of material packed in a 2-day conference. I stayed in the Performance track, but wanted to attend every session in the Operations track, too. Many speakers shared their slides, and there are videos and photos from some of the talks.

(more…)

3 Comments