Velocity: Tim O’Reilly and 20% discount

June 22, 2009 11:19 am | Comments Off on Velocity: Tim O’Reilly and 20% discount

It was great to read Tim O’Reilly’s blog post about Velocity. He tells the story about the first meeting where we talked about starting a conference for the web performance and operations community. It almost didn’t happen! The first meeting got postponed. This was at OSCON, and Tim got tied up preparing his keynote. Luckily, we were able to get together the next day.

Looking back, that seems so long ago. The excitement about web performance has skyrocketed since then, in part because of Velocity. There’s also been a lot of tool development in that space, including Firebug, YSlow, Page Speed, HttpWatch, PageTest, VRTA, and neXpert. All of these tools, and more, will be showcased at Velocity, happening this week in San Jose. The sessions take place on Tuesday, June 23 and Wednesday, June 24.

As a final nod, Tim provides a 20% discount code: VEL09BLOG. I hope to see you here!

Comments Off on Velocity: Tim O’Reilly and 20% discount

Simplifying CSS Selectors

June 18, 2009 12:55 pm | 25 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

“Simplifying CSS Selectors” is the last chapter in my next book. My investigation into CSS selector performance is therefore fairly recent. A few months ago, I wrote a blog post about the Performance Impact of CSS Selectors. It talks about the different types of CSS selectors, which ones are hypothesized to be the most painful, and how the impact of selector matching might be overestimated. It concludes with this hypothesis:

For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. There are some types of CSS rules and interactions with JavaScript that can make a page noticeably slower. This is where the focus should be.

I received a lot of feedback about situations where CSS selectors do make web pages noticeably slower. Looking for a common theme across these slow CSS test cases led me to this revelation from David Hyatt’s article on Writing Efficient CSS for use in the Mozilla UI:

The style system matches a rule by starting with the rightmost selector and moving to the left through the rule’s selectors. As long as your little subtree continues to check out, the style system will continue moving to the left until it either matches the rule or bails out because of a mismatch.

This illuminates where our optimization efforts should be focused: on CSS selectors that have a rightmost selector that matches a large number of elements in the page. The experiments from my previous blog post contain some CSS selectors that look expensive, but when examined in this new light we realize really aren’t worth worrying about, for example, DIV DIV DIV P A.class0007 {}. This selector has five levels of descendent matching that must be performed. This sounds complex. But when we look at the rightmost selector, A.class0007, we realize that there’s only one element in the entire page that the browser has to match against.

The key to optimizing CSS selectors is to focus on the rightmost selector, also called the key selector (coincidence?). Here’s a much more expensive selector: A.class0007 * {}. Although this selector might look simpler, it’s more expensive for the browser to match. Because the browser moves right to left, it starts by checking all the elements that match the key selector, “*“. This means the browser must try to match this selector against all elements in the page. This chart shows the difference in load times for the test page using this universal selector compared with the previous descendant selector test page.

Load time difference for universal selector

Load time difference for universal selector

It’s clear that CSS selectors with a key selector that matches many elements can noticeably slow down web pages. Other examples of CSS selectors where the key selector might create a lot of work for the browser include:

A.class0007 DIV {}
#id0007 > A {}
.class0007 [href] {}
DIV:first-child {}

Not all CSS selectors hurt performance, even those that might look expensive. The key is focusing on CSS selectors with a wide-matching key selector. This becomes even more important for Web 2.0 applications where the number of DOM elements, CSS rules, and page reflows are even higher.

25 Comments

Velocity – fully programmed

June 16, 2009 3:57 pm | 2 Comments

With my book and Velocity hitting in the same month, I’ve been slammed. Even though we started the Velocity planning process eleven months ago, we’ve been tweaking the program schedule up to the last minute, making room for new products and technology breakthroughs. I’m happy to say that the slate of speakers is nailed down, and it looks awesome. Here’s a rundown of the what’s happening in the Performance track, including the most recent additions.

Workshops (Mon, June 22)

At this year’s conference, we added a day of workshops. I kick things off talking about Website Performance Analysis, where I’ll take a popular, but slow, web site and show the tools used to make it faster. Nicholas Zakas is getting into deep performance optimizations with Writing Efficient JavaScript, relevant to any web site that uses JavaScript (which is every one). I’m psyched to sit in on Nicole Sullivan’s workshop, The Fast and the Fabulous: 9 ways engineering and design come together to make your site slow. We worked together at Yahoo!, so I can vouch for her guru-ness when it comes to CSS and web design. The workshops end with Metrics that Matter by Ben Rushlo from Keynote Systems. This was a topic that was brainstormed at the Velocity Summit held earlier this year – the importance of identifying the metrics that you just have to be watching to track and improve your site’s performance.

Sessions Day 1 (Tues, June 23)

We had so many good speaker proposals, we decided to kick things off a bit earlier, starting at 8:30am. We’ll cover the exciting stuff right out of the gate – new product announcements! (My lips are sealed.) One of the most important talks of the conference is The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search – where Eric Schurman from Live Search (‘scuse me, Bing) and Jake Brutlag from Google Search co-present the results of experiments they ran measuring the impact of latency on users. Are you kidding me?! Microsoft and Google, presenting together, with hard numbers about the impact of latency, talking about experiments run on live traffic! This is unprecedented and can’t be missed. Eric and Jake are two of the smartest and nicest guys around, so grab them afterwards and ask questions.

There’s a reprise of last year’s popular browser matchup, What Makes Browsers Performant, with representatives from Internet Explorer, Firefox, and Chrome. Doug Crockford is talking about Ajax Performance, painting the landscape of how developers should view and optimize their Web 2.0 applications. Michael Carter’s presentation on Light Speed Comet will present this newer technique for high volume, low latency Ajax communication. Other performance-related presentations include a demo of Google’s new Page Speed performance tool, A Preview of MySpace’s Open-sourced Performance Tracker, The Secret Weapons of the AOL Optimization Team, Go with the Reflow by my good buddy Lindsey Simon, and Performance-Based Design – Linking Performance to Business Metrics by Aladdin Nassar.

Sessions Day 2 (Wed, June 24)

We start early again, and jump right into the good stuff. Marissa Mayer starts with a keynote talking about Google’s commitment to fast web sites, followed by lightning demos of Firebug, HttpWatch, AOL PageTest, YSlow 2.0, and Visual Round Trip Analyzer. In Shopzilla’s Site Redo, Phil Dixon delivers more killer stats about the business impact of performance, such as a “5% – 12% lift in top-line revenue”. These are the numbers developers need to be armed with when debating the priority of performance improvements within their company. The morning closes with Ben Galbraith and Dion Almaer talking about the Responsiveness of web applications.

Several afternoon sessions come from Google. Kyle Scholz and Yaron Friedman present High Performance Search at Google. These guys have to build advanced DHTML that works across a huge audience; it’ll be important to find out what worked for them. Tony Gentilcore, creator of Fasterfox, gets the conference’s Sherlock Holmes award for his discoveries about why compression doesn’t happen as often as we think, in Beyond Gzipping. Brad Chen talks about a new tact on high performance applications in the browser using Google Native Client.

Matt Mullenweg is presenting some of the recent performance enhancements baked into WordPress. It’s a real treat to have Matt on the program. Developers that have to monitor performance will want to hear MySpace.com’s talk Fistful of Sand. In addition to hearing from Google Search, we’ll also get a glimpse of Frontend Performance Engineering in Facebook. Eric Mattingly is demoing a new tool called neXpert. And the day closes with me talking about the State of Performance, and a favorite from last year, High Performance Ads – Is It Possible?.

One of the most rewarding things about Velocity 2008 was the amount of sharing that took place. Everyone was talking about the pitfalls to avoid and the successes that can be had. I see that happening again this year. All of these speakers are extremely approachable. They have great experience and are smart, too, but the key for Velocity is that you can walk up to any one of them afterward and ask for more details or share what you’ve discovered. The Web is out there. Velocity is where we work together to make it faster.

See you at Velocity!

[If you haven’t registered yet, make sure to use my "vel09cmb" 15% discount code.]

2 Comments

Using Iframes Sparingly

June 3, 2009 10:42 pm | 18 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

Time to create 100 elements

Iframes provide an easy way to embed content from one web site into another. But they should be used cautiously. They are 1-2 orders of magnitude more expensive to create than any other type of DOM element, including scripts and styles. The time to create 100 elements of various types shows how expensive iframes are.

Pages that use iframes typically don’t have that many of them, so the DOM creation time isn’t a big concern. The bigger issues involve the onload event and the connection pool.

Iframes Block Onload

It’s important that the window’s onload event fire as soon as possible. This causes the browser’s busy indicators to stop, letting the user know that the page is done loading. When the onload event is delayed, it gives the user the perception that the page is slower.

The window’s onload event doesn’t fire until all its iframes, and all the resources in these iframes, have fully loaded. In Safari and Chrome, setting the iframe’s SRC dynamically via JavaScript avoids this blocking behavior.

One Connection Pool

Browsers open a small number of connections to any given web server. Older browsers, including Internet Explorer 6 & 7 and Firefox 2, only open two connections per hostname. This number has increased in newer browsers. Safari 3+ and Opera 9+ open four connections per hostname, while Chrome 1+, IE 8, and Firefox 3 open six connections per hostname. You can see the complete table in my Roundup on Parallel Connections.

One might hope that an iframe would have its own connection pool, but that’s not the case. In all major browsers, the connections are shared between the main page and its iframes. This means it’s possible for the resources in an iframe to use up the available connections and block resources in the main page from loading. If the contents of the iframe are as important, or more important, than the main page, this is fine. However, if the iframe’s contents are less critical to the page, as is often the case, it’s undesirable for the iframe to commandeer the open connections. One workaround is to set the iframe’s SRC dynamically after the higher priority resources are done downloading.

5 of the 10 top U.S. web sites use iframes. In most cases, they’re used for ads. This is unfortunate, but understandable given the simplicity of using iframes for inserting content from an ad service. In many situations, iframes are the logical solution. But keep in mind the performance impact they can have on your page. When possible, avoid iframes. When necessary, use them sparingly.

18 Comments

Stanford videos available

May 20, 2009 11:46 pm | 9 Comments

Last fall I taught CS193H: High Performance Web Sites at Stanford. My class was videotaped so people enrolled through the Stanford Center for Professional Development could watch at offhours. In an earlier blog post I mentioned that SCPD was working to make the videos available. I’m pleased to announce that you can now watch these lectures on SCPD as part of the XCS193H videos. Yep, 25 hours of me talking about web performance! These videos include lectures on all the rules from my first book, High Performance Web Sites, as well as the new material from my next book, Even Faster Web Sites, due out in June

The videos aren’t free – tuition is $600. If this is too pricey, you can watch the first three videos for free. These videos are the most thorough explanation of my performance best practices. I hope you’ll check them out.

9 Comments

Flushing the Document Early

May 18, 2009 10:02 pm | 23 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

 

The Performance Golden Rule reminds us that for most web sites, 80-90% of the load time is on the front end. However, for some pages, the time it takes the back end to generate the HTML document is more than 10-20% of the overall page load time. Even if the HTML document is generated quickly on the back end, it may still take a long time before it’s received by the browser for users in remote locations or with slow connections. While the HTML document is being generated on the back end and sent over the wire, the browser is waiting idly. What a waste! Instead of letting the browser sit there doing nothing, web developers can use flushing to jumpstart page loading even before the HTML document response is fully received.

Flushing is when the server sends the initial part of the HTML document to the client before the entire response is ready. All major browsers start parsing the partial response. When done correctly, flushing results in a page that loads and feels faster. The key is choosing the right point at which to flush the partial HTML document response. The flush should occur before the expensive parts of the back end work, such as database queries and web service calls. But the flush should occur after the initial response has enough content to keep the browser busy. The part of the HTML document that is flushed should contain some resources as well as some visible content. If resources (e.g., stylesheets, external scripts, and images) are included, the browser gets an early start on its download work. If some visible content is included, the user receives feedback sooner that the page is loading.

Most HTML templating languages, including PHP, Perl, Python, and Ruby, contain a “flush” function. Getting flushing to work can be tricky. Problems arise when output is buffered, chunked encoding is disabled, proxies or anti-virus software interfere, or the flushed response is too small. Scanning the comments in PHP’s flush documentation shows that it can be hard to get all the details correct. Perhaps that’s why most of the U.S. top 10 sites don’t flush the document early. One that does is Google Search.

Google Search flushing earlyGoogle Search flushing early

The HTTP waterfall chart for Google Search shows the benefits of flushing the document early. While the HTML document response (the first bar) is still being received, the browser has already started downloading one of the images used in the page, nav_logo4.png (the second bar). By flushing the document early, you make your pages start downloading resources and rendering content more quickly. This is a benefit that all users will appreciate, especially those with slow Internet connections and high latency.

23 Comments

Sharding Dominant Domains

May 12, 2009 7:05 pm | 13 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

 

Rule 9 from High Performance Web Sites says reducing DNS lookups makes web pages faster. This is true, but in some situations, it’s worthwhile to take a bunch of resources that are being downloaded on a single domain and split them across multiple domains. I call this domain sharding. Doing this allows more resources to be downloaded in parallel, reducing the overall page load time.

To determine if domain sharding makes sense, you have to see if the page has a dominant domain being used for downloading resources. The following HTTP waterfall chart, for Yahoo.com, is indicative of a site being slowed down by a dominant domain. This waterfall chart shows the page being downloaded in Internet Explorer 7, which only allows two parallel downloads per hostname. The vertical lines show that typically there are only two simultaneous downloads at any given time. Looking at the resource URLs, we see that almost all of them are served from “l.yimg.com.” Sharding these resources across two domains, such as “l1.yimg.com” and “l2.yimg.com”, would cause the resources to download in about half the time.

Most of the U.S. top ten web sites do domain sharding. YouTube uses i1.ytimg.com, i2.ytimg.com, i3.ytimg.com, and i4.ytimg.com. Live Search uses ts1.images.live.com, ts2.images.live.com, ts3.images.live.com, and ts4.images.live.com. Both of these sites are sharding across four domains. What’s the optimal number? Yahoo! released a study that recommends sharding across at least two, but no more than four, domains. Above four, performance actually degrades.

Not all browsers are restricted to just two parallel downloads per hostname. Opera 9+ and Safari 3+ do four downloads per hostname. Internet Explorer 8, Firefox 3, and Chrome 1+ do six downloads per hostname. Sharding across two domains is a good compromise that improves performance in all browsers.

13 Comments

Positioning Inline Scripts

May 6, 2009 10:26 pm | 14 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

 

My first three chapters in Even Faster Web Sites (Splitting the Initial Payload, Loading Scripts Without Blocking, and Coupling Asynchronous Scripts), focus on external scripts. But inline scripts block downloads and rendering just like external scripts do. The Inline Scripts Block example contains two images with an inline script between them. The inline script takes five seconds to execute. Looking at the HTTP waterfall chart, we see that the second image (which occurs after the inline script) is blocked from downloading until the inline script finishes executing.

Inline scripts block rendering, just like external scripts, but are worse. External scripts only block the rendering of elements below them in the page. Inline scripts block the rendering of everything in the page. You can see this by loading the Inline Scripts Block example in a new window and count off the five seconds before anything is displayed.

Here are three strategies for avoiding, or at least mitigating, the blocking behavior of inline scripts.

  • move inline scripts to the bottom of the page – Although this still blocks rendering, resources in the page aren’t blocked from downloading.
  • use setTimeout to kick off long executing code – If you have a function that takes a long time to execute, launching it via setTimeout allows the page to render and resources to download without being blocked.
  • use DEFER – In Internet Explorer and Firefox 3.1 or greater, adding the DEFER attribute to the inline SCRIPT tag avoids the blocking behavior of inline scripts.

It’s especially important to avoid placing inline scripts between a stylesheet and any other resources in the page. This will make it as if the stylesheet was blocking the resources that follow from downloading. The reason for this behavior is that all major browsers preserve the order of CSS and JavaScript. The stylesheet has to be fully downloaded, parsed, and applied before the inline script is executed. And the inline script must be executed before the remaining resources can be downloaded. Therefore, resources that follow a stylesheet and inline script are blocked from downloading.

It’s better to explain this with an example. Below is the HTTP waterfall chart for eBay.com. Normally, the stylesheets and the first script would be downloaded in parallel.1 But after the stylesheet, there’s an inline script (with just one line of code). This causes the external script to be blocked from downloading until the stylesheet is received.

Stylesheet followed by inline script blocks downloads on eBay
Stylesheet followed by inline script blocks downloads on eBay

This unnecessary blocking also happens on MSN, MySpace, and Wikipedia. The workaround is to move inline scripts above stylesheets or below other resources. This will increase parallel downloads resulting in a faster page.

1All these resources are on the same domain, so downloading them all in parallel is possible on browsers that support more than two connections per server, including Internet Explorer 8, Firefox 3, Safari, Chrome, and Opera.

14 Comments

Web Exponents

May 4, 2009 11:14 pm | 3 Comments

I’m fortunate to have gotten to know so many tech gurus over the last few years from speaking at conferences and co-chairing Velocity. As if programming wasn’t complex enough, there are all the choices that have to be made with regard to language, design, performance, scalability, tools, deployment, and more. It’s incredibly valuable to hear what these experts have to say. I always come away with new insights and ideas.

I wanted to find a way that my fellow Googlers and the web development community at large could share in this benefit, as well. A few months ago I started inviting these industry leaders to give tech talks at Google. This has gone very well. The talks are popular, and there’s an even bigger benefit from posting the videos and slides afterward.

I decided to give this speaker series a name: Web Exponents where “exponent” is someone who “actively supports or favors a cause”. There’s a great tagline, too: “raising web technology to a higher power”. I blogged about this on the Google Code Blog:

All the videos are available in the Web Exponents playlist on YouTube. This week I’ll be blogging about Rob Campbell’s talk on Firebug. You can subscribe to the playlist to make sure you catch his and all future videos from these Web Exponents.

3 Comments

Loading Scripts Without Blocking

April 27, 2009 10:49 pm | 47 Comments

This post is based on a chapter from Even Faster Web Sites, the follow-up to High Performance Web Sites. Posts in this series include: chapters and contributing authors, Splitting the Initial Payload, Loading Scripts Without Blocking, Coupling Asynchronous Scripts, Positioning Inline Scripts, Sharding Dominant Domains, Flushing the Document Early, Using Iframes Sparingly, and Simplifying CSS Selectors.

As more and more sites evolve into “Web 2.0” apps, the amount of JavaScript increases. This is a performance concern because scripts have a negative impact on page performance. Mainstream browsers (i.e., IE 6 and 7)  block in two ways:

  • Resources in the page are blocked from downloading if they are below the script.
  • Elements are blocked from rendering if they are below the script.

The Scripts Block Downloads example demonstrates this. It contains two external scripts followed by an image, a stylesheet, and an iframe. The HTTP waterfall chart from loading this example in IE7 shows that the first script blocks all downloads, then the second script blocks all downloads, and finally the image, stylesheet, and iframe all download in parallel. Watching the page render, you’ll notice that the paragraph of text above the script renders immediately. However, the rest of the text in the HTML document is blocked from rendering until all the scripts are done loading.

Scripts block downloads in IE6&7, Firefox 2&3.0, Safari 3, Chrome 1, and Opera

Browsers are single threaded, so it’s understandable that while a script is executing the browser is unable to start other downloads. But there’s no reason that while the script is downloading the browser can’t start downloading other resources. And that’s exactly what newer browsers, including Internet Explorer 8, Safari 4, and Chrome 2, have done. The HTTP waterfall chart for the Scripts Block Downloads example in IE8 shows the scripts do indeed download in parallel, and the stylesheet is included in that parallel download. But the image and iframe are still blocked. Safari 4 and Chrome 2 behave in a similar way. Parallel downloading improves, but is still not as much as it could be.

Scripts still block, even in IE8, Safari 4, and Chrome 2

Fortunately, there are ways to get scripts to download without blocking any other resources in the page, even in older browsers. Unfortunately, it’s up to the web developer to do the heavy lifting.

There are six main techniques for downloading scripts without blocking:

  • XHR Eval – Download the script via XHR and eval() the responseText.
  • XHR Injection – Download the script via XHR and inject it into the page by creating a script element and setting its text property to the responseText.
  • Script in Iframe – Wrap your script in an HTML page and download it as an iframe.
  • Script DOM Element – Create a script element and set its src property to the script’s URL.
  • Script Defer – Add the script tag’s defer attribute. This used to only work in IE, but is now in Firefox 3.1.
  • document.write Script Tag – Write the <script src=""> HTML into the page using document.write. This only loads script without blocking in IE.

You can see an example of each technique using Cuzillion. It turns out that these techniques have several important differences, as shown in the following table. Most of them provide parallel downloads, although Script Defer and document.write Script Tag are mixed. Some of the techniques can’t be used on cross-site scripts, and some require slight modifications to your existing scripts to get them to work. An area of differentiation that’s not widely discussed is whether the technique triggers the browser’s busy indicators (status bar, progress bar, tab icon, and cursor). If you’re loading multiple scripts that depend on each other, you’ll need a technique that preserves execution order.

Technique Parallel Downloads Domains can Differ Existing Scripts Busy Indicators Ensures Order Size (bytes)
XHR Eval IE, FF, Saf, Chr, Op no no Saf, Chr ~500
XHR Injection IE, FF, Saf, Chr, Op no yes Saf, Chr ~500
Script in Iframe IE, FF, Saf, Chr, Op no no IE, FF, Saf, Chr ~50
Script DOM Element IE, FF, Saf, Chr, Op yes yes FF, Saf, Chr FF, Op ~200
Script Defer IE, Saf4, Chr2, FF3.1 yes yes IE, FF, Saf, Chr, Op IE, FF, Saf, Chr, Op ~50
document.write Script Tag IE, Saf4, Chr2, Op yes yes IE, FF, Saf, Chr, Op IE, FF, Saf, Chr, Op ~100

The question is: Which is the best technique? The optimal technique depends on your situation. This decision tree should be used as a guide. It’s not as complex as it looks. Only three variables determine the outcome: is the script on the same domain as the main page, is it necessary to preserve execution order, and should the busy indicators be triggered.

Decision tree for optimal async script loading technique

Ideally, the logic in this decision tree would be encapsulated in popular HTML templating languages (PHP, Python, Perl, etc.) so that the web developer could just call a function and be assured that their script gets loaded using the optimal technique.

In many situations, the Script DOM Element is a good choice. It works in all browsers, doesn’t have any cross-site scripting restrictions, is fairly simple to implement, and is well understood. The one catch is that it doesn’t preserve execution order across all browsers. If you have multiple scripts that depend on each other, you’ll need to concatenate them or use a different technique. If you have an inline script that depends on the external script, you’ll need to synchronize them. I call this “coupling” and present several ways to do this in Coupling Asynchronous Scripts.

47 Comments