Jonathan Hedley

How-to: Optimize your site for speed

Does your website load as quickly as you — and your users — would like? If not, here’s a detailed set of proven guidelines aimed at improving the speed of your site.

The benefits of speed optimized pages:

  1. Your visitors will be happier, and will feel much more engaged on a snappy site than a slow one. User interface responsiveness is a very large contributing factor to how users trust your content and company. Users trust and enjoy fast sites, and are quickly frustrated by slow sites.
  2. The faster that you can serve content to your visitors, the faster they’ll be off your servers, leaving them free to serve the next visitor. That means that you can handle greater traffic loads, with less hardware.
  3. Smaller overall downloads means lower bandwidth bills at the end of the month.

You can often get quite massive improvements with just a few tweaks. A few years ago I worked on a project for the Sydney Morning Herald that reduced the time to display the first story on the homepage on a modem from around 17 seconds down to 2 seconds. Broadband connections had a similar relative speed improvement.

Summary

By order of biggest improvements first:

  1. Reduce the total number of HTTP requests
    1. Build style sheet and JavaScript libraries
    2. Combine images into CSS sprites
    3. Enable intelligent caching: send expiry headers, and support cache validation
  2. Support progressive page rendering
    1. CSS at the top of the page, JavaScript at the bottom
    2. Don’t use document.write, and minimise calls to external JavaScript
    3. Use AJAX to load complex secondary page data out of band
  3. Reduce the overall download size
    1. Compress text files (HTML, CSS, and JavaScript) on-the-fly with gzip
    2. Minify JavaScript
  4. Speed up the backend application

That looks pretty straightforward (and maybe you’re thinking that this is all a blinding flash of the obvious): that’s because it is. But these techniques can bring some great improvements, and you might find some interesting ideas in the details.

Methodology

Before you get started: define a goal for the optimization. Having an explicit goal means that you’ll know when you’ve finished, and it gives a target to aim towards. An example goal might be that a user, with a cold cache, can read a story on the homepage within 2 seconds, and the whole page is downloaded within 8 seconds, on an average internet connection, for your users.

Whatever your goal is, make it precise and clear. It’s good to have a stretch goal, but it still needs to be obtainable.

As with any optimization, it’s crucial to measure your initial state, and the results of each modification. Otherwise it’s impossible to quantify any improvement. Not all of these tweaks will give the same level of improvement, and depending on your environment, some might not be worth implementing. Measure, and you will be able to make a sound decision.

Start by recording the current download timings of the site / page you’re optimizing: the time to the first main content, time till interactive, time to download complete. Repeat each timing run at least 3 times to give some statistical significance. Time both with cold caches (a full reload) and with warm caches.

To give a detailed picture of what the browser is doing, use tools like Firebug’s network tool, Charles Proxy or Wireshark, and review the server logs. It’s important to be able to watch the browser hitting the server in real-time by tailing the logs: it lets you verify that your test has been implemented correctly.

After each tweak, run a set of timings again, and keep a spreadsheet log of what you did and what the impact was.

Reduce the number of HTTP requests

On most sites, the major component of download time is not the base HTML file, but the number of subsequent HTTP requests to load the page’s supporting files: the CSS, the JavaScript, the site furniture graphics, the pictures, etc. Each of those are extra HTTP requests, and each unique request takes a relatively long time. The fewer requests to the sever that the browser has to make, the faster the page will download.

There is an inherent overhead in each HTTP request. It takes substantially less time to serve one 30K file than it does three 10K files. While HTTP keepalives are useful (and you should ensure they are enabled), they don’t help as much as I had expected.

Combine files into libraries

Most sites will have a few CSS files, a few JavaScript files, and certainly many graphics that make up the site furniture. Combine each file in a type into a library.

CSS and JavaScript libraries can be simply created just by concatenating them into one combined file (each, for CSS and for JavaScript, obviously). You can quickly go from 10 or more files (that are needed before much can be shown) down to 2.

Some sites run a hierarchical setup where there are specific CSS and JavaScript files for each level in the hierarchy (i.e. you might have: core.css, home.css, technology.css, gadgets.css); and the browser needs to load them all to display a deep page. This might have been set up in an attempt at improving cacheability. It is nearly always better to have the browser download one specific file than it is to hope to have some already in cache (and probably have to load two or three extras anyway).

To keep the modularity that comes with splitting these files out by section (or business unit), keep them split in your development process, and combine them in your build process. A simple Ant task will combine them. Alternatively, use custom code to combine the files on the fly, when presented with a URL like core.css,home.css,technology.css.

For images: use the CSS sprite technique. Briefly, the images are added to one larger image file, and laid out in a convenient way. A CSS background with a specific top and left offset is then used to show each specific graphic where required. This works bests for static page furniture; it is difficult to set this up for more dynamic content like news photos.

Make files cacheable

Once you have reduced the total number of unique files required for the page, make what remains cacheable.

Caches mean that files often don’t need to be downloaded at all, and the browser can do a quick check to see if a file has changed since the last time it was fetched, and not retrieve it if it hasn’t changed. And caches aren’t only in the browser: a caching proxy or CDN that’s close to the user can give strong speed improvements too, and serve files to more than one user, reducing your overall bandwidth bills.

But be careful not to rely on caching as a crutch: people always have to visit your site for the first time. Have a look at your revisit ratio and you’ll likely find that most people won’t have a primed cache. Caches are most useful for subsequent page loads within one user session.

Use the Expires and cache-control max age headers for all pages, both dynamically and statically created. The TTL (time to live) that you set will depend on how often the page updates, and how quickly after it does update you want those changes reflected. 5 to 20 minutes is often appropriate. Allowing pages to be cached won’t affect your analytics or ad impressions, as these are best recorded via JavaScript hits that are set to be uncacheable.

Make dynamic pages support the if-modified-since request header, and send the last-modified date header. This enables cache validation: when a browser goes to render a page that it had cache that has gone past its TTL, it will send a GET request that is conditional on the document’s modification date. If your application doesn’t support that conditional request, it is obliged to send the full document, even if it hasn’t changed.

Practically, the last-modified date can be very efficiently determined for most dynamic pages by running a version of the main content query. For example, on a news site, use the date of the most recent news story in the relevant section as the last-modified date. Even if you have to run through all of the normal business logic required to generate the page, it still makes sense to short-circuit and not send the page’s HTML, but the not-modified response header instead, if the content hasn’t changed.

Use far future expiry headers on static resources (pictures, furniture graphics, CSS, and JavaScript). Setting an expiry date many years into the future means that the browser and proxies can aggressively cache the content, and won’t need to validate the cache.

Obviously, you will want the flexibility to update the page furniture over time. Do this by creating new versions with new filenames — build the date or version number into the file name. A beneficial side effect of this strategy is that you know that as soon as the referencing page is published, visitors will access the updated support files.

Use the cacheability engine to test that you have caching and validation set up correctly.

Allow progressive rendering

As the browser downloads the page, give you readers something to see as soon as possible. People perceive time oddly: if they see incremental progress as the page downloads, the will often perceive this as loading much faster than a page that doesn’t show anything until it is 100% complete, even if the total download time is the same.

As browsers download a page, they will do their best to render the content as it comes in. But there are circumstances that make it difficult for the browser to do this.

Load CSS files at the top of the page — from within the head section. Browsers generally won’t render anything until the style sheet has been loaded, so as not to show a flash of unstyled content. The sooner the CSS is loaded, the better.

Conversely, it’s best to load JavaScript files at the bottom of the HTML — just before the closing body tag. When a browser rendering thread comes across a JavaScript source file that has not yet been downloaded and interpreted, it must pause rendering until the JS load is complete. This is because JavaScript files may execute the document.write command, which inserts HTML at the source file’s position.

It’s far better to construct user interface JavaScript that can run once the page HTML has been delivered and rendered, and then the JS can make the appropriate changes to the page DOM. This also has a useful benefit in helping to keep the semantic HTML separate from the UI logic. Note that browsers tend to load JS in priority to images, so even if the JS is at the bottom of the page, it will be loaded in priority to images higher up in the source.

As a rule, don’t use document.write: it stalls the browser renderer until all JavaScript has been downloaded and evaluated, and generally benchmarks much lower than DOM HTML manipulation. Particularly don’t use an inline <script> tag with an external source to fetch ads: rendering will completely stall for each ad until the ad server can return the JavaScript (and this gets even worse when less reliable third party ad servers are used). This makes your page very reliant on the ad server: if it is slow or not available, your page will be collateral damage. Rather, load ads via iframes, and insert the iframe code itself at the end of the page with JavaScript: this gets them loaded asynchronously to the page.

Use different host names to increase the number of active download threads. Browsers will generally allocate 2 to 4 download threads per host. Serving static resources on different host names will encourage the browser to download more content at once. This can easily be set up with domain name wildcards and virtual hosts. Another benefit of using multiple hosts is that slow connections downloading large files won’t tie up your application server.

Check the basics: make sure that all images have height and width tags.

For complex HTML that makes the browser chug, or for secondary data on the page that takes a long time on the backend to generate, consider using an AJAX method to load and display this content out of band, after the core page has been downloaded and rendered. If you use this technique, put a sized placeholder in the core HTML so that the page doesn’t abruptly re-layout when the new content is loaded in.

Reduce overall download size

Smaller files and overall size means people can see the content sooner, so they’re happier; they get off your network sooner, so your infrastructure can serve the next reader; and a lower bandwidth bill at the end of the month. Of course you need to trade those benefits off against the file size required to reach the required level of utility and aesthetic value for the site.

Serve compressed HTML, JavaScript, and CSS files using on-the-fly gzip compression. This incurs a slightly higher server CPU load per page impression, but it gets people off your server and network much sooner. It allows you to serve more page impressions, and the user gets their content much faster.

Nowadays you can safely gzip all of these textual file types in modern browsers, but older browsers will need some handholding — some prefer to only have the HTML compressed. Use the Apache browser match directives to set this up.

Minify your JavaScript — but keep the original source around for editing and debugging. Minification, which effectively compresses the file by removing formatting (and potentially by shortening function and variable names), can bring files down to 60% of their original size. Add gzip compression to that as well and you’re looking at a serious size reduction.

At the extreme end, minify HTML and CSS (remove HTML formatting, trim class names, omit unambiguous quotes around attributes, etc).

Check the basics: don’t have massive pictures inline, but use thumbnails that link to the full size images.

For image intensive sites, consider not loading images until they are scrolled into view. This saves bandwidth costs on content that is never seen.

Speed up your backend application

I won’t go into detail here, as the biggest improvements in load time tend to be in the client-side downloads rather than the back-end application. But here are a few ideas for speeding up the backend:

Use a CDN if your business model can afford it: particularly for static files and for shareable (public, not personalised) pages.

Use a distributed application object cache (like memcached) to cache SQL / CPU intensive results. A distributed cache means that you maximise how much content you can keep in cache, and thus maximise your hit ratio: each server can dedicate its otherwise free RAM to cache, which the whole cluster can share.

Use Squid or another caching reverse proxy if you need to quickly mitigate traffic load without baking caching into your application.

Use a load balancer (like Perlbal) which distributes traffic according to which server has the least number of active connections.

And design your apps not to use session, or put the session into your distributed cache.

Conclusion

These guidelines should give you a head start in speed optimising your site. The benefits are many: happier users, increased serving capacity on your network, and lower bandwidth costs.

Please let me know if you have any other optimization suggestions, what successes you’ve had optimizing your site, or if you have any suggestions for improving this guide.

Copyright © 2008 Jonathan Hedley Home About Contact Feed