Tuesday, June 5, 2007

HTTP Content Optimization

You have your business engine all setup. Your processing algorithms have been optimized to the hilt. Your data model is as scalable as any. Any now you are publishing your data to the WWW. Well, the good news is you can still do more - all with plain old HTTP.

Herein I list 3 simple ways you can leverage HTTP to help you better serve your content:

  1. Cache-Control headers
  2. ETag + If-None-Match
  3. gzip
1. Cache-Control headers

Cache-Control is byfar is the most widely used of all http headers and for good reason. You generate your response and you set a Cache-Control response header with a validity period. The clients, the intermediaries and the web infrastructure at large all work overtime for you caching your content until the time that you have tagged it valid.

This of course works best for static resources or for such dynamic resources whose validity you can reasonably predict before hand.

2. ETag + If-None-Match

This is one of the most powerful but unfortunately, a not-frequently used technique. So your content is such that you can't reasonably predict its validity period. Which means #1 doesn't work for you. Your next best buddy is ETags.

This is how it works: You generate your content and you set the http header ETag (entity tag). The ETag represents the state of your resource. Even if one bit of the resource content changes, so does its ETag. You can think of ETag as a simple hash of your content. Ok so you have set the ETag and sent the response. Now the next time the client tries to access the same URL, it will send an If-None-Match request header and it will be set to the same value as that of your ETag. Now you can either regenerate the content or you may have it cached on your server, if the ETag of your content matches the If-None-Match it implies that the content has not changed. The client has indicated to you thru the If-None-Match that it already has this content. So what do you do - send nothing! Yes - simply set the response status to HTTP 304 (Not Modified) and the size of the content that you send this time is exactly 0. At a minimum, you gain in saved bandwidth (read performance) but if you have cached the content on your server, you also gain from saved computation.

3. gzip

With #1 and #2 you benefit by not having to resend your content in certain situations. But even when you can't get away from having to send content, you can still gain in bandwidth by simply compressing it. But of course you want to compress content only for clients that you know can decompress it and even then you need to tell the client that the content you sent is compressed and it needs to decompress it.

No hassle - http makes it fairly straightforward. If the client understands gzip, it sends an Accept-Encoding request header with the string gzip in it. If you (the server) read this header and find the string gzip, you gzip your content and set a response header Content-Encoding to gzip. This tells the client that the content is gzipped and it must decompress it before providing it to the user.

It's normal for gzip to compress text content in upwards of 70% and given how easy it is to compress content you should be compressing your content right about now.

Note that #2 + #3 put together have problems in IE 6 and you might have to take that into account.

But all in all optimizing the delivery of your content with http is simple yet powerful and your web application can only benefit from it.


Anonymous said...

Huzzah! I'm happy to see someone recommend these things, and not somehow imply they're only available through REST.

sgillies said...

REST is an architectural style, not a protocol, but "REST" almost always means HTTP REST, and REST advocates simultaneously promote best HTTP practices.

Anonymous said...

True. Since Fielding chose to explain HTTP thru REST, it's natural that REST and HTTP are often spoken in the same breath. Which is alright as long as one understands that one is a set of principles and the other is the way messages are exchanged on the web.

Anonymous said...

Do u have to do any extra settings on IIS to enable gzip for ESRI REST services or it is by default through ESRI REST Services?

Twik said...

Very informative blog. You shared very nice tips on content optimization. Thanks for sharing valuable information.