Wednesday, June 4, 2008

3 reasons why Hillary lost

Caucus states, Small-donor money, Bill Clinton

An over-simplification but probably the gist of it.

Sunday, February 17, 2008

Untangling REST

Link:

Lately, however, it seems that I spend more time answering people’s questions about REST than I should [...]. I need to start organizing my own correspondence.

When the Roy himself talks about REST you listen.

Subscribed.

Sunday, December 9, 2007

When in doubt, restore

I've had Windows Vista on my laptop for a few months now. For the last couple of days it had been dying on me, one application at a time. No audio in media players, Network List Service failing to start, Network Icon showing a big fat red X, et al.

Googling the issues didn't get me far. So finally I decided to do it - System Restore. It's been behaving well since.

Now if only life had that option :)

Wednesday, December 5, 2007

The next bubble

Classic!

make your elevator pitch
code it up and flip the switch





Monday, June 18, 2007

ArcGIS Server REST and JavaScript APIs announced

REST and JavaScript APIs for the ArcGIS Server were announced at the ESRI UC Plenary Session today.

At 9.3 you can REST-enable your ArcGIS Server services. You can then either consume them from the ArcGIS JavaScript library as well as from other clients such as Google Maps, Virtual Earth and the like.

While at the UC, you can learn more about these APIs at the Server Road Ahead sessions, the EDN sessions, Advanced ArcGIS Online Sessions and also at the Java SIG.

Tuesday, June 5, 2007

HTTP Content Optimization

You have your business engine all setup. Your processing algorithms have been optimized to the hilt. Your data model is as scalable as any. Any now you are publishing your data to the WWW. Well, the good news is you can still do more - all with plain old HTTP.

Herein I list 3 simple ways you can leverage HTTP to help you better serve your content:

  1. Cache-Control headers
  2. ETag + If-None-Match
  3. gzip
1. Cache-Control headers

Cache-Control is byfar is the most widely used of all http headers and for good reason. You generate your response and you set a Cache-Control response header with a validity period. The clients, the intermediaries and the web infrastructure at large all work overtime for you caching your content until the time that you have tagged it valid.

This of course works best for static resources or for such dynamic resources whose validity you can reasonably predict before hand.

2. ETag + If-None-Match

This is one of the most powerful but unfortunately, a not-frequently used technique. So your content is such that you can't reasonably predict its validity period. Which means #1 doesn't work for you. Your next best buddy is ETags.

This is how it works: You generate your content and you set the http header ETag (entity tag). The ETag represents the state of your resource. Even if one bit of the resource content changes, so does its ETag. You can think of ETag as a simple hash of your content. Ok so you have set the ETag and sent the response. Now the next time the client tries to access the same URL, it will send an If-None-Match request header and it will be set to the same value as that of your ETag. Now you can either regenerate the content or you may have it cached on your server, if the ETag of your content matches the If-None-Match it implies that the content has not changed. The client has indicated to you thru the If-None-Match that it already has this content. So what do you do - send nothing! Yes - simply set the response status to HTTP 304 (Not Modified) and the size of the content that you send this time is exactly 0. At a minimum, you gain in saved bandwidth (read performance) but if you have cached the content on your server, you also gain from saved computation.

3. gzip

With #1 and #2 you benefit by not having to resend your content in certain situations. But even when you can't get away from having to send content, you can still gain in bandwidth by simply compressing it. But of course you want to compress content only for clients that you know can decompress it and even then you need to tell the client that the content you sent is compressed and it needs to decompress it.

No hassle - http makes it fairly straightforward. If the client understands gzip, it sends an Accept-Encoding request header with the string gzip in it. If you (the server) read this header and find the string gzip, you gzip your content and set a response header Content-Encoding to gzip. This tells the client that the content is gzipped and it must decompress it before providing it to the user.

It's normal for gzip to compress text content in upwards of 70% and given how easy it is to compress content you should be compressing your content right about now.

Note that #2 + #3 put together have problems in IE 6 and you might have to take that into account.

But all in all optimizing the delivery of your content with http is simple yet powerful and your web application can only benefit from it.

Friday, May 25, 2007

IE 6: gzip + ETag != If-None-Match

ETag + If-None-Match give you the benefit of not having to send unmodified content repeatedly (HTTP 304). IE6 handles this well.

Content-Encoding with gzip gives you the benefit of compressing the content that you send. IE6 handles this well as well.

So gzip + ETag / If-None-Match should give you the combined benefit of sending compressed content when you must and not sending content at all if it's not modified. Well, as you might have already guessed, IE6 does not handle this well. If your content is gzipped and you send an ETag header as well, IE6 does not send an If-None-Match on subsequent requests. Which of course means that you can't leverage HTTP 304.

So if you are servicing IE6 clients beware that it supports either compression or ETags but not both.

Thankfully, this has been fixed in IE7. Firefox of course just works.

Friday, May 18, 2007

More Resources == Scalable

Link: Sam Ruby on Google Maps

Of course, the rest of the iceberg was that Google had simply tiled the Earth. In so doing, they converted a single web service (call me with a bunch of information, and I will provide you with a custom result) into a large number of individually addressable, cacheable, and scalable web resources.
That's it. More the resources more is the opportunity to use Cache-Control headers, to use ETags, to distribute and load-balance the system.

In the same article, Sam also talks about how the web is not a service but a space. And in today's world adding more "space" will scale your system manifold than implementing a state-of-the-art service with the most optimal algorithm. Processor speeds have flattened. Today it's about dual-cores, quad-cores, (your-budget)-cores. The more cores your program can use to get the job done, the more scalable will your system be.

As Brian Goetz puts it: "Tasks must be amenable to parallelization". Parallelization comes for free with every resource you add. So add more resources and see your system scale.

Sunday, May 13, 2007

JavaOne Days 3-4: Mashups and garbage collection

If you are surprised that I am talking about mashups and garbage collection in the same post, well, so was I. But if one can talk about implementing servers in JavaScript, I most definitely can talk about mashups and garbage collection in the same breath.

Day 1 was primarily scripting, day 2 hardcore java. Days 3 and 4 saw sessions on the entire gamut of Java technologies - from garbage collection to mashups. I discuss some of them here.

Blueprints for Mashups

This was arguably the most informative session for me by far at this year's JavaOne. Kudos to Greg, Mark and Sean for putting together a to-the-point, practical and readily-usable session together. Personally, this session validated the REST and JSON concepts I had gathered over the past few months. I'll recommend this session to anybody interested in building mashups, REST services, AJAX and a whole lot more. (No, they haven't paid me to say this.)

Anybody building REST services should design their system with JavaScript in mind. In today's mashup world the browser (and hence JavaScript) is your first class client. XML and JSON are the typical content types returned by REST services. To get over the browser's cross-domain restrictions, there are 2 possible solutions:

  • Server-side proxy: Clients always send requests to the server that is hosting the page. The server in turn acts as a proxy, sends your request to the (remote) mashup service and returns the data it gets from the mashup service to you.
  • Dynamic script tags: Browsers allow script tags to communicate with cross domain servers. This opens up the opportunity for you to issue requests to any mashup service by generating dynamic script tags.
Consider the Atom format if you are serving XML. This opens up your service to the large number of feed readers and other Atom clients out there.

JSON has of course gained immense popularity of late. Although JSON is technically a serialized JavaScript object, the JSON format is highly portable and language independent. In addition to returning JSON, you can also support wrapping the JSON object in a JavaScript callback method (they called it jsonp - JSON with padding). This enables clients to specify a callback method which can readily work with the JSON that the server returns.

Various options are available for securing your REST services:
  • User tokens
  • Session based hash
  • URL based API key
  • Authentication headers
JavaScript best practices:
  • Use namespaces
  • Use CSS for applying styles
  • Don't add to the prototype of common JavaScript objects
  • Setting the innerHTML property is easier / better than DOM manipulation
Components of a ("good") Mashup API / library:
  • A server-side service
  • A client-side JavaScript API / library
  • A client-side CSS for applying styles
  • Document the API
  • Create simple examples enabling a simple cut-and-paste approach to learning your API
As you can see, they have, true to the name, provided blueprints for mashups, best practices, possible hurdles and workarounds, problems and solutions. Once again - highly recommended for anybody who has / will have / may have anything to do with mashups.

Phobos - A Server-side JavaScripting framework

While JS clients are ubiquitous and obviously here to stay, for the life of me, I have not yet come to terms to implementing my server in JS as well. The 2 reasons they cited - impedance mismatch and all-you-need-is-an-F5 to redeploy - didn't quite do it for me. May be in another life I'll grow up to JavaScript servers, but not for now. And if at all that day were to come, there should be a JavaScriptOne and no JavaOne.

Garbage collection

If there was one objective of this session, it was to clear the GC myths out there. Some of the finest Java minds were refuting the urban legends out there and when they talk you listen:
  • Object allocation is cheap. Reclaiming young objects is also cheap.
  • Small, short-lived immutable objects are good. Large, long-lived mutable objects are bad.
  • Nulling references rarely helps - except when it comes to arrays.
  • Avoid finalizers - in most cases there are better alternatives possible.
  • Avoid System.gc() - except between well-defined application phases and when the load on your system is low.
  • Object pooling is not required in today's VMs. It is a legacy of older VMs. Exceptions are objects that are expensive to allocate / initialize or objects that represent scarce resources.
  • Consider using reference-objects for limited interactivity with the garbage collector.

Certain memory-leak pitfalls:
  • Objects in wrong scope
  • Lapsed listeners
  • Metadata mismanagement

They strongly advocated FindBugs for finding such pitfalls as well as other potential bugs.

JavaOne Day 2: Hardcore Java and Java EE 5

While Day 1 saw an overdose of scripting, sanity made a return on Day 2. Although JavaScript technology (whoever came up with that) sessions continued, there was a good supply of deep-dive Java and Java EE sessions to keep me busy. And there was also some WADLing.

Generics and Collections

Generics have been around for a while and most Java programmers have some understanding of it. There have been many criticisms against the erasure based implementation of generics - since parameter types are not reified, constructs that require type information at runtime don't work well. However erasures allow 2 most important benefits - migration compatibility and piecewise generification of existing code. Just these benefits make erasures a necessary bane.

Huge additions have been made to the Collections framework in Java 5 and 6. So much so that many recommend programmers to never use arrays - Collections all the way!

Concurrency Practices

The concurrency classes introduced in Java 5 are a great new toolset for the Java programmer. New concurrency related annotations such as @ThreadSafe, @GuardedBy, etc. are being considered. One important aspect to keep in mind - imposing locking requirements on external code is asking for trouble. Ensure that you make your code threadsafe yourself. Performance penalties incurred by synchronized constructs are overblown.

Immutable objects are your friends - final is the new private! They are automatically threadsafe. Object creation is cheap. Aim for less and less mutable state.

Performance is now a function of parallelization - write code such that it can use more cores to get the job done faster. So to improve scalability find serialization in your code and break it up:

  • Hold locks for less time
  • Use AtomicInteger for counters
  • Use more than one lock
  • Consider using ConcurrentMap
  • Consider ThreadLocal for heavyweight mutable objects that don't need to be shared
Effective Java

Builder pattern allows you to construct objects cleanly in a valid state with both required and optional parameters. The basic premise can be expressed in code as such:
MyObject myobj = new MyObject.Builder(requiredParam1,requiredParam2).optionalParam1(opt1).optionalParam2(opt2).build();

Generics - avoid raw types in new code. Don't ignore compiler warnings. Annotate local variables with @SuppressWarnings rather than the entire method / class. Use bounded wildcards to increase applicability of APIs. 3 take home points for paramterized methods:
  • Parameterize using <? extends T> for reading from a Collection
  • Parameterize using <? super T> for writing to a Collection
  • Parameterize using <T> for methods that both read and write
If a type variable appears only once in a method signature then consider using a wildcard. Avoid using bounded wildcards in return types. Generics and arrays don't mix well - consider using generics wherever possible.

Java EE 5 Blueprints

Filter.doFilter() may run in a different thread than the Servlet.service() method. The Servlet API does not use NIO, however it is possible to implement it yourself. New annotations may make web.xml optional.

The JSP and JSF EL have been unified. # is now reserved in JSP 2.1. The javax.faces.ViewState hidden field (for client side state) will be standardized. Include this field in your AJAX postback. Application-wide configuration of JSF resource bundles will be possible in faces-config.xml. The <f:verbatim> tag is no longer needed to interleave HTML and JSF content. @PostConstruct and @PreDestroy annotations will be supported for JSF managed beans.

WADL

WADL - Web Application Description Language. As a colleague of mine put it - it's WSDL for REST! And IMO that's almost what it is. It's all in good intent to introduce WADL - a formal definition of your REST resources, allows you to automatically stub out code in your favorite language, aids testing. However WADL has many rough edges and I don't see myself using it anytime soon until those have been addressed - overloaded / loosely typed query parameters, security, non-standard content negotiation, et al.