23rd
2011
What’s really important about hashbangs — #! # !!!
Recently, a few large sites have been redesigned to use a #! separator in their url structure, notably twitter and the entire gawker suite. In short, it lets them serve their entire site as one single HTML page with a chunk of javascript, and the browser handles loading the actual content of the page from the string after the #! separator. This has been causing a fairly loud uproar in the internet standards community, because doing so goes against the standards for urls, and these urls break for everyone not using javascript. Some background reading, if you’re just coming in:
- Mike Davies: Breaking the Web with hash-bangs
- Tim Bray: Broken Links
- James Aylett: Wisdom comes from deliberate reflection
There’s been some great discussion of this issue so far, but a few things have been missing (or in some cases mentioned in passing but not strongly stressed) from the various pieces of analysis I’ve read. Here’s what I think really matters about this:
- First and foremost, there’s a reason why people are using #! uris. Arguably, most of what they get out of it can be done by other methods that don’t break as much (using another character, URL rewriting, etc…), but there’s one important piece that can’t. If you want to change the location in the address bar from javascript without a reload, you’re only allowed to modify the path after the #. This is a dealbreaker for any other scheme. The browsers simply won’t let you rewrite anything before the # without a page reload. If you want javascript navigation between “pages” on your site, you have to use the fragment to store the location info. There’s a mechanism coming to allow this, but for now, this is the only way to do it in all common browsers. It remains to be seen whether crafting an entire site as javascript content loads is a good idea, but this is how it’s going to be done. Google has encouraged this behavior by endorsing this method as the standard substitution string with a method for indexing dynamic sites. This was a terrible idea, but until Google retracts that standard, people are going to do this.
- It is important to distinguish that not all URIs (“Uniform Resource Identifiers”) are URLs (“Uniform Resource Locators”). A URL is a special kind of URI that ‘provide[s] a means of locating the resource by describing its primary access mechanism (e.g., its network “location”).’ I would argue that the real problem with a #! is that the URI now contains information that is semantically relevant to the server, after the fragment separator. As a result, these URIs no longer qualify as URLs because the routing information contained within can no longer be used to find the resource in question. The fragment identifier is supposed to be ignored until you get to the client: “the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme.” The issue stems from the reality that just about everybody who deals with URIs expects them to also be URLs, and these are not. This is particularly problematic for keeping track of resources on the internet. http://www.example.com/about.html#contact and http://www.example.com/about.html#team are the same resource. http://twitter.com/#!/fields and http://twitter.com/#!/anythingelse are not, but unless you ignore the #!, they are.
- Some have suggested just stripping the #!, but that doesn’t consistently work to get you the content of the page, it only serves to get you back to the right place if your client speaks javascript. This is a fine way to preserve the links when you’re sharing them, but not guaranteed to work for any sort of spidering or indexing. Twitter seems to do the right thing here and give you an html page if you request it without the #!, but gawker doesn’t - they issue a 301 redirect to the #! page, which is useless if you ignore the javascript. There is no way to get the content of the page from the #! form. In that case, you have to replace these with the ?_escaped_fragment_= form if you want the actual content. Except - in order to know to do that, you need to have the #! preserved in the url and not stripped.
Unfortunately, there is zero consistency in how these urls are being treated on the server side. The only sane response I can see to this is to not parse #! (or /#/, which suffers from the same problems) as a fragment separator. This is clearly in violation of the spec, but so is using anything after the fragment specifier as location information for the resource. (For the rubyists among you, I forked my own version of the addressable gem to do this, though it won’t be incorporated into the main branch because it doesn’t follow the URI spec.)