2 posts on Urls

On URL readability

2 min read 0 comments Report broken page

Yesterday, I was watching some season 6 episodes of Futurama (btw, this is their best season ever!) and I noticed the URLs in the website I was in (let’s call it foo.com). They were like:


I thought to myself “hey, this looks very clean and readable”. And then I noticed that it only has 1 less character than its non-rewritten counterpart:


However, I’m pretty sure you agree that the second one is much harder to read. I asked for opinions on twitter, and got many interesting replies. Apart from the ones that completely missed the point, these were the core explanations:

  • = and especially & are more complex and look more like letters, so our brain has trouble tuning them out (@feather @robert_tilt @rexxars @mrtazz @manchurian)
  • Slashes have more whitespace around them, so they are less obtrusive (@feather @stevelove @kenny1987 @janl)
  • They’re all visual noise, but we always have slashes in a URL, so using the slash to separate keys and values as well only introduces 1 separator instead of 3 (@bugster @craigpatik @nyaray)
  • Slashes imply hierarchy, which our brains process easier than key-value pairs. Key-value pairs could be in any order, paths have a specified order. (@sggottlieb @edwelker @stevenhay @jwasjsberg @stazybohorn)
  • Ampersands and equal signs are harder to type than slashes. They’re both in the top row and ampersands even need the Shift key as well. (@feather)
  • Ampersands and equal signs have semantic meaning in our minds, whereas slashes not as much (@snadon)

Regarding hierarchy and RESTful design, the first example isn’t exactly correct. If it was hierarchical, it should be foo.com/futurama/seasons/6/episodes/9. As it currently stands, it’s key-value pairs, masquerading as hierarchical. However, it still reads better.

So I’m leaning towards the first three explanations, although I think all of them have a grain of truth. Which makes me wonder: Did we choose the wrong characters for our protocol? Could we have saved ourselves the hassle and performance overhead of URL rewriting if we were a bit more careful in choosing the separators back then?

Also, some food for thought: Where do you think the following URLs stand in the legibility scale?



http : //foo.com/futurama-season-6-episode-9  (suggested by Ben Alman)

Do you think there are there any explanations that I missed?

Get your hash — the bulletproof way

3 min read 0 comments Report broken page

This is probably one of the things that everyone thinks they know how to do but many end up doing it wrong. After coming accross yet one more super fragile snippet of code for this, I decided a blog post was in order.

The problem

You want to remove the pound sign (#) from location.hash. For example, when the hash is "#foo", you want to get a string containing "foo". That’s really simple, right?

Tricky cases

What most developers seem to miss is that in modern, JavaScript-heavy applications, a hash can contain any unicode character. It doesn’t necessarily have to correspond to the value of an actual id attribute in the page. And even when it does, ID attributes can now contain almost any unicode character. Another thing sometimes forgotten is that there might be no hash in the page. Even in a URL that ends in #, location.hash is actually equal to "" (the empty string) and not "#".

Naive approaches

This one is the most recent, found in a book I was tech reviewing:

var hash = location.hash.match(/#(\w+)/)[1];

which has quite a few issues:

  • Returns wrong results when there is any non-latin or non-alphanumeric character in the hash. For example, for the hash #foo@o#bar$%huh hello, just "foo" would be returned.
  • Throws a TypeError when location.hash is empty, since .match() will return null.

Other variations of this pattern I’ve seen include using explicitly defined character classes instead of \w, adding an anchor (^) before the pound sign (which is an excellent idea for performance) and checking if .match() actually returned something before using its result. However, they usually also fall into at least one of the 2 aforementioned issues.

Another approach a friend of mine once used was this:

var hash = location.hash.split(‘#’)[1];

This also has its issues, which are ironically less than the first one, even though it seems a far more naive approach.

  • With the same test hash, it would at least get the "foo@o" part, which means it only fails when the hash contains a pound sign
  • When there’s no hash, it doesn’t throw an error, although it returns undefined instead of the empty string.

Getting it right

The approach I usually use is far simpler than both of the above and probably looks too loose:

var hash = location.hash.substring(1);

However, let’s examine it a bit:

  • With our weird test hash, it actually returns the correct result: “foo@o#bar$%huh hello”
  • When no hash exists, it correctly returns the empty string

“But it assumes there’s a pound sign at the start of the string!” I almost hear some of you cry. Well, that could be a real concern, if we were dealing with an arbitrary string. In that case, we would have to check if there’s actually a pound sign first or if the string even exists. However, with location.hash the only case when that is not true, is when there is no hash. And we got that case covered. ;)

Edit: As pointed out in the comments, you may also use location.hash.slice(1) instead of substring. I kinda prefer it, since it’s 4 bytes shorter.

If however you’re obsessed with RegExps and want to do it with them no matter what, this is just as bulletproof and almost as short:

var hash = location.hash.replace(/^#/, ‘’);

If for some reason (OCD?) you want to do it with .match() no matter what, you could do this:

var match = location.hash.match(/^#?(.*)$/)[1];

In that case, since the pound sign is optional, since .match() never returns null. And no, the pound sign never erroneously becomes part of the returned hash, because of the way regex engines work.

“This is too basic, what a waste of my time!”

Sorry for that. I know that for some of you, this is elementary. But the guy who wrote that book is very knowledgable (the book is really good, apart from that code snippet) so I thought this means there are many good developers out there who get this wrong, so this post was needed to be written. If you’re not one of them, you can take it as a compliment.

“Hey, you missed something too!”

In that case, I’d love to find out what it is, so please leave a comment! :)