Case Sensitivity in URLs

I am an open source guy – so naturally I prefer the flagship OSS software Apache to the proprietary Microsoft IIS. But there is one area where IIS does a better job than Apache – the case sensitivity of URLs.

Are URLs Case Sensitive?

They should not be – but they sometimes are. Domain names are not case sensitive – for example http://www.apache.org/ and http://WWW.Apache.Org/ goes to the same location. But in the LAMP platform, the path is case sensitive…

But in the case of Microsoft IIS server, this is not true – try…

Reason: Linux Filesystem is Case Sensitive

The root cause of this is that the filesystem in the Linux OS is case sensitive – while FAT32/NTFS filesystems in Windows are not.

Dynamic URLs

Now Dynamic/friendly/clean URLs are appearing in many CMS tools. A good example for this is the ‘permalink structure’ in WordPress. These dynamic URLs could be case sensitive or not – it depends on the software. In WordPress they are case insensitive. Del.icio.us is also case insensitive. TinyURL is another service that uses case insensitive URLs. But it is possible for the tool to make the URLs case sensitive.

From the SEO perspective

If the search bot visits two urls say, example.com/MyWebPage/Index and example.com/mywebpage/index , will the bot index both page contents? If they are same, will one get the duplicate content penalty? Or will google just index the URL with lower case and ignore the other – remember, in Linux/Apache, both pages may have different content.

Conclusion

The RFC for URL says they must be case insensitive.

For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow “HTTP” as well as “http”).

Apache must not use the filesystem as an excuse – I really hope they provide case insensitive URLs

6 Comments

  1. Not sure about if google will penalize the pages or not….but yeah one thing for sure….life is easier as most of us are using wordpress..otherwise people would have just cried..isn’t it ?

  2. I think you should go through that RFC again and look at the BNF.

    That quote is for scheme section hence lower case in “scheme names” only, the scheme part is defined as alpha (both lower and upper case).

    But I can agree it is arguable whether one or the other is ‘better’, ie. aiming at hefty URL spec here.

  3. As suggested by m.kanlic, the RFCs on require only the scheme and host parts of the URL/URI to be case insensitive.

    Note that making URLs case-insensitive reduces the effective namespace of your website, promotes laziness on the part of developers, and can result in unintentional and undesirable duplicate content issues.

    Furthermore and regarding the comment on Apache using the filesystem as an excuse, note that the case -> no-case transformation is uni-directional (i.e. cannot be reversed), which makes it practically impossible for Apache (or anything else) to use a case-sensitive filesystem with case-insensitive URLs.

  4. I actually like the case-sensitivity part of the apache. I can have files with same name and still have different contents just with mix and match of characters.

    But there’s a way you can avoid that if you want to… just put your htdocs directory in the fat partition and you have same Apache setup with case insensitive working.

1 Trackback / Pingback

  1. Who’s idea was it to remove file extensions from URLs? :: Jaisen Mathai

Comments are closed.