Beautification by Dirification?
Related: Read the resolution to this article here: Beautification Revisited
Although the subject of clean URLs has passed through the blogosphere plenty of times over the last year or two, I don’t feel there has been a definitive answer as to a) how important they are, and b) what the best way to implement them is. I plan on making a naming-convention change at Mike Industries this week and am soliciting user feedback as to the best way to go about it.
Are clean URLs even necessary?
Clearly the most compelling use of intelligent page naming is what is known in the industry as “vanity urls”. If we are calling out a special Lance Armstrong section on ESPN.com via the use of a television, radio, or print campaign, it is a huge advantage to point readers to something like “espn.com/lance”. A nice, clean, easy-to-remember URL is the only chance we have of planting information into our audience’s heads which may stick.
But what about the recent trend towards Googlefying URLs? In this case, you have a URL like:
http://www.mikeindustries.com/ blog/ archives/ 2004/ 04/ 20/ what_is_wrong_with_the_cottage_cheese_industry/
… which is neither memorable nor particularly vain. The idea, as I understand it, is mainly to pack the URL with keywords relevant to the subject of the page, so as to coax Google into awarding a higher page rank.
There appear to be advantages and disadvantages of naming pages in this fashion.
Although I don’t claim to be an engineer, all the database classes I took during business school told me that database entries are to be identified by their keyfields. A keyfield contains a unique value (usually an incrementing number) which can never be duplicated across rows. Part of the value of a keyfield is that it acts as a persistent ID and never needs to be changed since it contains no information about the entry. Anything about the entry can be changed and it won’t affect the ID.
Traditionally, I have viewed CMS-generated pages the same way I have viewed entries in a database. The URL is generated from the unique keyfield and all of the content is contained within the page itself. This is evident in the default naming conventions of many flavors of blogging software, such as Movable Type. A Movable Type URL (up until version 3.0) contained an incrementing number, a file extension, and nothing more. Something like 00000038.php or 00000045.html. This makes for a nice, always-unique incrementing URL for each entry.
When I began first developing Mike Industries, I jumped right out of the gate with Movable Type 3.0. To my surprise, the CMS automatically began naming pages based on their page titles (dirified). I thought this was great, since naming URLs this way seemed to be in fashion at the time.
No sooner did I make my first entry though, that I realized the potential downside of dirified URLs. A few minutes after clicking “Publish”, I wanted to change my page title. As I reconsidered my page title, I began considering what might happen down the road if the same thing were to happen after people began linking to one of my dirified URLs. URLs are probably the only thing on the web which must remain constant. You can change practically anything you want about a page at any point in time, but once you change its URL, you sever its ties to the world.
Another disadvantage of using Movable Type’s new URL naming convention (or any other automatic dirification mechanism) is that often the URL will get truncated into something less than optimal. For instance, “how_to_lose_the_fatherhood_blues.php” can easily become “how_to_lose_the_fat.php”. Because the URL is automatically generated and truncated, these things tend to happen a lot.
A further disadvantage of dirified URLs is that using them as a tool to butter up Google is just that. Google has smartened up to keyword packing and all sorts of other schemes, so I’m sure they will eventually smarten up to URL packing. Additionally, it is unclear to me how much help a dirified URL really provides to one’s search engine ranking.
Given these annoyances, I decided to check the “Use Old-Style Archive Links” option in Movable Type and keep my URLs as incrementing numbers.
A vanity affair
So here I am, chugging along with the site, and a month into it, things are working great. No archiving problems, a few post-publishing title changes, and a general good feeling about the naming convention I chose. But then yesterday, I got an e-mail from Sean Madden telling me that I could create better URLs by simply adding “dirify=1″ to my archive template. I knew this, of course, because I had hastily hashed out the situation in my head during the pre-launch stage, but the e-mail exchange which followed prompted a revisiting of the topic.
Following are some of the reasons I’m looking at dirifying my URLs again:
- People seem to have a nasty habit of linking to my stories and not including any text between the anchors. Some of this is probably because blog authors have URL autolinking turned on. This results in me seeing links on the web to my stuff which aren’t identifiable until I mouseover or click them.
- If I am trying to type one of my URLs into the location bar of my browser, the browser will suggest past pages I’ve visited within my domain but I can’t identify any of them because they are merely numeric.
- I’m starting to just get really vain about what appears in the address bar when I’m on a page. I realize this is silly, but sometimes it seems like part of the page, and if I’m a designer, I should be able to make it look better, right?
While we’re on the subject…
This isn’t directly related to the issue at hand, but it involves URL naming schemes so I’ll mention it anyway. Both on Mike Industries and certain major commercial sites I work on, I was thinking it would be nice to set up something like this: What if you could type in a URL like –
http://www.mikeindustries.com/thoughts on validation
… and the server would look for that exact URL. If no URL was found, you would automatically be redirected to not just a 404 page, not just a search page, but a search results page which was prepopulated with results from the terms in the address bar? If there was only one search result, maybe you’d be even be automatically redirected to that page (kind of like an “I Feel Lucky” for lazy people).
Anyway, perhaps this has been done before, but if it hasn’t, I wouldn’t be surprised if The Wolf has something cooked up by tomorrow morning. He’s kinda good.
Back to the program
Unless anyone can tell me that I was right about my initial instincts to use incrementing numbers, I think I’m going to give dirified URLs a shot. Especially since I can group entries into directories organized by year and month, the chances of having two identical titles in the same month is nil. I just need to stop the annoying habit of changing page titles after I publish.
Here is what I need feedback on:
- I use PHP on all of my pages. I’d like to hide the PHP extension from viewers. The two ways I know of doing this are a) creating a directory structure where each entry has its own directory and the entry itself is stored in “index.php” inside that directory, or b) using .htaccess and a ModRewrite to serve up “whatever.php” when “whatever” (no extension) is called. I don’t like A because it seems extraneous… I don’t want to be creating directories for every file I create. But I don’t like B because I’m not sure exactly how to implement it in a seamless way both on my server and in Movable Type. With B, I can get the extension hiding working, but MT still wants to create links with the extensions on them. Also with B, I need to be able to identify when a URL without an extension is actually a directory, so the index.php file can be served up appropriately instead of serving up directoryname.php.
- I_don’t_like_underscores_and_I_never_will. They remind me of when I had to take Mac files and throw underscores in them just so my less-able Windows comrades could use them. They also aren’t visible when viewed as a hyperlink because the underline occupies the same place the underscore does. I’d like to use hyphens instead. Are there any easy ways to do this? Hacks to MT templates maybe?
- Is there any way to limit the amount of truncation in a URL? It seems like Movable Type tends to make them quite short. Is there a maximum recommended length of a filename on the web in the first place?
- Is there any functional difference between unchecking the “Use Old-Style Links” box in MT and just adding “dirify=1″ to the archive template?
And so there you have it. I appreciate any advice anyone is willing to provide, even if it is to stick with the old school style of URL naming.