Beautification by Dirification?

Related: Read the resolution to this article here: Beautification Revisited

Although the subject of clean URLs has passed through the blogosphere plenty of times over the last year or two, I don’t feel there has been a definitive answer as to a) how important they are, and b) what the best way to implement them is. I plan on making a naming-convention change at Mike Industries this week and am soliciting user feedback as to the best way to go about it.

Are clean URLs even necessary?

Clearly the most compelling use of intelligent page naming is what is known in the industry as “vanity urls”. If we are calling out a special Lance Armstrong section on ESPN.com via the use of a television, radio, or print campaign, it is a huge advantage to point readers to something like “espn.com/lance”. A nice, clean, easy-to-remember URL is the only chance we have of planting information into our audience’s heads which may stick.

But what about the recent trend towards Googlefying URLs? In this case, you have a URL like:

https://mikeindustries.com/ blog/ archives/ 2004/ 04/ 20/ what_is_wrong_with_the_cottage_cheese_industry/

… which is neither memorable nor particularly vain. The idea, as I understand it, is mainly to pack the URL with keywords relevant to the subject of the page, so as to coax Google into awarding a higher page rank.

There appear to be advantages and disadvantages of naming pages in this fashion.

Conventional wisdom

Although I don’t claim to be an engineer, all the database classes I took during business school told me that database entries are to be identified by their keyfields. A keyfield contains a unique value (usually an incrementing number) which can never be duplicated across rows. Part of the value of a keyfield is that it acts as a persistent ID and never needs to be changed since it contains no information about the entry. Anything about the entry can be changed and it won’t affect the ID.

Traditionally, I have viewed CMS-generated pages the same way I have viewed entries in a database. The URL is generated from the unique keyfield and all of the content is contained within the page itself. This is evident in the default naming conventions of many flavors of blogging software, such as Movable Type. A Movable Type URL (up until version 3.0) contained an incrementing number, a file extension, and nothing more. Something like 00000038.php or 00000045.html. This makes for a nice, always-unique incrementing URL for each entry.

When I began first developing Mike Industries, I jumped right out of the gate with Movable Type 3.0. To my surprise, the CMS automatically began naming pages based on their page titles (dirified). I thought this was great, since naming URLs this way seemed to be in fashion at the time.

No sooner did I make my first entry though, that I realized the potential downside of dirified URLs. A few minutes after clicking “Publish”, I wanted to change my page title. As I reconsidered my page title, I began considering what might happen down the road if the same thing were to happen after people began linking to one of my dirified URLs. URLs are probably the only thing on the web which must remain constant. You can change practically anything you want about a page at any point in time, but once you change its URL, you sever its ties to the world.

Another disadvantage of using Movable Type’s new URL naming convention (or any other automatic dirification mechanism) is that often the URL will get truncated into something less than optimal. For instance, “how_to_lose_the_fatherhood_blues.php” can easily become “how_to_lose_the_fat.php”. Because the URL is automatically generated and truncated, these things tend to happen a lot.

A further disadvantage of dirified URLs is that using them as a tool to butter up Google is just that. Google has smartened up to keyword packing and all sorts of other schemes, so I’m sure they will eventually smarten up to URL packing. Additionally, it is unclear to me how much help a dirified URL really provides to one’s search engine ranking.

Given these annoyances, I decided to check the “Use Old-Style Archive Links” option in Movable Type and keep my URLs as incrementing numbers.

A vanity affair

So here I am, chugging along with the site, and a month into it, things are working great. No archiving problems, a few post-publishing title changes, and a general good feeling about the naming convention I chose. But then yesterday, I got an e-mail from Sean Madden telling me that I could create better URLs by simply adding “dirify=1” to my archive template. I knew this, of course, because I had hastily hashed out the situation in my head during the pre-launch stage, but the e-mail exchange which followed prompted a revisiting of the topic.

Following are some of the reasons I’m looking at dirifying my URLs again:

  1. People seem to have a nasty habit of linking to my stories and not including any text between the anchors. Some of this is probably because blog authors have URL autolinking turned on. This results in me seeing links on the web to my stuff which aren’t identifiable until I mouseover or click them.
  2. If I am trying to type one of my URLs into the location bar of my browser, the browser will suggest past pages I’ve visited within my domain but I can’t identify any of them because they are merely numeric.
  3. I’m starting to just get really vain about what appears in the address bar when I’m on a page. I realize this is silly, but sometimes it seems like part of the page, and if I’m a designer, I should be able to make it look better, right?

While we’re on the subject…

This isn’t directly related to the issue at hand, but it involves URL naming schemes so I’ll mention it anyway. Both on Mike Industries and certain major commercial sites I work on, I was thinking it would be nice to set up something like this: What if you could type in a URL like —

https://mikeindustries.com/thoughts on validation

… and the server would look for that exact URL. If no URL was found, you would automatically be redirected to not just a 404 page, not just a search page, but a search results page which was prepopulated with results from the terms in the address bar? If there was only one search result, maybe you’d be even be automatically redirected to that page (kind of like an “I Feel Lucky” for lazy people).

Anyway, perhaps this has been done before, but if it hasn’t, I wouldn’t be surprised if The Wolf has something cooked up by tomorrow morning. He’s kinda good.

Back to the program

Unless anyone can tell me that I was right about my initial instincts to use incrementing numbers, I think I’m going to give dirified URLs a shot. Especially since I can group entries into directories organized by year and month, the chances of having two identical titles in the same month is nil. I just need to stop the annoying habit of changing page titles after I publish.

Here is what I need feedback on:

  1. I use PHP on all of my pages. I’d like to hide the PHP extension from viewers. The two ways I know of doing this are a) creating a directory structure where each entry has its own directory and the entry itself is stored in “index.php” inside that directory, or b) using .htaccess and a ModRewrite to serve up “whatever.php” when “whatever” (no extension) is called. I don’t like A because it seems extraneous… I don’t want to be creating directories for every file I create. But I don’t like B because I’m not sure exactly how to implement it in a seamless way both on my server and in Movable Type. With B, I can get the extension hiding working, but MT still wants to create links with the extensions on them. Also with B, I need to be able to identify when a URL without an extension is actually a directory, so the index.php file can be served up appropriately instead of serving up directoryname.php.
  2. I_don’t_like_underscores_and_I_never_will. They remind me of when I had to take Mac files and throw underscores in them just so my less-able Windows comrades could use them. They also aren’t visible when viewed as a hyperlink because the underline occupies the same place the underscore does. I’d like to use hyphens instead. Are there any easy ways to do this? Hacks to MT templates maybe?
  3. Is there any way to limit the amount of truncation in a URL? It seems like Movable Type tends to make them quite short. Is there a maximum recommended length of a filename on the web in the first place?
  4. Is there any functional difference between unchecking the “Use Old-Style Links” box in MT and just adding “dirify=1” to the archive template?

And so there you have it. I appreciate any advice anyone is willing to provide, even if it is to stick with the old school style of URL naming.

53 comments on “Beautification by Dirification?”:
  1. I actually like the old numerical style. The new way has a cheap feel for me and doesn’t look very good in my opinion. The point you brought up dealing with bookmarks not working after a title is changed is a much more important issue than whether somebody can go through their history and see titles in the urls.

  2. Keith says:

    No advice from me, but, oh boy Mike. I’m going to be following this one.

    I’ve been thinking about this very thing lately (today) and I’ll be very interested in seeing what you end up with as I have similar views on the subject.

    My currect site uses perhaps the worst possible scheme you can have, it’s dirified in a really wonky way that makes for some crazy ass long file names and little directory structure. Pretty much all the drawbacks of dirify and none of the benefits. I didn’t know any better when I started out and I plan to change this soon.

    Thing is — I’m still not sure what to do about my old entries. I’m changing the file names, changing my main domain (7nights.com is old and pretty much meaningless at this point and since I’m messing with stuff…) and I’d hate to suddenly cut loose everyone who’s linked to my old stuff. I’ll probably just need to keep the old structure around for a bit. A true archive…

    Anyway, must be nice to be at, what, 30 or so posts?

    This kind of thing becomes a bit of a mess once you’ve broken 1000.

  3. Izzy says:

    My only beef with dirification is what you briefly mentioned; that if I decide to change the title of the entry I’m in trouble. Typepad does it this way by default and I’ve had to erase a few entries and republish after I changed titles. I’d rather have the incrementing URL’s just for this reason alone.

  4. yafujifide says:

    I like clean URLs that have a hierarchy that makes since, but locking yourself into a title for an article isn’t cool. I like URLs like this:

    mikeindustries.com/articles/tech/40/

    If I can go back to /articles/tech/ at any time, that’s cool. I could even go back to /articles/. I look at URLs like a navigation tool–if I can manipulate it to go where I want, I’m happy. I could care less if the title of an article is in the actual URL. And for the sake of beauty alone, I prefer URLs without an extension.

    Plus, if you put the title in the URL, doesn’t that mean you can only use that title once in the given directory?

  5. Phil Dokas says:

    My site recently went live also and I gave some consideration. My archives follow this format:

    /archives/year/month/complete_title/

    I forgo <MTEntryPermalink> for the reason you listed: I hate seeing index.php on the end of my URL’s. What if I change file types? Suddenly all links are broken? Do I have to alter my .htaccess file just because I’ve moved to a more suitable language? Bah, that’s all crap. Drop the file name and all is well.

    I achieve this in MT by using this for permalinks (pardon the line breaks):

    <a href="/archives/<MTEntryDate format='%Y'>/
        <MTEntryDate format='%m'>/
        <MTEntryDate format='%d'>/
        <MTEntryTitle dirify='1'>/" title="Permanent link">
            <MTEntryDate format='%l:%M %p'>
    </a>

    Yes it adds overhead to the processing time of MT, but the rebuild time is worth it to me for clean URLs.

    In my Weblog Config on the Archives page, the monthly and category archives come with URLs suitable to my purposes. The individual archive one however needed a bit of work. For Archive File Template, I changed it to:

    <MTEntryDate format='%Y'>/
    <MTEntryDate format='%m'>/
    <MTEntryDate format='%d'>/
    <MTEntryTitle dirify='1'>/index.php

    There are a few caveats of course. You still can’t edit post titles after publication and underscores still litter your URL. However, I think this is an improvement on MT 3’s default archiving scheme.

    To answer the areas you requested feedback on, I think my above proposal elegantly takes care of Item 1. Item 2, I’m not familiar with any existing solutions, but it seems like it would be easy for an experienced MT plugin guru to help you out there. Item 3, yes, you can limit MT’s truncation. In my preceding examples, you would use this for the final directory instead:

    <MTEntryTitle trim_to="YourLength" dirify='1'>

    Additionally, there is this plugin which acts similarly to the trim_to attribute except it works with whole words. I’m not sure how that would stack up with an attribute you’re dirifying though. You might want to run some tests before you tear apart your templates.

    Also, regarding your prepopulated search/404 pages, please see the non-existent directory “Monkeys” on NSLog() for a prime example of your solution. Erik there was kind enough to explain his solution with code.

    Good things all around, hope this was helpful.

  6. Scott Evans says:

    I didn’t go for title-based URLs because, I guess, you never know when you’ll use a title twice. I made up my own URL convention: YYYY-mm-dd_mtPostNumber. I tacked on the last bit for those wacky days where you post more than one thing. I like date-based naming; not sure why. I suppose a nice compromise would be YYYY-mm-dd_title, as long as you don’t mind long URLs.

    Anyway, I also wrote some PHP much like the PHP you suggest, to do auto redirects from my old, non-MT blog URLs to the new ones. Definitely a nice thing to do if you’re going to remove the old files. I can send the code along if you like (it’s about 10 lines of PHP).

  7. bofe says:

    Mike,

    I use Movable Type 2.x but I’m sure this works in 3.0 as well. I use ‘solution B’ – the files with no extension that is served as PHP.

    For my ‘Individual Entry Archive’ under ‘weblog config’ I have this:

    /

    Dirify plus allows you to specify dashes, underscores, initial uppercase, lowercase, etc.

    It works very well for me and was an easy install.

    Also, if you’re not for urls that have ‘titles-that-are-eight-miles-long’ at the end of the URL, why not use the ‘MTEntryKeywords’ at the end instead of MTEntryTitle.

    I’ve seen bloggers implement the keywords solution but they worry about legacy content that they didn’t specify keywords for – for this you can use Brad Choate’s MTIfEmpty plugin (http://mt-plugins.org/archives/entry/ifempty.php may be included in MT3) –

    [MTIfEmpty var=”EntryKeywords”][/MTIfEmpty] (had to substitue some characters for the comment to display)

    With MTIfEmpty, that code checks to see if you have specified keywords, and if you haven’t, it uses Entry Title. If you have, it uses the entry’s keyword.

    Hope this helps. Love the site.

  8. gb says:

    I know a transition from MT is probably out of the question, but I’ve been using WordPress, and those concerns are easily addressable. The mod-rewrite setup (assuming your host has it enabled) is extremely easy to set up, as WP will generate the contents of your .htaccess file for you (assuming you decide to use it). There is also a setting on the prefs page that will use hyphens instead of underscores, and you can even write your own title slug for the URLs (making it as long or short, hyphened or underscored, or even a random bunch or characters).
    Possibly not helpful to your situation, but perhaps someone will find it useful.

  9. bofe says:

    Whoops, my individual entry archive says:

    [$MTArchiveDate format=”%Y/%m/%d”$]/[$MTEntryTitle dirifyplus=”pld”$]

  10. Henning Seljenes says:

    With some recent talk on 404 pages, I always thought it would be a great idea for the script to break up the URL and try to figure out what the user was looking for. But I think if you do this, you should make the page look like a 404 page (not just a normal search page) so the user knows the content wasn’t there, but then suggest some possible entries.

    As for the Googlefied URLs, I prefer the longer method (/archive/%year%/%month%/%postname%/) because you can delete parts of the URL and “step backwards” through the site structure. For example, you could chop off the %postname% to see all posts for that year & month.

  11. Wasn’t the Ideal Weblog URL Scheme decided a few years ago? Go with /blog/year/month/day/shorttitle/. For example: /blog/2004/jul/27/urls/.

    You can accomplish the headline-search shortcut with a custom 404 handler. Make a centralized 404 handler through with every 404 passes, and make it do smart things based on URLs: It could do a headline match, check a hard-coded redirect table, e-mail you the broken link if there’s a referrer, etc. This is what we do with our custom CMS at the newspaper site I work for.

    See also Nathan Ashby-Kuhlman’s series on news-site URLs.

  12. A vote for not going back to titles_as_urls. If you switch to anything, do something like: 2004/January/07/1 … 2 … etc.

    As for not knowing the title when typing, I don’t know if that’s a good enough excuse. I know at least Mozilla Firefox shows the title on the righthand side for visited URLs as you type — it should be a problem solved by the browser, not one that you depend on for URL creation.

  13. Ivan Raszl says:

    I actually like the old numerical style, but I guess not everybody has a geek mind.

  14. Ivan Raszl says:

    is it just me on OSX Firefox? the sample url in the article: “https://mikeindustries.com/blog/archives/2004-/04/20/what_is_wrong_with_the_cottage_cheese_industry/” does not wrap into it’s place. it goes through the right panel.

  15. You can use Apache’s content negotiation modules to drop extension names (I’d say that this is the “preferred” way of doing it, even). This is part of HTTP, so many other web servers probably support it.

    Have a look at http://httpd.apache.org/docs/content-negotiation for more.

    Of course, there’s always my favourite piece of web-writing to cite, http://www.w3.org/Provider/Style/URI … which I’m sure you’ve all seen anyways! It talks a bit about content negotiation too.

    IMHO, numbers are out – they’re too hard to remember… But then again some of the URIs I’ve been coming up with for my blog I can’t even remember :-/ OK, I guess that makes me undecided :P

  16. Mike P. says:

    <cheeky smile> Hmm.. The advantages of rolling yer own… ;-] </cheeky smile>

    I wanted to mention something that could be confusing:
    “The idea, as I understand it, is mainly to pack the URL with keywords relevant to the subject of the page, so as to coax Google into awarding a higher page rank.”

    Page rank, in this case, has nothing to do with the little green bar in your google toolbar, but rather, where you page ranks on a search results page. People often mistake PageRank for page rank.

    That “While we’re on the subject…” idea is totally doable, and a great idea. Doesn’t PHP.net do something like that?

  17. I used to let MT number my files, but it always you to bug me when I couldn’t easily see what files people had come to in my referrers. “000139.html” doesn’t mean anything to me. When I redesigned last year I decided to have actual words in my file names, but fearing super long names I came up with a better solution. I don’t use the Entry Excerpt field, so I repurposed it to construct my own custom file names and just dirify its contents. Its just a much easier way to control what I get. I don’t care about search engines seeing my file names and ranking them better, there is plenty of crap on the page for that, I just needed something I could easily recognize. Beyond that, my file structure goes “/archive/year/month/day/file.php”

    Makes sense to me :D

  18. Dave Marks says:

    What would probably help would be a plugin for MT which when you made an edit on a page, would store the old url in a table along with the article id, which your 404 page could then look up against and redirect to the correct article.

    Ofcourse a better way would be if MT stored the URL vs Article ID in a seperate table – this would allow for multiple urls for every article – although this would probably make for higher overhead.

  19. As mentioned earlier in the comments, php.net implements a function similar to the one described in the post. If a user types in http://php.net/xxxxxxxx, its server attempts to do essentially the following:

    1. Does the URL “exist”? i.e., http://php.net/manual
    2. If not, does ‘xxxxxxx’ match anything in the function index?
    3. If not, then generate a search page with the most likely matches from php.net’s database.

    (Note: as I look on PHP’s description of this feature, the logic involved is a little more complex…)

    Implementation can be done in a variety of ways, I believe a la mod_rewrite. PHP.net uses the ErrorDocument page as a “front controller” to handle invalid URL’s and pass that information to a search function to create the described functionality.

    So an easy way to do this with Movable Type would be to create an “Error Document” page in PHP, configure Apache accordingly, and have PHP search MT’s database (if MySQL is the data format for example) for keywords, page titles, etc.

    This is an intriguing idea, because it allows “pretty URLs”, “vanity URLs”, etc. to be created on the fly.

  20. Geof says:

    Mike:

    I want to briefly follow up on gb’s point from before with an anecdote: I, like you, am likely to change a post title. However, rather than using the post title to create an URL, WP1.2+ uses what’s called a “post-slug”, which upon first publish is your first title, sanitized [stripping out PHP-unfriendly characters]. While I would love to see the post slug frozen after first publish, it is not.

    What is also cool is that the post slug can be something other than the sanitized title, which is great if you’re going to give the post a goofy or asinine name but want the URL to make a bit more sense to someone looking at it. [In the case of long titles, one can also come up with a one- or two-word slug and not have an outrageously long URL.]

    I use prettified URL’s purely because there have been many situations in my workplace where I have to read someone an URL, and I foresee that, one day, someone might have to read out my URL’s to someone else. Having something more readable is, in my mind, a bit better. If nothing else, putting the date in the URL allows the careful reader to have yet another context clue about when the content was posted.

  21. If anyone is interested in an example implementation of this php.net feature for Movable Type, send an email my way, and I’ll send you my implementation when I have it completed later today (for MT 3.0 with MySQL anyway). Mike’s post has me excited about this idea. It solves a lot of my “wants” for pretty url’s on my site. If I want to emphasize a particular post, I can create a pseudo-perma-link with http://example.com/description_of_post, or I can generate a list of possible blog entries from a search query (http://example.com/category_search). There are a lot of useful possibilities.

  22. Andy says:

    Mike (and Keith):

    In regard to losing folks who have linked to previous entries with an old naming scheme…

    I had the same issue when I decided to ‘dirify’ my entries. A reader pointed me to RedirectMatch, a directive you use in your htaccess file. It involves creating a regular expression, and was beyond me (my reader gave me a tailored solution that worked). But you might want to look into it.

    Excerpt of his comments with my tailored solution:

    Have you tried using RedirectMatch with regular expressions in your top level .htaccess file?

    For example, to go from http://modulo26.net/daily/120903.php to http://modulo26.net/daily/archives/2003/12/09/foo you would add the following line to your .htaccess file.

    RedirectMatch /daily/([0-9]{2})([0-9]{2})([0-9]{2})(.*)$ http://modulo26.net/daily/archives/20$3/$1/$2/

    That directive should work for any URL matching “/daily/######.foo”

    Needless to say, as long as my new entry was published at ‘daily/archives/2003/12/09/index.html’ then any previous linkage to ‘daily/120903.php’ worked like a charm.

  23. compuwhiz7 says:

    On my custom Weblog software (which is both written in ASP.NET and not even off the ground yet), I’m plannning to do something like this:

    http://www.domain.com/journal/year/month/day/entry.title/

    Because I eschew dates with zeros, year might be 2004, month might be july, and day might be 28, with entry.title being a customized post slug (like welcome), not an automagically generated one.

    The URIs in question wouldn’t actually exist on the server; only /journal/ would. The rest would be intercepted by global.asax (the ASP.NET equivalent of .htaccess and mod_rewrite) and then the appropriate data would be retrieved from the database.

    At any rate, sorry for going off-topic with my ASP.NET thing—that in itself is of no use to you, but the URI structure might, and you could do something remarkably similar with .htaccess and mod_rewrite.

  24. jake says:

    I’d have to agree with your initial evaluation. I use the number of the post for similar reasons. Perhaps you could append it in a /archive/2004/07/28/00027.php or even leave off the ending .php. I do and just use htaccess to let the program know where you’re going. Then you could also provide a setup to drill down by date. It’d be easier if someone came looking for something that happend in this month and could just type in the ../2004/07/ or something like that. I just used the post ID, but maybe you want a little more control.

    In any event, the 3 reasons you list off are not all that convincing.

    1. Yes, people who leave out the title of the link in their post are annoying, even if it is automatic, that’s their bad semantics, not yours. Half the time, since the title is truncated it only gives a vague idea about what the link brings you to, and you hover over it anyway.

      I much prefer just “feeling lucky,” although since it’s related to what I’m reading, unless it’s totally out of place, I have an idea as to what I’ll see when I follow the link.

    2. Yes, it lists off the title, but, at least in Firefox, it also gives me the title of the page I’m going to, I look at that more than the url, unless it’s blatent, like apple.com/itunes/
    3. it is totally silly. ;)

    I can’t really answer all your other questions (I’ve haven’t used MT very much since I “roll my own”) so I’ll just leave it at my opinion on the other stuff.

  25. Laurence Hygate says:

    Could you imagine a way of utilising HTTP 301s to allow you to change titles (and hence dirified URIs) after the fact. Maybe some blogging software does this kind of thing automatically?

  26. I like to use words just because it’s more human. I’m not good with numbers and I’m more likely to transpose or forget them so given a choice I avoid them.

    I’m not sure about the keyword packing aspect of nice urls but it was always my understanding that they are more of a replacement for pages generated by query string. foo.com/articles.php?year=2004&month=04 type stuff. instead of the user seeing that they see foo.com/2004/04 or foo.com/articles/0404/, basically anything other than the query string. Using some combination of the date and the title to “permalink” to articles is a convention that I noticed on news sites before blogs became popular. It seems that the main idea is to avoid using the query string (which can be disconcerting even for experienced inet users) so however you choose to do it works.

    As Andy said there is an apache module that automatically checks for close matches to mistyped file names. If it’s not turned on by default on your server (it was on mine) you would need to turn it on using the .htaccess file. From there I wouldn’t imagine it’s too much more to

    1. point requests for /page/ to page.php if it isn’t a real directory or
    2. point incorrect pages to a styled list of possible results

    The syntax for the rewrite rules took some getting used to but just tweaking the code that WordPress generated for me I was able implement some totally seperate changes off the top of my head. (with help from apache documentation)

  27. Jona says:

    This is an interesting thought, Mike. Currently in my blog, because I use xBlogLite, my entries are numbered like yours. My future plans, however, are to move to URL’s like the following:

    http://www.mysite.com/journal/year/month/day/entry_title

    This way, the address looks clean and is descriptive. Personally, I rarely need to modify my entries, because I almost always write them very thoroughly and make sure that I’ve studied well before posting. Though the URL would be irrelevent if you changed the entire purpose of the entry, I think being certain of what you want to write about in the first place is a small price to pay for a much nicer address to each entry.

  28. Faruk Ateş says:

    You mean to tell me Movable Type doesn’t let you choose the slug (ie. “beautification-by-dirification” in this entry’s URL) manually? If so, then man that blows more carhorns than a good traffic jam.

    Personally, I prefer URL’s that aren’t numerically identified. An ID tells me nothing, so when I encounter a URL with nothing but an ID number to distinguish it from other URL’s I still have no clue as to what it’s about.

    There are exceptions to that, but it’s not something to rely on. When Dan `SimpleBits` Cederholm offered 3 copies of his book, asking only for his readers to post a link to their favorite blog entry of all blogs they ever read, one particular Design By Fire article as pasted numerous times. DxF uses ID numbers for entries, and because it was mentioned by so many people on SimpleBits in those comments, I ended up memorizing the URL to belong to that particular article (http://www.designbyfire.com/000099.html for The Real Reason You Should Care About Web Standards). Now, when I encounter links to DxF with that ID number, I know which article they’re talking about. But for all others, I don’t. When someone makes a link like this, you can hover over the word “this” and still have no clue what the entry is about. Not everyone adds a title attribute to such links. But if I make a link like this, for instance, you can see that the entry is about Hierarchy.

    The point behind my little rant-and-example is this: you CAN’T ensure that everyone who ever links to your site uses proper, descriptive link labels (instead of “this” etc.), but you CAN ensure that the URI’s they link to are as descriptive as possible.

    As for your 4 feedback-things, I can only point out that over at the vBulletin forums someone had problems with one of the vB-files being over 32 characters in length. I would personally always try to stay below 25 anyway, because “titles-that-are-too-long-for-a-single-entry-tend-to-kinda-suck-a-lot” :)

    On a side-note: it’s funny you should mention the idea of going to “https://mikeindustries.com/thoughts on validation” and ending up at either search results or directly the only entry found using those terms. I just tried to implement such functionality in my work’s Content Management System a few minutes ago, but so far without luck. If I do end up getting it to work, however, it’ll still be of no use to you, as it won’t be at all compatible with neither Movable Type nor WordPress or any other such piece of software. Their entire engine is completely different from my CMS’s, so alas…

  29. Faruk Ateş says:

    Sorry about the double post, but I think this excellent quote from Pirates of the Caribbean is very fitting for what I said about URI’s containing ID’s or not:

    The only rules that really matter are these — what a man can do and what a man can’t do.

    You can ensure descriptive URI’s on your own blog. You can’t ensure others to use descriptive links when linking to your entries.

  30. Faruk Ateş says:

    Sorry about the triple (bad me!) post, but I mentioned I was working on this:

    What if you could type in a URL like —

    https://mikeindustries.com/thoughts on validation

    … and the server would look for that exact URL. If no URL was found, you would automatically be redirected to not just a 404 page, not just a search page, but a search results page which was prepopulated with results from the terms in the address bar? If there was only one search result, maybe you’d be even be automatically redirected to that page (kind of like an “I Feel Lucky” for lazy people).

    Sure enough, I just managed to implement that into my CMS.

    The concept is relatively simple, but since I can’t share any of this PHP code in public (it being property of the company I work for), here are the basic steps needed:

    1. Check the URL that’s given;
    2. if it’s valid in your system (MT, WP, whatever), send it there of course;
    3. if it’s NOT valid in the system, do a serverside redirect (in PHP: header(“Location: $newloc”); ) to the search location of your system, appending the requested Query String (in PHP: $_SERVER[‘REQUEST_URI’]) after optionally formatting it more appropriately for the search engine of your system (if it’s any bit advanced).
    4. Send a flag (I use “&lucky=1” after your good example of Google’s implementation) along with this redirect;
    5. Make sure your search engine checks for the existance of this flag (probably requires you to alter your system’s code) and, upon finding only one result, do another server-side redirect to that result’s location.

    The reason for server-side redirects is mostly just so people won’t notice it happening. HTML-side redirects tend to be rather slow, plus, they’ll rewrite the URL in your browser’s address bar which is very undesirable (esp. with a double redirect).

  31. Alan Green says:

    Search results on a 404 page are neat, but it is important that the browser is returning a 404 response, so that it is clear to search engines and other robots that the Uniform Resource Locator doesn’t actually locate a resource.

    Faruk’s suggestion of automagically serving up a page if a likely looking match can be found is interesting, but I would be cautious in applying it. It has the potential to make every page on your website available through myriad URLs, many of which may not work tomorrow or the next day.

  32. Faruk Ateş says:

    Alan,

    You’re correct on pointing out the risk, but with a server-side redirect, the actual URL that people will end up with will always be the correct and proper URL, the one they would’ve gotten to if they had done normal browsing/searching and clicking around on your site. I agree that it’s rather bad to have every conceivable URL instantly return a seemingly valid result page (ie. search results or an actual article match), but with a redirect then even search engines will ignore the “typed-in” url (like “http://www.site.com/some article keywords“). Of course, if someone “deliberately” created a link somewhere to such a URL, it does indeed pose a risk of being wrongly indexed if you don’t send a 404 header in that case.

    Of course, if that’s of serious concern to you, you can skip the auto-forward-to-the-only-searchresult-page step, and just provide the search results with a 404 header and a big “404 Not Found–but is this perhaps what you’re looking for?” heading on that page. People will know that their URL was incorrect, but still be given useful links (presumably), and search engines will notice from the 404 header that it’s incorrect indefinitely.

    Me, I’ll be going for my own implementation now, as I rather like the incredible flexibility, but it’s good to inform people of the slight risk involved, yes. :)

  33. Gary Love says:

    Mike,
    I don’t use MT, so I don’t have an answer for you in that regards.

    Personally, I use both of the methods you mention. Every story is reachable through permanent id’s or readable versions that act as miniature database requests.

    For instance, take a recent article about urls.
    Its permanent/unique location is http://www.nemejo.com/storyid/61.

    However, it can also be found through non-permanent or non-unique urls, for instance its topic or section. Making it both hackable and readable. For instance, http://www.nemejo.com/topics/technology/besturls or http://www.nemejo.com/bestpractices/besturls.

    Overall, I like this tactic, but the downside is that I haven’t found a way to prevent people from linking to the non-permanent urls.

  34. quis says:

    Here’s some simple .htaccess thing that I use, might help get you started/give you some ideas:

    RewriteCond %{REQUEST_FILENAME}  -d
    RewriteRule ^.*$  -  [L]
    RewriteRule ^([^/]+)/$	/handler.php?p=$1 [L]
    

    This basically checks if the request is a directory, if it is the directory index will be served, if not a “handler” type page will be served.

  35. Faruk Ateş says:

    Gary:

    Overall, I like this tactic, but the downside is that I haven’t found a way to prevent people from linking to the non-permanent urls.

    Try implementing my method of server-side redirecting people to the actual, real, permanent URL’s. Then only existing links to those non-permanent (but functioning) URL’s will remain, anyone trying them will end up with the right, permanent URI in their address bar.

    What you’re currently making happen is exactly the thing that Alan warns us about.

  36. Mark Pilgrim has done some nice work in this area:
    http://diveintomark.org/archives/2003/08/15/slugs

  37. Nick Potter says:

    Just a thought but even if you did change your entry titles would it not be possible to do something with an .htaccess file so that if someone linked to a page that didn’t exist anymore (as you’d changed the name of the post) that it listed everything in that subdirectory – nicely formatted of course.

    A kind of:

    Sorry – can’t find that exact entry. But these other entries exist for August 3, 2004…

    miscellaneous post one
    entry with new title
    miscellaneous post two

    (I’m presuming here that the title of the post wouldn’t be so different you wouldn’t be able to work out which entry you were after)

  38. A very interesting post.

    As a web developer with a particular interest in good quality information architecture, I regard self-explanatory, permanent URLs as an essential aspect of a high-quality site. Whilst I agree with the points made about non-changability of URLs, I feel strongly that using 0000014.php etc is far worse.

    One thing I really hate is the use of technology-specific URLs such as .php. I suggest that on the vast majority of sites, parsing .html for PHP is unlikely to reduce performance, yet gives better flexibility for the future, and is arguably more correct in that the user receives HTML (.html) not PHP (.php), even though PHP (for instance) has generated the page.

  39. Robert D. says:

    I agree that a long “dirified” URL isn’t really great, but it’s a lot easier to remember than one with a long number. It’s not perfect, but it’s a lot better than what came before. You can always use your templates creatively so you can manually name your files.

    Personally, I like the way A List Apart does it: they come up with a simple, short title to use as the URL. I’d like for major weblogging programs like Movable Type to allow us to do this.

  40. Andy Evans says:

    I say keep it old school… google has learned to deal with parameters… so pass the title with the “?” or you can spoof a fake variable if you want to play dirty

    /dynamic.php?fakeparamorrealwhatever=google/likes/it

    Let the programming toss out the nasty and google can get the goods.

    I’m not a search engine expert, but from what I have seen there is no reason to hide the page in some make-believe directory structure.

  41. Henrik Lied says:

    I, and many others like guessable URI’s. For example, on my site I don’t categorize them after day, month or even year.
    I use a clean, guessable URI-scheme like

    /article/category/whitespacestripped-post-title/

    I just don’t see the point in categorizing the posts after date, since no-one can possably guess a URI like that.

  42. I’m arriving way late into this discussion, but my half-nickel-rounded-down:

    I think it’s important to keep URLs short. A lot of email clients will wrap an inline link after 72 or so characters. If I may be permitted to spam your comments:

    http://www.vardaman.org/archive/2004/01/17

    I often marvel at my non-techie friends who understand the problem well enough that they use soemthing like http://www.tinyurl.com to shorten emailed links.

  43. Sam Walker says:

    It should go without saying, but you should always do what is best for users. In this case, a title in the URL is definitely best for users — a numeral is only good for the server, it’s useless for the user. The URL is intended to tell you where you are and give you a rough idea of what you’re looking at — some people in this thread have suggested that this is a problem that should be solved by something else, but you’re forgetting that URL is that thing — that’s specifically what it was created for. As for changing titles, it shouldn’t be a problem — if you move the post to a new page because of a new title, the old URL should return a 301 Moved Permanently error code, which will send the user towards the right page, and even fix their bookmarks if the browser is smart enough.

  44. Oli says:

    Although this is a moot point in that you’ve implemented a method and found several of the same references, I just thought I’d add an article on URL slugs that I wrote a while back. It may be of interest…

    Thanks for the interesting reading – I’m looking forward to playing with sIFR!

  45. Scot Hacker says:

    Part of the value of a keyfield is that it acts as a persistent ID and never needs to be changed since it contains no information about the entry. Anything about the entry can be changed and it won’t affect the ID.

    Unfortunately, database IDs aren’t future-proof at all, in the case of MT. Consider two scenarios:

    1) Over time, you delete some entries in your blog, leaving gaps in the ID numbering. Later, you decided to move to another web host. You export your MT data, which – surprise – does NOT export the database IDs. Then you import into the new system, and oops – many of the old posts get new IDs. Rebuild, and your permalinks suddenly aren’t so permanent.

    2) MT can and often does have multiple blogs in a single installation. Blog A may have already incremented to #312. Now Blog B is added and its first post is #313. Blog A’s next post is #314. Now either blog decides they want their own MT installation on the same server, or wants to move to another host, and you get the same problem as in 1) above.

    Why doesn’t MT export database IDs on export? Because if it did, you would expect them also to be imported into your new installation. But if you’re a 2nd blog on that new host in a multii-blog environment, the IDs of your old data could already be accounted for by the existing blog. So MT *can’t* sanely keep IDs on export/import. Therefore ID-based URLs are a bad idea – they never should have been in MT to begin with. Which is why we have sane, future-proof, date/slug-based URLs in MT3.

  46. I have always used the
    http://mt-stuff.fanworks.net/plugin/dirifyplus.phtml
    Dirifyplus plugin to replace underscores with dashes(google friendly, – are seen as spaces an_is seen as an _. Do a google search on underscores as filenames and see. It definitely helps ratings.
    A file format that makes sense and is user friendly. A lot of dates and numbers mean nothing.

    http://www.yoursite.com/categoryname/filename
    tells me a lot. If I visit your site, I would likely browse a category called webdesign, but will never look at December/2004. Who has the time to browse through weblogs on months, years or whatever looking for content that the archives give no clue to?

  47. Malaga says:

    Greetings from Malaga (Spain). Antonio :-)

  48. Ryan Turner says:

    Google does read underscores similiarly now to dashes. Its also important to note that depending upon your google Page Rank – google will not spider many directory levels deep. As you gain in PR it will eventually spider an entire site regardless of directory structure but this can take years depending upon your site’s directory structure and content. Your best bet is to stay no more than 2 levels off the root of the site. Anyway… my $0.02

  49. Considering all the research done on this, and considering Matt Cutts who works for google clearly states that dashes are best as google sees underscores as character data, I’ll vote with Matt:
    http://www.mattcutts.com/blog/dashes-vs-underscores/

    That was in 2005, then he said it again in apr 2006:
    http://www.mattcutts.com/blog/guest-post-vanessa-fox-on-organic-site-review-session/

    “And speaking of putting a dash in URLs, hyphens are often better than underscores (Ed. Note: bolded by Matt ). african-elephants.html is seen as two words: “African” and “elephants”. african_elephants is seen as one word: african_elephant. It’s doubtful many people will be searching for that.”

  50. Another Sensational Design

    A couple of days ago I mentioned how much I liked the design at Sonnenvogel.com. I’ve just found another site…

  51. Another Sensational Design

    A couple of days ago I mentioned how much I liked the design at Sonnenvogel.com. I’ve just found another site…

  52. MAVROMATIC says:

    Mavromatic Now Has Future Proof URLs

    It seems like the latest trend in blogging is to use future proof URLs. Well, I spent some time talking to Mike Davidson and a few other co-workers as to why I should actually make the change. It turn…

  53. […] Last week’s post on dirified URLs was supposed to bring about some sort of consensus opinion on smart URL-naming conventions. Thanks to everyone who posted their very helpful and enlightening comments, but in the end, we only discovered more options and came to no mutual conclusions. It appears that people just look for different things in their URLs and what you do with yours is up to you. […]

Comments are closed.

Subscribe by Email

... or use RSS