Matt Gemmell

My new book CHANGER is out now!

An action-thriller novel — book 1 in the KESTREL series.

★★★★★ — Amazon

Miss URLd

Development 7 min read

Help me make my URLs as pretty as they can be!
Search-engines don't spider my blog because Thistle uses URL parameters, 
and most engines deliberately don't follow parameterised URLs very far. Also, let's face it: parameterised 
URLs are ugly things indeed. Thus, I'm currently involved 
in enhancing Thistle to be capable of using "slashed URLs" as I call them; i.e. "index.php/category/post.html".

I want to follow <a href="">Blosxom's model</a> 
as much as possible, but Thistle's greater feature-set leads to some difficult 
questions about the format of such URLs. I'd very much appreciate your help as I try to create 
the prettiest URLs which are feasible.

In addition to any general thoughts and comments, which are always greatly appreciated, there are a few 
essential questions I'm interested in answering. These key questions are numbered, and are 
<span class="softhilight">highlighted like this</span> in the content of this post. Responses to these 
questions, even if very brief, would be a huge help to me.
Required functionality
The new style of URLs I'm thinking about must be capable of specifying a category or a specific post, and any combination of: 
a date, an author, 
a "type" (formatting template), and/or a search query-string. I'm inclined to think that a search query should 
always be parameterised, so we can ignore searches for now. Thus, I want to define an URL-format which can specify 
any or all* of the above pieces of information, purely in slashed-URL format (without any HTTP-GET parameters).

* In fact, there are some mutually exclusive options. A category-filter cannot be combined with a specific post, since the post implies its own category. Similarly, a specific post can’t be combined with a date-filter, since the post has a specific date associated with it. Also, author-filters can’t be combined with a specific post, since a post was written by a specific author. Filters in Thistle are currently always combined with a logical AND.

Dates & Categories
Thistle's grandfather, the venerable <a href="">Blosxom</a>, helps us with 
determining an appropriate URL-format for some of these 
parameters/filters. Below are some example Blosxom URLs and their meanings.

<strong>All posts in /cooking/italian</strong>


<strong>July's posts in /home/repair</strong>


<strong>January 1st's posts in /personal/resolutions</strong>


The above is directly applicable to Thistle; we can readily adopt that format to deal with category-filters and date-filters, 
although I think I might prefer to have the date-filter come before the category. 
<span class="softhilight"><strong>Question 1</strong>: Should date-filters come <em>before</em> or <em>after</em> category-filters?</span>
Problems arise, however, when you consider how Blosxom specifies a formatting-type (which it calls a "flavor"). It works like this:

<strong>The bruschetta.txt entry in /cooking/italian, default HTML flavour</strong>


<strong>The bathroom_remodel.txt post in /home/repair, displayed as RSS</strong>


<strong>The exercise.txt post in /personal/resolutions, using a custom 1993 flavour</strong>


In the above examples, all of the blog's actual posts are in files with the ".txt" suffix. The blog also has three formatting-types defined, 
called "html", "rss" and "1993". Presumably "html" is the default format, "rss" renders posts as an RSS feed, and "1993" is probably a 
much simpler formatting style, using circa-1993 HTML formatting (HTML 2.0? I can't remember what version we had back then). Uniquely, Blosxom 
specifies a "flavor" to use by adding the flavor's name as a <em>suffix</em> in its permalink URLs.

This is all well and good, until you consider the consequence of this system: it means that you can only specific a "flavor" if 
the URL is a permalink. You cannot, for example, have an URL with a category-filter, which also specifies a flavor to use when 
rendering that category. Thistle doesn't have this limitation; it accepts a "type" parameter in any context. Thus, our slashed-URLs 
must be capable of specifying a type even if they are not permalink URLs. So where do we put the type in the URL?

There are two possibilities as I see it: either do the same as Blosxom regardless, or look for the type as a separate segment of the URL. 
Here's how it would look in each case:

<strong>Cocoa category using printer-friendly type, Blosxom-style</strong>




<strong>Cocoa category using printer-friendly type, separate segment</strong>




<strong>This post using printer-friendly type, Blosxom-style</strong>


<strong>This post using printer-friendly type, separate segment</strong>




So, <span class="softhilight"><strong>Question 2</strong>: For types, do we use Blosxom-style <em>suffixes</em>, or separate <em>segments</em>? 
If separate segments, should they be at the <em>beginning</em> or <em>end</em> of the URL? Otherwise, if suffixes, should they also have a slash in 
front of them if the URL is not a permalink (see examples above)?</span>
Now we come to the issue of author-filters. As a descendant of <a href="">PHPosxom</a>, Thistle has the ability to 
support multiple authors per blog. An author can "sign" a post with their abbreviated nickname, and the post will be attributed to them appropriately 
when displayed. The thing is, I have no idea how many people actually <strong>use</strong> this feature. I know that Robert (Daeley, creator of PHPosxom) has 
actually <em>removed</em> the authors feature from the next revision of PHPosxom, as part of his optimisation efforts.

I really like the idea of the authors feature; I just can't help but wonder if it's commonly-enough used to justify allowing its inclusion in 
our new slashed URLs, with a corresponding increase in the complexity of the parsing code for such URLs. If you're a Thistle user, consider this 
a bonus question: do you actually use the multiple-authors feature?

In any case, working on the assumption that we should indeed support author-filtering in our new URLs, we need to decide where in the URL we'll look 
for the name (actually, the abbreviated nickname) of an author. Author nicknames can be assumed to have no spaces for the purposes of this discussion. 

Clearly, we have to consider question 2 (where the type-filters go, and their format) when thinking about this. If you thought we should use suffixes for 
types, then we can immediately say that author-filters should thus go in a segment at the start of the URL, as below.

<strong>All posts by me, using printer-friendly type, in the Thistle category</strong>




However, if you preferred putting the type in a separate segment, then we have a further question to answer: 
<span class="softhilight"><strong>Question 3</strong>: Should author-filters be at the <em>same end</em> of the URL as type-filters, or the <em>opposite end</em>? 
If the same end, should the author-filters be <em>before</em> or <em>after</em> the type-filters?</span>

Almost done! One more question to go... and it's a tough one.
Order of precedence
If your previous decisions would allow type-filters and/or author-filters as <strong>separate segments</strong> at the <strong>start</strong> of an URL, 
then we have to decide on order of precedence. For example, consider this URL:


What does it mean? Or rather, what should Thistle <em>interpret</em> it to mean? It could be any of these:
  • All posts in the /jan/dev category
  • All posts in the /dev category written by the author "jan"
  • All posts in the /jan category written by the author "dev"
  • All posts in the /dev category, using the "jan" type
  • All posts in the /jan category, using the "dev" type
  • All posts, written by "jan", using the "dev" type
  • All posts, written by "dev", using the "jan" type
  • (Any others...?)
Obviously, in the case of there being more than one entity (category, author or type) with the same name, 
we must have an order of precedence to fall back on, to determine how to proceed. This leads to my final 
question, and perhaps the most important one of all: 
<span class="softhilight"><strong>Question 4</strong>: What order of precedence should be applied for 
identically named <em>types</em>, <em>categories</em> and <em>authors</em>?</span>

Think carefully about that last question. It's important to consider the most common situations encountered 
in the average blog, the most commonly-used features, how people organise their information, and so on. 
Justifications and/or elaborations would be very helpful here.
My answers
Well, it's over - class dismissed! If you've even just <em>read</em> this far, I'd like to thank you. If you 
plan to leave feedback, then consider yourself officially blessed. I know that reading this must have been a bit 
of an ordeal - if it's any consolation, writing and formatting this post was a real swine.

In closing, I'm going to give you my answers to each of the questions, for reference. Thanks again for reading!
  1. Date-filters should come before category filters.
    I've always considered dates to be more "encompassing" than categories, so they should come first. Also, it will look better for "archive" URLs to start with /2003/07/... etc, since archives are usually primarily ordered by date.
  2. Type-filters should be separate segments, at the end of URLs.
    Blosxom's suffixes are certainly enticing, but they weren't designed for use with category-filters, and look awkward there; using separate segments is clean and consistent. Type-filtering is a secondary sort of property of an URL or page, so types should be at the end of URLs. This allows URLs to end with "/print" for printable versions, or with "/rss" for feeds (as LiveJournal does). The formatting template used for a page is sort of an afterthought, so it should be at the end.
  3. Author-filters should be at the same end of an URL as type-filters, and should come before type-filters.
    Similarly to my answer to question 2, I think that author-filtering is subservient to category-filtering, so should come after the main part of the URL. However, filtering by author is more important than what formatting template you want to use to render the page, so author-filters should come before type-filters. URLs should look like this: "index.php/category/author/type".
  4. The order of precedence should be: category, author, type.
    (As explained in my previous answers.)
Whew. All done.