Summarizing My rev="canonical" Argument by Ben Ramsey

I think my central argument against rev="canonical" in my previous post was lost due to the fact that my post was so long. So, I’ll try to summarize my points in a very concise way.

Let’s get away from the argument about rev not being in HTML 5. That’s not the point. If a case can be made to add it back into HTML 5, then fine. Though, I still think it’s a confusing attribute, and perhaps not needed, but that is not what I am arguing. I am arguing that rev="canonical" is not the best way to indicate to clients a shorter form for the current URL. Rather, I think using rel="alternate shorter" is a better way.

What I would like to focus on is why people seem to think rev="canonical" is better than rel="alternate shorter". Is it? If so, why?

Consider this obviously fictional and tongue-in-cheek dialogue between a client and a requested resource:

“Hi, Resource! How’s it shakin’ today? I like what you have to offer, but what’s this rev="canonical" link you have? Am I not already requesting your canonical URL?”

“Oh. Hi, Client. Yes, this is my canonical URL, but that’s another URL that refers to my canonical URL.”

“Why would I want to know what other URLs refer to your canonical one? Is not the point of having a canonical URL that the other URLs don’t matter? Besides, they all point here anyway.”

“True, but that’s a shorter URL that you might want to use instead of my canonical one.”

“Well, how was I supposed to know that? All it tells me is that it’s another URL that points to your canonical one.”

“It’s implied.”

“Implied?”

“Yeah. A community agreed that rev="canonical" would mean that the URL identified by the href is a shorter form for my canonical one.”

“Why didn’t they just make it explicit with something like a rel="alternate shorter" attribute instead being ambiguous about it?”

And that’s the crux of my argument.

A couple of other notes…

On my previous post, Matt Cutts makes an excellent point about the danger in allowing documents to claim canonical-ness over other URLs. But if we do like Google did with rel="canonical" and restrict rev="canonical" to specify only URLs that are in the same domain as the canonical one, then we lose the value of being able to specify shorter domains, as we often do with shorter URLs. In fact, in Simon Willison’s post about his own rev="canonical" implementation, he mentions that he bought a new domain name for his links.

Bradley Holt makes an interesting point on Twitter about the use of the alternate keyword. Alternate is supposed to refer to another representation of the current document with the same content. Can it also refer to another representation of the URL that points to the current document?

5 Comments

Simon Harris April 11, 2009 at 23:15

I agree with your argument, rel-alternate/shorter is better.

But - and please don't take this personally - I don't agree that any of this matters in the first place. It's a real storm in a teacup.

The only reason there's a debate about how to represent "short" URLs is because Twitter is fairly popular among developers right now. Twitter is one small (and frankly quite annoying) part of the Web ecosystem, and people are proposing changing HTML to accommodate it, in ways that will be completely ignored by every user and user agent out there, and splitting hairs about how best to do so? Really? Does this deserve the amount of noise it has generated?

Eli White April 13, 2009 at 15:01

Hey Ben, just wanted to say that I agree with you 100%. This is the cruxt of my own opinions as well, and I was going to write my own blog post about them, but you've done it so well already.

Les Orchard April 13, 2009 at 19:41

Or, here's an alternate dialogue that sums up part of my like for rev="canonical":

"Hi there, HTML page. I found you with this URL I've got handy. I like you, but this particular URL is a bit wide. You got any others I can use to find you?"

"Sure, I've got at least one listed as rev=canonical in my head. There might be more."

"Oh, hey, I found the list and at least one of them is shorter than what I had before. Oh yeah, and we don't like the ev.il URL shortener as a policy on our end, so we'll use that good.ie link instead."

The crux of my argument is choice, both for the publisher and for the link carrier. Granted the ev.il vs good.ie service choice is a tad contrived - but as you can see, URL length preferences and acceptable service policies are supported by rev="canonical"

Les Orchard April 13, 2009 at 19:43

The above scenario is, of course, supported by rel="short" or whatever - but rev="canonical" treats shorter length as on par with any other criteria for choosing an alternate URL for a given page. It also leaves open the possibility to assert that maybe I don't *want* you to shorten my links.

Erik Vold April 21, 2009 at 02:41

I disagree with your assumption that the canonical relation is restricted to a single domain, because it does not reflect how canonicalization is defined, what you describe is how it is used by the search engines, which is a perfectly fine arbitrary decision for them to make.

Since you want to get away from the html 5 arguments, I'll assume we can agree for arguments sake that html 5 was incorrect in removing @rev, and will shortly reintroduce it.

So your central argument is that rel=short* is better than rev=canonical to specify short url alternatives, correct?

I would argue that a rel=short* is by definition canonical, so if we can imagine @rev is alive and well, you would use rev=canonical for every rel=short* because rel=short* is a subset of rev=canonical.

In a world of documents where every rel=short* has a rev=canonical, rel=short* still has a use I would argue, which is to specify the publishers preferred short url, but it's a marginal benefit really, because it is not hard to deduce which of the 1+ rev=canonical's is the shortest, or short enough to satisfy the user.

Some argue that if rev=canonical is used at all it should be used for every rev=canonical, a possibly infinite number. I disagree, and say the publisher should simply use rev=canonical for any rev=canonical link that may be of interest to the user. This will enable many more discussions between a user and a document to occur then the one we are all dealing with this month, which is providing a short url.

In response to your dialog above, I think you went astray right from the start, try this use case:

User: "Hey document, your canonical url is f***ing huge, does you canonical url represent any other urls I can scan for a shorter version I can use?"

Doc: "Yeah sure here are some."

User: "Nice I picked one short enough, dope!"

[If short url is from another domain]
User [thinks]: "Wait this url is from another domain.. is it safe?"

User [thinks]: "I better verify by checking it's http headers for a rel=canonical or a 301 redirect to a page the canonical represents, if there is not one there then I can also scan the html if I'm a keener."

User [thinks]: "done, off to twitter."

User [thinks]: "those old days of using tinyurl were so lame."

Well actually a user is not going to check the http headers, but they could trust the rev=canonical, or use a tool which will do all of the above for the user. Like say a bookmarklet or a ubiquity command, or a firefox extension, etc..