A rev="canonical" Rebuttal by Ben Ramsey

There’s a lot being said about rev="canonical". Others have already explained what it is and stated the arguments for it, so I won’t go into all of that, but I would like to offer a rebuttal—to play devil’s advocate, so to speak—in hopes that we’ll all slow down and think about what we’re doing before we jump all-in and start implementing something that may not be a good standard for the Web, leading to more problems down the road.

First, let’s look at rev="canonical" from the perspective of a purist. HTML 5 does not include the rev attribute on the link or a tags. It was dropped, and there has been a lot of discussion about this, so to reintroduce it at this point is a fruitless effort. The community has already decided against it. Why bring it back to the table?

Furthermore, I thought we had moved beyond encouraging people to break the standards. Rather, we want to encourage people to follow standards and not make their own. Creating your own standards leads to differentiation and specialization in client applications (browsers), and some browsers will end up supporting the new features, while others will not. The frustrations faced by client-side developers attempting to program for multiple clients have taught us that this is not desirable.

That said, if the microformats and HTML 5 communities are interested in revisiting this and considering rev for inclusion in HTML 5, then this is solved, but I still think there are some grievous pragmatic problems with rev="canonical".

The first is this: rev is too damned confusing to understand. If it takes a two-hour conversation on IRC to explain what rev="canonical" means, then something is wrong. Developers should be able to understand the semantic meaning of rev="canonical" at first sight and without the need to dig through multitudes of documentation and blog posts to grok the concept of rev.

I think this confusion will ultimately lead to problems that render the value of rev="canonical" as meaningless to clients and search engines. What rev="canonical" really means is this: “I (the URL of the current document) am the canonical URL for that URL over there (the one specified in the href of the link tag).” What I think will happen, though, is that people will misunderstand this, as previous usage of rev has shown. This misunderstanding could lead to the following improper implementations.

Let’s say, for example, that the current document’s canonical URL is http://example.org/2009/04/10/a-rebuttal-for-rev-canonical. A shortened form might be http://example.org/revcanonical. Knowing this, the correct implementation for rev="canonical" would be:

<link rev="canonical" href="http://example.org/revcanonical" />

Thus, when a client reads http://example.org/2009/04/10/a-rebuttal-for-rev-canonical, it sees this link and understands that the current URL is the canonical URL for http://example.org/revcanonical.

I foresee that implementers could easily misunderstand this and implement it like this:

<link rev="canonical" href="http://example.org/2009/04/10/a-rebuttal-for-rev-canonical" />

In this case, the link is self-referential, and the value of rev="canonical" is lost, since no short form is specified. However, this won’t lead to any problems, since the default URL for a document not containing rev="canonical" is the original URL, according to RevCanonical.

What will lead to problems is when people misunderstand rev, thinking it to mean rel—after all, rev isn’t in the HTML 5 spec, so maybe rel will work (or so the thinking may go)—so they implement it like this:

<link rel="canonical" href="http://example.org/revcanonical" />

This means something entirely different. This tells Google (and maybe other search engines) that the canonical URL of the current document is the value of the href. It is the inverse of rev="canonical". This might not lead to a problem quite as drastic as the linkrot apocalypse, but it might lead to inaccurate URLs being stored in search engines and could negatively affect your SEO.

Finally, earlier I said that rev="canonical" means “I am the canonical URL for that URL over there.” In this case, “that URL over there” just happens to be a shorter one, but semantically, that’s not what rev="canonical" means, and here is another problem with this approach.

A canonical URL is just that: canonical. It is the primary URL used to refer to the resource. All other URLs referring to the resource are secondary and unimportant (except insomuch as they direct us to the primary URL). In fact, there could be infinite secondary URLs that direct clients to the canonical one. By specifying rev="canonical", you assign importance to the link identified by the href, but you don’t express why it is important, except to say that it is another URL that points to this canonical one. In fact, you could have hundreds of rev="canonical" links for any particular document. How would an implementer choose the proper one to use as the short URL?

This is why I think better semantics are necessary. I see no need to specify that “this is the canonical URL for that URL over there.” If things are set up properly, then “that URL over there” will properly tell search engines and clients that it isn’t the canonical URL by responding with a 301 or 302 redirect. Instead, the canonical URL should tell clients that there is a preferred shorter form of the URL that may be used if desired, and I think the best way to do that is with a rel attribute, specifying an alternate URL form for the current document. The RevCanonical folks also identify this form:

<link rel="alternate shorter" href="http://example.org/revcanonical" />

All of the aforementioned problems are solved with this usage: it doesn’t break the HTML 5 standard, it isn’t confusing and can be understood by developers without the need for long discussions, and it doesn’t imply that the current document is attempting to identify all URLs for which it is the canonical one.

So, why does RevCanonical specify two forms that serve the same purpose, and why do they advocate for the one that violates the HTML 5 spec and is confusing as hell to explain when there is already a form (suggested by themselves) that doesn’t violate the standard and is easy to understand?

Next steps: I’ve already added the “shorter” rel attribute to the WHATWG wiki, and I’ve mentioned it on the #microformats IRC channel. It will be an big uphill battle to get them to reconsider adding rev back to the HTML 5 spec, but I think the low-hanging fruit is in getting the “shorter” rel type added, and I think there’s a good case for adding it. The danger now is in how many early adopters implement rev="canonical" in the meantime. It looks like people are starting to add it, and that worries me a bit.

20 Comments

Herman Radtke April 11, 2009 at 05:21

I sense a panic in the air surrounding this whole issue. We shouldn't be scared into adopting rev="canonical". You, and some others, have raised some strong concerns and those concerns need to be addressed before its use becomes widespread.

Ben Ramsey April 11, 2009 at 05:30

I don't think of it as panic, but it is concern. Judging from tweets about the issue, it seems to me that many people are rushing in to adopt it before it's been fully thought out and vetted by the community.

But maybe early adopters are needed to test it so that we can think through it and vet it.

Drew McLellan April 11, 2009 at 10:18

You don't need to worry about early adopters - early adopters are smart enough to worry about themselves. Something that takes 5 minutes to implement only takes 2 minutes to change, if necessary.

What this does highlight, if anything, is the poor decision to remove a perfectly useful attribute from the HTML5 spec. You happen to disagree with this particular use of rev, but a lot of people support it. That shows, if anything, that this attribute has some merit, and an application with which we can all be agreed may arise tomorrow.

The bulk of your argument against appears to simply be that WHATWG has already decided it doesn't like the rev attribute, which is putting the cart way before the horse.

Ben Ramsey April 11, 2009 at 13:17

Drew: I don't think the bulk of my argument is that it isn't in the HTML 5 spec. I did mention that, but I also said that, if the WHATWG is interested in "considering rev for inclusion in HTML 5, then this is solved." I didn't discount that they could choose to include it again. The bulk of my argument really centers around the fact that I think rev is bad in general because it is prone to being misunderstood and, thus, misused. As a concept, it isn't bad, but most people won't grok it, so they won't implement it correctly.

Chris Shiflett April 11, 2009 at 14:14

I pretty much agree with everything Drew said, but I'm glad you wrote this. I'm off to play soccer in a minute, but one quick response first. :-)

If rev was safely part of the HTML 5 spec, I don't think there would be any issue at all. It's not hard to explain: "The rel attribute indicates a forward link; the rev attribute indicates a reverse link." It also makes the most sense, because it's the opposite of rel="canonical".

My own devil's advocate argument would be different. Implementing rev="canonical" for sites like Dopplr and Flickr takes very little effort. Implementing it for sites like Twitter requires an HTTP request and some HTML parsing every time. That's a lot to ask, and I think it's the biggest obstacle rev="canonical" faces.

Chris Snyder April 11, 2009 at 14:24

You are absolutely spot-on. It's counter-intuitive. It also encourages programmers to ignore the HTTP standard in some vain chase for seo magic. And it's a lot of fuss for something that solves a very small part of a much larger problem (sites that run their own internal short url service vs. the thousands of third party short url services that everyone will continue to use).

When did we deprecate the HTTP redirect in favor of markup? If there is a canonical URL for this resource, then why am I not looking at it?

Chris Shiflett April 11, 2009 at 14:52

Chris, I think you're confused.

1. The debate about rev versus rel has to do with whether the former is going to be included in HTML 5. (HTTP is a closed spec and is unrelated.)

2. No one cares if lots of people continue to use third-party services. This idea isn't for them.

3. Answering your last question, you are indeed looking at the canonical URL. This is the distinction between rev and rel; rev indicates a reverse relationship, so the other alternative is not the canonical one. (It's the short one.)

Hope that helps.

Jason Karns April 11, 2009 at 14:56

In regards to specifying one preferred short URL over many others that could potentially be linked from the canonical document, I posted a thought-comment about this specific issue. http://revcanonical.wordpre...

CSS Stylesheets are linked and can be grouped by @title on the link element and can be marked as preferred by adding @title and omitting 'alternate' on @rel. Thus all links with @rel=alternate would be secondary to the @rev=canonical link that has @title but omits @rel=alternate
See the HTML 4.01 spec for further information on linking persistent, preferred and alternate stylesheets.
http://www.w3.org/TR/REC-ht...

Jeremy Keith April 11, 2009 at 15:50

What Drew said.

There are other things in HTML that are much, much harder to grok than rev. People seem to do okay. And certainly the early adopters implementing rev="canonical" right now know what they're doing.

If the argument "but people might use it incorrectly" were a valid concern, we shouldn't even be talking about new HTML elements like header, footer, section, article, etc., because some people are almost certain to implement them incorrectly. But that doesn't stop them being semantically useful constructs that should be ought to be in the spec.

Just like rev.

Lars Gunther April 11, 2009 at 16:41

I am not worried about SEO juice. I am worried about usability and security. When I see a shortened link I don't know:

1. What domain it leads to. Iwant to see information in order to make an informed decision how safe and informative the resource might be.

2. If I've seen the page before. I basically stopped clicking links on Twitter, since I often find myself on a page where I've been before, following a different URL.

3. What the page is about. Today many sites have very well designed URL's that in themselves are informative. Short URL's make us loose that information.

I'm seeing shortened URL's all the time now, even when the full URL would fit with margin within Twitters 140 character limit.

So this is my idea. Except for badly designed .net sites which tend to have URL's that reads like a book, stop using these (beep) URL shorteners!

Nicholas Sloan April 11, 2009 at 16:44

I think people are ignoring the argument that rev="canonical" indicates that the current URL is canonical for the listed URL, and that it attributes importance to that alternate URL without any semantics for why.

I am also bothered by the fact that this feature is being included in the markup. It really just feels like a protocol type of issue to me. Can someone explain why this is being considered for implementation in markup rather than as an extension to HTTP? HTTP does allow extensions, doesn't it?

Ben Ramsey April 11, 2009 at 17:26

Nicholas: That is precisely my argument, and I think people are missing that and focusing on the fact that I mentioned that rev is not in HTML 5.

I smell another post coming from me in the near future that concisely states what I was trying to say here.

Matt Cutts April 11, 2009 at 17:55

"Thus, when a client reads http://example.org/2009/04/..., it sees this link and understands that the current URL is the canonical URL for http://example.org/revcanon...."

If a url A1 can claim it is the canonical url for another url A2 on the domain A, that opens up the possibility of hijacking attacks--especially on freehosts. That's why when my team at Google built consensus for rel=canonical, we said that urls could only give away canonical-ness, not take it from other urls. Splatting canonical-ness forward from a url is safe, but claiming canonical-ness from other urls opens up the possibility of attacks.

Andy Mabbett April 11, 2009 at 22:21

Other issues aside, rel="shortcut" seems to be more precise than rel="shorter".

Andy Mabbett April 11, 2009 at 23:10

The solution to the problem Matt Cuts highlights is to only accept rel="shortcut" (or, for that matter, rev="canonical") if a reciprocal rel="canonical" is also in place (as with rel="me"). In which case, passing that relationship across domains should also be acceptable.

samj April 13, 2009 at 00:38

Agreed. Vote NO RevCanonical.

I'd suggest that rel="short" is the appropriate alternative (shorter? shorter than what? is that guaranteed to be the case?)

Sam

samj April 13, 2009 at 02:03

See rev=canonical considered harmful (complete with sensible solution) for more. I've taken the liberty of updating the proposal to rel="short" because it's a> shorter and b> makes more sense :)

Sam

Bernhard HÃ¤ussner April 13, 2009 at 14:46

The proof that developers get this wrong is php.net. Thy define every of their mirror pages as canonival using rev="canonical" on them for pointing to the short url. The flickr rev implementation doesn't provide short urls on URLs that oar not canonical. This is not optimal too.

But whith rel="shortcut" there's another problem: There are many pages that define their bookmark icon via rel="shortcut icon". Since the rel attribut is a space separated list, the shortcut icon would be your short url.

Using short or shorter is even not that correct, because one would expect a shorter version of the content.

And putting alternate in the list is even worse because the short url is not really an alternate to view the content since it just redirects.

I figured out rel="shortcut redirect" would fit best.

samj April 20, 2009 at 12:49

I was on the rel=short[cut] bandwagon until I realised that there will surely be difficult-to-diagnose breakage thanks to microsoft and others [foolishly] pushing "shortcut icon".

The only truly safe option that doesn't require people to cover multiple bases is rel=shortlink.

Sam

Erik Vold April 21, 2009 at 05:18

Here is my rebuttal to rel=short*: http://erikvold.com/blog/in...