finally a bnode with a uri

A Comparison of Microformats, eRDF, and RDFa

An updated (and customizable) comparison of the different approaches for semantically enhancing HTML.
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly

Results

Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60

Summary:

Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.


Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

Comments and Trackbacks

This comparison chart is a good start, but it's quite biased too. Putting things like "validation" on the same level as "extensible" is a bit confusing, especially since RDFa can be implemented only with extra attributes (which people are doing all the time already without a working W3C validator.) Points to consider: (1) re-mixing different *existing* vocabularies (can't do it in MFs unless it's specified.) (2) making deeper statements like "this page has an author whose name is Ben" (eRDF can't do that), or (3) making statements about other URLs (eRDF can't do it, MFs can do it only in a domain-specific way.)

Here's a practical example: which of these technologies can express Flickr's latest machine tags regarding an embedded photo? Only RDFa.
Comment by Ben on 2007-02-12 20:30:50 UTC
Hi Ben,

it's quite biased too
I disagree. My preferences obviously differ from yours, but the comparison itself should be un-biased as you can disable any feature you don't consider relevant.

Putting things like "validation" on the same level as "extensible" is a bit confusing
That's exactly why I added these custom priorities. Just select "I don't care", and validation won't affect the calculated result. Different priorities also lead to different ratings, so the criteria are *not* on the same level, unless you specifiy them as such.

especially since RDFa can be implemented only with extra attributes
Which is exactly the point. The other approaches don't need extra attributes which are invalid in any deployed HTML spec. Again, if it's a non-issue for you, just set the priority to zero ("I don't care"). With regard to non-attribute features: If RDFa can be implemented with just attributes, why are the other serialization options still part of the syntax? If we should have learned anything from the RDF/XML deployment issues, then it's not putting unnecessary "optimizations" into a spec.

(1) re-mixing different *existing* vocabularies
That's what I meant with "Custom extensions" (Feature No 3.). Maybe I should rename it, but from an RDF POV, it's irrelevant if a vocabulary exists or not.

(2) making deeper statements like "this page has an author whose name is Ben" (eRDF can't do that)
Sorry, that's not true. Please have a look at the very first example in the eRDF doc ;)

(3) making statements about other URLs (eRDF can't do it, MFs can do it only in a domain-specific way.)
True for MFs, but they are domain-specific by design. Not true for eRDF: Creating relations to other docs can be done via rel/rev in eRDF, describing arbitrary resources via a simple owl:sameAs statement.

Here's a practical example: which of these technologies can express Flickr's latest machine tags regarding an embedded photo? Only RDFa.
Sorry, that may be true for MFs, but it's wrong again for eRDF. Here is a flickr machine tag snippet:

[a href="/photos/baddie80/tags/geolon767979417/" class="Plain"]geo:lon=7.67979417[/a]

Re MFs, you could only add a rel-tag, which might still be usable for certain tags (e.g. camera models). In eRDF, you'd have to add a span around the 7.67979417. Plus an owl:sameAs. Plus a namespace definition. That's it. Could it be that you guys haven't really looked at eRDF yet?

Comment by Benjamin Nowack on 2007-02-12 21:29:08 UTC
Nice try, we have looked at eRDF quite a bit :)

What you forget to mention is that you lose self-containment in eRDF the moment you need a LINK rel/rev: now you have to modify the HEAD of the document as well as the BODY. And that's not doable when you want to write a simple blog entry, when you're using your average content management system, when you want copy-and-paste, etc...

You also can't do what I specifically mentioned (deep structure) without naming the intermediate node, which is really annoying and almost a deal breaker if you want to, for example, plop down your bibtex entries in the page.

Regarding the flickr machine tag: fair enough, that can be made to work in eRDF, although you're now stuck declaring a custom namespace in the HEAD, which is hardly self-contained by any definition.

Regarding the multiple serializations... we're working on simplifying, and you're right that there shouldn't be multiple ways to do the same thing. It takes time to simplify, just like it takes time to build up the right features.

That said, the pattern is very clear: every time you stretch to do something a bit more complex with eRDF or MF, you end up giving up something significant. RDFa is meant to not give up on anything, and for that we added a few attributes. It is interesting that you're willing to give up so much just so you can validate, when RDFa is already conformant (if you browser conforms to the spec, RDFa won't break anything.)

As for being biased: come on, claiming you're not is just asking for trouble. You put "custom extensions" as a single item on the list, when it really needs to be broken up into multiple items so people can express which parts they really want and which they care less about. They may not want their own vocab, but they may want to mix existing ones. You're right that in RDF that's the same thing, but in MF it's definitely not. What about "won't break browser rendering?" Why isn't that an important feature that, for just about anyone, outweighs validation?

I simply don't understand some of your other judgments. RDFa is clearly more self-contained than eRDF, given the declaration of namespaces. You also handwave that microformats can "support emerging standards like nofollow" without explaining how they would do such a thing and who decides, given that no MFs out there use a profile URL to indicate what should be parsed. I'm also still looking for how tidy breaks RDFa.

What it comes down to in the end is this: RDFa doesn't validate with current validators. But, given that it's just extra attributes, is there really a problem? Go back to fundamentals and ask yourself: is HTML with a few extra attributes really broken? Should it really be that, if you add attributes for purposes that the browser can ignore (look at the Dojo toolkit), suddenly you're broken? I don't think so. And if you believe in an extensible web, then neither should you.
Comment by Ben Adida on 2007-02-12 22:36:05 UTC
Subject must be in current document (excepting inferred labels) -- makes eRDF a whole lot less interesting for me.
Comment by Mike Linksvayer on 2007-02-13 04:40:56 UTC
Ben, thanks for the reply. It's good to see this discussion happen.

What you forget to mention is that you lose self-containment in eRDF the moment you need a LINK rel/rev: now you have to modify the HEAD of the document as well as the BODY. And that's not doable when you want to write a simple blog entry, when you're using your average content management system, when you want copy-and-paste, etc...
Please look at the different feature values, they already reflect that eRDF does not support full self-containment. See below for a comment on the head/body non-issue.

You also can't do what I specifically mentioned (deep structure) without naming the intermediate node, which is really annoying and almost a deal breaker if you want to, for example, plop down your bibtex entries in the page.
Right, and covered by feature "Explicit support for blank nodes". If it's a requirement for you, just tweak the priority setting. After your previous comment I've also added two additional features: "Arbitrary resource descriptions", and "Explicit syntactic means for arbitrary resource descriptions" which will increase RDFa's score, depending on your priorities.

Regarding the flickr machine tag: fair enough, that can be made to work in eRDF, although you're now stuck declaring a custom namespace in the HEAD, which is hardly self-contained by any definition.
No, eRDF allows you to define namespaces in the body section. I'm not sure if Ian added that to the spec already, though. You simply use the a-tag to do so. Again, this comparison is not meant to be "either-xor" (quite the opposite, actually), there is no point in poking artificial holes into one solution. Try to see things positive: both eRDF and RDFa allow you to describe flickr's machine tags by using custom namespaces, which should be covered by the feature matrix.

Regarding the multiple serializations... we're working on simplifying, and you're right that there shouldn't be multiple ways to do the same thing. It takes time to simplify, just like it takes time to build up the right features.
Great! I know that you are working hard to improve RDFa. Let me repeat that this comparison isn't meant to bash RDFa. It's just for evaluating *personal* priorities against the different approaches.

That said, the pattern is very clear: every time you stretch to do something a bit more complex with eRDF or MF, you end up giving up something significant. RDFa is meant to not give up on anything, and for that we added a few attributes. It is interesting that you're willing to give up so much just so you can validate, when RDFa is already conformant (if you browser conforms to the spec, RDFa won't break anything.)
Again, as the comparison shows, there are different offers on the table, each with its own advantages and disadvantages. Some people don't need more complex features, others, like you and me, do. Having to define namespaces may be a deal-breaker for one developer, Tidy alerts may be one for someone else. This can all be tailored in the form above.

As for being biased: come on, claiming you're not is just asking for trouble. You put "custom extensions" as a single item on the list, when it really needs to be broken up into multiple items so people can express which parts they really want and which they care less about. They may not want their own vocab, but they may want to mix existing ones. You're right that in RDF that's the same thing, but in MF it's definitely not.
MFs don't support custom extensions at all. eRDF and RDFa support them in the same way. Splitting feature #3 into three separate items wouldn't really have an effect on the overall score. And of course I'm biased. That's why I added the custom priorities, so that you can be biased, too. Just add another comment with a link to your customized comparison result.

What about "won't break browser rendering?" Why isn't that an important feature that, for just about anyone, outweighs validation?
As I said in the intro, this is not a feature that makes any difference wrt to the offered solutions, and is thus kept out of the feature list. Neither MFs, nor eRDF, nor RDFa break browser rendering. No matter what priority you'd pick for it, it wouldn't change the qualitative ranking.

I simply don't understand some of your other judgments. RDFa is clearly more self-contained than eRDF, given the declaration of namespaces.
The point is that either you *can* reliably copy chunks, or you can't. From the publisher's side, you either *can* produce self-contained snippets, or you can't. For the latter, the result would again be "yes" for all three approaches, so it's excluded from the feature matrix.

You also handwave that microformats can "support emerging standards like nofollow" without explaining how they would do such a thing and who decides, given that no MFs out there use a profile URL to indicate what should be parsed.
MFs are set of hard-coded conventions. The MFs crowd uses mailing lists and a wiki to develop new formats. They are not restricting themselves to a fixed syntax (only at the lowest HTML level), so they can embrace new patterns, while eRDF and RDFa are bound to their syntactical constructs (class, rel, rev, about, property, etc.). But hey, there *is* another feature lurking in here: "Stable syntax specification" (now added to the feature list).

I'm also still looking for how tidy breaks RDFa.
Try putting a meta or link tag into the body section and then run the html through tidy. They will be moved to the head. This is related to the point you made above, if RDa is going to drop support for link and meta in the body, this is a non-issue and I'll happily remove it from the comparison.

What it comes down to in the end is this: RDFa doesn't validate with current validators. But, given that it's just extra attributes, is there really a problem? Go back to fundamentals and ask yourself: is HTML with a few extra attributes really broken? Should it really be that, if you add attributes for purposes that the browser can ignore (look at the Dojo toolkit), suddenly you're broken? I don't think so. And if you believe in an extensible web, then neither should you.
I never said that improving the HTML spec is wrong. The right question is whether it's necessary for all use cases. It may be helpful for some, with the cost of having to change running systems. I also didn't say that there is no value in RDFa (I still get (almost) the same ranking for MFs and RDFa after all). Just tweak the priorities above and you'll probably end up with a combination that suggests to use RDFa alone. But it's simply not realistic to think there will ever be only one single solution. Even a perfect proposal for RDF-in-HTML probably wouldn't have kept others from creating a namespace-free approach. And that's basically all I tried to show with the comparison. Everyone has different priorities. For you, RDFa is clearly all that's needed, for others it's MFs, for me it's currently a combination of MFs and eRDF, in the future it might be MFs and RDFa, or maybe RDFa alone.
Comment by Benjamin Nowack on 2007-02-13 09:29:36 UTC
Mike,
you can use the rev attribute to switch subject and object. For things beyond that, you need owl:sameAs, which carries the same semantics as RDFa's "about" (although RDFa's syntax is clearly more compact for that feature)
Comment by Benjamin Nowack on 2007-02-13 09:34:20 UTC
I've been using tagsoup for a while instead of tidy, and it causes no problems with RDFa markup.
Comment by Bob DuCharme on 2007-02-13 13:21:52 UTC
I hadn't thought of using owl:sameAs. Neat workaround, but doesn't it create an intermediate triple (<> owl:sameAs <myDesiredSubject> .) that is probably inaccurate (I know, the subject could be an #id)? Strikes me as non-intuitive at best.
Comment by Mike Linksvayer on 2007-02-14 02:06:48 UTC
The subject is always the current scope, usually an id value, you probably wouldn't use owl:sameAs anywhere outside of an id'd element. The mechanism is similar to RDFa's "about" boundary. (ARC's eRDF parser has native support for owl:sameAs and does URI consolidation at runtime, so you won't end up with additional triples. But that's of course just one tool and a custom extension.)

There could even be a (slight) advantage for using sameAs, as it allows you to refer to external things in terms of local ids (i.e. interlink different page sections that talk about external, but connected resources) which can lead to more compact markup. In general, however, RDFa's explicit "about" construct is surely more intuitive.
Comment by Benjamin Nowack on 2007-02-14 08:17:34 UTC
A question: Say I have quibbles with both RDFa and eRDF, so I invent my own syntax, create an XSLT stylesheet and a GRDDL profile for it; what do I lose by not going with a more standardised approach?
Comment by Keith Alexander on 2007-04-17 09:04:32 UTC
Keith, I'd say each of the three approaches is "standardised enough". GRDDL processors could be able to handle any syntax. If you want others to use your format, too, you'd have to provide documentation and promote it on your own, of course, which might be an argument for e/RDF/a.
Comment by Benjamin Nowack on 2007-04-18 12:24:25 UTC
I've been thinking about this a little more - one thing you do lose by not using a 'standard' format, is, perhaps, the ability of processors/user-agents to harness dual context of your data in the DOM tree and RDF - for example, your Web Clipboard idea copying and pasting eRDF/RDFa snippets between applications.

I put up a small demo page that shows you data from eRDF fragments that are linked to from the page. This is only really practical because of the close tie between the html page and the data.
Comment by Keith Alexander on 2007-05-12 13:52:17 UTC
Hmm, very cognitive post.
Is this theme good unough for the Digg?
Comment by Serenia on 2009-02-27 10:09:04 UTC
1 comment is currently in the approval queue.

Comments are disabled for this post.

Earlier Posts

Later Posts

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds