Posts tagged with: rdfa

RDFa button (inofficial)

A
Update/Note: This is not an official RDFa button, those (in the known colours) will be provided by W3C's Communications Team once RDFa is a Rec or CRec.

A couple of days ago I created an RDFa technology button, and I was asked to share it, so here it is:

RDFa
(PNG, GIF, SVG source file)

Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.

Adding (partial) RDFa support to the Firefox HTML Validator extension

I
Update (2008-04-24): I managed to get rid of the xmlns-related errors (.replace() to the rescue ;), so the extension now accepts markup that follows the latest RDFa DTD (including @typeof). And while at it, I created versions for win and mac.

One of the reasons I haven't been using RDFa in production is the problem of quality assurance (a.k.a. plain old html validation). Not because RDFa isn't valid markup as such, but the main tool I'm using during development is Marc Gueury's excellent HTML Validator Extension for Firefox. RDFa is valid XHTML+RDFa, but XHTML+RDFa is not HTML, so the extension reports dozens of errors starting with the unrecognized Doctype declaration. The W3C Markup Validator supports RDFa, but I often develop while I'm offline, or on a non-public Web server, and the little "0 errors / 0 warnings" message in the status bar is more convenient than having to send markup to an online service.

Yesterday, however, I started working on an RDFa generator for one of Intellidimension's projects (Very interesting to see them use RDF big time, while many of us are still experimenting and thinking about potential markets, BTW). So, now that the RDFa-caused messages made it almost impossible to spot real HTML errors, I wondered if the add-on could perhaps be hacked to accept RDFa as well. Long story short: It can, to a certain extent. I don't know if arbitrary XML namespace prefixes (xmlns:foo="...") can be supported by a pure DTD/SGML-based validator (the FF extension uses openSP). FWIW, I couldn't get it to work.

Apart from that, RDFa-enabling the extension was mainly copying the RDFa DTD and a set of modules to the plug-in's SGML library. It now happily accepts RDFa attributes (about, resource, property, datatype, content, etc) and makes my life a little bit easier. If anyone has an idea how I could make it accept (non-predefined) namespace prefixes as well, I'd appreciate hints.

The tweaked extension is so far just a hack. I didn't even ping Marc yet or change the internal ID, so any extension update will remove the RDFa functionality. You can try/download it if you like (windows version), but I may have to take it offline should Marc not be happy about the re-distribution.

Moving out of the shadow with RDFa

R
Ian Davis has written an interesting series of posts related to the problems arising from using fragment identifiers in resource URIs. Ian makes a lot of valid points, but I think misses an essential one. (With this post I'm breaking with a long tradition, I'm saying positive things about RDFa ;)

So, what's the problem, and how can RDFa help? Ian is discussing a lot of architectural things, and I'm sure there are issues and inconsistencies. But the practical problem he describes is based on the following WebArch principle:
The fragment identifies a portion of a representation obtained from a URI,
and its meaning changes depending on the type of representaion. [sic]
That means that you can't use "http://example.com/ben#self" as an HTML section identifier and as a non-document identifier (e.g. the person ben). Ian concludes that
You can have a machine readable RDF version or a human readable HTML
version but not both at the same time
and that this forces the structured web into a disregarded shadow of the human-readable web.

I think that conclusion is not correct. eRDF re-uses HTML's @id to establish resource identifiers, so it mixes document identifiers with non-doc ones, and this is an ambiguity problem indeed. RDFa, however, is a layer on top of HTML that introduces a dedicated mechanism for resource identification, the @about attribute (, and that's why it unfortunately needs an own DTD, but that's another story). From a WebArch POV, the design is clean, content-type-specific identifiers don't get mixed. I can unambiguously describe what "..ben#self" is meant to identify without the representation format playing a role. RDFa can re-purpose HTML's text nodes for RDF literals, and anchors for resource URIs, but apart from that, the HTML document is not much more than a (human-friendly) container.

So, you can serve HTML and machine-readable information in a single document, you just have to make sure that your resource URI fragments don't appear in HTML @ids. And now that we are back on the practical level: Any other ID generation mechanism can work, too. It's fairly easy to implement a URI generator for RDF extracted from a microformats-enabled HTML page without overloading resource IDs. I personally don't see a huge problem (again, practically), as all my applications work with triples, not with representations or encodings which are dealt with by the parsers and extractors.

One practical issue remains, though: Current browsers don't (natively) support navigating to RDF identifiers encoded in RDFa-, microformats-, or GRDDL-enabled HTML pages. You need an additional JavaScript lib to invoke appropriate scroll actions after a page URI with a (non-HTML) fragment identifier is loaded. That's a little annoying, but doable. I think fragment identifiers are valuable. They allow the description of multiple resources in a single document, and that's a handy feature. Whether that breaks Web architecture theory, dunno. Not for me, at least ;-)

A Comparison of Microformats, eRDF, and RDFa

A
Update (2006-02-13): In order to avoid further flame wars with RDFa folks, I've adjusted the form to not show my personal priorities as default settings anymore (here they are if you are interested, it's a 48-42-40 ranking for MFs, eRDF, and RDFa respectively). All features are set to "Nice to have" now. As you can see, for these settings, RDFa gets the highest ranking (I *said* the comparison is not biased against RDFa!). If you disable the features related to domain-independent resource descriptions, MFs shine, if you insist on HTML validity, eRDF moves up, etc. It's all in the mix.

After a comment of mine on the Microformats IRC channel, SWD's Michael Hausenblas asks for the reason why I said that I personally don't like RDFa. Damn public logs ;) OK, now I have to justify that somehow without falling into rant mode again...

I already wrote a little comparison of Microformats, Structured Blogging, eRDF, and RDFa some time ago, sounds like a good opportunity to see how things evolved during the last 8 months. Back then I concluded that both eRDF and RDFa were preferred candidates for SemSol, but that RDFa lacked the necessary deployment potential due to not being valid HTML (as far as any widespread HTML spec is concerned).

I excluded the Structured Blogging initiative from this comparison, it seems to have died a silent death. (Their approach to redundantly embed microcontent in script tags apparently didn't convince the developer community.) I also excluded features which are equally available in all approaches, such as visible metadata, general support for plain literals, being well-formed, no negative effect on browser behaviour, etc.

Pretending to be constructive, and in order to make things less biased, I embedded a dynamic page item that allows you to create your own, tailored comparison. The default results reflect my personal requirements (and hopefully answer Michael's question). As your mileage does most probably vary, you can just tweak the feature priorities (The different results are not stored, but the custom comparisons can be bookmarked). Feel free to leave a comment if you'd like me to add more criteria.

No. Feature or Requirement Priority MFs eRDF RDFa
1 DRY (Don't Repeat Yourself) yes yes mostly
2 HTML4 / XHTML 1.0 validity yes yes no
3 Custom extensions / Vocabulary mixing no yes yes
4 Arbitrary resource descriptions no yes yes
5 Explicit syntactic means for arbitrary resource descriptions no no yes
6 Supported by the W3C partly partly yes
7 Follow DCMI guidelines no yes no
8 Stable/Uniform syntax specification partly yes yes
9 Predictable RDF mappings mostly yes yes
10 Live/Web Clipboard Compatibility yes mostly mostly
11 Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment) mostly partly partly
12 Support for not just plain literals (e.g. typed dates, floats, or markup). yes no yes
13 Triple bloat prevention (only actively marked-up information leads to triples) yes yes no
14 Possible integration in namespaced (non-HTML) XML languages. no no yes
15 Mainstream Web developers are already adopting it. yes no no
16 Tidy-safety (Cleaning up the page will never alter the embedded semantics) yes yes no
17 Explicit support for blank nodes. no no yes
18 Compact syntax, based on existing HTML semantics like the address tag or rel/rev/class attributes. yes mostly partly
19 Inclusion of newly evolving publishing patterns (e.g. rel="nofollow"). yes no partly
20 Support for head section metadata such as OpenID or Feed hooks. no partly partly

Results

Solution Points Missing Requirements
RDFa 35 -
eRDF 34 -
Microformats 33 -

Max. points for selected criteria: 60

Summary:

Your requirements are met by RDFa, or eRDF, or Microformats.

Feature notes/explanations:

DRY (Don't Repeat Yourself)
  • RDFa: Literals have to be redundantly put in "content" attributes in order to make them un-typed.
HTML4 / XHTML 1.0 validity
  • RDFa: Given the buzz around the WHATWG, it's uncertain when (if at all) XHTML 2 or XHTML 1.1 modules will be widely deployed enough.
Explicit syntactic means for arbitrary resource descriptions
  • eRDF: owl:sameAs statements (or other IFPs) have to be used to describe external resources.
Supported by the W3C
  • MFs, eRDF: Indirectly supported by W3C's GRDDL effort.
Stable/Uniform syntax specification
  • MFs: Although MFs reuse HTML structures, the format syntax layered on top differs, so that each MF needs separate (though stable) parsing rules.
Predictable RDF mappings
  • MFs: Microformats could be mapped to different RDF structures, but the GRDDL WG will probably recommend fixed mappings.
Live/Web Clipboard Compatibility
  • eRDF, RDFa: Tweaks are needed to make them Live-Clipboard compatible.
Reliable copying, aggregation, and re-publishing of source chunks. (Self-containment)
  • MFs: Some Microformats (e.g. XFN) lose their intended semantics when regarded out of context.
  • eRDF/RDFa: Only chunks with nearby/embedded namespace definitions can be reliably copied.
Support for head section metadata such as OpenID or Feed hooks.
  • eRDF: Can support openID hooks.
  • RDFa: Will probably interpret any rel attribute.


Bottom line: For many requirement combinations a single solution alone is not enough. My tailored summary suggests for example that I should be fine with a combination of Microformats and eRDF. How does your preferred solution mix look like?

SeenOn - Timestamp or State of Mind?

f
<tommorris> Every time I see a movie from now on,
  I'm adding the IMDB URL to my FOAF file.
<briansuda> with what predicate?
<tommorris> rdf.opiumfield.com/movie/0.1/seen
...
<briansuda> seenOn, is that a timestamp or a state-of-mind?
(microformats(!) irc channel)

Now, who said RDF was less real-word-ish than microformats?

Related link (wrt to movies, not toxics): Microformats 80%, RDF 20% by Tom Morris about the longtail utility of (e)RDF(a). Wanted to state something like this for some time. After implementing a Microcontent parser (part of the next ARC release) that creates a merged triple set from eRDF and Microformats, I can't say anymore that MFs don't scale (even though making the meaning of nested formats explicit is sometimes tricky). I was really impressed by the amount of practical use cases covered by them (Listings and qualified review ratings even go beyond the demos I've seen in RDFer circles). However, there is still a lot of room for custom RDF extensions that can be used to extend microformatted HTML. Skill levels are just one of many longtail examples: They are currently not covered by hResume, but available in Uldis' CV vocab.

The important thing IMO is that RDFers should not forget to acknowledge the amazing deployment work of the MF community and focus on what they can add to the table (storage, querying, and mixing, as a start) instead of marketing RDF-in-HTML as an alternative, replacement, or otherwise "superior" (likewise the other way round, btw.). I think we also shouldn't overcharge the big content re-publishers. When maintainers of sites like LinkedIn or Eventful get bombed with requests to add different semantic serializations to their pages, they may hesitate to support any of them at all. For most of these mainstream sites, Microformats do the job just fine, and often better. Why should people for example have to specify namespaces when a simple, agreed-on rel-license does the trick already? (We could still use RDF to specify the license details, and even the license link is only a simple conversion away from RDF.)

ARC Embedded RDF (eRDF) Parser for PHP

A
Update: The current RDFa primer is *not* broken wrt to WebArch, the examples were fixed two weeks ago. I've also removed the "no developer support" rant, just received personal support ;-)

While searching for a suitable output format for a new RDF framework, I've been looking at the various semantic hypertext approaches, namely microformats, Structured Blogging, RDFa, and Embedded RDF (eRDF). Each one has its pros and cons:

Microformats:
  • (+) widest deployment so far
  • (+) integrate nicely with current HTML and CSS
  • (-) centralized project, inventing custom microformats is discouraged
  • (-) don't scale, the number of MFs will either be very limited, or sooner or later there will be class name collisions

Structured Blogging:
  • (+) a large number of supporters (at least potentially, the supporters list is huge, although this doesn't represent the available tools)
  • (+) not a competitor, but a superset of microformats
  • (-) the metadata is embedded in a rather odd way
  • (-) the metadata is repeated
  • (-) the use cases are limited (e.g. reviews, events, etc)

RDFa:
  • (+) follows certain microformats principles (e.g. "Don't repeat yourself")
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) RDF-focused
  • (+) W3C-supported
  • (-) Not XHMTL 1.0 compliant, it will take some time before it can be used in commercial products or picky geek circles
  • (-) The default datatype of literals is rdf:XMLLiteral which is wrong for most deployed properties

eRDF:
  • (+) follows the microformats principles
  • (+) freely extensible
  • (+) All resource descriptions (e.g. for events, profiles, products, etc.) can be extracted with a single transformation script
  • (+) uses existing markup
  • (+) XHTML 1.0 compliant
  • (+) RDF-focused
  • (-) Covers only a subset of RDF
  • (-) Does not support XML literals

So, both RDFa and eRDF seem like good candidates for embedding resource descriptions in HTML. The two are not really compatible, though, it is not easily possible to create a superset which is both RDFa and eRDF. However, my publishing framework is using a Wiki-like markup language (M4SH) which is converted to HTML, so I can add support for both approaches and make the output a configuration option. Maybe it's even possible to create a merged serialization without confusing transformers.

I'll surely have another look at RDFa when there is better deployment potential. For now, I've created a M4SH-to-eRDF converter (which is going to be available as part of the forthcoming SemSol framework), and an eRDF parser that can generate RDF/XML from embedded RDF. I've also added some extensions to work around (plain) eRDF's limitations, the main one being on-the-fly rewriting of owl:sameAs assertions to allow full descriptions of remote resources, e.g.
<div id="arc">
  <a rel="owl-sameAs" href="http://example.com/r/001#001"></a>
  <a rel="doap-maintainer" href="#ben">Benjamin</a>
</div>
is automatically converted to
<http://example.com/r/001#001> doap:maintainer <#ben>

The parser can be downloaded at the ARC site (documentation).
I've also put up a little demo service if you want to test the parser.

YARDFIXHTML - Yet Another RDF-In-XHTML proposal

I
Ian Davis proposes "Embedded RDF", a microformats-inspired path to metadata-enriched HTML. Unlike microformats, his approach can utilize a single generic transformation script instead of one transformation for each format (or micromodel if you prefer Danny Ayers' terminology), which is closer to RDF's idea of freely mixable vocabularies.

I had some hopes of RDF/A but stopped following its progress several months ago as it didn't seem to provide an easy way to really bridge the gap between HTML and RDF. My use case was (and is) to be able to markup html in a way which allows me to automatically (and without too much effort) generate context menus or tool-tips à la "Show me more info about this person" etc. I'm not sure if Ian's proposal provides a solution for this, but at least it seems to be much easier to grok than RDF/A, and not requiring XHTML2 is IMHO a huge plus.

It would perhaps have been helpful (or at least polite) to coordinate the effort with the RDF in XHTML Taskforce a bit. The W3C is working on an own approach for quite some time now (which may as well be the reason for Ian to go it alone..). However, the nice thing about GRDDLy approaches is that you don't really need a large user community for each format. For RDFers, it's another long-tail example, as long as the transformation process can be automated. Every triple counts ;)

Update: I had a closer look now at the embeddable RDF examples, and a simple
<span id="p1">
  <a rel="foaf-weblog" href="http://jd.com/blog" class="foaf-name">
    John Doe
  </a>
</span>
seems to be enough to implement the use case I mentioned. That's really cool as I'll now finally be able to hook an RDF store to my HTML publishing tools. It'll need some behaviour-like extensions, but it should be doable now without too many problems!

My publishing system uses a homegrown markup language internally which is sort of a trade-off between simplicity and flexibility, e.g. a Web link is generated by
[[http://www.example.com/ Link label goes here]] 

The HTML for a blog-identified person as illustrated above could then perhaps be created from something as simple as
[blog http://jd.com/blog John Doe] said ...
Hm, generating the internal markup could be supported by some SPARQL-based "suggest persons as you type" feature, but that's of course a completely different story (and to be told another time).