I've been thinking about this since Semantic Camp where I had an inspiring dialogue with Keith Alexander about semantics in HTML. We were wondering about the feasibility of a true microformats superset, where existing microformats could be converted to RDF without the need to write a dedicated extractor for each format. This was also about the time when "scoping" and context issues around certain microformats started to be discussed (What happens for example with other people's XFN markup, aggregated in a widget on my homepage? Does it affect my social graph as seen by XFN crawlers? Can I reuse existing class names for new formats, or do we confuse parsers and authors then? Stuff like that).
A couple of days ago I finally wrote up this "poshRDF" idea on the ESW wiki and started with an implementation for paggr widgets, which are meant to expose machine-readable data from RDFa, microformats, but also from user-defined, ad-hoc formats, in an efficient way. PoshRDF can enable single-pass RDF extraction for a set of formats. Previously, my code had to walk through the DOM multiple times, once for each format.
A poshRDF parser is going to be part of one of the next ARC revisions. I've just put up a site at poshrdf.org to host the dynamic posh namespace. For now the site links to a possibly interesting by-product: A unified RDF/OWL schema for the most popular microformats: xfn, rel-tag, rel-bookmark, rel-nofollow, rel-directory, rel-license, hcard, hcalendar, hatom, hreview, xfolk, hresume, address, and geolocation. It's not 100% correct, poshRDF is after all still a generic mechanism and doesn't cover format-specific interpretations. But it might be interesting for implementors. The schema could be used to generate dedicated parser configurations. It also describes the typical context of class names so that you can work around scoping issues (e.g. the XFN relations are usually scoped to the document or embedded hAtom entries).
I hope to find some time to build a JSON exporter and microformats validator on top of poshRDF in the not too distant future. Got to move on for now, though. Dear Lazyweb, feel free to jump in ;)
poshRDF - RDF extraction from microformats and ad-hoc markup
poshRDF is a new attempt to extract RDF from microformats and ad-hoc markup
Posted on 2008-11-10 at 12:25 UTC
by trackback)
(Comments are disabled for this post.
Comments and Trackbacks
But I'm not sure I understand: how do we manage uF which do not use the class attribute? (xfn, license) or those where the class value shall be considered (listing type in hListing, for example) ?
And: what about nesting?
Not sure about the class values that neither mark a relation or node (like the hListing actions you mention). Maybe specify them as booleans? They could then generate
_:hlisting123 mf:rent true ; mf:offer true .
The nesting is working quite nicely so far. For each term, you can define the scope (i.e. the possible containers). The poshRDF parser then uses the closest matching parent node as subject. This way, you'll have an hCard nested in an hReview nested in an hAtom entry still produce the correct mf:reviewer relation (instead of mf:author). There might be cases where you may want to re-use relations in multiple nested microformats, those are not supported by poshRDF (which might not be a bad thing ;).
FWIW in my stuff I considered listing types as values of the listing (<> listingType: listing:offer) but I'm not sure it's sensible. In your framework It would maybe mean marking the element as rdf-s rdf-p rdf-o which seems ugly.
Or maybe they should be managed as subproperties of a generic relation (as XFN)?
And interesting to see that we also agree on defining entity/properties groups to ease nested parsing: it means I'm not being completely stupid :) (I do not have single pass parsing though, cause I put definitions in separate files so they can be used/extended independently)
Thanks for the mind food :)