Microdata, semantic markup for both RDFers and non-RDFers

RDF-in-HTML could have been so simple.
There's been a whole lot of discussion around Microdata, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.

Unfortunately, RDFa lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, native, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.

But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.

Here is a short overview of RDF features supported by Microdata:
  • Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)
  • Subject declaration, via @itemid (RDFa uses @about)
  • Main subject typing, via @itemtype (RDFa uses @typeof)
  • Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)
  • Literal objects, via node values (RDFa also allows hidden values via @content)
  • Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)
  • Object language, via @lang
  • Blank nodes
I won't go into details why hiding semantics in RDFa will be penalized by search engines as soon as spammers discover the possibilities, why reusing RDF/XML's attribute names was probably not a smart move with regard to attracting non-RDFers, why the new @vocab idea is impractical, or why namespace prefixes, as handy as they are in other RDF formats, are not too helpful in an HTML context. Let's simply state that there is a trade-off between extended features (RDFa) and simplicity (Microdata). So, what are the core features that an RDFer would really need beyond Microdata:
  • the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral
  • datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)
Markup preservation is currently turned on by default in RDFa and can be disabled through @datatype in RDFa, so an RDFer-satisfying RDFa 1.1 spec could probably just be Microdata + @datatype + a few extended parsing rules to end up with the intended RDF. My experience with watching RDF spec creation tells me that the RDFa group won't pick this route (there simply is no "Kill a Feature" mentality in the RDF community), but hey, hope dies last.

I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. ARC is going to get a "microRDF" extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope my related suggestion will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):

Microdata:
<div itemscope itemtype="http://xmlns.com/foaf/0.1/Person">

  <!-- plain props are mapped to the itemtype's context -->
  <img itemprop="img" src="mypic.jpg" alt="a pic of me" />
  My name is <span itemprop="name"><span itemprop="nick">Alec</span> Tronnick</span>
  and I blog at <a itemprop="weblog" href="http://alec-tronni.ck/">alec-tronni.ck</a>.

  <!-- other RDF vocabs can be used via full itemprop URIs -->
  <span itemprop="http://purl.org/vocab/bio/0.1/olb">
    I'm a crash test dummy for semantic HTML.
  </span>
</div>
Extracted RDF:
@base <http://host/path/>
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix bio: <http://purl.org/vocab/bio/0.1/> .
_:bn1 a foaf:Person ;
      foaf:img <mypic.jpg> ;
      foaf:name "Alec Tronnick" ;
      foaf:nick "Alec" ;
      foaf:weblog <http://alec-tronni.ck/> ;
      bio:olb "I'm a crash test dummy for semantic HTML." .

Naming Properties and Relations (comment)

A local comment to JeniT's post about predicate names
I was incapable of adding a comment to Jeni's interesting post about RDF predicate Names (markdown-related, my fault), so I'll quickly post it here, as I'm pondering similar things, too.

In her post, Jeni explores the issues around naming RDF terms. The community gathered a couple of experiences and suggestions in the last years, some entry points are:
I personally find "role-noun" easier to support in RDF apps than the older hasPropertyOf (now often considered anti-)pattern. And inverse properties are just painful, as they usually require some form of inference to streamline the user experience.

Not sure if that's helpful information, but for a project around semantic note-taking/logging, I played with different notations users might be comfortable with, for entering factoids using an unstructured input form (à la Twitter). I could identify the following patterns that still seemed to be acceptable (as shared/supported syntax). All of them can be implemented using role-noun predicates (assuming that predicate labels are similar to the predicate names):
  • SUBJECT'(s)? PREDICATE (:|is) OBJECT
  • OBJECT is SUBJECT'(s)? PREDICATE
  • OBJECT is (the)? PREDICATE of SUBJECT
  • SUBJECT has PREDICATE (:)? OBJECT
  • (the|a)? PREDICATE(s)? of SUBJECT (is|are) OBJECT ((,|and|&) OBJECT)*
(There are more patterns, for things like tagging and typing, but the examples above are the predicate-related grammar rules).

As soon as you add (has|is|of) to one PREDICATE, you get problems with the other notations, so role-noun seems to be a good fit.

Unfortunately, one (non-trivial) problem remains: People (and Web 2.0 apps) also like 'SUBJECT PREDICATE_VERB OBJECT' (e.g. "likes", "bookmarked", "said", "posted", "is listening to" ...) and I don't have a proper idea how to handle those automatically yet, other than hard-coding support for the typical social media verbs. It could be possible to use wordnet to detect verbs and derive a canonicalized form, and then model those patterns as activities (activity = liking, bookmarking, saying, posting, listening, plus ACTIVITY_PERSON and ACTIVITY_TARGET or somesuch). If anyone has a suggestion, I'd be happy to hear it.

New ARC2 release

Finally in sync with code.semsol.org and the BZR repository
I moved ARC's codebase to a BZR repository 2 months ago but didn't really find the time to synchronize it with the way I created bundles in the past. Today I finally linked the repository and its TGZ creation feature from the main download page. This is the first bundle since March, so there are quite a number of fixes. Some tweaks were not logged, but from now on, the process should be more professional (thanks to the proper versioning system).

Here is the raw list of changes, the most interesting are probably the improved RDFa extractor (cheers to Toby Inkster and Masahide Kanzaki for code) and the new auto-cleanup of unused values/hashes in the RDF store. I received a couple of more patches which will be integrated in the coming weeks:
  • new component: Resource
  • new method: completeQuery (PREFIX-injection)
  • Reader: new method: getResponseHeaders
  • RDFa: fixes, +3 test case PASSes (thx to Toby Inkster & Masahide Kanzaki)
  • Class: auto-populate POST (php5 bug)
  • Class: refactored *PName methods
  • new methods: toIndex, toTriples, checkRegex
  • Parsers: unsetting reader object to fix garbage collection
  • SelectQueryHandler: improved LIKE-check for REGEX-rewriting
  • Class: used prefixes were not logged, leading to serialization gaps
  • Class: fixed root calculation bug in calcURI
  • Class: new methods: toDataURI/fromDataURI
  • ARC2_SPARQLScriptProcessor: improved automatic PREFIX injection
  • ARC2_RemoteStore: added automatic PREFIX injection and getResourceLabel method
  • ARC2_StoreSelectQueryHandler: fixed missing brackets in getExpressionSQL.
  • Reader: Improved timeout handling
  • Reader: support for port in http header (thx to Roan O'Sullivan)
  • Slowly starting to switch to inline PHPDoc documentation
  • Atom_Parser: Addition: support for link types
  • DeleteQueryHandler: Addition: cleanValueTables method (auto-called every 500 DELETE queries)
  • Class: new method: resetErrors
  • Class: switch from getScriptURI to getRequestURI in init()

In related news:
  • Tuukka Hastrup created an ARC 2 Starter Pack that simplifies the process of setting up an ARC store.
  • Andrew Ritz created a WordPress extension that lets you embed results from remote SPARQL endpoints directly in your blog pages.

SKOS + DC + Linked Data = Semantic Tagging?

Using Dublin Core terms to link SKOS concepts to Linked Data entities
Still looking for a simple way to tag concrete resources (to-do items, people, locations) with personal concepts (e.g. "non-profit", "research", "semweb"), and also with other non-conceptual resources (clients, projects), I skimmed through the fresh SKOS Recommendation. I'm still a fan of SKOS and frequently wonder about semweb apps where the internal models are grounded in pluggable, personal(!) SKOS schemes, instead of coordination-intensive RDF Schemas or OWL ontologies. I don't know if such an approach could really work, I guess network effects benefit more from rather tightly defined relations and identifiers. Mainly just to have it written down somewhere (this is really not well thought out yet), here are some of the related entry points and considerations:
  • Tagging should be personal.
    While I like the idea of grounding tags in existing dictionaries such as DBPedia, tags seem to work best when they are as user-defined and informal as possible. Last year, I experimented with a tool that allowed me to tag things with other people's delicious tags. It just felt wrong, I wanted my "own" tags. (I think the latest Faviki release is a nice example for combining the best of both worlds).
  • SKOS supports personal tags
    Concepts in SKOS are sort-of scoped (or "namespaced"). If I describe a "Fun" concept, it is defined as seen by the creator of the concept URI, i.e. I can annotate it with ':Fun dct:creator <#me> ; dct:created "2009-08-19"' etc, even though the general idea of Fun was clearly not invented by me, and definitely before today.
  • Tags should be safely portable
    Thanks to URIs, SKOS concepts can be ported to other applications, and they can be grouped and organized in so-called concept schemes, i.e. I could have a "Waving" in a "Dance" concept scheme, and also in a "Netiquette" scheme.
  • There is a need to merge tag sets
    If tags are used to organize all sorts of personal things, it should be possible to merge them into a unified model. Mainly for personal use ("personal world view"), but also for sharing with other people and linking to their views. This is again possible thanks to SKOS being based on RDF, URIs, and very loose semantics.
  • There is a need to tag real-world objects with concepts
    This is partly obvious. Tags are a means to an end. But while they are already widely used to annotate document-like resources (web pages, photos, etc), I'd also like to tag things like my projects, people in my address book, and similar non-documents. From the SKOS Primer: While the SKOS vocabulary itself does not include a mechanism for associating an arbitrary resource with a skos:Concept, implementors can turn to other vocabularies So, whatever predicate URI we are going to use, it's not going to be provided by SKOS directly.
  • Maybe Dublin Core terms can link non-documents to concepts
    This is a slightly controversial conclusion/assumption, given that DC terms are mainly associated with document metadata. But after exploring the DCMI website, I can't find any clear evidence that their terms can't be used more generally. Both the Usage Guide (thanks to Masahide for the pointer) and the Abstract Model actually support this thought. The Usage guide mentions that "DC metadata can be applied to other resources as well" (but notes that the suitability may depend on the particular context at hand), and the Abstract Model states that the notion of a Dublin Core "resource" is equivalent to "Resource" defined in RDF Schema, which can be anything, even including Literals. So, we can most probably use dct:subject or dct:relation to tag a project or person with a SKOS concept.
  • There is a need to associate concepts with real-world objects
    If we organize our personal concept space with SKOS, we may also want to more formally specify our personal concepts, so that other applications or people can merge them with their tags. Therefore, we need a predicate that can relate concepts to non-concepts such as DBPedia identifiers. Such a mechanism could maybe also help with RDF's general problem of URI aliases. I could have a personal, canonical concept URI for a resource and use it as a container for the resource's various aliases. Again, SKOS does not provide a predicate for this use case, so we've got to look elsewhere.
  • Maybe Dublin Core terms can link concepts to real-world objects
    Another possibly controversial conclusion, but again there is supporting text in the DCMI specs: "A value associated with the Dublin Core Subject property is a concept (a conceptual entity) or a physical object or person (a physical entity)". So, if the value of dc:subject can be a non-document, we can say things like :Berlin a skos:Concept; dct:subject dbpedia:Berlin .. This is very interesting because it could allow us to use dct:subject in both ways: for the tagging of things, and also for grounding tags. FOAF has a handy primaryTopic term, which could work in this context, too, but unfortunately, its scope is (currently) set to foaf:Document. DanBri also suggested the creation of a dedicated skos:it (or similar) predicate which would be even better.
  • Sometimes I'd like to "tag" real-world objects with real-world objects
    Don't know if tagging is still the right word here, but what I mean is a generic relation for arbitrary things in a common application context. Often, we can do better by specifying the relation between two resources, but in other cases, a simple, maybe just temporary link, is better than laziness leading to a completely non-annotated resource. Given the two DCMI-related findings above, we could maybe conclude that a predicate like dct:relation can also be used to relate a project to a person, or the other way round, without having to invent a new predicate.
</brain:dump>

SemWeb T-Shirt Shop closed

I've closed the Spreadshirt shop we set up a year ago, due to lack of interest.
Just a quick FYI: I've closed the SemWeb Spreadshirt Shop from last year. I never had a payout (you have to reach a certain amount of profit before you earn actual money), and as I plan/have to discontinue most of my many pet projects anyway (Simplify Your Life etc.), this one was rather easy to start with.

I guess my red semweb cap just became a rarity ;)

The Semantic Web - Not a piece of cake...

The SemWeb layercake diagram as an isometric infographic
For a client project I've been looking at Isometric Projection, which is not only nice for mapping 3D objects to a 2D environment, but even more so for adding a 3rd dimension to (previously) flat visual objects. The additional axis allows for much more information to be provided, without (necessarily ;) sacrificing compactness and simplicity.

While I was pushing small boxes around on a 30° grid, Jim Hendler tweeted about his Layer Cake talk from the recent Dagstuhl meeting (which is awesome, BTW. Read it, if you haven't yet) and I started to wonder if an isometric version of the tech stack could help reduce the overload resulting from the current two-dimensional ones. Not really, I fear, but it was a fun experiment nontheless. Might be worth exploring this a little further. At least the concepts can be separated from specific technologies and the application layer has a different angle than before (which I personally think makes more sense). Anyway, just wanted to share the result. Enjoy.

Semantic Web Technology Stack

Feel free to use and share.

Code.semsol.org - A central home for semsol code

Semsol gets code repositories and browsers
The code bundles on the ARC website are generated in an inefficient manual process, and each patch has to wait for the next to-be-generated zip file. The developer community is growing (there are now 600 ARC downloads each month), I'm increasingly receiving patches and requests for a proper repository, and the Trice framework is about to get online as well. So I spent last week on building a dedicated source code site for all semsol projects at code.semsol.org.

So far, it's not much more than a directory browser with source preview and a little method navigator. But it will simplify code sharing and frequent updates for me, and hopefully also for ARC and Trice developers. You can checkout various Bazaar code branches and generate a bundle from any directory. The app can't display repository messages yet (the server doesn't have bzr installed, I'm just deploying branches using the handy FTP option), but I'll try to come up with a work-around or an alternative when time permits.

Code Browser

CommonTag too complicated?

Not sure if the commontag effort sends the right message.
Update: I just read the spec again, I can't tag non-content with the CommonTag vocabulary. Too bad, ignore the last paragraph, please.

Sorry for raising my voice here, but some of us are really working hard to show that SemWeb technologies don't have to be complicated, and unfortunately, the new CommonTag effort seems to send exactly the opposite message.

Don't get me wrong, a widely used tagging ontology would be great. We do have 3 (or 4? 5?) tagging vocabularies already, but none really caught up, possibly because tagging is meant to be simple and the proposed solutions apparently weren't easy enough. CommonTag is promoted as being "simple" and "easy", but after looking at the examples in the QuickStart Guide, I'm not so sure:
  • The snippets are really off-putting (not only for Non-RDFers). Do I really need multiple nested HTML nodes to create something as simple as a tag?
  • Couldn't the term names be more intuitive? What could a ctag:Tag be? The actual tag or an intermediate resource that is then, err, tagged? A person ctag:tagged a resource, right? Ah, no.
  • Why aren't the term names at least consistent? "ctag:taggingDate" follows noun-role, "ctag:tagged" is a dunno, "ctag:means" is a present-form verb, "ctag:isAbout" sort-of follows the hasPropertyOf anti-pattern.
  • The vocabulary introduces aliases for well-deployed terms such as rdfs:label and dct:created, which makes its use in practical settings expensive (it'll ease things on the author side, though).

To be a little more constructive: Using the vocabulary doesn't have to lead to the complicated markup seen in the examples. I'm sure they'll soon get better snippets from someone in the RDFa community. And apart from that, there is also a handy term in the RDF Schema which might just be what you are looking for: "ctag:isAbout". It lets you directly point from a resource (default is the page) to a Linked Data identifier (e.g. from DBPedia), without the need for all those intermediate nodes (which lead to triple bloat and slow down SPARQL queries). CommonTag-consuming apps will have to implement some form of inferencing to handle "isAbout", but as the term is in the spec, I assume they plan to.

Granular modeling of tags is apparently tricky, but shouldn't there be some sweet spot? Something a little more expressive than rel-tag but less complex than a fully spec'd Tag ontology? xFolk looks promising, or maybe the CommonTag group members could have agreed on formalizing and supporting "scoped rel-tag" (rel-tags with an optional RDFa "about" container). Most rel-tag-to-RDF converters have some form of scoping already anyway (because tags can apply to reviews, pages, vcards, etc.). That would have been a cool outcome after 1 year of stealth work.

I may as well just over-stress the simplicity aspect here. Maybe CommonTag is "simple enough" for web publishers. There are some initial supporters, and for RDFers, the nested structures and bnodes will most probably be acceptable. So let's see how things evolve.

I personally think I'll have a closer look at ctag:isAbout. I'm still looking for an alternative to dc/dct:subject to tag arbitrary things with arbitrary identifiers, maybe CommonTag can provide it, although
<#me> ctag:isAbout dbpedia:Semantic_Web .
still doesn't sound right for a rich tag, and the domain is "ctag:TaggedContent" which sounds wrong for non-textual resources, too. (dct:relation is the best I could find so far for tagging things with things, but Dublin Core is coming from a publishing context and is therefore often recommended for describing publications only).

ESWC 2009 Linked Data Dashboards

A first Paggr application went live during ESWC2009.
In case you missed the tweets or a local announcement: The first Paggr application went online a few days ago. This year's ESWC Technologies Team pushed things a little further, with RFID tracking during the event and extended conference data that includes detailed session and date/time information (kudos to Michael Hausenblas for RDFizing even PDFs).

Based on this dataset, we provided a conference explorer and stress-tested the "Dog Food" server while at it. The system survived, but I also learned a lot. We used about 50 RDF stores for the different public and user-specific dashboards, which basically worked nicely. However, rendering non-ugly resource summaries requires a bit of endpoint hammering, and some of the more complex path queries resulted in timeouts. Yesterday, I had to create a mirror from the data dump to route a couple of widgets through a replicated (ARC :-) endpoint. But then this is also one of the powerful possibilities that come with semantic web technologies. You can often switch or double the back-end repository in no time, and without any code changes. (And as all the Sparqlets are created in a web-based tool, I didn't even have to upload a changed configuration file. I simply tweaked a SPARQLScript parameter.)

Anyway, there are a couple of public dashboards, in case you'd like to give it a try (it's still an early version), I also embedded a short screencast below. The system is going to be moved to a DERI server when the conference is over, but the URIs and data will probably stay stable. (And no, it won't really work with IE yet.) More to come!



HQ version (quicktime, 110MB)

Simple RDFication of SPARQL SELECT results with RDFa

How to use RDFa to make SELECT results locally available as RDF
A couple of weeks ago, I've written about the self-enforcing value spiral that RDF data enables. Here is an example about how RDFa can be used to support this "Repurpose-Republish" loop.

While data exchange between different semantic web sources is usually RDF-based (i.e. the data always maintain their semantics), there is one major exception: SPARQL SELECT queries. This developer-oriented operation returns tabular data (similar to record sets in SQL). Once the query result is separated from the query, the associated structural data is lost. You can't directly feed SELECT results back into a triple store, even though querying based on linked resources means that you have just created knowledge. It's a pity to show this generated information to human consumers only.

One of the demos at my NYC talk was a dynamic wiki item that pulled in competitor information from Semantic CrunchBase and injected that into a page template as HTML. The existing RDF infrastructure does not let me cache the SELECT results locally as usable RDF. And a semantic web client or crawler that indexes the wiki page will not learn how the described resource (e.g. Twitter) is related to the remote, linked entities.

wiki with linked data

However, by simply adding a single RDFa hook to the wiki item template, the RDF relation (e.g. competitor) can be made available again to apps that process my site content. This is basically how Linked Data works. But here is the really nifty thing: My site can be a consumer of its own pages, too, recursively enriching its own data.

markup-to-SELECT-to-RDFa-to-RDF

I tweaked the wiki script which now works like this: When the page is saved, a first operation updates the wiki markup in the page's graph (i.e. the not-yet-populated template). In a second step, the page URL is retrieved via HTTP. This will return HTML with RDFa-encoded remote data, which is then parsed by ARC, and finally added to the same graph. We end up with a graph that does not only contain the wiki markup, but also the RDFized information that was integrated from remote sites. After adding this graph to the RDF store, we can use a local query to generate the page and occasionally reset the graph to enable copy-by-reference. And all this without any custom API code.

rdfa-to-sparql

Back from New York "Semantic Web for PHP Developers" trip

Gave a talk and a workshop in NYC about SemWeb technologies for PHP developers
/me at times square I'm back from New York, where I was given the great opportunity to talk about two of my favorite topics: Semantic Web Development with PHP, and (not necessarily semantic) Software Development using RDF Technology. I was especially looking forward to the second one, as that perspective is not only easier to understand for people from a software engineering context, but also because it is still a much neglected marketing "back-door": If RDF simplifies working with data in general (and it does), then we should not limit its use to semantic web apps. Broader data distribution and integration may naturally follow in a second or third step once people use the technology (so much for my contribution to Michael Hausenblas' list of RDF MalBest Practices ;)

The talk on Thursday at the NY Semantic Web Meetup was great fun. But the most impressive part of the event were the people there. A lot to learn from on this side of the pond. Not only very practical and professional, but also extremely positive and open. Almost felt like being invited to a family party.

The positive attitude was even true for the workshop, which I clearly could have made more effective. I didn't expect (but should have) that many people would come w/o a LAMP stack on their laptops, so we lost a lot of time setting up MAMP/LAMP/WAMP before we started hacking ARC, Trice, and SPARQL.

Marco brought up a number of illustrating use cases. He maintains an (inofficial, sorry, can't provide a pointer) RDF wrapper for any group on meetup.com, so the workshop participants could directly work with real data. We explored overlaps between different Meetup groups, the order in which people joined selected groups, inferred new triples from combined datasets via CONSTRUCT, and played with not-yet-standard SPARQL features like COUNT and LOAD.

And having done the workshop should finally give me the last kick to launch the Trice site now. The code is out, and it's apparently not too tricky to get started even when the documentation is still incomplete. Unfortunately, I have a strict "no more non-profits" directive, but I think Trice, despite being FOSS, will help me get some paid projects, so I'll squeeze an official launch in sometime soon-ish.

Below are the slides from the meetup. I added some screenshots, but they are probably still a bit boring without the actual demos (I think a video will be put up in a couple of days, though).

Could Microdata work better for me than RDFa?

Just had a quick look at the Microdata proposal, wondering about its pros and cons.
I've always had my little issues with RDFa, mainly for personal reasons. I'm repeating them here (for the last time, promised, don't want to trigger another flame war):
  • I personally don't like the amount of new attributes and their names (about, resource, typeof, and property are at least as inconsistent as RDF/XML's tokens).
  • I've written an RDFa parser, but still don't really understand the processing model. RDFa does the job of course, and it's been specified by smart people I respect, but to me it just still feels a little too complicated. I often have to utilize an extraction service to verify the triples resulting from a snippet, and I've seen the creators of RDFa do the same.
    One reason for being less intuitive than hoped is the fact that adding an attribute to some existing snippet can easily change the entire meaning of nested information. This makes it tricky to incrementally add structure to already tested and approved RDFa (an unnoticed @rel or @typeof may add an unwanted blank intermediate node, for example, and you can have any combination of RDFa attributes on a single node).
  • I consider structured blogging a central use case for RDF in HTML, yet it's not fully supported by RDFa: RDFa does not allow sub-structures in XML Literals (for security/triple injection reasons, IIRC), so you can't extract a post body (including HTML markup) and also get the annotations encoded in the body (like reviews or events).
  • (Reliable) copy and paste is not possible when prefix definitions can be kept separate from annotations. This is relevant to some of the apps I'm working on, and it took me quite some time to admit that (intuitively desirable) URI abbreviations in HTML do have negative practical implications. It depends on the use case, but it also needs some experience to realize this, as the pro-prefix argument is practically motivated as well. (I started playing with RDF-ish copy & paste rather early, if that makes this conclusion more credible).
  • The xmlns:prefix mechanism doesn't work nicely with my development environment. This is perhaps a silly argument, but for me personally it is important to see that green little "0 errors" indicator in my browser while I'm creating sites. It was not hard to extend the Firefox validator extension with support for new attributes, but there was no clean way to make it accept xmlns:prefix. Spotting true errors in the dozens of RDFa-related complaints is annoying.

Having said that, if this little list is all I can come up with, then RDFa is probably a pretty solid and usable spec. I could easily write a list of things I find flawed in RDF/XML, or even SPARQL, my favorite RDF technology. And there is another good reason why I should tend towards using RDFa: Lack of proper alternatives. I still think it would be possible to create a cross-doctype solution. eRDF and my own poshRDF experiment show that it's possible, but so far these approaches are incomplete RDF-wise, and I wouldn't have the energy or funds to build a community to develop things further (and again, my arguments are motivated by personal use cases and habits, so there isn't a large overlap with other people's requirements anyway).

Nevertheless, the new "Microdata" proposal is currently being discussed, so it might be worth having a look and comparing it with my RDFa issue list above. I only had a quick scan, I may have gotten some details wrong:
  • It only introduces two new (mandatory) attributes: "item" and "itemprop". "item" can be used to type resources. RDFa's "about" can be re-used for URI-identified items. That sounds compact and neat so far.
  • "item" is mandatory to indicate the boundary of a resource description. This makes accidental triples much less likely to happen than with RDFa. For any "itemprop", you just have to walk up the DOM tree to find the container item, which makes both human- and code-based parsing easy.
  • Structured blogging?Aww, not really. While you can at least choose between raw markup or structured values in RDFa, Microdata only supports flat key-value pairs where the value is a node's textContent and won't contain tags (if I read the draft correctly). I don't really need datatypes and languages, but I definitely want RDF triples where the object can contain HTML markup (wiki blobs with embedded annotations are another example).
  • Copy & paste of source code or from/to contenteditable sections is more reliable than with RDFa because there is no prefix mechanism.
  • It'd be possible to make the Firefox validator eat the new Microdata attributes without complaining, but I'm not sure how likely it is to have Microdata support in the official distribution anytime soon. Marc Gueury writes that validating HTML5 may require a new sort of validator, switching to HTML5 may make things worse instead of better for me, development-wise.

I recently watched a short section of a TV fortune-teller show where desperate people could dial in to get their questions asked. The lady who called asked "Will I find a new love?", and the fortune-teller looked into her cards (very slowly, of course, given the 3 EUR/minute rate), then slowly lifted her head, looked straight into the camera and articulated her findings: "I see a definite Maybe."

I guess this awesome universal answer also works for my opening question. There simply is no ideal solution. I like the item/itemprop idea, but I'd need to add a hack for markup values (e.g. by adding a item="...XMLLiteral" container and then converting these items to XML nodes. But then I can just add a simpler hack to my RDFa extractor to deep-parse XMLLiterals). This doesn't justify a whole new spec. The copy/paste problem is not too urgent any more, as Linked Data enables nifty copy-by-reference instead of copy-by-value.

It's generally a little surprising to see that Microdata proposal. For months, the HTML5 opinion makers argued against user-defined markup structures, and now they created a completely new spec that not only extends RDFa's possibilities to identify resource types and relations, but also seems to introduce a redundant serialization for selected microformats.

Anyway, for the sake of convergence and less work, I think I still prefer (a subset of) RDFa, if only there was a way to get rid of CURIEs (who wants an abbreviation mechanism whose acronym can't even be properly expanded? ;). And an alternative for the validation pain could be a simple, locally installed validator, accessible through a Ubiquity script. When I think about it, I mainly just need well-formedness and some attribute checks. A Ubiquity script could directly show HTML errors and also extracted triples, and maybe even do some triple sanity checks, too. But then this setup would work for Microdata just as fine. Ah well..

Paggr screencast: Conference Explorer (proto)

Prototype screencast of a semantic conference explorer for ESWC 2009.
I just returned from a short, doc-enforced trip to Nice (awesome place, savoir-vivre and all that) and will fly to the NYC SemWeb Meetup in a few days. Before we went to France, I created another Paggr screencast. This one is the first to show the (user-facing) dashboard and widgets we plan to make available as a semantic conference explorer at ESWC 2009. Still some way to go, but I'm optimistic that we'll have a number of handy helpers online by the beginning of the event. I won't be able to attend in person, so I'm highly motivated to have at least a twitter and twitpic tracker up and running then.



HQ version (quicktime, 134MB)

ARC Graph Gear Serializer Plugin

Patrick Murray-John created an ARC2 converter for Graph Gear visualizations
Patrick Murray-John (who is currently Semantifying the University of Mary Washington) just released a first version of an ARC2 converter for Graph Gear visualizations. Looks pretty cool.
Graph Gear visualization from RDF via ARC

RDF/SPARQL-based web development for PHP coders: Meetup presentation and workshop in NYC

I'll give a talk and run a workshop in New York City in May.
The Linked Data meme is spreading and we have strong indications that web developers who understand and know how to apply practical semantic web technologies will soon be in high demand. Not only in enterprise settings but increasingly for mainstream and agency-level projects where scripting languages like PHP are traditionally very popular.

I can't really afford travelling to promote the interesting possibilities around RDF and SPARQL for PHP coders, so I'm more than happy that Meetup master Marco Neumann offered me to come over to New York and give a talk at the Meetup on May 21st. Expect a fun mixture of "Getting started" hints, demos, and lessons learned. In order to make this trip possible, Marco is organizing a half-day workshop on May 22nd, where PHP developers will get a hands-on introduction to essential SemWeb technologies. I'm really looking forward to it (and big thanks to Marco).

So, if you are a PHP developer wondering about the possibilities of RDF, Linked Data & Co, come to the Meetup, and if you also want to get your hands dirty (or just help me pay the flight ticket ;) the workshop could be something for you, too. I'll arrive a few days earlier, by the way, in case you want to add another quaff:drankBeerWith triple to your FOAF file ;)

Archives/Search

YYYY or YYYY/MM

Feeds