finally a bnode with a uri

Posts tagged with: w3c

Is the Semantic Web Layer Cake starting to crumble?

Some thoughts about the ever-growing number of RDF specs.
I recently read an article about how negative assertions about something are automatically getting associated with the person who made them. For example, if you say negative things about your competitor's products, people will subconsciously link these negative sentiments directly with you. A psychology thing. So, my recent rants about the RDF spec mania at the W3C have already lead to an all-time low karma level in the RDF community, and I'm trying hard to keep away from discussions about RDFa 1.1 or RDF Next Steps etc. to not make things worse. (Believe it or not, not all Germans enjoy being negative ;)

Now, why another post on this topic? ARC2's development is currently on hold as my long-time investor/girlfriend pulled the plug on it and (rightly so) wants me to focus on my commercial products. With ARC spreading, the maintenance costs are rising, too. There are some options around paid support, sponsoring and donations that I'm pondering, but for now the mails in my inbox are piling up, and one particular question people keep asking is whether ARC is going to support upcoming SPARQL 1.1 or if I'm going to boycott it and perhaps think that the W3C specs are preventing the semantic web from gaining momentum. Short answer (both times): To a certain extent, yes.

Funnily, this isn't so much a question about developers wanting to implement SPARQL 1.1, but rather if they actually can implement it, in an efficient way. SPARQL 1.1 standardizes a couple of much-needed features that we had in ARC's proprietary SPARQL+ for a couple of years. Things like aggregates and full CRUD which I managed to implement in a fast-enough way for my client projects. But when it comes to all the other features in SPARQL 1.1, the suggestions coming out of the "RDF 2.0" initiative, and the general growth of the stack, I do wonder if the RDF community is about to overcookbake its technology layer cake.

Not that any particular spec was bad or useless, but it is becoming increasingly hard for implementors to keep up. Who can honestly justify the investment in the layer cake if it takes a year to digest it, another year to implement a reasonable portion of it, and then a new spec obsoletes the expensive work? The main traction the Semantic Web effort is seeing happens around Linked Data, which uses only a fraction of the stack, and interestingly in a way non-compliant with other W3C recommendations such as OWL, because the latter doesn't provide the needed means for actual symbol linking (or didn't explain it good enough).

A central problem could be lack of targeting, and lack of formulating the target audience of a particular spec. 37signals once said that good software is opinionated. The RDF community is doing the exact opposite and seems to desperately try to please everyone. The groups follow the throw-it-out-and-see-what-sticks approach. And every new spec is thrown on the stack, with none of them having a helpful description for orientation. No one is taking the time to reduce confusion, to properly explain who is meant to implement the spec, who is meant to use the spec, and how the spec relates to other ones. Sure, new specs raise the market entrance barrier and thus help the few early vendors to keep competition away. But if the market growth gets delayed this way, it may die, or at least an unnecessary number of startups do. (Siderean is one example, their products were amazing. Another one is Radar Networks, which suffered from management issues, but they might have survived if they had spent less money trying to implement an OWL engine for Twine.)

For the fun of it, here are some micro-summaries for RDF specs, how I as a web developer understand them:
  • RDF: "A schema-less key-value system that integrates with the web." (Oha!)
  • RSS 1.0: "Rich data streams." (This is the stuff the thought leaders then said would never be needed, and which now inefficiently have to be squeezed into Atom extensions. Deppen!)
  • OWL 1: "Dumbing down KR-style modeling and inference to the web coder level" (I really liked that approach, it attracted me to the SemWeb idea in the first place, even though I later discovered that RDF Schema is sufficient in many cases.)
  • SPARQL 1.0: "A webby SQL with support for remote databases and without complex JOIN syntax." (Love it!)
  • GRDDL: "For HTML developers who are also XSLT nerds." (A failure, possibly because the target audience was too small, or because HTML creators didn't care for XML processing requirements. Or the chained processing of remote documents was simply too complex.)
  • OWL 2: "Made for the people who created it, and maybe AI students." (Never needed any of its features that I couldn't have more easily with simple SPARQL scripts. I think some people need and use it, though.)
  • RIF: "Even more features than OWL2, and yet another syntax". Alternative summary (for a good ROFL): "Perfect for Facebook's Open Graph". (No use case here. Again, YMMV.)
  • RDFa 1.1: I actually stopped following it, but here is one by Scott Gilbertson: "a bit like asking what time it is and having someone tell you how to build a watch"
  • SPARQL 1.1: "Getting at par with enterprise databases, at any cost." (A slap in the face of web developers. Too many features that are not implementable in any reasonable time, nor in its entirety, nor with user-satisfying performance. Profiles for feature subsets could still save it, though).
  • Microdata: "RDF-in-HTML made easy for CMS developers and JavaScript coders" (Not sure if it'll succeed, but it works well for me.).
  • SKOS: "An interesting alternative to RDFS and OWL and a possible bridge to the Web 2.0 world." (Wish I had time to explore SKOS-centric app development, the potential could be huge.)

I still believe that the lower-end adoption issue could be solved by a set of smaller layer cakes, each baked for and marketed to a defined and well-understood target audience. If the W3C groups continue to add to the same cake, it's going to crumble apart sooner or later, and the higher layers are going to bury the foundations. Nobody is going to taste from it at all then.

Ben Lavender formulated his concerns already several months ago.

The truth about the Semantic Web... by Dan Brickley
Picture: "The truth about the Semantic Web..." by Dan Brickley

And to answer the ARC-related question in more detail, too: Next step is collecting enough funds to test and release a PHP 5.3 E_STRICT version (Thanks so much to all donaters so far, we'll get there!). SPARQL 1.1 compatibility will come, but only for those parts that can be mapped to relational DB functionality. The REST API is on my list, too. Empty graphs, don't think so (which app would need them?). Sub-queries, most probably not. Federated queries, sure, as soon as someone figures out how to do production-ready remote JOINs ;-)

Update: This article has been called unfair and misleading, and I have to agree. I know that spec work is hard, that it's easy to complain from the sideline, and that frustration is part of compromise-driven specifications. Wake-up calls have to be a little louder to be heard, though, but I apologize for the toe-stepping. It is not directed against any person in particular.

RPointer - The resource described by this XPointer element

URIs for resources described in microformatted or poshRDF'd content
I'm often using/parsing/supporting a combination of different in-HTML annotations. I started with eRDF and microformats, more recently RDFa and poshRDF. Converting HTML to RDF usually leads to a large number of bnodes or local identifiers (RDFa is an exception. It allows the explicit specification of a triple's subject via an "about" attribute). Additionally, multi-step parsing a document (e.g. for microformats and then for eRDF) will produce different identifiers for the same objects.

I've searched for a way to create more stable, URI-based IDs. Mainly for two use cases: Technically, for improved RDF extraction, and practically for being able to subscribe to certain resource fragments in HTML pages, like the main hCard on a person's Twitter profile. The latter is something I need for Knowee.

The closest I could find (and thanks to Leigh Dodds for pointing me at the relevant specs) is the XPointer Framework and its XPointer element() scheme, which is defined as: ...intended to be used with the XPointer Framework to allow basic addressing of XML elements.
Here is an example XPointer element and the associated URI for my Twitter hCard:
element(side/1/2/1)
http://twitter.com/bengee#element(side/1/2/1)
We can't, however, use this URI to refer to me as a person (unless I redefine myself as an HTML section ;-). It would work in this particular case as I could treat the hCard as a piece of document, and not as a person. But in most situations (for example events, places, or organizations), we may want to separate resources from their respective representations on the web (and RDFers can be very strict in this regard). This effectively means that we cant use element(), but given the established specification, something similar should work.

So, instead of element(), I tweaked ARC to generate resource() URIs from XPointers. In essence:
The RPointer resource() scheme allows basic addressing of resources described in XML elements. The hCard mentioned above as RPointer:
resource(side/1/2/1)
http://twitter.com/bengee#resource(side/1/2/1)
There is still a certain level of ambiguity as we could argue about the exact resource being described. Also, as HTML templates change, RPointers are only as stable as their context. But practically, they work quite fine for me so far.

Note: The XPointer spec provides an extension mechanism, but it would have led to very long URIs including a namespace definition for each pointer. Introducing the non-namespace-qualified resource() scheme unfortunately means dropping out of the XPointer Framework ("This specification reserves all unqualified scheme names for definition in additional XPointer schemes"), so I had to give it a new name (hence "RPointer") and have to hope that the W3C doesn't create a resource() scheme for the XPointer framework.

RPointers are implemented in ARC's poshRDF and microformats extractors.

Semantic Web Community Shop now Open

Today we opened a spreadshirt shop for Semantic Web Merchandise
Update (2009-07-24): The Shop has been closed due to lack of sales during the last 12 months

Putting my Talis' money where my mouth is, I've set up a SemWeb T-Shirt shop in coordination with the W3C Communications team (which, btw, is working on an official W3C shop :).

The Community shop features a couple of cube-based designs, but it's also meant to support the broader SemWeb Interest Group and their members' open-source projects. I'm happy to help with designs and product creation (as time permits). Profits will go straight to the respective project maintainers.

semantic web community shop

RDFa button (inofficial)

An inofficial light-blue RDFa button
Update/Note: This is not an official RDFa button, those (in the known colours) will be provided by W3C's Communications Team once RDFa is a Rec or CRec.

A couple of days ago I created an RDFa technology button, and I was asked to share it, so here it is:

RDFa
(PNG, GIF, SVG source file)

Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.

SPARQL is a W3C Recommendation

The W3C moves SPARQL to REC status. ARC is part of the Implementation Survey
I guess I already pushed out enough ARC spam today, so I'll keep things short: SPARQL is now a W3C Recommendation!

What I'm personally very happy about is the Implementation Survey which features two pure-PHP implementations*. This really opens the door for mainstream Web Developers to start exploring RDF and SPARQL on off-the-shelf hosted web servers. Everything I create these days (e.g. the ARC site, including the bots and archive generators there, or this blog) is powered by SPARQL. It's an amazing productivity booster as you never have to worry about complicated JOINs or evolving database schemas again. You can just code away and it's great fun to work with. Want more Testimonials? The Data Access Working Group collected quite a number of them from W3C member organizations.

* Don't let yourself be fooled by RAP's low report scores, their SPARQL engine is quite mature, they just didn't run the whole test suite.

Short trip to DFKI for SWEO off-site F2F

polycom is cool, travelling with Deutsche Bahn is not.
Yesterday, SWEO's 3rd F2F took place at the MIT, and although I couldn't afford in-person attendance, Leo Sauermann from AI research center DFKI enabled a near-equivalent in Germany. I missed the first agenda item due to the usual Deutsche "zanks for traffelink wizz us" Bahn delays, but apart from that it was a fun event and worth the 4 + 6 hour train journey. Super-modern DFKI was impressive, and the multi-screen polycom with remote-controlled cam truly rocked.

sweo F2F via DFKI ploycom
pic by Leo

The train back had power sockets, so I could hack a bit, and when I arrived in DDorf at 7am this morning, I had a working RDFa parser for ARC's next release. Almost there, now..

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds