SPO(G) in ARC

S
The Urban Dictionary describes SPOG as "super pimped out gangsta" or as "a weapon that (...) had a fusion reactor as a power source". Sorry to disappoint you, neither has become part of ARC. Nevertheless, the SPOG I mean is quite powerful, too. It is a constrained SPARQL XML result format from SELECT queries that was proposed by Morten Frederiksen a few months ago. SPOG enables streaming store backups/dumps, and being another RDF serialization, it can be used for streamed loading as well. Support for SPOG was added in the latest revision (2008-07-02) and extends the store and the endpoint components:
  • The store got a dump() method that stream-outputs SPOG from all quads, and a createBackup($path, $alternative_query) method to write a SPOG dump (or custom SPO(G) query result) to a local file
  • The SPARQL endpoint feature list accepts "dump" as a new read operation
  • The SPARQL endpoint accepts "DUMP" as a query type now ("DUMP" also works via the internal query() method)
  • The format detector accepts SPOG XML as an RDF format now, SPARQL+ queries will work fine with LOAD <some-spog-file.srx>. (There is now a dedicated SPOG parser for streaming LOADs.)

These additions should simplify graph exchange and store replication quite a bit.

Morten++ for the idea and an initial implementation.

Documentation - Release Notes

Semantic Web Community Shop now Open

T
Putting my Talis' money where my mouth is, I've set up a SemWeb T-Shirt shop in coordination with the W3C Communications team (which, btw, is working on an official W3C shop :).

The Community shop features a couple of cube-based designs, but it's also meant to support the broader SemWeb Interest Group and their members' open-source projects. I'm happy to help with designs and product creation (as time permits). Profits will go straight to the respective project maintainers.

semantic web community shop

SPARQLScript Teaser

B
I just managed to trick my experimental SPARQLScript parser into accepting simple IF-branches and placeholders. Here is an example of what is going to be possible with ARC soon (and yes, I know this snippet most probably won't excite anyone but me ;)
BASE <http://sparqlbot.semsol.org/data/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

# set the endpoint
ENDPOINT <endpoint.php>

# feed still fresh?
$current = ASK FROM <graph-updates> WHERE {
  <http://planetrdf.com/index.rdf> dc:date ?date .
  FILTER (?date > ${now-1h})
}

# refresh feed and update graph log
IF (!$current) {
  DELETE FROM <http://planetrdf.com/index.rdf>
  LOAD <http://planetrdf.com/index.rdf>
  INSERT INTO <graph-updates> { <http://planetrdf.com/index.rdf> dc:date "${now}" }
} 
(Parsed Structure)

The fun thing about the whole SPARQLScript experiment is that the parser (so far) is still below 200 LOC. A lot can be re-used from the official SPARQL Grammar, e.g. IF-blocks are really just:
Script ::= ( Query | PrefixDecl | EndpointDecl | Assignment | IFBlock )*
IFBlock ::= 'IF' BrackettedExpression '{' Script '}'

Implementing the actual SPARQLScript processing engine is of course more work than the parser, but I'm making progress there, too.

SPARQLBot wins "Scripting for the Semantic Web" Challenge

S
I couldn't afford attending this year's ESWC in person. This is a pity as the annual "Scripting for the Semantic Web" (SFSW) Workshop is part of the main conference, and I would have loved to meet the SemWeb Scripting community. I did send in a submission to the SFSW scripting challenge, though, and if the tweets that came through the Tenerife net bottleneck are right, it was worth the effort :-)

And the 2nd prize went to SMOB, the SemWebby Twitter. Congratulations to Alexandre, Tuukka, Uldis, and John! (Cool to see ARC used in their project, too.)

I promised T-Shirts to ARC's core contributors some time ago, and I think I'll use the prize (kindly sponsored by the SemWeb masters at Talis) to finally create and send out some Thank-Yous to the community.

SPARQLBot wins SFSW Challenge 2008

Major ARC revision: Talis platform-alignment, Remote Store, SPARQLScript

T
The latest ARC release comes with a couple of non-trivial (but also not necessarily obvious) changes. The most significant (as it involves ARC's resource indexes) is the alignment with the structures used by the Talis platform. ARC's parser output and PHP or JSON formats are now directly processable by Talis' platform tools. The documentation has been updated already, you may have to adjust your code (basically just "s/val/value/" and "s/dt/datatype/") in a few places.

The second major addition is a Remote Store component (documentation still to come) that is inspired and based on Morten Frederiksen's great RemoteEndpointPlugin. The Remote Store works like Morten's Plugin, but supports SPARQL+' LOAD, INSERT, and DELETE (i.e. write/POST) operations.

The third addition is also the reason why the Remote Store (which can be used as a SPARQL Endpoint Proxy) became a core component. I've worked on a draft for a SPARQL-based scripting language during the last months, and the latest ARC revision includes an early SPARQLScript parser and a SPARQLScript processor that can run a set of routines against remote SPARQL endpoints. What's still missing before this stuff becomes more usable (apart from documentation ;) is output templating and some other essential features such as loops. I do have an early prototype running in a local SPARQLBot version, but I probably won't have it online in time for tomorrow's Semantic Scripting Workshop (that I'll try to attend remotely at least). This is really powerful (and fun) stuff that will be available soon-ish. Can't wait to replace my hard-coded inferencer with a set of easily pluggable SPARQLScript procedures.

Other tweaks and changes include a very early hCalendar extractor and a couple of bug fixes that were reported by (among others) the SMOB project maintainers.

As usual, thanks to all who sent in patches, bug reports, feature requests, and stress-tested ARC. I think we're pretty close to a release candidate now :-)

"Online Social Graph Consolidation" webinale Slides

S
I gave another talk at webinale2008, this one was about how SemWeb technology (XFN, RDF, FOAF, SPARQL, Inference) can help with the aggregation, integration, and consolidation of online social graph fragments spread across Web 2.0 services. Again, I tried to keep things demo-ish (using grawiki for Linked Data editing, and knowee for the integration and consolidation), so the slides themselves (available on slideshare) aren't too spectacular (and in german).

"SemWeb Tech 'n' Use" webinale Slides

S
Update (2008-05-29): I've uploaded the SVG source file and a hi-res PNG of the SemWeb Menu Slide

Not that many attendees really, but talk went fine. Kept things simpler and more practical this year with a live mashup/hack of data from webinale, IPC, and DLW websites via *cough* regexp-injected microformats and RDFa, pulled out and integrated with ARC and SPARQL. Fun stuff, but most of the slides are a bit boring w/o the actual demos.

semweb menu

webinale 2008 starts today

I
see me speak at webinale 2008 Still a few hours left to finish my presentations, then I'll join Germany's WebDev crowd at the webinale 2008 in Karlsruhe (It's taking place at the same location as this year's ISWC). My talks are about "Semantic Web Tech 'n' Use" (mostly microformats, RDFa, SPARQL), and RDF-based "Online Social Graph Consolidation" (FOAF, XFN, SPARQLy inference, knowee etc.), and there will be more SemWeb-related talks:
A (personally) interesting thing about the webinale is its co-location with the International PHP Conference, and the (new) Dynamic Languages World Europe, and that registering for one conference includes free access to any of the others. It's the perfect audience to talk about practical SemWeb Scripting with ARC and PHP.

Geequipment

S
Needed new biz cards for the webinale next week, experimented a bit with the service's other offerings while at it, and ended up with a shirt, a new mouse pad, and geeky beer mats. Hmm, should we ask the Comm team for a SWIG CafePress shop...?

geequipment

Credits to Danny Ayers for the awesome "Get Your Data Out" tagline (and song!), and to Brian Manley for the CafePress suggestion.

RDFa button (inofficial)

A
Update/Note: This is not an official RDFa button, those (in the known colours) will be provided by W3C's Communications Team once RDFa is a Rec or CRec.

A couple of days ago I created an RDFa technology button, and I was asked to share it, so here it is:

RDFa
(PNG, GIF, SVG source file)

Please see the W3C Semantic Web Logos and Policies page for license details. This button is derived from the original W3C ones.

Adding (partial) RDFa support to the Firefox HTML Validator extension

I
Update (2008-04-24): I managed to get rid of the xmlns-related errors (.replace() to the rescue ;), so the extension now accepts markup that follows the latest RDFa DTD (including @typeof). And while at it, I created versions for win and mac.

One of the reasons I haven't been using RDFa in production is the problem of quality assurance (a.k.a. plain old html validation). Not because RDFa isn't valid markup as such, but the main tool I'm using during development is Marc Gueury's excellent HTML Validator Extension for Firefox. RDFa is valid XHTML+RDFa, but XHTML+RDFa is not HTML, so the extension reports dozens of errors starting with the unrecognized Doctype declaration. The W3C Markup Validator supports RDFa, but I often develop while I'm offline, or on a non-public Web server, and the little "0 errors / 0 warnings" message in the status bar is more convenient than having to send markup to an online service.

Yesterday, however, I started working on an RDFa generator for one of Intellidimension's projects (Very interesting to see them use RDF big time, while many of us are still experimenting and thinking about potential markets, BTW). So, now that the RDFa-caused messages made it almost impossible to spot real HTML errors, I wondered if the add-on could perhaps be hacked to accept RDFa as well. Long story short: It can, to a certain extent. I don't know if arbitrary XML namespace prefixes (xmlns:foo="...") can be supported by a pure DTD/SGML-based validator (the FF extension uses openSP). FWIW, I couldn't get it to work.

Apart from that, RDFa-enabling the extension was mainly copying the RDFa DTD and a set of modules to the plug-in's SGML library. It now happily accepts RDFa attributes (about, resource, property, datatype, content, etc) and makes my life a little bit easier. If anyone has an idea how I could make it accept (non-predefined) namespace prefixes as well, I'd appreciate hints.

The tweaked extension is so far just a hack. I didn't even ping Marc yet or change the internal ID, so any extension update will remove the RDFa functionality. You can try/download it if you like (windows version), but I may have to take it offline should Marc not be happy about the re-distribution.

New ARC2 plugins

K
If there was a "most productive SemWeb coder" category in Danny's "This Week's Semantic Web", this week's turn would probably be Keith Alexander's. Last week, he provided no fewer than three ARC2 Plugins:
While at it, he also implemented a SPARQL+ wrapper for Talis Platform stores.

I think I blogged about Morten's RemoteEndpoint plugin a while back (this one should really become part of the core codebase), but did I mention Peter Krantz' File System Synchronizer? It keeps an RDF Store in sync with a file system directory which enables a really nice option to implement larger RDF editing systems on top of ARC: By using editing tools that work with small RDF files (quick response times and everything) and his plugin, it becomes possible to provide rich query functionality over the whole dataset without the store getting in the way of the publishing tools. RDF index rebuilding can be slow, de-coupling read from write operations and introducing an asynchronous update process is a nice solution.

Awesome stuff.

RDFAuth, with less Story-telling

A
Update: Dan Brickley suggested (in a private mail to Henry and me) that "RDFAuth" is most probably not a very smart name anyway, as something that contains official/generic technologies (RDF and oAuth in this case) may send wrong signals and cause misunderstanding. And that we shouldn't waste time fighting. He suggests more specific names (BeatnikAuth/knoweeAuth) for the time being, as this is all still premature stuff, and because no one should claim to have created an "RDFAuth", especially not if it isn't backed by the whole community. Well, what can I say, he's of course right. I apologize and will s/RDFAuth/knoweeAuth/ from now on.

You may have read Henry Story's recent post about RDFAuth, an RDF-oriented mechanism to access (partly) protected web resources. He's not describing the RDFAuth protocol, though. I've tried to clarify things a couple of times on the semantic-web list, but somehow he seems to prefer to hijack the name instead, together with parts of the idea and claim it as his invention (it's not mine either, to make things clear). Now, innovation is always based on a combination of prior work and improvements, but his "following my strict architectural guidelines, I came across what I am just calling RDFAuth" preening goes a tiny bit too far to not trigger a comment.

What he describes (a PGP-based authentication protocol) is clearly interesting, but it's simply not what RDFAuth, an idea that was developed in the knowee project, is about. For knowee (which just released the alpha version, btw), we needed something that can be implemented on basic, shared web servers. PGP is simply not an option (if considered mandatory). People won't upload their private keys to 3rd party servers, and PGP libs are not necessarily available in those environments either.

Final clarifications:
  • RDFAuth may support PGP, it's just not a requirement.
  • I'm pretty sure that Henry's PGP-only approach will attract more SemWeb geeks than RDFAuth, it just wouldn't necessarily work for knowee's target audience.
  • The RDFAuth idea is in no way special or new. It more or less predates oAuth, but long-term-ish I'll most probably have to replace it with oAuth, once there is a way to generate tokens without the browser redirect dance (fully server-side token generation is another knowee requirement).
  • I read about a token-based, decentralized identification mechanism on a very early OpenID FAQ page that described a non-browser-dependent way to log into web sites. I can't find the link anymore, but this is basically what RDFAuth is based on. So, this is not my idea either.
  • The possibility of combining 200 OK response headers with WWW-Authenticate was suggested by Etan Wexler on the FOAF mailing list
  • Dan Brickley explored SPARQL-based group membership discovery a while back. I like this idea of de-coupling data exchange decisions from the identification/authorisation process very much (RDFers don't need things like sReg or Attribute Exchange).
  • The only thing that RDFAuth adds is light-weight, personal token services (as a replacement of OpenID's browser-based identification), and the re-use of straight HTTP BasicAuth, so that partly protected resources can more easily be discovered by both server-side and client-side tools (e.g. Tabulator), and also to allow widely deployed modules like mod_php to access the login token and client identifier using built-in functionality. And I doubt that layering a protocol on top of HTTP BasicAuth hasn't been done before, so, again, nothing special to brag about here.
OK, enough geek whining ;), don't want to waste more time of my precious weekend.

Semantic Web Aliases

A
Update: Kingsley provides a number of Web references for most of the buzzwords below.

I just had an interesting twitter dispute with Ian, triggered by his invention of another alias for the Semantic Web. For last year's webinale I created a slide with which I tried to "de-confuse" people a little bit, I guess I'll need several slides this year. This is mostly just for future reference, not many of those are going to stick anyway:

A list of terms people use to name (a subset/aspect/whatever of) the "Semantic Web":
  • Semantic Web (by timbl)
  • SemWeb (by the developer community)
  • Web of Data (by timbl)
  • Data Web (by timbl)
  • The Web as a Database (by timbl)
  • Web of Knowledge (by stefandecker)
  • lowercase semantic [wW]eb (by tantek)
  • Knowledge Web (by ?)
  • Semantic Web 2.0 (by stefandecker)
  • Web 3.0 (by nova)
  • Semantic Graph (by nova)
  • Hyperdata (by danja)
  • Linked Data (by timbl)
  • Linked Data Web (by kidehen)
  • Structured Web (by the structured blogging community and mkbergman)
  • Semantic Data Web (by kidehen)
  • GGG - The Giant Global Graph (by timbl)
  • Web 3G (by iand)

See also: Interblag

And I fear there are more (even w/o considering "Pipe Dream", "Ivory Towers Inc." and similar ones). I like "Structured Web" and "Hyperdata" very much. But at the end of the day (yes, I know silly jargon as well), I think we'll just call it the [sS]emantic [wW]eb ;-)

SPARQLBot 101

M
While SPARQLBot was mostly a fun hack for last week's SemanticCamp, there is still a lot of activity on the #sparqlbot channel (it actually seems to increase). More than 30 SPARQL commands have been created. Michael Hausenblas now kindly created an introduction that gives a nice overview of the stuff that has been added to the command collection so far: SPARQLBot 101. Have fun, and thanks, Michael!