Posts tagged with: sparql
SPARQLBot 101
M
While SPARQLBot was mostly a fun hack for last week's SemanticCamp, there is still a lot of activity on the #sparqlbot channel (it actually seems to increase). More than 30 SPARQL commands have been created. Michael Hausenblas now kindly created an introduction that gives a nice overview of the stuff that has been added to the command collection so far: SPARQLBot 101. Have fun, and thanks, Michael!
Posted on 2008-02-21 at 20:10 UTC
by
(trackback)
New ARC features: Triggers and MySQL extensions
A
The latest ARC revision got two new features: SPARQL Triggers and MySQL function extensions for SPARQL.
SPARQL Triggers
Triggers in ARC were suggested by Dan Brickley, who is experimenting with dynamically populated/updated Group definitions. What you can effectively do now in ARC is associating custom trigger classes with SPARQL query types, which will then be automatically called after registered query types, for example to refresh inferred Graphs:$config = array(
...
'store_triggers' => array(
/* register LOAD triggers */
'load' => array('updateFriendsList', 'crawlXFNLinks'),
),
);
$ep = ARC2::getStoreEndpoint($config);
$ep->go();
MySQL Extension Functions
Morten Frederiksen did it again. He sent in about 10 lines of code which he suggested to add to ARC's SQL rewriter. The effect? ARC suddenly has access to dozens of MySQL functions. That's CONCAT, CURDATE, MD5, UNIX_TIMESTAMP, and many more. A namespace for MySQL function URIs is now online, and queries look like this:PREFIX mysql: <http://web-semantics.org/ns/mysql/> PREFIX foaf: <http://xmlns.com/foaf/0.1/>I talked a little bit more about these things with Danny Ayers in a recent podcast.
SELECT ?person WHERE { ?person foaf:givenname ?n1 ; foaf:family_name ?n2 . FILTER (mysql:concat(?n1, " ", ?n2) = "Alec Tronnick") . }
Posted on 2008-02-04 at 20:10 UTC
by
(trackback)
Looking for paid (Semantic Web) Projects
I
Update 2: Yay, I think I'm safe for the next couple of months, should have blogged much earlier. Now I'm starting to think we could really need a Job site for SemWeb people..
Update: Ah, the blogosphere. I already received some replies. One to share: Aduna is looking for a Java Engineer.
About a year ago, I received some funds which allowed me to re-write the ARC toolkit, and also to bring Trice (a semantic web application framework for PHP) to production-readiness. However, Semantic Web Development is generally still very new, especially in the Web Agency market where I'm coming from. It's not that easy yet to keep things self-sustaining.
May well be that I should blog less about bleeding-edge experiments, but rather about how RDF and SPARQL allow me to deploy extensible websites at a fraction of the time it used to take in the past. "Release Early", "Data First", "Evolve on the Fly", and all those patterns that SemWeb technology enables in a web development context.
Anyway, to keep things short: I'm actively (read: urgently ;-) looking for more paid projects. I'm a Web development all-rounder with particular interest in scripting languages and quite some experience in delivering RDF and frontend solutions (more details on my profile page). While it would of course be great to work on stuff where I can use my tools, I'm available for more general web development as well. I'm most productive when I can work from my office, but temporary travelling is basically fine, too. The Düsseldorf Airport is just minutes away.
Cheers in advance for suggestions,
bnowacksemsol.comBenji
Update: Ah, the blogosphere. I already received some replies. One to share: Aduna is looking for a Java Engineer.
May well be that I should blog less about bleeding-edge experiments, but rather about how RDF and SPARQL allow me to deploy extensible websites at a fraction of the time it used to take in the past. "Release Early", "Data First", "Evolve on the Fly", and all those patterns that SemWeb technology enables in a web development context.
Anyway, to keep things short: I'm actively (read: urgently ;-) looking for more paid projects. I'm a Web development all-rounder with particular interest in scripting languages and quite some experience in delivering RDF and frontend solutions (more details on my profile page). While it would of course be great to work on stuff where I can use my tools, I'm available for more general web development as well. I'm most productive when I can work from my office, but temporary travelling is basically fine, too. The Düsseldorf Airport is just minutes away.
Cheers in advance for suggestions,
bnowacksemsol.comBenji
Posted on 2008-01-24 at 12:50 UTC
by
(trackback)
Grawiki - A Wiki (and aggregator) for graph-shaped data
T
In case you watched the "DriftR" screencast I created in December, there is now a live version online. (I dropped the initial name, my blog posts suddenly showed up in CrunchBase. ;-)
Grawiki is a SPARQL-based Data Wiki, a little bit inspired by freebase, less impressive, feature-rich, scalable and all that, but, well, OpenSource, SemWeb-enabled, and decentralized (each Grawiki installation can import selected graphs from other ones, back-POSTing is in the works). As it seems that I forgot to write-protect the instance mentioned above, you can play with it if you like. You'll most probably encounter bugs, the built-in inferencer is still at alpha stage, and editing of consolidated bnodes is quite tricky to implement. I'll tweak things in a day or two.
With Grawiki, I think I finally have (the start of) a tool that could work nicely for ad-hoc RDF editing and aggregation (it can import RDF and certain microformats). Oh, and a personal URI, and a FOAF file. At last ;-)
I'm now considering the addition of RDFa injections as a possible next step, the current editor uses a home-grown mechanism to activate the editing hooks and stuff, which was easier to implement and debug in my XHTML 1.0 development environment. Stay tuned, a download site probably won't be up before next week, gotta focus on urrrgent SWEO/knowee todos first...
Grawiki is a SPARQL-based Data Wiki, a little bit inspired by freebase, less impressive, feature-rich, scalable and all that, but, well, OpenSource, SemWeb-enabled, and decentralized (each Grawiki installation can import selected graphs from other ones, back-POSTing is in the works). As it seems that I forgot to write-protect the instance mentioned above, you can play with it if you like. You'll most probably encounter bugs, the built-in inferencer is still at alpha stage, and editing of consolidated bnodes is quite tricky to implement. I'll tweak things in a day or two.
With Grawiki, I think I finally have (the start of) a tool that could work nicely for ad-hoc RDF editing and aggregation (it can import RDF and certain microformats). Oh, and a personal URI, and a FOAF file. At last ;-)
I'm now considering the addition of RDFa injections as a possible next step, the current editor uses a home-grown mechanism to activate the editing hooks and stuff, which was easier to implement and debug in my XHTML 1.0 development environment. Stay tuned, a download site probably won't be up before next week, gotta focus on urrrgent SWEO/knowee todos first...
Posted on 2008-01-22 at 20:50 UTC
by
(trackback)
ARC Remote Endpoint Plugin
M
OK, you're probably already wondering if Morten and I have a link exchange contract, but anyway: He just announced a plugin for ARC that provides "access to remote SPARQL endpoints as if they were local stores." Cool stuff :-)
Posted on 2008-01-16 at 14:50 UTC
by
(trackback)
SPARQL is a W3C Recommendation
T
I guess I already pushed out enough ARC spam today, so I'll keep things short: SPARQL is now a W3C Recommendation!
What I'm personally very happy about is the Implementation Survey which features two pure-PHP implementations*. This really opens the door for mainstream Web Developers to start exploring RDF and SPARQL on off-the-shelf hosted web servers. Everything I create these days (e.g. the ARC site, including the bots and archive generators there, or this blog) is powered by SPARQL. It's an amazing productivity booster as you never have to worry about complicated JOINs or evolving database schemas again. You can just code away and it's great fun to work with. Want more Testimonials? The Data Access Working Group collected quite a number of them from W3C member organizations.
* Don't let yourself be fooled by RAP's low report scores, their SPARQL engine is quite mature, they just didn't run the whole test suite.
What I'm personally very happy about is the Implementation Survey which features two pure-PHP implementations*. This really opens the door for mainstream Web Developers to start exploring RDF and SPARQL on off-the-shelf hosted web servers. Everything I create these days (e.g. the ARC site, including the bots and archive generators there, or this blog) is powered by SPARQL. It's an amazing productivity booster as you never have to worry about complicated JOINs or evolving database schemas again. You can just code away and it's great fun to work with. Want more Testimonials? The Data Access Working Group collected quite a number of them from W3C member organizations.
* Don't let yourself be fooled by RAP's low report scores, their SPARQL engine is quite mature, they just didn't run the whole test suite.
Posted on 2008-01-15 at 19:45 UTC
by
(trackback)
RDF Tools - An RDF Store for WordPress
T
Together with Morten Frederiksen and Dan Brickley (who is revisiting his SparqlPress idea), I've created a WordPress extension (called "RDF Tools") that adds an (ARC-based) RDF Store and SPARQL Endpoint to the blogging system. The store is kept separate from the WP tables (i.e. it's not a wrapper), but you can use WP's nice admin screens to configure it (screenshot), and given the amount of developer-friendly hooks that WP offers, I'm curious what can be done now, possibly in combination with other extensions such as those Alexandre Passant is working on. It could perhaps also be handy as a deployment accelerator for knowee.
Posted on 2008-01-15 at 15:10 UTC
by
(trackback)
ARC Data Wiki Plugin
A
I'm blessed with a small but first-class community around ARC that helps me with bug reports, patches, encouraging feedback, and nifty ideas. One example for the latter was Morten Frederiksen's invention to allow ARC to be extended with third party plugins. He even demonstrated the utility by enhancing the toolkit with a remote SPARQL endpoint for his named graph exchange work. ARC plugins are not bundled with the core codebase (which is meant to stay compact), but can easily be integrated in any ARC installation (Developer documentation is now online, too).
My first own plugin was triggered by Tim Berners-Lee's suggestion to write a lightweight request handler for an RDF-powered Data Wiki, as described in a recent Tech Report (PDF) and already implemented with Algae. I had to tweak the SPARQL+ spec and ARC's Query Parser to make it compatible with Eric Prud'hommeaux's SPARQL/Update flavor. This had the nice side-effect that all three SPARQL Write proposals (SPARUL, SPARQL/Update, SPARQL+) now (almost) share a common subset for basic INSERTs and DELETEs. After these updates, writing the plugin itself became almost trivial.
The code is still experimental and limited, but it's available for download, together with setup instructions. The Data Wiki plugin doesn't require a database (unlike the other SPARQL components in ARC) and supports update calls sent by RDF editors such as the Tabulator. I've set up a demo RDF wiki and will try to add remote update functionality to my own editor (to be renamed) now as well. Hmmm, would be cool to have a selection of generic tools to collaboratively read from and write to shared RDF spaces one day.
My first own plugin was triggered by Tim Berners-Lee's suggestion to write a lightweight request handler for an RDF-powered Data Wiki, as described in a recent Tech Report (PDF) and already implemented with Algae. I had to tweak the SPARQL+ spec and ARC's Query Parser to make it compatible with Eric Prud'hommeaux's SPARQL/Update flavor. This had the nice side-effect that all three SPARQL Write proposals (SPARUL, SPARQL/Update, SPARQL+) now (almost) share a common subset for basic INSERTs and DELETEs. After these updates, writing the plugin itself became almost trivial.
The code is still experimental and limited, but it's available for download, together with setup instructions. The Data Wiki plugin doesn't require a database (unlike the other SPARQL components in ARC) and supports update calls sent by RDF editors such as the Tabulator. I've set up a demo RDF wiki and will try to add remote update functionality to my own editor (to be renamed) now as well. Hmmm, would be cool to have a selection of generic tools to collaboratively read from and write to shared RDF spaces one day.
Posted on 2008-01-15 at 13:50 UTC
by
(trackback)
LOAD, INSERT, and DELETE in ARC2 via SPARQL+
F
The new ARC site is coming along quite nicely. Last week I implemented two (low-level) agents that log IRC conversations and mails to ARC-DEV. RDF and SPARQL make such things incredibly easy. Today, I started writing documentation for the preview release of ARC2, and one the core changes to ARC1 is the removal of the API class for inserts and deletes in favour of an extended SPARQL, called SPARQL+ which enables aggregates, LOAD, INSERT, and DELETE, without the need for major query engine code additions.
LOAD is compatible with the LOAD operation introduced in the SPARUL proposal:
LOAD is compatible with the LOAD operation introduced in the SPARUL proposal:
LOAD <http://example.com/> INTO <http://example.com/archive>INSERT and DELETE are different, though. They re-use the LOAD and CONSTRUCT handlers which simplified the implementation and will hopefully make it easier for people who just learned SPARQL's standard syntax. INSERT and DELETE in SPARQL+ each support two different forms, one for explicit triples (with simple wildcards in DELETE queries), and one for dynamically CONSTRUCTed ones, e.g.
DELETE {
<#foo> <bar> "baz" .
<#foo2> <bar2> ?any .
}
or
INSERT INTO <http://example.com/inferred> CONSTRUCT {
?s foaf:knows ?o .
}
WHERE {
?s xfn:contact ?o .
}
More examples and detailed information about how exactly SPARQL+ extends the SPARQL grammar are available in ARC2's SPARQL+ documentation section
Posted on 2007-11-26 at 17:50 UTC
by
(trackback)
ARC2 Progress
G
OK, I met this week's 2nd deadline and finished ARC2's SPARQL test suite report. Pass/Fail results as of today: 317/67 (Sept. 22nd: 352/84). That's a huge step forward compared to ARC1, so I'm quite happy.
Next actions: Making the knowee prototype public (deadline missed, boo!), and relaunching the ARC site, together with proper community tools and the new release.
Next actions: Making the knowee prototype public (deadline missed, boo!), and relaunching the ARC site, together with proper community tools and the new release.
Posted on 2007-09-19 at 16:55 UTC
by
(trackback)
Web Clipboard: Adding liveliness to "Live Clipboard" with eRDF, JSON, and SPARQL.
C
Some context: In 2004, Tim Berners-Lee mentioned a potential RDF Clipboard as a user model which allowed copying resource descriptions between applications. Depending on the type of the copied resource, the target app would trigger appropriate actions. (See also the ESW wiki and Danny's blog for related links and discussion.)
I had a go at an "RDF data cart" last year which allowed you to "1click"-shop resource descriptions while surfing a site. Before leaving, you could "check out" the collected resource descriptions. However, the functionality was limited to a single session, the resource pointers didn't use globally valid identifiers.
Then, a couple of months ago, Ray Ozzie announced Live Clipboard, which uses a neat trick to access the operating system's clipboard for Copy & Paste operations across web pages.
Last week, I finally found the time to combine the Live Clipboard trick with the stuff I'm currently working on: A Semantic Publishing Framework, Embeddable RDF, and SPARQL. If you haven't heard of the latter two: eRDF is a microformats-like way to embed RDF triples in HTML, SPARQL is the W3C's protocol and query language for RDF repositories.
What I came up with so far is a Web Clipboard that works similar to Live Clipboard (I'm actually thinking about making it fully compatible), with just a few differences:
Complete documentation is going to be up at the clipboard site, but I'll first see if I can make things Live Clipboard-compatible (and I'll be travelling for the rest of the week). Here is a simple explanation how the current SPARQL demo works:
Apart from adding a small javascript library and a CSS file to the page, I specified the clipboard namespace and a default endpoint to be used for any resource pointer embedded in the page (this is eRDF syntax):
Then I embedded a sparqlet that generates the list of Planet RDF bloggers (this is done server-side). The important thing is that the HTML contains eRDF hooks like this:
Ideally, the resource ID (
(The nice thing about eRDF-encoded hooks is that the information can be read by any HTTP- and eRDF-enabled client, the clipboard functionality could be implemented without having to load the page in a browser.)
Now, when the page is displayed, an onload-handler instantiates a JavaScript Web Clipboard which automatically adds an icon for each resource identified by the "webclip:Res/webvlip:resID"-hooks.
When the icon is clicked, the resource pointer JSON object is created and can be copied to the system's clipboard. It currently looks like this (on a single line):
We can see that the clipboard uses the default endpoint mentioned at the document level as the embedded hook didn't specify a resource-specific endpoint. We can also see that the endpoint supports two specs, namely the SPARQL protocol and JSONP.
When this JSON object is pasted to another clipboard section, the onpaste-handler can decide what to do. In the demo, any paste section will make an asynchronous On-Demand-JavaScript call to the resource's SPARQL endpoint to retrieve a custom resource representation. The "Latest blog post" section uses a pre-defined callback, but this can be overwritten (as e.g. done by the "Resource Description" section which uses a custom function to display results).
I've added a playground area to the clipboard site where you can create your own clipboard sections. Give it a try, it's not too complicated. You can even bookmark them.
Here is an example JavaScript snippet that adds a clipboard section to a clipboard-enabled page with an 'id="resultCountSection"' HTML element:
Something like this is all that will be needed for the final clipboard. No microformats parsing or similar burdens (although you could use the Web Clipboard to process microformats). The Clipboard's definition of an endpoint is rather open, too. An RSS file could be considered an endpoint as well as any other Web-accessible document or API.
I had a go at an "RDF data cart" last year which allowed you to "1click"-shop resource descriptions while surfing a site. Before leaving, you could "check out" the collected resource descriptions. However, the functionality was limited to a single session, the resource pointers didn't use globally valid identifiers.
Then, a couple of months ago, Ray Ozzie announced Live Clipboard, which uses a neat trick to access the operating system's clipboard for Copy & Paste operations across web pages.
Last week, I finally found the time to combine the Live Clipboard trick with the stuff I'm currently working on: A Semantic Publishing Framework, Embeddable RDF, and SPARQL. If you haven't heard of the latter two: eRDF is a microformats-like way to embed RDF triples in HTML, SPARQL is the W3C's protocol and query language for RDF repositories.
What I came up with so far is a Web Clipboard that works similar to Live Clipboard (I'm actually thinking about making it fully compatible), with just a few differences:
- Web Clipboard uses a hidden single-line text input instead of a textarea which seemed to be a little bit easier to insert into the document structure, and it makes it work in Opera 8.5. The downside is that input fields don't allow multi-line content to be pasted (which is not needed by Web Clipboard, but will be necessary if I want to add Live Clipboard compatibility)
- Web Clipboard doesn't paste complete resource descriptions, but only pointers to those. This makes it possible to e.g. copy a resource from a simple list of person's names, and display full contact details after a paste operation. (See the demo for an example which does asynchronous calls to a SPARQL endpoint). This "pass by reference" enables things like distributed address books or calendars where changes at one place could be automatically updated in the other apps.
- Instead of XML, Web Clipboard uses a small JSON object which can simply be evaluated by JavaScript applications, or split with a basic regular expression. The pasted object contains 1) a resource identifier, and 2) an endpoint where information about the identified resource is available. The endpoint information consists of a URL and a list of specifications supported by the endpoint.
Complete documentation is going to be up at the clipboard site, but I'll first see if I can make things Live Clipboard-compatible (and I'll be travelling for the rest of the week). Here is a simple explanation how the current SPARQL demo works:
Apart from adding a small javascript library and a CSS file to the page, I specified the clipboard namespace and a default endpoint to be used for any resource pointer embedded in the page (this is eRDF syntax):
<link rel="schema.webclip" href="http://webclip.web-semantics.org/ns/webclip#" /> <link rel="webclip.endpoint" href="http://www.sparqlets.org/clipboard/sparql" />
Then I embedded a sparqlet that generates the list of Planet RDF bloggers (this is done server-side). The important thing is that the HTML contains eRDF hooks like this:
<div id="agent0" class="-webclip-Res"> <span class="webclip-resID" title="_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787"></span> <span class="foaf-name">Bob DuCharme</span> <a rel="foaf-weblog" href="http://www.snee.com/bobdc.blog/">bobdc.blog by Bob DuCharme</a> </div>
Ideally, the resource ID (
webclip:resID, here again in eRDF notation) is a URI or some other stable identifier. The queried endpoint, however, obviously couldn't find a URI for the rendered resource, so it only provided a bnode ID. This is ok for the SPARQL endpoint the clipboard uses, though. The "foaf:weblog" information could be used to further disambiguate the resource identifier, the demo doesn't use it, however.(The nice thing about eRDF-encoded hooks is that the information can be read by any HTTP- and eRDF-enabled client, the clipboard functionality could be implemented without having to load the page in a browser.)
Now, when the page is displayed, an onload-handler instantiates a JavaScript Web Clipboard which automatically adds an icon for each resource identified by the "webclip:Res/webvlip:resID"-hooks.
When the icon is clicked, the resource pointer JSON object is created and can be copied to the system's clipboard. It currently looks like this (on a single line):
{
resID : "_:bb1ed0e67fdb042619f2f20fdc479c3af_id2245787",
endpoint: {
url: "http://www.sparqlets.org/clipboard/sparql",
specs: [
"http://www.w3.org/TR/rdf-sparql-protocol/",
"http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/"
]
}
}
We can see that the clipboard uses the default endpoint mentioned at the document level as the embedded hook didn't specify a resource-specific endpoint. We can also see that the endpoint supports two specs, namely the SPARQL protocol and JSONP.
When this JSON object is pasted to another clipboard section, the onpaste-handler can decide what to do. In the demo, any paste section will make an asynchronous On-Demand-JavaScript call to the resource's SPARQL endpoint to retrieve a custom resource representation. The "Latest blog post" section uses a pre-defined callback, but this can be overwritten (as e.g. done by the "Resource Description" section which uses a custom function to display results).
I've added a playground area to the clipboard site where you can create your own clipboard sections. Give it a try, it's not too complicated. You can even bookmark them.
Here is an example JavaScript snippet that adds a clipboard section to a clipboard-enabled page with an 'id="resultCountSection"' HTML element:
window.clipboard.addSection({
id : "resultCountSection",
resIDVar : "myRes",
query : "SELECT ?knowee WHERE "+
"{"+
" ?myRes <http://xmlns.com/foaf/0.1/knows> ?knowee . "+
"}"+
" LIMIT 50",
callback : function(qr){
var rows=(qr.results["bindings"]) ? qr.results.bindings : [];
var result="The pasted resource seems to know "+
rows.length+" persons.";
/* update paste area */
this.item.innerHTML=result;
/* refresh clipboard */
window.clipboard.activate();
}
});
window.clipboard.activate();
Something like this is all that will be needed for the final clipboard. No microformats parsing or similar burdens (although you could use the Web Clipboard to process microformats). The Clipboard's definition of an endpoint is rather open, too. An RSS file could be considered an endpoint as well as any other Web-accessible document or API.
Posted on 2006-06-12 at 15:55 UTC
by
(trackback)
ARC RDF Store for PHP - enSPARQL your LAMP
A
A first version of ARC RDF Store is now available. It's written entirely in PHP and has been optimized for basic LAMP environments where install and system configuration privileges are often not available. As with the other ARC components, I tried to keep things modular for easier integration in other PHP/MySQL-based systems.
A full store installation consists of just 7 files (à 10-50 KB each) but offers a variety of nice features:
I've put up an ugly little demo service where you can run test queries against w3photo/CONFOTO data (~ 20K triples, prop-table for
Next action: A JavaScript CRUD frontend. Well, and bug-fixing...
A full store installation consists of just 7 files (à 10-50 KB each) but offers a variety of nice features:
- It works with PHP 4 and MySQL 4
It's also possible to use a single installation/DB to run multiple independent stores. - SPARQL queries are (almost) completely translated to SQL
This not only avoids having to use interpreted PHP to process sub-results but also allows the RDBMS to optimize queries. (Note, however, that queries which can't be translated to SQL are not supported) - Cartesian Catastrophe protection ;-)
Triple duplicates from different graphs can be moved to a dedicated table which is then only considered by GRAPH queries. GRAPH-independent queries ignore the duplicates, the combinatorial explosion is less likely to happen. - Application-specific table space customization
PHP comes with certain performance limitations. And using shared, hosted Web servers often means that MySQL is running with standard settings. In order to still allow building advanced applications with ARC RDF Store, it provides options to split the triple tables. By default, triples with literal objects are separated from those with non-literal object values. On top of this, it's possible to specify so-called prop-tables to further split the table space used to store triples. For a calendaring app it might be useful to separate date/dateTime properties from the oher triples, a social networking site could define a prop-table forfoaf:knowsand related properties,foaf:depictsin case of a codepiction demo,rdfs:subClassOffor an ontology editor, etc. Splitting the triple space improves both query and insert speed. - Reversible resource consolidation
The "Store keeper" class provides a smushing function for both functional and inverse functional properties. Pre-consolidation IDs are stored for each triple, enabling the un-smushing of merged resources to a certain extent (only one previous ID value can be restored of resources that have been smushed several times using different identifiers). - JSON results
JSON results are available for all the main CRUD store methods (add_data, query, update_data, delete_data). - Multiple options for inserting and deleting data
The store can add RDF/XML from the Web (including 3xx handling), RDF/XML passed as parameter, or single triples encoded in a turtle subset. Data removal is possible by providing a graph IRI, a concrete RDF/XML document, or a triple pattern including wildcards.
I've put up an ugly little demo service where you can run test queries against w3photo/CONFOTO data (~ 20K triples, prop-table for
rdf:type). Might be fun to play with the JSONI generator, or to check out the SQL created by the rewriter.Next action: A JavaScript CRUD frontend. Well, and bug-fixing...
Posted on 2006-02-20 at 17:45 UTC
by
(trackback)
JSONC, JSONI, JSONP: Creating tailored JSON from SPARQL query results
R
Update (2006-02-05): Elias Torres sent me a pointer to a draft of a related tech note he is working on with other DAWG members. I've adjusted my serialiser, so that its output is closer to their proposal now. After testing the different serialisation options, I've also updated the JSONI format. The examples below show the changes already.
My current sparqlet (SPARQL-driven portlet) implementations mostly use queries generated server-side, the results are returned as application-specific JavaScript. While this approach allows certain bandwidth and convenience optimisations, it always needs custom code on the server.
For generic operations, SPARQL endpoints offer an XML format which can be consumed by in-browser applications via XHR techniques. However, the SPARQL result XML format makes things quite bloated, and what I learned from Web 2.0 coders is that JSON results are often preferred.
It's rather straightforward to generate JSON code equivalent to the XML structure. I know that several people are working on this, but I couldn't find any public versions. I hope my ARC stuff isn't too different, but I can easily tweak it later. Here is a sample of a default SPARQL JSON result returned from an ARC server:
However, like the XML result, this JSON alternative is not the most efficient when a consuming app doesn't need the typing info of the individual bindings (uri/bnode/literal). I played around with some pre-defined "compact" JSON formats, but looking at the queries I'm using in my stuff, there are often cases, where I want the typing info for one or two of the bindings, but not for the rest. The solution I implemented for the ARC RDF Store looks like this: The user can specify an optional
JSONC can help reduce bandwidth and browser memory consumption, but it doesn't really add too much to the front-end developer's convenience. The RDF model is graph-based and resource-oriented, but SPARQL results are tabular with usually a lot of repeated values. Therefore a developer has to process the code before resource-centric views can be displayed. What's missing (if we want to avoid custom, server-side code or heavy pre-processing on the client) is a way to tell the SPARQL endpoint to arrange and index the tabular results before they are serialised as JSON: JSONI. The
ARC supports several JSONI fine-tuning options:
JSONI brings SPARQL result processing closer to the way how Web 2.0 APIs work.
Finally, in order to facilitate inclusion of the different JSON results in browser-based applications, I added support for JSONP (JSON with Padding), which allows the definiton of a string which is prepended to the returned JSON result. Combined with on-demand JavaScript, this enables accessing SPARQL endpoints from different domains without having to force users to adjust browser settings.
I'm still testing things locally, but with a bit of luck the store and API will eventually be online next week.
For generic operations, SPARQL endpoints offer an XML format which can be consumed by in-browser applications via XHR techniques. However, the SPARQL result XML format makes things quite bloated, and what I learned from Web 2.0 coders is that JSON results are often preferred.
It's rather straightforward to generate JSON code equivalent to the XML structure. I know that several people are working on this, but I couldn't find any public versions. I hope my ARC stuff isn't too different, but I can easily tweak it later. Here is a sample of a default SPARQL JSON result returned from an ARC server:
SELECT DISTINCT ?s ?p ?o
WHERE { ?s ?p ?o }
LIMIT 20
{
head: {
vars: ["s", "p", "o"]
},
results: {
distinct: true,
ordered: false,
compact: false,
indexed: false,
bindings: [
{
s: {
type: "bnode",
value: "b2490b2520bf2872200093194ff36f465_id2245308"
},
p: {
type: "uri",
value: "http://xmlns.com/foaf/0.1/weblog"
},
o: {
type: "uri",
value: "http://www.nzlinux.org.nz/blogs/"
}
},
{
s: {
type: "uri",
value: "http://www.nzlinux.org.nz/blogs/"
},
p: {
type: "uri",
value: "http://www.w3.org/2000/01/rdf-schema#seeAlso"
},
o: {
type: "uri",
value: "http://www.nzlinux.org.nz/blogs/wp-rdf.php?cat=9"
}
},
...
]
}
}
(I'm using associative arrays for the bindings in order to reduce bandwidth a bit. Didn't put too much work into this default serialisation, it's probably going to change when there is a recommended format available.)However, like the XML result, this JSON alternative is not the most efficient when a consuming app doesn't need the typing info of the individual bindings (uri/bnode/literal). I played around with some pre-defined "compact" JSON formats, but looking at the queries I'm using in my stuff, there are often cases, where I want the typing info for one or two of the bindings, but not for the rest. The solution I implemented for the ARC RDF Store looks like this: The user can specify an optional
jsonc argument which defines whether a binding should be serialised entirely or if it can be flattened:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p1 ?p1_name ?p2_name ?p2_mbox_sha1 WHERE { ?p1 foaf:name ?p1_name ; foaf:knows ?p2 . ?p2 foaf:name ?p2_name ; foaf:mbox_sha1sum ?p2_mbox_sha1 . } ORDER BY ?p1_name LIMIT 30
jsonc="p1(),p1_name,p2_name,p2_mbox_sha1";
{
head: {
variables: ["p1", "p1_name", "p2_name", "p2_mbox_sha1"]
},
results: {
distinct: true,
ordered: true,
compact: true,
indexed: false,
bindings: [
{
p1: {
type: "bnode",
value: "b2e9ddd5ebb264646b852dcd207e13d8a_bn1"
},
p1_name: "Jim Ley",
p2_name: "Jeremiah McElroy",
p2_mbox_sha1: "f0d988b33153f21479cffa647cbe6faac65a98f8"
},
{
p1: {
type: "bnode",
value: "b2e9ddd5ebb264646b852dcd207e13d8a_bn1"
},
p1_name: "Jim Ley",
p2_name: "Mart Sanderson",
p2_mbox_sha1: "ce3165ecf98cdb6d8153503949b320e24a6138a0"
},
...
]
Appending parentheses to a result variable activates the complete serialisation, the other vars will be flattened. The jsonc parameter can also be used to remove selected result variables from the returned JSON. This may be helpful in in cases when they were needed to retrieve the SPARQL result set (e.g. in combination with DISTINCT), but aren't actually used in the client app.JSONC can help reduce bandwidth and browser memory consumption, but it doesn't really add too much to the front-end developer's convenience. The RDF model is graph-based and resource-oriented, but SPARQL results are tabular with usually a lot of repeated values. Therefore a developer has to process the code before resource-centric views can be displayed. What's missing (if we want to avoid custom, server-side code or heavy pre-processing on the client) is a way to tell the SPARQL endpoint to arrange and index the tabular results before they are serialised as JSON: JSONI. The
jsoni parameter works similar to the jsonc one, but it allows nesting of result vars to specify index structures:
jsoni="p1_name(p2_name)";
{
head: {
variables: ["p1_name"]
},
results: {
distinct: true,
ordered: true,
compact: false,
indexed: true,
index: {
p1_name: [
{
value: "Jim Ley",
type: "literal",
index: {
knows: [
"Jeremiah McElroy",
"Mart Sanderson"
]
}
},
{
value: "Leandro Mariano López",
type: "literal",
index: {
knows: [
"Jim Ley",
"Dan Brickley",
"Maria de las Mercedes Reina",
"Charles McCathieNeville",
"Eva Méndez",
"Morten Frederiksen",
"Daniel Krech",
"Maximiliano Cittadini"
]
}
},
...
]
ARC supports several JSONI fine-tuning options:
- Custom index keys:
"p1_name AS person(p2_name AS knows)" - Combination with JSONC:
"p1_name(p2_name())"(serialise p2_name as full array as well) - Multiple indexes in one query:
"p1_name(p2_mbox_sha1),p2_mbox_sha1(p2_name)" - Nesting:
"s(p(o))"
JSONI brings SPARQL result processing closer to the way how Web 2.0 APIs work.
Finally, in order to facilitate inclusion of the different JSON results in browser-based applications, I added support for JSONP (JSON with Padding), which allows the definiton of a string which is prepended to the returned JSON result. Combined with on-demand JavaScript, this enables accessing SPARQL endpoints from different domains without having to force users to adjust browser settings.
I'm still testing things locally, but with a bit of luck the store and API will eventually be online next week.
Posted on 2006-02-03 at 18:00 UTC
by
(trackback)
ARC SPARQL2SQL Rewriter for PHP v0.2.0
U
OK, I could easily spend another month on this beast, but it should be good enough for my current projects and I really have to continue with those now. The v0.2.0 rewriter converts a structure created by the ARC SPARQL Parser to SQL code. This allows pushing SPARQL query processing to a mySQL database engine, thus working around the PHP performance bottleneck on hosted Web servers (well, not only there ;). It supports only a subset of the specification, but I tried to cover the most common cases. The rewriter doesn't make much sense as a stand-alone component (unless you are an RDF infrastructure developer), but I'll keep its revisions separate from the upcoming ARC RDF Store.
Unfortunately, the W3C test cases are provided in n3 only, but I managed to at least scrape the examples from the working draft. As you may be able to see in that document, the rewriter cannot handle multiple/nested UNIONs, combined expressions, some of the built-ins (e.g. lang, langMatches), custom functions, and several other features yet, but it can convert triple patterns, OPTIONALs (simple, grouped, or nested), simple UNIONs, simple REGEXes (translated to LIKEs where possible), GRAPH queries, and dataset restrictions (although I'm still not 100% sure if I completely understood the FROM NAMED stuff). I also included the optimisation stuff I wrote about last week: A list of property alternatives can be provided which will then be rewritten to embedded ORs instead of using UNIONs. And the rewriter is able to create SQL for a split up triple table space (How to split is customisable, I'm going to write more about this when the store is released).
Unfortunately, the W3C test cases are provided in n3 only, but I managed to at least scrape the examples from the working draft. As you may be able to see in that document, the rewriter cannot handle multiple/nested UNIONs, combined expressions, some of the built-ins (e.g. lang, langMatches), custom functions, and several other features yet, but it can convert triple patterns, OPTIONALs (simple, grouped, or nested), simple UNIONs, simple REGEXes (translated to LIKEs where possible), GRAPH queries, and dataset restrictions (although I'm still not 100% sure if I completely understood the FROM NAMED stuff). I also included the optimisation stuff I wrote about last week: A list of property alternatives can be provided which will then be rewritten to embedded ORs instead of using UNIONs. And the rewriter is able to create SQL for a split up triple table space (How to split is customisable, I'm going to write more about this when the store is released).
Posted on 2006-01-26 at 12:30 UTC
by
(trackback)
ARC Store design fine-tuning
G
I still haven't released ARC Store as I'm continually discovering optimisation possibilities while working on the SPARQL2SQL rewriter. The latter comes along quite well, just ticked off simple UNIONs, numeric comparisons, grouped OPTIONALS, and nested (yay!) OPTIONALs on my to-do list. I'm currently struggling a little bit with GRAPH queries combined with datasets (restricted via FROM / FROM NAMED) but that's more due to spec reading incapabilities than to mySQL's interpretation of SQL. I'm going to implement a subset of the SPARQL built-ins (e.g. REGEX), but after that the store should finally be usable enough for a first public release.
However, I'm not that used to relational algebra and there are lots of mySQL-specific options, so I frequently used the manual to find out how to e.g. construct SQL UNIONs and LEFT JOINs, or how to make things easier for the query optimizer. I wrote already about RDF store design considerations last month but it looks like there's more room for optimisation:
The column size is now 21 (66% of the initial md5). Taking only a sub-portion of the md5 hash (as e.g. done by 3store) could improve things further. This may all sound a little bit desperate (that's at least what mySQL folks said), but as the ARC Store is probably going to be the only SPARQL engine optimised for basic shared web hosting environments, I assume it's worth the extra effort. Note that overall storage space is not (yet) my main concern, it's the size of the indexes used for join operations.
The problem with a generic solution is a) to decide how to split up the triples, and b) how to efficiently run queries over the whole set of split tables (e.g. for <foo> ?p ?o patterns).
re a): A table for rdf:type is a reasonable idea, 25% in the datasets I worked with so far were rdf:type statements, with another 10% used by dc:date and foaf:name, but the numbers of FOAF and DC terms are clearly application-specific. In order to speed up joins, it might also be useful to completely separate object2object relations from those relating resources to literals (e.g. in CONFOTO, the latter consume over 40%).
re b): From the little experience I gained so far, I don't expect UNIONs or JOIN/OR combinations to be sufficiently fast. But mySQL has a MERGE storage engine which is "a collection of identical MyISAM tables that can be used as one". This allows efficient queries on selected tables (e.g. for joins, or rdf:type restrictions) and ok-ish performance when the whole set of tables has to be included in a query.
I'm still experimenting, may well be that I only go for the first optimisation in ARC store v0.1.0, but the other ones are surely worth further considerations.
However, I'm not that used to relational algebra and there are lots of mySQL-specific options, so I frequently used the manual to find out how to e.g. construct SQL UNIONs and LEFT JOINs, or how to make things easier for the query optimizer. I wrote already about RDF store design considerations last month but it looks like there's more room for optimisation:
Shorter index keys
I'm still using CHAR columns for the hashes, but instead of using the hex-based md5 of an RDF term, I'm converting the md5 to a shorter string (without loosing information) now. The CHAR column uses a full byte for each character, but the characters in an md5 string are all from [0-9a-f] (i.e. a rather small 16-character set). Taking the md5 hash as a base-16 number, I can easily shorten it when I use a longer character set. As I said before, PHP can't handle large integers, so I split the md5 string in three chunks, converted each part to an integer, and then re-encoded the result with a different, larger set of characters. I first used'0123456789 abcdefghijklmnopqrstuvwxyz!?()+,-.@;=[]_{}'
(54 characters) which reduced the overall column size to 23 (-28%). Then I found out that BINARY table columns do case-sensitive matching and may even be faster, so I could change the set to
'0123456789 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!?()+,-.@;=[]_{}'
(79 chars).The column size is now 21 (66% of the initial md5). Taking only a sub-portion of the md5 hash (as e.g. done by 3store) could improve things further. This may all sound a little bit desperate (that's at least what mySQL folks said), but as the ARC Store is probably going to be the only SPARQL engine optimised for basic shared web hosting environments, I assume it's worth the extra effort. Note that overall storage space is not (yet) my main concern, it's the size of the indexes used for join operations.
OR instead of UNION
SPARQL UNIONs can't always be translated to SQL ORs (at least I couldn't figure out how), so using SQL's UNION construct is the better way to be compliant. However, for most practical use cases for UNIONs (alternative predicates), a simpleWHERE (p='rdfs:label' OR p='foaf:name' OR ...) is much faster than a union.
I don't know how to efficiently automate the detection of when to rewrite to ORs, I'll probably have to make that API-only.Splitting up the table space
I think TAP and Jena offer ways to separate selected statements from the main triple table, thus accelerating certain joins and queries (and saving storage space). I also read about this strategy in a more recent blog post by Chimezie Ogbuji who describes an approach with a dedicated rdf:type table.The problem with a generic solution is a) to decide how to split up the triples, and b) how to efficiently run queries over the whole set of split tables (e.g. for <foo> ?p ?o patterns).
re a): A table for rdf:type is a reasonable idea, 25% in the datasets I worked with so far were rdf:type statements, with another 10% used by dc:date and foaf:name, but the numbers of FOAF and DC terms are clearly application-specific. In order to speed up joins, it might also be useful to completely separate object2object relations from those relating resources to literals (e.g. in CONFOTO, the latter consume over 40%).
re b): From the little experience I gained so far, I don't expect UNIONs or JOIN/OR combinations to be sufficiently fast. But mySQL has a MERGE storage engine which is "a collection of identical MyISAM tables that can be used as one". This allows efficient queries on selected tables (e.g. for joins, or rdf:type restrictions) and ok-ish performance when the whole set of tables has to be included in a query.
I'm still experimenting, may well be that I only go for the first optimisation in ARC store v0.1.0, but the other ones are surely worth further considerations.
Posted on 2006-01-17 at 14:00 UTC
by
(trackback)

