SPARQLScript is already heavily used by the Pimp-My-API tool or the TwitterBot, but yesterday I added a couple of new features and finally had a go at implementing a (forward chaining) rule evaluator (for the reasons mentioned some time ago).
A first version ("LOD Linker") is installed on Semantic CB, with initially 9 rules (feel free to leave a comment here if you need some additional mappings). With SPARQLScript being a superset of SPARQL+, most inference scripts are not much more than a single INSERT + CONSTRUCT query (you can click on the form's "Show inference scripts" button to see the source code):
$ins_count = INSERT INTO <${target_g}> CONSTRUCT {?res a foaf:Organization } WHERE { { ?res a cb:Company } UNION { ?res a cb:FinancialOrganization } UNION { ?res a cb:ServiceProvider } # prevent dupes OPTIONAL { GRAPH ?g { ?res a foaf:Organization } } FILTER(!bound(?g)) } LIMIT 2000But with the latest SPARQLScript processor (ARC release 2008-09-12) you can run more sophisticated scripts, such as the one below, which infers DBPedia links from wikipedia URLs:
$rows = SELECT ?res ?link WHERE { { ?res cb:web_presence ?link . } UNION { ?res cb:external_link ?link . } FILTER(REGEX(?link, "wikipedia.org/wiki")) # prevent dupes OPTIONAL { GRAPH ?g { ?res owl:sameAs ?v2 } . } FILTER(!bound(?g)) } LIMIT 500 $triples = ""; FOR ($row in $rows) { # extract the wikipedia identifier $id = ${row.link.replace("/^.*\/([^\/\#]+)(\#.*)?$/", "\1")}; # construct a dbpedia URI $res2 = "http://dbpedia.org/resource/${id}"; # append to triples buffer $triples = "${triples} <${row.res}> owl:sameAs <${res2}> . " } #insert if ($triples) { $ins_count = INSERT INTO <${target_g}> { ${triples} } }(I'm using a similar script to generate
foaf:name
triples by concatenating cb:first_name
and cb:last_name
.)Inferred triples are added to a graph directly associated with the script. Apart from a destructive rule that removes all email addresses, the reasoning can easily be undone again by running a single DELETE query against the inferred graph.
I'm quite happy with the functionality so far. What's still missing is a way to rewrite bnodes, I don't think that's already possible. But INSERT + CONSTRUCT will leave bnode IDs unchanged, so the inference scripts don't necessarily require URI-denoted resources.
Another cool aspect of SPARQLScript-based inferencing is the possibility to use a federated set of endpoints, each processing only a part of a rule. The initial DBPedia mapper above, for example, uses locally available wikipedia links. However, CrunchBase only provides very few of those. So I created a second script which can retrieve DBPedia identifiers for local company homepages, using a combination of local queries and remote ones against the DBPedia SPARQL endpoint (in small iterations and only for companies with at least one employee, but it works).