SPARQLScript is already heavily used by the Pimp-My-API tool or the TwitterBot, but yesterday I added a couple of new features and finally had a go at implementing a (forward chaining) rule evaluator (for the reasons mentioned some time ago).
A first version ("LOD Linker") is installed on Semantic CB, with initially 9 rules (feel free to leave a comment here if you need some additional mappings). With SPARQLScript being a superset of SPARQL+, most inference scripts are not much more than a single INSERT + CONSTRUCT query (you can click on the form's "Show inference scripts" button to see the source code):
$ins_count = INSERT INTO <${target_g}> CONSTRUCT {?res a foaf:Organization } WHERE { { ?res a cb:Company } UNION { ?res a cb:FinancialOrganization } UNION { ?res a cb:ServiceProvider } # prevent dupes OPTIONAL { GRAPH ?g { ?res a foaf:Organization } } FILTER(!bound(?g)) } LIMIT 2000But with the latest SPARQLScript processor (ARC release 2008-09-12) you can run more sophisticated scripts, such as the one below, which infers DBPedia links from wikipedia URLs:
$rows = SELECT ?res ?link WHERE { { ?res cb:web_presence ?link . } UNION { ?res cb:external_link ?link . } FILTER(REGEX(?link, "wikipedia.org/wiki")) # prevent dupes OPTIONAL { GRAPH ?g { ?res owl:sameAs ?v2 } . } FILTER(!bound(?g)) } LIMIT 500 $triples = ""; FOR ($row in $rows) { # extract the wikipedia identifier $id = ${row.link.replace("/^.*\/([^\/\#]+)(\#.*)?$/", "\1")}; # construct a dbpedia URI $res2 = "http://dbpedia.org/resource/${id}"; # append to triples buffer $triples = "${triples} <${row.res}> owl:sameAs <${res2}> . " } #insert if ($triples) { $ins_count = INSERT INTO <${target_g}> { ${triples} } }(I'm using a similar script to generate
foaf:name
triples by concatenating cb:first_name
and cb:last_name
.)Inferred triples are added to a graph directly associated with the script. Apart from a destructive rule that removes all email addresses, the reasoning can easily be undone again by running a single DELETE query against the inferred graph.
I'm quite happy with the functionality so far. What's still missing is a way to rewrite bnodes, I don't think that's already possible. But INSERT + CONSTRUCT will leave bnode IDs unchanged, so the inference scripts don't necessarily require URI-denoted resources.
Another cool aspect of SPARQLScript-based inferencing is the possibility to use a federated set of endpoints, each processing only a part of a rule. The initial DBPedia mapper above, for example, uses locally available wikipedia links. However, CrunchBase only provides very few of those. So I created a second script which can retrieve DBPedia identifiers for local company homepages, using a combination of local queries and remote ones against the DBPedia SPARQL endpoint (in small iterations and only for companies with at least one employee, but it works).
Comments and Trackbacks
Hmm, I'm not sure which stability you mean. Email removal shouldn't have an effect on overall dataset consistency. In order to keep the system in that state (if that's what you refer to), you have to re-run the rule, or connect it with a trigger that fires on LOAD or INSERT.