<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Semantic Web Posts</title>
  <link rel="alternate" type="text/html" href="http://bnode.org/blog/sw_en" />
  <link rel="self" type="application/atom+xml" href="http://bnode.org/blog/atom1/sw_en.atom.atom" />
  <id>http://bnode.org/res/channel/sw_en</id>
  <updated>2010-08-13T21:20Z</updated>
  <author>
    <name>Benjamin Nowack</name>
  </author>
  <generator uri="http://semsol.com/" version="0.2.0">SemSol</generator>

  <entry>
    <title>Dynamic Semantic Publishing for any Blog (Part 2: Linked ReadWriteWeb)</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/08/13/dynamic-semantic-publishing-for-any-blog-part-2-linked-readwriteweb"/>
    <id>http://bnode.org/blog/2010/08/13/dynamic-semantic-publishing-for-any-blog-part-2-linked-readwriteweb</id>
    <published>2010-08-13T21:20Z</published>
    <updated>2010-08-13T21:13:56Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A DSP proof of concept using ReadWriteWeb.com data.</summary>
    <category term="arc"/>
    <category term="bbc"/>
    <category term="blogdb"/>
    <category term="dynamic semantic publishing"/>
    <category term="entity hubs"/>
    <category term="linked data"/>
    <category term="paggr"/>
    <category term="prospect"/>
    <category term="readwriteweb"/>
    <category term="semanticweb"/>
    <category term="semsol"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
The <a href="http://bnode.org/blog/2010/07/30/dynamic-semantic-publishing-for-any-blog-part-1">previous post</a> described a generic approach to BBC-style &amp;quot;Dynamic Semantic Publishing&amp;quot;, where I wondered if it could be applied to basically any weblog.<br />
<br />
During the last days I spent some time on a test evaluation and demo system using data from the popular <a href="http://readwriteweb.com/">ReadWriteWeb tech blog</a>. The application is not public (I don't want to upset the content owners and don't have any spare server anyway), but you can watch a <a href="http://www.youtube.com/watch?v=6sHx2ghiifs">screencast</a> (embedded below).<br />
<br />
The application I created is a semantic dashboard which generates dynamic entity hubs and allows you to explore RWW data via multiple dimensions. To be honest, I was pretty surprised myself by the dynamics of the data. When I switched back to the official site after using the dashboard for some time, I totally missed the advanced filtering options.<br />
<br />
<object width="600" height="420"><param name="movie" value="http://www.youtube.com/v/6sHx2ghiifs&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/6sHx2ghiifs&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="600" height="420"></embed></object>
<br />
<br />
In case you are interested in the technical details, fasten your data seatbelt and read on.<br />
<br />

<h4>Behind the scenes</h4>
As mentioned, the framework is supposed to make it easy for site maintainers and should work with plain HTML as input. Direct access to internal data structures of the source system (database tables, post/author/commenter identifiers etc.) should not be needed. Even RDF experts don't have much experience with side effects of semantic systems directly hooked into running applications. And with RDF encouraging loosely coupled components anyway, it makes sense to keep the semantification on a separate machine.<br />
<br />
In order to implement the process, I used <a href="http://trice.semsol.org/">Trice</a> (once again), which supports simple agents out of the box. The bot-based approach already worked quite nicely in <a href="http://talis.com/">Talis</a>' <a href="http://fanhu.bz/">FanHubz</a> demonstrator, so I followed this route here, too. For &amp;quot;Linked RWW&amp;quot;, I only needed a very small number of bots, though.<br />
<br />
<img src="http://bnode.org/media/2010/08/rww_bot_console.gif" title="Trice Bot Console" alt="Trice Bot Console" /><br />
<br />
Here is a quick re-cap of the <a href="http://bnode.org/media/2010/08/dsp_architecture.gif">proposed dynamic semantic publishing process</a>, followed by a detailed description of the individual components:
<ul><li>Index and monitor the archives pages, build a registry of post URLs.</li>
<li>Load and parse posts into raw structures (title, author, content, ...).</li>
<li>Extract named entities from each post's main content section.</li>
<li>Build a site-optimized schema (an &amp;quot;ontology&amp;quot;) from the data structures generated so far.</li>
<li>Align the extracted data structures with the target ontology.</li>
<li>Re-purpose the final dataset (widgets, entity hubs, semantic ads, authoring tools)</li></ul>
<br />

<h4>Archives indexer and monitor</h4>
The archives indexer fetches the <a href="http://www.readwriteweb.com/archives.php">by-month archives</a>, extracts all link URLs matching the &amp;quot;YYYY/MM&amp;quot; pattern, and saves them in an <a href="http://arc.semsol.org/">ARC Store</a>.<br />
<br />
The implementation of this bot was straightforward (less than 100 lines of PHP code, including support for pagination); this is clearly something that can be turned into a standard component for common blog engines very easily. The result is a complete list of archives pages (so far still without any post URLs) which can be accessed through the RDF store's built-in SPARQL API:<br />
<br />
<img src="http://bnode.org/media/2010/08/rww_archives_rdf.gif" title="Archives triples via SPARQL" alt="Archives triples via SPARQL" /><br />
<br />
A second bot (the archives monitor) receives either a not-yet-crawled index page (if available) or the most current archives page as a starting point. Each post link of that page is then extracted and used to build a registry of post URLs. The monitoring bot is called every 10 minutes and keeps track of new posts.<br />
<br />

<h4>Post loader and parser</h4>
In order to later process post data at a finer granularity than the page level, we have to extract sub-structures such as title, author, publication date, tags, and so on. This is the harder part because most blogs don't use Linked Data-ready HTML in the form of Microdata or RDFa. Luckily, blogs are template-driven and we can use DOM paths to identify individual post sections, similar to how tools like the <a href="http://open.dapper.net/">Dapper Data Mapper</a> work. However, given the flexibility and customization options of modern blog engines, certain extensions are still needed. In the RWW case I needed site-specific code to expand multi-page posts, to extract a machine-friendly publication date, Facebook Likes and Tweetmeme counts, and to generate site-wide identifiers for authors and commenters.<br />
<br />
Writing this bot took several hours and almost 500 lines of code (after re-factoring), but the reward is a nicely structured blog database that can already be explored with an off-the-shelf RDF browser. At this stage we could already use the SPARQL API to easily create dynamic widgets such as &amp;quot;related entries&amp;quot; (via tags or categories), &amp;quot;other posts by same author&amp;quot;, &amp;quot;most active commenters per category&amp;quot;, or &amp;quot;most popular authors&amp;quot; (as shown in the example in the image below).<br />
<br />
<img src="http://bnode.org/media/2010/08/rww_post_structures.gif" title="Raw post structures" alt="Raw post structures" /><br />
<br />

<h4>Named entity extraction</h4>
Now, the next bot can take each post's main content and <a href="http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais">enhance it with Zemanta and OpenCalais</a> (or any other entity recognition tool that produces RDF). The result of this step is a semantified, but rather messy dataset, with attributes from half a dozen RDF vocabularies.<br />
<br />

<h4>Schema/Ontology identification</h4>
Luckily, RDF was designed for working with multi-source data, and thanks to the SPARQL standard, we can use general purpose software to help us find our way through the enhanced assets. I used <a href="http://semsol.com/prospect">a faceted browser</a> to identify the site's main entity types (click on the image below for the full-size version).<br />
<br />
<a href="http://bnode.org/media/2010/08/rww_prospect.gif"><img src="http://bnode.org/media/2010/08/rww_prospect_small.gif" title="RWW through Paggr Prospect" alt="RWW through Paggr Prospect" /></a><br />
<br />
Although spotting inconsistencies (like Richard MacManus appearing multiple times in the &amp;quot;author&amp;quot; facet) is easier with a visual browser, a simple, generic SPARQL query can alternatively do the job, too:<br />
<br />
<img src="http://bnode.org/media/2010/08/rww_types.gif" title="RWW entity types" alt="RWW entity types" /><br />
<br />

<h4>Specifying the target ontology</h4>
The central entity types extracted from RWW posts are Organizations, People, Products, Locations, and Technologies. Together with the initial structures, we can now draft a consolidated RWW target ontology, as illustrated below. Each node gets its own identifier (a URI) and can thus be a bridge to the public <a href="http://linkeddata.org/">Linked Data</a> cloud, for example to import a company's competitor information.<br />
<br />
<img src="http://bnode.org/media/2010/08/rww_ont.gif" title="RWW ontology" alt="RWW ontology" /><br />
<br />

<h4>Aligning the data with the target ontology</h4>
In this step, we are again using a software agent and break things down into smaller operations. These sub-tasks require some RDF and Linked Data experience, but basically, we are just manipulating the graph structure, which can be done quite comfortably with a SPARQL 1.1 processor that supports INSERT and DELETE commands. Here are some example operations that I applied to the RWW data:
<ul><li>Consolidate author aliases (&amp;quot;richard-macmanus-1 = richard-macmanus-2&amp;quot; etc.).</li>
<li>Normalize author tags, Zemanta tags, OpenCalais tags, and OpenCalais &amp;quot;industry terms&amp;quot; to a single &amp;quot;tag&amp;quot; field.</li>
<li>Consolidate the various type identifiers into canonical ones.</li>
<li>For each untyped entity, retrieve typing and label information from the Linked Data cloud (e.g. DBPedia, Freebase, or Semantic CrunchBase) and try to map them to the target ontology. </li>
<li>Try to consolidate &amp;quot;obviously identical&amp;quot; entities (I cheated by merging on labels here and there, but it worked).</li></ul>
Data alignment and QA is an iterative process (and a slightly slippery slope). The quality of public linked data varies, but the cloud is very powerful. Each optimization step adds to the network effects and you constantly discover new consolidation options. I spent just a few hours on the inferencer, after all, the Linked RWW demo is just meant to be a proof of concept.<br />
<br />
After this step, we're basically done. From now on, the bots can operate autonomously and we can (finally) build our dynamic semantic publishing apps, like the <a href="http://paggr.com/">Paggr</a> Dashboard presented in the <a href="http://www.youtube.com/watch?v=6sHx2ghiifs">video</a> above.<br />
<br />
<a href="http://bnode.org/media/2010/08/rww_dashboard.gif"><img src="http://bnode.org/media/2010/08/rww_dashboard_small.gif" title="Dynamic RWW Entity Hub" alt="Dynamic RWW Entity Hub" /></a><br />
 <br />

<h4>Conclusion</h4>
Dynamic Semantic Publishing on mainstream websites is still new, and there are no complete off-the-shelf solutions on the market yet. Many of the individual components needed, however, are available. Additionally, the manual effort to integrate the tools is no longer incalculable research, but is getting closer to predictable &amp;quot;standard&amp;quot; development effort. If you are perhaps interested in a solution similar to the ones described in this post, please <a href="http://semsol.com/contact">get in touch</a>.
 
      </div>
    </content>
  </entry>

  <entry>
    <title>Dynamic Semantic Publishing for any Blog (Part 1)</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/07/30/dynamic-semantic-publishing-for-any-blog-part-1"/>
    <id>http://bnode.org/blog/2010/07/30/dynamic-semantic-publishing-for-any-blog-part-1</id>
    <published>2010-08-02T09:55Z</published>
    <updated>2010-08-14T16:23:53Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Bringing automated semantic page generation a la BBC to standard web environments.</summary>
    <category term="arc"/>
    <category term="bbc"/>
    <category term="dynamic semantic publishing"/>
    <category term="entity hubs"/>
    <category term="linked data"/>
    <category term="prospect"/>
    <category term="readwriteweb"/>
    <category term="semanticweb"/>
    <category term="semsol"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
&amp;quot;Dynamic Semantic Publishing&amp;quot; is a new technical term which was <a href="http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dynamic_sem.html">introduced by the BBC's online team</a> a few weeks ago. It describes the idea of utilizing <a href="http://linkeddata.org/">Linked Data</a> technology to automate the aggregation and publication of interrelated content objects. The <a href="http://news.bbc.co.uk/sport1/hi/football/world_cup_2010/default.stm">BBC's World Cup website</a> was the first large mainstream website to use this method. It provides hundreds of automatically generated, topically composed pages for individual football entities (players, teams, groups) and related articles.<br />
<br />
Now, the added value of such linked &amp;quot;entity hubs&amp;quot; would clearly be very interesting for other websites and blogs as well. They are multi-dimensional entry points to a site and provide a much better and more user-engaging way to explore content than the usual flat archives pages, which normally don't have dimensions beyond date, tag, and author. Additionally, HTML aggregations with embedded Linked Data identifiers can improve search engine rankings, and they enable semantic ad placement, which are attractive by-products.<br />
<br />
<img src="http://bnode.org/media/2010/08/entity_hubs.gif" title="Entity hub examples" alt="Entity hub examples" /><br />
<br />
The architecture used by the BBC is optimized for their internal publishing workflow and thus not necessarily suited for small and medium-scale media outlets. So I've <a href="http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais">started</a> thinking about a lightweight version of the BBC infrastructure, one that would integrate more easily with typical web server environments and widespread blog engines.<br />
<br />

<h4>How could a generalized approach to dynamic semantic publishing look like?</h4>
We should assume setups where direct access to a blog's database tables is not available. Working with already published posts requires a template detector and custom parsers, but it lowers the entry barrier for blog owners significantly. And content importers can be reused to a large extent when sites are based on standard blog engines such as WordPress or Movable Type.<br />
<br />
The graphic below (<a href="http://bnode.org/media/2010/08/dsp_architecture.gif">large version</a>) illustrates a possible, generalized approach to dynamic semantic publishing.<br />

<a href="http://bnode.org/media/2010/08/dsp_architecture.gif"><img src="http://bnode.org/media/2010/08/dsp_architecture_small.gif" title="Dynamic Semantic Publishing" alt="Dynamic Semantic Publishing" /></a><br />
<br />
Process explanation:
<ul><li><strong>Step 1</strong>: A blog-specific crawling agent indexes articles linked from central archives pages. The index is stored as RDF, which enables the easy expansion of post URLs to richly annotated content objects.</li>
<li><strong>Step 2</strong>: Not-yet-imported posts from the generated blog index are parsed into core structural elements such as title, author, date of publication, main content, comments, Tweet counters, Facebook Likes, and so on. The semi-structured post information is added to the triple store for later processing by other agents and scripts. Again, we need site (or blog engine)-specific code to extract the various possible structures. This step could be accelerated by using an interactive extractor builder, though.</li>
<li><strong>Step 3</strong>: Post contents are passed to APIs like <a href="http://opencalais.com/">OpenCalais</a> or <a href="http://zemanta.com/">Zemanta</a> in order to extract stable and re-usable entity identifiers. The resulting data is added to the RDF Store.</li>
<li>After the initial semantification in step 3, a generic RDF data browser can be used to explore the extracted information. This simplifies general consistency checks and the identification of the site-specific ontology (concepts and how they are related). Alternatively, this could be done (in a less comfortable way) via the RDF store's SPARQL API.</li>
<li><strong>Step 4</strong>: Once we have a general idea of the target schema (entity types and their relations), custom SPARQL agents process the data and populate the ontology. They can optionally access and utilize public data.</li>
<li>After step 4, the rich resulting graph data allows the creation of context-aware widgets. These widgets (&amp;quot;Related articles&amp;quot;, &amp;quot;Authors for this topic&amp;quot;, &amp;quot;Product experts&amp;quot;, &amp;quot;Top commenters&amp;quot;, &amp;quot;Related technologies&amp;quot;, etc.) can now be used to build user-facing applications and tools.</li>
<li><strong>Use case 1</strong>: Entity hubs for things like authors, products, people, organizations, commenters, or other domain-specific concepts.</li>
<li><strong>Use case 2</strong>: Improving the source blog. The typical &amp;quot;Related articles&amp;quot; sections in standard blog engines, for example, don't take social data such as Facebook Likes or re-tweets into account. Often, they are just based on explicitly defined tags. With the enhanced blog data, we can generate aggregations driven by rich semantic criteria.</li>
<li><strong>Use case 3</strong>: Authoring extensions: After all, the automated entity extraction APIs are not perfect. With the site-wide ontology in place, we could provide content creators with convenient annotation tools to manually highlight some text and then associate the selection with a typed entity from the RDF store. Or they could add their own concepts to the ontology and share it with other authors. The manual annotations help increase the quality of the entity hubs and blog widgets.
</li></ul>
<br />

<h4>Does it work?</h4>
I explored this approach to dynamic semantic publishing with nearly nine thousand articles from <a href="http://readwriteweb.com/">ReadWriteWeb</a>. <a href="http://bnode.org/blog/2010/08/13/dynamic-semantic-publishing-for-any-blog-part-2-linked-readwriteweb">In the next post</a>, I'll describe a &amp;quot;Linked RWW&amp;quot; demo which combines <a href="http://trice.semsol.org/">Trice</a> bots, <a href="http://arc.semsol.org/">ARC</a>, <a href="http://semsol.com/prospect">Prospect</a>, and the <a href="http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais">handy semantic APIs provided by OpenCalais and Zemanta</a>.<br />
<br />


      </div>
    </content>
  </entry>

  <entry>
    <title>Linked Data Entity Extraction with Zemanta and OpenCalais</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais"/>
    <id>http://bnode.org/blog/2010/07/28/linked-data-entity-extraction-with-zemanta-and-opencalais</id>
    <published>2010-07-28T09:50Z</published>
    <updated>2010-07-29T07:27:15Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A comparison of the NER APIs by Zemanta and OpenCalais.</summary>
    <category term="blogdb"/>
    <category term="linkeddata"/>
    <category term="ner"/>
    <category term="nlp"/>
    <category term="opencalais"/>
    <category term="prospect"/>
    <category term="rdf"/>
    <category term="readwriteweb"/>
    <category term="rww"/>
    <category term="semanticweb"/>
    <category term="zemanta"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I had another look at the Named Entity Extraction APIs by <a href="http://zemanta.com/">Zemanta</a> and <a href="http://www.opencalais.com/">OpenCalais</a> for some product launch demos. My <a href="http://bnode.org/blog/2009/01/16/connecting-the-lod-dots-with-calais-4-0-and-zemanta">first test from last year</a> concentrated more on the Zemanta API. This time I had a closer look at both services, trying to identify the &amp;quot;better one&amp;quot; for &amp;quot;BlogDB&amp;quot;, a semi-automatic blog semantifier.<br />
<br />
My main need is a service that receives a cleaned-up plain text version of a blog post and returns normalized tags and reusable entity identifiers. So, the findings in this post are rather technical and just related to the BlogDB requirements. I ignored features which could well be essential for others, such as Zemanta's &amp;quot;related articles and photos&amp;quot; feature, or OpenCalais' entity relations (&amp;quot;X hired Y&amp;quot; etc.).<br />
<br />

<h4>Terms and restrictions of the free API</h4>
<ul><li>The API terms are pretty similar (the wording is actually almost identical). You need an API key and both services can be used commercially as long as you give attribution and don't proxy/resell the service. </li>
<li>OpenCalais gives you more free API calls out of the box  than Zemanta (50.000 vs. 1.000 per day). You can get a free upgrade to 10.000 Zemanta calls via a simple email, though (or excessive API use; Andraž auto-upgraded my API limit when he noticed my <a href="http://bnode.org/blog/2009/01/16/connecting-the-lod-dots-with-calais-4-0-and-zemanta">crazy HDStreams test</a> back then ;-).</li>
<li>OpenCalais lets you process larger content chunks (up to 100K, vs. 8K at Zemanta).</li></ul>
<br />

<h4>Calling the API</h4>
<ul><li>Both interfaces are simple and well-documented. Calls to the OpenCalais API are a tiny bit more complicated as you have to encode certain parameters in an XML string. Zemanta uses simple query string arguments. I've added the respective PHP snippets below, the complexity difference is negligible.
<pre class="code">function getCalaisResult($id, $text) {
  $parms = '
    &amp;lt;c:params xmlns:c=&amp;quot;http://s.opencalais.com/1/pred/&amp;quot;
              xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&amp;quot;&amp;gt;
      &amp;lt;c:processingDirectives
        c:contentType=&amp;quot;TEXT/RAW&amp;quot;
        c:outputFormat=&amp;quot;XML/RDF&amp;quot;
        c:calculateRelevanceScore=&amp;quot;true&amp;quot;
        c:enableMetadataType=&amp;quot;SocialTags&amp;quot;
        c:docRDFaccessible=&amp;quot;false&amp;quot;
        c:omitOutputtingOriginalText=&amp;quot;true&amp;quot;
        &amp;gt;&amp;lt;/c:processingDirectives&amp;gt;
      &amp;lt;c:userDirectives
        c:allowDistribution=&amp;quot;false&amp;quot;
        c:allowSearch=&amp;quot;false&amp;quot;
        c:externalID=&amp;quot;' . $id . '&amp;quot;
        c:submitter=&amp;quot;http://semsol.com/&amp;quot;
        &amp;gt;&amp;lt;/c:userDirectives&amp;gt;
      &amp;lt;c:externalMetadata&amp;gt;&amp;lt;/c:externalMetadata&amp;gt;
    &amp;lt;/c:params&amp;gt;
  ';
  $args = array(
    'licenseID' =&amp;gt; $this-&amp;gt;a['calais_key'],
    'content' =&amp;gt; urlencode($text),
    'paramsXML' =&amp;gt; urlencode(trim($parms))
  );
  $qs = substr($this-&amp;gt;qs($args), 1);
  $url = 'http://api.opencalais.com/enlighten/rest/';
  return $this-&amp;gt;getAPIResult($url, $qs);
}
</pre>
<pre class="code">function getZemantaResult($id, $text) {
  $args = array(
    'method' =&amp;gt; 'zemanta.suggest',
    'api_key' =&amp;gt; $this-&amp;gt;a['zemanta_key'],
    'text' =&amp;gt; urlencode($text),
    'format' =&amp;gt; 'rdfxml',
    'return_rdf_links' =&amp;gt; '1',
    'return_articles' =&amp;gt; '0',
    'return_categories' =&amp;gt; '0',
    'return_images' =&amp;gt; '0',
    'emphasis' =&amp;gt; '0',
  );
  $qs = substr($this-&amp;gt;qs($args), 1);
  $url = 'http://api.zemanta.com/services/rest/0.0/';
  return $this-&amp;gt;getAPIResult($url, $qs);
}
</pre> </li>
<li>The actual API call is then a simple POST:<pre class="code">function getAPIResult($url, $qs) {
  ARC2::inc('Reader');
  $reader = new ARC2_Reader($this-&amp;gt;a, $this);
  $reader-&amp;gt;setHTTPMethod('POST');
  $reader-&amp;gt;setCustomHeaders(&amp;quot;Content-Type: application/x-www-form-urlencoded&amp;quot;);
  $reader-&amp;gt;setMessageBody($qs);
  $reader-&amp;gt;activate($url);
  $r = '';
  while ($d = $reader-&amp;gt;readStream()) {
    $r .= $d;
  }
  $reader-&amp;gt;closeStream();
  return $r;
}
</pre></li>
<li>Both APIs are fast.
</li></ul>
<br />

<h4>API result processing</h4>
<ul><li>The APIs return rather verbose data, as they have to stuff in a lot of meta-data such as confidence scores, text positions, internal and external identifiers, etc. But they also offer RDF as one possible result format, so I could store the response data as a simple graph and then use SPARQL queries to extract the relevant information (tags and named entities). Below is the query code for Linked Data entity extraction from Zemanta's RDF. As you can see, the graph structure isn't trivial, but still understandable:
<pre class="code">SELECT DISTINCT ?id ?obj ?cnf ?name
FROM &amp;lt;' . $g . '&amp;gt; WHERE {
  ?rec a z:Recognition ;
       z:object ?obj ;
       z:confidence ?cnf .
  ?obj z:target ?id .
  ?id z:targetType &amp;lt;http://s.zemanta.com/targets#rdf&amp;gt; ;
      z:title ?name .
  FILTER(?cnf &amp;gt;= 0.4)
} ORDER BY ?id
</pre>
</li></ul>
<br />

<h4>Extracting normalized tags</h4>
<ul><li>OpenCalais results contain a section with so-called &amp;quot;SocialTags&amp;quot; which are directly usable as plain-text tags. </li>
<li>The tag structures in the Zemanta result are called &amp;quot;Keywords&amp;quot;. In my tests they only contained a subset of the detected entities, and so I decided to use the labels associated with detected entities instead. This worked well, but the respective query is more complex.</li></ul>
<br />

<h4>Extracting entities</h4>
<ul><li>In general, OpenCalais results can be directly utilized more easily. They contain stable identifiers and the identifiers come with type information and other attributes such as stock symbols. The API result directly tells you how many Persons, Companies, Products, etc. were detected. And the URIs of these entity types are all from a single (OpenCalais) namespace. If you are not a Linked Data pro, this simplifies things a lot. You only have to support a simple list of entity types to build a working semantic application. If you want to leverage the wider <a href="http://linkeddata.org/">Linked Open Data</a> cloud, however, the OpenCalais response is just a first entry point. It doesn't contain community URIs. You have to use the OpenCalais website to first retrieve disambiguation information, which may then (often involving another request) lead you to the decentralized Linked Data identifiers.</li>
<li>Zemanta responses, in contrast, do not (yet, Andraž told me they are working on it) contain entity types at all. You always need an additional request to retrieve type information (unless you are doing nasty URI inspection, which is what I did with detected URIs from <a href="http://cb.semsol.org/">Semantic CrunchBase</a>). The retrieval of type information is done via Open Data servers, so you have to be able to deal with the usual down-times of these non-commercial services.</li>
<li>Zemanta results are very &amp;quot;webby&amp;quot; and full of community URIs. They even include sameAs information. This can be a bit overwhelming if you are not an RDFer, e.g. looking up a <a href="http://dbpedia.org/">DBPedia</a> URI will often give you dozens of entity types, and you need some experience to match them with your internal type hierarchy. But for an open data developer, the hooks provided by Zemanta are a dream come true. </li>
<li>With Zemanta associating shared URIs with all detected entities, I noticed network effects kicking in a couple of times. I used <a href="http://readwriteweb.com/">RWW</a> articles for the test, and in one post, for example, OpenCalais could detect the company &amp;quot;Starbucks&amp;quot; and &amp;quot;Howard Schultz&amp;quot; as their &amp;quot;CEO&amp;quot;, but their public RDF (when I looked up the &amp;quot;Howard Schultz&amp;quot; URI) didn't persist this linkage. The detection scope was limited to the passed snippet. Zemanta, on the other hand, directly gave me Linked Data URIs for both &amp;quot;Starbucks&amp;quot; and &amp;quot;Howard Schultz&amp;quot;, and these identifiers make it possible to re-establish the relation between the two entities at any time. This is a very powerful feature.
</li></ul>
<br />

<h4>Summary</h4>
Both APIs are great. The quality of the entity extractors is awesome. For the RWW posts, which deal a lot with Web topics, Zemanta seemed to have a couple of extra detections (such as &amp;quot;ReadWriteWeb&amp;quot; as company). As usual, some owl:sameAs information is wrong, and Zemanta uses incorrect Semantic CrunchBase URIs (&amp;quot;.rdf#self&amp;quot; instead of &amp;quot;#self&amp;quot; // <em>Update: to be fixed in the next Zemanta API revision</em>), but I blame us (the RDF community), not the API providers, for not making these things easier to implement.<br />
<br />
In the end, I decided to use both APIs in combination, with an optional post-processing step that builds a consolidated, internal ontology from the detected entities (OpenCalais has two Company types which could be merged, for example). Maybe I can make a <a href="http://semsol.com/prospect">Prospect</a> demo from the RWW data public, not sure if they would allow this. It's really impressive how much value the entity extraction services can add to blog data, though (see the screenshot below, which shows a pivot operation on products mentioned in posts by Sarah Perez). I'll write a bit more about the possibilities in another post.<br />
<br />
<a href="http://bnode.org/media/2010/07/blogdb_rww.gif"><img src="http://bnode.org/media/2010/07/blogdb_rww_small.gif" title="RWW posts via BlogDB" alt="RWW posts via BlogDB" /></a>


      </div>
    </content>
  </entry>

  <entry>
    <title>Contextual configuration - Semantic Web development for visually minded webmasters</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/05/10/contextual-configuration-semantic-web-development-for-visually-minded-webmasters"/>
    <id>http://bnode.org/blog/2010/05/10/contextual-configuration-semantic-web-development-for-visually-minded-webmasters</id>
    <published>2010-05-21T12:40Z</published>
    <updated>2010-05-21T13:06:11Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A short screencast demonstrating contextual configuration via widgets in semsol's RDF CMS.</summary>
    <category term="cms"/>
    <category term="configuration"/>
    <category term="faceted browser"/>
    <category term="paggr"/>
    <category term="prospect"/>
    <category term="semanticweb"/>
    <category term="ux"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Let's face it, building semantic web sites and apps is still far from easy. And to some extent, this is due to the configuration overhead. The RDF stack is built around declarative languages (for simplified integration at various levels), and as a consequence, configuration directives often end up in some form of declarative format, too. While fleshing out an RDF-powered website, you have to declare a ton of things. From namespace abbreviations to data sources and API endpoints, from vocabularies to identifier mappings, from queries to object templates, and what have you.<br />
<br />
Sadly, many of these configurations are needed to style the user interface, and because of RDF's open world context, designers have to know much more about the data model and possible variations than usually necessary. Or webmasters have to deal with design work. Not ideal either. If we want to bring RDF to mainstream web developers, we have to simplify the creation of user-optimized apps. The value proposition of semantics in the context of information overload is pretty clear, and some form of data integration is becoming mandatory for any modern website. But the entry barrier caused by large and complicated configuration files (Fresnel anyone?) is still too high. How can we get from our powerful, largely generic systems to end-user-optimized apps? Or the other way round: How can we support frontend-oriented web development with our flexible tools and freely mashable data sets? (Let me quickly mention Drupal here, which is doing a great job at near-seamlessly integrating RDF. OK, back to the post.)<br />
<br />
Enter RDF widgets. Widgets have obvious backend-related benefits like accessing, combining and re-purposing information from remote sources within a manageable code sandbox. But they can also greatly support frontend developers. They simplify page layouting and incremental site building with instant visual feedback (add a widget, test, add another one, re-arrange, etc.). And, more importantly in the RDF case, they can offer a way to iteratively configure a system with very little technical overhead. Configuration options could not only be scoped to the widget at hand, but also to the <em>context</em> where the widget is currently viewed. Let's say you are building an RDF browser and need resource templates for all kinds of items. With contextual configuration, you could simply browse the site and at any position in the ontology or navigation hierarchy, you would just open a configuration dialog and define a custom template, if needed. Such an approach could enable systems that worked out of the box (raw, but usable) and which could then be continually optimized, possibly even by site users.<br />
<br />
A lot of &amp;quot;could&amp;quot; and &amp;quot;would&amp;quot; in the paragraphs above, and the idea may sound quite abstract without actually seeing it. To illustrate the point I'm trying to make I've prepared a short video (embedded below). It uses <a href="http://cb.semsol.org/">Semantic CrunchBase</a> and <a href="http://semsol.com/prospect">Paggr Prospect</a> (our new faceted browser builder) as an example use case for in-context configuration.<br />
<br />
And if you are interested in using one of our solutions for your own projects, <a href="http://semsol.com/contact">please get in touch</a>!<br />
<br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/Sz8ohHDViL8&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Sz8ohHDViL8&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
Paggr Prospect (part 1) <br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/_yO_dEn0g0g&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/_yO_dEn0g0g&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
Paggr Prospect (part 2)<br />

      </div>
    </content>
  </entry>

  <entry>
    <title>Trice' Semantic Richtext Editor</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/05/01/trice-semantic-richtext-editor"/>
    <id>http://bnode.org/blog/2010/05/01/trice-semantic-richtext-editor</id>
    <published>2010-05-01T16:35Z</published>
    <updated>2010-05-03T09:44:47Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A screencast demonstrating the structured RTE bundled with the Trice CMS</summary>
    <category term="cms"/>
    <category term="editor"/>
    <category term="html5"/>
    <category term="linkeddata"/>
    <category term="markup"/>
    <category term="microdata"/>
    <category term="rdfa"/>
    <category term="rte"/>
    <category term="semanticweb"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
In my <a href="http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy">previous post</a> I mentioned that I'm building a Linked Data CMS. One of its components is a rich-text editor that allows the creation (and embedding) of structured markup.<br />
<br />
An earlier version supported limited Microdata annotations, but now I've switched the mechanism and use an intermediate, but even simpler approach based on HTML5's handy data-* attributes. This lets you build almost arbitrary markup with the editor, including Microformats, Microdata, or RDFa. I don't know yet when the CMS will be publicly available (3 sites are under development right now), but as mentioned, I'd be happy about another pilot project or two. Below is a video demonstrating the editor and its easy customization options.<br />
<br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/bn8DmFGk9rA&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/bn8DmFGk9rA&amp;hl=en_US&amp;fs=1&amp;rel=0&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>


      </div>
    </content>
  </entry>

  <entry>
    <title>Could having two RDF-in-HTMLs actually be handy?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy"/>
    <id>http://bnode.org/blog/2010/04/15/could-having-two-rdf-in-htmls-actually-be-handy</id>
    <published>2010-04-15T10:30Z</published>
    <updated>2010-05-10T08:15:05Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A combination of  RDFa and Microdata would allow for separate semantic layers.</summary>
    <category term="cms"/>
    <category term="microdata"/>
    <category term="paggr"/>
    <category term="rdf"/>
    <category term="rdfa"/>
    <category term="semanticweb"/>
    <category term="stepbystep"/>
    <category term="trice"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Apart from grumpy rants about the complexity of W3C's RDF specs and <a href="http://twitter.com/bengee/status/11886421732">semantic richtext editing excitement</a>, I haven't blogged or tweeted a lot recently. That's partly because there finally is increased demand for the stuff I'm doing at <a href="http://semsol.com/">semsol</a> (agency-style SemWeb development), but also because I've been working hard on getting my tools in a state where they feel more like typical Web frameworks and apps. <a href="http://talis.com/">Talis</a>' <a href="http://fanhu.bz/">Fanhu.bz</a> is an example where (I think) we found a good balance between powerful RDF capabilities (data re-purposing, remote models, data augmentation, a crazy army of inference bots) and a non-technical UI (simplistic visual browser, Twitter-based annotation interfaces).<br />
<br />
Another example is something I've been working on during the last months: I somehow managed to combine essential parts of <a href="http://paggr.com/">Paggr</a> (a drag&amp;drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to <a href="http://www.w3.org/TR/microdata/">Microdata</a> for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:<br />
<br />
As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the &amp;quot;just-visual&amp;quot; integration of current widget pages and <a href="http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards">bring truly connectable and reusable information to the user interface</a>.<br />
<br />
Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy &amp; paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.<br />
<br />
Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -&amp;gt; main content -&amp;gt; page layout -&amp;gt; widget structure -&amp;gt; widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject (&amp;quot;this is about the page structure&amp;quot;, &amp;quot;this is about the main page topic&amp;quot;). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).<br />
<br />
<img src="http://bnode.org/media/2010/04/trice_cms_editing.gif" title="Trice CMS Editor" alt="Trice CMS Editor" /><br />
<small>Trice CMS editing view</small><br />
<br />
<img src="http://bnode.org/media/2010/04/trice_cms_view.gif" title="Trice CMS Editor" alt="Trice CMS Editor" /><br />
<small>Trice CMS page view with inline widgets describing one resource</small><br />
<br />
If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.<br />
<br />
To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards &amp;quot;No&amp;quot;). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy &amp; paste, but explicit subjects (or &amp;quot;itemid&amp;quot;s in Microdata-speak) could bring us a little closer.<br />
<br />
Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?



      </div>
    </content>
  </entry>

  <entry>
    <title>Microdata, semantic markup for both RDFers and non-RDFers</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-both-rdfers-and-non-rdfers"/>
    <id>http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-both-rdfers-and-non-rdfers</id>
    <published>2010-01-26T12:00Z</published>
    <updated>2010-01-26T18:20:03Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>RDF-in-HTML could have been so simple.</summary>
    <category term="arc"/>
    <category term="html5"/>
    <category term="microdata"/>
    <category term="rdf-in-html"/>
    <category term="rdfa"/>
    <category term="semanticweb"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
There's been a whole lot of discussion around <a href="http://dev.w3.org/html5/md/Overview.html">Microdata</a>, a new approach for embedding machine-readable information into forthcoming HTML5. What I find most attractive about Microdata is the fact that it was designed by HTMLers, not RDFers. It's refreshingly pragmatic, free of other RDF spec legacy, but still capable of expressing most of RDF.<br />
<br />
Unfortunately, <a href="http://rdfa.info">RDFa</a> lobbyists on the HTML WG mailing list forced the spec out of HTML5 core for the time being. This manoeuver was understandable (a lot of energy went into RDFa, after all), but in my opinion very short-sighted. How many uphill battles did we have, trying to get RDF to the broader developer community? And how many were successful? Atom, microformats, OpenID, Portable Contacts, XRDS, Activity Streams (well, not really), these are examples where RDFers tried, but failed to promote some of their infrastructure into the respective solutions. Now: HTML5, where the initial RDF lobbying actually had an effect and lead to a native mechanism for RDF-in-HTML. Yes, <strong>native</strong>, not in some separate spec. This would have become part of every HTML5 book, any HTML developer on this planet would have learned about it. Finally a battle won. And what a great one. HTML.<br />
<br />
But no, Microdata wasn't developed by an RDF group, so they voted it out again. Now, the really sad thing is, there could have been a solution that would have served everybody sufficiently well, both HTMLers and RDFers. The RDFa group recently realized that RDFa needs to be revised anyway, there is going to be an RDFa 1.1 which will require new parsers. If they'd swallowed their pride, they would most probably have been able to define RDFa 1.1 as a proper superset of Microdata.<br />
<br />
Here is a short overview of RDF features supported by Microdata:
<ul><li>Explicit resource containers, via @itemscope (in RDFa, the boundaries of a resource are often implicitly defined by @rel or @typeof)</li>
<li>Subject declaration, via @itemid (RDFa uses @about)</li>
<li>Main subject typing, via @itemtype (RDFa uses @typeof)</li>
<li>Predicate declaration, via @itemprop (RDFa uses @property, @rel, and @rev)</li>
<li>Literal objects, via node values (RDFa also allows hidden values via @content)</li>
<li>Non-literal objects, via @href, @src, etc. (RDFa also allows hidden values via @resource)</li>
<li>Object language, via @lang</li>
<li>Blank nodes</li></ul>

I won't go into details why hiding semantics in RDFa will be penalized by search engines as soon as spammers discover the possibilities, why reusing RDF/XML's attribute names was probably not a smart move with regard to attracting non-RDFers, why the new @vocab idea is impractical, or why namespace prefixes, as handy as they are in other RDF formats, are not too helpful in an HTML context. Let's simply state that there is a trade-off between extended features (RDFa) and simplicity (Microdata). So, what are the core features that an RDFer would really need beyond Microdata:
<ul><li>the possibility to preserve markup, but probably not necessarily as an explicit rdf:XMLLiteral</li>
<li>datatypes for literal objects (I personally never used them in practice in the last 6 years that I've been developing RDF apps, but I can see some use cases)</li></ul>

Markup preservation is currently turned on by default in RDFa and can be disabled through @datatype in RDFa, so an RDFer-satisfying RDFa 1.1 spec could probably just be Microdata + @datatype +  a few extended parsing rules to end up with the intended RDF. My experience with watching RDF spec creation tells me that the RDFa group won't pick this route (there simply is no &amp;quot;<a href="http://www.slideshare.net/dmc500hats/startup-metrics-for-pirates-fowa-london-oct-2009">Kill a Feature</a>&amp;quot; mentality in the RDF community), but hey, hope dies last.<br />
<br />
I've been using Microdata in two of my recent RDF apps and the CMS module of (ahem, still not documented) Trice, and it's been a great experience. <a href="http://arc.semsol.org/">ARC</a> is going to get a &amp;quot;microRDF&amp;quot; extractor that supports the RDF-in-Microdata markup below (Note: this output still requires a 2nd extraction process, as the current Microdata draft's RDF mechanism only produces intermediate RDF triples, which then still have to be post-processed. I hope <a href="http://lists.w3.org/Archives/Public/public-html/2010Jan/0912.html">my related suggestion</a> will become official, but I seem to be the only pro-Microdata RDFer on the HTML list right now, so it may just stay as a convention):
<br />
<br />
<strong>Microdata</strong>:
<pre class="code">&amp;lt;div itemscope itemtype=&amp;quot;<strong>http://xmlns.com/foaf/0.1/</strong>Person&amp;quot;&amp;gt;

  &amp;lt;!-- plain props are mapped to the itemtype's context --&amp;gt;
  &amp;lt;img itemprop=&amp;quot;<strong>img</strong>&amp;quot; src=&amp;quot;mypic.jpg&amp;quot; alt=&amp;quot;a pic of me&amp;quot; /&amp;gt;
  My name is &amp;lt;span itemprop=&amp;quot;<strong>name</strong>&amp;quot;&amp;gt;&amp;lt;span itemprop=&amp;quot;<strong>nick</strong>&amp;quot;&amp;gt;Alec&amp;lt;/span&amp;gt; Tronnick&amp;lt;/span&amp;gt;
  and I blog at &amp;lt;a itemprop=&amp;quot;<strong>weblog</strong>&amp;quot; href=&amp;quot;http://alec-tronni.ck/&amp;quot;&amp;gt;alec-tronni.ck&amp;lt;/a&amp;gt;.

  &amp;lt;!-- other RDF vocabs can be used via full itemprop URIs --&amp;gt;
  &amp;lt;span itemprop=&amp;quot;<strong>http://purl.org/vocab/bio/0.1/olb</strong>&amp;quot;&amp;gt;
    I'm a crash test dummy for semantic HTML.
  &amp;lt;/span&amp;gt;
&amp;lt;/div&amp;gt;
</pre>

<strong>Extracted RDF</strong>:
<pre class="code">@base &amp;lt;http://host/path/&amp;gt;
@prefix foaf: &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt; .
@prefix bio: &amp;lt;http://purl.org/vocab/bio/0.1/&amp;gt; .
_:bn1 a foaf:Person ;
      foaf:img &amp;lt;mypic.jpg&amp;gt; ;
      foaf:name &amp;quot;Alec Tronnick&amp;quot; ;
      foaf:nick &amp;quot;Alec&amp;quot; ;
      foaf:weblog &amp;lt;http://alec-tronni.ck/&amp;gt; ;
      bio:olb &amp;quot;I'm a crash test dummy for semantic HTML.&amp;quot; .
</pre>


      </div>
    </content>
  </entry>

  <entry>
    <title>Naming Properties and Relations (comment)</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/09/15/naming-properties-and-relations_comment"/>
    <id>http://bnode.org/blog/2009/09/15/naming-properties-and-relations_comment</id>
    <published>2009-09-15T17:45Z</published>
    <updated>2009-09-17T10:00:03Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A local comment to JeniT's post about predicate names</summary>
    <category term="graphnote"/>
    <category term="microblogging"/>
    <category term="rdf"/>
    <category term="semantic logging"/>
    <category term="semanticweb"/>
    <category term="ui"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I was incapable of adding a comment to <a href="http://www.jenitennison.com/blog/node/128">Jeni's interesting post about RDF predicate Names</a> (markdown-related, my fault), so I'll quickly post it here, as I'm pondering similar things, too.<br />
<br />
In her post, Jeni explores the issues around naming RDF terms. The community gathered a couple of experiences and suggestions in the last years, some entry points are:
<ul><li><a href="http://dig.csail.mit.edu/breadcrumbs/node/72">Backward and Forward links in RDF just as important</a></li>
<li><a href="http://esw.w3.org/topic/HasPropertyOf">HasPropertyOf (ESW Wiki)</a></li>
<li><a href="http://esw.w3.org/topic/RoleNoun">RoleNoun (ESW Wiki)</a>
</li></ul><br />

I personally find &amp;quot;role-noun&amp;quot; easier to support in RDF apps than the older hasPropertyOf (now often considered anti-)pattern. And inverse properties are just painful, as they usually require some form of inference to streamline the user experience. <br />
<br />
Not sure if that's helpful information, but for a project around semantic note-taking/logging, I played with different notations users might be comfortable with, for entering factoids using an unstructured input form (à la Twitter). I could identify the following patterns that still seemed to be acceptable (as shared/supported syntax). All of them can be implemented using role-noun predicates (assuming that predicate labels are similar to the predicate names):
<ul><li>SUBJECT'(s)? PREDICATE (:|is) OBJECT</li>
<li>OBJECT is SUBJECT'(s)? PREDICATE</li>
<li>OBJECT is (the)? PREDICATE of SUBJECT</li>
<li>SUBJECT has PREDICATE (:)? OBJECT</li>
<li>(the|a)? PREDICATE(s)? of SUBJECT (is|are) OBJECT ((,|and|&amp;) OBJECT)*</li></ul>
(There are more patterns, for things like tagging and typing, but the examples above are the predicate-related grammar rules).<br />
<br />
As soon as you add (has|is|of) to one PREDICATE, you get problems with the other notations, so role-noun seems to be a good fit.<br />
<br />
Unfortunately, one (non-trivial) problem remains: People (and Web 2.0 apps) also like 'SUBJECT PREDICATE_VERB OBJECT' (e.g. &amp;quot;likes&amp;quot;, &amp;quot;bookmarked&amp;quot;, &amp;quot;said&amp;quot;, &amp;quot;posted&amp;quot;, &amp;quot;is listening to&amp;quot; ...) and I don't have a proper idea how to handle those automatically yet, other than hard-coding support for the typical social media verbs. It could be possible to use wordnet to detect verbs and derive a canonicalized form, and then model those patterns as activities (activity = liking, bookmarking, saying, posting, listening, plus ACTIVITY_PERSON and ACTIVITY_TARGET or somesuch). If anyone has a suggestion, I'd be happy to hear it.


      </div>
    </content>
  </entry>

  <entry>
    <title>New ARC2 release</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/08/21/new-arc2-release"/>
    <id>http://bnode.org/blog/2009/08/21/new-arc2-release</id>
    <published>2009-08-21T10:20Z</published>
    <updated>2009-08-21T10:40:22Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Finally in sync with code.semsol.org and the BZR repository</summary>
    <category term="arc2"/>
    <category term="bzr"/>
    <category term="release"/>
    <category term="semanticweb"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
I moved <a href="http://code.semsol.org/source/arc/">ARC's codebase</a> to a BZR repository <a href="http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code">2 months ago</a> but didn't really find the time to synchronize it with the way I created bundles in the past. Today I finally linked the repository and its TGZ creation feature from the <a href="http://arc.semsol.org/download/">main download page</a>. This is the first bundle since March, so there are quite a number of fixes. Some tweaks were not logged, but from now on, the process should be more professional (thanks to the proper versioning system).<br />
<br />
Here is the raw list of changes, the most interesting are probably the improved RDFa extractor (cheers to Toby Inkster and Masahide Kanzaki for code) and the new auto-cleanup of unused values/hashes in the RDF store. I received a couple of more patches which will be integrated in the coming weeks:
<ul><li>new component: Resource </li>
<li>new method: completeQuery (PREFIX-injection)</li>
<li>Reader: new method: getResponseHeaders</li>
<li>RDFa: fixes, +3 test case PASSes (thx to Toby Inkster &amp; Masahide Kanzaki)</li>
<li>Class: auto-populate POST (php5 bug)</li>
<li>Class: refactored *PName methods</li>
<li>new methods: toIndex, toTriples, checkRegex</li>
<li>Parsers: unsetting reader object to fix garbage collection</li>
<li>SelectQueryHandler: improved LIKE-check for REGEX-rewriting</li>
<li>Class: used prefixes were not logged, leading to serialization gaps</li>
<li>Class: fixed root calculation bug in calcURI</li>
<li>Class: new methods: toDataURI/fromDataURI</li>
<li>ARC2_SPARQLScriptProcessor: improved automatic PREFIX injection</li>
<li>ARC2_RemoteStore: added automatic PREFIX injection and getResourceLabel method</li>
<li>ARC2_StoreSelectQueryHandler: fixed missing brackets in getExpressionSQL.</li>
<li>Reader: Improved timeout handling</li>
<li>Reader: support for port in http header (thx to Roan O'Sullivan)</li>
<li>Slowly starting to switch to inline PHPDoc documentation</li>
<li>Atom_Parser: Addition: support for link types</li>
<li>DeleteQueryHandler: Addition: cleanValueTables method (auto-called every 500 DELETE queries)</li>
<li>Class: new method: resetErrors</li>
<li>Class: switch from getScriptURI to getRequestURI in init()</li></ul>
<br />
<strong>In related news:</strong>
<ul><li>Tuukka Hastrup created an <a href="http://tuukka.sioc-project.org/arc2-starter-pack/">ARC 2 Starter Pack</a> that simplifies the process of setting up an ARC store.</li>
<li>Andrew Ritz created a <a href="http://twilight-labs.co.cc/blog/?p=10">WordPress extension</a> that lets you embed results from remote SPARQL endpoints directly in your blog pages.
</li></ul>
      </div>
    </content>
  </entry>

  <entry>
    <title>SKOS + DC + Linked Data = Semantic Tagging?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/08/19/skos-dc-linked-data-semantic-tagging"/>
    <id>http://bnode.org/blog/2009/08/19/skos-dc-linked-data-semantic-tagging</id>
    <published>2009-08-19T12:35Z</published>
    <updated>2009-08-19T13:04:42Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Using Dublin Core terms to link SKOS concepts to Linked Data entities</summary>
    <category term="dc"/>
    <category term="dcmi"/>
    <category term="faviki"/>
    <category term="semanticweb"/>
    <category term="skos"/>
    <category term="tagging"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Still looking for a simple way to tag concrete resources (to-do items, people, locations) with personal concepts (e.g. &amp;quot;non-profit&amp;quot;, &amp;quot;research&amp;quot;, &amp;quot;semweb&amp;quot;), and <strong>also</strong> with other non-conceptual resources (clients, projects), I skimmed through the fresh <a href="http://www.w3.org/TR/skos-reference/">SKOS Recommendation</a>. I'm still a fan of SKOS and frequently wonder about semweb apps where the internal models are grounded in pluggable, personal(!) SKOS schemes, instead of coordination-intensive RDF Schemas or OWL ontologies. I don't know if such an approach could really work, I guess network effects benefit more from rather tightly defined relations and identifiers. Mainly just to have it written down somewhere (this is really not well thought out yet), here are some of the related entry points and considerations:<br />
<ul><li><strong>Tagging should be personal.</strong><br />
While I like the idea of grounding tags in existing dictionaries such as DBPedia, tags seem to work best when they are as user-defined and informal as possible. Last year, I experimented with a tool that allowed me to tag things with other people's delicious tags. It just felt wrong, I wanted my &amp;quot;own&amp;quot; tags. (I think the latest <a href="http://www.faviki.com">Faviki</a> release is a nice example for combining the best of both worlds).</li>
<li><strong>SKOS supports personal tags</strong><br />
Concepts in SKOS are sort-of scoped (or &amp;quot;namespaced&amp;quot;). If I describe a &amp;quot;Fun&amp;quot; concept, it is defined as seen by the creator of the concept URI, i.e. I can annotate it with '<code>:Fun dct:creator &amp;lt;#me&amp;gt; ; dct:created &amp;quot;2009-08-19&amp;quot;</code>' etc, even though the general idea of Fun was clearly not invented by me, and definitely before today.</li>
<li><strong>Tags should be safely portable</strong><br />
Thanks to URIs, SKOS concepts can be ported to other applications, and they can be grouped and organized in so-called concept schemes, i.e. I could have a &amp;quot;Waving&amp;quot; in a &amp;quot;Dance&amp;quot; concept scheme, and also in a &amp;quot;Netiquette&amp;quot; scheme.</li>
<li><strong>There is a need to merge tag sets</strong><br />
If tags are used to organize all sorts of personal things, it should be possible to merge them into a unified model. Mainly for personal use (&amp;quot;personal world view&amp;quot;), but also for sharing with other people and linking to their views. This is again possible thanks to SKOS being based on RDF, URIs, and very loose semantics.</li>
<li><strong>There is a need to tag real-world objects with concepts</strong><br />
This is partly obvious. Tags are a means to an end. But while they are already widely used to annotate document-like resources (web pages, photos, etc), I'd also like to tag things like my projects, people in my address book, and similar non-documents. From the <a href="http://www.w3.org/TR/skos-primer/#secindexing">SKOS Primer</a>:
<cite class="cite">While the SKOS vocabulary itself does not include a mechanism for associating an arbitrary resource with a skos:Concept, implementors can turn to other vocabularies</cite> So, whatever predicate URI we are going to use, it's not going to be provided by SKOS directly. </li>
<li><strong>Maybe Dublin Core terms can link non-documents to concepts</strong><br />
This is a slightly controversial conclusion/assumption, given that DC terms are mainly associated with document metadata. But after exploring the <a href="http://dublincore.org/">DCMI website</a>, I can't find any clear evidence that their terms can't be used more generally. Both the <a href="http://dublincore.org/documents/usageguide/">Usage Guide</a> (thanks to <a href="http://twitter.com/_masaka/status/3403593488">Masahide</a> for the pointer) and the <a href="http://dublincore.org/documents/abstract-model/">Abstract Model</a> actually support this thought. The Usage guide mentions that &amp;quot;DC metadata can be applied to other resources as well&amp;quot; (but notes that the suitability may depend on the particular context at hand), and the Abstract Model states that the notion of a Dublin Core &amp;quot;resource&amp;quot; is equivalent to &amp;quot;Resource&amp;quot; defined in <a href="http://www.w3.org/2000/01/rdf-schema#">RDF Schema</a>, which can be anything, even including Literals. So, we can most probably use <code>dct:subject</code> or <code>dct:relation</code> to tag a project or person with a SKOS concept. </li>
<li><strong>There is a need to associate concepts with real-world objects</strong><br />
If we organize our personal concept space with SKOS, we may also want to more formally specify our personal concepts, so that other applications or people can merge them with their tags. Therefore, we need a predicate that can relate concepts to non-concepts such as <a href="http://dbpedia.org/">DBPedia</a> identifiers. Such a mechanism could maybe also help with RDF's general problem of URI aliases. I could have a personal, canonical concept URI for a resource and use it as a container for the resource's various aliases. Again, SKOS does not provide a predicate for this use case, so we've got to look elsewhere. </li>
<li><strong>Maybe Dublin Core terms can link concepts to real-world objects</strong><br />
Another possibly controversial conclusion, but again there is supporting text in the <a href="http://dublincore.org/documents/abstract-model/#sect-4">DCMI specs</a>: &amp;quot;<cite class="cite">A value associated with the Dublin Core Subject property is a concept (a conceptual entity) or a physical object or person (a physical entity)</cite>&amp;quot;. So, if the value of dc:subject can be a non-document, we can say things like <code>:Berlin a skos:Concept; dct:subject dbpedia:Berlin .</code>. This is very interesting because it could allow us to use dct:subject in both ways: for the tagging of things, and also for grounding tags. FOAF has a handy <a href="http://xmlns.com/foaf/spec/#term_primaryTopic">primaryTopic</a> term, which could work in this context, too, but unfortunately, its scope is (currently) set to foaf:Document. <a href="http://danbri.org/">DanBri</a> also suggested the creation of a dedicated <code>skos:it</code> (or similar) predicate which would be even better. </li>
<li><strong>Sometimes I'd like to &amp;quot;tag&amp;quot; real-world objects with real-world objects</strong><br />
Don't know if <em>tagging</em> is still the right word here, but what I mean is a generic relation for arbitrary things in a common application context. Often, we can do better by specifying the relation between two resources, but in other cases, a simple, maybe just temporary link, is better than laziness leading to a completely non-annotated resource. Given the two DCMI-related findings above, we could maybe conclude that a predicate like dct:relation can also be used to relate a project to a person, or the other way round, without having to invent a new predicate.
</li></ul>

&amp;lt;/brain:dump&amp;gt;

      </div>
    </content>
  </entry>

  <entry>
    <title>SemWeb T-Shirt Shop closed</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/07/24/semweb-t-shirt-shop-closed"/>
    <id>http://bnode.org/blog/2009/07/24/semweb-t-shirt-shop-closed</id>
    <published>2009-07-24T08:50Z</published>
    <updated>2009-07-24T08:57:48Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>I've closed the Spreadshirt shop we set up a year ago, due to lack of interest.</summary>
    <category term="semanticweb"/>
    <category term="shop"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
Just a quick FYI: I've closed the <a href="http://bnode.org/blog/2008/06/23/semantic-web-community-shop-now-open">SemWeb Spreadshirt Shop</a> from last year. I never had a payout (you have to reach a certain amount of profit before you earn actual money), and as I plan/have to discontinue most of my many pet projects anyway (Simplify Your Life etc.), this one was rather easy to start with.<br />
<br />
I guess my <a href="http://twitter.com/bengee">red semweb cap</a> just became a rarity ;)
      </div>
    </content>
  </entry>

  <entry>
    <title>The Semantic Web - Not a piece of cake...</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake"/>
    <id>http://bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake</id>
    <published>2009-07-08T14:55Z</published>
    <updated>2009-07-08T15:05:14Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>The SemWeb layercake diagram as an isometric infographic</summary>
    <category term="infographics"/>
    <category term="isometric"/>
    <category term="layer cake"/>
    <category term="semanticweb"/>
    <category term="stack"/>
    <category term="technologies"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
For a client project I've been looking at <a href="http://en.wikipedia.org/wiki/Isometric_projection">Isometric Projection</a>, which is not only nice for mapping 3D objects to a 2D environment, but even more so for adding a 3rd dimension to (previously) flat visual objects. The additional axis allows for much more information to be provided, without (necessarily ;) sacrificing compactness and simplicity.<br />
<br />
While I was pushing small boxes around on a 30° grid, <a href="http://twitter.com/jahendler/statuses/2489423431">Jim Hendler  tweeted</a> about his Layer Cake talk from the recent Dagstuhl meeting (which is awesome, BTW. <a href="http://www.cs.rpi.edu/~hendler/presentations/LayercakeDagstuhl-share.pdf">Read it</a>, if you haven't yet) and I started to wonder if an isometric version of the tech stack could help reduce the overload resulting from the current two-dimensional ones. Not really, I fear, but it was a fun experiment nontheless. Might be worth exploring this a little further. At least the concepts can be separated from specific technologies and the application layer has a different angle than before (which I personally think makes more sense). Anyway, just wanted to share the result. Enjoy.<br />
<br />
<a href="http://bnode.org/media/2009/07/08/semantic_web_technology_stack.png"><img src="http://bnode.org/media/2009/07/08/semantic_web_technology_stack_small.png" title="Semantic Web Technology Stack" alt="Semantic Web Technology Stack" /></a><br />
<br />
Feel free to <a href="http://creativecommons.org/licenses/by/3.0/">use and share</a>.

      </div>
    </content>
  </entry>

  <entry>
    <title>Code.semsol.org - A central home for semsol code</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code"/>
    <id>http://bnode.org/blog/2009/06/22/code-semsol-org-a-central-home-for-semsol-code</id>
    <published>2009-06-22T14:00Z</published>
    <updated>2009-06-23T06:37:56Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Semsol gets code repositories and browsers</summary>
    <category term="arc"/>
    <category term="bzr"/>
    <category term="repository"/>
    <category term="semanticweb"/>
    <category term="semsol"/>
    <category term="trice"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
The code bundles on the <a href="http://arc.semsol.org/">ARC website</a> are generated in an inefficient manual process, and each patch has to wait for the next to-be-generated zip file. The developer community is growing (there are now 600 ARC downloads each month), I'm increasingly receiving patches and requests for a proper repository, and the <a href="http://trice.semsol.org/">Trice framework</a> is about to get online as well. So I spent last week on building a dedicated source code site for all <a href="http://semsol.com/">semsol</a> projects at <a href="http://code.semsol.org/">code.semsol.org</a>.<br />
<br />
So far, it's not much more than a directory browser with source preview and a little method navigator. But it will simplify code sharing and frequent updates for me, and hopefully also for ARC and Trice developers. You can checkout various <a href="http://bazaar-vcs.org/">Bazaar</a> code branches and generate a bundle from any directory. The app can't display repository messages yet (the server doesn't have bzr installed, I'm just deploying branches using the handy FTP option), but I'll try to come up with a work-around or an alternative when time permits.<br />
<br />
<a href="http://code.semsol.org/"><img src="http://bnode.org/media/2009/06/22/code_browser.gif" title="Code Browser" alt="Code Browser" /></a>
      </div>
    </content>
  </entry>

  <entry>
    <title>CommonTag too complicated?</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/12/commontag-too-complicated"/>
    <id>http://bnode.org/blog/2009/06/12/commontag-too-complicated</id>
    <published>2009-06-12T11:25Z</published>
    <updated>2009-06-12T12:08:13Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>Not sure if the commontag effort sends the right message.</summary>
    <category term="commontag"/>
    <category term="microformats"/>
    <category term="modeling"/>
    <category term="semanticweb"/>
    <category term="tagging"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
<strong>Update: </strong> I just read the spec again, I can't tag non-content with the CommonTag vocabulary. Too bad, ignore the last paragraph, please.
<div class="hr"><hr /></div>
Sorry for raising my voice here, but some of us are really working hard to show that SemWeb technologies <em>don't</em> have to be complicated, and unfortunately, the new <a href="http://commontag.org/">CommonTag</a> effort seems to send exactly the opposite message.<br />
<br />
Don't get me wrong, a widely used tagging ontology would be great. We do have 3 (or 4? 5?) tagging vocabularies already, but none really caught up, possibly because tagging is meant to be simple and the proposed solutions apparently weren't easy enough. CommonTag is promoted as being &amp;quot;simple&amp;quot; and &amp;quot;easy&amp;quot;, but after looking at the examples in the <a href="http://www.commontag.org/QuickStartGuide">QuickStart Guide</a>, I'm not so sure:<br />
<ul><li>The snippets are really off-putting (not only for Non-RDFers). Do I really need multiple nested HTML nodes to create something as simple as a tag? </li>
<li>Couldn't the term names be more intuitive? What could a ctag:Tag be? The actual tag or an intermediate resource that is then, err, tagged? A person ctag:tagged a resource, right? Ah, no.</li>
<li>Why aren't the term names at least consistent? &amp;quot;ctag:taggingDate&amp;quot; follows noun-role, &amp;quot;ctag:tagged&amp;quot; is a dunno, &amp;quot;ctag:means&amp;quot; is a present-form verb, &amp;quot;ctag:isAbout&amp;quot; sort-of follows the hasPropertyOf anti-pattern.</li>
<li>The vocabulary introduces aliases for well-deployed terms such as rdfs:label and dct:created, which makes its use in practical settings expensive (it'll ease things on the author side, though).</li></ul>
<br />
To be a little more constructive: Using the vocabulary doesn't have to lead to the complicated markup seen in the examples. I'm sure they'll soon get better snippets from someone in the RDFa community. And apart from that, there is also a handy term in the <a href="http://commontag.org/ns#">RDF Schema</a> which might just be what you are looking for: &amp;quot;ctag:isAbout&amp;quot;. It lets you directly point from a resource (default is the page) to a Linked Data identifier (e.g. from DBPedia), without the need for all those intermediate nodes (which lead to triple bloat and slow down SPARQL queries). CommonTag-consuming apps will have to implement some form of inferencing to handle &amp;quot;isAbout&amp;quot;, but as the term is in the spec, I assume they plan to.<br />
<br />
Granular modeling of tags is apparently tricky, but shouldn't there be some sweet spot? Something a little more expressive than rel-tag but less complex than a fully spec'd Tag ontology? <a href="http://microformats.org/wiki/xfolk">xFolk</a> looks promising, or maybe the CommonTag group members could have agreed on formalizing and supporting &amp;quot;scoped rel-tag&amp;quot; (rel-tags with an optional RDFa &amp;quot;about&amp;quot; container). Most rel-tag-to-RDF converters have some form of scoping already anyway (because tags can apply to reviews, pages, vcards, etc.). <em>That</em> would have been a cool outcome after 1 year of stealth work.<br />
<br />
I may as well just over-stress the simplicity aspect here. Maybe CommonTag is &amp;quot;simple enough&amp;quot; for web publishers. There are some initial supporters, and for RDFers, the nested structures and bnodes will most probably be acceptable. So let's see how things evolve.<br />
<br />
<s>I personally think I'll have a closer look at ctag:isAbout. I'm still looking for an alternative to dc/dct:subject to tag arbitrary things with arbitrary identifiers, maybe CommonTag can provide it, although<br />
<pre class="code">&amp;lt;#me&amp;gt; ctag:isAbout dbpedia:Semantic_Web .</pre>
still doesn't sound right for a rich tag, and the domain is &amp;quot;ctag:TaggedContent&amp;quot; which sounds wrong for non-textual resources, too. (<a href="http://dublincore.org/documents/dcmi-terms/#terms-relation">dct:relation</a> is the best I could find so far for tagging things with things, but Dublin Core is coming from a publishing context and is therefore often recommended for describing publications only).<br />
</s>
      </div>
    </content>
  </entry>

  <entry>
    <title>ESWC 2009 Linked Data Dashboards</title>
    <link rel="alternate" type="text/html" hreflang="en" href="http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards"/>
    <id>http://bnode.org/blog/2009/06/04/eswc-2009-linked-data-dashboards</id>
    <published>2009-06-04T13:20Z</published>
    <updated>2009-06-04T19:19:04Z</updated>
    <author>
      <name>Benjamin Nowack</name>
    </author>
    <summary>A first Paggr application went live during ESWC2009.</summary>
    <category term="confx"/>
    <category term="dashboards"/>
    <category term="eswc2009"/>
    <category term="linked data"/>
    <category term="paggr"/>
    <category term="semanticweb"/>
    <category term="sparqlets"/>
    <category term="sparqlscript"/>
    <category term="widgets"/>
    <content type="xhtml" xml:lang="en">
      <div xmlns="http://www.w3.org/1999/xhtml">
In case you missed the tweets or a local announcement: The first <a href="http://paggr.com/about">Paggr</a> application went online a few days ago. This year's <a href="http://eswc2009.org/">ESWC</a> Technologies Team pushed things a little further, with <a href="http://social.eswc2009.org/">RFID tracking</a> during the event and extended <a href="http://data.semanticweb.org/conference/eswc/2009">conference data</a> that includes detailed session and date/time information (kudos to Michael Hausenblas for RDFizing even PDFs).<br />
<br />
Based on this dataset, we provided a <a href="http://personal.eswc2009.org/">conference explorer</a> and stress-tested the <a href="http://data.semanticweb.org/">&amp;quot;Dog Food&amp;quot;</a> server while at it. The system survived, but I also learned a lot. We used about 50 RDF stores for the different public and user-specific dashboards, which basically worked nicely. However, rendering non-ugly resource summaries requires a bit of endpoint hammering, and some of the more complex path queries resulted in timeouts. Yesterday, I had to create a mirror from the <a href="http://data.semanticweb.org/dumps/conferences/">data dump</a> to route a couple of widgets through a replicated (ARC :-) endpoint. But then this is also one of the powerful possibilities that come with semantic web technologies. You can often switch or double the back-end repository in no time, and without any code changes. (And as all the Sparqlets are created in a <a href="http://personal.eswc2009.org/widgets/">web-based tool</a>, I didn't even have to upload a changed configuration file. I simply tweaked a SPARQLScript parameter.)<br />
<br />
Anyway, there are a couple of <a href="http://personal.eswc2009.org/live">public</a> <a href="http://personal.eswc2009.org/">dashboards</a>, in case you'd like to give it a try (it's still an early version), I also embedded a short screencast below. The system is going to be moved to a <a href="http://deri.ie/">DERI</a> server when the conference is over, but the URIs and data will probably stay stable. (And no, it won't really work with IE yet.) More to come!<br />
 <br />
<object width="550" height="385"><param name="movie" value="http://www.youtube.com/v/D7V4YNJHWwU&amp;hl=de&amp;fs=1&amp;color1=0x006699&amp;color2=0x54abd6"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/D7V4YNJHWwU&amp;hl=de&amp;fs=1&amp;color1=0x006699&amp;color2=0x54abd6" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="550" height="385"></embed></object>
<br />
<br />
<a href="http://paggr.com/media/2009/06/eswc2009.mov">HQ version (quicktime, 110MB)</a>


      </div>
    </content>
  </entry>

</feed>
