finally a bnode with a uri

Posts tagged with: paggr

Semantic WYSIWYG in-place editing with Swipe

Introducing Swipe, Paggr's Microdata editor.
Several months ago (ugh, time flies) I posted a screencast demo'ing a semantic HTML editor. Back then I used a combination of client-side and server-side components, which I have to admit led to quite a number of unnecessary server round-trips.

In the meantime, others have shown that powerful client-side editors can be implemented on top of HTML5, and so I've now rewritten the whole thing and turned it into a pure JavaScript tool as well. It now supports inline WYSIWYG editing and HTML5 Microdata annotations.

The code is still at beta stage, but today I put up an early demo website which I'll use as a sandbox. The editor is called Swipe (like the dance move, but it's an acronym, too). What makes Swipe special is its ability to detect the caret coordinates even when the cursor is inside a text node, which is usually not possible with W3C range objects. This little difference enables several new possibilities, like precise in-place annotations or "linked-data-as-you-type" functionality for user-friendly entity suggestions. More to come soon...

Swipe - Semantic WYSIWYG in-place editor

Dynamic Semantic Publishing for any Blog (Part 2: Linked ReadWriteWeb)

A DSP proof of concept using ReadWriteWeb.com data.
The previous post described a generic approach to BBC-style "Dynamic Semantic Publishing", where I wondered if it could be applied to basically any weblog.

During the last days I spent some time on a test evaluation and demo system using data from the popular ReadWriteWeb tech blog. The application is not public (I don't want to upset the content owners and don't have any spare server anyway), but you can watch a screencast (embedded below).

The application I created is a semantic dashboard which generates dynamic entity hubs and allows you to explore RWW data via multiple dimensions. To be honest, I was pretty surprised myself by the dynamics of the data. When I switched back to the official site after using the dashboard for some time, I totally missed the advanced filtering options.



In case you are interested in the technical details, fasten your data seatbelt and read on.

Behind the scenes

As mentioned, the framework is supposed to make it easy for site maintainers and should work with plain HTML as input. Direct access to internal data structures of the source system (database tables, post/author/commenter identifiers etc.) should not be needed. Even RDF experts don't have much experience with side effects of semantic systems directly hooked into running applications. And with RDF encouraging loosely coupled components anyway, it makes sense to keep the semantification on a separate machine.

In order to implement the process, I used Trice (once again), which supports simple agents out of the box. The bot-based approach already worked quite nicely in Talis' FanHubz demonstrator, so I followed this route here, too. For "Linked RWW", I only needed a very small number of bots, though.

Trice Bot Console

Here is a quick re-cap of the proposed dynamic semantic publishing process, followed by a detailed description of the individual components:
  • Index and monitor the archives pages, build a registry of post URLs.
  • Load and parse posts into raw structures (title, author, content, ...).
  • Extract named entities from each post's main content section.
  • Build a site-optimized schema (an "ontology") from the data structures generated so far.
  • Align the extracted data structures with the target ontology.
  • Re-purpose the final dataset (widgets, entity hubs, semantic ads, authoring tools)

Archives indexer and monitor

The archives indexer fetches the by-month archives, extracts all link URLs matching the "YYYY/MM" pattern, and saves them in an ARC Store.

The implementation of this bot was straightforward (less than 100 lines of PHP code, including support for pagination); this is clearly something that can be turned into a standard component for common blog engines very easily. The result is a complete list of archives pages (so far still without any post URLs) which can be accessed through the RDF store's built-in SPARQL API:

Archives triples via SPARQL

A second bot (the archives monitor) receives either a not-yet-crawled index page (if available) or the most current archives page as a starting point. Each post link of that page is then extracted and used to build a registry of post URLs. The monitoring bot is called every 10 minutes and keeps track of new posts.

Post loader and parser

In order to later process post data at a finer granularity than the page level, we have to extract sub-structures such as title, author, publication date, tags, and so on. This is the harder part because most blogs don't use Linked Data-ready HTML in the form of Microdata or RDFa. Luckily, blogs are template-driven and we can use DOM paths to identify individual post sections, similar to how tools like the Dapper Data Mapper work. However, given the flexibility and customization options of modern blog engines, certain extensions are still needed. In the RWW case I needed site-specific code to expand multi-page posts, to extract a machine-friendly publication date, Facebook Likes and Tweetmeme counts, and to generate site-wide identifiers for authors and commenters.

Writing this bot took several hours and almost 500 lines of code (after re-factoring), but the reward is a nicely structured blog database that can already be explored with an off-the-shelf RDF browser. At this stage we could already use the SPARQL API to easily create dynamic widgets such as "related entries" (via tags or categories), "other posts by same author", "most active commenters per category", or "most popular authors" (as shown in the example in the image below).

Raw post structures

Named entity extraction

Now, the next bot can take each post's main content and enhance it with Zemanta and OpenCalais (or any other entity recognition tool that produces RDF). The result of this step is a semantified, but rather messy dataset, with attributes from half a dozen RDF vocabularies.

Schema/Ontology identification

Luckily, RDF was designed for working with multi-source data, and thanks to the SPARQL standard, we can use general purpose software to help us find our way through the enhanced assets. I used a faceted browser to identify the site's main entity types (click on the image below for the full-size version).

RWW through Paggr Prospect

Although spotting inconsistencies (like Richard MacManus appearing multiple times in the "author" facet) is easier with a visual browser, a simple, generic SPARQL query can alternatively do the job, too:

RWW entity types

Specifying the target ontology

The central entity types extracted from RWW posts are Organizations, People, Products, Locations, and Technologies. Together with the initial structures, we can now draft a consolidated RWW target ontology, as illustrated below. Each node gets its own identifier (a URI) and can thus be a bridge to the public Linked Data cloud, for example to import a company's competitor information.

RWW ontology

Aligning the data with the target ontology

In this step, we are again using a software agent and break things down into smaller operations. These sub-tasks require some RDF and Linked Data experience, but basically, we are just manipulating the graph structure, which can be done quite comfortably with a SPARQL 1.1 processor that supports INSERT and DELETE commands. Here are some example operations that I applied to the RWW data:
  • Consolidate author aliases ("richard-macmanus-1 = richard-macmanus-2" etc.).
  • Normalize author tags, Zemanta tags, OpenCalais tags, and OpenCalais "industry terms" to a single "tag" field.
  • Consolidate the various type identifiers into canonical ones.
  • For each untyped entity, retrieve typing and label information from the Linked Data cloud (e.g. DBPedia, Freebase, or Semantic CrunchBase) and try to map them to the target ontology.
  • Try to consolidate "obviously identical" entities (I cheated by merging on labels here and there, but it worked).
Data alignment and QA is an iterative process (and a slightly slippery slope). The quality of public linked data varies, but the cloud is very powerful. Each optimization step adds to the network effects and you constantly discover new consolidation options. I spent just a few hours on the inferencer, after all, the Linked RWW demo is just meant to be a proof of concept.

After this step, we're basically done. From now on, the bots can operate autonomously and we can (finally) build our dynamic semantic publishing apps, like the Paggr Dashboard presented in the video above.

Dynamic RWW Entity Hub

Conclusion

Dynamic Semantic Publishing on mainstream websites is still new, and there are no complete off-the-shelf solutions on the market yet. Many of the individual components needed, however, are available. Additionally, the manual effort to integrate the tools is no longer incalculable research, but is getting closer to predictable "standard" development effort. If you are perhaps interested in a solution similar to the ones described in this post, please get in touch.

Contextual configuration - Semantic Web development for visually minded webmasters

A short screencast demonstrating contextual configuration via widgets in semsol's RDF CMS.
Let's face it, building semantic web sites and apps is still far from easy. And to some extent, this is due to the configuration overhead. The RDF stack is built around declarative languages (for simplified integration at various levels), and as a consequence, configuration directives often end up in some form of declarative format, too. While fleshing out an RDF-powered website, you have to declare a ton of things. From namespace abbreviations to data sources and API endpoints, from vocabularies to identifier mappings, from queries to object templates, and what have you.

Sadly, many of these configurations are needed to style the user interface, and because of RDF's open world context, designers have to know much more about the data model and possible variations than usually necessary. Or webmasters have to deal with design work. Not ideal either. If we want to bring RDF to mainstream web developers, we have to simplify the creation of user-optimized apps. The value proposition of semantics in the context of information overload is pretty clear, and some form of data integration is becoming mandatory for any modern website. But the entry barrier caused by large and complicated configuration files (Fresnel anyone?) is still too high. How can we get from our powerful, largely generic systems to end-user-optimized apps? Or the other way round: How can we support frontend-oriented web development with our flexible tools and freely mashable data sets? (Let me quickly mention Drupal here, which is doing a great job at near-seamlessly integrating RDF. OK, back to the post.)

Enter RDF widgets. Widgets have obvious backend-related benefits like accessing, combining and re-purposing information from remote sources within a manageable code sandbox. But they can also greatly support frontend developers. They simplify page layouting and incremental site building with instant visual feedback (add a widget, test, add another one, re-arrange, etc.). And, more importantly in the RDF case, they can offer a way to iteratively configure a system with very little technical overhead. Configuration options could not only be scoped to the widget at hand, but also to the context where the widget is currently viewed. Let's say you are building an RDF browser and need resource templates for all kinds of items. With contextual configuration, you could simply browse the site and at any position in the ontology or navigation hierarchy, you would just open a configuration dialog and define a custom template, if needed. Such an approach could enable systems that worked out of the box (raw, but usable) and which could then be continually optimized, possibly even by site users.

A lot of "could" and "would" in the paragraphs above, and the idea may sound quite abstract without actually seeing it. To illustrate the point I'm trying to make I've prepared a short video (embedded below). It uses Semantic CrunchBase and Paggr Prospect (our new faceted browser builder) as an example use case for in-context configuration.

And if you are interested in using one of our solutions for your own projects, please get in touch!



Paggr Prospect (part 1)


Paggr Prospect (part 2)

Could having two RDF-in-HTMLs actually be handy?

A combination of RDFa and Microdata would allow for separate semantic layers.
Apart from grumpy rants about the complexity of W3C's RDF specs and semantic richtext editing excitement, I haven't blogged or tweeted a lot recently. That's partly because there finally is increased demand for the stuff I'm doing at semsol (agency-style SemWeb development), but also because I've been working hard on getting my tools in a state where they feel more like typical Web frameworks and apps. Talis' Fanhu.bz is an example where (I think) we found a good balance between powerful RDF capabilities (data re-purposing, remote models, data augmentation, a crazy army of inference bots) and a non-technical UI (simplistic visual browser, Twitter-based annotation interfaces).

Another example is something I've been working on during the last months: I somehow managed to combine essential parts of Paggr (a drag&drop portal system based on RDF- and SPARQL-based widgets) with an RDF CMS (I'm currently looking for pilot projects). And although I decided to switch entirely to Microdata for semantic markup after exploring it during the FanHubz project, I wonder if there might be room for having two separate semantic layers in this sort of widget-based websites. Here is why:

As mentioned, I've taken a widget-like approach for the CMS. Each page section is a resource on its own that can be defined and extended by the web developer, it can be styled by themers, and it can be re-arranged and configured by the webmaster. In the RDF CMS context, widgets can easily integrate remote data, and when the integrated information is exposed as machine-readable data in the front-end, we can get beyond the "just-visual" integration of current widget pages and bring truly connectable and reusable information to the user interface.

Ideally, both the widgets' structural data and the content can be re-purposed by other apps. Just like in the early days of the Web, we could re-introduce a copy & paste culture of things for people to include in their own sites. With the difference that RDF simplifies copy-by-reference and source attribution. And both developers and end-users could be part of the game this time.

Anyway, one technical issue I encountered is when you have a page that contains multiple page items, but describes a single resource. With a single markup layer (say Microdata), you get a single tree where the context of the hierarchy is constantly switching between structural elements and content items (page structure -> main content -> page layout -> widget structure -> widget content). If you want to describe a single resource, you have to repeatedly re-introduce the triple subject ("this is about the page structure", "this is about the main page topic"). The first screenshot below shows the different (grey) widget areas in the editing view of the CMS. In the second screenshot, you can see that the displayed information (the marked calendar date, the flyer image, and the description) in the main area and the sidebar is about a single resource (an event).

Trice CMS Editor
Trice CMS editing view

Trice CMS Editor
Trice CMS page view with inline widgets describing one resource

If I used two separate semantic layers, e.g. RDFa for the content (the event description) and Microdata for the structural elements (column widths, widget template URIs, widget instance URIs), I could describe the resource and the structure without repeating the event subject in each page item.

To be honest, I'm not sure yet if this is really a problem, but I thought writing it down could kick off some thought processes (which now tend towards "No"). Keeping triples as stand-alone-ish as possible may actually be an advantage (even if subject URIs have to be repeated). No semantic markup solution so far provides full containment for reliable copy & paste, but explicit subjects (or "itemid"s in Microdata-speak) could bring us a little closer.

Conclusions? Err.., none yet. But hey, did you see the cool CMS screenshots?

ESWC 2009 Linked Data Dashboards

A first Paggr application went live during ESWC2009.
In case you missed the tweets or a local announcement: The first Paggr application went online a few days ago. This year's ESWC Technologies Team pushed things a little further, with RFID tracking during the event and extended conference data that includes detailed session and date/time information (kudos to Michael Hausenblas for RDFizing even PDFs).

Based on this dataset, we provided a conference explorer and stress-tested the "Dog Food" server while at it. The system survived, but I also learned a lot. We used about 50 RDF stores for the different public and user-specific dashboards, which basically worked nicely. However, rendering non-ugly resource summaries requires a bit of endpoint hammering, and some of the more complex path queries resulted in timeouts. Yesterday, I had to create a mirror from the data dump to route a couple of widgets through a replicated (ARC :-) endpoint. But then this is also one of the powerful possibilities that come with semantic web technologies. You can often switch or double the back-end repository in no time, and without any code changes. (And as all the Sparqlets are created in a web-based tool, I didn't even have to upload a changed configuration file. I simply tweaked a SPARQLScript parameter.)

Anyway, there are a couple of public dashboards, in case you'd like to give it a try (it's still an early version), I also embedded a short screencast below. The system is going to be moved to a DERI server when the conference is over, but the URIs and data will probably stay stable. (And no, it won't really work with IE yet.) More to come!



HQ version (quicktime, 110MB)

Paggr screencast: Conference Explorer (proto)

Prototype screencast of a semantic conference explorer for ESWC 2009.
I just returned from a short, doc-enforced trip to Nice (awesome place, savoir-vivre and all that) and will fly to the NYC SemWeb Meetup in a few days. Before we went to France, I created another Paggr screencast. This one is the first to show the (user-facing) dashboard and widgets we plan to make available as a semantic conference explorer at ESWC 2009. Still some way to go, but I'm optimistic that we'll have a number of handy helpers online by the beginning of the event. I won't be able to attend in person, so I'm highly motivated to have at least a twitter and twitpic tracker up and running then.



HQ version (quicktime, 134MB)

Paggr article in Nodalities Magazine 6

The latest NodMag issue features an article about Paggr.
Talis' new Nodalities Mag is now available online (and the print version is on its way to subscribers). This issue contains six semantic web articles, including one about Paggr:
  • Linking Data and Semantics at O'Reilly - Gavin Carothers and Charles Greer tell O'Reilly Media's Linked Data story.
  • Discovering SPARQL - Alex Tucker exposes SPARQL endpoints via Bonjour.
  • Linked Data In(ter)Action - Benjamin Nowack discusses Paggr.
  • Introducing: STI International
  • Social Semantic Web Scales in the Cloud - Simon Schenk discusses SemaPlorer
  • Streams, Pools and Reservoirs - Leigh Dodds explores flowing data

Paggr screencast: Linked Data Widget Builder

A screencast about Paggr's sparqlet builder.
Running an R&D-heavy agency in the current economical climate is pretty tough, but there are also a couple of new opportunities for these semantic solutions that help reduce costs and do things more efficiently. I'm finally starting to get project requests that include some form of compensation. Not much yet (all budgets seem to be very tight these days), but it's a start, and together with support from Susanne, I could now continue working on Paggr, semsol's Netvibes-like dashboard system for the growing web of Linked Data.

An article about Paggr will be in the next Nodalities Magazine, and the ESWC2009 technologies team is considering a custom system for attendees which is a great chance to maybe get other conference organizers interested. (I see much potential in a white-label offering, but a more mainstream-ish version for Web 2.0 data is still on my mind. Just have to focus on getting self-sustained first.)

Below is a short screencast that demonstrates a first version of the sparqlet (= semantic widget) builder. I've de-coupled sparqlet-serving from the dashboard system, so that I'll be able to open-source the infrastructure parts of Paggr more easily. Another change from the October prototype is the theme-ability of both dashboards and widget servers. Lots of sun, sky, and sea for ESWC ;-)



HQ version (quicktime, 120MB)

The Linked Data Value Spiral

The value of Linked Data grows when it's utilized an re-distributed
I'm currently writing an article about paggr for the Nodalities Magazine. As there is not too much to write about yet, I'm focusing on the basic idea (customizable Linked Data dashboards), its inspiration (TimBL's RDF Clipboard concept), enabling technologies and trends (Live Clipboard, widgets, AJAX homepages, sub-page-level interaction), and the user interface challenges related to generic interaction with Linked Data.

One thing that I thought might be worth sharing separately is the "Linked Data Value Spiral" below. It tries to illustrate that semantic data don't have a single-loop life cycle, but that re-distributing utilized ( = newly meshed/combined) information will create a self-enforcing "Linked Data ecosystem". I tried to associate the individual value creation processes with SemWeb market sectors. (RDF stores, for example, are typical information organization products, paggr tries to remove the bottleneck between utilization and re-distribution, etc.)

Linked Data Value Spiral

It's just an abstraction, the boundaries are of course blurry (a SPARQL endpoint can help with both utilization and discovery), but I still find the simple spiral and its segments handy to classify current products and companies. It even helped me a little to identify market opportunities and gaps:
  • The recent VoiD effort could have a significant impact on the whole Semantic Web progress.
  • Entity extraction providers like Zemanta and OpenCalais could play a huge role to boost creation processes.
  • What about "accelerator" products that offer a shortcut between utilization and creation (i.e. apps that create Linked Data while you are using them, with instant re-distribution)?
  • Is it a problem when a service like Freebase exports RDF but doesn't provide links to external datasets?
  • ...

Feel free to use and share.

paggr wins Semantic Web Challenge 2008

ISWC 2008 in Karlsruhe was just great. Even won the Semantic Web Challenge.
paggr wins semantic web challenge What can I say? I'm still smiling like on the pic on the left (credits: Keith Alexander). And you have no idea how urgently I need the money ;-)

paggr has received very encouraging feedback (or premature praise, rather), so I'm busily working on getting the beta out as soon as possible. Especially given that paggr wouldn't have had a chance to convince the judges without the great amount of Linked Data and all the painful spec work by the Semantic Web Community. The ball's in my court to actually deliver now.

There are some items left on my todo list before I dare sending out more invitation codes (some were added after feedback at ISWC):
  • improved RDF exporter for portals and individual widgets (just finished the first version, using a new thingy called poshRDF)
  • the widget and agent builders should be visual, more like the cool SPARQLMotion editor (I'm working on that now).
  • dropping a widget item on the canvas should auto-open a corresponding details widget
  • widgets should be able to "listen to" other widgets for auto-refreshs
  • a setup wizard that lets you specify initial accounts and data sources

I assume that a fully generic semantic widget and agent platform might be either over- or underwhelming, so I plan to provide a set of ready-to-run apps for paggr. Here are some ideas:
  • feed reader with rich filtering and bookmarking (to del.icio.us?) and rating
  • microblog and aggregator (twitter + identi.ca + groups + filters + posting)
  • address book
  • semantic email client
  • calendaring
  • decentralized social network (portable personal profile + lifestream aggregation)
Do you have any preferences or ideas for apps you'd like to see on paggr? I'd be very happy about your suggestions. Leave a comment here or send a mail to paggr [at] semsol [dot] com.

paggr teaser video and pre-registration site online

paggr teaser video and landing page
I've been semi-silently working on something new. A combination of many semwebby things I came across and played with during the last 3 years or so:
  • semantic markup
  • smart data
  • an rdf clipboard
  • ajax
  • sparql sparql sparql
  • sparql + scripting
  • sparql + templates
  • sparql + widgets
  • lightweight, federated semweb services and bots
  • UIs for open data
  • semwikis
  • agile and collaborative web development

So, what happens when you put this all together? At least something interesting, and perhaps semsol's first commercial service. (Or product, this is all just LAMP stuff and can easily be run in an intranet or on a hosted server). Anyway, still some way to go. It's called paggr, the landing page is up, and today I created a first teaser/intro video.

I'll demo the beta (launch planned for November) at upcoming ISWC during the poster session (my poster is about SPARQL+ and SPARQLScript, the two SPARQL extensions that paggr is based on). I may have early invites by then.

As a preparation for the hopefully busy fall and winter months, though, I'll be on vacation for the next two weeks. No Email, no Web, no Phone. Yay!



HQ version (quicktime, 130MB)

Semantic Web gets a mention in Visual-x mag webinale report

A nice paragraph about my talk at last week's webinale
The visual-x mag just published a webinale report that contains a nice summary of my talk (and even a link to paggr). Phew, this means that at least some people were not scared off, which is great personally, but also (and more importantly) from a SWEO perspective.

My first screencast

Tools used for a screencast about forthcoming paggr collector.
I created my first screencast this weekend, oh dear ;-)

I didn't really have a lot to talk about yet (I plan to do some screencasts for the paggr system), so this one was more for testing a number of different tools (the screencast itself is about paggr's soon-polymorphic drag and drop). I used a Windows box and found the following tools quite useful:
  • CamStudio for screen and audio recording,
  • a Magix tool (commercial, I assume there are alternatives) to cut and create an mpeg version from the CamStudio avi,
  • SUPER for converting the mpg file to the Flash video (flv) format,
  • and a free FLV player I found on the web somewhere. It hopefully isn't doing evil things on my server now..

The tools make the technical side of things rather convenient, but it still was more work than expected, and I sound just horrible (sometimes close to "zank you for traffeling wiz Deutsche Bahn", if you know what I mean).

Archives/Search

YYYY or YYYY/MM
No Posts found

Feeds