Posts tagged with: rdf

Naming Properties and Relations (comment)

A local comment to JeniT's post about predicate names
I was incapable of adding a comment to Jeni's interesting post about RDF predicate Names (markdown-related, my fault), so I'll quickly post it here, as I'm pondering similar things, too.

In her post, Jeni explores the issues around naming RDF terms. The community gathered a couple of experiences and suggestions in the last years, some entry points are:
I personally find "role-noun" easier to support in RDF apps than the older hasPropertyOf (now often considered anti-)pattern. And inverse properties are just painful, as they usually require some form of inference to streamline the user experience.

Not sure if that's helpful information, but for a project around semantic note-taking/logging, I played with different notations users might be comfortable with, for entering factoids using an unstructured input form (à la Twitter). I could identify the following patterns that still seemed to be acceptable (as shared/supported syntax). All of them can be implemented using role-noun predicates (assuming that predicate labels are similar to the predicate names):
  • SUBJECT'(s)? PREDICATE (:|is) OBJECT
  • OBJECT is SUBJECT'(s)? PREDICATE
  • OBJECT is (the)? PREDICATE of SUBJECT
  • SUBJECT has PREDICATE (:)? OBJECT
  • (the|a)? PREDICATE(s)? of SUBJECT (is|are) OBJECT ((,|and|&) OBJECT)*
(There are more patterns, for things like tagging and typing, but the examples above are the predicate-related grammar rules).

As soon as you add (has|is|of) to one PREDICATE, you get problems with the other notations, so role-noun seems to be a good fit.

Unfortunately, one (non-trivial) problem remains: People (and Web 2.0 apps) also like 'SUBJECT PREDICATE_VERB OBJECT' (e.g. "likes", "bookmarked", "said", "posted", "is listening to" ...) and I don't have a proper idea how to handle those automatically yet, other than hard-coding support for the typical social media verbs. It could be possible to use wordnet to detect verbs and derive a canonicalized form, and then model those patterns as activities (activity = liking, bookmarking, saying, posting, listening, plus ACTIVITY_PERSON and ACTIVITY_TARGET or somesuch). If anyone has a suggestion, I'd be happy to hear it.

Back from New York "Semantic Web for PHP Developers" trip

Gave a talk and a workshop in NYC about SemWeb technologies for PHP developers
/me at times square I'm back from New York, where I was given the great opportunity to talk about two of my favorite topics: Semantic Web Development with PHP, and (not necessarily semantic) Software Development using RDF Technology. I was especially looking forward to the second one, as that perspective is not only easier to understand for people from a software engineering context, but also because it is still a much neglected marketing "back-door": If RDF simplifies working with data in general (and it does), then we should not limit its use to semantic web apps. Broader data distribution and integration may naturally follow in a second or third step once people use the technology (so much for my contribution to Michael Hausenblas' list of RDF MalBest Practices ;)

The talk on Thursday at the NY Semantic Web Meetup was great fun. But the most impressive part of the event were the people there. A lot to learn from on this side of the pond. Not only very practical and professional, but also extremely positive and open. Almost felt like being invited to a family party.

The positive attitude was even true for the workshop, which I clearly could have made more effective. I didn't expect (but should have) that many people would come w/o a LAMP stack on their laptops, so we lost a lot of time setting up MAMP/LAMP/WAMP before we started hacking ARC, Trice, and SPARQL.

Marco brought up a number of illustrating use cases. He maintains an (inofficial, sorry, can't provide a pointer) RDF wrapper for any group on meetup.com, so the workshop participants could directly work with real data. We explored overlaps between different Meetup groups, the order in which people joined selected groups, inferred new triples from combined datasets via CONSTRUCT, and played with not-yet-standard SPARQL features like COUNT and LOAD.

And having done the workshop should finally give me the last kick to launch the Trice site now. The code is out, and it's apparently not too tricky to get started even when the documentation is still incomplete. Unfortunately, I have a strict "no more non-profits" directive, but I think Trice, despite being FOSS, will help me get some paid projects, so I'll squeeze an official launch in sometime soon-ish.

Below are the slides from the meetup. I added some screenshots, but they are probably still a bit boring without the actual demos (I think a video will be put up in a couple of days, though).

ARC Graph Gear Serializer Plugin

Patrick Murray-John created an ARC2 converter for Graph Gear visualizations
Patrick Murray-John (who is currently Semantifying the University of Mary Washington) just released a first version of an ARC2 converter for Graph Gear visualizations. Looks pretty cool.
Graph Gear visualization from RDF via ARC

RDF/SPARQL-based web development for PHP coders: Meetup presentation and workshop in NYC

I'll give a talk and run a workshop in New York City in May.
The Linked Data meme is spreading and we have strong indications that web developers who understand and know how to apply practical semantic web technologies will soon be in high demand. Not only in enterprise settings but increasingly for mainstream and agency-level projects where scripting languages like PHP are traditionally very popular.

I can't really afford travelling to promote the interesting possibilities around RDF and SPARQL for PHP coders, so I'm more than happy that Meetup master Marco Neumann offered me to come over to New York and give a talk at the Meetup on May 21st. Expect a fun mixture of "Getting started" hints, demos, and lessons learned. In order to make this trip possible, Marco is organizing a half-day workshop on May 22nd, where PHP developers will get a hands-on introduction to essential SemWeb technologies. I'm really looking forward to it (and big thanks to Marco).

So, if you are a PHP developer wondering about the possibilities of RDF, Linked Data & Co, come to the Meetup, and if you also want to get your hands dirty (or just help me pay the flight ticket ;) the workshop could be something for you, too. I'll arrive a few days earlier, by the way, in case you want to add another quaff:drankBeerWith triple to your FOAF file ;)

ARC now also GPL-licensed

ARC is now available under the W3C Software or the GPL license
Arto Bendiken and Stéphane Corlosquet asked me to provide ARC also under the GPL (for Drupal, in addition to the current W3C Software License), so here you are.

ARC is already used by several modules that help turn Drupal into an RDF-powered CMS, for example the RDF API, the SPARQL extension, or the Calais module. The new license will make it easier for the Drupal community to directly bundle ARC with their RDF extensions. I guess that Drupal will have its own complete RDF toolkit one day, but it's great to see ARC being utilized for accelerating the development progress.

Paggr screencast: Linked Data Widget Builder

A screencast about Paggr's sparqlet builder.
Running an R&D-heavy agency in the current economical climate is pretty tough, but there are also a couple of new opportunities for these semantic solutions that help reduce costs and do things more efficiently. I'm finally starting to get project requests that include some form of compensation. Not much yet (all budgets seem to be very tight these days), but it's a start, and together with support from Susanne, I could now continue working on Paggr, semsol's Netvibes-like dashboard system for the growing web of Linked Data.

An article about Paggr will be in the next Nodalities Magazine, and the ESWC2009 technologies team is considering a custom system for attendees which is a great chance to maybe get other conference organizers interested. (I see much potential in a white-label offering, but a more mainstream-ish version for Web 2.0 data is still on my mind. Just have to focus on getting self-sustained first.)

Below is a short screencast that demonstrates a first version of the sparqlet (= semantic widget) builder. I've de-coupled sparqlet-serving from the dashboard system, so that I'll be able to open-source the infrastructure parts of Paggr more easily. Another change from the October prototype is the theme-ability of both dashboards and widget servers. Lots of sun, sky, and sea for ESWC ;-)



HQ version (quicktime, 120MB)

Semantics @ SIMsKultur Online

SIMsKultur Online is adding semantics
Exciting times, it really looks like we are about to witness RDF's tipping point. Every other week we see another service adding semantic web support. I didn't even find time to play with O'Reilly's RDF data yet, and yesterday I already came across the next site: SIMsKultur not only added RDF export for all events (more info at evo42), but also put up a hacked smesher instance to enrich and filter their Tweets (work in progress). I've been told that even SPARQL support is on their list.

smesher @ SIMsKultur

This is exactly the stuff I was dreaming of when I started with RDF development: Web agencies enhancing their customers' experience with easy-to-deploy solutions. I didn't expect it to become such a marathon, and we're still not fully there yet, but it feels a lot like we're finally hitting the home stretch :-)

Quick thoughts on semantic microblogging

Motivation and wish list for a personal semantic microblogging system
This week, the first "Microblogging Conference Europe" will take place in Hamburg. I was lucky to get a late ticket (thanks to Dirk Olbertz, who won't be able to make it). The conference will have barcamp-style tracks, and (narrow-minded as I am) I started thinking about adding SemWeb power to microblogging.

The more I use Twitter and advanced clients like TweetDeck, the more I think that (slightly enhanced) microblogs could become great interfaces to the (personalized) Semantic Web. I'm already noticing that I don't use a feed reader or delicious to discover relevant content any more. I'm effectively saving time. But simultaneously it becomes obvious that Twitter can be a distracting productivity killer. So, here is the idea: Take all the good things from microblogging and add enough semantics to increase productivity again. And while at it, utilize this semantic microblog as a work/life/idea log.

A semantic microblog would simplify the creation of structured, machine-readable information, in part for personal use, and generally to let the computer take care of certain tasks or do things that I didn't think of yet.

I have only two days left to prepare a demo and a talk, so I better start developing. I'll keep the rest of this post short and log my progress on Twitter instead. The app will be called "smesher". I'm starting now (or rather tomorrow morning, have to leave in 15 mins).

Use cases

  • How much time did I spend doing support this month?
  • Who are my real contacts (evidence-driven, please, why do I have to manually add followees)?
  • Show me a complete history of activities related to project A
  • How much can I bill client B? (or even better: Generate an invoice for client B)
  • What was that great Tapas Bar we went to last summer again?
  • Where did I first meet C?
  • Bookmarks ranked by number of occurrences in other tweets
  • Show me all my blog posts about topic D
  • ...

Microblogs: Strengths

  • Microblogs are web-based
  • Microblogs are very easy to use ("less is more")
  • Microblogs offer a great communication channel (asynchronous, but almost instant)
  • Microblog clients are getting ubiquitous
  • Microblogs can be used as life logs
  • Microblogs can be used for note taking
  • Microblogs can be used for bookmarking
  • Microblogs can be used for announcements
  • Microblogs can accelerate software development (near-real-time feedback loop)
  • Microblog search (and the associated feeds) can be used to track interests
  • hashtags are a simple way to annotate posts
  • A Microblog can be used as an interface to bots

Some Requirements and Nice-to-haves for semantic microblogging

  • access to a post's default information (author, title, date, source)
  • support for evolving patterns (@-recipients, people mentioned, URLs mentioned, hashtags, Re-Tweets)
  • groups, or at least private notes (some posts just don't need to be on the public timeline ;)
  • complete archives
  • perhaps semantic auto-tagging
  • post-publication tags (I'll surely forget a necessary tag every now and then)
  • private tags?
  • keep the simple UI (no checkbox overload etc.)
  • support for machine tags or a similar grassroots extensibility mechanism to increase granularity without losing usability/simplicity
  • an API that supports user-defined and evolving structures
  • custom streams/tabs à la TweetDeck, but with semantic filtering (e.g. "This month's working hours")
  • URL expander for bit.ly etc.
  • rules to create/infer/extract information from (machine) tags and existing data, maybe recursively
  • Twitter/Identi.ca tracking/relaying

Approach

  • Getting Real (UI first etc., worked great last time)
  • RDF 'n' SPARQL FTW: I don't know what the final data model is going to be, and I want an API but don't have time to code it.

Related Work

Knowee - (The beginning of) a semantic social web address book

Knowee is a web address book that lets you integrate distributed social graph fragments. A new version is online at knowee.net.
Heh, this was planned as a one-week hack but somehow turned into a full re-write that took the complete December. Yesterday, I finally managed to tame the semantic bot army and today I've added a basic RDF editor. A sponsored version is now online at knowee.net, a code bundle for self-hosting will be made available at knowee.org tomorrow.

What is Knowee?

Knowee started as a SWEO project. Given the insane number of online social networks we all joined, together with the increasing amount of machine-readable "social data" sources, we dreamed of a distributed address book, where the owner doesn't have to manually maintain contact data, but instead simply subscribes to remote sources. The address book could then update itself automatically. And -in full SemWeb spirit- you'd get access to your consolidated social graph for re-purposing. There are several open-source projects in this area, most notably NoseRub and DiSo. Knowee is aiming at interoperability with these solutions.
knowee concept

Ingredients

For a webby address book, we need to pick some data formats, vocabularies, data exchange mechanisms, and the general app infrastructure:
  • PHP + MySQL: Knowee is based on the ubiquitous LAMP stack. It tries to keep things simple, you don't need system-level access for third-party components or cron jobs.
  • RDF: Knowee utilizes the Resource Description Framework. RDF gives us a very simple model (triples), lots of different formats (JSON, HTML, XML, ...), and free, low-cost extensibility.
  • FOAF, OpenSocial, microformats, Feeds: FOAF is the leading RDF vocabulary for social information. Feeds (RSS, Atom) are the lowest common denominator for exchanging non-static information. OpenSocial and microformats are more than just schemas, but the respective communities maintain very handy term sets, too. Knowee uses equivalent representations in RDF.
  • SPARQL: SPARQL is the W3C-recommended Query language and API for the Semantic Web.
  • OpenID: OpenID addresses Identity and Authentication requirements.
I'm still working on a solution for access control, the current Knowee version is limited to public data and simple, password-based access restrictions. OAuth is surely worth a look, although Knowee's use case is a little different and may be fine with just OpenID + sessions. Another option could be the impressive FOAF+SSL proposal, I'm not sure if they'll manage to provide a pure-PHP implementation for non-SSL-enabled hosts, though.

Features / Getting Started

This is a quick walk-through to introduce the current version.
Login / Signup
Log in with your (ideally non-XRDS) OpenID and pick a user name.

knowee login

Account setup
Knowee only supports a few services so far. Adding new ones is not hard, though. You can enable the SG API to auto-discover additional accounts. Hit "Proceed" when you're done.

knowee accounts

Profile setup
You can specify whether to make (parts of) your consolidated profile public or not. During the initial setup process, this screen will be almost empty, you can check back later when the semantic bots have done their job. Hit "Proceed".

knowee profile

Dashboard
The Dashboard shows your personal activity stream (later versions may include your contacts' activities, too), system information and a couple of shortcuts.
knowee dashboard

Contacts
The contact editor is still work in progress. So far, you can filter the list, add new entries, and edit existing contacts. The RDF editor is still pretty basic (Changes will be saved to a separate RDF graph, but deleted/changed fields may re-appear after synchronization. This needs more work.) The editor is schema-based and supports the vocabularies mentioned above. You'll be able to create your own fields at some later stage.

It's already possible to import FOAF profiles. Knowee will try to consolidate imported contacts so that you can add data from multiple sources, but then edit the information via a single form. The bot processor is extensible, we'll be able to add additional consolidators at run-time, it only looks at "owl:sameAs" at the moment.
knowee contacts

Enabling the SPARQL API
In the "Settings" section you'll find a form that lets you activate a personal SPARQL API. You can enable/protect read and/or write operations. The SPARQL endpoint provides low-level access to all your data, allows you to explore your social graph, or lets you create backups of your activity stream.

knowee api knowee api

That's more or less it for this version. You can always reset or delete your account, and manually delete incorrectly monitored graphs. The knowee.net system is running on the GoGrid cloud, but I'm still tuning things to let the underlying RDF CMS make better use of the multi-server setup. If things go wrong, blame me, not them. Caching is not fully in place yet, and I've limited the installation to 100 accounts. Give it a try, I'd be happy about feedback.

OpenSocial in RDF

I've created an RDF converter for the OpenSocial field definitions.
I'm currently working on a new release of Knowee. This is another (long-promised) item on my ToDo list before I can finally concentrate on paggr (although it took too long already and hopefully won't break my neck. All the planned paid projects for bootstrapping paggr didn't happen, due to frozen budgets and politics. I hope the situation here improves soon.)

So, while I was trawling the vocabulary market, trying to gather terms for the stuff that Knowee works with (people, their profiles, contacts, accounts, and activities), I remembered OpenSocial, the effort to standardize basic interactions between social networking sites. I can use a good amount of FOAF, but OpenSocial has very handy things such as a generic "tags" field and a clean vCard mapping. And it's a super-set of Portable Contacts, too.

Today, I wrote a converter that extracts the field definitions from the JavaScript specification files, together with their labels, comments, domains, and value types. (A little too late, I found out that Dan Brickley had already done part of this a couple of months ago, could have saved me some work, d'oh.)

I've just added the osoc spec to web-semantics.org/ns. I hope it might be of use to others as well. Funnily, the "relationship" term was not part of any of the source files, maybe I still have to invent a property (a foaf:knows equivalent that also works with organizations).

poshRDF - RDF extraction from microformats and ad-hoc markup

poshRDF is a new attempt to extract RDF from microformats and ad-hoc markup
I've been thinking about this since Semantic Camp where I had an inspiring dialogue with Keith Alexander about semantics in HTML. We were wondering about the feasibility of a true microformats superset, where existing microformats could be converted to RDF without the need to write a dedicated extractor for each format. This was also about the time when "scoping" and context issues around certain microformats started to be discussed (What happens for example with other people's XFN markup, aggregated in a widget on my homepage? Does it affect my social graph as seen by XFN crawlers? Can I reuse existing class names for new formats, or do we confuse parsers and authors then? Stuff like that).

A couple of days ago I finally wrote up this "poshRDF" idea on the ESW wiki and started with an implementation for paggr widgets, which are meant to expose machine-readable data from RDFa, microformats, but also from user-defined, ad-hoc formats, in an efficient way. PoshRDF can enable single-pass RDF extraction for a set of formats. Previously, my code had to walk through the DOM multiple times, once for each format.

A poshRDF parser is going to be part of one of the next ARC revisions. I've just put up a site at poshrdf.org to host the dynamic posh namespace. For now the site links to a possibly interesting by-product: A unified RDF/OWL schema for the most popular microformats: xfn, rel-tag, rel-bookmark, rel-nofollow, rel-directory, rel-license, hcard, hcalendar, hatom, hreview, xfolk, hresume, address, and geolocation. It's not 100% correct, poshRDF is after all still a generic mechanism and doesn't cover format-specific interpretations. But it might be interesting for implementors. The schema could be used to generate dedicated parser configurations. It also describes the typical context of class names so that you can work around scoping issues (e.g. the XFN relations are usually scoped to the document or embedded hAtom entries).

I hope to find some time to build a JSON exporter and microformats validator on top of poshRDF in the not too distant future. Got to move on for now, though. Dear Lazyweb, feel free to jump in ;)

paggr teaser video and pre-registration site online

paggr teaser video and landing page
I've been semi-silently working on something new. A combination of many semwebby things I came across and played with during the last 3 years or so:
  • semantic markup
  • smart data
  • an rdf clipboard
  • ajax
  • sparql sparql sparql
  • sparql + scripting
  • sparql + templates
  • sparql + widgets
  • lightweight, federated semweb services and bots
  • UIs for open data
  • semwikis
  • agile and collaborative web development

So, what happens when you put this all together? At least something interesting, and perhaps semsol's first commercial service. (Or product, this is all just LAMP stuff and can easily be run in an intranet or on a hosted server). Anyway, still some way to go. It's called paggr, the landing page is up, and today I created a first teaser/intro video.

I'll demo the beta (launch planned for November) at upcoming ISWC during the poster session (my poster is about SPARQL+ and SPARQLScript, the two SPARQL extensions that paggr is based on). I may have early invites by then.

As a preparation for the hopefully busy fall and winter months, though, I'll be on vacation for the next two weeks. No Email, no Web, no Phone. Yay!



HQ version (quicktime, 130MB)

Getting Real with RDF & SPARQL at DevX

DevX article about combining the Getting Real approach with SemWeb technologies
My "Getting Real" with RDF and SPARQL article is now available in DevX' Semantic Web zone:
"Getting Real" is an agile approach to web application development. This article explains how it can be successfully combined with the flexibility of semantic web technologies. The article is a look behind the scenes of dooit's first iteration (and an introduction to Trice, code included). The focus is not so much on the Web aspect of RDF, but rather on its ability to accelerate software development ("Data First", etc).

Any feedback is welcome, in comments here or over at the DevX site.

dooit - a live Getting Real experiment

I created an RDF app following the Getting Real approach
dooitI've probably read Getting Real half a dozen times since the release of the free online version last year. The agile process seems to fit quite nicely with RDF-based tools (Semantic CrunchBase was the most recent proof of concept for me). I'm currently writing a DevX article about using RDF and SPARQL in combination with Getting Real and wondered about quantitative numbers for such an approach. As I usually don't record hours for personal projects, I had to create a new one: sillily named "dooit", a to-do list manager.

dooit follows a lot of GR suggestions such as "UI first", not wasting too much time on a name, that less may be enough for 80% of the use cases, or that usage patterns may evolve as "just-as-good" replacements of features ("mm-dd" tags could for example enable calendar-like functionality).

I started the live experiment on Friday and finished the first iteration on Saturday. Below is a twitter log of the individual activities. I was using Trice as a Web framework, otherwise I would of course have spent much more time on generating forms and implementing AJAX handlers etc. So, the numbers only reflect the project-specific effort, but that's what I was interested in.
  • (Fr 08:24) trying the "Getting Real" approach for a small RDF app
  • (Fr 10:51) idea: a siiimple to-do list with taggable items
  • (Fr 11:02) nailing down initial feature set: ~15mins: add, edit, tick off taggable to-do items
  • (Fr 11:02) finding a silly product name: ~5mins: "dooit"
  • (Fr 11:27) creating paper sketches: ~20mins (IIRC, done yesterday evening)
  • (Fr 11:42) got unreal by first spending ~30mins on a logo
  • (Fr 12:07) Setting up blank Trice instance and basic layout to help with HTML creation: ~25mins
  • (Fr 13:52) first dooit HTML mock-up and CSS stylesheet: ~90mins
  • (Fr 17:14) JavaScript/AJAX hooks for editing in place, forms work, too, but w/o data access on the server: ~3h
  • (Fr 18:12) identifying RDF terms for the data structures: ~30min
  • (Fr 18:13) gotta run. time spent so far for creating RDF from a submitted form: 20mins
  • (Sa 14:40) continuing Getting Real live experiment
  • (Sa 14:41) "URIs everywhere" is one of the main issues for agile development of rdf-based apps. Will try to auto-gen them directly from the forms..
  • (Sa 19:04) rdf infrastructure work to auto-generate RDF from forms and to auto-fill forms from RDF: ~2h
  • (Sa 19:07) functions to send form data to RDF store via SPARQL DELETE/INSERT calls: ~1h
  • (Sa 19:09) replacing mockup template sections with SPARQL-generated snippets: ~1h (CRUD and filter-by-tag now in place, just ticking off items doesn't work yet)
  • (Sa 20:09) implementing rest of initial feature set, tests, fine-tuning: ~1 h. done :)
  • (Sa 20:14) Result of Getting Real experiment: http://semsol.org/dooit Got Real in ~10 12 hours
I think I can call it a success so far. One point about GR is staying focused, working from the UI to the code helps a lot here (as does live-logging, I guess ;). But I'm not done yet. Now that I have a first running version, I still have to see if my RDF-driven app can evolve, if the code is manageable and easy to change. I'm looking forward to finding that out, but my shiny new dooit list suggests to finish the DevX article first ;)

CrunchBase Interview

I've been interviewed by the CrunchBase team.
Semantic CrunchBase seems to be worth the time I'm putting into it. Thanks to TechCrunch's and CrunchBase' great move to open their data and encourage reuse (and writing about the apps that use their API), I've had the chance to do a couple of SemWeb demos and reach out to the audience that could benefit as much (or maybe even more) from RDF & Co. as the groups we already have on board: Web app developers.

I also got an offer to write some related articles for DevX, and the CrunchBase team just published an interview where I (shamelessly) promote SemWeb development. I am already noticing an increased number of mails asking for RDF introductions, and people are even starting to just figure things out on their own, with friendly SPARQL paving the path.

This might be the right time for a SWEO II (with a focus on the "E") or a similar effort driven by the RDF community.

Archives/Search

YYYY or YYYY/MM

Feeds