I only wanted to track SemTech
chatter but it seems all semantics-related tweet streams are discussing just one thing right now: Schema.org
. So I apparently will have to build a #semtech filtering app, but I couldn't resist and had a close look at Schema.org, too. And just like everybody else, I'll join the fun of polluting the web with yet another opinion about its potential impact on the Semantic Web initiative and related efforts.
What exactly is Schema.org?
- It is a list of instructions for adding structured data to HTML pages.
- Webmasters can choose from a long, but finite list of types and properties.
- Data-enhanced web pages trigger richer displays in Google/Bing/Yahoo search result pages.
Why the uproar?
- Schema.org proposes the use of Microdata, a rather new RDF format that was not developed by the RDF community.
- Schema.org introduces a new vocabulary which doesn't re-use terms from existing RDF schemas.
Who can benefit from it?
- The web, because the simple template-like instructions on schema.org will boost the amount of structured data, similar to Facebook's Open Graph Protocol.
- The semantic web market, by offering complementing as well as extending/competing solutions.
- SEO people, because they can offer their service with less effort.
- Website owners, who can more reliably customize their search engine displays and increase CTRs.
- Possibly HTML5 (doctype) deployment, because the supported structures are based on HTML5's Microdata.
- Verticals around popular topics (Music, Food, ...) because the format shakeout will make their parser writers' lifes easier.
- Verticals who manage to successfully establish a schema.org extension (e.g. Job Offers).
- The search engine companies involved, because extracting (known) structures can be less expensive and more accurate than NLP and statistical analysis. Controlling the vocabulary also means being able to tailor it to semantic advertising needs, integrating the schema.org taxonomy with AdWords would make a lot of (business) sense. And finally, the search engines can more easily generate their own verticals now (as Google has already successfully done with shopping and recipe browsers), making it harder for specialized aggregators to gain market share.
- Spammers, unless the search engines manage to integrate the structured markup with their exisitng stats-based anti-spam algorithms.
Who might be threatened and how could they respond?
- Microformats and overlapping RDF vocabularies such as FOAF (unlikely) or GoodRelations, which Schema.org already calls "earlier work". Even if they continue to be supported for the time being, implementers will switch to schema.org vocabulary terms. One opportunity for RDF schema providers lies in grounding their terms in the schema.org taxonomy and highlighting use cases beyond the simple SEO/Ad objectives of Schema.org. RDF vocabs excel in the long tail, and there are many opportunities left (especially for non-motorcycle businesses ;-). This will best work out if there are finally going to be applications that utilize these advanced data structures. If the main consumers continue to be search engines, there is little incentive to invest in higher granularity.
- The RDFa community. They think they are under attack here, and I wonder if Manu is overreacting perhaps? Hey, if they had listened to me they wouldn't have this problem now, but they had several reasons to stick to their approach and I don't think these arguments get simply wiped away by Schema.org. They may have to spend some energy now on keeping Facebook on board, but there are enough other RDFa adopters that they shouldn't be worried too much. And, like the RDF vocab providers, they should highlight use cases beyond SEO. The good news is that potential spam problems, which are more likely to occur in the SEO context, will now get associated with Microdata, not RDFa. And the Schema.org graph can be manipulated by any site owner while Facebook's interest graph is built by authenticated users. Maybe the RDFa community shouldn't have taken the SEO train in the first place anyway. Now Schema.org simply stole the steam. After all, one possible future of the semantic web was to creatively destroy centralized search engines, and not to suck up to them. So maybe Schema.org can be interpreted as a kick in the back to get back on track.
- The general RDF community, but unnecessarily so. RDFers kicked off a global movement which they can be proud of, but they will have to accept that they no longer dictate how the semantic web is going to look like. Schema.org seems to be a syntax fight, but Microdata maps nicely to RDF, which RDFers often ignore (that's why schema.rdfs.org was so easy to set up). The real wakeup call is less obvious. I'm sure that until now, many RDFers didn't notice that a core RDF principle is dying. RDFers used to think that distinct identifiers for pages and their topics are needed. This assumption was already proved wrong when Facebook started their page-based OGP effort. Now, with Schema.org's canonical URLs, we have a second, independent group that is building a semantic web truly layered on top of the existing web, without identifier mirrors (and so far without causing any URI identity crisis). This evolving semantic web is closer to the existing web than the current linked data layer, and probably even more compatible with OWL, too. There is a lot we can learn. Instead of becoming protective, the RDF community should adapt and simplify their offerings if they want to keep their niches relevant. Luckily, this is already happening, as e.g. the Linked Data API demonstrates. And I'm very happy to see Ivan Herman increasingly speaking/writing about the need to finally connect web developers with the semantic web community.
What about Facebook?
Probably the more interesting aspect of this story, what will Facebook do? Their interest graph combined with linked data has big potential, not only for semantic advertising. And Facebook is interested in getting as many of their hooks into websites as possible. Switching to Microdata and/or aligning their types with Schema.org's vocabulary could make sense. Webmasters would probably welcome such a consolidation step as well. On the other hand, Facebook is known for wanting to keep things under their own control, too, so the chance of them adopting Schema.org and Microdata is rather low. This could well turn into an RSS-dejavu with a small set of formats (OGP-RDFa, full RDFa, Schema.org-Microdata, full Microdata) fighting for publisher and developer attention.
I am glad that Microdata finally gets some deserved attention and that someone acknowledged the need for a format that is easy to write and
to consume. Yes, we'll get another wave of "see, RDF is too complicated" discussions, but we should be used to them by now. I expect RDF toolkits to simply integrate Microdata parsers soon-ish (if we're good at one thing then it's writing parsers), and the Linked Data community gets just another taxonomy to link to
. Schema.org owns the SEO use case now, but it's also a nice starting point for our more distributed vision. The semantic web vision is bigger than data formats and it's definitely bigger than SEO. The enterprise market which RDF has mainly been targetting recently is a whole different beast anyway. No kittens killed. Now go build some apps, please ;-)