skimming

skimming at XML 2007 (and The Cloud's Silver Lining)

I've not yet had a chance to write up my experiences at XML 2007 in Boston last week, but an announcement today by Amazon prompts me to at least summarise one of the talks I did, called XForms, REST, XQuery...skimming. (Slides)

It's a theme I've touched on a little in this blog, and have spoken on before (at an XML UK meeting, last year, and at XTECH 2007, this year), and the idea goes something like this; building web applications by dynamically generating user interfaces that contain data, is awkward, and difficult to maintain. Web services (which includes REST and APP) removes some of the dependency on databases by hiding the data behind a simple abstraction layer--a kind of 'ODBC for the web'--and XQuery takes this further by giving us a standard language to get at the data.

But XForms is the icing on the cake, since it gives us a rich-client language that can be deployed like traditional web applications, yet is one that ensures a clean separation of the data and the user interface.

During the course of the presentation I showed how incredibly easy it is to create an application with XForms, since forms can be first created on the desktop--loading and saving their data from local XML files--before being deployed to any type of web server. By creating static forms that load their own data, rather than the more common model of dynamically generating static files that contain data, we end up with files that are completely agnostic about where they are delivered from. My demonstrations took them from the desktop, to a simple web server, to eXist (an XML database).

But these applications could even run in 'the cloud', with no web-server. That may sound odd, but I could store any of these XForms applications in SVN or Amazon's S3 and they would work as well as if they were in IIS or Apache. This is the ultimate in simple maintenance and future-proofing, since you are no longer reliant on any particular operating system or server-side language...the forms just 'are'.

GData

The final step of my presentation was to take this a little further and show an XForm that interacts with Google's GData; the form is simple, and accepts a longitude, latitude and comment which is then inserted as a row into a Google Spreadsheet, using Atom Publishing Protocol (APP). Another form is then used to retrieve the entire spreadsheet, and each row is used to create a pin on a map.

The great thing about this demonstration is that it shows that not only are our forms now agnostic about where they run--desktop, server-side, S3, etc.--but now the data becomes agnostic; we don't care how the data is stored, all we are concerned about is the protocol used to interact with it.

And crucailly, XForms' ability to get at this data shows that it really is the missing link when it comes to building a new kind of web application.

SimpleDB

But this generalisation about data just took another leap with the announcement today by Amazon of a closed beta called SimpleDB. This will allow programmers to store and query for data in a manner similar to the way that GData works.

By putting data into the cloud, and then using an easily deployed, declarative rich-client language like XForms, it is possible for the architecture of web applications to be altered in a quite fundamental way, in a direction that is completely different to what we're used to.

RDFa hacking at Hack Day

Hack Day: London, June 16/17 2007 So tomorrow morning it's off to Alexandra Palace for the start of Hack Day London. My wife is raising an eyebrow as I ready my sleeping-bag, and she may have a point...but I'm secretly quite looking forward to hanging out with a lot of smart people, hearing about their good ideas, with no-one around to tell me what time to go to bed.
For myself, I'm hoping to get involved in projects relating to metadata and rich-clients. I'm sure there will be people working in this area, but if not, I've got plenty of new things we've been experimenting with that I might be able to get others to take a look at.

RDFa Parsing

Perhaps the most interesting is the use of our RDFa parser within a blog or other document. By adding metadata to elements in a document you are providing a hook which gives you more information about some item. Onto this hook you can 'hang' further functionality or more information.

To illustrate, let's say I have a cookery blog on which I mentioned Canteen Cuisine by Marco Pierre White. Given that I probably want to make some money from the site with my Google Ads and reseller links, I will most likely place a link to a site like Amazon around the words "Canteen Cuisine":

I found a good recipe in <a href="http://www.amazon.com/gp/product/0091808189?ie=UTF8&tag=escuelerie-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0091808189">
Canteen Cuisine
</a> which you really should buy so that I get a kick-back.

Obviously this satisfies my commercial yearnings, but it doesn't really contribute to the great 'mass' of semantic information that others might find useful. It doesn't actually tell you which book is being referred to, only that there is a link to a page on Amazon. It would be far better--and easier, as it happens--to simply tag the book with an ISBN number, and leave the creation of links to Amazon, or Barnes and Noble, to some other layer of behaviour.
The key is to use RDFa to indicate a unique identifier for the book, and also to indicate an object type, i.e., to indicate that the item we're dealing with is really a book:

I found a good recipe in <span xmlns:bib="http://somebig.org/" about="urn:ISBN:0091808189" class="bib:book">
Canteen Cuisine
</span> which you really should buy so that I get a kick-back.

If you know RDF then you'll understand that the generated triples are:

<urn:ISBN:0091808189> rdf:type <http://somebig.org/book> .

If you don't know RDF don't worry; the main point is that we've added mark-up that says nothing more than:

"urn:ISBN:0091808189" represents a book.

But that simple fact is pretty useful, since now we could use the ISBN number to go and get more information from Amazon, and then display the data in a tooltip, or add links so that a reader of the cookery blog could buy the book or add it to their wish-list.

Before we look at how we get that data, it's worth pointing out that Amazon does offer the Quick Linker widget for TypePad users, but you'll see that the mark-up looks very strange:

<a type=amzn asin=B000012345>I love this item</a>

In some ways it's similar to what I've shown with RDFa, where we use a small amount of mark-up as a hook to generate richer functionality, but in their case it's an Amazon-specific solution. A key design goal of RDFa is to provide a generic solution that would work with all types of data.

RDFa action handlers

Once the RDFa parser has 'found' items in the page it then does something with them. Up until now all of our experiments have been to simply display data in a different way, such as showing FOAF information as a card with the person's picture, or event information as a pin on a map. The Operator Firefox extension (which now supports RDFa as well as microformats) takes a slightly different approach and allows the user to do things with the data, such as add an hCard to their contacts database.

The Amazon book experiment is a first step towards combining these approaches. The action handler that gets run when a book is found actually goes to get further data about the book from Amazon, before it shows a panel that contains an image of the book and its title. This means that we don't need to know any URLs for the book on Amazon or where its images are located, since they are derived in the action handler; all we need is the ISBN number.

Yahoo! Pipes

But an interesting twist here is that we don't go directly to Amazon to get the data, but instead go via Yahoo! Pipes. There are many reasons for doing this--not least that it fits well with my ideas about skimming--but the main one is simply because Pipes gives us JSON data for any feed we can access, which works nicely with the model we're building here. By running all of our data requests through Pipes we can provide a single format to our RDFa processor, even if we decide to change the source of our data at some later point, or add further services into the pipe to add more metadata.

The pipe that we've defined for Amazon actually takes the URI of the ISBN number (such as urn:ISBN:0091808189) rather than just the ISBN number, because at some point I'd like this to be a proper RDF query. So I'm actually saying, "give me everything you know about this URI", and whilst that URI currently represents a book, in the future the same query might be able to return information about URIs that represent people, ships, planets, and so on.

What's next?

Everything we've been working on so far is about providing a solid widget framework that can be used to enhance any page on any site, in a simple and consistent way. I'm sure that at Hack Day there will be people working on ideas to do with enhancing blogs, ideas related to using the semantic web, microformats (whether in the traditional or RDFa style), and many other things besides, and so I'm really hoping that this will be an opportunity to move some of this work on, in the context of real applications that people want to build.