So while waiting for my consulting contract from UVM to come through I thought I’d while away the time with a METS XForm. This has immediately proved more challenging than I had anticipated. I think it would not be to hard to make institution specific forms but that it will be nearly impossible to create [...]
metadata
@role values for SVG
Submitted by administrator on Sat, 05/17/2008 - 19:12.Something which was part of the early design concepts from the XHTML Role Attribute Module, which has got a little lost, is that elements from any language can provide a handy source of role values too.
To reconstruct the logic:
A role value is simply a URI, or resource. The reason for this is so that the extensibility hook that we're creating puts us straight into the world of RDF.
Now, some values of @role will need to be invented. This might be because they simply don't exist, or because we want the values to be 'cross-cutting', and apply to many different mark-up languages.
But there are many values that already exist, that are suitable for use in a variety of situations. For example, XForms has a hint element, that can apply to its form controls:
<xf:input ref="surname">
<xf:label>Surname:</xf:label>
<xf:hint>Please enter your surname or family name</xf:hint>
</xf:input>
The semantics of 'XForms hint' and pretty well defined, so it should be straightforward to apply them to other situations. For example, an Ajax library could pick up a hint and do something with it in an (X)HTML document, even without XForms:
<input name="surname" />
<div role="xf:hint">
Please enter your name
</div>
SVG
This whole topic came up recently because someone asked whether it would be possible to add some new values for role which would identify paragraphs, sections, headers, and so on, and that could e used in languages like SVG; but the answer is that if we use the XHTML p, section, h1, h2, etc., values then we don't need to invent new roles:
<svg:text role="xh:h1">Metadata</svg:text>
<svg:text role="xh:p">
Metadata is data about data...which is also data...kind of
turtles all the way down...
</svg:text>
As you can see, a role-aware voice system would be able to provide feedback to a user in any mark-up language, simply by knowing XHTML role values.
Upcoming talks on RDF, RDFa and XForms
Submitted by administrator on Thu, 05/01/2008 - 12:50.May is going to be pretty busy with talks about XForms, RDF and RDFa coming up.
First up is my talk XForms, REST, XQuery...and skimming at XTech 2008. The talk embraces themes I've been pursuing for a couple of years now; that as we put more functionality into the client, and servers get 'cleverer', it becomes much easier to build sophisticated web applications. Of course server technologies are moving so fast now that this whole approach is making more and more sense, so I'm looking forward to taking in recent developments in my talk. For example, both Amazon and Google effectively have 'databases in the cloud' that can be used to store data and query it, via APIs, with literally no configuration.
A few weeks later I'm going to be giving a tutorial on RDF at SemTech. This is an interesting development for me, because RDF and the semantic web were always my first interests--before XForms and before XHTML 2. (As well as writing RDF parsers, and designing applications, I also contributed chapters on RDF and RDFS to a couple of books on metadata and XML.)
But one problem I always had when trying to build semantic-web applications was that defining the user interface was pretty hairy. This was partly because RDF Schema is tricky to process, but also because HTML was insufficiently powerful in its core feature-set, so the translation from RDF to HTML involved a lot of work.
The need for a user interface language that was much richer than HTML was therefore why I got involved in the XForms standard (and worked with a team of people to produce the first fully conforming XForms processor, formsPlayer). So although it may not seem directly connected to the semantic web, I believe that in the coming period XForms will start to become a key part of the semantic web's architecture.
Another problem I kept coming up against whilst developing for the semantic web was the difficulty in actually publishing metadata. In particular I always found it frustrating that there was a lot of really useful metadata just sitting in ordinary web pages, and no-one could get at it. Attempting to resolve this problem gave rise to RDFa, and I'm excited that the RDFa in XHTML working draft is extremely close to becoming a stable recommendation. And as interest in RDFa grows, I'm pleased to say that some of my other presentations in May will be 'tech talks' on RDFa at Yahoo!, eBay and Google. (I'm really excited that I might be getting to meet some of the guys behind Yahoo!'s SearchMonkey.)
My final talk of the month will be at the excitingly-named Kings of Code, and I'm looking forward to talking about XHTML, XHTML 2, HTML 5, XAML, and anything else I can think of in relation to web languages.
Metadata Repositories vs. Metadata Registries
Submitted by administrator on Fri, 03/21/2008 - 16:06.For several years people have been using the terms metadata Registry and Repository inconstantly, imprecisely and almost interchangeably and I would like to weigh in as to how these terms could be used more precisely to allow organizations to effectively to manage metadata processes.
First lets take the definition of a Repository. Webster defines a repository as …a place, room, or container where something is deposited or stored.. Note that here is nothing in this definition about the quality of the things being stored or the process to check to see if new incoming items are duplicates of things already in the repository. If I have 100 users they could each define "Customer" as the see fit and put their own definition into the metadata repository as their own definition. No problems.
On the other had lets take the word Registry. A Registry has the connotation of more than just a shared dumping ground. Registries have the additional capability to create workflow processes to check that new metadata is not a duplicate (for a given namespace). One of the definitions from Webster is an official record book. Note the word official.
A Repository is similar to a front-porch of a house. No locks prevent new things from landing there. But a Registry is a protected back room where human-centric workflow processes are used ensure that metadata items are non-duplicates, precise, consistent, concise, distinct, approved and unencumbered with business rules that prevent reuse across an enterprise. These registries have become the central foundation that agility can be baked-in to many enterprise process. The latest version of the Kimball's Data Warehouse Lifecycle Toolkit (which is actually a very good read) even goes as far as to call their process "metadata-driven". Not different the model-driven development world.
Registries have the implicit connotation of trust behind them. They now serve a a central process for the creation of shared meaning across the enterprise. Definitions in a registry have been vetted by an enterprise-level organization that has the responsibility of enterprise data stewardship. They have a high probability of being consistent with industry best-practices and vertical industry standards. Registries are the go-to source for creating canonical XML schemas, enterprise ontologies or conformed dimensions in a OLAP cube. Repositories are personal or small departmental definitions of an isolated view of the world.
None of these ideas are really new. They are at the core of the ISO/IEC 11179 metadata registry standard. Note that they don't call it a repository standard! People are just now starting to understand how important Registries are in most enterprise-wide systems. The growth of Business Intelligence and Enterprise Data Warehouse terminology and Service Oriented Architectures is a good place to see the rise of repositories and registries. We now see service registries, portlet registries, model registries...the list goes on-and-on.
Much of the background on the differences between the use of repositories and registries can be traced way back to the early days of object-oriented systems in the 1995 book Succeeding with Objects by Adele Goldberg and Kenneth Rubin. This was one of the first books on enterprise reuse strategies and they defined the concept enterprise asset reuse and the need for a trust-driven repository as a basis for reusing assets. They identified a multi-step process for reviewing new submissions to determine if the submission duplicated existing assets. They showed how critical it was to classify items in a registry and search an existing registry for duplicates before new items are added. If you can get a copy of the book I would suggest you read the section on "Set Up a Process for Maintaining Reusable Assets" on page 245.
The book then goes on to show how organizations can and should be structured to reuse these assets and gives the pros and cons of the differing organization structures and their impact on reuse. This is the basis for the data governance and data stewardship movement in many organizations today.
So the next time someone uses the word registry or repository in a conversation, ask them if they are using the definition of the word that is consistent with the corporate business term registry or is their own private definition from their own repository of imprecisely used buzzwords.
XForms for Metadata Creation
Submitted by administrator on Fri, 02/29/2008 - 22:06.I just got back from Code4lib 2008 in Portland Oregon. Last year I did a lightning talk on XForms, this year I did full 20 minute presentation. Well, actually I did half a 20 minute presentation. I split the slot with Michael Park from Brown. We had both been working on MODS editors and had [...]
Working, not blogging
Submitted by administrator on Fri, 12/14/2007 - 15:14.The past few weeks have been pretty busy for me, and I haven’t had much time or inclination to blog. But here is a round-up of what’s going on at the CDI these days.
MODS editor
The new MODS editor (talked about here, and you can see a version here) has been in production for about a [...]
MODS: a bumpy launch
Submitted by administrator on Tue, 12/04/2007 - 00:07.Last week was our target date to launch the new MODS editor. On Wednesday we had a training session with the cataloging department to review workflow and familiarize the catalogers with the form. Metadata has been a bit of a bottle neck for us because the two librarians who were doing the cataloging could only [...]
Authority Control: MODS & MADS
Submitted by administrator on Tue, 10/30/2007 - 17:52.I’ve been working on implementing authority control in our metadata editor for the past few weeks. I’ll spare you the painful details and just outline the process and direct you to a mostly working example of the MODS editor with name and subject suggest (the subjects suggest is still a bit buggy) functions.
I decided [...]
XForms and CSS
Submitted by administrator on Wed, 10/10/2007 - 19:54.I’m in the final steps on my MODS XForm (check out the updated copy here), and I’ve run across quite a number of strange behaviors, and downright infuriating bugs. Most problems I’ve had have been with CSS. I would really like to use the display:table, display:table-row, display:table-cell to get my input boxes and labels to [...]
Metadata Web Services
Submitted by administrator on Wed, 10/10/2007 - 10:41.I have been using the eXist native XML database/web server for about nine months now and it is starting to change the way I think about metadata management.
My latest project for a financial institution requires us to quickly build XForms to manage various metadata as well as data. What I am finding is that my old method of storing metadata in XML files on a file system and then transforming the metadata using Apache Ant was a complex process. My new approach is to store the XML directly in eXist. This used to be a little bit hard since I thought that you had to use the eXist web interface to upload each XML file and ant scripts to backup your eXist database.
This all changed when I was shown how to use the Microsoft Windows WebDAV tool. Copying files to eXist and backing up the entire data store is just a drag and drop using Windows.
Now an entire new set of metadata web services are becoming much easier to build. Take the simple task of building a pick list of enumerate values for a form. XForms allows you to use the select1 control and specify an itemset using an XPath expression. I can now just load the data elements into an instance and grab the values and labels directly from the enumerations in the metadata registry files.
The only drawback to this is the fact that you load more metadata (like the full definitions) then you need in building the form. But once again eXist comes to the rescue. It is just a few lines of code to create a little web service (using XQuery) that you pass a code table to that returns just the label/value pairs. Using this method the selection list are always up to date and don't require any "batch" updates.
What I am learning from this is that in the past, metadata management was usually an after thought. Something that the coding standards people used to enforce database column naming conventions. But with metadata being stored in eXist metadata becomes part of application services. Building apps is just assembling forms that pull metadata from the registry in real time.
If you are concerned that the metadata registry server will be overloaded with requests for information each time forms load, we should remember that these services are RESTful. The results can also be cached so they don't have to be regenerated. I still have more to learn about how to make these services fast but since metadata is small it can usually always exist in RAM and disk I/O is very limited.
All of these developments are just small pieces of the puzzle at putting well-managed metadata at the core of your enterprise development methodologies. It is really the heart of the model-driven enterprise.
Let me know if you are creating metadata web services. I would like to know what things you feel are useful to your users.
- Dan
Inching closer to a MODS form
Submitted by administrator on Mon, 09/17/2007 - 13:42.I’ve spent most of the last week developing the MODS XForm. You can take a look at its current incarnation here. The implementation of the form itself is not terribly difficult, although the nested repeats can get a little tricky. The real time sink is in trying to create a form that makes sense and [...]
Xforms suggest
Submitted by administrator on Fri, 08/24/2007 - 18:51.So I have an XForms suggest function mostly working (you can see an example here, click on the “subject analysis” tab and try typing in milk, or united states). I used this tutorial to get the XForm working, and an XQuery that looks for all unique subjects starting with the submitted value.
It is a neat [...]
Fixing the web, Part 1
Submitted by administrator on Wed, 08/22/2007 - 08:42.The XHTML.com site have published the first part of a two part series on what might be wrong with the web, and how we might fix it. The idea is that the first part consists of short opinion pieces by people involved in various ways with the web, and the second part is a collection of responses from anyone who wants to send something in.
Contributors are Chris Wilson, Daniel Glazman, Joe Clark, Doug Geoffray, Roberto Scano, Jeffrey Veen, Dave Raggett, Mike Andrews, James Pearce, Nova Spivack and myself.
In my comments I focused on the need for a better way to create standards:
I actually have a very positive view of the transition to the 'future Web', and I see many of the things that one might say we need for a better Web already emerging. For example, we need the differences between browsers to be removed, but due to the high quality of libraries such as YUI, Dojo, and Prototype this is well underway. We also need there to be a convergence of application development techniques, so that the same languages can be used for the Web, desktop applications, gadgets, widgets, and so on; but with an explosion of interest in HTML, JavaScript and XForms--as well as the appearance of multi-language platforms like Silverlight--this trend also looks set to continue.So whilst I might say that these things are crucial to the future of the Web, they are also very much underway, and barring a collective overnight loss of memory by the development community, look certain to continue. But there is one thing that to me doesn't look so assured, and that is whether standards are agreed upon around Ajax, and more generally, the whole approach to standards development.
In general, standards (at least those from the W3C) are developed in a kind of 'kitchen-sink' style, where some specification tries to include just about everything that can be devised on a particular topic--and therefore takes years to write. A good example is SVG, and even, to some extent, my own favourite, XForms. In both cases there are many useful things that could be factored out of these specifications to be made available elsewhere as smaller, more manageable components, but it is only recently that this approach has started to be tried. Good examples of the 'bite-size' approach are the 'role attribute', 'access', and RDFa, which provide techniques for adding metadata to web documents as well as making them more accessible. (See also The XHTML role attribute: small and perfectly formed and Using RDFa in XHTML 1.)
A good measure that some are learning that this is a promising approach to writing specifications is the backplane initiative at the W3C. But unfortunately, an illustration that lessons haven't been learned is the W3C's embracing of HTML 5, which is not only a case study in kitchen-sink design--let's throw everything in there--but a case study too of the problems caused by the 'not invented here' mindset.
If the W3C continues to support the creation of enormous specifications that take years rather than months to complete, then it will almost certainly become increasingly irrelevant to the 'future Web'. Of course, based on the W3C's lack of coherent leadership around the whole HTML 5 question, some might not see that as necessarily a bad thing, but it does raise the question as to whether other organisations can fill the space with a better process. For example, can the Open Ajax Alliance take a much more focused approach to standards writing, creating small specifications that can be easily combined? If so, it's possible that the vacuum would be filled, and the future of the 'future Web' would stand on a firmer footing.
Whilst I'm not quite sure where the future of standards creation will come from, the sheer dynamism we see in the Web development space makes me certain that it is imminent. If we can devote even a fraction of that creative energy to evolving a new approach to standards development, the future of the 'future Web' looks very bright indeed. But if we don't, then for the foreseeable future we are looking at increasing fragmentation, and the lack of standardisation in the browser merely being transferred to a 'lack of standardisation in the libraries'.
Anyone is free to respond to the discussion, which needs to be done by September 10th. They'll then use the responses to create 'part 2'.
MODS XForm Update
Submitted by administrator on Tue, 08/14/2007 - 15:26.I’ve finally had a chance to revisit my MODS XForm. Although I had been using Orbeon, I discovered this week that the origin and context attributes work in Firefox, so I have been doing most of my development with Firefox. I couldn’t find any documentation on this (looked here and here), but I have been [...]
mashup* punch-up round-up
Submitted by administrator on Fri, 02/23/2007 - 16:56.
No, last-night's mashup* wasn't really a punch-up...but we did have a thoroughly enjoyable ding-dong on topics ranging from whether the next wave of the web would be sensual or semantic, whether rules-engines were the answer to our problems, whether placing a red cross (or a green tick) against an entry in search results was providing a service or 'policing the web', and much more besides. Combined with a great venue, and some interesting characters to share a bottle of beer with, it made for an enjoyable and interesting evening.
Although the meeting was 'about' the semantic web, organiser Sam Sethi was keen to avoid yet another discussion that remained high-level and abstract, and instead wanted to look at some of the exciting things that can be done once you get your hands on the data. Of course that didn't stop some people from trying to explain what 'semantic web' means, and unfortunately they did so with just enough clarity to convince everyone else that it was an unrealisable pipe-dream.
All of which indicated to me, at least, that Sam had been right when he made the decision to focus the evening on what can be done today. As part of illustrating this, he kicked off with some nice demos of the Firefox extensions Webcards and Tails, both of which process microformats located in the ordinary pages you view whilst you are browsing...and both of which look great.
My demo also involved a browser extension, this time for Internet Explorer, and one we created ourselves using the browser extension features of formsPlayer. The data I used for my demonstration used the RDFa format rather than microformats, and to illustrate what the sidebar can do I used Ivan Herman's home-page (which contains FOAF information) as well as my previous blog entry describing the mashup* event (which contains FOAF and iCal information).
The RDFa browser extension simply parses the HTML document in the main window, and stores any RDFa it finds. Then, using XForms and formsPlayer's ability to render custom widgets based on the datatype of the data, the information from the document is displayed in ways that are specific to each piece of data.
In the case of the mashup* event we have two high-level widgets, one for a person and one for an event. The person widget simply shows a person's name, a link to their email address, and their picture. The event widget shows the title of the event, a link to any further information about the event, a map of the location, and the time the event starts. Both the map and the clock are of course themselves widgets--the first an interactive Google Map, and the second an analogue clock created using SVG:
The architecture of this sidebar is interesting for two reasons; the first is that it allows different actions to be performed depending on the data that has been located, and is something that Tails and Operator allow. But we get further flexibility in our RDFa sidebar when it comes to rendering this information; by allowing the widgets to be selected at run-time, based on the 'type' of the data being shown, we really play to the de-centred nature of the web, and allow users to do what they want with the data that they discover. We need this because RDFa is a generic language, and as such we don't know whether it will be used to carry information about people, events, chemical compounds, items for sale, and so on. But precisely because it's generic we don't need to write a parser for every possible language--we only need one. By allowing different widgets to be displayed depending on the 'type' of the data, we can easily create views on the data that are appropriate to different groups and users.
I resisted the temptation last night to try to explain what 'semantic web' means, so I certainly won't bother trying to do it now. Last night's event convinced me that Sam was right to focus on what we can do with all of this, and the exciting collection of demonstrations makes me think that today, the 'doing' part is making the whole thing feel very real indeed.
Computers Don't Create Models (Today) – People Do
Submitted by administrator on Thu, 02/15/2007 - 20:12.I had several good comments about my posting about the increasing role of
semantics in the "Era of Tera". I did not mention some of the key
concepts that the Intel paper discussed and the impact these concepts
will have on IT strategy. Here is an excerpt:
The problem is ordinary computers don't model things. Aside from
supercomputers, today's computers aren't capable of developing
mathematical models of complex objects, systems or processes. Nor are
they powerful or fast enough to perform such tasks at speeds people
demand. We can't plug in a statistical model for a rare malignant
tumor or the behavioral pattern of a shoplifting employee and search
for similar instances of the model in a data set. To benefit from the
wealth of data building up in the world, we need to be able to
communicate with computers in more abstract terms (high-level concepts
or semantics). We need to speak in terms of models.
I believe that the core problem is that there will be a HUGE demand
for highly-skilled data-modelers/ontologists in five years. But there
will not be a large supply since this skill is not something that can
be learned from a single class in college. Rather it is more of a
tacit skill that is not easily codifiable.
That being said, there are many best practices that ARE codifiable.
You can already purchase books on OWL/RDF and metadata registries,
although most of them are mired in relational database models or
written by academics with little real-world experience. You can read the Wikipedia articles on “Data Stewardship”. What is still
needed is a single set of best practices and tools for creating
families of machine-readable data models that can be use as a basis
for creating exchange models. And we need to do this without having to learn how to represent all 12 types of UML diagrams in XMI and transform them. The NIEM subset generator is a good example of the solid XML-Schema-driven front-end of this process of selecting elements from a metadata registry and putting them in your shopping cart.
The bottom line is that this is really about empowerment. And unless organizations introduce semantic web technologies into their organizations at a grass-root level and support them at the CEO level, many organizations will be left behind in the Era of the Tera.
