Mark Birbeck

Treating URIs as strings considered dangerous

Since URIs are often conveyed as strings it's tempting to manipulate them as such, but it's better--and safer--to delegate URI manipulation to special functions. These can then have their own unit-tests, which will take into account the edge-cases that can catch us out.
<!--break-->
I've just been doing a quick code review in the Ubiquity XForms project, and one thing caught my eye that I thought might be worth a post.
Forms submission
In forms submission -- both in XForms and HTML forms -- we often need to add parameters to a URI.
For example, if we have the URI http://example.org, and the parameters a=b and c=d, then the resulting URI should be:

  1. http://example.org?a=b&c=d

It seems pretty straightforward that we need to add the parameters to the URI, with a '?' in between:

  1. [uri] + '?' + [parameters]

Adding parameters
We can see that the parameters themselves have been added by taking the name and value (a=b, etc.), and adding it to the URI, ensuring that for all parameters other than the first, there is a separator between. (The separator can be either an '&' or a ';'.)
Base URIs already containing parameters
The bug that needed fixing though, was that this 'naive' concatenation doesn't work if the URI you are dealing with already contains parameters.
For example, if we have the URI http://example.org?x=y, and we need to add the same two parameters we had before, then our simple concatenation would give us:

  1. http://example.org?x=y?a=b&c=d

when what we actually need is:

  1. http://example.org?x=y&a=b&c=d

As you can see, if we already have a '?' then we don't need to add another, so it seems that a simple addition to our concatenation code would be to use a call to indexOf to see if there is a '?' present, and only add another if we don't find one.
There's a small additional test we'll need to make which is to check whether the last character in the URI is a '?', as I'll explain.
Base URIs with empty query strings
Recall that we added the parameters by placing a=b and c=d onto the end of the URI, separated by '&':

  1. http://example.org?a=b&c=d

Now, if we already have a URI with a query then we need to ensure that there is an additional '&' placed before our first parameter:

  1. http://example.org?x=y&a=b&c=d

But what if the base URI has a query indicator (i.e., the '?') but no parameters? In other words, what if we have this URI:

  1. http://example.org?

In this situation we don't want to add the extra separator, otherwise we'll get this:

  1. http://example.org?&a=b&c=d

So our rules now become that we only want to precede our parameters with a separator if the '?' is not the last character in the URI. It's a little awkward, but thanks to lastIndexOf, I'm sure we can manage.
Base URIs already containing fragments
However, there's a further subtlety; what if the URI contains a fragment identifier?
For example, if we have the URI http://example.org#x, and we need to add the same two parameters we had before, then our simple concatenation would give us this:

  1. http://example.org#x?a=b&c=d

The fragment identifier part of the URI has now become x?a=b&c=d because it's always the last part of the URI. What we actually want is to insert the new parameters before the '#':

  1. http://example.org?a=b&c=d#x

Now we need to add another use of indexOf to check for a '#', and if we find one, use its position as the point at which to insert the parameters.
Context is everything
However, the assumption behind using indexOf and lastIndexOf in this way is that a URI will contain only one '?'. A secondary assumption here is that the only time you'll ever see a '? is as an indicator of the query part of the URI.
Both of these assumptions are incorrect.
Question marks in parameters
The first assumption is that you can only have one '?' in a URI. However, the query section of RFC 3986 explicitly flags up that the '?' character is a valid parameter value. For example, we can have a=finished? as a parameter.
This means it's quite easy to envisage scenarios where there is more than one '?' in the URI:

  1. http://example.org?a=finished?&c=d

This won't necessarily mess up our first use of indexOf, but it will mess up the use of lastIndexOf as a way to check whether you need to add an extra separator. Recall that we wanted to avoid turning this:

  1. http://example.org?

into this:

  1. http://example.org?&a=b&c=d

so we used lastIndexOf to check whether the last character in the URI was a '?'. But that algorithm will turn this:

  1. http://example.org?a=finished?

into this:

  1. http://example.org?a=finished?c=d

You might need to look closely to spot it, but because the last character was a '?', we haven't added a separator before the c, and as a result, instead of having two parameters (a=finished? and c=d) we have only one (a=finished?c=d).
Question marks in fragments
The interesting thing about the previous examples is that at least you know you have a query string, so you might be tempted to still use indexOf to manipulate things. After all, although we may have too many '?' characters, we still know that we have query.
However, with the the fragment section of RFC 3986 all bets are off; here we can see that '?' is explicitly allowed as a fragment character.
This means that it's possible to have a '?' in a URI even if it doesn't have a query. For example:

  1. http://example.org#finished?

This may seem like a contrived example, but actually it's not, for two reasons.
The first is that the fragment part of a URI is carefully defined to allow anything, because we don't know how it will be interpreted. You may think that "finished?" is not valid as an HTTP fragment, but what about in the scheme "xyz"?
And this is the key point; since HTML forms and XForms can ultimately deal with any scheme, since that's how the web is designed, we must write our algorithms defensively, and not assume anything.
Safe URI handling
Hopefully this delving into some of the subtleties of URI handling and parsing--and we haven't even begun to talk about turning relative paths into absolute paths, handling encoded characters, and so on-- has done enough to convince you that you shouldn't manipulate URIs directly, as simple strings.
The only way to be completely sure of what is happening is to use special functions to unpack a URI into its various components, then manipulate those components--perhaps adding additional parameters to the list of query parameters, but it might also be to turn a relative path into an absolute path--before finally reassembling the URI.
This may sound like a lot of work, but it's the only way to be sure that characters don't get incorrectly interpreted as a consequence of their position in the URI not being taken into account.
In the backplanejs library these functions are in the URI module. (This is also imported into the Ubiquity XForms library.) Breaking up a URI simply involves calling spliturl, which returns an object containing all of the parts. For example:

  1. spliturl( "http://example.org?a=finished?" )

would give us the object:

  1. {
  2.   scheme: "http:,
  3.   authority: "example.org",
  4.   path: ""
  5.   query: "a=finished?"
  6.   fragment: ""
  7. }

It's then an easy matter to manipulate the query part, before creating a new URI with the recomposeURI method.
Conclusion
Since URIs are often conveyed as strings then it's tempting to manipulate them as such. But the problem with doing this means that the context of a character is rarely taken into account when processing.
It's better--and safer--to delegate URI manipulation to special functions. These can then have their own unit-tests, which will take into account the edge-cases that can catch us out.

XForms Developer Zone and User Group launched

We're pleased to be launching two new initiatives to help people who are interested in XForms.
The first is the all new XForms Developer Zone web-site -- or xformsdz, as we're calling it.
Whilst the Developer Zone will be unashamedly biased towards XForms, within that, we'll have discussions, articles, code snippets, and tutorials about any XForms processor we can find, and any application framework in which it's used.
To accompany the web-site, we're also launching a regular newsletter, and a London XForms User Group.
The London XForms User Group
The inaugural meeting of the user group will be in Clerkenwell, London, hosted by Skills Matter. We'll be looking at the architecture of XForms, in particular what it means to have MVC support built in from the ground up, and we'll be using XForms and Google Maps as a case-study.
The October meeting will look at the benefits of using XForms for the UK insurance industry, and we're lucky enough to have an industry expert -- Neal Champion -- speaking to us that evening.
If you'd like to come along, we'd love to see you. And even if you can't make it, join the user group mailing-list, and we'll keep you posted about XForms-related activities and events, in and around London.

Linking issues and revisions in Google Code

<!--break-->
We use Google Code a lot in our projects, because it provides a great range of tools, with a straightforward interface.
It's particularly useful to have version control and issue-tracking in the same system, and it's very easy to cross-reference the two by referring to revisions in issue comments, and to issues in the commit messages.
On top of that, Google Code provides some nice code review facilities, and here too, it's possible to refer back to revisions and issues.
In this post I'll outline how we use these tools together in UXF, the Ubiquity XForms AJAX library.
Everything should have an issue
The first thing that's needed is an issue. This might be an issue in the traditional sense, of a bug. But it might also be a desirable feature, that someone would like to add. Whatever it is, we shouldn't really be starting to code until we've written down what we're going to do, or what problem we're trying to solve.
Let's use issue 515 as an illustration. At the top we see it has the following description:

We need a UI control that can adhere to data binding restrictions for
xsd:boolean / xforms:boolean. The control may render a checkbox, for
example.

Sample code snippet:

<xf:input ref="check" datatype="xforms:boolean">
<xf:label>Check: </xf:label>
</xf:input>

Based on conversation with John in last week's call, needed in this
iteration (setting priority to critical).
Code reviews and issues
This description is plenty to get started with, so Rahul gets on with his development and testing, until he is ready for a code review. When he is ready, he commits his code to the review area, with the following SVN message:

[ issue 515 ] Proposed changes for adding a UI control that can bind to boolean
data (renders a checkbox).

You can view this commit at r2919, and as you can see, Google Code provides a convenient link to all of the files involved in the commit. GC also provides tools for attaching comments to any line in those files.
This revision therefore becomes the location at which the review is conducted, perhaps involving many comments from many reviewers.
But note also that Rahul's reference to "issue 515" has become a link to the issue itself. This means that a reviewer has everything they need to start reviewing -- links to all of the files that have changed, a reference to the issue being addressed, and some tools that allow them to add comments to the files.
However, anyone viewing the issue directly would not necessarily know that it was being worked on, or what the progress was. So to keep things up-to-date, Rahul added the following comment to issue 515:

Code for such a UI control has been proposed for addition in r2919 and review
requested. Changing Status to InReview.

The issue is now not only marked as being InReview, but we also have a convenient link to the code review itself. If you were interested in this issue, you would now be able to add your comments to the review, try out the proposed code, and so on.
Committing the code to trunk
At some point the review will finish, and the code will be ready to go into the trunk. (There may be other reviews first, but at some point the cycle should end.)
In our example, Rahul commits his reviewed code to trunk at r2920, with the following message:

[ issue 515 ] UI control that can bind to boolean data. Committed after favorable
review of r2919, with suggested filename change.

Note that we still have a reference to the issue number, but most importantly, we have a reference to the code review at r2919. This means that anyone looking at the code that is committed, and who wanted to understand why a particular change was made, could find the history of the discussion in the code review.
But we're not finished yet.
If we go back to the code review at r2919, we'll find that the last comment Rahul has added says:

Committed to trunk in r2920.

Now, anyone who stumbles across the code review directly knows that the code they are looking at really did find it's way into the trunk, and wasn't left hanging there uncommitted, or was abandoned in favour of some other solution.
Closing the issue
The final step is of course to close off the issue, and here it's important to refer to the revision at which the code was committed. In our example, the last comment on issue 515 is this:

Committed to trunk in r2920 after favorable review.

Resolving issue as Fixed.

Now anyone interested specifically in the issue can see not only that it is closed, but also where the code is that addressed the issue.
Conclusion
This technique may seem a little laborious, but it actually takes longer to describe than to do.
And by following these steps we ensure that we know how a sequence of comments ends, whether in an issue or code review.
We need to know how an issue was addressed when it gets marked as fixed, we need to know what happens to code after a positive code review, we need to know where some code came from that is committed to trunk; in short, no matter whether you are looking at one of committed code, an issue, or a code review, these procedures ensure that you can easily find the other two.

Customising initial instance data in an xform

One of our customers recently asked:
"I have a question on how we can open an XForms page from a non-XForms page. We are trying to have a summary page which is a non-XForm page and we would like to try to open an already existing XForms page by populating data on to it dynamically, by clicking on a component in the summary page. More like a summary-to-detail functionality....
We are trying to find the best way to do this."
It's an interesting question, and although we have worked out a couple of ways of looking at this on the client side, I thought the best answer for now was to user the server.<!--break-->
The easiest way to do this is for the page being linked from to pass whatever parameters are needed in the URL, and for a server-side process to make use of those values when preparing the XForm.
For example, the user might click on a link like this:

ht<!-- -->tp://mysite.com/newform.xyz?name=Mark

Where xyz is whatever server-side processor you happen to be using, such as ASP,
PHP, JSP, etc.
The server-side script would then create a form as usual, but place the parameters that were passed in to it, into the instance data. Depending on the language, it would look something like this:

<xf:instance xmlns="">
<data>
<name><% =Query("name") %></name>
</data>
</xf:instance>

When the form reaches formsPlayer or Ubiquity XForms, it will just look like a normal form, with inline instance data -- the fact that it was generated on the server will be irrelevant.
Decoupling
An extension of this technique is to decouple the data from the form by 'bouncing' the parameter on from the form, to a subsequent request for data.
This is achieved by altering newform.xyz so that the parameter is used in the @src attribute on xf:instance, like this:

<xf:instance src="getdata.xyz?name=<% =Query("name") %>" />

The effect here is that when the form is loaded, a second request is immediately made to your server for the instance data, and the URL for that instance data will include the name parameter that was passed to the form generation script. getdata.xyz obviously needs to be a script too, but this time it only needs to worry about creating XML, and not the rest of the form. It might look something like this:

<data xmlns="">
<name><% =Query("name") %></name>
</data>
Centralising data management
This technique could be taken further and in the getdata.xyz script a query for data could be run, based on the parameter -- or parameters -- passed in. So the original link that the user clicks might look like this:

ht<!-- -->tp://mysite.com/customer.xyz?id=72

The customer.xyz form might generate a 'bounce' to the data that looks like this:

<xf:instance src="getdata.xyz?type=customer&amp;id=<% =Query("id") %>" />

Now getdata.xyz could use type to determine which tables to query, and id to decide which item to retrieve; which means that all data requests could be passed through this one script.
REST and APIs
Now, one final step is to tidy up our URLs a bit, so that the script name is irrelevant, and the data structure comes to the fore. The link that the user clicks on might look like this:

ht<!-- -->tp://mysite.com/customer/72

and the instance data request generated in the form might look like this:

<xf:instance src="customer/<% =Query("id") %>.xml" />

For example:

ht<!-- -->tp://mysite.com/customer/72.xml

This is starting to look a lot like putting XForms on top of a RESTful API.
And if we threw in an XQuery database like eXist, we could dispense with having to manage any server-side scripts at all!

Duck-typing and XForms

In a recent code review on the Ubiquity XForms project, the question of whether to test for an element by name or properties came up. In this post we look at the benefits that can be had from using duck-typing as a way to manage objects' functionality, rather than the more usual hierarchical solutions.<!--break-->
OO hierarchies
The usual sort of example given when looking at and explaining object hierarchies, is to say that we have a class called Vehicle, from which we can derive a class Car and Bicycle.
What I've always hated about this kind of ordering is that you actually spend more time discussing where to put properties than actually writing code. For example, is a motorbike a type of bicycle, that just happens to have an engine? Is a van a small lorry?
And more to the point, who cares?
Well, the Department for Transport might care, but they also might change their definitions based on all sorts of criteria, such as lead in petrol, engine size, number of wheel axles, and so on. In fact, if you were in charge of building the DfT a system, then you would certainly be asking for trouble if you encoded their definitions of vehicles directly into the object hierarchy of your application.
Duck-typing
Many programmers will be familiar with the idea of duck-typing, which is a different way to look at objects in object-oriented systems.
The name comes from the idea that 'if something walks like a duck, and quacks like a duck, then it's a duck'. So if an object has a property of wheels and perhaps engine, then it's a Vehicle.
This can make for very efficient systems, because the 'type' of an object is no longer tied directly to its class in the source code, which as we know is very difficult to change, but is instead something that can be determined at run time. More than that, the 'type' of an object is worked out at the point of need, allowing objects to take on many personae, in a way that multiple inheritance is simply not able to cope with. For example, our Vehicle becomes an object of type 'printable', merely by the presence of a print() method.
XForms
The XForms language wasn't designed with duck-typing in mind, but as I'll illustrate, duck-typing is an incredibly powerful way to implement a flexible and extensible XForms processor.
We'll use the XForms hint element to illustrate.
The hint element
The XForms hint element contains text or mark-up that will be displayed to a user when the hint event is fired, which is usually related to the mouseover event in browser-based processors. When the event occurs, the relevant text or mark-up is displayed to the user, and when some other event occurs (such as a mouseout, a timer expiring, etc.) then the hint is hidden.
The actual text to be displayed can be obtained inline from the element, or via the usual binding techniques, such as using @ref, @value or @bind.
So we have two aspects to a hint; the first concerns its behaviour -- and you could say this is the key to its hintyness -- whilst the second aspect concerns what should be displayed.
The second aspect -- the 'what to display part' -- is actually common to a number of elements in XForms, such as xf:output, xf:help, xf:alert, and so on; they can all either display inline text, or text that is obtained from the instance data.
In the UXF library, the two aspects of a hint are kept separate, on two separate objects, and these two objects are dynamically added to a hint object at run time. This is different to a traditional object model where we would have a hint object that inherits from the 'what to display object', creating a rigid hierarchy; in our model we have a hintyness object that handles the 'show and hide' part, and we have a 'what to display object' that manages the data, and we bring the two together when we encounter the XForms hint element.
The beauty of this is its flexibility; we can 'attach' our 'what to display' object to anything, independent of whether it's a hint, and the object will manage the processing of @ref and @value. And we can also attach our 'hinty' object to anything to make it behave like a hint, independent of the 'what to display' behaviour; SVG circles or HTML5 video objects could all be given hint behaviour, simply by attaching the hintyness object.
The role attribute
The obvious way to indicate in mark-up that we have an object of a certain type is to use the element directly. For example, in XForms we could write:

<xf:hint>Please enter your name</xf:hint>

or:

<xf:hint ref="msg" />

But the role attribute opens up the possibility of assigning functionality to non-XForms elements, for example:

<span role="xf:hint">Please enter your name</span>

Now we get our duck-typing in reverse; 'if an element says that it wants to be a duck, give it the waddle and quack methods'.
Ubiquity XForms
The Ubiquity library makes use of this dynamic binding technique to create flexible run-time objects, through the use of a decoration mechanism; a set of rules have been created that indicate which objects to attach to which elements, and when a document loads, the functionality is 'wired in' to each element, as expressed in the rules. Adding new elements is easy, since it just requires adding more rules.
But the code review I mentioned earlier related to the other side of this, the original duck-typing question. We had a piece of code that needed to toggle a xf:case, and the natural inclination was to check the name of the element, to ensure we really had a casey element, i.e., one that could be toggled.
Yet to be an element that could be toggled only requires the presence of the toggle() method; just as anything that quacks is a duck, so anything that can be toggled is a case.
So instead of checking the element name:

if (NamespaceManager.compareFullName(oCase, "case", "http://www.w3.org/2002/xforms")) {
...
}

our code now checks for the method:

if (oCase && typeof oCase['toggle'] === 'function') {
oCase.toggle();
}
Loosely-coupled Objects
Another key design goal of the Ubiquity suite is that the objects that make up the system are 'loosely-coupled'. This means that they can change independently of each other, and as a consequence, improvements in one area should immediately benefit other areas.
In this case, imagine that a rule is added to the decorator module such that the presence of @role="xf:case" on a div causes all caseyness functionality to be attached. With the duck-typing way of testing whether to toggle, the following mark-up will just immediately start working:

<xf:switch>
<xf:case id="error">...</xf:case>

<div id="ok" role="xf:case">...</div>
</xf:switch>
With the prior method -- checking the name of the element -- it would not have worked until we had added code to check for the presence of @role. In other words, we would have been in exactly the same situation as a normal object hierarchy application, adding special code to test for specific things.
Conclusion
Whilst duck-typing can be confusing if used in every aspect of an application, it can work well when its boundaries are clear. In Ubiquity XForms and Ubiquity SMIL, elements can acquire rich combinations of functionality based on their position in a document, the element name, the presence of other attributes, and so on. In this situation, duck-typing is a natural fit, and keeps the architecture open.

Declarative Ajax Programming with Ubiquity XForms

This presentation by Mark Birbeck was part of XML-in-Practice 2008, organised by the IDEAlliance, and showed developers how to use XForms in their Ajax applications.<!--break-->
Presentation title: Declarative Ajax Programming with Ubiquity XForms (Slides for IE and FF only) (Details)
By: Mark Birbeck
Conference: XML-in-Practice 2008, held at the Marriott Crystal Gateway Hotel, Arlington, Virginia, December 8th to 10th, 2008.

Passing run-time parameters to internet applications

Determining the behaviour of an application at run-time using parameters is a well-established practice. But whilst it's possible with command-line and server-side applications, the scope for passing information to client-side internet applications is limited. With the growth in internet-facing desktop applications, widgets and gadgets, there is a need to pass parameters directly to the application, rather than via a server, and this post looks at how that might be achieved.<!--break-->
To illustrate, we can use the Unix command ls to see the contents of a directory, and add switches like -las to change the behaviour:

ls -las

Similarly, on the web we can pass parameters to server-side applications, using query strings:

h<!-- stop Drupal from making this into a link -->ttp://www.google.co.uk/search?q=ubiquity

Let's look at how something similar might be done on the client.
Using XPointer
We already have a mechanism to pass variables to a client, via the URI, and that's the XPointer Framework. I've described before how we use this technique in UBX Viewer to pass parameters that are to become meta and link values; as UBX Viewer loads it uses these settings to configure itself.
But there are a number of reasons why we now need something more general purpose, and it can easily be based on the same principles.
#vars(...)
The technique we're using with the Ubiquity libraries is to pass information to be used by the client in the vars XPointer.
For example, if we have a slideshow application, and want to create a bookmark for the third slide, we might just do it like this:

.../slideshow.html#3
However, if the application is also able to indicate which theme should be used, it is better to have the capacity to set more parameters, which vars gives us:

.../slideshow.html#vars(slide=3, theme=city)
#instance(...)
Another example of how run-time information might be passed concerns XForms applications.
In XForms it is possible to indicate the data to be used in the form by using the instance element. The element can either hold information directly:

<xf:instance>
<details xmlns="">
<name>Harry Angstrom</name>
</details>
</xf:instance>

or via an external file:

<xf:instance src="data.xml" />

However, when creating a generic application the initial data will inevitably be different for each user. In this situation we can use another XPointer, called instance:

.../createProfile.html#instance(name="Harry Angstrom")
This has the effect of populating the default instance in the default model, with an element called name, using the same rules as described in XForms 'lazy authoring'.
#src(...)
The final way that we might want to specify data at run-time is to indicate the @src value of an element, so as to cause a different file to be loaded at run-time.
For example, if we wanted the instance data to be loaded from a different location, we would ensure that the instance has an @id value in the document:

<xf:instance id="p" src="" />

and then specify the data source in the URL:

.../createProfile.html#src(p=data.xml)
Conclusion
As the number of client-side internet applications grows there is an increasing need to make them flexible. Passing run-time parameters via the URL is one way this can be achieved.

The future of mark-up languages


This presentation by Mark Birbeck was given at the Kings of Code conference.
<!--break-->
Future Of Web Languages
View more documents from Mark Birbeck.

Presentation title: The future of mark-up languages (slides)
By: Mark Birbeck
Conference: Kings of Code, held at Pakhuis de Zwijger, in Amsterdam, on May 27th, 2008.

AttachmentSize

koc-logo-wide.png36.56 KB

XForms Ubiquity

I just found out about a nice little XForms engine called Ubiquity. (Having dinner with Mark Birbeck, TV Raman, and Leigh Klotz certainly helps one find out about such things) :-)
It’s a JavaScript implementation done right. Open source under the Apache 2.0 license. Seems like a nice fit with, oh maybe MarkLogic Server? -m

Interview on Cubic Garden

Ian Forrester interviewed Mark Birbeck whilst they were both at XTech 2008. Subjects mentioned range from XForms, Ajax, Sidewinder, ODF, unobtrusive JavaScript, XUL, standards adoption, New Labour, Hegel's idealism, the Ajax Community, and the Ubiquity XForms processor.
<!--break-->

Building Mobile Applications with xH


This presentation by Mark Birbeck was part of the Reinventing the Mobile Browsing Experience Through Ajax industry panel, held at WWW2007.

Ajax and progressive browser enhancement

There is still plenty of scope for the browser to evolve, but the truth is that progressive browser enhancement makes browser evolution far less significant than it has been until now. That's probably not a bad thing, given the chaos that is currently surrounding the development of HTML at the W3C.

Turn any web page into a desktop application

I've just finished implementing a new feature in the Sidewinder Viewer, which further simplifies the task of turning any web page into a desktop application. If invoked with a command line argument that specifies the document to load, the viewer now also checks the fragment identifier for the presence of a meta XPointer scheme. The expression associated with this scheme allows you to set a number of application-level properties, such as the window title, height, width, position and so on.