webBackplane

Treating URIs as strings considered dangerous

Since URIs are often conveyed as strings it's tempting to manipulate them as such, but it's better--and safer--to delegate URI manipulation to special functions. These can then have their own unit-tests, which will take into account the edge-cases that can catch us out.
<!--break-->
I've just been doing a quick code review in the Ubiquity XForms project, and one thing caught my eye that I thought might be worth a post.
Forms submission
In forms submission -- both in XForms and HTML forms -- we often need to add parameters to a URI.
For example, if we have the URI http://example.org, and the parameters a=b and c=d, then the resulting URI should be:

  1. http://example.org?a=b&c=d

It seems pretty straightforward that we need to add the parameters to the URI, with a '?' in between:

  1. [uri] + '?' + [parameters]

Adding parameters
We can see that the parameters themselves have been added by taking the name and value (a=b, etc.), and adding it to the URI, ensuring that for all parameters other than the first, there is a separator between. (The separator can be either an '&' or a ';'.)
Base URIs already containing parameters
The bug that needed fixing though, was that this 'naive' concatenation doesn't work if the URI you are dealing with already contains parameters.
For example, if we have the URI http://example.org?x=y, and we need to add the same two parameters we had before, then our simple concatenation would give us:

  1. http://example.org?x=y?a=b&c=d

when what we actually need is:

  1. http://example.org?x=y&a=b&c=d

As you can see, if we already have a '?' then we don't need to add another, so it seems that a simple addition to our concatenation code would be to use a call to indexOf to see if there is a '?' present, and only add another if we don't find one.
There's a small additional test we'll need to make which is to check whether the last character in the URI is a '?', as I'll explain.
Base URIs with empty query strings
Recall that we added the parameters by placing a=b and c=d onto the end of the URI, separated by '&':

  1. http://example.org?a=b&c=d

Now, if we already have a URI with a query then we need to ensure that there is an additional '&' placed before our first parameter:

  1. http://example.org?x=y&a=b&c=d

But what if the base URI has a query indicator (i.e., the '?') but no parameters? In other words, what if we have this URI:

  1. http://example.org?

In this situation we don't want to add the extra separator, otherwise we'll get this:

  1. http://example.org?&a=b&c=d

So our rules now become that we only want to precede our parameters with a separator if the '?' is not the last character in the URI. It's a little awkward, but thanks to lastIndexOf, I'm sure we can manage.
Base URIs already containing fragments
However, there's a further subtlety; what if the URI contains a fragment identifier?
For example, if we have the URI http://example.org#x, and we need to add the same two parameters we had before, then our simple concatenation would give us this:

  1. http://example.org#x?a=b&c=d

The fragment identifier part of the URI has now become x?a=b&c=d because it's always the last part of the URI. What we actually want is to insert the new parameters before the '#':

  1. http://example.org?a=b&c=d#x

Now we need to add another use of indexOf to check for a '#', and if we find one, use its position as the point at which to insert the parameters.
Context is everything
However, the assumption behind using indexOf and lastIndexOf in this way is that a URI will contain only one '?'. A secondary assumption here is that the only time you'll ever see a '? is as an indicator of the query part of the URI.
Both of these assumptions are incorrect.
Question marks in parameters
The first assumption is that you can only have one '?' in a URI. However, the query section of RFC 3986 explicitly flags up that the '?' character is a valid parameter value. For example, we can have a=finished? as a parameter.
This means it's quite easy to envisage scenarios where there is more than one '?' in the URI:

  1. http://example.org?a=finished?&c=d

This won't necessarily mess up our first use of indexOf, but it will mess up the use of lastIndexOf as a way to check whether you need to add an extra separator. Recall that we wanted to avoid turning this:

  1. http://example.org?

into this:

  1. http://example.org?&a=b&c=d

so we used lastIndexOf to check whether the last character in the URI was a '?'. But that algorithm will turn this:

  1. http://example.org?a=finished?

into this:

  1. http://example.org?a=finished?c=d

You might need to look closely to spot it, but because the last character was a '?', we haven't added a separator before the c, and as a result, instead of having two parameters (a=finished? and c=d) we have only one (a=finished?c=d).
Question marks in fragments
The interesting thing about the previous examples is that at least you know you have a query string, so you might be tempted to still use indexOf to manipulate things. After all, although we may have too many '?' characters, we still know that we have query.
However, with the the fragment section of RFC 3986 all bets are off; here we can see that '?' is explicitly allowed as a fragment character.
This means that it's possible to have a '?' in a URI even if it doesn't have a query. For example:

  1. http://example.org#finished?

This may seem like a contrived example, but actually it's not, for two reasons.
The first is that the fragment part of a URI is carefully defined to allow anything, because we don't know how it will be interpreted. You may think that "finished?" is not valid as an HTTP fragment, but what about in the scheme "xyz"?
And this is the key point; since HTML forms and XForms can ultimately deal with any scheme, since that's how the web is designed, we must write our algorithms defensively, and not assume anything.
Safe URI handling
Hopefully this delving into some of the subtleties of URI handling and parsing--and we haven't even begun to talk about turning relative paths into absolute paths, handling encoded characters, and so on-- has done enough to convince you that you shouldn't manipulate URIs directly, as simple strings.
The only way to be completely sure of what is happening is to use special functions to unpack a URI into its various components, then manipulate those components--perhaps adding additional parameters to the list of query parameters, but it might also be to turn a relative path into an absolute path--before finally reassembling the URI.
This may sound like a lot of work, but it's the only way to be sure that characters don't get incorrectly interpreted as a consequence of their position in the URI not being taken into account.
In the backplanejs library these functions are in the URI module. (This is also imported into the Ubiquity XForms library.) Breaking up a URI simply involves calling spliturl, which returns an object containing all of the parts. For example:

  1. spliturl( "http://example.org?a=finished?" )

would give us the object:

  1. {
  2.   scheme: "http:,
  3.   authority: "example.org",
  4.   path: ""
  5.   query: "a=finished?"
  6.   fragment: ""
  7. }

It's then an easy matter to manipulate the query part, before creating a new URI with the recomposeURI method.
Conclusion
Since URIs are often conveyed as strings then it's tempting to manipulate them as such. But the problem with doing this means that the context of a character is rarely taken into account when processing.
It's better--and safer--to delegate URI manipulation to special functions. These can then have their own unit-tests, which will take into account the edge-cases that can catch us out.

Countdown to Safari

The next big task that I'm about to tackle on the Ubiquity XForms project is support for Safari and Chrome. In general, UXF has seemed to work pretty well in the WebKit-based browsers whenever I've tried it, but I've never run exhaustive tests.
<!--break-->
So yesterday, for the first time, I ran the full XForms 1.1 test suite in Safari. The results were a huge disappointment: 443 tests were executed, of which 402 failed and just 41 passed. Suddenly, the job ahead seemed colossal.
After my initial shock wore off, and feeling dubious about the figures, I decided to watch the tests more closely as they were being run in Selenium. It quickly became apparent that a problem existed in our isModelReady() extension function, because Selenium was frequently executing its commands before the test forms had finished loading. Sure enough, some poking around revealed a bug that was easily resolved.
I'm pleased to be able to say that, post-commit, the test suite results for Safari are a lot more promising. Of the 443 tests that ran, 195 failed and 248 passed; a 55% (ish) pass rate. Not great, but far less daunting as a starting point. Among the failures, there look to be a few clumps of tests that may be of common origin, which bodes well for the possibility of quick wins. Submission looks like being the biggest single area that needs work, as chapter 11 failed almost in its entirety.
Issues have been raised for many of the failures in the bug tracker. So if anyone else feels like getting their hands dirty, feel free to take some. The long road to Safari support starts here!

XForms Developer Zone and User Group launched

We're pleased to be launching two new initiatives to help people who are interested in XForms.
The first is the all new XForms Developer Zone web-site -- or xformsdz, as we're calling it.
Whilst the Developer Zone will be unashamedly biased towards XForms, within that, we'll have discussions, articles, code snippets, and tutorials about any XForms processor we can find, and any application framework in which it's used.
To accompany the web-site, we're also launching a regular newsletter, and a London XForms User Group.
The London XForms User Group
The inaugural meeting of the user group will be in Clerkenwell, London, hosted by Skills Matter. We'll be looking at the architecture of XForms, in particular what it means to have MVC support built in from the ground up, and we'll be using XForms and Google Maps as a case-study.
The October meeting will look at the benefits of using XForms for the UK insurance industry, and we're lucky enough to have an industry expert -- Neal Champion -- speaking to us that evening.
If you'd like to come along, we'd love to see you. And even if you can't make it, join the user group mailing-list, and we'll keep you posted about XForms-related activities and events, in and around London.

Linking issues and revisions in Google Code

<!--break-->
We use Google Code a lot in our projects, because it provides a great range of tools, with a straightforward interface.
It's particularly useful to have version control and issue-tracking in the same system, and it's very easy to cross-reference the two by referring to revisions in issue comments, and to issues in the commit messages.
On top of that, Google Code provides some nice code review facilities, and here too, it's possible to refer back to revisions and issues.
In this post I'll outline how we use these tools together in UXF, the Ubiquity XForms AJAX library.
Everything should have an issue
The first thing that's needed is an issue. This might be an issue in the traditional sense, of a bug. But it might also be a desirable feature, that someone would like to add. Whatever it is, we shouldn't really be starting to code until we've written down what we're going to do, or what problem we're trying to solve.
Let's use issue 515 as an illustration. At the top we see it has the following description:

We need a UI control that can adhere to data binding restrictions for
xsd:boolean / xforms:boolean. The control may render a checkbox, for
example.

Sample code snippet:

<xf:input ref="check" datatype="xforms:boolean">
<xf:label>Check: </xf:label>
</xf:input>

Based on conversation with John in last week's call, needed in this
iteration (setting priority to critical).
Code reviews and issues
This description is plenty to get started with, so Rahul gets on with his development and testing, until he is ready for a code review. When he is ready, he commits his code to the review area, with the following SVN message:

[ issue 515 ] Proposed changes for adding a UI control that can bind to boolean
data (renders a checkbox).

You can view this commit at r2919, and as you can see, Google Code provides a convenient link to all of the files involved in the commit. GC also provides tools for attaching comments to any line in those files.
This revision therefore becomes the location at which the review is conducted, perhaps involving many comments from many reviewers.
But note also that Rahul's reference to "issue 515" has become a link to the issue itself. This means that a reviewer has everything they need to start reviewing -- links to all of the files that have changed, a reference to the issue being addressed, and some tools that allow them to add comments to the files.
However, anyone viewing the issue directly would not necessarily know that it was being worked on, or what the progress was. So to keep things up-to-date, Rahul added the following comment to issue 515:

Code for such a UI control has been proposed for addition in r2919 and review
requested. Changing Status to InReview.

The issue is now not only marked as being InReview, but we also have a convenient link to the code review itself. If you were interested in this issue, you would now be able to add your comments to the review, try out the proposed code, and so on.
Committing the code to trunk
At some point the review will finish, and the code will be ready to go into the trunk. (There may be other reviews first, but at some point the cycle should end.)
In our example, Rahul commits his reviewed code to trunk at r2920, with the following message:

[ issue 515 ] UI control that can bind to boolean data. Committed after favorable
review of r2919, with suggested filename change.

Note that we still have a reference to the issue number, but most importantly, we have a reference to the code review at r2919. This means that anyone looking at the code that is committed, and who wanted to understand why a particular change was made, could find the history of the discussion in the code review.
But we're not finished yet.
If we go back to the code review at r2919, we'll find that the last comment Rahul has added says:

Committed to trunk in r2920.

Now, anyone who stumbles across the code review directly knows that the code they are looking at really did find it's way into the trunk, and wasn't left hanging there uncommitted, or was abandoned in favour of some other solution.
Closing the issue
The final step is of course to close off the issue, and here it's important to refer to the revision at which the code was committed. In our example, the last comment on issue 515 is this:

Committed to trunk in r2920 after favorable review.

Resolving issue as Fixed.

Now anyone interested specifically in the issue can see not only that it is closed, but also where the code is that addressed the issue.
Conclusion
This technique may seem a little laborious, but it actually takes longer to describe than to do.
And by following these steps we ensure that we know how a sequence of comments ends, whether in an issue or code review.
We need to know how an issue was addressed when it gets marked as fixed, we need to know what happens to code after a positive code review, we need to know where some code came from that is committed to trunk; in short, no matter whether you are looking at one of committed code, an issue, or a code review, these procedures ensure that you can easily find the other two.

A brief history of Ubiquity formsPlayer

Having spent the majority of my last four years working on the likes of Ubiquity XForms, Hubbub and UBX, I've found it rather nostalgic to get back to working on Ubiquity formsPlayer during the last few weeks. So much so, in fact, that I decided to take the opportunity to delve back through the source control logs and brush up on the old girl's history a bit.
<!--break-->
The initial project commit was made by Mark, on the morning of Monday 13th May, 2002. I joined the project a shade less than a year later, on 6th May 2003, before making my first commit exactly one month after that, on 6th June. Full conformance to the XForms Recommendation (or Proposed Recommendation, as it was then) was announced on 22nd September of the same year, before formsPlayer 1.0 was made generally available on 13th February, 2004. The five-and-a-half years since that first release have seen eight subsequent point releases, four different source control repositories, an aborted formsPlayer 2.0 branch and (belatedly) two new versions of Internet Explorer from Microsoft.
Reading back through all that, the over-riding impression I get is one of maturity. And, although most of our energy has been directed elsewhere for the last year or so, Ubiquity formsPlayer remains a key project for us, for precisely that reason. Things may have gone quiet on the old formsPlayer website, but we're still actively developing the project for our customers and, now that it is open source, you can track our progress on the project's Google Code site.
The latest release is version 1.8.1023, built today and tested against versions 6, 7 and 8 of Internet Explorer. If you're still using an old version of formsPlayer, please consider upgrading and giving us your feedback in the issue tracker or on the mailing list.

How to harness the power of XHTML and XForms in your .NET applications

Recently, we've had a number of enquiries about embedding formsPlayer inside third-party applications, as more people are realising the benefits of using XForms as a dynamic UI framework. Fortunately, formsPlayer exposes a set of COM interfaces explicitly for this purpose and a number of commercial solutions exist that are currently relying on them. With this in mind, I've posted a tutorial article to The Code Project, one of the leading .NET developer resources, which demonstrates integration of these interfaces in a simple C# browser-like application.

Ajax and progressive browser enhancement

There is still plenty of scope for the browser to evolve, but the truth is that progressive browser enhancement makes browser evolution far less significant than it has been until now. That's probably not a bad thing, given the chaos that is currently surrounding the development of HTML at the W3C.

Turn any web page into a desktop application

I've just finished implementing a new feature in the Sidewinder Viewer, which further simplifies the task of turning any web page into a desktop application. If invoked with a command line argument that specifies the document to load, the viewer now also checks the fragment identifier for the presence of a meta XPointer scheme. The expression associated with this scheme allows you to set a number of application-level properties, such as the window title, height, width, position and so on.