xml

Impressions of SemTech 2010

Two weeks ago I attended my fifth Semantic Technology conference in San Francisco. It was a great conference! This is my fifth time attending the conference and I plan to attend in the future! My biggest dilemma was which session to attend. At times there were as many as eight concurrent sessions going on.

There were a few big trends that I spotted.

The use of RDFa to annotate web pages using semantically precise elements was clearly a big trend. Jay Myer of BestBuy described how the BestBuy sales went up 30% when they added RDFa tags to their product pages. Although many search engines are not transparent about their use of RDFa tags in page rankings, Jay’s results should make it clear that this strategy works.

Jay told a great story about how hard it was to find a fridge that met his specific criteria: black with a specific size etc. using a keyword-based search engine. The semantic web will change all of this!

The big factor here is Martin Hepps GoodRelations ontology for products and services. This can really bring the Semantic Web to the masses. Finally a way to code the hours of operation for your store so that you can use Siri to ask "What Sushi Restrauants are Open at 10pm Near Here"

There were also a lot of sessions on LinkedData and specifically on LinkedData in the government area. Jim Heldler and his students at RPI are scraping every data set on data.gov and converting them all to RDF and storing them in a huge triple store.

Jim also provided one of the best sound-bites from the conference:

“Get AWAY from the Table”

This goes far beyond the NOSQL movement that we are seeing. It gets to the core of the problems with innovation in many organizations.

Jim is a consultant to the US data.gov transparency process and a key advocate of open linked data standards. It is interesting to see a friendly rivalry between Jim and Tim Berners-Lee who is a consultant for the UK data transparency movement. Both are using RDF to convert data but each is using slightly different strategic approaches.

He was referring to the fact that many organizations over-use the relational model and they need to understand that there are many alternatives, especially when doing mashups of data sets from many sources.

There were also many presentations on natural language processing and finding the true “meaning” of words within unstructured and semi-structured data sets.

I always attend Brian Sletton’s session on REST. Many people do not understand the relationship between REST, URI’s and the semantic web stack. Joe Wicentowski’s work at the US Department of State on the URL rewriting frameworks within eXist has really reinforced how this can be done easily in a single XQuery module.

I presented a 3.5 hour tutorial session on Entity Extraction. Since it was scheduled for the last session of the last day of the conference I though the attendance would be very low. But the room was packed and most of the people stayed till the very end.

It was wonderful to finally meet Marie Wallace and DJ McCloskey from the LanguageWare team at IBM. Their support of the Apache UIMA standards will be a great step forward to the creation of interoperable language analysis piplelines. Marie and DJ introduced me to the Millennium restaurant just a few blocks from the hotel. My wife Ann and I went there over the weekend and we now own one of their cookbooks.

By the way Google is now also support recipes in their rich snippets in their search results so you will soon be able to find "only recipes that take under 30 minutes" in a Google search engine.

There were also many session on the need for good taxonomies and ontologies in the enterprise. The use of SKOS for controlled vocabularies was a very hot topic and there were dozens of presentation on how OWL is being used to capture and exchange business rules.

It was also really great to finally meet Jeni Tennison after reading her books and following her Tweets for a long time. She is deeply involved in the federal data transparency projects in the UK. Here use of RDF is breaking new best practices for the entire community.

It was also great to catch up with Mark Birbeck, one of the chief architects of the Ubiquity XForms libraries. I am looking forward to trying out some of the new tools based on his backplane JavaScript libraries.

The people from Facebook also gave a presentation on how they are adding the ability for people to add a few lines to each web site to allow users to add a “like” button on a web site using a variation of RDF. Many people were a little disappointed to hear that they will not be using namespaces in their interfaces. Facebook felt that host HTML coders could only handle a single namespace. Many people agreed with their findings and thought that until 90% of HTML coders knew what namespaces were that organizations like Facebook were stuck just adding new data to HTML meta elements.

One of the most interesting discussions I had was with the people from Cray Computer. Apparently a large unnamed government agency has given Cray Research a very large contract to build customized ASIC (FPGA) chips to do graph analysis. Their ThreadStorm XMT architecture has 128 register sets and allows the CPUs to get continuous feeds of graph queries without waiting for memory. Their claim is that the federal agency has told them they are getting a 100X improvement in complex graph queries. The challenge is that the API is currently a low-level C interface and they have not yet put a SPARQL complier in front of an XML. So on good SPARQL benchmarks are yet available to compare the results of this hardware with a typical triple store. But even if it is fast, the price would be in the six digits for one of the XMP systems. So only very large organizations and government agencies might use this unless it was provided as a service.

One of the other things I found was how many people are using OWL and the OWL reasonerers like Pellet as replacements for traditional rules engines from companies like FairIsaac. There are two reasons for this. First is that many rules engine companies only talk about their engine performance but not on the need for precise semantics in the data elements. With OWL and the methods behind the semantic web stack we attempt to put semantics higher in the requirements of a system a use pre-built and pretested ontologies.

The second big reason that people are use OWL is that the rules are created using open standards. That means you are not locked into any one vendors rule format. The new W3C standards for rule interchange (RIF) will also start to break down many of the barriers to the creation and exchange of large industry vertical rule sets to that each organization does not have to start from scratch with a rule base. This will be big for industries like insurance where claims processing ontologies are just being created. Thanks to Kendal Clark for good information on this topic.

I also had a great time talking to many people in the publishing area. Seth Maislin gave a great presentation on Taxonomies. Seth and Marlene Rockmore joined me for a lunch at nearby Indian restaurant. Marlene is an expert on Taxonomy development. She was working on a book with O’Relly on taxonomies that is on hold for now but we hope to see one in the future from her.

It was also nice to see how many federal agencies are now starting to use SPARQL in the intelligence community. I saw a good presentation by Dennis Wisnosky with the US DOD talk about how they plan to use NIEM and semantic technologies to cut their integration costs. Yeah NIEM! Most people don’t understand that the NEIM is based on the RDF model, it just uses XML syntax to store the relationships. See my web-cast on the SemanticUniverse.com web site for more details. Hopefully his slides will be available in the future. Many people were taking photos of the slides since the DOD is not great at getting their slides out. Wonder why?

I also had lunch with several people from within the intelligence community that had great discussions about the pros and cost of pulling assertions out of NLP-analyzed annotated XML documents and the challenges relating to this. Keeping the links back to the original XML documents to is critical to verify the results of an assertion in context. Traceability, linage, provenance and time-domain representations in RDF that don’t cause a 10X growth is the number of triples is a difficult problem.

If there was some way to relate RDF assertion to a source document XML Node ID (the fourth column question) is a very difficult problem and one that needs close research with native XML and RDF systems. One of the things I learned about the eXist-Lucene integration done by Wolfgang Meier is that keeping the node-id as the document ID in Lucene allows non-programmers the ability to configure customized search rank rules past on the context of the keyword in a document. This is where highly customizable structured search rocks. I think this innovation needs to be added to RDF triples projects. Keeping the XML node-id of an assertion with an RDF triple context could be a great way to prevent RDF triple bloom for context.

Finally, I felt that David Wood and James Leigh's presentation on Callimacus project had the most potential to have a "big impact" on the adoption of Semantic Web tools. This new open source project allows users to specify a simple HTML template with special XML tags that allows them to bring triples directly into a web application. This has the potential to allow far more non-SPARQL programmers to integrate RDF data directly into a web page much like XQuery does today. With some work it might be possible to auto-generate the XForms bind elements for form rules. This would be a huge win for the integration of rules into web forms without needing to write JavaScript.

You can also see my tweets on the conference here: http://twitter.com/dmccreary

I hope to see more of you in San Francisco next year. Let me know if you are interested in co-presenting any papers!

XForms: binding to an optional element

I asked this on the XSLTForms list, but it’s worth casting a wider net.
Say I have an XML instance like this:
<root>
<format>xml</format>
</root>
Possible values for the format element are “xml”, “text”, “binary”, or a default setting, indicated by a complete absence of the <format> element. I’d like this to attach to a select1 with four choices:
<xf:select1 ref=”format”>…
* [...]

XForms 1.1 is out

XForms 1.1 is now a full W3C Recommendation. Compared to version 1.0, which went live a bit more than 6 years ago, version 1.1 offers lots of road-tested tools that make development easier and more powerful, including new datatypes and XPath functions, a significantly more powerful submission subsystem, and a more flexible event model.
And XSLTForms [...]

XProc and SMIL: Orchestrating Pipelines

Although the W3C's XML Pipeline Language (XProc) hasn't even left the stable yet, people are already looking beyond its original purpose. XProc was designed to solve the problem of how to describe the joining together of multiple XML processing steps. So, the question is, how do you extend XProc to handle new features like explicit concurrency...

Verifying XML Signatures on Lotus Forms Documents

Back in March, I wrote about the open standards basis of Lotus Forms documents. This entry included comments on the use of the XML Signatures standard in combination with XForms within the XFDL markup of Lotus Forms.



Now I'd like to draw your attention to a developerWorks article we've now published on the technical details of Verifying Lotus Forms XML Signatures with Java. This article explains how a JSR 105 compliant implementation, such as can be found in the Apache security library or in Java 6, can be used validate the XML signatures created by the Lotus Forms client software (either the Web Form Server or the client-side Forms Viewer plugin).



Generally, a Lotus Forms document consolidates the client-side of the business process function. This could be a many-step process for an individual or it could be a process that spans many individuals who are collaborating to perform a business transaction. Applying an XML signature on a Lotus Form protects the markup of the consolidated client experience, not just the transactional data created by users. Users don't "see" the XML markup of data for a transaction. They visually see (or aurally sense with accessibility software) the whole "contract" that gives context to the data content. An XML signature applied by Lotus Forms client software signs the whole agreement. The above mentioned article explains how open standards based software can be used to complete the server-side function of validating the XML signatures in order to secure the transactions of a business process. Since Lotus Forms are XML documents based on XForms, this means that the entire business process workflow on a Lotus Form can be achieved with open standards based software.

Are we losing the Declarative Web?

I saw something the other day that I was both intrigued and bothered by in equal measure. 'Mozilla and the Khronos Group Announce Initiative to Bring Accelerated 3D to the Web'. Apparently, the working group will look at exposing OpenGL capabilities within ECMAScript. The intriguing part is that, as a fan of 3D Computer Graphics and Animation this has got to be a good sign, especially if it is exposed in this way; but the bothersome bit is how people will end up using it because it has been exposed in this way. The crux of the problem for me is the question, JavaScript - what's it good for? Absolutely...

When you're SMIL-ing, when you're SMIL-ing...

...the whole world smiles with you. No it's not a typo, the acronym for the W3C's Synchronized Multimedia Integration Language (SMIL) is pronounced "smile", and the SMIL Animation module sure makes me smile; even more so given the fact that I've seen it mentioned, outside of the usual multi-media circles, three times last year and once already this year...

XML 2008 liveblog: Ubiquity XForms

I will talk about one or more sessions from XML 2008 here.
Mark Birbeck of Web Backplane talking about Ubiquity XForms.
Browsers are slow to adopt new standards. Ajax libraries have attempted to work around this. Lots of experimentation which is both good and bad, but at least has legitimzed extensions to browsers. JavaScript is the assembly [...]

XForms for Prototyping

A high-fidelity prototype provides the engineers and QA organization with a rich, interactive description of the product's intended functionality and
design to be used as a reference basis for implementation and test. Whenever this subject is raised my thoughts turn immediately to XForms. The advantage of prototyping with XForms is that it is quick, declarative, readable and is well defined.

XForms for Prototyping

A high-fidelity prototype provides the engineers and QA organization with a rich, interactive description of the product's intended functionality and
design to be used as a reference basis for implementation and test. Whenever this subject is raised my thoughts turn immediately to XForms. The advantage of prototyping with XForms is that it is quick, declarative, readable and is well defined.

XRX: Simple, Elegant, Disruptive - O'Reilly XML Blog

XRX: Simple, Elegant, Disruptive - O'Reilly XML Blog

XRX is a new web development architecture that is a milestone
in elegant simplicity. XRX stands for: XForms on the client REST
interfaces and XQuery on the server Because XRX uses a single model
for data (XML) it avoids the translation complexity of other
architectures. The simplicity and elegance of XRX allows developers
to focus on other value-added features of web application
development and enables non-programmers to create a rich web
interaction experience without the need to use procedural
programming languages.

Saved By: draganigajic | View Details | Give Thanks

Tags: , , , , , , ,

Chiba - Trac

Chiba - Trac

alternative to orbeon seems better

Saved By: draganigajic | View Details | Give Thanks

Tags: , ,

The Future of XForms

Some of the recent talk on the Mozilla XForms Project's mailing list (dev-tech-xforms) has been about the winding-down in effort on the Mozilla XForms plug-in. There has been praise for the efforts of those developers involved in the project, and quite rightly so. However, some people may be seeing this as a bad sign for XForms in general. Well, not so I say and the reasons for this are three-fold...

XForms for XML Documents (forms)

XForms for XML Documents (forms)

Orbeon Forms - Web Forms for the Enterprise, Done the Right Way

Orbeon Forms - Web Forms for the Enterprise, Done the Right Way

w3blog " Blog Archive " Innovation, or the lack thereof... W3C, anyone?

w3blog " Blog Archive " Innovation, or the lack thereof... W3C, anyone?

Word To XML Converters

XForms

XRX: Performing Updates | O'Reilly News

XRX: Performing Updates | O'Reilly News

Ten XML Schemas you should know

Ten XML Schemas you should know

In this article, look at some top XML schemas that provide solutions for all sorts of problems, from the basics of Web services to data description. You'll also cover database-like solutions that involve contacts and invoices. The schemas in this article were chosen for their usefulness and utility, plus their impact on the XML community in how information is shared and exchanged using the XML format.

Saved By: Frederik Van Zande | View Details | Give Thanks

Tags: , , , , , , , , , ,

XForms in Firefox

XForms in Firefox

Elliotte Rusty Harold explains how to process XForms with Firefox.

Saved By: Timo Kankare | View Details | Give Thanks

Tags: , , ,