Overview of XQuery

Home arrow XML Tutorials arrow XQuery arrow Overview of XQuery
Overview of XQuery Print E-mail
Contributed by Howell   
Monday, 05 June 2006
What is XML Query?

XML Query, often abbreviated as XQuery, is a specification that's been around in one form or another for a few years now. The XML Query working group, chartered in September 1999, was tasked with creating a flexible query language to extract data from XML documents. The latest working drafts (see Resources) go a long way toward achieving this goal.

XQuery builds on the XPath specification. In fact, some of the features of XQuery have been acknowledged as being so fundamental that they have been incorporated into the XPath 2.0 specification, and this specification is now co-owned by the W3C's XML Query and XSL working groups. This is good news, as it means that style sheet authors will soon be able to take advantage of features like sequences, quantification, and stronger type control. Also, conditional expressions and iterators have been added to the XPath language, where previously they were part of the XSL language. This should allow for cleaner code in style sheets and fewer headaches for the style sheet creators.

 

FLWR expressions

The most powerful new feature in XQuery is the FLWR expression. FLWR (pronounced flower) is an acronym for For-Let-Where-Return, the clauses allowed in one of these expressions. FLWR expressions can perform many tasks you'd never dream of undertaking in XSL style sheets.

Each FLWR expression has one or more for clause, one or more let clause, an optional where clause, and a return clause.

for clauses

You use the for clauses to specify a set of Cartesian tuples on which the rest of the expression will be evaluated, shown in Listing 1. You control the order of evaluation by the order you choose for the tuples.

 for $exp1 in (<a/>, <b/>)


The running program will evaluate the expression in Listing 1 twice, with the $exp variable set to <a/> and <b/> . If you introduce another for expression, the program will evaluate the Cartesian product. Take a look at the example in Listing 2, which uses more than one for clause.


for $exp1 in (<a/>, <b/>)
for $exp2 in (<c/>, <d/>)
 


In Listing 2, the program will evaluate the expression four times, once for each tuple:

(<a/>, <c/>)
(<a/>, <d/>)
(<b/>, <c/>)
(<b/>, <d/>)

let clauses

The let clause assigns a value or sequence to a variable. This can be useful shorthand for use in the where or return clauses.

where and return clauses

The where clause directs the program to discard particular tuples if they do not meet particular conditions. The return clause defines what to return for each tuple.

In this example, the query returns the names of all authors in the document who have written more than three books. It starts with an example document on which the expression operates, shown in Listing 3.


<authorList>
<author name="Kevin Williams">
<book>Professional XML, 2nd Edition</book>
<book>Professional XML Databases</book>
<book>Professional XML Schemas</book>
</author>
<author name="John Q. Somebody">
<book>Esoteric Topics in Programming, Vol. 1</book>
<book>Esoteric Topics in Programming, Vol. 2</book>
</author>
</authorList>
 


<frequentWriters>
{
let $inDoc := document("authors.xml")
for $author in ($inDoc//author)
let $cb := count($author/book)
where ($cb >= 3)
return
<author>$author/@name</author>
}
</frequentWriters>
 


The XQuery in Listing 4 would return the contents of Listing 5.


<frequentWriters>
<author>Kevin Williams</author>
</frequentWriters>


 The distinct-values function

XQuery also introduces a function that comes in very handy when performing data manipulations: the distinct-values function (also found in XPath 2.0). This function allows you to easily pivot a relationship in a document. For example, say you had the list of your software company's customers and the products they had purchased shown in Listing 6.


<customerList>
<customer name="Big Bank, Inc.">
<product name="MyDataMinder" />
<product name="MyDataFinder" />
</customer>
<customer name="PharmaCorp, Inc.">
<product name="MyDataFinder" />
<product name="MyDataBinder" />
</customer>
</customerList>
 

If you wanted to transform this document into a document that lists all the products, along with a list of customers for each product, you would have a major task on your hands. It's possible, but very ugly to code. Using XQuery, though, the problem becomes a simple one, as shown in Listing 7.


<productList>
{
let $inDoc := document("customerList.xml")
for $product in distinct-values("$input//customer/product/@name)
return
<product name={$product}>
{
for $customer in $input//customer
where $customer/product/@name = $product
return
<customer name={$customer/@name} />
}
</product>
}
</productList>
 


Listing 7 would produce the output shown in Listing 8.


<productList>
<product name="MyDataMinder">
<customer name="Big Bank, Inc." />
</product>
<product name="MyDataFinder">
<customer name="Big Bank, Inc." />
<customer name="Pharmacorp, Inc." />
</product>
<product name="MyDataBinder">
<customer name="Pharmacorp, Inc." />
</product>
</productList>
 
Powerful, simple to use, and easy to understand: XQuery makes that kind of data manipulation easy. 

When should you use XQuery?

When it's sensible to begin using XQuery really depends on when you're reading this column and how eager you are to start using a new spec. As of February 2002 the specification is still in Working Draft status, which means it can change significantly between now and the time it is released. Once it reaches Proposed Recommendation status, it's generally viewed to be stable enough to try out -- in fact, the W3C encourages developers to use specs at this point to generate the feedback required to fine tune the spec before it's blessed with Recommendation status. Spring 2002 would be a good time to get to know the spec if you think it will offer big enough benefits that you want to try it out as soon as it reaches Proposed Recommendation status.

No matter when you decide that XQuery might be a viable solution for you, here are a few guidelines to keep in mind regarding when it may be an appropriate part of your solution.

First of all, XQuery isn't a magic bullet. Even though syntactically it's much better than XSL for data manipulation (and it allows some things that XSL doesn't allow directly), the engine underneath still has to read each document, parse it, and then manipulate it using the query language. This makes XQuery a good solution for indexed document repositories (so-called XML "databases") that can quickly access atoms of document content, but it's not as good a solution for unindexed documents.

Second, XQuery contains some mechanisms for accessing multiple documents in a repository. The document function allows you to programmatically access multiple documents in the same query. However, the same problem applies: You still need to load and parse every document. For best performance, then, you're still better off using an XML database or some other sort of indexing model.

Finally, XQuery works best on "hybrid" documents -- documents that contain both narrative flow and quantified data. For example, a medical-transcription document might contain both a narrative of the surgeon's actions during an operation, as well as specific amounts of medicine, blood, and other supplies used during the operation. That document would be ill suited to storage in a relational database, but XQuery would do a good job of extracting the quantified information from the XML document directly. If your document is pure data, however, it still makes more sense to bring it into a relational database for manipulation.

Conclusion

XQuery provides a strong grammar for the manipulation of data inside XML documents. It is best suited to documents that contain both narrative text and quantified data. For the best performance using XQuery on these types of documents, load them into some sort of indexed XML repository.

Whether the W3C will release the specification by summer still remains to be seen; there are some gigantic unresolved issues at this time, including whether there should be reserved words in XPath 2.0 expressions. These issues will almost certainly take some time to resolve. However, being aware of your document needs now can position you to best take advantage of this technology when it becomes widely available

Resources

 Specification for XQuery 1.0.

 


  home              contact us

 

©2006-2008 DeveloperZone.biz   All rights reserved     powered by Mambo Designed by Siteground