XQuery and Information Retrieval
An interesting problem we were facing is that though there are several
sophisticated XML query languages, none of these languages
sufficiently supports an XML document view in which documents mainly
contain text besides XML element structures. Thus data (or
information) retrieval schemes that support conditions on text become
equally important to query schemes that focus on path patterns. In a
WebDB 2002 paper Jan-Marco Bremer and Michael Gertz propose an
extension of the XML query language XQuery by a powerful information
retrieval component, dubbed XQuery/IR, providing a well-defined and
easy to use model for integrating XML data and document retrieval
through dynamic ranking of document fragments. This is the
first work that not only has a well-defined semantics for such a
information retrieval operator in XQuery, but also outlines a complete
framework for its realization. We are currently completing a journal
paper that details the full implementation of the new operator in
XQuery, with a particular focus on space and access efficient index
structures to support full-text indexing of XML documents and XQuery
optimization schemes. A first prototype is currently used in the
context of the Human Brain Project in which text-rich XML documents
are integrated into an XML document repository using the above
document conversion approach.
Personnel:
Michael
Gertz (Computer Science)
Jan-Marco Bremer
(Ph.D. student, Computer Science)
Funding:
In the context of the Human Brain Project (see
Data Intergration and Metadata Management
Research Web page).
Publications: