la-web.org
1st Latin American Web Congress
Santiago 2003
Empowering Our Web · November 10-12
Organized by the Center for Web Research,
Dept. of Computer Science, University of Chile
with the Sponsorship of IW3C2

XML Retrieval

(Half day tutorial)
Instructor: Ricardo Baeza-Yates,
Universidad de Chile

Abstract
This tutorial covers the main concepts related to retrieving information from data structured in XML. The content is divided in three parts. The first covers the main concepts of XML, including defining, displaying and querying XML data. The second part addresses the challenges of retrieving information from XML data, in particular ranking and benchmarking. The last part covers indices for structure and their algorithms, focusing in the trade-off of expressivity vs. efficiency. Along the tutorial the state of the art of the above issues is emphasized, including current available software.

Introduction
XML is becoming the de facto data standard for the Web. XML query languages, through the proposed Xquery language, combines SQL with concepts from OO databases and information retrieval. Hence, it is an important topic for IR and DB researchers. However, little is known about expressivity of the language versus the efficiency of feasible implementation, which is related to structured text models and results on the last years.

Contents
The first part is devoted to XML standards including the main concepts of XML, namespaces, DTDs and schemas, as well as XML software.

The second part is focused in XML Query Languages, including the history of the development of them. That includes Xpath and Xquery.

The third part covers structured text models and their relation to XML. We compare their expressivity and efficiency and show that can be used to implement XML query languages. We also give two examples of indices for XML.

The last part includes XML retrieval evaluation and the different techniques that haveing used in the INEX initiative.

Official URL for LA-Web 2003:
http://la-web.org/
Last updated:
Questions/Comments: la-web@dcc.uchile.cl