|
XML Retrieval
(Half day tutorial)
Instructor: Ricardo Baeza-Yates,
Universidad de Chile
Abstract
This tutorial covers the main concepts related to retrieving information
from data structured in XML. The content is divided in three parts.
The first covers the main concepts of XML, including defining, displaying
and querying XML data. The second part addresses the challenges
of retrieving information from XML data, in particular ranking and
benchmarking. The last part covers indices for structure and their
algorithms, focusing in the trade-off of expressivity vs. efficiency.
Along the tutorial the state of the art of the above issues is emphasized,
including current available software.
Introduction
XML is becoming the de facto data standard for the Web. XML query
languages, through the proposed Xquery language, combines SQL with
concepts from OO databases and information retrieval. Hence, it
is an important topic for IR and DB researchers. However, little
is known about expressivity of the language versus the efficiency
of feasible implementation, which is related to structured text
models and results on the last years.
Contents
The first part is devoted to XML standards including the main concepts
of XML, namespaces, DTDs and schemas, as well as XML software.
The second part is focused in XML Query Languages, including the
history of the development of them. That includes Xpath and Xquery.
The third part covers structured text models and their relation
to XML. We compare their expressivity and efficiency and show that
can be used to implement XML query languages. We also give two examples
of indices for XML.
The last part includes XML retrieval evaluation and the different
techniques that haveing used in the INEX initiative.
|