ECS 289F        INFORMATION SYSTEMS INTEROPERABILITY             (4)     III

Lecture:            3 hours

Discussion:    1 hour

Prerequisite:   Course 165A

Grading:   Letter; projects (60%), homework (20%), final (20%)

Catalog Description:

Information integration and data exchange among web-based information
sources; data models: semistructured data and XML; metadata models;
ontologies; query processing; data integration architectures: wrappers
and mediators; search engines; digital libraries; scientific data
ware- houses

Goals:

The integration and exchange of heterogeneous information among
web-based information sources builds a core concept in today's
information infrastructure.  Students will learn the concepts and
architectures underlying these infrastructures.  They will learn
non-standard data models and query languages associated with
semistructured data and XML and apply query processing strategies to
these models.  Students will learn the role and concepts underlying
metadata models and the usage of metadata in information integration
scenarios.  They will learn and implement components of these
architectures, including source wrappers and query mediators.

In the project work, the class will stress developing and implementing
data integration environments using commercial and freely available
software.

Expanded Course Description:

I. Introduction and Overview
 A.  Problems in Information Integration and Data Exchange
 B.  Concepts and Architectures
 C. Applications

II. Semistructured Data
 A. Basic Concepts
 B. The Object Exchange Model (OEM)
 C. Unstructured Query Language (UnQL)

III. XML
 A.  Basic Concepts, Syntax
 B.  DTD and XML Schema
 C. XML Add-ons (XPath, XLink, XSL, SAX, DOM)

IV. Query Languages
 A.  Path Expressions
 B.  Lorel and UnQL
 C.  Query Languages for XML
 D.  Advanced Topics: Structural Recursion, Complexity Issues, Graph Theory

V. Type Systems
 A. Typing Semistructured Data
 B. Typing in XML
 C. Path Constraints
 D. General Integrity Constraints

VI. Metadata
 A.  Role of Metadata in Data Integration/Exchange
 B.  Metadata Standards
 C.  Ontologies
 D.  Data Annotation Techniques
 E.  Resource Description Framework (RDF)

VII. Systems
 A.  Mediators and Wrappers
 B.  Query Processing Aspects
 C.  The Stanford Lore System
 D.  XML-``based'' Database Systems

VIII. Advanced Data Integration Infrastructures
 A.  Search Engines: Features, Anatomy, Concepts
 B.  Digital Libraries
 C.  Scientific Data Warehouses

Textbook:

No required textbook.  Recommended: S. Abiteboul, P. Buneman, D. Suciu
(eds.): Data on the Web -- From Relations to Semistructured Data and
XML, Morgan Kaufmann, 2000

Collection of papers addressing specific topics will be distributed in
class.

Computer Usage:

Students work individually on projects in a UNIX workstation
environment, using standard UNIX tools (programming environments) as
well as major database (Oracle8i, Lore) and XML software packages.

Engineering Design Statement:

The goal of the projects is to design, implement and verify a data
integration architecture tailored to a particular application domain.
The application domain is chosen by the students prior to the
projects.  The projects include the design and implementation of
source wrappers, query mediators (and associated metadata models) as
well as the implementation of a query processing engine.  The systems
and tools used for these projects resemble those that would be found
in industry to the extent possible, including Oracle8i and diverse XML
tools.  Projects are graded based on the design, performance, and
correctness, including documentation.  Examination questions are based
on models and techniques discussed in the lecture and from the
projects.

ABET Category Content:

  Engineering Science:  2  units
  Engineering Design:   2  units

Instructor:  M. Gertz

Prepared by:   M. Gertz (January 2000)

Overlap Statement:

This course has some very minor overlaps with the topic "distributed query
processing" covered in ECS 265 (Distributed Database Systems). Otherwise,
there is no overlap with existing courses.