ECS 289F INFORMATION SYSTEMS INTEROPERABILITY (4) III Lecture: 3 hours Discussion: 1 hour Prerequisite: Course 165A Grading: Letter; projects (60%), homework (20%), final (20%) Catalog Description: Information integration and data exchange among web-based information sources; data models: semistructured data and XML; metadata models; ontologies; query processing; data integration architectures: wrappers and mediators; search engines; digital libraries; scientific data ware- houses Goals: The integration and exchange of heterogeneous information among web-based information sources builds a core concept in today's information infrastructure. Students will learn the concepts and architectures underlying these infrastructures. They will learn non-standard data models and query languages associated with semistructured data and XML and apply query processing strategies to these models. Students will learn the role and concepts underlying metadata models and the usage of metadata in information integration scenarios. They will learn and implement components of these architectures, including source wrappers and query mediators. In the project work, the class will stress developing and implementing data integration environments using commercial and freely available software. Expanded Course Description: I. Introduction and Overview A. Problems in Information Integration and Data Exchange B. Concepts and Architectures C. Applications II. Semistructured Data A. Basic Concepts B. The Object Exchange Model (OEM) C. Unstructured Query Language (UnQL) III. XML A. Basic Concepts, Syntax B. DTD and XML Schema C. XML Add-ons (XPath, XLink, XSL, SAX, DOM) IV. Query Languages A. Path Expressions B. Lorel and UnQL C. Query Languages for XML D. Advanced Topics: Structural Recursion, Complexity Issues, Graph Theory V. Type Systems A. Typing Semistructured Data B. Typing in XML C. Path Constraints D. General Integrity Constraints VI. Metadata A. Role of Metadata in Data Integration/Exchange B. Metadata Standards C. Ontologies D. Data Annotation Techniques E. Resource Description Framework (RDF) VII. Systems A. Mediators and Wrappers B. Query Processing Aspects C. The Stanford Lore System D. XML-``based'' Database Systems VIII. Advanced Data Integration Infrastructures A. Search Engines: Features, Anatomy, Concepts B. Digital Libraries C. Scientific Data Warehouses Textbook: No required textbook. Recommended: S. Abiteboul, P. Buneman, D. Suciu (eds.): Data on the Web -- From Relations to Semistructured Data and XML, Morgan Kaufmann, 2000 Collection of papers addressing specific topics will be distributed in class. Computer Usage: Students work individually on projects in a UNIX workstation environment, using standard UNIX tools (programming environments) as well as major database (Oracle8i, Lore) and XML software packages. Engineering Design Statement: The goal of the projects is to design, implement and verify a data integration architecture tailored to a particular application domain. The application domain is chosen by the students prior to the projects. The projects include the design and implementation of source wrappers, query mediators (and associated metadata models) as well as the implementation of a query processing engine. The systems and tools used for these projects resemble those that would be found in industry to the extent possible, including Oracle8i and diverse XML tools. Projects are graded based on the design, performance, and correctness, including documentation. Examination questions are based on models and techniques discussed in the lecture and from the projects. ABET Category Content: Engineering Science: 2 units Engineering Design: 2 units Instructor: M. Gertz Prepared by: M. Gertz (January 2000) Overlap Statement: This course has some very minor overlaps with the topic "distributed query processing" covered in ECS 265 (Distributed Database Systems). Otherwise, there is no overlap with existing courses.