Data on the Web is not only unstructured and of diverse types, but also often comes without metadata that can be used in data retrieval schemes. This is a complicating aspect in particular for querying (distributed) scientific data sources containing huge amounts of image, text, and raw experiment data that exhibit no database-like schema structures. Over the last years we have developed and extended a data annotation model that allows users to associate semantic-rich metadata with (remote) Web data at different levels of granularity (whole documents or fragments (regions of interest) of documents). Metadata schemes underlying annotations are based on conceptual structures such as ontologies and standard vocabularies. This ensures that only well-defined metadata can be associated with data. Conceptual structures in combination with data annotations allow users to query heterogeneous data sources in a uniform and integrated fashion at an abstract, conceptual level. Since the initial proposal of conceptualized data annotations, we have made several contributions in the area of metadata management and data integration.
Integrating Scientific Data
With faculty from the Department of Computer Science at UC Davis and
the Center for Neuroscience at UC Davis, we are working on the
development of architectures and models for integrating, managing, and
querying heterogeneous forms of Neuroscience data in collaborative
research environments. The integration approach utlizes a so-called
annotation graph model, which is based on representing and querying
graph structures, and turns out to be extremely useful in presenting,
managing and querying metadata schemes, data annotations, and
Web-accessible documents in a uniform and transparent manner. While
some works on ontologies and metadata simply focus on associating
concepts with data, in our approach includes checking the consistent
usage of metadata (schemes) in data annotations.
Personnel:
Michael
Gertz (Computer Science)
Jan-Marco Bremer
(Ph.D. student, Computer Science)
Cheryl Kang
(M.S. student, Computer Science)
Mike
Hogarth (School of Medicine and Graduate Group of Medical Informatics),
Fredric
Gorin (Center for Neuroscience)
Funding:
Human Brain Project: "Informatics of Human and
Monkey Brain Atlases", (PI Edward
G. Jones, Center for Neuroscience) at a level of about $7,000,000
for 5 years
Publications: