Biological Database Modeling

Biological data such as protein structure and function, DNA sequences, and metabolic pathways require conceptual modeling characteristics that are not available in traditional conceptual modeling, such as in the widely used entity-relationship (ER) model and its variant, the enhanced-ER (EER) model. In particular, there are three constructs that occur frequently in bioinformatics data: ordered relationships, functional processes, and three-dimensional structures. In addition, biological data modeling requires many levels of abstraction, from the DNA/RNA level to higher abstraction levels such as cells, tissues, organs, and biological systems. In this chapter, we discuss some of the concepts that are needed for accurate modeling of biological data. We suggest changes to the EER model to extend it for modeling some of these concepts by introducing specialized formal relationships for ordering, processes, and molecular spatial structure. These changes would facilitate more accurate modeling of biological structures and ontologies, and they can be used in mediator and integrative systems for biological data sources. We propose new EER schema diagram notation to represent the ordering of DNA sequences, the three-dimensional structure of proteins, and the processes of metabolic pathways. We also show how these new concepts can be implemented in relational databases. Finally, we discuss why multilevel modeling would enhance the integration of biological models at different levels of abstraction, and we discuss some preliminary ideas to realize a multilevel model.

