Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-02-11
2004-06-01
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06745204
ABSTRACT:
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark files or records, but otherwise reserves all copyright rights whatsoever.
REFERENCE TO A COMPUTER PROGRAM LISTING APPENDIX
The present application includes and incorporates by reference a computer sequence listing appendix on a single compact disc and its duplicate, each compact disc was created on Oct. 23, 2003 and includes the files Appendix A.doc (96 kB) and Appendix B.doc (160 kB).
FIELD OF THE INVENTION
The invention relates to a system, methods and products for managing, finding, and/or displaying biomolecular interactions.
BACKGROUND OF THE INVENTION
Technological advances and mounting interest have pushed proteins into the scientific spotlight. This growing field encompasses the study of proteins, both in structure and in function, contained in a proteome—the protein equivalent of a genome. Because of increased interest and technique automation (Mendelsohn et al., 1999), the rate of proteomic data production is growing in a similar fashion as that of genomics a decade ago. For example, mass spectrometers, gene chips, and two-hybrid systems have made cellular signaling pathway mapping faster and easier and consequently these are becoming large producers of data. Protein-protein interaction and more general biomolecule-biomolecule (protein-DNA, protein-RNA, protein-small molecule, etc.) interaction information is being generated and recorded in the literature. Lessons from the genomic era have taught us that large amounts of related data recorded in scientific journals soon becomes unmanageable. A well designed common data specification based on a model of the biological information is therefore required to describe and store biomolecular interaction data.
SUMMARY OF THE INVENTION
The present inventors have designed a data specification for the storage and management of biomolecular interaction and biochemical pathway data that possesses the following properties:
1. It describes the full complexity of the biological data, from simple binary interactions to large-scale molecular complexes and networks of pathways and interactions. It stores protein, DNA, RNA, and other molecules in full atomic detail, since character based sequence abstractions of biomolecules often miss important chemical features, such as methylation on DNA. This allows as much data as possible to be stored for scientific use in electronic form rather than in print.
2. It is easily computable. A computer can easily read, write, and traverse the specification. This facilitates maintenance of a database of such information, creation of advanced queries and querying tools and development of computer programs that use the information for data visualization, data mining, and visual data entry.
3. It is platform and database independent. Tools written for one platform can read data created on another platform directly. It handles the data structure without modification as well.
4. It is succinct and easy for humans to understand. Field to data correspondence is very clear and a human readable format of the specification is available.
The data structure was designed for a database referred to herein as “BIND” (Biomolecular Interaction Network Database). The data structure is written in a data specification language called Abstract Syntax Notation. 1 (ASN.1, also known as X.208 or ISO-8824) The U.S. National Center for Biotechnology Information (NCBI) uses ASN.1 to describe and store all of its biological and publication data and all of GenBank, MMDB and PubMed (Ostell and Kans, 1998). BIND inherits the NCBI data model, which provides a solid foundation for the BIND data specification through the use of mature NCBI data types that describe sequence, 3D structure, and publication reference information.
Although the specification is written in ASN.1, it is not restricted to this syntax. The data structures can be readily translated to other common data specification languages such as CORBA IDL (Object Management Group, 1996) or XML if the need arises. Aside from ASN.1, no other biological data specification is sufficiently rich in mature data types to use as a foundation for BIND without first building and testing those base data types.
The BIND data specification represents complex cellular pathway information efficiently in a computer. BIND defines three main data types: interactions, molecular complexes, and pathways. Each of these objects is composed of various component and descriptor objects that are either defined in the specification proper or inherited from the NCBI ASN.1 data specifications. For example, an interaction record contains, among other data objects, two BIND-objects. A BIND-object describes a molecule of any type and is itself defined using simpler sub-objects. Normally, a BIND-object describing a biopolymer sequence will store a simple link to a sequence database, such as GenBank (Benson et al., 1999). If, however, the sequence is not present in a public database, it can be fully represented using an embedded NCBI-Bioseq object. The NCBI-Bioseq object is how NCBI stores all of the sequences in GenBank and is a mature data structure. BIND also inherits the NCBI taxonomy model (also used and supported by EMBL, DDBJ and Swiss-Prot) and data, via an inherited NCBI-BioSource, and is designed so that interactions can be both inter- and intra-organismal. Sequence, structure, publication, taxonomy and small molecule databases provide a strong foundation for BIND.
Broadly stated, the present invention contemplates a system for electronically managing, finding, and/or visualizing biomolecular interactions comprising a computer system including at least one computer receiving data on biomolecular interactions from a plurality of providers and processing such data to create and maintain images and/or text defining biomolecular interactions, said computer system, in response to data requests, creating and transmitting to a plurality of end-users, the images and/or text defining biomolecular interactions.
In an embodiment, a system for electronically managing, finding, and/or visualizing biomolecular interactions is provided comprising:
(a) a maintenance entity for receiving data on biomolecular interactions from a plurality of providers and means for receiving and processing such data to create and maintain images and/or text defining biomolecular interactions; and
(b) one or more computer systems maintained by the maintenance entity and having means for creating and transmitting to a plurality of end-users the images and/or text defining biomolecular interactions.
The system is useful in managing, finding, and/or displaying biomolecular interactions including interactions involving proteins, nucleic acids (RNA, DNA), and ligands, molecular complexes, and signaling pathways. The interactions are defined both at the molecular and atomic levels and in particular they may be defined by chemical graphs.
The invention also provides a method for displaying on a computer screen information concerning biomolecular interactions comprising retrieving an image and/or text defining a biomolecular interaction from a system of the invention.
The present invention also provides a data structure stored in the memory of a computer the data structure having a plurality of records and each record containing a biomolecular interaction and information relating to the biomolecular interaction. In an embodiment the biomolecular interaction is identified by chemical graphs. The information in the data structure may be accessible by using indices which may represent selections of information from the chemical graphs.
The term “record” used herein generally refers to a row in a database table. Each record contains one or more fields or attributes. A given record may be uniquely specified by one or a combination of fields or attributes known as the record's primary key.
Bader Gary
Hogue Christopher
Dodds, Jr. Harold E.
Merchant & Gould P.C.
Mount Sinai Hospital
Robinson Greta
LandOfFree
System for electronically managing, finding, and/or... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for electronically managing, finding, and/or..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for electronically managing, finding, and/or... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3365763