Nucl. Acids Res.
QUICK SEARCH:   [advanced]
Year:  Vol:  Page: 

Nucleic Acids Research 2005 33(Database issue):D5-D24; doi:10.1093/nar/gki139
This Article
Print PDF (350K)
Database Summaries
Alert me when this article is cited
Alert me if a correction is posted
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Request Permissions
Commercial Re-use Guidelines
for Open Access NAR Content
PubMed Citation
Articles by Galperin, M. Y.
Nucleic Acids Research, 2005, Vol. 33, Database issue D5-D24
© 2005, the authors
Nucleic Acids Research, Vol. 33, Database issue © Oxford University Press 2005; all rights reserved

The Molecular Biology Database Collection: 2005 update

Michael Y. Galperin*

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

* Tel: +1 301 435 5910; Fax: +1 301 435 7794; Email: galperin{at}

Received October 30, 2004; Revised and Accepted November 1, 2004

The Nucleic Acids Research Molecular Biology Database Collection is a public online resource that lists the databases described in this and previous issues of Nucleic Acids Research together with other databases of value to the biologist and available throughout the world. All databases included in this Collection are freely available to the public. The 2005 update includes 719 databases, 171 more than the 2004 one. The databases are organized in a hierarchical classification that simplifies the process of finding the right database for any given task. The growing number of databases related to immunology, plant and organelle research have been accommodated by separating them into three new categories. The database summaries provide brief descriptions of the databases, contact details, appropriate references and acknowledgements. The online summaries also serve as a venue for the maintainers of each database to introduce database updates and other improvements in the scope and tools. These updates are particularly important for those databases that have not been described in print in the recent past. The database list and summaries are available online at the Nucleic Acids Research web site,

In its 12th annual database issue, Nucleic Acids Research presents 135 new and recently updated molecular biology databases. The current release of the Nucleic Acids Research online Molecular Biology Database Collection (Table 1) includes 719 databases, an increase of 171 over last year (1). The database geography also continues to expand. This year we have the first databases from Brazil, Cuba, Estonia, Greece, Hungary (2,3), Malaysia, Taiwan (4,5) and Turkey. The database authors have again shown remarkable creativity in naming their databases: last year's ORFanage [the database of orphan ORFs (6)] has been joined by H-ANGEL [Human ANatomic Gene Expression Library (7)], PROPHECY [PROfiling of PHEnotypic Characteristics in Yeast (8)], PANDIT [Protein and Associated Nucleotide Domains with Inferred Trees (9)], SIEGE [Smoking-Induced Epithelial Gene Expression (10)] and other aptly named databases. The database list is divided into 14 major categories, 3 more than last year. One of them, the category for immunology-related databases, was created in response to the rapid growth in databases dedicated to immuno-polymorphisms, certainly an offshoot of the Human Genome Project. The proliferation of plant-related databases, sparked by the completion of the first two plant genomes (Arabidopsis thaliana and Oryza sativa) and steady progress in sequencing other plants, prompted elevation of their status from a subcategory to a separate category. One more category, organelle databases, was created to provide a single home for the databases on chloroplasts and mitochondria from various sources. As always, we hope that these database listings, organized into a hierarchical structure, will help introduce the community of biologists to the enormous body of data accumulated by their colleagues and simplify the process of finding the appropriate database for each particular task.

View this table:
[in this window]
[in a new window]
Table 1. Molecular Biology Database Collectiona

Certainly, this listing is far from exhaustive. To be included, databases had to be publicly available to any user and allow direct browsing of the data without downloading any special software that might interfere with institutional firewalls. This means leaving out several potentially interesting database projects.

Of the 548 databases featured in last year's compilation, 17 have been dropped from the list because they have been discontinued, merged into larger ones or, like the well-known Kabat database, converted to commercial access. The previous year saw a loss of 13 databases from a total of 386 in the 2003 release. These numbers and the history of Swiss-Prot ( and the GDB Human Genome Database ( show that the databases that offer useful content usually manage to survive, even if they have to change their funding scheme or migrate from one host institution to another. This means that the open database movement is here to stay, and more and more people in the community (as well as in the financing bodies) now appreciate the importance of open databases in spreading knowledge. It is worth noting that the majority of database authors and curators receive little or no remuneration for their efforts and that it is still difficult to obtain money for creating and maintaining a biological database. However, disk space is relatively cheap these days and database maintenance tools are fairly straightforward, so that a decent database can be created on a shoestring budget, often by a graduate student or as a result of a postdoctoral project. Many databases in this compilation originated just that way—as collections of data on a certain research topic that a particular lab was studying anyway, formatted in a user-friendly way by graduate or even undergraduate students as part of their dissertations or course work. Subsequent maintenance and further development of these databases, however, require a commitment that can only be applauded. For scientists from China, France, Japan, Russia and many other countries, making their databases available to the worldwide community also means maintaining them in English, the lingua franca of science, which does not always come easily. Such efforts deserve special appreciation.

Speaking of appreciation, those who maintain databases often do not get much credit for their work either. Other than publication in the Database Issue of Nucleic Acids Research or in Bioinformatics, or an occasional publication in some other journal, there is currently no straightforward way to announce progress. The online summaries published by the database maintainers on the NAR web site partially fill this void. Since the entry number assigned to each database (Table 1) will be stable, these updates can be cited in just the same way as any other online resource. At this time, I would suggest the following format for citing these summaries: ‘The ooTFD database (11) is listed with Accession No. 185 in the NAR Molecular Biology Database compilation (1); see the recent summary at’. Suggestions for a better format are certainly welcome. Suggestions for the inclusion of additional databases in this Collection, as well as for improvements to the category structure, are also encouraged and should be directed to the author at galperin{at} .

I thank Rich Roberts, Alex Bateman and my colleagues at NCBI for support and helpful advice, Alice Ellingham and Gill Smith for logistical support, and Claire Saxby, Amanda Titmas and Kate Welsby at Oxford University Press for their patience in handling this compilation.

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use permissions, please contact journals.permissions{at} .


  1. Galperin,M.Y. ( (2004) ) The Molecular Biology Database Collection: 2004 update. Nucleic Acids Res., , 32, , D3–D22.[Abstract/Free Full Text] .

  2. Barta,E., Sebestyén,E., Pálfy,T.B., Tóth,G., Ortutay,C.P. and Patthy,L. ( (2005) ) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res., , 33, , D86–D90.[Abstract/Free Full Text] .

  3. Tusnády,G.E., Dosztányi,Z. and Simon,I. ( (2005) ) PDB_TM: selection and membrane localization of transmembrane proteins in the Protein Data Bank. Nucleic Acids Res., , 33, , D275–D278.[Abstract/Free Full Text] .

  4. Huang,H.-D., Horng,J.-T., Lin,F.-M., Chang,Y.-C. and Huang,C.-C. ( (2005) ) SpliceInfo: an information repository for the modes of mRNA alternative splicing in human genome. Nucleic Acids Res., , 33, , D80–D85.[Abstract/Free Full Text] .

  5. Chang,Y.-H., Su,W.-H., Lee,T.-C., Sun,H.-F.S., Chen,C.-H., Pan,W.-H., Tsai,S.-F. and Jou,Y.-S. ( (2005) ) TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations. Nucleic Acids Res., , 33, , D174–D177.[Abstract/Free Full Text] .

  6. Siew,N., Azaria,Y. and Fischer,D. ( (2004) ) The ORFanage: an ORFan database. Nucleic Acids Res., , 32, , D281–D283.[Abstract/Free Full Text] .

  7. Tanino,M., Debily,M.-A., Tamura,T., Hishiki,T., Ogasawara,O., Murakawa,K., Kawamoto,S., Itoh,K., Watanabe,S., José de Souza,S., Imbeaud,S., Graudens,E., Eveno,E., Hilton,P., Sudo,Y., Kelso,J., Ikeo,K., Imanishi,T., Gojobori,T., Auffray,C., Hide,W. and Okubo,K. ( (2005) ) The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res., , 33, , D567–D572.[Abstract/Free Full Text] .

  8. Fernandez-Ricaud,L., Warringer,J., Ericson,E., Pylvänäinen,I., Kemp,G., Nerman,O. and Blomberg,A. ( (2005) ) PROPHECY – a database for high-resolution phenomics. Nucleic Acids Res., , 33, , D369–D373.[Abstract/Free Full Text] .

  9. Whelan,S., de Bakker,P.I. and Goldman,N. ( (2003) ) Pandit: a database of protein and associated nucleotide domains with inferred trees. Bioinformatics, , 19, , 1556–1563.[Abstract/Free Full Text] .

  10. Shah,V., Sridhar,S., Beane,J., Brody,J.S. and Spira,A. ( (2005) ) SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic Acids Res., , 33, , D573–D579.[Abstract/Free Full Text] .

  11. Ghosh,D. ( (2000) ) Object-oriented transcription factors database (ooTFD). Nucleic Acids Res., , 28, , 308–310.[Abstract/Free Full Text] .