The Molecular Biology Database Collection: 2005 update -- Galperin 33 (Supplement 1): D5 -- Nucleic Acids Research

The Molecular Biology Database Collection: 2005 update

Michael Y. Galperin^*

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

^* Tel: +1 301 435 5910; Fax: +1 301 435 7794; Email: galperin{at}ncbi.nlm.nih.gov

Received October 30, 2004; Revised and Accepted November 1, 2004

ABSTRACT

TOP
ABSTRACT
COMMENTARY
REFERENCES

The Nucleic Acids Research Molecular Biology Database Collectionis a public online resource that lists the databases describedin this and previous issues of Nucleic Acids Research togetherwith other databases of value to the biologist and availablethroughout the world. All databases included in this Collectionare freely available to the public. The 2005 update includes719 databases, 171 more than the 2004 one. The databases areorganized in a hierarchical classification that simplifies theprocess of finding the right database for any given task. Thegrowing number of databases related to immunology, plant andorganelle research have been accommodated by separating theminto three new categories. The database summaries provide briefdescriptions of the databases, contact details, appropriatereferences and acknowledgements. The online summaries also serveas a venue for the maintainers of each database to introducedatabase updates and other improvements in the scope and tools.These updates are particularly important for those databasesthat have not been described in print in the recent past. Thedatabase list and summaries are available online at the NucleicAcids Research web site, http://nar.oupjournals.org/.

	ABSTRACT

COMMENTARY

TOP
ABSTRACT
COMMENTARY
REFERENCES

In its 12th annual database issue, Nucleic Acids Research presents135 new and recently updated molecular biology databases. Thecurrent release of the Nucleic Acids Research online MolecularBiology Database Collection (Table 1) includes 719 databases,an increase of 171 over last year (1). The database geographyalso continues to expand. This year we have the first databasesfrom Brazil, Cuba, Estonia, Greece, Hungary (2,3), Malaysia,Taiwan (4,5) and Turkey. The database authors have again shownremarkable creativity in naming their databases: last year'sORFanage [the database of orphan ORFs (6)] has been joined byH-ANGEL [Human ANatomic Gene Expression Library (7)], PROPHECY[PROfiling of PHEnotypic Characteristics in Yeast (8)], PANDIT[Protein and Associated Nucleotide Domains with Inferred Trees(9)], SIEGE [Smoking-Induced Epithelial Gene Expression (10)]and other aptly named databases. The database list is dividedinto 14 major categories, 3 more than last year. One of them,the category for immunology-related databases, was created inresponse to the rapid growth in databases dedicated to immuno-polymorphisms,certainly an offshoot of the Human Genome Project. The proliferationof plant-related databases, sparked by the completion of thefirst two plant genomes (Arabidopsis thaliana and Oryza sativa)and steady progress in sequencing other plants, prompted elevationof their status from a subcategory to a separate category. Onemore category, organelle databases, was created to provide asingle home for the databases on chloroplasts and mitochondriafrom various sources. As always, we hope that these databaselistings, organized into a hierarchical structure, will helpintroduce the community of biologists to the enormous body ofdata accumulated by their colleagues and simplify the processof finding the appropriate database for each particular task.

	COMMENTARY

View this table:
[in this window]
[in a new window]

Table 1. Molecular Biology Database Collection^a

Certainly, this listing is far from exhaustive. To be included,databases had to be publicly available to any user and allowdirect browsing of the data without downloading any specialsoftware that might interfere with institutional firewalls.This means leaving out several potentially interesting databaseprojects.

Of the 548 databases featured in last year's compilation, 17have been dropped from the list because they have been discontinued,merged into larger ones or, like the well-known Kabat database,converted to commercial access. The previous year saw a lossof 13 databases from a total of 386 in the 2003 release. Thesenumbers and the history of Swiss-Prot (http://www.expasy.org/announce/)and the GDB Human Genome Database (http://www.gdb.org/gdb/aboutGDB.html)show that the databases that offer useful content usually manageto survive, even if they have to change their funding schemeor migrate from one host institution to another. This meansthat the open database movement is here to stay, and more andmore people in the community (as well as in the financing bodies)now appreciate the importance of open databases in spreadingknowledge. It is worth noting that the majority of databaseauthors and curators receive little or no remuneration for theirefforts and that it is still difficult to obtain money for creatingand maintaining a biological database. However, disk space isrelatively cheap these days and database maintenance tools arefairly straightforward, so that a decent database can be createdon a shoestring budget, often by a graduate student or as aresult of a postdoctoral project. Many databases in this compilationoriginated just that way—as collections of data on a certainresearch topic that a particular lab was studying anyway, formattedin a user-friendly way by graduate or even undergraduate studentsas part of their dissertations or course work. Subsequent maintenanceand further development of these databases, however, requirea commitment that can only be applauded. For scientists fromChina, France, Japan, Russia and many other countries, makingtheir databases available to the worldwide community also meansmaintaining them in English, the lingua franca of science, whichdoes not always come easily. Such efforts deserve special appreciation.

Speaking of appreciation, those who maintain databases oftendo not get much credit for their work either. Other than publicationin the Database Issue of Nucleic Acids Research or in Bioinformatics,or an occasional publication in some other journal, there iscurrently no straightforward way to announce progress. The onlinesummaries published by the database maintainers on the NAR website partially fill this void. Since the entry number assignedto each database (Table 1) will be stable, these updates canbe cited in just the same way as any other online resource.At this time, I would suggest the following format for citingthese summaries: ‘The ooTFD database (11) is listed withAccession No. 185 in the NAR Molecular Biology Database compilation(1); see the recent summary at http://www3.oup.co.uk/nar/database/summary/185’.Suggestions for a better format are certainly welcome. Suggestionsfor the inclusion of additional databases in this Collection,as well as for improvements to the category structure, are alsoencouraged and should be directed to the author at galperin{at}ncbi.nlm.nih.gov .

ACKNOWLEDGEMENTS

I thank Rich Roberts, Alex Bateman and my colleagues at NCBIfor support and helpful advice, Alice Ellingham and Gill Smithfor logistical support, and Claire Saxby, Amanda Titmas andKate Welsby at Oxford University Press for their patience inhandling this compilation.

	ACKNOWLEDGEMENTS

Notes

The online version of this article has been published underan open access model. Users are entitled to use, reproduce,disseminate, or display the open access version of this articlefor non-commercial purposes provided that: the original authorshipis properly and fully attributed; the Journal and Oxford UniversityPress are attributed as the original place of publication withthe correct citation details given; if an article is subsequentlyreproduced or disseminated not in its entirety but only in partor as a derivative work this must be clearly indicated. Forcommercial re-use permissions, please contact journals.permissions{at}oupjournals.org .

	Notes

REFERENCES

TOP
ABSTRACT
COMMENTARY
REFERENCES

	REFERENCES

Galperin,M.Y. ( (2004) ) The Molecular Biology Database Collection: 2004 update. Nucleic Acids Res., , 32, , D3–D22.[Abstract/Free Full Text] .
Barta,E., Sebestyén,E., Pálfy,T.B., Tóth,G., Ortutay,C.P. and Patthy,L. ( (2005) ) DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res., , 33, , D86–D90.[Abstract/Free Full Text] .
Tusnády,G.E., Dosztányi,Z. and Simon,I. ( (2005) ) PDB_TM: selection and membrane localization of transmembrane proteins in the Protein Data Bank. Nucleic Acids Res., , 33, , D275–D278.[Abstract/Free Full Text] .
Huang,H.-D., Horng,J.-T., Lin,F.-M., Chang,Y.-C. and Huang,C.-C. ( (2005) ) SpliceInfo: an information repository for the modes of mRNA alternative splicing in human genome. Nucleic Acids Res., , 33, , D80–D85.[Abstract/Free Full Text] .
Chang,Y.-H., Su,W.-H., Lee,T.-C., Sun,H.-F.S., Chen,C.-H., Pan,W.-H., Tsai,S.-F. and Jou,Y.-S. ( (2005) ) TPMD: a database and resources of microsatellite marker genotyped in Taiwanese populations. Nucleic Acids Res., , 33, , D174–D177.[Abstract/Free Full Text] .
Siew,N., Azaria,Y. and Fischer,D. ( (2004) ) The ORFanage: an ORFan database. Nucleic Acids Res., , 32, , D281–D283.[Abstract/Free Full Text] .
Tanino,M., Debily,M.-A., Tamura,T., Hishiki,T., Ogasawara,O., Murakawa,K., Kawamoto,S., Itoh,K., Watanabe,S., José de Souza,S., Imbeaud,S., Graudens,E., Eveno,E., Hilton,P., Sudo,Y., Kelso,J., Ikeo,K., Imanishi,T., Gojobori,T., Auffray,C., Hide,W. and Okubo,K. ( (2005) ) The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms. Nucleic Acids Res., , 33, , D567–D572.[Abstract/Free Full Text] .
Fernandez-Ricaud,L., Warringer,J., Ericson,E., Pylvänäinen,I., Kemp,G., Nerman,O. and Blomberg,A. ( (2005) ) PROPHECY – a database for high-resolution phenomics. Nucleic Acids Res., , 33, , D369–D373.[Abstract/Free Full Text] .
Whelan,S., de Bakker,P.I. and Goldman,N. ( (2003) ) Pandit: a database of protein and associated nucleotide domains with inferred trees. Bioinformatics, , 19, , 1556–1563.[Abstract/Free Full Text] .
Shah,V., Sridhar,S., Beane,J., Brody,J.S. and Spira,A. ( (2005) ) SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic Acids Res., , 33, , D573–D579.[Abstract/Free Full Text] .
Ghosh,D. ( (2000) ) Object-oriented transcription factors database (ooTFD). Nucleic Acids Res., , 28, , 308–310.[Abstract/Free Full Text] .