Nucleic Acids Research 2005 33(Database issue):D5-D24;
doi:10.1093/nar/gki139
Nucleic Acids Research, 2005, Vol. 33, Database issue
D5-D24
©
2005, the authors
Nucleic Acids Research, Vol. 33, Database
issue © Oxford University Press 2005; all rights reserved
The Molecular Biology Database Collection: 2005 update
Michael Y. Galperin*
National Center for Biotechnology Information, National Library of Medicine,
National Institutes of Health, Bethesda, MD 20894, USA
* Tel: +1 301 435 5910; Fax: +1 301 435 7794; Email:
galperin{at}ncbi.nlm.nih.gov
Received October 30, 2004; Revised and Accepted November 1,
2004
|
ABSTRACT |
The
Nucleic Acids Research Molecular Biology Database Collection
is a public online resource that lists the databases described
in this and previous issues of Nucleic Acids Research together
with other databases of value to the biologist and available
throughout the world. All databases included in this Collection
are freely available to the public. The 2005 update includes
719 databases, 171 more than the 2004 one. The databases are
organized in a hierarchical classification that simplifies the
process of finding the right database for any given task. The
growing number of databases related to immunology, plant and
organelle research have been accommodated by separating them
into three new categories. The database summaries provide brief
descriptions of the databases, contact details, appropriate
references and acknowledgements. The online summaries also serve
as a venue for the maintainers of each database to introduce
database updates and other improvements in the scope and tools.
These updates are particularly important for those databases
that have not been described in print in the recent past. The
database list and summaries are available online at the Nucleic
Acids Research web site, http://nar.oupjournals.org/.
|
COMMENTARY |
In its
12th annual database issue, Nucleic Acids Research presents
135 new and recently updated molecular biology databases. The
current release of the Nucleic Acids Research online Molecular
Biology Database Collection (Table
1) includes 719 databases, an increase of 171 over last year (1).
The database geography also continues to expand. This year we have
the first databases from Brazil, Cuba, Estonia, Greece, Hungary (2,3),
Malaysia, Taiwan (4,5)
and Turkey. The database authors have again shown remarkable
creativity in naming their databases: last year's ORFanage [the
database of orphan ORFs (6)]
has been joined by H-ANGEL [Human ANatomic Gene Expression Library
(7)],
PROPHECY [PROfiling of PHEnotypic Characteristics in Yeast (8)],
PANDIT [Protein and Associated Nucleotide Domains with Inferred
Trees (9)],
SIEGE [Smoking-Induced Epithelial Gene Expression (10)]
and other aptly named databases. The database list is divided
into 14 major categories, 3 more than last year. One of them,
the category for immunology-related databases, was created in
response to the rapid growth in databases dedicated to
immuno-polymorphisms, certainly an offshoot of the Human Genome
Project. The proliferation of plant-related databases, sparked by the
completion of the first two plant genomes (Arabidopsis
thaliana and Oryza sativa) and steady progress in
sequencing other plants, prompted elevation of their status from a
subcategory to a separate category. One more category, organelle
databases, was created to provide a single home for the databases on
chloroplasts and mitochondria from various sources. As always, we
hope that these database listings, organized into a hierarchical
structure, will help introduce the community of biologists to the
enormous body of data accumulated by their colleagues and simplify
the process of finding the appropriate database for each particular
task.
Certainly,
this listing is far from exhaustive. To be included, databases had to
be publicly available to any user and allow direct browsing of the
data without downloading any special software that might interfere
with institutional firewalls. This means leaving out several
potentially interesting database projects.
Of the 548 databases featured in last year's compilation, 17 have
been dropped from the list because they have been discontinued,
merged into larger ones or, like the well-known Kabat database,
converted to commercial access. The previous year saw a loss of
13 databases from a total of 386 in the 2003 release. These numbers
and the history of Swiss-Prot (http://www.expasy.org/announce/)
and the GDB Human Genome Database (http://www.gdb.org/gdb/aboutGDB.html)
show that the databases that offer useful content usually manage
to survive, even if they have to change their funding scheme or
migrate from one host institution to another. This means that the
open database movement is here to stay, and more and more people in
the community (as well as in the financing bodies) now appreciate the
importance of open databases in spreading knowledge. It is worth
noting that the majority of database authors and curators receive
little or no remuneration for their efforts and that it is still
difficult to obtain money for creating and maintaining a biological
database. However, disk space is relatively cheap these days and
database maintenance tools are fairly straightforward, so that a
decent database can be created on a shoestring budget, often by a
graduate student or as a result of a postdoctoral project. Many
databases in this compilation originated just that way—as collections
of data on a certain research topic that a particular lab was
studying anyway, formatted in a user-friendly way by graduate or even
undergraduate students as part of their dissertations or course work.
Subsequent maintenance and further development of these databases,
however, require a commitment that can only be applauded. For
scientists from China, France, Japan, Russia and many other
countries, making their databases available to the worldwide
community also means maintaining them in English, the lingua
franca of science, which does not always come easily. Such
efforts deserve special appreciation.
Speaking of appreciation, those who maintain databases often do
not get much credit for their work either. Other than publication in
the Database Issue of Nucleic Acids Research or in
Bioinformatics, or an occasional publication in some other
journal, there is currently no straightforward way to announce
progress. The online summaries published by the database maintainers
on the NAR web site partially fill this void. Since the entry number
assigned to each database (Table
1) will be stable, these updates can be cited in just the same
way as any other online resource. At this time, I would suggest the
following format for citing these summaries: ‘The ooTFD database (11)
is listed with Accession No. 185 in the NAR Molecular Biology
Database compilation (1);
see the recent summary at http://www3.oup.co.uk/nar/database/summary/185’.
Suggestions for a better format are certainly welcome. Suggestions
for the inclusion of additional databases in this Collection,
as well as for improvements to the category structure, are also
encouraged and should be directed to the author at galperin{at}ncbi.nlm.nih.gov
.
|
ACKNOWLEDGEMENTS |
I thank Rich
Roberts, Alex Bateman and my colleagues at NCBI for support and
helpful advice, Alice Ellingham and Gill Smith for logistical
support, and Claire Saxby, Amanda Titmas and Kate Welsby at Oxford
University Press for their patience in handling this
compilation.
|
Notes
|
The
online version of this article has been published under an open
access model. Users are entitled to use, reproduce, disseminate, or
display the open access version of this article for non-commercial
purposes provided that: the original authorship is properly and fully
attributed; the Journal and Oxford University Press are attributed as
the original place of publication with the correct citation details
given; if an article is subsequently reproduced or disseminated not
in its entirety but only in part or as a derivative work this must be
clearly indicated. For commercial re-use permissions, please contact
journals.permissions{at}oupjournals.org
.
|
REFERENCES |
- Galperin,M.Y. ( (2004) ) The Molecular Biology Database
Collection: 2004 update. Nucleic Acids Res., ,
32, , D3–D22.[Abstract/Free Full Text] .
- Barta,E., Sebestyén,E., Pálfy,T.B., Tóth,G., Ortutay,C.P. and
Patthy,L. ( (2005) ) DoOP: Databases of Orthologous Promoters, collections of
clusters of orthologous upstream sequences from chordates and plants.
Nucleic Acids Res., , 33, , D86–D90.[Abstract/Free Full Text] .
- Tusnády,G.E., Dosztányi,Z. and Simon,I. ( (2005) ) PDB_TM:
selection and membrane localization of transmembrane proteins in the Protein
Data Bank. Nucleic Acids Res., , 33, ,
D275–D278.[Abstract/Free Full Text] .
- Huang,H.-D., Horng,J.-T., Lin,F.-M., Chang,Y.-C. and Huang,C.-C. (
(2005) ) SpliceInfo: an information repository for the modes of mRNA
alternative splicing in human genome. Nucleic Acids Res., ,
33, , D80–D85.[Abstract/Free Full Text] .
- Chang,Y.-H., Su,W.-H., Lee,T.-C., Sun,H.-F.S., Chen,C.-H.,
Pan,W.-H., Tsai,S.-F. and Jou,Y.-S. ( (2005) ) TPMD: a database and resources
of microsatellite marker genotyped in Taiwanese populations. Nucleic
Acids Res., , 33, , D174–D177.[Abstract/Free Full Text] .
- Siew,N., Azaria,Y. and Fischer,D. ( (2004) ) The ORFanage: an
ORFan database. Nucleic Acids Res., , 32,
, D281–D283.[Abstract/Free Full Text] .
- Tanino,M., Debily,M.-A., Tamura,T., Hishiki,T., Ogasawara,O.,
Murakawa,K., Kawamoto,S., Itoh,K., Watanabe,S., José de Souza,S., Imbeaud,S.,
Graudens,E., Eveno,E., Hilton,P., Sudo,Y., Kelso,J., Ikeo,K., Imanishi,T.,
Gojobori,T., Auffray,C., Hide,W. and Okubo,K. ( (2005) ) The Human Anatomic
Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene
expression across disparate technologies and platforms. Nucleic Acids
Res., , 33, , D567–D572.[Abstract/Free Full Text] .
- Fernandez-Ricaud,L., Warringer,J., Ericson,E., Pylvänäinen,I.,
Kemp,G., Nerman,O. and Blomberg,A. ( (2005) ) PROPHECY – a database for
high-resolution phenomics. Nucleic Acids Res., ,
33, , D369–D373.[Abstract/Free Full Text] .
- Whelan,S., de Bakker,P.I. and Goldman,N. ( (2003) ) Pandit: a
database of protein and associated nucleotide domains with inferred trees.
Bioinformatics, , 19, , 1556–1563.[Abstract/Free Full Text] .
- Shah,V., Sridhar,S., Beane,J., Brody,J.S. and Spira,A. ( (2005) )
SIEGE: Smoking Induced Epithelial Gene Expression Database. Nucleic
Acids Res., , 33, , D573–D579.[Abstract/Free Full Text] .
- Ghosh,D. ( (2000) ) Object-oriented transcription factors
database (ooTFD). Nucleic Acids Res., ,
28, , 308–310.[Abstract/Free Full Text] .