Proteomics is getting its act together. A new international project announced last week will gather resources on the structures of proteins into a comprehensive, publicly accessible database that will help scientists interpret the flurry of data pouring out of the Human Genome Project and other sequencing efforts.
The National Human Genome Research Institute (NHGRI), along with five other centers and institutes at the National Institutes of Health, announced on 23 October that it will fund the Unified Protein Database--or UniProt for short. NIH will contribute $15 million to run the project over the next 3 years.
Until now, the most widely used of the protein databases has been the Swiss SWISS-PROT project, says NHGRI program director Peter Good. But the time-intensive cataloging of detailed protein structural data, coupled with funding shortages, has left SWISS-PROT unable to keep up with the furious pace of progress in genomics, Good says. The situation led to the creation of TrEMBL, a computer-annotated offshoot of SWISS-PROT intended to temporarily store its growing backlog. Meanwhile, the U.S.-based Protein Information Resource (PIR) worked independently to compile its own database.
Now the leaders of the three projects will team up to merge their databases into one. Joining forces will cut down on "the duplication of work so that money can be spent more wisely," says current leader of SWISS-PROT Rolf Apweiler of the European Bioinformatics Institute in Cambridge, United Kingdom. UniProt will incorporate the best of SWISS-PROT, TrEMBL, and PIR, Apweiler says. He expects the database to be online by the end of 2003. It will be located at www.uniprot.org.
"It comes at the right time," says structural biologist Jia-huai Wang of the Dana-Farber Cancer Institute in Boston. Sequencers have generated information on thousands of proteins, but the structure is known for only a very small percentage of those. "With a comprehensive and reliable database, scientists can make more accurate predictions" of a protein's structure and function based on information about related proteins, Wang says.
A centralized database is "critical," concurs Samir Hanash, an oncologist at the University of Michigan, Ann Arbor, and president of the Human Proteome Organisation. However, UniProt is just the beginning, Hanash cautions. Additional databases will be needed to store information about when and where proteins act in the body, he says.