In a provocative paper published this week, researchers say they have figured out a way to link a person's DNA to their anonymous genetic data in a certain kind of public research database. But the National Institutes of Health (NIH), which hosts one of the largest such databases, says it's not taking any new steps to prevent someone from using the method to breach privacy. That contrasts with NIH's response 4 years ago, when a similar study prompted the agency to pull genetic data from its public Web sites.
The issue then involved studies that compare DNA variants called single nucleotide polymorphisms (SNPs) in people with and without an illness to find disease risk markers. NIH had begun posting online pooled SNP results from hundreds of people, thinking privacy would not be breached. But then scientists reported in PloS Genetics that if they had a sample of an individual's DNA, they could link it to that person's SNP results within a public DNA pool. NIH (and the Wellcome Trust) removed data from public sites; NIH now allows only approved researchers to download pooled data from SNP disease studies.
Such access barriers are less common for a different type of genetic data: measures of gene activity derived by analyzing RNA levels in a tissue sample. Because this gene expression data wasn't thought to be traceable to an individual, researchers have routinely deposited RNA results in public databases. One example is NIH's Gene Expression Omnibus (GEO) database, which holds nearly 1000 datasets for gene expression tests on human tissues. Anyone can look up data for individuals who participated in, say, a study on breast cancer or childhood obesity.
Now it seems that this RNA data can be linked to a person's DNA after all. Eric Schadt and colleagues at Mount Sinai School of Medicine in New York City reported this week in Nature Genetics that they have developed a technique for generating a personal SNP profile, or a DNA "bar code," for an individual based on their gene expression results. This means that, in principle, if someone had a DNA sample from a participant in a study stored in GEO, they could devise a SNP barcode, match it to a GEO sample, and look at that participant's biological data.