The functional genomics field, which looks at the activities of the genome and levels of gene expression rather than particular gene mutations, generally relies on aggregating information from many samples for its statistical power. This means that broadly sharing raw data is vital; however, sharing these data currently is challenging because of the privacy concerns of individuals within those datasets, leading to these data being largely inaccessible behind firewalls. In a study publishing November 12 in the journal Cell, a team of investigators demonstrates that it’s possible to de-identify those data to ensure patient privacy. They also demonstrate how these raw data could be linked back to specific individuals through their gene variants by something as simple as an abandoned coffee cup if these sanitation measures are not put in place.
“The purpose of this study is to come up with practical ways to broadly share the raw data without creating undue privacy concerns,” says senior author Mark Gerstein, a professor of bioinformatics at Yale University.
Functional genomics research is frequently tied to a specific disease. For example, an investigation into a particular psychiatric condition might look at the expression of certain genes in a type of neuron. And, by nature of having their genetic material included in such a study, an individual’s medical status with regard to that condition could inadvertently be revealed.
This can happen through what’s known as a quasi-identifier. The way a quasi-identifier works is that if someone has enough individual data points about you, even if those data on their own are not sensitive or unique, they can be combined to create an identifier that is unique to you. In a non-genetic setting, this means if someone has your zip code, birthday, the model of car you drive, and other