By applying unsupervised and automated machine learning techniques to the analysis of millions of cancer cells, Rebecca Ihrie and Jonathan Irish, both associate professors of cell and developmental biology, have identified new cancer cell types in brain tumors. Machine learning is a series of computer algorithms that can identify patterns within enormous quantities of data and get ‘smarter’ with more experience. This finding holds the promise of enabling researchers to better understand and target these cell types for research and therapeutics for glioblastoma—an aggressive brain tumor with high mortality—as well as the broader applicability of machine learning to cancer research.
With their collaborators, Ihrie and Irish developed Risk Assessment Population IDentification (RAPID), an open-source machine learning algorithm that revealed coordinated patterns of protein expression and modification associated with survival outcomes.
The article, “Unsupervised machine learning reveals risk stratifying glioblastoma tumor cells” was published online in the journal eLife on June 23. RAPID code and examples are available on the cytolab Github page.
For the past decade, the research community has been working to leverage machine learning’s ability to absorb and analyze more data for cancer cell research than the human mind alone can process. “Without any human oversight, RAPID combed through 2 million tumor cells—with at least 4,710 glioblastoma cells from each patient—from 28 glioblastomas, flagging the most unusual cells and patterns for us to look into,” said Ihrie. “We’re able to find the needles in the haystack without searching the entire haystack. This technology lets us devote our attention to better understanding the most dangerous cancer cells and to get closer to ultimately curing brain cancer.”
Fed into RAPID were data on cellular proteins that govern the identity and function of neural stem cells and other brain cells. The data type used is called single-cell mass