Can Big Genomic Data Reveal the Fundamental Units of the Brain?

January 20, 2016

By Aaron Krol

January 20, 2016 | An adult mouse’s brain, an object not much bigger than the last joint of your pinky finger, contains around 75 million neurons. At the Allen Institute for Brain Science in Seattle, the Mouse Cell Types program, led by Hongkui Zeng, is trying to figure out just how many varieties of neurons make up this vast complex, and what makes each one unique.

Zeng’s research focuses on the primary visual cortex, a tiny sliver of the brain where signals from the eyes are processed and interpreted. Because vision is a relatively well-defined process, it’s thought to be a good model for connecting the behavior of individual neurons to larger brain functions.

“You really can’t understand a system until you understand its parts,” says Bosiljka Tasic, a founding member of the Mouse Cell Types program.

To a shocking extent, those parts are still a mystery. Many supposed cell types are based on little more than what you can see through a microscope: a neuron’s shape, or the pattern of rootlike dendrites extending from its body. These morphological traits, though important, are hard to see in full, and even harder to track methodically across thousands or millions of cells.

This month, Zeng’s team published a study in Nature Neuroscience that takes advantage of new technological developments to get a fine-grained look at the molecular toolkits of single neurons. Using newly refined methods to isolate single cells, Zeng’s lab collected over 1,600 brain cells from the visual cortexes of adult mice, intact and in good shape for sequencing. With advances in highly parallel, unbiased RNA sequencing, the group was able to measure each cell’s entire “transcriptome”―the array of RNA molecules that indicate which genes are actively producing proteins―at a depth that reveals even the scarcest RNA traces.

“We think this is probably the most comprehensive survey of a cortical area,” says Tasic, who co-led the study with her colleague Vilas Menon. “Many studies that are coming out now do very shallow sequencing… We wanted to go deeper.” With a median of 8.7 million sequencing reads per cell, the authors discovered a wealth of new RNA markers that define discrete groups of neurons. Some of these markers suggest that known cell types in the brain can be split into smaller sub-categories. A few even stake out rare types of neurons that may be new to science.

Yet the data collected for this study also confirms that the brain’s biology is neither tidy nor easy to unravel.

“There is this obsession in the field, and in many other areas of biology, that people always want cleanliness and discreteness,” Tasic says. Instead, her efforts to classify neurons have shown that “types” can be slippery, and many cells straddle the line between closely related groups. As projects like this one seek to redefine cell types for the genomics age, scientists will have to face these ambiguities and consider what they can tell us about the nature of the brain.

Patterns within Patterns

Whole transcriptomes provide an impressive amount of data with which to organize cells, but that data is hard to interpret in an unbiased way. “We’re trying, in some sense, to solve two problems simultaneously,” says Vilas Menon, co-lead author of the paper. “We’re trying to cluster the genes, and also to cluster the cells.”

To disentangle these problems, the team performed an iterative analysis. First, their software looked for RNA markers that diverged most widely between different cells, using those markers to sort all the cells in the study into large clusters. Then, they wiped the slate clean, looking for brand-new markers within each cluster to split the cells step by step into smaller groups. The smallest possible divisions, in which no new RNA markers could strongly distinguish cells from one another, became the group’s proposed “cell types.”

defineThe researchers used two different computational methods to define clusters, but both revealed the same basic hierarchy of types. “In general, the higher level splits correspond to what’s already known for these broad classes of neurons,” says Menon. For instance, the first split simply divided all the neurons in their data from a handful of other cell types present in the brain, like the glial cells that support the brain’s physical structure. The second split separated GABAergic cells, which mostly damp down chemical signals in the brain, from glutamatergic cells, which mostly spark and amplify signals.

Beyond this point, the patterns became more revealing. Within the glutamatergic cells, for example, later clustering tended to split neurons according to how deeply they were embedded in the cortex. A mouse’s primary visual cortex is organized in six layers, and the Allen Institute’s transcriptome data suggests that the neurons in each layer may be closely related to one another, or have similar functions that require the same genes to be activated. Yet the GABAergic cells did not split out so naturally by layer, implying that their development may follow very different rules.

At the narrowest levels of clustering, the genes that defined cell types sometimes came as complete surprises. Within a group of GABAergic neurons known for producing high levels of the hormone somatostatin, the authors found a subtype of cells expressing an additional gene called Chodl. “Nobody has ever heard of this marker Chodl,” says Tasic. “But it’s the most beautiful pattern you’ve ever seen, because it’s only in that cell type. This is the beauty of transcriptomics.”

With luck, genes like Chodl will provide new clues to the roles of specific cell types. If no other neurons make use of this gene, it’s reasonable to think it may have a very specialized function. But even if that’s not the case, highly unique markers like Chodl are invaluable for studying neurons more closely, letting scientists design new molecular and genetic tools to target single cell types for follow-up research.

“I see this as a first step in allowing us to selectively manipulate cell types,” says Tasic. “And then you can do all sorts of things to those cells. You can label them specifically, and study their morphology. You can perturb them. You can inactivate them. I think this will be the way to truly understand what these different cells do.”

Mountains and Ridges

“Technically, this is a very impressive achievement,” says Joshua Sanes, a neurobiologist at the Harvard Center for Brain Science. “It’s using a really nice combination of state-of-the-art methods to address what, to me, is a big problem in neurobiology.”

Like the researchers at the Allen Institute, Sanes is interested in the problem of defining cell types. (Both his group and Hongkui Zeng’s receive funding from the national BRAIN Initiative, which has provided grants for big data-gathering projects to attack this question.) It’s a vexing issue, both because it requires such an immense amount of data to address, and because biology again and again rejects easy categories.


To Sanes, one of the most interesting aspects of Tasic and Menon’s paper is their decision to point out neurons with traits of more than one cell type. Unlike other groups that may exclude ambiguous data from analysis, the Allen Institute accepted cells with “intermediate” transcriptomes as important findings of their study. In some cases―most notably, a class of glutamatergic neurons in layer four of the cortex―these intermediate cells are so abundant that two or more supposedly separate “types” almost seem to merge together.

“That could mean that, although some cells are in types, there’s a certain amount of slipperiness,” says Sanes. “It’s been pretty hard to define neurons in a way that will help research move forward.”

It’s possible that some classes of neurons don’t exist in discrete types at all, but include a spectrum of cells expressing different mixes of the same genes. Or transcriptomes may just not be the best way to define cell types―because neurons of the same type change their RNA arsenals depending on their stage of development, or the chemical signals they’re responding to.

“Some parts of the overall phenotypic landscape may have features of a continuum,” says Tasic, but that doesn’t mean that her group’s proposed cell types are not useful ways of thinking about neurobiology. “If there are two mountains that are connected by a ridge, there are still two mountains. The fact that you have a ridge is fine. Maybe that’s biology.”

From Rosetta Stones to Searchable Databases

Tasic, Menon, and their colleagues identified 49 cell types altogether, but the number is less important than the process that produced it. Almost certainly, there are still new cell types to discover, and perhaps further divisions within the types the Allen Institute has identified.

“I think it’s extremely unlikely they’ve gotten all the types,” says Sanes. “It’s terrific, but it’s not like you should think of this as a complete catalogue.” To isolate single neurons, the Allen Institute used a method called FACS, which relies on sampling many different strains of transgenic mice to collect both abundant and rare cell types. The authors agree that this approach leaves open the possibility that some rare types were not sampled, and future studies will use different methods of capturing single cells, adding yet more data to the mix. (At his lab, Sanes is working with a new method called Drop-seq, which the Allen Institute also plans to adopt.)

For work like this to be meaningful, it’s not necessary for the Allen Institute to come up with a complete encyclopedia of cell types on its own. What is essential is that the data be made easily available to neuroscientists everywhere, to compare with their own studies and gradually refine with new gathering

Today, this is far from assured. A lot of research on cell types is only available through journal articles, and there are few standards for formatting data so it can be shared and understood across institutions. This is apparent in some of the detective work that Zeng’s team did to see if their proposed cell types matched any previously identified types. Tasic, Menon, and colleagues trawled through the scientific literature looking for what they called “Rosetta stones,” unique molecular features that could clearly be seen in their own transcriptome data.

In the future, this work could be made almost automatic, especially as objective data types like RNA sequencing information become more common. Just a few weeks ago, many of the first recipients of BRAIN Initiative grants―including both Zeng and Sanes―met in Bethesda, Md., to discuss plans for sharing neurobiological data, and ways to make that data more uniform and searchable.

“I think the BRAIN Initiative has been helpful in drawing attention and funding,” says Sanes. “The NIH is doing everything it can to ensure data sharing, and I think the community is going along with that well.”

In the meantime, Zeng’s group has released their raw transcriptome data to GEO, an NIH-supported database of RNA information, and made an annotated version of their data available online on the Allen Institute website. Tasic and Menon hope that outside researchers will use these resources to design more detailed studies of specific neuron types. Neuroscience is still in the earliest stages of data gathering, but to truly understand the brain, scientists will eventually have to make the leap into exploring function, cell type by cell type.

“We can find genes that are differentially expressed at the level of the whole brain, but we really don’t know what these genes do,” Tasic says. “Once you see that this gene is expressed in a specific type, you can formulate a hypothesis.”