Sept. 18, 2006 | Everyone who works on systems biology seems to have his or her own definition. Here is a good basic definition from Wikipedia: “Systems biology is an academic field that seeks to integrate different levels of information to understand how biological systems function. By studying the relationships and interactions between various parts of a biological system...it is hoped that eventually an understandable model of the whole system can be developed.”
Systems biology is fueled by large-scale, global data sets such as those from arrays. The traditional methods of molecular biology gag on these large data sets because the field’s culture demands that every study tell a neat, mechanistic story.
A typical array experiment reveals five to 10,000 active genes. With traditional methods, these are filtered down to a list of a thousand or so genes whose expression changes (although no one ever explains why the only interesting genes are the ones that change; as any fan of Sherlock Holmes knows, the dog that doesn’t bark is sometimes the important clue!). The list is further pruned to a small number of known genes from known pathways. Inevitably, most of these pathways have been previously implicated — this gives the work its credibility — but one or two have not, which confers novelty and sizzle. The thousands of genes that don’t fit the story are simply ignored.
Systems biology tries to use more of the data by thinking globally and moving beyond the known pathways. This requires sophisticated mathematical and computational methods for analyzing the data to find interesting patterns that are not closely linked to known biology. There’s a tendency to focus on the math and computer aspects, since this is the new stuff, and to conclude that systems biology is focused on theory. This is a wrong conclusion.
Systems biology is squarely an experimental field that eats, drinks, and breathes data. To do systems biology, you need an experimental system that is amenable to large-scale experimentation. Ideally, you want to perturb your system in numerous ways (e.g., treat your system with several drugs at different dosages) and generate data using multiple complementary methods, for example, expression arrays to measure gene expression, ChIP-chip to get data on transcription factor binding, mass-spec proteomics to assess protein abundance, and single-cell microscopy to track protein localization. Time course data is especially valuable, as this lets you watch the system as it responds to each perturbation. You end up with large amounts of diverse data that grow further as you add data from external public or proprietary databases.
This is not for the data-phobic or those trying to get by on an R01 budget.
Challenges and Changes
Some of the computational challenges are obvious. You need good laboratory informatics to manage the experimental procedures and collect the data. For familiar data types, such as arrays, you need the usual software tools to analyze those data types individually. But you may also face new data types, such as protein-protein interaction data, which is widely used in systems biology.
The major new challenges arise from the need to integrate so much large-scale data. Typical large-scale data sets suffer from high error rates. You should not accept any single data point at face value. To draw valid inferences from such data, you have to jointly analyze many data points from different sources. Combining data in this way is a central theme of systems biology.
In a series of online articles at www.PharmaDD.com, I will review products, both commercial and academic, that play a role in systems biology and the other key fields supporting translational medicine. The first review, accompanying the Web version of this column, describes the software used at my home institution, the Institute for Systems Biology (ISB). This includes academic packages such as Cytoscape, Gaggle, SBEAMS, and GDxBase, and commercial software from Ingenuity. Do send along, for our consideration, the names of products you’d like us to review.
Systems biology is the next stop on biology’s long postgenomic journey. It remains to be seen whether it’ll be a good place to hang for a while or just a rest stop on the side of the highway. Either way, there’s lots to see and do. So join me online!
Guest columnist Nat Goodman is a regular columnist for Bio-IT World’s sister publication, Pharma DD, where this column first appeared in the September/October 2006 issue. Respond to Goodman's column at: www.goodman.pharmadd.com.