By James Golden

Nov. 12, 2002 | Bioinformatics as a business, not to be confused with bioinformatics as a field of study, is at an interesting crossroads. As an academic branch of learning, bioinformatics remains mostly what it always was — a cross-disciplinary endeavor between computer science and molecular biology. But bioinformatics as a money-making proposition has different criteria for success, and it has received a lot of bad press lately, some of it deserved. If you're in the business of developing and selling informatics platforms and tools, your company may have been the focus of some of that press.

The parent business — genomics — hasn't fared much better. Journalists frequently ask me to comment on "the failure of genomics," wanting to know why, two years after the completion of the Human Genome Project, we have yet to see a bonanza of new drugs and a boom in biotechnology stock valuations. The focus for commercial bioinformatics in the "post-genome" decade needs to move forward. To bring the industry from a speculative enterprise to a mature and profitable (if limited) industry requires a true understanding of the drug discovery process and the role that computational technology can play.

For the past decade, the focus of most drug discovery companies has been genomics — sequencing genomes, identifying the protein-coding genes, evaluating gene function, and transcription. More recent efforts have focused on understanding proteins, particularly their shapes and interactions. Throughout, bioinformatics has been a core technology in gathering and interpreting genomic data to elucidate the nature of genomes.

Developments in computer hardware and software have facilitated the assembly and analysis of whole genomes. Predictive models have enabled the identification of coding sequences, the identification and evolutionary comparison of genes, and, in many instances, functional assignment. High-performance computers and new software have shed light on the structures of these gene products. New concepts, such as Beowulf clusters, Hidden Markov Models, field-programmable gate arrays, and greedy algorithms, became part of the everyday biology lexicon. New companies, bolstered by funding from VC houses and investment banks, marketed and sold these technologies, while aspiring entrepreneurs assembled databases that could be licensed to pharmaceutical companies for millions of dollars.

In many respects, these efforts were highly successful. By 2001, we had a working draft of the human genome with more than 30,000 genes, thousands of which are potential drug targets. In the process, the line between commercial and academic bioinformatics became blurred, and like the Human Genome Project itself, academic and commercial entities created competing systems and software, all with the goal of sequencing and annotating the human genome.

During this golden age, bioinformaticists developed software that computational biologists could use to make biological discoveries based on genomic data. But the industry swerved off course by selling expensive systems that focused on the individual pieces of a solution, without heeding downstream processes that were the actual bread-and-butter of our customers. Bioinformatics has always been about integrating data and converting it into information. When it loses that focus, it loses its value to the customer.

Genomics Is Not Drug Discovery 
Although drugs were being discovered and marketed long before the advent of genomics, the process to date has been largely hit or miss. Genomics has provided the first real opportunity for systematic target-based drug discovery — developing a drug for a specific genetic or proteomic target. What pharmaceutical companies are looking for now is a "pharmaceutically tractable genome" — how many of those more than 30,000 genes really are associated with a disease state, such that when mutated, they cause a cell to deviate from its normal development pathway? In essence, how many of those genes can we do something about? Biopharma companies want to develop screens for those druggable targets against which they can test their collections of therapeutic compounds. They also want to know how those compounds vary in efficacy and toxicity in different patients.

So far, bioinformatics has been an enabling technology for target discovery, but it has not been consistently applied to drug discovery. What is the role of bioinformatics in moving from targets to compounds to clinical trials, and how do we build a business around that form of discovery?

The answer requires a serious look at the business process of our potential customers and understanding of where our technologies meet key needs. To bring a drug to market requires years of coordinated research across a staggering array of disciplines. A clear win for our customers would be any product that can reliably reduce time or cost in bringing a therapeutic compound to market. A key barrier to success is a lack of integration and communication.

We have to develop a good understanding of the key pieces of target-based drug discovery and then understand how to integrate those pieces together into a complete gestalt. The process can be divided into four distinct parts: target discovery and validation; lead discovery and optimization; pharmacogenomics and clinical trials; and creating a bioinformatics infrastructure for bringing it all together. In understanding our biopharmaceutical client's business we need to ask and answer questions in these four areas.

Pieces and Parts

The first objective of target discovery is to determine what genes are available for the bioinformatician to work with. This question has both biological and intellectual property implications. In moving from discovering a gene to advancing that gene as a compound target, we need to know who owns what properties of that gene. If we have freedom to operate, we need to know if that gene is involved in a disease that is relevant for the business. If a gene appears to be upregulated only in cancer cell lines, whereas the company's interest is treating cardiac disease, it may not make sense to develop that gene as a target. Once I have a qualified target, can I do anything about it? Can we modify that target and see some sort of phenotypic change in a cell?

The identification of druggable targets opens up new opportunities for compound screening and optimization. If we have a validated target, can we predict a structure that decreases the number of screens we need to identify an inhibiting compound? By reducing the number of assays we need to perform, we lower costs and increase throughput. What kinds of de novo and threading prediction algorithms are available, and how well do they work? How good are the models? With target structure and large compound libraries, can we use rational design techniques to create derivative compounds for more optimal screening? Can we combine the target and lead discovery processes to find new targets and optimized compounds at the same time?

Once a compound has been optimized and is ready to advance to clinical trials, we should have a set of biomarkers that can be used for monitoring safety and efficacy. Biomarkers are favored by the FDA and provide more complete information for their decision-making. Biomarkers come in a number of flavors, including genetic markers, RNA markers, protein markers, and metabolic intermediates. SNP identification is probably the most obvious technology for identifying genetic biomarkers. The bioinformatics techniques that allow us to create a druggable genome can also be used to create an ADME-T (absorption, dispersion, metabolism, excretion, toxicity) genome. By identifying the key genes involved in the body's ability to process drugs, we can identify where our experimental compound might cause unintended consequence. These pharmacogenomic and pharmacogenetic techniques can be used throughout the target and lead discovery phases.

Tying It All Together

Each piece of the puzzle listed above represents an opportunity for informatics. The scope and value of each proposition may differ, but each step in the drug discovery process can benefit from automation and optimization. Some of these steps, such as fragment assembly and homology comparison, are well-served by currently available tools. Other processes, such as the management, storage, and comparison of large chemical libraries, benefit from multiple solutions focused on differing therapeutic areas. The business opportunity is in knowing where your solution fits into the complete process. An investment (or company creation) opportunity exists by mapping out the complete process and discovering what needs are underserved. Competitive intelligence is critical — know the strengths and weaknesses of both your competitors and your potential clients in all areas of the drug discovery process.

But pieces and parts are not where the key opportunities exist. Through all these stages of the drug discovery process, the sine qua non of bioinformatics is knowledge management, integration, and dissemination. The bioinformatics infrastructure is used to prioritize the genome, identify IP issues, elucidate the role of the target in a disease that is a key business of the company, and clearly link to the data coming from functional assays and pharmacogenomic experiments. This information has to be easily accessible to everyone in the research organization and the connections have to be clear. The ideal set of bioinformatics tools should fit together to act as an ERP (enterprise resource planning) and CRM (customer resource management) system. A point of fact — there is no such thing as an off-the-shelf enterprise solution. Every pharmaceutical company uses some variant of the target — compound — trial process, but each company's mission is sufficiently different to negate a one-size-fits-all solution. The key to selling bioinformatics products is in making sure your company's solution clearly fits an established need and integrates seamlessly with other products already in place.

The business of bioinformatics has suffered over the past few years. But as the most critical technology for unifying the disparate pieces of the drug discovery process, numerous opportunities remain for commercial success. In many respects, bioinformatics is drug discovery — at its optimal use it unites genomic-based targets with rationally designed screens and optimized compounds. It can be used to design a clinical trial and bring the patient data back into the discovery process, completing the circle of gene-to-drug-to-patient, and back to gene again, ever refining our knowledge of systems biology and the nature of disease.

James Golden is manager of business development at 454 Corp., a subsidiary of CuraGen Corp., where, until recently, he was the director of bioinformatics. He can be reached at