Rapidly combining cosmos of chemical and biological data, Johnson & Johnson's multidisciplinary ABCD platform for drug discovery aims to give scientists access to more information -- and the ability to make better decisions.
By Mark D. Uehling
June 17, 2004 | With four generic chemical scaffolds as a starting point, Dimitris K. Agrafiotis of Johnson & Johnson is using a laptop to generate more than 308 million new molecules. This takes all of seven seconds. He asks a visitor to select which one of the 308 million new molecules he would like to inspect. Number 207,301,427? There it is. "I can browse every single compound," Agrafiotis says. "My challenge would be to find another company that can do what I just did — in a month. We have tools that can browse billion-member collections interactively."
Agrafiotis is the architect guiding a global team of scientists and programmers. The group melds a pre-existing IT project at J&J, and the software from its 2003 acquisition of 3-Dimensional Pharmaceuticals (3DP), a drug discovery firm that counts Agrafiotis as an early employee. Thanks to a massive integration project, J&J says it is on the threshold of overcoming a fundamental limitation in drug discovery. The dilemma: how to integrate a wealth of chemical and biological data, including late-stage information about in vivo assays, into one standardized data warehouse for 1,000 J&J drug discovery scientists worldwide.
Dimitris Agrafiotis and his team at Johnson & Johnson are uniting in silico, in vitro, and in vivo data in one massive data warehouse.
In the genomic era, evaluating millions of virtually generated molecules overwhelms the most facile mind. So does assessing combinatorial chemistry libraries of actual compounds. But going forward, J&J hopes to have fewer failed compounds by giving scientists tools to pick molecules with the best druglike properties — and the fewest side effects.
"Everyone is trying to do this," notes Sangtae Kim, director of Shared Cyberinfrastructure at the National Science Foundation. Kim helped to engineer Eli Lilly's cross-disciplinary platform for drug discovery. Without more details, Kim says, it's impossible to assess what J&J has accomplished. In the industry at large, mergers and acquisitions usually have grim consequences for scientific collaboration — and that may also affect a company with as many farflung fiefdoms as J&J. "The more decentralized the company is, the harder it is going to be," Kim says.
But outside Philadelphia, in Exton, Penn., and at other J&J research sites, the company has growing pride in the IT it has built in-house. There is also subdued optimism that it could usher in significant improvement in lead optimization. J&J says elements of its Advanced Biology and Chemistry Discovery (ABCD) platform of applications and databases are already dispensing chemical data — and will dole out biological data by 2005.
"We have developed a set of tools to decide, out of the millions of compounds we can make, which should we make," says Roger Bone, vice president of drug discovery at J&J Pharmaceutical Research & Development (PRD). "In aggregate, we will reduce the amount of time the scientists spend getting the information."
|Think Excel, on Steroids, for Scientists
Bone says one of the main accomplishments could be the often-promised, seldom-delivered vision of integrating biological or genetic information — and small-molecule data from chemists and pharmacologists. "That's the thing that doesn't exist in off-the-shelf offerings from commercial vendors," Bone says. "They have tools that do great things, but they don't allow you to pull analytical data from instruments." Bulldozing the Silo
Bone says J&J wants more of the same information available across oceans, across scientific disciplines, even across separate parts of J&J. "We are breaking down the barriers — the barriers that prevent us from accessing data," he says. "At other companies, there may be cultural barriers — the absorption, distribution, metabolism, and excretion (ADME) folks may not talk to the chemists."
"This all sounds good and reasonable. Different companies have made different degrees of progress. These enterprisewide data integration plans must be more holistic. That is an even greater degree of difficulty and complexity. Very few companies have mastered it at that level.
Sangtae Kim, National Science Foundation
Part of the process of building the company's ABCD suite has been knitting together a variety of databases. That has required painstaking and political discussions with participants from all over the company. Susan Ward, a consultant and veteran of information integration projects at Wyeth, Millennium Pharmaceuticals, and Infinity Pharmaceuticals, says that if J&J has done what it says, it will be significant. "It's not a technical issue, it's a cultural issue," she says — and one of the thorniest in drug discovery. "In the early stages, the data tend to be reasonably black and white. As you get to more complex factors and more complex systems, that becomes less true. People start getting very uncomfortable having the data disassociated from their judgment of what the data mean."
Norman Huebert might agree with her. The team leader for ADME at J&J PRD, he's spent a few years at other pharmaceutical companies, and says the culture of his famously decentralized employer is genuinely different. "There is a general impression that a lot of companies are accumulating the data," he allows, "but [the data are] not necessarily being used for decision making. This is where a system such as ABCD can facilitate. It makes that information readily available not only to the team leaders, but to the individual chemists. They can actually give a situation their intellectual analysis."
Mergers and acquisitions elsewhere in the industry, Huebert notes, can result in shotgun marriages of different compound-numbering schemes, different units of measurement, and different names for identical assays. Synchronizing all that is more easily said than done. Huebert says: "I am painfully aware of what happens when you try to combine disparate sets of compound collections with different numbering systems and assays and databases."
The prevailing paradigm in drug discovery is to take a single assay, a single measurement of potency. This becomes the rationale to advance (or kill) a compound. But the ABCD platform, Huebert says, has been designed to convert units from different instruments in different parts of the world, allowing apples-to-apples comparisons between different experiments. That allows more complex decisions that take more variables into account.
Says Huebert: "One of the advantages of the ABCD software is that as long as you can define the structure of the data from the different parts, and define the protocols that produce that data, and compare the similarity or difference between protocols, you can use data that have been derived from different sources."
Using different sources, in turn, allows J&J to frontload insights about a drug's good or bad ADME profiles and toxicity. The company's scientists on either side of the Atlantic can consider a multitude of such properties and do so earlier in the process than competitors. "We're optimizing all the properties," Huebert says. "This paradigm goes all the way back to where we get our first hits out of a high-throughput screen."
|Prioritizing Compounds, Rationally
|According to J&J's Jan Hoflack, vice president of medicinal chemistry and enabling technologies, high-throughput screening is not, strictly speaking, a problem at all.
In plainer language, J&J has moved all wart-finding endeavors to the earliest possible moment. "That's the whole point of the process," Huebert says. "We're trying to highlight the warts, the liabilities. The sooner you do that, the sooner you get on the right path." Indeed, Huebert reports that the consideration of more than one assay, of very early consideration of ADME profiles, is already spurring a few molecules a bit faster through the J&J research pipeline. "We already have examples of different projects in-house where this optimization was taken seriously," he says. "Those projects moved much faster."
Another member of the ABCD team, principal scientist Victor Lobanov, echoes that theme. "There is a lot of [commercial] software out there," he says. "What is unique to our approach is that we are integrating cheminformatic knowledge into the data-access/data-mining application. That is what will take us into an area where nobody has gone before."
Like several of his colleagues, Lobanov eventually comes around to Excel, the ubiquitous Microsoft spreadsheet, which approximates a standard way to handle lists of molecules. But in the industrial age of drug discovery, Excel does not allow scientists to easily share data — or to mine them.
Suppose the results of a 100,000-compound high-throughput screening campaign are available. "If you put all the data and mine them in one application, you will find yourselves limited," Lobanov says. "This is a limitation of Excel. We are creating a chemistry spreadsheet of high quality." In the case of the high-throughput data, some of the information will be accessible to everyone at the company.
As he speaks, Lobanov plays on his computer with clusters of hundreds of thousands or, in some cases, millions of molecules. Sifting and sorting them, he can map them into 3-D clumps — vast shapes of dots, each representing one compound, that coalesce into something like a cumulus cloud in the clear blue sky of his monitor. Then he zeroes in on one node of the cloud, dragging those molecules to another tabular worksheet, and sorts those.
He seems to have a dizzying number of ways to plot, sort, sift, and screen the molecules. Basically, any chemical quality of interest to a scientist can become an axis on a graph. "You can display up to four properties simultaneously," Lobanov notes. "This scatterplot can handle up to 1 million compounds. It becomes slower, but it is still very workable. For the project teams, it will be very useful to figure out which leads seem most promising."
But the ABCD platform is not only about chemistry. The system will eventually include tools for the biologists at J&J. Many were developed at J&J's La Jolla, Calif., site. There, Jackson Wan, head of bioinformatics and genomics technologies, talks about the gentle process of weaning his scientists from Excel. "Where things are more mature, where we need to share globally, you really don't want people to put data in Excel," Wan says. "The data formats are all different. It makes sense to develop a database."
The ABCD data warehouse will draw upon, among other things, mushrooming volumes of genetic and microarray data. Wan quickly describes compound registration software, and project-management and portfolio-evaluation software. He allows that J&J uses applications from LION bioscience, from Omniviz, even from Stanford University. "We don't want to duplicate what is done out there already," Wan says. "We're going to keep that and integrate it."
But sometimes, Wan says, other functionality is needed, and even the best custom-integration consultants that J&J has retained have needed a long time to be brought up to speed. Then, often as not, they depart prematurely. "We tried that many times," Wan says with some weariness.
"With the tools we build," Wan continues, "we try to show the most valuable information so people can make decisions. A lot of their job is to sift the good information from the bad. Which do you work on first?" So the ABCD platform has a variety of querying and reporting tools built in. It is also lightning-fast. "Speed is the issue. In our past system, that was a complaint — it took a lot of time for the answers to come up."
|Can Your Vendor Do This?
|In its ABCD platform, Johnson & Johnson will combine a variety of in silico, in vitro, and in vivo data in a single electronic environment for scientists. A selected list of major components is below.
Performance is also paramount for Agrafiotis, the ABCD project leader and an avid amateur soccer player. "Speed is of the essence," Agrafiotis says. "This is the core theme of our tool. The performance is unparalleled. I don't think anybody can come close to the speed, power, and scale of our tools."
It's not only a question of speed for its own sake, but also a recognition that biologists and chemists have better things to do than cut and paste from Excel, or recalculate laboratory values into different units. That's the boring, mundane work that computers can do — once the human beings have worked out the thorny issue of which name for the assay will be the global standard.
Agrafiotis seems to have thoroughly melded his training in chemistry and software development, honed at 3DP. J&J bought the company for $88 million, seeking compounds and informatics.
As Agrafiotis spins molecules, he is rapidly slicing and dicing information about their valence, aromaticity, ring structure, and stereochemistry — not to mention their bioavailability, half-life, purity, and lethal dose for 50 percent of a population. In all, J&J scientists using ABCD will be able to choose between thousands of in silico, in vitro, and in vivo data variables, computed or experimentally derived, stored in a variety of databases worldwide.
"You might want to ask whether the in vivo assays, pharmacology, disease models, and the in vivo pharmacokinetic analysis and toxicology are in the database. If they've managed to get all that in there, they are ahead of the game. Other companies have some of that, but I don't know of anyone else who's got all that comprehensively available.
Susan Ward, consultant
Agrafiotis is nothing if not opinionated. "I use IT in a more rigorous way than it is defined in industry," he says. "It is the plumbing. But it is also the algorithmics. You cannot dissociate the information from the science of the data analysis. Any solution has to involve not only solid tools but solid analytics." The classic Lipinski "Rule of Five" for optimal druglike properties is in the system. So is a beautiful way to summarize the status of a molecule or project in just one page.
Agrafiotis knows that many companies have announced efforts to do what J&J has promised. He insists the ABCD project is different. "Modern pharmaceutical companies rely heavily on commercial [software] to implement their key information management functions," he notes. "Most of these systems address different aspects of the discovery pipeline and were never intended to work with each other. Consequently, a significant amount of effort and expense is consumed in integration activities, which can result in suboptimal systems that do not respond gracefully to change."
Nor does he like the Internet. "We need to be able to deliver large amounts of data to the user and allow the user to mine them interactively," Agrafiotis says. "The Web is not a good medium for that. The data analytics will be [Microsoft] .Net applications operating at native speeds. .Net allowed us to capitalize on our massive investment in C++ by encapsulating high-performance C++ components, and integrating them into highly sophisticated Windows applications."
He does like the system's highly detailed, even beautiful images of chemical structures. "All we are trying to do," Agrafiotis says, "is make sure the information is massaged and presented in an appropriate way."
Agrafiotis is understated and calm — a chemist to the core after postdoctoral training with Andrew Streitwieser at the University of California at Berkeley and Nobel laureate Elias Corey at Harvard University. But he is not shy when it comes to explaining the merits of the ABCD project. "The rest of the industry is based on gluing things together that were never meant to work together. When the dust settles, everything here will come from the same code. It will be a uniform code base. It is the very best in the industry."
PHOTO CREDITS: DIMITRIS AGRAFIOTIS AND SANGTAE KIM BY JAMES WASSERMAN; WARD BY MICHAEL MANNING