By Mark D. Uehling
December 15, 2002 | It's the dirty little secret of scientific IT: the Excel spreadsheet. Who can say what it holds -- a miracle drug? A new gene? Lab staff bets on the Super Bowl?
PerkinElmer Inc., GeneData AG, Bio-Rad Laboratories Inc., Bristol-Myers Squibb, and a few other companies all want to change that. But any change must be approached gingerly, so as not to frighten bench researchers who cling to their spreadsheets the way some children guard their favorite blankets.
For a variety of companies, all presenting their latest wares at the recent Chips to Hits meeting in Philadelphia, the prospect of instrument data flowing automatically into analytical applications is looming just over the horizon. The issue is not whether Excel spreadsheets will ever be banished by decree, but whether more efficient tools could convince scientists to make a slow transition to applications that manage and analyze large datasets more effectively.
For starters, Agilent Technologies and Rosetta Biosoftware (a business unit of Rosetta Inpharmatics Inc.) made a joint announcement about the ultimate end-user: the FDA. The two companies say the forthcoming version of Rosetta Resolver, version 4.0, will be fully compliant with 21 CFR Part 11, the encyclopedic data-custody regulation, with data coming off Agilent instruments and promising to land in Resolver in an easily-audited, secure manner.
Other companies are also thinking about what happens to data both up- and downstream from a particular machine or software application. "We have one application that looks at all types of data," said Joseph Schambaugh, senior bioinformatics specialist at GeneData, citing new Refiner software that pre-processes information from oligo arrays, spotted arrays and commercial products from Affymetrix Inc. Everything goes directly into a GeneData database. Recognizing that some scientists like to get their hands dirty, tweaking their own algorithms, the software allows that.
ArrayInformatics, to take another example, is year-old software from PerkinElmer with new capabilities. The program takes data coming off PerkinElmer gene-expression equipment and passes them directly to applications that analyze the data. "You don't have to know how to use complicated data sets," says Robert Fleming-Jones, a bioinformatics specialist at PerkinElmer.
He notes the information goes directly from the scanner into an Oracle database. "It's an out-of-the-box solution that is very accessible and very easy to use. Once you press the start button, you end up with all the things you need in the database."
The ArrayInformatics software allows scientists to filter out low or noisy signals from microarrays, normalize the data across microarray experiments, standardize the data -- and even filter out uninteresting data. Says Fleming-Jones: "The users don't have to touch the data as it's coming off the machine. They just plug everything together in the LAN."
Having trained as a biochemist and an IT person, he is not unaware of the continuing popularity of spreadsheets, and understands that some scientists will be unwilling to let go of their favorite file format. "You can grab the data as a spreadsheet at any point in the process," Fleming-Jones notes of PerkinElmer's software.
The company designed the software to appeal to scientists, but it has the scientific workplace in mind, he says. "They don't have to put their postdocs and grad students on this [application]. They can use their technicians because it is going to get done right. The ease-of-use is appealing for everyone."
Bio-Rad, while not known as an informatics company, announced software called VersArray, designed to do more than just work with gene expression data off the company's machines. The application finds and "grids" spots on microarrays, placing them into a known matrix of rows and columns for further analysis. The software is fully-integrated into a complete Bio-Rad system that prints the arrays which a robotloads.
"This technology is for people who can't afford or don't want to go to Affymetrix," says Harry Naghieh, Bio-Rad's business unit manager for functional genomics. "I call it home-brew DNA chip-making. In a couple of days, you can be analyzing data."
At Bristol-Myers Squibb, Stan Hefta is director of proteomics in the company's Pharmaceutical Research Institute. At the Philadelphia meeting, Hefta described a group that had grown dramatically, both in terms of headcount and alliances with smaller companies like Affymetrix, Athersys, Exelixis, Lexicon, Lifespan, Orchid BioSciences, Pharmagen, and Sequitur. "We increased dramatically the number of platforms," said Hefta. "We increased the number of databases we subscribe to."
That profusion of activity, in turn lead inexorably to a decision to wrap up what the company was doing into one package. "If you went into the mass-spectroscopy lab," he told the meeting, "you probably wouldn't see many people working there. The instruments are chugging along, spewing out data. Emails are sent, automatically, saying 'The data is here, come and get it.'"
Hefta notes his own group is putting 50 gigabytes of data into Beowolf clusters. "The goal is to collect as much data as possible and analyze it later," he said. "All of our data collection is completely automated. All of the data processing is automated." Does that mean scientists never see it? No. Just that intermediate step -- analyzing individual gels and microarrays -- is left to computers.
Part of the puzzle is the BMS Mass Informatics System, software which has a proprietary algorithm to tell the poor-quality signals from the good-quality ones. Says Hefta: "We need to get this data into the hand of end users, biologists, as quickly as possible, so they can get in and probe the data in any way they want. We are integrating as quickly as possible into a systems biology strategy."