A Guide to GeneMatcher
Paracel's vice president of product management Joseph Borkowski explains the history and attributes of GeneMatcher, the company's massively parallel supercomputer designed to enhance bioinformatics algorithms, to editor-in-chief Kevin Davies. Feb. 18, 2004
| "TextFinder searches huge amounts of unstructured text, accounting for spelling errors, transpositions, different kinds of words, proximities, etc. We have a large government agency that has been using this for over 10 years now. So the key thing is to search in real time, account for errors, misspellings, and gaps. The same kind of data search is now transposed into biological space. Instead of the errors being spelling errors in text documents, the 'errors' are changes in sequence due to evolution. GeneMatcher enables high-resolution searches of genomic data.
"When you do index searches on text, you can get quick results, but spelling is a problem. Same thing in genomic space — you can index the data pretty easily, and get the information back quickly, but if you start doing anything more complicated, it starts slowing down.
"So, with that in mind about TextFinder, the same technology can be used for searching genomic data, so you can get more complicated searches, higher resolution searches. Basically models such as Smith-Waterman, Hidden Markov Models, those kind of searches are very easy for GeneMatcher to do.
"Both GeneMatcher and TextFinder actually search every single character in the database; they're not taking any shortcuts. With GeneMatcher, all the data get screened, and you can give as complicated a model as you want to search the data. So, where we're getting a lot of sales is after the first round of annotation of the human genome, researchers then find regions where they want to have a more detailed search.
"Most of the large pharmas, a lot of biotechs, a lot of the large government labs worldwide are using GeneMatcher, where people want high-resolution searches and they want to mix the public data with their own data.
Mix and Match
"Genomic information is now moving into the clinical space, out of the in silico lab into the wet lab, and that's a much larger market. So there's a tremendous opportunity there for turnkey solutions. Take a lab of 100 people or so ... they're often not going to have a bioinformatics department. So that's where we come in.
"The GeneMatcher chip contains 192 custom processors. We designed all the circuitry. It's about the same size as a Pentium 4 chip, same number of transistors, but we were able to get rid of the parts of the chip that you don't need for biological sequence analysis. So this is how we were able to get a couple of hundred thousand processors into a single rack.
"A one-board GeneMatcher in various configurations sells for between $79,000 to $99,000.
"The real advantage of the GeneMatcher is that instead of ... hiring a bunch of people to put together a large server room with a large infrastructure to do bioinformatics, GeneMatcher has the accelerator component, but it also has its own Linux cluster. Each board of a GeneMatcher has about 3,000 processors on it, making it about 1,000 times faster than a general-purpose system. It's pretty smokin'!"