By Kevin Davies
September 15, 2009 | From humble beginnings in 2005, Danish software company CLC bio has emerged as one of the leading software providers for the exploding genomics and next-gen sequencing market. The Danes say they aim to be “among the most innovative bioinformatics companies in the 21st century.”
After earning his Ph.D. from the University of Florida, Bjarne Knudsen saw a business opportunity to improve the quality, efficiency, and user-friendliness of life science software. He returned to Denmark, recruited his brother Thomas (now CLC bio’s CEO) to the cause, hired a few local software developers, and CLC bio was born.
The origin of the name is a closely guarded secret. “‘Cake Loving Company’ is a good guess, but it’s incorrect,” says CLC bio North America CEO Jan Lomholdt.
CLC bio initially took a “Microsoft approach” to bioinformatics. “Everyone has to have it on their desktop, download it from the Internet. It was global thinking from the beginning,” says Lomholdt. The resulting Workbench suite, going to head to head with the likes of DNASTAR, Vector NTI, and Gene Codes, proved competitive. It was platform independent and “people could just right-click and get [GenBank] directly into their program,” says Lomholdt. By 2006, CLC bio had serviced 100,000 free downloads.
Adding more functionality, CLC bio attracted its first industry customers. With next-gen sequencing platforms emerging, Lomholdt and colleagues worked with the major vendors—Roche/454, Illumina, Applied Biosystems (AB), and Helicos—to ensure it could handle all types of sequence data.
Released in 2008 at the Bio-IT World Expo, the Genomics Workbench operates as a desktop application for next-gen sequencing analyses. Early this year, CLC bio added a three-tier server architecture with CLC Genomics Server, providing a server structure for CPU-intensive jobs, a database component, and the ability to use Genomics Workbench as a thick client and develop and execute customized plug-ins centrally.
Lomholdt says that the major next-gen platforms are “very good in hardware technology [but] they are not good in doing software—and they know it. Customers demand downstream analysis. When we said, ‘Hey, we can do this,’ they said, ‘Finally, there is one we can point to when the customer gets mad!’”
Lomholdt admits that CLC bio was a little late in supporting AB’s color space, but does now support SOLiD data. Moreover, the vendors recognize that many customers use multiple platforms. “We can be the one that merges their different datasets,” says Lomholdt. “We can handle long or short reads and merge them to get a higher quality result.”
A coup was attracting the J. Craig Venter Institute (JCVI). “He [Venter] found out what we were doing, and said this is exactly what I need,” recalls Lomholdt. “We come from Denmark near LegoLand! We do bricks—build your own special plug-ins and increase the value of your program.”
Granger Sutton, JCVI’s senior director of informatics, said JCVI would implement the full enterprise platform as it integrated workflows “across different technologies and geographical sites.”
JCVI incorporated CLC bio’s accelerated versions of HMM algorithms into its pipeline for metagenomic annotations. According to JCVI director of bioinformatics software, Saul Kravitz, the single instruction, multiple data (SIMD)-accelerated tools increase analysis capacity and “take advantage of our annotation pipeline without any further hardware investments.”
One reason for CLC bio’s success is that it has consistently worked on algorithm optimization. Customers appreciate the improvements in RAM allocation allowing de novo assembly of a human genome on a single computer with 32 Gigabytes of RAM in 17 hours, as opposed to the massive RAM requirements by open source alternatives, says Lomholdt.
“We have an assembler that can challenge MAQ—higher coverage and speed, less RAM consumption,” says Lomholdt. Last June, CLC bio unveiled CLC Genomics Machine, bundling next-gen sequencing software with the hardware. “IT is a very important component. The bioinformatician is important, the scientist is important. Now we have a solution for all three.”
Among other notable clients for CLC bio is Saudi Biosciences. The sequencing of the first Arab genome was outsourced to BGI Shenzen, which transmitted the raw sequence data to CLC bio for the assembly as well as some “special analysis” for the Arabs (Lomholdt declined to elaborate). Head of bioinformatics, Ruiqiang Li, said Genomics Workbench is “simply in a league of its own when it comes to flexibility.”
Customers can obtain the software in many models—renting, leasing, owning, or site licenses. The Albert Einstein Epigenomics Center in New York is using the CLC portfolio for teaching, and the software used as an educational tool at Harvard and elsewhere. “One of our efforts is to help scientists get the Nobel Prize! Some say publications, I say Nobel Prize—why not?! To do that, you have to visualize the analysis.”
The CLC Genomics Server won a Best of Show award at the 2009 Bio-IT World Expo. University of Pittsburgh’s Michael Barmada called the CLC Genomics Server an ideal platform and said, “it’s nice to see complex computational algorithms and routines presented in a user-friendly environment with a very elegant interface.”
The company now has 50 staff, with bureaus in Singapore, Brazil, India, the U.K. “We think we’re in a very sweet position between the vendors, the customers, the HPC, the analysis, and the algorithm development,” says Lomholdt.
This article also appeared in the September-October 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.