July 29, 2010 | Computational biologists have a need for speed. The computing cluster at the Howard Hughes Medical Institute’s (HHMI) Janelia Farm Research Campus delivers the performance they require—at a mind-boggling 36 trillion operations per second.
In the course of their work, Janelia researchers generate millions of digitized images and gigabytes of data files, and they run algorithms daily that demand robust computational horsepower. Geneticists, molecular biologists, biophysicists, physiologists, and even electrical engineers pursue some of the most challenging problems in neuroscience, chief among them how individual neuronal circuits process information. Their discoveries depend, now more than ever, on the seamless interplay of scientists and computers.
Humming nonstop in Janelia’s compact computing center are 4,000 processors, 500 servers, and storage machines holding half a petabyte of data—about 50 Libraries of Congress worth of information. (See, “Marshall’s IT Plan for Janelia Farm,” Bio•IT World, Oct 2006)
Though there are many larger clusters around the world, this particular one is just right for Janelia Farm. “Beautifully conceived, ruthlessly efficient, and extraordinarily well run by the high-performance computing team,” according to Janelia researcher Sean Eddy, the system is designed to make digital images available lightning fast while muscling through the monster calculations required to help investigators conduct genome searches and catalog the inner workings and structures of the brain.
A group leader at Janelia Farm, Eddy deals in the realm of millions of computations daily as he compares sequences of DNA. He is a rare breed, both biologist and code jockey. “I’m asking biological questions, and designing technologies for other people to ask biological questions,” he says.
Eddy writes algorithms to help researchers extract information from DNA sequences (See, “Eddy Wins 2007 Franklin Award,” Bio•IT World, Apr 2007). It’s a gargantuan matching game where a biological sequence—DNA, RNA, or protein—is treated as a string of letters and compared with other sequences. “From a computer science standpoint, it’s similar to voice recognition and data mining,” he says. “You’re comparing one piece against another. We look for a signal in what looks like random noise.”
Eddy looks for the hand of evolution in DNA by comparing different organisms’ genomes. He’s searching for strings of DNA sequences that match—more than random chance would dictate.
“It’s a lot like recognizing words from different languages that have a common ancestry, thus probably the same meaning,” he explains. “In two closely related languages—Italian and Spanish, for example—it’s pretty obvious to anyone which words are basically the same. That would be like two genes from humans and apes.”
But in organisms that are more divergent, Eddy needs to understand how DNA sequences tend to change over time. “And it becomes a difficult specialty, with serious statistical analysis,” he says.
From a computational standpoint, that means churning through a lot of operations. Comparing two typical-sized protein sequences, to take a simple example, would require a whopping 10200 operations. Classic algorithms, available since the 1960s, can trim that search to 160,000 computations—a task that would take only a millisecond or so on any modern processor. But in the genome business, people routinely do enormous numbers of these sequence comparisons—trillions and trillions of them. These “routine” calculations could take years if they had to be done on a single computer.
That’s where the Janelia cluster comes in. Because a different part of the workload can easily be doled out to each of its 4,000 processors, researchers can get their answers 4,000 times faster—in hours instead of years. The solutions don’t tend to lead to eureka moments; rather, they provide reference data for genome researchers as they delve into the complexities of different organisms. “These computational tools are infrastructural, a foundation for many things,” Eddy says.
While that may not sound dramatic, Eddy’s protein-matching algorithms are an industry standard, used by researchers as the search tool for a reference library called the Protein Families database, or Pfam. There are roughly 10 million proteins in the database. Luckily about 80 percent of those sequences fall into a much smaller set of families and Eddy has designed the analysis software to query for matches in this data set. “When a new sequence comes in, [Pfam] is like a dictionary—it’s always being added to,” he says. The database currently identifies 12,000 protein families.
There is also an RNA database called Rfam for which Eddy and his Janelia team have software design and upkeep responsibilities. Eddy has to keep one step ahead of his users, which means stressing his analysis tools to the failure point so he can improve them. “We set up experiments and try to break the software and push the envelope,” he says.
A Mosaic of Fly Neurons
The Janelia computing system is referred to as a “cluster” of processors by both its overseers and its users. The cluster serves 350 researchers and support staff and can scale up to serve many more if requirements demand it. Its design puts a premium on expandability, flexibility, and fast response, particularly since the scientific needs may change and evolve rapidly.
The computer cluster was recently upgraded as part of a regular four-year technology refresh. Made up of commercially available hardware components built by Intel, Dell, and Arista Networks, Eddy calls the system a “working class supercomputer.” Janelia is the first customer for this particular design—in fact, some of the components have serial number 1 or 2 and are signed by the engineers who built them. The new system is up to 10 times faster than the old one and has six times more memory.
“Janelia’s new computing cluster provides a platform that is an order of magnitude more responsive than the previous system and can be grown easily to accommodate changing requirements,” says Vijay Samalam, Janelia’s director of IT and scientific computing.
That expanded capacity is a big help to Janelia fellow Louis Scheffer, an electrical engineer and chip designer by training. He uses the cluster to help researchers map the brain wiring of the common fruit fly, Drosophila melanogaster. Essentially, it’s a massive three-dimensional image-manipulation challenge. First, slices of brain 1/1,000th the thickness of a human hair are digitally photographed with an electron microscope and stored. In each layer, the computer assigns colors to the neurons so researchers can trace their path. As an example, the medulla of the fly, part of the brain responsible for vision, requires more than 150,000 individual images to create the full mosaic, which is 1,700 layers (slices) deep.
But all these pictures must be knitted together so scientists can follow neural paths and see where they lead. Think Google Earth. As you pan across the globe, data are fed onto the screen so you can “fly” from one location to another, and more images are required as you drill down to examine surface topography. Making the transitions smooth in between images requires fine-tuned alignment. “It’s not completely simple—there are a whole bunch of distortions to deal with,” says Scheffer. Some are caused by the electron microscope itself as it dries out the target specimen during imaging.
To align one image to its neighbor takes about one minute of computer time. But once matched, the resulting checkerboard must be stacked and aligned with the mosaic of images above and below. “You need to make about one million comparisons,” Scheffer says. “It would take [a personal] computer four years.” With Janelia’s parallel processors on the task, the job is done in a few hours.
But Scheffer’s matching is just the first step in the image-manipulation process. Janelia software engineer Philip Winston takes the processed pictures and does the unthinkable—he chops them up again. He creates smaller “tiles” of the photos, which can be more easily added and subtracted from a computer screen as a researcher pans across an image. “To open a single image would take five minutes if you didn’t tile them,” says Winston. Only 20 tiles are required on the screen at any one time. Currently, Winston is working with four million tiles as part of the Janelia Fly Electron Microscope project to map the entire brain of the fruit fly.
Humans proofread the final fly-brain image for accuracy, to trace the neural paths and make sure the computer has identified structures correctly. “[People] are an important step,” says Winston. “Without them, the computer segmentation would be 95 percent right and we wouldn’t know about the other 5 percent.”
Scheffer and Winston’s ultimate goal is to completely automate the mapping process and to teach the computer to identify the inner structures of the fruit fly brain, in particular the different types of neurons, and the axons and dendrites branching out from them. “To do the whole fly brain we have to improve the automated segmentation,” says Winston. Scheffer hopes to achieve the computer-generated—and accurate—mapping of the brain within the next five years as more pieces are imaged and processed.
The increased speed of the new computing system will make that effort easier going forward. The cluster is faster than its predecessor for two reasons: it has more than four times as many individual processors, and it’s using a networking technology that speeds up the communication between processors. While it has been a longstanding protocol for slower connections, 10-gigabit Ethernet has not traditionally been the choice of makers and engineers of top supercomputers who until recently, when best performance was a must, used specialized networking technology called InfiniBand.
“Now Ethernet switches are as efficient as, or very close to, InfiniBand and you don’t need a different [networking] skill set,” says Spartaco Cicerchia, manager of network infrastructure at Janelia Farm. The bottom-line advantage is that Ethernet is easier to work with, familiar to more networking engineers, and tends to be cheaper to use.
Lower latency is now possible via Ethernet due to a relatively new networking standard called iWarp. Traditionally, computers’ processors must manage the flow of information packets as they pass between them. In the new systems, those packets are handled by a separate piece of Intel hardware. “Traditionally, [central processing units handle] network packets. However, when network interface speeds went from 1 to 10 gigabits per second, the load on CPUs increased by an order of magnitude,” says Goran Ceric, Janelia’s manager of scientific computing systems.
The creation of iWarp helped alleviate this issue and reduce latency and overhead. iWarp helps in three ways, according to Ceric: by processing network packets using specialized hardware instead of CPUs; by placing data directly into application buffers, thus eliminating intermediate packet copies; and by reducing a need for “context switching,” in which a processor must pass commands back and forth between an application and an operating system. “For many parallel applications,” he says, “if you can lower communications time between processors in different systems over the network, the better your performance is.”
The new network infrastructure has dropped the communications lag inside the Janelia cluster sixfold from 60 to 10 microseconds.
Janelia Farm fellow Roian Egnor’s studies on the vocalizations of mice (and the neural pathways required) depends partly on heavy computation power. Though famed for their quiet ways, it turns out mice are chatterboxes. All their communications, unfortunately, happen at frequencies between 30 and 100 kilohertz, far above the range of human hearing.
She is collaborating with another Janelia fellow, Elena Rivas, who is starting to process the recorded communications using a statistical analysis tool called a hidden Markov model. The software is similar to that which Sean Eddy uses to compare millions of DNA strands. “The cool thing about hidden Markov models is, you can tell them ‘Look, here’s what I think are good examples of what I want you to characterize. Learn them, and then I’m going to give you unlabeled vocalizations and I want you to see which match and which don’t,’” Egnor says.
Rivas has reworked a standard protein analysis program called HMMER3 to handle Egnor’s data, which comes in one-terabyte chunks. “The core of the programs is identical,” Rivas says. “We’re going to try to determine the types of [mouse] vocalizations and try to model each one.” Once those models, or “families,” are delineated, researchers can then test new mouse vocalizations against these templates. The analysis takes the computing cluster only a few moments to run. “The beautiful thing about Janelia is that I stream that [information] to the data share, and Elena picks it up and starts working on it,” Egnor says.
Making that transfer possible is another hidden attribute of the Janelia research complex—its internal network. It’s the pipeline that carries huge image or auditory files without clogging or slowing down the system. In the startup phase, that meant overbuilding the fiber infrastructure as much as possible and designing it to handle unpredictable loads through 10-gigabit ports. Janelia’s network is fully meshed and runs at “line rate”—meaning that the 40-gigabit/second data-center backbone is available to every user at all times, rather than being designed to serve only a small percentage of researchers as they need it while the rest ponder their research or go to lunch.
The computing cluster communicates with the rest of the campus through 450 miles of fiber optic cable, operating at 1 gigabit/second to users’ desktops.
The updated cluster also runs at an impressive 84 percent efficiency, based on the global standard, called the Linpack Benchmark, traditionally used to measure performance and rank top supercomputers. Right now, the Janelia system would rank roughly in the top 200 of existing computing clusters, says Ceric. Janelia plans to enter its cluster in the next edition of the Linpack ranking system this summer.
The installation’s increased efficiency is also better for the planet, since it gobbles less electricity. The old cluster ran at 25 million operations per second per watt. Now it can produce 200 million operations per second on the same amount of power. And it throws off less heat that ultimately must be air conditioned away. “It takes less power and we produce fewer BTUs,” says Cicerchia.
As Janelia researchers go about their day thinking up novel ways to explore neural networks, few contemplate the silicon marvel that quietly makes much of their work possible. But ask any of them to consider their research without the cluster and you quickly enter the realm of the unthinkable.
“In a single day at Janelia we can do something that would take 11 years on a single-processor workstation,” says Eddy. “We breathe CPU cycles like air.”
This article also appeared in the July-August 2010 issue of Bio-IT World Magazine and was adapted with permission from the HHMI Bulletin (May 2010).