By John Dodge

May 13, 2003 | In a tour de force of genomics, government research centers in Canada and the U.S. have decoded the genome of the coronavirus that is all but proven to be the cause of SARS (severe acute respiratory syndrome).

The British Columbia Cancer Agency (BCCA) in Vancouver first sequenced the SARS genome in the early hours of Sunday, April 13. The discovery's significance will expedite tests to diagnose the disease, which has claimed more than 150 lives as of this writing, mostly in mainland China and Hong Kong.

"This information in and of itself does not provide a cure," said Marco Marra, director of the Michael Smith Genome Sciences Centre (GSC). "It provides a test for this particular virus and will test that very hypothesis that this is SARS."

With only a bit more embellishment, the course of sequencing events has the makings of a Tom Clancy thriller. The decision to decode the genome was made March 27, officials said.

"[Our success] was a combination of several events, serendipity being one of the most significant," said Rob Holt, GSC head of sequencing. The "challenge," he said, was producing a DNA copy of the virus's RNA genome to work with. "That was the main hurdle. It's very temperamental," he said.

"It grows very poorly in tissue," said Bob Brunham of the National Microbiology Laboratory (NML) in Winnipeg, where the SARS sample was produced. "We had to grow it up to big enough levels [for sequencing]. The starting material was very limiting."

After several days of effort, the NML shipped one millionth of a gram of genetic material on April 6 to the BCCA. Within a week, the sequencing fragments had been assembled into the completed genome. Once started, the sequencing itself was "fairly routine," Holt said. The genome sequence was assembled in a mere 12 hours, but that doesn't mean the episode lacked for suspense.

"It wasn't like 'Eureka!' but [there was] immense relief when the first sequences came off the machine," Marra said. "They were all viral sequences and were coming from different parts of the genome. [We gave] an enormous sigh of relief."

The SARS virus genome contains 29,736 bases, and is available from GSC's very busy Web site: (a small caveat on the site notes "This sequence may contain errors.")

Oh Canada
The Canadians took considerable pride in narrowly beating the U.S. Centers for Disease Control and Prevention (CDC) in the race to sequence SARS. The CDC announced it had separately sequenced different patient samples of what is clearly the same virus on April 14. The U.S. sequence contained 15 additional bases compared to the Canadian counterpart.

"Research laboratories can use this information to begin to target antiviral drugs to form the basis for developing vaccines and to develop diagnostic tests that can lead to early detection," said CDC Director Julie Gerberding.

"This is essentially a draft. Now we need to see if what we have identified in the laboratory matches what's causing disease in patients," said William Bellini, SARS coordinator at CDC.

Although understandably elated to have scooped the Americans, BCCA researchers underlined their collaboration with the CDC and other research centers. There have been more than a dozen SARS-related deaths in Canada, with hundreds more infected.

"This collaborative effort demonstrates that the use of genomics crosses the boundaries of health issues and gives us the confidence that we'll be able to meet similar challenges in the future," said Simon Sutcliffe, president of BCCA.

BCCA researchers have concluded that SARS will not mimic the 1918 influenza pandemic, given its relatively slow spread. However, it's unlikely to abate in North America as warm weather arrives, as is often the case with colds and the flu, they said. The CDC has warned that the virus mutates rapidly and must be closely tracked.

Sidebar: Technology Triumph
The Canadian effort to decode SARS is a veritable showcase of sequencing and computing technology. The Genome Sciences Center (GSC) in Vancouver used a combination of IBM's biggest commercial servers, Linux workstations, 100-megabit networking, 802.11b wireless communications, and PDAs capable of wireless bar coding. The sequencing was done on Applied Biosystems (ABI) 3730 XL DNA sequencers.

"We've been in partnership with [GSC and the British Columbia Cancer Agency] for three years," said IBM Life Sciences' Sal Causi, based in Toronto. Causi said the BCCA has almost 400 nodes based on IBM xSeries and pSeries Regatta servers. GSC says it has 90 of those nodes running off eight-way eServer xSeries 440s in a Linux cluster using Beowulf technology. Various nodes are connected via a 100-megabit network over the University of British Columbia's BC Net. The storage capacity for the systems exceeds 4 terabytes.

"We did not anticipate it would be particularly taxing on the hardware," said GSC's director of bioinformatics, Steve Jones, referring to the trivial size of the SARS genome compared to those of mammals. "Everything just worked from the IT end. It was extremely exciting, but we've done most of this stuff before, usually in large mammalian sequencing projects."

The software workhorses were Phred, a base caller for reading the sequencing data, and Phrap, the assembly program for shotgun sequencing. Both were developed by University of Washington researcher Phil Green and are available free to academic research centers.

The entire effort shows that GSC is making good on its mission statement, which is "to find innovative means to automate the sequencing and fingerprinting process, develop cost-effective measures that will make such research financially viable and utilize state-of-the-art computing facilities to collect, mine, analyze and disperse data collected at this and other genome facilities."

The cost of the project was not immediately available, but Nicole Adams, BCCA spokeswoman, said GSC would likely be reimbursed by Genome Canada, the primary funding organ for genomic and proteomic research in Canada.