YouTube Facebook LinkedIn Google+ Twitter Xingrss  

Sequencing, Sequencing, Sequencing

BGI-Shenzhen tackles the cute, the edible, and pretty much everything else.

By Alissa Poh

November 10, 2009
| SHENZHEN, CHINA—Boxing Day 2008: The Luohu border between Hong Kong and mainland China is crowded, smoky, and noisy. With my visitor’s visa to Shenzhen in hand, I’m cleared to visit the Beijing Genomics Institute’s (BGI’s) Shenzhen-based sequencing facility. Accompanied by my father and brother-in-law, I wave down the nearest cab. The cab driver doesn’t have the faintest idea where BGI is located. As none of us can handle his thick Mandarin accent, I’m forced to call Zhuo Li, vice president of BGI’s health care division. I hand the phone to the driver, and happily we’re deposited at BGI’s main entrance. It’s a tall gray-and-glass structure, distinctly newer and shinier than the neighboring buildings. My companions head across the street for a late breakfast (frog legs), and I wander in to meet Li.

The lobby lacks a smiling receptionist, tasteful paintings, or piped-in music. Save for supercomputers humming away within a glass-enclosed area and several ping-pong tables—naturally—in a corner, it’s Spartan. My sense is BGI’s staff wasn’t going to spend any time on décor that they could otherwise devote to research.

Li is tall, lean, and intense. He greets me in immaculate English, and escorts me on a whirlwind tour of the institute. It’s eleven floors of long hallways, each with its respective research unit—cloning, bioinformatics and the like—on one side, posters papering the opposite wall. Lab-coated staff are everywhere, poring over printouts, peering into cell-culture hoods, shuttling racks of test tubes from one lab to another. Most ignore me, apart from the occasional half-diffident glance.

BGI Beginnings
Li succinctly answers my questions, but as I discover, Chinese scientists are rather more close-mouthed than their Western colleagues. Getting them to elaborate beyond the facts is akin to tooth extraction.

It all started in August 1998 with the Human Genome Project, which geneticist Yang Huanming and three like-minded countrymen, all recently returned from U.S. postdocs, saw as the perfect way to position China on the genomics and sequencing stage. Yang’s plan was to utilize the Chinese Academy of Sciences (CAS), as its Institute of Genetics already had its own Human Genome Center. But he quickly concluded that CAS, bound by traditions, was lagging behind the rest of the world. In early 1999, he broke away, setting up BGI as a private, non-profit research organization. A few months later, at a conference at the Wellcome Trust Sanger Institute in the U.K., Yang announced China’s intention of becoming a global player in genomics.

Naturally, he was asked whether he had the money to realize his vision. As he later confessed to Science, he lied. Just four months after the conference, CAS funded three Chinese sequencing centers to tackle 1 percent of the human genome, with BGI receiving over half of the total award. But Yang didn’t know it at the time, standing at the podium and looking out at a sea of skeptical faces. He gambled on the funds somehow materializing, figuring that what the audience didn’t know couldn’t hurt them, or BGI’s image.

Three years later, the genomics world took notice when BGI metamorphosed from one man’s intangible dream to the cover of Science, having outraced its global competition to shotgun-sequence the indica rice genome. A reform-minded China proved that the will to succeed, spirited nationalism, and sheer manpower can be a potent combination. BGI split its sequencing team into 12-hour shifts so the machines could run 24/7 for the 74 days it took to finish indica. Dispensing with the commute between workplace and home, staff catnapped in hallways or simply dozed in their chairs.

By 2002, BGI had outgrown its initial home and relocated to an industrial park in Beijing, with an additional campus in Hangzhou. The original Beijing unit assumed responsibility for all commercial and outsourcing projects, while the Hangzhou branch focused on sequencing and academic research. Then in 2007, BGI made a major investment in next-gen sequencing technology—Illumina’s Solexa—and moved its headquarters to Shenzhen. The director is 33-year-old Jun Wang, a handsome, highly decorated Ph.D. from Peking University whose interest in genomics dates back a decade to the Human Genome Project.

The Chinese Way
New employees at BGI-Shenzhen don’t need reminding about the institute’s game plan. It’s right in their faces: poster-style and of billboard proportions, spanning an entire hallway. Printed in giant font, dead center, is a four-word slogan—“Sequencing is the basic!” It’s the foundation for moving into broader biological systems and processes—analysis of DNA variation and global methylation, protein networks, and metagenomics, ultimately providing individualized health care and agricultural advances.

Large-scale research is a mainstay of BGI-Shenzhen, and the “Tree of Life” project among its most prominent. Silkworms, cucumbers, chickens, and pigs are but a few examples of organisms large and small that the institute’s scientists have already sequenced. On the wall-sized poster, they’re lumped into three groups: animals, plants, and microorganisms. Animals are labeled “economic” (ducks, for instance); “endangered” (the Chinese river dolphin); or “model” (Drosophila). Similarly, microorganisms are categorized as industrial, pathogenic, or environmental. Projects past, present, and future are annotated, respectively, by red flags, green stars, and yellow circles.

BGI-Shenzhen is perhaps best known for the panda genome, as well as a Han-Chinese individual whose genome was but the third announced and published worldwide, after Watson and Venter.

Back in February 2008, the institute launched its International Giant Panda Genome project, aiming to sequence and assemble the draft sequence within six months. The honor fell to Jingjing, the prototype for the Beijing Olympics’ panda mascot. The project was wrapped up by October. This ranked among China’s top ten technology accomplishments for 2008, and is viewed as a major step toward understanding why pandas eat only bamboo and have poor libido. It also suggested that rather than being related to raccoons, they likely hail from the bear family.  

The first (human) Asian sequence is the starting point for BGI-Shenzhen’s Yanhuang project—so named for the Mandarin saying yan huang zi sun, or “descendants of Yan and Huang,” two emperors from ancient times that many Chinese consider their earliest ancestors. The institute has its sights set on sequencing at least 100 additional Chinese genomes, to better study genetic variations among China’s different populations.

“It got a lot of media attention,” Li says of the November 2008 publication in Nature. “Not long afterwards, we received RMB10 million [$1.46 million] from an anonymous Chinese donor. He’s interested in decoding personal genetic information to improve biomedical research, and wants to help this project move forward.”

Information gleaned from Yanhuang will contribute to the 1000 Genomes project, aimed at creating the most finely-tuned reference map of human genetic variation to date, down to the 1 percent level. BGI-Shenzhen is one of the key players in this undertaking.
Other initiatives include a “strategic alliance,” since early 2008, with Knome, George Church’s personal genomics company. The latter gets prime access to BGI’s capabilities in whole-genome sequencing, assembly, and annotation for its private clients. BGI-Shenzhen is also one of 13 academic and industrial participants in MetaHIT, a four-year project financed by the European Commission to study connections between genes of the human intestinal microbiota and our health, zooming in on inflammatory bowel disease and obesity. In addition, a Sino-Danish diabetes project involves deep-sequencing of exons and other conserved genomic regions from more than 4,000 individuals, in an attempt to discover genetic variations linked with obesity, type 2 diabetes and hypertension.

“We’re a completely private organization, with an annual budget of [$30 million],” Li says. “So to feed ourselves and carry out all our projects, we rely on revenue from these collaborations, and our spin-off companies [ten in total].” BGI-Shenzhen also benefits from  the generous support of Shenzhen’s municipal  government.

‘Omics Know-How 
BGI-Shenzhen relies heavily on Illumina’s Genome Analyzers for its myriad sequencing projects; at last count, April 2009, their fleet had expanded to 29 (eight are in Hong Kong). The machines are kept in continuous production, churning out data at a daily rate of 60 gigabases (GB). “We could sequence the human genome 20 times a day,” Li says, “but we probably won’t load all our machines with just the one sample.” He’s not joking, and yet I wonder if one can’t take the last half of his statement as a bit of deadpan humor.

Might they eventually switch to another platform such as Life Technologies’ SOLiD system? “We’ve developed all our software and applications based on Illumina, which is why we mainly use their technology,” Li responds. “But we do have two SOLiD machines, and we may get more. It doesn’t necessarily mean we’ll switch; we’d like to make good use of both [technologies].”

The software developers work within BGI-Shenzhen’s energetic bioinformatics group, one of the largest in China, if not the world, directed by seven-year veteran Ruiqiang Li. “I don’t know any other place with so many bioinformaticians [200 and counting] under one roof,” Zhuo Li affirms, adding that many are 25 or younger, with some of the brightest stars barely out of college. Designing novel analysis tools capable of handling short-read sequences by the ton is among the group’s specialties. Their Short Oligonucleotide Analysis Package (SOAP), for instance, includes de novo software where assembling large genomes—panda, human and the like—takes just about two days.

BGI-Shenzhen also has an active health care platform, which Li manages. They’ve developed a variety of affordable, quality diagnostic tests—for instance, tissue-matching via the gold standard Sequencing-Based Typing (SBT). China is seeing an increasing number of bone marrow transplants, yet most diagnostic laboratories remain unequipped with the expensive SBT commercial kits. Hence BGI’s decision to manufacture their own SBT reagents and software.

Public health and smarter disease surveillance are additional foci, particularly “digitalized health,” complete with databases for personal health records. Li’s platform has successfully introduced this improved system to Chinese communities in Yunnan Province, Inner Mongolia, and Tibet—nearly 25,000 individuals in all—and with the support of their local government, they’re doing the same for residents in Shenzhen’s Yantian district, which surrounds the institute.

 Not surprisingly, BGI dabbles in cloning and genetic engineering, mainly for agricultural and animal husbandry purposes. Researchers in this division use handmade cloning (HMC) technology—a cheaper and simpler alternative—to produce transgenic pigs. They’ve already created a porcine model of Alzheimer’s, collaborating with Danish scientists.

Looking Ahead
Rather than dwell on how many years they’ve been in existence, folks at BGI consider their institute “as young as genomics.” And much like the field itself—which has accelerated at lightning speed within the last decade—BGI now has five additional branches across China, plus a presence in Hong Kong, Denmark, and the U.S. (California). Coming from a tiny brick building devoid of staff, equipment, or money, it’s phenomenal growth. So what does BGI see in its future?

A “Personal Genomics Industry” by 2012, for starters. BGI believes the cost of human genome sequencing could drop below $1000 soon, making feasible an era where digitalized and personalized health records are affordable. They’ve estimated the Chinese market for such services at a fat RMB1 trillion ($146.5 billion).

And of course, more sequencing—of the cute, the edible, and anything else imaginable. BGI-Shenzhen scientists are already working on the emperor penguin, the Tibetan antelope, and the polar bear. “These [creatures] will help us understand how living organisms adapt to extreme environments,” Li says. “And we think it is fun as well. Actually, we want to sequence everything—and we will.” 

Blueprint of the Super Computer Center
Genomics is a typical data-intensive computational application. The sequencing platform generates over 10 Tb raw data every day currently.

• To the end of 2008, 20 Tflops (Tera FLoating point Operation Per Second), 1 PB storage
• To the midyear of 2009, 50 Tflops, 5 PB storage
• To the end of 2009, 100 Tflops, 10 PB storage

System Architecture
• Computational capability: >100 Tflops, Linux cluster system, 4 ways x 4 cores CPUs and
 32-64 RAM per node, ~500 computing nodes;
• Storage: 10 PB, large-scale parallel file system, high speed I/O;
• Network: 10 Gb computing Ethernet, 100 Mb management Ethernet;
• System: professional high-performance Linux cluster system and job management;
• Software: bioinformatics software development by our own team

RMB60 million ($8.79 million)
















This article also appeared in the November-December 2009 issue of Bio-IT World Magazine.
Subscriptions are free for qualifying individuals. Apply today.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1

For reprints and/or copyright permission, please contact  Jay Mulhern, (781) 972-1359,