By Kevin Davies
May 22, 2012 | The National
Center for Supercomputing Applications (NCSA) has selected T-Finity tape
libraries from Boulder, CO-based Spectra Logic to provide hundreds of petabytes
of data storage for its upcoming Blue Waters supercomputing system, one of the
most powerful supercomputing systems in the world.
Based at the
University of Illinois at Urbana-Champaign and funded by the National Science
Foundation, Blue Waters will be one of the
world’s largest active file repositories stored on tape media, scaling to a total capacity of 380
petabytes (PB)—the equivalent of 5,054 years of HDTV video or a stack of
books over nine times the distance from the earth to the moon—within the
next two years or so.
Scientists
will use the supercomputer for a variety of research applications, including hurricane/tornado
modeling, the Big Bang, and as a “computational microscope” for biomolecular structure
and modeling studies (see below).
Michelle Butler, senior technical
program manager for network and storage engineering for the NCSA, struggles to
contain her enthusiasm at the behemoth currently being assembled. “It’s a fantastic machine,” she tells Bio-IT World. “I’ve been here for 22
years, doing storage all along, I just can’t wait to get going on this. I’ve
got a 2 Petabyte machine [on the side]. But I want to see the 25 PB [disk
storage] run and scream!”
Not Top 500
Blue Waters
would almost certainly rank in the top 3 or top 5 supercomputers in the
semi-annual Top 500 rankings, says Butler, if
the NCSA chose to compete. “But that’s not a very good use [of our resources],”
she says. “It [the computer] is to be used for science and engineering. The
code for the Top 500 run is not a good judge of supercomputing. It’s just a
line in the sand. It doesn’t do any I/O or any real scientific work. We propose
a scientific app to gauge these very large machines. That’s what they’re
supposed to be running, right?”
Six teams, including that of Klaus Schulten, physics professor at
the University of Illinois at Urbana-Champaign, have been using the Early
Science System (ESS) for a few months. Blue Waters goes into full production
this fall, with 26 research teams currently scheduled for time. “Our scientists
say they’ve gotten more results in 6-8 weeks than the last 3 years,” says
Butler.
Blue Waters ESS
consists of 48 Cray XE6 cabinets (15% of the final total of 303) with 2 PB of
online disk storage. Cray took over the project last summer, when IBM pulled
out of a contract with the University of Illinois. Within two weeks of the
contract being signed with NSF, Butler says, the first machines were already on
the floor. “The machine is fully delivered,” she says. “All the cabinets are
there. We’re ramping up—the last truck arrived last Friday.”
When
complete, Blue Waters will have 25 PB of disk storage. (That storage system—Sonexion—is provided by Xyratex, a partner of Cray.) The disk storage will get the data
off to storage in another location, which includes a near-line system that has
a 1.2-PB cache. “This is where the Spectra Logic library comes in,” says Butler.
“The environment will be 380 PB of raw storage. It will have 244 tape drives,
starting this summer, and scale to 366 tape drives next summer.”
The Blue
Waters file system actually exceeds read/write rates of 1 TeraByte/sec,
settling at about 2.2 PB/hour, and boasts more than 200,000 cores. “It’s 4-5
times faster than any machine in the U.S.” claims Butler. NCSA is assembling
teams of experts to help the research teams get the most out of this immense
resource.
Virus Viewing
The leading life sciences
application on Blue Waters so far is managed by the University of Illinois’
Klaus Schulten, who leads a center for computational biology and software
development. “Our software is used by over 250,000 people. It’s particularly
successful on the most powerful computers in Japan, China, Europe and America,”
he says.
Schulten’s group is using Blue
Waters to perform molecular modeling studies of the HIV capsid. During HIV
infection, the virus’ capsid shell must open quickly to release its contents.
Schulten likens the process to finding the little perforation on a bag of
peanuts, which his team has successfully located. Now his group is trying to
understand how capsid release is programmed in order to identify new drug
targets to combat infection.
“A computer alone can’t tell the
story—you have to do experiments,” says Schulten. In collaboration with
Angela Gronenborn (University of Pittsburgh), Schulten’s group is performing a
clever trick: juxtaposing the detailed crystal structure of the individual capsid
protein with lower-resolution electron microscopy results of the native capsid.
“The computer knows the crystal
structure of the individual proteins at high resolution, [lined up] like
football players singing the national anthem… As the game starts, we can’t see
the details of the players. So we try to superimpose the structures onto the
action of the game.”
“We’re usually very proud if we
can do a 1-million atom simulation,” says Schulten. “Now it’s 60 million atoms…
We literally have 20-100 times more data [than a few years ago]—we have to
put [the data] somewhere!”
Spectra Vision
The tape
storage RFP (request for proposals) process won by Spectra Logic was very open,
says Butler. “We didn’t want to narrow the research into what was [technically]
possible,” she says. NCSA outlined the ideal footprint, the data capacity, the
number of media slots, reliability, and requested a performance of 100 Gigabytes/sec.
From a pool of ten vendors, four were seriously evaluated over about a year.
According to
Bill Kramer, deputy director of the Blue Waters project, the T-Finity solution
appealed because of its “high
enterprise-level performance, ready data accessibility and massively scalable
capacity. We are confident it will provide our user community with fast,
reliable access to the massive volumes of critical data stored within Blue
Waters’ Petascale near-line file repository.”
The Spectra
tape libraries will enable NCSA to keep all of its near-line data accessible in
an active repository. NCSA will begin by deploying four 17-frame T-Finity tape
libraries in the first year of operation, followed by two additional libraries
the next year.
“We are pleased to partner with NCSA and support one of the most powerful and
cutting-edge supercomputers in the world,” said Nathan Thompson, Spectra
Logic’s founder and CEO. “It is gratifying to see tape-based storage play a
major role in one of the largest, best practice HPC deployments to date
and to help support the important scientific breakthroughs and advancements the
Blue Waters project will enable.”
The storage solution was architected by storage integrator NET Source, a member
of Spectra Logic’s SpectraEDGE partner program. The
T-Finity will be integrated with IBM’s enterprise TS1140 Technology tape
drives. “Given Spectra’s support of TS1140 Technology and proven storage
solutions, Spectra Logic was clearly the ideal solution to meet Blue Waters’
high performance, data-intensive storage needs,” commented NET Source president
Joe Fannin.