St. Jude Announces Availability Of Clinical Genome Sequencing Data In Real Time Via Cloud

July 11, 2019

By Benjamin Ross

July 11, 2019 | St. Jude Children's Research Hospital has announced the real-time availability of clinical genome sequencing data on the St. Jude Cloud. The data will be uploaded to the cloud in a private, secure environment on a monthly basis, according to the hospital.

The announcement comes more than a year after the initial launch of the St. Jude Cloud. At the time, the institution reported the platform would allow scientists to explore more than 10,000 whole-genome, 6,000 whole-exome, and 1,500 RNA-Seq samples paired with clinical data from more than 10,000 pediatric cancer patients and survivors, with 10,000 whole-genome sequences expected to be available by 2019.

The initial release of data occurred in May, with a second round from 1,000 patients scheduled for July. In an official statement, St. Jude said data from approximately 500 subjects will be available every year for the foreseeable future.

Making data publicly available in real time was a goal for the platform from the beginning, Scott Newman, who leads the Bioinformatics Analysis group at St. Jude, tells Bio-IT World. "The real-time data initiative is a shift in how we think about data sharing."

Traditionally, sharing data from sequencing studies was the final step—after researchers had identified a group of patients, applied for a grant to sequence their genomes, analyzed that data for a few years, and published a paper. The new model for sharing sequencing data is prospective, Newman says, adding considerable speed to the process.

"Now, as soon as clinical duties are complete, the case moves over to the research environment where it's de-identified and … the data gets uploaded to the cloud," says Newman. The real-time availability of the data in a centralized location allows researchers to avoid downloading large files of raw data and instead directly apply their bioinformatics tools to it.

"It has become exceedingly difficult for researchers to download that [kind of] data to their host institution," Newman says. "With whole genome sequencing, the average patient has a bundle of about 100-200 gigabytes of data, and we've got hundreds and now thousands of patients." The St. Jude Cloud provides an online space where researchers can find, and more easily ask scientific questions of, the data.

Newman says the leadership team at St. Jude, including Jinghui Zhang, chair of St. Jude's Department of Computational Biology, and James Downing, president and CEO of the hospital, recognized the need to develop a solution quickly, starting with better handling of St. Jude's own data.

St. Jude offers comprehensive whole-genome, -exome and -transcriptome sequencing to every patient who consents to the testing, Newman says. These bundles of sequencing data are easily shared internally. The enormous challenge was how to make the data available to researchers externally, he adds.

The difficulty is that the data aren't attached to any research studies, says Newman, leaving St. Jude uncertain about how to go about making it more widely accessible. But doing so would help fill a void in the pediatric space where "discovery is never over." By understanding the genetic drivers of pediatric cancers, he says, "we can better diagnose, better stratify and assign risk groups, better target therapies, and better understand cancer predisposition."