Biotechs, Pharma Talk Cloud at AWS Symposium

By Allison Proffitt

June 8, 2021 | At the AWS Healthcare and Life Sciences symposium last month, several biotech, pharma, and genomics organizations presented their cloud-enabled workflows and highlighted how Amazon Web Services is enabling their research and efforts.

Wilson To, Worldwide Head of Healthcare for AWS, opened the keynote presentation by crediting Amazon’s success with founder Jeff Bezos’ main tenets: put the customer first, invent on behalf of the customer (and be willing to fail), and finally be patient with outcomes. “It’s allowed us to be stubborn with our vision while being very flexible with the details,” To said.

For healthcare, To said AWS aims to accelerate the digitization and use of health and life sciences data to increase person-centered care, and has developed a dedicated healthcare practice over the past eight years to better serve the space. He highlighted Amazon HealthLake and Comprehend Medical as purpose-built services to advance healthcare ends.

AWS has worked closely with customers to deliver on the personalization of healthcare, To said. Here are a few of the highlighted projects and efforts.

For Insilico Medicine, Petrina Kamya outlined how Insilico’s drug discovery team—made up of AI scientists, computational chemists, structural biologists, medicinal chemists, and software developers—uses three drug discovery and development platforms hosted on AWS to discover novel targets and compounds and forecast clinical trial success. Panda Omics is the AI hypothesis-generation engine Insilico uses for ‘omics data analysis and interpretation and target identification. Chemistry42 designs small molecules using a multiparameter optimization approach to de novo drug design, ingesting both experimental data as well as conducting virtual screenings against a library. And InClinico is the platform currently under development to predict how lead-like compounds will perform in clinical trials including predicting safety issues, efficacy, and trial success. All three platforms, Kamya said, are validated in Insilico’s own drug discovery pipeline. She highlighted 12 current programs, two of which are nearing IND-enabling.

Eli Lilly team members Kent Supancik and Chris Blessing reported their efforts to transform Lilly into a data and insights-driven powerhouse. Their ongoing Enterprise Data program first seeks to improve operational efficiencies, reducing hardware and software redundancies, data maintenance costs, support costs, costs of redundant data and more. “After year one, we’ve been able to accelerate our return-on-investment projection by over 50%,” Supancik said, “meaning we’re going to cut that period of time in half that we believe we can get to those operational efficiencies by leveraging AWS capabilities.” The advantages won’t be limited to clearing redundancies. Supancik says the next step is to optimize top line use cases including in research, real-world evidence, and sales and marketing. Lilly took a federated approach and worked with teams to deliver data to the use cases while keeping the platform agile and incremental, building enthusiasm for the platform. AWS has been a crucial partner in this effort, Blessing said, serving as the integrated storage provider plus a holistic partner facilitating peer connections, Well Architected reviews to identify best practices and areas for adjustment, and training.

George Asimenos gave an update on DNAnexus’s development for the UK Biobank of a cloud-based Research Analysis Platform. Announced last August, the platform is being developed by DNAnexus with AWS. (AWS sponsors the petabytes of storage needed for the main UK Biobank data.) The Research Analysis Platform lets approved researchers gain access to a specific subset of data they have requested for their stated research goals. Within the platform, researchers can run analyses via off-the-shelf analysis tools, their own Linux-based tools, or interactive tools like the IGV Genome Browser, JupyterLab, or DNAnexus’ Cohort Browser. The data are held in an AWS S3 bucket while the researchers’ virtual project spaces and analyses are provided in the DNAnexus layer. The data delivered to the researchers are “soft copies” of the originals, Asimenos pointed out, not multiple copies of the UK Biobank data. In addition, these soft copies are protected by pseudonymization. DNAnexus changes the identifiers in both file names and internal data before delivering data to research users. “It protects the original identifiers, and makes each researcher see a unique identifier… specifically for that researcher,” Asimenos said. Changing file names isn’t that hard, he conceded. But pseudonymizing file content—while maintaining a single file within S3—is much harder, he said. DNAnexus does this via their watermarker service, running in a Linux container in an Amazon EC2 instance adjacent to the S3 buckets. “We’ve been able to do this in such a way that adds zero overhead to the transfer,” Asimenos said, “which allows us to change the header on the compressed file without even decompressing the file.” The process moves researchers closer to the goal: democratized access to the UK Biobank dataset.

Peter Goodhand, CEO of the Global Alliance for Genomics and Health, gave an update on the scale of genomics data worldwide and outlined the case for global genomic data sharing. More data can clearly better demonstrate patterns in health and disease, increase statistical significance, and increase accuracy in diagnoses and precision medicine, but we have moved past the idea that a few groups can hold all the needed data. Instead, we must share resources, and Goodhand argued for a needed paradigm shift from copying and downloading our shared data to data visiting. There won’t be one supreme data sharing model, Goodhand said, but centralized knowledge bases, data commons, federated hub and spoke models, and linkages between distributed datasets will all have a role to play. Among its goals, GA4GH is working to help the genomics and health communities take full advantage of modern cloud environments by bringing algorithms to the data, Goodhand said. “We thought for many years as we saw this massive amount of data emerge that it could be overwhelming, a tsunami of data, that we wouldn’t be able to keep up with it from a technical point of view. So we organized in a way to turn it into a treasure trove of data that could be available to humanity. And the key to doing that in many, many instances is the use of federation and the use of the cloud.”

In the healthcare providers and payors track, Allison Heath at Children’s Hospital of Philadelphia (CHOP), gave insight into the Center for Data Driven Discovery in Biomedicine (D3b), a team of integrated scientists, clinicians, and researchers working to accelerate the translational space for healthcare. The Kids First Data Resource Center is a platform CHOP has built over the past few years focusing on childhood cancer and structural birth defects through collaborative research and data sharing. The program is built on AWS through CHOP’s partnership with the NIH Strides program. The Kids First portal is generally users’ first stop and it’s open to anyone, Heath said. The portal offers cohort building and analysis tools through Gen3 (offering framework services for approved users), Cavatica (with Seven Bridges), PedcBioPortal (knowledge base integrations) and more.

Kids First contains whole genome sequences for patients and many of their families. “We’re bringing in a lot of data here,” Heath said, “and we’ve created an infrastructure on S3… It’s really nice because then from that location—the storage—we can go in and look at different things, layer different tools, share it with different policies, etc.” For users, an on-demand Spark Cluster on AWS EC2 delivers the data they have permission to see to Zeppelin notebooks.

CHOP is now working on a similar pipeline for clinical data as well, Heath said, to link the research findings from genomics to the patient’s clinical experience. After looking at the landscape for clinical data tools, Heath said CHOP has landed on a FHIR-based approach. While FHIR was originally designed for electronic medical records, Heath has found that the FHIR framework has been easily extended to add research data types and deidentification. “We’ve been working with Amazon HealthLake to really start to bring in the research data that comes in from surveys or case report forms or other kinds of data matrices that people generate and be able to bring that into one system,” she said.

Tom Schoenherr, CCO of Ambry Genetics within Konica Minolta Precision Medicine (KMPM), looked at the role diagnostics and genetic testing can play in precision medicine. The biggest challenge, Schoenherr said, is the size of the data. “Our vision is not to just generate clinical data—both imaging data, pathology data, genetic information—but really to be able to label that information, structure it, integrate it, and allow our scientists and our bioinformatics teams to mine that in collaboration with our pharma partners and clinicians in an effort to drive incremental models and predictors for us to work hand in hand with pharmaceutical companies to drive new therapies as well as biomarkers,” Schoenherr said. He calls the goal “integrated diagnostics” and credits Amazon HealthLake with enabling the vision. KMPM is depending on AWS to power two platforms. CARE is a population health tool for assessment, risk and education. From Ambry’s high-risk screening tool through genetic testing, results return, and post-test genetic counseling, CARE manages the whole pipeline. CARE is already implemented at 215 sites reaching more than 3,000 physicians and more than 60M patients, Schoenherr said. CARE is solving problems for our customers, he added.

LATTICE is an in-development precision diagnostics platform to integrate multi-omic, multi-modal data. KMPM and AWS are partnering on this platform with the aim of developing a global laboratory network. “This program, and this initiative being partnered with Amazon Web Services, is going to drive the incremental insights that our clinicians and our pharma partners are really looking for in really focusing on getting the right patient with the rest test at the right time to drive true precision medicine.”