GSK’s Digital Transformation Roadmap

September 13, 2021 | When Mike Montello joined the GSK team, the first steps of the pharma’s digital transformation had already taken place. His focus was to integrate the enterprise digital transformation to support another transformation taking place in R&D. “We had made significant technical progress in understanding enterprise data and combining R&D data into a data lake,” Montello explained. From there, he and his R&D Tech group at GSK were tasked with accelerating adoption of digital, data, and analytics to improve the overall research and development productivity—and increase probability of success of new medicines and vaccines.

Montello will be joining us next week at the Bio-IT World Conference & Expo being held in Boston and online. He’ll be speaking on the Plenary Panel on how pharma is Broadening the Data Ecosystem.

Digital transformation, Montello said, is not a painless checklist and is fueled by culture, data, and technology. Data must be tagged with metadata, the highest value data-driven use cases must be identified, and then the organization's data must be stitched together in a way that is both useful and meaningful. And along the way, cultures must change. And for Montello, all this is in service of a broader transformation for the entire enterprise.

In 2022 GSK will spin off its consumer business and become a standalone pharmaceutical and vaccine company with a focus on the science of the immune system, human genetics, and advanced technology.

Montello recently sat down with Stan Gloss, founding partner at BioTeam, to discuss why tech is central to GSK’s evolution and what he’s learned about the highs and lows of the digital transformation process, why senior leadership buy-in is so critical, and his advice for others seeking a similar trajectory. Bio-IT World was invited to listen in.

Editor’s Note: Trends from the Trenches is a regular column from the BioTeam, offering a peek behind the curtain of some of their most interesting case studies and projects at the intersection of science and technology.

Stan Gloss: Tell me about your background.

Mike Montello: I started my career in biopharmaceuticals and healthcare over 21 years ago and have worked across the industry at data, software, tech, clinical research, biotech, and consulting companies. Three years ago, I joined GSK to help drive the transformation that was just getting started in both tech and R&D. I could not have asked for a better opportunity. The intersection of science and technology is what excites me most and it’s been really inspiring to help build the future and work alongside both Karenann Terrell (Chief Digital and Technology Officer) and Hal Barron (Chief Scientific Officer) who are not only world leaders in their respective fields, but had a shared vision that technology should drive R&D.

But to get to know me and how I solve problems using digital approaches, it’s important to know my background, which has been at the convergence of art and engineering. As a child I was really into music, mathematics, and computing. I started programming at 12. By 14, I learned complex piano pieces like Gershwin’s Rhapsody in Blue that literally took one year to master. It instilled in me a work ethic that has shaped how I think about solving problems. Overcoming constant failure is the key—and working through constant feedback is just like code: write, fail, improve.

So now looking back, I can see that it’s been a combination of art, mathematics, and persistence that now exemplifies my approach to solving problems.

I have a degree in engineering and then post-university I spent about 12 years in consulting and that's how I got introduced to healthcare and the important purpose of improving patient lives. I now have spent a total 21 years working at the convergence of science and technology and working to do the best I can to accelerate the discovery and development of medicines.

What's your current role?

I lead the R&D Tech organization at GSK, which is part of the broader GSK Tech capability. We are accountable for the strategy and delivery of digital technology, big data, and analytics technology, and over 500 applications used in early research, first-in-human and late-stage clinical trials, through to medicines and vaccines approval. R&D Tech has transformed into a product-centric organization, and runs very similar to a software engineering organization and we are an interoperable capability which supports the end-to-end process of R&D.

What makes the role and team unique in the industry is the end-to-end platform approach from data generation (both internal and external) through to the big data and analytics tech used for data science, all treated as an interoperable architecture. This enables our outstanding team to have a tremendous impact on delivering new medicines and vaccines. In the last few years, GSK has had an impressive track record of delivery with 11 approvals since mid-2017, a 90% success rate in phase 3, and doubled the number of assets we have in phase 3. It’s tremendously rewarding to have a role that contributes to improving healthcare and helping patients.

Where is GSK in its digital transformation journey?

GSK is about three years into its tech transformation, which is powering the digital, data, and analytics solutions across the enterprise. And in R&D specifically, there has been a similar effort, which put a major emphasis on culture and how data can inform decision-making, including the ability to take smart risks. The digital transformation includes the implementation of strategic partners, the transformation of skills, the adoption of both agile ways of working and a product-centric approach to managing tech delivery, the acceleration and adoption of future-proof tech, and maintaining a robust core to enable the business to operate seamlessly and securely during transformation. I’ve been really pleased and amazed by how much progress we’ve made and the role data has played in helping scientists focus their efforts has been really rewarding.

How did you approach data in your Digital Transformation journey?

We run R&D Tech today in formation like a software company. The big data journey in R&D started with a data lake. The first step was a concerted effort to integrate data that was sitting in silos and make the data accessible. Our data lake was called the R&D Information Platform (RDIP) and the first iteration allowed us to catalog our data and understand which use cases and secondary data assets would provide the most value to our medicines pipeline. The initial use cases included insights that enabled target ID, predict-first processes in research, medicine design, and digital CMC approaches. In the latest generation, we have created a step change in our RDIP capability and introduced a new approach to democratizing data using a data fabric for research and development and next generation security technology. The use cases have now expanded to enable clinical study planning and design, clinical operations, and pipeline governance.

With RDIP, we can now leave the data where it is, and we virtually bring it together to create data products. The approach is modern and allows the teams to move at pace and enables delivery of data for data scientist and AI/ML teams. We want scientists to be able to access the right data at right time to enable decisions. Our approach to integrating both scientific and human data on one platform allows scientists to perform reverse translation, biomarker studies, and target discovery.

Was breaking down the old silos of data in your data lake difficult?

Yes, it was hard and continues to be an architectural challenge to enable scale. Curating our data required a significant investment in time, people, and underlying big data tech and computing. We started with a data center of excellence with data scientists and data engineers that began to curate data domains. There are data domains for assays, for molecules, for genetic associated data as an example. Before the transformation, data moved in batch and traditional ETL. Now we are moving to a phase where data can stream on demand, or when events are triggered through the fabric. This investment in technology is helping to scale the solution to the next level.

Are your data FAIRified?

First the data needs to be FAIR and that’s a standard in industry. Within R&D Tech, we have a talented and dedicated team that is focused on data quality, data protection, and data privacy. This team helps to automate data quality and works on important features like metadata management, de-identifying data, and ensuring the safe handling of data when used by scientists. These features are fundamental to ensuring data is findable, accessible, interoperable, and reusable. A dedicated data engineering team looks at sharing data from an ethics perspective as well as legal and privacy.

To enable accessibility of data, we are advancing the use of natural language processing and knowledge graphs to auto tag and organize information. Users can now search for information using natural language.

Did you have to tag all the data in your data lake with metadata? What were the challenges doing that?

Yes, the data is tagged with metadata. One of the key challenges is automating tagging because of the velocity and variety of the data that is generated across the research hubs and collaborations. We are constantly trying to decrease the delay in having production ready datasets and aim for all data to be available within hours. The challenge also is metadata needs to be applied at different layers of the architecture. For example data pipelines, data assets and products, and models themselves. Metadata is applied to every bit and through metadata we can qualify data lineage and have full traceability. When a model generates an insight, the data used in the model can be traced all the way back to system of record through the use of metadata.

Applying all that metadata requires the cooperation of scientist and staff. How have you worked to gain cooperation?

When scientists create the system of record, scientists need to describe the data that is being created. When a data pipeline is created, a data engineer needs to describe the data pipeline. There's human change management that is part of this overall system and a culture of data integrity is critical to success. The culture prioritizes skills development around data literacy and treating data as a product.

Did you run into any cultural issue with the user community not wanting to change?

Throughout the transformation, there were early adopters and there were others who were resistant to change, which is normal. What pleasantly surprised me the most was that our talented teams were ready to change. Momentum was created through the creation of the first generation of the R&D Information Platform. But more importantly, a strong culture of purpose fueled the momentum to create a step change in our digital, data, and analytics capabilities which allowed us to implement new strategic partnerships, a product-centric operating model, and step change in the technology that powers big data and analytics in R&D.

Why are end-users resistant to sharing their data?

The complications of sharing and using data are not necessarily behavioral and related to end users wanting to keep their data. Resistance is created based on restrictions embedded in the data itself. We are trying to find ways in the foundation to automate data sharing in a way that protects privacy, security, and intellectual property. Organizations must make sure that data is being shared in a very safe way and ensuring data integrity at every step.

What do you think are the biggest challenges that you have faced and what challenges do you see in this process going forward?

Making the volume of data accessible globally is the biggest technical challenge. Data is being generated by functional genomics and single cell, high throughput screening. The volume of data creates an engineering challenge because we must think about cost-effective storage and the use of a combination of on-premises and cloud computing. Petabytes of data are streaming from our laboratories around the world and need to be made accessible to researchers located in Hubs across the UK, Europe, and the US where latency and application performance is a concern. We have outstanding data and software engineers who are solving these challenges.

Another challenge is implementing an effective data governance program, ensuring that the right person has access to context-specific data at the right time. Data governance at scale across a network of internal and external datasets is a greater challenge than the technical challenge.

What role does AI/ML play in GSK’s Digital Transformation?

GSK’s R&D strategy focuses on expanding the number of genetically validated targets in our research portfolio. We know they are at least twice as likely to succeed as medicines and they now make up 70% of our targets in research.

Combining human genetics with experimental tools from functional genomics with AI/ML will enhance the impact of our portfolio with increasing precision, speed, and scale.

Our Tech team works closely with our dedicated, in-house AI/ML team led by Kim Branson who is a world leader at the intersection of medicine/AI and drug discovery. As Kim’s team grows to over 100 machine learning experts, our digital transformation is enabling analysis ready data sets in hours and provides a qualified computational infrastructure setup to train and infer at scale.

How does the acquisition of third-party data come into play?

Third-party data is essential to enable the R&D strategy and external data enables us to build a sizable training corpus for machine learning. No matter what data we have generated internally, we are always interested in gathering more data from different sources. Data allows us to build better models. GSK has put a strong focus on genetics because it can double the probability of success of a new medicine. So we have amassed a significant amount of data through strategic collaborations, both with genetic databases and with partners who work with us on functional genomic experiments which take up a lot of compute power. We also have a knowledge graph that goes beyond even the significant genetic databases—now the relationships number in 700 billion triples.

Do you have to massage that data from external sources to make it usable in your frameworks, in your foundation?

It’s very dependent on what the type of question you’re answering. The tools that are in RDIP foundation allow data engineers the ability to ingest and then prepare datasets to answer a scientific question. Many times, data does need to be prepared. There is typically more time spent in data preparation than building a model or algorithm. Our data tech foundation includes a focus on automation to decrease the amount of time it takes to prepare data sets for analysis.

When you say data prep, is that similar to data curation? Is that a part of the curation step to curate it into the framework?

Data is ingested and delivered as a data product, a “swatch” within an overall fabric. This is like curation. Each data product that is produced is purpose driven. As an example, our team has built a series of data products which surface in an analytic called a control tower, enabling users to monitor performance and take action to improve R&D performance.

What career advice could you give others who are going down the same journey?

The best advice I would give is to never be content in your current position and always maintain a north star to what you want to achieve five years from today.

There was one time in my career where I became a bit content, and there was a restructuring at the company where I worked which changed my role. The restructure lit a fire for me to get moving to update my skills. The restructure also forced me to go out and see what other roles were available. Looking back, I remembered I treated the situation in a positive light: as an opportunity. It was an opportunity to develop and do something new. After this situation, which happened early in my career, I adopted a growth mindset and now continuously challenge myself to learn and grow versus wait for a trigger outside of my control.

The pace of adoption of tech is at a level I haven't seen before in the biopharmaceutical industry. Digital adoption has been accelerated as we managed continuity through the pandemic and launched solutions for COVID-19. It’s important to not sit idle and to keep learning. Keeping up is hard but we must becoming comfortable that skillsets today may be obsolete in 3-5 years.

What advice would you give an organization that is going to travel down the digital transformation path? What did you learn along the transformation journey specifically?

There’re a couple of things I would share. First is to lightly govern with a centrally guided approach, but to empower teams to locally execute. One of our first areas of focus was changing ways of working and pivoting to agile and product-centric teams, taking best practices from Silicon Valley and software engineering-based product companies. We run R&D Tech today in formation like a software company. We have a talented team of product directors who deliver business impacts in R&D, everything from improving target ID all the way through to the late-stage processes like pharmacovigilance and regulatory operations. Product directors deliver quarterly releases. We have a culture that accepts failure in one of the releases to build in the learning into the next release. When you focus on value in quarterly releases, the cost of failure is a lot lower than the traditional deployments that could take one year or greater to deploy.

The light governance included representations across each of the business units, e.g., R&D, Supply Chain, and Commercial. Each area of the enterprise could implement change at their own pace while cross sharing across all the organization through a central team of product and agile leaders. A culture of transparency and continuous improvement was key to making progress at pace.