Celgene’s Big Data IQ

By Allison Proffitt 

September 28, 2017 | In recent years, Celgene has been doubling down on its information management. “Information analytics enables Celgene to deliver on our business strategy by making information available,  accessible, and utilized in ways that helps the business answer critical questions,” explained Frank Malta, vice president of IT at Celgene. In January, the company launched an internal big data platform to more cohesively manage and analyze data to drive decision-making from pre-clinical research through commercialization.

About two and a half years ago the company launched an initiative called iKU (pronounced IQ)—Information, Knowledge, and Utilization. The iKU initiative comprises a technology and platform foundation; data governance; strategic partnerships; data science; and finally, “ a mindset change,” Malta said. iKU wasn’t meant to be just a technology platform, he adds, but a shift in organizational thinking on how to use the ever-increasing volumes of data inside and outside of the company.

“It’s a shift toward value-based and personalized care. There’s a real need to better understand both the profiles of our medicines—and the patients who are going to benefit from them—as early in the drug discovery process as possible,” agreed Patrick Loerch, senior director of data science at Celgene.

The platform—a key component of the iKU strategy—is called Synapse and has two modules launched in January 2017. “Explorer” catalogs and manages access to data within the organization. “Analyzer” provides the tools to conduct the analyses within the data lake, leveraging Hadoop and Cloudera.

“Historically, in traditional pharmaceutical companies you would have independent departments bringing in datasets and because of the size of the organization, it’s been very difficult to even have an understanding of what datasets are available,” Loerch explains. “With the Explorer functionality, we now have a catalog of all available data: both internal datasets as well as external datasets that we have access to through our partnerships. There’s a tremendous time savings… just in this first step of understanding what datasets are available and how to access them.”

Synapse lets Celgene ask questions that haven’t been accessible before, bigger picture questions that have the potential to drive the business in new ways. “There are certain types of studies that inform decisions at multiple stages of the pipeline,” said Loerch. For example, “building out treatment flows based on real world data. It’s getting an understanding of how a given disease is being treated in the real world, which is a question that informs decisions throughout the entire pipeline.”

Celgene hoped for improved speed, access, and performance with the big data platform, but Loerch said the platform is also letting the company store and access more than just data. The platform houses best practices, business rules, as well as computer code through GitHub.


“As people work with a given dataset, we can actually capture the business rules that are used for a particular type of project—whether it’s a treatment flow project or creating a comparator patient cohort,” Loerch explained. “So that when other areas of the company come in, they can access not just the rules at a high level, but the translation of the rules into the SQL code that was used to pull the cohorts from a specific database.”

New Mindset, New Questions

That kind of data sharing is great on paper, but Malta and Loerch laughed when we asked about data territoriality. “Inevitably there’s a learning curve; it takes time for a platform like this to be fully utilized,” Loerch said. “But in the end, I think people see the benefit. That’s really where a platform like this starts to take hold: when you start to realize the value from the specific business questions that we can now answer.”

Celgene knew that a new data environment and changed mindset would require a platform that is easy to use and well-managed, Malta said. “To do that, we have put into place data lead roles to provide governance. We’ve also put into place capabilities like tagging. You’ve got to tag it for it to be accessible and findable. We’ve also created consistent taxonomies that we all use,” Malta continued. “That type of structure is challenging to put in place, but it is what enables the speed that Patrick talked about earlier.”

Synapse is also designed to be used by people across the organization. Data access is limited by permissions, consent, and geography, but Celgene did not want data access limited by coding skill. The platform provides visibility for the data, not just an overview of what data are available.

“We create fit-for-purpose views of that data. That fit-for-purpose may be for a specific departmental need or a functional need,” Malta said, and Loerch and his team have created apps, so that non-programmers can explore datasets.


Malta believes it’s already paying off. There used to be more time being spent preparing data, he said; now teams can do real analyses and share them across business groups. “This is where the value comes from,” he said.

Loerch is especially intrigued with the cohort definitions that Synapse enables. “We’re able to take a lot of the great work being done out of HEOR [health economics and outcomes research] and market access and make that more broadly available.” Loerch said.

Target identification, biomarker development, HEOR, and market access groups will each define a patient cohort slightly differently depending on their specific questions, he said. Synapse lets the different groups compare and contrast across these cohort definitions when visualizing first line, second line, third line treatments for a disease using real world data. The resulting treatment pathways are also updated in real-time as that dataset is updated.

Build vs Buy

Celgene built the platform after finding that the solution ecosystem, “somewhat limited,” Malta said. The company wanted a solution to span pre-clinical through commercial workflows. “It would be extremely difficult to find something you could buy that would meet all of our needs,” Loerch said.

The company already had a cloud-first approach, and Malta is enthusiastic about the gains that have been possible thanks to cloud computing.

“As an example, as part of our myeloma genome project, we were able to analyze vast amounts of data in a cloud environment in just days,” he said. “In a traditional compute manner it would have taken us months to stand up those servers and that capability. And, when we were done with the cloud, we were simply able to turn it off.”

Having a cloud-first strategy also lets the company quickly respond to business needs, Malta said, and it enables partnerships in new ways. “Many of these partners are already working in the cloud arena; many of them are using AWS like us. It creates a much simpler model in terms of sharing data.”

Malta says the company “hasn’t yet” found an on-premises solution that works as well as the cloud. “At this point, there’s a cost advantage to work in the cloud environment. But in the future, if we’ve got use cases that require a tremendous amount of persistent data, and continuous analytics, we will need to consider on-premises and hybrid capabilities.”

Synapse, then, is a cloud platform as well.

Explorer is based on ConvergeHEALTH’s Data Asset Explorer product and runs on the SalesForce platform; Analyzer is based on the ConvergeHEALTH Miner product and is housed in AWS and uses Cloudera as its Hadoop back end to power analytics. The company did use what Malta called “accelerators” to speed up the build process. Deloitte’s ConvergeHEALTH group worked with Celgene throughout the development and deployment stage.

In the nine months since Synapse launched, Malta and Loerch have been pleased by the platform’s use; it is working to serve the iKU strategy. “When we leverage the platform, we’re leveraging it to drive value into the organization,” Loerch said.

Click here to login and leave a comment.  


Add Comment

Text Only 2000 character limit

Page 1 of 1