NIH’s Moonshot Project To Globally Register Medicinal Ingredients

January 31, 2020

January 31, 2020 | What’s in a name? When it comes to ingredients found in medicine, quite a lot of important detail. That’s why NIH’s NCATS (National Center for Advancing Translational Sciences) has been working to marshal a definitive resource for medicinal ingredients.

At NCATS, Daniel Katzel, a senior software engineer, is working with the Global Ingredient Archival System (ginas), a collaborative international effort to realize a global mechanism for substance information exchange. Ginas’ primary project—G-SRS (Global Substance Registration System)—is the software system that assists agencies worldwide in registering and documenting information about substances found in medicines. Ginas provides a common identifier for all of the substances used in medicinal products, using a consistent definition of substances globally, including active substances under clinical investigation, consistent with the ISO 11238 standard.

Katzel has been working in translational science for 15 years.  He began his career at TIGR, The Institute for Genomic Research in Maryland (now the J. Craig Venter Institute) writing bioinformatics software and genomics software to sequence, assemble, analyze, and annotate viral and bacterial genomes. Twenty to 30,000 viral genome submissions to GenBank later, he joined Rancho BioSciences and serves as a contractor at NCATS.

On behalf of Bio-IT World, Bridget Kotelly, conference producer with Cambridge Healthtech Institute, spoke with Katzel about the challenges he’s seen in translational science, and the effort to come to worldwide consensus on medicinal ingredient names. Their conversation has been edited for length and clarity.

Editor’s Note: Katzel will be speaking more about the applications to arise from the G-SRS Moonshot program in April at the Bio-IT World Conference and Expo in the program on Clinical Research and Translational Informatics.

Bio-IT World: What are the biggest challenges in translational science?

Daniel Katzel: We're generating a lot of data, and there is no good way to aggregate all that data together and answer these giant systematic questions. I see translational research as trying to get systematic answers for things and trying to make systematic changes for how we do it, drugs or treatments. In order for us to answer these kinds of questions, we have to have not only the data, but somehow to compile the data together. And the G-SRS project is one step toward that. For example, NCATS might say, "How many diseases don't have a treatment yet?" The building block for that would be to say, "Well, first, how many diseases are there? And then can you give me a list of all the different treatments and then let's see how they overlap and what diseases are missing a treatment." But we don't have a good way to even have those [conversations] yet. So, that's the kind of thing that we're trying to answer.

Tell me a bit about G-SRS and Moonshot and how it’s addressing the challenge.

The G-SRS project is like a periodic table of all the ingredients relevant to human and animal health. It builds on top of the ISO 11238 standard, which is a way to define substances. It's not just a chemical substance like a molecule, but also nucleic acids, parts of medicinal plants that are not really easily describable. There's a standard now of how to define these things.

The G-SRS project is a part of the ginas resource—the Global Ingredient Archival System. It is the first and, I think, only current implementation of this standard. It's being used internally at the FDA, and it's also being used in other regulatory agencies around the world, and hopefully growing worldwide as time goes on and more and more people start to be involved. The basic idea is that the G-SRS project will define all these different kinds of substances, so everyone knows the exact same thing is referred to in other places. The problem is that we have lots of different names for different structures, depending on what country you're in, or what company. We have to have a way to say this and that are the same thing without depending on how it's drawn structurally or how it's named.  If we're using some kind of name dictionary, on which most previous systems are based, we sometimes miss out on these connections when different data sources use different names for the same thing. The G-SRS project helped us link all these things together.

Then we can pull in data from the FDA, clinical trial information, product applications, and link them all together so we can have one giant portal where everything is in one place and linked together.

Are you adding users? Are people accessing G-SRS world-wide?

That's a good question. There's an internal version at the FDA and there are hundreds of people inside the FDA that have access to this information including non-public information. We periodically publish the publicly available dataset [for use by others including regulatory agencies and pharmaceutical companies]. The G-SRS webpage where we have an instance of just the public data is available worldwide; anyone can go and look at it without logging in.

What about the Moonshot project?

They called it the moonshot because, like our original mission to the moon, this is a very ambitious goal. When we were trying to get to the moon, we developed all these other technologies and made some discoveries along the way, essentially the technology effector of all these things that we wouldn't have had if we didn't have to go to the moon. There were a lot of similar things that happened when we were trying to make this G-SRS project. We made all these little side projects to help speed up the development of the user interface, for example. Chemicals that pharmaceutical companies were registering in the G-SRS would require someone at the FDA to manually draw the chemical structures, which was time consuming and error prone, so we built a tool to do image structure recognition. I developed a system so that users can switch between the underlying cheminformatics libraries and the computational code won’t have to be recompiled or anything; you can use all these different libraries on the fly, so to speak.

One big project built on top of the G-SRS project is called Inxight: Drugs, which is also made by NCATS.
Curators at Rancho BioSciences are annotating additional data on top of the public FDA dataset and adding more, things like targets and metabolite data. We're pulling all this extra information into one portal, so you can see not just all the substances, but how many of these are in approved drugs and what other targets and things like that. I think that is actually probably the most visited website of the G-SRS infrastructure.