Building A Marriage Between Medicine And Tech

July 12, 2019

July 12, 2019 | With engineering as his background, John-William Sidhom, a MD and PhD Candidate at the Bloomberg-Kimmel Institute for Cancer Immunotherapy at Johns Hopkins, says the ideal scenario would be to have a system where scientific research asks the relevant questions, tech provides answers to those questions, and the physician implements those solutions for the most clinical impact.

"The goal is to have these three dimensions where the engineering aspect brings in this computational ability to solve problems very creatively," he says. "As a scientist, I'm able to ask the most relevant questions and the most insightful ones, and then as a clinician, being able to really translate those findings into something that's clinically beneficial for patients and their family [is the primary goal]."

Connecting these fields would be a match made in heaven, says Sidhom, and we've already seen monumental steps being taken in deep learning algorithms and their impact on immunotherapies.

On behalf of Bio-IT World, Mary Ann Brown spoke with Sidhom about his recent work in deep learning, how the emergence of cancer immunotherapies can possibly break down existing data siloes, and how more analytics can be generated in big data.

Editor's note: Brown, Executive Directors of Conferences at Cambridge Healthtech Institute, is planning a track dedicated to Informatics for Cancer Immunotherapies at the upcoming Immuno-Oncology Summit in Boston, August 5-9. Sidhom will be speaking on the program. Their conversation has been edited for length and clarity.

Bio-IT World: Your recent paper, DeepTCR, a Deep Learning Framework for Revealing Structural Concepts Within the TCR Repertoire (DOI:, focuses on deep machine learning. Use of data in analytics is a scientific discipline that adds to your résumé. Please explain this addition.

John-William Sidhom: My background has been mostly in an area of mathematical modeling focused on mechanical dynamics, using mostly calculus and linear algebra in numerical methods. Coming into the cancer immunotherapy world, there's not so much mechanical modeling as there is data modeling.

With that, I had to adapt a new set of computational tools and approaches. A few years ago, I was at AACR, and Google was giving a talk where the focus was how they were using deep learning in other areas to reveal complex patterns in biological data.

I had heard about deep learning, but I guess I had never heard it pitched in this way. I recognized that perhaps there was a lot of potential here to apply deep learning into omics, particularly cancer immunology genomics. I walked out of that talk, I bought a book, probably the main textbook on deep learning, and I read it. I went through that and I took a course online and taught myself how to implement deep learning models.

Now deep learning is generally a combination of mostly calculus and linear algebra, but I had a background in mathematics, so it wasn't too difficult to translate that mathematical foundation into a deep learning application, and then begin applying it into cancer genomics.

The DeepTCR paper that exploits deep learning is saying in its problem statement that TCR sequencing is obviously a very new technology that we have at our disposal in the cancer immunology world, and it's been used mostly to assess the adaptive immune system. That’s very important because we believe that T-cells are a very important player in an anti-tumor response, and of course, T-cells act through an antigen.

The receptor kind of tells us something about what they're targeting or what they're using to kill off the tumor. The problem with TCR-seq is that it's very advanced, it's very complex data, and the patterns are very nuanced.

It seemed like a very good marriage between deep learning, which is very good at pattern recognition, and TCR-seq, which is very complex and rich with patterns, to create DeepTCR. DeepTCR tends to learn relevant structural patterns in the TCR receptor that may predict something about the receptor, whether it’s associated with some sort of response immunotherapy or whether it binds a certain antigen.

While neural networks can be frowned upon due to the risk of model over-fitting or because they are perceived as "black boxes", we have found them to be very powerful in our hands for discovering complex patterns in immune-genomic data. This is how DeepTCR came about.

What are the results of your initial research?

What we have found in the current version of the manuscript has been that these deep learning models can, with a very high accuracy, predict antigen-specificity, given a label. For example, if a network is trained to recognize a certain antigen TCR, it can then predict other TCRs and whether they will bind the antigen or not.

Something that we touch upon in our paper is that a lot of current analytics in TCR-seq are focused at the single sequence or single-receptor level, but often in TCR-seq, we sequence a repertoire of T-cells and that collection of T-cells may have some label on it. For example, a collection of T-cells might be the product of a certain immunotherapy.

In the paper, we also use a multi-instance learning approach where we can predict the type of immunotherapy a given TCR repertoire has been exposed to. This is an example where we can make predictions on not just a single TCR sequence but a collection or repertoire of sequences.

Traditionally, the complex data from cancer informatics and in immune response have been siloed. With the emergence of cancer immunotherapies, are these silos being broken down?

I think the answer is an obvious yes. Immunology and cancer genetics have often been fields that have been pretty siloed.

I think what we're starting to realize in medical science is that genetics are an important foundation for almost everything in biology, especially in human biology. Genetics are your foundation, and everything can be built in light of a genetic understanding.

We're seeing this influx of sequencing technology, which are genetically based technology, into immunology to create these large datasets that are comprehensive, and it gives a lot of information. The result of this marriage of genetics and immunology has been big data, and with big data comes very challenging analytical questions or analytical problems, and this is where machine learning and AI are going to become very helpful and valuable in being able to parse, if you will, the child of these different disciplines.

You mentioned technologies, such as single-cell analysis, as well as big data. Is there new technology that needs to be developed, or do we need more analytics for the big data that is being generated?

In general, I would say analytics always lag behind technology because tech develops first and the analytics come after, but I would argue that there is still a need for tech development with very focused questions in mind.

For example, in our field, we are very interested in tying antigens to T-cells. We need data that's very focused so it won't be applicable everywhere, like single cell is, but it will be very applicable to answer very important questions in cancer immunology.

As you work with projects, you take data that's available that perhaps was created in an analytically agnostic way. It just was created with no particular intention of how to be analyzed, and so you need to apply some algorithm you design to analyze that data.

As I’ve gone through this process, I've realized at certain points that I wish I had this dataset, or I wish I had this type of data. If I had this type of data, I could answer this type of question. I think we need to see a better marriage between the analytical people and the tech development individuals, to develop more focused assays that will answer very focused questions.

In these two processes of analytics and tech development, they typically happen in a more straddled fashion where the tech leads the analytics. On the analytical end, you end up trying to make do with what's been given to you by tech people. It's not always perfect, it's not always what you'd want, but you try and make the best out of it.

Whereas if these two fields worked together in a simultaneous way, then people developing tech will be developing it with certain analytical frameworks in mind, and I think that marriage makes the tech a little more limited in what questions it could answer, but it can perhaps answer certain questions in a better and more focused way.