Deep Learning In The Clinic: Predicting Patient Prognosis

By Allison Proffitt

December 4, 2017 | In a paper published last month on arXiv, and not yet peer-reviewed, Stanford researchers outlined how machine learning can be made to predict prognosis in a clinical setting.

Accurately predicting prognosis allows doctors to better meet patients’ needs and wishes at the end of life, particularly in bringing in palliative care teams in a timely way. There are several clinical tools used to predict prognosis in terminally-ill patients. The Palliative Prognostic Index (PPI), for example, calculates a multiple regression analysis based score using performance status, oral intake, edema, dyspnea at rest, and delirium.

But these indices are time-consuming and expensive to implement. Anand Avati, in the Department of Computer Science at Stanford University, is the first author on the paper in which researchers posited that there is a role here for big data.

“The proliferation of EHR systems in healthcare combined with advances in Machine Learning techniques on high dimensional data provides a unique opportunity to make contributions,

especially in disease prognosis,” Avati and his colleagues write. “Technology can… play a crucial role by efficiently identifying patients who may benefit most from palliative care, but might otherwise be overlooked under current care models.”

With that in mind, the researchers designed a deep learning model to answer a question they consider a proxy for identifying which patients would benefit from palliative and end-of-life care: “Given a patient and a date, predict the mortality of that patient within 12 months from that date, using EHR data of that patient from the prior year.

The researchers used STRIDE (Stanford Translational Research Integrated Database Environment), a clinical data warehouse supporting clinical and translational research at Stanford University. The portion of STRIDE used in the work comprises the EHR data of approximately 2 million adult and pediatric patients cared for at either the Stanford Hospital or the Lucile Packard Children’s hospital between 1995 and 2014.

The researchers were agnostic to disease type, disease stage, age, or other indicators. Choosing, instead, to “build a deep learning model that considers every patient in the EHR (with a sufficiently long history), without limiting our analysis to any specific sub-population or cohort.”

The “supervised learning” dataset was made up of patients with EHR records and hospital visits between three and 12 months of death. Those patients who died within the 3-12 month timeframe were considered positive cases; those who did not die within 12 months were considered negative cases. The inclusion criteria selected a total of 221,284 patients; 177,011 of those were used to train the model, 22,139 were left to validate the model.

The model is a Deep Neural Network (DNN) comprising an input layer of 13,654 dimensions made up of diagnostic, CPT, and prescription codes; 18 hidden layers (each 512 dimensions); and a scalar output layer, the authors report. The software was implemented using the Python programming language (version 2.7), PyTorch framework, and the scikit-learn library (version 0.17.1). The training was performed on an NVIDIA TitanX (12GB RAM) with CUDA version 8.0.

The authors report that the model is “reasonably calibrated”, with a Brier score of 0.042. “Upon conducting a chart review of 50 randomly chosen patients in the top 0.9 precision bracket of the test set, the palliative care team found all were appropriate for a referral on their prediction date, even if they survived more than a year. This suggests that mortality prediction was a reasonable (and tractable) choice of a proxy problem to solve,” they write.

The Right Problems

Deep learning techniques have demonstrated tremendous success in predictive ability, but the authors acknowledge the challenges in convincing physicians to use such predictive algorithms. “It is important to establish the trust of the practitioner in the model’s decisions for them to feel comfortable taking actions based on it,” they write.

Martha Presley, a palliative care physician at Aspire Health who was not affiliated with the work, sees great promise in the Stanford team’s approach.

“Current models of risk prediction in serious illness are tools for providers to use alongside clinical evaluation to formulate a prognosis. They are not independently reliable outside of the context of a holistic patient picture, and even the gestalt of a seasoned palliative care clinician dissected into its elements can only be several hundreds of items long,” she told Clinical Informatics News via email after reading the study. “The beauty of unsupervised machine learning is it can actually teach us what are greater predictors of mortality by identifying patterns among patients that pass away. Just because we anecdotally believe that recurrent hospitalizations for heart failure and a particular combination of drugs are a likely predictor of one year mortality does not mean that in aggregate in the data over a population that will bear out. The use of a DNN has value in both identifying patients as well as telling us how it identified the patients.”

But Presley notes some challenges as well. “Implementation of this for populations across varied health care settings will require collaboration of insurance companies as well as hospital systems and even local physician practices. The sharing of data is necessary to see the whole patient picture and more accurately identify patients,” she wrote.

The Stanford researchers report that the model is currently being piloted for daily, proactive outreach to newly-admitted patients. They also plan to collect objective outcome data (such as rates of palliative care consults, and rates of goals of care documentation) resulting from the use of the model.

This article also ran on Clinical Informatics News.