Overcoming ‘Analysis Paralysis’ in Cell Painting With Artificial Intelligence

Contributed Commentary by Victor Wong, Core Life Analytics, and Angeline Lim, Molecular Devices

November 25, 2022 | Gone are the days of only measuring single parameters in cell-based experiments. Instead, researchers should widen their focus, namely with the help of innovations like the Cell Painting assay—developed in 2013 by the Carpenter-Singh Lab at the Broad Institute of MIT and Harvard—for cytological characterization. This multiplexed phenotypic profiling approach introduces six fluorescent dyes that stain and identify up to eight cellular components and organelles after cells are cultured and treated with experimental conditions of interest. The end result is an abundant collection of information about the cellular response to perturbations—all nestled within colorful high-content images.

This information presents a great opportunity: researchers can access more information from an individual experiment than ever before. However, they are also faced with three major hurdles: figuring out how to mine all that data effectively, performing predictive analytics, and ensuring easy access to insights in perpetuity.

In my (Victor’s) experience, data analytics isn’t core to the scientific education process, so becoming familiar is a significant investment. With vast datasetsbecoming more commonplace, it’s critical that researchers have intelligent, robust analytics tools at their disposal that dive into metrics quickly and paint a holistic picture of the experiment.

Cell Painting involves identifying similarities or dissimilar phenotypes translated to numerical data for making assumptions. The assay generates terabytes of information. Conventional statistical analysis techniques are an option. Unfortunately, these usually only consider a few quantitative measures of cellular features such as size, intensity, and texture, leaving a plethora of helpful information in the dark.

This is where artificial intelligence (AI) and machine learning-enabled software can uplevel data analytics, shining a light on previously incomprehensible biological insights. Not only does this solution identify the aforementioned quantitative measures, but it also generates scientific meaning—offers interpretations—from experiments rather than just outputting data. For example, researchers can assess a compound’s effect on the entire phenotype by generating detailed cellular profiles. Furthermore, they can make unbiased predictions around drug efficacy and safety and narrow down mechanisms of action, leading to novel drug discoveries.

Researchers must keep the following three concepts in mind to realize the true potential of Cell Painting and maximize the use of AI-enabled data analysis software.

Rubbish In, Rubbish Out

Data mining is the process of finding hidden anomalies, patterns, and correlations within large datasets to predict accurate outcomes, something AI-enabled software can support. The data quality level at the beginning of a research project sets the foundation for data mining quality, from which accurate predictions can be attained.

This fact reinforces the importance of having exceptionally clear, robust raw images from the Cell Painting assay. If researchers rely on subpar imaging technology not optimized for high-throughput image acquisition or handling highly multiplexed cell-based assays, they will face muddy data that can negatively impact the quality of analyses further in the workflow.

Impeccable data quality is especially critical when the software begins to apply AI-based algorithms during data analysis. Poor quality or biased data will affect the software’s decision-making, proving true that the algorithms are only as accurate as the information they’re fed. Therefore, the software should guide users through the necessary steps to pre-process their data. These steps include normalization, outlier removal, imputation of missing data, and more to ensure high-quality and unbiased readouts at the end of an assay.

Iterative Flexibility Is Key

AI-enabled software turns images from multiple experiments into numbers that can be rapidly collated and flagged for similarities. For example, consider if researchers were to analyze hundreds of thousands of organoids from multiple patients on a large screen, exposing them to a library of perturbations. In an initial analysis, one researcher could identify specific organoid phenotype profiles and add this new information to the dataset as metadata.

In subsequent iterations, that researcher—or someone else—could zoom in on these specific profiles. In the process, AI helps uncover patterns underlying these phenotypes, determine the biological mechanisms, and easily identify which perturbations affect these specific mechanisms. This iterative flexibility is critical. The software must flex to incorporate—and reference—new information as it’s uncovered to deliver the most thorough, predictive analytics.

Waste Nothing

An ability to build robust, reliable reference libraries that can be used repeatedly by comparing one set of data to future datasets has infinite use in Cell Painting. Each time a new experiment runs with a new compound, it can be easily compared to existing datasets. AI will quickly point out that your cells react to a new biomolecule as it did to an anti-cancer drug tested last month—or last year. Such “hits” could signal a potentially new, less harsh therapeutic worth looking into.

This ‘waste nothing’ mindset has the potential to reach far beyond any one research group or institution. For example, when reference libraries are publicly available—as is planned with the JUMP-Cell Painting Consortium featuring over 140,000 compounds—any researcher around the globe can access the data at any point in time, leading to increased collaboration and, ultimately, a transformation of the drug discovery process.

Future Implications

AI software helps researchers regain control of their innate creativity and curiosity by alleviating data overwhelm and offering predictions that might otherwise have remained out of view. Together, researchers can ask more questions, run more experiments, and discover more answers to some of life’s most pressing challenges.

Victor Wong is Chief Scientific Officer at Core Life Analytics. In his role, he helps provide rapid and unbiased analytical tools to streamline the high throughput therapeutic development process for scientists everywhere, focusing on assisting users in leveraging the full power of the company’s StratoMineR platform. He has diverse experience in many therapeutic fields, and his scientific focus has been on target discovery and validation to find novel treatments for diseases such as diabetes, hearing loss, and neurological disorders. Wong received his Ph.D. in Physiology at the University of Toronto, was a Canadian Institute of Health Research Fellow, and conducted his postdoctoral training at Weill Cornell Medicine. He can be reached at victor@corelifeanalytics.com.

Angeline Lim is an Applications Scientist at Molecular Devices specializing in bioimaging. She provides scientific, technical, and applications support for the company’s portfolio of ImageXpress high-content imaging systems. She has over 10 years of research experience in imaging, genetics, cell, and molecular biology. Lim holds a Ph.D. in Molecular Cell and Developmental Biology from the University of California at Santa Cruz. She can be reached at angeline.lim@moldev.com.