Human Proteoform Project Could Be Biology’s Next Moonshot

By Deborah Borfitz

January 19, 2021 | Eight years ago, an international team of researchers proposed that the term “proteoform” be adopted to describe the vast number of forms of protein products from our genes—including changes due to genetic variations, alternative RNA splicing, and post-translational modifications—to reduce the semantic-related ambiguity in the study of proteins. Since these proteoforms can be turned on or off, understanding them with absolute molecular precision is required to demystify the world of how proteins function and “unlock the future of human biology,” says Neil Kelleher, professor of molecular biosciences, chemistry, and medicine at Northwestern University and faculty director of Northwestern Proteomics, as well as a world-renowned proteomics pioneer.

To that end, the nonprofit Consortium for Top-Down Proteomics recently proposed the Human Proteoform Project to generate a definitive reference set of the proteoforms produced from the genome. This will become a “seminal moment in science,” Kelleher says, and the initiative is the next obvious step now that the Human Genome Project has provided the blueprint for how proteins get made.

Details of the proposal were recently published in Science Advances (DOI: 10.1126/sciadv.abk0734). The end goal is creation of a Human Proteoform Atlas, a high-resolution reference proteome that will be public and available to all, including the many proteomics companies recently advancing in the private sector. It is possible to accomplish this ambitious project over the next decade, Kelleher says.

Mapping of the open frontier of our proteome would have wide-ranging implications, he adds. The impacts would improve and elevate the return on investment in clinical proteomics, chemical proteomics and drug development, regenerative medicine, and next-generation proteomics like single-molecule protein sequencing.

Most people have more than a passing interest in proteins whether they are aware of it or not, says Kelleher, because proteins are involved in all human diseases. The Human Proteoform Project would enable earlier and more precise detection of those diseases.

That could help explain the rush of money from venture capitalists, institutional investors, and Wall Street—by some accounting, roughly $3 billion in the past 18 months alone—into proteomics, says Kelleher. The recipients include biotechnology companies focused on promising technologies such as single-cell proteomics techniques, single-molecule proteoform analysis, and single molecule protein sequencing.

In some sense, they are vying to become the “Illumina of proteomics,” he says, replicating the success of one of the biggest next-generation companies made possible by the Human Genome Project. In the few years afterward, that publicly funded initiative stimulated the creation of about 300,000 new jobs as the price of sequencing genomes plummeted.

“Proteomics is on a path to become equal to genomics in terms of economics and benefits for the future of human health,” says Kelleher. With government support, the proteomics ecosystem could grow tenfold. A pre-competitive proteomics initiative launched now could therefore have accelerated impact relative to the Human Genome Project because of work already underway in the private sector.

‘Life Of Their Own’

Northwestern Proteomics, the leader in top-down proteomics, is certainly interested in advancing the Human Proteoform Project. The 60-scientist group maintains the proteoform informatics platform that will serve as initial versions of Human Proteoform Atlas, Kelleher shares. Details about creation of the web-based repository just published in Nucleic Acids Research (DOI: 10.1093/nar/gkab1086).

The field has long been dominated by “bottom-up proteomics,” based on mass spectrometry, which generates about $5 billion per year in economic activity. Northwestern Proteomics, and the Consortium for Top-Down Proteomics—where Kelleher serves as president of the board of directors—is concerned with systematic discovery of intact proteoforms with all their molecular parts.

Even today, proteoform is probably a familiar term to a minority of scientists, he says. Structural biologists may have concluded that study of the proteome has reached its pinnacle now companies like AlphaFold (developed by Google’s sister company DeepMind) have figured out how to fold proteins via computer.

But the proteoforms, what Kelleher describes as “all the decorations that occur in life,” remain largely unknown. As an example, he points to the eyeballs, which yellow and get diseased with age because certain protein molecules don’t get repopulated.

It’s the same scenario across all major disease areas, he says, including cardiology, oncology, and, most especially, neurology and neurodegeneration. “Clinicians even call them proteinopathies, or diseases of proteins in your brain.”

By mapping out what proteins are created from the body’s 20,300 human genes, the Human Proteoform Project will “elevate the whole ecosystem for biomedical research and for clinical practice,” says Kelleher. “There is a proteoform family for every human gene, and proteoforms have a life of their own. They can be activated or repressed after they are produced, and their diversity varies widely in our different cell types in unknown ways.”

Millions of unique proteoforms are created across the genome due to genetic variation, modification, or alternative splicing, making it an almost unfathomably large undertaking. “All of this is radically open science,” Kelleher says, from which all humankind stands to benefit.

Top-Down Strategy

The Consortium for Top-Down Proteomics launched in 2012. It now has 400 members from around the world advocating for a government role in funding the Human Proteoform Project, says Kelleher.

The proposed approach is different from mainstream proteomics, which captures about 10% of the human proteome, he continues. He likens the bottom-up strategy to stamp collecting where proteoforms are a collection of stamps that get shredded into about 50 pieces each, all about the same size, which then get blown about. Scientists must get down on the floor to collect all the little pieces and try to put the stamps (proteoforms) back together.

In contrast, a top-down strategy determines the precise weight of each stamp (proteoform), all of which are slightly different, says Kelleher. The stamps would also have distinct structural attributes. Scientists then “controllably break” the stamps into pieces to achieve 100% molecular precision for each one.

The board of directors of the Consortium for Top-Down Proteomics is now forming an advisory board to broaden the advocacy base for the Human Proteoform Project, Kelleher reports. It will include current supporters of the consortium as well as scientific leaders.

The consortium has members from academic institutions, corporations, and government agencies worldwide, and its work is supported by sponsorships from Thermo Fisher Scientific, Bruker, SCIEX, Pfizer, and Agilent.

Players in the proteomics space include big players (e.g., Thermo Fisher Scientific, Bruker, Agilent, Waters, and Sciex), numerous small- to mid-size companies providing tools and services, and a growing assortment of small biotech companies attracting venture capital, says Kelleher. Additionally, many biopharmaceutical companies are already using top-down proteomics every day. “Half of the whole pipeline of new drugs are proteins, so that means proteoforms.”

Scaling The Atlas

The existing proteoform atlas, residing on the consortium’s website, contains a couple hundred thousand proteoforms. Northwestern Proteomics also has 50,000 unique human proteoforms from five human tissues, Kelleher says.

As envisioned, technology development over the first three to four years of the project will focus on advancing mass spectrometry—a “linear extension of the current state of play in top-down proteomics,” says Kelleher. After that, “the crystal ball as to what disruptive platforms could emerge gets a little hazy… which is why we’re excited by all those biotech proteomic entrepreneurial companies.”

The community will need to expand its team of proteoform informaticians to perhaps 40 or 50 software engineers, he adds. The Consortium for Top-Down Proteomics also has a working group of about 40 “computer geeks” around the world currently being funded by an assortment of small grants.

But the project can’t realistically happen on the scale proposed without major financial support from federal governments or foundations, Kelleher notes. The initial ask, now that the framework for the project has been outlined, is on the order of $100 million a year in support. For perspective, the Human Genome Project required approximately $4 billion in public investment over about a decade.