NIH unveils comprehensive proteogenomic dataset to help cancer researchers unravel molecular mysteries

Aug. 15, 2023
New dataset.

The National Institutes of Health is releasing a comprehensive dataset that standardizes genomic, proteomic, imaging, and clinical data from individual studies of more than 1,000 tumors across 10 cancer types. Researchers from around the world will be able to use this publicly available resource to uncover new molecular insights into how cancers develop and progress. The dataset was generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) at the National Cancer Institute, part of the National Institutes of Health.

The pan-cancer proteogenomic dataset, which is described in a paper published in Cancer Cell, builds on decades of technological advances in proteomic science. The launch of this dataset supports the Cancer Moonshot goal of accelerating cancer research through improved sharing of data. Two additional research papers published in Cell by CPTAC investigators provide an initial demonstration of the dataset's potential as a valuable resource for scientific discovery. In the first paper, multi-omic analyses are used to link cancer driver mutations with protein patterns. The second paper delves into protein modifications that regulate cell signaling and physiology to show associations with DNA repair, metabolism, and immunity across different tumor types.

The pan-cancer proteogenomic dataset will be publicly available through the NCI Cancer Research Data Commons repositories. Proteomics data can be accessed via the Proteomic Data Commons at https://pdc.cancer.gov/pdc/cptac-pancancer. Genomic and transcriptomic data can be accessed via the Genomic Data Commons at https://portal.gdc.cancer.gov and the Cancer Data Service at https://dataservice.datacommons.cancer.gov.

NIH release