Deep dive into molecular testing to improve diagnosis of difficult tumors: A Q&A with Drs. Jie-Fu (Jeff) Chen and JinJuan Yao
The College of American Pathologists (CAP) recently published an article advocating for molecular testing for cancers of unknown primary (CUP) and other diagnostically ambiguous tumors.
The article, titled “Utility of Molecular Testing in the Diagnostic Workup of Difficult-to-Classify Tumors,” highlights the benefits of genomic, transcriptomic, and epigenetic profiling for pinpointing tumor origin, leading to personalized treatment for patients.
Authors Jie-Fu (Jeff) Chen, MD, FCAP, CAP Personalized Healthcare Committee member and JinJuan Yao, MD, PhD, FCAP offered Medical Laboratory Observer a deeper dive into the article, breaking down where molecular testing could aid with the diagnosis of difficult tumors, how AI can be used to support clinical decision making, and next steps for labs.
When molecular testing is used to classify difficult tumors, which findings should carry the greatest diagnostic weight—truncal driver alterations, mutational signatures, or multi-omic patterns—and how do you prioritize these when results are discordant?
A: First, we need to emphasize that the use of molecular findings in classifying difficult tumors is highly dependent on the clinical and histologic context. While studies have suggested that some tumor types carry “pathognomonic” or “defining” molecular alterations, one must be very careful interpreting the results along with other supporting evidence, such as the location of the tumor and morphologic/immunophenotypic findings in the tumor. In a different context, with appropriate prior knowledge about the patient’s oncologic history or findings from previous tumors, even genomic variants that are rarely observed can be very helpful. Here’s a breakdown of the common clinical scenarios where molecular tests may help with classifying difficult tumors, and the types of molecular findings that carry more diagnostic weight in those scenarios:
Scenario 1 — Cancer of unknown primary (CUP)
In this scenario, clinical and histologic findings have provided extremely limited information. In this setting, the “pathognomonic” or highly specific molecular events – the ones exclusively observed in one or very few tumor types – carry the highest diagnostic weight. This is not limited to one category of molecular alteration. It is worth noting, however, that these highly specific molecular events and their corresponding tumor types only represent a small subset. In most cases, the tumors do not carry such highly specific molecular events, and the classification will require an integrative approach that considers all available molecular findings: truncal driver alterations, mutation signatures, gene expression, methylation, and even multiomic data. This is also the scenario where artificial intelligence (AI) or machine learning (ML)-based algorithms may provide valuable suggestions after appropriate training and validation on large datasets.
Scenario 2 — Difficult tumor with limited differential diagnosis or known history
In this scenario, clinical and histologic findings including the patient’s oncologic history have narrowed the differential diagnosis down to a smaller range. The “pathognomonic” molecular events are still very helpful, but now we could also consider the truncal driver alterations that show significantly different prevalence across the tumor types in the differential diagnosis. These truncal driver alterations represent distinct mechanisms underlying the tumorigenic processes, and based on our knowledge on the tumor biology, could help us further point toward one or few specific tumor types. When pathognomonic or truncal driver alterations are not present, one is more likely to reach a definitive tumor classification using other supporting evidence such as mutation signature analysis, methylation classifiers, or gene expression clustering.
Scenario 3 — Known molecular profile from prior tumor
In this scenario, the question is whether the tumor of interest represents a recurrence/metastasis from the patient’s known neoplastic disease, or a new primary tumor that is biologically distinct from the prior one. When the molecular profile of the previous tumor is available, comparative molecular testing has been shown to provide much more reliable determination of clonal relationship than clinical or histologic findings alone. In this setting, we rely on the genomic alterations shared between the tumors to establish the clonal relationship, with emphasis on non-driver, and sometimes low-prevalence molecular events. Common truncal driver alterations are helpful in establishing independent primary tumors when they carry different truncal drivers; however, a single shared hotspot driver mutation between two tumors may be insufficient to establish a clonal relationship as these common truncal drivers have higher chance of randomly occurring in two independent primary tumors.
In general, mutation signatures as well as multiomic patterns are usually considered as supporting evidence in current practice. Mutation signatures often reflect broad categories of biologic processes that may contribute to tumorigenesis, and will need to be considered alongside clinical and histologic features, as well as other molecular findings. The models or algorithms that analyze multiomic patterns are limited by its design and validation to predict within a defined group of tumor types. In addition, the tumor types included in the training process were often generic categories and may not reflect the most up-to-date tumor classification or cover rare subtypes. Proper correlation with clinical and histologic findings becomes more critical with the prediction from these algorithms. In spite of the limitations, these novel algorithms powered by ML or AI-assisted approaches have demonstrated impressive capacity in predicting tumor classification and primary sites.
Your article highlights the growing role of machine learning and AI–based tumor classifiers. What level of transparency or validation should laboratories expect from these tools before integrating them into diagnostic workflows?
A: Before clinical integration, labs should perform comprehensive validation studies for AI classifiers like any high-impact assay. The intended use should have a clear scope, and the performance should meet the expectations, while the limitations should be clearly illustrated.
The transparency should include the following aspects:
- Intended use + label set: exactly what tumor types are covered, and what is not covered.
- Training/validation cohort composition: size, specimen types (FFPE vs fresh), sites, demographics if relevant, tumor purity distribution, and class imbalance.
- Input requirements + QC thresholds: minimum coverage/call rates, tumor fraction requirements, batch effects, and failure modes.
- Confidence reporting: calibrated probabilities, top outputs, and “no-call/uncertain” behavior.
- Known limitations: performance on rare tumor type, out-of-distribution handling, and performance in treated/recurrent tumors.
Thorough validations should also include:
- Independent external validation (not just internal cross-validation).
- Local verification on your own specimen types/workflow:
- A representative set of real-world “easy” cases and “challenging” cases
- Pre-analytical variations common in your lab (low-input, necrotic, low coverage, and so on)
- Reproducibility testing (run-to-run / instrument-to-instrument if relevant).
- Clinical risk controls: SOPs for discordant results, second-review triggers, and mandatory correlation with morphology/IHC.
- Ongoing monitoring: drift checks, periodic revalidation when the model updates or when your pre-analytics change.
Bottom line: AI tools should be integrated as decision support, not sole arbiters—unless the evidence and regulatory context justify otherwise. Labs should insist on a “no-call” option and clear documentation of the model’s boundaries.
In the coming years, we will likely see more studies and algorithms favoring multi-layer evidence and standardized integration over any single “magic” assay. We also expect more evidence to emerge with experiences using these algorithms on real-world patient data in real-time clinical management or decision making.
Looking ahead, which molecular or multi-omic advances do you believe will have the greatest impact on resolving diagnostically ambiguous cancers over the next five years—and how should laboratories prepare now?
A: Currently, next-generation sequencing (NGS) has been widely used for detecting mutations, copy number alterations, and fusion/rearrangements in tumor tissue and circulating tumor DNA (ctDNA). Beyond detection of individual alterations, more integrative analysis such as microsatellite instability, loss of heterozygosity, and mutation signature are beginning to be incorporated into the assays. Methylation profiling and whole transcriptome sequencing are also getting more and more critical in some clinical practice. Over the next 5 years, we believe that more labs will be equipped with high-throughput molecular assays that generate more comprehensive profiles, including from genome, transcriptome, methylome, and others. In the meantime, with the advances in AI and computation technologies, combined molecular or multiomic approaches will rapidly evolve and will make a greater impact on resolving diagnostically ambiguous cancers. While all the novel molecular tools aim for more precise tumor classification and more clinical benefit, there will be a greater need for a proper workflow and stewardship to ensure that the resources and tests are appropriately used.
What we think the laboratories can prepare now:
- Build a tiered testing algorithm: when to reflex from IHC to molecular testing, and how to prioritize DNA, RNA, methylation or other tests (based on tumor type and tissue constraints).
- Strengthen pre-analytics: tumor enrichment strategies, macro-dissection standards, QC gating, documentation of fixation/decalcification.
- Develop an “interpretation playbook”:
- How to resolve discordant results
- When to repeat tests with suboptimal quality
- What orthogonal confirmations are required
- Invest in data integration: structured reports that unify morphology, IHC, and multi-omics; decision-support templates; tumor board pipelines.
- Plan for AI governance: validation SOPs, update control, drift monitoring, and accountability.
- Train the team: not just molecular staff—all practicing pathologists need practical literacy in what each assay can and cannot do.


