Transforming Cancer Registry Operations with Federated AI

Huren Sivaraj, 7 October 2025

 

Transforming Cancer Registry Operations with Federated AI

Federated AI is not about replacing the people who build cancer registries.

It is about empowering stakeholders to re-imagine how we can solve the challenges of fundamentally different Epidemiological and Clinical Research Registries through a common foundational AI system.

Cancer registries remain one of the most critical and underappreciated services in cancer control. They provide the evidence base for how we measure disease burden, plan services, and assess outcomes. Yet, as oncology evolves, so must the systems that sustain it.

Over the past few years, we’ve seen the dual challenge emerge clearly. Growing data complexity in the age of Precision Oncology and limited human resources to keep pace with the complexity of data collection and curation. Many registry teams now need to manage enormous volumes of unstructured reports with little automation support. The need for a smarter, more adaptable data infrastructure that respects the needs of complex healthcare systems and data privacy is clear.

Two Worlds of Cancer Registries

 

Across most healthcare systems, cancer data exists in two distinct but complementary domains. The first is epidemiological, typically managed at national or state level, focusing on incidence, survival, and population-level trends. It relies on structured notifications and standardised reporting. The second is clinical, developed within academic or research networks, integrating molecular, treatment, and outcomes data to drive translational research.

Both are essential, yet each remains incomplete when viewed through the lens of hospital and system-level investment. What has long been missing is a shared architecture; where a single foundational data infrastructure is capable of sustaining both epidemiological surveillance and clinical research needs.

Why the Time for Change is Now

The urgency is real. Cancer incidence is rising globally. Precision oncology has expanded what must be recorded. And policy leaders need faster insights to evaluate the impact of diagnostic and therapeutic changes. The ability to capture longitudinal outcomes has become non-negotiable.

Yet most hospitals still rely on manual registry abstraction. These processes are slow, fragmented, and expensive to sustain. Capability-building at the institutional level can no longer stop at single-time-point reporting. It needs to support multiple registry functions from national surveillance to clinical research all from the same digital foundation.

From Manual Review to Federated Automation

When we first began working with hospitals on registry automation, the core question was not “Can AI read a pathology report?” but “Can it do so with accuracy and governance that clinicians will trust?”

The answer has come through experience.

Across an early registry partner, Oncoshot’s Well-Characterised Registry (WCR) platform has demonstrated over 95% validated accuracy in automating data extraction from pathology, CT, and clinical reports.

Of specific importance to the registry, it achieved 97.6% histology extraction accuracy across more than 1,200 real-world pathology reports.

These results show that automation in cancer data abstraction is not just possible; it’s ready.

How the Solution Works

At its core, the WCR platform is powered by a federated open-sourced Large Language Model (LLM) trained on deidentified unstructured data with accuracy benchmarks obtained by comparison with labelled data by expert clinicians and data managers.

It operates within secure, isolated hospital environments, where all personally identifiable information is masked at source.  The model identifies and structures key registry variables such as histology, diagnosis, tumour size, and TNM stage directly from unstructured documents.

Governance and compliance are built into every layer. Hospitals retain full oversight and audit trails. The system is built to learn locally, adapts to institutional patterns and feedback from users on data accuracy. It also scales across different sites without compromising accuracy or privacy while providing aggregate insights to central registry partners or sponsors.

This combination of precision and sovereignty creates trust across both healthcare systems and registries.

A Platform That Learns and Scales

True scalability in healthcare data systems depends on three things: privacy, accuracy, and currency. The WCR platform has been designed with all three.

  • Privacy by design: Personal identifiers never leave the institution.

  • Guaranteed clinical accuracy: Continuous validation and QA ensure reliability.

  • Always-fresh data: Automated refreshes mean registries stay up to date without starting over.

This means hospitals can finally move from reactive data cleaning to continuous, proactive registry intelligence, a foundation that supports research, policy, and clinical trials alike.

Real-World Validation: Case Studies at IACR 2025

These principles are already being proven in practice.

Our upcoming presentations at the International Association of Cancer Registries (IACR) 2025 in Izmir, Türkiye, highlight two real-world implementations.

These studies reflect months of collaborative work with registry teams, pathology departments, and institutional leadership. They show that high accuracy and privacy compliance can be achieved together, and that automation can empower cancer registry professionals.

Join Us

If you’re attending IACR 2025, come see these results firsthand.

For institutions exploring how to modernise their registry infrastructure, our team would be glad to share what we’ve learned from these implementations.

📩 Contact us at support@oncoshot.com to schedule a discussion or demonstration.