Sharing AI insights without sharing patient data

Jeff Leek next to graphic displayed on large screen — Fred Hutch chief data officer Jeff Leek leads a breakout session about artificial intelligence during the Fred Hutch faculty retreat held June 10, 2024, in Seattle, Washington. Photo by Robert Hood / Fred Hutch News Service

Keeping private health information confidential is not only a legal obligation but one of the highest ethical priorities at Fred Hutch Cancer Center.

Electronic health records track all the clinical data relevant to a patient’s care and treatment, including test results, vital signs, progress notes and radiology reports. These records also contain personal details about age, gender, ethnicity, income, education level and marital status. Medical images and genome sequencing help guide the care of individual patients.

The accumulated data of all Fred Hutch’s consenting patients is also valuable for cancer research.

Using sophisticated computing techniques, Fred Hutch researchers can find hidden patterns and connections in that patient data to better understand cancer and discover more effective ways to treat it. Other comprehensive cancer centers also are learning from their own patients’ records.

But the requirements of patient confidentiality make it difficult to combine data from different centers into a single computer model that can make powerful predictions and inferences based on what it learns from so many examples.

In October, Fred Hutch announced that it is spearheading a project with three other National Cancer Institute-designated cancer centers to make such a model using leading-edge artificial intelligence technology.

The Cancer AI Alliance — CAIA — includes Dana-Farber Cancer Institute, Memorial Sloan Kettering Cancer Center, The Sidney Kimmel Comprehensive Cancer Center and the Whiting School of Engineering at Johns Hopkins.

Fred Hutch secured the initial funding and will serve as the alliance’s coordinating center. CAIA is supported by more than $40 million in funding and AI technology and expertise from AWS, Deloitte, Microsoft, NVIDIA and Slalom.

The CAIA centers collectively will build a resource to enable researchers to build a computer model that learns from each cancer center’s patient records without that data ever leaving the secure confines of their home institutions.

Sounds great, but what the heck is a computer model?

How a machine learns

Basically, a model is a simplified representation of reality, like a map.

Think about how you got to work today. You relied on a mental map based on all your previous commutes. The more trips you take, the more reliable your mental map becomes, which is important because it helps you make accurate predictions about traffic and how to arrive on time.

But a mental map is still just a model of your commute, not the real thing.

Detours, closures, accidents, weather and the Mariners’ home baseball games are all variables that can influence the commute, requiring you to constantly update the model in your head so that it more closely matches the reality of the streets.

Fred Hutch and the other CAIA centers want to make a computer model of cancer that can accurately predict how a patient’s cancer is likely to progress, what treatments are likely to work under which conditions, and how the cancer might change itself to evade those treatments.

The CAIA computer model needs to learn from lots of diverse patient data — the more examples, the better — to accurately represent what cancer looks like in the real world.

The variables of the model will be numbers representing all the different kinds of information included in a patient record, everything from age, race and smoking history to genetics, MRIs, blood tests and drug dosages.

Those variables — and their relative influence — are the model’s settings, which can be adjusted based on feedback that tracks whether the model’s predictions are getting closer to the mark or further away.

How to share insights without sharing data

Fred Hutch can only train a model on Fred Hutch data, but its model would work better if it could also train on patient data from Boston, New York and Baltimore. And those centers’ models would work better if they trained on data from Seattle and the other cities.

An approach called federated learning makes that possible without sharing patient data. It works like this:

Each CAIA center will get a copy of the same overall model and train it on its own patient data, adjusting the variable settings to get more accurate predictions.

Then the centers will send those new, adjusted settings to a central location to update and improve the overall model.

When you update your mental commute map to reflect a Mariners’ home game, you don’t need to know who is at the game, who the starting pitcher is, or even who the Mariners are playing — just the date and time.

Similarly, when updating a cancer model, you don’t need to share the complete patient data profile. The centers will share only the adjusted settings, not the patient data that influenced those adjustments. The records themselves remain safely behind each institution’s firewalls.

Those shared adjustments — millions of them — are then merged to establish new consensus settings reflecting what the overall model has learned from the patient records at all the centers.

Each center then gets a copy of the updated overall model with the new consensus settings, which are further refined with the center’s own patient data. This cycle of local training and central updates to the overall model’s settings may repeat many times to tighten the match between the model of cancer and the reality of cancer across the CAIA centers.

These models — trained on the comprehensive and diverse data of patient experiences at four cancer centers and potentially more institutions — will help researchers make better sense of the complex molecular interactions underlying tumor biology, disease progression, response to therapy and resistance to treatment.

It will be particularly useful for scientists studying rare cancers and small populations with only a few patients at each center, which could shed light on new therapies.

Dr. Thomas J. Lynch Jr speaks at podium — President and Director Dr. Thomas J. Lynch, Jr. welcomes attendees and provides a ‘State of the Hutch’ address during the Fred Hutch faculty retreat. Photo by Robert Hood / Fred Hutch News Service

Questions and answers about CAIA

The Nov. 20 online all-staff Fred Hutch Town Hall featured a Q & A about the new alliance, with Fred Hutch President Thomas J. Lynch Jr., MD, who holds the Raisbeck Endowed Chair for the President and Director, and Jeff Leek, PhD, Fred Hutch vice president and chief data officer, who holds the J. Orin Edson Foundation Endowed Chair.

The following is a transcript edited for brevity and clarity.

Q: (Tom Lynch) Last month, we announced the launch of the Cancer AI Alliance, which we are calling CAIA. I'm very excited about this. A number of very prominent tech companies, along with four spectacular cancer centers, have gotten together to think about how we can use AI to transform cancer care and research. What does CAIA mean? Who is involved and how did the idea start?

A: (Jeff Leek) The original impetus for the idea is based on developing predictive, prognostic, therapy assigning, and discovery-based machine learning models from patient data and deploying them back into the health system to improve patient care. We have a track record at Fred Hutch of doing this within our liquid tumor program using the Gateway system consenting patients to participate in research and leveraging the information about those patients to make new discoveries to enable new cures for cancers.

We've started to develop the latest iteration of that, which is the CARDS platform — Clinical and Research Data Solution — to enable the same kind of procedure for all of our patients within Fred Hutch, making sure we're learning from all of the information carefully, preserving privacy carefully, working under the right regulatory frameworks and using that data to help everyone learn and develop new models, new predictions and new cures for cancer.

It's been a massive effort across UW, our IT department, our Data department, Legal, and Compliance to enable us to set up a system where we can leverage and use our patient data according to all of the rules, following all protocols to be able to build new models and learn from our patient data.

Every cancer center is trying to do something similar. Our partners Memorial Sloan Kettering, Dana Farber, Sydney Kimmel Cancer Institute in Baltimore are similarly building up their learning health care data systems.

But the challenge has always been that these are siloed. If I want to learn about the cancer patients at Johns Hopkins, I have to create a study and develop a data sharing protocol and material transfer agreement. This can take a long time to set up each individual study cancer center to cancer center.

How could we create an incentive to break down those silos and enable us to do this work across cancer centers?

There are so many reasons why that's hard to do, but coming online are a bunch of technologies that enable us to do what's called federated learning. And so, capitalizing on that technological advancement, we can avoid some of the challenging issues that come with trying to share the data directly.

Q: Is AI a strategy or does AI enable strategies?

A: I like to think of three components: There's data which we've now collected in a scale that we haven't ever collected before. There are models which we can fit at a scale we've never been able to do before, and there's the interfaces to people.

And all three of those things at this convergence point are really possible across a wide range of industries, and that's why you're seeing all this excitement around AI. I've been an AI skeptic for a long time. One of the things that's changed my mind is really thinking about what AI is. Is AI something to replace people? Is AI going to replace all the truck drivers?

I don't think that's what it is. Is AI going to place replace the radiologists? We heard that hype about five years ago. We still need a lot of radiologists today, so that never really played out.

Q: We're short here.

A: What changed my thinking was thinking of AI as a “better computer” or as a support tool, and a human AI partnership tool. Then you can think of ways you can go faster, do more and really make discoveries that you couldn't have made otherwise. And so, as I started to think of it that way, my skepticism reduced a lot. And I think I saw the potential for this to have a huge impact across both our research community and our clinical care community as an assistant, to make discoveries that wouldn't have been possible otherwise.

Q: Tell me a little bit about what we've done so far with CAIA. We announced CAIA in October, where does it stand?

A: I would say the short answer is this is going to be the coolest laboratory for AI research that's ever been created for cancer.

If it works, we're going to have the most comprehensive data coupled to a system that will enable you to build the biggest, most creative models possible and available to all of the researchers at all of these cancer centers and in a not-for-profit way so that we can actually just drive this research forward. This is an incredible opportunity.

We're doing a lot of the hard, behind-the-scenes institutional work. We're building platforms. We're working on legal arrangements. We're setting up data models. We're negotiating the niceties of how we're going to do this and we're starting to think about use cases.

We're starting with these four cancer centers and ultimately our ambition is to include all the comprehensive cancer centers in the U.S. Once it's possible to build and deploy models across those cancer centers, then we will have the opportunity to keep that cycle going, make discoveries, develop new models and deploy them back into those centers.

I'm excited about those possibilities, especially if they're baked into the context of how do we study if this is effective for patients? How do we study if this is effective for our providers? How do we study if this is an aid or if this is a detracting from our care?

We're just at the beginning of that revolution and I think this positions Fred Hutch strategically in the perfect position to drive that research forward for the next decade.