Keeping private health information confidential is not only a legal obligation but one of the highest ethical priorities at Fred Hutch Cancer Center.
Electronic health records track all the clinical data relevant to a patient’s care and treatment, including test results, vital signs, progress notes and radiology reports. These records also contain personal details about age, gender, ethnicity, income, education level and marital status. Medical images and genome sequencing help guide the care of individual patients.
The accumulated data of all Fred Hutch’s consenting patients is also valuable for cancer research.
Using sophisticated computing techniques, Fred Hutch researchers can find hidden patterns and connections in that patient data to better understand cancer and discover more effective ways to treat it. Other comprehensive cancer centers also are learning from their own patients’ records.
But the requirements of patient confidentiality make it difficult to combine data from different centers into a single computer model that can make powerful predictions and inferences based on what it learns from so many examples.
In October, Fred Hutch announced that it is spearheading a project with three other National Cancer Institute-designated cancer centers to make such a model using leading-edge artificial intelligence technology.
The Cancer AI Alliance — CAIA — includes Dana-Farber Cancer Institute, Memorial Sloan Kettering Cancer Center, The Sidney Kimmel Comprehensive Cancer Center and the Whiting School of Engineering at Johns Hopkins.
Fred Hutch secured the initial funding and will serve as the alliance’s coordinating center. CAIA is supported by more than $40 million in funding and AI technology and expertise from AWS, Deloitte, Microsoft, NVIDIA and Slalom.
The CAIA centers collectively will build a resource to enable researchers to build a computer model that learns from each cancer center’s patient records without that data ever leaving the secure confines of their home institutions.
Sounds great, but what the heck is a computer model?
How a machine learns
Basically, a model is a simplified representation of reality, like a map.
Think about how you got to work today. You relied on a mental map based on all your previous commutes. The more trips you take, the more reliable your mental map becomes, which is important because it helps you make accurate predictions about traffic and how to arrive on time.
But a mental map is still just a model of your commute, not the real thing.
Detours, closures, accidents, weather and the Mariners’ home baseball games are all variables that can influence the commute, requiring you to constantly update the model in your head so that it more closely matches the reality of the streets.
Fred Hutch and the other CAIA centers want to make a computer model of cancer that can accurately predict how a patient’s cancer is likely to progress, what treatments are likely to work under which conditions, and how the cancer might change itself to evade those treatments.
The CAIA computer model needs to learn from lots of diverse patient data — the more examples, the better — to accurately represent what cancer looks like in the real world.
The variables of the model will be numbers representing all the different kinds of information included in a patient record, everything from age, race and smoking history to genetics, MRIs, blood tests and drug dosages.
Those variables — and their relative influence — are the model’s settings, which can be adjusted based on feedback that tracks whether the model’s predictions are getting closer to the mark or further away.
How to share insights without sharing data
Fred Hutch can only train a model on Fred Hutch data, but its model would work better if it could also train on patient data from Boston, New York and Baltimore. And those centers’ models would work better if they trained on data from Seattle and the other cities.
An approach called federated learning makes that possible without sharing patient data. It works like this:
Each CAIA center will get a copy of the same overall model and train it on its own patient data, adjusting the variable settings to get more accurate predictions.
Then the centers will send those new, adjusted settings to a central location to update and improve the overall model.
When you update your mental commute map to reflect a Mariners’ home game, you don’t need to know who is at the game, who the starting pitcher is, or even who the Mariners are playing — just the date and time.
Similarly, when updating a cancer model, you don’t need to share the complete patient data profile. The centers will share only the adjusted settings, not the patient data that influenced those adjustments. The records themselves remain safely behind each institution’s firewalls.
Those shared adjustments — millions of them — are then merged to establish new consensus settings reflecting what the overall model has learned from the patient records at all the centers.
Each center then gets a copy of the updated overall model with the new consensus settings, which are further refined with the center’s own patient data. This cycle of local training and central updates to the overall model’s settings may repeat many times to tighten the match between the model of cancer and the reality of cancer across the CAIA centers.
These models — trained on the comprehensive and diverse data of patient experiences at four cancer centers and potentially more institutions — will help researchers make better sense of the complex molecular interactions underlying tumor biology, disease progression, response to therapy and resistance to treatment.
It will be particularly useful for scientists studying rare cancers and small populations with only a few patients at each center, which could shed light on new therapies.