A pitfall with broad implications for leukemia research
Bradley and Dvinge’s experiment revealed that their very premise about the origins of leukemia was wrong. Those abnormal RNAs turned out to be triggered not by cancer, but by the manner in which blood samples were being stored.
In the end, what they found, described in an article published this week in the journal Proceedings of the National Academy of Sciences, revealed not the key information about leukemia they’d originally imagined, but rather a pitfall with implications for similar leukemia research.
To study the molecular quirks of a cancer cell, Bradley, Dvinge and their colleagues needed to compare those quirks to the same molecules in healthy cells. It turns out that comparison is tricky when studying blood cancers. In studies of solid tumors such as breast or lung cancer, where the tumor is often confined to one or a few places in the body, researchers may ask patient study volunteers to donate a comparison sample from another, cancer-free part of their body – say, the other breast in a breast cancer patient whose tumor hasn’t spread.
But for leukemia patients, since the cancer is in their blood and therefore diffuse throughout the body, there’s no way to take a “healthy” sample from the same person. So researchers usually take blood samples from healthy people instead.
For their study, Bradley and Dvinge didn’t collect blood samples themselves but rather trawled large, publicly available databases that catalog the entire DNA and RNA repertoire of thousands of different cancer patients. Focusing just on leukemia, they looked at 15 different data sets of samples from nearly 600 patients with several kinds of leukemia. These data sets, amassed by several different research laboratories around the world, also included a handful of blood samples from healthy volunteers to serve as comparisons.
Over and over, the researchers found these high levels of abnormal RNA in leukemia cells from every collection of data – except for one: a data set that catalogs a relatively common adult blood cancer known as acute myeloid leukemia.
A light bulb went off
They also found that samples from a data set of children’s leukemias, collected by their Fred Hutch colleague and paper co-author, Dr. Soheil Meshinchi, showed the highest levels of abnormal RNA. At first, the researchers thought maybe they had identified a key distinction between pediatric and adult leukemias. But then a light bulb went off for Dvinge.
She realized there was another important difference between the database of pediatric leukemia and that of adults – pediatric cancers are rarer than adult cancers. And that led the researchers to a wildly different possible explanation for their findings.
“For the pediatric leukemias, many samples were collected worldwide and then shipped, by airplane or carrier, to a research center where they are then processed, because it’s a rare disease,” Bradley said. “Samples would sit around for a minimum of a day and frequently several days. So we thought we should test whether this mattered.”
The FedEx effect
That was the test that turned their hypothesis on its head.
Dvinge, working with Meshinchi and his team, conducted what she describes as a very simple experiment. The researchers took blood samples from two healthy men and two healthy women, split each tube of blood into several smaller tubes, processed and examined the RNA from one tube right away and let the others sit on the laboratory bench at room temperature for up to 48 hours.
“Lo and behold … the samples that sat around for 24 to 48 hours, or even eight hours, they all exhibited this global signal,” Dvinge said. “What we had thought was this massive biological effect present in all leukemias … was in fact more related to how the samples were processed. Just the fact that when blood sits around … in a carrier, in FedEx, or wherever it’s collected at the clinic, that actually causes changes in the RNA within the cell.”
As for the handful of healthy samples in the public data sets, those are nearly always collected at the research site and processed right away, Dvinge said, because it’s easier to find healthy volunteers than volunteers with a given cancer. So what had looked to the scientists like a major difference between people with leukemia and those without was actually a difference in how long their blood sat around in tubes before the RNA could be analyzed.
And the data set from adult leukemia patients that didn’t show the change in abnormal RNA? That was the one databank where researchers were able to process patients’ samples immediately, right where they were drawn.
A cautionary tale
The scientists briefly mourned their fallen hypothesis. “That was a tough week,” Dvinge said. But they soon realized they were onto something equally big, if not bigger. It wasn’t just the malformed RNA that looked different when blood sat around at room temperature, but nearly every kind of RNA, and across the entire genome of these cells. Up to 40 percent of seeming differences in RNA between leukemia patients and healthy people could be due to this artifact, the researchers found.
This matters not only because previously published studies on how RNA is altered in leukemia could be wrong, but because, like Bradley and Dvinge, many other research teams are using these large, publically available databanks of cancer samples in their studies – those same samples in which RNA was already affected.
Dr. Eirini Papapetrou, a researcher at Mount Sinai Hospital in New York who studies genes linked to leukemia and other blood diseases, said this finding is especially timely because many researchers are now looking at cancer RNA in a large-scale way.
“I think it is very important to have this in mind as more and more … RNA samples are prepared in different sites, so that people are aware of potential pitfalls that the actual sample preparation and preservation may have on their results,” she said.
That potential impact gave Bradley pause at first. He wasn’t sure how his peers might receive the news. The researchers performed experiment after experiment until they were convinced that this artifact was real and that it was truly widespread in the publically available databanks.
“If you’re going to publish something like this that could impact a lot of studies, you want to really double-check everything,” Dvinge said.
Dvinge recently presented her results at a conference and the information was received very well by other scientists, she said. That gave the team the confidence to be straightforward about how their findings could impact other people’s work.
The researchers emphasized that this artifact is not due to a mistake anyone made, but rather due to the nature of the research.
“It’s not like people are sloppy when they collect the samples, it’s just that sometimes it’s not feasible to do it any other way,” Dvinge said.
‘Where the truth lies’
The way the scientists themselves stumbled on this finding ultimately made them realize how significant their data would be to the community. And they knew similar large-scale studies of cancer RNAs would only become more common as more samples are collected and technologies to examine them become more readily available.
Other research teams had found similar artifacts on a smaller scale, but nobody before Bradley and Dvinge had pored over the reams of public cancer data to figure out how widespread the problem was. As large data sets become more ubiquitous in research, studies like this are needed, Papapetrou said, but these cautionary tales shouldn’t discourage scientists from using that information.
“It is important that Dr. Bradley and his colleagues undertook these tasks to try to sort out where the biological truth lies in the data and what can be just technical noise,” she said. “The message from this work is that we should not hide behind the possibility that there may be artifacts in our data, but we should find them to the best of our ability and try to sort them away from what is more likely to be a real finding.”
Bradley and his team provide signposts for that sorting. They found that the simple step of chilling tubes of blood on ice while they are waiting to be processed largely mitigates the artifact, but for those cases in which samples are already collected or it’s not possible to change how blood is treated in the clinic, the scientists came up with a set of “marker” RNAs that other scientists can use to determine whether the artifact is likely affecting their results. If so, they can correct for it.
In the long term, the impact of these findings might be greater than their disproven hypothesis about malformed RNA in leukemia, Bradley said, because they could help many more leukemia researchers uncover what is really true and keep them from wasting time on false results that ultimately won’t translate to help patients.
“It was a really interesting case study of a project, in that it had this unexpected detective aspect to it,” Bradley said. “It’s certainly not what we were expecting.”