Finding branchpoint needles in a haystack of sequencing data

From the Bradley Lab, Basic Sciences & Public Health Sciences Divisions

Proper expression of proteins in eukaryotic cells requires precise stitching of protein-coding fragments, or exons, from precursor mRNAs that also contain non-coding introns. This process, known as splicing, can be regulated to produce slightly different proteins from the same template in different tissues. However, mis-regulation of mRNA splicing can lead to human disease, including cancer.

During intron removal, an RNA-protein complex known as the spliceosome recognizes and binds to four distinct intronic features: its 5' end (GU), a branchpoint nucleotide, a polypyrimidine tract, and the 3' end (AG). Once bound, the spliceosome catalyzes two successive trans-esterification reactions: the first between the 5' GU and the branchpoint nucleotide, forming an intermediate --called the lariat--and releasing the upstream exon, followed by a second trans-esterification between the recently released upstream exon and the 3' AG, thus excising the intron lariat and forming a translation-ready mRNA sequence (Figure 1).

schematic of splicing and split-read approach
Figure 1. Overview of splicing and the split-read approach to detecting branchpoints from reverse transcribed lariats. First, the lariat is formed via nucleophilic attack by the 2' OH group of the branchpoint (labeled as “A”) on the phosphate group between the 5' splice site and the upstream exon – the first trans-esterification reaction. This, in turn, releases the upstream exon. The 3' OH of the released upstream exon then participates in a nucleophilic attack on the phosphate group between the 3' splice site and the downstream exon, releasing the intronic lariat – the second trans-esterification reaction and the ligation of the two exons. Second, target sequences are aligned with known sequences, aiming to determine any differing branchpoints. Image provided by Jose Pineda.

While intron branchpoints are pivotal in the splicing process, it is difficult to comprehensively annotate and study them due to their transient nature. Furthermore, intron branchpoints are generally assumed to play basal (rather than regulatory) roles due to their evolutionarily conservation in the splicing process. To test this assumption, Dr. Robert Bradley and M.D./Ph.D. candidate Jose Pineda from the Basic Sciences and Public Health Sciences Divisions led an investigation to catalog branchpoints in the human genome and elucidate any possible regulatory features. “We were further motivated by work showing the importance of splicing factor mutations in the development and progression of cancer,” said Pineda. “In particular, mutations in SF3B1, the splicing factor that is involved in branchpoint selection, are found in a wide range of hematopoietic cancers and solid tumors.” Their results were recently published in Genes & Development.

Intron lariats tend to be degraded quickly and are thus rarely reverse transcribed and incorporated into cDNA libraries. Because of this, Pineda and Bradley sifted through 1.31 trillion reads from 17,164 RNA-seq data sets from a diverse set of human tissues in both diseased and healthy states in order to find a sufficient number of lariats to study. These lariats were subsequently scanned using a “split-read” approach, which centers around comparing the cDNA splice site-branchpoint junctions to the genome, transcriptome and known 5'/3' splice sites (Figure 1). Because the number of detected branchpoints varied greatly among introns, Pineda and Bradley isolated and studied only constitutive introns, discovering that 95% of them contain two or more branchpoints, with a median of 6.75 branchpoints per intron.

Although most introns contain multiple branchpoints, the study suggested that the identity of the branchpoint nucleotide was highly preferential; 82.5% of constitutive intron branchpoints were adenines, confirming previously known findings. Furthermore, some adenine branchpoints demonstrated a preference for uracil 2 nucleotides (nt) upstream of the adenine. Non-adenine branchpoints did not display a preference at the surrounding positions.

The suggestion of adenine preference confirmed findings from early biochemical studies on the first intron of the human β-globin gene HBB that revealed a preference for an adenine branchpoint nucleotide 37 nt upstream from the 3' splice site. When this adenine was mutated to a guanine, branchpoint usage shifted to an adenine at the -24 nt position, which had a far lower binding energy between the branchpoint and the U2 snRNA, a small nuclear RNA that is part of the spliceosomal machinery, compared to the initial -37 nt adenine.

In the Bradley lab’s recent study, HBB splicing was explored further in both healthy and diseased human tissues. The researchers found that HBB branchpoint usage is highly tissue-specific; two different sets of non-overlapping branchpoints were utilized to form lariats in blood cells compared to metastatic prostate cancer cells. Normal blood, leukemic peripheral blood, and bone marrow cells utilized branchpoints at the -37, -78, -41, -24 nt positions, whereas metastatic prostate cancer cells utilized branchpoints primarily at the -30 and -26 nt positions (Figure 2). Furthermore, the previously observed -37 nt branchpoint adenine was used 78% of the time in blood cells, whereas it was rarely utilized in the metastatic prostate cancer sample. This key finding suggests that branchpoint selection is more complex than initially thought, and that recognition and regulation could differ in different cell types and in disease states.

differential splice site usage in blood vs. prostate cancer
Figure 2. Estimated branchpoint usage for the first intron of HBB for both bone marrow/peripheral blood and metastatic prostate cancer. Binomial proportion test utilized for P-values and error bars (95% confidence intervals). Image provided by Jose Pineda.

While the mechanisms behind regulation of intron branchpoint selection are incompletely understood, Pineda and Bradley suggest that even slight alterations in these processes could result in cancerous consequences, as seen with splicing factor SF3B1, which is mutated in some cancers and associated with abnormal 3' splice site recognition. When asked about next steps and further research goals, Pineda noted that “we are currently developing techniques to further identify more branchpoints and push the branchpoint annotation effort in the human genome to completion.”

 

Pineda JMB and Bradley RK. 2018. Most human introns are recognized via multiple and tissue-specific branchpoints. Genes & Development. 32(7-8): 577-591.

This research was supported by the ARCS Foundation, Leukemia & Lymphoma Society, Edward P. Evans Foundation and the National Institutes of Health.