Researchers discover one million new components of the human genome

“We’ve started to chip away at the dark genome by finding nearly one million previously unknown exons through a method called exon trapping”
""

Timothy Hughes, professor and chair of  U of T’s department of molecular genetics in the Temerty Faculty of Medicine, is the principal researcher on a study that found nearly one million new exons, or stretches of DNA that are expressed in mature RNA (photo courtesy of the Donnelly Centre)

Researchers at the University of Toronto’s Donnelly Centre for Cellular and Biomolecular Research have found nearly one million new exons – stretches of DNA that are expressed in mature RNA – in the human genome.

There are around 20,000 protein-coding genes in humans that contain approximately 180,000 known internal exons. These protein-coding regions account for only one per cent of the entire human genome. The vast majority of what remains is a mystery – aptly referred to as the “dark genome.“

“We’ve started to chip away at the dark genome by finding nearly one million previously unknown exons through a method called exon trapping,” said Timothy Hughes, principal investigator on the study and professor and chair of the department of molecular genetics in U of T’s Temerty Faculty of Medicine.

“The technique involves an assay with plasmids to find exons in DNA fragments of unknown composition,” said Hughes, who holds the Canada Research Chair in decoding gene regulation and the John W. Billes Chair of Medical Research at U of T. “While exon trapping is not widely used anymore, it proved to be effective when used in combination with high-throughput sequencing to scan the entire human genome.”

The findings were published recently in the journal Genome Research.

Exons are segments of the genome that can encode proteins to direct tissue development and biological processes within the body. They are considered to be autonomous if they don’t require external assistance to splice into a mature RNA transcript, which is then translated into a protein.

The team behind the study was driven to test the exon definition model that guides research in molecular genetics after questioning one of its assumptions – that the accurate removal of non-protein-coding intron regions of the genome is aided by clear and consistent indicators of where exons begin and end. This assumption does not seem to hold in all cases as the splicing of exons does not always go smoothly, sometimes resulting in mature RNA transcripts that contain non-functional components.

“Almost none of the newly discovered exons are found consistently across genomes of different species,” said Hughes. “They seem to appear in the human genome mainly due to random mutation and are unlikely to play a significant role in our biology. This is evidence that evolution in humans involves a lot of trial and error – most likely enabled by the vast size of our genome.”

It is helpful to document randomly mutated exons within the human genome as their translation could potentially be harmful. Long non-coding RNA exons, which are autonomous but often have no known function, have been connected to the development of cancer. Of the roughly 1.25 million known and unknown exons the team found through exon trapping, almost four per cent were long non-coding RNA exons.

In addition, the exons residing within non-coding introns, called pseudoexons, can mutate to make a weak splice site stronger. This results in the exon being included in a mature RNA transcript, potentially leading to disease.

“This is an interesting study that broadens our knowledge of sequences across the human genome that have the potential to be recognized as exons in transcribed RNA,” said Benjamin Blencowe, professor of molecular genetics in U of T’s Temerty Faculty of Medicine, who was not involved in the study. “While the significance of the majority of the newly detected exons is unclear, some of them may be activated in certain contexts – for example, by disease mutations – and therefore cataloguing them is important. This study will further serve as a valuable resource facilitating ongoing efforts directed at deciphering the splicing code.”

A stronger understanding of the factors impacting exon inclusion in mature RNA can help improve programs like SpliceAI, a widely used tool for predicting splice sites and aberrant splicing. SpliceAI can be trained on new data such as that produced through this study to refine its prediction capabilities.

“SpliceAI often doesn’t provide details on the characteristics of exons and has a poor ability to predict splicing in exons that aren’t already catalogued,” said Hughes. “Our exon trapping data contains biologically meaningful information that can be fed into SpliceAI and other splicing predictors to open up new paths for exploring the dark genome.”

The research was supported by the Canadian Institutes of Health Research and the U.S. National Institutes of Health.

Donnelly Centre