Student Theses and Dissertations


Bahar Taneri

Date of Award


Document Type


RU Laboratory

Gaasterland Laboratory


alternative splicing, alternative splicing databases, NOVA, eukaryotic gene expression


Analyzing transcriptomes in the context of all available genome and transcript sequence data has the potential to reveal biologically meaningful insight into functional properties of genes and complexity of genomes. Alternative splicing is one of the major mechanisms contributing to the complexity of genomes. This important cellular process generates several different messenger R N A transcripts from a single gene, expression of which produces structurally and functionally different proteins. Regulation of alternative splicing could be tissue-specific, developmental stage and/or physiological condition dependent. Comprehensive analysis of alternative splicing is essential to understand fully the capacity of genomes and thus proteomes. Comparative analyses of alternative splicing across species can provide significant biological insight not only to evolution of alternative splicing, but also to its regulation and functional significance. For comprehensive analyses of alternatively spliced genes, we developed and utilized databases of alternatively spliced transcripts in transcriptomes of Homo sapiens, M u s musculus and Rattus norvegicns. Our databases allow in-depth analyses of alternative and constitutive exons within alternatively spliced genes. Interactive web implementation of our databases brings to end-users the ability to instantly identify orthologous human-mouse, human-rat and mouse-rat gene-pairs with their corresponding exons. A novel visualization method w e introduce, provides easy access to conserved alternative splicing data and a tool to explore the evolutionary significance, regulation and function of this important biological process. Our statistical analysis showed high prevalence of variant loci in human, mouse and rat transcriptomes. 8 1 % of h u m a n loci are variant, as are 7 4 % of mouse loci and 5 8 % of rat loci, revealing widespread presence of alternative splicing in all three transcriptomes. W e further showed that alternative splicing events are mainly due to the presence or absence of cassette exons. More than 6 0 % of alternative exons are cassette exons in all three transcriptomes. Specifically, to analyze the impact of alternative splicing on transcription factor protein structure, we studied the effect of cassette exons on protein domain architectures of mouse transcription factors. We showed that alternative splicing preferentially adds or deletes domains important in DNA-binding function of the transcription factors. 7 5 % of the domains affected by cassette exons are DNA-binding domains. Further, we showed that there is a single transcription factor isoform within a given tissue and isoforms differ across different tissues indicating tissue-specificity of alternatively spliced transcription factors. These results indicate that alternative splicing might contribute to differential gene expression via creation of tissue-specific transcription factor isoforms. In addition, we showed that in the human transcriptome, there is a high prevalence of transcript sequence data from cancer tissues. More than 80% of human variant loci contain transcripts from cancer tissues. We showed that cancer transcripts introduce variation beyond normal alternative splicing via cancer-specific cassette exons. In the majority of tissues, more than 20 % of the cassette exons are from cancer transcripts only. Our results quantitatively validate presence of aberrant alternative splicing in cancer sequence data. Lastly, through a comparative analysis of alternatively spliced genes in transcriptomes of Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Plasmodium falciparum to those in human, mouse and rat transcriptomes, w e showed that there is more alternative splicing in genomes of more complex organisms and that there is an elevation of alternative splicing in mammalian genomes.


A thesis presented to the faculty of The Rockefeller University in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

Permanent URL

Included in

Life Sciences Commons