Student Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)

RU Laboratory

Bieniasz Laboratory


About 8 and 10 percent of the human and mouse genomes, respectively, are comprised of sequences of retroviral origin. Occasional infection of germ line can lead to integrated retroviral genomes being vertically inherited as host alleles. During thousands to millions of years, some of these sequences acquired inactivating mutations and were fixed in ancestral populations by genetic drift, while others became fixed by providing an evolutionary advantage to the host. Those inherited proviruses are termed endogenous retroviruses (ERVs) and have been identified in a variety of animal species representing an extensive viral “fossil” record of past retroviral infections. With the advent of whole genome sequencing projects and high throughput sequencing platforms, it became evident the wide diversity and the important role that these sequences have had in the evolution of their hosts. In the present study we developed a computational framework to identify ERVs in primate and murine genomes. The results of these genome screenings were used to identify suitable candidate sequences in which to perform paleovirological analyses that lead to the successful reconstruction of two ancient retroviruses. MuERV-L is an env-deficient highly abundant mouse specific ERV that has undergone two amplification bursts, being the more recent and prolific ~2 million years ago (MYA), probably through entirely intracellular mechanisms. MuERV-L is transcriptionally active at the two-cell stage of the mouse embryo and recent studies have implicated the co-option of its LTR as a promoter for totipotency genes. In the present work, we describe the analysis and reconstruction of an infectious ancestral MuERV-L (ancML) sequence through paleovirological analyses of MuERV-L elements in the mouse genome. The resulting ancML sequence was infectious in CHO cells and its replication was dependent on reverse transcription. We found that IFN-α could reduce ancML replication by ~20 fold. Additionally, we found that the expression of mouse APOBEC3 was able to restrict the replication of ancML. However, inspection of endogenous MuERV-L sequences suggested that the impact of APOBEC3 mediated hypermutation on MuERV-L evolution was limited. We discussed the possibility that type I IFN responses (maybe through restriction factors) might inhibit MuERV-L replication at the two-cell stage of the mouse embryo and have kept MuERV-L copy numbers under control. Although no extant human gammaretroviruses have been identified, HERV-T is a low copy primate ERV lineage that is closely related to the gammaretrovirus genus. Through phylogenetic and genomic analysis of HERV-T insertions we defined three distinct lineages. Two lineages (HERV-T1 and HERV-T2) entered the primate germline after the Old World monkey-ape split about ~32-30 MYA, whereas the other (HERV-T3) entered before this divergence ~40 MYA. Phylogenetic analysis of complete (LTR-gag-pol-env-LTR) proviral sequences showed that HERV-T2 was subjected to APOBEC3 mediated hypermutation, and subsequently expanded in apes, most likely through retrotransposon-like mechanisms. Phylogenetic and statistical analysis of HERV-T3 proviruses allowed us to estimate the sequence of their ~32 MY old ancestor, revealing that its unusually long leader sequence encoded a 855-nucleotide ORF separated from gag by 36 nucleotides. This pre-gag ORF of unknown function putatively codes for a protein that includes a transmembrane domain. Additional analysis of the HERV-T3 ancestral sequence allowed us to reconstruct the corresponding env sequence (ancHTenv). We found that a modern gammaretrovirus (MLV) could be pseudotyped with ancHTenv enabling it to infect a wide variety of primate cell lines with titers that are similar to MLV particles carrying the amphotropic MLV envelope. A single HERV-T proviral insertion in the genome of all great apes contains an env gene with full coding potential. Proteins encoded by the extant human HERV-T envelope gene (HsaHTenv) and one estimated to be encoded by the hominid ancestor were not able to generate infectious MLV pseudotyped particles, probably because HsaHTenv is not correctly processed into its mature and functional form. Statistical and phylogenetic analyses indicate that the env gene in this locus is evolving slower than the rest of the proviral sequences, and that selective pressures have acted on this locus to conserve its envelope sequence. Remarkably, we found that expression of the HsaHTenv was able to specifically block infection by MLV particles pseudotyped with the ancHTenv, but not particles pseudotyped with the amphotropic MLV envelope. Additionally, we identified MOT1 as the receptor used by ancHTenv. Further experiments are needed in order to test the hypothesis that HsaHTenv served as a restriction factor through interference with the receptor once used by HERV-T. As paleovirology also studies the evolution of the host defense mechanisms that have been shaped by past retroviral infections, we investigated the origins and evolution of tetherin, an orphan antiviral protein with no known homologs. We found that tetherin function is encoded by genes that exhibit no sequence homology and share only a common architecture and location in modern jawed vertebrate genomes, indicating an origin of ~450 MYA. Moreover, tetherin is part of a cluster of three potential sister genes that includes pv1 and a putative gene of unknown function, here referred as tm-cc(at), which encode proteins of similar architecture. Some variants of these proteins exhibit antiviral activity while others can be endowed with antiviral activity following a simple modification. Only in a slowly evolving species (coelacanths) does Tetherin exhibit homology to TMCC( aT). We suggest that neofunctionalization, drift and positive selection drove a near complete loss of sequence similarity among modern tetherin genes, and between tetherin and its sister genes. Scenarios by which this orphan gene may have arisen and evolved exemplify how protein modularity, evolvability and robustness can create new functions and preserve them, despite sequence divergence due to genetic conflict with past and present viruses.


A Thesis Presented to the Faculty of The Rockefeller University in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy

Included in

Life Sciences Commons