Institute of Molecular, Cell Biology, Mannheim University of Applied Sciences, Mannheim, Germany
Transcript processing. Northern blot analyses have shown that for humans depending on the tissue PEG10 transcripts of different size exist. One between 6 and 7 kb, corresponding to the major 6.6 kb PEG10 transcript, as well as minor sized transcripts (Ono et al., 2001; Smallwood et al., 2003; Lux et al., 2005). The major 6.6 kb PEG10 transcript is polyadenylated and at the distal end of exon 2 there are two canonical polyadenylation sequences, AATAAA. In a recent study, minor sized alternatively polyadenylated transcripts were isolated but none of these transcripts contained the typical polyadenylation signal motif at their 3-end nor any known alternative polyadenylation signal sequence motifs (Lux et al., 2010). If PEG10 transcripts are truly subjected to alternative polyadenylation then future studies have to address the question whether this influences PEG10 expression, mRNA stability, mRNA localisation or translation and if it might be related to pathological processes. Because PEG10 is most likely derived from a retrotransposon it is interesting to note that non-conserved poly(A) sites are associated with transposable elements to a much greater extent than conserved ones (Lee et al., 2008).
Translation. In order to perform the -1 frameshift, the reading frame 1 (RF1) - reading frame 2 (RF2) overlap sequence contains a seven nucleotide "slippery" sequence with typical consecutive homopolymeric triplets. The underlined PEG10 "slippery" heptanucleotide sequence G GGA AAC TC follows the general pattern of X XXY YYZ where the A- and P-site tRNAs detach from the zero frame codons XXY YYZ and re-pair after shifting back one nucleotide to XXX YYY and restart translation with the codon after the YYY triplet. Thus, the deduced amino acid sequence of the frameshift site after frameshift translation is GNL. The heptanucleotide "slippery" sequence is completely conserved in all species and the sequence of the downstream pseudoknot is completely conserved in the mammalian species. except for one nucleotide change in the rodent sequence. A detailed analysis of the PEG10 frameshift sequence was done by Manktelow and colleagues (2005). Due to alternative splicing two transcript variants exist, PEG10-A and PEG10-B, leading to several protein isoforms. These isoforms are first, the result of different translation initiation sites in reading frame 1 (RF1) (Lux et al., 2010). Second, due to the fact, whether the reading frames 1 and 2 (RF2) are translated into an RF1 protein or into an RF1/2 protein by succesfull -1 frameshift translation, and third, in the RF1/2 translation products, shortly after the frameshift site, there is a retroviral typical functional aspartic protease motif usually for Gag-Pol protein processing leading to proteolytic cleavage products (Clark et al., 2007). It was demonstrated that upstream of the originaly predicted ATG translation initiation site (TIS) a second in frame non-ATG exists. For clarification, the non-ATG translation site will be named TIS-1a and the previous ATG start site TIS-2. This non-ATG start codon, a CTG, is 102 nucleotides upstream of the ATG start codon. The alternative splice event that leads to transcript PEG10-B introduces an additional in frame ATG start codon even further upstream of the previous two, which will be named TIS-1b. Their might be an additional TIS further downstream of TIS-2 (Lux et al., 2010). Transcript PEG10-A codes for PEG10-RF1 (using TIS-2), PEG10-RF1a (using TIS-1a), PEG10-RF1/2 and PEG10-RF1a/2. While in theory, transcript PEG10-B can lead to all six isoforms, PEG10-RF1, PEG10-RF1a, PEG10-RF1b (using TIS-1b), PEG10-RF1/2, PEG10-RF1a/2 and PEG10-RF1b/2. Figure 1 shows in a more schematic way the different TIS for transcripts PEG10-A and PEG10-B. The deduced amino acid sequences of the different isoforms are listed in figure 2. Investigation of PEG10 translation in mouse placenta during gestation and human placenta showed that in vivo both reading frames are translated as an RF1 protein and an RF1/2 fusion protein (Clark et al., 2007). The mouse RF1/2 protein is about 40 kDa larger than the corresponding human protein due to an in frame insertion of approximately 600 nucleotides into the RF2 sequence. Interestingly, the size of RF1 and RF1/2 proteins and the translational frame shift efficiency varies during gestation. From 9.5 dpc when PEG10 expression in mice is first detectable, the 150 kDa frameshift protein is dominant. By using an RF1-specific antibody the frameshift efficiency was estimated and showed an apparent decrease from 68% at 9.5 dpc to 43% by 21.5 dpc. At 15.5 dpc an additional protein of 105 kDa was detected at about equal amounts as the 150 kDa protein. At late gestation, 21.5 dpc, it was present in greater amounts than the 150 kDa PEG10-RF1/2 protein. Mass spectrometry analysis identified the 105 kDa protein as a PEG10 product consisting primarily of PEG10 RF2 but containing peptides from both reading frames. PEG10 protein analysis for amniotic membrane showed a similar profile to that of placenta. Starting with a low expression at 9.5 dpc and then an increased and continued expression throughout gestation. The RF1/2 fusion protein again was the dominant band. Surprisingly, at 10.5 dpc a transient RF1 protein of increased mass of about 50 kDa was detected that disappeared at later time points and only the slightly smaller 47-kDa band identical in size to that in placenta was present again. Furthermore, in this report three RF1 protein populations were detected for HepG2 cells ranging from 47 to 55 kDa. Whether these different PEG10 protein masses are the result of post-translational modifications or due to the use of different TIS or a mixture of both is not clear and awaits further investigations. In addition, western blot analysis with an RF1-specific antibody of adult mouse heart, spleen and brain tissue extracts showed a weak, single protein band but of different mass, around 50 kDa, for each tissue (Clark et al., 2007). No RF1/2 proteins were detected. The authors concluded based on their further analysis that these proteins do not represent Peg10 proteins.
Protein domains/motifs. By bioinformatic analyses using different programmes like the Simple Modular Architecture Research Tool (SMART), the SUPERFAMILY Sequence Search (SCOP domains) and the Eukaryotic Linear Motif (ELM) resource for functional site prediction, several domains and motifs were predicted. Some are exemplarily shown schematicaly in figure 3 for the 784 amino acid long PEG10-RF1b/2 protein. The Zink-finger domain was consistently identified although the size of the domain varies from amino acid 357-389 or a core region from 370-386 for a ZNF-C2HC (CX2CX4HX4C) consensus sequence, which is highly conserved in Gag proteins in most retroviruses and some retrotransposons. There are two proline rich regions, one at the N-terminus and one at the C-terminus. Proline-rich regions are recognized as presenting binding motifs to, for example, Src homology 2 (SH2) and SH3 domains. The ELM programme predicted the C-terminal proline stretch to be a possible binding site for SH3 domain containing proteins. As already reported, PEG10 contains a retroviral typical aspartyl protease consensus sequence, AMIDSGA. In order to test whether this motif is catalytic active the aspartate was mutated to an alanine (Clark et al., 2007). This change disrupted the protease activity and proved that the aspartyl protease is responsible for the cleavage of the full length PEG10 frameshift protein in to the RF1 and RF2 parts. Taken the protease activity into account the previously estimated PEG10 frameshift efficiency of 15-30% (Shigemoto et al., 2001; Lux et al., 2005) was reestimated to be 60% (Clark et al., 2007).
Interacting proteins. Aside from the protease motif, for none of the other domains it is known whether they are functional nor if they bind to other proteins. The only known binding partners for PEG10 are currently the SIAH1 and SIAH2 proteins (Okabe et al., 2003) and the TGF-beta type I receptor ALK1 (Lux et al., 2005). All three proteins were identified by a yeast two-hybrid screen with the PEG10-RF1 protein and the interactions were confirmed by co-immunoprecipitation experiments. The exact SIAH1/SIAH2 binding region was not determined, but the ELM programme predicted a potential SIAH1 binding site (figure 3, PEG10-RF1b amino acids 329-337). Co-immunoprecipitation experiments by overexpressing PEG10-RF1 and several other type I and II receptors of the TGF-beta superfamily in COS-1 cells showed that PEG10 does also interact with other members of this receptor group (Lux et al., 2005). Nevertheless, when specifically investigated in the two-hybrid assay under stringent conditions none of these receptors reacted with PEG10-RF1 to activate the reporter system. Thus, the most specific interaction appears to be with ALK1.
NCBI: 23089 MIM: 609810 HGNC: 14005 Ensembl: ENSG00000242265
dbSNP: 23089 ClinVar: 23089 TCGA: ENSG00000242265 COSMIC: PEG10
Andreas Lux
PEG10 (paternally expressed 10)
Atlas Genet Cytogenet Oncol Haematol. 2011-02-01
Online version: http://atlasgeneticsoncology.org/gene/44104/peg10-(paternally-expressed-10)