Valentina GUASCONI, Hakima YAHI, Slimane AIT-SI-ALI*
CNRS, UPR 9079 Institut André Lwoff 7, rue Guy Moquet Villejuif, France Email : aitsiali@vjf.cnrs.fr
* : corresponding author
December 2002
In eukaryotic cells, there are three different RNA polymerases (RNA Pol). Each RNA Pol is responsible for a different class of transcription : PolI transcribes rRNA (ribosomal RNA), PolII mRNA (messenger RNA), and PolII tRNA (transfer RNA) and other small RNAs.
Any protein that is needed for the initiation of transcription is defined as a transcription factor. Many transciption factors act by recognizing cis-acting sites that are parts of promoters or enhancers. However, binding to DNA is not the only means of action for a transcription factor. A factor may recognize another factor, or may recognize RNA Polymerases. In Eukaryotes, transcription factors, rather than the enzymes themselves, are principally responsible for recognizing the promoter; this contrasts with the modus operandi of bacterial RNA Polymerase, in which a basic enzyme recognizes the promoter, assisted in certain cases by accessory factors.
Transcription factors are able to bind to specific sets of short conserved sequences contained in each promoter. Some of these elements and factors are common, and are found in a variety of promoters and used constitutively; others are specific and their use is regulated.
The factors that assists RNA PolII can be divided into 3 general groups:
RNA PolII cannot initiate transcription itself, but is absolutely dependent on auxiliary transcription factors (called TFIIX, where "X" is a letter that identifies the individual factor). The enzyme together with these factors constitutes the basal (or minimal) transcriptional apparatus that is needed to transcribe any class II promoter (coding genes).
The first step in the complex formation at a promoter containing a TATA box is binding of the factor TFIID to a region that extends upstream from the TATA sequence. TFIID is solely responsible for recognizing a promoter for RNA PolII. TFIID contains 2 types of components: the TATA-binding protein (TBP), a small protein of about 30 KDa, which is responsible for the recognition of the TATA box, and the so-called TAFs (for TBP-Associated Factors). Some TAFs are tissue-specific. The TAFs in TFIID are named in the form TAFII00, where "00" gives the molecular mass of the subunit.
Transcription factors act in a defined order to build a complex that is joined by RNA polymerase and is needed for the initiation of transcription. Footprinting of the DNA regions protected by the growing complex suggests the following model (Figure 1): commitment to a promoter is initiated when TFIID binds the TATA box; then TFIIA joins the complex. The following step is the addition of TFIIB, which is bound downstream to the TATA box ; it may provide the surface that is recognized by RNA polymerase.
The factor TFIIF consists of 2 subunits. The larger subunit (RAP74) has an ATP-dependent DNA helicase activity that could be involved in melting the DNA at initiation. The smaller subunit (RAP38) has some homology to the regions of bacterial sigma factor that contact the core polymerase; indeed, it binds tightly to RNA PolII. TFIIF may bring RNA PolII to the assembling transcription complex. The initiation reaction, as defined by the formation of the first phosphodiester bond, can occur at this stage. Some further general transcription factors, TFIIE, TFIIH and TFIIJ, are required to allow RNA PolII to start moving away from the promoter. TFIIH has several activities, including an ATPase, a helicase, and a kinase activity that can phosphorylate and activate the RNA PoII; it is also involved in repair of DNA damage.
Most of the TFII factors are released before RNA PolII leaves the promoter. A model predicts that phosphorylation of the tail of RNA PolII is needed to release the Pol from the transcription factors, so that it can start the elongation step.
The efficiency and specificity with which a promoter is recognized depend upon short sequences, farther upstream the TATA box, which are recognized by upstream and inducible factors. Examples of these sequences are the CAAT box, which plays a strong role in determining the efficiency of the promoter, and is recognized in different promoters by different factors, such as factors of the CTF family, the factors CP1 and CP2, and the factors C/EBP and ACF, and the GC box, which is recognized by the factor Sp1. These factors have the ability to interact with one another by protein-protein interactions. The main purpose of the elements is to bring the factors they bind into the vicinity of the initiation complex, where protein-protein interactions determine the efficiency of the initiation reaction.
Figure 1: Schematic model for the assembly of the basal transcriptional apparatus.
Common types of motifs that are responsible for binding to DNA can be found. There are several groups of proteins that regulate transcription by using particular motifs to bind DNA :
III.1 Helix-turn-helix proteins
The helix-turn-helix motif was originally identified as the DNA-binding domain of phage repressors; one a-helix lies in the wide groove of DNA, the other lies at an angle across DNA. A related form of the motif is present in the homeodomain, a sequence first characterized in several proteins encoded by genes concerned with developmental regulation in Drososphila ; it is also present in genes coding for mammalian transcription factors. The homeobox is a sequence that codes for a domain of 60 amino-acids. The homeodomain is responsible for binding to DNA; the specificity of DNA recognition lies within the homeodomain. Its C-ter region shows homology with the helix-turn-helix motif of procaryotic repressors. Homeodomain proteins can be either transcriptional activators or repressors. The nature of the factor depends on the other domain(s) the homeodomain is responsible solely for binding to DNA. The activator or repressor domains both act by influencing the basal apparatus, maybe by binding to coactivators or corepressors ; the repressor Eve, for example, interacts directly with TFIID.
III.2 Zinc finger proteins
The zinc-finger motif comprises a DNA-binding domain. It was originally found in the factor TFIIIA, which is required for RNA PoIIII to transcribe 5S rRNA genes. These proteins take their name from their structure, in which a small group of conserved aminoacids binds a zinc ion. Two types of DNA-binding proteins have structures of this type: the classic " zinc finger " proteins, and the steroid receptors.
A " finger protein " typically has a series of zinc fingers; the consensus sequence of a single finger is:
Cys-X2-4-Cys-X3-Phe-X3-Leu-X2-His-X3-His
The motif takes its name from the loop of aminoacids that protrudes from the zinc-binding site and is described as the Cys2/His2 finger.
Figure 2: A series of three zinc fingers (e.g. transcription factor Sp1)
The fingers are usually organized as a single series of tandem repeats ; the stretch of fingers ranges from 9 repeats that occupy almost the entire protein (as in TFIIIA), to providing just one small domain consisting on 2 fingers ; the general transcription factor Sp1 has a DNA-binding domain that consists of 3 zinc fingers (Figure 2).
The C-terminal part of each finger forms a-helices that bind DNA ; the N-terminal part form b-sheets. The non-conseved aminoacids in the C-terminal side of each finger are responsible for recognizing specific target sites.
Steroid receptors, which are activated by binding a particular steroid (e.g. glucocorticoids, thyroid hormone, retinoic acid), and some other proteins, have another type of finger. The structure is based on a sequence with the zinc-binding consensus :
Cys-X2-Cys-X13-Cys-X2-Cys
These are called Cys2/Cys2 fingers. Proteins with Cys2/Cys2 fingers often have non-repetitive fingers, in contrast with the tandem repetition of the Cys2/His2 type. Binding sites on DNA are usually short and palindromic. The glucocorticoid and estrogen receptors each have 2 fingers, that form a-helices that fold together to form a large globular domain.
- Example : Nuclear hormone receptors
The nuclear hormone receptor superfamily includes receptors for thyroid and steroid hormones (glucocorticoids, mineralocorticoids, estrogens, androgens, progesterone), retinoids and vitamin D, as well as different " orphan " receptors of unknown ligand. Ligands for some of these receptors have been recently identified, showing that products of lipid metabolism such as fatty acids, prostaglandins, or cholesterol derivatives can regulate gene expression by binding to nuclear receptors. Many other orphan receptors may have still an unidentified ligand, but other may act in a constitutive manner or could be activated by other means, i.e., phosphorylation. Nuclear receptors act as ligand-inducible transcription factors by directly interacting as monomers, homodimers, or heterodimers with the retinoid X receptor (RXR) with DNA response elements of target genes. The effects of nuclear receptors on transcription are mediated through recruitment of coregulators. A subset of receptors binds corepressor factors and actively represses target gene expression in the absence of ligand. Corepressors are found within multicomponent complexes that contain HDAC activity ; deacetylation leads to chromatin compactation and transcriptional repression. Upon ligand binding, the receptors undergo a conformational change that allows the recruitment of multiple coactivator complexes ; some of these proteins are chromatin remodeling factors or possess HAT activity, whereas others may interact directly with the basic transcriptional machinery. Recruitment of coactivator complexes to the target promoter causes chromatin decompactation and transcriptional activation.
Nuclear receptors are grouped into a large superfamily that comprise different subfamilies. One large family is formed by thyroid hormone receptors (TRs), retinoic acid receptors (RARs), and vitamin D receptors (VDRs), as well as different orphan receptors. Another subfamily contains the retinoid X receptors (RXRs), which bind 9-cis -retinoic acid and play an important role in nuclear receptor signaling, as they are partners for different receptors that bind as heterodimers to DNA. These nuclear receptors are often retained in the nucleus regardless of the presence of ligand. A third family is formed by the steroid receptors, which undergo nuclear translocation upon ligand activation.
A typical nuclear receptor consists of a variable N-terminal region, a conserved DNA-binding domain (DBD), a linker region, and a conserved region that contains the ligand binding domain (LBD). The receptors also contain regions required for transcriptional activation, such as the autonomous transcriptional activation function, AF-1, that contributes to constitutive ligand-independent activation by the receptor, and a second transcriptional activation domain, AF-2, located in the COOH terminus of the LBD, which, unlike the AF-1 domain, is strictly ligand-dependent and conserved among members of the nuclear receptor superfamily.
The N-terminal domain, which is the most variable both in size and in sequence, and in many cases contains an AF-1 domain, shows promoter- and cell-specific activity, suggesting that it is likely to contribute to the specificity of action among receptor isoforms and that it could interact with cell type-specific factors.
The DBD, the most conserved domain of nuclear receptors, confers the ability to recognize specific target sequences and activate genes. It contains 9 cysteines, as well as other residues that are conserved across the nuclear receptor superfamily and are required for high affinity DNA binding. This domain comprises 2 zinc-fingers and a COOH-terminal extension that contains the so-called T and A boxes. Aminoacids required for discrimination of core DNA recognition motifs are present at the base of the first finger in a region termed the " P box ", and other residues of the second finger that form the " D box " are involved in dimerization (Figure 3).
The LBD is a multifunctional domain that, in addition to the binding of ligand, mediates homo- and hetero-dimerization, ligand-dependent transcriptional activity, and, in some cases, hormone reversible transcriptional repression. The LBD domain contains the COOH-terminal AF-2 motif responsible for ligand-dependent transcriptional activation.
Figure 3: The DNA-binding domain of a nuclear receptor.
Nuclear receptors regulate transcription by binding to specific DNA sequences in target genes known as hormone response elements, or HREs. These elements are located in regulatory sequences normally present in the 5-flanking region of the target gene, or in enhancer regions upstream of the transcriptional initiation site. A sequence of 6 bp constitutes the core recognition motif. Most receptors bind as homo- or hetero-dimers to HREs composed typically of two core hexameric motifs. For dimeric HREs, the half-sites can be configured as palindromes (Pal), inverted palindromes (IPs), or direct repeats (DRs).
In contrast to steroid receptors, that almost exclusively recognize palindromic elements (with TGTTCT as typical sequence), nonsteroidal receptors can bind to HREs (TGACCT) with different configurations. In this case, the arrangement as well as the spacing between the motifs are determinant to confer selectivity and specificity. The length of the spacer region is an important determinant of the specificity of the hormonal responses. Thus DRs separated by 3, 4 and 5 bp (i.e., DR3, DR4, and DR5) mediate preferential regulation by vitamin D, thyroid hormone, and retinoid acid respectively. DR1 serves as the preferred HRE for the RXR. Steroid receptors almost exclusively bind as homodimers to the HREs. Although several nonsteroid nuclear receptors also bind DNA as homodimers, many nonsteroidal receptors bind to their HREs preferentially as heterodimers. In this case, the RXR is the promiscous partner for different receptors. Typical heterodimeric receptors such as TR, RAR or VDR can bind to their response elements as homodimers, but heterodimerization with RXR strongly increases the efficiency of DNA binding and transcriptional activity.
HREs are usually located close to recognition sequences for other transcription factors, and interaction between the receptors and these factors, which can result in functional synergism or repression, can play an important role in determining transcriptional rates. Such interactions may serve to restrict a hormonal response to cell types that express the appropriate set of transcription factors.
Regulation of transcription by nuclear receptors requires the recruitment of coregulators, with ligand-dependent exchange of corepressors for coactivators as the basic mechanism for switching gene repression to activation. Ligand-dependent recruitment of coactivators is dependent on AF-2. The best characterized group of nuclear receptors coactivators is the p160 family, which comprise SRC-1/NCoA-1, TIF-2 /GRIP-1/NCoA-2, and p/CIP/ACTR/AIB1/TRAM1/RAC3; they all possess HAT activity. Furthermore, nuclear receptors are able to interact with CBP/p300, and with pCAF. Corepressors are NCoR (or RIP-13 ; associated with unliganded TR and RAR), and SMRT (silencing mediator for retinoic and thyroid hormone receptors) ; they act by recruiting a histone deacetylase to the promoter region.
Nuclear receptor coactivators with HAT activity appear to play a role in malignant transformation and more specifically in APL. The genes for monocytic leukemia zinc finger protein (MOZ) and the coactivator TIF-2 TIF-2 are involved in the inv(8)(p11q13) that causes leukemia; the inversion creates a fusion protein between MOZ and TIF-2, which still contains the CBP interaction domain of MOZ. Another translocation in acute monocytic leukemia also fuses MOZ with CBP. Leukemia cell phenotype observed in both cases could arise by a recruitment of CBP, resulting in modulation of the transcriptional activity of target genes by a mechanism involving abnormal histone acetylation. Furthermore, chromosomal translocations can create fusion proteins between PML (promyelocitic leukemia protein) and PLZF (promyelocitic zinc finger protein), which are corepressors-associated proteins, and RARa; these proteins are oncogenic and result in APL.
III.3 Leucine zipper proteins
The leucine zipper is a stretch of aminoacids rich in leucine residues that provide a dimerization motif. Dimerization allows the juxtaposition of the DNA-binding regions of each subunit. A leucine zipper forms an amphipathic helix in which the leucines of the zipper on one protein could protrude from the a-helix and interdigitate with the leucines of the zipper of another protein in parallel to form a coiled coil domain. The region adjacent to the leucine repeats is highly basic in each of the zipper proteins, and could comprise a DNA-binding site. The 2 leucine zippers in effect form a Y-shaped structure, in which the zippers comprise the stem, and the 2 basic regions bifurcate simmetrically to form the arms that bind to DNA (Figure 4). This is known as the bZIP structural motif. It explains why the target sequences for such proteins are inverted repeats with no separation. Zippers may be used to sponsor the formation of homodimers or heterodimers. There are 4 repeats in the protein C/EBP (a factor that binds as a dimer to both the CAAT box and the SV40 core enhancer), and 5 repeats in the factors Jun and Fos (which form the heterodimeric transcription factor AP1).
Figure 4: A leucine-zipper motif.
Example : AP1 (Activating Protein-1)
AP-1 was originally identified by its binding to a DNA sequence in the SV40 enhancer. Its major component is Jun, the product of the gene c-jun, which was identified by its relationship with the oncogene v-jun carried by an avian sarcoma virus. The mouse genome contains a family of c-jun related genes, JunB and JunD. All of them have leucine zippers that can interact to form homodimers or heterodimers (for a review, see Wisdom, 1999). The other major component of AP1 is the product of another gene with an oncogenic counterpart : the c-fos gene, which is the cellular homologue of the oncogene v-fos carried by a murine sarcoma virus. Expression of c-fos activates genes whose promoters or enhancers possess an AP1 target site. The c-fos product is a nuclear phosphoprotein that is one of a group of proteins (Fos-related antigens, FRA), which constitute a family of fos-like proteins. Fos has also a leucine zipper. Fos cannot form homodimers, but can form a heterodimer with Jun. A leucine zipper in each protein is required for the interaction. The ability to form dimers is a crucial part of the interaction of these factors with DNA. Fos cannot by itself bind DNA, possibly because of its failure to form a dimer, but the Jun-Fos heterodimer can bind to DNA with same target specificity as the Jun-Jun dimer, and this heterodimer binds to the AP1 site with an affinity about 10X that of the Jun homodimer.
These dimers can bind AP-1 DNA recognition elements (5-TGAG/CTCA-3), also known as TREs (phorbol 12-O-tetradecanoate-13-acetate (TPA) response elments), based on their ability to mediate transcriptional induction in response to the phorbol ester tumor promoter TPA. The DNA binding affinities and transactivation capacities of the Jun proteins vary considerably, with c-Jun exhibiting the highest activation potential ; heterodimerization with c-Fos further increase c-Juns transcriptional capacity through the formation of more stable dimers, while heterodimerization with JunB attenuates it.
AP-1 proteins are transcription factors that contribute both to basal gene expression, as well as TPA-inducible gene expression. Many other stimuli, most notably serum, growth factors and oncoproteins, are also potent inducers of AP-1 activity ; it is also induced by tumor necrosis factor (TNFa) and interleukin I (IL-1), as well as by a variety of environmental stresses, such as UV radiation.
AP-1 activity is important in growth control and play a key role in cell transformation. Furthermore, since two of AP-1 target genes are collagenase and IL-2, AP-1 is likely to be involved in inflammation and innate immune response. AP-1 transcription factors have also been implicated in the control of cell death and proliferation ; indeed, they regulate the expression of target genes involved in the two processes. For example, AP-1 may promote cell proliferation by activating the cyclin D1 gene, whose regulatory sequences contain two AP-1 binding sites.
c-Jun is a negative regulator of both p53 expression and its ability to activate target gene transcription. The effect of c-Jun on p53 is likely to be direct, and exerted through a variant AP-1 site in the p53 promoter, but in this case c-Jun represses rather than activates transcription. The major function of c-Jun induction in UV-irradiated cells seems to be the repression of p53-mediated p21cip1/waf1 induction , and thereby allows growth arrested cells to re-enter the cell cycle (thus promoting apoptosis). JunD also interacts with the p53 pathway, since expression of p19Arf is down-regulated by JunD. The INK4a locus is a transcription unit shared by the p19Arf and p16 genes, and recent results suggests that JunB regulates p16 expression , thus the growth inhibitory activity of JunB is likely to be in part dependent on p16 ; indeed, the p16 promoter contains 3 AP-1 binding sites. On the other hand, c-Jun down-regulates p16 transcription.
The pro-apoptotic capacity of c-Jun can be explained also by its ability to activate the FasL gene, which is induced by DNA damaging agents, and indeed contains one AP-1 binding site.
It should be said that, despite the involvement of Jun and Fos in growth control and oncogenesis, positively regulated AP-1 target genes that mediate cell cycle progression were never identified. Pro-mitogenic AP-1 complexes, especially those containing c-Jun, seem to accomplish their growth promoting functions through the repression of tumor-suppressor genes, such as p53, p21cip1/waf1 and p16.
III.4 Helix-loop-helix proteins
The amphipathic helix-loop-helix (HLH) motif has been identified in some developmental regulators and in genes coding for eukaryotic DNA-binding proteins. The proteins that have this motif have both the ability to bind DNA and to dimerize. They share a common type of sequence motif: a stretch of 40-50 aminoacids contains 2 amphipathic a-helices separated by a linker region (the loop) of varying length. The proteins in this group form both homodimers and heterodimers by means of interactions between the hydrophobic residues on the corresponding faces of the 2 helices. The ability to form dimers resides with these amphipathic helices, and is common to all HLH proteins.
Most HLH proteins contain a region adjacent to the HLH motif itself that is highly basic, and which is needed for binding to DNA. Members of the group with such a region are called bHLH proteins. A dimer in which both subunits have the basic region can bind to DNA.
The bHLH proteins fall into 2 general groups. Class A consists of proteins that are ubiquitously expressed, including mammalian E12/E47. Class B consists of proteins that are expressed in a tissue-specific manner, including mammalian MyoD, Myf5, myogenin and MRF4 (a group of transcription factors that are involved in myogenesis, called myogenic regulatory factors, MRFs).
A common modus operandi for a tissue-specific bHLH protein may be to form a heterodimer with a ubiquitous partner. There is also a group of gene products that specify development of the nervous system in Drosophila melanogaster (where Ac-S is the tissue-specific component, and da is the ubiquitous component). The Myc proteins form a separate class of bHLH proteins.
III.4.1 Myogenic bHLH
Perhaps the best understood of the bHLH proteins are the MRFs (myogenic regulatory factos). These proteins are approximately 80% similar in their bHLH regions. In addition, the MRFs have homology outside the bHLH domain, including a cysteine-histidine-rich stretch adjacent to the basic region and a serine-threonine-rich region at the C-terminus. All MRFs are capable of converting the mesodermal cell line C3H10T1/2 into myoblasts; they are expressed solely in skeletal muscle.
The MRFs inhibit cell proliferation and directly/indirectly regulate a cascade of muscle-specific gene expression. The 4 members of the MyoD gene family are important for different stages of muscle development, and have the capacity of both auto- and cross-regulate their own and each othersexpression. MyoD and Myf5 share an overlapping (redundant) function required for generating (or maintaining) muscle cell identity and activating myogenin ; myogenin is required at a later stage of muscle development, specifically during the terminal differentiation of myoblasts to myotubes. MRF4 may have a role in later development.
MyoD forms heterodimers with ubiquitously expressed members of the E2A HLH protein family (alternatively spliced products of the E2A gene, E2-5, E12, E47). The MyoD-E2A heterodimers bind to the E-box motif (CANNTG), which functions as the cognate binding site for all bHLH factors and is present in most muscle specific enhancers. MyoD is able to activate the expression of myogenin, p21, myosin heavy chain (MHC), and desmin. It binds cooperatively to 2 MyoD binding sites, and many MyoD target genes contain 2 or more sites. Indeed, MHC cis-regulatory region does not contain E boxes; it is possible that myogenic factors could bind sequences other than the CANNTG consensus, perhaps as components of complexes with other proteins.
In general, dimers formed from bHLH proteins differ in their abilities to bind to DNA. For example, MyoD-E47 heterodimer forms efficiently and binds strongly to DNA, while MyoD homodimers bind only poorly. Differences in DNA-binding result from properties of the region in or close to the HLH motif. Some HLH proteins lack the basic region and/or contain proline residues that appear to disrupt its function (e.g. Id). Proteins of this type have the same capacity to dimerize as bHLH proteins, but a dimer that contains one subunit of this type can no longer bind to DNA specifically.
The formation of muscle cells is triggered by a change in the transcriptional program that requires several bHLH proteins, including MyoD. The trigger for muscle differentiation is probably a heterodimer consisting of MyoD-E12 or MyoD-E47, rather than a MyoD homodimer. Before myogenesis begins, a member of the non-basic HLH type, the Id protein, may bind to MyoD and/or E12 and E47 to form heterodimers that cannot bind to DNA. It binds to E12/E47 better than to MyoD, and so might function by sequestering the ubiquitous bHLH partner. Over-expression of Id can prevent myogenesis, so the removal of Id could be the trigger that release MyoD to initiate myogenesis.
III.4.2 Myc family
The members of the Myc family of oncogenes (c-Myc, N-Myc and L-Myc) code for nuclear phosphoproteins that appear to promote cell growth and transformation by regulating the transcription of target genes required for proliferation. Mutations which disrupts the regulation or expression level of Myc are found in cancers.
Mycs transcription functions require a C-terminal basic helix-loop-helix-leucine zipper (bHLHZip) DNA-protein interaction motif, which is necessary for the interaction with its bHLHZip partner Max and for the sequence-specific binding to DNA ; the heterodimer binds to the conserved E-box motif CA(C/T)GTG, activating transcription of genes linked to this motif. However, the mechanisms by which Myc activates transcription remain unclear. Myc-Max transcriptional activity on either synthetic promoters or on putative cellular target genes is weak and variable on the order of 2-5 fold.
Myc contains conserved regulatory domains in the N-terminus (Myc boxes, MB1 and MB2), which regulate Mycs transactivation or transrepression functions. Through MB2, Myc interacts with the TRRAP co-factor, a component of the large SAGA (Spt-Ada-Gch5-Acetyltransferase) complex, containing histone-acetyltransferase activity, which can be involved in Myc-mediated transactivation. Indeed, Myc antagonist, Mad, which is a bHLHZip protein that can dimerize with Max and bind to the same E-box consensus as Myc-Max, represses transcription by interacting with the corepressor Sin3, which in turns binds to class I histone deacetylases (HDAC 1 and 2). This suggests that maybe Myc is functioning facilitating the action of other transcription factors by opening up specific segments of chromatin. This may explain the relatively weak transactivation activity of Myc on its own.
Myc has been shown to increase transcription (1,5-2 fold) of cell cycle regulatory genes, such as those encoding , Cyclin D1 and D2, Cyclin A, Cyclin E, cdk25A, p19Arf and Id2. Myc also plays a role in the induction of apoptosis (e.g., p19Arf induction is important in p53-dependent apoptosis). Other putative Myc target genes are odc (ornithine decarboxylase), a-prothymosin (involevd in cell cycle progression), nm23 (inhibitor of metastatic progression), tert, and cad.
A potentially important function for Myc might be to regulate the rate of growth (defined as an increase in cell mass and size) that is thought to be required for cell cycle progression and cell division ; indeed, the Drosophila Myc ortholog (dmyc) is required to mantain the normal size of cells and organs. A majority of genes up-regulated following Myc induction in a variety of contexts, are involved in ribosome biogenesis, energy and nucleotide metabolism, and translational control. So, maybe the capacity of Myc to drive cell cycle progression is due, in part, to stimulation of growth.
IV.1 The p53 protein
The p53 gene was the first tumor-suppressor gene to be identified. p53 mutations are found in 50-55% of all human cancers. The human p53 protein contains 393 aminoacid and is structurally and functionally divided into 4 domains :
All these pathways inhibit p53 degradation, thus stabilizing p53 at high concentration. The amount of p53 protein in cells is determined mainly by the rate at which it is degraded. The degradation proceeds through a process of ubiquitin-mediated proteolysis, involving Mdm2. This process is subjected to a feedback loop, since Mdm2 is a p53 target gene.
The downstream events mediated by p53 take place by 2 major pathways : cell cycle arrest and apoptosis.
p53 target genes can be divided into 4 categories :
cell cycle inhibition. p53 directly stimulates the expression of p21waf1/cip1, an inhibitor of cyclin-dependent kinases (CDKs), which are key regulators of the cell cycle. p21 inhibits both the G1/S and the G2/M transition ; p21 also binds to PCNA, and the p21-PCNA complex seems to block the role of PCNA as a DNA polymerase processivity factor in DNA replication. In epithelial cells, p53 also stimulates the expression of 14-3-3s which helps to maintain a G2 block. Cell cycle arrest can also be mediated by p53 induction of GADD45, which is also induced by DNA damage. GADD45 binds to PCNA, and can arrest the cell cycle. It also plays a direct role in DNA nuclotide excision repair.
Many genes seem to be repressed by p53, but the mechanisms are still unclear.
IV.2 The E2F family
E2F was originally discovered as a cellular activity that is required for adenovirus E1A transforming protein to mediate the transcriptional acitvation of the viral E2 promoter.
E2F controls the G1/S transition of the proliferative cell cycle by regulating the transcription of cellular genes that are essential for cell division. These encode cell cycle regulators, such as Cyclin E, Cyclin A, Cdc2, Cdc25, Rb and E2F1, enzymes that are involved in nucleotide biosynthesis, such as dihydrofolate reductase, thymidylate synthetase and thymidine kinase, and the main components of the DNA replication machinery, such as DNA Polymerase a, Cdc6, ORC1 and the minichromosome maintenance (MCM) proteins.
E2F is inhibited by its association with the tumor suppressor protein Rb, which is in turn regulated by phosphorylation (Figure 5) : in G0/G1, Rb is hypophosphorylated and inhibits E2F activity; during G1, Rb is progressively phosphorylated and, as a consequence, loses its affinity for E2F, allowing cells to proceed through the G1/S transition. Rb is also an essential molecule in terminally differentiating cells, in which E2F target genes are irrevesibly repressed (e.g. muscle cells). Rb can inhibit E2F activity by masking its transactivation domain, or by recruiting chromatin remodeling factors, including HDAC, members of the ATP-dependent complex SWI/SNF, DNMT1, and a Histone H3 Methyltransferase (HMT), most likely Suv39H1. In this scenario, it is possible that Rb represses E2F transcription to varying degrees, even in a two step process. First, Rb-E2F complex could switch off actively transcribed genes by recruiting HDACs to deacetylate lysine 9 of histone H3 ; subsequently, Suv39H1 could methylate histone H3 at lysine 9, creating a binding site for HP1 proteins. The tight association between HP1 and Suv39H1 would facilitate the modification of adjacent histone tails , and allow the silencing effect to be propagated throughout the locus.
Figure 5: The Rb/E2F pathway.
Eight human genes have been identified as components of the E2F transcriptional activity ; these genes have been divided into 2 distinct groups : the E2Fs (1-6), and the DPs (1 and 2). The protein products of these two groups heterodimerize (through a leucine zipper domain) to give rise to functional E2F activity, and all possible combination of E2F-DP complexes exist in vivo. The individual E2F-DP complexes preferentially recognize, via a bHLH DNA-binding motif, the same nucleotide sequence, TTTCCCGC; however, the individual E2F-DP species invoke very different transcriptional responses depending on the identity of the E2F moiety and the proteins associated with the complex. The E2F family can be divided into 3 distinct subgroups - E2F1, E2F2 and E2F3, E2F4 and E2F5, and E2F6 - on the basis of their transcriptional properties. E2F1,2 and 3 are potent transcriptional activators (" activating E2Fs ") ; by contrast, E2F4 and E2F5 seem to be primarily involved in the " active repression " of E2F responsive genes by recruiting the pocket proteins and their associated histone modifying enzymes. Finally, E2F6 also acts as a transcriptional repressor, but through a distinct, pocket-protein-independent manner.
E2F1 was cloned by virtue of its ability to bind Rb. E2F1 was shown to bind to DNA in a DP-dependent manner, and the resulting complex is a strong transcriptional activator of E2F-responsive promoters. E2F2 and 3 are highly homologous to E2F1, and show similar DNA-binding and transactivation properties. E2F1, 2 and 3 could contribute to the repression of E2F-responsive genes by recruiting Rb. However, the key role of these E2Fs is the activation of genes that are essential for cellular proliferation and the induction of apoptosis. E2F1, 2 and 3 are potent transcriptional activators of E2F responsive genes. Overexpression of any of these proteins is sufficient to induce quiescent cells to re-enter the cell cycle ; beside their role in the control of cell proliferation, de-regulation of E2F activity can trigger apoptosis. Apoptosis can be either p53-dependent, and in this case it is likely to be mediated by the transcriptional activation of p19Arf, a known E2F-responsive gene, or p53-independent. In normal cells, the " activating " E2Fs are specifically regulated by their association with Rb, but not with the related pocket proteins p107 and p130. Their release is triggered by the phosphorylation of Rb in late G1 and correlates closely with the activation of E2F target genes. The analysis of mutant mice for E2F1 have evidenced its tumor-suppressor properties (E2F1 deficient mice develop a broad spectrum of tumors), maybe through its ability to induce apoptosis. Furthermore, E2F1 is likely to be involved in the DNA-damage-response pathway, since it can interact with proteins involved in DNA repair (such as NBS and Mre1).
These E2Fs are important in the induction of cell cycle exit and terminal differentiation. They were originally identified and cloned by virtue of their association with p107 and p130. The sequence of these proteins diverge considerably from those of the activating E2Fs. Significant levels of E2F4 and 5 are detected in quiescent (G0) cells, whereas E2F1, 2 and 3 are primarely restricted to actively dividing cells. In addition, whereas the activating E2Fs are specifically regulated by Rb, E2F5 is mainly regulated by p130, and E2F4 associates with each of the pocket proteins at different points in the cell cycle. As E2F4 is expressed at higher levels than the other E2F-family members, it accounts for at least half of the Rb-, p130- and p107-associated E2F activity in vivo. They also have a different subnuclear localization : E2F1, 2 and 3 are constitutively nuclear, whereas E2F4 and 5 are predominantly cytoplasmatic, but it was shown that association with Rb or p130 is enough to induce their nuclear localization in vivo . Indeed, in G0/G1 cells, E2F4 and 5 account for most of the nuclear E2F-DP-pocket protein complexes. As these complexes associate with HDACs in vivo, E2F4 and 5 are thought to be crucial in mediating the transcriptional repression of E2F-responsive genes.
This member of the family lacks the carboxy-terminal sequences that are responsible for both pocket protein binding and transactivation. Overexpression studies showed that E2F6 can repress E2F responsive genes. The " repression domain " of E2F6 binds Ring1 and YY1 binding protein (RYBP), a component of the mammalian Polycomb (PcG) complex ; indeed, E2F6 associates with numerous known PcG proteins in vivo, including the oncoprotein Bmi-1. This suggests that the transcriptionally repressive properties of E2F6 are mediated through its ability to recruit the PcG complex. Bmi-1 is involved in the regulation of senescence and tumorigenity, and its transforming activity seems to depend on its ability to repress the p16INK4A and p19ARF tumor-suppressor genes, which are expressed from the INK4A locus ; it is widely inferred that Bmi-1 mediates the direct transcriptional repression of INK4A through its participation in the PcG complex. As p19ARF is a known-E2F responsive gene, it is likely that E2F6 acts as the sequence-specific DNA-binding component of the Bmi-1 containing PcG complex.
IV.3 The STAT family
Signaling initiates when ligands such as cytokines bind to and activate their cell surface receptors, typically by inducing receptor aggregation (Figure 6). Cytokine receptors such as the interferon or IL-6 receptors, which lack intrinsic tyrosine kinase (TK) activity, recruit members of the Janus kinase (JAK) family of cytoplasmatic TKs to act as intermediates for activation of STATs. The JAK family consists of Jak1, Jak2, Jak3 and Tyk2, each of which may be activated by a variety of receptors. Following ligand engagement, the receptor-associated JAKs become activated by phosphorylating themselves, and subsequently phosphorylate tyrosine residues within the receptor cytoplasmic tails. The receptor phosphotyrosines serve as docking sites for the recruitment of inactive cytoplasmic STAT monomers through interaction with the SH2 domain in STATs. JAK-mediated phosphorylation of an invariant tyrosine residue on the receptor-bound STAT monomer induces the phosphorylated STAT monomers to associate with each other through interaction of the phosphotyrosine of one molecule with the SH2 domain of another molecule. The activated dimers then translocate to the nucleus, where they bind to specific DNA-response elements in the promoters of target genes and thereby induce unique gene expression programs.
In contrast to typical cytokine receptor signaling, activated growth factors receptors with intrinsic TK activity, e.g. EGF receptor and PDGF receptor, may directly phosphorylate STAT proteins. In addition to growth factors receptors and JAKs, other TKs that directly phosphorylate STATs are cytoplasmic kinases such as Src and Abl.
The C-ter portion of STATs contains several key elements required for STAT activation and function. The Src-homology 2 (SH2) domain is a common structural motif among signaling molecules that mediates protein-protein interactions via direct binding to specific phosphotyrosines. Phosphorylation of a critical tyrosine residue activates STATs by stabilizing association of two STAT monomers through reciprocal phosphotyrosine-SH2 interactions to form a dimer. The extreme C-ter part of STATs contains the transactivation domain that is required for transcriptional activation. In addition to activation by tyrosine phosphorylation, phosphorylation of a serine residue in the transactivation domain of some STATs (particularly Stat1 and Stat3) contributes to maximal transcriptional activity. In the N-ter portion, STATs contain the DNA-binding domain and a region that mediates oligomerization between two STAT dimers bound to DNA.
Figure 6: Mechanisms of STAT signaling.
Activation of STATs results in expression of genes that control critical cellular functions including cell proliferation, survival, differentiation and development, as well as specialized cellular functions such as those associated with immune responses. Although the STAT family is highly structurally conserved, there are distinct differences both in primary sequence and in function; studies on knock-out mice helped to role out possible functions of each STAT protein. Stat2 or Stat3 null mice are embryonic lethal, consistent with a fundamental role for these STAT proteins in development. Mice with targeted Stat1 gene disruption are viable, have impaired response to interferons and show high susceptibility to bacterial and viral infections. Stat5 knockout mice are viable with phenotypic defects that are tissue-specific, including defects in mammary gland development and lactation during pregnancy, as well as sexual dimorphic pattern of liver gene expression, infertility and immune dysfunction.
In normal cells, ligand dependent activation of the STATs is a transient process ; in contrast, in many cancerous cell lines and tumors, the STAT proteins (in particular Stat1, 3 and 5) are persistently tyrosine-phosphorylated or activated. STATs play an important role in controlling cell cycle progression and apoptosis. Stat1 plays an important role in growth arrest, in promoting apoptosis and is implicated as a tumor suppressor, while Stat3 and Stat5 are involved in promoting cell cycle progression and cellular transformation, and preventing apoptosis.
Diverse oncoproteins (such as those of the Src and Abl family) can activate specific STATs (particularly Stat3 and Stat5), and constitutively activated STAT signaling directly contributes to oncogenesis. Indeed, inappropriate activation of specific STATs occurs with surprisingly high frequency in a wide variety of human cancers. Involvement of aberrant JAK activation in human cancers is linked to a chromosomal abnormality in acute lymphocytic leukemia (ALL): chromosomal translocation of the short arm of chromosome 9, containing the kinase domain of Jak2, to the short arm of chromosome 12, containing the oligomerization domain of Ets transcription factor, results in a fusion protein (Tel-Jak2), possessing constitutive kinase activity. Enforced dimerization due to the contribution from the Ets portion of the molecule mimics cytokine-induced activation of Jak2 at membrane-bound receptors. Indeed, many of STATs target genes are directly involved in the control of cell proliferation and survival.
Expression of constitutively activated form of Stat3 induces transcription of reporter contructs containing promoter sequences derived from the bcl-xL gene. A strong correlation exists between elevated levels of members of the Bcl-2 family of anti-apoptotic regulatory proteins, including bcl-xL protein, and human cancers. Constitutively activated STAT signaling, particularly involving Stat3 and Stat5, contributes to malignant progression by preventing apoptosis.
On the other hand, the expression of cyclin D1, which controls progression from G1 to S phase, is elevated in cells expressing the constitutively activated form of Stat3. In addition, regulation of cyclin D1 gene by Stat5 has also been observed. Thus, activation of the two STAT family members most strongly associated with oncogenesis, Stat3 and Stat5, results in increased expression of a critical regulator of cell cycle progression. These findings are consistent with the suggestion that inappropriate STAT activation contributes to oncogenesis by stimulating cell proliferation.
Also p21WAF1/CIP1 and c-Myc are STATs target genes: Stat1 activation in response to interferon-g results in induction of p21WAF1/CIP1and growth arrest ; indeed, the p21WAF1/CIP1 promoter contains three STAT binding sites, two of which bind Stat1 and one that binds Stat1 and Stat3. c-Myc protein is a critical regulator of both cell proliferation and survival; Stat3 is directly or indirectly implicated in regulating c-Myc, and this transcriptional regulation appears to be mediated by Src family kinases.
Stat3 and Stat5 regulate multiple cellular genes that could contribute to oncogenesis by mechanisms that include promotion of cell cycle progression and prevention of apoptosis. While Stat1 is also activated in some tumors, it is likely the major STATs involved in promoting oncogenesis are Stat3 and Stat5.
Atlas of Genetics and Cytogenetics in Oncology and Haematology
Transcription factors
Online version: http://atlasgeneticsoncology.org/deep-insight/20043/transcription-factors