(*) Corresponding authors : Philippe Dessen
April 2016
Content I- Chromosome rearrangements/Hybrid genes
Introduction Most gene fusions alter the expression and/or the function of normal genes, and they are generally strong driver mutations in neoplasia. They can provide important information for the classification of tumors (e.g. the well-known problem of "small round blue cell tumors") and may become the target for therapy (e.g. tyrosine kinase inhibitors). Since the discovery of the "Philadephia chromosome" (BCR/ABL1 fusion), hundreds and thousands of gene fusions have been highlighted. Several sets of hybrid genes (or "fusion genes") have been published during the last few years. The main resources in cytogenetics deal "every minute-every day" with all the structural and numerical chromosome rearrangements: translocations ("t"), inversions ("inv"), insertions ("ins"), dicentrics ("dic", accompanied with "ace" when the dic occurs) (generating many hybrid genes/fusion proteins at the origin and/or in the process of cancer development), and also deletions ("del"), duplications ("dup") (generating hybrid genes at the breakpoints, and/or gene copy number changes), isochromosomes ("i"), double minus ("dm") and homogeneously staining regions ("HSR"), monosomies and trisomies, with massive gene copy number changes, marker chromosomes, with high levels of any possible abnormality, and rings ("r"), so instable that their significance, in term of carcinogenesis, at the cell level, remains anecdotic/unpredictable/unknown (see http://atlasgeneticsoncology.org/Educ/PolyMecaEng.html). Various types of databases have been developed. Majority of this data is integrated in the COSMIC database (as studies presented in http://atlasgeneticsoncology.org/cosmicstudies.html) or in the Mitelman database resulting in redundant information in various databases. The Atlas of Genetics and Cytogenetics in Oncology and Haematology (http://atlasgeneticsoncology.org) provides peer reviewed articles/cards on chromosome abnormalities, clinical entities and genes. Primer sequences for the verification for hybrid genes can be provided in the literature (Lovf M et al., 2011; Skotheim RI et al., 2009; Urakami K et al., 2016). Hybrid genes can be present in tumors but as well in normal tissues (Babiceanu M et al., 2016). The International System for Human Cytogenetic Nomenclature (ISCN) is the nomenclature used to describe normal and abnormal karyotypes. Languages with specific grammars have been invented in logic and in mathematics with specific grammars (see https://en.wikipedia.org/wiki/Portal:Logic). The ISCN follows this model. It uses operands and, to act on them, unary and binary operators (e.g. "r" (ring) is an unary operator because it acts on one operand (one chromosome), and "t" (translocation) is a binary operator, because it acts on two operands, (the 2 chromosomes involved in the translocation). ISCN originates at the Denver conference, in 1960 (Proposed standard, 1960). Revisions and updates of the ISCN made the interpretation more difficult (ISCN 2013). A new version is being released by the end of 2016 (McGowan-Jordan J et al. (2016) but will not be freely available on the web. I-Chromosome rearrangements/Hybrid genes I-1 Mitelman Database The Catalog of Chromosome Aberrations in Cancer, containing 3,844 cases, was first published in 1983. Successive printings were published by Karger, Allen R Liss, and Wiley-Liss (Sixth Edition, 1998). In 2000, the support of the National Cancer Institute (NCI) pushed the Catalog to be an open online database (Figure 1). The last update in Februrary 2016, included a total number of cases amounting to 66,479, implicating 10,277 gene fusions (Heim S and Mitelman F, 2015). The information is manually collected from literature and subsequently organized into distinct sub-databases: The "Cases Quick Searcher" and the "Cases Full Searcher" contain the information related to chromosomal aberrations in individual cases, with the specific tumor characteristics. The "Molecular Biology Associations Searcher" compile events according to the gene rearrangements, with a mention to tumor histologies (Figure 2). It is accessed by a Gene List" (from A2M, A2M/ALK, A2M/ARFGEF2, to ZZZ3/NCK1). The "Clinical Associations Searcher" has established its database on tumor type, related to chromosomal aberrations and/or gene rearrangements. The starting point is the "Topography List" presenting the location of the tumor (from Adrenal, Anus, Bladder, Blood vessel, Bone to Vagina), paired with a "Morphology List", according to histology subtypes of the tumor (from Acinic cell carcinoma, Acute basophilic leukemia, Acute eosinophilic leukemia, to Wilms tumour. It is possible to find other sub-databases: "Recurrent Chromosome Aberrations Searcher", providing a way to search recurrent chromosome abnormalities, and the "Reference Searcher", which enquires the bibliographic references. Each sub-database specifies pertinent references with PMID numbers hyperlinked to PubMed.
Figure 1: Mitelman database: "Cases Quick Searcher", "Molecular Biology Associations Searcher", and "Clinical Associations Searcher" (http://cgap.nci.nih.gov/Chromosomes/AllAboutMitelman)
Figure 2: PAX5/JAK2 in the Mitelman database (http://cgap.nci.nih.gov/Chromosomes/MSearchForm, click on: "Expand Gene List", quote: PAX5/JAK2) This free access database shows raw data; it is (almost) finished, showing approximately 99.9% of the different published chromosomal rearrangements, and very reliable (each case is manually collected by prominent experts: Felix Mitelman, Bertil Johansson, and Fredrik Mertens). The Mitelman catalog and database is still an indispensable assistant to every cancer cytogeneticist. Taking in consideration all the progress made in cancer cytogenetics, it would have been much slower without the Mitelman database. I-2 Atlas of Genetics and Cytogenetics in Oncology and Haematology The Atlas of Genetics and Cytogenetics in Oncology and Haematology (Dorkeld et al., 1999; Huret al al., 2013) (http://atlasgeneticsoncology.org) is a peer reviewed all in one freely available online journal (ISSN: 1768-3262), encyclopaedia and database. It is an integrated structure and includes the following topics: genes, cytogenetics and clinical entities in cancer, and cancer-prone diseases. The Atlas combines various types of information: genes, gene rearrangements, cytogenetics, protein domains, function, cell biology, pathways. It also encloses: clinical genetics, cancer prone hereditary diseases and diseases, focused on cancers and other medical conditions. The collection of all these different data helps to unify cancer genetics, while data found elsewhere is dispersed between several sites. The Atlas is the only cancer genetics database quoting prognosis. The iconography in the Atlas (32,554 images) is diverse (medical imaging, pathology, chromosomes, 3-D structure of proteins, genetic maps...). The objectives of the project is to transfer scientific innovation towards research itself, and more precisely towards patient care (translational health research), medical treatment assistance in rare forms of cancer, making the fight against cancer more efficient, decrease the costs in fundamental, applied research and medical, toward a personalised cancer medicine. It is also an appliance for researchers in genomics. Content: The Atlas contains 45,500 pages (30,519 documents) written from 3,216 authors from roughly 50 countries (in decreasing order: France, USA, Italy, United Kingdom, Germany, Japan, Spain, Canada, China, The Netherlands...). The Atlas is mainly constituted of structured review articles or "cards" (original monographs written by invited authors), but also contains traditional overviews, a portal directed to websites and databases dedicated to cancer and/or genetics, case reports in haematology, and various languages teaching items. The Atlas constitutes a fountain of knowledge regarding the biology of normal and cancerous cells. There are 1,460 genes annotated cards (e.g. TP53 http://atlasgeneticsoncology.org/Genes/P53ID88.html), and 27,800 non-annotated cards on genes, 600 leukemias (e.g.Classification of myelodysplastic syndromes http://atlasgeneticsoncology.org/Anomalies/ClassifMDSID1058.html ; see also Figure 3), 210 solid tumors (e.g. Head and Neck Paraganglioma, an overview http://atlasgeneticsoncology.org/Tumors/HeadNeckParagangliomaID6202.html ), 115 cancer prone diseases (e.g. Oculocutaneous Albinism http://atlasgeneticsoncology.org/Kprones/OculocutaneousAlbinismID10022.html), and 110 Deep insight (e.g. The nuclear pore complex: structure and function http://atlasgeneticsoncology.org/Deep/NuclearPoreFunctionID20139.html). The Atlas items are usually looked up by chromosome or using the search box for genes or chromosomal abnormalities, in dedicated pages for solid tumors or for cancer-prone diseases. However, a "Search by Chromosome band" has recently been developed: it is a synthesis of all hybrid gene resources for each chromosome band, representing 435 pages presenting the chromosomal abnormalities, genes implicated, associated with collected data from databases, the literature and links to the original web sites. (e.g. http://atlasgeneticsoncology.org/Bands/1p36.html)
Figure 3 t(9;9)(p13;q24) PAX5/JAK2 in the Atlas (http://atlasgeneticsoncology.org/Anomalies/t0909p13p24ID1559.html) Annotations/Meta-analyses: The Atlas is the only database that gives annotated data with meta-analyses (e.g. survival curves in the t(3;21)(q26;q22) RUNX1/MECOM (http://atlasgeneticsoncology.org/Anomalies/t0321ID1009.html) or in the t(1;11)(p32;q23) KMT2A/EPS15 (http://atlasgeneticsoncology.org/Anomalies/t0111p32q23ID1046.html), which are calculated from the available cases in the literature. Also, the uniquely detailed description of the gene SQSTM1 domains (http://atlasgeneticsoncology.org/Genes/GCSQSTM1.html) is the result of a careful annotation of collected data from various research papers. Since 2000 the Atlas has started to use radiating circle as a way to illustrate partner genes in a translocation (see http://atlasgeneticsoncology.org/Partners.htm), and since then the use has largely expanded. Diagnosis and treatment: The Atlas may contribute to the cytogenetic diagnosis and may guide treatment decision making, particularly regarding rare diseases (numerous, rare diseases are frequently encountered). From the section "Genes", one can obtained 600 genes implicated in , 732 in breast cancer, and 480 genes in prostate cancer (e.g. see paragraph "Other genes implicated" at: http://atlasgeneticsoncology.org/Tumors/breastID5018.html). The improving development of technics in genetics, it now appears that many subtypes of solid tumors may exist (there are potentially hundreds of breast cancer subtypes defined by distinct genetic profile yet to be uncovered), following the leukemia model. Recently, new information on lung adenocarcinoma might give the possibility to consider personalized medicine (see http://atlasgeneticsoncology.org/Tumors/TranslocLungAdenocarcID6751.html ). Together with cell biology developments, proves that the encyclopaedic content of the Atlas and other similar data sources are probably a basis for developing personalized medicine for cancer. ICD-O3 nomenclature: Nosology and phylum of solid tumors and hematological cancers can be found in the Atlas at http://atlasgeneticsoncology.org/Tumors/SolidNosology.html and http://atlasgeneticsoncology.org/Anomalies/ICD-OHematology.html. Cell biology and physio-pathology: Information on cell biology and physio-pathology, can be found in specific pages of the Atlas (e.g.: Angiogenesis: http://atlasgeneticsoncology.org/Categories/Angiogenesis.html). Links: More than 17,000 internal hyperlinks in the Atlas can be found. Gene cards are broadened by external links to up to date databases filing complementary aspects. Educational tools: The Atlas has also worked in developing educational tools in genetics in English, Spanish, and French (e.g. http://atlasgeneticsoncology.org/GeneticFr.html). Altogether with the information described above, this constitutes a step in the continuing medical education. Electronic journal: An Open access electronic journal/pdf version of the Atlas has been developed by Institute for Scientific and Technical Information (INIST) of the French National Centre for Scientific Research (CNRS). Available are the archives of a quarterly journal since 1997, which became a bimonthly journal in 2008 and a monthly journal in 2009, comprising 2,500 articles in more than 120 volumes, which constitutes a 10,000 pages collection, available at: http://irevues.inist.fr/atlasgeneticsoncology. On the other hand, the Atlas is an encyclopedia with 45,000 pages of reference work, unfortunately stays incomplete and partially dated. As a product of collaborative work, the accuracy and the renewal of the Atlas is dependent on colleague participation. I-3 COSMIC (http://cancer.sanger.ac.uk/cosmic) COSMIC is a catalog of somatic mutations in cancer, developed by the Sanger Institute with the support of the Wellcome Trust. It approximately includes all abnormalities, from single nucleotide variations to chromosome rearrangements / hybrid genes. COSMIC includes and displays somatic mutation information, related details and contains information related to human cancers. For hybrid genes, COSMIC describes in v76 (Feb 2016) 17,245 fusions, with 283 fusion genes which are cured, and 1,271 different pairs when taking inferred breakpoints into account (Figure 4). These fusions are part of a global database that is mainly regrouping somatic mutations in cancer. All the fusions are identified with a code (ex: COSF699) and defined on the genome with the standardization of HGVS (http://www.hgvs.org/mutnomen/recs-DNA.html) (ex : PLXND1{ENST00000324093}:r.13016TMCC1{ENST00000393238}:r.9185992) (Forbes SA et al., 2015). (http://cancer.sanger.ac.uk/cosmic/fusion/summary?id=1071) A synthesis of all these resources is integrated in chromosomal band pages of the Atlas (http://atlasgeneticsoncology.org/Bands/1p36.html#GENES) with links to the original websites. Figure 4: PAX5/JAK2 gene fusion at COSMIC I-4 ChimerDB 2.0 (http://biome.ewha.ac.kr:8080/FusionGene/) ChimerDB 2.0 is a database for hybrid genes updated in 2010, with PubMed references and various information about the structure of chimeric genes. (Kim N et al., 2006a ; Kim P et al., 2006b). I-5 TICdb (http://www.unav.es/genetica/TICdb/) TICdb (v 3.3 August 2013) is a database of Translocation breakpoints In Cancer (Novo FJ et al., 2007). This update contains 1,313 sequences of hybrid genes found in human tumors, involving 420 different genes (Figure 5). For every fusion, TICdb will return the HGNC names of both partner genes and the original reference (either a GenBank or a Pubmed ID), as well as the fusion sequence at the nucleotide level. A complete list of genes and the fusion sequences can be obtained at http://www.unav.es/genetica/allseqsTICdb.txt. Figure 5: PAX5 hybrid genes at TicDB (http://www.unav.es/genetica/TICdb/results.php?hgnc=PAX5&x=23&y=8) I-6 ChiTARS (http://chitars.bioinfo.cnio.es/) ChiTARS is a database of chimeric transcripts obtained by the analysis of EST or RNA sequencing with part of experimental validation. This database including 20,750 chimeric human transcripts, has been developed within the ENCODE project (Frenkel-Morgenstern M et al., 2013 ; Frenkel-Morgenstern M et al., 2015). I-7 TCGA Fusion gene Data Portal (http://54.84.12.177/PanCanFusV2/) TCGA Fusion gene Data Portal presents the analysis from 20 tumor types of the TCGA program, as of December 2014, with 10,431 fusions in 2,961 tumors with fusions (a mean of 3.5 fusions per sample) (Yoshihara K et al., 2015). This is the result of a specific pipleline for RNA Seq data analysis (PRADA) developed at the MDAndersson Cancer Center. I-8 Fusion cancer (http://donglab.ecnu.edu.cn/databases/FusionCancer/) (Wang Y, 2015). This database of hybrid genes in human cancers originated from the analysis of RNA-seq data in the Sequence Read Archive (SRA) on NCBI in 15 cancer types and contains 11,839 fusions, with structured information on cancer types, breakpoint accession numbers of SRA and chimeric sequences. I-9 OMIM (http://www.omim.org/ see General resources in Genetics and/or Oncology) The "Online Mendelian Inheritance in Man" (OMIM) catalog encloses 1,523 entries with "fusion gene" (Amberger JS et al., 2015). I-10 Other resourcesBooks: "Cancer Cytogenetics: Chromosomal and Molecular Genetic Abberations of Tumor Cells", by Sverre Heim and Felix Mitelman, is published by Wiley-Blackwell Ref. This major textbook is the fourth edition (2015) and contains 648 pages. Some useful iconography of chromosome rearrangements from the UWCS laboratory, University of Wisconsin, can be found at http://www.slh.wisc.edu/clinical/cytogenetics/cancer/. An analysis of hybrid genes in 675 tumor cell lines has been performed by GenenTech (Klijn C et al., 2015). Of the 2,200 gene fusions catalogued, 1,435 consist of genes not previously found in fusions. A synthesis of cell lines analyses can be found in the Atlas at http://atlasgeneticsoncology.org/celllines.html. Finally, the Mitelman and the Atlas being complementary, the recommendation is that both of these indispensable databases should be used. II- Data for SKY and FISH Fluorescence in-situ hybridization (FISH) technique facilitates the identification of chromosomal structures to be identified using specific probes. This significantly improves the localisation of breakpoints on chromosomes by a direct view of hybridization of probes using one or several colors associated with the probes. The big advantage of the FISH technique is that it can also be used on non-dividing cells (interphase nuclei). BAC clones are used for mapping studies as they contain large inserts of human DNA and can be fluorescently labeled to determine the localization of genes and identify regions implicated in cancer chromosomal aberrations. The Cancer Chromosome Aberration Project (CCAP) has created a set of BAC clones mapped cytogenetically by FISH and physically by STSs to the human genome. The BAC data is integrated into various CGAP and NCBI databases to provide related clinical, histopathologic, genetic, and genomic information (http://cgap.nci.nih.gov/Chromosomes/CCAPBACClones) and more precisely for each chromosome (e.g. http://cgap.nci.nih.gov/Chromosomes/BACCloneMap?CHR=6). The Human BAC Array (http://mkweb.bcgsc.ca/bacarray/ is built using 32,855 clones from RPCI-11, RPCI-13, Caltech-D BAC libraries. The set achieves an average depth of coverage of 1.8X, average effective resolution of 76 kb. Genome-adjacent clones in the set overlap by an average of 73 kb. The set provides coverage of 98% of the human hg17 (May 2004) genome assembly and 98% of the human May 2005 BAC fingerprint map. The clone set is publically available from BACPAC Resources. An easy way to select them is by the Cytogenetic Resource of FISH-mapped, Sequence-tagged Clones at NCBI (http://www.ncbi.nlm.nih.gov/genome/cyto/cytobac.cgi?CHR=6&VERBOSE=ctg). All BAC can be located on the UCSC genome browser (http://genome.ucsc.edu) when BAC end pairs track is selected. On the other hand, BAC from the fishClones file can be visualized on the chromosomal bands on the Atlas (http://atlasgeneticsoncology.org/Bands/) that has a link to their GenBank sequences. More recently, several commercial companies have developed more specific catalogs of FISH clones as oligonucleotides probes (see also the chromosome pages of the Atlas for links). With differently labelled DNA probes (in general as a mixture), combined green/red signals colocalize in yellow in normal cells. In a chromosome translocation the co-localized signal will split, resulting in separate green and red signals, the unaffected chromosome remaining with a yellow signal. Concerning the SKY techniques, there are some resources such as a SKY/M-FISH &CGH database at the NCBI (which provides a public platform for investigators to share and compare their molecular cytogenetic data http://www.ncbi.nlm.nih.gov/sky/), with an ICD-O3 nomenclature (International Classification of Diseases - Oncology). Elsewhere, there are some others resources as SKY Karyotypes and FISH analysis of Epithelial Cancer Cell Lines at Cambridge (http://www.pawefish.path.cam.ac.uk/). III- Comparative genomic hybridization (CGH) resources In 1992, Dan Pinckel (Kallioniemi A et al., 1992) developed the comparative genomic hybridization (CGH) independently of the morphological analysis of chromosomes. In the first step of development, CGH was used on metaphases. But at the end of 1990 Solinas-Toldo (Solinas-Toldo S et al., 1997) and Pinkel et al. (Pinkel D et al., 1998) proposed a new technique of DNA hybridization on array (first spotted with cDNA, but rapidly, after 2002, with synthetized (50-80 mers) oligonucleotides. The genomic resolution was increased below 50-100 nucleotides, as the density of probes is, in parallel, increased from 20K to up 2M. Because it is a method of a ratio of copy numbers (often defined as log2 of the ratio) this technique only detects disequilibrium between a disease sample and a normal sample, and it has been applied to several aspects of genetic imbalances. Numerous arrays have been designed (from pan-genomic to specific of some abnormalities (custom design)). For example the GEO server (Gene Expression Omnibus) has 432 CGH platforms (with 233 as human) and 71 SNP (with 46 for human). The processing of CGH data is not obvious (with normalization of the raw data, centralization, segmentation in pieces of chromosomes with homogeneous copy number limited by breakpoints, and finally annotation of implicated genes). An optimal profile of copy number associated with accurate breakpoints requires normalization (with correction of GC content) and centralization (especially when the profile has a great part of abnormalities). This optimization also depends on the nature of the sample (such as clonality or the percentage of tumor cell). It is important to note the impact in clinical routine to define, for example, actionable genes (Commo F et al., 2015). Another extension of this approach are the SNP arrays that combine probes designed for copy-number measurement and probes specific of a known nucleotide variant ("single nucleotide polymorphism"). The great advantage is the possibility to measure the ploidy (which cannot be measured by CGH alone, as the measure is a relative value, depending on the percentage of tumor cells). Moreover, the segmentation of copy number can be correlated with the segmentation of LOH (loss of heterozygosity), which gives a better interpretation of the origin of abnormalities. Several sites are repositories for these CGH/SNP profiles: III-1 GEO (http://www.ncbi.nlm.nih.gov/geo/) GEO (Gene Expression Omnibus) is a public functional genomics data repository supporting MIAME-compliant data submissions. Array and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles. This database includes curated gene expression DataSets, as well as original series and platform records in the GEO repository. Mainly used for gene expression, GEO has a limited part dedicated to CGH datasets (1,358 experiments for human neoplasms). It is not easy to synthetize the variation of copy number results directly on the site. The best way is to export (as GSExxxRAW.tar) and reanalyze the data with a specific software (as Bioconductor packages or commercial companie's tools (Clough E and Barrett T, 2016). III-2 Array Express (http://www.ebi.ac.uk/arrayexpress/) Array Express is a similar archive of functional genomics data, stored data from high-throughput functional genomics experiments, and provides these data for the reuse for the research community (Petryszak R et al., 2016). There are several other sites that present reanalyzed data (public or local) with various analytic approaches and provide facilities for exploring abnormalities in different types of tumors. III-3 Tumorscape (http://www.broadinstitute.org/tcga/home) This portal (Broad Institute), created in 2010, is designed to facilitate the use and understanding of high resolution copy number data amassed from multiple cancer types. The 3,131 datasets are partly originating from GEO and reanalyzed with the GISTIC algorithm to identify regions that have been altered above the background rate and therefore may be subject to positive selection. For each of these regions, one or more "peak regions", most likely to contain the target genes, are identified (Beroukhim R et al., 2010). The following functionalities are supported: - Gene-level Analysis: One can query the level and significance of copy number alterations affecting any gene listed in Refseq (or miRNAs). Click "Analyses", then "by Gene". - Analysis by cancer type: One can query the most significant regions of amplification and deletion in individual cancer types. Click "Analyses", then "by Cancer Type". In Analysis by Gene, these data represent a GISTIC analysis performed on this cancer type. Across a large number of cancers, copy number alterations (amplifications or deletions) can be found almost anywhere in the genome. GISTIC identifies regions that are altered above the background rate and therefore may be subject to positive selection. For each of these regions, one or or more "peak regions", most likely to contain the target genes, are identified. The evidence that a gene is targeted by these copy number alterations includes: i) Presence in a peak region: these peak regions are the regions deemed most likely by GISTIC to contain the gene or genes being targeted by significant amplifications/deletions; ii) a significance (Q-value): this represents the likelihood that the gene only suffers amplifications/deletions at the background rate across the entire genome. The data can be visualized on the IGV (Integrated Genome Viewer). III-4 MetaCGH (http://compbio.med.harvard.edu/metacgh/) This website is designed to provide access to array CGH (comparative genomic hybridization) based on copy number profiles of 8,227 human cancer genomes (Figure 6). See the description of the database for more information about its composition (Kim TM et al., 2013). An interactive web-based browser facilitates the exploration of the result set: - Search for specific genes of interest. Support alternative gene nomenclatures. - Browse cytobands by frequency of alteration. - Visualize alteration frequency over the full set of tumor types for a gene of interest. Figure 6: PAX3 gain and loss in tumors at MetaCGH (http://compbio.med.harvard.edu/metacghBrowser/). III-5 CaSNP (http://cistrome.org/CaSNP/) CaSNP is a comprehensive collection of copy number alterations (CNA) from SNP arrays. It collects 11,485 Affymetrix SNP arrays of 34 different cancer types in 105 studies to profile the genome-wide CNA and SNP in each. This includes all the cancer SNP profiles using Affymetrix SNP arrays (10K to 6.0) with raw data from GEO, with additional arrays from the TCGA consortium and a few individual publications. All CNA data stored in CaSNP is generated from raw data analyzed by dCHIP-SNP software. Data can be visualized as table or heatmap. (Cao Q et al., 2011). III-6 Cell line project (http://cancer.sanger.ac.uk/cell_lines) For decades, human immortal cancer cell lines have constituted an accessible, easily usable set of biological models. In order to improve their utility the Cancer Genome Project has embarked on a systematic characterization of the genetics and genomics of large numbers of cancer cell lines. Prior knowledge of their genetic abnormalities may allow more informed choice of cancer cell lines in biological experiments and drug testing and more informed interpretation of results. Among other information (exome sequencing) the COSMIC Cell Lines Project includes genome wide copy number analysis and genotyping information obtained by using the Affymetrix SNP6 array and analyzed by using the PICNIC algorithm. A complete list of cell lines can be found on http://cancer.sanger.ac.uk/cell_lines/cbrowse/all. III-7 Cancer Cell Line Encyclopedia (http://www.broadinstitute.org/ccle/home). For several years, the Broad Institute has developed resources for cell lines data, especially copy number analysis with Affymetrix SNP6.0 arrays. These last two resources are complementary. Several other sites presenting global resources from TCGA or ICGC programs give access for each disease by copy number analysis (e.g. Broad GDAC FireBrowse, cBioPortal (see below), OASIS portal ....) III-8 ArrayMap (http://www.arraymap.org) ArrayMap is a curated reference database and bioinformatics resource targeting copy number profiles that provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. The current data reflects 65,042 genomic copy number arrays, in 986 experimental series and on 333 array platforms (Cai H et al., 2015). A main interest of these resources (originating in great part from GEO datasets) is the fine classification with the ICD-O3 nomenclature. This resource is an elaborate and complete site for querying large amount of CGH data of cancer. For the majority of the samples, probe level visualization and customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools (as linear, circularized or karyotype like presentations). Numerous tools permit visualization of part of profiles (selection of chromosomes or genes) and export of data in tabulated files. An API (with relatively easy syntaxes) facilitates an automation of analyses. Moreover a majority of cards (leukemia or solid tumors) in the Atlas are linked, via ICD-O3 codes, to ArrayMap (Figure 7). Figures 7: ArrayMap (http://www.arraymap.org/) Selection of 26 samples of T lymphoblastic leukaemia/lymphoma (ICD-O 9837/3) to obtain a "heatmap" of gain and loss for all the samples showing the variability of CGH profiles. IV- Mutation databases The difference between single nucleotide (SNP) as the variability within a population and mutations acquired in a neoplastic process is extremely crucial. The determination of variants was previously obtained by SNP arrays, but is nowadays performed by massive parallel sequencing. As a result, a huge quantity of polymorphisms and mutations in tumors, are compared to controls. The landscape of the majority of recurrent mutations is now known and can be used for diagnosis. Even in haematological malignancies, where the chromosome rearrangements have shown to bear a major role, nonetheless, it appears now that some mutations at the nucleotide level can still be very important in determining treatments in relation to patient outcome (e.g. ASXL1, ATM, BCL6, BRAF, KRAS and NRAS, CBL, CCND3, CDKN2A and CDKN2C, CEBPA, CRLF2, ETV6, FLT3, GATA2, ID3, IDH1, IDH2, IKZF1, JAK1, KIT, MYD88, NOTCH1, NPM1, RUNX1, TP53).IV-1 COSMIC (http://cancer.sanger.ac.uk/cosmic) COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers. In the v76 (Feb 2016), there are 3,942,175 mutations on 1,192,776 samples collected in 22,844 papers. The interface has been fully redesigned and offers multiple ways to view mutations, fusions, copy numbers, etc. (Forbes SA et al., 2015). IV-2 CENSUS (http://cancer.sanger.ac.uk/census/) The Cancer Gene Census is an ongoing effort to include cancer genes for which mutations have been causally implicated in cancer. The original census and analysis was published in Nature Reviews Cancer and supplemental analysis information related to the paper is also available. The census is regularly updated. In particular, Felix Mitelman and his colleagues have been continuing to provide information on more genes involved in uncommon translocations in leukaemias and lymphomas. Currently, there is more than 1% of all human genes that have been mutated in cancer. Out of these, roughtly 90% cancer mutations are somatic, 20% bear germline mutations that predispose to cancer and 10% show both somatic and germline mutations (Futreal PA et al., 2004). IV-3 HGMD (http://www.hgmd.cf.ac.uk/ac/index.php) The recognition that certain DNA sequences are hypermutable has yielded clues to the endogenous mutational mechanisms involved and has provided insights into the intricacies of the processes of DNA replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller understanding of the mutational process may prove important in molecular diagnostic medicine by contributing to improvements in the design and efficacy of mutation search procedures and strategies for different genetic disorders. The Human Gene Mutation Database (HGMD) collects known (published) gene lesions responsible for human inherited disease. This database, whilst originally established for the study of mutational mechanisms in human genes (Cooper DN and Krawczak M, 1996) has now acquired a much broader utility in that it embodies an up-to-date and comprehensive reference source to the spectrum of inherited human genes. Thus, HGMD provides information of practical diagnostic importance to i) researchers and diagnosticians in human molecular genetics, ii) physicians interested in a particular inherited condition in a given patient or family, and iii) genetic counselors. Note: HGMD has two types of access: a free public one with limited data and a professional one requiring a license. IV-4 LOVD (http://www.lovd.nl/3.0/home) LOVD stands for Leiden Open (source) Variation Database. The LOVD's purpose is to provide a flexible tool for gene-centered collection and display of DNA variations. LOVD 3.0 extends this idea to also provide patient-centered data storage and NGS data storage, even for variants outside of genes. LOVD consist of both a database soltware and the content from Locus Specific Mutations databases (LSSB) (http://grenada.lumc.nl/LSDBlist/lsdbs) which are curated by laboratories. A general access gives links to each gene (92,241 entries in all) (Fokkema IF et al., 2011). IV-5 TCGA cBIoPortal (http://www.cbioportal.org/) The cBioPortal for Cancer Genomics provides visualization, analysis and access of large-scale cancer genomics data sets (126 in April 2016). For each dataset the portal presents several diagrams for mutations, copy number variations, survival analysis so on (Figure 8). It also provides help in analysing a list of predefined genes (Deng M et al., 2016). Figure 8: PAX5 alterations in cancer at cBioPortal (http://www.cbioportal.org/, Select Cancer Study, tick "all"; Enter Gene Set: "write; "PAX5")IV-6 ICGC Data Portal (https://dcc.icgc.org/) The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium's member projects. The Pancancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutations in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium. It contains descriptions of 36,985,985 mutations in 57,773 genes and 17,867 donors within 66 projects in 21 primary sites (Zhang J et al., 2011). IV-7 OASIS Portal (see above) presents data from 30 datasets (from Acute Myeloid Leukemia to Uterine Corpus Endometrial Carcinosarcoma) with 6,817 mutations, 11,222 CNVs and expression (8,178 RNA Seq and 4,889 microarrays). IV-8 IntOGen (http://www.intogen.org) IntOGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes (Figure 9). At the end of 2014, IntOGen defines a list of 459 driver genes in 28 cancer types (Gundem G et al., 2010). Figure 9: PAX5 mutation frequency at intOGen (http://www.intogen.org/search?gene=PAX5) IV-9 BioMuta v2 (https://hive.biochemistry.gwu.edu/tools/biomuta/) BioMuta v2.0 is a curated single-nucleotide variation (SNV) and disease association database where the variations are mapped to the genome/protein/gene. Oriented toward cancer, the database has 5,233,790 SNV for 41 cancer types and gives position of mutation and frequency in each cancer type (Wu TJ et al., 2014). IV-10 DoCM (http://docm.genome.wustl.edu/) The Database of Curated Mutations (DoCM) is a curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification. Curation of the literature to produce a high quality set of pathogenic somatic mutations is not straitforward. This requires sifting through the ever growing body of cancer research literature (6% annual growth rate in the last 10 years), which for year 2015 means over 156,399 articles related to cancer as indexed by PubMed. This volume of literature makes it difficult to identify bona fide somatic mutations with characterized functional or clinical significance in cancer. Once identified, these mutations require significant curation efforts to format and standardize the mutations in a consistent way that enables databasing. For example, publications often only specify the amino acid change and gene name to describe the mutation. DoCM addresses these challenges by acting as an accessible, open-source, and openly licensed somatic mutation repository that also enables community contributions. IV-11 CIViC (https://civic.genome.wustl.edu/#/home) The CIViC (Clinical Interpretations of Variants in Cancer) database is based on Evidence items which reference their parent variants, variant groups, and genes. One can explore the various CIViC entities and their attributes using the menu. Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. In the context of cancer, this might involve the identification of specific mutations shown to predict response to a targeted therapy. The biomedical literature describing these associations is large and growing rapidly. Currently these interpretations exist largely in private or encumbered databases resulting in extensive repetition of effort. Currently this database is just starting with 212 genes (474 variants) analysed from 870 publications. IV-12 ExAC (http://exac.broadinstitute.org) ExAC (Exome Aggregation Consortium) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary of the data available for the wider scientific community. The data set provided on this website spans across 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. The data are available under the ODC Open Database License (ODbL). One is allowed to freely share and modify the ExAC data as long as it is of public use of the database, or work produced from the database, with keeping the resulting data-sets open and offering the shared or adapted version of the data under the same ODbL license (Minikel EV et al., 2016). Bibliography The cancer genome Stratton MR, Campbell PJ, Futreal PA Nature 2009 Apr 9;458(7239):719-24PMID 19360079 The emerging complexity of gene fusions in cancer Mertens F, Johansson B, Fioretos T, Mitelman F Nat Rev Cancer 2015 Jun;15(6):371-81PMID 25998716 Zur Frage der Enstehung maligner Tumoren Boveri T. 1914 Gustav Fischer A minute Chromosome in Human Chronic Ganulocytic Leukemia Nowell PC, Hungerford DA Science 1960 132:1497 Fluorescent labeling of chromosomal DNA: superiority of quinacrine mustard to quinacrine Caspersson T, Zech L, Modest EJ Science 1970 Nov 13;170(3959):762PMID 5479635 Identificaton of a translocation with quinacrine fluorescence in a patient with acute leukemia Rowley JD Ann Genet 1973 Jun;16(2):109-12PMID 4125056 A cellular oncogene is translocated to the Philadelphia chromosome in chronic myelocytic leukaemia de Klein A, van Kessel AG, Grosveld G, Bartram CR, Hagemeijer A, Bootsma D, Spurr NK, Heisterkamp N, Groffen J, Stephenson JR Nature 1982 Dec 23;300(5894):765-7PMID 6960256 Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining Rowley JD Nature 1973 Jun 1;243(5405):290-3PMID 4126434 Characteristic chromosomal abnormalities in biopsies and lymphoid-cell lines from patients with Burkitt and non-Burkitt lymphomas Zech L, Haglund U, Nilsson K, Klein G Int J Cancer 1976 Jan 15;17(1):47-56PMID 946170 A new translocation in Burkitt's tumor cells Berger R, Bernheim A, Weh HJ, Flandrin G, Daniel MT, Brouet JC, Colbert N Hum Genet 1979;53(1):111-2PMID 535896 2/8 translocation in a Japanese Burkitt's lymphoma Miyoshi I, Hiraki S, Kimura I, Miyamoto K, Sato J Experientia 1979 Jun 15;35(6):742-3PMID 467575 Variant translocation in Burkitt lymphoma Van Den Berghe H, Gosseye CP, Englebienne V, Cornu G, Sokal G Cancer Genetics and Cytogenetics 1960, 1; 9-14 Chromosomes and causation of human cancer and leukemia Oshimura M, Freeman AI, Sandberg AA XXVI Binding studies in acute lymphoblastic leukemia (ALL)PMID 268996 15/17 translocation, a consistent chromosomal change in acute promyelocytic leukaemia Rowley JD, Golomb HM, Dougherty C Lancet 1977 Mar 5;1(8010):549-50PMID 65649 Chromosome abnormalities in poorly differentiated lymphocytic lymphoma Fukuhara S, Rowley JD, Variakojis D, Golomb HM Cancer Res 1979 Aug;39(8):3119-28PMID 582296 Nonrandom chromosome changes involving the Ig gene-carrying chromosomes 12 and 6 in pristane-induced mouse plasmacytomas Ohno S, Babonits M, Wiener F, Spira J, Klein G, Potter M Cell 1979 Dec;18(4):1001-7PMID 519762 Alveolar rhabdomyosarcoma: a cytogenetic and correlated cytological and histological study Seidal T, Mark J, Hagmar B, Angervall L Acta Pathol Microbiol Immunol Scand A 1982 Sep;90(5):345-54PMID 7148452 [Translocation of chromosome 22 in Ewing's sarcoma] Aurias A, Rimbaut C, Buffe D, Dubousset J, Mazabraud A C R Seances Acad Sci III 1983;296(23):1105-7PMID 6416623 [Chromosomal translocation (11; 22) in cell lines of Ewing's sarcoma] Turc-Carel C, Philip I, Berger MP, Philip T, Lenoir G C R Seances Acad Sci III 1983;296(23):1101-3PMID 6416622 Cytogenetics of a renal adenocarcinoma in a 2-year-old child de Jong B, Molenaar IM, Leeuw JA, Idenberg VJ, Oosterhuis JW Cancer Genet Cytogenet 1986 Mar 15;21(2):165-9PMID 3004698 6q- and loss of the Y chromosome--two common deviations in malignant human salivary gland tumors Stenman G, Sandros J, Dahlenfors R, Juberg-Ode M, Mark J Cancer Genet Cytogenet 1986 Aug;22(4):283-93PMID 3015376 The mixed salivary gland tumor Ñ A normally benign human neoplasm frequently showing specific chromosomal abnormalities. Mark J, Dahlenfors R, Ekedahl C, Stenman G Cancer Genetics and Cytogenetics 1980 2, 231-24 Reciprocal translocation t(3;12)(q27;q13) in lipoma Heim S, Mandahl N, Kristoffersson U, Mitelman F, Rser B, Rydholm A, Willén H Cancer Genet Cytogenet 1986 Dec;23(4):301-4PMID 3779626 Cytogenetic studies of adipose tissue tumors Turc-Carel C, Dal Cin P, Rao U, Karakousis C, Sandberg AA I A benign lipoma with reciprocal translocation t(3;12)(q28;q14)PMID 3779624 Two site-specific deletions and t(1;14) translocation restricted to human T-cell acute leukemias disrupt the 5' part of the tal-1 gene Bernard O, Lecointe N, Jonveaux P, Souyri M, Mauchauffé M, Berger R, Larsen CJ, Mathieu-Mahul D Oncogene 1991 Aug;6(8):1477-88PMID 1886719 In vivo amplification of the PAX3-FKHR and PAX7-FKHR fusion genes in alveolar rhabdomyosarcoma Barr FG, Nauta LE, Davis RJ, Schäfer BW, Nycum LM, Biegel JA Hum Mol Genet 1996 Jan;5(1):15-21PMID 8789435 Deregulation of the platelet-derived growth factor B-chain gene via fusion with collagen gene COL1A1 in dermatofibrosarcoma protuberans and giant-cell fibroblastoma Simon MP, Pedeutour F, Sirvent N, Grosgeorge J, Minoletti F, Coindre JM, Terrier-Lacombe MJ, Mandahl N, Craver RD, Blin N, Sozzi G, Turc-Carel C, O'Brien KP, Kedra D, Fransson I, Guilbaud C, Dumanski JP Nat Genet 1997 Jan;15(1):95-8PMID 8988177 Large deletions at the t(9;22) breakpoint are common and may identify a poor-prognosis subgroup of patients with chronic myeloid leukemia Sinclair PB, Nacheva EP, Leversha M, Telford N, Chang J, Reid A, Bench A, Champion K, Huntly B, Green AR Blood 2000 Feb 1;95(3):738-43PMID 10648381 FUS-CREB3L2/L1-positive sarcomas show a specific gene expression profile with upregulation of CD24 and FOXL1 Möller E, Hornick JL, Magnusson L, Veerla S, Domanski HA, Mertens F Clin Cancer Res 2011 May 1;17(9):2646-56PMID 21536545 Genome profiling of chronic myelomonocytic leukemia: frequent alterations of RAS and RUNX1 genes Gelsi-Boyer V, Trouplin V, Adélaïde J, Aceto N, Remy V, Pinson S, Houdayer C, Arnoulet C, Sainty D, Bentires-Alj M, Olschwang S, Vey N, Mozziconacci MJ, Birnbaum D, Chaffanet M BMC Cancer 2008 Oct 16;8:299PMID 18925961 The recurrent SET-NUP214 fusion as a new HOXA activation mechanism in pediatric T-cell acute lymphoblastic leukemia Van Vlierberghe P, van Grotel M, Tchinda J, Lee C, Beverloo HB, van der Spek PJ, Stubbs A, Cools J, Nagata K, Fornerod M, Buijs-Gladdines J, Horstmann M, van Wering ER, Soulier J, Pieters R, Meijerink JP Blood 2008 May 1;111(9):4668-80PMID 18299449 Rearrangement of CRLF2 in B-progenitor- and Down syndrome-associated acute lymphoblastic leukemia Mullighan CG, Collins-Underwood JR, Phillips LA, Loudin MG, Liu W, Zhang J, Ma J, Coustan-Smith E, Harvey RC, Willman CL, Mikhail FM, Meyer J, Carroll AJ, Williams RT, Cheng J, Heerema NA, Basso G, Pession A, Pui CH, Raimondi SC, Hunger SP, Downing JR, Carroll WL, Rabin KR Nat Genet 2009 Nov;41(11):1243-6PMID 19838194 Oncogenic activation of FOXR1 by 11q23 intrachromosomal deletion-fusions in neuroblastoma Santo EE, Ebus ME, Koster J, Schulte JH, Lakeman A, van Sluis P, Vermeulen J, Gisselsson D, Øra I, Lindner S, Buckley PG, Stallings RL, Vandesompele J, Eggert A, Caron HN, Versteeg R, Molenaar JJ Oncogene 2012 Mar 22;31(12):1571-81PMID 21860421 Fusions involving protein kinase C and membrane-associated proteins in benign fibrous histiocytoma Pńaszczyca A, Nilsson J, Magnusson L, Brosjö O, Larsson O, Vult von Steyern F, Domanski HA, Lilljebjörn H, Fioretos T, Tayebwa J, Mandahl N, Nord KH, Mertens F Int J Biochem Cell Biol 2014 Aug;53:475-81PMID 24721208 FLNA, a new partner gene fused to MLL in a patient with acute myelomonoblastic leukaemia De Braekeleer E, Douet-Guilbert N, Morel F, Le Bris MJ, Meyer C, Marschalek R, Férec C, De Braekeleer M Br J Haematol 2009 Sep;146(6):693-5PMID 19622092 The MLL recombinome of acute leukemias in 2013 Meyer C, Hofmann J, Burmeister T, Gröger D, Park TS, Emerenciano M, Pombo de Oliveira M, Renneville A, Villarese P, Macintyre E, Cavé H, Clappier E, Mass-Malo K, Zuna J, Trka J, De Braekeleer E, De Braekeleer M, Oh SH, Tsaur G, Fechina L, van der Velden VH, van Dongen JJ, Delabesse E, Binato R, Silva ML, Kustanovich A, Aleinikova O, Harris MH, Lund-Aho T, Juvonen V, Heidenreich O, Vormoor J, Choi WW, Jarosova M, Kolenova A, Bueno C, Menendez P, Wehner S, Eckert C, Talmant P, Tondeur S, Lippert E, Launay E, Henry C, Ballerini P, Lapillone H, Callanan MB, Cayuela JM, Herbaux C, Cazzaniga G, Kakadiya PM, Bohlander S, Ahlmann M, Choi JR, Gameiro P, Lee DS, Krauter J, Cornillet-Lefebvre P, Te Kronnie G, Schäfer BW, Kubetzko S, Alonso CN, zur Stadt U, Sutton R, Venn NC, Izraeli S, Trakhtenbrot L, Madsen HO, Archer P, Hancock J, Cerveira N, Teixeira MR, Lo Nigro L, Möricke A, Stanulla M, Schrappe M, Sedék L, Szczepański T, Zwaan CM, Coenen EA, van den Heuvel-Eibrink MM, Strehl S, Dworzak M, Panzer-Grümayer R, Dingermann T, Klingebiel T, Marschalek R Leukemia 2013 Nov;27(11):2165-76PMID 23628958 The new cytogenetics: blurring the boundaries with molecular biology Speicher MR, Carter NP Nat Rev Genet 2005 Oct;6(10):782-92PMID 16145555 Array comparative genomic hybridization and its applications in cancer Pinkel D, Albertson DG Nat Genet 2005 Jun;37 Suppl:S11-7 Genetic diagnosis in malignant hemopathies: from cytogenetics to next-generation sequencing De Braekeleer E, Douet-Guilbert N, De Braekeleer M Expert Rev Mol Diagn 2014 Mar;14(2):127-9PMID 24437978 Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM Science 2005 Oct 28;310(5748):644-8PMID 16254181 A landscape effect in tenosynovial giant-cell tumor from activation of CSF1 expression by a translocation in a minority of tumor cells West RB, Rubin BP, Miller MA, Subramanian S, Kaygusuz G, Montgomery K, Zhu S, Marinelli RJ, De Luca A, Downs-Kelly E, Goldblum JR, Corless CL, Brown PO, Gilks CB, Nielsen TO, Huntsman D, van de Rijn M Proc Natl Acad Sci U S A 2006 Jan 17;103(3):690-5PMID 16407111 Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer Rikova K, Guo A, Zeng Q, Possemato A, Yu J, Haack H, Nardone J, Lee K, Reeves C, Li Y, Hu Y, Tan Z, Stokes M, Sullivan L, Mitchell J, Wetzel R, Macneill J, Ren JM, Yuan J, Bakalarski CE, Villen J, Kornhauser JM, Smith B, Li D, Zhou X, Gygi SP, Gu TL, Polakiewicz RD, Rush J, Comb MJ Cell 2007 Dec 14;131(6):1190-203PMID 18083107 Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, Fujiwara S, Watanabe H, Kurashina K, Hatanaka H, Bando M, Ohno S, Ishikawa Y, Aburatani H, Niki T, Sohara Y, Sugiyama Y, Mano H Nature 2007 Aug 2;448(7153):561-6PMID 17625570 Identification of a novel, recurrent HEY1-NCOA2 fusion in mesenchymal chondrosarcoma based on a genome-wide screen of exon-level expression data Wang L, Motoi T, Khanin R, Olshen A, Mertens F, Bridge J, Dal Cin P, Antonescu CR, Singer S, Hameed M, Bovee JV, Hogendoorn PC, Socci N, Ladanyi M Genes Chromosomes Cancer 2012 Feb;51(2):127-39PMID 22034177 Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing Campbell PJ, Stephens PJ, Pleasance ED, O'Meara S, Li H, Santarius T, Stebbings LA, Leroy C, Edkins S, Hardy C, Teague JW, Menzies A, Goodhead I, Turner DJ, Clee CM, Quail MA, Cox A, Brown C, Durbin R, Hurles ME, Edwards PA, Bignell GR, Stratton MR, Futreal PA Nat Genet 2008 Jun;40(6):722-9PMID 18438408 Transcriptome sequencing to detect gene fusions in cancer Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM Nature 2009 Mar 5;458(7234):97-101PMID 19136943 Chimeric transcript discovery by paired-end transcriptome sequencing Maher CA, Palanisamy N, Brenner JC, Cao X, Kalyana-Sundaram S, Luo S, Khrebtukova I, Barrette TR, Grasso C, Yu J, Lonigro RJ, Schroth G, Kumar-Sinha C, Chinnaiyan AM Proc Natl Acad Sci U S A 2009 Jul 28;106(30):12353-8PMID 19592507 Complex landscapes of somatic rearrangement in human breast cancer genomes Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, Stebbings LA, Leroy C, Edkins S, Mudie LJ, Greenman CD, Jia M, Latimer C, Teague JW, Lau KW, Burton J, Quail MA, Swerdlow H, Churcher C, Natrajan R, Sieuwerts AM, Martens JW, Silver DP, Langerød A, Russnes HE, Foekens JA, Reis-Filho JS, van 't Veer L, Richardson AL, Børresen-Dale AL, Campbell PJ, Futreal PA, Stratton MR Nature 2009 Dec 24;462(7276):1005-10PMID 20033038 Comprehensive molecular characterization of clear cell renal cell carcinoma Cancer Genome Atlas Research Network Nature 2013 Jul 4;499(7456):43-9PMID 23792563 Comprehensive genomic characterization of squamous cell lung cancers Cancer Genome Atlas Research Network Nature 2012 Sep 27;489(7417):519-25PMID 22960745 Comprehensive molecular characterization of urothelial bladder carcinoma Cancer Genome Atlas Research Network Nature 2014 Mar 20;507(7492):315-22PMID 24476821 Integrated genomic characterization of endometrial carcinoma Cancer Genome Atlas Research Network, Kandoth C, Schultz N, Cherniack AD, Akbani R, Liu Y, Shen H, Robertson AG, Pashtan I, Shen R, Benz CC, Yau C, Laird PW, Ding L, Zhang W, Mills GB, Kucherlapati R, Mardis ER, Levine DA Nature 2013 May 2;497(7447):67-73PMID 23636398 MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers Steidl C, Shah SP, Woolcock BW, Rui L, Kawahara M, Farinha P, Johnson NA, Zhao Y, Telenius A, Neriah SB, McPherson A, Meissner B, Okoye UC, Diepstra A, van den Berg A, Sun M, Leung G, Jones SJ, Connors JM, Huntsman DG, Savage KJ, Rimsza LM, Horsman DE, Staudt LM, Steidl U, Marra MA, Gascoyne RD Nature 2011 Mar 17;471(7338):377-81PMID 21368758 Use of whole-genome sequencing to diagnose a cryptic fusion oncogene Welch JS, Westervelt P, Ding L, Larson DE, Klco JM, Kulkarni S, Wallis J, Chen K, Payton JE, Fulton RS, Veizer J, Schmidt H, Vickery TL, Heath S, Watson MA, Tomasson MH, Link DC, Graubert TA, DiPersio JF, Mardis ER, Ley TJ, Wilson RK JAMA 2011 Apr 20;305(15):1577-84PMID 21505136 Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia Roberts KG, Morin RD, Zhang J, Hirst M, Zhao Y, Su X, Chen SC, Payne-Turner D, Churchman ML, Harvey RC, Chen X, Kasap C, Yan C, Becksfort J, Finney RP, Teachey DT, Maude SL, Tse K, Moore R, Jones S, Mungall K, Birol I, Edmonson MN, Hu Y, Buetow KE, Chen IM, Carroll WL, Wei L, Ma J, Kleppe M, Levine RL, Garcia-Manero G, Larsen E, Shah NP, Devidas M, Reaman G, Smith M, Paugh SW, Evans WE, Grupp SA, Jeha S, Pui CH, Gerhard DS, Downing JR, Willman CL, Loh M, Hunger SP, Marra MA, Mullighan CG Cancer Cell 2012 Aug 14;22(2):153-66PMID 22897847 The landscape and therapeutic relevance of cancer-associated transcript fusions Yoshihara K, Wang Q, Torres-Garcia W, Zheng S, Vegesna R, Kim H, Verhaak RG Oncogene 2015 Sep 10;34(37):4845-54PMID 25500544 Mitelman database of chromosome aberrations and genes fusions in Cancer Mitelman F, Johansson B, Merten sF Mitelman F, Johansson B and Mertens F (Eds.) 2016, http://cgap.nci.nih.gov/Chromosomes/Mitelman Atlas of genetics and cytogenetics in oncology and haematology in 2013 Huret JL, Ahmad M, Arsaban M, Bernheim A, Cigna J, Desangles F, Guignard JC, Jacquemot-Perbal MC, Labarussias M, Leberre V, Malo A, Morel-Pair C, Mossafa H, Potier JC, Texier G, Viguié F, Yau Chun Wan-Senon S, Zasadzinski A, Dessen P Nucleic Acids Res 2013 Jan;41(Database issue):D920-4PMID 23161685 The impact of translocations and gene fusions on cancer causation Mitelman F, Johansson B, Mertens F Nat Rev Cancer 2007 Apr;7(4):233-45PMID 17361217 Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer Kalyana-Sundaram S, Shankar S, Deroo S, Iyer MK, Palanisamy N, Chinnaiyan AM, Kumar-Sinha C Neoplasia 2012 Aug;14(8):702-8PMID 22952423 Implications of chimaeric non-co-linear transcripts Gingeras TR Nature 2009 Sep 10;461(7261):206-11PMID 19741701 SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer Rickman DS, Pflueger D, Moss B, VanDoren VE, Chen CX, de la Taille A, Kuefer R, Tewari AK, Setlur SR, Demichelis F, Rubin MA Cancer Res 2009 Apr 1;69(7):2734-8PMID 19293179 New insights to the MLL recombinome of acute leukemias Meyer C, Kowarz E, Hofmann J, Renneville A, Zuna J, Trka J, Ben Abdelali R, Macintyre E, De Braekeleer E, De Braekeleer M, Delabesse E, de Oliveira MP, Cavé H, Clappier E, van Dongen JJ, Balgobind BV, van den Heuvel-Eibrink MM, Beverloo HB, Panzer-Grümayer R, Teigler-Schlegel A, Harbott J, Kjeldsen E, Schnittger S, Koehl U, Gruhn B, Heidenreich O, Chan LC, Yip SF, Krzywinski M, Eckert C, Möricke A, Schrappe M, Alonso CN, Schäfer BW, Krauter J, Lee DA, Zur Stadt U, Te Kronnie G, Sutton R, Izraeli S, Trakhtenbrot L, Lo Nigro L, Tsaur G, Fechina L, Szczepanski T, Strehl S, Ilencikova D, Molkentin M, Burmeister T, Dingermann T, Klingebiel T, Marschalek R Leukemia 2009 Aug;23(8):1490-9PMID 19262598 Next-generation sequencing of RNA and DNA isolated from paired fresh-frozen and formalin-fixed paraffin-embedded samples of human cancer and normal tissue Hedegaard J, Thorsen K, Lund MK, Hein AM, Hamilton-Dutoit SJ, Vang S, Nordentoft I, Birkenkamp-Demtröder K, Kruhøffer M, Hager H, Knudsen B, Andersen CL, Sørensen KD, Pedersen JS, Ørntoft TF, Dyrskjøt L PLoS One 2014 May 30;9(5):e98187PMID 24878701 The evolving classification of soft tissue tumours - an update based on the new 2013 WHO classification Fletcher CD Histopathology 2014 Jan;64(1):2-11PMID 24164390 The 2016 revision of the World Health Organization (WHO) classification of lymphoid neoplasms Swerdlow SH, Campo E, Pileri SA, Harris NL, Stein H, Siebert R, Advani R, Ghielmini M, Salles GA, Zelenetz AD, Jaffe ES Blood 2016 Mar 15PMID 26980727 Towards individualized follow-up in adult acute myeloid leukemia in remission Hokland P, Ommen HB Blood 2011 Mar 3;117(9):2577-84PMID 21097673 Liquid biopsy: monitoring cancer-genetics in the blood Crowley E, Di Nicolantonio F, Loupakis F, Bardelli A Nat Rev Clin Oncol 2013 Aug;10(8):472-84PMID 23836314 Microfluidic, marker-free isolation of circulating tumor cells from blood samples Karabacak NM, Spuhler PS, Fachin F, Lim EJ, Pai V, Ozkumur E, Martel JM, Kojic N, Smith K, Chen PI, Yang J, Hwang H, Morgan B, Trautwein J, Barber TA, Stott SL, Maheswaran S, Kapur R, Haber DA, Toner M Nat Protoc 2014 Mar;9(3):694-710PMID 24577360 A novel flow cytometry-based cell capture platform for the detection, capture and molecular characterization of rare tumor cells in blood Watanabe M, Serizawa M, Sawada T, Takeda K, Takahashi T, Yamamoto N, Koizumi F, Koh Y J Transl Med 2014 May 23;12:143PMID 24886394 Pharmacogenomic modeling of circulating tumor and invasive cells for prediction of chemotherapy response and resistance in pancreatic cancer Yu KH, Ricigliano M, Hidalgo M, Abou-Alfa GK, Lowery MA, Saltz LB, Crotty JF, Gary K, Cooper B, Lapidus R, Sadowska M, O'Reilly EM Clin Cancer Res 2014 Oct 15;20(20):5281-9PMID 25107917 Identification of a population of blood circulating tumor cells from breast cancer patients that initiates metastasis in a xenograft assay Baccelli I, Schneeweiss A, Riethdorf S, Stenzinger A, Schillert A, Vogel V, Klein C, Saini M, Bäuerle T, Wallwiener M, Holland-Letz T, Höfner T, Sprick M, Scharpff M, Marmé F, Sinn HP, Pantel K, Weichert W, Trumpp A Nat Biotechnol 2013 Jun;31(6):539-44PMID 23609047 Development of personalized tumor biomarkers using massively parallel sequencing Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, Antipova A, Lee C, McKernan K, De La Vega FM, Kinzler KW, Vogelstein B, Diaz LA Jr, Velculescu VE Sci Transl Med 2010 Feb 24;2(20):20ra14PMID 20371490 Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome Druker BJ, Sawyers CL, Kantarjian H, Resta DJ, Reese SF, Ford JM, Capdeville R, Talpaz M N Engl J Med 2001 Apr 5;344(14):1038-42PMID 11287973 Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia Druker BJ, Talpaz M, Resta DJ, Peng B, Buchdunger E, Ford JM, Lydon NB, Kantarjian H, Capdeville R, Ohno-Jones S, Sawyers CL N Engl J Med 2001 Apr 5;344(14):1031-7PMID 11287972 Imatinib mesylate in advanced dermatofibrosarcoma protuberans: pooled analysis of two phase II clinical trials Rutkowski P, Van Glabbeke M, Rankin CJ, Ruka W, Rubin BP, Debiec-Rychter M, Lazar A, Gelderblom H, Sciot R, Lopez-Terrada D, Hohenberger P, van Oosterom AT, Schuetze SM; European Organisation for Research and Treatment of Cancer Soft Tissue/Bone Sarcoma Group; Southwest Oncology Group J Clin Oncol 2010 Apr 1;28(10):1772-9PMID 20194851 Adjuvant treatment of GIST: patient selection and treatment strategies Joensuu H Nat Rev Clin Oncol 2012 Apr 24;9(6):351-8PMID 22525709 Philadelphia chromosome-positive acute lymphoblastic leukemia: current treatment and future perspectives Lee HJ, Thompson JE, Wang ES, Wetzler M Cancer 2011 Apr 15;117(8):1583-94PMID 21472706 RET fusion gene: translation to personalized lung cancer therapy Kohno T, Tsuta K, Tsuchihara K, Nakaoku T, Yoh K, Goto K Cancer Sci 2013 Nov;104(11):1396-400PMID 23991695 Tyrosine kinase gene rearrangements in epithelial malignancies Shaw AT, Hsu PP, Awad MM, Engelman JA Nat Rev Cancer 2013 Nov;13(11):772-87PMID 24132104 Targeting the MLL complex in castration-resistant prostate cancer Malik R, Khan AP, Asangani IA, Cielik M, Prensner JR, Wang X, Iyer MK, Jiang X, Borkin D, Escara-Wilke J, Stender R, Wu YM, Niknafs YS, Jing X, Qiao Y, Palanisamy N, Kunju LP, Krishnamurthy PM, Yocum AK, Mellacheruvu D, Nesvizhskii AI, Cao X, Dhanasekaran SM, Feng FY, Grembecka J, Cierpicki T, Chinnaiyan AM Nat Med 2015 Apr;21(4):344-52PMID 25822367 DOT1L inhibits SIRT1-mediated epigenetic silencing to maintain leukemic gene expression in MLL-rearranged leukemia Chen CW, Koche RP, Sinha AU, Deshpande AJ, Zhu N, Eng R, Doench JG, Xu H, Chu SH, Qi J, Wang X, Delaney C, Bernt KM, Root DE, Hahn WC, Bradner JE, Armstrong SA Nat Med 2015 Apr;21(4):335-43PMID 25822366 Inhibition of BET recruitment to chromatin as an effective treatment for MLL-fusion leukaemia Dawson MA, Prinjha RK, Dittmann A, Giotopoulos G, Bantscheff M, Chan WI, Robson SC, Chung CW, Hopf C, Savitski MM, Huthmacher C, Gudgin E, Lugo D, Beinke S, Chapman TD, Roberts EJ, Soden PE, Auger KR, Mirguet O, Doehner K, Delwel R, Burnett AK, Jeffrey P, Drewes G, Lee K, Huntly BJ, Kouzarides T Nature 2011 Oct 2;478(7370):529-33PMID 21964340 EZH2 inhibition as a therapeutic strategy for lymphoma with EZH2-activating mutations McCabe MT, Ott HM, Ganji G, Korenchuk S, Thompson C, Van Aller GS, Liu Y, Graves AP, Della Pietra A 3rd, Diaz E, LaFrance LV, Mellinger M, Duquenne C, Tian X, Kruger RG, McHugh CF, Brandt M, Miller WH, Dhanak D, Verma SK, Tummino PJ, Creasy CL Nature 2012 Dec 6;492(7427):108-12PMID 23051747 EZH2 inhibition sensitizes BRG1 and EGFR mutant lung tumours to TopoII inhibitors Fillmore CM, Xu C, Desai PT, Berry JM, Rowbotham SP, Lin YJ, Zhang H, Marquez VE, Hammerman PS, Wong KK, Kim CF Nature 2015 Apr 9;520(7546):239-42PMID 25629630 [Cytogenetics, cytogenomics and cancer: 2004 update] Bernheim A, Huret JL, Guillaud-Bataille M, Brison O, Couturiers J; Groupe Français de Cytogéné Oncologique Bull Cancer 2004 Jan;91(1):29-43PMID 14975803 Genetics and metabolism in Neurospora BEADLE GW Physiol Rev 1945 Oct;25:643-63PMID 21004451 The GenBank nucleic acid sequence database Burks C, Fickett JW, Goad WB, Kanehisa M, Lewitter FI, Rindone WP, Swindell CD, Tung CS, Bilofsky HS Comput Appl Biosci 1985 Dec;1(4):225-33PMID 3880345 GenBank Burks C, Cassidy M, Cinkosky MJ, Cumella KE, Gilna P, Hayden JE, Keen GM, Kelley TA, Kelly M, Kristofferson D, et al Nucleic Acids Res 1991 Apr 25;19 Suppl:2221-5PMID 2041806 Recent changes in the GenBank On-line Service Benton D Nucleic Acids Res 1990 Mar 25;18(6):1517-20PMID 2326192 GenBank Benson DA, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW Nucleic Acids Res 2015 Jan;43(Database issue):D30-5PMID 25414350 The European Bioinformatics Institute in 2016: Data growth and integration Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R Nucleic Acids Res 2016 Jan 4;44(D1):D20-6PMID 26673705 Searching and Navigating UniProt Databases Pundir S, Magrane M, Martin MJ, O'Donovan C; UniProt Consortium Curr Protoc Bioinformatics 2015 Jun 19;50:1PMID 26088053 Genenames Gray KA, Yates B, Seal RL, Wright MW, Bruford EA org: the HGNC resources in 2015 Nucleic Acids ResPMID 25361968 Database resources of the National Center for Biotechnology Information NCBI Resource Coordinators Nucleic Acids Res 2016 Jan 4;44(D1):D7-19PMID 26615191 Genic insights from integrated human proteomics in GeneCards Fishilevich S, Zimmerman S, Kohn A, Iny Stein T, Olender T, Kolker E, Safran M, Lancet D Database (Oxford) 2016 Apr 5;2016PMID 27048349 The UCSC Cancer Genomics Browser: update 2015 Goldman M, Craft B, Swatloski T, Cline M, Morozova O, Diekhans M, Haussler D, Zhu J Nucleic Acids Res 2015 Jan;43(Database issue):D812-7PMID 25392408 Ensembl 2016 Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Keenan S, Lavidas I, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Nuhn M, Parker A, Patricio M, Pignatelli M, Rahtz M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Birney E, Harrow J, Muffato M, Perry E, Ruffier M, Spudich G, Trevanion SJ, Cunningham F, Aken BL, Zerbino DR, Flicek P Nucleic Acids Res 2016 Jan 4;44(D1):D710-6PMID 26687719 Integrated genomic analyses of ovarian carcinoma Cancer Genome Atlas Research Network Nature 2011 Jun 29;474(7353):609-15PMID 21720365 International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data Zhang J, Baran J, Cros A, Guberman JM, Haider S, Hsu J, Liang Y, Rivkin E, Wang J, Whitty B, Wong-Erasmus M, Yao L, Kasprzyk A Database (Oxford) 2011 Sep 19;2011:bar026PMID 21930502 Making sense of cancer genomic data. Chin L, Hahn WC, Getz G, Meyerson M. Genes Dev. 2011 Mar 15;25(6):534-55. doi: 10.1101/gad.2017311.PMID 21406553 Human cancer databases (review) Pavlopoulou A, Spandidos DA, Michalopoulos I Oncol Rep 2015 Jan;33(1):3-18PMID 25369839 Oncogenomic portals for the visualization and analysis of genome-wide cancer data Klonowska K, Czubak K, Wojciechowska M, Handschuh L, Zmienko A, Figlerowicz M, Dams-Kozlowska H, Kozlowski P Oncotarget 2016 Jan 5;7(1):176-92PMID 26484415 Human genotype-phenotype databases: aims, challenges and opportunities Brookes AJ, Robinson PN Nat Rev Genet 2015 Dec;16(12):702-15PMID 26553330 Databases and web tools for cancer genomics study Yang Y, Dong X, Xie B, Ding N, Chen J, Li Y, Zhang Q, Qu H, Fang X Genomics Proteomics Bioinformatics 2015 Feb;13(1):46-50PMID 25707591 Variation Interpretation Predictors: Principles, Types, Performance, and Choice Niroula A, Vihinen M Hum Mutat 2016 Jun;37(6):579-97PMID 26987456 Deciphering ENCODE Diehl AG, Boyle AP Trends Genet 2016 Apr;32(4):238-49PMID 26962025 dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions Wu J, Wu M, Li L, Liu Z, Zeng W, Jiang R Database (Oxford) 2016 Mar 17;2016PMID 26989155 Somatic mutation in cancer and normal cells Martincorena I, Campbell PJ Science 2015 Sep 25;349(6255):1483-9PMID 26404825 Cancer Cytogenetics: Chromosomal and Molecular Genetic Abberations of Tumor Cells Sverre Heim and Felix Mitelman 2015, Wiley-Blackwell , New-York A database on cytogenetics in haematology and oncology Dorkeld F, Bernheim A, Dessen P, Huret JL Nucleic Acids Res 1999 Jan 1;27(1):353-4PMID 9847226 Lancet A PROPOSED standard system of nomenclature of human mitotic chromosomes 1960 May 14;1(7133):1063-5 PubMed PMID: 13857542PMID 13857542 An International System for Human Cytogenetic Nomenclature Shaffer LG, McGowen-Jordan J, Schmid M, editors 2013, Basel: S. Karger Mitelman database of chromosome aberrations and genes fusions in Cancer Mitelman F, Johansson B, Mertens F ChimerDB--a knowledgebase for fusion sequences Kim N, Kim P, Nam S, Shin S, Lee S Nucleic Acids Res 2006 Jan 1;34(Database issue):D21-4PMID 16381848 ChimerDB 2 Kim P, Yoon S, Kim N, Lee S, Ko M, Lee H, Kang H, Kim J, Lee S 0--a knowledgebase for fusion genes updated Nucleic Acids ResPMID 19906715 TICdb: a collection of gene-mapped translocation breakpoints in cancer Novo FJ, de Mendíbil IO, Vizmanos JL BMC Genomics 2007 Jan 26;8:33PMID 17257420 OMIM Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A org: Online Mendelian Inheritance in Man (OMIM), an online catalog of human genes and genetic disorders Nucleic Acids ResPMID 25428349 COSMIC: exploring the world's knowledge of somatic mutations in human cancer Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ Nucleic Acids Res 2015 Jan;43(Database issue):D805-11PMID 25355519 ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data Frenkel-Morgenstern M, Gorohovski A, Lacroix V, Rogers M, Ibanez K, Boullosa C, Andres Leon E, Ben-Hur A, Valencia A Nucleic Acids Res 2013 Jan;41(Database issue):D142-51PMID 23143107 ChiTaRS 2 Frenkel-Morgenstern M, Gorohovski A, Vucenovic D, Maestre L, Valencia A 1--an improved database of the chimeric transcripts and RNA-seq data with novel sense-antisense chimeric RNA transcripts Nucleic Acids ResPMID 25414346 A comprehensive transcriptional portrait of human cancer cell lines Klijn C, Durinck S, Stawiski EW, Haverty PM, Jiang Z, Liu H, Degenhardt J, Mayba O, Gnad F, Liu J, Pau G, Reeder J, Cao Y, Mukhyala K, Selvaraj SK, Yu M, Zynda GJ, Brauer MJ, Wu TD, Gentleman RC, Manning G, Yauch RL, Bourgon R, Stokoe D, Modrusan Z, Neve RM, de Sauvage FJ, Settleman J, Seshagiri S, Zhang Z Nat Biotechnol 2015 Mar;33(3):306-12PMID 25485619 FusionCancer: a database of cancer fusion genes derived from RNA-seq data Wang Y, Wu N, Liu J, Wu Z, Dong D Diagn Pathol 2015 Jul 28;10:131PMID 26215638 Fusion gene microarray reveals cancer type-specificity among fusion genes Løvf M, Thomassen GO, Bakken AC, Celestino R, Fioretos T, Lind GE, Lothe RA, Skotheim RI Genes Chromosomes Cancer 2011 May;50(5):348-57PMID 21305644 A universal assay for detection of oncogenic fusion transcripts by oligo microarray analysis Skotheim RI, Thomassen GO, Eken M, Lind GE, Micci F, Ribeiro FR, Cerveira N, Teixeira MR, Heim S, Rognes T, Lothe RA Mol Cancer 2009 Jan 19;8:5PMID 19152679 Next generation sequencing approach for detecting 491 fusion genes from human cancer Urakami K, Shimoda Y, Ohshima K, Nagashima T, Serizawa M, Tanabe T, Saito J, Usui T, Watanabe Y, Naruoka A, Ohnami S, Ohnami S, Mochizuki T, Kusuhara M, Yamaguchi K Biomed Res 2016;37(1):51-62PMID 26912140 Recurrent chimeric fusion RNAs in non-cancer tissues and cells Babiceanu M, Qin F, Xie Z, Jia Y, Lopez K, Janus N, Facemire L, Kumar S, Pang Y, Qi Y, Lazar IM, Li H Nucleic Acids Res 2016 Apr 7;44(6):2859-72PMID 26837576 Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D Science 1992 Oct 30;258(5083):818-21PMID 1359641 Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Döhner H, Cremer T, Lichter P Genes Chromosomes Cancer 1997 Dec;20(4):399-407PMID 9408757 High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG Nat Genet 1998 Oct;20(2):207-11PMID 9771718 Impact of centralization on aCGH-based genomic profiles for precision medicine in oncology Commo F, Ferté C, Soria JC, Friend SH, André F, Guinney J Ann Oncol 2015 Mar;26(3):582-8PMID 25538175 The Gene Expression Omnibus Database Clough E, Barrett T Methods Mol Biol 2016;1418:93-110PMID 27008011 Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, Füllgrabe A, Fuentes AM, Jupp S, Koskinen S, Mannion O, Huerta L, Megy K, Snow C, Williams E, Barzine M, Hastings E, Weisser H, Wright J, Jaiswal P, Huber W, Choudhary J, Parkinson HE, Brazma A Nucleic Acids Res 2016 Jan 4;44(D1):D746-52PMID 26481351 The landscape of somatic copy-number alteration across human cancers Beroukhim R, Mermel CH, Porter D, Wei G, Raychaudhuri S, Donovan J, Barretina J, Boehm JS, Dobson J, Urashima M, Mc Henry KT, Pinchback RM, Ligon AH, Cho YJ, Haery L, Greulich H, Reich M, Winckler W, Lawrence MS, Weir BA, Tanaka KE, Chiang DY, Bass AJ, Loo A, Hoffman C, Prensner J, Liefeld T, Gao Q, Yecies D, Signoretti S, Maher E, Kaye FJ, Sasaki H, Tepper JE, Fletcher JA, Tabernero J, Baselga J, Tsao MS, Demichelis F, Rubin MA, Janne PA, Daly MJ, Nucera C, Levine RL, Ebert BL, Gabriel S, Rustgi AK, Antonescu CR, Ladanyi M, Letai A, Garraway LA, Loda M, Beer DG, True LD, Okamoto A, Pomeroy SL, Singer S, Golub TR, Lander ES, Getz G, Sellers WR, Meyerson M Nature 2010 Feb 18;463(7283):899-905PMID 20164920 Functional genomic analysis of chromosomal aberrations in a compendium of 8000 cancer genomes Kim TM, Xi R, Luquette LJ, Park RW, Johnson MD, Park PJ Genome Res 2013 Feb;23(2):217-27PMID 23132910 CaSNP: a database for interrogating copy number alterations of cancer genome from SNP array data Cao Q, Zhou M, Wang X, Meyer CA, Zhang Y, Chen Z, Li C, Liu XS Nucleic Acids Res 2011 Jan;39(Database issue):D968-74PMID 20972221 arrayMap 2014: an updated cancer genome resource Cai H, Gupta S, Rath P, Ai N, Baudis M Nucleic Acids Res 2015 Jan;43(Database issue):D825-30PMID 25428357 Detection of large-scale variation in the human genome Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C Nat Genet 2004 Sep;36(9):949-51PMID 15286789 Global variation in copy number in the human genome Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME Nature 2006 Nov 23;444(7118):444-54PMID 17122850 The Database of Genomic Variants: a curated collection of structural variation in the human genome MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW Nucleic Acids Res 2014 Jan;42(Database issue):D986-92PMID 24174537 DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM, Carter NP Am J Hum Genet 2009 Apr;84(4):524-33 A global reference for human genetic variation 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR Nature 2015 Oct 1;526(7571):68-74PMID 26432245 Integrating common and rare genetic variation in diverse human populations International HapMap 3 Consortium, Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJ, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE Nature 2010 Sep 2;467(7311):52-8PMID 20811451 Evolution and functional impact of rare coding variation from deep sequencing of human exomes Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM; Broad GO; Seattle GO; NHLBI Exome Sequencing Project Science 2012 Jul 6;337(6090):64-9PMID 22604720 A census of human cancer genes Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR Nat Rev Cancer 2004 Mar;4(3):177-83PMID 14993899 Human Gene Mutation Database Cooper DN, Krawczak M Hum Genet 1996 Nov;98(5):629PMID 8882888 LOVD v. 2.0: the next generation in gene variant databases. Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT Hum Mutat. May;32(5):557-63PMID 21520333 Web-TCGA: an online platform for integrated analysis of molecular cancer data sets Deng M, Brägelmann J, Schultze JL, Perner S BMC Bioinformatics 2016 Feb 6;17:72PMID 26852330 IntOGen: integration and data mining of multidimensional oncogenomic data Gundem G, Perez-Llamas C, Jene-Sanz A, Kedzierska A, Islam A, Deu-Pons J, Furney SJ, Lopez-Bigas N Nat Methods 2010 Feb;7(2):92-3PMID 20111033 A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) Wu TJ, Shamsaddini A, Pan Y, Smith K, Crichton DJ, Simonyan V, Mazumder R Database (Oxford) 2014 Mar 25;2014:bau022PMID 24667251 Quantifying prion disease penetrance using large population control cohorts Minikel EV, Vallabh SM, Lek M, Estrada K, Samocha KE, Sathirapongsasuti JF, McLean CY, Tung JY, Yu LP, Gambetti P, Blevins J, Zhang S, Cohen Y, Chen W, Yamada M, Hamaguchi T, Sanjo N, Mizusawa H, Nakamura Y, Kitamoto T, Collins SJ, Boyd A, Will RG, Knight R, Ponto C, Zerr I, Kraus TF, Eigenbrod S, Giese A, Calero M, de Pedro-Cuesta J, Haïk S, Laplanche JL, Bouaziz-Amar E, Brandel JP, Capellari S, Parchi P, Poleggi A, Ladogana A, O'Donnell-Luria AH, Karczewski KJ, Marshall JL, Boehnke M, Laakso M, Mohlke KL, Kähler A, Chambert K, McCarroll S, Sullivan PF, Hultman CM, Purcell SM, Sklar P, van der Lee SJ, Rozemuller A, Jansen C, Hofman A, Kraaij R, van Rooij JG, Ikram MA, Uitterlinden AG, van Duijn CM; Exome Aggregation Consortium (ExAC), Daly MJ, MacArthur DG Sci Transl Med 2016 Jan 20;8(322):322ra9PMID 26791950 Birth Defects Cytogenet Cell Genet. 1974;13(3):1-216 Using ClinVar as a Resource to Support Variant Interpretation Harrison SM, Riggs ER, Maglott DR, Lee JM, Azzariti DR, Niehaus A, Ramos EM, Martin CL, Landrum MJ, Rehm HL Curr Protoc Hum Genet 2016 Apr 1;89:8PMID 27037489 Identification and analysis of deleterious human SNPs Yue P, Moult J J Mol Biol 2006 Mar 10;356(5):1263-74PMID 16412461 The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency Rubinstein WS, Maglott DR, Lee JM, Kattman BL, Malheiro AJ, Ovetsky M, Hem V, Gorelenkov V, Song G, Wallin C, Husain N, Chitipiralla S, Katz KS, Hoffman D, Jang W, Johnson M, Karmanov F, Ukrainchik A, Denisenko M, Fomous C, Hudson K, Ostell JM Nucleic Acids Res 2013 Jan;41(Database issue):D925-35PMID 23193275 Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O'Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD Nucleic Acids Res 2016 Jan 4;44(D1):D733-45PMID 26553804 The UCSC Genome Browser database: 2015 update Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AF, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ Nucleic Acids Res 2015 Jan;43(Database issue):D670-81PMID 25428374 BioGPS: building your own mash-up of gene annotations and expression profiles Wu C, Jin X, Tsueng G, Afrasiabi C, Su AI Nucleic Acids Res 2016 Jan 4;44(D1):D313-6PMID 26578587 UniProt: a hub for protein information UniProt Consortium Nucleic Acids Res 2015 Jan;43(Database issue):D204-12PMID 25348405 The neXtProt knowledgebase on human proteins: current status Gaudet P, Michel PA, Zahn-Zabal M, Cusin I, Duek PD, Evalet O, Gateau A, Gleizes A, Pereira M, Teixeira D, Zhang Y, Lane L, Bairoch A Nucleic Acids Res 2015 Jan;43(Database issue):D764-70PMID 25593349 PhosphoSitePlus, 2014: mutations, PTMs and recalibrations Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E Nucleic Acids Res 2015 Jan;43(Database issue):D512-20PMID 25514926 New and continuing developments at PROSITE Sigrist CJ, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I Nucleic Acids Res 2013 Jan;41(Database issue):D344-7PMID 23161676 The Pfam protein families database: towards a more sustainable future Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A Nucleic Acids Res 2016 Jan 4;44(D1):D279-85PMID 26673716 The InterPro protein families database: the classification resource after 15 years Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJ, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD Nucleic Acids Res 2015 Jan;43(Database issue):D213-21PMID 25428371 GeneTests: an online genetic information resource for health care providers Pagon RA J Med Libr Assoc 2006 Jul;94(3):343-8PMID 16888670 Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S Hum Mutat 2012 May;33(5):803-8PMID 22422702 Activation of proto-oncogenes by disruption of chromosome neighborhoods Hnisz D, Weintraub AS, Day DS, Valton AL, Bak RO, Li CH, Goldmann J, Lajoie BR, Fan ZP, Sigova AA, Reddy J, Borges-Rivera D, Lee TI, Jaenisch R, Porteus MH, Dekker J, Young RA Science 2016 Mar 25;351(6280):1454-8PMID 26940867 Discovery of unfixed endogenous retrovirus insertions in diverse human populations Wildschutte JH, Williams ZH, Montesion M, Subramanian RP, Kidd JM, Coffin JM Proc Natl Acad Sci U S A 2016 Apr 19;113(16):E2326-34PMID 27001843 All the World's a Stage: Facilitating Discovery Science and Improved Cancer Care through the Global Alliance for Genomics and Health Lawler M, Siu LL, Rehm HL, Chanock SJ, Alterovitz G, Burn J, Calvo F, Lacombe D, Teh BT, North KN, Sawyers CL; Clinical Working Group of the Global Alliance for Genomics and Health (GA4GH) Cancer Discov 2015 Nov;5(11):1133-6PMID 26526696 Written2016-04Etienne De Braekeleer, Jean Loup Huret, Hossain Mossafa, Katriina Hautaviita, Philippe DessenCancer Genetics & Stem Cell Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, United Kingdom; Medical Genetics, Dept Medical Information, University Hospital, F-86021 Poitiers, France; Laboratoire CERBA, 95310 Saint Ouen l'Aumone, France; (Mouse genomics, Wellcome Trust Sanger Institute); UMR 1170 INSERM, Gustave Roussy, 114 rue Edouard Vaillant, F-94805 Villejuif, France.CitationThis paper should be referenced as such : Etienne De Braekeleer, Jean Loup Huret, Hossain Mossafa, Katriina Hautaviita, Philippe DessenGeneral resources in Genetics and/or OncologyAtlas Genet Cytogenet Oncol Haematol. ;(6):359-379.Free journal version : [ pdf ] [ DOI ]On line version : http://AtlasGeneticsOncology.org/Deep/Cancer_CytogenomicsID20145.htm
Figure 4: PAX5/JAK2 gene fusion at COSMIC I-4 ChimerDB 2.0 (http://biome.ewha.ac.kr:8080/FusionGene/) ChimerDB 2.0 is a database for hybrid genes updated in 2010, with PubMed references and various information about the structure of chimeric genes. (Kim N et al., 2006a ; Kim P et al., 2006b). I-5 TICdb (http://www.unav.es/genetica/TICdb/) TICdb (v 3.3 August 2013) is a database of Translocation breakpoints In Cancer (Novo FJ et al., 2007). This update contains 1,313 sequences of hybrid genes found in human tumors, involving 420 different genes (Figure 5). For every fusion, TICdb will return the HGNC names of both partner genes and the original reference (either a GenBank or a Pubmed ID), as well as the fusion sequence at the nucleotide level. A complete list of genes and the fusion sequences can be obtained at http://www.unav.es/genetica/allseqsTICdb.txt.
Figure 5: PAX5 hybrid genes at TicDB (http://www.unav.es/genetica/TICdb/results.php?hgnc=PAX5&x=23&y=8) I-6 ChiTARS (http://chitars.bioinfo.cnio.es/) ChiTARS is a database of chimeric transcripts obtained by the analysis of EST or RNA sequencing with part of experimental validation. This database including 20,750 chimeric human transcripts, has been developed within the ENCODE project (Frenkel-Morgenstern M et al., 2013 ; Frenkel-Morgenstern M et al., 2015). I-7 TCGA Fusion gene Data Portal (http://54.84.12.177/PanCanFusV2/) TCGA Fusion gene Data Portal presents the analysis from 20 tumor types of the TCGA program, as of December 2014, with 10,431 fusions in 2,961 tumors with fusions (a mean of 3.5 fusions per sample) (Yoshihara K et al., 2015). This is the result of a specific pipleline for RNA Seq data analysis (PRADA) developed at the MDAndersson Cancer Center. I-8 Fusion cancer (http://donglab.ecnu.edu.cn/databases/FusionCancer/) (Wang Y, 2015). This database of hybrid genes in human cancers originated from the analysis of RNA-seq data in the Sequence Read Archive (SRA) on NCBI in 15 cancer types and contains 11,839 fusions, with structured information on cancer types, breakpoint accession numbers of SRA and chimeric sequences. I-9 OMIM (http://www.omim.org/ see General resources in Genetics and/or Oncology) The "Online Mendelian Inheritance in Man" (OMIM) catalog encloses 1,523 entries with "fusion gene" (Amberger JS et al., 2015). I-10 Other resourcesBooks: "Cancer Cytogenetics: Chromosomal and Molecular Genetic Abberations of Tumor Cells", by Sverre Heim and Felix Mitelman, is published by Wiley-Blackwell Ref. This major textbook is the fourth edition (2015) and contains 648 pages. Some useful iconography of chromosome rearrangements from the UWCS laboratory, University of Wisconsin, can be found at http://www.slh.wisc.edu/clinical/cytogenetics/cancer/. An analysis of hybrid genes in 675 tumor cell lines has been performed by GenenTech (Klijn C et al., 2015). Of the 2,200 gene fusions catalogued, 1,435 consist of genes not previously found in fusions. A synthesis of cell lines analyses can be found in the Atlas at http://atlasgeneticsoncology.org/celllines.html. Finally, the Mitelman and the Atlas being complementary, the recommendation is that both of these indispensable databases should be used. II- Data for SKY and FISH Fluorescence in-situ hybridization (FISH) technique facilitates the identification of chromosomal structures to be identified using specific probes. This significantly improves the localisation of breakpoints on chromosomes by a direct view of hybridization of probes using one or several colors associated with the probes. The big advantage of the FISH technique is that it can also be used on non-dividing cells (interphase nuclei). BAC clones are used for mapping studies as they contain large inserts of human DNA and can be fluorescently labeled to determine the localization of genes and identify regions implicated in cancer chromosomal aberrations. The Cancer Chromosome Aberration Project (CCAP) has created a set of BAC clones mapped cytogenetically by FISH and physically by STSs to the human genome. The BAC data is integrated into various CGAP and NCBI databases to provide related clinical, histopathologic, genetic, and genomic information (http://cgap.nci.nih.gov/Chromosomes/CCAPBACClones) and more precisely for each chromosome (e.g. http://cgap.nci.nih.gov/Chromosomes/BACCloneMap?CHR=6). The Human BAC Array (http://mkweb.bcgsc.ca/bacarray/ is built using 32,855 clones from RPCI-11, RPCI-13, Caltech-D BAC libraries. The set achieves an average depth of coverage of 1.8X, average effective resolution of 76 kb. Genome-adjacent clones in the set overlap by an average of 73 kb. The set provides coverage of 98% of the human hg17 (May 2004) genome assembly and 98% of the human May 2005 BAC fingerprint map. The clone set is publically available from BACPAC Resources. An easy way to select them is by the Cytogenetic Resource of FISH-mapped, Sequence-tagged Clones at NCBI (http://www.ncbi.nlm.nih.gov/genome/cyto/cytobac.cgi?CHR=6&VERBOSE=ctg). All BAC can be located on the UCSC genome browser (http://genome.ucsc.edu) when BAC end pairs track is selected. On the other hand, BAC from the fishClones file can be visualized on the chromosomal bands on the Atlas (http://atlasgeneticsoncology.org/Bands/) that has a link to their GenBank sequences. More recently, several commercial companies have developed more specific catalogs of FISH clones as oligonucleotides probes (see also the chromosome pages of the Atlas for links). With differently labelled DNA probes (in general as a mixture), combined green/red signals colocalize in yellow in normal cells. In a chromosome translocation the co-localized signal will split, resulting in separate green and red signals, the unaffected chromosome remaining with a yellow signal. Concerning the SKY techniques, there are some resources such as a SKY/M-FISH &CGH database at the NCBI (which provides a public platform for investigators to share and compare their molecular cytogenetic data http://www.ncbi.nlm.nih.gov/sky/), with an ICD-O3 nomenclature (International Classification of Diseases - Oncology). Elsewhere, there are some others resources as SKY Karyotypes and FISH analysis of Epithelial Cancer Cell Lines at Cambridge (http://www.pawefish.path.cam.ac.uk/). III- Comparative genomic hybridization (CGH) resources In 1992, Dan Pinckel (Kallioniemi A et al., 1992) developed the comparative genomic hybridization (CGH) independently of the morphological analysis of chromosomes. In the first step of development, CGH was used on metaphases. But at the end of 1990 Solinas-Toldo (Solinas-Toldo S et al., 1997) and Pinkel et al. (Pinkel D et al., 1998) proposed a new technique of DNA hybridization on array (first spotted with cDNA, but rapidly, after 2002, with synthetized (50-80 mers) oligonucleotides. The genomic resolution was increased below 50-100 nucleotides, as the density of probes is, in parallel, increased from 20K to up 2M. Because it is a method of a ratio of copy numbers (often defined as log2 of the ratio) this technique only detects disequilibrium between a disease sample and a normal sample, and it has been applied to several aspects of genetic imbalances. Numerous arrays have been designed (from pan-genomic to specific of some abnormalities (custom design)). For example the GEO server (Gene Expression Omnibus) has 432 CGH platforms (with 233 as human) and 71 SNP (with 46 for human). The processing of CGH data is not obvious (with normalization of the raw data, centralization, segmentation in pieces of chromosomes with homogeneous copy number limited by breakpoints, and finally annotation of implicated genes). An optimal profile of copy number associated with accurate breakpoints requires normalization (with correction of GC content) and centralization (especially when the profile has a great part of abnormalities). This optimization also depends on the nature of the sample (such as clonality or the percentage of tumor cell). It is important to note the impact in clinical routine to define, for example, actionable genes (Commo F et al., 2015). Another extension of this approach are the SNP arrays that combine probes designed for copy-number measurement and probes specific of a known nucleotide variant ("single nucleotide polymorphism"). The great advantage is the possibility to measure the ploidy (which cannot be measured by CGH alone, as the measure is a relative value, depending on the percentage of tumor cells). Moreover, the segmentation of copy number can be correlated with the segmentation of LOH (loss of heterozygosity), which gives a better interpretation of the origin of abnormalities. Several sites are repositories for these CGH/SNP profiles: III-1 GEO (http://www.ncbi.nlm.nih.gov/geo/) GEO (Gene Expression Omnibus) is a public functional genomics data repository supporting MIAME-compliant data submissions. Array and sequence-based data are accepted. Tools are provided to help users query and download experiments and curated gene expression profiles. This database includes curated gene expression DataSets, as well as original series and platform records in the GEO repository. Mainly used for gene expression, GEO has a limited part dedicated to CGH datasets (1,358 experiments for human neoplasms). It is not easy to synthetize the variation of copy number results directly on the site. The best way is to export (as GSExxxRAW.tar) and reanalyze the data with a specific software (as Bioconductor packages or commercial companie's tools (Clough E and Barrett T, 2016). III-2 Array Express (http://www.ebi.ac.uk/arrayexpress/) Array Express is a similar archive of functional genomics data, stored data from high-throughput functional genomics experiments, and provides these data for the reuse for the research community (Petryszak R et al., 2016). There are several other sites that present reanalyzed data (public or local) with various analytic approaches and provide facilities for exploring abnormalities in different types of tumors. III-3 Tumorscape (http://www.broadinstitute.org/tcga/home) This portal (Broad Institute), created in 2010, is designed to facilitate the use and understanding of high resolution copy number data amassed from multiple cancer types. The 3,131 datasets are partly originating from GEO and reanalyzed with the GISTIC algorithm to identify regions that have been altered above the background rate and therefore may be subject to positive selection. For each of these regions, one or more "peak regions", most likely to contain the target genes, are identified (Beroukhim R et al., 2010). The following functionalities are supported: - Gene-level Analysis: One can query the level and significance of copy number alterations affecting any gene listed in Refseq (or miRNAs). Click "Analyses", then "by Gene". - Analysis by cancer type: One can query the most significant regions of amplification and deletion in individual cancer types. Click "Analyses", then "by Cancer Type". In Analysis by Gene, these data represent a GISTIC analysis performed on this cancer type. Across a large number of cancers, copy number alterations (amplifications or deletions) can be found almost anywhere in the genome. GISTIC identifies regions that are altered above the background rate and therefore may be subject to positive selection. For each of these regions, one or or more "peak regions", most likely to contain the target genes, are identified. The evidence that a gene is targeted by these copy number alterations includes: i) Presence in a peak region: these peak regions are the regions deemed most likely by GISTIC to contain the gene or genes being targeted by significant amplifications/deletions; ii) a significance (Q-value): this represents the likelihood that the gene only suffers amplifications/deletions at the background rate across the entire genome. The data can be visualized on the IGV (Integrated Genome Viewer). III-4 MetaCGH (http://compbio.med.harvard.edu/metacgh/) This website is designed to provide access to array CGH (comparative genomic hybridization) based on copy number profiles of 8,227 human cancer genomes (Figure 6). See the description of the database for more information about its composition (Kim TM et al., 2013). An interactive web-based browser facilitates the exploration of the result set: - Search for specific genes of interest. Support alternative gene nomenclatures. - Browse cytobands by frequency of alteration. - Visualize alteration frequency over the full set of tumor types for a gene of interest.
Figure 6: PAX3 gain and loss in tumors at MetaCGH (http://compbio.med.harvard.edu/metacghBrowser/). III-5 CaSNP (http://cistrome.org/CaSNP/) CaSNP is a comprehensive collection of copy number alterations (CNA) from SNP arrays. It collects 11,485 Affymetrix SNP arrays of 34 different cancer types in 105 studies to profile the genome-wide CNA and SNP in each. This includes all the cancer SNP profiles using Affymetrix SNP arrays (10K to 6.0) with raw data from GEO, with additional arrays from the TCGA consortium and a few individual publications. All CNA data stored in CaSNP is generated from raw data analyzed by dCHIP-SNP software. Data can be visualized as table or heatmap. (Cao Q et al., 2011). III-6 Cell line project (http://cancer.sanger.ac.uk/cell_lines) For decades, human immortal cancer cell lines have constituted an accessible, easily usable set of biological models. In order to improve their utility the Cancer Genome Project has embarked on a systematic characterization of the genetics and genomics of large numbers of cancer cell lines. Prior knowledge of their genetic abnormalities may allow more informed choice of cancer cell lines in biological experiments and drug testing and more informed interpretation of results. Among other information (exome sequencing) the COSMIC Cell Lines Project includes genome wide copy number analysis and genotyping information obtained by using the Affymetrix SNP6 array and analyzed by using the PICNIC algorithm. A complete list of cell lines can be found on http://cancer.sanger.ac.uk/cell_lines/cbrowse/all. III-7 Cancer Cell Line Encyclopedia (http://www.broadinstitute.org/ccle/home). For several years, the Broad Institute has developed resources for cell lines data, especially copy number analysis with Affymetrix SNP6.0 arrays. These last two resources are complementary. Several other sites presenting global resources from TCGA or ICGC programs give access for each disease by copy number analysis (e.g. Broad GDAC FireBrowse, cBioPortal (see below), OASIS portal ....) III-8 ArrayMap (http://www.arraymap.org) ArrayMap is a curated reference database and bioinformatics resource targeting copy number profiles that provides an entry point for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. The current data reflects 65,042 genomic copy number arrays, in 986 experimental series and on 333 array platforms (Cai H et al., 2015). A main interest of these resources (originating in great part from GEO datasets) is the fine classification with the ICD-O3 nomenclature. This resource is an elaborate and complete site for querying large amount of CGH data of cancer. For the majority of the samples, probe level visualization and customized data representation facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools (as linear, circularized or karyotype like presentations). Numerous tools permit visualization of part of profiles (selection of chromosomes or genes) and export of data in tabulated files. An API (with relatively easy syntaxes) facilitates an automation of analyses. Moreover a majority of cards (leukemia or solid tumors) in the Atlas are linked, via ICD-O3 codes, to ArrayMap (Figure 7).
Figures 7: ArrayMap (http://www.arraymap.org/) Selection of 26 samples of T lymphoblastic leukaemia/lymphoma (ICD-O 9837/3) to obtain a "heatmap" of gain and loss for all the samples showing the variability of CGH profiles. IV- Mutation databases The difference between single nucleotide (SNP) as the variability within a population and mutations acquired in a neoplastic process is extremely crucial. The determination of variants was previously obtained by SNP arrays, but is nowadays performed by massive parallel sequencing. As a result, a huge quantity of polymorphisms and mutations in tumors, are compared to controls. The landscape of the majority of recurrent mutations is now known and can be used for diagnosis. Even in haematological malignancies, where the chromosome rearrangements have shown to bear a major role, nonetheless, it appears now that some mutations at the nucleotide level can still be very important in determining treatments in relation to patient outcome (e.g. ASXL1, ATM, BCL6, BRAF, KRAS and NRAS, CBL, CCND3, CDKN2A and CDKN2C, CEBPA, CRLF2, ETV6, FLT3, GATA2, ID3, IDH1, IDH2, IKZF1, JAK1, KIT, MYD88, NOTCH1, NPM1, RUNX1, TP53).IV-1 COSMIC (http://cancer.sanger.ac.uk/cosmic) COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers. In the v76 (Feb 2016), there are 3,942,175 mutations on 1,192,776 samples collected in 22,844 papers. The interface has been fully redesigned and offers multiple ways to view mutations, fusions, copy numbers, etc. (Forbes SA et al., 2015). IV-2 CENSUS (http://cancer.sanger.ac.uk/census/) The Cancer Gene Census is an ongoing effort to include cancer genes for which mutations have been causally implicated in cancer. The original census and analysis was published in Nature Reviews Cancer and supplemental analysis information related to the paper is also available. The census is regularly updated. In particular, Felix Mitelman and his colleagues have been continuing to provide information on more genes involved in uncommon translocations in leukaemias and lymphomas. Currently, there is more than 1% of all human genes that have been mutated in cancer. Out of these, roughtly 90% cancer mutations are somatic, 20% bear germline mutations that predispose to cancer and 10% show both somatic and germline mutations (Futreal PA et al., 2004). IV-3 HGMD (http://www.hgmd.cf.ac.uk/ac/index.php) The recognition that certain DNA sequences are hypermutable has yielded clues to the endogenous mutational mechanisms involved and has provided insights into the intricacies of the processes of DNA replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller understanding of the mutational process may prove important in molecular diagnostic medicine by contributing to improvements in the design and efficacy of mutation search procedures and strategies for different genetic disorders. The Human Gene Mutation Database (HGMD) collects known (published) gene lesions responsible for human inherited disease. This database, whilst originally established for the study of mutational mechanisms in human genes (Cooper DN and Krawczak M, 1996) has now acquired a much broader utility in that it embodies an up-to-date and comprehensive reference source to the spectrum of inherited human genes. Thus, HGMD provides information of practical diagnostic importance to i) researchers and diagnosticians in human molecular genetics, ii) physicians interested in a particular inherited condition in a given patient or family, and iii) genetic counselors. Note: HGMD has two types of access: a free public one with limited data and a professional one requiring a license. IV-4 LOVD (http://www.lovd.nl/3.0/home) LOVD stands for Leiden Open (source) Variation Database. The LOVD's purpose is to provide a flexible tool for gene-centered collection and display of DNA variations. LOVD 3.0 extends this idea to also provide patient-centered data storage and NGS data storage, even for variants outside of genes. LOVD consist of both a database soltware and the content from Locus Specific Mutations databases (LSSB) (http://grenada.lumc.nl/LSDBlist/lsdbs) which are curated by laboratories. A general access gives links to each gene (92,241 entries in all) (Fokkema IF et al., 2011). IV-5 TCGA cBIoPortal (http://www.cbioportal.org/) The cBioPortal for Cancer Genomics provides visualization, analysis and access of large-scale cancer genomics data sets (126 in April 2016). For each dataset the portal presents several diagrams for mutations, copy number variations, survival analysis so on (Figure 8). It also provides help in analysing a list of predefined genes (Deng M et al., 2016).
Figure 8: PAX5 alterations in cancer at cBioPortal (http://www.cbioportal.org/, Select Cancer Study, tick "all"; Enter Gene Set: "write; "PAX5")IV-6 ICGC Data Portal (https://dcc.icgc.org/) The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium's member projects. The Pancancer Analysis of Whole Genomes (PCAWG) study is an international collaboration to identify common patterns of mutations in more than 2,800 cancer whole genomes from the International Cancer Genome Consortium. It contains descriptions of 36,985,985 mutations in 57,773 genes and 17,867 donors within 66 projects in 21 primary sites (Zhang J et al., 2011). IV-7 OASIS Portal (see above) presents data from 30 datasets (from Acute Myeloid Leukemia to Uterine Corpus Endometrial Carcinosarcoma) with 6,817 mutations, 11,222 CNVs and expression (8,178 RNA Seq and 4,889 microarrays). IV-8 IntOGen (http://www.intogen.org) IntOGen collects and analyses somatic mutations in thousands of tumor genomes to identify cancer driver genes (Figure 9). At the end of 2014, IntOGen defines a list of 459 driver genes in 28 cancer types (Gundem G et al., 2010).
Figure 9: PAX5 mutation frequency at intOGen (http://www.intogen.org/search?gene=PAX5) IV-9 BioMuta v2 (https://hive.biochemistry.gwu.edu/tools/biomuta/) BioMuta v2.0 is a curated single-nucleotide variation (SNV) and disease association database where the variations are mapped to the genome/protein/gene. Oriented toward cancer, the database has 5,233,790 SNV for 41 cancer types and gives position of mutation and frequency in each cancer type (Wu TJ et al., 2014). IV-10 DoCM (http://docm.genome.wustl.edu/) The Database of Curated Mutations (DoCM) is a curated database of known, disease-causing mutations that provides easily explorable variant lists with direct links to source citations for easy verification. Curation of the literature to produce a high quality set of pathogenic somatic mutations is not straitforward. This requires sifting through the ever growing body of cancer research literature (6% annual growth rate in the last 10 years), which for year 2015 means over 156,399 articles related to cancer as indexed by PubMed. This volume of literature makes it difficult to identify bona fide somatic mutations with characterized functional or clinical significance in cancer. Once identified, these mutations require significant curation efforts to format and standardize the mutations in a consistent way that enables databasing. For example, publications often only specify the amino acid change and gene name to describe the mutation. DoCM addresses these challenges by acting as an accessible, open-source, and openly licensed somatic mutation repository that also enables community contributions. IV-11 CIViC (https://civic.genome.wustl.edu/#/home) The CIViC (Clinical Interpretations of Variants in Cancer) database is based on Evidence items which reference their parent variants, variant groups, and genes. One can explore the various CIViC entities and their attributes using the menu. Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. In the context of cancer, this might involve the identification of specific mutations shown to predict response to a targeted therapy. The biomedical literature describing these associations is large and growing rapidly. Currently these interpretations exist largely in private or encumbered databases resulting in extensive repetition of effort. Currently this database is just starting with 212 genes (474 variants) analysed from 870 publications. IV-12 ExAC (http://exac.broadinstitute.org) ExAC (Exome Aggregation Consortium) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a variety of large-scale sequencing projects, and to make summary of the data available for the wider scientific community. The data set provided on this website spans across 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. All of the raw data from these projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects. The data are available under the ODC Open Database License (ODbL). One is allowed to freely share and modify the ExAC data as long as it is of public use of the database, or work produced from the database, with keeping the resulting data-sets open and offering the shared or adapted version of the data under the same ODbL license (Minikel EV et al., 2016).
Atlas of Genetics and Cytogenetics in Oncology and Haematology
Cancer Cytogenomics resources
Online version: http://atlasgeneticsoncology.org/deep-insight/20145/cancer-cytogenomics-resources