Identification of genetic variants for clinical management of familial colorectal tumors

Background The genetic mechanisms for families who meet the clinical criteria for Lynch syndrome (LS) but do not carry pathogenic variants in the mismatch repair (MMR) genes are still undetermined. We aimed to study the potential contribution of genes other than MMR genes to the biological and clinical characteristics of Norwegian families fulfilling Amsterdam (AMS) criteria or revised Bethesda guidelines. Methods The Hereditary Cancer Biobank of the Norwegian Radium Hospital was interrogated to identify individuals with a high risk of developing colorectal cancer (CRC) for whom no pathogenic variants in MMR genes had been found in routine diagnostic DNA sequencing. Forty-four cancer susceptibility genes were selected and analyzed by using our in-house designed TruSeq amplicon-based assay for targeted sequencing. RNA splicing- and protein-dedicated in silico analyses were performed for all variants of unknown significance (VUS). Variants predicted as likely to affect splicing were experimentally analyzed by resorting to minigene assays. Results We identified a patient who met the revised Bethesda guidelines and carried a likely pathogenic variant in CHEK2 (c.470 T > C, p.I157T). In addition, 25 unique VUS were identified in 18 individuals, of which 2 exonic variants (MAP3K1 c.764A > G and NOTCH3 c.5854G >A) were analyzed in the minigene splicing assay and found not to have an effect on RNA splicing. Conclusions Among high-risk CRC patients that fulfill the AMS criteria or revised Bethesda guidelines, targeted gene sequencing identified likely pathogenic variant and VUS in other genes than the MMR genes (CHEK2, NOTCH3 and MAP3K1). Our study suggests that the analysis of genes currently excluded from routine molecular diagnostic screens may confer cancer susceptibility. Electronic supplementary material The online version of this article (10.1186/s12881-018-0533-9) contains supplementary material, which is available to authorized users.


Background
Heredity represents a major cause of colorectal cancer (CRC) with at least 20% of the cases estimated to develop due to genetic factors and about 5% being linked to inherited variants in cancer-predisposing genes [1][2][3][4]. Currently, patients with CRC are referred to germline mismatch repair (MMR) testing based on the identification of high-risk phenotypic features (i.e. early age of onset, family history, clinical criteria), but beyond microsatellite instability (MSI) and MMR immunohistochemistry (IHC) testing for Lynch syndrome (LS), no systematic approach to hereditary risk assessment exists [5]. LS is caused by a defective MMR system due to presence of germline defects in at least one of the MMR genes, MLH1, MSH2, MSH6, PMS2 or to deletions of the 3′ portion of the EPCAM gene [6]. LS is clinically classified according to the Amsterdam (AMS) criteria and/or the Bethesda guidelines, both relying in clinical information and family history. The Bethesda guidelines also take into account the MSI signature characteristic of MMR-deficient tumors [7][8][9][10]. LS patients have an increased lifetime risk of CRC (70-80%), endometrial cancer (50-60%), stomach cancer (13-19%), ovarian cancer (9-14%), cancers of the small intestine, the biliary tract and brain as well as carcinoma of the ureters and renal pelvis [11].
However, a high proportion of cases who meet the clinical criteria for LS (~60%) do not carry pathogenic variants in the MMR genes and have been reported as familial colorectal cancer type X (FCCTX) or Lynch-like syndrome (LLS) according to their MSI status [12][13][14][15][16]. The genetic mechanisms are undetermined in the majority of these families [14].
DNA sequencing (DNA-seq) studies using multigene panels have reported that as much as~18% of patients diagnosed with CRC below the age of 50 years have pathogenic variants in several genes that are not traditionally associated with CRC (ATM, CHEK2, BRCA1, BRCA2, CDKN2A and PALB2) [5,17]. Notably, there is a need to determine whether these variants contribute to hereditary CRC risk via the combination of low-and moderate-penetrance susceptibility alleles [5,17,18].
Given the high frequency and wide spectrum of pathogenic variants, it has been suggested that genetic counseling and testing with a multigene panel should be considered for all patients with early-onset CRC [17,[19][20][21][22][23]. Importantly, the identification of high-risk CRC patients is a major issue, because morbidity and mortality from CRC and extracolonic cancers in these patients and their relatives can be decreased by early screening and intensive surveillance [19,[24][25][26].
In an effort to discover inherited genetic variants that influence biological and clinical characteristics of familial CRC developed in unrelated high-risk patients, who previously tested negative for pathogenic variants in MMR genes, we examined 44 cancer associated genes using next generation sequencing (NGS), and applied minigene-based assay to analyze the impact of a subset of genetic variants on RNA splicing.

Study population
The Hereditary Cancer Biobank of the Norwegian Radium Hospital was used to identify unrelated high-risk CRC individuals from families that fulfilled the AMS criteria or the revised Bethesda guidelines [7][8][9][10]27]. By the standard diagnostic clinical techniques, all study subjects were demonstrated not to carry pathogenic variants or large genomic rearrangements in MMR genes (MLH1, MSH2, MSH6 or PMS2).
Ethical approval for the study was granted by the Norwegian Data Inspectorate and Ethical Review Board (ref 2015/2382). All examined patients signed an informed consent for their participation in the study.

Targeted sequencing
Genomic DNA was isolated from peripheral blood samples and targeted sequencing was carried out using a TrueSeq amplicon based assay v.1.5 on a MiSeq apparatus, as previously described [28,29]. The 44-gene panel used in this study includes genes associated with cancer predisposition as described in a prior study [28,29].

Sequencing data analysis
Paired-end sequence reads were aligned to the human reference genome (build GRCh37) using the BWA-mem algorithm (v.0.7.8-r55) [30]. The initial sequence alignments were converted to BAM format and subsequently sorted and indexed with SAMtools (v.1.1) [30]. Genotyping of single nucleotide variants (SNV) and short indels was performed by GATK's HaplotypeCaller. Filtering of raw genotype calls and assessment of callable regions/ loci were done according to GATK's best practice procedures, as described more detailed previously [28].

Nomenclature and classification of genetic variants
The nomenclature guidelines of the Human Genome Variation Society (HGVS) were used to describe the detected genetic variants [38]. The recurrence of the identified variants was established by interrogating four databases (in their latest releases as of November 2016): the Leiden Open Variation Database (LOVD), the Universal Mutation Database (UMD), ClinVar and the Human Gene Mutation Database (HGMD). The variants were classified according to the 5-tier classification system into the following categories: class 5 (pathogenic), class 4 (likely pathogenic), class 3 (uncertain variants or variants of unknown significance, VUS), class 2 (likely not pathogenic) and class 1 (not pathogenic) [3].

In silico analyses of VUS
Two types of bioinformatics methods were used to predict the impact of selected variants on RNA splicing. First, we used MaxEntScan (MES) and SSF-like (SSFL) to predict variant-induced alterations in 3′ and 5′ splice site strength, as described by Houdayer et al. 2012 [39], except that here both algorithms were interrogated by using the integrated software tool Alamut Batch version 1.5, (Interactive Biosoftware, http://www.interactive-biosoftware.com). For prediction of variant-induced impact on exonic splicing regulatory elements (ESR), we resorted to ΔtESRseq- [40], ΔHZei- [41], and SPANRbased [42] as described by Soukarieh et al. [43]. Score differences (Δ) between variant and wild-type (WT) cases were taken as proxies for assessing the probability of a splicing defect. More precisely, we considered that a variant mapping at a splice site was susceptible of negatively impacting exon inclusion if ΔMES≥15% and ΔSSFL≥5% [39], whereas an exonic variant located outside the splice sites was considered as a probable inducer of exon skipping if negative Δ scores (below the thresholds described below) were provided by all the 3 ESRdedicated in silico tools. We chose the following thresholds: <− 0.5 for ΔtESRseq-, <− 10 for ΔHZei-, and < − 0.5 for SPANR-based scores. In addition, we evaluated the possibility of variant-induced de novo splice sites by taking into consideration local changes in MES and SSFL scores. In this case, we considered that variants located outside the splice sites were susceptible of creating a competing splice site if local MES scores were equal to or greater than those of the corresponding reference splice site for the same exon.

Cell-based minigene splicing assays
In order to determine the impact of selected exonic variants on splicing, we performed functional assays based on the comparative analysis of the splicing pattern of WT and mutant reporter minigenes, as follows. First, genomic regions containing the exon of interest (internal exons only) and at least 150 nucleotides of the flanking introns were amplified by PCR [49] using patients' DNA as template and primers indicated in Additional file 1: Table S1. Next, representative minigenes were created by inserting the PCR-amplified fragments into a previously linearized pCAS2 vector [43]. All constructs were sequenced to ensure that no unwanted mutations had been introduced into the inserted fragments during PCR or cloning. Then, WT and mutant minigenes were transfected into HeLa cells grown in 12-well plates (at~70% confluence) using the FuGENE 6 transfection reagent (Roche Applied Science). Twenty-four hours later, total RNA was extracted using the NucleoSpin RNA II kit (Macherey Nagel) and, the minigenes' transcripts were analyzed by semi-quantitative RT-PCR using the OneStep RT-PCR kit (Qiagen), as previously described [43]. The sequences of the RT-PCR primers are shown in Additional file 1: Table S1. Later, RT-PCR products were separated by electrophoresis on 2.5% agarose gel containing EtBr and visualized by exposure to UV light under saturating conditions using the Gel Doc XR image acquisition system (Bio-Rad), followed by gelpurification and Sanger sequencing for proper identification of the minigenes' transcripts. Finally, splicing events were quantitated by performing equivalent fluorescent RT-PCR reactions followed by capillary electrophoresis on an automated sequencer (Applied Biosystems), and computational analysis by using the GeneMapper v5.0 software (Applied Biosystems).

Clinical characteristics and family history
Upon querying the Hereditary Cancer Biobank of the Norwegian Radium Hospital for cases that fulfill the AMS and/or the revised Bethesda guidelines, we identified 34 unrelated potential high-risk CRC individuals who did not carry pathogenic variants in MMR genes. The median age at first CRC diagnosis was 51.5 years (range: 34-86 years).
Pedigree information showed that 13 (38%) families fulfilled the AMS I and/or II criteria and the revised Bethesda guidelines while 21 (62%) met the revised Bethesda guidelines only (Table 1). Fifteen (44%) patients had tumors with MSI and/or MMR IHC data available, of which 2 (13%) were MSI-high and/or MMR deficient. Clinical, family and tumor data information is detailed in Table 1.

Germline findings
Given that the families that fulfilled the AMS criteria and/or the Bethesda guidelines did not carry pathogenic variants in the MMR genes, we hypothesized that other genes could be implicated in the genetic determinism of these phenotypes.
In order to pursue this hypothesis, we collected DNA samples from all probands and performed highthroughput sequencing of a panel of 44 cancerassociated genes. For the 34 samples, mean depth of coverage ranged from 127 to 507 with the fraction of target bases with coverage ≥25 ranging from 80% to 93. The NGS results revealed that each individual carried an average of 26 SNV (between 19 and 33 per individual) in the set of 44 cancer susceptibility genes, most of which were common polymorphisms (allele frequency ≥ 1% in the general population) according to the ExAC database, and some being classified as benign or likely benign (class 1 or class 2) according to either ClinVar or the American College of Medical Genetics and Genomics (ACMG) guidelines [35,50] (Table 2). Importantly, we identified a likely pathogenic variant in a moderate-penetrance gene (CHEK2 c.470 T > C, p.I157T) in a female patient diagnosed with colon cancer at 42 years, melanoma at 44 years and BC at 57 years with a proficient IHC MMR profile and fulfilling the revised Bethesda guidelines (Patient 19,609) ( Table 1).

Protein and splicing-dedicated in silico analyses
The 25 unique VUS were analyzed by using five in silico prediction tools with different underlying algorithms to estimate the impact of the variants on the structure and function of the corresponding proteins.
Concordances between the 5 prediction tools were found for 2 out of the 25 VUS, suggesting a potentially damaging effect on protein level for the variants: MUTYH c.812G > A (p.R271Q) and MSH2 c.128A > G (p.Y43C) ( Table 3 (Table 3).
Two out of the 25 VUS were bioinformatically predicted to affect RNA maturation by potentially modifying splicing signals (Table 3). More specifically, according to our in silico results, NOTCH3 c.5854G >A (identified in Patients 3222 and 4932) was predicted to potentially induce exon 32 skipping by alteration of exonic splicing regulatory elements, whereas MAP3K1 c.764A > G (detected in Patient 21,368) was predicted to introduce a deletion of the first 131 nucleotides of exon 3 (r.634_764del) due to the creation of a putative new acceptor splice site. Skipping of NOTCH3 exon 32 would produce a transcript with a frameshift deletion of 98 nucleotides (NOTCH3 r.5816_5913del), potentially leading to the production of a carboxy-terminally truncated NOTCH3 protein p. (Lys1940Glyfs*14). The MAP3K1 r.634_764del transcript would be expected to be degraded by nonsense mediated decay and/or result in a very short MAP3K1 protein p. (Val212Leufs*45). The NOTCH3 c.5854G >A was identified in two patients (Patients 3222 and 4932) that fulfilled the revised Bethesda guidelines and AMS criteria, respectively while the MAP3K1 c.764A > G (Patient 21,368) in a patient which family fulfilled the revised Bethesda guidelines (Table 1).

Minigene splicing assays
Because patient RNA was not available, we decided to experimentally assess the impact of these 2 variants (NOTCH3 c.5854G >A and MAP3K1 c.764A > G) might have on RNA splicing by performing cell-based minigene splicing assays.
As shown in Fig. 1 we found that NOTCH3 c.5854G >A and MAP3K1 c.764A > G did not modify the splicing pattern of the minigenes' transcripts. These data thus disagree with the in silico predictions and suggest that either the exon 32 of NOTCH3 and the exon 3 of MAP3K1 are refractory to splicing mutations (the predictions thus being incorrect) or that the minigenes used in our study do not fully reproduce the splicing pattern of the mutant exons in NOTCH3 and MAP3K1 bona fide transcripts (the predictions being eventually correct). Complementary studies using RNA from NOTCH3 c.5854G >A and MAP3K1 c.764A > G carriers need to be performed to verify the pertinence of these results.

Discussion
The major unexpected finding in our Norwegian highrisk CRC cohort was the detection of a likely pathogenic variant in CHEK2 (c.470 T > C, p.I157T), a moderatepenetrance gene not traditionally associated with CRC, in an individual with a LS-evocative personal/family history and a high number of Class 3 variants in BC-and CRC-associated genes. Interestingly, the CHEK2 (c.470 T > C, p.I157T) has an allele frequency of 1.89*10-3 in the Norwegian population (http://norgene.no/vcf-miner/), and is reported in ClinVar as having conflicting interpretations of pathogenicity/being a risk factor (Variation ID: 5591). Importantly, there is no systematic classification for most of the genetic variants found by NGS, and, in more general terms, the impact of low-to moderate-penetrance pathogenic variants with respect to clinical management is not fully understood [52]. Co-segregation or case-control studies for further evaluation will be key in understanding whether such germline variant may have a modifying effect, since we do not yet have evidence-based guidelines for the majority of these genes.
On the other hand, CHEK2 germline variants have been described to confer an elevated risk of BC (relative risk = 3.0) [53]. However, the presence of pathogenic variants in CHEK2 is not frequently associated with cancer in high-risk BC families, prompting speculation that there may be several low-penetrance or moderatepenetrance BC risk genes segregating independently     within these families [23,54,55]. Co-segregation analyses may add clues in our understanding whether this germline variant is implicated in CRC predisposition. Finally, we did not find pathogenic variants in POLE in our cohort, which is in contrast to what has been described in families with high burden of CRC adenomas and carcinomas in addition to extra-colonic cancers [56]. According to the Prospective LS Database (PLSDB), a total of 125 Norwegian families had a demonstrated pathogenic variant in either MLH1 (n = 21), MSH2 (n = 52), MSH6 (n = 36), or PMS2 (n = 16) [25]. On the other hand, a large portion of high-risk CRC families without pathogenic variant in MMR or EPCAM genes may be explained by a polygenic model involving a combination of multiple genomic risk factors, including the effect of either low-penetrance susceptibility alleles [57], highpenetrance genes which have not been tested, or the effect of environmental factors. In addition, emerging data suggest that CRC cases negative for pathogenic MMR variants may contain a significantly higher number of copy-neutral loss of heterozygosity (cnLOH) regions, some located within well-known oncogenes and tumor suppressor genes, compared to cases of sporadic CRC [58]. These genomic variations, which were not investigated in this study, may provide an additional explanation for high-risk CRC phenotypes without MMR or EPCAM pathogenic variants.
Recent NGS studies described the presence of heterozygous pathogenic BRCA1/2 or APC variants as well as biallelic MUTYH alterations in individuals with clinical features resembling those of LS [5,22]. More precisely, those studies reported that 7% of patients with CRC carried pathogenic variants in non-LS genes, including 1.0% with BRCA1/2 mutations, and nearly two thirds of probands with high-penetrance non-LS mutations lacked clinical histories suggestive of their respective syndromes [5].
From 34 high-risk CRC individuals, our NGS panel testing identified one patient that carried a pathogenic variant in a gene with reportedly moderate penetrance. Our finding is in line with the mutation frequency (6%) in non-LS cancer susceptibility genes for individuals undergoing LS genetic testing [21] and 4% of patients with BC tested negative for BRCA1/2 genes [23]. Our results may have implications for an appropriate genetic Besides the likely pathogenic CHEK2 variant, we identified a total of 25 variants in our cohort for which there were not so much data as to their clinical significance. We thus undertook bioinformatics analyses in an attempt to predict the biological impact of these Class 3 variants, both at the RNA and protein level, the ultimate goals being: (i) to discriminate pathogenic from nonpathogenic alterations in this set of variants and (ii) to further pinpoint the genetic determinants of high risk CRC in our cohort. On one hand, our RNA splicingdedicated bioinformatics evaluation predicted that 2 out of the 25 VUS identified in this study (NOTCH3 c.5854G >A, p.V1952 M and MAP3K1 c.764A > G, p.N255S) could potentially affect RNA splicing. These two variants were then experimentally analyzed by performing minigene splicing assay. Our results revealed that neither variant altered the splicing pattern of the representative minigenes, suggesting that they do not affect the splicing of NOTCH3 or MAP3K1 transcripts. Additional experiments based on the analysis of RNA from carriers of these variants will be important to verify our minigene results. On the other hand, our proteindedicated bioinformatics analysis yielded 8 consistent predictions (2 VUS predicted as deleterious and 6 as benign) and several conflicting results that were not explored further.
In this scenario, not only functional tests, but also co-segregation studies will be key to understanding whether the VUS detected in this work are nonpathogenic or otherwise have a causal or a modifying effect. Importantly, we do not yet have evidencebased guidelines for the majority of the genes carrying the VUS identified in this study and, in more general terms, the impact of low-to moderatepenetrance pathogenic variants with respect to clinical management is not fully understood. Most of these variants may in the future be reclassified as deleterious or benign, but in the meantime, they cannot be used to make clinical decisions [59]. Informed (re)classification of VUS in cancer-associated genes may cater to more appropriate risk-management, and may provide significant clues for the identification of additional patients carrying such uncommon variants.
NGS panel testing may benefit patients with a personal or family history compatible with more than one recognized CRC inherited syndrome. The CRC risk management strategy for these individuals is not yet available and there is a need to identify new high-, moderate-, and low-penetrance gene variants that may affect the risk of CRC or LS-associated tumors in non-MMR pathogenic carriers. The identification of such gene variants in combination with family history may contribute to more intensive surveillance and improved prevention [23].

Conclusions
Our study provides information on genetic locus that might possibly be related to cancer susceptibility, demonstrating that genes presently not routinely tested may be important for capturing cancer predisposition in these patients. In addition, we stratified 25 VUS by the use of RNA splicing-and protein-dedicated in silico analyses. Further studies are necessary for making reliable estimates of cancer risk for the VUS found in this study and allowing appropriate genetic counseling for the patients and their relatives.
Surveillance for early cancer detection is essential to ensure optimal survival for patients afflicted with familial cancers. Our findings pinpoint the need of more studies to unravel the mechanisms underlying the development of CRC in high-risk patients and the identifying for new cancer predisposition genes.

Additional file
Additional file 1: Table S1. Primers used in the pCAS2 minigene splicing assay.