Systematic search for enhancer elements and somatic allelic imbalance at seven low-penetrance colorectal cancer predisposition loci

Background Common single-nucleotide polymorphisms (SNPs) in ten chromosomal loci have been shown to predispose to colorectal cancer (CRC) in genome-wide association studies. A plausible biological mechanism of CRC susceptibility associated with genetic variation has so far only been proposed for three loci, each pointing to variants that affect gene expression through distant regulatory elements. In this study, we aimed to gain insight into the molecular basis of seven low-penetrance CRC loci tagged by rs4779584 at 15q13, rs10795668 at 10p14, rs3802842 at 11q23, rs4444235 at 14q22, rs9929218 at 16q22, rs10411210 at 19q13, and rs961253 at 20p12. Methods Possible somatic gain of the risk allele or loss of the protective allele was studied by analyzing allelic imbalance in tumour and corresponding normal tissue samples of heterozygous patients. Functional variants were searched from in silico predicted enhancer elements locating inside the CRC-associating linkage-disequilibrium regions. Results No allelic imbalance targeting the SNPs was observed at any of the seven loci. Altogether, 12 SNPs that were predicted to disrupt potential transcription factor binding sequences were genotyped in the same population-based case-control series as the seven tagging SNPs originally. None showed association with CRC. Conclusions The results of the allelic imbalance analysis suggest that the seven CRC risk variants are not somatically selected for in the neoplastic progression. The bioinformatic approach was unable to pinpoint cancer-causing variants at any of the seven loci. While it is possible that many of the predisposition loci for CRC are involved in control of gene expression by targeting transcription factor binding sites, also other possibilities, such as regulatory RNAs, should be considered.

GWASs are based on genotyping SNPs which tag linkage disequilibrium (LD) blocks in the genome, thus capturing a high proportion of common genetic variation. Hence, usually the associating tag SNPs are not themselves causal but rather are in LD with disease-causing variants. We have previously demonstrated that the tag SNP rs6983267 at 8q24 directly disrupts a TCF-4 transcription factor binding site and enhances Wnt signalling in the colon [11]. This was supported by a simultaneous study showing a physical interaction between rs6983267 region and MYC proto-oncogene [12]. Furthermore, allelic imbalance (AI) at 8q24 seen in colorectal tumours favors the risk allele G of rs6983267, suggesting that the locus is somatically selected for in tumourigenesis [13]. At 18q21, a novel variant that correlates with rs4939827 reduces SMAD7 expression, leading to aberrant TGFβ (transforming growth factor beta) signalling [14]. Finally, rs16888589 at the 8q23 CRC locus was shown to influence EIF3 H expression by physically interacting with the promoter [15]. Unlike at 8q24, no significant difference in the alleles targeted by imbalance was detected in rs4939827 at 18q21 [4,14] nor in rs16892766 at 8q23 [15]. It is therefore likely that many low-penetrance cancer susceptibility loci may be explained by subtle changes in distant regulatory elements, and these changes can also play a role in the somatic tumour development. Based on the genes that locate inside or near the CRC-associating regions, including GREM1 at 15q13 and BMP4 at 14q22, alterations in TGFβ-superfamily signalling appear to be at the basis of several loci [10].
Possible biological mechanisms underlying CRC predisposition are yet to be discovered in the seven lowpenetrance loci at 15q13, 10p14, 11q23, 14q22, 16q22, 19q13, and 20p12, which is the focus of this study. First, possible somatic selection of the risk alleles was evaluated in heterozygous individuals by examining the tagging SNPs in tumour and corresponding normal tissues. The location of the SNPs at predicted enhancer elements was then investigated with an in silico tool. As none of the tagging SNPs were predicted to locate at transcription factor binding sites, the analysis was extended to the given LD regions. Putative functional variants were searched by genotyping all the known SNPs inside the associating LD regions that were predicted to disrupt transcription factor binding sites.

Study population
A population-based series of 1 042 CRC samples collected since 1994 from nine Finnish central hospitals was used in this study [16,17]. Both germline DNA extracted from blood or normal colonic tissue and corresponding fresh-frozen tumour DNA were available. Information on histological tumour grade and Duke's stage was obtained from pathology reports. The 837 control DNA samples used in this study were anonymous healthy blood donors from the Finnish Red Cross Blood Transfusion Service. Samples and clinicopathologic data were obtained with informed consent and ethical review board approval in accordance with the declaration of Helsinki.

Allelic imbalance (AI) analysis
Allelic ratios were compared by sequencing tumour and respective normal tissue DNA in heterozygous patients, as described previously [13,18,19]. In brief, allele peak height ratios of <0.6 and >1.67 between normal and tumour samples were considered imbalance (Tumour (Allele1/Allele2) / Normal (Allele1/Allele2) ). Tumour and normal samples were sequenced using Applied Biosystems BigDye v3.1 sequencing chemistry and ABI3730 Automatic DNA sequencer (Applied Biosystems, Foster City, CA, USA). Peak heights were manually measured from sequencing chromatograms using Chromas http://www.technelysium.com.au and Sequence Scanner (Applied Biosystems) softwares, based on which the allelic ratio was calculated. First, 90 tumournormal pairs were analyzed. If any trend towards imbalance was observed, all the available heterozygote samples were analyzed. All the tumours were microscopically evaluated by a pathologist and at least 64% of the analyzed tumours contained ≥70% of carcinoma tissue.

Statistical analysis
All the analyses were performed with R software. Exact binomial test was used in allelic imbalance analysis. Allelic odds ratios, 95% confidence intervals, and P-values were calculated with Pearson's Chi-squared test. To adjust for multiple testing we applied a Bonferroni correction (not shown in Table 1). Fisher's exact test was used in the analysis of clinicopathological characteristics.

Results
The tag SNPs of seven low-penetrance loci were sequenced in heterozygous tumour-normal pairs, in order to detect possible AI occurring in the neoplastic progression. The risk alleles were not significantly targeted by AI in any of the seven SNPs ( Table 2). The frequency of overall imbalance (loss of either the risk or the neutral allele) ranged between 9 and 31% at the seven loci (Table 2). In rs10411210, ten tumours showed loss of the neutral allele and five tumours loss of the risk allele, however AI occurred altogether in only 9% of the tumours ( Table 2). No significant differences were observed between the two AI groups in terms of Duke's stage (P = 0.3) or histological grade (P = 1.0).
elements with a score ≥ 300. One of the SNPs, rs11853552 at 15q13, was already previously genotyped by Jaeger et al. (2008) [5], and was therefore excluded from the analysis. The remaining 12 SNPs were genotyped in the same Finnish case-control series as the tag SNPs in previous studies (Table 1) [6,8,9]. None of the SNPs showed association with CRC ( Table 1). Three of the 12 SNPs (rs28768389, rs12899808, and rs34812868 in 15q13) were genotyped using sequencing and four additional polymorphisms that did not locate in any predicted binding sites were observed in the sequencing fragments. One of these SNPs, rs35614970 (A6/A3), showed significant association with CRC (OR 1.2, 95% CI 1.03-1.38, P = 0.02). The frequency of A6 was 0.705 in controls and 0.741 in cases. This association did not remain significant after correction for multiple testing (P = 0.21, Bonferroni correction for 13 SNPs). None of the other additional SNPs (rs11071928, rs34944927, and a novel C to T change at chr15: 30 806 922) showed association with CRC.

Discussion
In this study, we exploited the same approach as for 8q24 to systematically analyze molecular basis of seven low-penetrance CRC loci where the cancer-causing variants have not yet been identified. This is the first time, to our knowledge, that rs4779584, rs10795668, rs4444235, rs9929218, rs10411210, and rs961253 have been analyzed for possible AI in colorectal tumours. Sequencing of fresh-frozen tumour material provides accurate data on possible loss of the neutral allele or gain of the risk allele. In case of selective imbalance, subsequent copy number analyses can reveal whether the role of a variant resembles that of a tumour suppressor or an oncogene, and guide further functional efforts. The 8q24 locus, where gain of the risk allele was observed in rs6983267 [11,13], currently seems to be the only low-penetrance risk locus for CRC that is somatically enriched during tumourigenesis. Lack of imbalance favouring risk allele does not, however, preclude involvement in germline predisposition. It is therefore possible that some of the seven susceptibility variants analyzed in this study play a role in the early stages of neoplastic development, without providing further selective advantage in the somatic cancer progression.
Interestingly, 19q13 gain is common in both primary and metastatic CRC [25]. We did not, however, observe any significant difference in the alleles targeted by AI in rs10411210, nor association of neutral allele loss with more advanced disease stage. Loss of heterozygosity involving 10p14 has also been reported to occur in CRC [26] but we found no evidence of risk allele selection in rs10795668. Furthermore, deletion of 11q23-q24 is a frequent event in CRC, among other tumour types [27]. However, Tenesa et al. (2008) found no AI in favour of the risk allele in rs3802842 based on up to 43 CRCs [7], which is now confirmed by our analysis of 89 CRCs. Finally, although 18q loss is common in CRC [28], Broderick et al. (2007) observed no selection of the risk allele at rs4939827 in 248 CRC cases [4].
The location of rs6983267 at TCF-4 binding sequence was recently discovered using EEL [11], which prompted us to utilize this powerful tool also for the seven susceptibility loci. Transcription factor tissue specificities are incompletely understood, supporting the rationale of our unbiased approach of not restricting to colon-specific factors. Given that the tagging SNPs at the seven loci lie in noncoding regions, the most likely underlying mechanism is differential gene expression through enhancers or repressors. Although regulatory SNPs have been identified in the loci successfully fine-mapped so far, our study underscores the importance of considering also other mechanisms. For instance, sequence variation affecting noncoding regulatory RNAs, many of which have been linked to cancer-associated pathways, could explain some of the predisposition loci devoid of coding genes.

Conclusion
While successful in the 8q24 locus, the approach used in this study was unable to pinpoint causal variants in the seven low-penetrance CRC loci analyzed. Finding the underlying functional changes in the GWAS loci is a challenging, yet important, task in order to fully understand the biology behind common CRC susceptibility.