Shared ancestral susceptibility to colorectal cancer and other nutrition related diseases

Background The majority of non-syndromic colorectal cancers (CRCs) can be described as a complex disease. A two-stage case–control study on CRC susceptibility was conducted to assess the influence of the ancestral alleles in the polymorphisms previously associated with nutrition-related complex diseases. Methods In stage I, 28 single nucleotide polymorphisms (SNPs) were genotyped in a hospital-based Czech population (1025 CRC cases, 787 controls) using an allele-specific PCR-based genotyping system (KASPar®). In stage II, replication was carried out for the five SNPs with the lowest p values. The replication set consisted of 1798 CRC cases and 1810 controls from a population-based German study (DACHS). Odds ratios (ORs) and 95% confidence intervals (CIs) for associations between genotypes and CRC risk were estimated using logistic regression. To identify signatures of selection, Fay-Wu’s H and Integrated Haplotype Score (iHS) were estimated. Results In the Czech population, carriers of the ancestral alleles of AGT rs699 and CYP3A7 rs10211 showed an increased risk of CRC (OR 1.26 and 1.38, respectively; two-sided p≤0.05), whereas carriers of the ancestral allele of ENPP1 rs1044498 had a decreased risk (OR 0.79; p≤0.05). For rs1044498, the strongest association was detected in the Czech male subpopulation (OR 0.61; p=0.0015). The associations were not replicated in the German population. Signatures of selection were found for all three analyzed genes. Conclusions Our study showed evidence of association for the ancestral alleles of polymorphisms in AGT and CYP3A7 and for the derived allele of a polymorphism in ENPP1 with an increased risk of CRC in Czechs, but not in Germans. The ancestral alleles of these SNPs have previously been associated with nutrition-related diseases hypertension (AGT and CYP3A7) and insulin resistance (ENPP1). Future studies may shed light on the complex genetic and environmental interactions between different types of nutrition-related diseases.


Background
Colorectal cancer (CRC; OMIM ID: 114500) is among the most common cancers in industrialized countries and one of the leading causes of cancer-related mortality [1]. The incidence rates vary among different groups and populations depending on sex, age and country, with higher rates among males than females and increasing with age [2]. The differences in the incidence rates across the globe are mainly attributed to differences in diet and other environmental factors.
In both sporadic and familial CRC, genes and environment together contribute to the risk of CRC. The majority of non-syndromic CRCs can be characterized as a complex disease [3]. The major factors that modify CRC risk are obesity, diabetes, red meat consumption, physical inactivity, alcohol consumption, chronic inflammation, and cigarette smoking [4][5][6]. Additionally, the intake of vitamin D, calcium, fruit and vegetables may potentially influence the risk of CRC [5]. Geneenvironment interactions also underlie other complex diseases, such as obesity (OMIM ID: 601665) and diabetes mellitus type II (T2D; OMIM ID: 125853).
A specific feature of CRC and other complex diseases is that they are mainly diseases of humans living in industrialized societies, in environments with almost unlimited food supply and low energy expenditure [5,7]. Nutrition is one of the most important environmental traits influencing the fitness of an individual. In the past centuries, the genetic constitution of an individual was supposed to optimize food utilization in order to protect against malnutrition. In modern societies, the ancestral genetic constitution might not be beneficial anymore, because it does not protect against the relatively new condition of overnutrition. Therefore, variants that protect against overnutrition and related health issues are supposed to be rare [7]. Nevertheless, genetic variants that promote a carbohydrate-based nutrition as well as genetic variants that show ancestral susceptibility to a nutrition-related disease have already been described (inter alia [8][9][10]). Signatures of such processes can be detected in the human genome using genome-wide approaches that evaluate differences in the world-wide allele frequencies and haplotype distribution (inter alia [11][12][13][14][15]).
Based on the interplay between genetic and environmental risk factors in nutrition-related complex diseases we posed the following hypothesis: "Polymorphisms with ancestral alleles associating with a nutrition-related disease and showing signatures of recent selection may be associated with CRC risk." We outline here the methods used for selecting such candidate genes and show the results when the selected variants were tested for CRC risk in two large case-control studies.

Candidate SNP selection for the case-control study
The study focused on SNPs, for which ancestral alleles have previously been associated with nutrition-related complex diseases other than CRC, such as obesity, T2D and metabolic syndrome. Information about such SNPs was collected from 30 published reports by browsing the PubMed database (http://www.ncbi.nlm.nih.gov/sites/ entrez?db=pubmed) [16] for the keywords "diabetes", "obesity", "metabolic syndrome" (OMIM ID: 605552) and "hypertension" (OMIM ID: 145500) up to 06/2009. Most of the articles were based on genome-wide association studies or were meta-analyses. A complete list of the publications can be found at the reference list of the Additional file1.
From these 30 reports, associations with the risk of the diseases and with the related quantitative traits were retrieved. The quantitative traits for diabetes were fasting glucose level and insulin resistance. For obesity, the traits were body mass index (BMI) and waist to hip ratio. The quantitative traits for hypertension and the metabolic syndrome were high-density lipoprotein (HDL) level, low-density lipoprotein (LDL) level, triglycerides level, salt sensitivity, blood pressure and insulin resistance. A complete list of the reported associations can be found in the Additional file 1.
The candidate SNP selection for the association study took place in three major phases ( Figure 1): (1) "Selection of Candidate SNPs": All published SNPs were evaluated for the nature of the risk alleleeither ancestral (A) or derived (D) -and the allele frequency differences between African, European and Asian populations (YRI: Sub-Saharan African population, Yoruba in Ibadan, Nigeria; CEU: Caucasian population, Utah residents with Northern and Western European ancestry from the CEPH collection; HCB: East Asian population, Han Chinese in Beijing, China; JPT: East Asian population, Japanese in Tokyo, Japan). An allele was considered a "risk allele" when it was associated with a significantly increased risk of a key-disease (OR>1; statistical significance based on the criteria of the original publication), or when it was associated with a significant increase of quantitative values in the original publication. The nature of the risk allele was determined by using the NCBI database (http://www.ncbi.nlm.nih.gov) [16].
The reported ancestral susceptibility SNPs that showed an absolute allele frequency difference of >45% between the African and any non-African population were chosen for further investigation. The threshold value of 45% was set to detect variants with a "major-to-minor" allele change between populations, thus indicating a possible influence of selective pressure. A second, lower threshold (25%) was set for the difference between the YRI and the CEU population to acknowledge the more recent separation of the European than the East Asian population from the African population.
(2) "Candidate Gene Definition": The SNPs that passed the first selection criteria were evaluated for their location in the genome, possible functional effects, linkage disequilibrium (LD) with other polymorphisms within the gene region and the number of candidate SNPs in the gene region.
(3) "Tagging SNP Approach": In addition to the evaluation of the reported ancestral susceptibility SNPs, a tagging SNP approach was carried out for each candidate gene or gene region using the genotyping data of the CEU population and HaploView © software [17]. Next to a minor allele frequency (MAF) of ≥5%, a tagging SNP had to feature the following parameters: be or capture a phase 1 SNP and/or be or capture a functional polymorphism and/or capture a maximal number of SNPs within a candidate gene or gene region with >25% allele frequency difference between the YRI and the CEU population In the majority of cases, the reported ancestral susceptibility SNP itself was genotyped. When that was not possible (e.g. because the assay design failed due to the structure of the surrounding sequence) another SNP that was in LD with the reported SNP (r 2 >0.9) was selected for genotyping in order to indirectly gain information about the reported ancestral susceptibility SNP. For large, diverse genes/gene clusters, additional tagging SNPs were selected in order to gain more knowledge about the genes. These tagging SNPs should also fulfil the criterion of >25% allele frequency difference between the YRI and CEU population (Table 1).

Study populations
First, a hospital-based study population from the Czech Republic was analyzed. Between 09/2004 and 05/2009, 1025 CRC cases were recruited by nine oncological departments in the Czech Republic [32]. The sampled patients showed positive colonoscopic results for malignancy, histologically confirmed as colon or rectal carcinomas. The patients who met the Amsterdam criteria I or II for hereditary nonpolyposis colorectal cancer (OMIM ID: 120435) were excluded from the study. During the same time period, 787 healthy controls were recruited by five gastroenterological departments of the Czech Republic [32]. They were individuals undergoing colonoscopy for various gastrointestinal complaints, such as macroscopic bleeding, positive faecal occult blood test or abdominal pain of unknown origin. Only individuals showing negative colonoscopic results for malignancies, colorectal adenomas, benign polyps or inflammatory bowel disease were eligible for the study. Beside general information about sex and age, information about BMI (OMIM ID: 606641) and diabetes status was available for most of the individuals ( Table 2).
The SNPs that showed nominally significant associations in the Czech population were additionally analysed in a German population-based case-control study. The DACHS (Darmkrebs: Chancen der Verhütung durch Screening) study contributed 1798 cases and 1810 matched controls recruited from 01/2003 to 12/2007 in South-West Germany [33,34]. The patients included in the study had a first diagnosis of invasive CRC. As controls, individuals were randomly selected from lists of residents provided by the population registries. In the detailed standardized questionnaires, information about BMI at least five years before sampling and diabetes status was available in addition to general information about sex and age [34]. Table 2 outlines the characteristics of the Czech and the DACHS population relevant for the study.

Statistical analysis
The observed genotype frequencies in the controls were tested for Hardy-Weinberg equilibrium (HWE) using χ 2 tests [35]. Odds ratios (ORs) and 95% confidence intervals (CIs) for associations between genotypes and CRC risk were estimated by logistic regression (PROC LOGISTIC, SAS Version 9.2; SAS Institute, Cary, NC) [36]. The estimated effects for all SNPs refer to the ancestral allele (A).
Due to the low allele frequency shown by the majority of the tested polymorphisms the dominant model was applied in all estimations. The ORs were adjusted for age and sex. Additionally, a pooled analysis of the two studies was conducted. The ORs for the pooled analysis were adjusted for age, sex and study population. Gene-gene interaction was studied for pair-wise interaction using logistic regression. To acknowledge the fact that men are in higher risk for CRC than women and that BMI is one of the most important risk factors for non-syndromic CRC, an analysis stratified by sex and BMI was performed. The threshold value for BMI was chosen according to the median BMI in the respective study population. To assess effect modification by sex and BMI, multiplicative interaction terms were utilized in multivariate regression models. P values ≤ 0.05 were considered statistically significant. Bonferroni correction was not applied because it would have been overly conservative since the SNP selection was hypothesis-driven and all selected SNPs have previously been associated with a phenotype predisposing to CRC. Instead, a replication study using the German sample population was conducted to validate the initial results in the Czech population.

Signatures of selection
Next to allele frequency differences between the African and the non-African populations, the study aimed to detect additional signatures of selection in those genes that were associated with CRC in the case-control study. Highly variable allele frequencies in different populations might be attributable to processes such as genetic drift, bottleneck events or founder effects that occur during the separation from the ancestral population. In order to encounter this problem, methods, which are less susceptible to demographic influences, were applied to investigate  [12,15]. The iHS measures the length of haplotypes around a given SNP in comparison to the whole genome. Values < −1.5 and > 1.5 (|1.5|) give conclusive evidence for natural selection while values < −2 or > 2 (|2.0|) give evidence for a powerful selection signal [15,37]. Values were estimated for the YRI, the CEU and the East Asian (ANS) population.

Candidate SNP selection
With the keywords "diabetes", "obesity", "metabolic syndrome" and "hypertension", we found in PubMed 30 publications, which reported 246 polymorphisms to be associated with the key-disease or with a related quantitative trait. Supplementary material (Additional file 1) provides information about all the 246 polymorphisms with the corresponding chromosomal position, allele status and frequency, reported associations with the diseases or the traits, information about the functionality of the SNPs and a complete reference list. Carriers of the ancestral (A) allele of 130 SNPs had an increased risk of a key-disease (52.8%), whereas 106 SNPs were associated with an increased risk due to the derived (D) allele. The majority of the SNPs were located in introns (all Twenty-nine SNPs fulfilled the selection criteria of allele frequency difference >45% between the African and the non-African populations and >25% between the YRI and the CEU populations. These SNPs with their corresponding genes were considered as candidates for the CRC case-control study. After considering the location and the function of the SNPs and the LD characteristics of the gene regions, 28 SNPs in 15 genes were selected for the case-control study (Table 1).

Association study
The genotype distribution of 27 of the 28 SNPs genotyped in the Czech control population was according to HWE. For ERBB3 rs11171739, the genotype distribution deviated from HWE (p <0.0001) and the SNP was not considered in the further analyses. Except for CYP3A5 rs776746, which was monomorphic in the Czech cohort, none of the observed allele frequencies differed significantly from the allele frequencies given in the NCBI database (CEU population).
Three SNPs in three genes showed modest associations with the risk of CRC in the Czech population ( Table 3). The ancestral alleles of two SNPs were associated with an increased risk of CRC: AGT rs699 (OR 1.26; 95% CI 1.01-1.57) and CYP3A7 rs10211 (OR 1.38; 95% CI 1.04-1.83). In contrast, the ancestral allele of ENPP1 rs1044498 SNP was associated with a decreased risk of CRC (OR 0.79; 95% CI 0.63-1.00). The gene-gene interaction analysis showed no evidence of epistasis (data not shown).
A replication study in the German DACHS population was carried out for the five SNPs with the lowest p values (rs699, rs10211, rs1044498, rs12592797, rs7298565). None of the analysed polymorphisms were associated with the risk of CRC in the German population alone or in the pooled analysis of the two populations ( Figure 2). Figure 2 Comparative data plot of the OR and 95% CI of the SNPs analyzed in the Czech and the DACHS cohort; dominant model, individual Czech and DACHS data adjusted for age and sex; joint data adjusted for age and sex, stratified by study; CZ Czech cohort; OR odds ratio; CI confidence interval.
In the data stratified by sex, ENPP1 rs1044498 was associated with a decreased risk of CRC in the male subgroup of the Czech population (OR 0.61; 95% CI 0.45 -0.83; p 0.0015; p interaction 0.01). No association was detected in the Czech female subgroup and in the German study (data not shown).
In the data stratified by BMI, modest associations were detected in the Czech subgroup with a BMI >27 for AGT rs699 (OR 1.54; CI 1.05 -2.25; p 0.027; p interaction 0.036) and CYP3A7 rs10211 (OR 1.78; CI 1.08 -2.93; p 0.023; p interaction 0.113). No association was detected in the German study (data not shown).

Signatures of selection
One important signature of selective pressure was already a criterion to select the SNPs for the study: a high allele frequency difference between HapMap populations. Thus, all SNPs that were associated with the risk of CRC fulfilled this criterion. The highest allele frequency difference among all analysed SNPs was found in ENPP1 rs1044498 (YRI vs. HCB 94.4%; YRI vs. CEU 87.3%). The SNPs AGT rs699 and CYP3A7 rs10211 showed an allele frequency difference of 52.5% and 67% (YRI vs. CEU), respectively.

Discussion
The present study intended to crosslink susceptibility variants of nutrition-related complex diseases to CRC. In fact, the results in the Czech hospital-based case-control study suggested that polymorphisms in AGT, CYP3A7 and ENPP1 may be associated with the risk of CRC. However, replication in the population-based German DACHS population did not confirm the associations.
From the 246 SNPs that have been reported to be associated with a nutrition-related disease, 130 showed ancestral susceptibility to overall risk of obesity, T2D, metabolic syndrome or hypertension. However, only 29 SNPs fulfilled the initial selection criterion of ≥45% allele frequency difference between the YRI and any HapMap population, indicating selective pressure. Except ABCA1 that has been found to be mutated in CRC tumour samples [38,39], none of the 15 genes of the present study has previously been associated with the risk of CRC (http://www.ncbi.nlm.nih.gov/; http://www.hugenavigator. net/ CancerGEMKB/home.do) [40].
The association study in the Czech population indicated ancestral susceptibility to the risk of CRC for the missense AGT SNP rs699 and to the 3'UTR SNP rs10211 in CYP3A7. Interestingly, SNPs in these two genes feature similar phenotypic effects, such as predisposing to hypertension and salt sensitivity [18].
Published data about AGT suggests that the ancestral allele of the probable pathogenic SNP rs699 (M268T), as well as the ancestral alleles of the missense SNP rs4762 (T207M) and the 50UTR SNP rs5051, predispose to essential hypertension, increased plasma angiotensinogen and increased frequency of preeclampsia (OMIM ID: Figure 3 Plot of Fay-Wu's H (a) and plot of the Standardized Integrated Haplotype Score (|iHS|) (b). Estimates for the SNPs that were associated with CRC in the Czech population. Comparison of the African (YRI), European (CEU) and East Asian (ANS) population. All estimates refer to SNPs that are linked to the genotyped SNPs because direct values were not available for the genotyped SNPs. * indicate values that provide conclusive evidence for natural selection [12,15,37]. 189800) [18,41,42]. Additionally, rs5051 (r 2 =0.95 to rs699) has been demonstrated to affect the transcription rate of AGT [41,42]. AGT (angiotensinogen [serpin peptidase inhibitor, clade A, member 8]) is an important member of the renin-angiotensin system that regulates blood pressure and fluid homeostasis probably through influencing sodium sensitivity [41,42].
Also the intronic CYP3A5 SNP rs776746 has previously been associated with hypertension and salt sensitivity [9,18]. This SNP has been reported to result in an incorrectly spliced mRNA and in a truncated nonfunctional protein. In the Czech population, rs776746 was monomorphic. However, the CYP3A7 SNP rs10211 -also located within the same cytochrome P450 gene cluster and linked to rs776746 in the HapMap CEU population (r 2 =0.82) -showed nominally significant association with the risk of CRC. The genes of the cytochrome P450 gene family encode for some of the most important enzymes involved in the metabolism of various xenobiotics and endogenous substrates such as cholesterol, steroids, environmental carcinogens and drugs [43]. In particular, CYP3 enzymes are responsible for the metabolism of eicosanoids.
The allele frequencies of the genotyped SNPs rs699 and rs10211 and the two functional SNPs rs5051 and rs776746 are highly variable among worldwide populations, with higher frequencies of the derived alleles in non-African populations while the ancestral alleles predominate in the African population. The values of |iHS| determined for SNPs that are fully linked to rs699 and rs10211 provided conclusive evidence for natural selection in the population with European ancestry. Additionally, rs2687075 (r 2 =1.0 to rs10211) in CYP3A7 showed strong negative values of Fay-Wu's H that were considered as signatures for a selective sweep in non-African populations [12,15]. Previous studies have already suggested AGT and CYP3A5 as targets of selection, and have additionally connected the two genes directly by their related function [18,42]. In AGT, selection was particularly suggested to work on the promoter that contains rs699 and on SNPs in high LD with it. This selection was attributed to altered requirements for the human to maintain sodium homeostasis [42]. Since the derived allele, that predisposes to salt tolerance, is not yet fixed in non-African populations, the remaining ancestral allele, that predisposes to salt sensitivity, shows ancestral susceptibility to related diseases such as hypertension, preeclampsia [41,42] and CRC.
A possible association with the risk of CRC and signatures of recent selection were also observed for one polymorphism in ENPP1. However, in contrast to the initial hypothesis, the ancestral allele of rs1044498 was associated with a decreased risk of CRC in the Czech population. ENPP1 is a member of the ecto-nucleotide pyrophosphatase/phosphodiesterase (ENPP) family. The encoded protein interacts with the insulin receptor thereby inhibiting subsequent signalling. In previous studies, the ancestral allele of the missense polymorphism rs1044498 (Q121K) has been associated with more vivid insulin receptor binding, stronger inhibition of insulin signalling, insulin resistance, an increased risk of T2D and an increased risk of myocardial infarction (OMIM ID: 608446) [9,44,45]. The associations were most pronounced in cohorts that underwent lifestyle interventions to improve an individual's weight or cholesterol level [45]. It is possible that the effect of ENPP1 on the risk of metabolic syndrome and the risk of CRC is highly dependent on additional environmental factors or modifiers. Unfortunately, we were not able to validate our results in the independent German case-control study. Since the Czech and the German populations should not differ significantly in their genetic constitution, differences in nutrition or other environmental factors may contribute to the observed results or the associations may be a chance finding [46,47]. Since the selection of candidate SNPs was based on complex gene-environmental interactions -with the SNP contributing to a phenotype that predisposes to CRC -the detected associations are expected to be weaker than the associations for the original intermediate phenotype. Already in the original reports about the associations of SNPs with nutritionrelated diseases, low ORs were detected in most of the cases. As we studied only a few polymorphisms per gene, other polymorphisms with low LD (r 2 <0.8) or rare SNPs (MAF<0.05) that may contribute to the risk of CRC might have been missed. However, considering previously reported associations of several SNPs in the three described genes with components of the metabolic syndrome, the ancestral nature of the risk alleles, and the detected signatures of selection, a true nature of the modest effects on CRC risk in the Czech population cannot be excluded [48]. Especially the close resemblance of the detected associations and function of SNPs in AGT and CYP3A7 may indicate a true effect of the polymorphisms on CRC susceptibility.

Conclusion
Our study showed evidence of association of the ancestral alleles of polymorphisms in AGT and CYP3A7 and the derived allele of a polymorphism in ENPP1 with an increased risk of CRC in Czechs, but not in Germans. The ancestral alleles of these SNPs have previously been associated with the nutrition-related diseases hypertension (AGT and CYP3A7) and insulin resistance (ENPP1). Future studies may shed light on the complex genetic and environmental interactions between different types of nutrition-related diseases. The application of additional selection criteria, such as ancestral susceptibility, signatures of selection, or pathway membership, might help to narrow down the numerous published polymorphisms and to find the most promising candidates for association studies. Large study populations that provide the possibility to define large subgroups with specific pre-diagnostic features may be used to review the actual function of such polymorphisms and may provide further insights into the evolution of common complex diseases.