The RS4939827 polymorphism in the SMAD7 GENE and its association with Mediterranean diet in colorectal carcinogenesis

Background The objective of our investigation is to study the relationship between the rs4939827 SNP in the SMAD7 gene, Mediterranean diet pattern and the risk of colorectal cancer. Methods We examined 1087 cases of colorectal cancer and 2409 population controls with available DNA samples from the MCC-Spain study, 2008–2012. Descriptive statistical analyses, and multivariate logistic mixed models were performed. The potential synergistic effect of rs4939827 and the Mediterranean diet pattern was evaluated with logistic regression in different strata of of adherence to the Mediterranean diet and the genotype. Results High adherence to Mediterrenean diet was statistically significantly associated with colorectal cancer risk. A decreased risk for CRC cancer was observed for the CC compared to the TT genotype (OR = 0.65 and 95% CI = 0.51–0.81) of the rs4939827 SNP Also, we could show an association between the Mediterranean diet pattern (protective factor) and rs4939827. Although the decreased risk for the CC genotype was slightly more pronounced in subjects with high adherence to Mediterrenean diet, there was no statistically significant synergistic effect between genotype CC and adherence to the Mediterranean dietary pattern factors. Conclusion The SMAD7 gene and specifically the allele C could be protective for colorectal cancer. An independent protective association was also observed between high adherence Mediterranean diet pattern and CRC risk. Findings form this study indicate that high adherence to Mediterranean diet pattern has a protective role for CRC cancer probably involving the Tumor Growth Factor- β pathway in this cancer. Electronic supplementary material The online version of this article (10.1186/s12881-017-0485-5) contains supplementary material, which is available to authorized users.


Background
Colorectal cancer (CRC) has the third highest incidence in men, and the second highest in women worldwide [1]. This cancer has an increasing incidence in developing countries [2] probably due to the increasing prevalence of environmental factors that contribute to the development of CRC, for instance, dietary pattern and physical activity [3]. On the other hand, mortality levels are decreasing in countries with specialized care and better screening services [4]. Gene-Environment (GxE) interactions may play an essential role in increasing the susceptibility to developing colorectal cancer [3]. Among these factors, it is important to pay attention to the Mediterranean Diet Pattern adherence, since lower adherence to this pattern is more common as a direct consequence of the general westernization of the lifestyle's population [5][6][7]. Assessing the adherence to a Mediterranean dietary pattern is one of the particularly interesting approaches which may help to understand the relationship between diet and CRC [8]. However, the effect of this pattern over health depends on many individual aspects, involving genetic factors and polymorphisms. The lack of knowledge about the interactions between diet-polymorphism is a huge problem for the public health [9].
Earlier studies show a possible relationship between the diet pattern and the rs4939827 SNP in SMAD7 (SMAD family member 7), which has been associated to CRC previously. The SMAD7 gene acts as a Transforming Growth Factor Beta (TGF-β) family inhibitor by blocking the pathway signaling ( Fig. 1) [10]. TGF-β encodes for a cytokines family, which are multifunctional peptides that control some process like cell proliferation. In this way, when TGF-β is inhibited by the interaction with SMAD7, cell proliferation is promoted and this could lead to the development of cancer. Another key gene in the undirected regulation of SMAD7 expression is the gene Ski (SKI protooncogene), which acts by blocking the TGF-β target genes, as SMAD7 [11,12].
The interaction between the Mediterranean diet and the SMAD7 gene may be due to the action of other genes that are involved in the TGF-β pathway, in particular SMAD3 (SMAD family member 3), which is inhibited by the first one [13]. This pattern of diet is characterized by high fiber contents and before a fermentative process Na-butyrate is produced. This compound interacts with SAMD3, highlighting the signaling of this pathway [11]. In this way, we can check that the relationship between the dietary pattern and SMAD7 expression is real.
The goal of this article is to investigate the relationship between the rs4939827 SNP, the Mediterranean Diet Pattern and the risk of CRC.

Study population
MCC-Spain is a multicentric case-control study with population controls aiming to evaluate the influence of environmental exposures and their interaction with genetic factors in common tumors in Spain (prostate, breast, colorectal, gastroesophageal and chronic lymphocytic leukemia). Between September 2008 and December 2013, subjects between 20 and 85 years of age (matched by 5-year age interval) with histologically-confirmed newly-diagnosed colon or rectum cancer (ICD-10: C18, C19, C20, D01.0, D01.1, D01.2) were recruited in 23 Spanish hospitals from 12 Spanish provinces. Simultaneously, population-based controls frequency-matched to cases, by age, sex and region with the join distribution of the tumors included in MCC were randomly selected from primary care centers within hospitals' catchment areas. All participants signed an informed consent. Approval for the study was obtained from the ethical review boards of all recruiting centers. Additional information regarding the study design is provided elsewhere [14]. In our study, 3496 individuals were included from this MCC-Spain study. For this analysis, 1087 cases of colorectal cancer and 2409 controls were involved with available DNA samples for genetic analysis.

Lifestyle information
A computerized epidemiological questionnaire including self-reported socio-demographic and anthropometric data, family history of cancer, environmental exposures, use of selected drugs, reproductive history and current and past lifestyle behaviors (including leisure time physical activity and sedentary lifestyle) was administered by trained personnel in a face-toface interview at enrolment. Waist and hip circumferences were measured by the interviewer [14]. Subjects were provided a previously validated semi-quantitative Spanish Food Frequency Questionnaire (FFQ) [15] which was modified to include regional products. The FFQ included 140 food items, and assessed usual dietary intake during the previous year. The FFQ included the specific cooking methods for meat and some pictures to establish how thoroughly-cooked participants prefer it. The FFQ was self-administered and returned by mail or filled out face to face (global response rate 88%). Frequency data was used to derive amount (g/day, g/1000 kcal) of each of the individual food types and summary variables. The food composition table has been a compiled table from the Centre for Higher Studies in Nutrition and Dietetics (CESNID) and other specific sources [16]. Crosscheck questions on food groups intakes were used to adjust the frequency of foods eaten and reduce misreporting of food groups with large numbers of items [17,18]. Data from this questionnaire was used to obtain a score in order to know the adherence to the Mediterranean diet by Sofi F [19]. Sofi F. et al. have computed the adherence to this type of diet taking into account 9 groups of foods (vegetables, legumes, meat, fish, integral cereals, fruit, dairy products, alcohol, olive oil) [19]. The range of this score goes from zero to eighteen points. From that score, we create subsets based on individuals with low (if the score is <9), medium (from 9 to 11 points) or high (if the score is higher than eleven points) adherence to this kind of diet [12]. Since the Mediterranean diet has a high fiber content and the SMAD7 gene can interact with it, an analysis of this factor was considered essential.
For the variable of physical activity, the last 10 years were taken into account. This parameter also includes recreational physical activity. The assessed parameter was METs, a measured unit of metabolism. The responses obtained allowed us to create four subpopulations: sedentary, low physical activity, average physical activity and intense physical activity.

Samples processing
Peripheral blood (27 ml) was drawn from participants, which were aliquoted in whole blood, plasma, cellular fraction for DNA extraction, and serum and stored at −80°C. Saliva was collected for subjects refusing to donate blood with the Oragene® DNA Kit and stored at room temperature until DNA extraction. We collected biological samples for DNA extraction for participants with interview, as well as toenail and hair samples. In 4 centers (Madrid, Cantabria, Asturias and Huelva) cases and controls also donated urine samples (60 ml) that were aliquoted and frozen at −80°C. Fresh tumor biopsies or paraffin embedded samples are available in all participating hospitals. Standardized basic clinical and pathological information on the diagnosis and treatment of tumors was collected from hospital records by using a predefined format.
Genetic analyses were carried out within MCC-Spain and also through participation in international consortia. The InfiniumHumanExomeBeadChip from Illumina was used to genotype >200,000 coding markers plus 6000 additional custom variants on the pathways of interest [14].

SNP selection
After a literature search, different SNPs that were associated with CCR were selected. These SNPs were processed with the PLINK software in order to verify that they had been collected in the database of the MCC-Spain. This file contains the basic data needed for statistical analysis (cases and controls, SNPs, and the identifier of each individual). Finally, after an ad-hoc evaluation of the polymorphisms, rs4939827 was selected and we performed the statistical analyses described in the following section because only this SNP was considered relevant in its relation with colorectal cancer (see Additional file 1: Table S1).

Statistical methods.
First, a descriptive statistical analysis was performed to determine the characteristics of our study population. For age, mean and standard deviation were calculated. For the rest of variables, we calculate frequencies of cases and controls using STATA. Table 1 show variables used in analysis and how they were classified.
Then multivariate logistic mixed models, including the study region as a random effect term, were performed to evaluate the association between genotypes of selected polymorphisms and the risk of CRC. The same analysis was made in order to show the association of the component of the Mediterranean diet with the colorectal cancer. Adjusted odds ratios and confidence intervals at 95% were calculated by reference to the homozygous genotype of the most common allele (T allele). The minimally adjusted odds ratio was calculated by adjusting by sex, age and educational background. The adjusted odds ratios were calculated taking into account the variables mentioned before and BMI, total energy intake, family history first degree, race and METS.
Finally, linear regression was used for associations between polymorphisms and the study variables. For this, the online software used was SNPstats, by the logadditive model mainly, although we also observed the results of other models (dominant, codominant, recessive and overdominant) [20]. By using STATA, the rs4939827 SNP association with the Mediterranean diet pattern was evaluated with logistic regression. For that, we stratified the variable of adherence to the Mediterranean diet and the genotype was also taken into account.

Results
The initial study population consists of 6090 individuals, distributed into 2140 cases and 3950 controls. Among them, 3496 had both genetic information and the dietary pattern available, the algorithm of missing data can be seen in Fig. 2. Table 1 shows the main characteristics of those individuals which the analysis was performed with. Analyzing the missing data, it has been observed that only men have significant differences (in both case and control groups). The rest of variables do not show significant differences (Data not shown).

Descriptive analysis
In this study 3496 participants have been included, 1087 cases (65.1% males; 34.9% females) and 2407 controls (54% males; 46% females). The characteristics of the individuals of this study are described in Table 1. Data belong to a multi case-control study and matches were performed by frequencies in all types of tumors (colorectal, mama, prostate, leukemia, stomach), so it can be the reason for the mismatch between the sex distribution between affected cases and frequency-matched controls.
In the study population, the average age is 66.6 (±10.4) in cases and 63.1 (±11.5) in controls. Physical activity habits show a sedentary population, with 43.0% in cases and 35.5% in controls who do not practice physical activity. BMI shows that only 23.8% in cases and 31.5% in controls have an adequate weight. Most of the population suffers from obesity or being overweight.

Mediterranean diet pattern and components analysis
The components of the Mediterranean diet pattern by case-control status are shown in Table 2. We observe that there are significant differences in alcohol intake in the last 10 years between cases and controls, however we have not got any clear evidence that alcohol intake in the Mediterranean diet is associated with CRC. The opposite case can be found in vegetables and legumes, where a high intake of them may be protective against colorectal cancer. On the other hand, meat and dairy The minimally adjusted odds ratio (OR) was calculated adjusting by sex, age (as continuous variable) and educational level from a multivariate logistic mixed model.Area was used as random variable. The adjusted odds ratios were calculating taking into account the variables mentioned before and BMI, total energy intake, family history fist degree, race and METS. (N: Number; CI: Confidence Interval). Cut-offs are based on score criteria of Sofi's Mediterranean Pattern [19] products are risk factors when they are consumed in high quantities. Other components (integral cereals, fish and olive oil) do not show significant differences. Fruit cannot be analyzed because there are not enough subjects (Table 2). Taking into account the adherence to the Mediterranean dietary pattern, 68.6% of cases and in 75.3% of controls have a high or medium adherence to the Mediterranean dietary pattern. All these variables show significant differences between cases and controls ( Table 3).

Associations of rs4939827 with CRC and other factors
The rs4939827 SNP was statistically significantly associated with colorectal cancer according to several models. The most important result is referred to the Log-Additive model (p-value = 1.00E-04). The association of this polymorphism with CRC can also be observed in other models in a significant way: codominant model (pvalue = 2e-04); dominant model (p-value = 6e-04); recessive model (p-value = 8e-04). The analysis of association of this polymorphism with risk factors is also shown in the summary table.

rs4939827 associations with risk factors of this study
After analyzing the association between rs4939827 and risk factors, statistically significant results can be observed in the case of the degree of adherence to the Mediterranean dietary pattern. Other factors assessed did not show a significant association with the SNP (data not shown). We conducted an analysis for rs4939827 with an interaction term to evaluate the independent main effect of the Mediterranean diet and main effect of the protective rs4939827 allele (see Additional file 2: Table S2). However, none of the results provide a significant p-value although the overdominant model is close to p-value = 0.05 (p-value = 0.07), determining that C allele it the dominant one.
The CC genotype (16.1% in cases and 20.5% in controls) seems to be protective compared to the TT (35.6% in cases and 30.3% in controls) genotype (ORa TTvsCT = 0.82 IC (95%) = 0.69-0.98; ORa TTvsCC = 0.65 IC (95%) = 0.51-0.81). When we check the association with adherence pattern to the Mediterranean diet (only statistically significant association), it can be observed that high adherence is a protective factor ( Fig. 3 and Table 3). However, we could not see a synergistic effect between both genotype CC and adherence to the Mediterranean dietary pattern factors.

Discussion
SMAD7 is a target gene of the signaling pathway TGFβ-SMAD. For the TGFβ expression, it must be joined with its receptor type II. This binding stimulates the receptor type I, which binds to the above complex and, the complete assembly, is able to regulate the formation and activation (by phosphorylation) of the heterodimeric complex SMAD2/3. After the activation, this complex joins SMAD4 (SMAD family member 4) protein, which helps the complex translocation to the nucleus. In this organelle, the activation of the expression of target genes of TGFβ occurs, including the SMAD7 gene and other  Fig. 3 Association between SNP rs4939827 and the Mediterranean dietary pattern adherence in relation with colorectal cancer taking into account the genotype. Reference: TT genotype. Vertical axis: Adjusted Odd Ratio (The adjusted odds ratios (OR) was calculating taking into account the variables sex, age, socioeconomic level, BMI, total energy intake, family history fist degree, METS and area was used as random variable). Abscissa axis: Mediterranean Diet Pattern Adherence genes responsible for proliferation inhibition and cell division. This happens when the SKI protein is not in the nucleus, which is capable of binding with the heterotrimeric complex and inhibit its function, inhibiting the expression of the SMAD7 gene in this way (Fig. 1) [6,7]. In this study of 1087 cases of CRC and 2409 controls, the rs4939827 SNP with the CC genotype was associated with a reduction of the risk for colorectal cancer. This may be due to SMAD7 participation modulating the TGFβ pathway. SMAD7 protein is capable of binding with TGFβ, causing its ubiquitination and destruction. In this way, the action of TGFβ is inhibited and cell proliferation can occur. To clarify the role of SMAD7 in cell growth in the CRC, Halder et al. [21], overexpressed this gene. This fact produces a higher cell growth than in normal conditions (21). At the same time, it has been observed that inhibition of SMAD7 with a specific oligonucleotide reduces cell growth in CRC [22]. This data leads to the hypothesis that rs4939827 with genotype CC is a protective genotype because the correct expression of the SMAD7 gene is inhibited, preventing cell proliferation and reducing the susceptibility to colorectal cancer.
This hypothesis is supported by B. Zhang et al. (2014) [23] who share the idea that the gene variant rs4939827 may be associated with an increase of CRC survivorship [23]. Similarly, Slattery M. et al. (2010) [24] also agree with our results as they show that the CC genotype is inversely associated with the risk of colon cancer. Thus, they conclude that individuals containing the homozygous recessive gene variant rs4939827 show a reduced risk of colon cancer by a 27% [24]. Thompson C.L. et al. (2009) [25] performed an analysis of the SNP taking into account the dominant model, stratifying the population according to gender. They noticed that women with the C allele were associated with a decreased risk of CRC. In contrast, they did not obtain significant results in men [25].
The CC genotype of this polymorphism was also associated with survival of other cancers. For example, according to Geng, T.T. et al. (2015) [26], the dominant model showed that the rs4939827 polymorphism was significantly associated with a decreased risk of esophageal cancer by 0.67 fold due to a possible inhibition of the SMAD7 gene [26].
The T allele must also be taken into account. Although T allele does not show a significant risk in our results, many authors support the idea by Jung KJ et al. (2015) [27], which defended the hypothesis that the T allele of this SNP was a risk factor for colorectal cancer and rectal cancer [27]. Furthermore, according to analysis made by Baert-Desurmont S. et al. (2015) [28] an increased risk is observed depending on the number of T alleles present in genotype [28]. Ho J. W. et al. (2011) [29] analyzed the risk of TC heterozygous and TT homozygous genotypes. They identified both as risk genotypes, but only the homozygous genotype showed a significant result [29]. Tenesa A. et al. (2008) [30], after taking into account the location of the tumor to perform the analysis, noted that the T allele is a risk factor mainly for rectal cancer and not for the colon cancer [30]. In our results, we cannot verify that information, since we did not identify the T allele as a risk allele, only the C allele could be identified as a protective allele. Tenesa A. et al. (2010) [31] also associated the T allele of the rs4939827 SNP with an increase of mortality [31]. Passarelli M.N. et al. (2011) [32] found that SMAD7 variants that inhibit TGFβ completely may reduce its tumor suppressor activity (resulting in an increased risk), but can also reduce their ability to promote their metastatic promoter activity (resulting in a slower progression of the tumor and improved survival) [32]. Finally, Yao et al. (2015) [33] support that the T allele increases the risk of CRC in Caucasian population [33].
If we analyze the SNP association with diet, our results are statistically significant. However, few authors refer to the association of this factor with the rs4939827 polymorphism. As it is mentioned before, association with the diet is due to the TGFβ signaling pathway genes that interact with SMAD7. There is evidence that this association may be due to the ingestion of high amounts of fiber in the Mediterranean diet. Vegetables and fruit, for example, are highly fermentable fiber sources, while products such as wheat bran are low fermentable. The difference is the amount of short-chain fatty acid formed after fermentation of the fiber by the colon's bacteria in this organ, as described by Nguyen K.A. et al. (2006) [11]. So, food containing fermentable fiber (fruit and vegetables) will produce higher amount of fatty acids than that containing short chain poorly fermentable fiber. Therefore, there is a positive relationship between the consumption of large amounts of fiber and a lower risk of colon cancer. This may be due to the fact that the Na-butyrate (Na-B, the main product of the fermentation of fiber) is able to induce cell cycle arrest, cellular differentiation and, even, apoptosis. It seems that the Na-B is capable of inducing, selectively, Smad3 phosphorylation, so pathway activity of TGF-β is enhanced [6,7]. Despite this, we have not found a statistically significant interaction in our results. It may be due it has not got enough statistical power or the actual interaction is carried out with SMAD3. This produces that TGF-β is not ever ubiquitined and degraded, so cyclins jogging cell proliferation (p15, p21 and p27) are expressed [34]. Besides, one of SMAD7 functions is to produce cell cycle arrest in G1 phase. However, because of this mutation, this activity cannot be carried out by this process. Na-B intake helps this action be covered, further minimizing the risk of uncontrolled cell proliferation.
One limitation in this work is that genotype x diet effect could be due to fiber intake rather than adherence to the Mediterranean diet, although both concepts are related.

Conclusions
There is an important role of SMAD7 gene in colorectal cancer as the allele C is protective against this disease. Also, we can see a protective association between rs4939827 SNP and high adherence to the Mediterranean diet pattern and we suppose that high adherence to Mediterranean diet pattern probably has a protective role. So they may participate together in the TGFβ pathway in colorectal cancer.

Additional files
Additional file 1: Table S1. Analysis of SNPs related to CRC. The minimally adjusted odds ratio (OR) was computed adjusting by sex, age, educational level. Area, places where cases and controls were recruited, was used as random variable. Association is evaluated for carrying 1 or 2 SNP minor frequency alleles relative to a reference of zero SNP minor frequency allele.