Potential contribution of SIM2 and ETS2 functional polymorphisms in Down syndrome associated malignancies

Background Proper expression and functioning of transcription factors (TFs) are essential for regulation of different traits and thus could be crucial for the development of complex diseases. Subjects with Down syndrome (DS) have a higher incidence of acute lymphoblastic leukemia (ALL) while solid tumors, like breast cancer (BC) and oral cancer (OC), show rare incidences. Triplication of the human chromosome 21 in DS is associated with altered genetic dosage of different TFs. V-ets erythroblastosis virus E26 oncogene homolog 2 (ETS2) and Single Minded 2 (SIM2) are two such TFs that regulate several downstream genes involved in developmental and neurological pathways. Here we studied functional genetic polymorphisms (fSNP) in ETS2 and SIM2 encoding genes in a group of patients and control subjects to better understand association of these variants with DS phenotypes. Methods We employed an in silico approach to identify potential target pathways of ETS2 and SIM2. fSNPs in genes encoding for these two TFs were identified using available databases. Selected sites were genotyped in individuals with DS, their parents, ALL, BC, OC as well as ethnically matched control individuals. We further analyzed these data by population-based statistical methods. Results Allelic/genotypic association analysis showed significant (P < 0.03) differences of rs2070530, rs1051476, rs11254, rs711 for DS subjects compared to control. rs711 also exhibited significantly different genotypic distribution pattern in parents of DS probands (P < 0.02) and BC patients (P < 0.02). Interaction analysis revealed independent main effect of rs711 in all the groups, while rs11254 exhibited independent main effect in DS subjects only. High entropy values were noticed for rs461155 in the solid tumor groups. Significant interactive effects of rs2070531 with rs1051475, rs1051476, rs11254 were observed in all the groups except DS. Conclusions We infer from the present investigation that the difference in frequencies of fSNPs and their independent as well as interactive effects may be the cause for altered expression of SIM2 and ETS2 in DS and malignant groups, which affects different downstream biological pathways. Thus, altered expression of SIM2 and ETS2 could be one of the reasons for variable occurrence of different malignant conditions in DS.


Background
Transcription factors (TFs) regulate pathways related to diseases either through their direct action on the target genes or by controlling downstream pathways. Hence they are important candidates for investigating etiology of complex diseases. There are several TF encoding genes in the human 21 st chromosome (HSA21) and deregulated expression of any of these could influence downstream pathways. Due to trisomy of the HSA21 in Down syndrome (DS) (MIM# 190685), genetic overdosage of a number of TF encoding genes is a distinct possibility. DS patients are prone to acute leukemia, including acute lymphoblastic leukemia (ALL), while solid tumors especially breast cancer (BC) is rare [1]. We hypothesized that DS related abnormalities like intellectual disability, immunological imbalance, hormonal alteration, and predisposition to childhood acute leukemia could be due to improper expression and functioning of TFs located in the HSA21. Because disease association studies have revealed higher differential expression ratio in different tissues for the TF genes encoding Single minded 2 (SIM2) and V-ets erythroblastosis virus E26 oncogene homolog 2 (ETS2) within HSA21 [2], here we explored the role of these two TFs in DS phenotype and related malignancies.
SIM2 is important for normal neuronal development. SIM2 can heterodimerize with aryl hydrocarbon receptor nuclear translocator (ARNT) and translocate to the nucleus to transcriptionally regulate gene expression [3]. Expression of SIM2 mRNA has been detected in fetal brain regions associated with DS pathology [4]. SIM2 also plays an important role in carcinogenesis. After entry into a cell, carcinogenic compounds bind to the cytoplasmic Aryl hydrocarbon receptor (AhR) and are carried to the nucleus. Ligand-bound AhR together with ARNT [5] bind to the Xenobiotic Response Element present in the promoter region of certain genes encoding for oxidative enzymes [6][7][8]; transcriptional activation of these enzymes accelerates carcinogen metabolism [9]. SIM2 inhibits AhR/ARNT dimerization, thereby inhibiting carcinogen metabolism and promoting carcinogenesis [5,10]. In addition, SIM2 is the second most consistently over expressed gene in prostate cancer [11] and over expression of the short isoform of SIM2 (SIM2s) is reported in malignant colon, pancreas, and prostate tissues as compared to the corresponding normal tissues [9][10][11]. SIM2 has further been proposed to have a breast tumor suppressive activity [12] and a genome-wide linkage scan identified three putative breast cancer susceptibility loci, one of which (21q22) harbors SIM2 [13]. Therefore, SIM2 functions as a tumour selective marker and drug target in several types of malignancies [10].
Besides SIM2, ETS2 over expression induces craniofacial defects as well as skeletal anomalies in transgenic mice resembling DS [14]. Increased rate of neuronal apoptosis [15] and amyloid precursor protein (APP) gene transactivation are also observed upon ETS2 over expression [16], which might play an important role in the early onset of Alzheimer's disease and neuronal abnormalities in DS [16].
Given the functional importance of SIM2 and ETS2, we sought out to investigate alterations in their expression in disease etiology. Functional single nucleotide polymorphisms (fSNP) in candidate genes are important indicators for their association with disease phenotypes. In the present study, fSNPs of SIM2 and ETS2 were analyzed for their potential role in Indian individuals suffering from DS, ALL and solid tumors that includes BC and oral cancer (OC).

Methods
In Silico analysis to predict pathways regulated by SIM2 and ETS2 We undertook computational methods to determine the probable pathways regulated by SIM2 and ETS2. The promoter sequences (from-5000 bp to +1000 bp) were retrieved from the Eukaryotic promoter database (EPD) (http://www.epd.isb-sib.ch/) and Transcriptional Regulatory Element Database (TRED) (http://rulai.cshl.edu/ TRED). Presence of SIM2 and/or ETS2 binding sites in the promoter sequences were identified by a Perl based program "Consensus-Finder".
We retrieved expression profile of the genes harboring binding sites for SIM2 and ETS2 in different tissues by using GNF SymAtlas database (http://symatlas.gnf.org/ SymAtlas/). Tissue-specific differential expression pattern (fold change) were calculated separately by comparing the median value of expression; value greater than the median was considered as over expression and value less than median was considered as under expression. We then determined co-expression of putative target genes at the site of over expression/under expression of SIM2 and ETS2 and promoter sites of these genes were analyzed by GENEDOC and Promoter Scan tools respectively (http://www-bimas. cit.nih.gov/molbio/proscan/). Functions of these putative target genes along with SIM2 and ETS2 in various biological pathways was analyzed by Panther (http://www. pantherdb.org/pathway/) and KEGG pathway (http://www. genome.ad.jp/kegg/pathway.html). The entire process is presented schematically in Figure 1.

Subjects
Five ethnically matched groups of individuals were recruited for analysis of fSNPs. Healthy volunteers, without any clinical history of intellectual disability or malignant disorder, were recruited as controls (N = 149). Nuclear families having child with DS (N = 132) were recruited from the outpatient department of Manovikas Kendra, Kolkata and trisomic status of the probands was confirmed by karyotyping. ALL patients (N = 38) were recruited from the Netaji Subhash Chandra Bose Cancer Research Institute, Kolkata. Genomic DNA from post-operative normal tissue, adjacent to malignant BC (N = 49) and OC (N = 54) were collected from Chittaranjan National Cancer Research Institute and Indian Institute of Chemical Biology, Kolkata respectively. All samples were acquired after obtaining   informed written consent for participation. Institutional Human Ethical Committee approved the study protocol.

Sample collection, DNA isolation and genotyping
Peripheral blood (~5 ml) collected from control individuals, DS probands, their parents and ALL patients was used for extraction of genomic DNA [33]. Target sequences were amplified and PCR amplicons were subjected to genotyping ( Table 2).

Statistical analyses
Difference in allelic and genotypic frequency of the studied fSNPs in different study groups as compared to control was calculated by simple r x c contingency table (http://www.physics.csbsju.edu/stats/contingency_NROW_  [35]. Interaction among the genotypes of SIM2 and ETS2 was analyzed by multifactor dimensionality reduction (MDR) software (version 2.0 beta 8.1) [36] and values were expressed as   [37]. Genotype data of four fSNPs (rs461155, rs1051425 in ETS2 and rs2073601, rs2073416 in SIM2) were also included for LD, haplotype and SNP-SNP interaction analysis. For convenience, triplicate homozygous genotypes were considered as diploid homozygous genotypes in DS probands while the triplicate heterozygous genotypes were considered as the diploid heterozygous genotype for all the calculations to compare with respective reference diploid groups [32,38].

Results
In Silico analysis to predict pathways regulated by SIM2 and ETS2 Computational expression analysis by GNFSymAtlas showed that all the splice variants of SIM2 and ETS2 were over expressed in 13 tissues and under expressed in 12 tissues (Additional file 1: Table S1). Both SIM2 and ETS2 binding sites were identified in 464 genes by the 'Consensus-Finder' program from the eukaryotic promoter database. These putative target genes of SIM2 and ETS2 were sorted into four groups (Additional file 1: Table S2). Gene set I contains 71 genes, which showed over expression in all the tissues where SIM2 and ETS2 were also over expressed. Gene set II comprised of 9 genes, which showed down regulation in all the tissues where SIM2 and ETS2 were also down regulated. The 3 rd and 4 th set of genes exhibited reverse pattern of expression as compared to SIM2 and ETS2. In addition, SP1 and AP2 were identified as common TFs for both SIM2 and ETS2 target genes. Target pathway identification by Panther and KEGG (Table 3)  Interestingly, genes like KLK8, LCK, RELA, S100A8, S100A9, TRRAP, and GATA3 are known to have roles in malignant development.

In silico identification of functional variants
Different in silico tools identified functional genetic variants in SIM2 and ETS2. Among them, thirty five (seven in SIM2 and twenty eight in ETS2) SNPs were genotyped in this study. Functional significance of the SNPs is indicated in Table 1.

Allelic and genotypic frequency distribution
Comparative analysis of MAF in different populations revealed significant difference in many SNPs (rs374575, rs2070529, rs2070530, rs1051476, rs11254 and rs711 in CEU; rs2269188 and rs7276961 in HCB; rs2269188, rs2070529 and rs2070530 in JPT; rs2269188, rs374575, rs2070529 and rs711 in YRI) ( Table 4). Among seven SNPs studied in SIM2 (Table 1), only rs2269188 was polymorphic in the studied population. This SNP showed significant difference in allelic (χ 2 =6.333, P = 0.012, Power = 82.3%) and genotypic (χ 2 =6.41, P = 0.041, Power = 74.17%) frequency only in ALL compared to the control (Additional file 1: Table S3). However, the differences were not significant after correction for multiple testing. Twenty eight SNPs in ETS2 genomic region were analyzed and ten of them were polymorphic in the studied population. Eight of the ten ETS2 SNPs (rs374575, rs2070529, rs2070530, rs2070531, rs6517481, rs7276961, rs1051475 and rs1051476) did not show any significant difference in allelic frequency in DS probands, their parent and malignant groups (Additional file 1: Table S3). rs11254 showed a marginal allelic association in DS probands (P = 0.04712) which failed to stand Bonferroni (BF) and Benjamini-Hochberg (BH) correction for multiple testing (Table 5). rs711 showed significant increase in the 'G' allele frequency in probands with DS (χ 2 =8.51, BF P and BH P =0.03, Power = 43.47%) as compared to controls. Although a significant increase in the 'G' allele (χ 2 =6.83, P = 0.00895, Power = 85.03%, OR = 2.6) was noticed in ALL patients, it was found to be marginally significant after correction for multiple testing (BH P = 0.06). On the other hand, a significant increase in the ' A' allele (χ 2 =9.91, BF P and BH P =0.01, Power = 88.26%) was observed in BC patients (Table 5).

LD and haplotype analysis
SNP pairs that showed higher LD (high D' or r 2 value) in at least one combination or different LD patterns in control and case groups during pair wise analysis by Haploview 4.1 were sorted out. In control individuals and parents of probands with DS, all the studied SNPs exhibited strong LD (Table 6). In particular, rs6517481-rs7276961, rs1051475-rs1051476, rs2070529-rs2070530, rs2070531-rs6517481, rs2070531-rs7276961 pairs exhibited strong LD in other studied groups. Some paired combinations showed different LD pattern in different disease groups. For instance, rs11254 showed weak LD with all the sites in DS and BC groups, while rs461155 showed weak LD in OC. Statistically significant differences in frequency of several haplotypes were noticed between test and control groups (Figure 2). Notably the ' A-C-C-C-T-C-C-A-A-T-C-C-C-G-G' haplotype showed significant frequency difference in BC, DS proband and their parents when analyzed by Unphased. However, comparison by simple Chi-square test followed by analysis of the power of association by Piface (Additional file 1: Table S4) showed statistically significant difference only for the DS probands and BC groups (p value 0.054 and 0.013 respectively).

Discussion
The present study was aimed at identifying possible involvement of SIM2 and ETS2, two TFs known to have gene overdosage in probands with DS exhibiting trisomy of HSA21. To identify SIM2 and ETS2 targets, we focused on 464 genes containing binding site for both these factors in their regulatory regions (−5000 bp to +1000 bp). Following categorization based on expression pattern by GNF SymAtlas, 91 genes were identified as up-or down regulated by these TFs (Additional file 1: Table S2). Genes like ABP1, HRB2, S100A8, THBS1, CYB561, GATA1, GATA3, SP1 and AP2 indicated potential activation by SIM2 and ETS2 (gene set I and II), while genes such as GCNT2, MASP1, LOC338328, PCSK4, ICAM1, LPPR4, SLC25A21, H1F0 and ATP1A1 indicated potential repression by these TFs (gene set III and IV). Many of the genes with binding sites for SIM2 and ETS2, viz. KLK8, LCK, TRRAP, GATA3, etc. were earlier reported to have role in neurological as well as malignancy related pathways [28,39,40]. Analysis in the present study by Panther also revealed that genes such as KLK8, KRT16, and LCK carrying binding sites for SIM2 and ETS2, are involved in the development and function of the neurological system. Hence over expression of SIM2 and ETS2 might alter expression of the downstream target genes leading to different DS phenotypes. Previous analysis of DS revealed ambiguous observations on expressions of genes in HSA21 and other autosomes. For instance, a dosage dependent increase in transcription across different tissue/cell types was noticed in DS [41]. Analysis of lymphoblastoid cell lines generated from unrelated individuals revealed over expression of several HSA21 genes even in normal healthy volunteers [42]. In contrast, gene expression profile analysis of hearts of human fetuses with trisomy of HSA21 showed significant downregulation of 278 genes and upregulation of 195 genes as compared to controls [43]. On the other hand, serial analysis of gene expression in lymphocytes from children with DS revealed modest deregulation of autosomal genes [44]. Whole genome microarray in adult DS brains showed upregulation of 27% of genes on HSA21 as compared to 4.4% of genes on other autosomes [45]. Contrary to that, microarray analysis of cultured amniocytes and chorionic villus cells from fetuses with trisomy 13, 18, or 21 revealed lack of over expression of most of the HSA21 genes with only modest changes for genes on all other chromosomes [46]. It is possible that the differences in gene expression in HSA21 and other autosomes are due to the tissue of origin [47].
Differential expression of SIM2 and ETS2 target genes was also reported in different malignancies. For instance, the TRRAP gene, involved in transcriptional regulation and DNA repair, was found to be high in bone metastases from prostate cancer, intermediate in BC, and low in lung and kidney cancers [39]. KLK8 was upregulated in colorectal cancer and ovarian cancer while underexpressed in esophageal and cervical cancer [40,48]. Differential gene expression profiling of approximately 8000 genes in sixty different cancer cell lines revealed difference in gene expression pattern to be correlated with the tissue of origin and the physiological properties (e.g., doubling time, drug metabolism, and interferon response) of cell lines [49]. Difference in expression between specific cancer cell line and their nonmalignant counterparts was also noticed [49]. It is intriguing to note that genes like MAGEA3 and ATP1A1, which indicated potential over expression in our study, are also over expressed in leukemia/lymphoma [48,[50][51][52]. However, THBS1, which also indicated potential upregulation in the present study, was down regulated in leukemia and upregulated in lymphoma [48,53]. Thus, it remains unclear whether differential expression is also taking place for genes identified by our present in silico analysis. Further validation, involving expression analysis in various tumor tissues of individuals with DS, will be necessary.
Our next goal was to identify fSNPs in these two TFs. A number of SNPs with deleterious effects were identified in both the genes by our in silico approach. Analysis of allelic frequencies showed significant difference in MAF for the Indian control population as compared to other Asian, i.e. Japanese and Chinese, as well as Caucasian populations. Frequency distribution analysis revealed that the rs2269188 'G' allele was significantly high in ALL subjects, which failed to stand test for  rs711 is a site for SR protein mediated splicing regulation and may generate splice variants. In the Korean population, rs711 was reported to be associated with increased risk for acute myeloid leukemia [54]. In the present study, difference in allelic frequency for this site showed a trend to be significant in ALL even after correction for multiple testing, while DS probands, parents of DS probands and BC showed significant differences. MDR analysis supported evidence of individual effect of this SNP in all the studied groups.
rs11254, rs2070530 and rs1051476 showed significant difference in genotype distribution in DS probands (BH P = 0.001, 0.01 and 0.03 respectively). Though there was individual effect of these SNPs (29.78%, 3.04%, and 2.56% respectively), no significant synergistic effect was observed. rs11254 showed a very high individual effect (29.78%) in DS probands which could be due to 100% reduction in heterozygosity. On the other hand in malignant groups, rs11254 showed interactive effect in synergistic mode with other SNPs (rs2070530, rs2070531, rs6517481, rs7276961, rs1051475 and rs1051476). Therefore, this SNP may act differently in DS and other malignant groups.
While comparing differences in haplotype frequencies generated by fifteen SNPs, we analyzed each pair by simple Chi square tests to avoid errors due to multiple comparisons. The ' A-C-C-C-T-C-C-A-A-T-C-C-C-G-G' haplotype showed statistically significant higher occurrence in the control group compared to DS probands and BC. Frequency of this haplotype was also higher compared to other haplotypes generated from these 15 SNPs, which may be conferring protection towards the diseases.
MDR analysis exhibited high individual entropy value for rs461155 in both BC and OC groups. Involvement of risk allele of rs461155 in subjects with these two solid tumors has also been reported earlier [32]. Therefore, from the present study we predict that rs461155 may individually play an important role in solid tumor groups (BC and OC). On the other hand, rs2070530, rs2070531, rs6517481, rs7276961, rs1051475, rs1051476 and rs11254 may act together in ALL, BC and OC groups, where rs11254 act as a nodal SNP. In silico analysis revealed that, rs11254 has a potency to change miRNA and TF binding sites in the 3'UTR of ETS2. Presence of risk allele and inappropriate interaction of rs11254 probably can hamper proper expression of ETS2. There are various reports on loss of heterozygosity (LOH) of different genes under different malignant conditions like ovarian tumors [55], BC [56], head and neck squamous cell carcinoma [57], pituitary tumors [58], AML [59] etc. We found 100% LOH for rs11254 in DS probands.
Analysis of LD pattern of studied SNPs exhibited that rs6517481, rs7276961, rs1051475, rs1051476, rs2070529, rs2070530, rs2070531, rs461155 and rs11254 are in high LD in the studied population. MDR analysis also provided evidence of interaction between these SNPs in the malignant groups and parents of probands with DS and thus, may suggest combined effect of these fSNPs in the studied groups.
Similar to the present observation, SNP pairs rs2070529-rs2070530 were found to be in high LD in other populations studied in the HapMap; LD data for other SNP pairs were not available. Both haplotype distribution pattern and LD between different SNPs were found to vary in different groups examined in the present investigation, which could be attributed to the difference in allelic frequencies. Whether the observed difference is contributing to the disease etiology requires further analysis.
Our results do not imply that ETS2 and SIM2 are the only TFs in the HSA21 with a role in oncogenesis because several other TFs, located in the HSA21, also have association with malignancies [31,60]. For example, increased expression of BACH1 (transcriptional regulator of megakaryocytic differentiation process) and SON (homologous sequence with MYC family of oncoproteins) were reported in association with myeloid leukemia in DS [61]. RUNX1 and ERG were hypothesized as candidates for leukemia in non-DS patients; however, triplicate dosages of these two genes were incapable to generate transient myeloproliferative leukemia in Ts1Cje mice and thus, these two genes may not be directly responsible for development of leukemia in individuals with DS [62]. Further analysis of these TFs, in association with SIM2 and ETS2, would help us to understand their actual role in DS associated malignancies.

Conclusions
We summarize that, a) the rs2269188 'G' allele, showing trend for higher occurrence in ALL patients (BH P = 0.06, OR = 2.6), may play a regulatory role in ALL by altering carcinogen metabolism; in mother of probands with DS also, this SNP may contribute some regulatory role as the individual effect of this SNP calculated by MDR analysis was very high (Table 7); that b) rs711 may have very important role in DS and associated malignancies; that c) the fSNP rs11254 may act as a core SNP in the interaction cluster of rs6517481, rs7276961, rs1051475, rs1051476, rs2070529, rs2070530 and rs2070531, thus playing a role in malignant development in BC, OC, ALL; in parents of DS probands, these SNPs also showed strong interaction while in DS, a high individual effect of rs11254 was found; and that d) rs2070530, rs711 and rs11254 (with 100% LOH) showed strong genotypic association with DS. This prominent difference in status of fSNPs of SIM2 and ETS2 may indicate a significantly different pattern of SIM2 and ETS2 regulation in the studied groups, eventually leading to altered expression of their downstream genes associated with distinct disease phenotypes.

Additional files
Additional file 1: Table S1. Sites of overexpression and underexpression of SIM2 and ETS2. S2: Possible downstream genes of SIM2 and ETS2 identified by in silico analysis. S3: Details of allelic and genotypic association test of studied SNPs. S4: Comparative analysis of haplotypes in different study group.