Risk variants in BMP4 promoters for nonsyndromic cleft lip/palate in a Chilean population

Background Bone morphogenetic protein 4 gene (BMP4) plays a key role during maxillofacial development, since orofacial clefts are observed in animals when this gene is conditionally inactivated. We recently reported the existence of association between nonsyndromic cleft lip/palate (NSCLP) and BMP4 polymorphisms by detecting transmission deviations for haplotypes that include a region containing a BMP4 promoter in case-parent trios. The aim of the present study was to search for possible causal mutations within BMP4 promoters (BMP4.1 and BMP4.2). Methods We analyzed the sequence of BMP4.1 and BMP4.2 in 167 Chilean NSCLP cases and 336 controls. Results We detected three novel variants in BMP4.1 (c.-5514G > A, c.-5365C > T and c.-5049C > T) which could be considered as cleft risk factors due to their absence in controls. Additionally, rs2855530 G allele (BMP4.2) carriers showed an increased risk for NSCLP restricted to males (OR = 1.52; 95% C.I. = 1.07-2.15; p = 0.019). For this same SNP the dominant genotype model showed a higher frequency of G/G+G/C and a lower frequency of C/C in cases than controls in the total sample (p = 0.03) and in the male sample (p = 0.003). Bioinformatic prediction analysis showed that all the risk variants detected in this study could create new transcription factor binding motifs. Conclusions The sex-dependent association between rs2855530 and NSCLP could indirectly be related to the differential gene expression observed between sexes in animal models. We concluded that risk variants detected herein could potentially alter BMP4 promoter activity in NSCLP. Further functional and developmental studies are necessary to support this hypothesis.


Background
Nonsyndromic cleft lip with or without cleft palate (NSCLP [MIM 119530]) is one of the most common human craniofacial birth defects, with both genetic and environmental components involved in its etiology [1]. Its prevalence rate ranges from 1/300 to 1/2500 depending on the ethnic origin of the populations [2]. NSCLP presents a wide variety of primary and secondary medical complications in its rehabilitation. This fact plus medical costs and the emotional burden to patients and their families makes NSCLP a public health problem [3].
The identification of genetic risk factors in NSCLP has been the subject of intensive research in the last decades. In the past few years, the list of NSCLP candidate genes has rapidly increased and their study has been mainly focused in the search for coding mutations [4]. Some reports have estimated that the contribution of certain genes to NSCLP such as MSX1 accounts for 2%, IRF6 12-17% and 6% for the aggregate contribution of FOXE1, GLI2, MSX2, SKI, SATB2 and SPRY2 [5][6][7][8]. The increased knowledge that has emerged from all these reports has permitted an improvement in the knowledge of genetic factors involved in NSCLP, especially in families showing a familial aggregation of this disorder [9][10][11].
An attractive NSCLP candidate gene is bone morphogenetic protein 4 (BMP4, 14q22-23 in humans), a member of the transforming growth factor-beta superfamily. BMP signals regulate many aspects of skeletal development, including cartilage and bone formation during craniofacial and limb development [12]. Gong and Guo have reported that the Bmp4 expression localizes at the site of fusion of the mice facial prominences [13]. The conditional inactivation of Bmp4 in a transgenic mice line results in an isolated cleft lip [14]. The findings of these studies imply that the function of Bmp4 in the ectoderm of the facial processes is to regulate lip fusion.
In humans, few studies on the role of BMP4 in NSCLP have been reported. Lidral and Moreno performed a genome wide scan meta-analysis that showed evidence of linkage between NSCLP and the chromosomal region 14q21-24 [15]. Lin et al. performed an association study in a Chinese population using the nonsynonymous single-nucleotide polymorphism (SNP) of BMP4, rs17563 (p.Val152Ala) and described that the C allele carriers showed an increased risk for NSCLP [16]. Recently, Jianyan et al. and Lin et al. have reported an interaction between rs17563 and environmental factors like maternal passive smoking in the expression of NSCLP [17,18]. The BMP4 coding sequence was analyzed by Suzuki et al. in a sample of patients with subepithelial, microform and overt cleft lip [19]. These authors detected missense and nonsense mutations in 0.7% of these patients which were absent in controls. All these findings support a role for genetic variation of BMP4 in the pathogenesis of NSCLP.
Our group recently reported a mutation screening analysis of BMP4 in a sample of 150 Chilean NSCLP case-parent trios. This analysis considered the coding regions (exons and exon-intron boundaries) and exclude regulatory regions. Due to the absence of causal mutations, we decided to genotype three SNPs (two intronic and one located 5 kb upstream to BMP4) in this same sample of triads. Significant deviations from expected transmissions were observed for haplotypes conformed by rs1957860 and rs762642 [20]. These polymorphisms delimitate a genomic region where a promoter and an enhancer of BMP4 are located [21]. Consequently, in this new study we searched for NSCLP risk variants within the two BMP4 promoters: BMP4.1 (located upstream from exon 1) and BMP4.2 (located upstream from exon 2) (Figure 1) [22] by direct sequencing in a case-control Chilean sample.

Patients and controls
Our study group consisted of 167 unrelated NSCLP cases and 336 controls. The distribution of cases by gender was 64% males and 36% females. Among cases, 115 were sporadic (9 cleft lip and 106 cleft lip and palate) and 52 had a positive family history (5 cleft lip and 47 cleft lip and palate). The patients were identified and interviewed during the course of clinical examinations between the years 2008 and 2011 in the following centers: Cleft Lip/Palate Clinic, School of Dentistry, University of Chile; Dental Service, Hospital Roberto del Rio; Cleft Lip/Palate Center, Hospital Exequiel Gonzalez Cortes; Maxillofacial Service, Hospital San Borja-Arriaran; Maxillofacial Service, Hospital Sotero del Rio (all of them located in the city of Santiago, Chile) and at the Corporation for the Help of Cleft Children (located in the city of La Serena, which is approximately 400 kilometers north from Santiago, Chile). In-depth interviews of at least three family members were conducted to provide detailed familiar information. A careful anamnesis was carried out to evaluate the use of teratogenic substances, such as phenytoin, warfarin and ethanol during pregnancy. The control group was recruited from blood donors of the Blood Bank, Hospital San Jose. After a careful interview those with a negative family history of orofacial clefts were incorporated in the study. The gender distribution of controls was 55% males and 45% females. The Institutional Review Boards of the Faculty of Medicine of the University of Chile and of the National Fund for Science and Technological Development (FONDECYT) approved our study and all participants gave their informed consent.
The contemporary urban Chilean population is mainly the result of the admixture between Amerindians (of Asian origin) and the Spanish settlers initiated in the XVI and XVII centuries [23]. The relationship between Amerindian admixture, socioeconomic strata, and prevalence of NSCLP has been extensively studied in Chile [24,25]. All individuals included in our study belong to the middle-low and low socioeconomic strata which show the highest rates of Amerindian admixture and NSCLP [25].

Molecular Analysis
Genomic DNA purification: Genomic DNA was extracted from peripheral blood white cells according to the method described by Chomczynsky and Sacchi [26].
BMP4 promoters sequencing: the genomic segments corresponding to BMP4.1 and BMP4.2 were amplified by the polymerase chain reaction (PCR). The primers were designed using the on-line tool Primer3 http:// frodo.wi.mit.edu/primer3/ taking as reference BMP4.1 and BMP4.2 sequences described by Van den Wijngaard et al. and deposited in GeneBank (accession numbers AF035427.1 and AF035428.1, respectively) [24]. BMP4.1 was described as a 1097 bp segment (chromosome position GRCh37:14:54424710-54423613) and BMP4.2 as a 1212 bp segment (chromosome position GRCh37:14:54422436-54421219) ( Figure 1). Given that the maximum length for an appropriate sequence lecture of a segment is 850 bp, these promoters were amplified in two overlapping fragments. For this purpose AmpliTaq Gold ® 360 (Applied Biosystems) was used as DNA polymerase applying 35 amplification cycles according to the manufacturer recommendations. Primers, length of amplified fragments and annealing temperature are listed in Additional File 1. All PCR products were visualized by 1.5% agarose gel electrophoresis and sent to Macrogen Inc. (Seoul, Korea) where they were sequenced using the forward primer. Samples with variants not previously described were also sequenced with the reverse primer to confirm these findings.

Bioinformatic Analyses
The presence of SNPs previously described within BMP4.1 and BMP4.2 was examined using the SNP BLAST tool http://blast.ncbi.nlm.nih.gov/Blast.cgi?PRO-GRAM=blastn&BLAST_SPEC=SNP&BLAST_PRO-GRAMS=megaBlast&PAGE_TYPE=BlastSearch&-SHOW_DEFAULTS=on&LINK_LOC=dbSNP_homepage. Sequencing results were analyzed by multiple alignments using the ClustalW2 program http://www. ebi.ac.uk/Tools/msa/clustalw2/ comparing them with the aforementioned reference sequences deposited in GeneBank. For sequence variants detected in cases but not in controls and for those associated with NSCLP, their capability to disrupt or create mammalian transcription factor binding motifs was predicted. For this purpose the softwares Tfsitescan http://www.ifti.org/cgibin/ifti/Tfsitescan.pl and MatInspector http://www.genomatix.de/en/index.html were used.

Statistical Analyses
Allele and genotype frequencies of the BMP4.1 and BMP4.2 polymorphisms were estimated as simple proportions in cases and controls. An exact test to assess Hardy-Weinberg equilibrium implemented in the Arlequin statistical package was used in these polymorphisms [27]. To evaluate the association between NSCLP and BMP4.1 and/or BMP4.2 polymorphisms, allele and genotype Odds Ratio (OR) with 95% confidence intervals (C.I.) were estimated for the total sample and subdivided by gender. These analyses were performed using the UNPHASED program that applies a global likelihood-ratio significance test that does not require corrections for multiple comparisons [28]. Additionally, UNPHASED gives a likelihood-ratio test specific for each allele and genotype [28]. Parallel association analyses were carried out with PLINK software [29].

Results
The analysis of both BMP4 promoters in cases detected the presence of novel genetic variants. Four of them were found in BMP4.  Table 2. Two SNPs were found in BMP4.1: rs2855527 and rs77671695, and four SNPs in BMP4.2: rs2855530, rs113141288, rs76953585 and rs113562279. No significant differences in the allele Genotypes for BMP4.1 and BMP4.2 SNPs showed no significant deviations from Hardy-Weinberg expectations both in cases and controls (data not shown). The results of the genotype association analysis for BMP4.1 SNPs showed that only rs2855530 in BMP4.2 showed a positive association of C/G genotype (OR = 1.75; 95% C. I. = 1.14-2.69; p = 0.012) while an inverse relation was observed for the C/C genotype (OR = 0.79; 95% C.I. = 0.43-1.47; p = 0.020) ( Table 3). PLINK software does not show an individual p-value for each genotype casecontrol comparison. However, the genotype association displayed a significant difference when G/G+C/G frequencies were compared with C/C frequencies. Thus, G/G+C/G were more frequent in cases than controls while C/C was more frequent in controls than cases (p = 0.032) (data not shown).
In male participants, the C/G genotype of rs2855530 showed an OR = 2.34 (95% C.I. = 1.33-4.09; p = 0.011) and the C/C genotype presented an OR = 0.51 (95% C. I. = 0.23-1.09; p = 0.003) while the same comparisons  were non-significant for female participants (Table 4). Analysis performed by PLINK once again corroborate these results. Thus, the G/G+C/G versus C/C comparison was more significant between male cases and male controls than the total sample (p = 0.003). According to Tfsitescan and MatInspector softwares, the change observed in c.-5514 position generates a new site for GATA-1. The variant c.-5365C > T produces a sequence that can be potentially recognized by a RXR heterodimer transcription factor. The change T for C in c.-5049 introduced a previously inexistent site for TCF-1α. Finally, in the case of BMP4.2, this same bioinformatic analyses showed that rs2855530 G allele generated a Sp1 binding motif which is not detected when the C allele is present. Therefore, the bioinformatic analyses of the genetic variants c.-5514G > A, c.-5365C > T and c.-5049C > T within BMP4.1 and the SNP rs2855530 within BMP4.2 showed that they are capable of creating mammalian transcription factor binding motifs.

Discussion
Regulatory variants are important in the understanding of the phenotypic diversity and the role they play in the susceptibility of complex diseases. However, it is noteworthy that these regulatory variants have not received the same scientific interest in comparison to coding variants [30]. According to the Human Gene Mutation Database http://www.hgmd.cf.ac.uk/ac/index.php approximately 1.6% of single base-pair substitutions described are regulatory mutations. Furthermore, there is abundant evidence indicating that regulatory SNPs (rSNPs) have an impact in the phenotypic diversity and can also affect disease susceptibility interacting with other variants located in their vicinity [31]. All these changes can potentially disrupt the DNA motifs recognized by transcription factors and consequently alter the normal processes of gene activation and/or transcriptional regulation [32]. In this context and taking into account the absent of NSCLP causal mutations in BMP4 coding regions plus the evidence of haplotype  association reported by Suazo et al [20], the present report was focused in detecting NSCLP risk variants within both BMP4 promoters. In accordance with our previous report we found NSCLP risk variants within BMP4 promoters. Due to their frequencies it was impossible to establish if these novel variants are in linkage disequilibrium with those SNPs associated with NSCLP described by Suazo et al. For BMP4.1 three novel substitutions were detected in cases (c.-5514G > A, c.-5365C > T and c.-5049C > T) which can be considered as potential susceptibility variants due to their absence in controls. Moreover, c.-5365T and c.-5049T were found in the same NSCLP case and they were inherited from his healthy mother. Therefore, although that this haplotype can be considered a risk factor it cannot produce the expression of NSCLP by itself and it would need other genetic variants and/or environmental factors absent in this case's mother. In tune with our findings, several mutations have been identified in NSCLP candidate genes mainly in sporadic cases [11]. For this reason these variants can be considered as private mutations from private families. The novel variants described in our study present the same characteristic shared by private mutations from private families.
In BMP4.2 we did not detect novel allelic variants. Regarding polymorphisms, SNP rs2855530 showed an association with NSCLP which presented a sexual dimorphism. Combining the results of association analyses using UNPHASED and PLINK softwares we can conclude that rs2855530 G allele and the G/G+C/G genotype (dominant model) should be considered risk factors but restricted to males due to their higher frequency in male cases than in male controls. On the other hand, the C/C genotype seems to represent a protective factor for male individuals given that its frequency is higher in male controls. Our group has previously reported a sexual or gender dimorphism for NSCLP where an STR allele of MSX1 gene showed significant differences between male cases and male controls [33]. Using animal models, sex-biased gene expression has been reported for gonadal and extragonadal tissues during embryogenesis where the major determinants of these differences are sex hormones [34,35]. The human adult face displays a sexual dimorphism which seems to be established in the first years of life but could depend on factors expressed in the prenatal life [36]. These evidences and our findings in the present study are closely linked with epidemiological findings showing a higher frequency of NSCLP in males than females.
The bioinformatic analysis of the risk variants predicts that they could create new transcription factor binding motifs which could be involved in NSCLP. The c.-5514A allele of BMP4.1 could introduce a new site for the hematopoietic transcription factor GATA-1 [37]. A different site for this factor has been reported within the human BMP4.1 promoter and it has been demonstrated that it produces a negative effect on BMP4 expression [22]. The c.-5365T allele also generates a consensus sequence for RXR transcription factor which is related with gene expression regulated by retinoic acid (RA). This situation may explain why teratogenic doses of RA induce cleft palate in Rxr-a knockout mice in a lower frequency than in wild-type animals [38]. The BMP4.1 c.-5049T allele introduces a novel site for TCF-1α, a canonical Wnt pathway effector which is expressed in the processes that originate the mice midface [39,40]. For BMP4.2, the bioinformatic analysis predicted that the rs2855530G allele generates a novel binding site for Sp1, an ubiquitous transcription factor. Sp1 can modulate the gene expression in cellular processes like differentiation, growth and apoptosis, among others [41].
Nevertheless, there is no information about Sp1 inactivation or overexpression related to craniofacial anomalies.
To our knowledge, the present study constitutes the first report detecting novel risk regulatory variants for a NSCLP candidate gene. In this context, three previous studies have associated rSNPs with this birth defect: this is the case of rs642961 located in an IRF6 enhancer, rs28999109 within the PDGF-C promoter, and rs16260 located in the CDH1 promoter [42][43][44]. The bioinformatic analysis for IRF6 and PDGF-C variants showed that the risk alleles disrupt potential transcription factor motifs. Nevertheless, reporter gene assays demonstrated that significant alterations in gene expression were detected only for the PDGF-C promoter variant [43]. Following the tendency set by these latter articles, functional studies are necessary to confirm our findings.

Conclusions
In summary, we have detected three novel NSCLP potential causal variants in BMP4 promoters which could contribute to approximately 1.2% of this birth defect, as well as a risk SNP allele with a clear sex dependent association (rs2855530). These results are in concordance with our previous report showing the absence of potential causal mutations in the coding sequence of BMP4. The bioinformatic analyses have predicted that all these variants can potentially generate novel transcription factor recognition sites. In future reports will be necessary to confirm the in vivo capability of these variants to alter BMP4 expression using functional and developmental approaches.

Additional material
Additional file 1: Primer sequence and fragment sizes used for BMP4.1 and BMP4.2 PCR amplification and sequencing. a table showing primers used for BMP4.1 and BMP4.2 PCR amplification and sequencing, fragment sizes and other PCR conditions.