Replication and exploratory analysis of 24 candidate risk polymorphisms for neural tube defects

Background Neural tube defects (NTDs), which are among the most common congenital malformations, are influenced by environmental and genetic factors. Low maternal folate is the strongest known contributing factor, making variants in genes in the folate metabolic pathway attractive candidates for NTD risk. Multiple studies have identified nominally significant allelic associations with NTDs. We tested whether associations detected in a large Irish cohort could be replicated in an independent population. Methods Replication tests of 24 nominally significant NTD associations were performed in racially/ethnically matched populations. Family-based tests of fifteen nominally significant single nucleotide polymorphisms (SNPs) were repeated in a cohort of NTD trios (530 cases and their parents) from the United Kingdom, and case–control tests of nine nominally significant SNPs were repeated in a cohort (190 cases, 941 controls) from New York State (NYS). Secondary hypotheses involved evaluating the latter set of nine SNPs for NTD association using alternate case–control models and NTD groupings in white, African American and Hispanic cohorts from NYS. Results Of the 24 SNPs tested for replication, ADA rs452159 and MTR rs10925260 were significantly associated with isolated NTDs. Of the secondary tests performed, ARID1A rs11247593 was associated with NTDs in whites, and ALDH1A2 rs7169289 was associated with isolated NTDs in African Americans. Conclusions We report a number of associations between SNP genotypes and neural tube defects. These associations were nominally significant before correction for multiple hypothesis testing. These corrections are highly conservative for association studies of untested hypotheses, and may be too conservative for replication studies. We therefore believe the true effect of these four nominally significant SNPs on NTD risk will be more definitively determined by further study in other populations, and eventual meta-analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12881-014-0102-9) contains supplementary material, which is available to authorized users.

Rigorously establishing genetic risk for any multifactorial disorder is important but inherently difficult. Genetic risk is optimally detected with a large number of racially/ethnically homogenous samples that include cases with a well-defined, highly penetrant phenotype. These are difficult criteria to meet for rare, complex disorders, meaning that statistical power is often compromised to some extent. This, in combination with the bias that significant associations are more likely to be published [37][38][39][40], contributes to many associations in the literature that would not survive correction for multiple tests.
Specifically in the field of genetic risk for NTDs, studies including large numbers of candidate polymorphisms and/ or genes have implicated many candidate SNPs exhibiting nominal NTD associations [21,[41][42][43][44][45]. Each of the cited studies examined a minimum of 37 SNPs for NTD risk, yet only the association of rs1907362 in cubilin (CUBN) with NTDs survived correction in the original study [21]. The most likely explanation is that the reported nominally significant associations were due to chance. However, the large number of tests performed, reduced power due to limited availability of samples, and/or the populationspecific effect of a risk allele can all contribute to Type II errors. Therefore replication studies are essential for the validation of any genetic risk factor for NTDs.
In one of the largest studies to date we previously reported evaluation of the common genetic variation in 82 candidate genes for NTD risk in an Irish population [45]. The inclusion of 570 NTD cases (~95% spina bifida cases), their parents and 999 controls made it possible to use both family based methods (the transmission disequilibrium test and log-linear analysis) and case-control based methods (logistic regression) to evaluate genetic risk for NTDs in cases (case effect) as well as genetic risk for an NTD pregnancy in mothers (maternal effect). In the current study, we test 24 of the resulting nominally significant association signals using 9 casecontrol tests and 14 family-based tests in racially/ethnically similar populations. The 9 SNPs that showed nominal significance by case-control analysis in the previously studied Irish NTD cohort were additionally evaluated for NTD risk using varying models in racially/ethnically similar and distinct populations.

Statistical methods
For replication analyses, each candidate risk allele was retested using the same association test and model for which it was nominally significant in the primary study. These tests are noted in Tables 1 and 2, and include: 1) case-control tests of logistic regression using a continuous, recessive or dominant genetic model; 2) log-linear tests of case or maternal effect using a dominant or recessive genetic model; and 3) the Spielman transmission disequilibrium test (TDT [46]).
Secondary hypotheses involved more broadly testing the candidate SNPs for NTD risk in other populations and/or NTD classifications. Three models (continuous, recessive and dominant) of logistic regression were performed to generate genotype relative risks (GRRs) and 95% confidence intervals (CI).

Study populations
Family-based replication was performed in a United Kingdom cohort consisting of trios of NTD cases and their parents. Exclusion of the "Other NTDs" class, which may include a small number of families with multiple defects or spina bifida occulta, did not substantially change the results. This cohort consists of three groups (Table 3). First, the 400 case families recruited with the assistance of the UK Association for Spina Bifida and Hydrocephalus (ASBAH) (England and Wales) are fully described elsewhere [47]. Second, 131 case families were recruited from Northern Ireland. Ethical approval of the use of these samples was granted by the UK Multi-Centre Research Ethics Committee (University of Newcastle, UK, and University of Northern Ireland, Coleraine) and the Institutional Review Board at the National Human Genome Research Institute (Bethesda, MD, USA).
Replication of findings from case-control analyses was performed in a NYS case-control cohort. NTD cases from the entire state born 1998-2005 were identified by their inclusion in the NYS Congenital Malformations Registry (NYSCMR). Matched controls were selected as a random sample of non-malformed control infants born 1998-2005 from NYS Newborn Screening Program records. Demographic data was obtained by matching to NYS birth certificates. Four controls matched for maternal race/ethnicity were selected for each NTD case. Due to low numbers, the Asian and "other" racial/ethnic subgroups were excluded, and subjects coded as "Hispanic, white" and "Hispanic, other" were combined into a single Hispanic group for analysis. Case diagnoses allowed classification of the NTDs into six subgroups based on NTD subtype (anencephaly, encephalocele, spina bifida) and whether the NTD was isolated or part of multiple defects (Table 4). To perform the replication analyses, only isolated NTDs of all three subtypes among the NHWs were considered in order to most closely match the composition of the NTD cases in the original Irish cohort. Secondary hypotheses involved analyses of: 1) all  Genotyping MTHFD1 rs2236225 was genotyped in the UK cohort as a restriction fragment length polymorphism (PCR-RFLP) using MspI as previously described [17,48]. Genotypes were obtained for 91.6% of NTD fathers, 92.5% of NTD cases and 93.6% of NTD mothers. Concordance was 100% for repeated (n = 83) and for re-plated (the testing of a second sample from the DNA source) (n = 92) samples using an independent assay based on detection of allele-specific primer extension using matrix-assisted laser desorption/ionizationtime of flight (MALDI-TOF) mass spectrometry (Sequenom, San Diego, CA, USA). The remaining SNPs in Table 2 were also genotyped in the UK cohort using the Sequenom platform. Two independent assays of folate hydrolase (FOLH1) rs16906205 failed, so the proxy FOLH1 rs11040291 (r 2 = 1 in Hapmap CEU) was genotyped and reported instead. Two independent assays to genotype methionine adenosyltransferase II, beta (MAT2B) rs17535909, were attempted but both failed, and there was no proxy for this singleton SNP. For this set of 13 SNPs, the average call rates were ≥96.8% for each family group (NTD cases, NTD mothers or NTD fathers). Re-plated and re-genotyped samples covered >18% of the cohort with 99.2% genotype concordance for this set of 13 SNPs. The 14 SNPs typed in the UK cohort exhibited non-Mendelian inheritance in <1% of families. These SNPs were also in Hardy Weinberg Equilibrium (HWE, p > 0.01) for each family group. Genotypes for families exhibiting non-Mendelian inheritance and other discordant genotypes were excluded from analysis. For NYS samples, DNA was extracted from one 3-mm archived dried blood spot specimen [49] and whole-genome amplified using a primer extension preamplification method, as described previously [50]. SNPs were genotyped by KBiosciences (Herts, UK) using KAS-Par chemistry. Eight SNPs were genotyped in duplicate using independently whole-genome amplified DNA aliquots with 100% concordance in genotype calls. FOLH1 rs383028 was genotyped using genomic DNA because data from amplified DNA did not pass quality control criteria. The average call rate for 9 SNPs (Table 1) was 99.9% for both cases and controls. Replated samples covered 6.5% of the cohort with genotype concordance of 100%. No SNPs deviated from HWE (p > 0.01) in any case or control group for any race/ethnicity.

Results
The primary aim of this study was to perform replication analyses of the nominally significant NTD associations identified in a recent study in an Irish population [45]. The secondary aim was to test a subset of these candidate SNPs for association using alternate risk models and populations.

Replication analyses Replication criteria
The replication strategy was designed to retest nominally significant NTD-associated SNPs in racially/ethnically matched populations using the same association tests and genetic models that previously yielded the  lowest p-values among 1441 SNPs in 82 candidate genes tested in an Irish population [45]. Case-control association tests were performed in a cohort of 190 isolated NTD cases and 941 controls from non-Hispanic white (NHW) mothers from NYS, and family-based tests were performed in NTD trios (n = 530) consisting of NTD cases and their parents from the United Kingdom, including centers in Northern Ireland, England and Wales.
The top 25 groups of SNPs sharing high linkage disequilibrium (LD; D' > 0.9) with the lowest p-values for any test were selected for replication (52 SNPs total). Without access to NTD mothers and corresponding controls several loci could not be retested. This included 9 independent mother-control signals (17 SNPs) in adenosine deaminase (ADA), alcohol dehydrogenase 1 family, member A2 (ALDH1A2), catechol-O-methyltransferase (COMT), CUBN, MTHFD1, methylenetetrahydrofolate dehydrogenase (NADP + dependent) 1-like (MTHFD1L), brachyury (T), and transcription factor AP-2 alpha (TFAP2A). Of the remaining 35 SNPs, the SNP with the lowest observed p-value was selected for replication testing whenever SNPs with a minimum p-value for the same test and model shared high LD (D' > 0.9). This reduced the number of tests to 24 (9 case-control tests and 15 family-based tests). Results of these tests are shown in Tables 1 and 2.

Replication of associations detected in case-control analyses
We used the NTD case samples from NYS to replicate case effects previously observed in the Irish NTD cohort. Each SNP was tested for association with NTDs by logistic regression with the same genetic model used for the original observation. No SNP was observed to be significantly associated with isolated NTDs in white cases from NYS (Table 1). This lack of replication was accompanied by the corresponding genotype relative risk (GRR) values indicating an inconsistent effect of the candidate risk allele to that observed in the original study for eight of the nine SNPs.

Replication of associations detected in family-based analyses
Replication analyses of previously observed case or maternal effects detected by log-linear analysis were repeated with the same genetic model in a combined cohort of NTD triads from the United Kingdom. Two SNPs showed nominally significant association with NTDs:  (Table 2). If applied, the significance of these results would not withstand Bonferroni correction for multiple tests when considering all SNPs tested in the current study. In contrast to the case-control analyses, the direction of effect for the GRRs of these family-based associations largely agreed between the initial and replication studies (10 of 14), regardless of significance.

Secondary hypotheses -exploratory analyses in new populations
Applying analyses that yielded nominal associations from the initial study In addition to replication, these candidate SNPs were tested for association in other NTD populations using various models. The nine SNPs selected for case-control replication in the NYS cohort were first tested in African American and Hispanic cases with an isolated NTD and controls using the same tests and models for which each SNP had been originally observed to be nominally associated in the Irish population (Table 1, 2 tests/SNP). Of the nine SNPs examined in each of the two racial/ethnic groups, only one, ALDH1A2 rs7169289, was found to be nominally associated with isolated NTDsin African Americans in a continuous model (GRR = 0.57 [0.34-0.98], p = 0.041). This "protective" effect is in contrast to the risk effect seen for this SNP in the Irish cohort using a dominant model.

Case-control analyses in all NTD cases vs. isolated Spina Bifida cases
These SNPs were also tested by performing logistic regression using three genetic models (continuous, dominant and recessive) in all NTD cases and controls in each of the three racial/ethnic groups (9 tests/SNP). Of these nine SNPs tested in nine models (N = 81 tests), AT rich interactive domain 1A (ARID1A) rs11247593 was the only SNP found to be nominally associated with NTDsin non-Hispanic whites in a dominant model (GRR = 0.58 [0.35-0.97], p = 0.037). This protective effect is in contrast to the risk effect (GRR > 1) seen in the Irish cohort using a continuous model (Table 1).
Lastly, the same tests were performed in a restricted subset of isolated spina bifida cases and controls in the three racial/ethnic groups (9 tests/SNP). These 81 tests generated two significant findings: ALDH1A2 rs7169289 was nominally associated with isolated spina bifida cases in the African American population in continuous (GRR = 0.46 [0.25-0.87], p = 0.017) and dominant (GRR = 0.47 [0.24-0.94], p = 0.033) models. Similarly, a significant risk effect was observed in the original Irish population, but not the NYS NHW population (Table 1).
These results are summarized in Table 5. These results are nominally significant, though would not withstand Bonferroni correction using the total number of tests as the correction factor.

Discussion
Our study addressed two questions. First, are SNPs that were nominally significant upon testing for NTD association in an Irish population also associated with NTDs in a similar, independent population? Second, are the SNPs that were nominally significant upon testing for NTD association in an Irish population by logistic regression also associated with NTDs in different racial/ ethnic populations when using a broader range of association tests, genetic models and NTD case groupings?
Our criteria for replication were stringent, and only performed for a SNP when the same association test and genetic model could be applied. Because we did not have samples from mothers of NYS NTD cases, we were unable to test for a maternal risk for nine independent mother-control signals in seven genes. Nine SNPs were re-tested for a case effect by logistic regression in a white NTD sample from NYS, and log-linear analyses were used to re-test ten SNPs for a case effect and five SNPs for a maternal effect in a UK NTD sample. Of 14 SNPs previously observed to be associated with NTDs by family-based analysis in an Irish population, two showed nominally significant NTD association in trios from the UK ( Table 2). ADA rs452159 falls in the first intron, and is at the border of a D' block encompassing the first exon and intron of the gene. MTR rs10925260 is in intron 23 of the gene, and is part of a large block of D' LD encompassing the entire gene. Due to their strong D' linkage, these nominally significant associations may reflect a direct signal or that of a causative SNP linked to the tested SNP(s).
Although none of the 9 SNPs found to be nominally significant in Irish NTDs were replicated in isolated NYS white NTD cases, two were nominally associated with NTDs under different conditions. ARID1A rs11247593 did not replicate by logistic regression using a continuous model in isolated NTDs (Table 1), but was significant when all white NTD cases and controls were tested using a dominant model (Table 5). ARID1A rs11247593 is intronic and is part of a large D' block extending over the entire gene, so the signal may be due to the tested SNP or a linked causative SNP. In addition, ALDH1A2 rs7169289 was significantly associated with isolated NTDs and isolated spina bifida cases in the African American population (Table 5). These results most likely represent a single association signal as the genetic models (continuous, dominant) and NTD sets (isolated NTDs, isolated SB) for these analyses overlap. The estimated effects observed in African American NTD cases from NYS ( Table 5; OR = 0.46-0.57, p = 0.017-0.041) are similar to the original effect observed in an Irish population ( Table 1; OR = 0.67 [0.52-0.86], p = 0.0016). ALDH1A2 rs7169289 is just downstream of the gene. It falls between two blocks of D' LD, and the association signal may be due to this SNP or a linked SNP in either block.
As was found in the original study, none of the observed associations in the NYS and UK sample sets would have survived correction for multiple tests. One interpretation is that these associations are indeed due to chance and that the tested SNPs do not contribute to NTD risk. It is not clear, however, that Bonferroni correction is appropriate for replication studies. All the tests performed in this study were clearly based on individual, a priori hypotheses generated for our initial study and supported by previous data. Additionally, factors that can contribute to Type II error should be considered, such as population stratification and genetic heterogeneity.
The most important factor may be limited sample size, which compromises the power to detect true associations. Compared to the number of NTD cases (n~570) in the original Irish cohort, a limited number of white NYS isolated NTD cases (n~190) were available in the current study. With~530 NTD trios, however, the UK NTD cohort appears comparable, but this is before taking into consideration the power required to replicate an association. Due to the "winner's curse," or the tendency of an initial observation to overestimate the effect size or significance of a true association, replication studies generally require larger study sample numbers to detect the original effect [51,52]. This may explain why the well known NTD risk allele of MTHFR rs1801133 (T) is not associated with NTDs by TDT (p = 0.742) in our large UK NTD cohort. In fact, it is nominally significant as a protective factor in a recessive model by log-linear analysis (GRR = 0.663 [0.441-0.997], p = 0.049). Considering that this association has been replicated in many studies in other populations supporting its role in NTD risk, it would be surprising if the MTHFR rs1801133 TT genotype does not contribute to NTDs in the UK population. A lack of replication power and chance seem the most likely explanations for this failure to replicate, which must be taken into consideration for the other SNPs tested for replication in this study. Authentic validation of any genetic association requires accumulation of evidence over time, involving multiple studies in independent populations. Lack of adequate power is a pervasive problem in the field of NTD genetics. One recent review estimated that~1000 cases would be required to attain 80% power to detect an odds ratio of 2 or under, yet approximately one quarter of published NTD association studies used fewer than 100 cases [53]. This problem is compounded when attempting to evaluate candidate risk SNPs in less studied populations. While we were able to examine candidate risk SNPs of interest in African American and Hispanic populations from NYS, far fewer cases were available compared to whites (Table 4). Although some population-based studies have included African American cases in aggregate analyses of genetic risk for NTDs [54][55][56], and there are studies of NTD risk in African cohorts [57], this is the first report of candidate NTD SNPs evaluated in an African American case-control cohort. The nominal association observed for ALDH1A2 rs7169289 may be real, but needs confirmation.
Identifying potential NTD risk SNPs has proven much easier than validating them. A single replication study is not definitive, and the limited numbers in existing NTD cohorts may contribute to subsequent underpowered studies failing to confirm reported associations. By combining data from multiple published studies, metaanalyses increase power and confidence in whether a SNP truly contributes to NTD risk. Meta-analyses involving aggregate data from several hundred to several thousand NTD cases and controls have confirmed MTHFR rs1801133 (c.677C > T) as a maternal [58] and case risk factor for NTDs [14,15,58,59]. Although comparatively fewer data were available, meta-analyses show 5-methyltetrahydrofolate-homocysteine methyltransferase reductase (MTRR) rs1801394 (c/66A > G) is a maternal risk factor for NTDs [60,61], while MTR rs1805087 (c/2756A > G) does not contribute to maternal or case risk [15,22,60,62]. However, meta-analysis requires a large amount of data including full genotype information. These data are absent in publications for the majority of NTD risk SNPs that show nominal significance. It is therefore essential to publish all association study data in a format that allows future meta-analyses to be performed (i.e., genotype counts or a way to unambiguously determine them). As such, we have reported the genotype data for all 24 SNPs tested in our study (Additional file 1: Table S1 and Additional file 2: Table S2).