Discovery of rare ancestry-specific variants in the fetal genome that confer risk of preterm premature rupture of membranes (PPROM) and preterm birth

Background Preterm premature rupture of membranes (PPROM) is the leading identifiable cause of preterm birth, a complication that is more common in African Americans. Attempts to identify genetic loci associated with preterm birth using genome-wide association studies (GWAS) have only been successful with large numbers of cases and controls, and there has yet to be a convincing genetic association to explain racial/ethnic disparities. Indeed, the search for ancestry-specific variants associated with preterm birth has led to the conclusion that spontaneous preterm birth could be the consequence of multiple rare variants. The hypothesis that preterm birth is due to rare genetic variants that would go undetected in standard GWAS has been explored in the present study. The detection and validation of these rare variants present challenges because of the low allele frequency. However, some success in the identification of fetal loci/genes associated with preterm birth using whole genome sequencing and whole exome sequencing (WES) has recently been reported. While encouraging, this is currently an expensive technology, and methods to leverage the sequencing data to quickly identify and cost-effectively validate variants are needed. Methods We developed a WES data analysis strategy based on neonatal genomic DNA from PPROM cases and term controls that was unencumbered by preselection of candidate genes, and capable of identifying variants in African Americans worthy of focused evaluation to establish statistically significant associations. Results We describe this approach and the identification of damaging nonsense variants of African ancestry in the DEFB1 and MBL2 genes that encode anti-microbial proteins that presumably defend the fetal membranes from infectious agents. Our approach also enabled us to rule out a likely contribution of a predicted damaging nonsense variant in the METTL7B gene. Conclusions Our findings support the notion that multiple rare population-specific variants in the fetal genome contribute to preterm birth associated with PPROM. Electronic supplementary material The online version of this article (10.1186/s12881-018-0696-4) contains supplementary material, which is available to authorized users.


Background
There are significant disparities in preterm birth rates in the United States, with African Americans experiencing an increased burden [1,2]. Delivery after preterm premature rupture of the membranes (PPROM) is the leading identifiable cause of spontaneous preterm birth, and PPROM is more common in African-Americans. PPROM is believed to be caused, in part, by infection and inflammation, presumably incited by microbes ascending from the vagina, resulting in the release of pro-inflammatory cytokines and the activation of matrix-degrading proteases that breakdown the collagens that give the fetal membranes their tensile strength, resulting in unscheduled rupture [3][4][5].
Twin studies have revealed that both fetal and maternal genetic factors contribute to gestational age at delivery, but there is uncertainty about the roles played by specific fetal and maternal genes. Attempts to identify genetic loci associated with gestational age at delivery and preterm birth using genome-wide association studies (GWAS) have only been successful with large numbers of cases and controls (see reference [5] for a review). Moreover, these studies have not identified genes that could account for increased preterm births in African-Americans. Efforts to identify ancestry-specific variants using GWAS approaches have led to the conclusion that spontaneous preterm birth is likely to be the consequence of multiple common variants or rare variants not easily detected by GWAS [6]. This is not a surprising conclusion since GWAS is based the "common disease-common variant" hypothesis, positing that a significant proportion of the variance of common diseases are attributable to DNA variants that are present in > 1-5% of the population, and that there are many of these DNA variants, each contributing a small amount to the total risk to a particular disease [5].
As noted above, an alternative hypothesis is that diseases are associated with rare genetic variants that have relatively larger effect sizes that would go undetected in standard GWAS. The detection and validation of these rare variants presents challenges because of the low allele frequency. Some success in the identification of fetal loci/ genes associated with preterm birth using whole genome sequencing and whole exome sequencing (WES) has recently been reported [7][8][9]. While encouraging, this is currently expensive technology and methods to leverage the sequencing data to quickly identify and cost-effectively validate variants are needed.
We recently pursued the approach of searching for rare variants in fetal genes that could contribute to risk of PPROM, employing WES to identify the burden of damaging mutations in African-American fetal (neonatal) samples [8,9]. In one of our studies, our analysis of the WES data focused on genes that either negatively regulate the innate immune response or which encode proteins that protect the host against microbes and their noxious products. Rather than utilizing a prospective candidate gene filter, we decided to develop a WES analysis plan that was not encumbered by preselection and capable of identifying rare damaging variants in African-Americans worth focused evaluation to establish statistically significant association and the mechanism(s) underlying the mutation effect. We hoped to establish a cost effective simple process that could be applied to modest sample sizes.

Subjects
The subjects in the discovery WES (76 PPROM cases and 43 term controls) and initial confirmatory targeted genotyping for DEFB1 and MBL2 nonsense variants (188 PPROM cases and 175 term controls) have been previously described [8,9]. The METTL7B SNPs were evaluated with the WES cohort. They were neonates born of self-reported African-American women. Term controls consisted of neonates born from uncomplicated singleton pregnancies (> 37 weeks gestation). PPROM cases were from singleton pregnancies prior to 37 weeks of completed gestation. The diagnosis of membrane rupture was based on pooling of amniotic fluid in the vagina, amniotic fluid ferning patterns and a positive nitrazine test. Women with multiple gestations, fetal anomalies, trauma, connective tissue diseases and medical complications of pregnancy requiring induction of labor were excluded as previously described [8,9].
The previously published analysis of the DEFB1 and MBL2 nonsense variants included the WES cohort and initial confirmatory targeted genotyping cohort described above [9]. In the present study the METTL7B SNPs were evaluated with the original WES cohort. In addition, we performed targeted genotyping for the DEFB1 and MBL2 nonsense variants on 119 PPROM cases and 199 controls not previously reported. The subjects were recruited from the same populations as the previously reported cohorts using identical inclusion and exclusion criteria. Ninety-four of these PPROM cases and 94 term controls were used for genotyping METTL7B SNPs.

Whole exome sequencing and genotyping
The methods used for WES (50-100X coverage) and analysis of the sequencing data have been described in detail in previous publications [8,9]. With the number of PPROM cases (76), we had 78% power to detect variants with a minor allele frequency of 0.005. Targeted genotyping was performed on the Agena (previously Sequenom) MassArray iPLEX platform [8,9]. The primers used for METTL7B genotyping are presented in Additional file 1: Table S1. Only high confidence genotype calls were included in the analysis.

Estimation of African ancestry
To reduce the potential risk that population stratification biased the genetic association tests, the percent African ancestry of the PPROM neonates and term control neonates was determined using ancestry-informative markers as previously described [8,9]. No significant differences in the percentage of African ancestry were found between PPROM cases and term controls (Means +/− S.D.; West African ancestry: PPROM cases: 0.695 +/− 0.073 (mean + S.D.); Term controls 0.698 +/− 0.087 (p > 0.10)) [9].

Selection strategy
We developed the following simple screening method for analyzing the WES data: 1) Identify predicted damaging nonsense variants (gnomAD "high confidence") present only in PPROM cases in the WES discovery panel; 2) Validate the nonsense variants by Sanger sequencing; 3) Verify it is a rare variant (minor allele frequency < 0.01) based on the genome aggregation database (gnomAD: http://gnomad.broadinstitute.org/); 4) Determine whether the variant/mutation is of African ancestry using a public database (gnomAD); 5) Determine whether the gene is under selective pressure, consistent with an essential role in a biological or pathophysiological process, by a literature review (PubMed: https://www.ncbi.nlm.nih.gov/pubmed/); 6) Assess whether heterozygous variants could potentially cause a biological effect by altering the expression level or activity of mature protein; 7) Determine whether the gene harboring the nonsense variant is expressed in fetal membranes; 8) Evaluate whether the gene could play a role in the existing pathophysiological concepts of PPROM from the literature; and 9) Conduct follow-up genotyping of the nonsense variant in independent cohorts to test the association of the identified variant with PPROM.

Statistical analysis
The minor allele frequencies for the DEFB1 and MBL2 nonsense variants examined in this report would require a very large number of PPROM cases and controls for an association study to achieve a power of 0.8 and a p value = 0.05. Therefore, we combined all WES and genotyping data reported previously [9] with the results of the genotyping of the additional subjects for each nonsense variant in the analysis. Finding no genetic association with these samples sizes cannot rule out an association. However, the discovery of significant associations, albeit in a study of limited power, does not negate the findings, with the caveat that significant findings from low powered studies may not always replicate.
Associations were examined for statistical significance using Fisher's Exact test (1-tailed) to determine whether the nonsense variant was overrepresented in PPROM cases. Nominal p values are reported. Correcting for multiple tests (Bonferroni adjustment) a, p value of < 0.017 would be considered the threshold for statistical significance, which was met for the DEFB1 and MBL2 nonsense variants studied.

Results
We detected more than 800 different nonsense variants (stop gain, stop loss, and start loss) in the discovery WES sample of PPROM cases and term controls, approximately 33% of which were unique to PPROM, the majority of which occurred in only one PPROM case, and 30% of the variant types were unique to term controls, with the majority occurring in one term control (Table 1) The remaining approximately one third of the nonsense variants occurred both in PPROM cases and controls, and not unexpectedly were nonsense variants with the highest allele frequency, suggesting that these variants might be tolerated and do not contribute to PPROM risk. Most of these nonsense variants have been previously detected in the human genome. More than 1400 coding sequence frameshift variants and splicing variants, predicted to be or possibly damaging were detected. Since a number of these variants were not previously known, it is uncertain whether they reflect sequencing errors in the WES. We suspect the latter since Sanger sequencing of a number of the DNA samples failed to confirm frameshift mutations. Consequently, we did not include the predicted damaging frameshift and splicing mutations in our screening paradigm.
A heterozygous nonsense variant (rs5743490) of African ancestry in the Defensin Beta 1 (DEFB1) gene, which encodes a small cysteine-rich cationic peptide that damages the cellular membranes of bacteria and some viruses, was found in PPROM cases in our initial WES and targeted genotyping [9], but not in neonates born at term (Tables 1, 2, 3 and 4). No other loss of function variants, including splicing variants and frameshift variants, were identified in DEFB1 in our WES. DEFB1 is expressed by the fetal membranes [9] (Additional file 2: Figure S1).      The DEFB1 rs5743490 SNP has two alternative minor alleles, C/T (African ancestry), which creates a stop codon; and G/A (Latino ancestry), which produces a synonymous amino acid change. We verified by Sanger sequencing that the minor allele of rs5743490 that we detected encoded a stop codon [9]. This DEFB1 nonsense variant truncates the DEFB1 protein 4 amino acids into the mature peptide amino acid sequence so that no functional DEFB1 would be made [10]. However, the mutant protein, if expressed, could have dominant negative activity by preventing proteolytic processing of the un-mutated pro-peptide encoded by the normal allele. Thus, heterozygous mutations could possibly be functionally significant.
An additional 115 PPROM cases and 191 controls were subsequently genotyped for the DEFB1 nonsense mutation, yielding more nonsense mutations in PPROM cases, including a neonate with a homozygous DEFB1 nonsense variant, and only one mutant allele in a term control (Table 4). A statistically significant association of the rs5743490 nonsense mutation and PPROM was present in the combined cohorts (Table 4) (p < 0.004 by Fisher's Exact test, 1-tailed).
We discovered another rare nonsense variant of African ancestry (rs74754826) in the MBL2 gene, which encodes mannose binding lectin-2, a protein involved in anti-microbial host defense [9]. It was only detected in PPROM cases (WES and initial targeted genotyping), and it met the screening criteria for being a PPROM candidate gene (Tables 1, 2 and 3). One hundred and nineteen PPROM cases and 199 term controls were genotyped in the present study for this nonsense variant, and a statistically significant association of the minor allele with PPROM was found (P < 0.015 by Fisher's Exact test, 1-tailed) ( Table 4).
We then applied our WES screening approach to look for other PPROM candidate genes, including genes where the nonsense mutation was of relatively high allele frequency in PPROM cases. We detected 7 heterozygous nonsense variants in the Methyltransferase Like 7B (METTL7B) gene in the WES discovery panel in PPROM  cases, and none in term controls. This was the largest number of unique "PPROM mutation alleles". METTL7B transcripts were detected in human placenta and amnion by PCR with sequence verification of the amplicon (Additional file 2: Figure S1). A METTL7B SNP (rs146636131) adjacent to rs115687886 that is in phase modifies the codon to create a benign missense variant (p.Arg224Leu). Subjects with both rs115687886 and rs146636131 minor alleles were considered to have the missense variant rather than the nonsense mutation (Tables 1, 2, 3 and 4). Another rarer nonsense variant (rs138407179) was also detected. Both METTL7B nonsense variants are identified as causing loss of function with "high confidence" in the gno-mAD database. rs138407179 has two alternate minor alleles, one encoding the stop codon of African ancestry (G/T) and another (G/A), which encodes a predicted damaging variant of South Asian ancestry.
Follow-up targeted genotyping of the METTL7B SNPs of interest on 94 PPROM cases and 94 term controls detected the nonsense variant in term controls, including a homozygote mutant. Statistical analysis of the combined analysis WES data and follow-up genotyping revealed no statistically significant association of the METTL7B rs115687886 nonsense mutation with PPROM (Tables 3  and 4), a finding that was not unexpected based on the fact that the minor allele of rs115687886 did not robustly meet all screening criteria as noted above.

Discussion
The simple screening approach outlined above may be useful to others seeking rare variants with moderate to high effect size associated with preterm birth in specific populations. The approach can also be used to identify rare mutations that are protective for PPROM by starting the screening with selection of variants found only in term controls, not in PPROM, and applying the subsequent filters. The fact that the majority of WES nonsense variants were detected in a single PPROM case, but each individual case harbored multiple nonsense variants allows for a test of genetic burden to be conducted as we have done in our previous studies [8,9] in addition to the more focused examination of the contributions of individual variants. Importantly, the patterns of nonsense mutations among PPROM cases and term controls could also point to pathways and gene networks that when disrupted promote PPROM or protect against it.
Our findings suggest that a rare damaging DEFB1 variant of African ancestry may have a role in the pathophysiology of PPROM, presumably because it facilitates a dysbiotic reproductive tract flora that invades and or inflames the fetal membranes leading to premature rupture. The DEFB1 gene has been under selective pressure [11,12], and has rare loss of function variants with four different ancestries (African, Latino, East Asian, European) reported in the genomAD database. It will be of interest to determine in the future if the other DEFB1 loss of function variants play a role in preterm birth after PPROM in the respective populations.
The discovery that the DEFB1 nonsense variant is associated PPROM prompted us to examine damaging variants in other beta defensin genes in our WES study. A stop-loss variant of African ancestry was identified in DEFB119 (rs12329612) in 19 PPROM cases, including 1 homozygote (20 alleles/152 total alleles), and 5 term controls, including 1 homozygote. A start-loss variant detected in DEFB128 (rs145944118) was found in one PPROM case, and another start-loss variant (rs18818350) was detected in DEFB132 in one term control. The functional significance of these variants and their relationship to preterm birth are currently unknown.
We discovered a significant association between a nonsense variant in an anti-microbial gene, MBL2, and  PPROM. This association is consistent with the work of others who examined common MBL2 polymorphisms in fetal DNA from European populations and found increased risk of preterm birth [13,14]. MBL-2 presumably functions as part of the host defense system, including DEFB1, which prevents or limits infections that cause chorioamnionitis and PPROM. In contrast to our findings with the DEFB1 and MBL2 nonsense variants, the nonsense variant (rs115687886) in the METTL7B gene did not stand up to further scrutiny as a PPROM candidate. METTL7B encodes a putative methyltransferase whose transcript is elevated in blood leukocytes in the context of infection in pregnancy, providing a potential link to PPROM mechanisms [15]. However, the METTL7B was initially reported to be a lipid droplet-associated protein whose function with respect to lipid metabolism remains obscure [16].
The METTL7B rs115687886 minor allele (in the absence of the adjacent in phase SNP) truncates the protein at amino acid position 224 of the 244-amino acid protein. This truncation is outside of the methyltransferase domain (amino acids 75-172). The functional significance of this protein truncation has not been established to the best of our knowledge, which could make the mutation "ineligible" in our screening criteria. Moreover, the rs115687886 minor allele frequencies in our African-American term controls and PPROM cases are relatively high ( Table 4) and outside of our definition of "rare" (Allele frequency < 0.01). No splicing or frameshift variants predicted to disrupt the protein coding sequence were detected in METTL7B the WES sample.
The other nonsense minor allele we examined (rs138407179), found in both a PPROM case and a term control, truncates the protein at amino acid residue 80, which likely damages the protein. Genotyping of additional PPROM cases and controls is required to determine if this nonsense variant is associated with PPROM. We could find no information in the literature regarding selective pressures impinging on the METTL7B gene.
Although our proposed strategy is consistent with guidelines for investigating causality of sequence variants in human disease, our approach has limitations including the current cost of WES and the use of modest sample sizes which may not have the power to detect pathophysiologic important rare variants/mutations [17]. The focus on nonsense variants found only in cases might exclude important PPROM-associated mutations from consideration if there was by chance a nonsense variant in the control group but not the cases. Importantly, the analysis strategy used in this report did not encompass other potentially damaging variants including frameshift, splicing and damaging missense variants, including variants that cause gain of function. These variants/mutations could, of course, be incorporated into the screening algorithm. In addition, WES would not identify intragenic regulatory elements that have an impact on gene expression levels. Another limitation of our study is the absence of direct evidence for disrupted function of the DEFB1 and MBL-2 proteins derived from the respective mutant transcripts. That said, the DEFB1 nonsense mutation would not lead to production of a mature peptide, so it is most certainly damaging. However, its potential to be a dominant negative inhibiting processing of full length DEFB1 pro-peptide from the major allele in heterozygous mutants remains to be explored. Likewise, the impact of the MBL2 nonsense mutation on protein function is only predicted, and studies need to be conducted with recombinant proteins to prove loss of function.
Our findings on the DEFB1 and MBL2 nonsense mutations are consistent with the notion that rare fetal mutations contribute to the disparities in preterm birth among African-Americans, and support the mining of rare mutations identified in WES as a portal to discovery of genes playing a role in preterm birth. A similar approach could be applied to other populations focusing on ancestry-enriched damaging variants. For example, there are rare damaging Latino (rs759177517; p. Tyr5-Ter) and East Asian (rs140403947, p. Tyr60Ter) nonsense variants in the DEFB1 gene that could be evaluated for association with PPROM in the respective populations.