Identification of novel functional sequence variants in the gene for peptidase inhibitor 3

Background Peptidase inhibitor 3 (PI3) inhibits neutrophil elastase and proteinase-3, and has a potential role in skin and lung diseases as well as in cancer. Genome-wide expression profiling of chorioamniotic membranes revealed decreased expression of PI3 in women with preterm premature rupture of membranes. To elucidate the molecular mechanisms contributing to the decreased expression in amniotic membranes, the PI3 gene was searched for sequence variations and the functional significance of the identified promoter variants was studied. Methods Single nucleotide polymorphisms (SNPs) were identified by direct sequencing of PCR products spanning a region from 1,173 bp upstream to 1,266 bp downstream of the translation start site. Fourteen SNPs were genotyped from 112 and nine SNPs from 24 unrelated individuals. Putative transcription factor binding sites as detected by in silico search were verified by electrophoretic mobility shift assay (EMSA) using nuclear extract from Hela and amnion cell nuclear extract. Deviation from Hardy-Weinberg equilibrium (HWE) was tested by χ2 goodness-of-fit test. Haplotypes were estimated using expectation maximization (EM) algorithm. Results Twenty-three sequence variations were identified by direct sequencing of polymerase chain reaction (PCR) products covering 2,439 nt of the PI3 gene (-1,173 nt of promoter sequences and all three exons). Analysis of 112 unrelated individuals showed that 20 variants had minor allele frequencies (MAF) ranging from 0.02 to 0.46 representing "true polymorphisms", while three had MAF ≤ 0.01. Eleven variants were in the promoter region; several putative transcription factor binding sites were found at these sites by database searches. Differential binding of transcription factors was demonstrated at two polymorphic sites by electrophoretic mobility shift assays, both in amniotic and HeLa cell nuclear extracts. Differential binding of the transcription factor GATA1 at -689C>G site was confirmed by a supershift. Conclusion The promoter sequences of PI3 have a high degree of variability. Functional promoter variants provide a possible mechanism for explaining the differences in PI3 mRNA expression levels in the chorioamniotic membranes, and are also likely to be useful in elucidating the role of PI3 in other diseases.


Background
PI3 [Gene ID: 5266] is a member of the 'trappin' gene family [1]. The trappin gene family members are defined by an amino-terminal transglutaminase substrate domain consisting of hexapeptide repeats with the consensus sequence of GQDPVK and a carboxy-terminal four-disulphide bond core. PI3, also known as trappin-2, elafin, elastase specific inhibitor and skin-derived antileukoproteinase (SKALP), is a low-molecular weight, 6 kDa serine protease inhibitor [2], that is capable of inhibiting neutrophil elastase (also known as elastase 2; ELA2; [GeneID: 1991]) and proteinase 3 (PRTN3; [GeneID: 5657]; also known as the Wegener autoantigen, P29). PI3 has been mapped to chromosome 20q12-13.1 [3], and this locus contains 14 genes expressing protease inhibitor domains with homology to whey acidic protein (WAP). Human PI3 gene spans about 11,620 bp and consists of three exons [2,4]. The gene has multiple transcription start sites and the mRNA has been reported to have an unusually short 5'-UTR (5'-untranslated region) [5].
Initially, PI3 was identified in human epidermis of psoriatic patients [6], and later in bronchial secretions from patients with bronchial carcinoma [7] and chronic obstructive pulmonary disease [2], as well as in epidermal [8] and breast tumors [9]. In addition to its antipeptidase role, PI3 has antimicrobial activity and is a component of the innate immune system to protect epithelial surfaces from infection [10][11][12][13]. Expression of PI3 can be induced by inflammatory mediators such as tumor necrosis factor (TNF) and interleukin 1 beta (IL1B) [14,15].
In our previous report we identified PI3 as a down-regulated gene in the chorioamnionitic membranes of patients with preterm premature rupture of membranes (PPROM) [16]. In this study, we investigated the possible molecular mechanisms that control the expression of PI3 by carrying out a detailed analysis of the PI3 gene sequences.

Genomic DNA isolation
Blood samples were obtained from 112 healthy unrelated African-American individuals after written informed consent. The collection of samples, and their utilization for research purposes, was approved by the Institutional Review Boards of Wayne State University and the National Institute of Child Health and Human Development, NIH. Genomic DNA was extracted from blood samples using QIAGEN ® DNA Blood BioRobot ® 9604 kit (QIAGEN Inc., Valencia, CA.).

Direct sequencing of PCR products
Genomic DNA was used as a template to generate three overlapping PCR products of 724 bp, 717 bp and 1,328 bp in size extending from 1,173 bp upstream to 1,266 bp downstream of the translation start site of the PI3 gene [GenBank: NT_011362]. Primers are listed in Table 1. All PCRs were carried out in 100-µl volumes containing 1.5 mM of MgCl 2 , 0.2 mM dNTPs, 0.4 µM of each primer, 3 U of Taq DNA polymerase (Roche Molecular Systems, Inc., Branchburg, NJ) and 100 ng of genomic DNA. A 10 minute initial denaturation at 94°C was followed by 40 cycles consisting of 30 s denaturation at 94°C, 30 s annealing at 50°C to 55°C, and 1 minute extension at 72°C. PCR products were analyzed on 2% agarose gels. PCR products were purified by ultrafiltration (Centricon Centrifugal Filter Devices, Millipore, Bedford, MA), and sequenced by cycle sequencing and dye terminator labeling (ABI ® BigDye™ Terminator v1.1 Cycle Sequencing kit, Applied Biosystems, Foster City, CA). Sequencing reactions were purified using gel filtration columns (CENTRI-SEP, Princeton Separation, Adelphia, NJ) and run on 310 or 3700 Genetic Analyzer (Applied Biosystems). Sequences were edited using BioEdit [17]. Fourteen SNPs were genotyped from 112 unrelated individuals and nine SNPs from 24 unrelated individuals (Table 2).

In silico search for transcription factor binding sites
The sequences in and around the SNP sites in the promoter region were searched for putative transcription factor binding sites using three different computer programs: TESS [18,19], Alibaba 2.1 [20,21], and MatInspector [22,23]. Default parameters were used as search criteria.

Electrophoretic mobility shift assays (EMSA)
Oligonucleotides (Table 1) and their complementary strands were designed and purchased as gel purified (IDT, Coraville, IA). Complementary oligonucleotides were annealed to each other to generate double-stranded probes. EMSAs were performed using commercially available HeLa cell nuclear extracts (Promega, Madison, WI) and nuclear extracts prepared from primary amnion cell cultures as previously described [24] since we had previously demonstrated that PI3 protein was produced by a variety of chorioamniotic membrane cell types with the highest amount produced by the amniotic epithelial cells [16]. Primary amnion cell cultures were established using amniotic membranes obtained from women not in labor at term who underwent elective cesarean deliveries for obstetrical indications. All other reagents were purchased from a commercial source and used according to the manufacturer's protocol (Promega, Madison, WI). The concentration of poly(dI-dC) (Amersham Biosciences Corp., Piscataway, NJ) in the reaction was optimized to 0.05 µg/ µl to minimize non-specific binding. The concentrations of components in 10 µl reaction mixtures were as follows: 1X binding buffer [without poly(dI-dC)], 3.75 µg of HeLa or amnion cell nuclear extract, 0.05 µg/µl poly(dI-dC) and 50 fmole of 32 P-labeled double-stranded probe (>50,000 cpm). All the components, except 32 P-probe, were added to the reaction and incubated for 15 min on ice and 10 min at 20°C, followed by the addition of the 32 P-labelled probe, and incubation for 20 min at 20°C. For competition experiments, a 100-fold molar excess of unlabeled double-stranded oligonucleotides was added to the reaction mixture prior to the addition of the labeled probe. For supershift experiments, polyclonal antibodies against AP1 (Cat. No. sc-253X and sc-44X; Santa Cruz Biotechnology, Santa Cruz, CA), and GATA1 (Active Motif, Carlsbad, CA) were used. For AP1, after 20 min of incubation at 20°C with 32 P-labelled probe, 400 ng of corresponding antibodies were added to the reaction and incubated for another 15 min at 20°C. For the GATA1 assay, antibodies were added and incubated for 20 min at 20°C before adding the labeled probe [25]. Samples were run on nondenaturing 6% polyacrylamide gels in 0.5X TBE buffer, at 100 V for 80 min. X-ray film (Kodak, Rochester, NY) was exposed to dried gels 2 to 5 h at -80°C depending on signal intensity.

Nomenclature for sequence variants and genes
The variants and nucleotides are described following the guidelines of the Human Genome Variation Society (HGVS) [26]. SNPs are described using the genomic sequence AL049767.12 as a reference and numbered relative to the translation start site. Official gene symbols provided by Human Genome Organization (HUGO) Nomenclature Committee (HGNC) were used [27].

Statistical analyses
Tests for deviations from HWE were performed by using the χ 2 goodness-of-fit test. Haplotypes were estimated following expectation maximization (EM) algorithm as implemented in the software Arlequin [28].

SNP genotyping and haplotype construction
When this study was initiated, only two polymorphisms were known to exist in the PI3 gene, neither of them in the promoter region. We identified 23 SNPs (Table 2 and Table 2). Eleven SNPs were located in the promoter region, one in exon 1, seven in intron 1, two in exon 2 and two in intron 2. To obtain more reliable Locations of the 23 SNPs detected in the region from 1,173 bp upstream to 1,266 bp downstream of the translation start site of the PI3 gene  Table 2.  Table 2). The genotype frequencies of all SNPs were in HWE. Three SNPs (+50C>T, +959A>C and +1123C>T) altered the codons (Table 2). However, only two SNPs (+50C>T and +959A>C) altered amino acid ( Table 2). The T allele at +50 altered the 17 th amino acid from threonine to methionine and C allele at +959 altered the 34 th amino acid from threonine to proline. The 17 th amino acid is part of the signal peptide sequence, whereas the 34 th amino acid is part of the proprotein.
Thirteen of the 23 SNPs were in complete linkage disequilibrium (Table 3). Altogether 16 haplotypes were identified (Table 3). PI3_F was the most common haplotype followed by PI3_H and PI3_K.

Effect of SNPs on protein binding
We performed in silico searches for putative transcription factor binding sites at the 11 SNP sites in the promoter region of the PI3 gene. Except for one SNP (-675C>T), all other sites showed potential differential binding for at least one transcription factor (Table 4). In other words, the transcription factor was predicted to bind to one of the alleles, but not the other for these 10 SNPs.
To verify experimentally the differential binding of transcription factors, we conducted electrophoretic mobility shift assays (EMSA) for 10 SNP sites located in the promoter region. Due to the presence of a long stretch of ACrepeats, EMSA was not carried out for the SNP -868C>G. Of the 10 putative sites, six (-1077A>G, -1067A>G, -1063G>A, -960T>Del, -689C>G, -338G>A) showed differential binding by transcription factors in nuclear extracts derived from HeLa cells (not shown), while only two (-1063G>A and -689A>G) showed differential binding using amnion cell nuclear extract (Fig 2). Those SNP sites that did not show differential binding with HeLa cell nuclear extract, also did not show differential binding with amnion cell nuclear extracts. A transcription factor in HeLa cell nuclear extract bound to -960T>Del. There was however, no binding by a transcription factor in amnion cell nuclear extract to this same site. No transcription factor in either HeLa or amnion cell nuclear extract bound to the -258A>G site. Our interest was the transcriptional regulation of the PI3 gene in amnion cells [16]. We, therefore, focused on the two SNPs (-1063G>A and -689C>G, Fig 2) that showed differential binding by transcription factors derived from the amnion cell nuclear extract. For -1063A>G and -689C>G, the banding patterns representing the protein-DNA complexes were similar when using HeLa and amnion cell nuclear extracts, although the band intensities were lower with the latter, probably due to a lower concentration of functional proteins (Fig 2). To determine the specificity of the binding, we used a competition assay (Fig 3). The differential binding that persisted after cross-competition (100-fold) was considered to be due to the SNP. For -1063A>G, one protein-DNA complex persisted after a labeled double-stranded A-probe was competed with double-stranded sequence differing only at the SNP (G instead of A). For -689C>G, two protein-DNA complexes persisted after labeled double-stranded G-probe was competed with a double-stranded sequence differing only at the SNP (C instead of G) (Fig 3).
Our in silico search predicted that AP1 was the transcription factor that would bind differentially at the -1063 SNP. To investigate this we used 100-fold excess of a competitor with the consensus sequence for AP1 binding or the anti-AP1 antibody in the reaction. No change in the banding pattern was observed in the competition assay (Fig 4). Similarly, no supershift with anti-c-jun or anti-cfos antibody was observed for -1063G>A polymorphism using amnion or HeLa cell nuclear extracts (Fig 4). Since a positive control, using consensus AP1 binding sequence, demonstrated a supershift against anti-c-jun and anti-cfos antibodies (Fig 4), a failure in the supershift was unlikely to be due to technical problems. We, therefore, concluded that the protein that binds to the A probe at nt -1063 does not contain the AP1 epitope.
For the SNP at nt -689, the transcription factor, whose binding was predicted to change due to the SNP, was GATA1 ( Table 5). As shown in Fig 5, a consensus sequence containing the GATA1 binding site was able to compete with the -689G probe ( Fig 5A) and a supershift was observed with anti-GATA1 antibody when using amniotic cell nuclear extract (Fig 5B) indicating that GATA1 binds to the G-allele of the -689C>G polymorphism in the promoter region of PI3 gene.

Discussion
We observed a high degree of polymorphism within the PI3 gene with 23 SNPs detected, 11 of which were located in the promoter region. We found an amino acid substitution, T34P, in the 4 th amino acid of the amino terminaltransglutaminase substrate domain, GQDPVK, of PI3. To determine if this SNP has a significant effect on the function of this domain, we searched for the consensus sequence of the transglutaminase substrate in other mammals. A similar sequence domain was identified in seminal vesicle protein I (Semg1) repeats in guinea pig (PROSITE documentation PDOC000282). Semg1 is a clotting protein that serves as the substrate in the formation of the copulatory plug [30]. Covalent clotting of this protein is catalyzed by a transglutaminase and involves the formation of γ-glutamyl-ε-lysine crosslinks. The con- EMSA showing the banding patterns with HeLa and amniotic cell nuclear extracts for -1063A>G and -689C>G sites Putative binding of several transcription factors in the promoter region of PI3 gene have been previously reported (Table 5) [5,9,14,15,31]. Except for two, none of these sites were polymorphic in our study. The two consensus sequences of NFKB1 at nt -964 to -956 and nt -340 to -331 identified in previous studies [5,9] contained the -960T>Del and -338G>A SNP sites, respectively, in our study. Based on our in silico analysis, the binding of transcription factors NFATC2 and AP1 could be altered by the sequence changes at -960T>Del and -338G>A (Table 4), but with EMSA we did not observe any differences between the two alleles using amnion cell nuclear extract. For -960T>Del we did not observe any binding to a transcription factor using amnion cell extract.
We identified 10 SNPs with alleles that were predicted to have different binding sites for one or more transcription factors by in silico searches (Table 4). We tested the predicted differential binding at these sites by EMSA with both HeLa and amnion cell nuclear extracts. HeLa cells are used widely for studying the functionality of promoter polymorphisms. Since PI3 mRNA is down-regulated in chorioamniotic membranes of patients with PPROM [16], we also used a nuclear extract derived from amnion cells. Six of the 10 sites exhibited differential binding to transcription factors with the HeLa extract, in contrast to only two with amnion cell nuclear extract. Since we did not observe supershift at -1063G>A using antibody against AP1 and nuclear extract from either HeLa or amniotic cell line, it is likely that the differential binding was to a transcription factor other than AP1 or the antibody did not have the specific epitope. The presence of AP1 in the nuclear extract of both cell lines was confirmed by the supershift seen when using AP1 consensus probe. We demonstrated the binding of GATA1 to the G allele at the -689C>G site by supershift with an antibody against GATA1. Haplo type These SNP loci are in complete linkage disequilibrium with each other.  EMSA showing self-and cross-competition for differential binding for -1063A>G and -689C>G sites with amniotic or HeLa cell nuclear extracts Figure 3 EMSA showing self-and cross-competition for differential binding for -1063A>G and -689C>G sites with amniotic or HeLa cell nuclear extracts. The arrows indicate the differential binding that consistently persisted after cross-competition.
Results of competition and supershift experiments for the -1063A>G site using HeLa cell nuclear extract Figure 4 Results of competition and supershift experiments for the -1063A>G site using HeLa cell nuclear extract. S, shift; SS, supershift. Arrow on the left indicates a protein-DNA complex specific to the transcription factor binding to the Aallele.
neutrophil elastase (ELA2, [LocusID: 1991]) are increased in the amniotic fluid of patients with PPROM and acute chorioamnionitis [32]. It is therefore plausible to speculate that the production of PI3 in the fetal membranes is to protect the tissue from the damage that could be caused by increased amounts of neutrophil elastase. Our recent study [16] showing decreased expression of PI3 in the chorioamniotic membranes from patients with PPROM supports our hypothesis that patients who are not capable of producing adequate amounts of PI3 may be predisposed to PPROM.
It has been suggested that PI3 is involved in the pathophysiology of many clinical conditions. For example, PI3 was found in the epidermis of patients with psoriasis, but not in normal human epidermis [33]. Higher levels of PI3 were also observed in bronchial secretions from patients with chronic obstructive pulmonary disease [2] and bronchial carcinoma [7], and the expression of PI3 was decreased in breast [9] and in epidermal tumors [8,9]. The SNPs identified here will likely be useful for studying the molecular mechanisms of these diseases.

Conclusion
A high degree of polymorphism was detected in the PI3 gene with 23 SNPs, 11 of which are in the promoter region. Two SNP sites (-1063G>A and -689C>G) showed differential binding of transcription factors in nuclear extracts derived from both amnion and HeLa cells suggesting possible involvement of these two SNPs in the expression of PI3 gene. As the SNP site at -1063G>A did not bind to the transcription factor AP1 as suggested by in silico search, the bound transcription factor may not be in current database and needs to be characterized. Binding of GATA1 to the G allele at the -689C>G site suggests the involvement of GATA1 in the transcriptional regulation of PI3 gene in amnion cells. We have performed a genetic association study with PI3 variants, including the -689C>G variant, and found that it is associated with PPROM [manuscript in preparation]. We also previously demonstrated by immunohistochemistry that many cell types of the chorioamniotic membranes produce PI3 and that PI3 protein is decreased in chorioamniotic membranes from PPROM cases [16]. Together, these lines of evidence provide a plausible genetic explanation for the