Novel variants in the PRDX6 Gene and the risk of Acute Lung Injury following major trauma

Background Peroxiredoxin 6 (PRDX6) is involved in redox regulation of the cell and is thought to be protective against oxidant injury. Little is known about genetic variation within the PRDX6 gene and its association with acute lung injury (ALI). In this study we sequenced the PRDX6 gene to uncover common variants, and tested association with ALI following major trauma. Methods To examine the extent of variation in the PRDX6 gene, we performed direct sequencing of the 5' UTR, exons, introns and the 3' UTR in 25 African American cases and controls and 23 European American cases and controls (selected from a cohort study of major trauma), which uncovered 80 SNPs. In silico modeling was performed using Patrocles and Transcriptional Element Search System (TESS). Thirty seven novel and tagging SNPs were tested for association with ALI compared with ICU at-risk controls who did not develop ALI in a cohort study of 259 African American and 254 European American subjects that had been admitted to the ICU with major trauma. Results Resequencing of critically ill subjects demonstrated 43 novel SNPs not previously reported. Coding regions demonstrated no detectable variation, indicating conservation of the protein. Block haplotype analyses reveal that recombination rates within the gene seem low in both Caucasians and African Americans. Several novel SNPs appeared to have the potential for functional consequence using in silico modeling. Chi2 analysis of ALI incidence and genotype showed no significant association between the SNPs in this study and ALI. Haplotype analysis did not reveal any association beyond single SNP analyses. Conclusions This study revealed novel SNPs within the PRDX6 gene and its 5' and 3' flanking regions via direct sequencing. There was no association found between these SNPs and ALI, possibly due to a low sample size, which was limited to detection of relative risks of 1.93 and above. Future studies may focus on the role of PRDX6 genetic variation in other diseases, where oxidative stress is suspected.


Background
Acute Respiratory Distress Syndrome (ARDS) and Acute Lung Injury (ALI) affect 100,000-150,000 patients each year in the United States alone [1,2]. ALI is an inflammatory syndrome characterized by acute respiratory failure due to non-cardiogenic pulmonary edema and hypoxemia [3]. Oxidant stress caused by reactive oxygen species (ROS) is thought to be a major contributor to the pathogenesis of ALI. ROS can be generated by inflammatory cells or pulmonary endothelium and cause damage to proteins, DNA, and lipids [4].
The risk of developing ALI/ARDS is not uniformly distributed in the critically ill population, suggesting a genetic influence on outcomes [5]. Peroxiredoxins are a superfamily of non-heme and non-selenium peroxidases that are widely distributed throughout all phyla [6]. The Peroxiredoxin 6 gene (PRDX6) is located on chromosome 1q24 and is approximately 12 Kb in length, containing 5 exons. The Prdx6 protein encoded by PRDX6 is involved in redox regulation of the cell and has been shown in cell and animal models to be protective against oxidative injury [7]. Prdx6 also has phospholipase A 2 activity and has an important role in lung surfactant metabolism [7]. The protein product, Prdx6, has been shown to affect the cellular level of H 2 O 2 produced in cells stimulated with platelet-derived growth factor or tumor necrosis factorα, and modulating signaling induced by those ligands [8], thus indicating that Prdx6 can have an effect on cytokine levels and cell signaling cascades. Recent studies suggest that Prdx6 is only active following heterodimerization with glutathione-S-transferase pi, indicating that there is an important interaction between Prdx6 and GSTpi [6]. Despite these important functions, little is known about genetic variation within the PRDX6 gene [9] and its association with ALI.
In order to determine if variation within PRDX6 is associated with ALI risk in either the African American (AA) or European American (EA) populations, we performed direct sequencing of the 5' UTR, exons, introns, and the 3' untranslated region (UTR) in 48 subjects (25 African Americans and 23 European Americans) and identified 80 variants, many of which have not been previously reported. Eighteen of the eighty variants, along with 19 tagging SNPs selected using HapMap http:// hapmap.ncbi.nlm.nih.gov/, were tested for association with ALI using a custom genotyping platform.

Patient population
Between 1999 and 2006, patients were enrolled in a major trauma cohort study designed to study molecular risks for acute lung injury [10][11][12]. Participants met the following inclusion criteria: 1) admission to the intensive care unit (ICU) as a result of acute trauma directly from the field or via that hospital's Emergency Department; and 2) have an Injury Severity Score (ISS) ≥ 16 as calculated on the basis of information available during their first 24 hours of hospitalization. The following demographic and clinical variables were collected upon admission to the ICU: age, gender, ISS, blunt mechanism, and acute physiology and chronic health evaluation (APACHE) ( Table 1). Exclusion criteria were death or discharge from the ICU within 24 hours of admission, less than 13 years of age, current or past evidence of congestive heart failure (CHF) or recent acute myocardial infarction, severe chronic respiratory disease, morbid obesity, burns on over 30% of the total body surface area, and lung or bone marrow transplant [10].
The definition of ALI was in accordance with the American European Consensus Conference (AECC) [3]. ALI and ARDS were defined as: acute onset; bilateral pulmonary infiltrates on chest X-ray consistent with pulmonary edema; absence of evidence of left atrial hypertension; and poor systemic oxygenation, and a ratio of arterial oxygen (PaO2) to the fraction of inspired oxygen (FiO2) less than or equal to 300 for ALI and 200 for ARDS [3]. All chest x-rays were reviewed independently by 2 trained observers. In our population, greater than 85% of subjects meeting criteria for ALI also met criteria for ARDS.

Clinical Data and Biosample Collection
Clinical data were collected by trained study nurses using a standardized research case report form designed for the trauma cohort study. Blood for analysis was obtained from residual blood samples in tubes containing ethyledenediaminetetraacetic acid (EDTA) that had been previously drawn for other clinical purposes. Study personnel collected residual samples each day, centrifuged, and separated the buffy coat layers, which were frozen at -80°C [10]. All clinical and biosample collection protocols were approved by the institutional review board (IRB) at the University of Pennsylvania School of Medicine under a waiver of informed consent.

PRDX6 resequencing
Genomic DNA was extracted from whole blood using Qiagen Qiamp DNA Blood Midi Kits (Qiagen USA) and stored in the provided tris-EDTA buffer. DNA from 25 African American and 23 European American subjects selected from the major trauma cohort, with ALI status equally distributed within each group, were selected for sequencing of PCR fragments, providing a power of 99% to detect minor allele frequencies of at least 5% [13]. PCR primers for 4 Kb upstream of the ATG start site, all exons, introns, and 4 Kb of the 3' UTR were designed using PCRoverlap (Children's Hospital of Philadelphia (CHOP) bioinformatics core) to generate amplicons between 600 and 800 bp that overlapped by  [14]. Genotyping novel variants not only served to test for association, but allowed us to validate those SNPs in a larger population. Tagging SNPs were also selected to provide better coverage of the haplotype structure of PRDX6. SNPlex utilizes an oligonucleotide ligation/PCR assay with universal ZipChute probe detection to perform genotyping of up to 48 SNPs in a single reaction. ZipChute probes were custom designed and detected by capillary electrophoresis using the Applied Biosystems 3130 Analyzer and genotype calls were determined using Gene Mapper 4.0 (Applied Biosystems Foster City, CA). All genotyping was performed in the University of Pennsylvania's Laboratory for Molecular Epidemiology (LME). Staff was blinded to the disease status and genotyping calls were performed in subsamples by plate. Each plate contained six positive controls to test for concordance. Genotyping calls were performed automatically using the algorithm described by Da La Vega and colleagues [15].

In silico modeling of putative function in SNP sites
We sought to test inferred function in silico using transcription factor binding and mRNA binding tools. TESS is a web-based software tool for locating possible transcription factor binding sites in DNA sequences using weight matrix models. It can also be used for browsing information about relevant transcription factors in the TRANSFAC database [16]. All SNPs discovered within the 5' UTR and the first intron were submitted to TESS as 21 base pair long FASTA sequences with the reference allele of the SNP of interest in the 11 th position. A second search was performed using the alternative allele in the 11 th position. To eliminate any poor matches due to background noise, transcription factors with log-likelihood scores (La) less than 12, were eliminated. TESS results were compared with experimental transcription factor binding site (TFBS) data registered in the University of California Santa Cruz (UCSC) Genome Browser by the Encyclopedia of DNA Elements (ENCODE) consortium [17]. The ENCODE data were filtered by chromosome and position.
A search query was performed for potential miRNA binding sites in the 3' UTR of PRDX6 using Patrocles http://www.patrocles.org/. Patrocles is an online database containing DNA sequence polymorphisms that are predicted to interrupt miRNA-mediated gene regulation [18]. The search was performed using "PRDX6" as a key word in the target gene id field and miRNA target motifs were defined by Xie et al. [19] and Lewis et al [20].

Statistical Analysis of ALI association
259 African American and 254 European American subjects enrolled in the major trauma cohort were used to test for association of novel variants and tagging SNPs with ALI. Association of each PRDX6 SNP with ALI was determined separately for European Americans and African Americans using an additive model Chi 2 test, with a p-value < 0.0014 for African Americans considered significant. Dominant and recessive inheritance models were also tested using Chi 2 analysis. Multivariable analyses of potential confounding were performed using logistic regression methods. Power was calculated using the power for genetic association analyses (PGA) [21]. Using PGA, we estimated that a sample size of 250 subjects per race category would provide 80% power to detect relative risks of 2.26 or greater for SNPs with a prevalence of 0.05 or greater and 1.93 or greater for SNPs with a prevalence of 0.10 or greater, assuming a Bonferroni-corrected alpha = 0.0014 for African Americans, and an incidence of ALI = 0.30 (Additional file 2). These statistical analyses were performed using STATA 11 (STATA Data Corp, College Station, TX). Pairwise linkage disequilibrium was evaluated using Haploview http://www.broadinstitute.org/mpg/haploview. Genotypes with a completion rate of 95% or greater were considered for analysis in Haploview. LD was calculated in terms of r 2 values and blocks were defined using the default algorithm using the confidence intervals methods of Gabriel and colleagues [22].
Haplotypes were inferred using the standard expectation maximization algorithm in Haploview [23,24] and the following confidence interval (CI) criteria: CI minima for strong LD: 0.7 -0.98; upper CI maximum for strong LD: 0.98; fraction of strong LD in informative comparisons ≥ 0.95; and exclude markers with minor allele frequency (MAF) < 0.05. Haplotypes were tested for association with ALI first in a global association test, which performed contingency testing using all haplotypes of an LD block compared to no haplotypes, and then as individual haplotypes versus ALI coded in an additive fashion PLINK [25]. Haplotype multiple testing was addressed by applying permutation tests (10,000 permutations).

Identification of novel polymorphisms in PRDX6
Direct sequencing of the PRDX6 gene in 48 subjects revealed 80 genetic variants, none of which were in coding regions (31 in the 3' UTR, 22 in the 5' UTR and 27 intronic) (Additional file 3). The variants identified via direct sequencing were compared with those registered in the NCBI dbSNP database (Build 130) and 43 were found to be novel SNPs ( Table 2). Thirty seven were matched with SNPs catalogued in dbSNP (Build 130) and Genewindow http://genewindow.nci.nih.gov/Welcome based on chromosome position (Table 3). Twenty five of the novel SNPs uncovered had a MAF > 0.04 and were submitted to the NCBI to be catalogued and assigned ss numbers in the submitter records section of dbSNP (Table 2). Novel SNPs were also compared with SNPs registered in the 1000 Genomes database. Thirty six out of thirty seven known SNPs overlapped with SNPs registered in 1000 Genomes, but only sixteen out of forty-three novel SNPs identified via our sequencing effort were also registered in 1000 Genomes (Table 4). Several variants were only observed in one individual. As a quality control measure we present the confidence scores for these genotypes in additional file 4. Confidence scores are reported as the percentage of overlap between heterozygote peaks. Previous studies indicated that two transcription factor binding sites, the ARE1 (-357 to -349) [26] and GRE2 (-750 to -738) (A. Fisher, unpublished observations), may play a role in the regulation of PRDX6. We were unable to sequence the ARE1 region and portions of the intronic regions due to the GC rich content of the flanking sequence ( Figure 1). The GRE2 region was successfully sequenced, but showed no variation.

In Silico function of novel SNPs in PRDX6
The TESS results showed several potential transcription factor binding motifs in both the reference and alternative sequence. The reference and alternative sequences were submitted as independent queries and transcription factors were returned for 19 positions in the reference sequence and 21 positions in the alternative sequence (Table 5). Twenty seven out of twenty nine sequences submitted were shown to create, abolish, or change a transcription factor binding site. Fourteen of these SNPs were novel. Comparison of the transcription factors returned from the TESS query with the data from ENCODE showed that only 3 of these putative transcription factor binding sites have been tested by the ENCODE consortium, SP1, GATA-1, and c-Myc. ENCODE data for SP1, GATA-1, and c-Myc revealed that there is no evidence of binding affinity with the sequence results from the PRDX6 gene when filtered for PRDX6 and A549 cells.
A Patrocles miRNA database search for PRDX6 revealed eight SNPs in the 3' UTR of PRDX6 as potential miRNA binding sites (Table 6). Of the eight SNPs returned from the search query, three matched SNPs from this study (rs4611, rs36005931, and rs2000). rs4611 and rs36005931 are located within octamers that have been conserved among several species, but do not correspond to a known miRNA. The G allele of rs2000 is part of an octamer capable of binding miR-942. A literature search for miR-942 returned only sequence data, with no known function to date.

Association of PRDX6 with ALI
The trauma cohort described in Table 1 was genotyped for 37 PRDX6 SNPs using SNPlex. All SNPs were tested for Hardy-Weinberg equilibrium (Additional file 5). Chi 2 analysis of incidence of ALI compared to genotype using an additive model showed no significant association between any of the SNPs in this study and ALI (Table 7). Dominant and recessive models failed to demonstrate an association between our SNPs of interest and ALI (Additional file 6). The genotype concordance rate based on assay positive controls was 100% and the frequency of missing genotypes is presented in Table 7. Logistic regression analysis after adjustment for age and ISS showed no association between ALI and our SNPs (Table 8).

Haplotype Analysis
Haplotype blocks were created for both African and European Americans using 27 and 28 SNP markers, respectively. Haplotype blocks were created for a region spanning 100.6 kb of chromosome 1. For African Americans, there were 14 SNP markers with genotyping completion rate of less than 95% and were thus excluded from the haplotype analysis. For European Americans, there were 9 SNPs with a genotype completion rate of less than 95% and were excluded from the haplotype analysis.

Discussion
Prdx6 is a member of the thiol-specific antioxidant protein family and in overexpressing cell and mouse models has been shown to be protective against oxidant stress which null models show sensitivity to oxidants [7,9,27]. Thus, PRDX6 is a suitable candidate gene for ALI risk. The extent of genetic variation within PRDX6 remains largely unknown, therefore we performed direct sequencing of the PRDX6 gene, and identified novel variants for future study. We also tested the newly discovered SNPs and tagging SNPs for association with ALI using our trauma cohort, and did not demonstrate an association with traumarelated ALI.  We identified 43 novel variants among African American and European American subjects with either ALI or control status. None of the 43 SNPs identified were in coding regions which may indicate that the Prdx6 protein is highly conserved across phyla. Approximately 19 kb on chromosome 1 was sequenced in order to achieve adequate coverage of the PRDX6 gene and flanking 5' and 3' UTRs. Special attention was given to the GRE2 and ARE1 regions -749 to -737 and -357 to -349, respectively. The ARE1 within the PRDX6 promoter was shown to play a role in regulation of transcription and to be inducible under conditions of oxidative stress [26] and the GRE2 may be capable of binding transcription factors under oxidative stress conditions [28]. Due to the GC rich content of the region surrounding the ARE1, we were unable to optimize PCR reaction conditions in a way to prime through the secondary structure. The GRE2 region was sequenced, but no variation was noted. The GC rich region within the PRDX6 promoter might warrant further investigation since methylation of DNA cytosine residues are often found in the sequence context CpG. Several new sequencing approaches are emerging that target methylation sites using restriction enzyme treatment followed by sequence by synthesis [29].
In addition to comparing our results with NCBI's dbSNP, we compared our novel and known SNPs with the resequencing data registered in 1000 Genomes. The 1000 Genomes project aims to find most genetic variants with frequencies of at least 1%. Thus far three sequencing projects contribute to the database, low coverage sequencing of 179 individuals from 4 populations, high coverage sequencing of 2 mother-father-child trios, and exon targeting sequencing of 697 individuals from 7 populations [30]. Although 1000 Genomes aims to identify over 95% of variation in any individual, 27 of our novel SNPs and 1 previously recorded SNP are not present in the database, signifying a need for resequencing of extreme phenotypes, such as ALI cases.
Novel and previously recorded SNPs in the 5' UTR and first intron of PRDX6 were submitted to TESS to determine their likelihood of being in transcription factor binding sites. We found 19 motifs in the reference sequences that are capable of binding known transcription factors and 21 in the alternative sequence. A comparison between the results of the reference sequence search and the alternative revealed that in most cases, the SNP of interest changes the motif enough to cause a different transcription factor to bind that site or can cause a binding site to disappear and vice versa. After comparison with the ENCODE data, we found that our sequences have not yet been shown to bind the three overlapping transcription factors tested in ENCODE experimentally.
Known SNPs validated in the sequencing effort were compared using a Patrocles search query for miRNA target sites within PRDX6 to determine if any of our SNPs were in putative target sites for miRNAs. Three of the eight SNPs returned from the search corresponded with our known SNPs. Only one of the three SNPs was found to have a corresponding known miRNA (miR-942). Some miRNAs are known to control the expression of genes at the posttranscriptional level [31]. However, very limited data are available on miR-942.
We performed an association study for ALI using newly uncovered SNPs and SNPs selected from Hapmap and NCBI's dbSNP and observed no significant association between any of the SNPs in this study and    ALI. This lack of association may be due to several causes. First, the detectable effect size is modest because of sample size limitations. We genotyped 513 subjects to test for an association between our selected SNPs and ALI, but this sample size was inadequate to detect relative risks below 1.93 and 1.69 for alleles with MAFs of 0.05 and 0.10, respectively. Second, our analyses were limited to patients with severe trauma. Thus, our study did not evaluate a possible association with other causes of ALI such as sepsis. Finally, it is possible that PRDX6 genetic variation may not modify the risk of ALI. The genotype data were used to construct haplotype blocks to better assess the PRDX6 gene structure. Haplotype analysis plays an important role in association studies between genotype and phenotype, since SNPs found to be in strong LD can capture most of the genetic variation across fairly large regions [24]. The haplotype blocks constructed from our genotype data did not show strong linkage disequilibrium using confidence intervals, therefore tagging SNP strategies in future studies should be approached with caution.
Our resequencing data did not show any variation in the coding region of PRDX6. Had nonsynonymous SNPs been discovered, it would have prompted us to investigate whether any of these SNPs had any effect on protein structure, which could cause a loss of function in Prdx6. Since we cannot make a connection between coding region SNPs and conformational changes in the protein, we examined regulatory effects. We found several promoter SNPs that change the sequence of potential TFBSs based on conservation data. We were unable to confirm that these sequences were in fact TFBSs due to the lack of available data. However if any of our promoter SNPs showed a significant association with ALI or another phenotype perhaps using a larger sample size, future studies using promoter constructs could offer more information on upregulation of PRDX6. We also found several SNPs in the 3'UTR. It is possible that one or more of these SNPs is responsible for changing an miRNA binding site, thus repressing protein translation. Our study has several limitations. One potential limitation of this study is the number of genotype call failures. Ten and nine markers for African Americans and European Americans respectively were eliminated from our analysis since they were under the 95% completion rate cut-off. This high rate of genotype failure was due to difficulties with consistent assay performance rather than DNA quality. If these genotypes had been obtained, it is a possible that an association may have been observed. Also, we did not adjust our results for ancestry informative markers (AIMs). Instead our population was stratified based on skin color, which may not be an adequate proxy for population admixture effects. Another possible limitation is a candidate gene approach that focused on a single gene: PRDX6. ALI risk may be considered a complex phenotype, and thus likely is not fully explained by a variation in a single gene [10]. Finally, we only tested for association in patients with ALI from severe trauma. Thus, it is possible that PRDX6 may play a role in the initiation or severity of ALI after other insults, including sepsis, or in determining recovery from ALI.
PRDX6 has been shown to play a role not only in ALI, but other diseases as well. A recent studied demonstrated that PRDX6 promotes lung cancer metastasis and invasion via phospholipase A 2 activity in mice [32].   Another publication reported that PRDX6 transfected breast cancer cells metastasized more readily to the lungs when compared with control cells [26]. It is possible that our novel SNPs may function in lung cancer as well as ALI. The interaction between GSTpi and PRDX6 is another interesting subject for future studies. GSTpi expression is elevated in tumors from a variety of cancers, including lung cancer, compared to normal tissue [33]. Testing gene-gene interactions between PRDX6 and GSTpi would be an interesting future direction both in ALI and other diseases such as cancer.

Conclusion
In conclusion, this study revealed novel SNPs within the important anti-oxidant PRDX6 gene and its 5' and 3' flanking regions via direct sequencing. Several of these variants have putative function and may be useful for future gene association studies. Although there was no association discovered between our novel and tagging SNPs with trauma-related ALI, future studies may focus on the role of PRDX6 variation in other at risk groups, as well as other diseases.