Mutations in the 3'-untranslated region of GATA4 as molecular hotspots for congenital heart disease (CHD)

Background The 3'-untranslated region (3'-UTR) of mRNA contains regulatory elements that are essential for the appropriate expression of many genes. These regulatory elements are involved in the control of nuclear transport, polyadenylation status, subcellular targetting as well as rates of translation and degradation of mRNA. Indeed, 3'-UTR mutations have been associated with disease, but frequently this region is not analyzed. To gain insights into congenital heart disease (CHD), we have been analyzing cardiac-specific transcription factor genes, including GATA4, which encodes a zinc finger transcription factor. Germline mutations in the coding region of GATA4 have been associated with septation defects of the human heart, but mutations are rather rare. Previously, we identified 19 somatically-derived zinc finger mutations in diseased tissues of malformed hearts. We now continued our search in the 609 bp 3'-UTR region of GATA4 to explore further molecular avenues leading to CHD. Methods By direct sequencing, we analyzed the 3'-UTR of GATA4 in DNA isolated from 68 formalin-fixed explanted hearts with complex cardiac malformations encompassing ventricular, atrial, and atrioventricular septal defects. We also analyzed blood samples of 12 patients with CHD and 100 unrelated healthy individuals. Results We identified germline and somatic mutations in the 3'-UTR of GATA4. In the malformed hearts, we found nine frequently occurring sequence alterations and six dbSNPs in the 3'-UTR region of GATA4. Seven of these mutations are predicted to affect RNA folding. We also found further five nonsynonymous mutations in exons 6 and 7 of GATA4. Except for the dbSNPs, analysis of tissue distal to the septation defect failed to detect sequence variations in the same donor, thus suggesting somatic origin and mosaicism of mutations. In a family, we observed c.+119A > T in the 3'-UTR associated with ASD type II. Conclusion Our results suggest that somatic GATA4 mutations in the 3'-UTR may provide an additional molecular rationale for CHD.


Background
GATA4 (MIM# 600576) is a transcription factor which is characterized by a highly conserved binding domain of two zinc fingers. It is expressed in the heart and is essential for mammalian cardiac development (see reviews [1][2][3]). In mice, germline ablation of the gene encoding Gata4 results in abnormal ventral folding of the embryo, failure to form a single ventral tube, and lethality [4,5]. Besides heart development, Gata4 is involved in the formation of multiple organs, such as intestine, liver, pancreas and swim bladder in zebrafish [6] as well as gastric epithelial development in mouse through interaction with Fog cofactors [7].
In humans, four GATA4 gene mutations have been identified in families with congenital heart defects (CHD) notably, atrial septal defects [8][9][10][11]. One mutation (p.Gly296Ser) affects a highly conserved amino acid after the C-terminal zinc finger, and disrupts physical interaction with TBX5 [8], a transcription factor described to be defective in Holt-Oram syndrome [12]. But GATA4 germline mutations are rather rare. Two studies analyzed 16 families each, but only 2/16 families with multiple affected members carried GATA4 mutations [10,11]. In 42 patients with non-syndromic atrioventricular septal defects (familial as well as sporadic), no GATA4 mutations have been detected [13].
It is of considerable importance that genetic analysis of blood samples may not reveal somatically-occurring sequence alterations in the diseased cardiac tissues, but these mutations may provide a molecular rationale for pathogenesis [14,15]. We recently reported evidence for somatically-derived NKX2-5 and TBX5 mutations as a novel mechanism for septation defects of the human heart [16][17][18], and used the same heart collection of malformed hearts to investigate GATA4 gene mutations, notably those affecting the coding of amino acids for zinc finger functions of the protein [19]. To identify further genetic alterations associated with CHD, we continued our search in the 609 bp of 3'-untranslated region (3'-UTR) of GATA4.
The 3'-UTR of GATA4 is relatively long and likely contains regulatory elements essential for the regulation and transport of the mRNA transcript [20]. Indeed, evidence is accumulating that the 3'-UTR of mRNA is involved in the control of nuclear transport, polyadenylation status, subcellular targetting as well as rates of translation and degradation of mRNA; thus sequence alterations in this region may lead to disease (see review [21,22]). Recently, a single nucleotide deletion in the 3'-UTR of high mobility group A1 gene (HMGA1) was identified in two type 2 diabetes patients exhibiting deficiency of the protein [23]. The mutation decreased HMGA1 mRNA half life in the lym-phoblast cells of the patients, and further deletion of the 3'-UTR of HMGA1 resulted in the decrease of reporter assay expression. To the best of our knowledge, the 3'-UTR of GATA4 has not been investigated particularly as regards its role in post-transcriptional gene expression. In this paper, we report the detection of frequent GATA4 sequence alterations in the 3'-UTR in the diseased tissues of malformed hearts, some of which are predicted to alter RNA secondary structure. As a result of aberrant RNA folding, these mutations may lead to CHD by affecting mRNA localization and translation of protein. We found further five nonsynonymous mutations in exons 6 and 7 of GATA4.

Results
GATA4 is located on chromosome 8p23.1-p22, consists of seven exons and codes for a 442-amino-acid protein (see Fig. 1A). We investigated the genomic DNA for sequence variations in the entire coding region and untranslated regions (UTR) of this gene. In the case of formalin-fixed material, exons 3, 4, 6 and 7 of malformed hearts of patients with complex CHD could be amplified. We reported earlier the detection of 23 nonsynonymous mutations in exons 3 and 4, notably mutations affecting the zinc fingers of GATA4 [19]. Here we report further five nonsynonymous mutations in exons 6 and 7, which affect highly conserved amino acids 361, 377, 430, 432 and 442 in the C-terminal region of GATA4 (Table 2). Whether these sequence alterations result in aberrant protein expression requires additional studies. Previously, it has been shown that a frameshift mutation past amino acid 359 severely altered the encoded GATA4 protein and was associated with ASD [8]. Nonetheless, mutations p.Leu432Ser and p.Ala442Thr identified by us would affect polarity of the protein, due to replacement of nonpolar, hydrophophic amino acids with polar, hydrophilic ones. In addition, p.Met361Val likely results in aberrant RNA folding, as determined by a method described below.
Notably, exon 7 of GATA4 consists of 1,708 bp, the majority (1,525 bp) being untranslated. From our collection of formalin-fixed hearts, we could amplify the first four fragments of exon 7 (see Fig. 1A and Table 1 for the primer sequences designated GTx7-1 to GTx7-4). These primer sequences cover a part of the coding region c.1221 to c.1329 (nt 1739-1847, NM_002052) and 3'-UTR from c.+1 to c.+609 (nt 1848-2456, NM_002052). We found 9 sequence alterations, occurring in 3 to 7 patients, and 6 dbSNPs in the analyzed 3'-UTR region of GATA4 (Table 2,  Table 3). On the basis of sequence electropherograms and/or PCR-RFLP assays, both heterozygous and homozygous genotypes were observed (Fig. 1B). Using the program GeneQuest (Lasergene 6.0), we determined the effect of the nine sequence alterations on RNA folding.
GeneQuest uses the Vienna RNA folding procedure, taken from Zuker's optimal RNA folding algorithm, to fold the sense strand of selected DNA regions as RNA. We investigated the first 500 bp of the 3'-UTR, simply because the sequence alterations identified by us are located within this segment. Furthermore, there is suggestion that the whole 3-'UTR is not necessary for localization, but that localization signals lie within the regions of < 100-200 nt [25]. Compared to the reference sequence, 7/9 of these would result in faulty RNA folding (Fig. 2). While c.+77C > T and c.+479A > G would likely not lead to faulty RNA folding, the same pattern was observed for c.+10T > C and c.+44T > A. In addition, we searched the 3'-UTR of GATA4 for highly conserved motifs that may be involved in posttranscriptional regulation, (see [26]). None of the conserved motifs reported so far have been detected, but the highly conserved motif (AATAAA) associated with polyadenylation signals were located at positions c.+678-683 and c.+1502-1507.
To compare healthy vs. diseased tissues from each patient, we selected specifically those who were positive for mutations in the diseased tissue. For instance, in the diseased tissues, we found 17/68 patients without nucleotide changes in analyzed region of the 3'-UTR except for the dbSNPs. Thus for this comparison, we analyzed 21 patients for GTx7-1 (21 × 2 × 289 bp = 12,138 nucleotides); 25 patients for GTx7-3 (25 × 2 × 281 bp = 14,050 nucleotides) and 24 patients for GTx7-4 (24 × 2 × 270 bp = 12,960 nucleotides). Except for the dbSNPs rs867858, rs1062219, rs884662, rs904018, rs12825 (see Table 2), none of the mutations were detected in the unaffected tis-Genetic analysis of GATA4 Figure 1 Genetic analysis of GATA4. (A) Schematic representation of GATA4 (genomic, mRNA and protein level). Reference sequence for GATA4 was based on NM_002052 and the intron-exon boundaries on [42]. (B) Examples of 3'-UTR mutations and genotypes obtained after direct sequencing of fragments amplified from diseased cardiac tissues of patients affected by CHD. Additional sequence alterations in the 3'-UTR of GATA4 were detected, e.g. F24ASD (c.+281G>A).
sues. This result suggests somatic origin and mosaicism of mutations.
In the formalin-fixed malformed hearts, we observed six dbSNPs which were all located within 234 nucleotides in exon 7 (Table 2). We could calculate the allele frequencies for five loci (Table 2) and found the dbSNPs to be in Hardy-Weinberg equilibrium. We cloned three fragments from hearts heterozygous for rs1062219 (c.+426C > T), rs884662 (c.+517T > C) and rs904018 (c.+532T > C). Surprisingly, we found three haplotypes after sequencing of at least four clones in each heart. In all three hearts, we found one type containing all reference alleles [c. We further analyzed the entire gene in blood samples of 12 patients with CHD (see Fig. 1A). Mostly dbSNPs were detected (Table 4); nonetheless we observed two sequence variations in the 3'-UTR (c.+119A > T and c.+1260G > A) and two intronic (c.-518-25C > T and c.-458+5A > G) which have not yet been reported as dbSNPs. The 3'-UTR alterations and c.-518-25C>T were detected in a family of four, in which the two children were affected by ASD, but the parents had no history of CHD. For c.+119A>T, the mother was homozygous for the reference allele AA, while the father and the children had the heterozygous genotype AT (Fig. 3A&B). This sequence variation would affect RNA folding (Fig. 3C). As control, we analyzed blood samples from 100 unrelated Caucasian healthy individuals. No sequence variations were found, except for a synonymous change in exon 3 (c.699G>A, p.233Thr), an intronic (c.783+16G>A) and dbSNPs (Table 4).

Discussion
Recent studies characterize further the importance of Gata4 in cardiac development. Indeed, small changes in the level of Gata4 protein expression can dramatically influence cardiac morphogenesis and embryonic survival [27]. Moreover, myocardial expression of Gata4 is required for proliferation of cardiomyocytes, formation of the endocardial cushions, development of the right ventricle, and septation of the outflow tract [28]. To gain further insights into the role of GATA4 mutations in CHD in humans, we have analyzed the Leipzig collection of malformed hearts for sequence alterations of this gene. We reported previously the identification of 23 nonsynonymous mutations, 19 of which are located within the highly conserved zinc fingers and basic regions of GATA4 [19]. Notably, we identified a frequent mutation (p.Cys292Arg) in VSDs, which affects one of the four zinc coordinating cysteines in the C-finger and is predicted to disrupt the secondary structure of the protein. Functional characterization of site-directed mutated residues in the C-finger, showed that disruption of the zinc coordinating cysteines, would abolish DNA binding, Gata4-Nkx2-5 interactions and synergy [29,30].
To identify genetic alterations associated with CHD, we continued our search for mutations in the 3'-UTR region of GATA4. In the diseased tissues of malformed hearts, we found frequent alterations in the 3'-UTR of GATA4, in which 7/9 would affect RNA folding. Additional sequence alterations in the 3'-UTR of GATA4 were detected, but these were mostly isolated cases and therefore not reported here. Further, we identified a germline mutation (c.+119A>T) in which two affected family members were positive. Similarly, it would alter RNA folding. Since the unaffected father carried c.+119A>T, we cannot discount the possibilities of a low penetrance mutation or no causal association at all. Whether the 3'-UTR mutations reported here contributed to cardiac malformations by affecting the secondary structure of mRNA or through other posttranscriptional regulation associated with 3'-UTR, requires further functional analysis. Nonetheless, it has been shown that two mutations in the 3'-UTR of rat Mti (metallothionein-1) that are predicted to alter the secondary structure of this region also impaired localization of the mRNA [25].
There is, however, accumulating evidence on the important role of 3'-UTR in post-transcriptional regulation. For instance, efficient termination and stability of mRNA are dependent on a properly configured 3'-UTR [31]. Localization of mRNAs is also due to specific sites or signals within the 3'-UTR [25,32]. Recently, a systematic search discovered 106 highly conserved motifs in human 3'-UTRs that may be involved in post-transcriptional regulation, including mRNA stability and degradation [26]. Many of these are associated with microRNAs, which are non-coding RNAs of 21-22 nucleotides and are complementary to the 3'-UTR motifs that mediate post-transcriptional regulation [33]. Similary, 53 sequence motifs have been catalogued in 3'-UTR of yeast mRNAs, including motifs corresponding to known RNA-binding protein sites and motifs associated with subcellular localization [34]. Furthermore, 83 disease-associated variants in the 3'-UTR of human protein-coding genes have been systemat-  ically analyzed, and that functionality of these variants correlated highly to predicted secondary structural changes [35].
Many studies are suggestive for Gata4 to network with other transcription factors to regulate cardiac gene expression (see reviews [1][2][3]). These protein-protein interactions are mediated through the proteins' binding domains and for Gata4, the zinc fingers are involved. For instance, N-terminal zinc finger interacts with Fog2, while the Cterminal zinc finger contacts Nkx2-5, Nf-At3, Mef2c and Hand2. Notably, the C-finger of Gata4 interacts with the third alpha helix (homeodomain) of Nkx2-5, to enable transcriptional synergy of regulated genes. Site-directed mutagenesis of zinc coordinating cysteines in the C-finger leads to abrogation of Gata4-Nkx2-5 interaction and loss of synergy [29,30]. We reported previously the identification of somatically-derived GATA4 zinc finger mutations in the same malformed hearts, including a frequent mutation (p.Cys292Arg) in VSDs, which affects one of the four zinc coordinating cysteines and is predicted to disrupt the secondary structure of the protein [19]. However, aside from p.Cys292Arg and unlike NKX2-5 [16,17] common nonsynonymous mutations are rare in the coding regions of GATA4. Thus, the further detection of mutations in the 3'-UTR of GATA4 in the same malformed hearts with zinc finger mutations suggests a possible further mechanisms of disease, in which the zinc fingers are not involved. Indeed, as shown in Table 3, in which GATA4 nonsynonymous and 3'-UTR mutations identified in hearts with septation defects are summarized, there were 14 with 3'-UTR mutations only. These consist of 3 in VSDs, 3 in ASDs, and 8 in AVSDs. From these 14 hearts, 12 carried 3'-UTR mutations that are predicted to result in aberrant RNA folding. As can be surmised from Table 3, it appears that 3'-UTR mutations are more frequent in ASDs and AVSDs than in VSDs. In VSDs, 23 out of 29 had mutations in the coding region, most of which were p.Cys292Arg and only 9 out of 29, carried 3'-UTR mutations. It is also of considerable interest that sporadic cases of CHD from Lebanese population were analyzed for mutations affecting the zinc fingers and basic region of GATA4 [36]. A mutation (p.Glu216Asp) in the N-terminal zinc finger was detected in 2/26 patients with Tetralogy of Fallot (TOF), and this mutation resulted in reduced transcriptional activity without affecting GATA4's ability to bind DNA, nor its interaction with ZFPM2 (FOG2). However, none of the other 94 patients with various CHD phenotypes carried any mutations.
To help elucidate genetic alterations in affected tissues of malformed hearts, we have investigated in the past a panel of cardiac transcription factor genes from the same 68 malformed hearts [16][17][18][19]37]. Direct sequencing revealed mutations in diseased tissues, which were basically absent in matched normal heart samples. Common occurring mutations were identified, especially in the binding domains of transcription factors, which could affect DNAprotein or protein-protein interactions leading to CHD. While certain transcription factor genes (NKX2-5, GATA4) exhibited a high rate of mutations, others were not or rarely affected (HEY2, MEF2C). Results of these studies enabled us to put forward a hypothesis of somatic mutations as a novel molecular cause of CHD. Furthermore, we found malformed hearts containing combination of mutations in the binding domains of several transcription factors, As example, we identified in two patients with AVSD combined mutations in the binding domains of HEY2, NKX2-5, TBX5 and GATA4 [37]. We also identified 21 patients with combined mutations in the GATA4 zinc fingers and the homeodomain of NKX2-5 (unpublished results).

nonsynonymous and 3'-UTR) in diseased tissues of malformed hearts (Continued)
Our malformed hearts were stored in formalin more than 40 years ago and there is speculation about artifacts arising from this procedure [38][39][40]. To ensure reliability of our findings, we investigated sequence variations in 10 formalin-fixed, but healthy hearts which were collected at the same time as the malformed hearts. Essentially, these hearts were free of GATA4 mutations, therefore providing convincing evidence for reliability of the assay. In addition, our identification of six dbSNPs within the 3'-UTR demonstrates faithful preservation of DNA and reliability in using valuable retrospective material with the obvious advantage of studying clearly defined pathology, as opposed to using surrogate tissue material distal to organ defect. Five dbSNPs were highly polymorphic (see Table  2) and were found to be in Hardy-Weinberg equilibrium.
This was confirmed with our control population of healthy individuals. But cloning of fragments amplified from diseased heart tissues of patients who were 'heterozygous' for three dbSNPs detected three haplotypes instead of the expected two, which may be due to duplication of GATA4 in analyzed tissues.
This result is similar to our previous observations on different cardiac transcription factors conducted on the same set of malformed hearts [16][17][18][19]37]. After cloning amplified fragments with several closely-spaced 'heterozygous' mutations as marker loci within a single gene, we observed several haplotypes in individual hearts, instead of two haplotypes as expected of a diploid genome. (Note, we used the term haplotype to define the set of alleles Effect of mutations in the 3'-UTR of GATA4 on mRNA folding Figure 2 Effect of mutations in the 3'-UTR of GATA4 on mRNA folding. Using the program GeneQuest (Lasergene 6.0), the effect of sequence alterations on RNA folding was predicted using the first c.+500 nt in the 3'-UTR of GATA4. GeneQuest uses the Vienna RNA folding procedure, taken from Zuker's optimal RNA folding algorithm [43], to fold the sense strand of selected DNA regions as RNA. Compared to the reference sequence, 7/9 mutations would result in faulty RNA folding, on which five patterns here are shown. Same pattern was observed for c.+10 T > C and c.+44T > A.
within an investigated gene locus). The cause of the observed multiple haplotypes is unknown, but may be explained by a mixed population of cardiomyocytes carrying different mutations or de novo chromosomal rearrangements and gene duplications in the heart tissues of patients affected by CHD. Our results for NKX2-5 by using a yeast-based assay to determine function suggest that different haplotypes can lead to different cardiac disease phenotypes [41]. Furthermore, the presence of combined mutant alleles may alter/modify pattern of mRNA folding. We found, for instance, patients who were either positive for c.+218C>T or c.+259 A>G or both. Two patients were heterozygous for both mutations, while one patient was homozygous for both variant alleles. Singly c.+218C>T or c.+259 A>G would lead to misfolding (see Fig. 2), but if combined only the pattern observed for c.+259 A>G would result. Moreover, different haplotypes due to combinations of closely-spaced polymorphisms in the 3'-UTR of genes can result in different mRNA stabilities in transient expression assays [35]. Nonetheless, as the malformed hearts were conserved in formalin, possibilities exist that the observed multiple haplotypes could be PCR errors resulting from fragmented DNA. We believe the contrary, however, as specific haplotypes were detected in several malformed hearts after cloning, but were absent in matched unaffected heart tissue (unpublished results).
To further support DNA assay reliability on the formalinfixed material, we examined in the course of our work with the Leipzig heart collection, additional genes which have been associated with cardiac malformations. We searched for sequence variations affecting the binding domains of MEF2C and HEY2. In the case of MEF2C gene, no nonsynonymous mutations have been detected in a total of 68 patients with complex CHD (unpublished results). A similar finding was observed with the HEY2 gene, where only three nonsynonymous mutations were found in 2 of 52 patients [37]. Our work on TBX5 in the same heart collection likewise supports reliability of using the material [18]. Mutations in this T-box transcription factor has been associated with Holt-Oram syndrome (HOS), a disorder characterized by heart and upper limb deformities [12]. We amplified 200, 307, 276, 203 and 515 bp containing exons 3, 4, 5, 7 and 8, respectively in the same heart collection. The sequences encoding the Tbox are located in exons 3, 4 and 5. We found a total of nine nonsynonymous mutations distributed in few patients with ASDs and AVSDs. None of the 29 VSDs carried any nonsynonymous mutations, only two synonymous mutations were found in two patients. To assess further method reliability, we investigated in the same patient cohort amplified fragments from other genes, with or without prior implication to heart development (e.g. CFC1, HMGN4 and BRCC2), in the same heart collection, as well as in formalin-fixed normal mouse heart s (e.g. Hand1, Nkx2-5 and Gata4) but did not find sequence alterations [37].

Conclusion
We identified hotspots associated with cardiac defects in the 3'-UTR region. Our results suggest that somatic GATA4 mutations in the 3'-UTR may provide an additional molecular rationale for CHD

Methods
Materials, genomic DNA isolation, and mutation detection have been described previously [16]. Briefly, we analyzed 68 formalin-fixed hearts of Caucasians with complex cardiac malformations, notably n = 29 with ventricular (VSD), n = 16 with atrial (ASD), and n= 23 with atrioventricular (AVSD) septal defects. The explanted hearts, which were collected between 1954-1982, were obtained from the Institute of Anatomy, University of Leipzig, Germany. We also analyzed blood samples of 12 Caucasian patients with CHD, and blood samples of 100 unrelated healthy Caucasian individuals. In blood samples, defects of patients with CHD included VSD, ASD, hypoplastic left heart syndrome (HLHS), transposition of the great arteries (TGA), sub-pulmonary stenosis (SPS) and heterotaxy. Except for two patients who came from the same family, 10/12 patients were unrelated individuals. In the formalin-fixed malformed hearts, diseased tissues in the vicinity of the septal defects were analyzed.

Figure 3
Detection of 3'-UTR mutations in blood samples of patients with CHD. (A) sequence electropherograms for c.+119A>T, showing heterozygous (AT) and homozygous (AA) genotypes; (B) in a family of four in which the two children had ASD; the father and the children had the heterozygous genotype AT (+), the mother was homozygous for the reference allele AA; (C) Compared to the reference sequence and on the basis of Vienna RNA folding procedure, c.+119A>T is predicted to affect RNA folding.
Matched healthy heart tissues were likewise investigated for sequence alterations. Materials used in this study were obtained in accordance to an approved protocol. JB has obtained approval to conduct genetic studies involving human materials from the Medical School Hannover. The formalin-fixed hearts were collected more than 40 years ago and cadavers were donated voluntarily by patients' relatives.
The primers and PCR conditions used in investigating sequence variations of GATA4 are given in Table 1. PCRamplified fragments were sequenced directly using BigDyeTerminator v3.1 Kit (Applied Biosystems, Darmstadt, Germany) and Applied Biosystems 3100 Genetic Analyzer. Sequences were analyzed using SeqScape 2.0 (Applied Biosystems) or DNASTAR Lasergene 6.0 (Madison, Wisconsin USA). The numbering of mutations within the coding region of GATA4 starts with nucleotide A of the first codon ATG, while that in the untranslated region was based on NM_002052. Nucleotide changes in the untranslated and intronic regions were numbered according to suggested nomenclature [24]. Sequence variations were verified by independent PCR, double-strand sequencing, PCR-RFLP, or cloning of heterozygous genotypes followed by subsequent sequencing of clones, allowing the identification of variant alleles. Unless reported as NCBI dbSNPs, we refer to nucleotide changes as sequence variations or mutations interchangeably, which simply means deviations from the reference GATA4 sequence (NM_002052), and disease-causing or not.