Genetic variants associated with fasting blood lipids in the U.S. population: Third National Health and Nutrition Examination Survey

Background The identification of genetic variants related to blood lipid levels within a large, population-based and nationally representative study might lead to a better understanding of the genetic contribution to serum lipid levels in the major race/ethnic groups in the U.S. population. Methods Using data from the second phase (1991-1994) of the Third National Health and Nutrition Examination Survey (NHANES III), we examined associations between 22 polymorphisms in 13 candidate genes and four serum lipids: high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), total cholesterol (TC), and triglycerides (TG). Univariate and multivariable linear regression and within-gene haplotype trend regression were used to test for genetic associations assuming an additive mode of inheritance for each of the three major race/ethnic groups in the United States (non-Hispanic white, non-Hispanic black, and Mexican American). Results Variants within APOE (rs7412, rs429358), PON1 (rs854560), ITGB3 (rs5918), and NOS3 (rs2070744) were found to be associated with one or more blood lipids in at least one race/ethnic group in crude and adjusted analyses. In non-Hispanic whites, no individual polymorphisms were associated with any lipid trait. However, the PON1 A-G haplotype was significantly associated with LDL-C and TC. In non-Hispanic blacks, APOE variant rs7412 and haplotype T-T were strongly associated with LDL-C and TC; whereas, rs5918 of ITGB3 was significantly associated with TG. Several variants and haplotypes of three genes were significantly related to lipids in Mexican Americans: PON1 in relation to HDL-C; APOE and NOS3 in relation to LDL-C; and APOE in relation to TC. Conclusions We report the significant associations of blood lipids with variants and haplotypes in APOE, ITGB3, NOS3, and PON1 in the three main race/ethnic groups in the U.S. population using a large, nationally representative and population-based sample survey. Results from our study contribute to a growing body of literature identifying key determinants of plasma lipoprotein concentrations and could provide insight into the biological mechanisms underlying serum lipid and cholesterol concentrations.


Background
Decades of research have demonstrated that serum concentrations of blood lipids are associated with increased risk for cardiovascular disease and mortality [1][2][3][4]. Previous reports from the Framingham Heart Study suggested a strong positive relationship between coronary heart disease and elevated levels of total cholesterol (TC) and lowdensity lipoprotein cholesterol (LDL-C) levels, in addition to an inverse relationship between the disease and high-density lipoprotein cholesterol (HDL-C) levels [5][6][7][8]. The genetic basis for elevation in lipid levels is not well understood, but substantial heritability has been demonstrated in twin [9] and family-based [10][11][12] studies, which have estimated that approximately 43% to 83% of the variance in blood lipid and lipoprotein levels is attributable to genetic factors. Recent candidate gene studies [13][14][15][16], as well as genome-wide association studies [17][18][19][20][21][22][23][24][25], have identified polymorphisms that account for a portion of the variation in blood lipid levels.
Many genes involved in metabolic pathways have been found to contribute to lipid level variability [14,26,27].
However, conflicting findings are common among genetic association studies. Inconsistencies might be caused by differences in study design, study populations (geographic and ethnic background), statistical methods and power, allele frequencies, and gene-environment interactions. It is not clear if such findings are generalizable to the U.S. population. To assess genetic variation among racial and ethnic groups in the U.S. population, we need genetic information from a large, well-designed, and population-based U.S. survey, such as the Third National Health and Nutrition Examination Survey (NHANES III) that includes the three major race/ethnic groups. Therefore, we sought to investigate the associations between 22 polymorphisms in 13 candidate genes and serum lipid concentrations using data from the NHANES III, a nationally-representative survey of the U.S. population.

Study population
NHANES III is a multi-stage complex probability survey conducted from 1988 to 1994 by the National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention (CDC) [28,29]. The survey was designed to provide nationally representative statistics on the civilian, non-institutionalized U.S. population aged 2 months or older. A DNA bank was created from blood samples collected during the second phase of NHANES III (1991-1994) from participants aged 12 years or older. This DNA bank provided one of the first opportunities to assess genetic variation among major racial and ethnic groups using a well-designed, population-based, and nationally representative sample of the U.S. population. The bank contains specimens from 7,159 participants, 62% of whom originated from households containing multiple family members (mean: 1.59 participating members per household; range 1-11). More information on the DNA bank is available on the NCHS Web site [30].
We combined genetic data with behavioral, environmental, and clinical information available in NHANES III. We restricted our analyses to participants aged 17 years or older (n = 6,317). Of these, we included only those who self reported as non-Hispanic white, non-Hispanic black, or Mexican American (n = 6,016), who had their blood drawn in the morning (n = 2,712), who fasted at least 9 hours (n = 2,488), and who did not take cholesterol-lowering medications (n = 2,413). This study was approved by the NCHS Ethics Review Board.

Selection of polymorphisms
We tested 22 polymorphisms in 13 candidate genes that were chosen from a set of variants that we previously genotyped in the NHANES III DNA Bank [31], including polymorphisms in ABCB1, ADH1C, ADRB2, ADRB3,  APOE, ITGB3, MTHFR, MTRR, NOS3, SERPINE1,   PON1, PPARG, and TNF (Additional file 1, Table S1). The candidate genes included in this current study were identified from systematic literature reviews on previously published associations with blood lipid levels, chosen based on the biology of the disease locus in relation to the outcomes, or chosen based on prior linkage studies. Information on the nucleotide or amino acid change for each variant is included in Additional file 1, Table S1.

Genotyping methods
All genotypes were analyzed using TaqMan (Applied Biosystems, Foster City, California) or MGB Eclipse (Nanogen, Bothell, Washington) assays. Polymorphisms that passed blind-replicate analyses (≥ 98% of genotypes matched) were tested for deviation from Hardy-Weinberg proportions (HWP) using standard chi-square goodness-of-fit tests. Variants that deviated from HWP at p < 0.01 for at least two of the three included race/ethnic groups (i.e., non-Hispanic white, non-Hispanic black, and Mexican American) were excluded from further analysis. Detailed genotyping methods and quality control criteria have been previously described [31] or can be obtained from NCHS (in the case of APOE).

Laboratory measures and phenotype definitions
Details of the blood collection procedures and the laboratory evaluation of LDL-C, HDL-C, TC, and TG are available online [32]. Serum LDL-C was calculated using the Friedewald equation [33]. Participants who did not fast, who fasted fewer than 9 hours, or who had TG levels greater than 400 mg/dL were excluded in the analyses.
Phenotypic covariates included in the analyses were previously reported to be associated with blood lipid levels [34]. In non-genetic models, age and body mass index (BMI) were both strongly associated with blood lipid concentrations within each race/ethnic group (Additional file 1, Table S2). The remaining risk factors were significantly associated with at least one lipid measured in at least one race/ethnic group. The covariates included in the final models were: age (17-39 years, 40-59 years, or ≥ 60 years); sex; education completed (less than high school, high school, or college and above); alcohol intake (none, <4 drinks per week, ≥ 4 drinks per week); smoking status (current smoker, former smoker, non-smoker); BMI; physical activity [none, low (active <5 times per week), or high (active ≥ 5 times per week)]; and log of total fat intake (g/day, reported in a dietary recall from the previous 24-hour period).

Statistical analysis
All analyses were performed using SAS-Callable SUDAAN 9.01 (Research Triangle Institute, Research Triangle Park, North Carolina) and SAS 9.1 (SAS Institute, Cary, North Carolina) to account for the NHANES III complex sampling design. For each genetic variant, univariate and multivariable regression models were used to test for genetic associations with each blood lipid measurement, stratified by self-reported race/ethnicity. Interaction between each variant and race/ethnicity was examined to test racial/ethnic differences in the genetic effects. We assumed an additive model of inheritance and used regression analyses to test the null hypothesis that LDL-C, HDL-C, TC, or TG levels did not differ by an increasing number of minor alleles. Beta-coefficient estimates and 95% confidence intervals for each variant were calculated in regression models using sample weights that were recalculated for the NHANES III DNA bank data. TG levels were log-transformed to approximate a normal distribution.
Haplotype analysis was also performed for each of the seven genes for which at least two polymorphisms were genotyped: ADH1C, ADRB2, APOE, MTHFR, NOS3, PON1, and TNF. Haplotype frequencies were inferred within each racial/ethnic group using the Expectation-Maximization algorithm [35,36] available in the HAPLO-TYPE procedure in SAS/Genetics. The inferred haplotypes with rare frequency (<1%) were combined into one variable ("other"). Haplotype trend regression analyses [37,38] were conducted using crude and multivariable regression models, as described above.
For both the single variant analyses and haplotype analyses, the p-value from Satterthwaite statistics was adjusted to control the false discovery rate (FDR) [39], a method for correcting for multiple testing, in each of the three race/ethnic groups separately. An association was considered significant at an FDR-adjusted p-value of < 0.05.
We used Quanto (University of Southern California, Los Angeles, California; http://hydra.usc.edu/gxe/) to estimate the power of our study. Assuming additive genetic models, we determined the beta-coefficients that correspond to a genetic variant explaining 1% of the variation in the lipid measurements for allele frequencies ranging from 0.01 to 0.5. For these beta-coefficients and allele frequencies, we calculated the lower and upper limits of our power which account for our multiple testing adjustments using the effective sample sizes (sample sizes multiplied by a design effect of 1.2 to account for the complex sampling design of NHANES III) of the three race/ethnicities.

Results
Characteristics of the participants included in this study are described in Table 1. Non-Hispanic whites (n = 989) were the oldest, had obtained higher levels of education, and were the most physically active compared to non-Hispanic blacks and Mexican Americans. Non-Hispanic blacks (n = 683) were least likely to have consumed any alcoholic drinks in the past week, were most likely to be current smokers, and had the highest mean body mass index (BMI). Mexican Americans (n = 741) were the youngest, had the highest proportion of male participants, and were the least likely to smoke. Blood lipid levels were also significantly different (at p < 0.05) across the three main race/ethnic groups. Non-Hispanic blacks tended to have the highest HDL-C levels compared with non-Hispanic whites and Mexican Americans. In contrast, non-Hispanic whites had the highest LDL-C and TC levels; whereas, Mexican Americans had the highest serum triglycerides levels.
Allele frequencies for the study variants among the three racial/ethnic groups in the U.S. population are available in Additional file 1, Table S1. Each genetic variant was tested for association with each of the four blood lipid measurements. Table 2 lists the genetic variants with significant associations ((false-discovery rate (FDR)adjusted p-value < 0.05)) in at least one race/ethnic group for the crude or adjusted regression models. Complete results for all studied variants, with and without FDR adjustment of p-values, are available in Additional file 1, Table S3a-d (crude analyses) and Table S4a-d (covariateadjusted analyses). The FDR-adjusted and unadjusted pvalues for testing racial/ethnic differences in the genetic effects (SNP × race/ethnicity interaction) are also included in these Additional Tables. In fasting samples from non-Hispanic whites, none of the studied variants were found to be significantly associated with any blood lipids after adjustment for multiple testing. For non-Hispanic blacks, APOE rs7412 was strongly associated with both LDL-C and TC in crude and adjusted analyses. We observed that several polymorphisms were significantly associated with blood lipids in the Mexican American population, including: PON1 rs854560 with HDL-C in both crude and adjusted analyses; APOE rs7412 and rs429358 with LDL-C and TC in adjusted analyses; and NOS3 rs1799983 with LDL-C in adjusted models only. None of the 22 polymorphisms were found to be associated with triglyceride levels except for ITGB3 rs5918 in non-Hispanic blacks.
Haplotypes with significant associations with blood lipids in at least one race/ethnic group in crude or adjusted models are listed in Table 3. Complete results are included in Additional file 1, Tables S5a-d (crude analyses) and S6a-d (covariate-adjusted analyses). In the three race/ethnic groups, all polymorphisms within a gene were in linkage disequilibrium (p < 0.05 from linkage disequilibrium test; data not shown). Consistent with the results of the individual polymorphisms, haplotypes within three genes (APOE, NOS3, and PON1) were found to be significantly associated (FDR-adjusted p < 0.05) with blood lipid levels. In non-Hispanic whites, the only significant associations found were for the A-G haplotype of PON1 in relation to elevated LDL-C in crude and adjusted models. Among non-Hispanic blacks, an inverse association was found in crude and adjusted analyses between the T-T (ε2 isoform) haplotype of APOE and LDL-C (p ≤ 0.0010 for both) and TC (p = 0.052 for crude; p = 0.0020 for adjusted). In Mexican Americans, haplotypes of PON1 (A-A and A-G) were significantly associated with elevated HDL-C; and NOS3 haplotype T-T was significantly associated with decreased levels of LDL-C. In addition, carriers of the APOE C-C (ε4 isoform) haplotype had significantly increased LDL-C (borderline significant in crude model and strongly significant in adjusted model) and TC (in the adjusted model); whereas, APOE T-T (ε2) carriers had significantly decreased levels of LDL-C and TC (in adjusted models

Table 3: Crude and adjusted haplotype analysis for blood lipid levels by race/ethnicity -NHANES III (1991-1994) (Continued)
only). The A-G haplotype of ADRB2 was significantly associated with elevated HDL-C (in crude and adjusted models) among Mexican Americans (see Additional  Tables S5a and S6a). However, the confidence intervals were relatively wide. Rare haplotypes (which were combined and coded as "other") of MTHFR and TNF were associated with blood lipids in at least one race/ethnic group (Additional file 1, Tables S5a-d, S6a-d). There were no common haplotypes significantly associated with TG levels in any race/ethnic group.

Discussion
In this study, we evaluated statistical associations between blood lipid levels and candidate genes involved in a number of biological pathways, such as nutrient metabolism, immune response and inflammation, oxidative stress, and homeostasis. To our knowledge, there is only one other study (i.e., Keebler et al.) [40] published that describes genetic associations with blood lipid levels using a nationally representative sample of the U.S. population. This study also used data from the NHANES III survey, but associations were examined at 19 genomewide validated loci on fasting and nonfasting samples. Those data were not available for our use while the present study was being conducted. We examined a different set of polymorphisms which had been identified previously through candidate gene association studies. In our analyses, we used only fasting samples in accordance with guidelines from The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III, ATPIII) [34]. Our findings suggest, before and after adjustment for numerous demographic and behavioral characteristics in one or more race/ethnic groups, that blood lipid levels differ by an increasing number of minor alleles of polymorphisms in APOE, ITGB3, NOS3, and PON1. Our results also show that the A-G haplotype of ADRB2 was associated with elevated HDL-C among Mexican Americans. However, these results from crude and adjusted models are unstable (wide confidence intervals) and would need more data collected to support the association. We found that the group of rare haplotypes (frequency <1%) within MTHFR and TNF were associated with several blood lipids across race/ethnic groups; but, we are unable to identify which rare haplotype(s) contribute to these findings. Consequently, we cannot interpret these associations.
In analyses of individual variants and of haplotypes, we found strong statistical associations between genetic variation in APOE and LDL-C and TC levels in non-Hispanic blacks and Mexican Americans. APOE, one of the most studied genes in risk assessment of cardiovascular disease, plays a key role in the metabolism of cholesterol and triglycerides by binding to receptors on the liver and helping to mediate the clearance of chylomicrons and very low-density lipoproteins from the bloodstream [41][42][43][44][45][46]. Allelic variation in APOE has been associated consistently with plasma concentrations of total cholesterol and LDL cholesterol [42,47], and with protein levels of APOB (the major protein of LDL, VLDL, and chylomicrons).
Our findings suggested an association of the NOS3 rs1799983 variant and T-T haplotype with LDL-C in Mexican Americans. NOS3 serves as a key enzyme of the endogenous nitrovasodilator system, which is essential for the regulation of vascular function and blood pressure, through the production of nitric oxide. The Glu298Asp variant (rs1799983) has been significantly associated with higher plasma LDL cholesterol, LDL particle size, and lower plasma HDL cholesterol; but no significant associations were found with the T-786C variant [48]. Numerous studies have also reported a positive association with the Glu298Asp variant and haplotypes containing this variant with higher triglycerides and LDL cholesterol in Venezuelans [49] and Greeks [50].
We found higher HDL-C among Mexican American carriers of the PON1 rs854560 (Leu55Met) variant and A-A and A-G haplotypes. Conversely, we found higher LDL-C in non-Hispanic white carriers of the A-G haplotype. PON1 is an HDL-associated esterase that hydrolyzes products of lipid peroxidation and prevents the oxidation of HDL and LDL. In fact, the antioxidant activity and anti-atherogenic effect of HDL is thought to be largely because of the paraoxonase located on the HDL particle. Variants in PON1 previously have been associated with serum HDL and LDL cholesterol levels [51,52], and with increased risk for stroke [53]. There have been multiple studies and meta-analyses evaluating the association of PON1 variants with blood lipids in several populations or community-based samples, but with inconsistent results [51,[54][55][56][57][58][59][60][61].
Our results suggest a strong association of ITGB3 with triglycerides in non-Hispanic blacks. ITGB3 is a membrane receptor for fibrinogen and von Willebrand factor that has an important role in platelet aggregation. The Pro33 allele (rs5918) has been associated with coronary thrombosis [62,63] and stroke [64,65]. A previous study examined associations between 15 single nucleotide polymorphisms across ITGB3 and cardiovascular diseaserelated traits in the Hutterites (e.g., plasma levels of HDL and LDL cholesterol and triglycerides) and suggested that ITGB3 has sex-specific associations with plasma lipoprotein(a) [66].
Although we did not assess racial/ethnic difference in the genetic effects, we observed that two associations, both involving the rs7412 variant in APOE, were significant in two racial/ethnic groups. No variants were significant across all three racial/ethnic groups after the FDR adjustment. Limited power and statistical chance may explain, at least, in part, the lack of consistent findings across the three race/ethnicities. Alternatively, these differences may be caused by varying linkage disequilibrium patterns at causal loci across different race/ethnic populations or by gene-environment interactions that have not been identified or measured. As a result, it might not be unusual to find varying risks for a disease or trait at a given genomic locus across population subgroups. In agreement, a recent study examined 12 newly discovered genetic variants known to predict lipid levels in Europeans and also evaluated local ancestry at validated genes that influence lipid levels [67]. This study found genetic differences between the determinants of lipid phenotypes across different African and European populations. Such findings might suggest that many of the truly causal variants in different race/ethnic groups have yet to be discovered, as most genetic epidemiology studies have been performed in populations of European descent.
Although we identified associations of APOE, ITGB3, NOS3, and PON1 with blood lipid levels by examining polymorphisms individually, our results suggest that assessing genetic variation using haplotype methods might be more comprehensive and more informative. We found that although a single genetic variant might have a small (if any) effect in identifying a susceptibility locus for an outcome, the effect might reach statistical significance when combined with other variants within the gene. For example, after adding a single variant (APOE rs7412) to a regression model containing non-genetic risk factors, we were able to explain only slightly more variation in LDL-C (R 2 = 0.1448 for non-Hispanic white persons, 0.2065 for non-Hispanic black persons, and 0.1462 for Mexican American persons) compared to the variation explained by non-genetic risk factors alone (R 2 = 0.1163, 0.1533, and 0.1230, respectively). However, we observed that a larger proportion of the variation in LDL-C is explained by the model that contains the APOE T-T haplotype compared to the model containing the rs7412 variant alone (R 2 = 0.1521, 0.2073, and 0.1636, respectively). Overall, the variance in blood lipid levels explained by the contribution of each individual variant or haplotype is considerably small (<5%; data not shown).
The present study has many notable strengths. First, the study was conducted using a large population-based and nationally representative survey of the United States. The wealth of data in NHANES facilitated the examination of genetic, environmental, and clinical data for each of the three major race/ethnicities in the United States. Moreover, whereas many previous reports were limited to a single population or were based on smaller study populations, we were able to conduct the analyses separately in each race/ethnicity, and were therefore able to account for the differences in allele frequencies, disease prevalence, and linkage disequilibrium patterns between these subpopulations. Finally, the control of hypercholesterolemia is an important clinical and public health objective. Awareness of, and screening for, hypercholesterolemia have become more common in recent years. Accordingly, treatment of the condition has increased since the initiation of the National Cholesterol Education Program in 1985. The use of cholesterol-lowering medications has increased steadily in U.S. adults aged ≥ 20 years, from 8.2% in 1999-2000 to 14.0% in 2005-2006, as measured in NHANES [68]. Among those diagnosed with hypercholesterolemia, the proportion on treatment increased from 32.4% to 38.9% in the 8-year period from 1999 to 2006 [69]. Association analyses of genetic variants involved in influencing blood lipid levels may therefore be complicated by a high prevalence of study participants who take lipid-lowering drugs. An advantage of this study in NHANES III is that a small number of participants taking such medication needed to be excluded (n = 75; 3% of fasting samples). Evaluation of such genetic associations in subsequent NHANES surveys will result in a loss of a higher number of participants in the analyses.
In addition to these strengths, we acknowledge several limitations. To help reduce the chance of potential falsepositive results from multiple testing, we adjusted p-values to control the false discovery rate [39]. This method assumes that the set of tests are independent. Yet, we know that many of the test statistics might be correlated because of linkage disequilibrium between genetic variants [70]. The FDR adjustment, therefore, might result in overly conservative p-values, thus decreasing our ability to identify true associations.
Although we stratified the analysis by race/ethnicity, we cannot eliminate completely the possibility of confounding of our study results by population stratification. We were not able to assess population structure in our analysis and grouped participants by broad categories on the basis of self-reported race and ethnicity. Substantial admixture in the African American and Hispanic populations has been documented [71][72][73][74]. However, previous research conducted on the U.S. population has found little evidence for population substructure in whites [75].
Although the NHANES III data may be more representative of the U.S. population than other non-populationbased samples, the statistical power to detect genetic associations was limited in this study. For example, we determined the beta-coefficients that correspond to the genetic variant explaining 1% of the variation in LDL-C. The beta-coefficients ranged from 5.2 to 26.3 depending on the frequency of the minor allele (MAF = 0.01 to 0.5). Using these beta-coefficients and corresponding allele frequencies described above for LDL-C, we found that our power would be 42-82% for non-Hispanic whites, 24-