In the present study we used genotype data from 52 populations in the Human Genome Diversity Project (HGDP) to characterize the worldwide patterns of risk allele frequencies (RAFs) of 158 common SNPs associated with cardiovascular diseases and related quantitative traits. Our null hypothesis was that there is no variation of RAFs of SNPs associated with cardiovascular diseases and intermediate phenotypes, among the populations. Out of 158 susceptibility SNPs, substantial variations in RAFs of such SNPs were noted among the 52 populations, including some risk alleles being fixed or missed in at least one population. In addition we found that eight SNPs showed significant differences of RAFs among the seven geographic areas. These findings provide insights into the 'global' genetic epidemiology of cardiovascular disease.
Maximum differences in RAFs between any 2 populations ranged from 0.089 to 1.000 across SNPs with a mean of 0.661 (Figure 1A). In comparison with the rest of world, large differences in RAFs (ie, ΔF > 0.3 or ΔF < -0.3) were noted for the regions of Africa, East Asia, America, and Oceania (Additional file 4 and Figure 1B), consistent with the out-of-Africa hypothesis. Several explanations can be put forth for the larger differences in RAFs, including either demographic changes or shared selective events. Hofer et al.  suggested that large allele frequency differences between human continental groups are more likely to have occurred by genetic drift during population expansion after a bottleneck, than by selection. Using all the 158 SNPs sampled from the HGDP database, we also calculated Nei's genetic distances by 'dist.genet' function in the 'ade4' package  in R and then used multidimensional scaling (MDS) (by 'cmdscale' function in R) to assess population differences (Additional file 8). The MDS shows differentiation among the populations corresponding to the three main clusters (ie, Europeans, Africans, and Asians). Analysis of molecular variance (AMOVA)  using the 'amova' function in the 'pegas' package  in R showed that, of the total variance, 28.6% was due to variance among the seven geographic regions, 18.4% was due to variance among the populations within geographic regions, and the remaining (53.0%) was due to variance among individuals within populations (Additional file 9).
The susceptibility variants analyzed in the present study are likely functional variants (or in linkage disequilibrium with the causal variants). The large differences in RAFs might be due to either natural selection or population demography. For example, the non-synonymous SNP rs602662 within FUT2 is in strong linkage disequilibrium with a non-sense mutation (rs601338), a plausible causal variant . SNP rs3184504 is located in exon 3 of SH2B3 which encodes the T-cell adapter protein LNK  and might be a causal variant (Table 1). The association of the other six SNPs has been replicated in multiple cohorts or independent samples, suggesting they are likely to be 'true' associated loci. We found additional evidence for recent positive selection (based on iHS)  in four (rs1378942, rs653178, rs9388489, and rs3184504) of these eight SNPs. An elevated iHS score suggests that the ancestral allele itself or the selected allele hitchhiking with the ancestral allele may be the target of selection. The iHS for each of the four SNPs was positive, indicating that the ancestral allele was under selection.
We did not observe a significantly higher degree of population differentiation for cardiovascular disease susceptibility SNPs identified in GWAS (Figure 2B). The mean global F
ST of the 158 SNPs (0.1042) was not significantly higher than the F
ST for random markers. Lohmueller et al.  found that 48 SNPs associated with common diseases were not significantly more differentiated across populations than random SNPs, and in another study of 25 disease-associated SNPs identified in GWAS, the mean global F
ST (0.100) was not significantly higher than random SNPs in the genome . Using F
ST and iHS, Southam et al.  did not find consistent patterns of selection to confirm the 'thrifty-genotype' hypothesis for metabolic syndrome/diabetes based on HapMap data.
It should be noted that, after correction for multiple comparison either by Bonferroni method or false discovery rate, only one SNP (rs174540) remained significant. Therefore population history and demography are likely to explain most of the difference in RAFs among populations. However, in previous studies, signatures of natural selection have also been noted in FTO (iHS = 1.991) , FUT2 , ATXN2 (high levels of LD) , and SH2B3 (iHS = -2.02 for SNP rs3184504) . The non-synonymous SNP rs3184504 in SH2B3 associated with higher diastolic blood pressure (minor allele 'T') may be under recent positive selection . In the HapMap samples, this derived T allele has been shown to occur on a long haplotype (~1.5 MB) (iHS = -2.76, P < 0.006) , and local selection was noted (ie, F
ST = 0.260 for CEU-YRI comparison and F
ST = 0.290 for CEU-JPT/CHB comparison). The present analysis confirmed that in the HGDP sample, significant population differentiation (F
ST = 0.207, P = 0.048) could be attributed to a relatively higher RAF in the Middle East and Europe. A high global F
ST for rs7901695 of TCF7L2 (0.361) was noted in the HapMap samples , but not in the HGDP sample (0.188), possibly related to differences in sample selection between HapMap and HGDP.
Coop et al.  examined the role of geography and population history in the spread of selectively favored alleles, using the HapMap and HGDP databases, and argued that strong, sustained selection that drives alleles from low frequency to near fixation has been relatively rare during the past ~70,000 years . The importance of geography on patterns of genetic variation has been established in previous studies [9, 29–31]. We examined the prevalence of large RAFs between geographic regions and noted that RAFs of three SNPs with a significantly higher global F
ST are quite low (high)/even missed (fixed) in several populations (Figure 3). Spatial and/or temporal variation of selective pressures, such as pathogens, climate or diet, may have restricted local selection to particular populations or environments . The 'ancestral-susceptibility' hypothesis for common 'complex' diseases  states that the ancestral allele is maladaptive in the modern environment and associated with increased disease susceptibility. We found that the ancestral allele was the risk allele in 65 out of 158 SNPs (41.1%) based on dbSNPs and UCSC database. Thus, a subset of the cardiovascular disease susceptibility SNPs conforms to the hypothesis of 'ancestral susceptibility model' for common 'complex' diseases.
Geographic variation in the prevalence of phenotypes of medical relevance can partly be due to differential RAFs among different populations. Levels of total and LDL cholesterol in the Pima Indians (living in central and southern Arizona and Sonora (Mexico)) are lower than in populations of European origin . The risk allele of rs174570 in FADS2 (a gene that regulates unsaturation of fatty acids) is missed in Pima (RAF = 0.000), but fixed in Africans (RAF = 1) (Figure 3), raising the possibility that the low RAFs of SNPs may contribute to the low LDL cholesterol levels in Pima Indians. Pairwise comparison of RAFs among 52 populations and seven geographic areas indicated a high population differentiation between American and non-Americans, consistent with a previous study  showing Native Americans have greater differentiation than populations from other continental regions.
Several limitations of our study need to be mentioned. There is a potential for bias in selecting SNPs from the GWAS database as well as from the HGDP database. The SNPs were selected based on GWAS in populations of European ancestry and we were not able to characterize the patterns of geographic difference of RFs for SNPs ascertained in other specific populations, such as Africans and Asians. In addition, the majority of the HGDP populations are poorly represented in the genotyping chips and only 158 out of 292 SNPs have been genotyped in HGDP . The genotyping platforms used in published GWAS varied and there was no standard threshold in declaring significant hits. Ascertainment bias in Oceania should be noted since only two populations (Papuan and NAN Melanesian) were sampled in this geographic region. Nonetheless the present study highlights a novel approach to understanding the global genetic epidemiology of cardiovascular disease, the leading cause of death worldwide and is also a step towards understanding the evolutionary genetics of this disease.