Identification of female-specific genetic variants for metabolic syndrome and its component traits to improve the prediction of metabolic syndrome in females

Background Metabolic syndrome (MetS), defined as a cluster of metabolic risk factors including dyslipidemia, insulin-resistance, and elevated blood pressure, has been known as partly heritable. MetS effects the lives of many people worldwide, yet females have been reported to be more vulnerable to this cluster of risks. Methods To elucidate genetic variants underlying MetS specifically in females, we performed a genome-wide association study (GWAS) for MetS as well as its component traits in a total of 9932 Korean female subjects (including 2276 MetS cases and 1692 controls). To facilitate the prediction of MetS in females, we calculated a genetic risk score (GRS) combining 14 SNPs detected in our GWA analyses specific for MetS. Results GWA analyses identified 14 moderate signals (Pmeta < 5X10− 5) specific to females for MetS. In addition, two genome-wide significant female-specific associations (Pmeta < 5X10− 8) were detected for rs455489 in DSCAM for fasting plasma glucose (FPG) and for rs7115583 in SIK3 for high-density lipoprotein cholesterol (HDLC). Logistic regression analyses (adjusted for area and age) between the GRS and MetS in females indicated that the GRS was associated with increased prevalence of MetS in females (P = 5.28 × 10− 14), but not in males (P = 3.27 × 10− 1). Furthermore, in the MetS prediction models using GRS, the area under the curve (AUC) of the receiver operating characteristics (ROC) curve was higher in females (AUC = 0.85) than in males (AUC = 0.57). Conclusion This study highlights new female-specific genetic variants associated with MetS and its component traits and suggests that the GRS of MetS variants is a likely useful predictor of MetS in females. Electronic supplementary material The online version of this article (10.1186/s12881-019-0830-y) contains supplementary material, which is available to authorized users.


Background
Metabolic syndrome (MetS), defined as a cluster of metabolic risk factors including dyslipidemia, insulinresistance, and elevated blood pressure, is known to increase the prevalence of other non-communicable diseases such as type 2 diabetes (T2D) and cardiovascular diseases (CVD) [1]. In addition, recent studies reported MetS to be associated with various types of cancer including pancreatic, liver, breast, and colon cancer [2]. The prevalence of MetS is approximately 20 to 30% of the general world population and it is associated with increased morbidity and mortality rates worldwide [2,3]. Prevalence of MetS is known to increase with age in females; furthermore, females are more prone to MetS due to various cultural factors including stress and low socioeconomic status [3]. The prevalence of cardiovascular diseases that are associated with MetS was shown to be higher than in males, as determined via various metaanalysis studies [4]. A recent systematic review reported that the prevalence of MetS in most countries in Asiapacific region was higher in females, supported by the data from South Korea in 2007 which described that the prevalence of MetS was higher in females (32.9%) than in males (29%) [5]. Therefore, it is important to thoroughly understand risk factors underlying pathophysiological mechanisms of MetS and the nature of the interaction between gender and the prevalence of this disease.
Until recently, MetS was thought to be a risk factor independent of sex, despite significant research interest in the scientific community; association with various diseases and treatment options have been evaluated irrespective of sex [3]. Since MetS is influenced by not only environmental but also genetic factors, numerous genetic studies have been conducted to gain insight into the genetic basis of MetS and its component traits [6][7][8]. To the best of our knowledge, however, no female-specific genetic association studies of MetS and its component traits have been conducted in any of ethnic groups including East-Asian as well as European populations. In this study, we sought to identify female-specific genetic variants associated with MetS and its component traits in Korean females by utilizing GWA analysis. In addition, by combining 14 single nucleotide polymorphisms (SNPs) detected from the MetS association analysis, we specifically aimed to investigate whether a genetic risk score (GRS) composed of these SNPs might assist the prediction of MetS in females of South Korean origin.

Subjects
Subjects for the discovery stage were recruited from the KARE (Korea Association Resource) study cohort, a population-based cohort of 8842 participants. The KARE cohort consist of two population-based studies, the Ansung and Ansan cohorts which are located in Gyeonggi Province, close to Seoul, the capital of the Republic of Korea. Details of the participant's recruitment criteria and the study design are provided elsewhere [9]. A total of 4659 females were present in the KARE study cohort, including 1211 MetS cases and 639 normal controls.
Participants included in the studies used for the replication stage included unrelated Korean participants from the Rural1816, Rural3667, and HEXA (Health Examinee shared control study) cohorts. The Rural1816 cohort includes 1816 subjects, of which 957 are female; of these, 404 MetS cases and 21 normal controls were included in this study. The Rural1816 study combines subjects from the Wonju, Pyeong Chang, Gangneung, Geumsan, and Naju regional cohorts in Korea.
The Rural3667 cohort consists of 3667 subjects of which 2265 females were included in our study; of these, 424 and 330 female subjects were MetS cases and normal controls, respectively. The Rural3667 study combines subjects from the Yangpyeong, Namwon, and Goryeong regional cohorts in Korea. The study design and cohort characteristics of both Rural cohorts have been described previously [9].
A total of 2051 female subjects including 237 MetS cases and 702 normal controls were selected from 3700 subjects of HEXA study that has been described elsewhere [7]. All subjects included for the discovery and replication stages in this study provided written informed consent approved by the local review board. Clinical characteristics of each study cohort are summarized in Table 1.

Genotyping and quality control
For the KARE study subjects, genotyping was carried out using an Affymetrix Genome-Wide Human SNP array 5.0. Details on genotyping quality control (QC) for genotype data have been described previously [9]. For KARE genotype data, SNP imputation was performed to increase the coverage of common variants employing the IMPUTE program (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html) with International HapMap data (phase 2, release 22) of 90 JPT and CHB individuals as the imputation reference panel. Imputed SNPs of poor imputation quality (imputation info score < 0.5, missing gene call rate > 1%, minor allele frequency (MAF) < 0.01, and Hardy-Weinberg equilibrium (HWE) test P-value < 1 × 10 − 7 ) were excluded [7].
Genotyping of Rural3667 subjects in the replication stage was conducted using Illumina HumanOmni I-Quad vI array. Samples with a genotype missing call rate > 1% and heterozygosity > 30% were excluded from the sample pool. Markers with a missing SNP call rate > 5%, with MAF < 0.01, and with HWE test P-value < 1 х 10 − 6 were eliminated. After this QC process, the remaining 3667 subjects and 747,297 SNPs were included for subsequent analyses.
For the Rural1816 and HEXA study subjects, SNP genotyping was carried out using an Affymetrix Genome-Wide Human SNP array 6.0. Details on genotyping QC for genotype data have been described previously [7].

Phenotype measurements
In this study, we assessed MetS cases according to International Diabetes Foundation (IDF) criteria. IDF criteria of Korean MetS includes the waist circumference ≥ 85 cm in females (≥ 90 cm in males) plus any two of the following factors: (1) TG ≥ 150 mg/dL, (2) HDLC < 50 mg/ dL in females (< 40 mg/dL in males), (3) SBP/DBP ≥ 130/ 85 mmHg, and (4) FPG ≥ 100 mg/dL [10,11]. The MetS control group in this study consists of individuals with traits that do not fall into any of the MetS range (i.e., for females, waist circumference < 80 cm, TG < 150 mg/dL, HDLC ≥50 mg/dL, SBP/DBP < 130/85 mmHg, and FPG < 100 mg/dL). Individuals with missing phenotypic values necessary for MetS assessment were not included in the MetS case-control analysis ( Table 1).

Statistical analyses
The associations of genetic variants with MetS or six MetS component traits (waist circumference, FPG, SBP, DBP, HDLC, and TG) were carried out after adjustments for age and participants' areas of recruitment. Logistic (for MetS) or linear (for MetS component traits) regression analyses with the above mentioned adjustments were performed via an additive model using PLINK version 1.07 software (http://zzz.bwh.harvard.edu/plink/) [12]. Because the KARE cohort for the discovery stage as well as both Rural cohorts for the replication stage combines subjects from several regional cohorts, participants' areas of recruitment were adjusted for the relevant analyses. However, we did not apply principal components for the adjustment of association analyses in this study because previous studies demonstrated that the population stratification is negligible in KARE subjects used in our GWAS discovery stage [7,9]. Genetic variants with an association P-value < 0.01 in the discovery stage were further analyzed in the replication stage, utilizing participants from the Rural1816, Rural3667 and HEXA studies. Meta-analyses were performed to combine discovery stage and replication stage analysis results. This meta-analysis was carried out based on an inverse-variance weighting method using the METAL Program (http://csg.sph.umich.edu/ abecasis/metal/index.html) [13] (Fig. 1).

Generation of the genetic risk score (GRS)
A genetic risk score (GRS) was generated by combining risk alleles of 14 SNPs identified by association analyses for MetS in this study. The risk score of each SNP was attributed based on the risk allele numbers (0, 1, or 2) presented. According to an additive genetic model, each SNP is assumed to be independently associated with risk [14]. Therefore, the GRS of individuals is calculated by summing the risk allele numbers of each of the 14 SNPs. In order to reduce bias and noise, only individuals with no missing genotypes of 14 SNPs were included in analyses employing GRS.

Construction of a MetS prediction model
The association between the GRS and MetS was tested using logistical regression analysis in females, males, and all subjects combining females and males in the KARE study. To predict the prevalence of MetS in subjects from the KARE study, the logistic regression model was constructed comprising not only GRS, but also two additional variables such as area and age. The constructed These numbers indicate the number of subjects used for the association analysis WHR waist-hip ration, BMI body mass index, FPG fasting plasma glucose, HDLC high density lipoprotein cholesterol, LDLC low density lipoprotein cholesterol, TG triglyceride, TCHL total cholesterol, SBP systolic blood presure, DBP diastolic blood presure, ALT aspartate transaminase, AST alanine transaminase, GTP Glutamyl Transpeptidase model was evaluated by receiver operating characteristics (ROC) analysis and area under the curve (AUC) calculations. Logistical regression analyses were performed using R software, version 3.3.2 (GNU General Public License). The AUC calculation and ROC plot visualization were executed using R's ModelGood package. A P-value of < 0.05 was considered to be significant.

Identification of genetic loci for MetS and its component traits
To discover female-specific genetic loci for MetS, we conducted two stage sex-stratified GWAS. In the stage 1, discovery stage, SNPs across the whole genome were tested for their association with MetS in 1211 cases and 639 controls of KARE female subjects. All SNPs showing association P-value < 0.01 (15,482 SNPs) from the stage 1 logistic analysis (adjusted for area and age) were taken forward to the stage 2 replication study (Additional file 5: Figure S1). A total of 1065 cases and 1053 controls were included for the replication analysis of female subjects from the Rural1816, Rural3667, and HEXA study cohorts ( Fig. 1). Meta-analysis was conducted for SNPs that were validated for their association with MetS (P-value < 0.05) in the replication stage by combining association results from discovery and replication stages. Our meta-analysis identified 14 SNPs that fulfill our arbitrary cut-off P-value as evidence of moderate association (5 × 10 − 8 < P meta < 5 × 10 − 5 ) (Additional file 5: Figure S2). The association of these 14 female MetS signals was not detected in male subjects ( Table 2 & Additional file 1: Table S1).
In order to understand the molecular basis of MetS in females, we also performed sex-stratified GWAS for six MetS component traits including waist circumference, TG, HDLC, SBP, DBP, and FPG. For each trait, genomewide scan data from a total of 4659 KARE female subjects were tested via linear regression analysis adjusted for area and age in the discovery stage. Next, 5273 female subjects from three studies including Rural1816, Rural 3667, and HEXA were included in the replication stage to validate the signals selected in the discovery stage (P discovery < 0.01) . SNPs validated in the replication stage (P-value < 0.05) were used in the subsequent meta-analysis. Our metaanalyses combining the discovery and replication stages detected two female specific genome-wide significant loci (P meta < 5 × 10 − 8 ) (Fig. 2) as well as 33 female specific loci showing moderate evidence of association for a given trait (5 × 10 − 8 < P meta < 10 − 5 ) ( Table 3, Additional file 2: Table  S2 & Additional file 5: Figure S3).
Of two loci reaching genome-wide significance, rs455489 located in an intron of the DSCAM gene, was significantly associated with FPG levels in females (P meta-female = 2.92 × 10 − 9 , β = 6.76 ± 1.14), but not in males (P meta-male = 0.72) (Fig. 2a). The other genome-wide significant locus, rs7115583, located in an intron of the SIK3 gene, was identified for its association with HDLC levels (P meta-female = 3.58 × 10 − 8 , β = 1.32 ± 0.24) (Fig. 2b); however, this SNP was shown to have no effect on HDLC levels in males in this study (P meta-male = 0.09) ( Table 3). Of a total of 33 moderate signals, 3 were associated with waist circumference, 6 with FPG, 6 with SBP, 3 with DBP, 6 with HDLC, and 9 with TG (Table 3 &  Additional file 2: Table S2). The association of these loci was only detected in females, not in males.    high GRS tend to have higher susceptibility to MetS as compared to those with a low GRS (Fig. 3).

Generation of a GRS and a MetS prediction model
In a logistic regression model, GRS was found to be significantly associated with the prevalence of MetS in female subjects from the KARE study (P = 4.70 × 10 − 11 ). Binary logistic regression adjusted by area and age further increased the association strength between GRS and female MetS (P = 5.28 × 10 − 14 ). This prediction model for MetS comprising area, age, and GRS as predictor variables implies that a one unit increase in GRS increases the odds of developing MetS in females by a factor of 1.27. On the other hand, GRS was not associated with MetS in male subjects from the KARE study in this model (P = 0.33). Although the GRS calculated by this model showed significant association with MetS in all KARE subjects including males and females (P = 1.69 × 10 − 5 ), the odds of developing MetS was almost negligible (OR = 1.09) ( Table 4).
To evaluate the MetS prediction model, we conducted receiver operating characteristics (ROC) curve analysis employing R's ModelGood package. In female subjects from the KARE study, the area under the curve (AUC) measured from the ROC curve of a model comprising area, age, and the GRS as predictor variables showed higher values (AUC area + age + GRS = 0.85) compared to those of other models adjusted with area and age (AUC area + age = 0.83) or with only GRS (AUC GRS = 0.60) ( Table 4 & Fig. 4a). The model adjusted with area, age, and GRS further effectively predicted MetS in females (AUC female = 0.85) but not in males (AUC male = 0.57) or all subjects (AUC all = 0.72) ( Table 4 & Fig. 4b). Taken together, the GRS generated from our association analysis for MetS was a likely predictor for MetS in all female but not in male-containing populations.

Discussion
Several studies have been previously conducted to identify genetic loci influencing MetS or its component traits [15][16][17][18][19]. However, the total heritability of these traits is not yet fully understood [20]. To explain the missing heritability of these traits, more detailed investigations have been proposed, including analyses of less common variants, sequence-level data, epigenetic data, and environmental exposures [21,22].
It has been suggested that sex-specific genetic architecture also influences many complex diseases [23,24]. Thus, genetic studies should consider sex-specific effects in their design and interpretation in order not to fail to detect a significant proportion of the genes that contribute to risk for complex diseases. Although Zabaneh et al. [15] reported a genetic association study for MetS in Indian Asian men, most previous genetic studies for MetS or its component traits were analyzed without consideration of sex-specific effects.
Looking for ways to treat MetS that are more effective for female patients, we aimed to elucidate the female-  Table S1). To gain statistical power to detect genome-wide significant signals, the recruitment of a larger number of female subjects may be essential. Gene ontology analysis using the DAVID 6.8 functional annotation tool (https://david.ncifcrf.gov/ home.jsp) indicated that the function of genes corresponding to the 14 loci associated with MetS were enriched for the positive regulation of T cell proliferation or T cell co-stimulation (Table 5). Further studies will be necessary to gain insight into the functional relevance of these genes to the development of MetS in females.
From genome-wide association analysis for 6 MetS component traits in female populations, we were able to detect two signals reaching genome-wide significance. One genetic variant, rs455489 showing association with FPG (P meta-female = 2.92 × 10 − 9 , β = 6.76 ± 1.14), is located in an intron of the DSCAM (DS Cell Adhesion Molecule) gene (Table 3 & Fig. 2a). This result implies that 1 C allele increase in this SNP has an influence on about 6.76 mg/dL rise of FPG in female populations. It has been known that the coding product of the DSCAM gene is a member of the immunoglobulin superfamily of cell adhesion molecules (Ig-CAMs) and is involved in human central and peripheral nervous system development [25]. Although this gene is a candidate for Down syndrome and congenital heart disease [26], its involvement in FPG in females needs to be elucidated from further studies.
Another SNP, rs7115583, showed genome-wide significant association with HDLC (P meta-female = 3.58 × 10 − 8 , β = 1.32 ± 0.24) in females (Table 3 & Fig. 2b). This result indicates that 1 T allele increase of this SNP results in a 0.24 mg/dL rise in HDLC in females. SNP rs7115583 is located in intron of SIK3 (SIK Family Kinase 3). The stronger associations with HDLC appeared in the APOA5 (rs2075291) and BUD13 (rs11216129) genes as compared to the SIK3 gene. These variants, however, showed strong associations with HDLC in male subjects as well (P meta-male = 4.60 × 10 − 13 for rs2075291; P metamale = 8.42 × 10 − 8 for rs11216129). Genetic loci in or near APOA5 and BUD13 have been associated with HDLC levels in large-scale GWASs [7,27]. About 164 kb from these loci (APOA5 and BUD13), our results suggest that the SIK3 locus is the only independent (r 2 < 0.2) female-specific HDLC signal in this region. It has been reported that the expression of SIK3 is related to ovarian cancer development [28]. The molecular biological function of this gene has yet to be elucidated. In addition to the two genome-wide significant signals, 33 loci showed the moderate evidence of association (5 × 10 − 8 < P meta < 10 − 5 ) with a given MetS component trait in females. Among those, SNP rs1364120 was found to be moderately associated with DBP and is located in an intron of the CDH13 (Cadherin 13) gene, encoding a member of the cadherin superfamily. This CDH13 locus has been previously identified as a susceptibility locus influencing blood pressure by genome-wide study in two European populations [29]. It is known that the CDH13 encoded protein protects vascular endothelial cells from apoptosis due to oxidative stress and is associated with resistance to atherosclerosis [30]. In this study, the minor allele of rs1364120 has a protective effect on DBP ( Table 3). The other SNP, rs11082766, showing moderate association with HDLC is located in an intron of the LIPG (Lipase G, Endothelial Type) gene. The protein encoded by LIPG may be involved in lipoprotein metabolism and vascular biology. It is known that this protein has phospholipase and triglyceride lipase activity by hydrolyzing high density lipoproteins (HDL) more efficiently than other lipoproteins [31]. Another SNP, rs3766235 located in an intron of the MKNK1 (MAP Kinase Interacting Serine/Threonine Kinase 1) gene, shows a moderate association with TG. The encoded protein of MKNK1, a Ser/Thr protein kinase, is known to be involved in the p38 MAPK signaling pathway as well as regulation of lipid metabolism and insulin signaling cascades, implying its possible role in TG metabolism [32]. The functional relevance of the other genes corresponding to remaining 30 loci moderately associated with traits of interest will be further investigated in future studies.
Since six traits such as waist circumference, FPG, SBP, DBP, HDLC, and TG are used to define MetS; there is speculation that genetic regulation of these traits is closely related to the development of MetS. To gain insight into this hypothesis, we tested the association between a total of 35 loci (2 genome-wide significant signals and 33 moderately associated signals) for MetS component traits and MetS. Except for three SNPs (rs10171377 for FPG, rs4883839 for HDLC, and rs6508974 for HDLC), most signals did not show association with MetS (Table 3). These results indicate that simply combining genetic factors associated with MetS component traits could not predict the development of MetS.
In this study, each allele of risk variants for MetS only confers a modest effect on the risk in females (Table 2). Thus, applying single variants has probably a limited ability for MetS prediction. It has been suggested that a genetic risk score (GRS) combining multiple loci might improve prediction of target disease [33]. In this regard, we specifically aimed to investigate whether the GRS could reliably predict the prevalence of MetS in females. To the best of our knowledge, our study is the first to test the prediction model for MetS using GRS combining single variants associated with MetS specifically in females of East-Asian populations.
We constructed the GRS based on only subjects from the KARE study since some genotype data from the 14 MetS SNPs detected in this study were not available in other studies (including Rural1667, Rural3667, and HEXA). In order to reduce bias and noise, only female individuals with no missing genotypes of the 14 SNPs were included in the construction of the MetS prediction model employing GRS (912 cases and 473 controls). Our logistic model indicated that a GRS combining the 14 MetS SNPs is strongly associated with MetS prevalence in females; an increase in one unit of the GRS accounts for a 1.19 increase in the prevalence of MetS (P = 4.70 × 10 − 11 ). When this model was adjusted for area (of subject recruitment) and age, the association strength and effect were further increased (P = 5.28 × 10 − 14 , OR = 1.27). The AUC of this model, evaluated using receiver operating characteristic (ROC) curve analysis, was 0.85 (AUC area + age + GRS = 0.85) implying that this model comprising area, age, and GRS as predictor variables might be useful for the prediction of MetS in females (Table  4 & Fig. 4a). On the other hand, GRS was not associated with MetS in males (P = 0.33), but substantially associated with MetS in populations containing both male and female subjects (P = 1.69 × 10 − 5 ) ( Table 4 & Fig. 4b).
To evaluate if the sex-specific GRS is stronger and more powerful to predict MetS in a group of same sex, we also performed a new meta-analysis for MetS in male participants and identified 10 SNPs showing suggestive evidence of association with MetS in males (5 × 10 − 8 < P meta < 5 × 10 − 5 ). The association of these 10 SNPs was validated in replication analyses (Additional file 3: Table   Table 5 Functional annotation results of 16 genes for MetS using DAVID Bioinformatics Resources 6.8 GO Gene ontology, BP Biological process, CC Cellular component S3). Using risk alleles for MetS detected in males, we calculated male-specific GRS. Our logistic analyses demonstrate that male-GRS was strongly associated with MetS in male participants (P = 2 × 10 − 16 , OR = 1.35), but just nominally associated in female participants (P = 0.013,, OR = 1.08) (Additional file 4: Table S4). To predict MetS in males and females using male-GRS, we carried out ROC analyses and generated AUCs. Our analyses indicate that ROC-AUCs were 0.66 and 0.54 in male and female participants, respectively (Additional file 4: Table S4 and Additional file 5: Figure S4). Taken together, these results strongly imply that sex-specific GRS is a highly effective indicator to predict the MetS development in a population having the same sex.

Conclusion
In this study, we were able to detect new female-specific genetic variants influencing MetS and its component traits in Korean populations. Our findings including genome-wide significant signals such as DSCAM and SIK3 loci for FPG and HDLC, respectively, as well as several signals moderately associated with traits of interest provide new insights into the underlying sex stratified causes of this cluster of risks. In addition, our study suggests that the GRS based on 14 SNPs detected in our GWA analyses for MetS is a useful predictor of MetS in female but not in male populations.