We have applied a multivariate approach to identify variants associated with more than one trait, initially using 4986 unrelated individuals across 13 biochemical traits. Univariate testing of the significant or near-significant loci, on the full sample of 11,683 individuals, was then used to confirm these findings. We are interested firstly in the usefulness of multivariate analysis as a substitute for the more laborious and potentially less powerful approach of conducting multiple univariate analyses and comparing the results, and secondly in the details of the loci which are found to have effects on multiple variables in our data.
Testing one individual per family identified three known loci that were significantly or near-significantly associated with more than one trait, and replicated 11 loci in previously published genes that that passed a genome-wide threshold of 5 × 10-8 for single variables (Table 1). When a lower genome-wide threshold (p < 9 × 10-5) was used, a further six published loci were also identified (Additional file 1, Table S4). The three loci in previous publications using univariate association analysis (highlighted in Table 1) had evidence of significant or close to significant associations with more than one trait in our data, hence indicating benefits of detecting pleiotropic loci in multivariate analysis.
We have identified polymorphisms showing strong evidence of allelic associations with HDL and triglycerides on chromosome 8 (LPL gene MIM 609708); with GGT and possibly LDL and CRP on chromosome 12 (OASL gene MIM 603281); and with HDL and LDL and possibly CRP and triglycerides on chromosome 19 (TOMM40 (MIM 608061) /APOE (MIM 107741)-C1 (MIM 107710)-C2 (MIM 608083)-C4 (MIM 600745) gene cluster). Each gene has been previously recognised in genome-wide association studies concentrating on a few of these variables . The function of these genes is reasonably well-established. LPL plays a key role in lipid metabolism and is responsible for hydrolysis of triglyceride molecules present in circulating lipoprotein. APOE and APOC genes also play a key role in lipid metabolism and cholesterol transport by helping to stabilise and solubilize lipoproteins as they circulate in the blood [50, 51]. Both LPL and APOE polymorphisms have been found to be significantly associated with increases in LDL and decreases in HDL . The functional connection between the OASL gene (2',5'-oligoadenylate synthetase-like, also known as "thyroid hormone receptor interactor") and these phenotypes is unclear. However, nearby genes in linkage disequilibrium with the lead SNP in OASL include HNF1A and c12orf43. HNF1A is expressed in liver, kidney and endocrine pancreas and regulates a number of genes involved in lipoprotein metabolism including apolipoproteins, cholesterol synthesis enzymes and bile acid transporters . HNF1A also has allelic associations with type 2 diabetes , CRP [55–57] and coronary heart disease . The findings for the lipids, in particular, were similar to those obtained in previous genetic association studies on general population. However the relationships between inflammation (as presumptively measured by CRP) and the traits associated with obesity and cardiovascular risk are of particular interest. CRP was significantly, though not always strongly, correlated with each of the other traits at the phenotypic level and it also showed up in the multivariate association findings.
The multivariate approach helps us to understand the connections between variables. For example, for rs2075650 on chromosome 19, the multivariate approach suggested LDL, CRP, HDL and triglycerides might be associated with this particular SNP. Although the effects on LDL, HDL and triglycerides are consistent with what was expected (that is, the LDL effect is inversely associated with the HDL effect, positively associated with the triglycerides effect, and the HDL effect is inversely associated with that on triglycerides), the effects on CRP are contrary to expectation. The effect direction is opposite to those for LDL and triglycerides, and the same as that for HDL. This suggests that the alleles or haplotype which have risk-increasing effects on lipids have a potentially protective effect on CRP and (so far as the effect on CRP is reflecting the degree of inflammation) on the inflammatory process. The effect estimates of LDL for rs2075650 obtained in our study were similar to obtained by Aulchenko et al. . The effect of rs2075650 [G] on LDL was estimated as 0.160 ± 0.018 by Aulchenko et al. and 0.153 ± 0.020 in our analysis. The effect estimates of CRP for this SNP was not available from previous study for comparison. In addition, it was interesting to observe that rs4429638 which is in partial LD (r2 = 0.4) with rs2075650 has allelic effects in the opposite direction on LDL [52, 59–61] and CRP . Similarly on chromosome 12, rs3213545 affects LDL and CRP (and GGT) but not HDL or triglycerides. Again, the allelic effects on LDL and CRP are in opposite directions. This shows the usefulness of the multivariate approach to help understand the connections between several trait-SNP associations, which can then be modelled and evaluated in more detail.
Our study differs from previous investigations as it examines a large number of correlated biochemical traits, initially using unrelated individuals and following up the findings in other members of the families. It confirms some published associations and identifies new ones. As our cohort consisted of adolescents and adults, results from adults, adolescents and combined (adults and adolescents) cohorts were examined and compared. Because of the larger number of adults studied, results from adults were similar to the combined data. Results from the adolescents were not notably different from the combined data.
One main limitation of our approach is that only a subset of the data (from unrelated subjects) can be used for the initial multivariate analysis. Although it would add power, it is too computationally intensive to use all the available data (that is, taking account of the family structure) in genome-wide multivariate analysis. Although a subset of data was used, the method applied in our study was very efficient and easy to perform. A more specific limitation in our data is that the glucose and insulin measurements were not made on fasting blood samples. In adults, we made adjustments for the time since the last meal but in adolescents we had to rely on the fact they were seen at the same time of day and blood was taken around three hours after the expected time of breakfast.
Another set of limitations is related to the use of biomarkers of risk or, for CRP, of systemic inflammation. It seems that some loci may affect HDL-C or triglycerides without affecting cardiovascular risk , and it is possible that some loci might affect serum CRP without affecting inflammation. Nevertheless the divergence between allelic effects on risk factors deserves further examination.