Next-generation-sequencing-based identification of familial hypercholesterolemia-related mutations in subjects with increased LDL–C levels in a latvian population

Background Familial hypercholesterolemia (FH) is one of the commonest monogenic disorders, predominantly inherited as an autosomal dominant trait. When untreated, it results in early coronary heart disease. The vast majority of FH remains undiagnosed in Latvia. The identification and early treatment of affected individuals remain a challenge worldwide. Most cases of FH are caused by mutations in one of four genes, APOB, LDLR, PCSK9, or LDLRAP1. The spectrum of disease-causing variants is very diverse and the variation detection panels usually used in its diagnosis cover only a minority of the disease-causing gene variants. However, DNA-based tests may provide an FH diagnosis for FH patients with no physical symptoms and with no known family history of the disease. Here, we evaluate the use of targeted next-generation sequencing (NGS) to identify cases of FH in a cohort of patients with coronary artery disease (CAD) and individuals with abnormal low-density lipoprotein–cholesterol (LDL–C) levels. Methods We used targeted amplification of the coding regions of LDLR, APOB, PCSK9, and LDLRAP1, followed by NGS, in 42 CAD patients (LDL–C, 4.1–7.2 mmol/L) and 50 individuals from a population-based cohort (LDL–C, 5.1–9.7 mmol/L). Results In total, 22 synonymous and 31 nonsynonymous variants, eight variants in close proximity (10 bp) to intron–exon boundaries, and 50 other variants were found. We identified four pathogenic mutations (p.(Arg3527Gln) in APOB, and p.(Gly20Arg), p.(Arg350*), and c.1706–10G > A in LDLR) in seven patients (7.6 %). Three possible pathogenic variants were also found in four patients. Conclusion NGS-based methods can be used to detect FH in high-risk individuals when they do not meet the defined clinical criteria. Electronic supplementary material The online version of this article (doi:10.1186/s12881-015-0230-x) contains supplementary material, which is available to authorized users.


Background
Familial hypercholesterolemia (FH) was first described in 1920 [1] and is possibly the most common single-gene disorder. FH is characterized by elevated low-density lipoprotein-cholesterol (LDL-C) levels, tendon xanthoma, and arcus corneae before the age of 45 years. In the majority of cases, FH is inherited as an autosomal dominant trait, with a penetrance of almost 100 % [2]. Based on the Copenhagen General Population Study, the prevalence of heterozygous FH is about one in 200 individuals. However, the prevalence varies from one in 200 to one in 500 individuals, depending on the population [3]. The homozygous form of FH is much rarer, occurring in only one in 1,000,000 individuals [4][5][6].
The accumulation of lipids and lipoproteins in the plasma is the first key process in the development of atherosclerotic lesions [7]. Atherosclerosis, its resultant coronary artery disease (CAD), and other vascular diseases are the commonest causes of deaths worldwide [8], and the early diagnosis of FH patients before the development of severe complications is crucial to the commencement of appropriate treatment with lipidlowering agents.
The diagnosis of FH is often based on physical signs (tendon xanthoma or arcus corneae before 45 years of age), laboratory findings (plasma LDL-C levels), and the patient's history of elevated LDL-C and cardiovascular disease. There are at least three approaches to the diagnosis of FH based on these signs [9]. However, the absence of physical symptoms is not sufficient to reject a diagnosis, because about 50 % of heterozygous FH patients do not display physical signs [10] and there is often no evidence of premature CAD in heterozygous FH patients [4,11]. Moreover, in some monogenic hypercholesterolemias, lipid plasma levels are similar to the normal ranges [5]. Therefore, DNAbased tests are required to confirm a diagnosis of FH.
In majority of FH patients have a defective lowdensity lipoprotein receptor (LDLR). This results in the insufficient uptake of low-density lipoprotein (LDL) particles by cells and the accumulation of LDL-C in the plasma [12,13]. Extremely elevated plasma LDL-C levels greatly increase the risk (by more than 50 %) of premature CAD before the age of 55 years [11][12][13][14]. Hundreds of LDLR mutations have been reported and many of them cause FH (http://www.ucl.ac.uk/ldlr/Current/, http:// www.hgmd.org/). A phenotypically similar condition, familial defective apolipoprotein B (FDB), is caused by defects in the apolipoprotein B gene (APOB) [12,15], and especially by mutations in the LDLR-binding domain of APOB [14,16]. These mutations prevent the recognition of the APOB protein by LDLR, and therefore LDL particles bound to defective APOB cannot be taken up by cells. Various mutations in the APOB protein are related to FDB, most commonly p.(Arg3527Gln) [15,17,18]. Gain-of-function mutations in proprotein convertase subtilisin/kexin type 9 (PCSK9) have been suggested to increase the degradation of LDLR, reducing the number of LDLR molecules on the cell surface. This further reduces the uptake of LDL particles by cells, leading to yet another type of monogenic hypercholesterolemia [13,[19][20][21]. There is also evidence that functional mutations in low-density lipoprotein adapter protein 1 (LDLRAP1) cause a recessive form of FH [22,23]. The plasma LDL-C levels in these patients tend to be intermediate compared with those in patients with heterozygous or homozygous familial LDLR deficiency [24]. This adaptor protein is crucial for the normal assembly of LDLR-rich coated pits on the cell surface and therefore for normal LDL particle uptake into the cell by endocytosis [23,25].
The proportion of patients identified with FH and under treatment remains very low, and most undiagnosed and untreated patients are young [13]. Recent findings also suggest that the prevalence of FH is higher than previously thought [3]. In this context, DNA-based evidence of functional mutations in the LDLR, APOB, PCSK9, and LDLRAP1 genes is pivotal to making a correct diagnosis [26] because the actual prevalence of FH may be underestimated with the current diagnostic criteria for FH [3].
We conducted this study to assess the spectrum of FH-related mutations in a group of individuals that was selected mainly based on their elevated LDL-C levels, thus representing a typical population, including many individuals with undiagnosed FH. Another aim of the study was to estimate the feasibility of targeted nextgeneration sequencing (NGS)-based testing for FH.

Subjects
We conducted this research using DNA samples from the Genome Database of the Latvian Population (LGDB), a government-funded biobank (described briefly in [27]) that included 25,292 participants in August 2014 (1.3 % of the Latvian population). In the LGDB, 4335 samples included data on total cholesterol (TC), highdensity lipoprotein-cholesterol (HDL-C), LDL-C, and triglyceride (TG) levels. Two groups of patients were included in the study. We selected 50 unrelated individuals from the extreme end of the LDL-C spectrum and designated them the "population group" (POP). In total, 292 individuals had LDL-C levels > 4.9 mmol/L, which is the cut-off value for LDL-C based on World Health Organization criteria [11]. In this dataset, 50 samples from the upper end of the distribution, representing a range of LDL-C from 5.1 to 9.7 mmol/L, were chosen. The second group of 42 patients was recruited to the LGDB from the Latvian Center of Cardiology, the "CAD" group, during the period to March 2013, specifically for this study. The inclusion criteria for this group was a history of coronary heart disease (previous myocardial infarction or angina) and/or evidence of coronary atherosclerosis, established with invasive coronary angiography, and a TC level > 7 mmol/l. All subjects who gave their consent to participate in the study were included without further selection. All individuals from the CAD group underwent lipid-lowering therapy and highest known LDL-C value independently of pre-or post-treatment was used for this study. Written informed consent was obtained from all the participants. The study protocol was approved by the Central Medical Ethics Committee of Latvia (protocols nr. 2007 A-7 and 01-29.1/25). Sixteen (17 %) samples were also included in the previous pilot study, investigated with the same sequencing panel [28].

Next-generation sequencing (NGS)
We sequenced all the exons (including the exon-intron boundaries), and 5′ and 3′ untranslated regions of four FH-related genes (LDLR, APOB, PCSK9, and LDLRAP1), with the Life Technologies Ion Torrent™ PGM platform, which is based on the semiconductor sequencing technology [29]. For target enrichment, we developed an AmpliSeq™ assay, using the Ion AmpliSeq™ Designer tool to custom design the assay primers. The design was based on the human (Hg19) reference genome and pipeline v1.

Variant detection and analysis
The raw data acquired were aligned to the human reference genome (Hg19) using the Torrent suite 4.0.2 alignment plugin v4.0-r77189. The variant-calling plugin (variantCaller v4.0-r76860) was used to call and annotate the detected variations within the sequenced samples. The Standard-Germ Line-PGM-Low Stringency settings were chosen for the variant-calling plugin, in which the minimal coverage for calling single-nucleotide polymorphisms (SNPs) was 6x. All variants called were visually inspected using the Integrative Genomics Viewer (IGV) v2.3 software (http://www.broadinstitute.org/ software/igv/download) [30] to filter out possible PCR errors or false-positive variants caused by imperfections in the variant-calling plugin algorithms or by the platform specificity [31].
Variants with minor allele frequencies above 1 %, according to the 1000 Genomes phase 1 genotype data from 1094 worldwide individuals (http:// browser.1000genomes.org), were considered nonpathogenic. Similarly, all deep intronic variants (more than 10 bp away from exon-intron boundaries), other noncoding variants, and synonymous variants were excluded from further analysis. The remaining variants were analyzed by comparing them with the Human Gene Mutation Database (HGMD; http://www.hgmd.org/), the Familial Hypercholesterolemia Variant Database (FHVD; http:// www.ucl.ac.uk/ugi/fh), and scientific publications, for annotation purposes. Annotation in silico was also performed using tools that predict the functional effects of human SNPs: PolyPhen-2 v2.2.2r398 [32], SIFT [33], and Mutation Taster [34]. The variants described as causing FH in the FHVD, supported by evidence in the literature, were considered pathogenic (Category 1). Nonsynonymous variants with unknown or unclear evidence of pathogenicity, which were identified as damaging by all the prediction tools, were considered possibly pathogenic (Category 2). All the remaining variants were classified as variants of uncertain clinical significance (Category 3). All the Category 1 and Category 2 variants were directly sequenced for validation.

Results
For the purposes of this research, we selected 50 patients who represented the general population, with markedly elevated LDL-C levels (among these, only one individual had a prior history of CAD), the POP group, and 42 patients with elevated LDL-C levels and established CAD, the CAD group. The clinical characteristics of our cohort are summarized in Table 1.
After sequencing errors caused by PCR mistakes, variants called due to inadequate variant-calling algorithms near homopolymers, and strand-specific errors specific to the Ion Torrent platform [31] were removed, 114 different variants were detected in the total study group. For three variants, the alternative allele frequencies were 100 %. These were assumed to be inappropriately annotated in the reference genome and were not included in the list of variants [35]. Similarly, synonymous coding variants (n = 22) were not included in the list of variants. In total, 31 nonsynonymous variants, eight variants in close proximity (10 bp) to intron-exon boundaries, and 50 other variants were identified. Information on all (n = 89) the identified variants is given in Additional file 1: Table S1. Four FHcausing variants (Category 1), three possibly pathogenic variants (Category 2), and eight variants of uncertain significance (Category 3) were identified (shown in Table 2). The target coverage achieved in the regions of interest was 90.7 % and the mean read depth was 117×.
Four new variants were identified (not reported in any variant database): one missense change p.(Tyr144His) (APOB) from Category 2, and two missense changes p.(Val2095Glu) and p.(Met755Leu) (both in APOB) and one variant close to an LDLR splice site (c.2141-9 T > G) from Category 3. Table 3 shows the individual patients with Category 1, 2, and 3 variants.
FH-causing mutations were identified in seven patients (7.6 %) and possibly pathogenic variants were found in another four patients. No significant difference between the CAD and POP groups was identified in the number of variants from Categories 1, 2, or 3. All Category 1 and Category 2 variants were verified with direct sequencing.

Discussion
The heterogeneity of FH-causing variants supports the sequence-based detection of the disease as the primary TC level, mmol/L ± SD (min-max) 7.9 ± 0.7 (6.7-10.  methodology, and the recent advances in NGS-based techniques has made them more accessible for use in diagnostics. However, it is unclear how applicable this approach is to the identification of individuals with FH in primary care. In this study, we used our recently developed AmpliSeq-based assay [28] to test the ability of this approach to identify FH variants in high-risk groups from general and disease-specific populations, primarily selected on the basis of their available clinical and biochemical data. Therefore, the sole criterion for inclusion in the POP group was elevated LDL-C levels, whereas the CAD group was selected based on clinical or angiographic evidence of CAD and high TC levels (>7 mmol/l). The two groups showed significant differences in their LDL-C levels and the presence of CAD or myocardial infarction. However, the two groups did not differ in the number of variants identified (regardless of their proposed functions). We found four pathogenic FH-causing mutations in 7.6 % of the total group of subjects tested. The LDLR-protein-terminating mutation p.Arg350* was found in one subject. This variant has previously been reported as an FH-causing variant in 13 unrelated FH patients [13,36,37]. It encodes a truncated version of the protein, a nonfunctional LDLR, and therefore reduces the uptake of LDL particles in the liver and other tissues. p.(Gly20Arg) in LDLR was found in two patients and has previously been identified in FH patients in many studies [38][39][40][41][42][43]. Although it was predicted to be a benign variant by PolyPhen-2, LDLR p.(Gly20Arg) is already used in the LIPOchip®-FH diagnostic panel [44] and is a confirmed FH-causing variant. Similarly, c.1706 -10G > A in LDLR was identified by us in one patient, has been previously and repeatedly detected in FH patients [40,[45][46][47][48][49]; it is also used in an FH diagnostic panel [49]. Another variant, p.(Arg3527Gln) in APOB, was found in three subjects, and is the most common FH-causing mutation, supported by much epidemiological and functional evidence [18,41,[50][51][52]. Interestingly, another APOB variant, p.(Glu2566Lys), was found in all three subjects, suggesting strong linkage disequilibrium between these variants. Another three rare APOB mutations were suggested to be potentially damaging, in terms of the protein function, by all the prediction tools used in this study. One of these, p.(Arg1689His), has been described in the HGMD database as possibly causing hypertriglyceridemia.  However, this variant has also been reported in control subjects [53], suggesting that it only moderately influences the FH phenotype. The other two rare variants of APOB, which have not been described previously, were found in two (2.2 %) subjects. In addition to the variants described above, we found seven rare (<1 % according to the 1000 Genomes phase 1 genotype data from 1094 individuals worldwide; http://browser.1000genomes.org) nonsynonymous variants of unknown clinical significance (Category 3). All nonsynonymous variants in Category 3 were located in the APOB gene. Apart from p.(Glu2566Lys) which is in LD with the pathogenic mutation p.(Arg3527Gln), only p.(Ser2429Thr) has been reported in relation to hypertriglyceridemia [53].  (Table 3). It should be noted that to assess the possible functions of the Category 2 and 3 variants, additional functional and segregation studies will be required to determine the pathogenicity of these variants. However, for variants with low penetrance, which is attributed to the APOB variants, this can be a difficult task and the challenge of interpreting rare variants identified with NGS is the main obstacle to the introduction of this technique to clinical diagnostics. In addition to rare variants, we also detected a number of common polymorphisms, including nonsynonymous variants that have been associated with elevated LDL-C, atherosclerosis, and other conditions (summarized in Additional file 1: Table S1). The extent to which the presence of these common variants influences the outcomes of patients with rare low-penetrance variants is unclear. One of our aims in this project was to compare two groups that may be representative of populations of undiagnosed FH patients: the POP group, representing the general population, but with markedly elevated LDL-C levels, and the CAD group, representing coronary atherosclerosis patients, the majority of whom are on lipid-lowering therapy. Therefore, we did not attempt to control for equal LDL-C levels between the two groups. Our study group does not represent a typical FH cohort because no data were collected on their clinical phenotypes, including their tendon xanthomata (in the POP group), and no such phenotypes were observed (in the CAD group). Although some family histories were collected, they did not provide sufficient details to apply the commonly used FH criteria to define the disease. Another relevant consideration was the advanced age of most of the subjects included in the study, which may have caused variants with strong effects to be underrepresented. Thus, the pathogenic LDLR mutation p.(Arg350*), resulting in a nonfunctional protein, was identified in a 23-year-old patient, whereas the APOB variants known for their moderate effects were found in three patients aged 57-65 years. It is also well known that combinations of risk alleles of several small effect SNPs, can increase LDL-C levels [56][57][58]. Therefore, it cannot be excluded that LDL-C levels, the main selection criterion, were influenced by these SNPs in our cohort. Even more interesting is how these SNPs modify the LDL-C levels in patients with a monogenic background. However, much larger cohorts will be required to study the role of SNPs with sufficient statistical power. The main difference between the two groups was the presence of CAD in one of them. An increased prevalence of FHrelated variants could be expected in the CAD group. However, we observed no significant differences in the numbers of subjects with Category 1, 2, or 3 variants between the groups (p = 0.684), although there was a clear tendency for these groups to differ in the number of carriers of specifically Category 1 variants. Even more interesting is the observation that an excess of carriers of Category 1 variants was found in the POP group (n = 5, 10.0 %), rather than in the CAD group (n = 2, 4.8 %) as one would expect. This suggests that the FH-related variant rate is higher in populations with higher LDL-C levels, irrespective of the presence of CAD. It is very likely that the rate of FH diagnosis would be even higher in a population selected with an inclusion criterion of a higher LDL-C cut-off. This is consistent with suggestions to increase the total cholesterol cutoffs to improve the diagnostic rate for FH [59]. However, such conclusions must be validated in larger cohorts because the majority of similar studies, including ours, acquired their data from very small cohorts and therefore lacked power.
The FH-related mutation detection rate of 7.6 % for definite FH is slightly higher than but similar to that in a recently published study in which FH-causing variants were identified in 2.1 % of unrelated high-risk individuals [60]. Studies in which NGS was used to detect variants in FH patient populations with defined clinical criteria reported higher detection rates, ranging from 67 % in clinically defined FH patients to 26 % in possible FH patients [61]. It is clear that studies with stronger inclusion criteria, based on clinical signs of FH or the presence of a family history, show much higher detection rates. The recent study by Maglio and colleagues [62] showed a detection rate of approximately 60 % in a study group in which all the participants had a family history of CAD, and 74 % of whom were classified with definite or probable FH, and 16 % displayed xanthomas.
A typical drawback of amplicon-based sequencing methods is the incomplete coverage arising from the limits of multiplexing. With our approach, 92.1 % of regions of interest were covered theoretically, whereas 90.7 % were obtained in practice because some amplicons did not produce PCR products. However, the depth of reads for all the successfully sequenced amplicons was sufficient to cover the expected regions completely. This can be considered close to optimal because all the known functional domains were covered and the regions omitted were mainly untranslated parts of the genes.

Conclusions
In conclusion, we have shown that NGS-based methods can be used to detect FH in high-risk individuals and delivered a definitive diagnosis of FH in 7.6 % of all subjects in a population who did not meet the defined clinical criteria for FH. In this study, we also discussed the possible roles of low-penetrance variants in the etiology of high LDL-C and the problems associated with the interpretation of NGS results. It is clear that additional cascade testing in family members must be performed in subjects with uncertain variants. Implementation of the NGS-based approach may increase the percentage of known cases of FH, a currently underdiagnosed disease, and even more importantly, individuals with milder forms of FH and elevated LDL-C.

Additional file
Additional file 1: Table S1. All variants found in study group.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions IR performed the molecular genetic studies and NGS, analyzed and interpreted the data, and drafted the manuscript. GL facilitated patient involvement, diagnosed the patients, and drafted the manuscript. RB carried out the NGS. DS carried out the NGS and analyzed the data. IS analyzed the data. DF participated in the study design and coordination, and helped to draft the manuscript. IE carried out the NGS. KV facilitated patient involvement. GO facilitated patient involvement. AE participated in the study design and coordination. JK conceived the study, participated in its design and coordination, and helped to draft the manuscript. All the authors have read and approved the final manuscript.