Through the DNA1 Advanced Alpha-1 Screening™ program, we have begun to encounter large numbers of novel sequence variants of the SERPINA1 gene, as evidenced by the data we have presented. The present study is supportive of several earlier investigations that have uncovered previously uncharacterized and potentially pathogenic sequence variants of SERPINA1 [7, 9, 12, 19, 21]. There is a growing body of evidence to suggest that novel sequence variants may be more clinically impactful than previously thought, with some reported to be associated with early onset COPD [9].
Using NGS, we identified 21 rare/novel sequence variants of the SERPINA1 gene in patients suspected of having AATD. Most of the variants (n = 16) were SNVs. In addition, two base pair changes resulting in stop codon insertions, one base pair deletion, and two splice variants were discovered. All of the SNVs were previously recorded in the National Center for Biotechnology Information’s database of single nucleotide polymorphisms (dbSNP) and/or in the literature [19, 20, 22,23,24,25,26] (Table 1). The I50N variant (PiTijarafe) was previously confirmed as pathogenic in an vitro cell model, and was associated with similar AAT expression to the Z variant [26]. Nonetheless, to the best of our knowledge, this is the first study to describe seven variants (E204K, P289S, Q40R, M221T, K174E, I9N [includes precursor] and P28L) alongside additional patient data. However, despite the availability of other data such as AAT levels, determining whether these variants are clinically relevant is challenging. We therefore sought to evaluate the utility of computational modeling to provide supporting evidence, in addition to observed AAT serum levels, of the pathogenicity of rare SNVs. We note that computational methods predict the effects of missense variants on either protein function (SVM, and machine learning approaches) or the inherent stability of the tertiary/quaternary structure of a protein (FoldX). However, this may not always correspond with clinical parameters, such as secreted protein serum levels, or the degree of pathogenicity in a particular organ.
The majority of the sequence variants identified in our cohort were predicted to be deleterious by computational methods. Only two variants were predicted to be probably neutral by all three computational techniques. Of the rare variants previously reported in the dbSNP only (E204K, P289S, Q40R, M221T, K174E, I9N [includes precursor] and P28L), the probably deleterious variants were predicted to be, P289S, M221T, and P28L, and were accompanied by low AAT levels. In particular, the P289S variant was found in a 61-year-old patient with advanced emphysema, supporting the pathogenicity of this variant. The remaining variants were predicted to be neutral or possibly neutral, and were accompanied by normal or low-normal AAT levels (although no AAT level was reported with the Q40R variant), and are less likely be clinically relevant. Although there is some evidence of a relationship between AAT variants and cerebral aneurisms [27], we do not have sufficient evidence to conclude a causal relationship between the clinical presentation in patient (CA97) and the E204K variant. For the rare variants predicted to be probably deleterious or possibly deleterious, in line with previous reports, we observed that the majority of these cluster around functional domains of AAT [20]. The mechanism of pathogenicity for most of these sequence variants (I50N, P289S, M385T, M221T, D341V, V210E, P369H, V333M and A142D) is likely to be via disruption of the tightly packed hydrophobic core of the AAT protein, and some may in turn disrupt the adjacent reactive center loop (RCL; Fig. 3) that inhibits proteases. One possible mechanism is that substantial changes to the core of the protein could result in misfolding of the protein within hepatocytes, such that only small amounts of AAT would be released, resulting in reduced levels of AAT in the peripheral circulation. An alternative mechanism of pathogenicity might include missense changes that do not affect AAT folding and result in normal levels detected in serum, but have a deleterious effect on conformational changes required for sheet opening or protein-protein interactions necessary for inhibition of neutrophil elastase.
As expected, very low blood levels of AAT were found in heterozygotes for known deficiency alleles and new mutations. Two patients (12230 and 15230) in this study had very low AAT levels around the range associated with a PI*ZZ individual [20–45 mg/dL] [1], and novel pathogenic variants in combination with the Z allele. Patients such as these would be strong candidates for AAT therapy if they presented with airflow obstruction and significant emphysema [28]. There are more than 6 million individuals in the United States alone with the PI*MZ genotype [5]. As shown by this study, it is possible that numerous other patients may be undiagnosed compound heterozygotes with rare/novel sequence variants not detectable by IEF or targeted genotyping. The concept of cumulative deleterious effects in compound heterozygotes has previously been described for the PI*FZ genotype [29]. The F allele is associated with normal AAT levels but reduced AAT functionality, while low circulating levels are observed in Z patients [29]. All AAT secreted by PI*FF homozygotes has reduced functionality and these individuals have been shown to be at increased risk of lung damage caused by uninhibited elastase [29]. In PI*FZ heterozygotes, functionality and circulating levels are both reduced, resulting in an increased risk of emphysema compared with PI*FF patients [29].
Most novel sequence variants within our cohort were heterozygous with normal variants; it is therefore difficult to fully assess the impact of these variants on serum AAT levels and risk of emphysema. For known variants the disease risk is well known. For example, individuals with the PI*MZ genotype have a greater degree of airflow obstruction than PI*MM individuals with comparable smoke exposure, and ever-smoking PI*MZ individuals have an increased risk of developing COPD [30]. However, the longitudinal disease-risk associated with rare alleles is unknown and AAT levels, although indicative of severity, are not conclusive. As the majority of these rare/novel variants will probably have different mechanisms of pathogenicity, it is possible that the disease risk is different to that of common heterozygotes and is specific for each variant. Further biochemical and clinical characterization is needed to fully understand how these sequence variants contribute to lung disease.
AATD is usually associated with single amino acid substitutions/deletions leading to subtle structural changes to the AAT protein; however, this study also identified splice variants, stop codons, and large deletions in SERPINA1. The potential contribution of these sequence variants to AATD should not be underestimated, especially when occurring in combination with damaging structural mutations. For example, in patient 6326, insertion of a stop codon at position 156 in combination with the Z mutation resulted in a severe reduction in antigenic AAT levels (2 mg/dL). This effect was not apparent in this patient’s sibling (patient 6376), whose AAT level was 98 mg/dL. Patient 6376 is heterozygous for the above mentioned stop codon and the PI*I (R39C) allele – the PI*I mutation gives rise to a misfolded AAT protein, which is present in peripheral blood at near-normal concentrations [31]. This further demonstrates that rare and novel sequence variants can become more clinically relevant in combination with common deficiency alleles.
For patients with rare/novel mutations, apart from instances where the variants are deletions or null variants, it can be difficult to determine the impact of sequence variants and if treatment with exogenous AAT is necessary. This study has demonstrated that computational analyses may be useful in understanding the potential impact of novel mutations. The three predictive computational methods presented were generally in agreement and in most cases related to the observed AAT levels. In particular, we found that the enhanced structural information that contributes to the SVM predictions may confer a greater sensitivity to deleterious variants, making it suited for clinical genetics applications. The benchmarking analysis provides a strong validation for the balanced accuracy of the SVM predictions and supports its use in predicting the effects of the novel variants described in the current work. In addition, there was good agreement between results of the present analysis and previous studies [19, 20] (Table 2). One exception to the general agreement between this and previous studies may be P28L, with other computational measures suggesting that it is of intermediate pathogenicity. However, it is notable that the number of previously reported deleterious scores generally mirror that of those reported in the present study through the categories of probably deleterious, possibly deleterious, possibly neutral and probably neutral utilized in the present study. In particular, in the probably neutral section, no deleterious scores are presented from this analysis or previous reports.
Some important limitations of this study should be mentioned. This observational study was not controlled, i.e., there were no formal inclusion and exclusion criteria and no control group, and data were collected from a small (N = 23) patient population. In addition, genetic and non-genetic factors – not related to the AAT sequence variants reported here – may have contributed to the development of COPD. However, these factors are beyond the scope of the current report. Furthermore, computational modeling of missense variants only predicts if a substitution is deleterious to protein function or stability. We do not know the exact mechanisms by which these substitutions lead to either reduced AAT levels or weakened elastase-inhibiting activity. Furthermore, it should be noted that a host of different modeling software are available, and each may produce different results for a particular mutation, as demonstrated by Giacopuzzi et al. (Table 2). It was outside the scope of the present study to assess a wide range of modeling techniques, as a further aim of the study was to relate the computational scores to clinical parameters. However, Giacopuzzi et al. raise an important point, in that no individual computational method is infallible, and in an ideal situation, more than one technique should be consulted in the clinical decision-making process. In addition, computational predictions may be inconsistent with findings of experimental characterization; therefore, ultimately, detailed biochemical functional analysis of the protein is required to validate the findings of computational analyses. In addition, clinical information on patient presentation is required in order to obtain a full picture of the patient’s individual disease risk.
Despite the above limitations, this study demonstrates that there are numerous potentially pathogenic novel variants beyond those commonly associated with AATD. Due to the progressive and irreversible destruction of lung tissue seen in severe AATD, early and accurate diagnosis is crucial to prevent further loss of lung tissue. Data from the RAPID/RAPID Extension trials has demonstrated that while treatment with AAT can slow the loss of lung tissue, tissue lost prior to commencing treatment cannot be regained [32, 33]. This is compounded by the fact that patients often experience long delays before receiving an accurate diagnosis [34], partly due to a lack of specialized testing. Early diagnosis also enables patients to implement lifestyle changes such as smoking cessation and avoidance of passive smoke. However, identifying rare/novel variants can be difficult, and this task may be impossible by traditional methods such as protein phenotyping via IEF [10].
The increasing availability of commercial DNA testing is helping to improve diagnosis of patients with AATD and rare genotypes [35]. However, many current approaches do not incorporate sequencing, and are unable to detect potentially pathogenic rare/novel variants that may lead to development of AATD. The need for faster screening and diagnosis of AATD has led to the development of the DNA1 Advanced Alpha-1 Screening™ Program. DNA1 testing incorporates AAT levels, C-reactive protein serum levels, targeted genotyping (including the F and I alleles), and IEF, and reflexes to NGS when these methods prove insufficient. Our results support the proposal by Graham et al, who recommended that individuals with low serum levels and no resolution in targeted tests should be subjected to full-gene sequencing [12].