In this study, we systematically evaluated the clinical utility of five SNPs identified in recent GWAs and large cohort studies of lung cancer. Using data from a large case–control study that enrolled 5,068 participants, we found that most of the genetic variants (rs2736100, rs402710, rs4488809, and rs4083914) identified previously in other populations were also associated with risk of lung cancer in a Chinese population. In addition, we showed that a wGRS accounting for the adjusted effect size of each SNP was a better predictor than a cGRS, and had a stronger association with lung cancer risk than any single SNP alone. Although the weighted genetic risk score had a moderate predictive ability, it gave a better discrimination between lung cancer cases and cancer-free controls (AUC of ROC curve, 0.639) when used in combination with smoking status using the logistic regression model.
Several lung cancer risk assessment models have previously been proposed [12–15], but most predictors focused on traditional risk factors such as family history of lung cancer, smoking status, environmental exposure, age and gender. In contrast to these, genetic scores derived from inherited genetic variations offer the advantage of stability during the lifetime of the individual.
Previous studies have indicated that inherited genetic variants might account for an important fraction of lung cancer developmental risk [18, 19]. Recent GWA studies of lung cancer in population of European ancestry identified three lung cancer susceptibility loci: 5p15 (TERT-CLPM1L), 15q25 (CHRNA 3–5) and 6p21 (BAT3-MSH5) [4–9]. McKay et al.  reported two independent markers of lung cancer at the 5p15 region, rs2736100 (TERT) and rs402710 (CLPM1L). Furthermore, an association between rs2736100 and lung cancer were also replicated in Asian populations [20, 21]. Of the five SNPs evaluated in this study, we observed a strong signal at rs2736100 in accordance with previous reports.
15q25 region encoding nicotinic acetylcholine receptor subunits was thought to be related with lung cancer risk [6–8]. We evaluated the rs1051730 SNP from this region in the present study, but it showed no association with disease risk. It is conceivable that the rs1051730 allele frequency in the Chinese Han population (MAF, 0.02) is too low to confirm the effects seen in European populations . Reported risk SNPs at 6p21 (rs3117582 and rs3131379) are not polymorphic in the Chinese Han population, so were excluded from this study. Rs4488809 and rs4083914, previously identified by GWA and large cohort investigations, were also shown to be significantly associated with lung cancer risk in this study [23, 24].
Of the five SNPs evaluated in this study, the strongest signal was found for rs4488809, for which there was 21% elevated risk of lung cancer with each risk allele. The three other SNPs (rs2736100, rs402710, and rs4083914) were also associated with a risk of lung cancer, albeit at lower levels (<18%) for each risk allele. The estimated proportion of genetic variation explained by these four SNPs was therefore 4.02%, which includes 1.82% due to rs4488809 and 1.33% due to rs2736100. This suggests that the genetic susceptibility loci identified by GWA and large cohort studies in other populations only confer a small to moderate risk in a Chinese population when considered alone, and are of little use in lung cancer risk assessment.
To overcome this, a genetic risk score combining multiple loci might improve the identification of persons at high risk for developing lung cancer. Our results showed that although wGRS was highly associated with lung cancer susceptibility, a model including wGRS alone did not provide a better predictive capacity than a model including traditional factors (c statistic for wGRS alone, 0.551). Smoking history was also associated with lung cancer risk in this study, in agreement with previous reports [12, 25]. Moreover, wGRS, in combination with smoking status showed a better predictive ability (c statistic, 0.639). Indeed, the c statistic decreased by 0.020 when wGRS was removed from the full model, indicating that genetic risk factors could improve the discriminatory ability of the traditional assessment model, although this effect was moderate.
This study has a number of limitations. First, the susceptibility loci identified by GWA and large cohort studies with evidence of replication were associated with a lung cancer risk through strong linkage disequilibrium, and always conferred moderate effects. Many additional susceptibility loci for lung cancer remain to be discovered, and it is possible that rare variants with high penetrance would explain the remaining hereditary . Next generation sequencing technologies offer hope in the future research of such variants . Recently, several identified SNPs were reported [28–30]. Combining these new SNPs might result in improvement in classification of lung cancer risk. Second, because of limited traditional factors, the full predictive model established in this study only provided a moderate level of classification accuracy, with a c statistic of 0.639, which is inadequate for risk prediction. The discriminatory capability of our model might be improved by including additional factors such as history of bronchitis, emphysema or pneumonia, asbestos exposure, and family history of lung cancer. Third, our assessment model lacked external validation even though our estimates of ROC AUC were corrected for over-fitting by bootstrap and internal validation was conducted. Finally, as this was a retrospectively designed study, the results need to be validated by a large-scale, prospective study.