NGS based expanded pilot carrier screening study in North Indian population reveals unexpected results.

Background: To determine the carrier frequency and pathogenic variants of common genetic disorders in the north Indian population by using next generation sequencing (NGS). Methods: After pre-test counselling, 200 unrelated individuals (including 88 couples) were screened for pathogenic variants in 88 genes by NGS technology. The variants were classied as per American College of Medical Genetics criteria. Pathogenic and likely pathogenic variants were subjected to thorough literature-based curation in addition to the regular lters. Variants of unknown signicance were not reported. Individuals were counselled explaining the implications of the results, and cascade screening was advised when necessary. Results: Of the 200 participants, 52 (26 %) were found to be carrier of one or more disorders. Twelve individuals were identied to be carriers for congenital deafness, giving a carrier frequency of one in 17 for one of the four genes tested (SLC26A4, GJB2, TMPRSS3 and TMC1 in decreasing order). Nine individuals were observed to be carriers for cystic brosis, with a frequency of one in 22. Three individuals were detected to be carriers for Pompe disease (frequency one in 67). None of the 88 couples screened were found to be carriers for the same disorder. The pathogenic variants observed in many disorders (such as deafness, cystic brosis, Pompe disease, Canavan disease, primary hyperoxaluria, junctional epidermolysis bullosa, galactosemia, medium chain acyl CoA deciency etc.) were different from those commonly observed in the West. deferens), variable clinical severity and lack of availability of sweat testing, and absence of new born screening.

Most countries carry out prevention by screening for infections during pregnancy by serology, chromosomal disorders by biochemical test and structural abnormalities by ultrasonography. However, population prevalence studies have shown that the number of single gene disorders is almost equal to or exceeds chromosomal disorders and congenital malformations combined [3,7,8,9]. The cost of prevention through screening for single gene disorders is much less than the cost of treatment. For example, in Cyprus where thalassemia is common, it was shown that the cost of 8 weeks of prevention was equivalent to the cost of one week of treatment of the thalassemia population [10]. The Ministry of Health of Israel reported that the life time health care cost for persons with thalassemia vs the cost of national screening program was in a ratio of 4.22 to 1 [11]. The need to reduce the prevalence of genetic disorders in developing countries is greater now as the new treatments of genetic disorders are exorbitantly expensive and out of reach for these families [12,13]. Moreover most of this expenditure has to be covered by out of pocket expenses by the patients/parents themselves.
Screening for carriers of single gene disorders such as cystic brosis and Tay Sach disease has also been shown to be cost effective [14,15]. Beauchamp et al examined the clinical impact of a 176-condition expanded carrier screening and demonstrated its cost-effectiveness to reduce the burden of Mendelian disease as compared with minimal screening [16]. Zhanga et al considered the impact and cost-effectiveness of offering preventive population genomic screening for BRCA1/2, MLH1/MSH2 genes, cystic brosis, spinal muscular atrophy and fragile X syndrome to all young adults (18-25 years) in a single-payer health-care system in Australia, and reported that it would be highly cost-effective, but ethical issues need to be considered [17].
The basic objective of carrier screening is to identify carriers and offer them reproductive options from choosing to marry someone who is not a carrier of the same disease (premarital screening) or prenatal diagnosis. In the event that both the husband and wife are carriers of the same disorder, preimplantation genetic diagnosis (after in vitro fertilisation) or prenatal diagnosis (during early stages of a naturally conceived pregnancy) can be carried out [18]. Screening only those families who have a previously affected child is very ine cient, as majority of affected children are born to couples with no previous family history. Similarly screening only those who have an a-priori increased risk of being a carrier based on their personal and family history or who are consanguineously married, or in couples who are opting for sperm or egg donation (Assisted Reproduction Technologies) would still be an inadequate strategy to identify the carriers of genetic disorders. It is best to screen all couples for the genetic disorders common in that population.
World-wide, carrier screening has evolved from an ancestry-based (e.g. in Jewish populations) to pan-ethnic testing, and from single gene disorders, such as cystic brosis or α/β-thalassemia, by Sanger sequencing or hematologic techniques, to multiple disorders through Next Generation Sequencing (NGS) [19]. In the West, carrier screening by NGS was initially limited to targeted genotyping because most of the pathogenic variants in the Caucasian population had been characterized and the results were easier to interpret as the subjects were screened for known variants [19]. This approach is not suitable in resource poor countries as most of the pathogenic variants in different genes have not been characterized. However, screening later shifted to NGS of all coding exons of genes to identify carriers more e ciently. This is more suited in India and other resource-poor countries, identifying only the variants that are pathogenic or likely pathogenic and ignoring variants of uncertain signi cance.
Carrier screening studies for single gene disorders in India, as a service, have chie y been carried out for β-thalassemia, based on hematologic technique [20]. Isolated studies for p.Phe508del in cystic brosis [21] and p.Trp24Ter in GJB2 related hearing loss [22] and more recently SMN1 deletion in SMA (Spinal muscular atrophy) [23] have been performed as research studies. The objectives of the present study were to determine the carrier frequency of variants in 88 genes expected to be common in Asian Indians and to identify the pathogenic or likely pathogenic variants.

Subjects
This study was carried out at Sir Ganga Ram Hospital, a tertiary care multispecialty facility, over a period of 22 months from October 2016 through June 2018. Institutional ethical clearance was obtained prior to commencing the study (Ethical clearance number EC/08/ 16/1066). The molecular analysis was performed at Medgenome Laboratories Ltd, Bangalore. Two hundred unrelated individual (n=101 male, n=99 female) between the age of 20-60 years, visiting the Medical Genetics and Obstetrics and/or Gynaecology out-patient clinic for various reasons unrelated to genetic disorders were enrolled, after pre-test counselling. Individuals known to be carriers of any genetic disease, or with history of a chronic medical disorder or familial genetic disorder were excluded from the study. The relevant history and clinical data of each individual was recorded on standard case record proformas (Supplementary le 1).

Sample size
A sample size of 200 unrelated individuals was planned for enrolment in this pilot study.

Statistical analysis
Descriptive analysis was done, and outcome reported as proportion of carriers upon total individuals tested (n/200). Con dence interval was calculated by Wilson method [24].

Selection of gene panel
The selection of genes followed the Wilson and Jungner criteria [25]. Genes selected were those which cause high impact disorders that have signi cant effect on lifespan or reduce quality of life; or genes with moderate impact that do not reduce lifespan but impact quality of life; or disorders with signi cant socioeconomic burden for which couples would consider prenatal diagnosis. Limited literature available on the prevalence of various monogenic disorders in India was reviewed. The study by Ankala et al [26], summarised the prevalence of galactosemia in India 1:10,300, congenital adrenal hyperplasia (CAH) 1:2,600, phenylketonuria 1:18,300, and amino acid disorders 1:3,600. The prevalence of childhood hearing loss in India was estimated as 1:500 in one study in 2009 [27]. The true prevalence of cystic brosis in India is unknown but suspected to be high in a recent review done by Mandal et al [28], based on the increased citations in recent years. Lazarin et al, [29] also observed a carrier frequency of 1:40 for cystic brosis in South Asian population, much higher than expected from data in India. The genetic register maintained about patients evaluated at our centre was analysed. Eighty eight genes [72 autosomal recessive (AR), 7 X-Linked (XL), 9 autosomal dominant (AD/AR) were selected for testing (Supplementary le 2). A smaller number of disorders were aimed at to develop a short but e cient panel that could be offered at a low cost. Two recessive disorders (cystic megalencephaly and calpainopathy) were included as they are common in an ethnic group (Agrawals) in North India [26]. Familial Hypercholesterolemia, though an autosomal dominant disorder was studied as it is life threatening and early treatment can save lives. The study was planned in coherence with American College of Medical Genetics (ACMG) position statement on prenatal/preconception expanded carrier screening [30]. Some common disorders were excluded either because these are not detectable by NGS technology with accuracy or the disorder can be screened easily by haematological tests. These included β-thalassemia, deletions in SMN1 (survival motor neuron 1) causing SMA (spinal muscular atrophy), FXS (Fragile X syndrome), and DMD (Duchenne muscular dystrophy). Deletion study of CAH was excluded, although sequencing of the gene was performed. Large copy number variations in any of the 88 genes were also not included, in this sequence -based study.

Pre and Post-test counselling
Prior to the testing, all individuals were counselled about the type of disorders being tested, the implications of being a carrier, the bene ts of enrolment of the partner and voluntary nature of testing. Relevant personal, family and ethnic data were recorded. Subjects were clinically examined to rule out any chronic disorder. In post-test counselling the individuals were explained about carrier status and its implications, cascade screening of family members and residual risks remaining after the results (unscreened disorders, chromosomal disorders and indels). The study methodology is depicted in gure 1.

Molecular and Bioinformatic analysis
DNA was extracted from blood using Qiagen kit, and targeted genes were captured by a custom kit. The libraries were sequenced to mean coverage of >80-100X on Illumina sequencing platform. The sequences obtained were aligned to human reference genome (GRCh37/hg19) using BWA program [31,32] and analysed using Picard and GATK version 3.6 [33,34]. Gene annotation of the variants was performed using VEP (Variant effect predictor) program against the Ensembl release 87 human gene model [35]. Clinically relevant pathogenic variants were annotated using published variants in the literature and a set of diseases databases -ClinVar [36], OMIM (Online Mendelian inheritance in man) [37], GWAS catalogue (Genome wide association study in man) [38], HGMD (Human gene pathogenic variant database) [39] and SwissVar [40].
Validation of NGS results: All disease associated variants were manually inspected using IGV (integrative genomics viewer). It was observed that all variants had sequencing depth > 30. No strand biasness was observed. All variants were of good mapping quality. None of the variants were in highly repetitive regions. These were further validated using Sanger sequencing.

Results
Population demographics: Of the 200 individuals enrolled, 61.5% belonged to the 31-40 years age group (Table 1) could not be classi ed as they were either unsure of their caste and origin or were born of an inter-caste marriage.
Carrier frequency: Of the 200 participants, 52 (26 %) were found to be carrier of one or more disorders (Table 2). Congenital deafness as the most common disorder identi ed, with a carrier frequency of 1 in 17, for one of the four genes SLC26A4 (solute carrier family 26, member 4), GJB2 (gap junction beta 2 protein), TMPRSS3 (transmembrane protease, serine 3) and TMC1 (transmembrane channel like protein 1), in decreasing order. Cystic brosis was the second most commonly observed disorder with a carrier frequency of 1 in 22. Three subjects were detected to be carriers for Pompe disease (frequency 1 in 67) ( Table 2).
There was no couple where both husband and wife were carriers for the same disorder. No woman was found to be a carrier for the seven X-linked disorders included in the panel (Fabry disease, Ornithine transcarbamylase de ciency, hemophilia A and B, Hunter syndrome, severe combined immunode ciency (SCID) and adrenoleukodystrophy).
Of the 52 (26%) subjects found to be carriers, majority were carriers for one disorder (n=47/200=23.5%) and ve for two disorders (n=5/200=2.5%). No individual was found to be a carrier for three or more disorders.

Disease causing variants
The disease-causing variants were identi ed 57 times in 52 of 200 subjects ( Table 3). Number of variants were 47, as some variants were identi ed more than once. Majority were of missense type (72.34%). Among the already reported variants, 29.5% (n=13/44) have been described in patients belonging to the Indian subcontinent (India, Pakistan, Bangladesh). The individual variants are listed in Table 3 & 4 and discussed in more detail later. Three splice site variants were novel (not reported in the literature or locus speci c databases) and ful lled ACMG criteria for pathogenicity (Table 4).

Discussion
The study was designed to determine the carrier frequency of single gene disorders other than β-thalassemia for which has a carrier frequency of about 3-4 % has been shown in many studies in India [52]. Disorders such as spinal muscular atrophy (SMA), fragile X syndrome (FXS) and Duchenne muscular dystrophy (DMD) are common in all populations including South Asians were also excluded as these are di cult to detect with NGS [53]. Recently, we showed the carrier frequency of SMA in North India to be 2.25 % [23]. However, carrier frequency for other single gene recessive disorders is not known and signi cant differences in prevalence and pathogenic variants have been seen in different populations [54].

CFTR (Cystic brosis transmembrane conductance regulator) pathogenic and likely pathogenic variants
There were nine disease-causing variants identi ed in the CFTR gene in this cohort. Of these, only one case had the common p.Phe508del pathogenic variant i.e. 11% (n=1/9). Two pathogenic variants detected in CFTR gene in this study have been observed before in our laboratory (p.Arg75Ter and p.Ser549Asn). The remaining six pathogenic variants have not been reported in Indians earlier ( Table 3). The variants p.Ser549Asn, p.His199Tyr, p.Arg1070Gln have been described by multiple authors and functional studies have been carried out classifying them as pathogenic as per ACMG criteria. The other four variants p.Ile1366Phe, p.Cys491Phe, p.Phe1337Val, p.His620Leu have been documented to be associated with disease, however lack adequate functional studies. They meet the ACMG criteria for likely pathogenicity (Table 3). CFTR c.3854C>T, p.Ala1285Val variant was identi ed in three individuals, which though has been reported in the literature [55] associated with congenital bilateral absence of vas deferens CBAVD), is more likely to represent a common polymorphism due to its observance in high frequency in the NGS data in the Indian population (0.5% minor allele frequency in South Asians in GnomAD exomes). This variant was not included in the list of disease associated variants.
Studies on the genetic pro le of cystic brosis patients in India shows high variability, and many rare and new variants have been observed, while only few pathogenic variants (p. Arg1162Ter, p.Met1Thr, c.1161delC, p.Ser549Asp and c.1525-1G>A) are reported more than once [56,57,58]. This suggests the lack of founder or common mutations in CFTR gene and thus emphasises the need for sequencing of all coding regions of the CFTR gene in suspected cases in the Indian population. In the present study except for p.Phe508del no other pathogenic variant was present in the ACMG panel of cystic brosis [59]. In view of the heterogeneity in pathogenic variants, Mandal et al. also suggested that a single panel of pathogenic variants cannot be used for diagnosis or carrier testing of CF in India [28]. Archibald et al. also observed that the pathogenic variants in cystic brosis vary according to ethnic origin [53]. Lim et al. reported in ExAC database that the pathogenic variants in the CFTR in non-Europeans are different from those in people of European descent. They noted that none of the current genetic screening panels or existing CFTR pathogenic variant databases cover a majority of deleterious variants in any geographical region outside of Europe [60].
Among the nine disease causing variants identi ed in the CFTR gene in the present cohort, only one case had the common p.Phe508del pathogenic variant i.e. 11% (n=1/9). Kapoor  Biallelic variants in the GJB2 gene or deletion in the gene cause congenital nonprogressive mild to profound sensorineural hearing impairment. The pathogenic variants identi ed in GJB2 represent have been previously reported in Indian subjects. Ram Shankar et al studied the pathogenic variants in GJB2 gene in Indian patients with deafness and found p.Trp24Ter to be the most common pathogenic variant India [22]. In addition, they documented two other common pathogenic variants p.Trp77Ter and IVS1+1G>A. These differ from the common pathogenic variants identi ed in the Western (c.35delG) [65] and Japanese (c.235delC) and Korean (p.Val37Ile) populations [66,67].

SLC26A4 related hearing loss
Hearing loss due to SLC26A4 has been reported as third most common cause of hearing loss in a study in a pan-ethnic population [68]. This occurs due to an enlarged vestibular aqueduct and temporal bone abnormalities which can be appreciated on imaging. In addition to hearing loss, these individuals may have euthyroid goitre (Pendred syndrome). In this study, two out of the four disease -causing variants reported have been previously described in individuals of Indian ethnic origin: p.Arg409Pro [69,70] and p.Ile490Leu [71]. Other variants found in our study include p.Gly334Val, that has been described chie y in people of Mediterranean origin [72] and p.Phe335Leu which is a common variant reported worldwide [73].
Carrier screening and prenatal diagnosis for a disorder like hearing loss which impairs quality of life can have differing perceptions among families in different countries. The parental perceptions in Indian culture where resources are scarce towards congenital hearing loss have been pointed out by Nahar et al. previously [74]. While some families are interested in using the information to help in the management, planning and emotional adjustment to the birth of a child with deafness others opt for discontinuing an affected fetus especially if nancial resources are scarce.
GBA c.1448T>C, and c.866G>C, p.Gly289Ala Biallelic variants in the GBA gene causing a de ciency of acid β-glucosidase and cause Gaucher disease, the most common lysosomal storage disorder in the world [75]. The variant p.Gly289Ala and p.Leu483Pro were observed in one individual in the present cohort. Ankleshwari et al. studied 33 Indian patients with Gaucher disease, and identi ed p.Leu483Pro as the most common pathogenic variant 60.6 % (n=20/33). In addition, they reported p.Gly289Ala as a novel pathogenic variant in a patient with type I disease [76]. Homozygosity for the p.Leu483Pro variant is associated with neuronopathic involvement (type III) ranging from mild oculomotor apraxia to more severe involvement as well as lethal cases of collodion skin baby phenotype [77,78]. The variant most commonly observed in Western population (p.Asn370Ser) and associated with type I Gaucher disease is observed less commonly in India [77,79].  [80].
Subsequently this pathogenic variant has been reported in patients affected with infantile onset Pompe disease in several studies [81]. This variant lies in exon 14 of the gene, reported to be a hot spot for this gene [81]. However a study done on Indian ethnic patients reported no hot spots for this gene [82]. The ASPA gene encodes for aspartoacylase enzyme, de ciency of which results in Canavan disease. One individual was found to be carrier for the p.Leu301Pro variant. This variant has been reported by our group previously in a patient of Indian ethnicity with classical Canavan disease and raised urine N-acetyl aspartate [86]. On the basis of the reported literature this variant has classi ed using ACMG criteria as likely pathogenic.
ACADM c.811G>A, p.Gly271Arg Biallelic pathogenic variants in ACADM affect mitochondrial fatty acid β-oxidation due to de ciency of the enzyme medium-chain acyl-coenzyme A dehydrogenase. The p.Gly271Arg is a well reported pathogenic variant in the ACADM gene worldwide. It was observed in one individual in this study. The c.985A>G pathogenic variant commonly seen in the West, believed to be a founder pathogenic variant in Caucasians originating from an ancient Germanic tribe was not observed in the present cohort [87].
Disorders like AR polycystic kidney disease, methyl malonic acidemia, galactosemia, Smith-Lemli Opitz syndrome, oculocutaneous albinism type II, cystic megalencephalic leukoencephalopathy, phenylketonuria and junctional epidermolysis bullosa can be expected to be common in the Indian population as at least two cases were detected among the 200 individuals screened.
Other investigators and our group have identi ed a number of disorders with founder mutations among the Agarwal community [88,89]. Carriers for only two of these were identi ed in the current panel of genes -calpainopathy and developed by ACMG for cystic brosis are not suitable and will miss many carriers.
This study highlights the importance of an Indian database in improving the classi cation of variants. It is creditable that many genetic centres are pooling their data to develop such a database. The high carrier frequency of cystic brosis, if substantiated in larger population studies, would be su cient ground to initiate new-born screening in the Indian population. One major limitation of this study is the small sample size, and a larger studies would be justi ed to serve as a valuable tool for reducing the burden of genetic disorders. Availability of data and materials -The datasets generated and analysed during the current study are available in the European Variation Archive (EVA) repository, under the study browser at https://www.ebi.ac.uk/eva. The project number is PRJEB40310.

List Of Abbreviations
Competing interests -The authors declare that they have no competing interests.
Funding -The expense of NGS was funded by Medgenome Laboratories Pvt Ltd., Bangalore, India. Medgenome Laboratories Pvt Ltd., carried out the next generation sequencing and molecular analysis. The data was re-analysed at Sir Ganga Ram Hospital, New Delhi.
Author's contributions -ICV, SBM, RDP and VLR conceptualized and designed the details of the study. KS enrolled the subjects and counselled them as a part of a thesis dissertation under the guidance of ICV, SBM, RDP, SK1, KG and IG.
VLR, SN and SS carried out the next generation sequencing and analysis of molecular data. SK2 re-analysed all the molecular data, formatted the les and uploaded them on public database. RS and SK1 interpreted the variants and evaluated their occurrence in house database. KS and SBM re-checked the individual annotated variant excel sheets, compiled all the results, carried out review of literature, and analysed the individual variants before post-test counselling of the individuals. KS, SBM and ICV drafted the manuscript and carried out the subsequent revisions. All authors read and approved the manuscript and agreed to be personally accountable for all the contributions in the manuscript.