Diagnosis of Noonan syndrome and related disorders using target next generation sequencing

Background Noonan syndrome is an autosomal dominant developmental disorder with a high phenotypic variability, which shares clinical features with other rare conditions, including LEOPARD syndrome, cardiofaciocutaneous syndrome, Noonan-like syndrome with loose anagen hair, and Costello syndrome. This group of related disorders, so-called RASopathies, is caused by germline mutations in distinct genes encoding for components of the RAS-MAPK signalling pathway. Due to high number of genes associated with these disorders, standard diagnostic testing requires expensive and time consuming approaches using Sanger sequencing. In this study we show how targeted Next Generation Sequencing (NGS) technique can enable accurate, faster and cost-effective diagnosis of RASopathies. Methods In this study we used a validation set of 10 patients (6 positive controls previously characterized by Sanger-sequencing and 4 negative controls) to assess the analytical sensitivity and specificity of the targeted NGS. As second step, a training set of 80 enrolled patients with a clinical suspect of RASopathies has been tested. Targeted NGS has been successfully applied over 92% of the regions of interest, including exons for the following genes: PTPN11, SOS1, RAF1, BRAF, HRAS, KRAS, NRAS, SHOC, MAP2K1, MAP2K2, CBL. Results All expected variants in patients belonging to the validation set have been identified by targeted NGS providing a detection rate of 100%. Furthermore, all the newly detected mutations in patients from the training set have been confirmed by Sanger sequencing. Absence of any false negative event has been excluded by testing some of the negative patients, randomly selected, with Sanger sequencing. Conclusion Here we show how molecular testing of RASopathies by targeted NGS could allow an early and accurate diagnosis for all enrolled patients, enabling a prompt diagnosis especially for those patients with mild, non-specific or atypical features, in whom the detection of the causative mutation usually requires prolonged diagnostic timings when using standard routine. This approach strongly improved genetic counselling and clinical management.

So far, the molecular characterization can be reached in approximately the 75-90% of affected individuals. Some distinct phenotypes are emerged in association with definite gene mutations.
Nowadays, due to high genetic heterogeneity of these disorders, which affect genes that all together span about 30 kb of genomic DNA, the standard diagnostic testing protocol requires a multi-step approach, using Sanger sequencing. The selection of the genes to investigate on a first diagnostic level depends on the frequency of their association with this disorder and their relationship with a distinct phenotype. For this reason, accurate clinical evaluation and close interaction between clinical and molecular geneticists are mandatory for selecting the genes to be first studied. By using this approach, the causative mutations can be identified in most of the cases. Some mutations cannot be identified during the first screening level since some phenotypes may be related to mutations in different causative genes or some clinical features associated with NS related disorders may not be evident at younger ages, or some extremely rare mutations are not routinely screened at first analysis. To detect these mutations, an additional screening level is required with a second panel of genes, which again should be guided by clinical geneticist. In these latter cases the molecular diagnosis requires a longer time before identifying the pathogenic mutation. Moreover, standard Sanger sequencing for multiple genes is also an expensive technique. Based on these notions, genetically heterogeneous disorders demand innovative diagnostic protocols, in order to be able to identify disease-causing mutations in a rapid and routinely way.
Here we report our personal experience on the use of targeted Next Generation Sequencing (NGS) for diagnosis of RASopathies. Our study suggests that this protocol can be easily used as a standard diagnostic tool to identify disease-causing mutations, with a straightforward workflow from genomic DNA up to genomic variants identification.

Subjects
Between June 2012 and June 2013, 80 patients (35 males and 45 females) with a clinical suspect of any RASopathy were consecutively enrolled in this study. Mean age was 8 years (range 2 months -16 years). All patients had complete physical examination for major and minor anomalies by trained clinical geneticists (MCD, BD, RC). Two-dimensional Color-Doppler echocardiography, renal ultrasonography, and neurological/neuropsychiatric assessment for developmental delay or cognitive impairment were routinely performed. Clinical inclusion criteria were facial anomalies suggestive for RASopathies (presence of six or more features among hypertelorism, downslanting palpebral fissures, epicanthal folds, short broad nose, deeply grooved philtrum, high wide peaks of the vermilion, micrognathia, low-set and/or posteriorly angulated ears with thick helices, and low posterior hairline) [26], associated with almost one of the following clinical features: short stature, organ malformation (congenital heart defect or renal anomaly), developmental delay or cognitive deficit. All patients had normal standard chromosome analysis and array-CGH at a resolution of 75 kb.
A total of 10 DNA samples including 6 positive controls and 4 negative controls, previously characterized by standard Sanger sequencing were used as a validation set for establishing the amplicon resequencing workflow and assessing the analytical sensitivity and specificity of the targeted NGS. A second group of 80 DNA samples, extracted from patients manifesting the RASopathies phenotype, was used as training set. The patient's genomic DNA was extracted from circulating leukocytes according to standard procedures and quantified with fluorescencebased method. Informed consent was obtained from the patients' parents. The study was approved by the institutional scientific board of Bambino Gesù Children Hospital and was conducted in accordance with the Helsinki Declaration.

Targeted resequencing
Targeted resequencing was performed using a uniquely customized design: TruSeq® Custom Amplicon (Illumina, San Diego, CA) with the MiSeq® sequencing platform (Illumina, San Diego, CA). TruSeq Custom Amplicon (TSCA) is a fully integrated DNA-to-data solution, including online probe design and ordering through the Illumina website sequencing assay automated data analysis and offline software for reviewing results.

Probe design
Online probe design was performed by entering target genomic regions into Design Studio (DS) software (Illumina, San Diego, CA). Probe design (Locus Specific Oligos) was automatically performed by DS using a proprietary algorithm that considers a range of factors, including GC content, specificity, probe interaction and coverage. Once the design was completed, a list of 500 bp candidate amplicons (short regions of amplified DNA) was generated and the quality of each amplicon design assessed based on the predicted success score provided by DS.
For some targets, when required, DS has been used by the operator to edit and improve the predicted success score to a minimum value of 60%. All exons with a lower success score have been removed from the design and excluded from the final TSCA panel. The design was performed over a cumulative target region of 57,932 bp and generated a panel of 244 amplicons with a coverage of 98% of the cumulative region ( Figure 1). The choice of genes investigated in this panel has been made based on scientific evidence for a causative role in the disease [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. The list of the 11 genes, for a total of 132 exons, is reported in Table 1.

Library preparation and sequencing
TSCA kit generates desired targeted amplicons with the necessary sequencing adapter and indices for sequencing on the MiSeq® system without any additional processing. Library preparation and sequencing runs have been performed according to manufacturer's procedure.

Data analysis
The MiSeq® system provides fully integrated on-instrument data analysis software. MiSeq Reporter software performs secondary analysis on the base calls and Phredlike quality score (Qscore) generated by Real Time Analysis software (RTA) during the sequencing run. The TSCA workflow in Miseq Reporter evaluates short regions of amplified DNA (amplicons) for variants through the alignment of reads against a "manifest file" specified while starting the sequencing run. The manifest file is provided by Illumina and contains all the information on the custom assay. The TSCA workflow requires the reference genome specified in the manifest file (Homo sapiens, hg19, build 37.2). The reference genome provides variant annotations and sets the chromosome sizes in the BAM file output. The TSCA workflow performs demultiplexing of indexed reads, generates FASTQ files, aligns reads to a reference, identifies variants, and writes output files to the Alignment folder. SNPs and short indels are identified using the Genome Analysis Toolkit (GATK), by  A reference gene database is available in the Annotation subfolder of the reference genome folder and any SNPs or indels that occur within known genes are annotated. Each single variant reported in the VCF output file has been evaluated for the coverage and the Qscore and visualized via Integrative Genome Viewer (IGV) [27,28]. Based on the guidelines of the American College of Medical Genetics and Genomics [29], all regions that have been sequenced with a sequencing depth <30 have been considered not suitable for analysis. Furthermore we established a minimum threshold in Qscore of 30 (base call accuracy of 99.9%).

Sanger sequencing validation
All mutations identified by Miseq Reporter have been validated by Sanger sequencing using standard protocols and, where possible, family members were tested to detect the "de novo" origin of the mutation. Figure 2 shows the flowchart of the above described method.

TSCA performance
All coding regions for genes reported in Table 1 have been uploaded into DS for a total of 132 exons (cumulative target region of 57,932 bp). The 98.5% of the exons uploaded were covered by the amplicon design, with a predicted success score ≥60%. The remaining exons not entirely covered by DS or with a predicted success score <60% have been excluded from final TSCA content panel. TSCA sequencing runs generated 120 exons successfully and steadily sequenced (sequencing depth >30, Qscore >30), providing a total coverage of 91% of the overall of the exons uploaded into DS, and a coverage of 92% when referring to the number of exons covered by DS ( Table 1). The TSCA approach reduced up to 12 the number of exons requiring the standard Sanger sequencing analysis.   control samples confirmed both the expected mutations and the allele state. All variants were identified with a mean coverage of 318 and a mean Qscore = 38, providing a detection rate of 100% for the validation set (Table 2). Both positive and negative control samples did not highlight any further unexpected variant, confirming the absence of any unreported variant in the validation set, and of any false positive result.

Training set
Samples from training set were investigated in three different sequencing runs, with an average coverage of   (Table 3). All variants have been confirmed by Sanger sequencing and IGV, indicating the absence of any false positive result in the training set group (Figure 3). Moreover, to exclude any possible false negative event, 10 negative samples randomly selected, have been further analyzed by Sanger sequencing (only "hot spots" exons) and 30 additional samples have been analyzed for PTPN11, using NGS and Sanger sequencing and all of them provided negative results.

Reproducibility
TSCA sequencing showed 100% reproducibility for all 120 exons, independently from DNA samples and sequencing runs, making this approach compatible with a diagnostic purpose. Figure 4 illustrates the performance of the same target region through three sequencing runs.

Discussion
The term RASopathy applies to a group of genetic disorders characterized by similar phenotypes, caused by mutations in the RAS MAPK pathway. These phenotypes are characterized by a high degree of genetic heterogeneity, since individual diseases can arise from mutations in different genes. In addition, since different RASopathies share similar clinical features, their molecular characterization is complex, time consuming and expensive. In order to improve the molecular testing of RASopathies, in this study we investigated a protocol based on a targeted NGS using MiSeq Illumina platform enabling the analysis of all known causative genes in up to 96 patients in a single sequencing run. In particular, we analyzed 80 patients and identified 38 mutations in 6 of 11 RASpathway genes, including PTPN11 (22/38 = 58%), SOS1 (9/38 = 23%), BRAF (2/38 = 5%), MEK2 (2/38 = 5%), RAF1 (2/38 = 5%), CBL (1/38 = 3%). The relative frequency of mutations in the tested genes was in agreement with published results [30,36,37]. As shown in Table 3, in many patients the causal mutation was identified in the gene considered the most suitable candidate, based on frequency of mutations and the phenotypic characteristics. As expected, most NS patients had a PTPN11 mutation, while the second most frequently mutated gene was SOS1, followed by RAF1. Three patients with LEOPARD syndrome had one of the PTPN11 recurrent mutation previously associated with this phenotype (T468M or R498W) [20,33]. Among patients heterozygous for a pathogenic BRAF mutation, one was clinically diagnosed as being affected by CFCS, while in the other case clinical evaluation was unable to conclude whether he was affected by NS or CFCS. The patients' age at diagnosis is obviously important: this latter subject was 2 year-old at time of clinical evaluation, when he displayed only some features of CFCS [38]. Two other patients with CFCS had a mutation in MEK2 gene, which is less commonly mutated in this disorder. In addition, one NS patient had a mutation in CBL, a gene rarely associated with this disorder. Since all genes have been analyzed in one run, the present protocol allowed to reach the Developmental delay or cognitive deficit simultaneous identification of mutations affecting both the most frequent and rare genes, with a significantly reduction of time needed to reach the molecular characterization of the patient. This point is important for early diagnosis and classification of the different RASopathies, allowing a more appropriate management and counseling. A wide spectrum of PTPN11 mutations has been associated with NS, including some rare mutations: I221V, Q256R, F285S, V428L (Table 4). All these variants are associated with the distinct facial gestalt of NS, which was markedly expressed in one patient (case no.10) with F285S and mildly expressed in the others. Familial transmission from an affected individual has been found in one instance, and suspected in another case (no.8), where the affected proband's brother was referred to have membranous subaortic stenosis and cryptorchidism. Mental retardation or cognitive deficit was not associated with these mutations, with the exception of F285S mutation. Interestingly, the patient heterozygous for this latter mutation had a congenital pancreatic cyst, an unusual malformation in the RASopathies. In one patient (case no.16) we identified two unpublished mutations affecting two consecutive PTPN11 aminoacids, D395Y and Y396H. Both mutations were inherited from the affected father. Variability of clinical expression in this family was recognizable, since facial anomalies of NS were associated with developmental delay in the son only. The father had congenital total alopecia as distinctive feature. It is evident that the phenotype related to these mutations is quite atypical, showing common Noonan-like facial anomalies associated with variable additional neural and ectodermal features.
The other 43 patients enrolled in this study were negative for the investigated RAS genes. The proportion of negative patients was higher compared to previous reports, likely because clinical inclusion criteria were less stringent compared to other studies [39][40][41], being based on NS facial anomalies and almost only one additional feature. Interestingly, the mother of patient n°18 was included in this study being the mutated parent of an affected child, although the minimal clinical criteria for diagnosis of NS were not present.
All the variations have been confirmed by Sanger sequencing as well as the 4 negative control patients. These data indicated a 100% detection rate of mutations involved in RASopathies and, most important, all these results have been obtained with MiSeq on board analysis, being the bioinformatics examination performed only adopting the user friendly Miseq Reporter software.
The absence of any false negative and false positive results and the possibility to have an easy and accurate data analysis make this approach a good diagnostic tool. Furthermore, the library preparation workflow is easy and the TSCA kit performance is stable. In fact, all different experiments for the RAS pathway genes in each run provided the same results in terms of coverage and quality ( Figure 4). Two major points to be considered in the diagnostic protocols include the time needed to complete the entire workflow and the costs that this approach requires. Targeted NGS analysis of the complete coding sequences of the 11 genes in the RAS pathway for 96 patients takes about two months, including ten days for library preparation and data analysis and about 45 days to characterize uncovered regions using Sanger sequencing. Conversely, the use of Sanger sequencing to analyze the full coding sequence of the 11 genes would take about 16-18 months for the same number of patients. We also calculated that the cost of NGS analysis applied to the 92% of the regions of interest, plus Sanger sequencing of for the remaining regions, would cost 6 time less than the cost of a protocol entirely based on Sanger sequencing ( Figure 5). However, time and cost could be further reduced, by designing a new panel which includes also RIT1, a gene recently associated to NS [42], resulting in a 100% coverage of the cumulative region. This result likely makes the Sanger sequencing irrelevant for the analysis, further reducing the time and the cost of the entire process.

Conclusion
This study demonstrates that NGS can be successfully applied to the molecular testing of RASophaties with a remarkable gain of time and less cost, while maintaining the high quality of the results. Consistent with available records, our data confirm that the genetic mechanism underlying the RASopathies is due to germline mutations in different genes encoding for components of the RAS-MAPK signalling mutations, with PTPN11, followed by SOS1, being the most frequently mutated genes in our cohort. Moreover, the use of NGS protocol has allowed, with a high standard in terms of coverage and quality, an early detection of rare mutations in other RAS-MAPK genes, avoiding the use of standard Sanger sequencing approach and the related enlarged cost and time consuming issues. Taken all together, these data highlight the usefulness of a molecular characterization that lead to an early diagnosis especially for patients with mild, nonspecific or atypical features and might direct to a more appropriate genetic counselling and clinical management.