Examination of NRCAM, LRRN3, KIAA0716, and LAMB1 as autism candidate genes

Background A substantial body of research supports a genetic involvement in autism. Furthermore, results from various genomic screens implicate a region on chromosome 7q31 as harboring an autism susceptibility variant. We previously narrowed this 34 cM region to a 3 cM critical region (located between D7S496 and D7S2418) using the Collaborative Linkage Study of Autism (CLSA) chromosome 7 linked families. This interval encompasses about 4.5 Mb of genomic DNA and encodes over fifty known and predicted genes. Four candidate genes (NRCAM, LRRN3, KIAA0716, and LAMB1) in this region were chosen for examination based on their proximity to the marker most consistently cosegregating with autism in these families (D7S1817), their tissue expression patterns, and likely biological relevance to autism. Methods Thirty-six intronic and exonic single nucleotide polymorphisms (SNPs) and one microsatellite marker within and around these four candidate genes were genotyped in 30 chromosome 7q31 linked families. Multiple SNPs were used to provide as complete coverage as possible since linkage disequilibrium can vary dramatically across even very short distances within a gene. Analyses of these data used the Pedigree Disequilibrium Test for single markers and a multilocus likelihood ratio test. Results As expected, linkage disequilibrium occurred within each of these genes but we did not observe significant LD across genes. None of the polymorphisms in NRCAM, LRRN3, or KIAA0716 gave p < 0.05 suggesting that none of these genes is associated with autism susceptibility in this subset of chromosome 7-linked families. However, with LAMB1, the allelic association analysis revealed suggestive evidence for a positive association, including one individual SNP (p = 0.02) and three separate two-SNP haplotypes across the gene (p = 0.007, 0.012, and 0.012). Conclusions NRCAM, LRRN3, KIAA0716 are unlikely to be involved in autism. There is some evidence that variation in or near the LAMB1 gene may be involved in autism.


Background
Autism is a severe neuro-developmental disorder that manifests itself during the first three years of life and persists throughout a patient's lifetime. Because of the frequency with which it occurs, its severity, and its impact on children and families, autism is a major public health concern. It is estimated to occur in ~1/300 births, affecting males three times more often than females. Autistic individuals display impairments in sociability, communication, and also demonstrate repetitive and/or obsessivecompulsive behaviors [1][2][3][4][5][6].
A substantial body of data supports a genetic involvement in the etiology of autism. Further data suggests that at least one gene for autism lies on chromosome 7q22-31 [7][8][9][10]. Previously, the Collaborative Linkage Study of Autism (CLSA) narrowed the 34 cM 7q22-31 critical region to a 3 cM region between D7S496 and D7S2418).) [11]. Four genes within this region, Neuronal Cell Adhesion Molecule (NRCAM) (XM_027222); Leucine Rich Repeat Protein Neuronal 3 (LRRN3) (XM_045261); KIAA0716 (NM_014705); and Laminin Beta-1 (LAMB1) (NM_002291) were chosen for further study. This choice was based equally on their proximity to the microsatellite marker (D7S1817) most consistently cosegregating in this group of families, their tissue expression patterns, and their likely biological relevance to the autism disease process as described below.
The NRCAM gene encodes the Nr-CAM protein that is expressed in structures in the developing brain, including the floor plate. In this neuronal region, Nr-CAM has been implicated in axonal guidance through interaction with TAG-1/axonin-1 [12,13]. Additionally, when it is presented as a substrate in in vitro studies, Nr-CAM induces neurite outgrowth from dorsal root ganglia neurons [13]. Nr-CAM also serves as a receptor for several different neuronal recognition molecules. In its role as a receptor, it is active during nervous system development in several different regions including the spinal cord, the visual system, and the cerebellum [12].
Studies in Drosophila demonstrate that many members of the LRR family provide an essential role in target recognition, axonal pathfinding, and cell differentiation during neural development [14,15]. NLRR-3, the murine ortholog of human LRRN3, was isolated using a human brain cDNA fragment encoding an LRR as a probe against a mouse brain cDNA [16]. NLRR-3 mRNA is expressed abundantly in the brain and has very little expression in other tissues. Its expression is developmentally regulated and is confined to the nervous system. Its molecular structure and its expression pattern suggest that the NLRR-3 protein plays a role in the development and maintenance of the murine nervous system through protein-protein interactions [17,18]. These murine studies further strengthen the relevance of the Drosophila results in that these LRR proteins could have similar, if not the same, integral functions in mammalian neural development. Other studies have also implicated NLRR-3 as an important component of the murine pathophysiological response to brain injury [17]. Since LRRN3 is located within this autism candidate region, and shares a high degree of homology to NLRR-3, it was considered a strong autism candidate gene.
Based on an hypothesis that large cDNAs (> 4 kb) encoding large proteins (>50 kDA) in brain are likely to play an important role in mammals, studies to identify such novel genes have been performed [19]. During the course of these analyses, the KIAA0716 cDNA was identified.
Although little is known about KIAA0716 and its protein product, homology relationships suggest it is involved in cell signaling and communication based on its similarity to KIAA0299, which encodes the Dedicator of Cytokinesis 3 (DOCK3) protein. The DOCK proteins bind to the Src-Homology 3 (SH3) domain of the v-crk sarcoma virus CT10 oncogene homolog (CRK) protein and play an important role in signaling from focal adhesions [20].
Laminin is a large molecular weight glycoprotein present in basement membranes. Most laminin molecules are comprised of an α chain and two β chains that are assembled into a cruciform structure held together by disulfide bonds. Three types of β chains (β1 chain, β2 chain, and β3 chain) have been described and are thought to contribute to the functional variability of the members of the laminin family. The laminin-B1 (LAMB1) gene encodes the laminin B1 chain protein which is present at low levels in serum and is involved in cell attachment and chemotaxis presumably through its binding to the laminin receptor (expressed highly in the brain) (reviewed in [21]). Data from Xenopus studies demonstrate that laminin-1-beta is important in axonal guidance during embryonic development when it is co-expressed with netrin. More specifically, when netrin and laminin-1-beta are coexpressed, axons are repulsed into the areas where only netrin, and not laminin-1-beta, is present [22].
Based on their putative biological functions as well as their genetic locations within our previously reported 3 cM chromosome 7 autism critical region [11], these four neuronally expressed genes presented themselves as intriguing autism candidate genes. Thirty six intronic and exonic single nucleotide polymorphisms (SNPs) roughly spaced at 10 kb intervals throughout these candidate genes were analyzed via both single locus (Pedigree Disequilibrium Test-PDT) and multilocus (TRANSMIT) approaches. Additionally, Linkage Disequilibrium (LD) relationships between all tested SNPs were examined within the four candidates.

Sample composition
All probands met algorithm criteria of the Autism Diagnostic Interview-Revised (ADI-R) [23] and were at least four years old. All probands were also assessed with the Autism Diagnostic Observation Schedule (ADOS) [24] or a later revision (ADOS-G). Affected sibling pair (ASP) and trio families were recruited. Affected individuals were excluded if they had fragile X syndrome, tuberous sclerosis, or any other medical condition known to be associated with autism.
Families were recruited from three regions of the United States (Midwest, New England, and mid-Atlantic states) through three clinical data collection sites: the University of Iowa, Tufts University-New England Medical Center, and Johns Hopkins University. The Institutional Review Boards at each institution approved this study and appropriate informed consent was obtained from all subjects.
The sample was comprised of the 30 nuclear CLSA chromosome 7 families as previously described).) [11]. Both parents were available for genotyping in all but two families. Twenty-nine families had two affected children while one family had three affected children. Briefly, each of these families had evidence for linkage within a 34 cM region of chromosome 7q22-31 with LOD scores = 0.55. The affected sibling pairs also shared a 3 cM region between D7S496-D7S2418 (120 cM-123 cM) to a greater extent than any other chromosome 7 region.
For exon screening analysis, twelve unrelated affected individuals from these families were chosen because they shared one or both of the two overtransmitted SNP haplotypes (CV2193686/CV3268606 and CV1091266/ CV2193735).

Gene characterization and SNP choice
To determine the size and exon/intron structure of the four candidate genes, the Celera, Ensembl, and NCBI databases were used [25][26][27]. SNPs were chosen either from the NCBI dbSNP database [28] or generated through use of the Celera Discovery System and Celera Genomics' associated databases. An "RS" number indicates SNPs chosen from dbSNP while a "CV" number designates SNPs chosen from Celera RefSNP. When available, SNPs located in coding regions were chosen for analysis. In general, a 10 kilobase (kb) spacing of SNPs was sought to achieve as complete coverage as possible for a thorough analysis of each candidate gene.

Primer design
Once SNPs were chosen, polymerase chain reaction (PCR) primers were designed for single stranded conformational polymorphism (SSCP) analysis using the Primer 3 webbased PCR primer design software [29]. As part of the primer design process, each forward and reverse primer from the recommended pairs was compared against the NCBI non-redundant (NR) and high throughput genomic  ctggcctctctttacccaca  224  20  gtctgctcagccagagatca  gcagttccttgtttgcctgt  312  21  ctcccaaagtatgcacacga  tggcctgttctcttctgtga  218  22  tgggcagaataaatgtgcatc  tttggttttatcgggtgaca  271  22  tcattctaagaagtgggcagaa  tgacacgacagacccagaag  172  23  cctgagctcataaaggcatca  tgcctgtgtatttgggagtg  309  24  atcaagtcaatgcgagaacg  cctgcttgaccaaggtctgt  162  25  accatgccaggaaatctttg  gctagcattgcttattcccttc  477  26  tgggactatttgcaccaaaa  acgtgaaagttctcccttgg  273  26  tgatcctcatttgtgccaga  cagatattcggggtgagcat  159  27  cacatccccaaaccttacCA  tgccttggatagcattacca  259  28  ctgaggccatccactgagtt  gcctgatccttggcttgtt  249  29  cagagaatcacacagaaacaaatg  tcatttttgagccatttcca  281  30  ccccttagatgccatgtcttt  ggggacatctttgcttttca  258  31  aggtgtgttcccattcatgt  tctgggtcatttcaggaagg  281  32  aggagactggtggctccttt  ttgctggaaaattgacatgc  259  33  aaaataagcctgtgtgaaaagacc  ccacttctttctgtctaaatgtgg  219  34 agggattcatcaacaatcagtg ggcatccatagtttatttaaaagtga 433 sequence (HTGS) sequence databases to determine whether or not it was sufficiently specific for accurate PCR amplification in the desired region. In some cases, it was impossible to design a set of primers that were specific enough for the SNP region (high numbers of BLAST hits in many genomic regions), and new SNPs were chosen for primer design and analysis. Table 1 lists the primer sets, product sizes, and allele frequencies for each of the SNP assays.

SNP assay testing and optimization
Two or more different SNP primer sets in the same 10 kb region were designed and subjected to a preliminary round of SSCP screening using 2 different conditions (15 Watts for 4 hours and 28 Watts for 1.5 hours at 4°C) to ensure that the assays worked and to make a rough estimation of allele frequencies in the test population. In the preliminary screening, 14 unrelated control individuals (28 chromosomes) were tested for each polymorphism. An assay was considered successful, as well as sufficiently polymorphic, when two or more individuals had a consistently different banding pattern in the testing phase and the assay was clear (i.e. a banding pattern with only three possible genotypes representing a single SNP). After all of the assays were initially tested, successful assays spaced at regular intervals throughout the candidate genes were chosen for analysis. For assay RS280310, denaturing high performance liquid chromatography (dHPLC) was performed using previously published methods on the Transgenomic WAVE™ [30]. In a WAVE™ analysis, all homozygous individuals (regardless of their genotype) will yield a single dHPLC peak whereas heterozygous individuals will yield 2-4 dHPLC peaks based on the melting temperature of the PCR fragment heteroduplex or homoduplex and the actual run conditions. Hence, initially it was impossible for us to distinguish between an A/A homozygote and a C/ C homozygote. To unambiguously identify homozygous genotypes, we chose three samples with homozygous peak patterns and sequenced them to determine whether or not they were A/A or C/C for the particular SNP. Next, we pooled (in a 1:1 ratio) each of the unknown homozygotes with a known (sequenced) homozygote (A/A) and ran these samples on the WAVE™ again. Obtaining a single peak indicated an A/A homozygote. Obtaining four peaks indicated a C/C homozygote for this SNP. To crossvalidate the two genotyping methods, selected PCR amplified samples from the RS280310 assay were also run under SSCP conditions and blindly scored. Results were consistent for every sample.

Microsatellite genotyping
During the characterization work of KIAA0716, we identified a novel polymorphic CA dinucleotide repeat within intron 6 (the Celera genomic sequence contained (CA) 18 , while the NCBI sequence contained (CA) 22 ). A PCR assay was designed, characterized, and applied to the dataset. Primers were designed, controls were sized, and PCR amplification and analysis was performed using standard procedures. Briefly, PCR products were denatured and electophoresed on 6% (Polyacrylamide Gel Electrophoresis, PAGE) denaturing gels. The gels were then stained with Sybr ® Gold nucleic acid gel stain (Molecular Probes, Eugene, OR) and scanned on a Hitachi FMBIO II fluoroimager (Hitachi Instruments, San Jose, CA) using the appropriate filter for detection and visualization of the Sybr ® Gold stained PCR fragments.

Exon screening
For LAMB1 exon screening analysis, denaturing high performance liquid chromatography (dHPLC) was performed via previously published methods using the Transgenomic WAVE™ [30]. First, PCR primers were designed for exonic amplification as previously described.
Second, the assays were tested for specificity via PCR amplification and agarose gel electrophoresis. Third, the exonic assays were amplified in 12 unrelated affected individuals and run on the WAVE™ under denaturing conditions. Any samples with a discrepant dHPLC peak pattern were reamplified and subjected to dye-primer or dye-ter-minator sequencing using both forward and reverse primers. Table 2 lists the primer sets and PCR product size for each of the exon screening assays.

Genotype error checking
Genotypes were checked for Mendelian consistency, SNP haplotypes were constructed using SimWalk v2.0 [31], plotted in Cyrillic (version 2.1. Oxford, Cherwell Scientific Publishing), and all recombination events were identified. Haplotype reconstruction was based on minimizing the number of recombination events on a chromosome. In cases where apparent excess recombination was observed, gels were reread. If necessary, the SNP assay was rerun to determine the most accurate genotype. Once all of the data corrections were made, the haplotypes generated from the SNP genotype information were compared to the previously generated microsatellite marker haplotypes within and around the candidate genes. In all cases, the haplotypes were consistent.
Genotype frequencies of each of the various SNPs were also analyzed to determine that they conformed to Hardy-Weinberg equilibrium based on the observed allele frequencies.

Pedigree disequilibrium test (PDT)
Since a small number of families in our dataset were missing one parent or included half-siblings (one family), we used the PDT [32] to test for association between autism susceptibility and the genotyped SNPs. The PDT is an extension of the Transmission Disequilibrium Test (TDT) [33] that allows inclusion of extended families to test for allelic association. It has been shown [34] that the PDT is a valid and unbiased test of association even when linkage exists.

Transmit
Haplotype analysis offers a valuable tool for investigating associations between disease loci and multiple markers [35]. Therefore, TRANSMIT [36] was applied to adjacent two-SNP genotypes to determine whether any of these haplotypes were preferentially transmitted. Three-SNP haplotype analyses were not attempted due to the small overall sample size. All sampled individuals were included in the analyses.

Linkage disequilibrium (LD)
LD was assessed for the SNPs using the Graphical Overview of Linkage Disequilibrium (GOLD) program [37] and characterized using the D' statistic [38].

Results
In the subset of CLSA families chosen for increased sharing of the 4.5 Mb region of 7q31, we genotyped and ana-Linkage Disequilibrium results for NRCAM Once the genotype information was gathered and error checking was performed, association analyses were carried out to determine whether these genes are associated with autism susceptibility. The results from the allelic and haplotypic association analyses are given in Table 3. None of the individual polymorphisms in or surrounding NRCAM, LRRN3, or KIAA0716 demonstrated evidence for allelic association to autism. However, this analysis did reveal suggestive evidence for allelic association between LAMB1 and autism susceptibility for one SNP (CV2193735) located within intron 3 (p = 0.02). Examination of the two-SNP haplotypes produced marginally significant results for at least one combination in NRCAM, LRRN3, and KIAA0716. However, three different two-SNP combinations within LAMB1 (CV11428543/ CV2193689 p = 0.007 both within intron 25; CV2193689/CV3268606 p = 0.012 within intron 25 and intron 23; and CV1091266/CV2193735 p = 0.012 within introns 6 and 3) displayed strongly significant results ( Table 4). All SNPs were in Hardy-Weinberg Equilibrium.
Examination of marker-to-marker linkage disequilibrium identified discrete blocks of LD present within each of the genes. With this set of SNPs, NRCAM has four such blocks of LD defined by its 11 SNPs. The SNPs within LAMB1 displayed very strong LD in three blocks with a somewhat weaker level of LD among a majority of the SNPs. Within LRRN3, there is one SNP (RS280309) that seems to be in strong LD with the majority of the other SNPs. The 8 SNPs in KIAA0716 also comprise definite LD blocks although Linkage Disequilibrium results for LRRN3 After observing the multiple suggestive allelic association results in LAMB1, we screened its 34 exons for susceptibility variants using 12 unrelated affected individuals who shared one or both of the overtransmitted haplotypes (CV11428543/CV2193689-2/2 and CV1091266/ CV2193735-1/2). All of the exons were PCR-amplified in these individuals and screened for variations via dHPLC on the Transgenomic WAVE™. Whenever one of the exonic assays displayed a discrepant dHPLC peak pattern in any of the individuals, the exon was reamplified and sequenced in the discrepant individual and a control with forward and reverse primers.
13 of the 34 exonic assays had discrepant WAVE™ patterns in at least one tested individual. 12 of the 13 discrepant assays resulted in detectable SNPs while a SNP was not observed via sequencing in the other discrepant assay. Six of these detectable SNPs were in intronic flanking regions (exonic assays 1, 2, 4, 5, 26, and 34) while the other six were exonic. Four of the exonic SNPs resulted in synonymous codon substitutions (exonic assays 10, 15, 23, and 31) and the other two exonic SNPs created nonsynonymous amino acid changes (exonic assays 20 and 22). The G/A SNP in exon 20 at mRNA position 2915 creates a nonconserved glycine to serine change (G544S). A restriction assay was utilized to determine if this SNP was overrepresented in unrelated autistic individuals taken from the CLSA dataset (N = 90) compared to unrelated CEPH controls (N = 80). No significant difference in the allele frequencies between the two groups was found (χ 2 = 0.2177 ; P = 0.64). The exon 22 SNP is an A/G change at Linkage Disequilibrium results for KIAA0716 Figure 4 Linkage Disequilibrium results for KIAA0716. Each block represents the amount of LD between the indicated pair of SNPs.

LD in KIAA0716
mRNA position 3402 creating a conserved glutamine to arginine change (Q706R). We assayed this SNP via SSCP in unrelated autistic probands taken from the CLSA and Autism Genetic Resource Exchange (AGRE) datasets (N = 215) and unrelated CEPH controls (N = 73) to determine if there was a difference in allele frequencies. We did not observe a significant difference in allele frequencies between the two groups (χ 2 = 0.00 ; p = 1.00).

Discussion
Using a positional candidate gene approach we chose to examine LAMB1, NRCAM, KIAA0716, and LRRN3 for association to autism susceptibility.
Since it has been shown that using one or two SNPs is insufficient for a thorough candidate gene/disease association analysis [35], we chose multiple SNPs placed at regular intervals throughout these genes for our study.
Intronic and exonic SNPs were chosen to ensure that a susceptibility variant or a variant in LD with a true susceptibility allele would be detected. From this work, we observed some evidence for association between LAMB1 and autism, including one individual SNP (CV2193735) located within intron 3 (p = 0.02) and three separate two-SNP haplotypes (CV11428543/CV2193689, CV2193689/ CV3268606, and CV1091266/CV2193735) across the gene's transcriptional unit (p = 0.007, 0.012, and 0.012 respectively).
These results have not been corrected for multiple testing since it is still unclear as to what level of correction should be applied in an association study such as this. A Bonferroni correction (p < .004 correcting on 12 tests and p < .002 correcting on 24 tests) is too stringent to apply to these data since these tests are not all based on independent data points. However, not correcting at all for multiple Linkage Disequilibrium results for LAMB1 Figure 5 Linkage Disequilibrium results for LAMB1. Each block represents the amount of LD between the indicated pair of SNPs. tests will invariably lead to a high number of false positives in any study. The proper approach toward correction for multiple comparisons has yet to be resolved. Therefore, we provide the nominal results to allow the reader to decide the level of error correction to apply.

LD in LAMB1
Recently, it has been shown that LD varies markedly over different chromosomal regions and distances. Furthermore, average LD measures cannot be accurately predicted from one chromosomal region to another [39][40][41]. Hence, we defined the pattern of LD to ensure that we had adequate SNP coverage for our association study and achieved a sufficient SNP spacing for an association examination to be performed. Given the recent studies showing the tendency for LD to occur in "blocks" of DNA that can range from ~5-100 kb [42,43] and the SNP coverage that was achieved in this study, it was hardly surprising that we observed distinct blocks of LD within these genes.
During the course of the LAMB1 exon screening, 2 SNPs that led to amino acid changes were discovered in exons 20 and 22. The nonconserved amino acid change in exon 20 results in a glycine to serine substitution. This glycine is located within one of the Laminin Epidermal Growth Factor-like domains of the protein and has been evolutionarily conserved in various species including Mus musculus, Gallus gallus, Rattus Norvegicus, and C. elegans ( Figure  6). Given the extent of the amino acid change from the conformationally important glycine, which can confer a large degree of local flexibility on its polypeptide, to serine, another residue that can have a large effect on the local polypeptide conformation, and the extent of evolutionary conservation of this particular amino acid region among various species, it was somewhat surprising to us that there was no significant difference in allele frequencies between the autism and control groups. Regardless, even if this substitution changes the protein Amino acid conservation among various organisms for human LAMB1 544G Figure 6 Amino acid conservation among various organisms for human LAMB1 544G.

Glycine-544
conformation or alters protein activity, we conclude that it does not play a significant role in susceptibility to autism. The other amino acid change that we observed through exonic screening was within exon 22 and resulted in a conserved glutamine to arginine change. This glutamine is located within the myosin tail of the protein and is not evolutionarily conserved across species (data not shown).

Conclusions
Extensive SNP genotyping in three genes within the autism candidate 7q31 region, NRCAM, LRRN3, and KIAA0716 did not reveal any genomic variation associated with autism. However, some evidence of association with a multi-locus haplotype in LAMB1 was observed. Although exon screening did not discover a common variation that alters the LAMB1 protein product in autistic individuals, it is possible that disease susceptibility could also be conferred from a variation in the gene's regulatory region or from an intronic variant that impairs or alters splicing. Thus LAMB1 remains a viable candidate gene and may be associated with autism susceptibility in a subset of autistic patients. Further testing of our genetic findings in other datasets is required to definitively confirm or negate these results.