An investigation of polymorphisms in the 17q11.2-12 CC chemokine gene cluster for association with multiple sclerosis in Australians

Background Multiple sclerosis (MS) is a disorder of the central nervous system (CNS) characterised by inflammation and neuronal degeneration. It is believed to result from the complex interaction of a number of genes, each with modest effect. Chemokines are vital to the migration of cells to sites of inflammation, including the CNS, and many are implicated in MS pathogenesis. Most of the CC chemokine genes are encoded in a cluster on chromosome 17q11.2-12, which has been identified in a number of genome wide screens as being potentially associated with MS. Methods We conducted a two-stage analysis to investigate the chemokine gene cluster for association with MS. After sequencing the chemokine genes in several DNA pools to identify common polymorphisms, 12 candidate single-nucleotide polymorphisms (SNPs) were genotyped in a cohort of Australian MS trio families. Results Marginally significant (uncorrected) transmission distortion was identified for four of the SNPs after stratification for several factors. We also identified marginally significant (uncorrected) transmission distortion for haplotypes encompassing the CCL2 and CCL11 genes, using two independent cohorts, which was consistent with recent reports from another group. Conclusion Our results implicate several chemokines as possibly being associated with MS susceptibility, and given that chemokines and their receptors are suitable targets for therapeutic agents, further investigation is warranted in this region.


Background
Multiple sclerosis (MS) is the most common chronic neurological disease in young adults. It is characterised by inflammation of the central nervous system (CNS), believed to be the result of an autoimmune reaction resulting in demyelination and destruction of neural supporting cells [1]. Epidemiological studies suggest a multifactorial aetiology for MS, implicating a complex interplay between environmental and genetic factors [2]. In the past ten years, a large number of genome wide screens have been conducted, including the recent GAMES collaboration (Genetic Analysis of Multiple sclerosis in EuropeanS) [3]. Multiple regions of potential linkage and association with MS have been identified, suggesting that genetic predisposition to MS might result from the modest contribution of many genetic factors, which, if identified, may present important new therapeutic targets [4].
The inflammatory response that is characteristic of MS requires the targetted migration of leukocytes into the CNS, which is under the control of chemokines. Over 40 members of the human chemokine family have been identified, which act upon a variety of leukocytes via interactions with almost 20 seven-transmembrane domain chemokine receptors [5]. Chemokines are small molecules of approximately 8-10 kDa in size, and are primarily classified on the basis of the relative positioning of two conserved cysteines. In the CC chemokines, the two cysteines are adjacent, and in the CXC chemokines, a single amino acid residue separates the cysteines. Two smaller subgroups have also been identified; the CX 3 C chemokines, and the C chemokines.
Functionally, CC chemokines chemoattract a wide range of cells, including lymphocytes, dendritic cells, monocytes and some granulocytes [6,7], whilst CXC chemokines are chemoattractant for neutrophils and lymphocytes [6,7]. There is increasing evidence for chemokines possessing abilities beyond that of migration, including Thelper cell subset differentiation [8], T cell costimulation [9,10], and macrophage and natural killer cell maturation [11,12]. Substantial evidence supports the involvement of CC chemokines in the pathogenesis of MS. In the mouse model of MS, experimental autoimmune encephalomyelitis (EAE), knockout of CCL2 leads to resistance to disease induction [13], whilst CCL3, CCL4 and CCL5 have all been implicated in EAE development [14,15]. However, CCL3-knockout mice were found to be fully susceptible to myelin oligodendrocyte glycoprotein (MOG)induced EAE [16]. In MS lesions, expression of CC chemokines, including CCL2, CCL3, CCL4, CCL5, CCL7 and CCL8, and their receptors, have been identified on a wide variety of cells, such as astrocytes, microglia and perivascular T cells [17][18][19][20][21][22]. In addition, altered levels of CC chemokines and receptors have been identified in the serum and cerebrospinal fluid (CSF) of MS patients; some are elevated (including CCL5), whilst CCL2 is decreased in the CSF [22][23][24][25][26], possibly due to removal by CCR2-positive migrating cells as they cross the blood-brain barrier [27].
Fourteen of the 28 CC chemokine genes are clustered on chromosome 17q11.2-12 ( Figure 1) [28]. This cluster spans slightly less than 2 Mb, and is split into two subclusters separated by a gap of 1.5 Mb. The 17q11.2-12 region has been implicated in genome-wide screens for linkage and association with MS [29][30][31][32][33][34], and in a metaanalysis of three genome screens, the most significant nonparametric linkage score was obtained for this region [35]. The 17q region is also syntenic to an EAE quantitative trait locus on chromosome 10, which includes a chemokine gene cluster [36], and non-synonymous polymorphisms in murine CCL1, CCL2 and CCL12 were identified as candidates for the eae7 quantitative trait locus [37].
Candidate gene studies of CC chemokines in MS have not been extensive. Rather, the majority of studies have focussed on other diseases. Polymorphisms from several CC chemokine genes have been variably associated with diseases such as tuberculosis [38], HIV [39], arthritis [40], and asthma and atopy [41,42]. The CCR5Δ32 mutation has been widely studied in MS. Whilst it was found not to be a general susceptibility factor for MS [43], it may have effects on age of onset or disease progression [44,45]. More recently, several moderate single-loci and haplo-Schematic representation of the 17q11.2-12 CC chemokine gene cluster (not to scale) Figure 1 Schematic representation of the 17q11.2-12 CC chemokine gene cluster (not to scale). typic associations with MS were identified for single nucleotide polymorphisms (SNPs) from the CC chemokine gene cluster [46].
It was the aim of our study to conduct candidate gene analyses for the CC chemokines in order to identify polymorphisms and/or haplotypes associated with MS. We undertook a two-stage analysis. The first stage involved the scanning of the genes for published (online) and novel polymorphisms in several DNA pools by DNA sequencing. From these, 12 candidate SNPs were then individually genotyped in a cohort of MS trio families in the second stage. We also sought confirmation of our findings in an independent cohort of Australian MS families. Marginally significant (uncorrected) transmission distortion was identified for four of the SNPs, as well as for haplotypes encompassing the CCL2 and CCL11 genes. Pooled DNA sequencing DNA was extracted from whole blood using a rapid salting out method [48]. Methods for the construction of the DNA pools has been described elsewhere [49]. Four DNA pools were constructed: 217 HLA-DRB1*1501 positive sporadic MS patients; 155 HLA-DRB1*1501 negative sporadic MS patients; 169 MS probands from multicase families; 185 unrelated, unaffected controls. HLA-DRB1*1501 status was genotyped using a SYBR Green assay (Applied Biosystems (ABI)), described elsewhere [50].

Subjects
DNA sequencing primers were designed to encompass the exons of all genes analysed, as well as segments of the putative promoter region, and often part of the 3' untranslated region (UTR) (see additional file 1: 17qSuppTable1). Where possible, primers were also designed to maximise coverage of published SNPs, which were identified from online SNP databases.

Genotyping in Trio Families
SNaPshot An initial set of seven CC chemokine gene SNPs were selected for individual genotyping in a cohort of 204 MS trio families (Table 1): CCL2 -2581, CCL2 -2138, CCL11 67, CCL5 -471, CCL15 -1284, CCL15 136+88 and CCL23 -289. These SNPs were genotyped using the SNaPshot assay, which allows the genotyping of multiple SNPs simultaneously in a single-tube multiplexed reaction, using primer-extension methodology [51] (see additional file 3: 17qSuppTable3 for SNaPshot primer details for each of the seven SNPs).
SNaPshot genotyping was performed on the pooled PCR products of each individual, amplified using either the ABI 2× PCR master mix, or the Fermentas 2× PCR master mix (Progen). SNaPshot reactions were optimised in individuals of known genotype. Between 100-200 ng of DNA was used from each individual with the appropriate amplification primer pair (Sigma Genosys) to a final concentration of 5 ng/μL. PCR products were amplified using the standard conditions, as described above. The successful amplification of PCR products was verified on 1% agarose gels. The PCR products for each individual were then pooled, and an aliquot purified using ExoSAP-IT (USB Corporation). To this aliquot was then added 2.5 μL of SNaPshot Multiplex reagent (ABI), and pooled SNaPshot primers as defined in Supplementary Table 3, and the SNaPshot procedure was followed as per the manufacturer's instructions (ABI). Samples were then scanned using an ABI 3100. SNaPshot results were viewed using the GeneScan™ software (ABI) and genotypes determined using the GeneMapper™ software package (ABI).

SNPlex™
An additional five CC chemokine gene SNPs were selected for genotyping in the full cohort of 373 MS trio families (Table 1): CCL11 -488, CCL8 205, CCL16 -595, CCL14 -649 and CCL23 316. These SNPs were genotyped using the SNPlex™ genotyping system (ABI), for which SUPAMAC (University of Sydney, Australia) were contracted. This system allows high-throughput genotyping of up to 48 SNPs simultaneously in a single tube using an oligonucleotide ligation assay. Initially, candidate SNP details were submitted, after which suitability for the assay was determined by running the SNPs through an algorithm. Oligonucleotides were then designed for each SNP, and applied to the DNA samples (500 ng genomic DNA). Samples were scanned using an ABI 3730. Genotyping results were determined using the GeneMapper™ software package (ABI), utilising an allelic discrimination cluster analysis to allocate genotype calls with maximum stringency.

Additional genotyping of validation cohort
Genotyping of the additional cohort of MS trio families was contracted to the Australian Genome Research Facility (AGRF; Brisbane, Australia). Only the CCL2 -2138A>T and CCL11 -488C>A SNPs were genotyped using the Sequenom Autoflex Mass Spectrometer. After submission of SNP details, PCR oligonucleotides were designed and applied to the DNA samples.

Analyses & statistics
Sequencing results from DNA pools and individuals were compared and scanned for published SNPs and novel polymorphisms. All SNPs were classified according to the nomenclature recommended by den Dunnen and Antonarakis [52], and are relative to the translation start site. For genes transcribed in the anti-sense direction (indicated in Table 1 with (-)), the complementary nucleotide for each allele was used, and is used throughout this manuscript. Minor allele frequencies (MAF) of SNPs were estimated by comparing the relative peak heights of the alleles. Where the individual was heterozygous for a particular SNP, the MAF in the DNA pools could be corrected for SNP-specific variation in peak-height intensity. The significance of differences between the MS patient pools and the control pool was measured by estimated relative risk (ERR) calculations.
Each SNP was analysed for transmission distortion within the trios using the transmission disequilibrium test (TDT) [53]. Analysis was performed using the GENEHUNTER program [54].
The HelixTree genetics analysis software (Golden Helix Inc, Bozeman, USA) was used to calculate linkage disequilibrium (LD) for the 12 SNPs analysed and to determine haplotypes in parents only. By using the Expectation/Maximisation (EM) algorithm, the software was able to calculate probabilities of each haplotype occurring, based on multi-locus genotypes. It calculated D' and r 2 values, which are accepted measures of LD, and performed a χ 2 comparison for each pair of SNPs. HaploBlockFinder [55] was used to establish the haplotype block structure across the CC chemokine gene cluster.
TRANSMIT [56] was then used to analyse for haplotype transmission disequilibrium, estimating χ 2 values for individual haplotypes, as well as global χ 2 values for all haplotypes analysed. Mendelian transmission was checked using MERLIN [57].

Identification and analysis of SNPs in CC chemokine genes by DNA pool sequencing
The CC chemokine genes were scanned using DNA pool sequencing in order to identify common polymorphisms in the region, including the potential identification of novel SNPs, and to establish whether associations might exist for these SNPs. In all, 50 SNPs were identified across the CC chemokine gene cluster (a list of all variants identified is available in the additional file 2: 17qSuppTable2), 48 of which were known; novel SNPs were detected in CCL8 and CCL15. Five common SNPs within exons were confirmed, four of which cause codon changes. Twenty-nine SNPs had MAF >0.15 in the control pool, designated as 'common' here, and for which we had statistical power to detect associations for ERR >1.5.
The greater than 90% DNA sequence homology between CCL3 and CCL3L1, and between CCL4 and CCL4L1, made it impractical to design specific primers for each of these genes. Thus, these four genes were not sequenced.

Population genetics of individually genotyped markers
Based on their likely functional significance (codon changing, putative promoter region, reported functional significance) and ERR from the pooled DNA sequencing, 12 SNPs were chosen for genotyping in the MS cohort, either by SNaPshot or SNPlex™ ( Table 1). The CCL8 205 SNP was originally chosen for analysis by SNPlex™, but failed to pass the SNPlex™ algorithm. Thus, it was replaced with the CCL8 -572 SNP, which is in LD with the original SNP (International HapMap Project [58]).
Genotype information retrieval for the seven SNPs genotyped using SNaPshot was 100%. Genotype information available for single-locus analysis for the five SNPs genotyped by SNPlex™ ranged between 84-87%; losses due to failed genotyping, and the exclusion of genotypes called with low stringency. The SNPlex™ genotyping, which is based on a highly multiplex PCR, was quite sensitive to DNA quality. Unambiguous genotypes could be determined using the SNPlex™ calling algorithm, which we have used at the highest stringency. The genotype frequencies for all SNPs conformed to Hardy-Weinberg equilibrium estimates, except for the CCL11 -488 SNP in mothers. The Mendelian error rate for the seven SNPs was ≤ 2%. Table 2 lists the MAF of the 12 markers as determined by individual genotyping.

Linkage disequilibrium and haplotypes
The HelixTree genetics analysis software (Golden Helix Inc, Bozeman, USA) was used to conduct a pair-wise analysis of LD across the 12 CC chemokine gene markers, using the parents only. This software calculated both D' and r 2 measures of LD, and the data are represented in Figure 2. These results were confirmed using the Haploview software package [59]. Results suggest a clear separation of the SNPs into two haplotype blocks of moderate to strong LD, confirmed by HaploBlockFinder [55] (represented in Figure 2). The first block extended from the CCL2 -2581 SNP to the CCL11 67 SNP; a distance of 33.1 kb. The second block extended from the CCL14 -649 SNP to the CCL23 -289 SNP, but did not include the CCL15 136+88 SNP, which interestingly was not in LD with any of its neighbouring SNPs. This block extended across 30.9 kb.
The HelixTree software was then used to construct haplotypes only across the two clear haplotype blocks, in parents only (350-650 individuals; actual numbers varied for each SNP); and to estimate haplotype frequencies. Table 2 lists the common haplotypes (frequency >0.05) across the two haplotype blocks and their frequencies within the unaffected parents. Four common haplotypes were identified in the 5' haplotype block, with frequencies between 0.18 and 0.33. In the 3' haplotype block, three common haplotypes were identified, but the TATA haplotype was the most common by a substantial margin (frequency = 0.76).

Analysis for transmission distortion across single markers and haplotypes
Single-locus transmission disequilibrium Analyses for transmission distortion were undertaken for the 12 chemokine gene markers in all of the trios, as well as subgroups stratified based on HLA-DRB1*1501 status, disease course and gender. Results are listed in Table 3. Haplotype TDT Whilst single-locus analyses did not suggest that any of the SNPs were significantly associated with MS susceptibility, it is conceivable that they may define MS-associated haplotypes. Using the TRANSMIT [56] software, haplotype transmission was evaluated for common haplotypes (>5% frequency) of decreasing size from either block, as well as pairwise analyses across all twelve SNPs. Analyses were conducted only where genotyping information was available for all individuals across all 12 SNPs (n = 162 LD plot of CC chemokine gene cluster SNPs Figure 2 LD plot of CC chemokine gene cluster SNPs. families), with additional trio families removed after identification of Mendelian inconsistencies. Table 4 lists common haplotypes for which transmission distortion was identified. All of these findings were identified within the 5' haplotype block (CCL2 -2581.CCL2 -2138.CCL11 -488.CCL11 67). Marginally significant transmission distortion was found for two four-marker haplotypes, and several three-marker haplotypes (P ≤ 0.05). Marginally significant transmission distortion was also identified for pairwise analyses for CCL2 -2581.CCL2 -2138 (P(uncorrected) = 0.05) and CCL2 -2138.CCL11 -488 haplotypes (P(uncorrected) = 0.04). No significant transmission distortion was found for haplotypes across the 3' haplotype block.

Verification of haplotype results in an independent cohort
Independent validation of the results discussed above was sought in an independent cohort of 208 Australian MS trio families obtained from the Southern MS Genetics Consortium. It was determined that genotyping the CCL2 -2138A>T and CCL11 -488C>A SNPs was sufficient for information extraction across the four markers within the 5' haplotype block.
Genotype and minor allele frequencies for both SNPs were equivalent to those obtained for our cohort, and were in Hardy-Weinberg equilibrium. LD properties between the two markers were also in agreement with our results. No evidence for single-locus transmission distortion was identified for either SNP, supporting our original findings. Full haplotype transmission data could be obtained for 169 trios; losses were due to genotyping failure for either SNP. Two-marker haplotype analysis revealed slight transmission distortion for the A.C and T.C haplotypes in the same direction as our original findings, but the distortion was not significant (Table 4(b)). However, combining the total datasets of the two cohorts restored the original trends for transmission distortion for the A.C and T.C haplotypes (P = 0.05) ( Table 4(c)).

Discussion
In this study, we have analysed the members of the CC chemokine gene cluster for association with MS. Variant chemokine expression could diminish or enhance the inflammatory response characteristic of MS. The majority of CC chemokine genes are located in a chromosomal region (17q11.2-12) that has shown suggestive linkage and association with MS in a number of genome wide screens [29][30][31][32][33][34]. Most recently, in a large genome wide linkage screen, no genome-wide significant results could be identified beyond the MHC [60]. However, suggestive linkage was identified on chromosome 17q23. An important deduction from this work was that linkage studies are under-powered to detect modest associations even in large cohorts.
We utilised a two-stage approach for this study, the first of which was sequencing of four DNA pools, three composed of MS patients and one control DNA pool across the 17q11.2-12 CC chemokine gene cluster. This approach allowed estimation of MAF for all common (MAF >0.15) SNPs across the CC chemokine genes in our Australian Caucasian cohort, and identification of two novel SNPs. This technique has been validated by our group [61]. Twelve SNPs were chosen for further analysis by individual genotyping, based on their likely functional significance (codon changing, location in the putative promoter region), prevalence (MAF >0.1), and/or uneven representation between the MS and control pools.
Overall, a large amount of genetic data has been gathered from pooled DNA sequencing and individual genotyping. The pooled sequencing provided a close estimate of MAF for SNPs across the 17q chemokine gene region, and verified the existence of common SNPs published in online databases. The individual genotyping data from 350-650 unaffected parents allowed the determination of MAF for the 12 SNPs chosen for further study, establishment of LD structure across the region, and calculation of haplotype frequencies in our unaffected Australian cohort. The MAF data facilitate power calculations for additional diseasesusceptibility studies, and it and the haplotype data should prove useful in studies of population variation for these immunologically important genes. From individual genotyping, MAF were established for the 12 markers in a large unaffected cohort (up to 600 unaffected parents), whilst LD analysis revealed the haplotype block structure across the cluster, and haplotype frequencies were also established.
Measurement of transmission distortion for each of the 12 SNPs in MS trios provided some evidence for over-transmission of several of the SNPs after stratification, but these trends would not survive a conservative correction for multiple comparisons.
The CCL5 -471C>T SNP, which we found might be associated with MS in HLA-DRB1*1501 negative patients, is of potential functional relevance, as it creates a new tran-  scription factor binding site [41], and has shown associations with atopic dermatitis, atopy and asthma [41,42]. CCL5 is an important inflammatory chemokine, with a range of activities upon eosinophils [62], monocytes and activated T cells [63,64], and has been identified repeatedly in the CNS and CSF of MS patients [19,21,22]. Our data also identify CCL14 and CCL23 as warranting further investigation, with marginally significant (uncorrected) trends towards transmission distortion found in RR-MS and SP-MS patients for CCL14 -649T>A, and in males for CCL23 -289A>C. Little is known for either of these chemokines, beyond basic functionality, and neither has been studied with regard to MS pathogenesis. The CCL23 -289A>C SNP was found to affect a potential SP-1 binding site using the TRANSFAC database [65].
Upon analysing for haplotype transmission distortion across the CC chemokine gene cluster, we found suggestive evidence for association of haplotypes encompassing CCL2 and CCL11 (Table 4), which lie within a haplotype block spanning 33.1 kb. CCL2 has been implicated in MS pathogenesis. It is chemotactic for T cells and monocytes [66,67], is important in the induction of inflammation in EAE [13], has been identified in MS lesions [17][18][19], and in contrast to EAE studies has been associated with remission of MS [68].
Recently (and subsequent to the completion of our study), a similar study of polymorphisms across the CC chemokine gene cluster was conducted by Vyshkina et al [46]. In this study, 31 SNPs derived from online databases were selected for genotyping in a variety of individuals. The basis of selection of SNPs was not discussed. This is in contrast to our approach, in which an informed decision for SNP selection was based on several criteria, discussed above, with a particular focus on SNPs with potential functional consequences. The SNPs assayed by Vyshkina et al [46] were predominantly non-coding, with no SNPs from putative promoter regions. Fifteen SNPs overlapped with those identified in our pooled DNA sequencing. Two SNPs, both exonic, were individually genotyped in our study; the remainder did not pass the first stage. One exonic SNP was excluded as it was not common, whilst an exonic SNP in CCL4 was not analysed as this gene was excluded from our study. Similar to our study, Vyshkina et al [46] found no strong evidence for association with any single locus. In addition, we identified similar LD structure surrounding CCL2.CCL11 and between CCL14 and CCL23, and identified haplotypic associations for the CCL2.CCL11 haplotype block. Whilst the SNPs analysed in this block were different between the two studies, it might be assumed that they are subject to the LD within the block. Thus, whilst we can not specifically say that we have replicated the haplotype associations of Vyshkina et al [46], we would suggest that the determination of haplotype tagging SNPs in this block is justified. It was interesting that our TDT analysis was conducted entirely in sporadic MS cases, whilst that of Vyshkina et al [46] had an emphasis on familial MS, yet both studies found weak association for the CCL2.CCL11 haplotype block with MS, suggestive of a general MS susceptibility factor within this block. This group has since verified their results in a second-phase study [69]. Thus, in a combined population totalling almost 1000 MS families (331 Australian; 644 North American), the results implicate a haplotype association encompassing the CCL2.CCL11 markers.

Conclusion
We have conducted a two-stage analysis of polymorphisms across the CC chemokine gene cluster. We identified novel SNPs in this region, and added further information to the data available on LD structure and haplotypes across the cluster. Interestingly, our best single-locus findings were for promoter SNPs (CCL5 -471C>T, CCL14 -649T>A, CCL23 -289A>C), which might affect the relative expression of these chemokines and lead to downstream effects on leukocyte migration to the CNS, and therefore influence MS pathogenesis. We also confirmed the potential presence of a haplotypic association across CCL2 and CCL11. Further validation of the association of these SNPs in independent cohorts, and confirmation of their functional significance would support therapeutic targetting of these chemokines and their receptors.