The value of some Corsican sub-populations for genetic association studies

Background Genetic isolates with a history of a small founder population, long-lasting isolation and population bottlenecks represent exceptional resources in the identification of disease genes. In these populations the disease allele reveals Linkage Disequilibrium (LD) with markers over significant genetic intervals, therefore facilitating disease locus identification. In a previous study we examined the LD extension on the Xq13 region in three Corsican sub-populations from the inner mountainous region of the island. On the basis of those previous results we have proposed a multistep procedure to carry out studies aimed at the identification of genes involved in complex diseases in Corsica. A prerequisite to carry out the proposed multi-step procedure was the presence of different degrees of LD on the island and a common genetic derivation of the different Corsican sub-populations. In order to evaluate the existence of these conditions in the present paper we extended the analysis to the Corsican coastal populations. Methods Samples were analyzed using seven dinucleotide microsatellite markers on chromosome Xq13-21: DXS983, DXS986, DXS8092, DXS8082, DXS1225, DXS8037 and DXS995 spanning approximately 4.0 cM (13.3 Mb). We have also investigated the distribution of the DXS1225-DXS8082 haplotype which has been recently proposed as a good marker of population genetic history due to its low recombination rate. Results the results obtained indicate a decrease of LD on the island from the central mountainous toward the coastal sub-populations. In addition the analysis of the DXS1225-DXS8082 haplotype revealed: 1) the presence of a particular haplotype with high frequency; 2) the derivation from a common genetic pool of the sub-populations examined in the present study. Conclusion These results indicate the Corsican sub-populations useful for the fine mapping of genes contributing to complex diseases.


Background
Isolates have been of considerable use in genetic studies aimed at identifying mutations underlying rare diseases [1]. Moreover, isolated populations also afford several advantages in unrevealing the genetics of complex diseases [2,3]. The identification of genes involved in the pathogenesis of multifactorial diseases would contribute towards a better understanding of the physiopathology of these conditions. Furthermore, the prevention and development of new therapeutic approaches would be significantly enhanced.
Association studies are critically dependent on the extent of LD. Several reports have underlined how Linkage Disequilibrium (LD) is more extended in founder populations [4]. LD extended over large regions increases the power of association studies since the number of markers to be analyzed is at least 30% less than in outbred populations [4].
Genetic homogeneity found in isolated populations is a great advantage in the identification of large genomic regions containing the disease-associated locus, while fine mapping could require recently expanded population.
The aim of this paper is an evaluation of some Corsican sub-populations as candidate to identify genes implicated in complex disease. Genetic structure of Corsican population has been studied by several authors [5][6][7][8][9]. Corsica is the fourth largest island in the Mediterranean Sea (after Sicily, Sardinia and Cyprus) (figure 1). It is located southwest of Italy, southeast of France, and north of the island of Sardinia. Corsica has 1000 km of coastline, and is very mountainous, with Monte Cinto as the highest peak at 2706 m and 20 other summits of more than 2000 m. Low exogamy and migration rates, coupled with the existence of geographic barriers have generated a strong genetic drift on the island [7][8][9].
Corsica has been invaded several times (Greeks, Chartaginians, Romans, Vandals, Bizantynes, Saracens, Pisans, Genoese, Austrians, English and French). In the great majority of cases, these invasions were limited to the coastal areas and left slight marks on the gene pool of the native populations [6]. Strong evidence also suggests an internal microgeographic diversity with the most conserved population located in the center of the island in the mountainous regions [6], as reflected also in several dialectal linguistic subdivisions from a basic language similar to the Tuscan dialect plus remains of an archaic substratum shared by Corsica and Sardinia [10]. The island is separated from Sardinia by the Strait of Bonifacio.
The close by isolated founder Sardinian population has been extensively investigated and has provided a marked contribution towards the mapping of rare monogenic and complex diseases [3,11,12]. We proposed Corsica and Sardinia as a unique opportunity to study genes involved in complex diseases. This idea was based on the fact that the populations of these two islands both derive from a single ancestral population that migrated to Corsica and Sardinia during the last ice age. The genetic proximity between the two islands has already been emphasized elsewhere [13,14]. In a previous study we examined the extent of LD in three central Corsican sub-populations: Niolo, Corte and Bozio, using 7 microsatellite markers on the Xq13-21 region. Our results displayed a high degree of LD for the sub-populations of Bozio and Niolo and a lower degree for Corte [15]. We previously proposed a multistep procedure involving LD mapping in populations characterized by a different degree of LD extension but sharing a common founder population [15,16]. The proposed procedure involved: a) Identification of a large genomic region containing the disease associated locus in small sub-isolated population of the central region of Corsica and replication in small sub-isolated population of central Sardinia. b) Mapping in populations inside Sardinia and Corsica that show a weaker degree of LD extension. c) Fine mapping in open populations in which the extent of LD is low, such as the general population of Sardinia and Corsica. d) Final mapping in open population in which the level of background LD is very low, such as the African population. [15] In the present study we report the LD pattern on Xq13-21 microsatellite markers in a Corsican coastal sub-population in order to assess the presence of a decrease of LD Figure 1 Corsica island.

Bozio
Niolo extension in island. We have also examined the distribution of the DXS1225-DXS8082 haplotypes [17,18] in the four Corsican sub-populations investigated.

Methods
In a previous paper, we have studied the LD extent in 3 sub-populations located in the central mountainous area (Bozio, Niolo and Corte) [15]. Here we show new data on a sample of the Corsican coastal population coming from Ajaccio (n = 50) and Bonifacio (n = 32) area. Finally we evaluated a pooled sample of the different sub-populations.
The use of microsatellite markers in LD mapping is being substituted by SNPs. Nevertheless microsatellites markers remain a valuable tool for the first screening of background LD in a given population.
DNA samples were collected from unrelated male individuals. These samples were analyzed using seven dinucleotide microsatellite markers on chromosome Xq13-21: DXS983, DXS986, DXS8092, DXS8082, DXS1225, DXS8037 and DXS995 [12] spanning approximately 4.0 cM (13.3 Mb). The analysis of this region has been widely used as a measure of background LD in a given population and to compare the levels of LD between populations [2,[18][19][20][21].
Microsatellites were analysed using an ABI prism 377 DNA analyser. Genotypes were processed by Genescan v3.1 and Genotyper v2.5 software. A DNA standard, consisting of the CEPH control individual number 1347.02 (Applied Biosystems) was incorporated in all the runs to verify accuracy of typing.
The non-random allelic association between pairs of microsatellite loci was tested by an extension of Fisher exact test on contingency tables. p values were corrected by the step-down Holm-Sidak procedure (with the formula: p corrected = 1 -(1 -p) n , where n is the number of p values smaller or equal to that being corrected [22]. Each corrected p value was considered significant when < 0.05.
In order to compare LD strength between samples, we used the multiallelic normalized disequilibrium coefficient D' between each marker loci pairs [23]. As a small sample size may lead to overestimation of D' values, the latter were calculated subsequent to correction by means of a bootstrap procedure [24]. Graphical displays of LD between microsatellite pairs were provided by GOLD software http://www.sph.umich.edu/csg/abecasis/GOLD/.
Genetic differentiation test was carried out by Genepop 3.3 (updated version of software Genepop 1.2). The method, based on allelic distribution of alleles in the var-ious populations, are described by Raymond and Rousset (1995) [25]. The unbiased estimate of the p value was performed using a Markov chain method. Genetic differentiation test was used for all pairs of populations.
An unbiased estimate of gene diversity was calculated according to Nei [26] as follows: Variance of gene diversity is given by: where r is the numbers of loci and Gene diversity is often referred to as expected heterozygosity and is defined as the probability that two randomly chosen alleles from the population are different.
The study complies with the Declaration of Helsinki and received ethical approval by the University of Corte. Each subject, of Corsican ancestry up to grandparents, gave informed consent.

Results
The extent of linkage disequilibrium was assessed in each Corsican sub-populations, using seven markers on Xq13-21 and two different approaches: Fisher's exact test was used to evaluate statistical significance of the allelic association between all pairs of loci and Lewontin's coefficient D' was applied to assess the strength of LD. Table 1 reports the significance of non-random allelic association between pairs of microsatellite loci by the pairwise LD based on Fisher's exact test and D' values. Findings obtained for the previously analyzed sub-populations from Bozio, Niolo and Corte [15] and those from the coastal population sample are reported. Furthermore, an analysis was performed to evaluate LD in a pooled sample in order to mimic a random sample of the South Central Corsican population as a whole.
The coastal population showed a low level of LD, rather similar to values obtained for Corte. In both sub-populations only one marker pair (DXS8082-DXS1225) displayed a significant p value following correction for multiple testing with a high D' value (p = 0.000, 0.636 and p = 0.000, D' = 0.622 respectively). This marker pair has shown a strong association in all the populations analyzed so far [18].
The lowest levels of D' was shown by the pooled Corsican sample. Only the DXS8082-DXS1225 pair showed a corrected D' exceeding 0.5. Nevertheless, significant values were revealed in four marker pairs by means of Fisher's exact test.
Bozio sub-population exhibited the highest level of LD in Corsica followed by Niolo [15]. According to the D' values reported in Table 1, graphical representation of the patterns of LD in the different sub-populations show a decreasing extension of LD with the distance from Bozio and Niolo to Corte and the Coastal populations ( figure  2). Above all the pooled Corsican sample show a reduced LD extension, with low D' values (blue and dark blue areas) spread out for the most part of physical map. Table 2 shows the results of the differentiation test. The null hypothesis was that alleles distribution was identical across sub-populations. A high level of differentiation was observed among all sub-populations pairs with the exception of Niolo and Bozio being undifferentiated (p = 0.17) in agreement with previous results [15].
The distribution of DXS1225-DXS8082 haplotypes was also analyzed ( Table 3). The most frequent haplotype in the Corsican sub-populations was 216-225 followed by the 218-225 haplotype. The first haplotype was frequently observed in the sub-populations of Niolo (31%) and Bozio (25%) as well as in the coastal sample (22%), whereas the second was found increasingly in Corte (20%). The 210-219 haplotype reported by Laan [18] as the most largely represented haplotype in Europe was absent in the Corsican samples under investigation implying a very low genetic frequency or the absence in the Corsica population.
In addition we analysed the number of alleles and the gene diversity estimates in each sub-population and in the pooled sample. None of the samples showed a different mean gene diversity with respect to other sub-populations (Table 4). Moreover the table 4 shows that the average number of alleles is comparable in each sub-population analysed, while the number of alleles increases in the pooled sample (mean = 11.29). Therefore a different pattern of alleles are present in the different sub-populations contributing to the observed differentiation.

Discussion
The findings of the present study illustrate a decline in the LD extent in Coastal Corsican population compared to the inner mountainous regions of Niolo and Bozio. We have also shown the presence of an high frequency of a particular DXS1225-DXS8082 haplotype in most of the Corsican sub-populations examined, implying a common genetic origin (with the possible exception of Corte). The decrease observed in the level of LD in the coastal population (figure 2) imply the suitability of this sub-population for use in association studies based on a multistep procedure [15].
In order to mimic a random South Central Corsican sample we also analyzed LD level on the Xq13-21 region on a pool of Corsican samples. LD extent drops dramatically in the pooled sample (figure 2), making a general Corsican sampling unsuitable for long range mapping. This result is  Table 1. Colours reflect corrected D' values from red (D' = 1) to deep blue (D' = 0). Patterns of LD for Bozio, Niolo and Corte sub-populations appear slightly different from the distributions shown in Latini et al. [15], due to the updated Ensembl physical map. In agreement the analysis of LD level of 50 random sample drawn from pooled Corsican sample provided similar D' values to the unselected pooled sample (data not shown). The judgment on the extent of LD level in a population should take into account both D' values levels and the significance of the allelic association [27].

COASTAL POPULATION POOLED SAMPLE
The presence of an high frequency of the same DXS1225-DXS8082 haploype (216-225) either in the Coastal subpopulation sample as in sub-populations from Niolo and Bozio regions, suggest their origin from a common genetic pool. Being recombination events between these two markers extremely rare, new variants arise prevalently from a mutation in one of the two loci. The footprint of a demographic event persists for a longer period in haplotype distribution within a region characterized by a low crossing-over rate than in a single marker or between actively recombining markers [18]. Nevertheless it is worth nothing that the most frequent haplotype in Corte subpopulation is the 218-225 which only one dinucleotide away from the 216-225.
All the sub-populations examined show a strong degree of differentiation, according to the differentiation test, with the exception of Bozio and Niolo in agreement with previous results [15]. These data are also in agreement with the well documented microgeografic differentiation in Corsica [6]. The apparent discrepancy between haplotype distribution and differentiation test is best explained by the evolutionary forces causing microdifferentiation. In fact micro differentiation is often due to bottlenecks and genetic drift from a common genetic pool which will change gene frequency, thus restricting the number of haplotypes in the isolate population compared to the coastal sample.
Average gene diversity estimate for each Corsican subpopulation was quite similar to values reported for other populations and indicates that they possess similar homogenous genetic architectures. The pattern of gene  diversity obtained for all single markers showed no clear correlation with the extent of LD, thus suggesting that the increase in linkage disequilibrium is probably not due to selection, but rather due to demographic history of the studied populations [17].
The island of Corsica is geographically very close to the other Mediterranean Island of Sardinia. The isolated founder Sardinian population has been extensively investigated and has provided a marked contribution towards the mapping of rare monogenic and complex diseases [1,3]. The Corsican and Sardinian populations derive from a common genetic founding pool, being subjected over the centuries to similar evolutionary forces, such as isolation, consanguinity, and bottlenecks caused by famine and epidemics [5,6,13]. Moreover, the two populations share a similar dietary regime and climate (Mediterranean). Based on these considerations, it is feasible to suggest that the two islands may have selected the same kind of allele associated to specific common diseases. The distribution of the DXS1225-DXS8082 haplotypes highlights the genetic peculiarity of the populations of Corsica and Sardinia, thereby confirming previous data [13,14]. Indeed the 210-219 haplotype, the most frequent in Europe [18] is rarely observed also in Sardinia, where the most frequently represented is 216-221 (Zavattari personal communication). This haplotype has not been reported in the European population, likely stemming from the 216-225 Corsican haplotype.
The most frequent haplotypes of microsatellite loci DXS1225-DXS8082 in the Corsican sub-populations under investigation (Table 3) have not been reported in other populations to date (see Laan et al. [18]). Nevertheless we can not exclude that these haplotypes are present in the other populations examined as well as that the absence of some haplotypes in our samples is due to the reduced sample size and/or to founder effect.

Conclusion
Taken together, this data suggests the suitability of the Corsican sub-populations as a candidate for use in identifying genes underlying complex diseases by means of LD association studies performed according to a multistep procedure as described previously [15]. As previously proposed [15], we confirm that the joint study of Corsica and Sardinia would be an unique opportunity to study genes involved in complex diseases, according to a multistep procedure based on the presence of a decrease of LD extension within the sub-populations of the two islands. Among the possible target diseases, cardiovascular diseases that have an high incidence in Corsica [28]http:// www.fnors.org/asp/travaux/accueil.htm and therefore could represent a good candidate for association studies.
Lastly, it should be underlined how particular caution should be taken when using "isolated populations", such as those of the Sardinian and Corsican Islands as a whole, for the purpose of performing long range LD mapping, due to the possibility that micro differentiation may limit LD extension as reported here for the Corsican population.