Enhanced genetic maps from family-based disease studies: population-specific comparisons
© He et al; licensee BioMed Central Ltd. 2011
Received: 13 August 2010
Accepted: 19 January 2011
Published: 19 January 2011
Accurate genetic maps are required for successful and efficient linkage mapping of disease genes. However, most available genome-wide genetic maps were built using only small collections of pedigrees, and therefore have large sampling errors. A large set of genetic studies genotyped by the NHLBI Mammalian Genotyping Service (MGS) provide appropriate data for generating more accurate maps.
We collected a large sample of uncleaned genotype data for 461 markers generated by the MGS using the Weber screening sets 9 and 10. This collection includes genotypes for over 4,400 pedigrees containing over 17,000 genotyped individuals from different populations. We identified and cleaned numerous relationship and genotyping errors, as well as verified the marker orders. We used this dataset to test for population-specific genetic maps, and to re-estimate the genetic map distances with greater precision; standard errors for all intervals are provided. The map-interval sizes from the European (or European descent), Chinese, and Hispanic samples are in quite good agreement with each other. We found one map interval on chromosome 8p with a statistically significant size difference between the European and Chinese samples, and several map intervals with significant size differences between the African American and Chinese samples. When comparing Palauan with European samples, a statistically significant difference was detected at the telomeric region of chromosome 11p. Several significant differences were also identified between populations in chromosomal and genome lengths.
Our new population-specific screening set maps can be used to improve the accuracy of disease-mapping studies. As a result of the large sample size, the average length of the 95% confidence interval (CI) for a 10 cM map interval is only 2.4 cM, which is considerably smaller than on previously published maps.
Genetic maps are the foundation of linkage mapping for disease genes . Accurate genetic maps can greatly increase the power of a linkage study, especially for multipoint analysis. The accuracy of genetic maps is largely a function of the number of actual recombination events present in the data. Despite the importance of precise genetic maps for linkage studies, most genome-wide genetic maps [2–7] were built using a small collection of pedigrees comprising only the eight largest families (188 meioses total) in the Centre d'Etude du Polymorphisme Humain (CEPH) reference panel . Therefore, the 95% confidence interval (CI) for a 10 cM map interval from this small sample is large, at least 9.1 cM. deCODE Genetics constructed a substantially improved genetic map by genotyping 146 nuclear families containing 1,257 meioses [9, 10]. However, primarily because the grandparents of these small families from Iceland were not genotyped, the average number of informative meioses is only approximately 400, leading to an average 95% CI of 6.1 cM for a 10 cM map interval.
Several studies have shown that the use of inaccurate genetic maps during linkage analysis can reduce power and induce bias in the results [11, 12]. These effects are more pronounced for analyses using sex-specific maps, since they are each based on only half the meiotic count of sex-averaged maps, and therefore sampling errors pose an even greater problem. Halpern and Whittemore  showed that when distances from different maps were used in a multipoint analysis of prostate cancer, significantly different results can be produced.
We have used existing genotype data from 26 disease studies to generate improved genetic maps. The NHLBI Mammalian Genotyping Service (MGS) performed genome-wide linkage genotyping for hundreds of genetics studies using the Weber screening panels, with markers roughly evenly spaced along each chromosome at about 10 cM . The genotypes that they generated for these studies are appropriate for the construction of more accurate maps. Here, we describe the construction of high precision sex-averaged and sex-specific genetic maps utilizing genotypes from over 4,000 pedigrees that were previously genotyped by the MGS. Constructing genetic maps on a large collection of general pedigrees is extremely computationally demanding, especially in the presence of genotype errors, missing data, and multiple ethnicities. We have effectively analyzed this large heterogeneous data collection, either by using joint analyses or by combining the results from individual datasets. Our datasets were derived from different self-described populations, such as European/European descent, Chinese, Hispanic, African American, and Palauan. There are suggestions that the distribution of recombination may vary among some populations [15–17]. Therefore, genotypes from different ethnic groups were also evaluated separately to test whether we could detect population-specific distributions of recombination, and to produce population-specific genetic maps for the populations for which we had sufficient data.
A list of projects that contributed data1
Hispanic (Costa Rica)
European (Europe, USA, Canada)
European (Europe, USA), Chinese
Summary of our cleaned data in five populations
For rigorous quality control, we requested uncleaned genotype data and corresponding family relationship information from the PIs, and we performed thorough data cleaning. While we requested uncleaned genotype data so that we could apply an identical cleaning protocol to all the data sets, the primary studies of these data applied their own rigorous data cleaning steps prior to their own analyses. We evaluated the amount of missing genotype data per study and per marker to help ensure that none of the studies or markers were especially poorly genotyped. We identified and cleaned pedigree relationship errors first, followed by genotyping and gender-assignment errors. Pedigree relationship errors can result from different sources, such as undisclosed adoptions, mis-paternity, sample mix-up, incorrect family history, among others, all of which can lead to inaccurate results in linkage analyses. Genome scan data can be highly informative for checking the pedigree errors. We employed PREST  in our study, which implements identity-by-state, identity-by-descent, and likelihood-based methods to test whether the pattern of allele sharing between relative pairs is consistent with the stated relationship in the pedigrees. Individuals whose relationships in a pedigree were clearly wrong were excluded from our map construction. Genotyping errors can dramatically reduce the power of linkage studies. We used PedCheck  to identify, and clean our data of, Mendelian inconsistencies. For each marker that shows Mendelian inconsistencies, the PedCheck cleaning function sets all genotypes to unknown for each pedigree with inconsistencies. We detected subjects that were assigned an incorrect gender through identification of an over-abundance of homozygous female or heterozygous male genotypes for markers on the X chromosome. We identified 124 individuals that were coded as males that were highly likely to be females, or vice versa. The data included 6 markers from the Y chromosome non-pseudoautosomal region. Since males should be homozygous for these Y markers, any heterozygous Y genotypes suggest genotyping errors. Therefore, these markers provide genotyping error rate information. In summary, we detected 11 heterozygous genotypes in 35,375 Y genotypes (0.016%).
Handling large pedigrees
Several linkage programs based on the Lander-Green algorithms have been developed, each with specific advantages and disadvantages. We are not aware of any single program that could perform all of the types of analyses required for our study, so we employed a combination of five programs: Allegro , CRI-MAP , MENDEL, MERLIN , and METAMAP . While the vast majority of our pedigrees were small enough for Allegro and MERLIN to handle, we had some very large pedigrees that had to be split into smaller sub-pedigrees. We either split or trimmed our large pedigrees (N = 80) into smaller sub-pedigrees for analyses with Allegro and Merlin; this was not necessary for analyses with CRI-MAP. This trimming and splitting reduced our computational time by more than 93%. To construct a single, accurate estimate of the map based on data from two or more populations, we used the program METAMAP to combine the population-specific map estimates.
Study- and Population-specific Marker Alleles
Even though all the genotyping was performed in the same center, the codes used by PIs to describe marker alleles are not necessarily consistent across all studies. To handle this problem, we obtained PCR bandsizes rather than allele codes from each PI. In some rare cases MGS devised multiple primers for the same marker and the allele sizes changed when the primers were altered. Therefore, it was important to use study-specific marker allele labels and allele frequencies when different primers were used in different studies for the same marker. In addition, since different populations might have different allele frequencies, population-specific alleles were also required. We incorporated study- and population-specific marker alleles into our linkage analyses by creating study/population-specific marker copies or by adjusting the PCR bandsizes to be the same for different primers.
Ordering Markers and Comparing with the Physical Maps
Determining the correct marker orders was the first step in our map construction. Discrepancies have been previously noted between some of the Weber screening sets and physical positions . We used the Marshfield genetic maps  and Weber screening set maps  to initially determine marker order. Physical positions of the markers were obtained from NCBI and UCSC Human Genome Browser. We used Multithreaded Electronic PCR (me-PCR) [26, 27] to identify physical positions for markers not already identified in the published sequence. When comparing the marker order from the published maps with order determined on assembled sequence, we also identified a few discrepancies with the screening set map orders. We used linkage analysis of our data to resolve these marker order discrepancies and determine the final map order. In all cases, the linkage analyses we performed confirmed the physical order.
Precise Estimation of Map Distances
With markers carefully ordered, we computed accurate map distances. We also tested the hypothesis that the distribution of recombination does not vary significantly among different ethnic groups. CRI-MAP is the only program that could handle all of our large pedigrees intact and it runs very quickly. Therefore, we used CRI-MAP for initial estimates of inter-marker distances. However, because CRI-MAP does not perform full-likelihood analyses, some level of information loss is expected, which can lead to potential biases in parameter estimates [28, 29]. Therefore, we calculated more accurate map distance estimates by using the full-likelihood program, Allegro. Allegro applies the expectation-maximization (EM) algorithm  for map estimation and can be used for estimation of both sex-averaged and sex-specific maps. Because our European data set was extremely large and was derived from different marker sets, we first built maps separately for each Weber set (9 and 10) and then combined them together using the METAMAP program ; we did the same for the Chinese data set. METAMAP combines maps from different Weber sets (i.e. different studies) using weights that are inversely proportional to the variance of map distance estimates. The variances used by METAMAP were estimated using the non-parametric bootstrap .
Testing for Population-specific Recombination
We used numerical optimization with the MERLIN program to compute the map distances and corresponding variance-covariance matrices for the data in each population. MERLIN does not currently have any built-in map estimation routines. However, it can compute the log-likelihood of the pedigree data for a given map. In order to estimate our map distance, we used the box-constrained optimization function "optim" of the R programming environment (L-BFGS-B method;) to maximize the log-likelihood. The "optim" function optionally returns the Hessian matrix at the convergence point. Inverting the Hessian produced the variance-covariance matrix, which we used in the Wald test for statistical comparisons of the population-specific genetic maps. The variance of chromosomal and genome length was obtained by summing the individual terms in the variance-covariance matrix. We evaluated whether there are any differences between the maps, and if so, where the differences lie. We compared pairs of maps to identify differences in the estimated size of a) individual map intervals, b) individual chromosome map lengths, and c) map length over the entire genome. When performing multiple statistical tests, the Type 1 error rate may increase considerably. Using the QVALUE program , we corrected the multiple comparisons at a genome-wide level by controlling the Benjamini-Hochberg False Discovery Rate (FDR). We presented the p-values after correction for multiple testing. A significance level of 0.05 was used in all the tests.
Among the 26 studies used in our analyses, the median amount of missing genotype data was 3.3% with only two studies missing more than 10% of genotypes (13% and 15%, respectively). Only 0.9% of the markers were missing more than 20% of genotypes, with most having a high missing rate in only a single study and none having a high missing data rate in more than three studies. Many pedigree errors were identified in the uncleaned data that we received. Problems that frequently occurred included half siblings coded as full siblings, non-biological sibs coded as biological sibs, and non-biological children present in the pedigrees. In some rare cases, more complex relationship mistakes were identified. We detected incorrect familial relationships in 129 families. We corrected 75 of them by deleting 124 problematic pedigree members, and removed the remaining 54 entire families that had serious relationship errors. In total, we deleted 499 individuals that accounted for about 3% of the data to eliminate these pedigree errors. Additional pedigrees were excluded from analysis if they did not match one of the 5 main ethnic populations or if pedigree-relationship data were not provided. Next, PedCheck detected approximately 10,000 Mendelian inconsistencies and 1.8% of the genotypes in our study were removed by PedCheck to create Mendelianly-consistent data.
Our final cleaned data contained 15,525 genotyped individuals from 4,237 pedigrees with 5.7 million genotypes (Table 2). The accuracy of map estimates relies greatly on the sample size. The improvement of map distance estimates as the sample size increased was evident. CRI-MAP detected an average of 7,926 informative meioses for our markers. Using 7,926 informative meioses, the expected 95% CI  of 10 cM is 1.6 cM, which is much smaller than on any existing maps.
We determined the map order for the markers used in these disease-mapping studies. Most of the markers are present on the Marshfield map. While the map orders on the Marshfield map, Weber set maps, and physical maps were consistent with each other for the majority of the markers, we found several mistakes in the Marshfield map and the Weber screenset maps. Linkage results from CRI-MAP were used to clarify these map order problems. Marker D20S159 was assigned to chromosome 20 on the Marshfield and Weber set 10 maps. However, both its physical location and our linkage results confirmed that it is located on chromosome 2. Also, an X chromosome marker, DXS9893, was assigned to an incorrect position in the Weber set 10, where it was listed as being about 44 cM upstream of the position identified by me-PCR and confirmed by our linkage analysis. We also detected two minor map order inversions in the Marshfield map, one on chromosome 6 (the correct order: D6S1034-D6S1006-D6S2434) and another on chromosome 20 (the correct order: D20S451-D20S164-D20S171). Linkage analyses confirmed that the physical map orders are correct for both of these cases.
Enhanced Genetic Maps
Genetic map lengths in different populations (Kosambi cM)
Weber Set 9 and 10
Weber Set 9 and 10
Weber Set 9
Weber Set 10
Weber Set 10
Due to the large sample size, we observed a large number of informative meioses, which statistically ensured the accuracy of our map estimates. The standard error of a 10 cM map interval was only 0.6 cM on the sex-averaged map. Therefore, for a map interval of 10 cM, the estimated 95% CI was only 2.4 cM long. Since only about half the meioses were used when estimating each sex-specific map, the standard errors that we observed in female and male maps were a bit larger than those in the sex-averaged maps: for a map interval of 10 cM, the sex-specific standard errors were usually around 1 cM.
Our detailed sex-averaged maps, female maps, male maps, and their corresponding standard error (S.E. for theta) in each map interval are listed in Additional File 1: European_maps.xls. Population-specific map distances were estimated using the African American, Chinese, Hispanic and Palauan datasets and are described below (see "Genetic maps from non-European Populations" and Additional Files 2, 3, 4, 5: Chinese_maps.xls, African American_maps.xls, Hispanic_maps.xls, Palauan_maps.xls.)
In response to a concern raised during the review process about the possible impact of null alleles on our results, we evaluated the frequency of null alleles in our data . We used the MENDEL program to estimate the null allele frequency of each marker, separately by population. We found only one marker had a null allele frequency > 0.05, a level below which our simulations indicate negligible impact of null alleles on the accuracy of estimates of recombination rate (data not shown). This marker was D6S1959, which Jorgenson et al.  also found to have a null allele, and its frequency was > 0.05 in the African American, Chinese, and Palauan datasets. As a sensitivity analysis, we compared the map lengths obtained ignoring null alleles for the two map intervals flanking this marker with estimates obtained while modeling null alleles using MENDEL. In all three populations where the null allele frequency is > 0.05, the estimates of recombination fraction allowing for null alleles are very similar to the estimates obtained with a conventional analysis that does not allow for null alleles. Therefore, with only one marker out of 461 showing a modest frequency of null alleles, and having demonstrated that the impact of that marker on our map estimates is small, we are confident that null alleles do not have a substantial impact on our analyses and conclusions.
Comparing Our Maps with Marshfield/Weber Maps
Comparisons of Population-specific Maps
Summary of significant between-population map comparison results
Significant sex-averaged Intervals
length (Haldane cM)
European vs. Chinese
9 and 10
chromosome 8p: D8S1130-D8S1106
(p = 1.17E-05)
p = 7.29E-8
(4,185 vs. 4,036 cM)
European vs. Hispanic
p = 2.96E-5
(4,183 vs. 3997 cM)
Chinese vs. Hispanic
NS (p = 0.19)
(4,063 vs. 3,997 cM)
African-American vs. European
1, 2, 5, 6, 11, 18, 19, 21
p < 1E-10
(4,218 vs. 4,011 cM)
African-American vs. Chinese
chromosome 2p: D2S2952-D2S1400
(p = 0.036)
chromosome 6q: D6S305-D6S1277
(p = 0.0008)
chromosome 8p: D8S264-D8S277
(p = 0.0065)
chromosome 8p: D8S277-D8S1130
(p = 5.80E-09)
chromosome 8p: D8S1130-D8S1106
(p = 8.29E-06)
chromosome 18p: GATA178F11-D18S481
(p = 0.012)
1, 2, 3, 4, 5, 7, 11, 12, 13, 14, 16, 17, 18, 21
p < 1E-10
(4,218 vs. 3,883 cM)
European vs. Palauan
chromosome 11p: D11S1984-D11S2362
(p = 6.8E-05)
NS (p = 0.70)
(4,182 vs. 4,158 cM)
Chinese vs. Palauan
NS (p = 0.17)
(4,066 vs. 4,158 cM)
Hispanic vs. Palauan
p = 0.03
(3,998 vs.4,158 cM)
Several significant differences were identified when comparing the European and Chinese data. The European map was significantly longer than the Chinese map (European = 4,185 cM, Chinese = 4,036 cM, p = 7.29E-08); most chromosome maps were longer with chromosomes 4 and 14 being statistically significant (p = 0.04 and p = 1.47E-04, respectively); and one specific map interval was significantly longer in the European map (5.3 cM) compared with the Chinese map (1.7 cM): 8p: D8S1130-D8S1106 (p = 1.17E-05) (Table 4). An 8p inversion polymorphism has been previously reported at this map location.
We did not observe any significant map interval differences between the Hispanic and European maps after the correction for multiple testing. However, two chromosomes, chromosome 1 (p = 0.01) and the X chromosome (p = 0.01), showed significant differences in length. The chromosome 1 and X maps were 12% and 26% longer in Europeans than in Hispanics, respectively. The overall map length was also significantly different (European = 4,183 cM, Hispanic = 3,997 cM, p = 2.96E-05), with the European map about 5% longer than the Hispanic map (Table 4).
The Chinese and Hispanic samples were also compared with each other using the Weber set 10 markers. The smallest p-value was observed at the map interval D10S189-D10S1412 on chromosome 10, but it is not genome-wide significant (p = 0.26). For the map length comparisons, the difference was significant only for chromosome 14 (p = 0.04), where the Hispanic map was about 23% longer than the Chinese (Table 4). The overall Hispanic map was about 2% shorter than the Chinese map, which is not significant (Chinese = 4,063 cM, Hispanic = 3,997 cM, p = 0.19).
African American data were genotyped using the Weber set 9. When comparing it with the European data, we did not observe any map interval differences of genome-wide significance after the correction for multiple testing. The smallest p-value is only 0.14 which was located at the map interval D16S748-D16S764 on chromosome 16. The African American data had longer map lengths for all the autosomes, whereas the map length for the X chromosome was nearly the same. The differences are significant for eight chromosomes (1, 2, 5, 6, 11, 18, 19, and 21). Finally, the difference in the overall map length was highly significant (African American = 4,218 cM, European = 4,011 cM, p < 1E-10) between the two populations, with the African American map about 5% longer than the European map (Table 4).
We also compared the African American population with the Chinese population using the Weber set 9, and we detected 6 map interval differences of genome-wide significance. The p-values were very small. For these four map intervals, one was located on chromosome 6 (D6S305-D6S1277) and the three others were consecutively located on chromosome 8 short arm (D8S264-D8S277-D8S1130-D8S1106). The D6S305-D6S1277 interval size was 7.3 cM and 3.2 cM in the African American and Chinese data, respectively. The three consecutive map intervals on chromosome 8p were located in the same common inversion polymorphism region as we observed between the European and Chinese maps. The African American map interval sizes were 17.5 cM, 4.4 cM, and 6.0 cM, while the Chinese sizes were 10.8 cM, 12.9 cM, and 1.8 cM, respectively. The other two significant map intervals at FDR = 0.05 level were located on chromosome 2 (D2S2952-D2S1400) and on chromosome 18 (GATA178F11-D18S481) with p = 0.036 and p = 0.012, respectively. The African American data have longer map lengths for all the chromosomes and differences are significant for 14 of them. The largest difference was observed at chromosome 21 where the African American map was about 19% longer than the Chinese map. The overall African American map was 9% longer than the Chinese map, and this difference was highly statistically significant (African American = 4,218 cM, Chinese = 3,883 cM, p < 1E-10) (Table 4).
We also compared the Palauan and European data using the Weber set 10 markers. One map interval difference of genome-wide significance, D11S1984-D11S2362 (p = 6.8E-05), was at the distal end of chromosome 11 short arm (Table 4). The map distance in the Palauan and European data was 2.1 cM and 9.5 cM, respectively. The map lengths of each chromosome and the overall map length in the two populations did not differ significantly (European = 4,182 cM, Palauan = 4,158 cM, p = 0.70). When the Palauan data were compared with the Chinese data, we observed the smallest p-value at the same D11S1984-D11S2362 map interval, which, however, is not statistically significant (p = 0.24). We did not observe any significant difference in the overall map lengths (Chinese = 4,066 cM, Palauan = 4,158 cM, p = 0.17) or chromosome map length, either. When the Palauan results was compared with the Hispanic results, the only significant difference detected is the overall map length (Hispanic = 3,998 cM, Palauan = 4,158 cM, p = 0.03), where the Palauan map was about 4% longer than the Hispanic map (Table 4).
Genetic maps from non-European Populations
Because we observed significant map-length differences between some population groups, we also separately constructed sex-averaged and sex-specific genetic maps in the four non-European populations using Allegro. These maps are summarized in Table 3 and are included as Additional Files 2, 3, 4, 5 (Chinese_maps.xls, African American_maps.xls, Hispanic_maps.xls, Palauan_maps.xls). The Chinese and African American sample sizes are large, so their data alone can provide accurate map estimates for future linkage scans in the two populations. Our Hispanic and Palauan sample sizes are comparatively small. Map lengths for each of these populations were estimated using different sets of markers, so their map lengths are not directly comparable with each other.
We have constructed high-precision genetic maps with a very large data set generated by the NHLBI Mammalian Genotyping Service (MGS) and performed a systematic comparison of genetic maps across different populations. Accurate gene mapping requires high quality genetic maps. However, errors from a variety of sources cannot be avoided. We collected the genotype data in an uncleaned format and performed thorough and consistent data cleaning. By using the program PREST, we verified pedigree structures and over one hundred pedigrees with relationship errors were detected in these samples. Data with undetected pedigree errors could lead to inaccurate linkage results that can influence the conclusion regarding the presence or the absence of a linkage . The fact that we found so many relationship errors in these uncleaned data is a reminder of the need for a rigorous verification of pedigree information in linkage studies.
Different studies may use different labels to represent alleles and allele frequencies may vary in different populations. Therefore, it is important for linkage programs to use study/population-specific marker allele labels and frequencies when jointly analyzing data from different studies and populations. Unfortunately, most available linkage software (except newer versions of MENDEL ) cannot handle study/population-specific alleles directly. In this study, we employed two very useful approaches: we created dummy marker copies in each dataset for any markers genotyped in different studies or made proper bandsize adjustment for those markers that were genotyped by multiple primers. We were then able implement linkage analyses using study/population-specific alleles without the need to modify existing software.
By comparing the genetic orders of autosomal markers from the Weber sets 9 and 10 with their physical positions, DeWan et al.  identified 7 markers in the Set 10 and 5 markers in Set 9 whose physical orders were inconsistent with their genetic orders. With our large data collection, we confirmed that most of these previously-identified inconsistencies resulted from the imprecision of the physical map used in that comparison. With the latest (more accurate) physical assembly data, we only detected one inconsistency that had been encountered by DeWan et al: marker D20S159 was assigned to the wrong chromosome in the Weber set 10 (assigned to chromosome 20 instead of chromosome 2). In addition, we identified a marker order mistake on the X chromosome: marker DXS9893 in the Weber set 10 was incorrectly placed position 44 cM upstream of its actual location. These ordering problems could seriously impair the validity and accuracy of results of any linkage analysis that used these markers. In order to obtain correct linkage results for previously published genome scans, multipoint linkage analyses should be repeated on these regions with the correct map orders.
We tested population-specific recombinations across five ethnic groups. Numerical optimization is extremely time-consuming when the number of estimated recombination fractions (N) becomes large because the computational complexity is generally on the order of N 2 for each iteration. The great advantage of the numerical optimization method is that we can incorporate the covariance terms into our calculation as well as directly confirm the success of convergence, which improves our statistical tests. It was necessary to include the covariance terms because the map distance estimates of map intervals on the same chromosome are not always independent. Our results showed that adjacent map interval estimates are usually negatively correlated with each other, while the map intervals far apart tend to be independent (results not shown).
When comparing the maps interval by interval, the results from the European, Chinese, and Hispanic samples were in quite good agreement with each other. One region on chromosome 8p showed significant length differences between the European and Chinese maps, and between the African American and Chinese maps. This map interval lies within the 8p (8p23.1-8p22) inversion polymorphism region , which also harbors recurrent chromosomal rearrangements, including an inverted duplication deletion (8p23) [17, 39, 40]. This region harbors several members of the olfactory gene receptor family and is flanked by repeated inverted sequences which mediate homologous unequal recombination . The frequency of 8p inversion carriers has been estimated at 39% in a Japanese population and 26% in Europeans [39, 40]. Since an inversion has the potential to influence the computed map distance, either by suppressing recombination or altering regional physical distances, different map lengths could be observed when inversion frequencies differ among populations. Due to the sparseness of the Weber screening sets, it is not possible for us to investigate the potential impact of this inversion polymorphism on these maps in more detail. We also detected three other significantly different map intervals when comparing the African American and the Chinese samples (Table 4).
A highly significant difference in the D11S1984-D11S2362 map interval size was observed between the Palauans and the Europeans. This map interval is located within 5 Mb of the beginning of chromosome 11, where an exceptionally high level of structural variations have been reported recently. Tuzun et al.  identified 297 sites of structural variations (inversions, deletions, and insertions) in the whole genome, six of which were clustered in this narrow region. It would be interesting to evaluate the Palau-island population for the presence and frequency of structural variants in this region.
We also compared the map lengths of individual chromosomes and of the entire genome across the populations. We identified several chromosomes with significantly different map lengths between populations, and the full-genome-length comparisons showed the African American map to be longer than the European and Chinese maps (consistent with Jorgenson et al. ), the European map to be longer than the Chinese map (consistent with Ju et al. ), and the European map to be longer than the Hispanic map. Map lengths are expected to vary from one dataset to another based on differences in sample sizes, pedigree structure, genotyping completeness, and marker heterozygosities.
The accuracy of map estimates can be measured by the standard errors and the 95% CIs. Because of the large sample size of the European data, the standard errors for our enhanced sex-averaged map are quite small and the 95% CI for a 10 cM map interval in Europeans is only approximately 2.4 cM long.
Our European and Chinese enhanced maps are the first population-specific genetic maps constructed using a meta analysis approach to combine maps constructed using separate marker set-specific datasets. The method that we adopted has efficiency comparable to that of joint analysis of pooled data . In addition, combining maps from different datasets can avoid the practical difficulty of pooling a large heterogeneous data collection for a joint analysis. Without any need to access our original data, other investigators can easily incorporate their own data and improve these maps in the future.
The enhanced linkage maps from this study are being used to improve estimates of map distances on the Rutgers Map . The Rutgers Map provide map positions for over 28,000 markers (SNPs and microsatellite markers) using a combination of physical positions and linkage-based distance estimates. The Rutgers Map interpolation tool can be used to interpolate linkage map positions for any marker based on its physical position. This resource facilitates the use of genetic maps of SNPs for genome scans for linkage to genetic traits. While the Rutgers Map includes nearly all markers available for construction of linkage maps, these markers were only genotyped in a relatively small pedigree set, with an average of 301 informative meioses per marker. Incorporation of the map distance estimates obtained from these enhanced linkage maps will improve the accuracy of the Rutgers Maps.
In summary, we have evaluated 461 markers from the common Weber screening set maps using a very large set of genotype data. We used these data to obtain highly precise estimates of recombination-based map distances and to correct marker order discrepancies, resulting in enhanced linkage maps that can facilitate more accurate genome-wide linkage analyses. We also used these data to identify several discrepancies in map distances between specific ethnic populations, and to provide population-specific maps for African Americans, Chinese, Hispanic, and Palauan samples. For regions where map lengths differ among populations, using the population-specific map distances may allow for more accurate linkage analyses. Our data support the suggestion that there may be population differences in genomic structure, and that ignoring such differences could have a negative impact on genetic analyses.
This study was supported by NIH grants HL071029 and GM080221. We thank Alejandro Nato, Dr. Xiangyang Kong, Dr. Karl Broman, Dr. Leonid Kruglyak, Dr. Kyriacos Markianos, and Dr. Michael Frigge for sharing ideas and helpful communications. We thank Dr. James Weber for helpful advice and aid in contacting the PIs of the individual studies, and acknowledge the grant that funded the Marshfield Mammalian Genotyping Service (N01HV048141). Most of our computational jobs were done using the computer cluster at the Department of Human Genetics of the University of Pittsburgh, and we thank Ryan Evans for his help on computation and Dr. Michael Barmada for creating and maintaining this important resource. We thank all of the Enhanced Map Consortium scientists who provided data for this study (and acknowledge in parentheses grants that funded sample collection): Graeme Bell (P60DK020595), Wade Berrettini, Dorret Boomsma and Jouke Jan Hottenga, Rita Cantor, William Catalona, Judy Cho, Patrick Concannon (R01DK46635), Lynn DeLisi (NIDA-R01DA021576 and NIMH-R21MH083205), Richard Duerr (Scaife Family Foundation and Crohn's & Colitis Foundation of America), Steven Hunt and D.C. Rao (HyperGEN project: R01HL54471, R01HL54472, R01HL54473, R01HL54495, R01HL54496, R01HL54497, R01HL54509, R01HL54515, and R01HL55673), Howard Jacob, Michael Klein, Helena Kuivaniemi (HL064310), Suzanne Leal (R01DC03594), Jeffrey Murray, Marina Myles-Worsley (R01MH54186, R01MH560908), Mario Pirastu, Alan Shuldiner (R01AR46838), Gerard Tromp (R01NS034395), Abhay Vats (DK02854, DK64933), Scott Weiss, Xiping Xu.
- Botstein D, White RL, Skolnick M, Davis RW: Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980, 32 (3): 314-331.PubMedPubMed CentralGoogle Scholar
- A comprehensive genetic linkage map of the human genome. NIH/CEPH Collaborative Mapping Group. Science. 1992, 258 (5079): 148-162.Google Scholar
- Gyapay G, Morissette J, Vignal A, Dib C, Fizames C, Millasseau P, Marc S, Bernardi G, Lathrop M, Weissenbach J: The 1993-94 Genethon human genetic linkage map. Nat Genet. 1994, 7 (2 Spec No): 246-339.View ArticlePubMedGoogle Scholar
- Murray JC, Buetow KH, Weber JL, Ludwigsen S, Scherpbier-Heddema T, Manion F, Quillen J, Sheffield VC, Sunden S, Duyk GM, Weissenbach J, Gyapay G, Dib C, Morrissette J, Lathrop GM, Vignal A, White R, Matsunami N, Gerken S, Melis R, Albertsen H, Plaetke R, Odelberg S, Ward D, Dausset J, Cohen D, Cann H: A comprehensive human linkage map with centimorgan density. Cooperative Human Linkage Center (CHLC). Science. 1994, 265 (5181): 2049-2054.View ArticlePubMedGoogle Scholar
- Matise TC, Perlin M, Chakravarti A: Automated construction of genetic linkage maps using an expert system (MultiMap): a human genome linkage map. Nat Genet. 1994, 6 (4): 384-390.View ArticlePubMedGoogle Scholar
- Dib C, Faure S, Fizames C, Samson D, Drouot N, Vignal A, Millasseau P, Marc S, Hazan J, Seboun E, Lathrop M, Gyapay G, Morissette J, Weissenbach J: A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature. 1996, 380 (6570): 152-154.View ArticlePubMedGoogle Scholar
- Broman KW, Murray JC, Sheffield VC, White RL, Weber JL: Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet. 1998, 63 (3): 861-869.View ArticlePubMedPubMed CentralGoogle Scholar
- Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R: Centre d'etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics. 1990, 6 (3): 575-577.View ArticlePubMedGoogle Scholar
- Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K: A high-resolution recombination map of the human genome. Nat Genet. 2002, 31 (3): 241-247.PubMedGoogle Scholar
- Weber JL: The Iceland map. Nat Genet. 2002, 31 (3): 225-226.PubMedGoogle Scholar
- Daw EW, Thompson EA, Wijsman EM: Bias in multipoint linkage analysis arising from map misspecification. Genet Epidemiol. 2000, 19 (4): 366-380.View ArticlePubMedGoogle Scholar
- Fingerlin TE, Abecasis GR, Boehnke M: Using sex-averaged genetic maps in multipoint linkage analysis when identity-by-descent status is incompletely known. Genet Epidemiol. 2006, 30 (5): 384-396.View ArticlePubMedGoogle Scholar
- Halpern J, Whittemore AS: Multipoint linkage analysis. A cautionary note. Hum Hered. 1999, 49 (4): 194-196.View ArticlePubMedGoogle Scholar
- Yuan B, Vaske D, Weber JL, Beck J, Sheffield VC: Improved set of short-tandem-repeat polymorphisms for screening the human genome. Am J Hum Genet. 1997, 60 (2): 459-460.PubMedPubMed CentralGoogle Scholar
- Weitkamp LR: Proceedings: Population differences in meiotic recombination frequency between loci on chromosome 1. Cytogenet Cell Genet. 1974, 13 (1): 179-182.View ArticlePubMedGoogle Scholar
- Jorgenson E, Tang H, Gadde M, Province M, Leppert M, Kardia S, Schork N, Cooper R, Rao DC, Boerwinkle E, Risch N: Ethnicity and human genetic linkage maps. Am J Hum Genet. 2005, 76 (2): 276-290.View ArticlePubMedGoogle Scholar
- Ju YS, Park H, Lee MK, Kim JI, Sung J, Cho SI, Seo JS: A genome-wide Asian genetic map and ethnic comparison: the GENDISCAN study. BMC Genomics. 2008, 9: 554-View ArticlePubMedPubMed CentralGoogle Scholar
- McPeek MS, Sun L: Statistical tests for detection of misspecified relationships by use of genome-screen data. Am J Hum Genet. 2000, 66 (3): 1076-1094.View ArticlePubMedPubMed CentralGoogle Scholar
- O'Connell JR, Weeks DE: PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet. 1998, 63 (1): 259-266.View ArticlePubMedPubMed CentralGoogle Scholar
- Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25 (1): 12-13.View ArticlePubMedGoogle Scholar
- Lander ES, Green P: Construction of multilocus genetic linkage maps in humans. Proc Natl Acad Sci USA. 1987, 84 (8): 2363-2367.View ArticlePubMedPubMed CentralGoogle Scholar
- Lange K, Cantor R, Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E: MENDEL version 4.0: A complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. Am J Hum Genet. 2001, 69 (supplement): 504-Google Scholar
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30 (1): 97-101.View ArticlePubMedGoogle Scholar
- Stewart WC: Improving estimates of genetic maps: a meta-analysis-based approach. Genet Epidemiol. 2007, 31 (5): 408-416.View ArticlePubMedGoogle Scholar
- DeWan AT, Parrado AR, Matise TC, Leal SM: The map problem: a comparison of genetic and sequence-based physical maps. Am J Hum Genet. 2002, 70 (1): 101-107.View ArticlePubMedGoogle Scholar
- Murphy K, Raj T, Winters RS, White PS: me-PCR: a refined ultrafast algorithm for identifying sequence-defined genomic elements. Bioinformatics. 2004, 20 (4): 588-590.View ArticlePubMedGoogle Scholar
- Schuler GD: Sequence mapping by electronic PCR. Genome Res. 1997, 7 (5): 541-550.PubMedPubMed CentralGoogle Scholar
- Goldgar DE, Green P, Parry DM, Mulvihill JJ: Multipoint linkage analysis in neurofibromatosis type I: an international collaboration. Am J Hum Genet. 1989, 44 (1): 6-12.PubMedPubMed CentralGoogle Scholar
- Stewart WC, Thompson EA: Improving estimates of genetic maps: a maximum likelihood approach. Biometrics. 2006, 62 (3): 728-734.View ArticlePubMedGoogle Scholar
- Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc. 1977, 39: 1-38.Google Scholar
- Efron B, Tibshirani R: Statistical data analysis in the computer age. Science. 1991, 253 (5018): 390-395.View ArticlePubMedGoogle Scholar
- Byrd RH, Lu P, Nocedal J, Zhu C: A limited memory algorithm for bound constrained optimization. SIAM J Scientific Computing. 1995, 16: 1190-1208.View ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100 (16): 9440-9445.View ArticlePubMedPubMed CentralGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. JRSSB. 1995, 57: 125-133.Google Scholar
- Clopper C, Pearson E: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934, 26: 404-413.View ArticleGoogle Scholar
- Broman K, Matsumoto N, Giglio S, Martin C, Roseberry J, Zuffardi O, Ledbetter D, Weber J, eds: Common long human inversion polymorphism on chromosome 8p. Science and Statistics: A Festschrift for Terry Speed. 2003Google Scholar
- Cherny SS, Abecasis GR, Cookson WO, Sham PC, Cardon LR: The effect of genotype and pedigree error on linkage analysis: analysis of three asthma genome scans. Genet Epidemiol. 2001, 21 (Suppl 1): S117-122.PubMedGoogle Scholar
- Lange K, Weeks D, Boehnke M: Programs for Pedigree Analysis: MENDEL, FISHER, and dGENE. Genet Epidemiol. 1988, 5 (6): 471-472.View ArticlePubMedGoogle Scholar
- Giglio S, Broman KW, Matsumoto N, Calvari V, Gimelli G, Neumann T, Ohashi H, Voullaire L, Larizza D, Giorda R, Weber JL, Ledbetter DH, Zuffardi O: Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosome rearrangements. Am J Hum Genet. 2001, 68 (4): 874-883.View ArticlePubMedPubMed CentralGoogle Scholar
- Shimokawa O, Kurosawa K, Ida T, Harada N, Kondoh T, Miyake N, Yoshiura K, Kishino T, Ohta T, Niikawa N, Matsumoto N: Molecular characterization of inv dup del(8p): analysis of five cases. Am J Med Genet A. 2004, 133-137. 128A(2):Google Scholar
- Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H, Albertson D, Pinkel D, Olson MV, Eichler EE: Fine-scale structural variation of the human genome. Nat Genet. 2005, 37 (7): 727-732.View ArticlePubMedGoogle Scholar
- Matise TC, Chen F, Chen W, De La Vega FM, Hansen M, He C, Hyland FC, Kennedy GC, Kong X, Murray SS, Ziegle JS, Stewart WC, Buyske S: A second-generation combined linkage physical map of the human genome. Genome Res. 2007, 17 (12): 1783-1786.View ArticlePubMedPubMed CentralGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/12/15/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.