Analysis of Xq27-28 linkage in the international consortium for prostate cancer genetics (ICPCG) families

Background Genetic variants are likely to contribute to a portion of prostate cancer risk. Full elucidation of the genetic etiology of prostate cancer is difficult because of incomplete penetrance and genetic and phenotypic heterogeneity. Current evidence suggests that genetic linkage to prostate cancer has been found on several chromosomes including the X; however, identification of causative genes has been elusive. Methods Parametric and non-parametric linkage analyses were performed using 26 microsatellite markers in each of 11 groups of multiple-case prostate cancer families from the International Consortium for Prostate Cancer Genetics (ICPCG). Meta-analyses of the resultant family-specific linkage statistics across the entire 1,323 families and in several predefined subsets were then performed. Results Meta-analyses of linkage statistics resulted in a maximum parametric heterogeneity lod score (HLOD) of 1.28, and an allele-sharing lod score (LOD) of 2.0 in favor of linkage to Xq27-q28 at 138 cM. In subset analyses, families with average age at onset less than 65 years exhibited a maximum HLOD of 1.8 (at 138 cM) versus a maximum regional HLOD of only 0.32 in families with average age at onset of 65 years or older. Surprisingly, the subset of families with only 2–3 affected men and some evidence of male-to-male transmission of prostate cancer gave the strongest evidence of linkage to the region (HLOD = 3.24, 134 cM). For this subset, the HLOD was slightly increased (HLOD = 3.47 at 134 cM) when families used in the original published report of linkage to Xq27-28 were excluded. Conclusions Although there was not strong support for linkage to the Xq27-28 region in the complete set of families, the subset of families with earlier age at onset exhibited more evidence of linkage than families with later onset of disease. A subset of families with 2–3 affected individuals and with some evidence of male to male disease transmission showed stronger linkage signals. Our results suggest that the genetic basis for prostate cancer in our families is much more complex than a single susceptibility locus on the X chromosome, and that future explorations of the Xq27-28 region should focus on the subset of families identified here with the strongest evidence of linkage to this region.


(Continued from previous page)
Conclusions: Although there was not strong support for linkage to the Xq27-28 region in the complete set of families, the subset of families with earlier age at onset exhibited more evidence of linkage than families with later onset of disease. A subset of families with 2-3 affected individuals and with some evidence of male to male disease transmission showed stronger linkage signals. Our results suggest that the genetic basis for prostate cancer in our families is much more complex than a single susceptibility locus on the X chromosome, and that future explorations of the Xq27-28 region should focus on the subset of families identified here with the strongest evidence of linkage to this region.

Background
Prostate cancer (PC) is the most common male cancer in developed countries [1]. In the United States, each year there are over 200,000 newly diagnosed cases and over 30,000 deaths attributable to prostate cancer [2]. Family history, along with older age and African-American ancestry, are the most important risk factors established to date. Inherited genetic factors might account for a proportion of the familial risk, but it has been very difficult to discover the actual genetic basis of prostate cancer probably due to the large number of loci involved, the incomplete and possibly low penetrance associated with these loci, and the likely clinical and genetic heterogeneity of this disease.
In 1996, the first prostate cancer linkage report implicated chromosome 1q23-25 [3], but subsequent linkage studies have found contradictory conclusions. In this same year the International Consortium for Prostate Cancer Genetics (ICPCG), consisting of researchers from 11 groups around the world, was formed. With the initial aim of examining linkage and trying to replicate previous linkage findings the ICPCG pooled 1,323 pedigrees with clinically-(but not genetically-) defined "hereditary prostate cancer" (HPC). Given the large number of families in this dataset, it was hoped that this would provide increased power to confirm or exclude linkage, and to allow for informative linkage analyses of large homogeneous subsets in an attempt to control for some of the likely heterogeneity that would otherwise weaken the ability to detect linkage. The ICPCG analysis of 775 families supported the finding of a prostate cancersusceptibility gene linked to 1q24-25 in a defined subset of prostate cancer families with early age at onset, at least 5 affected relatives and evidence of male-to-male transmission [4]. The RNASEL gene was later implicated as harboring rare variant alleles that increase risk of prostate cancer and may account for this linkage signal [5]. Evidence has been accumulating in support of RNASEL as a prostate cancer risk locus, with several recent large case-control and cohort studies and a very large meta-analysis all showing significant associations of prostate cancer risk with polymorphisms in this locus [6][7][8][9][10].
Several other susceptibility loci presumed to contain rare variants of large effect on individual risk of prostate cancer have been suggested [3,4, and reviewed elsewhere [30,41,42]. In addition, recent genome-wide association studies (GWAS) have implicated multiple loci at which there are common variants (single nucleotide polymorphisms; SNPs) that are not necessarily functional but are associated with small effects on individual risk of prostate cancer [43][44][45][46][47][48][49]. For a review see Varghese and Easton [50]. Work is proceeding to try to identify more susceptibility loci, by GWAS using common SNP risk alleles, by conventional linkage analyses aimed at detecting genes with rare, high-penetrance risk alleles and by whole exome and whole genome sequencing analyses that can be used in conjunction with linkage and GWAS results.
In 1998, a study of 360 multiple-case families found evidence for a prostate cancer susceptibility locus on chromosome X in the region Xq27-q28 (HPCX) [38]. A subset of 52 Finnish families from this study was used to examine whether phenotypic subsets of families exhibited different evidence for linkage to this region. This study showed that families with no male-to-male (NMM) transmission and late age of onset of prostate cancer (> 65 years) exhibited stronger evidence of linkage to the Xq27-28 region than did the complete set of families [33]. There have been five replication studies, four of which supported linkage of prostate cancer susceptibility to this region [6,21,[51][52][53] with the study of large Utah pedigrees yielding independent genome-wide significant evidence of linkage [21], and one which did not support linkage to this region [12]. A fine-mapping study in the Finnish population examined association of prostate cancer to microsatellite markers in the HPCX Xq27-28 region using 108 independent prostate cancer patients selected from families with multiple affected men (55 were from the linkage study above) and 257 controls (anonymous, healthy male blood donors) from the same Finnish population. Significant association was observed for two markers in the region, DXS1205 (p = 0.0003) and bG82i1.1 (p = 0.0006), with stronger association observed at DXS1205 in the subset of 60 cases from families with no evidence of male-to-male transmission (p = 0.0002) [11]. Association of these two markers with prostate cancer risk has been replicated in an Ashkenazi Jewish founder population [6]. Positive associations were observed for allele 135 of the bG82i1.1 marker (OR = 1.77, P = 0.01) and allele 188 of DXS1205 (OR = 1.65, P = 0.02) in 979 prostate cancer cases and 1,251 controls.
Under the Xq27-q28 linkage peak is a region of 750 kb containing five SPANX genes (SPANX-A1, -A2, -B, -C, and -D). The SPANX genes encode nucleusassociated sperm proteins and their expression has been detected in a variety of cancers. While they were originally suggested as candidate genes for the HPCX susceptibility locus [54], more recent work has found no association between prostate cancer and mutations in any of these genes [55]. However, a more complex involvement of these genes is possible. Putative candidate genes for association with prostate cancer have been found on other regions of the X chromosome.
Gudmundsson et al. conducted a genome-wide SNP association study of prostate cancer in over 23,000 Icelanders followed by a separate replication study. Of the two novel SNPs identified by this study, one, rs5945572, was found on XP11.22 (odds ratio (OR) = 1.23) [56]. Eeles et al. also found association to this region in a large GWAS [47]. However, the odds ratios for the risk genotypes at this putative locus are quite small and not likely to be responsible for the linkage signal observed on Xq in highly aggregated pedigrees.
The aims of this study were to examine the evidence for linkage of prostate cancer to chromosome X using 1,323 multiple-case prostate cancer families from the ICPCG and genotyping a consensus map of 25 microsatellite markers and using both parametric and nonparametric allele-sharing linkage analyses. The pedigree subsets evaluated were presence/absence of male-tomale disease transmission (a surrogate for X-linked inheritance), Carter criteria of HPC [57,58], average age at onset of affected men in the family (<65 years of age or ≥ 65 years), and number of men in a family with confirmed PC. Determining whether any of these subsets show stronger evidence of linkage to the region may guide the selection of cases for future mutational analysis in this region.

Methods
This analysis was performed on 1,323 families with hereditary prostate cancer ascertained by 11 groups participating in the ICPCG. The process of ascertaining families and confirming diagnosis of prostate cancer differed among the groups, but in all samples, men were considered to be affected with prostate cancer only if medical records or death certificates could confirm the diagnosis. The 11 groups that participated in this linkage analysis are described elsewhere [59].
In the statistical analysis, all families were first analyzed together. In addition, several subsets of families were created based on pedigree characteristics. A pedigree was classified as satisfying the Carter criteria for hereditary prostate cancer [57,58] if at least one of the following conditions were met: 1) three consecutive generations of PC along a line of descent; 2) at least three first-degree relatives with a diagnosis of PC; 3) two or more relatives with a diagnosis of PC at age ≤ 55 years. Pedigrees were also classified according to whether transmission of PC in the family appeared consistent with X-linked transmission (yes versus no versus unclear). A pedigree was considered to be consistent with X-linked transmission if all affected males only had a family history of prostate cancer on the maternal side of the family so there was no evidence of male-to-male transmission of a prostate cancer risk allele. A pedigree was considered to be inconsistent with X-transmission if the family contained at least one affected father-affected son pair or if at least one affected son had an affected paternal uncle or paternal grandfather (male-to-male transmission). Pedigrees containing at least one male who had a family history of prostate cancer on both sides of his family (bilineal) were considered to be inconsistent with X-transmission. Pedigrees containing only a sibship of affected men with no information about the prostate cancer history on the maternal or paternal sides of the family were considered to be unclear for Xtransmission. Pedigrees were also classified as to whether or not the average age at onset of affected men in the family was less than 65 years of age.
Each group had genotyped a different set of markers in the Xq27-q28 region. In order to use the available genotype data without re-genotyping a common panel of markers, a consensus map of the genetic markers from the different groups was created as follows. A total of 26 different markers on chromosome Xq (see Additional file 1: Table S1) were genotyped by ICPCG members. For our analysis, the order of these markers was determined from UCSC Goldenpath (version hg13, released Nov.14.2002). The marker distance was based on the de-Code map [60]. All markers were successfully mapped to Goldenpath or deCode maps and cM distances were interpolated for some markers that were located on Goldenpath but not on the deCode map (Table 1). Because some groups did not have either the first or last markers from this consensus map, dummy noninformative markers (i.e., homozygous for all subjects) were used as anchors for these groups. This allowed us to align all group's linkage files to the consensus map, allowing for different groups using different markers. All groups computed parametric multipoint LOD scores and non-parametric multipoint allele-sharing LOD scores at 1 cM intervals along the consensus map, using the GENEHUNTER-PLUS software [61][62][63] implemented in common PERL scripts. These analyses were repeated using only the 964 families that were not included in the original publication of linkage to the Xq27-28 region [38]. The output files containing pedigree-specific parametric LOD scores and intermediate files for computing nonparametric Kong and Cox allele sharing LOD's for each pedigree were sent to the Data Coordinating Center, which then combined the data for the linkage analyses. The planned analyses were developed and approved by members of the ICPCG. The allele frequencies for each marker in each group were estimated by counting alleles across all families, ignoring genetic relationships. All groups ran analyses using the widely-spaced genome-wide screening (GWS) markers (shown in bold text in Additional file 1: Table  S1 and Table 1). Since some groups had also genotyped fine-mapping (FM) markers after finding suggestive evidence for linkage, our primary analyses used the GWS markers in order to attempt to eliminate any biases due to different information content across datasets. We also performed secondary analyses that included both GWS and FM markers. Since individual groups had finemapped at different densities and therefore obtained different levels of information content in their families, the secondary analyses were quite variable across samples in the amount of information available from the FM markers. This variability in marker density across studies could result in bias and so the combined analyses using the GWS markers are considered more reliable. We created two marker maps, one for the GWS markers and one for the FM markers (Table 1 shows the merged map of the GWS and FM markers).
A parametric model for dominant X-linked inheritance was used: the "Smith" model [3], with 2 liability classes, adapted to affecteds-only, X-linkage and a sex-limited trait. Multipoint parametric and non-parametric analyses were performed using GENEHUNTER-PLUS. After combining the results, multipoint heterogeneity LOD scores (HLODs) [63] were computed using the LOD scores from all sites. For the nonparametric allelesharing LODs, the Kong and Cox allele sharing statistics were computed using output files from GENEHUNTER-PLUS [62].

Results
In the nonparametric analyses, the analysis of all families using the GWS marker set resulted in an allele-sharing LOD of 2.0 in favor of linkage to Xq27-q28 at 138 cM, which is well below the commonly accepted threshold for claiming statistically significant evidence for linkage. Non-parametric analyses using the fine mapping (FM) marker set always resulted in the same or lower allele-sharing LODs (e.g. 1.22 at 125 cM in the complete dataset). The subsets that resulted in higher allele-sharing LODs for the GWS markers were the 732 families with 2-3 affecteds (allele-sharing LOD = 2.56 at 134 cM), the 627 families where mean age at onset was <65 years (allele-sharing LOD = 2.34 at 138 cM), and the subset of 288 families with 2-3 affecteds that appeared to exhibit male-to-male transmission (allele-sharing LOD = 3.49 at 134 cM). The subsets of families that appeared to exhibit patterns of prostate cancer consistent with X-linked inheritance did not have high positive allele-sharing LODs ( Table 2).
In the parametric, multipoint HLOD analyses, when all families were analyzed using the GWS marker set, the maximum HLOD was 1.28 at 138 cM (Figure 1a). When the FM marker set was used, the maximum HLOD was 0.45 at 125 cM in the complete set of families.
Subset analyses yielded larger HLOD scores in some subsets under this 2-liability class dominant parametric model. When using the GWS marker set, the subset of 104 families consistent with X-linked transmission, the 484 unclear families and the 735 non-X-linked (male-to-male transmission) families all gave positive HLODs, with a stronger signal observed in the latter group of families. When the FM marker set was used, the same pattern was observed: the subset of X-linkage transmission families gave a maximum HLOD = 0.246 at 132 cM, the unclear families gave a maximum HLOD = 0.142 at 143 cM, and the non-X-linkage families (male-to-male transmission families) yielded a maximum HLOD = 0.62 at 153 cM. The subset of 627 families with mean age at onset 65 years or younger gave HLOD = 1.8 at 138 cM using the GWS markers. The 696 families with mean age greater than 65 had maximum HLOD = 0.32 at 120 cM. Subdivisions based on the Carter criteria alone were not highly correlated with linkage evidence. Number of affected males in the family had a larger effect on linkage evidence, particularly when combined with pattern of transmission. The 732 families with 2-3 affecteds per family had maximum HLOD = 2.01 at 134 cM, whereas the 438 families with 4-5 affected males had HLOD = 0.1 at 168 cM and the 153 families with 6 or more affected males had HLOD = 1.4 at 153 cM. Consistent with the non-parametric analyses, the strongest evidence for linkage occurred in the subset of 288 families with 2-3 affected males and at least some evidence of maleto-male transmission: maximum multipoint HLOD = 3.24 at 134 cM ( Figure 1b). Interestingly, when the analysis of this subset was restricted to the 248 families that were not included in the originally published linkage study [38], the maximum multipoint HLOD increased slightly to 3.47 at 134 cM, which exceeds the 3.3 value suggested by Lander and Kruglyak [64] for genome-wide significance (Figure 1c). The subset of 330 families with 2-3 affected males who also met the Carter criteria gave similarly strong linkage results with a maximum multipoint HLOD = 2.38 at 137 cM in all such families (Figure 1d) and 2.74 at 138 cM in the 284 families in this subset that were not included in the original X-linkage publication [38] (Figure 1e).

Discussion
In the analyses presented here, there appeared to be little distinction between families with phenotypic segregation patterns consistent with X-linked inheritance (no male-to-male transmission) or those with evidence of male-to-male transmission when considering linkage evidence provided by those subsets of families for a PC susceptibility locus at Xq27-q28. Families with smaller numbers of affected men appeared to contribute the most evidence to linkage in this region. While classification of each family based on proportion of affected men out of total men old enough to be affected might provide more homogeneous subsets, this was not feasible for this study given the many sources of families with quite different ascertainment schemes and different degrees of completeness of pedigree data collection. In addition, given that prostate cancer is a late age at onset disease and is quite common, we are only able to assign "unaffected" status to men who are over the age of 75 years and who have a normal digital-rectal exam and normal PSA at that advanced age. There are small numbers of such well-characterized, elderly unaffected men in this set of families and most of the families do not have any of them, which would make a proportion misleading. Families with 2-3 affecteds per family had maximum HLOD = 2.01 at 134 cM. However, the strongest suggestion of linkage was observed in both the parametric and non-parametric analyses in families with 2-3 affecteds and possible male-to-male transmission. This pattern was observed whether we included the families from the initial linkage publication in the analysis or not. This subset of families, when excluding the families from the original linkage publication, had a multipoint HLOD of 3.47 at 134 cM. The HLOD in these new families was slightly over the Lander and Kruglyak threshold of 3.3 for genome-wide "significant" linkage but this threshold does not account for our multiple testing of different subsets, which likely requires a larger threshold to claim robust statistical evidence of linkage. However, this level of significance would meet the Lander and Kruglyak threshold for replication of a previously significant linkage (p = 0.01 or a LOD of approximately 1.0) even after correction for the multiple analyses. One candidate locus, SPANXB1, lies under this linkage peak. Since prostate cancer is fairly common, our analysis models allowed for the presence of sporadic cases in the families and the families with possible male-to-male transmission included some bilineal families. Thus, it is possible that in the male-to-male transmission family subsets, some families show sharing of X-chromosome markers among the maternally-related affected relative pairs in these pedigrees and no sharing among the paternally related affected pairs, thus giving evidence for X-linkage in these families. It appears that the evidence for linkage to Xq27-q28 is being driven mainly by families not included in the original linkage study [38] and these new families have had very few FM markers genotyped in this region (Additional file 1: Table S1). Interestingly, when the FM markers were added to the analyses of these same subsets, the HLODs no longer reached the Lander and Kruglyak genome-wide significance threshold. The information content when using only the GWS markers was fairly consistent across all families. However, when the FM markers were added, the information content differed greatly across groups of families and between the original and new families. An additional difference between the original and new families is that the families from the original linkage study had no markers genotyped more centromeric than 144 cM. Thus, it is possible that the differing position of linkage peaks between the analyses of the original and new families coupled with differential linkage information across these datasets is contributing to the inconsistent results. Our complete prostate cancer pedigree resource, with additional fine-mapping in the Xq27-28 region, provided, at best, modest suggestive evidence for linkage to this region. However, subsets of families with fewer affected individuals and paradoxically, families with some possible evidence of male-to-male disease transmission showed stronger linkage signals. This same result was observed in the analysis of very large Utah pedigrees [21], in which the best evidence for linkage was observed in the set of pedigrees with a maximum of 5 generations and an average of 2.5 genotyped prostate cancer cases. Although our finding of somewhat stronger linkage evidence in families with male-to-male disease transmission might not be sensible when considering a single locus on the X chromosome causing disease, this might indicate a more complex interaction between other susceptibility loci situated on the autosomes and a locus at Xq27-28. However, this finding might simply be due to high locus heterogeneity and/or important environmental risk factors in causation of prostate cancer, such that families segregating an Xlinked risk allele may have at least one affected family member who is not a carrier of this risk allele and who is paternally related to another affected family member. Figure 2 shows one such pedigree that exhibits both potential male-to-male transmission and potential maternal inheritance of PC. In this family, with a maximum LOD of 1.7 in the HPCX region, all maternally related affected males share a linked haplotype in this region and the one paternally-related affected male does not share this haplotype. Finally, it is possible that the true causal allele in this region lies within the pseudoautosomal regions PAR2, which is near this linkage peak. Since female carriers cannot become affected with prostate cancer and since no marker loci have been genotyped in the PAR2 region in our families, our current data are inadequate for resolving this.

Conclusions
Although our results do not provide strong evidence for a major prostate cancer susceptibility gene located in this region of Xq27-28, there is some evidence for a locus that may contribute to risk in families with 2-3 affecteds and in some larger families such as the one in Figure 2. This locus does not appear linked to prostate cancer risk in a high proportion of larger HPC families with many affected males. Given these observations, gene identification efforts in highly penetrant families would be better targeted to other chromosomal regions, perhaps using whole-exome or whole genome DNA sequencing techniques. Gene identification at Xq27-28 should be aimed at the families with smaller numbers of affected men identified here as belonging to the most strongly linked subset and to specific large families with strongly positive LOD scores.

Additional file
Additional file 1: Table S1. Markers genotyped and used in the linkage analyses by each data collection group [3].

Competing interests
The authors declare that they have no competing interests.  Figure 2 Pedigree that exhibits both potential male-to-male transmission and potential maternal inheritance of prostate cancer. In this family, with a maximum LOD of 1.7 in the HPCX region, all five maternally related affected males share a linked haplotype (shaded black) in this region and the one paternally-related affected male does not share this haplotype. The numbers in the shapes are liability classes based on affection status and age.
Translational Genomics Research Institute, Genetic Basis of Human Disease Research Division, Phoenix AZ, USA. 11 ACTANE consortium. 12 Cancer Epidemiology Centre, Cancer Council Victoria, Melbourne Australia. 13 Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, School of