Genetic variation in the NBS1, MRE11, RAD50 and BLM genes and susceptibility to non-Hodgkin lymphoma

Background Translocations are hallmarks of non-Hodgkin lymphoma (NHL) genomes. Because lymphoid cell development processes require the creation and repair of double stranded breaks, it is not surprising that disruption of this type of DNA repair can cause cancer. The members of the MRE11-RAD50-NBS1 (MRN) complex and BLM have central roles in maintenance of DNA integrity. Severe mutations in any of these genes cause genetic disorders, some of which are characterized by increased risk of lymphoma. Methods We surveyed the genetic variation in these genes in constitutional DNA of NHL patients by means of gene re-sequencing, then conducted genetic association tests for susceptibility to NHL in a population-based collection of 797 NHL cases and 793 controls. Results 114 SNPs were discovered in our sequenced samples, 61% of which were novel and not previously reported in dbSNP. Although four variants, two in RAD50 and two in NBS1, showed association results suggestive of an effect on NHL, they were not significant after correction for multiple tests. Conclusion These results suggest an influence of RAD50 and NBS1 on susceptibility to diffuse large B-cell lymphoma and marginal zone lymphoma. Larger association and functional studies could confirm such a role.


Background
Non-Hodgkin lymphoma (NHL) is a heterogeneous group of hematological malignancies that in aggregate constitutes the 5 th highest cause of cancer mortality in the United States [1] and Canada [2]. NHL subtypes vary in presentation, survival expectation, morbidity and responses to treatment. Chromosomal translocations are so characteristic of NHL that many genes now known to be important in the development of cancer, such as BCL2 [3], were originally discovered due to their position at recurrent translocation breakpoints in NHL tumours.
During development and differentiation, the DNA of Band T-cells is subject to double stranded breaks necessary for the rearrangement of immunoglobulin genes. Genes functioning in double-stranded break repair are involved in successfully controlling and repairing these breaks, thus protecting the genome from molecular events that could lead to cancer. This study examined four genes with key roles in maintaining genome stability: the MRN complex, MRE11, RAD50 and NBS1, and the Bloom syndrome gene (BLM). We have previously shown association with NHL of a genetic variant in H2AX, which encodes a histone involved in signalling the presence of double stranded breaks [4]. The MRN complex forms foci at sites of double stranded breaks induced by ionizing radiation or immunoglobulin rearrangements during B-and T-cell development, sensing DNA damage and initiating DNA repair [5][6][7].
The chromosome instability syndromes (reviewed in [8]) form a group of rare autosomal recessive diseases characterized by an increased risk of cancer. This group includes ataxia-telangiectasia (AT, OMIM 208900), Nijmegen breakage syndrome (NBS, OMIM 251260), Bloom syndrome (OMIM 210900) and Fanconi anemia (OMIM 227650). NBS includes an increased risk of lymphoid malignancies [9], particularly B-cell lymphoma [10,11]. Some patients with an NBS-like phenotype have mutations in RAD50 [12]. Hypomorphic mutations in MRE11 result in an AT-like disorder (AT-LD). NBS and AT-LD share many features, including immunodeficiency and genome instability caused by failure of timely activation of cell cycle checkpoint pathways [13][14][15][16].
Mutations in NBS1 cause aplastic anemia and acute lymphoblastic leukemia [17,18]. RAD50 variants have also been associated with an increased risk of sporadic [12], but not necessarily familial breast cancer [19,20]. MRE11 inactivation has been identified in colorectal cancer cell lines and primary tumours [21], suggesting that inactivation of the MRN complex could be a frequent event in cancers.
Bloom syndrome is also marked by a predisposition to cancer, particularly lymphoma and leukemia in young patients [22]. Although homozygous loss of Blm in mice leads to embryonic lethality, heterozygotes show increased risk of neoplasia, with augmented T-cell tumourigenesis [23]. This haploinsufficiency is supported by the increased risk of cancer in BLM heterozygotes of Ashkenazi Jewish descent [24], although there is some controversy regarding this finding [25]. This illustrates BLM's role in response to DNA damage [26], particularly during DNA replication stress [27].
While both Nbs1 [28] and Mre11 [29] null mutants are inviable in vertebrates, the hypermorphic Rad50 S mutation causes hematopoietic stem cell failure so that mice that do not die of lymphoma die of bone marrow attrition [30], highlighting the delicate balance the MRN complex exerts on cell survival. This is illustrated by the dosage sensitivity to this mutation and the bidirectional phenotypic rescue in Rad50 S/S Atm -/mice [31], leading the authors to speculate that while mutations that cause gross chromosomal instability would have a wide array of outcomes, less severe mutations would primarily affect tissues developed from a limited number of precursor stem cells. Since the hematopoietic system is such a system, this reinforces the need to look for variants in genes already known to be associated with severe genetic disorders, with the rationale that varying degrees of mutation severity affect the spectrum of possible effects.
To systematically investigate the role of NBS1, MRE11, RAD50 and BLM in susceptibility to NHL, we carried out re-sequencing of these four genes to establish the spectrum of genetic variation in NHL cases, and genotyped 797 NHL cases and 793 controls. Just as total inactivation of a gene and attenuation of its activity lead to different phenotypes in mice, we expected that subtle variation in DNA repair genes could be pertinent to NHL risk in the general population, while complete inactivation of these genes leads to rare and severe syndromes.

Study population
The methodology has been described previously [32,33]. Informed consent was obtained as approved by the joint University of British Columbia/British Columbia Cancer Agency Research Ethics Board. All HIV-negative NHL cases diagnosed in British Columbia from March 2000 to February 2004, residing in the Greater Vancouver Regional District and greater Victoria (Capital Regional District), aged 20 to 79 were invited to participate. Cases were reviewed and coded using the World Health Organization classification by an experienced lymphoma pathologist (RDG). Population controls were identified from the Client Registry of the British Columbia Ministry of Health and were frequency matched to cases by sex, age, and area of residence in a 1:1 ratio. 828 cases and 848 controls completed at least part of a study questionnaire; however, only those subjects with DNA available were used in this study. Table 1 summarizes the characteristics of the 797 cases and 793 controls available for analysis.

DNA extraction and sequencing
Genomic DNA was extracted from whole blood (in 10% of cases from a mouthwash or saliva sample) using the PureGene DNA isolation kit (Gentra Systems) following manufacturer's instructions. DNA was then quantified using PicoGreen (Molecular Probes) in a Victor2 fluorescence plate reader (Perkin-Elmer).
The genomic sequences for all genes were downloaded from the UCSC genome browser [34]. All coding and non-coding exons were sequenced, as well as 1000 base pairs upstream of transcription start. Conserved non-coding sequence regions (CNS regions) were identified using the VISTA genome browser [35]. The six most highly con-served CNS regions with at least 100 base pairs of at least 70% identity with the mouse and rat homolog were also sequenced.
Primers were selected for all amplicons using Primer3 [36]. The -21M13F (TGTAAAACGACGGCCAGT) forward or M13R (CAGGAAACAGCTATGAC) extensions were added to the 5' ends of the forward and reverse PCR primers, respectively, to allow uniform sequencing conditions. PCR and sequencing reactions were carried out as previously described [37]. Primers and conditions used in PCR reactions are listed in Additional file 1. The quality of sequencing reads was assessed using Phred [38,39], potential variants identified by Polyphred version 5 [40] and all sequences assembled with reference sequences using Phrap [41] and viewed in Consed version 12 [42].
Haplotypes of variants with minor allele frequency (MAF) >5% in the sequence data were inferred using PHASE v2.1.1 [43,44]. Four tagSNPs were selected for each gene using TagSNP, version 1.1 [45]. Three additional SNPs of potential functional relevance in NBS1 were also tested.

Genotyping
TaqMan ® was used for all genotyping. Assays were designed using the Assays-by-Design service (Applied Biosystems). Primers and probes used are listed in Additional file 2. 10 ng of each sample was aliquoted in 384-well plates and the DNA dried down at room temperature. TaqMan reactions were carried out in 5 uL volumes as per the manufacturer's protocols. Fluorescence data was obtained in the ABI PRISM 7900 HT, after 10 min at 95°C, followed by 40 cycles of 92°C for 15 s and 60°C for 1 min. The SDS2.2 software (Applied Biosystems) was used to assign genotypes to individual samples.

Statistical Analyses
Statistical analyses were carried out as described previously [32]. Briefly, all controls were tested for deviation from Hardy-Weinberg equilibrium. Odds ratios (OR) and 95% confidence intervals were estimated using logistic regression. These analyses were conducted using SPSS version 15, with adjustment for sex, age group (categories: 20-49, 50-59, 60-69, 70+), residence (Vancouver or Victoria), and for ethnicity (Caucasian, Asian, South Asian, Mixed, Unknown/Refused) when all cases and all controls were analyzed together. Heterozygotes and rare homozygotes were combined for analysis when the number of rare homozygotes was less than five. Tests were not performed when the sum of the number of heterozygotes and rare homozygotes was less than five for cases or controls. Tests for trend were conducted when there were at least five samples in each genotype category for both cases and controls. Multiple testing correction was carried out by the false discovery rate (FDR) method [46]. Because we tested  nineteen markers, the p-value of the most significant marker must be below the threshold of 0.0026 to be considered significant. The haplotypes inferred were analyzed as categorical variables and assessed for risk effect using R version 2.1.1 [47]. Haplotypes with frequency <4.5% were combined into a "rare" category. non-synonymous mutations, 4 of which were ranked as "probably" or "possibly damaging" by PolyPhen [50]. Only one of these, BLM_X13_(2603)_C/T, was observed more than once, with a MAF of 5.6%. Fifty-five (48%) variants were "singletons", meaning the minor allele was only observed once in this data set of 87 samples, or 174 chromosomes. Forty-one (36%) variants were "common", with MAF of at least 5%. 59% of variants were pre-

Genotyping
Haplotypes were inferred using the 41 variants that were observed more than once in the sequence data, using PHASE v2.1.1 [43,44]. The number of haplotypes inferred for each gene is indicated in Table 2. Haplotype tagging SNPs (tagSNPs) were selected using TagSNP version 1.1 [45]. Nineteen variants were chosen for genotyping and are indicated in bold in Additional file 4.
The 19 tagSNPs were genotyped in 797 cases and 793 controls, with an average genotype call rate of 97.6%. Their respective MAFs, as calculated using all 1590 samples, are in Additional file 2. The concordance of genotypes (in the 87 samples that were sequenced) between the independent methods of sequencing and TaqMan genotyping was complete; no discrepancies were found. As a quality assurance measure, we also genotyped the 19 SNPs in DNA samples from five three-generation CEPH families (purchased from Coriell Cell Repositories, NJ, USA) and con- firmed that the alleles segregated according to Mendelian inheritance.

NHL association tests
We compared all European ancestry controls against all European ancestry NHL cases, all B-cell NHL, all T-cell NHL and major subtypes individually. One of the variants, MRE11_5UP_(-1456)_C/T, was excluded from analysis due to deviation from Hardy-Weinberg equilibrium in controls. Results for the two most common subtypesdiffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) -and results suggestive of association with Marginal Zone lymphoma/Mucosa-Associated Lymphoid Tissue (MZ/MALT) are shown in Table 3; see Additional file 9 for all results. RAD50_IVS22(+24)_A/G showed a possible association with DLBCL that was strong enough to influence the overall NHL analysis (ptrend of 0.022 for DLBCL Combined analyses of all samples from all ethnicities were also performed, adjusting for ethnicity in the model (data not shown); some SNPs (usually the same as in the European ancestry only analysis) again showed results suggestive of association but failed to reach p < 0.05 upon correction for multiple testing. The ethnic diversity of our study population could mask a real signal and so we focused on the European subpopulation.
The haplotypes inferred from individual SNP genotypes were also tested for association with NHL using R version 2.1.1 (data not shown). No haplotype was more significantly associated with NHL than the individual SNPs forming that haplotype.

Discussion
RAD50, NBS1, MRE11 and BLM were re-sequenced in 87 NHL cases to characterize the variation in these genes in NHL cases in our population. All genes had similar numbers of variants and similar nucleotide diversity, albeit slightly greater for NBS1 ( Table 2). All four genes showed evidence of negative selection, as indicated by a K a /K s value of less than one (0.56 for all four genes combined), which we would expect for genes involved in such a conserved and critical process as DNA repair. The most variable gene, NBS1, also showed the lowest conservation.
Two SNPs in RAD50 were suggestive of association with specific NHL subtypes ( We did not find that variants in NBS1 conferred an increased risk of lymphoma, as in most other studies [53][54][55][56][57], although there remain some contradictory positive reports [58][59][60][61]. In contrast, non-synonymous mutations in NBS1 have been shown to be associated with acute lymphoblastic leukemia in German [17] and Polish [62] children. A study by Rollinson et al [63] of haplotypic variation in NHL found no increased risk associated with haplotypes of NBS1 and RAD50; however, they observed the variant rs601341 in MRE11 to have a protective effect on FL and a protective effect of an MRE11 haplotype on DLBCL. We did not sequence the part of intron 18 where rs601341 is located and so did not explicitly test this SNP. The difference between our results and those of Rollinson et al. could be the result of a SNP-specific effect, and/or the different populations studied.
Although there have been other studies of susceptibility to NHL looking at the genes addressed in this study, most have relied on the genotyping of rare variants discovered in studies of the rare recessive syndromes discussed above. Genotyping was generally done using single-strand conformation polymorphisms [17,53,54,56,58,61,62] or by TaqMan [63]. One study [63] used public databases to collect the information on the SNPs in the regions of interest. However, sequencing of germline DNA of patients with sporadic lymphoma to systematically identify genetic variants had not been previously done. Our systematic characterization of these genes provides valuable information on the variation found in these genes in individuals with NHL. Previous systematic investigations of another double-stranded break repair gene, ATM, by our group did not reveal any association between common variants in ATM and NHL or its subtypes [32]. In contrast, a common SNP in the promoter region of H2AX  showed a protective effect on NHL and on FL in particular [4].
Limitations of our study include the histological heterogeneity of NHL, which is composed of many subtypes, many of which are rare. Identification of genetic susceptibility factors that differ between subtypes will be limited by the lack of availability of adequate sample numbers for less common subtypes. The clinical diversity of NHL enabled us to make the strongest conclusions only for DLBCL and FL. Our sample is also ethnically heterogeneous, and so has reduced power to detect genetic factors that are present only in specific ethnic groups. Future replication of results in the context of large international consortia, such as the InterLymph Consortium [64], will help to overcome such limitations.

Conclusion
While the genes in this study were not significantly associated with NHL independently, it is possible that they could modify NHL risk in combination with other variants. Larger studies would be required to detect such genegene interactions. Our observation of possible associations of SNPs in RAD50 with DLBCL and MZ/MALT lym- If less than 5 samples were in a category, the analysis is not valid and marked by "-". Analyses were not done for subtypes that had fewer than 5 heterozygotes and minor homozygotes combined. Analysis is adjusted for adjusted for gender, ethnicity, age, and residence. p-value for test for trend is shown in italic type. p-values less than 0.05 are in bold. All results are in Additional file 9. phomas may contribute to the refinement of biological hypotheses for confirmation in larger association studies and functional studies. Mechanisms of tumourigenesis, and the basis for NHL susceptibility, probably differ between NHL subtypes. Specific observations such as these will help us understand the etiological basis for the diversity of NHL types.