Genetic association analysis of miRNA SNPs implicates MIR145 in breast cancer susceptibility

Background MicroRNAs (miRNAs) are important small non-coding RNA molecules that regulate gene expression in cellular processes related to the pathogenesis of cancer. Genetic variation in miRNA genes could impact their synthesis and cellular effects and single nucleotide polymorphisms (SNPs) are one example of genetic variants studied in relation to breast cancer. Studies aimed at identifying miRNA SNPs (miR-SNPs) associated with breast malignancies could lead towards further understanding of the disease and to develop clinical applications for early diagnosis and treatment. Methods We genotyped a panel of 24 miR-SNPs using multiplex PCR and chip-based matrix assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) analysis in two Caucasian breast cancer case control populations (Primary population: 173 cases and 187 controls and secondary population: 679 cases and 301 controls). Association to breast cancer susceptibility was determined using chi-square (X2) and odds ratio (OR) analysis. Results Statistical analysis showed six miR-SNPs to be non-polymorphic and twelve of our selected miR-SNPs to have no association with breast cancer risk. However, we were able to show association between rs353291 (located in MIR145) and the risk of developing breast cancer in two independent case control cohorts (p = 0.041 and p = 0.023). Conclusions Our study is the first to report an association between a miR-SNP in MIR145 and breast cancer risk in individuals of Caucasian background. This finding requires further validation through genotyping of larger cohorts or in individuals of different ethnicities to determine the potential significance of this finding as well as studies aimed to determine functional significance.


Background
MicroRNAs (miRNAs) are small non-coding single stranded RNA molecules (21 -25 nucleotides) involved in negative regulation of gene expression [1,2]. miRNAs are transcribed from long microRNA primary transcripts (pri-miRNAs) containing miRNA precursors (pre-miRNA, stem-loop molecules of 55 -70 nucleotides). Pre-miRNAs are usually generated via the canonical pathway [3], where pri-miRNAs are cleaved by the complex formed by the nuclear ribonuclease (RNase) III DROSHA and the RNAbinding protein DCGR8 (DiGeorge syndrome critical region gene 8) [4]. Following transport to the cytoplasm by XPO5 (exportin5) and the nuclear protein ran-GTP, they are processed by DICER1 [5,6]. The resultant duplex molecule contains both a single-stranded mature miRNA sequence and a complementary miRNA* strand which is released and degraded while the mature miRNA is loaded into the RNA-induced silencing complex (RISC) and merged into the Argonaute/EIFC2C (Ago) proteins [7,8].
Each miRNA may bind to up to 200 gene targets and multiple binding sites for different miRNAs, making a highly complex web of interactions affecting a variety of biological pathways [9,10].
miRNAs are very likely to play an important role in cancer biology due to their role in the regulation of important cellular processes including growth, differentiation and cell survival [11][12][13][14][15][16]. Several mechanisms result in changes in miRNA synthesis and expression in relation to cancer including: point mutations in miRNA and mRNA sequences, loss or mutation in the promoter regions for specific miRNA clusters, epigenetic changes and alterations in pathway related to dsRBD proteins [17][18][19]. Single nucleotide polymorphisms (SNPs) in miRNA genes (miR-SNPs) are one example of point mutation (albeit one occurring in the past) that could affect miRNA function in one of three possible ways: altering transcription of the primary miRNA transcript; processing of the pri-miRNA and pre-miRNA and; through their effects on modulation of miRNA-mRNA interactions [20][21][22][23]. As a result miR-SNPs have been associated with different types of cancer, including chronic lymphocytic leukaemia, gastric, lung and thyroid carcinoma [24][25][26][27].
Breast cancer is the second most commonly diagnosed cancer worldwide (1.67 million new cases, 25%) and the most common type of cancer for women in developed countries (793,684 new cases, 28%) [28]. Recent evidence suggests a role for miR-SNPs in breast cancer susceptibility including work by Hu et al. where the presence of mutant alleles of MIR196A2 rs11614913 and MIR499A rs3746444 significantly increased breast cancer risk in Chinese women [29]. However in genetic association analysis of Caucasian populations and functional studies in breast cancer cell lines performed by Hoffman et al. the presence of SNP in MIR196A2 rs11614913 was significantly associated with reduced risk of breast cancer, as well as less efficient processing of MIR196A2 and reduced capacity to regulate target genes, indicating additional factors may be at work in this SNP's effect on breast cancer [30]. Additionally, Kontorovich et al. found 2 SNPs, rs6505162 and rs895819 located in MIR423 and MIR27A precursors respectively, to be significantly associated with decreased risk of breast cancer in BRCA2 mutation carriers from a Jewish population [31]. rs895819 was also significantly associated with reduced risk of developing breast cancer in families with a history of non-BRCA related breast cancer in a later study performed by Yang et al [32]. Finally rs2910164, a miR-SNP located in the 3p strand of MIR146A, was found to be associated with a younger age of diagnosis in familial breast cancer for BRCA1 mutation carriers [33].
In this study, we have investigated a panel of miR-SNPs present in miRNA genes previously identified to be involved in the pathophysiological mechanisms of breast cancer. Our initial selection included 24 variants   located in 10 miRNA genes or in close proximity which were genotyped using multiplex PCR and matrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-TOF MS) analysis. Genotyping results of the 19 miR-SNPs successfully genotyped and included in this study identified six previously reported microRNA SNPs as non-polymorphic. From the remaining polymorphisms twelve miR-SNPs showed no significant differences between cases and controls and one variant in MIR145 was identified to be associated with breast cancer susceptibility in both of our Australian Caucasian breast cancer case-control populations. included in this study were between 33 and 80 years of age, with an average age of 60.16 and they were screened for personal and/or familial history of breast, ovarian or any other type of cancer. Control population for the GU-CCQ BB was established from 2 sources: The control group for this cohort was comprised of genotyping result data taken from 201 healthy females belonging to the phase 1 European population from the 1000Genomes project. Efforts were made to select a subgroup of individuals that were comparable to the case group in terms of age, ethnicity and sex [34].

Study populations
Genomic DNA sample preparation from whole human blood Genomic DNA was extracted from whole blood samples using a modified salting out method described previously [35,36]. DNA samples were evaluated by spectrophotometry using the Thermo Scientific NanoDrop™ 8000 UV-Vis Spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE. USA) to determine DNA yield and 260/280 ratios [37][38][39]. Samples with a reading below 1.7 for their 260/280 ratio were purified using an ethanol precipitation protocol to guarantee DNA sample purity [40].
miRNA SNP selection Figure 1 shows the selection process we followed to determine miRNA SNPs (miR-SNPs) that could be included in our study. Two datasets, "The whole miRNA-disease association data" and "The miRNA function set data" from the human miRNA disease database (HMMDD) created by Lu et al. [41] and updated in January 2012, were used to select 8 diseases and/or pathological characteristics and 24 biological and/or cellular functions related to breast cancer (See Table 1). As shown in Fig. 1, we picked the 50 miRNA genes from each dataset that were present in the majority of selected features for inclusion in the following steps. This list was narrowed down to the 25 miRNA genes on each dataset with the strongest evidence in order to maximise the potential for identification of biologically relevant molecules using two main criteria: miRNAs involved in the largest number of selected features from each group followed by a literature search to confirm the number of publications showing significant relationships to cancer biology or the possession of known functional effects of polymorphisms within the miRNA itself. Following this, we chose 10 miRNA genes from the 25 genes on both lists, again prioritising by number of functions and publications, and conducted a search to identify SNPs using both dbSNP database from The National Center for Biotechnology Information (NCBI) [42] and 1000 Genomes project browser [43]. Final selection of SNPs was done using this algorithm: All microRNA-SNPs located inside the pre-miRNA gene were automatically included in the SNP selection. However, SNPs located outside of the pre-miRNA gene were assessed using the following criteria: miR-SNPs located up to 500bp upstream or downstream from pre-miRNA were automatically included in the SNP selection. On the other hand, SNPs located more than 500bp from the 3' or 5' end were chosen only if they had a previously reported minor allele frequency  higher than 5% in Caucasian populations. As a result 56 microRNA SNPs were identified in this preliminary selection (Data not shown) (See Fig. 1).

Primer design
Using the MassARRAY® Assay Design Suite v1.0 software (SEQUENOM Inc., San Diego, CA, USA) we were able to create a single multiplex PCR genotyping assay containing 24 miR-SNPs from our preliminary selection (See Table 2). We designed forward and reverse PCR primers and one iPLEX® (extension) primer and verified that the mass of extension primers differed by at least 30 Da among different SNPs and by 5 Da between alternative alleles of the same marker to achieve successful marker and allele identification by mass spectrometry analysis. Primers were manufactured by Integrated DNA Technologies (IDT®) Pte. Ltd. (Baulkham Hills, NSW 2153, Australia) and primer information is shown in Table 3.

Primary multiplex PCR
Genotyping was undertaken following the iPLEX™ GOLD genotyping protocol using the iPLEX® Gold Reagent Kit (SEQUENOM Inc., San Diego, CA, USA). Primer extension reactions were performed according to the instructions for the SEQUENOM linear adjustment method included in the iPLEX™ GOLD genotyping protocol (SEQUENOM Inc., San Diego, CA, USA

MALDI-TOF MS analysis and data analysis
A total of 12-16 nl of each iPLEX® reaction product were transferred onto a SpectroCHIP® II G96 (SEQUE-NOM Inc., San Diego, CA, USA) using SEQUENOM® MassARRAY® Nanodispenser (SEQUENOM Inc., San Diego, CA, USA). SpectroCHIP® analysis was carried out by SEQUENOM® MassArray® Analyzer 4 and the SpectroAcquire software Version 4.0 (SEQUENOM Inc., San Diego, CA, USA). Finally data analysis for genotype determination was done using the MassAR-RAY® Typer software version 4.0 (SEQUENOM Inc., San Diego, CA, USA). In order to confirm the genotypes obtained, randomly selected samples (5 each for case and control cohorts) from each genotype (n = 240) were validated by Sanger Sequencing to ensure accuracy of genotyping results. In all cases, the Sanger Sequencing confirmed the genotyping obtained using MassARRAY.

Statistical analysis
Statistical analysis of genotypes and alleles was conducted using Plink software version 1.07 (http:// pngu.mgh.harvard.edu/purcell/plink/) [44]. The α for p-values was set at 0.05 to determine statistically significant association with breast cancer. Genotype and    allele frequencies for each miRNA SNP in our case and control populations were established and we used Hardy-Weinberg equilibrium (HWE) to evaluate deviation between observed and expected frequencies for identification of unexpected population or genotyping biases [45,46]. We performed Chi square analysis to evaluate differences in genotype and allele frequencies between cases and controls for each independent population [47]. Finally we calculated odds ratio (OR) and obtained 95% confidence interval (CI) 95% to assess disease risk.

Results and discussion
MicroRNAs are some of the small non-coding RNA molecules responsible for gene regulation at the translational level. They require a very complex series of nuclear and cytoplasmic processes for their synthesis and to achieve their functional effects on genes involved in key cellular functions like replication and cell differentiation [7,10,13]. As a result they are likely to play a role in the development and progression of cancer due to the important biological mechanisms they regulate [12,15,16]. They have been shown to have different roles in various types of cancer including breast cancer [18,24]. Breast cancer is a cause for concern since recent reports by the IARC show it has very high incidence and mortality rates around the world [28]. Analysis of single nucleotide polymorphisms in well-defined case control cohorts has provided information on miR-SNPs involved in the pathophysiology of different types of cancer including breast cancer [25][26][27][29][30][31][32][33]. Therefore we selected a panel of 24 miR-SNPs related to 9 miRNA genes previously identified to play a role in breast cancer to genotype in our Australian Caucasian breast cancer case control populations. Genotyping of our selected miRNA variants in the GRC-BC cohort showed six of them to be non-polymorphic (rs73798217, rs112394324, rs35301225, rs1547354, rs7050391 and rs3851812) and another five of the chosen miR-SNPs failed to successfully deliver genotypes (rs2829801, rs7395206, rs2858059, rs1143770 and rs77585961). On the other hand, genotype and allele frequencies of the remaining 13 miR-SNPs in the GRC-BC cases and controls showed these to be closely similar to those found in Hapmap for Caucasian populations and they were also in Hardy Weinberg Equilibrium (HWE) (p > 0.05). Ultimately, however, chi-square analysis of genotyping results of 12 SNPs located in relation to seven miRNAs (MIR210, MIR221, MIR222, MIR21, MIRLET7A1, MIRLET7A2 and MIR145) showed no significant differences for genotype and allele frequencies between cases and controls in our GRC-BC population (See Tables 4, 5 , 6, 7, 8 and 9).
In contrast, we were able to determine significant differences at the allelic level for rs353291 after chisquare analysis in our GRC-BC cohort (p = 0.041) although no significant difference in genotype frequencies (p = 0.09) (See Table 10). We then proceeded to genotype this SNP in the GU-CCQ BB population and we also found the genotype and allele frequencies closely matched Hapmap frequencies for Caucasian populations and cases and controls were in HWE. Statistical analysis of genotyping in our replication population showed similar findings to those obtained in the GRB-BC cohort shown in Table 10. We were able to find significant differences in allele frequencies between cases and controls in the GU-CCQ BB population (p = 0.006) and also statistically significant differences at the genotype level (p = 0.02). Finally we calculated odds ratios for alleles in the GRC-BC and GU-CCQ BB populations to be 1.37 (CI 95%: 1.01-1.84) and 1.31 (CI 95%: 1.04-1.70) respectively, suggesting the presence of the C allele at this locus increases the risk of developing breast cancer. However it should be noted that if we consider multiple testing, our finding for the analysis of allele frequencies for both populations for this variant is not significant if we determine Bonferroni correction of the α for p-value to be 2.08 x 10 −3 . However these results are still potentially interesting particularly considering the similarity of allelic significance in both independent case-control cohorts and point to the need for further genotyping in extended populations.
miR-SNP rs353291 is located 450 bp upstream from the MIR145 gene, inside the miRNA 143 host gene transcript, in the long arm of chromosome 5 region 32 at position 148,810,746. To the best of our knowledge there are no previous association studies in relation to this miR-SNP and breast cancer risk in Caucasians or any populations of other ethnicities. However, MIR145 has been previously reported to play a role in cancer biogenesis and progression. Downregulation of MIR145 is a common finding previously reported in colorectal [48,49], bladder [50], lung [51,52] and oesophageal [53] cancers possibly leading to poor prognosis. Similarly, it has also been found to have a tumour suppressor role in breast tissue particularly in the myoepithelial/basal cell compartment and it is found to be downregulated in breast cancer tissue samples [54][55][56]. Research using breast cancer cell lines showed it regulates genes involved in modulation of apoptosis [57,58]. Finally downregulation of MIR145 has been associated with a more aggressive behaviour of breast malignancies based on results performed in both breast cancer cell line and tissue samples [59,60]. However the molecular mechanisms leading to decreased expression of MIR145 still remain unknown and our finding could potentially help to provide further knowledge on these mechanisms through further functional validation on breast cancer cell culture and/or animal/ models. It is also possible that rs353291 is linked to a different nearby SNP not considered in this research that may have direct functional effects on MIR145, so additional sequencing/genotyping studies may need to be performed prior to functional assessment of the link between rs353291 and breast cancer.

Conclusions
We were able to determine that the presence of a polymorphism in miR-SNP rs353291 is associated with an increased risk of developing breast cancer based on our findings. To the best of our knowledge, this is the first report on breast cancer risk association for this variant in individuals of Caucasian background. This finding could potentially explain the previously described role that MIR145 plays in breast cancer documented in the literature, but it requires confirmation via functional studies using cell culture or animal models. It also requires further validation on larger Caucasian population as well as in cohorts of individuals with different ethnical backgrounds before it can be translated into clinical applications used in breast cancer diagnosis or treatment.