Association study between common variations in some candidate genes and prostate adenocarcinoma predisposition through multi-stage approach in Iranian population

Background Prostate cancer is one of the five common cancers and has the second incidence rate and the third mortality rate in Iranian population. The purpose of this study was to evaluate the association of rs16901979, rs4242382 and rs1447295 on 8q24 locus, rs2735839 (KLK3 gene) and rs721048 (EHBP1 gene) with prostate adenocarcinoma through multi-stage approach to identify the polymorphisms associated with prostate cancer and use them as screening factors. Screening tests can identify people who may have a chance of developing the disease before detection and any symptoms. Methods The case-control study included 103 cases (prostate adenocarcinoma) and 100 controls (benign prostatic hyperplasia). Tetra-primer ARMS-PCR was used to genotyping of each participant. A Multi-stage approach was used for efficient genomic study. In this method, a smaller number of people can be used. Chi-squared, Fisher’s exact test and logistic regression were used to investigate the SNPs associated with prostate cancer and Gleason score. Results In the first stage (59 men), the frequency of polymorphisms rs16901979, rs4242382, rs1447295, rs2735839 and rs721048 in the prostate adenocarcinoma group was evaluated compared to the control group (P-value < 0.3) in order to select meaningful polymorphisms. There was not any significant difference between genotype frequency rs16901979 (P = 0.671) and rs721048 (P = 0.474) in the case group compared to BPH. Therefore, these polymorphisms were eliminated, and in the second step (144 men), rs4242382, rs2735839 and rs1447295 were evaluated (P-value < 0.05). According to the total population (203 men), there was significant difference between genotype frequency rs4242382 (P = 0.001), rs2735839 (P = 0.000) and rs1447295 (P = 0.005) even after using Bonferroni correction (p = 0.016). The effect of these three polymorphisms on prostate cancer was not modified by age and PSA. There was a significant difference between the allelic frequency of A vs G (rs4242382, rs2735839) at all classes of Gleason score and A vs C (rs1447295) at Gleason score ≥ 8. Conclusions The results of this study for rs2735839, rs4242382 and rs1447295 indicate the association of these polymorphisms with prostate adenocarcinoma predisposition in Iranian population. Exposure effect is homogeneous between different ages and PSA level categories. These three polymorphisms should be studied in a larger population to confirm these results.


(Continued from previous page)
Conclusions: The results of this study for rs2735839, rs4242382 and rs1447295 indicate the association of these polymorphisms with prostate adenocarcinoma predisposition in Iranian population. Exposure effect is homogeneous between different ages and PSA level categories. These three polymorphisms should be studied in a larger population to confirm these results.
Keywords: Benign prostatic hyperplasia (BPH), PSA, 8q24, EHBP1, Single nucleotide polymorphism (SNP), KLK3 Background Prostate cancer is a malignancy, which in most cases is prostatic adenocarcinoma amongst men, in which tumor cells begin to grow and proliferation of the epithelial tissue [1][2][3][4]. About 70% of the prostate tumors originate from the peripheral zone, 25% from the Transition zone and 5% from the central zone [5][6][7][8]. The incidence of this cancer varies among different populations. The highest and the lowest prevalence have been found in the African-American and South Asian races respectively [9,10]. Despite the trend toward declining in the incidence and mortality rates of prostate cancer in the United States and some other Western countries, the incidence of this cancer is increasing in less developed and developing country [9][10][11][12]. In Iran, the prevalence as with other developing countries, is increasing [13]. According to recent GLOBO-CAN reports in 2018, Age-Standardized Incidence Rate (ASIR) and Age-Standardized Mortality Rate (ASMR) of prostate cancer were 16.6 and 8.3 per 100,000 populations in Iran. Therefore, it has the second incidence rate and the third mortality rate [13].
In the early stages of the cancer, when the tumor is limited to the prostate tissue itself, its symptoms are rare. But the main symptoms that arise when the cancer progresses include frequent urination, urinary incontinence, hematuria (blood in the urine), persistent pain in the lower back or pelvic cavity. Early diagnosis and accurate classification of prostate cancer are vital because survival rate is much reduced when the cancer spreads beyond the prostate. Because of tissue heterogeneity in prostate cancer, there is no specific imaging for the early diagnosis. Hence, the first step in diagnosis is usually the Prostate specific antigen (PSA) test and digital rectal exam (DRE) [2][3][4]. The PSA test, although is dedicated to the prostate gland, it is not proprietary for prostate cancer [14]. Benign prostatic hyperplasia (BPH), prostatitis, infection and DRE increase serum PSA level [15]. In contrast, in 20-40% of patients with prostate cancer that is confined to the prostate itself, the PSA level is less than or equal to 4 ng/ml and does not increase. Therefore, this diagnostic test lonely has a low screening value [16][17][18]. Needle biopsies are also carried out in advanced stages of the cancer for microscopic examination [19]. Screening tests can identify people who may have a chance of developing the disease before detection and any symptoms [20][21][22][23]. Experimental and clinical observations indicate that age, geographical area, ethnic difference, family history, obesity, androgens and various genetic alterations have a role in the pathogenesis of prostate cancer [24][25][26][27][28]. The genetic alterations involved in tumorigenicity include Somatic copy number alterations (SCNAS), structural rearrangement, point mutations, single nucleotide polymorphisms (SNPs) and miRNAs [20]. SNPs are important in genomic studies because there is a significant association between them and various biological traits. This genetic marker is responsible for over 90% of genetic variations that affect the function of genes and cause the difference in the susceptibility of individuals to cancer [29][30][31]. There are numerous ways for efficient genomic studies, such as multi-stage approach. In this method, a smaller number of people can be used to detect subjects' genotypes. In the first stage, a complete set of SNPs is examined in a limited number of individuals (according to the normal distribution, individuals are selected from two extremes to make the most difference) at liberal p-value. In the next stages, the SNPs selected in the first phase are studied in larger population at more stringent P-value. Eventually, from the several candidate SNPs in the first phase, only a small number of SNPs are in actual association with the target trait [32,33].
SNPs that were investigated in this study are rs2735839, rs721048, rs4242382, rs16901979 and rs1447295. rs4242382, rs16901979 and rs1447295 are located on 8q24 locus. The 8q24 is a susceptibility locus for a widerange of cancers [23]. This locus, commonly referred to as the gene desert, is highly conserved and consists of three regions (Region 1: 128.54-128.62 Mb; region 2: 128.12-128.28 Mb; region 3: 128.47-128.54 Mb) based on fine mapping [34]. The nearest genes to this chromosomal region are MYC (telomeric end) and FAM84B (centromeric end) [35]. The mechanism by which 8q24 can lead to prostate cancer is still not well understood [23], but there are assumptions that these variants can change the MYC protein expression and affect the Wnt signaling pathway [35]. rs4242382 is located at 230 Kb from MYC gene, and along with rs16901979 are in 8q24 region 2 [23]. rs1447295 C > A is located about 263 kilobases from MYC telomeric end in 8q24 region 1 [34][35][36][37][38]. rs2735839 G/A, an intergenic polymorphism is located on chromosome 19q13.33, at 600 base pairs downstream of the Kallikrein related peptidase 3 (KLK3) gene untranslatable region [39][40][41]. The KLK3 gene encodes the prostate-specific antigen, which is widely used in the screening of prostate cancer [42]. rs721048 is located in the EHBP1 gene intron at 2P15. This gene encodes Eps15 homology domain binding protein, which binds clathrin-dependent endocytosis to the cytoskeleton [22].
The purpose of this study was to evaluate the association of rs16901979, rs4242382, rs1447295, rs2735839 and rs721048 with prostate adenocarcinoma and clinical information compared to BPH through multi-stage approach in Iranian population. The published results of various studies in different population are sometimes similar and in some cases different. To date, these polymorphisms have not been studied in Iranian populations. Associated polymorphisms with prostate cancer can be a candidate biomarker for screening test.

Patients
In this case-control study, 103 men with prostate adenocarcinoma (case) and 100 age-matched men with BPH (control) who referred to Shahid Hasheminejad Hospital in Tehran. Iran, from January 2016 to June 2017 were assessed. The sampling was carried out randomly. Diagnosis was made based on the PSA level, DRE, prostate biopsy and physician confirmation so only prostate cancer patients with adenocarcinoma and BPH with no history of cancer were selected in this study. BPH was used as a control because the biopsy showed that the person did not have cancer. Selection of controls derived from the same ethnic population as cases. Patients' clinical data included age at diagnosis, serum PSA level, prostate volume, Gleason score, extraprostatic extension and perineural invasion.
Written informed consent was obtained from all participants. This research and all methods were performed in accordance with the ethical principles, the national norms, standards, relevant guidelines and regulations for conducting Medical Research in Iran. The study has been ap- According to this calculation, the minimum number of people surveyed in this study was 78 patients, however, more people were studied.

DNA extraction
Genomic DNA was extracted from peripheral blood lymphocytes using FavorPrep™ Blood Genomic DNA Extraction Mini Kit (Favorgen, Taiwan) instruction. The quality and quantity of each DNA sample were verified using 1.5% agarose gel electrophoresis and NanoDrop™ spectrophotometer respectively. The 260/280 ratio was 1.8.

SNP genotyping
Tetra-primer amplification refractory mutation system PCR (T-ARMS-PCR) assay was used for SNP genotyping of each participant. The primer design was done using the Primer1 database and validated through the Primer-BLAST-NCBI database. In this method, four primers in a reaction are used for genotyping, which include two non-specific outer primers and two allele-specific inner primers. In order to enhance the specificity, mismatches are introduced at the first and third position from the 3′ end of each of the two inner primers. Primer sequences, the nucleotide determining the alleles of SNP (bold letters) and primer products are shown in the Table 1. Positive control (C+) is the product of two non-specific outer primers. Negative control was also used to ensure the absence of contamination in PCR.
MAF is minor allele frequency Which should be greater than 0.01 to confirm SNP in that genetic region.

Statistical analysis
Chi-squared and Fisher's exact test with p < 0.3 in the first stage and p < 0.05 in the second stage were used to evaluate genotype frequency, allelic frequency, Hardy-Weinberg equilibrium (HWE), multiplicative and additive genetic models. To investigate the SNPs associated with prostate cancer and Gleason score, in the multiplicative and additive models and effect modification, Odds Ratios (OR) with 95% confidence interval (95% CI) were calculated by logistic regression and chi-squared test. Statistical significance was defined as p < 0.016 in order to address multiple-test issues by examining 3 SNPs in the second stage and applying a Bonferroni correction. Statistical analyses were performed using SPSS v.25.0.

Patients characteristics (stage I)
In the first stage, 59 men were selected which included 30 men with prostate adenocarcinoma and 29 men with BPH. The selection of subjects in the prostate adenocarcinoma group, which included the unhealthiest men, according to Gleason score > 7, positive results of perineural invasion and Extraprostatic extension was done that individuals should have at least two of these criteria. The choice of subjects in the BPH group, which included the healthiest, was according to PSA < 4. In the case and control groups, the age range was between 50-84, 47-78 and the mean age was 71.77 ± 9.22, 62.66 ± 7.848.

Statistical analysis (stage I)
In the first stage, a complete set of SNPs with liberal Pvalue (< 0.3) were examined to find meaningful polymorphisms. The genotype frequencies of these polymorphisms in each of the two groups are reported separately in the Table 2. There was no significant difference between genotype distribution of rs721048 (p = 0.474) and rs16901979 (p = 0.612) in case and control groups. With the Chi-square test, prostatic cancer (P = 0.952, P = 0.612) and BPH (P = 0.72, P = 0.706) groups in relation to rs721048 and rs16901979 respectively were under HWE. Therefore, multiplicative and additive genetic models were examined to evaluate the association of these polymorphisms and the incidence of cancer. In this study, GG (rs721048) and CC (rs16901979) are wild genotypes, so they are considered as reference genotypes. According to the Table 3 via Fisher's exact test and logistic regression, none of these two polymorphisms were associated with prostate cancer under the multiplicative and additive genetic models. rs16901979 and rs721048 were not significant in any of the analysis assay, so they were eliminated and in the next stage, only polymorphisms rs4242382, rs2735839 and rs1447295 were evaluated.

Patients characteristics (stage II)
At this stage, genotyping of the three selected polymorphisms was performed in the rest of the participants    Table 4.

Statistical analysis (stage II)
Since the statistical analysis of the first stage was performed only for the selection of meaningful polymorphisms in order to examination in the second stage, the statistical analysis of this stage included all participants (203 men). First, in three polymorphisms (rs4242382, rs1447295, rs2735839), the Hardy-Weinberg equilibrium was studied in two groups by using Chi-square test.  10.065], P = 0.00). All of these associations remained significant after Bonferroni correction (p < 0.016).
In Table 6, genotype and allelic frequency, the OR[95%CI] (logistic regression) and P-value (Fisher's exact test) of risk allele compared to wild allele at three categories of Gleason score were presented. According to this table, there is a significant difference between the frequency of allele A vs G in all three categories of Gleason score at rs42423820 and rs2735839. At rs1447295, there is a significant difference between frequency of allele A vs C only in the Gleason score ≥ 8. After Bonferroni correction, rs4242382 did not associate with Gleason Score = 7 and rs1447295 with any Gleason Score categories.  Using Fisher's exact test, no significant difference was found between the distribution of genotype at rs4242382, rs1447295 and rs2735839 with the perineural invasion and extraprostatic extension (Table 7).

Effect modification
In this study, age and PSA were investigated as an effect modification. According to Table 8, Age and PSA were categorized into 3 subgroups. The association of rs4242382, rs2735839 and rs1447295 with prostate cancer was investigated in each subgroup (df = 2). However, in none of these polymorphisms, total OR was significantly different with odds ratio of subgroup, so age and PSA are not considered as effect modification in this study.

Discussion
In the present study, polymorphisms rs16901979, rs1447295 and rs4242382, rs2735839 and rs721048 were evaluated in the prostate cancer compared to control group (BPH) in Iranian population for the first time. In the first stage [43], which included the unhealthiest and healthiest participants, rs16901979 and rs721048 did not show any significant association with prostate adenocarcinoma. There was no significant difference between the genotype frequency of  rs16901979 (P = 0.671) and rs721048 (P = 0.474) in both case and control groups. According to the multiplicative and additive genetic models, the association of these polymorphisms with prostate adenocarcinoma was not observed, so rs16901979 and rs721048 were eliminated at first stage. There was a significant difference between genotype distribution of rs4242382, rs1447295 and rs2735839 in both case and control groups in the first stage (P = 0.236, 0.171, 0.005 All of these associations remained significant after Bonferroni correction (p < 0.016). There was a significant difference between the allelic frequency of A vs G (rs4242382, rs2735839) at all three classes of Gleason score and A vs C (rs1447295) at Gleason score ≥ 8 Which probably indicates the association of this polymorphism (rs1447295) with the invasive prostate cancer but after Bonferroni correction, rs4242382 did not associate with Gleason Score = 7 and rs1447295 with any Gleason Score categories. No significant difference was found between genotype frequency and clinical features such as perineural invasion and extraprostatic extension. Among genomic changes that are associated with the development of prostate cancer, the variants in 8q24 chromosomal region are very important. In European and American populations, several variants have been identified in this locus for prostate cancer, but similar studies have been conducted less in Asian populations [23]. More than 11 Genome Wide Association Study (GWAS) have identified strong association between the 8q24 chromosome and the risk of this cancer. Variants of this locus can affect the regulation or transcription of a causative gene outside this region. MYC (proto-oncogene) at the telomeric end is a strong candidate gene in this field. The transcription factor encoded by c-MYC regulates the expression of multiple genes responsible for cell proliferation, cell differentiation and apoptosis [35]. FAM84B (at the centromeric end) expression, increases during prostate tumorigenesis and progression [35,37]. rs2735839 G/A is located at 600 base pairs downstream of the KLK3 gene (PSA) 3′ untranslated region, which codes PSA [40,41]. PSA (30-kDa) is a protease, which is released by the prostate gland epithelial cells into seminal fluid and blood. PSA can cleave a number of proteins, such as the insulin-like growth factor binding protein (IGFBP2,3,5), Parathyroid hormone-related protein (PTH-related protein), latent TGF-β2, fibronectin and laminin (extracellular matrix components) Whose destruction is the first step in tumorigenesis, metastasis and development of prostate cancer [44][45][46]. In prostate cancer, serum PSA level usually increases, which can induce mutations in P53 and up-regulation of the B-cell lymphoma 2 protein that inhibits apoptosis in tumor cells [47]. rs721048 A/G is located in the intron of EHBP1 gene. This gene encodes Eps15 homology domain binding protein. Upregulation of the NPF motif contained in EHBP1 causes degradation of the actin structure. However, the precise mechanism of EHBP1 in the onset and progression of prostate cancer are still unknown [22].
The effect modification is seen when the association of exposure and outcome varies according to a third factor, Indeed, the exposure can have a different effect among the different subgroups [48,49]. We investigated whether age and PSA were an effect modifier in  association between exposure (rs4242382, rs2735839 and rs1447295) and outcome (prostate cancer), but the effect of rs4242382 (X 2 = 0.756, X 2 = 1.84), rs2735839 (X 2 = 3.303, X 2 = 4.68) and rs1447295 (X 2 = 1.848, X 2 = 0.184) on prostate cancer was not modified by age and PSA respectively. Therefore, exposure effect is homogeneous between different ages and serum PSA level categories. 8q24 was initially identified by genome-wide linkage study in the Icelandic families' genome. Follow up these subjects, displayed that the microsatellite DG8S737 (repeat of the two nucleotides AC) and rs1447295 had the highest association with the risk of prostate cancer [50]. During a cohort study by Suuriniemi et al., rs1447295 had strong association with prostate cancer in 597 prostate cancer patients and 548 European-American controls [51] which was similar to the results of this study.  [52]. The effect of rs16901979 has been reported in African-American population more than the Asian ethnic [23]. During a study on African ethnic related to rs16901979, genotype AA and allele A were reported with an increased risk of prostate cancer (OR = 1.84 [95%CI] = 1.26-2.69, P = 0.002 and OR = 1.36, 95% CI = 1.13-1.64, P = 0.001) as well as association of AC + AA with Gleason score ≥ 7 (invasive prostate cancer) [53,54] which is contrary to the present study due to racial differences. In a study by Cheng-Xiao Zhao et al., The results indicate that 8q24 rs4242382-A may be related to the increased sensitivity to prostate cancer in Asian, Caucasian, and African American populations. Therefore, this polymorphism can be a multi-ethnic marker [23]. During a meta-analysis, a total of 20,239 cases and 20,439 controls were studied for rs1447295 C > A and 1850 cases and 2090 controls for rs16901979 C > A. rs1447295 was associated with the risk of prostate cancer in the Caucasian and Asian, but not African-American. The effect of rs16901979 was high among African-American more than the Asian [34].
In the first GWAS by Eles and colleagues, rs2735839 was identified as a risk factor for prostate cancer [55] Which showed a stronger association with the PSA level than previous polymorphisms [56]. Pomerantz et al., were found that rs2735839 (A) has been related to the mortality in prostate cancer [57]. However, during a meta-analysis between three KLK3 polymorphisms (2018), no association was found about rs2735839 and prostate cancer [58]. rs721048 (A) was first recognized by the GWA studies in the Caucasian population as a risk factor for prostate cancer. Through a meta-analysis, which included 48,135 cases and 10,254 controls, Xiang Ao et al., showed strong association of this polymorphism with prostate cancer (OR = 1.14, 95% CI = 1.11-1.17, P = 0.000) [22]. Unlike the Icelandic population, this polymorphism is not associated with cancer in the Netherland, Spain and Sweden population [43] as well as present study.

Conclusions
The results of this study indicate that polymorphisms rs2735839, rs4242382 and rs1447295 are associated with prostate cancer predisposition in Iranian population therefore, they may be considered as a biomarker for prostate cancer. Exposure effect is homogeneous between different ages and serum PSA level categories. Association of allele A (rs4242382, rs2735839) with all three Gleason score categories and rs1447295 (A) only at Gleason score ≥ 8, was observed. In future studies, it is suggested that these confirmed polymorphisms be examined in a larger population and separately in each of the Iranian ethnic.

Research limitations
Generally, markers should be validated by testing their effectiveness in determining the target phenotype in independent population and different genetic backgrounds, which is referred to as marker validation. In the present study, the association was studied totally in Iranian population. Therefore, each of these polymorphic markers should be investigated separately in each Iranian race. Due to the limited population of the present study, the association of these polymorphisms in a larger population should be investigated for a more accurate conclusion. Performing NGS on patient samples can provide more comprehensive results regarding this cancer in the Iranian population.