A new panel of SNPs to assess thyroid carcinoma risk: a pilot study in a Brazilian admixture population

Background Thyroid cancer is a common malignant disease of the endocrine system with increasing incidence rates over the last few decades. In this study, we sought to analyze the possible association of 45 single nucleotide polymorphisms (SNPs) with thyroid cancer in a population from Rio Grande do Norte, Brazil. Methods Based on histological analysis by a pathologist, 80 normal thyroid specimens of tissue adjacent to thyroid tumors were obtained from the biobank at the Laboratory of Pathology of Liga Norte Riograndense Contra o Câncer, Natal, RN. Patient samples were then genotyped using the MassARRAY platform (Sequenon, Inc) followed by statistical analysis employing the SNPassoc package in R program. The genotypic frequencies of all 45 SNPs obtained from the International HapMap Project database and based on data from the ancestral populations of European and African origin were used to compose the control study group. Results In our study, the following 9 SNPs showed significant differences in their frequency when comparing the study and control groups: rs3744962, rs258107, rs1461855, rs4075022, rs9943744, rs4075570, rs2356508, rs17485896, and rs2651339. Furthermore, the SNPs rs374492 C/T and rs258107 C/T were associated with a relative risk for thyroid carcinoma of 3.78 (p = 6.27 × 10e−5) and 2.91 (p = 8.27 × 10e−5), respectively, after Bonferroni’s correction for multiple comparisons. Conclusions These nine polymorphisms could be potential biomarkers of predisposition to thyroid carcinoma in the population from Rio Grande do Norte. However, complementary studies including a control group with samples obtained from healthy subjects in Rio Grande do Norte state, should be conducted to confirm these results. Electronic supplementary material The online version of this article (10.1186/s12881-017-0502-8) contains supplementary material, which is available to authorized users.


Background
Thyroid cancer (TC) is the most common endocrine malignancy, and its incidence rate has been increasing noticeably for decades [1]. Hence, if the observed trends are maintained, TC will replace colorectal cancer as the fourth leading cancer diagnosis in the United States by 2030 [2]. Such an increase is likely due to improved diagnosis, and largely or completely reflects the over diagnosis of indolent disease [3]. According to the Brazilian National Cancer Institute, about 6.960 new cases of TC were expected in Brazil in 2016 [4]. The state of Rio Grande do Norte, located in the northeast region of Brazil, presents the fourth highest estimated TC incidence rate in the country, which corresponds to 130 cases per 100,000 women and 30 cases per 100,000 men [4].
The thyroid gland is mainly composed of two cell types: follicular cells and parafollicular cells. Based on histological and clinical parameters, follicular cellderived carcinomas, which correspond to more than 90% of thyroid carcinomas, are typically divided into well differentiated, poorly differentiated, and undifferentiated (or anaplastic) carcinomas. Well differentiated thyroid carcinomas (WDTC) are further subdivided histologically as papillary thyroid carcinoma and follicular thyroid carcinoma, which correspond to 80% and 10% of thyroid carcinomas, respectively [5]. Both these carcinomas can progress to poorly differentiated carcinoma or completely lose differentiation, giving rise to anaplastic carcinoma [6]. On the other hand, parafollicular cell-derived medullary carcinoma accounts for about 5% of thyroid carcinomas [5,7].
The recent progress in understanding the molecular pathogenesis of TC has paved the way for developing more effective treatment strategies. This has mainly resulted from identification of numerous genetic and epigenetic changes including mutations, gene copy-number gain, gene translocation, and aberrant gene methylation, resulting in alterations of the signaling pathways involved in regulating cell proliferation and survival, such as the MAPK and PI3K-AKT pathways among others, which are reshaping TC medicine [8,9]. In addition, it is widely accepted that genetic predisposition to WDTC is expected to have common low-penetrance and rare moderate-penetrance genetic variants interacting with each other and with the environment determining individual susceptibility [10].
Recent high-throughput genotyping (GWAS) studies sought to identify single nucleotide polymorphisms (SNPs) associated with increased risk of thyroid tumorigenesis and to better explain the role of genetic variations in predisposition to TC [10][11][12][13][14][15][16][17][18][19][20][21][22]. However, some studies have demonstrated contradictory results, probably due to genetic diversity both within and among populations. This observation raises the question of whether a new panel of SNPs could be used as potential predisposition markers for TC.
Therefore, this study aimed to investigate whether the selected SNPs could be associated with predisposition to TC in patients diagnosed with this neoplasia in the state of Rio Grande do Norte. Considering that the population of this region is characterized by a marked ethnic mixture, this study may significantly contribute to elucidating the molecular basis underlying both predisposition to TC and the effect of interbreed populations on SNPbased association studies.
Genotyping SNP genotyping of the case group was performed using the MassARRAY SNP genotyping system (Agena Bioscience San Diego, EUA) according to manufacturer's instructions at the National Genotyping Center (CEGEN), on a panel of 45 SNP assays. The primers for amplification and extension were designed using the Extend Primer Assay Design software v4. Sequenom iPLEX GOLD chemistry was used for locus-specific amplification, followed by a single-base primer extension reaction, which generated products of different masses that were quantitatively analyzed using MALDI-TOF mass spectrometry. The resulting data were analyzed using TyperAnalyzer software v 4, followed by manual inspection of the spectra by trained personnel [23]. All assays were performed in 384-well plates, including negative controls and a trio of Coriell samples (Na10830, Na10831, and Na12147) for quality control.

Statistical analysis
Statistical tests were performed using the SNPassoc function in the R software v2.14.1 statistical package (R Development Core Team, 2006) [24]. For each SNP, associations were assessed by applying logistic regression to estimate the odds ratios (ORs) with 95% confidence intervals and P values. A P value <0.05 was considered statistically significant.

Results
Eighty TC patients from Brazil were genotyped for 45 SNPs, to identify potential molecular markers of predisposition to this neoplasia. Among all the patients studied, 77 were diagnosed with papillary thyroid carcinoma and 3 with follicular thyroid carcinoma. The age of one patient was not determined, and the staging, location, and size of tumor were also not determined in some patients. Among the 80 patients, 61 lymph node metastases were not evaluated. These data are shown in Table 1.
The allelic frequencies of 45 SNPs were analyzed and the minimum cutoff value for missing genotypes was defined as 10%. The allelic composition, allele major frequency and the p value for the Hardy-Weinberg equilibrium (HWE) of each SNP are shown in Table 2. Of these 45 SNPs, 11 were excluded from the study because they were not in the HWE (p < 0.05). The allelic frequencies from the Brazilian population were compared with those available in the HapMap database for 180 individuals of European and African ancestry, which constituted the control group.
The association analysis between patients and control groups for SNPs that showed a p-value greater than 0.05 for Hardy-Weinberg equilibrium is shown in Table 3 and Additional file 1: Table S1, which classify SNPs according to the inheritance model: dominant (AA/Ab + bb); recessive (AA + Ab/bb); over dominant (Ab/AA + bb) and codominant (AA/Ab/bb). As observed, 7 SNPs showed significant expression in the co-dominant model (rs1461855, rs2356508, rs258107, rs3744962, rs4075022, rs4075570, and rs9943744), 5 in the dominant model (rs1461855, rs258107, rs3744962, rs4075022, and rs9943744), 4 in the recessive model (rs1461855, rs2651339, rs4075570, and rs9943744) and 6 in the over dominant model (rs1461855, rs17485896, rs2356508, rs258107, rs3744962, and rs4075022). Most SNPs presented statistical significance under more than one model of inheritance. Furthermore, the SNPs rs374492 C/T and rs258107 C/T were associated with a relative risk for TC of 3.78 (p = 6.27 × 10e −5 ) and 2.91 (p = 8.27 × 10e −5 ), respectively after Bonferroni's correction for multiple comparisons. Figure 1 shows the results for logistic regression analysis of significant SNPs according to the model of inheritance, their respective odds ratios (OR), and p-values. Among the 9 SNPs analyzed (rs1461855, rs17485896, rs2356508, rs258107, rs2651339, rs3744962, rs4075022, rs4075570, and rs9943744), 7 showed OR values greater than 1, meaning that those alleles represent a risk for thyroid cancer development. In particular, the SNP rs3744962 showed the greatest risk with an OR value of 3.78. On the other hand, 2 SNPs (rs1461855 and rs2356508) were associated with a protective effect, with OR values less than 1.
All these previous studies conclude that the genetic ancestry of the Brazilian population is widely heterogeneous and is characterized by extensive admixture from three different ancestral roots: Amerindians, Europeans, and Africans. Furthermore, the composition of the Brazilian population ancestry follows a clear trend related to historical facts [37,38]. European colonization of Brazil showed successive migration waves, starting with the arrival of around 500,000 Portuguese men between the years 1500 and 1800, when they met the local Amerindian population. Portuguese-Amerindian admixture began soon after arrival of the first colonizers. Simultaneously, about 4 million Africans were compulsorily introduced into Brazil as slaves until the mideighteenth century, providing the second major ethnic contribution to the Brazilian population. Later, other waves of immigration to Brazil occurred, mainly from Italy, Portugal, Spain, Germany, Syria, Lebanon, and Japan, adding even more complexity to the already multi-ethnic highly admixed Brazilian population [37,38].
Brazil is the fifth largest country in the world and is divided into five geographical regions. In all these regions, the European ancestry is predominant with proportions ranging from 60.6% in the Northeast to 77.7% in the South [34,[39][40][41][42]. The African and Amerindian contributions to the population genetic background vary according to the region. The African ancestry proportion was the second largest in the Northeast (30.3%), followed in decreasing order by the Southeast, South, and North. On the other hand, populations in the North consist of a significant proportion of Amerindian ancestry, while remaining relatively uniform in other regions.  In particular, the northeast region of Brazil where the state of Rio Grande do Norte is located, presents the highest African ancestry contribution in the country ranging from 18.6 to 56.8% [34,[39][40][41][42][43].
The heterogeneity and admixture of Brazilians constitutes an important issue to be considered in genetic association studies [34]. Besides, it has important clinical implications for the design and interpretation of clinical trials and genetic counseling. Because of its heterogeneous Amerindian, European, and African ancestral roots, the Brazilian population has been considered an important model for population genetics. In this context, our work aimed to genotype and assess the allele frequency of 45 SNPs to identify potential molecular markers of predisposition to TC in the population of Rio Grande do Norte, compared to the frequency of these alleles in European and African populations.
An extensive review of literature showed that considerable progress towards understanding complex diseases has been made in recent years due to development of highthroughput genotyping technologies [44]. Furthermore, several GWAS studies have identified SNPs associated with many complex diseases or traits. In the particular case of TC, genetic risk loci are under characterization in terms of allelic variants or genes responsible for association with the disease, as well as the affected biological pathway and the main cell type driving its pathology [45].
In this regard, a growing number of studies have identified several SNPs associated with TC risk [10-17, 19, 20, 22, 46-49]. In the study by Gudmundsson and collaborators [46], a GWAS analysis was performed in a population from Iceland and showed a strong association of SNPs rs965513 and rs944289 with papillary and follicular thyroid cancer. Furthermore, rs944289 and rs965513 correspond to gene variants of Thyroid transcription factor 1 and 2, respectively, and both TC risk alleles were associated with low concentrations of thyroid stimulating hormone, and SNP rs965513 was associated with low thyroxin concentration and high triiodothyronine concentration [46]. Later, the same group found that rs966423, rs2439302, and rs116909374 variants were also associated with TC [11].
The association of rs944289 and rs965513 with TC risk was also shown in genetic studies conducted in Asian populations [14,50,51]. Additionally, Wang and collaborators [48] observed that SNPs rs966423 and rs2439302, previously reported to be associated with TC in European populations, were confirmed as risk factors in the Chinese population.
Similarly, another study has demonstrated that SNPs rs2910164, rs965513, rs1867277, rs6983267, and rs944289 showed a significant association with TC risk in patients from the United Kingdom [12]. Furthermore, according to Jendrzejewski [52], the SNP rs944289 can predispose to papillary thyroid carcinoma through deregulation of PTCSC3 expression, which acts as a tumor suppressor. A correlation study between Forkhead box E1 (FOXE1) gene variants rs894673, rs1867277, and rs3758249, and histopathological features of TC, suggested that FOXE1 variations generate a higher risk for poor histopathological features in papillary thyroid carcinoma [17]. In addition, Mancikova et al., [10] showed an association between SNPs rs7028661 and rs7037324, located near the FOXE1 locus, and TC risk. Moreover, the rare alleles of three SNPs (rs2997312, rs10788123 and rs1254167) showed suggestive evidence of association with higher risk for the disease. On the other hand, the SNP rs4075570 conferred protection in the series studied.
Finally, Lidral [15] showed that rs7850258 G allele associated with cleft lip, cleft palate, and hypothyroidism, has significantly greater enhancer activity than the allele associated with thyroid cancer (A); and Wokolorczyk et al., [47] showed that the SNP rs6983267 could be a good candidate multi-cancer susceptibility marker, once this SNP is associated with a wide range of cancers affecting the colon, prostate, breast, bladder, larynx, lung, kidney, and the thyroid [47]. These are some examples of the rapidly increasing number of studies focused on identifying genetic variants associated with TC predisposition and development.
An important challenge faced by molecular epidemiological association studies of candidate disease-susceptibility genes is to define variants that are functionally implicated in the disease. In addition to identifying polymorphic genetic variants and their possible association with diseases, it is necessary to understand the functional relevance of these SNPs. This is particularly urgent because the amount of genomic information that is available greatly exceeds the information about the function of variants that are presented in human disease studies [53]. Importantly, most of the recent genetic studies only establish the statistical associations of genetic markers and the disease, without supporting evidence of functional relevance. In context of predisposition to TC, new light is given on the molecular mechanisms of genetic variants through advances in molecular technologies.
As an example, the association of polymorphisms in DNA repair genes XRCC1 (rs25487, rs1799782) and XRCC3 (rs861539) with thyroid cancer risk and progression can be considered. As Yan and collaborators showed, the XRCC1 variant can interact with the XRCC3 variant to significantly increase differentiated thyroid carcinoma (DTC) susceptibility [22]. Similarly, a study to assess the role of polymorphisms in the LEP (rs7799039 and rs2167270) and LEPR (rs1137101 and rs1137100) genes in DTC susceptibility and their effect Fig. 1 Associations between 9 SNPs and thyroid cancer susceptibility analyzed by the Forest plot. a All SNPs that show significant risk associated with thyroid cancer development. b All SNPs that show significant protection against thyroid cancer development. The summary represents a combination of SNPs that show risk or protection regarding cancer development on leptin levels showed that LEP polymorphisms modify serum leptin concentrations in patients with DTC. Furthermore, the polymorphisms rs7799039 and rs1137101 increase the risk of DTC development, though they do not correlate with tumor aggressiveness [16]. As another example, supporting evidence of functional relevance was presented by Ceolin and collaborators [19], who evaluated the frequency of RET (REarranged during Transfection) proto-oncogene 3'UTR variants (rs76759170 and rs3026785) in Medullary Thyroid Carcinoma (MTC) patients. In silico analysis indicated that these variants might affect the secondary structure of RET mRNA, suggesting that they might play a role in the posttranscriptional control of RET transcripts.
In our study for molecular markers of predisposition to TC, we assessed the allele and genotype frequency of 45 SNPs, including four of the aforementioned SNPs. We observed that SNPs rs2910164 and rs965513, despite being in HWE, did not show associated risk with TC for the Brazilian population. Furthermore, SNPs rs6983267 and rs944289 were not in Hardy-Weinberg equilibrium in our studied groups. As observed, the frequencies of these particular SNPs differ from those observed for European, American and Asian populations, where some of these markers were associated with an increased risk of developing TC. These results reinforce the concept that the genetic constitution of a population, and the contribution of ancestral roots, is an important and influencing parameter in genetic association studies.
Other studies evaluated potential markers for predisposition to thyroid cancer in larger sample sizes [10,46]. These studies were performed in European populations, but these kinds of studies were not previously performed in a Brazilian population. Therefore, this pilot study presents a good opportunity to examine the feasibility of our approach, exploring a novel panel of SNPs that could potentially predict TC risk. We have thus first investigated these SNPs in a pilot study with a plan to validate our findings in a larger sample in the future. It is critical to improve both our knowledge of thyroid cancer risk factors and our knowledge of markers that predict aggressive disease in order to reduce disease incidence and unnecessary treatments that result in undesirable side effects with long-term financial and clinical impacts.

Conclusions
Of a total of 35 SNPs that were found in HWE in the Brazilian population, logistic regression analysis of 7 SNPs showed statistical significance when evaluated according to the genetic model of inheritance. For the first time, our results suggest that a new panel constituted of SNPs rs3744962, rs258107, rs4075022, rs9943744, rs4075570, rs17485896, and rs2651339 could represent predisposition to TC development in the population of Rio Grande do Norte. Furthermore, they may be considered suitable molecular markers for early diagnosis of the disease. These SNPs could be useful models to predict risk and for genetic counseling in clinical practice, thus avoiding invasive methods of analysis and often inconclusive diagnoses. In addition, SNPs rs1461855 and rs2356508 might possibly be associated with a protective effect against TC development. Nevertheless, complementary studies with a larger patient population as well as a control group composed of samples from the state of Rio Grande do Norte should be conducted to confirm these results. -The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.
Authors' contributions ICCS, DM and JCS performed the samples collection and processing. ICCS, JNGA and VHRD contributed to this work by doing the genomic extraction of the samples. ICCS, MT and GMB performed the genotype analysis of the samples. CCOR performed the histopathological analysis of the samples. AC made substantial contributions to acquisition of data. JG has been involved in drafting the manuscript and revising it critically for important intellectual content. ADL analyzed and interpreted the patient data regarding the SNP frequency performed. VS made substantial contributions to conception and design, analysis and interpretation of data. All authors read and approved the final manuscript.
Ethics approval and consent to participate As this is a retrospective study where biopsy samples were obtained from 2003 to 2011, the Ethics Committee in Research from Hospital Liga Norte Riograndense Contra o Câncer approved the conduction of the research without informed consent of the participants for this particular case (Protocol number 558788). Nevertheless, the authors protected the privacy and maintain the confidentiality.

Consent for publication
Not applicable.

Competing interests
Competing interests. The authors declare that they have no competing interests.