Weighted correlation network and differential expression analyses identify candidate genes associated with BRAF gene in melanoma

Background Primary cutaneous malignant melanoma is a cancer of the pigment cells of the skin, some of which are accompanied by BRAF mutation. Melanoma incidence and mortality rates have been rising around the world. As the current knowledge about pathogenesis, clinical and genetic features of cutaneous melanoma is not very clear, we aim to use bioinformatics to identify the potential key genes involved in the expression and mutation status of BRAF. Methods Firstly, we used UCSC public hub datasets of melanoma (Lin et al., Cancer Res 68(3):664, 2008) to perform weighted genes co-expression network analysis (WGCNA) and differentially expressed genes analysis (DEGs), respectively. Secondly, overlapping genes between significant gene modules and DEGs were screened and validated at transcriptional levels and overall survival in TCGA and GTEx datasets. Lastly, the functional enrichment analysis was accomplished to find biological functions on the web-server database. Results We performed weighted correlation network and differential expression analyses, using gene expression data in melanoma samples. We identified 20 genes whose expression was correlated with the mutation status of BRAF. For further validation, three of these genes (CYR61, DUSP1, and RNASE4) were found to have similar expression patterns in skin tumors from TCGA compared with normal skin samples from GTEx. We also found that weak expression of these three genes was associated with worse overall survival in the TCGA data. These three genes were involved in the nucleic acid metabolic process. Conclusion In this study, CYR61, DUSP1, and RNASE4 were identified as potential genes of interest for future molecular studies in melanoma, which would improve our understanding of its causes and underlying molecular events. These candidate genes may provide a promising avenue of future research for therapeutic targets in melanoma. Electronic supplementary material The online version of this article (10.1186/s12881-019-0791-1) contains supplementary material, which is available to authorized users.


Background
Skin cutaneous melanoma (SKCM) is a malignant cancer that originates from melanocytes and exists in different forms. The main types are basal cell cancer (BCC), squamous cell cancer (SCC) and melanoma [1,2]. Melanoma is the most dangerous type of skin cancer. The primary cause of melanoma is ultraviolet light (UV) exposure in those with low levels of skin pigment [1,2]. The UV light may come from the sun or other sources, such as artificial light devices. Besides, about 25 % of melanoma derives from moles. Those with many moles, a history of affected family members, and who have poor immune function were at greater risk [2]. A number of rare genetic defects such as xeroderma pigmentosum also increase risk [3]. Diagnosis can be finished by biopsy of any concerning skin lesion [2].
At least 50 % of melanomas harbor a V600E mutation in the BRAF gene. Tumors with BRAF mutations could respond to BRAF kinase inhibitor vemurafenib that was approved by the FDA in 2011 for therapy of patients with advanced melanoma and late-stage (metastatic) melanoma [4,5]. Recently, the FDA approved the other two drugs named dabrafenib and ipilimumab as therapy for patients with BRAF V600E mutation-positive in melanoma [6].
Existing research has revealed that cancer cannot be caused by only one gene or factor. It must be a network of different genes and pathways working together. Weighted gene co-expression network analysis (WGCNA) [7] is a methodology used to analyze novel gene modules co-expressing in gene expression data. Many studies have shown that WGCNA can be used to explore genes, a network of genes and correlation of genes in different cancers [8,9]. Moreover, differentially expressed genes (DEGs) analysis method has been applied in gene expression data [10].
In this paper, the study was designed to find potential genes and correlated pathways associated with the expression level and mutation status of BRAF in melanoma samples. By analyzing gene expression data [11] from UCSC public hub with the WGCNA algorithm and DEGs analysis, significant gene modules associated with the expression level of BRAF were identified and differentially expressed genes associated with the mutation status of BRAF were screened, then overlapping genes were validated in TCGA and GTEx database.

Data collection
A dataset containing the gene expression and basic phenotypes information of 95 melanoma samples was downloaded from the Cancer Browser website (https:// xenabrowser.net/datapages/?cohort=Melanoma%20 (Lin%202,008)). The gene expression information was experimentally collected through GeneChip Fluidics Station (Affymetrix), and the matrix values were log 2 ratio transformed. Genes were mapped onto Affymetrix HT-HGU133A probeMAP.

Study population
Melanoma samples that had both expression data and BRAF mutation status were included for further analysis. According to this criterion, there were 67 melanoma samples (30 BRAF wild-type and 37 BRAF mutation) corresponding to our analysis requirement.

Data processing
After the dataset was downloaded, probe identification numbers (IDs) were transformed into gene symbols. For multiple probes corresponding to one gene, the probe with the most significant p-value from the downstream differential analysis was retained as the gene expression value. As for DEGs analysis, we divided 67 samples into two groups (BRAF wild-type and BRAF mutation group) for screening differentially expressed genes. As for WGCNA analysis, we used BRAF gene expression values as clinical trait data. Figure 1 shows the paths of the data analysis.

Weighted gene co-expression network construction
The full set of genes with available expression data (10,994 genes) was applied to find the scale-free gene modules of co-expression and highly correlated genes constructed by WGCNA [7]. To construct a weighted gene network, the soft threshold power β was set to 3, which was the lowest power based on scale-free topology [12]. We set the parameter maxBlockSize = 11,000, and TOMType = "unsigned". Topological overlap matrix (TOM) was calculated by adjacency transformation, and the value (1-TOM) was designated to the distance for identification of hierarchical clustering genes and modules. The minimum module size was set to 30.

Module clinical feature associations
In order to identify modules that were significantly associated with the designated clinical trait (the expression level of BRAF), we plotted the heat map of modules-trait relationship according to the tutorial of the WGCNA package for R.

Identification of DEGs
Linear models for microarray data (limma package) is a library used for analyzing gene expression microarray data [13], especially for the assessment of differential expression and the analysis of designed experiments [14,15]. limma package in R has been applied to identify the DEGs between BRAF mutation and wild-type (marked as control group) samples. Genes with |log2 fold change (FC)| ≥ 1 and adjusted p-value < 0.05 as the cut-off criterion were selected for subsequent analysis.

Validation of candidate genes
The overlapping genes between significant modules and DEGs were chosen as the potential genes for deep analysis and validation. GEPIA [16] (website: http://gepia.cancerpku.cn/) is a web server for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from the TCGA and the GTEx projects, using a standard processing pipeline. Survival analysis and expression consistency evaluation of potential genes were carried out in GEPIA built-in SKCM and GTEx datasets, which contain 461TCGA-SKCM tumor patients, 1TCGA-SKCM normal control, and 557 GTEx normal skin samples. For the transcriptional level validation, the criteria of significant results was set to |log2 fold change| ≥ 1 and p-value < 0.01. For the overall survival analysis in TCGA datasets, the 458 samples with available overall survival data were divided into high and low expression groups using the median TPM as a breakpoint, and significance was determined using a logrank test with p < 0.05.

Functional enrichment analysis
GenCLiP 2.0 [17] is a web-based text-mining server, which can analyze human genes associated with biological functions and molecular networks. We uploaded filtered genes to online analysis tool GenCLiP 2.0 (http://ci.smu. edu.cn/GenCLiP2/ analysis.php) to find correlated significant pathways.

Expression value analysis of microarray data
We chose 10,994 genes and 67 samples to construct the gene co-expression network by WGCNA. Figure 2a showed the relationship between the expression level of BRAF and melanoma samples.

Weighted gene co-expression network construction
Choosing a proper soft-thresholding power is a critical step when constructing a WGCNA network. As shown in Fig. 2b, power value 3(β = 3) was selected to produce a hierarchical clustering tree (Fig. 3) with different colors representing different modules.

Module clinical feature associations
Since we had a summary profile (eigengene) for each module, we simply correlated eigengenes with external traits (marked BRAF expression) and looked for the most significant associations. It was clear that the MEbrown (1021 genes) was most positive associated with the expression of BRAF (Fig. 4a). The results also   demonstrated that the MEturquoise (1858 genes) was most negative associated with the expression of BRAF (Fig. 4a). As shown in Fig. 4b, there were 27 eigengenes. The upper panels presented hierarchical clustering dendrograms of the eigengenes, in which the dissimilarity of eigengenes had been visualized. The bottle heatmaps presented the eigengene adjacencies for the expression of BRAF. The dendrogram indicated that brown and black modules were highly related and their correlations were stronger than their individual correlations with the expression BRAF (Fig. 4b).

Identification of DEGs
Compared with BRAF wild type group, a total of 36 genes were identified in BRAF mutation group by the threshold of |log2 fold change (FC)| ≥ 1 and adjusted p-value < 0.05, of which 9 were up-regulated genes and 27 were down-regulated genes ( Table 1).
In order to verify these 20 overlapping candidate genes, we validated on online web server GEPIA, which contained the TCGA and GTEx melanoma samples. Figure 6a-b demonstrated the expression level of 3 genes in BRAF wild-type and BRAF mutation samples of melanoma, which was in accordance with its expression level in normal and tumor patients of SKCM. It was also revealed that low expression of these three genes has a worse overall survival in SKCM patients (Fig. 6c). Besides, we had discarded the other 17 genes that did not exhibit significant differential expression in the TCGA/GTEx data concordant with that observed in the Lin et al. data, and  Table 1 Thirty-six differentially expressed genes (DEGs) were identified from melanoma, including 9 up-regulated genes and 27 down-regulated genes. (The up-regulated genes were listed from the largest to the smallest of fold changes, and downregulated genes were listed from the smallest to largest)

Functional enrichment analysis
We used online website GenCLiP 2.0 tools to perform the functional and signaling pathway enrichment analysis of the above three genes (CYR61, DUSP1, and RNASE4). As shown in Table 2, the potential candidate genes (CYR61, DUSP1, and RNASE4) were involved in the nucleic acid metabolic process, while CYR61 and DUSP1 were most significantly enriched in the growth factor binding, ERK1 and ERK2 cascade, and regulation of ERK1 and ERK2 cascade.

Discussion
Melanoma is the most fatal form of skin cancer and strikes tens of thousands of people worldwide each year. The amount of cases is increasing faster than any other type of malignant cancer [18]. Many patients with BRAF mutation have received target treatments and therapies which activate their body's own immune system. There is BRAF mutation in melanoma. Besides, mutation also exists in NRAS gene and PTEN gene. Some scientists have struggled to find drugs targeting the mutated NRAS protein or NRAS protein [19], while others have uncovered a mechanism of resistance of targeted therapies for melanoma and identified compounds that inhibit eIF4F and enhance the effectiveness of vemurafenib in mice with melanomas [20].
In this study, firstly we applied WGCNA to identify the two key modules in melanoma that were associated with the expression of BRAF gene (the brown module was positive, and the turquoise was negative). At the same time, we identified the DEGs in the BRAF mutation group compared with BRAF wild-type group. Then, we chose the overlapping genes between modules and DEGs. Finally, as to the gene expression level and overall survival validation, we expand the scope of comparison range to the tumor group versus the normal group in TCGA/GTEx datasets.
We found that CYR61, DUSP1, and RNASE4 were significantly related to gene expression level and survival analysis results. CYR61 (Cysteine-rich angiogenic inducer 61) is a secreted, matricellular protein [21], which is associated with a range of cellular activities, such as cell adhesion, migration, differentiation, proliferation, apoptosis [21,22]. Beak et al. suggested that CYR61 was highly expressed in colorectal carcinomas (CRC) and CYR61 might play a role as meaningful targets for therapeutic intervention of patients with CRC [23]. D' Antonio et al. also found that decreased expression level of CRY61 was associated with prostate cancer recurrence after surgical treatment [24]. DUSP1 (Dual specificity protein phosphatase 1) is an oncogene that is associated with cancer progression in gastric cancer as well as a negative regulator of the mitogen-activated protein kinase (MAPK) signaling pathway, has anti-inflammatory properties [25][26][27]. Xiaoyi et al. also found that DUSP1 phosphatase regulated the pro-inflammatory milieu in head and neck squamous cell carcinoma [28], in addition to promoting angiogenesis, invasion, and metastasis in non-small-cell lung cancer (NSCLC) [29]. RNASE4 (Ribonuclease 4) is an RNase that belongs to the pancreatic ribonuclease family and has marked specificity towards the 3′ side of uridine nucleotides [30]. Unfortunately, to date there has been no research focused on the relationship between these several genes with melanoma.
The primary purpose of the study focuses on the prediction of key potential genes in cancers via data mining and data analysis. Though we have validated results in the TCGA and GTEx datasets, results need to be confirmed through molecular and cellular experiments.

Conclusions
Firstly, we have identified overlapping genes associated with the expression and the mutation status of BRAF in melanoma through WGCNA and DEGs analysis, Fig. 6 a The gene expression (log2 ratio value) of CYR61, DUSP1, and RNASE4 in melanoma samples (unpaired t test, * indicates p < 0.01). b Validation of the gene expression of CYR61, DUSP1, and RNASE4 in TCGA-SKCM (including 461 tumor patients and 1 normal control) and GTEx (including 557 normal control). The cutoff was set to |log2 fold change (FC)| ≥ 1, and p < 0.01. * indicates p < 0.01. c Overall survival analysis of the expression level of CYR61, DUSP1, and RNASE4 in TCGA-SKCM on GEPIA website respectively. Then, validation was applied to these overlapping genes, and three genes (CYR61, DUSP1, and RNASE4) were screened. However, more direct evidence is needed to confirm their association with melanoma. The study may be helpful for future studies concerning melanoma with the aim of finding potential key molecule targets of melanoma.