Mutation spectrum in human colorectal cancers and potential functional relevance
© Yin et al.; licensee BioMed Central Ltd. 2013
Received: 13 August 2012
Accepted: 10 January 2013
Published: 8 March 2013
Somatic variants, which occur in the genome of all cells, are well accepted to play a critical role in cancer development, as their accumulation in genes could affect cell proliferations and cell cycle.
In order to understand the role of somatic mutations in human colorectal cancers, we characterized the mutation spectrum in two colorectal tumor tissues and their matched normal tissues, by analyzing deep-sequenced transcriptome data.
We found a higher mutation rate of somatic variants in tumor tissues in comparison with normal tissues, but no trend was observed for mutation properties. By applying a series of stringent filters, we identified 418 genes with tumor specific disruptive somatic variants. Of these genes, three genes in mucin protein family (MUC2, MUC4, and MU12) are of particular interests. It has been reported that the expression of mucin proteins was correlated with the progression of colorectal cancer therefore somatic variants within those genes can interrupt their normal expression and thus contribute to the tumorigenesis.
Our findings provide evidence of the utility of RNA-Seq in mutation screening in cancer studies, and suggest a list of candidate genes for future colorectal cancer diagnosis and treatment.
KeywordsColorectal cancers Mutation spectrum RNA-Seq Transcriptome
As the third most common malignancy and the fourth major cause of cancer mortality , colorectal cancer is an important threat to human health which accounts for 1 million new cases worldwide each year. The consistency between incidence rates and economic development reflects a westernized lifestyle and attendant risk factor exposures . As a complex condition, colorectal tumor progression is associated with both genetic and environmental factors. To date, only a few common low-penetrance variants attributing to cancer risk have been identified using genome-wide association studies (GWAS), and it is still largely unknown to us the underlying mechanisms and genes involved in tumor development.
Recently, the importance of somatic mutations in cancer development has been widely accepted. It is thought that cancer evolves through the accumulation of somatic mutations in specific genes, depending on various tumor type . Evidence showed that mutation frequency of candidate cancer genes is much higher than expected, and that the particular combination of mutations could influence the tumor's properties [3–6]. These mutations are caused by a combination of environmental and heritable factors . Since the release of the human genome sequence, great efforts have been taken to identify somatic variants in colorectal cancers. For example, Sanger sequencing technique is applied to 13,023 genes and resulted in 189 genes with unexpected excess of somatic mutations in human breast and colorectal cancers . Another group of scientists have used mismatch repair detection (MRD) approach to screen 93 matched tumor-normal sample pairs and 22 cell lines for somatic mutations in 30 cancer relevant genes, and found a total of 152 somatic mutations in breast and colorectal cancers , including previously reported genes, such as BRAF and KRAS.
The recent development of novel high-throughput sequencing methods has provided an unprecedented opportunity to conduct whole-genome scale studies at an affordable cost, and is extensively applied in transcriptome profiling. This method, termed RNA-Seq, gives a far more precise measurement of expression levels of transcripts and a far more sophisticated characterization of their isoforms [9, 10], and has brought successes including identification of differentially expressed genes , fusion genes in tumor tissue [12–14], allele-specific expressed genes [15, 16]. Moreover, it can also serve as an efficient and cost-effective approach to systematically screen variants in transcribed regions [17–20]. To gain insight into the variation spectrum in tumor samples, we developed a sophisticated variant discovery pipeline and applied it to deep-sequencing transcriptome data from 2 colorectal cancer tissues and their matched normal tissues. There are more variants found in tumor tissues than in normal tissues. After additional filters, we also identified tumor-specific mutations in unreported genes, which supplement the increasing list of candidate colorectal cancer genes.
Whole transcriptome sequencing data of paired tumor and normal tissues from 2 stage III colorectal cancer patients were downloaded from NCBI Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo), with the accession number SRP006900. Specifically, 65-bp single-end short reads were generated by Illumina Genome Analyzer, following the standard procedure.
All single-end reads were aligned to UCSC human genome reference assembly (hg19), limited to chromosomes 1–22, X and Y. The alignment was carried out using BWA  with default parameters, which allows 4% mismatches in each alignment.
In each tissue sample, we called variants from the read alignment using SAMtools package . To avoid potential PCR duplicate fragments, we set –D as 100 when invoking vcfutils.pl script, although it varied little when this option is set to 1000 (~3% increase in the total number of variants). Next, we applied several filters to reduce possible false positive calls.
Filter 1.1 We first removed variants that were mistakenly called with a probability greater than 0.01. This was done by requiring a value ≥20 for the ‘QUAL’ column in vcf files generated by SAMtools.
Filter 1.2 We eliminated false positives that were caused by extremely high sequence coverage. To obtain the optimal upper bound for sequence coverage, we searched for variants after filter 1.1 which were also showed in the dbSNP build 135, and assign them as known set. Then, we decided a cut-off value as 97.5% of known variants have lower coverage than that and applied it to the remaining variants. This step was performed independently for each sample.
Identification of somatic variants
Somatic variants were called by comparing paired normal and tumor tissues. We used custom tools to parse variants after initial filters with following additional filters:
Filter 2.1 Variants in genomic regions of low quality were first excluded for further analysis. Poor quality regions were defined as regions with read coverage in only one sample of a pair, which could be caused by random bias.
Filter 2.2 We next removed variants that were presented in dbSNP135 , leaving novel variants.
Filter 2.3 This filter removes variants that are found in both of the matched normal and tumor tissues.
Filter 2.4 To reduce false positives caused by alignment difficulties around indels, we calculated the local mismatch rate as the percentage of mismatches within 10-bp downstream and upstream of a variant. Variants with high local mismatch rate (≥0.1, or ≥2 mismatches) were discarded.
Gene ontology analysis
The gene ontology (GO)  information for genes was assigned using bioconductor (http://www.bioconductor.org) package “org.Hs.eg.db”. The enrichment tests were performed using “topGO” package .
Read alignment and mutation spectrum
Sample and alignment summary
# unique reads
Identification of somatic variants
To investigate the potential effect of variants on oncogenesis, we next compared somatic variants between paired normal and tumor samples. Several additional filters were applied to call high confident somatic variants. First, if variant positions were only covered in one sample, we removed them to avoid false positives that are probably caused by sequence bias, resulting in 18,970 and 16,409 tumor and normal variants per sample. Next, we filtered known variants found in dbSNP135 , which leads to 11,749 and 9,857 novel variants in each tumor and normal sample, respectively. We also removed variants found in both tumor samples and matched normal samples, as well as variants with a high local mutation rate (2 mismatches in the flanking 20-bp region), which might be a result of local misalignment. In total, we obtained 3,382 tumor-specific novel variants and 1,812 variants per sample, across all autosomes and sex chromosomes.
Of note, the ratio of tumor versus normal samples is significantly higher for novel variants when compared to all variants (3,382/1,812 versus 23,549/19,383, P < 2.2 × 10-16, Fisher’s exact test), but no bias is observed for transition/transversion ratio between tumor and normal samples (2,054/719 versus 3,929/1,466, P = 0.235, Fisher’s exact test), so it is less likely that the excess of somatic variants in tumor samples are due to high false positive rate.
List of genes that contain somatic disruptive variants in both tumor samples in this study
Functional characterization of genes with somatic variants
Enriched molecular function categories in GO analysis
interspecies interaction between organisms
cell projection organization
interspecies interaction between organisms
cell projection organization
Characterization of potential colorectal cancer genes
As is well-known, accumulation of somatic variants is the basic mechanism leading to the development of malignancy. Due to the development of massively parallel sequencing, which makes large-scale sequencing affordable and available, we witnessed a rapid accumulation of somatic variants found in colorectal cancer, such as MLH3, BRAF, GALNT12, and TP53[32–36]. In the present analysis, we have identified 418 genes with somatic disruptive variants in two tumor samples. Among these genes, we found previously identified genes, such as TP53, and tumor-related or oncogenes, such as RAB5C, PIM-3, TPT1, ST14. Here we only present several high confident candidate genes that were found in both tumor samples and were good target for diagnosis marker and drug development. Guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1 (GNB2L1), which is also known as RACK1, encodes a ubiquitously expressed scaffolding protein and plays a crucial regulatory role in tumor growth . We have detected a 1-bp insertion in both tumor samples, and another 2-bp insertion and a C->T point mutation in one tumor sample. These changes could impact the normal function of GNB2L1 and thus tumor progression. We also found several members of the mucin protein family that have somatic variants in both tumor samples. Mucin proteins are the major constituents of mucus, which is the viscous secretion that covers epithelial surfaces. There were 2 indels in MUC2, 10 indels and point variants in MUC4, as well as 1 indel and 1 point variant in MUC12. Since the expression of mucin proteins has been correlated with aggressiveness of colorectal cancer , the excess of disruptive variants in mucin genes further confirmed their importance in colorectal carcinogenesis.
Recent advances in sequencing technologies continuously reduce sequencing costs and increase sequence output at an unprecedented rate, making RNA-Seq an appropriate method to characterize transcriptome profiles, such as gene expression differences or splicing variations. Wang et al. also used RNA-Seq data to derive sample-specific protein databases . By applying this method to two colorectal cancer cell lines SW480 and RKO, they found a significant improvement in protein identification. In addition, RNA-Seq can also be used for variant detection in transcribed regions, which is suitable for identification of somatic mutations [17–20, 40, 41]. However, it has been concerned that variant-calling by RNA-Seq is prone to error  and could generate a high false discovery rate. To minimize that, we implemented a series of stringent filters in our bioinformatic discovery pipeline. First, we required each variant should have a quality score no less than 20, removing poorly called variants. Next, we used variants that were found in dbSNP135 dataset to train our pipeline and filtered variants with extremely high read coverage. We also applied additional stringent filters to call high confident tissue-specific novel variants, including removing variants with high local mismatch rate. In our final list, we identified more somatic variants in tumor samples than in normal samples, and some variants were in tumor-related genes. Due to our strict filters, we argued that there should be more genes containing tumor-specific somatic variants.
It is widely acknowledged that accumulations of mutations in oncogenes and tumor suppressor genes are the main cause of human cancer . Mutations occurred only in tumor tissues provide important information to understand the potential biological processes underlying carcinogenesis, as well as to facilitate the development of diagnostic and therapeutic markers. As the development of sequencing techniques and the decrease of corresponding costs, large-scale studies begin to accumulate to identify somatic mutations in colorectal cancers. In one study, Sjöblom et al. used polymerase chain reaction (PCR) approach to analyze 13,023 genes in 11 breast and 11 colorectal cancers , and found an average of ~90 mutated genes per tumor sample. Using stringent criteria, they identified 189 significantly mutated genes, which affect a wide range of cellular functions, including transcription, adhesion, and invasion. In another study, Timmerman et al. applied next-generation sequencing to sequence the whole exome of primary colon tumors as well as adjacent not affected normal colonic tissue . More than 50,000 small nucleotide variations were identified for each tissue, and there are 359 and 45 most significant mutations in microsatellite stable (MSS) and microsatellite instable (MSI) colon cancers. Somatic mutations were found in the intracellular kinase domain of bone morphogenetic protein receptor 1A, BMPR1A, of which germline mutations are associated with juvenile polyposis syndrome. In this present study, we analyzed RNA-Seq data from 2 colorectal tumors and their matched normal tissues to compare their mutation spectra. In general, tumor tissues were enriched in somatic variants compared with normal tissues. By mapping short reads to 54,665 annotated human genes, we have detected 418 genes with somatic variants in tumor tissues, including 3 mucin genes found in both tumor samples. Mucins are complex glycoproteins and play important roles in protecting epithelial surfaces , alterations in mucin expression and the extent of their glycosylation have been reported to be associated with neoplastic progression and metastasis in several human cancers [42–44]. Since disruptive variants may radically change protein functions instead of gene expression, we further used SIFT tool  to assess their effects on protein functions. 10 of 12 variants were classified as tolerated variants, which have a limited impact on the protein function. Thus it is more likely that these disruptive mutations in mucin genes regulate gene expression and thus lead to tumorigenesis. Additionally, mucins can form insoluble mucous to protect gut lumen, therefore amino acid changes in these genes could result in the modification of the micro-environment. This change may in turn lead to the proliferation of some bacteria such as Fusobacterium nucleatum and Coriobacteria, which have been reported to be significantly over-represented in colorectal tumor specimens [46, 47]. Somatic disruptive mutations in these genes found here suggest the abnormality of their expression is related to colorectal tumorigenesis.
RNA-Seq is a powerful tool to identify somatic mutations in protein-coding regions after sophisticated filters. The list of genes we found in this study only represents a minimal set of candidate genes, due to the stringent criteria we applied. However, the identification of several oncogenes and tumorigenesis genes, as well as signal pathway genes, provides meaningful candidates to understand the molecular mechanism of colorectal cancer and for future drug target development. Although additional validations and functional examination are helpful, RNA-Seq, with well developed bioinformatic pipeline, can serve as the first step for somatic variant screening in human cancers.
- Tenesa A, Dunlop MG: New insights into the aetiology of colorectal cancer from genome-wide association studies. Nat Rev Genet. 2009, 10 (6): 353-358. 10.1038/nrg2574.PubMedView Article
- Vogelstein B, Kinzler KW: Cancer genes and the pathways they control. Nat Med. 2004, 10 (8): 789-799. 10.1038/nm1087.PubMedView Article
- Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C: Patterns of somatic mutation in human cancer genomes. Nature. 2007, 446 (7132): 153-158. 10.1038/nature05610.PubMedPubMed CentralView Article
- Jones S, Zhang X, Parsons DW, Lin JC, Leary RJ, Angenendt P, Mankoo P, Carter H, Kamiyama H, Jimeno A: Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008, 321 (5897): 1801-1806. 10.1126/science.1164368.PubMedPubMed CentralView Article
- Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N: The consensus coding sequences of human breast and colorectal cancers. Science. 2006, 314 (5797): 268-274. 10.1126/science.1133427.PubMedView Article
- Wood LD, Parsons DW, Jones S, Lin J, Sjoblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J: The genomic landscapes of human breast and colorectal cancers. Science. 2007, 318 (5853): 1108-1113. 10.1126/science.1145720.PubMedView Article
- Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000, 343 (2): 78-85. 10.1056/NEJM200007133430201.PubMedView Article
- Bentivegna S, Zheng J, Namsaraev E, Carlton VE, Pavlicek A, Moorhead M, Siddiqui F, Wang Z, Lee L, Ireland JS: Rapid identification of somatic mutations in colorectal and breast cancer tissues using mismatch repair detection (MRD). Hum Mutat. 2008, 29 (3): 441-450. 10.1002/humu.20672.PubMedView Article
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.PubMedPubMed CentralView Article
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.PubMedView Article
- Zhang LQ, Cheranova D, Gibson M, Ding S, Heruth DP, Fang D, Ye SQ: RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin. PLoS One. 2012, 7 (2): e31229-10.1371/journal.pone.0031229.PubMedPubMed CentralView Article
- Ju YS, Lee WC, Shin JY, Lee S, Bleazard T, Won JK, Kim YT, Kim JI, Kang JH, Seo JS: A transforming KIF5B and RET gene fusion in lung adenocarcinoma revealed from whole-genome and transcriptome sequencing. Genome Res. 2012, 22 (30): 436-445.PubMedPubMed CentralView Article
- Kohno T, Ichikawa H, Totoki Y, Yasuda K, Hiramoto M, Nammo T, Sakamoto H, Tsuta K, Furuta K, Shimada Y: KIF5B-RET fusions in lung adenocarcinoma. Nat Med. 2012, 18 (3): 375-377. 10.1038/nm.2644.PubMedView Article
- Lee CH, Ou WB, Marino-Enriquez A, Zhu M, Mayeda M, Wang Y, Guo X, Brunner AL, Amant F, French CA: 14-3-3 fusion oncogenes in high-grade endometrial stromal sarcoma. Proc Natl Acad Sci USA. 2012, 109 (3): 929-934. 10.1073/pnas.1115528109.PubMedPubMed CentralView Article
- Gregg C, Zhang J, Butler JE, Haig D, Dulac C: Sex-specific parent-of-origin allelic expression in the mouse brain. Science. 2010, 329 (5992): 682-685. 10.1126/science.1190831.PubMedPubMed CentralView Article
- Gregg C, Zhang J, Weissbourd B, Luo S, Schroth GP, Haig D, Dulac C: High-resolution analysis of parent-of-origin allelic expression in the mouse brain. Science. 2010, 329 (5992): 643-648. 10.1126/science.1190830.PubMedPubMed CentralView Article
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008, 5 (7): 613-619. 10.1038/nmeth.1223.PubMedView Article
- Cirulli ET, Singh A, Shianna KV, Ge D, Smith JP, Maia JM, Heinzen EL, Goedert JJ, Goldstein DB: Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biol. 2010, 11 (5): R57-10.1186/gb-2010-11-5-r57.PubMedPubMed CentralView Article
- Kridel R, Meissner B, Rogic S, Boyle M, Telenius A, Woolcock B, Gunawardana J, Jenkins C, Cochrane C, Ben-Neriah S: Whole transcriptome sequencing reveals recurrent NOTCH1 mutations in mantle cell lymphoma. Blood. 2012, 119 (9): 1963-1971. 10.1182/blood-2011-11-391474.PubMedView Article
- Canovas A, Rincon G, Islas-Trejo A, Wickramasinghe S, Medrano JF: SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mamm Genome. 2010, 21 (11–12): 592-598.PubMedPubMed CentralView Article
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMedPubMed CentralView Article
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.PubMedPubMed CentralView Article
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K: dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001, 29 (1): 308-311. 10.1093/nar/29.1.308.PubMedPubMed CentralView Article
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMedPubMed CentralView Article
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22 (13): 1600-1607. 10.1093/bioinformatics/btl140.PubMedView Article
- Lam HY, Pan C, Clark MJ, Lacroute P, Chen R, Haraksingh R, O'Huallachain M, Gerstein MB, Kidd JM, Bustamante CD: Detecting and annotating genetic variations using the HugeSeq pipeline. Nat Biotechnol. 2012, 30 (3): 226-229. 10.1038/nbt.2134.PubMedPubMed CentralView Article
- Bass BL: RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem. 2002, 71: 817-846. 10.1146/annurev.biochem.71.110601.135501.PubMedPubMed CentralView Article
- Hung MC, Link W: Protein localization in disease and therapy. J Cell Sci. 2011, 124 (Pt 20): 3381-3392.PubMedView Article
- Fabbro M, Henderson BR: Regulation of tumor suppressors by nuclear-cytoplasmic shuttling. Exp Cell Res. 2003, 282 (2): 59-69. 10.1016/S0014-4827(02)00019-8.PubMedView Article
- Dansen TB, Burgering BM: Unravelling the tumor-suppressive functions of FOXO proteins. Trends Cell Biol. 2008, 18 (9): 421-429. 10.1016/j.tcb.2008.07.004.PubMedView Article
- Nymark P, Wikman H, Ruosaari S, Hollmen J, Vanhala E, Karjalainen A, Anttila S, Knuutila S: Identification of specific gene copy number changes in asbestos-related lung cancer. Cancer Res. 2006, 66 (11): 5737-5743. 10.1158/0008-5472.CAN-06-0199.PubMedView Article
- Timmermann B, Kerick M, Roehr C, Fischer A, Isau M, Boerno ST, Wunderlich A, Barmeyer C, Seemann P, Koenig J: Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis. PLoS One. 2010, 5 (12): e15661-10.1371/journal.pone.0015661.PubMedPubMed CentralView Article
- Li WQ, Kawakami K, Ruszkiewicz A, Bennett G, Moore J, Iacopetta B: BRAF mutations are associated with distinctive clinical, pathological and molecular features of colorectal cancer independently of microsatellite instability status. Mol Cancer. 2006, 5: 2-10.1186/1476-4598-5-2.PubMedView Article
- Guda K, Moinova H, He J, Jamison O, Ravi L, Natale L, Lutterbaugh J, Lawrence E, Lewis S, Willson JK: Inactivating germ-line and somatic mutations in polypeptide N-acetylgalactosaminyltransferase 12 in human colon cancers. Proc Natl Acad Sci USA. 2009, 106 (31): 12921-12925. 10.1073/pnas.0901454106.PubMedPubMed CentralView Article
- Godai TI, Suda T, Sugano N, Tsuchida K, Shiozawa M, Sekiguchi H, Sekiyama A, Yoshihara M, Matsukuma S, Sakuma Y: Identification of colorectal cancer patients with tumors carrying the TP53 mutation on the codon 72 proline allele that benefited most from 5-fluorouracil (5-FU) based postoperative chemotherapy. BMC Cancer. 2009, 9: 420-10.1186/1471-2407-9-420.PubMedPubMed CentralView Article
- Iacopetta B: TP53 mutation in colorectal cancer. Hum Mutat. 2003, 21 (3): 271-276. 10.1002/humu.10175.PubMedView Article
- Wang F, Osawa T, Tsuchida R, Yuasa Y, Shibuya M: Downregulation of receptor for activated C-kinase 1 (RACK1) suppresses tumor growth by inhibiting tumor cell proliferation and tumor-associated angiogenesis. Cancer Sci. 2011, 102 (11): 2007-2013. 10.1111/j.1349-7006.2011.02065.x.PubMedView Article
- Manne U, Weiss HL, Grizzle WE: Racial differences in the prognostic usefulness of MUC1 and MUC2 in colorectal adenocarcinomas. Clin Cancer Res. 2000, 6 (10): 4017-4025.PubMed
- Wang X, Slebos RJ, Wang D, Halvey PJ, Tabb DL, Liebler DC, Zhang B: Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res. 2012, 11 (2): 1009-1017. 10.1021/pr200766z.PubMedPubMed CentralView Article
- Chepelev I, Wei G, Tang Q, Zhao K: Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq. Nucleic Acids Res. 2009, 37 (16): e106-10.1093/nar/gkp507.PubMedPubMed CentralView Article
- Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques. 2008, 45 (1): 81-94. 10.2144/000112900.PubMedView Article
- Ho SB, Niehans GA, Lyftogt C, Yan PS, Cherwitz DL, Gum ET, Dahiya R, Kim YS: Heterogeneity of mucin gene expression in normal and neoplastic tissues. Cancer Res. 1993, 53 (3): 641-651.PubMed
- Byrd JC, Bresalier RS: Mucins and mucin binding proteins in colorectal cancer. Cancer Metastasis Rev. 2004, 23 (1–2): 77-99.PubMedView Article
- Biemer-Huttmann AE, Walsh MD, McGuckin MA, Ajioka Y, Watanabe H, Leggett BA, Jass JR: Immunohistochemical staining patterns of MUC1, MUC2, MUC4, and MUC5AC mucins in hyperplastic polyps, serrated adenomas, and traditional adenomas of the colorectum. J Histochem Cytochem. 1999, 47 (8): 1039-1048. 10.1177/002215549904700808.PubMedView Article
- Ng PC, Henikoff S: SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31 (13): 3812-3814. 10.1093/nar/gkg509.PubMedPubMed CentralView Article
- Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, Barnes R, Watson P, Allen-Vercoe E, Moore RA: Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 2012, 22 (2): 299-306. 10.1101/gr.126516.111.PubMedPubMed CentralView Article
- Kostic AD, Gevers D, Pedamallu CS, Michaud M, Duke F, Earl AM, Ojesina AI, Jung J, Bass AJ, Tabernero J: Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. Genome Res. 2012, 22 (2): 292-298. 10.1101/gr.126573.111.PubMedPubMed CentralView Article
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/14/32/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.