Cross-pollination of research findings, although uncommon, may accelerate discovery of human disease genes
© Duda et al.; licensee BioMed Central Ltd. 2012
Received: 26 April 2012
Accepted: 24 November 2012
Published: 28 November 2012
Technological leaps in genome sequencing have resulted in a surge in discovery of human disease genes. These discoveries have led to increased clarity on the molecular pathology of disease and have also demonstrated considerable overlap in the genetic roots of human diseases. In light of this large genetic overlap, we tested whether cross-disease research approaches lead to faster, more impactful discoveries.
We leveraged several gene-disease association databases to calculate a Mutual Citation Score (MCS) for 10,853 pairs of genetically related diseases to measure the frequency of cross-citation between research fields. To assess the importance of cooperative research, we computed an Individual Disease Cooperation Score (ICS) and the average publication rate for each disease.
For all disease pairs with one gene in common, we found that the degree of genetic overlap was a poor predictor of cooperation (r2=0.3198) and that the vast majority of disease pairs (89.56%) never cited previous discoveries of the same gene in a different disease, irrespective of the level of genetic similarity between the diseases. A fraction (0.25%) of the pairs demonstrated cross-citation in greater than 5% of their published genetic discoveries and 0.037% cross-referenced discoveries more than 10% of the time. We found strong positive correlations between ICS and publication rate (r2=0.7931), and an even stronger correlation between the publication rate and the number of cross-referenced diseases (r2=0.8585). These results suggested that cross-disease research may have the potential to yield novel discoveries at a faster pace than singular disease research.
Our findings suggest that the frequency of cross-disease study is low despite the high level of genetic similarity among many human diseases, and that collaborative methods may accelerate and increase the impact of new genetic discoveries. Until we have a better understanding of the taxonomy of human diseases, cross-disease research approaches should become the rule rather than the exception.
The pace of genetic discovery in human diseases has accelerated exponentially through the invention of high-throughput technologies and the advent of next-generation sequencing approaches. What were once considered to be well-defined boundaries between human diseases are increasingly appearing as arbitrary and blurred. Recent research has shown that many human diseases share large numbers of genes and genetic networks, and therefore likely share molecular mechanisms that will elucidate shared causes and shared treatments .
This emerging picture of commonality among human diseases suggests that research, in particular our collective understanding of the genetic causes of human diseases, should benefit from a shift away from singularly focused research towards multi-disease focused efforts that look across existing disease circumscriptions rather than within. Indeed, recent efforts have begun to prove out this hypothesis, including comparative studies among autism and related neurodevelopmental disorders , as well as various disease-network approaches . Such innovative approaches to the study of human diseases will facilitate the creation of a more appropriate genetically based taxonomy of disease.
In the present study, we capitalized on the last fifteen years of genetic research to address whether or not cross-pollination in genetic research of human diseases has served to increase the pace of discovery. Specifically, we constructed authoritative gene lists for 193 human diseases defined in the Medical Subject Headings database  and mapped the publication records for each disease to the citation histories of all others. This enabled us to track patterns of shared discovery and evaluate the impact on the rate of new gene discovery in cases where cross-disease research occurred, as well as in cases where it did not.
The Mutual Citation Score
C ij = Number of citation events where a gene discovery in disease i is cited back to the discovery of the same gene in disease j
C ji = Number of citation events where a gene discovery in disease j is cited back to the discovery of the same gene in disease i
P i = Total number of genetic publications in i
P j = Total number of genetic publications in j
We attempted to control for self-citation by excluding citation events where both publications had the same last author. We computed the MCS for 10,853 disease pairs that had at least one gene in common. Fundamentally, for a pair of diseases, the MCS describes the percentage of all genetic discoveries that cross-reference the other disease. Theoretically, the values for the MCS scale from 0–1, where 0 signifies no record of cross-citations between the two diseases. However, because the MCS is based on the event when a gene is initially discovered in a disease, it would be impossible to have a disease pair with an MCS=1. For all 193 diseases, the research tool Genotator  was used to provide gene lists and genetic publication histories dated from 1996–2010 (Genotator files obtained March 2011). Unrelated publications were omitted from the publication histories of stroke, asthma, hypersensitivity, leukemia, diabetes mellitus and hemochromatosis after manual examination of the abstracts revealed no mention of the disease. Publication dates as well as author and citation information were obtained via PubMed Entrez Utilities . Although citation histories were only provided for PubMed Central publications, this subset of 3868 articles supplied an accurate representation of general research trends between diseases. No disease was significantly under or overrepresented in PubMed Central (Additional file 1: Table S1) and the disorders were all treated equally using this systematic approach.
The Individual Disease Cooperation Score
In other words, the ICS for any disease (i) is simply the summation of all MCS(i,j) values for which (i) is part of the disease pair (i, j).
Genetic similarity does not predict cooperative research
The disease pair of asthma and hypersensitivity, two highly genetically related and co-morbid conditions , returned the highest MCS of all disease pairs (MCS=0.39297), translating to mutual citation in approximately 40% of all publications relevant to either disorder. This pair shared 455 (74.71%) of their total combined genes, and 1066 citation events were recorded for 59 (13%) of the genes common to both disorders. Among the cross-referenced genes were several known to be high priority candidate genes in both asthma and hypersensitivity, including GSTP1 , ADAM33  and ADRB2 .
Diabetes mellitus and obesity  had the second highest MCS of 0.18314. The genetic overlap (35.7%) between this pair consisted of 418 genes. We found that 118 (28%) of the genes shared between diabetes mellitus and obesity were cross-referenced between the two research fields, including many genes that are strongly linked to both diseases such as FTO , involved in regulation of global metabolic rate and body fat accumulation, as well as ADIPOQ , which is implicated in the control of fat metabolism and insulin sensitivity.
We observed that a vast majority of disease pairs (99.75%) returned MCS values lower than 0.05, irrespective of the amount of genetic overlap between the pair. For example, lung neoplasms and bladder neoplasms were found to share 431 genes (60.88%), the third highest genetic overlap of all disease pairs examined. Less than 5% (n=21) of the genes common to both diseases were cross-referenced, contributing to the comparatively low MCS of 0.02578. Despite the low level of cooperative research, many of the genes in common, including GSTM1  and TP53 , have been shown to be highly associated with the molecular pathology of both diseases.
A similar pattern was observed for bipolar disorder and obsessive-compulsive disorder, two diseases not only known to be genetically linked but also known to occur together commonly in patients [16, 17]. We found that this disease pair had a genetic overlap of 53.48% consisting of 307 genes, however the only cross-disease citations were for BDNF  and SLC6A4 , genes that are widely implicated in over 300 other disorders and potentially less directly related to the mechanistic causes of disease than other genes. The resulting MCS for the pair was 0.004732, thus mutual citation between the fields of bipolar disorder and obsessive-compulsive disorder occurred in less than 0.5% of their combined set of genetic publications.
Disease pairs with comparable genetic overlaps have highly variable values for MCS
Parkinson Disease & Gaucher Disease
Fragile X Syndrome & ADHD
Obesity & Deafness
α-1 Antitrypsin Deficiency & Lung Neoplasms
Rheumatic Fever & Rheumatic Heart Disease
Hypertension & Breast Neoplasms
Cystic Fibrosis & Sarcoma
Ovarian Neoplasms & Breast Neoplasms
Multiple Sclerosis & Rheumatoid Arthritis
Spinal Muscular Atrophy & Fragile X Syndrome
Diabetes Mellitus & Obesity
Autism & ADHD
Myotonic Dystrophy & Fragile X Syndrome
Asthma & Hypersensitivity
Cryptococcosis & Poliomyelitis
Cross-pollination increases rate of discovery
The three disease fields that published most often were found to be diabetes mellitus, hypertension and obesity. Diabetes mellitus was found to have a publication rate of 0.87089 publications per day, translating to approximately 320 publications per year. Similarly, we calculated that the field of hypertension produces about 220 publications every year (pub rate=0.59701) and obesity, on average, releases 185 genetic publications per year (pub rate=0.05788). These diseases were also among the top fifteen most cooperative disease fields, as calculated by the ICS.
The opposite was observed for disease fields that collaborated infrequently, or not at all, with related disease fields. These diseases were substantially slower at publishing novel genetic discoveries. For example, fragile X syndrome had a low publication rate of 0.018222 publications per day, or about six publications per year, as well as a poor ICS (ICS=0.1881). Likewise, fragile X syndrome was found to have only cited six of 70 related disease fields in its history of genetic publications. We also discovered that myotonic dystrophy failed to cite any other disease field in its genetic research, despite having high genetic overlap with over 40 diseases, including spinal muscular atrophy (38.10%) and fragile X syndrome (40.00%). As a consequence of this low level of inter-disease collaboration, myotonic dystrophy had a publication rate of 0.007194, translating to an average of only two or three genetic publications per year. This discovery rate is more than 100 times slower than that of diabetes mellitus, the leader in genetic publication, even though diabetes and obesity, the relationship that contributes most to diabetes mellitus’ ICS, share only 35.70% of their implicated genes, less than myotonic dystrophy shares with either spinal muscular atrophy or fragile X syndrome. Our data provides evidence that disease fields such as myotonic dystrophy and fragile X syndrome could greatly improve their publication rates by participating in more cooperative research.
By analyzing the publication histories of 193 genetic disorders over the last fifteen years, we found that cross-pollination of genetic discovery, or cross-disease research, is uncommon. Surprisingly, only 0.25% of all genetically related disease pairs studied exhibited inter-disease collaboration in greater than 5% of their total genetic research (MCS ≥0.05), and only about 10% of all disease pairs participated in cooperative research at any level (MCS >0.00), despite often large numbers of genes in common between the diseases.
We also found that when cross-pollination of genetic discovery does occur, the pace of discovery in the field is accelerated. Both the number of collaborative relationships -- the number of different diseases whose research was cited by a genetically similar disease -- and the Individual Disease Cooperation Score (ICS) were found to have significant positive correlations with publication rate. These results indicate that diseases that engage in collaborative research frequently and with a wide variety of related diseases tend to publish genetic discoveries more quickly than non-cooperative disease fields. Because the balance in co-citation among a majority of the disease pairs studied was high, we can conclude that the correlation between cross-pollination and accelerated discovery was due to a bidirectional sharing of research findings rather than a gene bandwagon effect and unidirectional tracking of “hot” disease fields.
Our results suggest that the field of human disease research has historically functioned mainly in a disconnected and single-disease focused manner rather than through collaborative multi-disease spanning effort. Part of this no doubt stems from our existing taxonomy of human disease and the importance of specialization, and potentially also to the funding priorities of federal funding agencies. However, our study provides evidence in support of the possible benefits of a shift towards more collaborative cross-disease research methods. Genetic discoveries made by cross-disease collaborations could provide insight into multi-disease indications for drugs, and since fields that participate in cross-disease collaborations tend to make discoveries faster, such a strategy would accelerate the beginning of clinical trials for these newly proposed therapies.
The present study demonstrates the benefits of a cross-disease research model for genetic research in human diseases as they are currently defined. We found that a vast majority of genetically related diseases show no evidence for collaborative research practices over the last fifteen years. However, we observed that both the amount of cooperative research and the number of collaborative relationships in a particular disease field showed a strong positive correlation with an accelerated discovery rate in that disease field. These results suggest that cross-disease research will become increasingly more common and could accelerate the pace of discovery in the field as a whole, leading to faster understanding of the genetic roots of disease, faster development of multi-indicated drugs, and perhaps also leading to a new taxonomy of human disease that is informed by the expansive overlap in underlying genetic composition rather than by symptomatic traits.
Mutual citation score
Individual disease cooperation score
Attention deficit disorder with hyperactivity.
We thank members of the Wall Lab, including Todd DeLuca, Jae-Yoon Jung, Jike Cui, for thoughtful discussions and constructive advice. This project was aided in part by funding from the National Institutes of Health under Grant No. 1R01MH090611-01A1, and a Translational Research Program Grant from Children’s Hospital, Boston.
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, et al: The human disease network. Proc Natl Acad Sci U S A. 2007, 104 (21): 8685-8690. 10.1073/pnas.0701361104.View ArticlePubMedPubMed CentralGoogle Scholar
- Wall DP, Esteban FJ, Deluca TF, Huyck M, Monaghan T, et al: Comparative analysis of neurological disorders focuses genome-wide search for autism genes. Genomics. 2009, 93 (2): 120-129. 10.1016/j.ygeno.2008.09.015.View ArticlePubMedGoogle Scholar
- Hidalgo CA, Blumm N, Barabási AL, Christakis NA: A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009, 5 (4): e1000353-10.1371/journal.pcbi.1000353.View ArticlePubMedPubMed CentralGoogle Scholar
- Medical subject headings. http://www.nlm.nih.gov/mesh,
- Wall DP, Pivovarov R, Tong M, Jung JY, Fusaro VA, et al: Genotator: a disease-agnostic tool for genetic annotation of disease. BMC Med Genomics. 2010, 3 (50):
- Pubmed entrez utilities. http://www.ncbi.nlm.nih.gov/entrez,
- Global surveillance, prevention and control of chronic respiratory diseases: a comprehensive approach. Edited by: Bousquet J, Khaltaev N. 2007, Geneva, Switzerland: WHO Press
- Autworks. http://tools.autworks.hms.harvard.edu/genes/4638,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/15478,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/286,
- Khaodhiar L, McCowen KC, Blackburn GL: Obesity and its comorbid conditions. Clin Cornerstone. 1999, 2 (3): 17-31. 10.1016/S1098-3597(99)90002-9.View ArticlePubMedGoogle Scholar
- Autworks. http://tools.autworks.hms.harvard.edu/genes/24678,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/13633,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/4632,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/11998,
- Hantouche EG, Kochman F, Demonfaucon C, Barrot I, Millet B, et al: Bipolar obsessive-compulsive disorder: confirmation of results of the “ABC-OCD” survey in 2 populations of patient members versus non-members of an association. Encephale. 2002, 28 (1): 21-28.PubMedGoogle Scholar
- Perugi G, Toni C, Frare F, Travierso MC, Hantouche E, et al: Obsessive-compulsive-bipolar comorbidity: a systematic exploration of clinical features and treatment outcome. J Clin Psychiatry. 2002, 63 (12): 1129-1134. 10.4088/JCP.v63n1207.View ArticlePubMedGoogle Scholar
- Autworks. http://tools.autworks.hms.harvard.edu/genes/1033,
- Autworks. http://tools.autworks.hms.harvard.edu/genes/11050,
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/13/114/prepub