Genetic studies of the Roma (Gypsies): a review

Background Data provided by the social sciences as well as genetic research suggest that the 8-10 million Roma (Gypsies) who live in Europe today are best described as a conglomerate of genetically isolated founder populations. The relationship between the traditional social structure observed by the Roma, where the Group is the primary unit, and the boundaries, demographic history and biological relatedness of the diverse founder populations appears complex and has not been addressed by population genetic studies. Results Recent medical genetic research has identified a number of novel, or previously known but rare conditions, caused by private founder mutations. A summary of the findings, provided in this review, should assist diagnosis and counselling in affected families, and promote future collaborative research. The available incomplete epidemiological data suggest a non-random distribution of disease-causing mutations among Romani groups. Conclusion Although far from systematic, the published information indicates that medical genetics has an important role to play in improving the health of this underprivileged and forgotten people of Europe. Reported carrier rates for some Mendelian disorders are in the range of 5 -15%, sufficient to justify newborn screening and early treatment, or community-based education and carrier testing programs for disorders where no therapy is currently available. To be most productive, future studies of the epidemiology of single gene disorders should take social organisation and cultural anthropology into consideration, thus allowing the targeting of public health programs and contributing to the understanding of population structure and demographic history of the Roma.


Introduction
The Roma (Gypsies) became one of the peoples of Europe around one thousand years ago, when they first arrived in the Balkans [1,2]. The current size of the European Romani population, around 8 million [2], is equivalent to that of an average European country (Figure 1). While human rights and socio-economic issues related to the Roma are increasingly becoming the focus of political debate and media coverage throughout Europe, their poor health status [3-6] is rarely discussed and still awaits the attention of the medical profession.
This review of genetic studies of the Roma was prompted by two recent developments: (i) Studies conducted over the last decade have resulted in the identification of a number of novel single gene disorders and disease-caus-ing mutations. The accumulating data are already sufficient to outline a pattern and draw conclusions about public health policies and future research. (ii) The eco-nomic and political changes in Eastern Europe and the wars in former Yugoslavia have led to the west-bound migration of large numbers of Roma [7,8], changing the traditional demographic profile of Gypsy minorities across Europe. A predictable consequence of this new diaspora is that medical practitioners in many countries will encounter Romani patients with previously unknown or very rare disorders. A summary of the available information should facilitate diagnostic investigations and counselling in these affected families and stimulate international collaboration.

Materials and Methods
Literature searches were performed using the U.S.A National Library of Medicine PubMed/MEDLINE databases for the period 1960 to December 2000. Database searches using the keyword "Gypsies" identified 297 articles whilst the keyword "Gypsy" produced 573 articles. The discrepancy is due mainly to the inclusion of articles about the "gypsy retransposable element" and the "gypsy moth". Searches using the terms "Roma", "Romani" and "Romany" yielded results that were not relevant to the topic (eg. Roma, the capital of Italy) or else incomplete.
The majority of the 297 articles dealt with issues beyond the focus of this review, namely social problems related to the health of the Roma (28.6%), or general medical problems (29.6%). The remainder were reports on genetic research, of which 41 studies (13.8%) were in the field of clinical genetics, 44 (14.8%) were molecular studies of genetic disorders, and 39 (13.1%) covered population genetic research. In the clinical and molecular genetics fields, we have given preference to publications which were not limited to single case descriptions, and dealt with disorders with public health impact. Population genetics papers were selected on the basis of the compatibility of study design, specifically the analysis of comparable polymorphic systems.
Complementary data on history, linguistics, cultural anthropology and demography were found through standard library and bibliographic searches, and included publications recommended by consulting experts in Romani studies (Drs. Elena Marushiakova and Vesselin Popov from the Bulgarian Academy of Sciences and Dr. Ian Hancock from the University of Texas at Austin).

The "Track Record" of Genetics
Genetic studies of the Roma have been conducted for over 70 years, with thousands of individuals sampled across Europe. During the years of the Third Reich, Gypsies, together with Jews, attracted the special attention of German geneticists [9]. A grant proposal signed by Nobel prize winner Ferdinand Sauerbruch and funded by the Deutsche Forschungsgemeinschaft designed the "genetic and medical research" at the death camp in Auschhwitz [9]. The Race Hygiene and Population Biology Research Centre, established in 1936, organised thorough records of Jewish and Romani pedigrees and provided "the scientific basis" for the "final solution", the annihilation of millions of Jews and Roma in the concentration camps of Nazi-occupied Europe.
Post-war genetic research has been preoccupied with the Indian origins of the Roma [10][11][12][13][14][15][16], pursuing the "Indian connection" even in studies meant to focus on severe genetic disorders [17]. Most studies have remained in the realm of scientific exploration, away from the health needs of the Roma. Many publications display judgemental and paternalistic attitudes, that would be considered unacceptable if used with regard to other populations.
This historical "track record", the persisting practices of discrimination and marginalisation [3][4][5][6], and the fact that, unlike the Jews, the Finns and the French Canadians, the Roma are still the "object" of investigations conducted by outsiders, are all likely to impact on the attitudes of the Roma towards genetics. Building up the trust and collaboration necessary for both public health programs and research, should become a goal of the health care systems of Europe.

Population Genetics
Population genetic studies have used mostly "classical" polymorphisms to investigate Romani individuals from different European counties and address three main questions: (i) similarity between Roma and Indians; (ii) relatedness to European populations; (iii) affinities between Romani populations from different countries [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24]. Single locus comparisons have resulted in controversy, with some pointing to close genetic affinity between Roma and Indians, and others indicating that the Roma are indistinguishable from Europeans. Heterogeneity between countries has become apparent and has led to the conclusion that the European Roma are composed of two different populations, characterised respectively by a high and a low frequency of blood group B [23], or defined as East and West European Roma, with the former closely related to Indian populations [16]. Heterogeneity of Romani populations within the same country has been suggested by the very small number of studies addressing this issue [19,21,25,26].
In an attempt to summarise the existing data, we have conducted a multilocus re-analysis of several marker systems using comparable studies of the Roma in different European countries [11,22,23,24,26,27], Europeans [28][29][30][31][32][33] and north Indian populations [34][35][36]. The comparisons ( Figure 2, Table 1) provide a general indication that most of the Roma are genetically closer to Indians than to European populations, hardly surprising in view of the linguistic theories on the Indian origins of the Multilocus comparison between Romani populations from different European countries, autochthonous European populations and populations from north India The polymorphic systems included in the analysis comprised A 1 A 2 BO, MN, haptoglobin and Rh (CDE), with a total of 11 independent alleles. Information on these markers was available for the Roma in Slovakia (n = 350) [26], Hungary (n = 507) [11], England (n = 109) [23], Slovenia (n = 350) [27], Sweden (n = 115) [24] and Wales (n = 84) [22], for non-Roma Europeans (n = 5169) and for two north Indian populations, Rajput (n = 175) [34,35] and Punjabi (n = 140) [35,36]. Genetic distances between pairs of populations were computed by means of Reynold's coancenstry coefficient [84] and displayed as a neighbour-joining tree [85]. The robustness of the branches in the tree was assessed with a bootstrap approach [86]. The analysis was conducted using the PHYLIP 3.57c package [87].
Roma proposed two centuries ago [1]. More importantly, the analysis highlights the internal diversity of the Roma, who appear to be genetically far more heterogeneous than autochthonous European populations.

Diseases and mutations identified to-date
As a result of traditionally low socio-economic status and limited access of the Roma to health care, their unique genetic heritage has long escaped the attention of European medicine and is now being randomly "discovered".
To date, nine Mendelian disorders caused by private "Romani" mutations have been described ( Table 2).
In view of the lack of systematic studies, the list cannot be comprehensive and is likely to represent the biases and interests of individual medical researchers working in this field. Data in the literature, particularly from the Spanish Collaborative Study of Congenital Malformations [47], point to the existence of a number of additional rare single gene disorders, whose molecular basis is still to be identified. These include hereditary idiopathic torsion dystonia (ITD) [48], epidermolysis bullosa [49], albinism [49], and some rare autosomal recessive malformation syndromes, such as Bowen-Conradi, Jarcho-Levin, Meckel, Smith-Lemli-Opitz, and Fraser [47,49].
A third group of Mendelian disorders includes common conditions, where the mutation prevalent in the surrounding or in global populations is likely to have been introduced by admixture, for example cystic fibrosis and delF508 [50], phenylketonuria and the R252W and IVS10nt546 mutations [51,52], and medium chain acylcoenzyme A dehydrogenase (MCAD) deficiency and G985 [53].
At the same time, many studies emphasise the small size of the conserved region of homozygosity and the diversity of disease haplotypes observed even within single affected kindreds [37,40,42,44,54] (Figure 3). Haplotype diversification, generated by numerous historical recombinations and marker mutations [39] as a consequence of the old age [37,43] and high frequencies of diseasecausing mutations, has important implications for gene mapping studies: (i) Homozygosity mapping, relying on consanguinity in the affected families, is not applicable in studies using the standard genetic maps and can result in spuriously negative results [54]. (ii) The diversity of historical recombinations becomes a powerful tool in the subsequent refined genetic mapping and positional cloning of disease genes [55,39].

Epidemiological data
Reported gene frequencies are high for both private and "imported" mutations, and often exceed by an order of magnitude those for global populations. For example, galactokinase deficiency whose worldwide frequency is 1:150,000 to 1:1,000,000 [56,57] affects 1 in 5,000 Romani children [44]; autosomal dominant polycystic kidney disease (ADPKD) has a global prevalence of 1:1000 individuals worldwide [58] and 1:40 among the Roma in some parts of Hungary [17]; primary congenital glauco- The populations and references included in the comparison are as indicated in figure 2. Genetic variance was apportioned between and within populations and between and within groups of populations by means of the Analysis of Molecular Variance [88], as implemented in the Arlequin 1.1 package [89]. ma ranges between 1:5,000 and 1:22,000 worldwide [59,60] and about 1:400 among the Roma in Central Slovakia [61,62].
Carrier rates for a number of disorders have been estimated to be in the 5 to 20% range (Table 3).
Although incomplete, the available data already lead to some practical conclusions: (i) What may appear to be a novel disorder confined to a single family, could in fact be an indication of a common problem affecting large numbers of individuals. Research should therefore extend beyond case descriptions and aim at more comprehensive epidemiological information. (ii) The emphasis on consanguinity in affected families displaces the focus from an obvious need for public health intervention to patterns of personal behaviour. In the face of the reported high gene frequencies, consanguinity is no more relevant than it would be as a cause of beta-thalassemia in Mediterranean countries. (iii) High gene frequencies may result in the parallel segregation of phenotypically similar but genetically distinct disorders within the same kindred [40,42]. This clustering should be borne in mind in diagnostic studies, where assumptions based on pedigree structure should be avoided and independent clinical and genetic assessment should be conducted in all cases.  [40], the conserved region of homozygosity (red bars) was found to span only <500 kb. Courtesy of Dr. Tamara Rogers. has been identified in Romani families from Bulgaria. It has not been confirmed in the Hungarian ADPKD families, but appears probable because of a reported common migration history of all affected groups. Most estimates are based on prevalence figures. *Carrier rates determined through direct mutation detection are indicated in red. **The LGMD2C carrier rates for the general Romani population of Bulgaria are probably an overestimate since the screening was conducted in a geographical region where the high risk groups are clustered. ***The screening for the G985 mutation in Spain, performed in Gypsy groups residing in different parts of the country, revealed substantial differences between groups.
Research into Mendelian disorders has provided ample evidence of genetic stratification, with mutations occurring at high frequencies in some Romani communities and altogether absent in others, located in close geographic proximity. In some cases, such as Glanzmann thrombasthenia [63,64], LGMD2C [65,66], galactokinase deficiency [44], CCFDN [42] and HMSN-R [40], the identity of the affected groups has been specified. Other studies, for example of congenital glaucoma [61,62] and ADPKD [17] provide only an indication of the area of residence of the affected communities. In the few cases where gene frequencies can be compared between highrisk groups and the general Romani population of the same country, substantial differences become apparent (Table 3).
At the same time, founder mutations have spread with the Romani diaspora and are shared by affected individuals throughout Europe (Figure 4). International collaboration has already made a substantial contribution to the study of disease phenotypes in large samples of genetically homogeneous patients [46,[67][68][69][70][71] as well as to the refined mapping of disease genes [55]. Such collaboration will be essential for future research into new disorders, founder mutations and factors modifying disease severity, and for understanding the epidemiology of genetic diseases of the Roma. The first steps to European collaboration have been made, with the founding of the Gypsy Genetic Heritage Consortium in 1997, and the forthcoming ENMC workshop on neuromuscular disorders in Gypsies.

Discussion
The pattern emerging from genetic research is that of a conglomerate of founder populations which extend across Europe but at the same time differ within individual countries, and whose demographic history, internal structure and relationships are poorly understood. An insight is provided by the social sciences.
The 18 th century theory on the Indian origins of the Roma [reviewed in 1], is based on the similarities between Romani and languages spoken in the Indian subcontinent and is supported by genetic evidence. However the lack of close relationship to any specific language or dialect has left unresolved the question of the original ethnic composition of the proto-Roma, with both single [72,73] and diverse [74] origins proposed by linguists. Translated into the language of genetics, this is a relevant question related to the homogeneity or diversity of the founding population.
Inferred from linguistic influences retained in all Romani dialects, the major migration routes pass through Persia, Armenia, Greece and the Slavic-speaking parts of the Balkans [1]. The first documents pointing to the arrival of the Roma in the Balkans date from the 11 th -12 th century [1,75]. By the 15 th century, mention of their presence can be found in historical records from all parts of Europe [1,2].
Historical demographic data are limited, however tax registries and census data give an approximate idea of population size and rate of demographic growth through the centuries (Table 4). A small size of the original population is suggested by the fact that although most of the migrants arriving in Europe in the 11 th -12 th century remained within the limits of the Ottoman Empire [1,75], the overall number of Roma in its Balkan provinces in the 15 th century was estimated at only 17,000.
During its subsequent history in Europe, this founder population split into numerous socially divided and geographically dispersed endogamous groups, with historical records from different parts of the continent consistently describing the travelling Gypsies as "a group of 30 to 100 people led by an elder" [1,2]. These splits, a possible compound product of the ancestral tradition of the jatis of India, and the new social pressures in Europe (e.g. Gypsy slavery in Romania [76] and repressive legislation banning Gypsies from most western European countries [1,2]), can be regarded as secondary bottlenecks, reducing further the number of unrelated founders in each group. The historical formation of the present-day 8 million Romani population of Europe is therefore the product of the complex initial migrations of numerous small groups, superimposed on which are two large waves of recent migrations from the Balkans into Western Europe, in the 19 th -early 20 th century, after the abolition of slavery in Rumania [1,2,76] and over the last decade, after the political changes in Eastern Europe [7,8].
The Group is still the primary building block of the social organisation of the Roma [1,2]. Group identity and the ensuing divisions and rules of endogamy are based on tradition, customs and organs of self-rule, language and dialects, trades, history of migrations, and religion. Individual groups can be classified into major metagroups [1,2,
• Congenital myasthenia in Serbia, Macedonia, Greece, Bohemia and Germany [46]. • Hereditary motor and sensory neuropathy -Russe in Bulgaria [40], Romania and Spain [our unpublished findings]. The existing data are the product of ad-hoc collaborative studies and are not likely to represent the true spread of Romani founder mutations. The distribution of LGMD2C in Western Europe and in Bulgaria leads to the prediction that the disorder occurs and awaits detection along the entire European migration route, spanning the Balkans and Central Europe. Filling the gaps in the map will be particularly useful in the case of treatable disorders which are strong candidates for newborn screening, such as galactokinase deficiency and congenital myasthenia.
Linguistics, history and cultural anthropology suggest two major, equally plausible historical scenarios that could lead to a "jigsaw puzzle" of founder populations: (i) a genetically substructured ancestral population, where the old social traditions of strict endogamy have been retained and subsequent splits of the comprising groups have enhanced the original genetic differences; (ii) a small homogeneous ancestral population spawning numerous subgroups where strong drift effects have resulted in substantial genetic divergence. Genetic research has indeed faced the "jigsaw puzzle" and has thus far been unable to resolve it. The genetic data provide evidence of population stratification, however a closer examination is precluded by the random cross-section sampling design of most population genetic studies, where the traditional social organisation and self-identity of the Roma have been ignored and subjects classified on the basis of the political boundaries of Europe. The relationship between social organisation and genetic structure does not appear to be straightforward and is still to be addressed in population genetic research based on the long standing identity of Gypsy groups. The issue is of relevance to public health policies and the targeted prevention of mendelian disorders, as well as to future studies of genetically complex disorders.
The existing information on single gene disorders is certainly not exclusive to the Roma. The phenomenon of clustering of rare disorders and private founder mutations has been studied in detail in well characterised founder populations, such as the Jews [77,78], Finns [79,80] and French Canadians [81]. Unlike the above examples however, genetic studies of the Roma have failed to take the immediate benefits of research back to the individuals and families that have been the object of research. Yet by now it should be obvious that genetics has an important role to play in improving the quality of health care for the Roma. Treatable disorders such as ga-lactokinase and MCAD deficiency, with an expected incidence of affected births in the range of 1:1,000 to 1:5,000, meet the standard criteria for newborn screening more than does phenylketonuria, with its average incidence of 1:10,000. Adding the simple, sensitive and specific mutation tests to existing newborn screening programs would be technically simple and highly efficient due to the homogeneous genetic basis of the disorders.
Carrier testing should be made available to Romani communities at high risk for severe untreatable disorders. Information on the identity of affected Romani populations is important for public health intervention since it would allow the planning and facilitate the implementation of targeted prevention programs, especially in the Eastern European countries where economic resources are limited. The importance of the educational component of such programs has already been demonstrated by the highly successful prevention of Tay-Sachs disease among Ashkenazi Jews [82] and the failure of sickle-cell screening among Afro-Americans [83]. This component would be particularly important for a population like the Roma, which has been subject to racism and persecution throughout its co-existence with European societies.
The attention of geneticists is increasingly attracted by genetically isolated populations in the third world. In terms of living standards and the major health indicators, the Roma are much closer to the developing world than to their European neighbours [3]. This forgotten people of Europe can be regarded as a test case for the capacity of genetics to provide better health.