The human acetylation polymorphism is one of the most intensively studied pharmacogenetic traits that underlie interindividual and interethnic differences in response to xenobiotics. Genetically determined differences in N-acetylation capacity have proved to be important determinants of both the effectiveness of therapeutic response and the development of adverse drug reactions and toxicity during drug treatment [1, 2]. Some of the drugs excreted by acetylation are crucial in the treatment of diseases representing a worldwide concern, such as tuberculosis, AIDS-related complex diseases, and hypertension. Moreover, numerous association studies have linked the acetylation phenotype to susceptibility to a variety of complex human diseases, the most consistent findings being those regarding urinary bladder cancer, asthma and other allergic disorders [3–6].
Single nucleotide polymorphisms (SNPs) in the coding region of the NAT2 gene determine the acetylation phenotype. Thanks to the well-established genotype-phenotype correlation at this locus, the individual acetylation status can be reliably predicted from the haplotype combination at NAT2, according to the acknowledged classification of NAT2 haplotypes into either low-activity or fully functional alleles (see the consensus NAT2 gene nomenclature website: http://www.louisville.edu/medschool/pharmacology/NAT.html). Slow, intermediate and rapid acetylators are defined as carriers of zero, one or two functional haplotypes, respectively. Many studies did not distinguish between rapid and intermediate acetylators, categorizing both types of subjects as rapid acetylators.
Determining the linkage phase of SNP alleles along the coding sequence of NAT2 is crucial to unequivocally assign an individual's multi-site NAT2 genotype to a particular combination of two multilocus haplotypes and to correctly infer his acetylation status. Ambiguous NAT2 genotyping data that may lead to patient misclassification appear indeed to be common in human populations [7]. Special analytical techniques have been designed to unambiguously determine the allocation of SNP alleles to either DNA strand. However, molecular haplotyping methods are labour-intensive and expensive and are not suitable for routine clinical applications. A cheap and straightforward alternative for haplotype reconstruction is the use of computational algorithms. Several studies have demonstrated the high effectiveness of haplotype reconstruction algorithms in the particular case of the NAT2 gene by comparing the computationally inferred haplotypes to the real ones resolved through the use of molecular haplotyping techniques [7–9]. But such in silico approaches may not be convenient to clinicians not familiar with computational haplotype analysis. Moreover, algorithmic techniques are statistical and require the analysis of a population rather than a single or a few individuals.
To circumvent these limitations, Kuznetsov et al. [10] have developed a web-server NAT2PRED that allows a fast determination of NAT2 acetylation phenotypes (slow, intermediate, and rapid) from the unphased genotype data at six polymorphic positions in NAT2. These six SNPs are the most commonly reported ones in surveys of NAT2 sequence variation in human populations [11]. NAT2PRED alleviates the need of reconstructing haplotypes by implementing a supervised pattern recognition method that was trained on the NAT2 genotyping data of 1,377 subjects of known acetylation status. This tool is publicly available http://nat2pred.rit.albany.edu and has a simple intuitive user interface. It showed a nearly perfect classification accuracy of 99.9% in a sample mostly composed of Caucasians [10]. However, it remains unclear to what extent this tool can be applied to individuals from any ethnicity since it was developed on a dataset where 94% of subjects were Caucasian. The accuracy of NAT2PRED needs to be assessed before its application can be advocated at a large scale.
The objective of the present study was to empirically evaluate the performance of NAT2PRED in a wide collection of population samples worldwide. To this end, we performed an extensive survey of the literature to identify those samples that were adequately genotyped for all the common SNPs in NAT2. In total the collected data consisted of 8,489 individuals from 56 human populations representing major geographic regions: sub-Saharan Africa (12 samples), Europe and North Africa (23), Central/South Asia (5), East Asia (13), and America (3). In each sample, NAT2 haplotypes were reconstructed using either molecular or computational methods. Therefore, these data provided an opportunity to compare, for each subject, the acetylation status inferred by NAT2PRED from the unphased NAT2 genotype data with the "true" one predicted from the pair of haplotypes reconstructed through molecular or computational haplotyping. NAT2PRED correctly identified slow acetylators with a sensitivity above 99% for all populations outside sub-Saharan Africa.