Skip to main content
  • Research article
  • Open access
  • Published:

Characterization of APOBEC3 variation in a population of HIV-1 infected individuals in northern South Africa



The apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like 3 (APOBEC3) genes A3D, A3F, A3G and A3H have all been implicated in the restriction of human immunodeficiency virus type 1 (HIV-1) replication. Polymorphisms in these genes are likely to impact viral replication and fitness, contributing to viral diversity. Currently, only a few studies indicate that polymorphisms in the A3 genes may be correlated with infection risk and disease progression.


To characterize polymorphisms in the coding regions of these APOBEC3 genes in an HIV-1 infected population from the Limpopo Province of South Africa, APOBEC3 gene fragments were amplified from genomic DNA of 192 HIV-1 infected subjects and sequenced on an Illumina MiSeq platform. SNPs were confirmed and compared to SNPs in other populations reported in the 1000 Genome Phase III and HapMap databases, as well as in the ExAC exome database. Hardy-Weinberg Equilibrium was calculated and haplotypes were inferred using the LDlink 3.0 web tool. Linkage Disequilibrium (LD) for these SNPS were calculated in the total 1000 genome and AFR populations using the same tool.


Known variants compared to the GRCh37 consensus genome sequence were detected at relatively high frequencies (> 5%) in all of the APOBEC3 genes. A3H showed the most variation, with several of the variants present in both alleles in almost all of the patients. Several minor allele variants (< 5%) were also detected in A3D, A3F and A3G. In addition, novel R6K, L221R and T238I variants in A3D and I117I in A3F were observed. Four, five, four, and three haplotypes were identified for A3D, A3F, A3G, and A3H respectively.


The study showed significant polymorphisms in the APOBEC3D, 3F, 3G and 3H genes in our South African HIV1-infected cohort. In the case of all of these genes, the polymorphisms were generally present at higher frequencies than reported in other 1000 genome populations and in the ExAC exome consortium database .

Peer Review reports


The genes for the apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like protein gene family (APOBEC3), a family of seven members (APOBEC3 A, B, C, D, F, G and H), are situated on human chromosome 22. The proteins encoded by these genes are cytidine deaminases that have been classified as restriction factors because of their role as innate immunity factors. They provide host cell defense against a diverse set of retroviruses, endogenous retroelements and DNA viruses, including human immunodeficiency virus (HIV) [1,2,3]. APOBEC proteins restrict HIV through deamination of cytosines in viral cDNA during reverse transcription, causing G-to-A hypermutations in the viral DNA product, which results in degradation and viral inhibition [4]. The Vif protein of HIV has evolved to counteract this restriction by binding to APOBEC proteins leading to proteasomal degradation.

One of the most studied APOBEC proteins and the first that was discovered to restrict HIV-1 replication is APOBEC3G. In the absence of the HIV-1 Vif protein, APOBEC3G is efficiently packaged into viral particles, causing restriction during reverse transcription. The gene was originally identified as an HIV restriction factor because its expression converted a T-cell line that could support the replication of an HIV lacking vif into one that had a non-permissive phenotype [1].

Three other members of the APOBEC family, APOBEC3D (A3D), APOBEC3F (A3F), and APOBEC3H (A3H) can also be packaged into HIV particles and inhibit viral replication, when stably expressed in human T-cell lines [5]. Endogenous A3D and A3F combine to generate the 5′-GA-to-AA mutation pattern observed in vif-negative HIV grown in the non-permissive T-cell line CEM2n [6, 7]. Of the seven different human haplotypes of APOBEC3H, only hapII, hapV and hapVII are stable at the protein level and capable of HIV restriction [8,9,10,11,12].

Several APOBEC3 (A3D, A3F, A3G and A3H) genes are known to possess common polymorphisms that render them defective with reduced antiviral activity and increased sensitivity to HIV-1 Vif [5, 13,14,15,16]. The genetic associations between natural polymorphisms in APOBEC genes and the ability of the resulting proteins to restrict HIV and the contribution of polymorphisms to overall HIV diversity and disease progression have not received widespread attention. Polymorphisms in APOBEC genes could also play a significant role in HIV-1 evolution and diversity, especially in African populations, where the prevalence of HIV-1 is still increasing.

African populations are characterized by a high level of genetic diversity owing to a large number of variable genes and alleles [17,18,19,20]. Patterns of genetic variation in the African population are influenced by a demographic history that includes changes in population size, admixture and locus-specific forces such as natural selection, recombination and mutation. Genetic studies of structural variation of genes across ethnically diverse populations have been conducted [21]. Many population genetic studies of African populations are based on analysis of genetic markers genotyped in a small number of people in selected populations, in projects such as the 1000 Genomes Project (2010) and the International Haplotype Map (HapMap) Project [22,23,24]. Although these projects are valuable in their description of the overall human genetic diversity, they are limited in their coverage of African populations [25]. Thus, it is important to continue to add information about African populations that are underrepresented in human genomic studies, such as the South African population.

South Africa embodies a rich collection of ethnic backgrounds in addition to the more recent Caucasian immigrants. The major ethnic groups include the Bapedi, Basotho, Ndebele, Swati, Tsonga, Tswana, Xhosa, Venda and Zulu. The genetic substructure of these populations has been assessed by studying the Y-chromosome and autosomal DNA resulting into a cluster of three specific groups: Tswana/Sotho, Nguni and Venda [26, 27]. It is of clear interest to characterize the APOBEC3 gene polymorphisms existing in these various populations, since they may play a crucial role in the restriction and evolution of HIV-1.

In the current study, we characterized the genetic variability within the coding regions of A3D, A3F, A3G and A3H to document the level of diversity in samples obtained from HIV-1 positive individuals attending three HIV clinics in the Limpopo Province of Northern South Africa.


Study population and DNA extraction

The study population was comprised of a total of 192 HIV-1 positive individuals from several ethnic groups (Venda, Bapedi, Tswana, Tsonga and Swati) who presented for routine care in clinics and hospitals in the Waterberg and Vhembe districts of the Limpopo province in Northern South Africa. There were 116 females and 76 males with an age range from 4 to 98 years and their viral load and CD4+ cell count ranged from < 20 to 623,250 copies/ml and 5 to 1353 cells/μl, respectively (Additional file 1: Table S1). These individuals were recruited from July 2013 to December 2015. DNA was extracted from peripheral blood mononuclear cells (PBMC), using the QIAamp DNA blood mini kit (Qiagen) according to the manufacturer’s instructions.

Primer design

Primers to amplify the four APOBEC3 genes (A3D, A3F, A3G, and A3H) were designed using Geneious® 8.1.5 software (Biomatters, Inc.). A nested PCR strategy was used to amplify each APOBEC gene. The outer primer set was designed to flank and amplify a long gene fragment in the 1st polymerase chain reaction (PCR), while two sets of primers were designed to amplify two fragments of each gene in a nested PCR using the 1st round PCR product as the template (Table 1). The primer sets were chosen using the information for the A3D, A3F, A3G and A3H genes in the Ensembl Genome Browser (ENSG00000243811, ENSG00000128394, ENSG00000239713 and ENSG00000100298).

Table 1 List of APOBEC3 primers designed; primer name, sequence and product size are indicated

Polymerase chain reaction (PCR) to amplify A3D, A3F, A3G and A3H genes

The Takara (LA) PCR Kit Ver. 2.1 for long DNA fragments amplification (Clontech) was used to amplify the complete 12.16 kb A3D, 13.31 kb A3F, 10.74 kb of A3G, and 6.8 kb A3H genes in a 1st round PCR reaction using genomic patient DNA. The 1st round primary PCR products were then used as templates in “nested” PCR reactions to generate shorter PCR products/ All the PCR reactions contained: 1X PCR Mg2+ plus buffer, 400 μM dNTPs, 0.2 μM of each primer (Table 1) and 1.25 units of LA Taq high fidelity polymerase in a total volume of 20 μl. The following cycling conditions were used for all PCR reactions: Initial denaturation at 94 °C for 1 min, 30 cycles of denaturation at 98 °C for 10s, annealing at temperatures varying from 53 °C to 68 °C for 15 min (depending on primers) and extension at 72 °C for 10 min. Final amplicons were purified using AMpure XP beads (Beckman Coulter) and quantified using a Qubit 3.0 Fluorometer with the dsDNA HS kit (Invitrogen). Equimolar concentrations of the two shorter amplicons generated for each gene were pooled and normalized to 1 ng using 10 mM Tris elution buffer.

Fragmentation, tagmentation and addition of Illumina indices

Purified Tn5 transposase enzyme was used to fragment about 1-10 ng of DNA amplicons to sizes ranging from 35 bp to 700 bp, tagged with sequencing adaptors, in a manner similar to the protocol used in the Illumina Nextera Kit. The reaction mixture contained: 4 μl tagmentation buffer (5X TAPS-DMF), 1-5 μl Tn5 transposase (1X-5X) and 1-10 ng DNA, with an addition of nuclease free water to add up to a final volume of 20 μl. The reaction was performed at 55 °C for 5 min. The Tn5 transposase enzyme was produced and characterized in the University of Virginia laboratory, using published protocols [28]. Following this step, unique Illumina dual-index barcodes (index1 (i7) and index 2 (i5)) were added to each sample in a short PCR of 12 cycles, followed by a second AMpure XP bead purification, generating 300-500 bp indexed fragments for sequencing. Using the full complement of Nextera XT indices, up to 96 individual samples were pooled for each run.

Library normalization, pooling and sequencing

After purification, libraries were size-verified using a bioanalyzer 2100 with a High Sensitive DNA assay kit (Agilent Genomics), quantified and normalized to a concentration of 4 nM each. The normalized libraries were then pooled, and denatured into single strands. For good cluster generation, 1.8pM of the pooled library spiked with 25–30% PhiX was then loaded into the sequencing cartridge. Biological sample sheets were created in Basespace by labeling each sample with the appropriate index and setting up a sequencing run for the MiSeq. Each run generated approximately 25 million reads/sequences per sample.

Demultiplexing and sequence quality control evaluation

Sequences were demultiplexed automatically on the MiSeq as part of the data processing steps and ends pairing. FASTQ files were generated for each sample representing the two paired-end reads. Sequence quality was validated using the Galaxy NGS platform Quality Control tools for sequence manipulation which includes the fastQC program.

Sequence filtering, trimming, mapping and variant calling

Sequencing data quality, including the duplication rate, percent GC, and read quality was assessed by quality control tools for high throughput sequencing data [29, 30]. After filtering low coverage samples, reads were aligned against the human genome with BWA-MEM [31]. Alignments were sorted, marked for duplicates, and indexed using SAMtools [32]. Variants were called using Freebayes, a Haplotype-based tool to detect variants using short-read sequencing data [33]. Variant calls were normalized and decomposed with vt, a unified representation of genetic variants, and functionally annotated using SnpEff, a program for annotating and predicting the effects of single nucleotide polymorphisms [34]. Comprehensive annotation and prioritization was performed using the GEMINI framework for Integrative Exploration of Genetic Variation and Genome Annotations [35]. All further data manipulation and analysis was performed using R, a Language and Environment for Statistical Computing [36].

Statistical analysis

Hardy-Weinberg equilibrium (HWE) and allele frequency comparisons

All variant loci detected within the coding regions of these genes were tested for deviation from the Hardy-Weinberg Equilibrium (HWE) using an excel HWE calculator and chi-squared test with P < 0.05 showing non-consistency with HWE [37]. To statistically assess the differences between allele frequencies in our SA population and other populations, a Fisher’s exact test was conducted using an online Graphpad QuickCalcs tool (, with the exception of the comparison with the large ExAC exome population, where a chi-squared test was used.

Pairwise linkage disequilibrium (LD) and haplotype assignment

Pairwise linkage disequilibrium (LD) analysis between the SNPs in each gene was performed to test if they were in LD in linkage disequilibrium in the African population from the 1000 Genome (1000G) project phase 3 (version 5), as well as in the entire 1000G population. This was done using the LDLink 3.0 web tool LDmatrix and LDpair modules ( This tool investigates patterns of linkage disequilibrium returning calculated D prime (D’), R squared (R2) and goodness-of-fit (chi-squared and p-values) to the variant rs number assigned by dbSNP that were used as input. Haplotypes for each APOBEC 3 gene were defined using the LDhap module, which calculates population specific haplotypes frequencies of all haplotypes observed for a list of query variants, using data from the 1000 Genome project phase 3 (version 5) [38]. The haplotypes present in each individual were then tallied from our sequence data, and the frequency of each haplotype within the population was calculated.


Single nucleotide polymorphisms (SNPs), detection of indels and verification

There is limited availability of APOBEC3 gene sequences from African populations, and when sequencing has been performed, it has often been limited to A3G [39]. In this study, we applied next generation sequencing to determine variation in the coding exons of the APOBEC genes A3D, A3F, A3G and A3H in DNA from 192 HIV-1 positive individuals residing in the Limpopo province of northern South Africa. The proteins expressed from these genes have all been shown to be capable of HIV restriction [5]. APOBEC 3 variation in this region has not been reported previously.


The A3D gene is 12.1 kb long (Table 1) and has seven exons with exon 5 shown to display the most variation. Good quality A3D sequences after targeted DNA amplification of the exons were successfully obtained for 168/192 subjects. In the DNA from these 168 individuals, 8 nonsynonymous and 2 synonymous changes were identified when compared to the GRCh37 build of the human genome (Table 2). Of the 168 subjects analyzed, 48.8% (82/168) were identified with nonsynonymous or synonymous changes in many positions in the coding region of the A3D gene, while no changes were detected in the remaining 51.2% (86/168). These changes included several previously identified changes. There were no insertions or deletions observed in A3D in the sequenced samples. Variant R248K was the most frequent, observed in 20.8% (35/168) of the patients, with 2 homozygotes, followed by R97C that was found in 11,9% (20/168) with 1 homozygote Three variants, R6K, L221R, and T238I, that have not been reported elsewhere, were observed as heterozygotes in 10.1, 1.8 and 4.8% of the patients respectively. No variants deviated from HWE (Table 2). Linkage disequilibrium (LD) values for the four SNPs with known allele frequencies in the 1000 genome populations were calculated using the total 1000G population, as well as the AFR group (see Additional file 2: Table S2). Most of the variants are not in LD (cut off > 0.1) in these populations, except for R248K and T316 T that are in marginal LD (D’ = 1, R2 = 0.122) in the overall, but not in the AFR group.

Table 2 APOBEC 3D, 3F, 3G and 3H nonsynonymous and synonymous changes, genotypes, amino acid position and change in the protein, frequencies and Hardy Weinberg Equilibrium calculations from the study population


The A3F gene is 13.3 kb long (Table 1). Two major transcript isoforms have been described for this gene (APOBECF-201 and APOBECF-202 in ENSEMBL). These contain seven and three exons, respectively and share one exon (exon 2). The most variation has been observed in APOBEC-201 exon 4. The A3F exons were all successfully amplified and sequenced from a total of 154/192 subjects. Synonymous or nonsynonymous changes were observed in 98.1% (151/154) of the subjects, while 1.9% (3/154) had no change relative to the GRCh37 human genome build (Table 2). In the 154 samples successfully sequenced, there were seven nonsynonymous changes (R48P, A78V, I87L, Q87L, A108S, V231I and Y307C) and seven synonymous changes (I117I, S118S, R143R, Y196Y, S229S, S327S and E245E). A78V and A108S were the most frequent nonsynonymous changes in A3F, found in 38.3 and 64.9% and of the subjects respectively (Table 2). A few of these variants (A108S, R143R, Y196Y and E245E), deviated from the HWE (P-values < 0.05). The synonymous I117I mutation has not been reported previously. No insertions or deletions were observed for A3F in the sequenced samples. LD values for rs variants with known allele frequencies in the 1000G database for the overall and AFR group are shown in Additional file 3: Table S3. As can be seen in the table, several of the A3F variants, are in strong LD with each other in these populations.


The A3G gene was the first APOBEC3 gene described as encoding an HIV restriction factor and it remains the most studied. The gene is 10.7 kb and has 8 exons (Table 1). We successfully amplified A3G from 165/192 subjects. A total of four nonsynonymous (H186R, R256H, Q275E and G363R) and four synonymous changes (S60S, A109A, F119F and L371 L) were observed in A3G with the most frequent being H186R (61.8%) and Q275E (32.7%), (Table 2). All of these variants have been described previously. In total, nonsynonymous or synonymous changes were observed in 91.5% (151/165) of our patients, whereas 8.5% (14/165) had no changes relative to the reference GRCh37 human genome. There were no insertions or deletions observed in this gene. No variants deviated from HWE (Table 2). LD values could be calculated for all of these variants with the exception of A109A (Additional file 4: Table S4), which had a very low frequency in our population.. Most of the variants are not in LD, but H186R and Q275E are in marginal LD (D’ = 1, R2 = 0.108) in the AFR group.


A3H is the shortest, but most polymorphic of the APOBEC3 genes we analyzed. It is 6.8 kb in length (Table 1) and contains 5 exons, with the most variation in exons 1, 2 and 3. We observed nonsynonymous or synonymous changes in all the study subjects that we obtained sequences from (133/192). We found 6 nonsynonymous changes (N15Δ, R18L, G105R, K121E, K140E and E178D) and one synonymous change (T43 T) (Table 2). The N15Δ deletion was the only deletion observed and it occurred in 104 of 133 subjects (78.2%) either in a homozygous (49) or heterozygous (55) form. No insertions were found. The T43 T, G105R, K121E, K140E and E178D variants occurred mostly as homozygous forms in 95.5–100% of all subjects (Table 2). The K140E variant is also present as a homozygous variant in 100% in the 1000G and ExAC databases (see Table 4) and is thus likely to represent a sequencing error in the reference genome or an extremely rare variant in the human population. All of the other A3H variants deviated significantly from the HWE (P-value < 0.05), (Table 2). All of the variants with the exception of K140E (where this could not be calculated) are in LD in the overall 1000G population and many are in LD also in the AFR group (Additional file 5: Table S5).

Determination of APOBEC 3 haplotypes

In order to better understand the A3 genetic changes observed in each subject, all clusters of variation within the genes were assigned into haplotypes as described in materials and methods and their frequencies calculated. These haplotypes were classified as either confirmed or unconfirmed based on the number of heterozygous variants. This classification was necessary due to the fact that the NGS reads were short and thus in many cases we could not determine if SNPs occurred on the same chromosome (Table 3). Nonsynonymous variants were considered and their genotypes (homozygous or heterozygous) were indicated. Low frequency variants (MAF < 5%) were excluded from the haplotype assignment. Comparisons were made to the GRCh37 human genome whose combinations are represented as haplotypes in A3D, A3F and A3G (Table 3). We identified four confirmed haplotypes for A3D, four confirmed haplotypes for A3F and four confirmed haplotypes for A3G (Table 3). It is worth noting that only haplotypes for A3G and A3H have been described previously [12, 15, 40, 41]. In the case of A3H, there are seven well characterized and six additional haplotypes that were recognized more recently. The seven well characterized haplotypes of A3H were recently described as having an impact on the genetic diversity of HIV-1 Vifs in the global pandemic [12, 15, 16]. All of the known A3H haplotypes (I-XIII) are combinations of 5 nucleotide changes located in exons 2, 3 and 4. Haplotypes II, V, and VII have been termed stable, because of the observed relatively long half-lives of the encoded proteins, enabling them to restrict HIV-1. Four of the haplotypes (I, III, IV, VI) have been termed unstable, since the encoded protein half-lives have been shown to be short, resulting in complete loss of the ability to restrict HIV [12, 39]. In our subjects, we identified 4 haplotypes for A3H: the stable haplotype II (15 N, 18R, 105R, 121E 178D), haplotype III (15Δ, 18R, 105R, 121E, 178D), haplotype IV (15Δ, 18 L, 105R, 121E, 178D) and haplotype X (15 Δ, 18R, 105R, 121E, 178E) (Table 3) [11, 12, 39]. Haplotypes III, IV and X all have the amino acid 15 deletion, known to make the Apobec 3H protein unstable. From the data in Table 2 and this haplotype analysis we can conclude that 41.4% of our patient population cannot express any stable ApoBec3H proteins and thus lack the ability to restrict HIV using Apobec 3H.

Table 3 Haplotypes frequencies for A3D, A3F, A3G and A3H

Allele frequencies and their comparison with other populations

We next compared the nonsynonymous and synonymous variant frequencies in the South African population in our study to previously reported variant frequencies in the following populations: African (AFR), East Asian (EAS), European (EUR), Ad Mixed American (AMR), and South Asian (SAS), as reported in the 1000 Genome Project phase III, the HapMap project (NCBI), the dsSNP database and the Ensembl genome browser. We also compared our allele frequencies to the ExAC consortium database that contains sequences from more than 60,000 individuals (Table 4).

Table 4 Comparison of A3D, A3F, A3G and A3H allele frequencies between our South African population (SA) and populations in the 1000 Genome Project including: East Asian (EAS), European (EUR), African (AFR), Ad Mixed American (AMR), South Asian (SAS), as well as data from the Exome Aggregation Consortium (ExAC)

In a previous study by Duggal and colleagues that compared Apobec 3 variation between Africans, Asian and Europeans, nonsynonymous variation in A3D (R97C, R248K); A3F (A108S, V231I, Y307C); A3G (H186R, E275Q (now Q275E) and A3H (15Δ, R18L, R105G (now G105R), E121K/D, E178D) were reported [13]. Our data suggest that several variants occur more frequently in our South African population than in the “African” population they previously studied [13]. These include R97C and T238A in A3D; A108S and Y307C in A3F; Q275E in A3G and N15Δ, R18L, G105R and E178D in A3H (Table 4).

Overall, the EAS, EUR, AMR, SAS populations and the ExAC consortium database showed a higher level of Apobec 3 conservation than our study population (Table 4). For example, the A3D sequences in these populations were more closely related to the reference GRCh37 human genome (98–100%) than in our SA population, resulting in signficant p-values for all the variant comparisons where allele frequencies were available. In the case of A3F and A3G, several variants were also present more frequently than in the other populations (see Table 4). In the case of A3H, the N15Δ variant was clearly present in significantly higher frequency in our population compared to the others. This was also the case for all of the other observed variants, with the exception of R18L and K140E, which as discussed above is likely a sequencing error or an extremely rare variant. R18L was significantly lower in all of the populations, with the exception of the AFR population, where it was not significantly different. This is in contrast to all of the other variants, which were significantly higher in our SA population than in the AFR population. In the case of A3 D, F and G, the frequency for some of the variants were also significantly higher in our population than in the AFR population, whereas others showed more similar allele frequencies (see Table 4).

The term “Africans” has been loosely used to describe datasets generated from different parts of the African continent. To provide a more accurate comparison, we next compared the variants detected in our study to the various components of the AFR data set that consist of more specific African subpopulations or people of African descent (Table 5). These included Americans of African Ancestry in USA (ASW); African Caribbeans in Barbados (ACB); Gambians in the Western Gambia (GWD); Esan in Nigeria (ESN); Luhya in Webuye, Kenya (LWK); Mende in Sierra Leone (MSL) and Yoruba in Ibadan, Nigeria (YRI). We noticed higher levels of single nucleotide changes in our population (with significant p-values) compared to most of the other populations for the following variants: T238A in A3D, S327S in A3F, S60S, Q275E and G363R in A3G and all of the variants in A3H with the exception of R18L (and K140E-see above). (Table 5). Notably, the variant frequency of R97C in A3D is almost the same as in ASW and LWK but higher than in the other populations. The frequency of R48P in A3F and the frequency of R256H in A3G were similar among all Africans.

Table 5 Comparison of A3D, A3F, A3G and A3H allele frequencies between our South African population (SA) and other African populations in the 1000 Genome Project including: the African Caribbeans in Barbados (ACB), Americans of African Ancestry in USA (ASW), Esan in Nigeria (ESN), Gambian in the Western Gambia (GWD), Luhya in Webuye, Kenya (LWK), Mende in Sierra Leone (MSL) and Yoruba in Ibadan, Nigeria (YRI)


In this study, we characterized SNPs and indels within the coding exons of several human APOBEC3 genes (A3D, A3F, A3G and A3H) to document the level of diversity in these genes in HIV infected individuals in a diverse South African population residing in the Limpopo Province in Northern South Africa. We observed a high level of A3 diversity and a higher prevalence of certain variants than has previously been observed in other African populations. Interestingly, some of these variants have previously been linked to HIV disease progression [14, 39, 42] (see below). The use of next generation sequencing also allowed the identification of SNP genotypes that were not previously identified in South Africa, since previous studies used older methods such as TaqMan, SNP array genotyping assays, restriction fragment length polymorphism (RFLP) or Sanger sequencing [39].

Common variants in APOBEC3 genes have been intensively studied and many have been found to have differential effects on antiviral activity [7, 13, 14, 39, 42]. For example, the variants R97C and R248K in A3D have been reported to moderately decrease antiviral activity [13]. In contrast, the A3F variants A108S, V231I and Y307C have been reported to have potent antiviral activity against HIV-1 ΔVif strains [43, 44]. SNPs in A3G can also alter its antiviral activity and sometimes enhance the rate of HIV-1 disease progression, as reported in a cohort of HIV-1 subtype C infected South African women and a US based cohort of African Americans [14, 39]. In particular, the H186R variant has previously been associated with more rapid decline in CD4+ cells and accelerated disease progression [14, 39, 42]. Our study shows that this variant is present in much higher frequency in our SA population than in the non-African populations and in the ExAC database (Table 4). This variant is similar in prevalence in our population to that in several other African populations (Table 5).

Recent studies have shown A3H as the most polymorphic member of the A3 family. The A3H variants (15Δ, R18L, G105R, K121E, E178D), which make up 7 different haplotypes, have been functionally described in other studies, showing varying protein expression and stability [8, 11, 16, 45,46,47,48]. Data from the 1000 genome project suggest that stable A3H haplotypes (II, V and VII) predominate in Africa while unstable haplotypes (I, III, IV, VI) are more prevalent in Asia [15], Interestingly, the unstable A3H haplotypes III and IV (which cannot restrict HIV) were unexpectedly high among our study population. This can be attributed mainly to the high prevalence of the deletion at amino acid residue 15 (Tables 2, 3, 4 and 5) that showed an allele frequency of almost 60% in our population. This is very different from what was reported in previous studies of Africans, in which stable A3H haplotypes were reported to be dominant [15] (see also Table 5). Data from two recent studies illustrate that stable A3H haplotypes may function as contemporary HIV-1 restriction factors, contributing to limiting viral replication and rates of transmission [12, 15]. It is unclear what role, if any, the unstable A3H haplotype III and IV, which are the only ones present in over 40% of the patients we analyzed, may play in the high prevalence and transmission of HIV-1 in Limpopo.

Because HIV-1 Vif acts as an antagonist to APOBEC proteins including A3H, we speculate that the distribution of stable versus unstable A3H haplotypes in our study might also influence Vif variation in HIV in our study population. Studies performed in primary CD4+ lymphocytes have shown that HIV-1 Vif variants with certain amino acid residues (F39 and H48), known as hyper Vifs, are better capable of neutralizing stable A3H genotypes, implying that HIV-1 Vif might adapt to the A3H haplotype in a particular population [15]. We are presently analyzing HIV-1 Vif sequences from our study subjects in order to determine a possible correlation between the A3H haplotypes and HIV-1 Vif genetic variation in this rural area of South Africa.

All the subjects in this study were HIV infected and were mostly at the chronic stage of infection. Even though there is to date no strong evidence that APOBEC 3 genotypes significantly affect HIV infection risk, it remains possible that HIV-1 negative subjects in Limpopo would present a significant different A3 profile. If this turns out to be the case, it could imply that A3 genotypes either alone or in combination influence HIV transmission. It will thus be important to compare HIV positive and negative individuals in future studies of APOBEC3 variants in this region. It is also possible that the overall APOBEC3 expression landscape may turn out to affect disease progression. However, exploring this hypothesis would require studies in which clinical data are correlated with APOBEC 3 expression. Future studies of this kind are clearly warranted, since a previous report comparing HIV-1 non-controllers versus long-term non-progressors (LTNP) reported that LTNPs express higher levels of A3G and A3F proteins [49].


We have shown that significant A3 variation exists among HIV patients in an ethnically diverse population in Northern South Africa, by providing extensive data for 4 different A3 genes that are known to restrict HIV infection, but have previously only been sparsely studied in African populations. Our NGS results provide a baseline for future studies that could functionally characterize the SNPs identified in the APOBEC3 genes in this population and specifically analyze how they affect restriction of HIV replication and Vif function. Such studies will serve to increase our understanding of how the APOBEC3 protein landscape might have shaped the HIV epidemic in Northern South Africa.



1000 genomes










African Caribbeans in Barbados African




Ad Mixed American


Apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like 3


Americans of African Ancestry in USA


East Asian


Esan in Nigeria




Exome Aggregation Consortium


Gambians in the Western Gambia


Hardy-Weinberg Equilibrium


Pairwise linkage disequilibrium


Luhya in Webuye, Kenya


Mende in Sierra Leone


Next generation sequencing


Peripheral blood mononuclear cells


Polymerase chain reaction


South Asian


Single nucleotide polymorphism


Yoruba in Ibadan, Nigeria


  1. Sheehy AM, Gaddis NC, Choi JD, Malim MH. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature. 2002;418(6898):646–50.

    Article  CAS  Google Scholar 

  2. Chiu Y-L, Greene WC. The APOBEC3 cytidine deaminases: An innate defensive network opposing exogenous retroviruses and endogenous Retroelements. Annu Rev Immunol. 2008;26(1):317–53.

    Article  CAS  Google Scholar 

  3. Harris RS, Liddament MT. Retroviral restriction by APOBEC proteins. Nat Rev Immunol. 2004;4:868–77.

    Article  CAS  Google Scholar 

  4. Zhang H, Yang B, Pomerantz RJ, Zhang C, Arunachalam SC, Gao L. The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature. 2003;424(6944):94–8.

    Article  CAS  Google Scholar 

  5. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL, et al. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. J Virol. 2011;85(21):11220–34.

    Article  CAS  Google Scholar 

  6. Refsland EW, Hultquist JF, Harris RS. Endogenous origins of HIV-1 G-to-a hypermutation and restriction in the nonpermissive T cell line CEM2n. PLoS Pathog. 2012;8(7):39.

    Article  Google Scholar 

  7. An P, Penugonda S, Thorball CW, Bartha I, Goedert JJ, Donfield S, et al. Role of APOBEC3F gene variation in HIV-1 disease progression and pneumocystis pneumonia. PLoS Genet. 2016;12(3):e1005921.

    Article  Google Scholar 

  8. Harari A, Ooms M, Mulder LCF, Simon V. Polymorphisms and splice variants influence the antiretroviral activity of human APOBEC3H. J Virol. 2009;83(1):295–303.

    Article  CAS  Google Scholar 

  9. Dang Y, Wang X, Esselman WJ, Zheng Y-H. Identification of APOBEC3DE as another antiretroviral factor from the human APOBEC family. J Virol. 2006;80(21):10522–33.

    Article  CAS  Google Scholar 

  10. OhAinle M, Kerns JA, Li MMH, Malik HS, Emerman M. Antiretroelement activity of APOBEC3H was lost twice in recent human evolution. Cell Host Microbe. 2008;4(3):249–59.

    Article  CAS  Google Scholar 

  11. Wang X, Abudu A, Son S, Dang Y, Venta PJ, Zheng Y-H. Analysis of human APOBEC3H haplotypes and anti-human immunodeficiency virus type 1 activity. J Virol. 2011;85(7):3142–52.

    Article  CAS  Google Scholar 

  12. Ooms M, Brayton B, Letko M, Maio SM, Pilcher CD, Hecht FM, et al. HIV-1 Vif adaptation to human APOBEC3H haplotypes. Cell Host Microbe. 2013;14(4):411–21.

    Article  CAS  Google Scholar 

  13. Duggal NK, Fu W, Akey JM, Emerman M. Identification and antiviral activity of common polymorphisms in the APOBEC3 locus in human populations. Virology. 2013;443(2):329–37.

    Article  CAS  Google Scholar 

  14. An P, Bleiber G, Duggal P, Nelson G, May M, Mangeat B, et al. APOBEC3G genetic variants and their influence on the progression to AIDS. J Virol. 2004;78(20):11070–6.

    Article  CAS  Google Scholar 

  15. Refsland EW, Hultquist JF, Luengas EM, Ikeda T, Shaban NM, Law EK, et al. Natural polymorphisms in human APOBEC3H and HIV-1 Vif combine in primary T lymphocytes to affect viral G-to-a mutation levels and infectivity. PLoS Genet. 2014;10(11):e1004761.

    Article  Google Scholar 

  16. Ooms M, Majdak S, Seibert CW, Harari A, Simon V. The localization of APOBEC3H variants in HIV-1 Virions determines their antiviral activity. J Virol. 2010;84(16):7961–9.

    Article  CAS  Google Scholar 

  17. Cavalli-Sforza LL. Genes, people and languages. Sci Am. 1991;265(5):104–10.

  18. Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science. 2009;324(5930):1035–44.

    Article  CAS  Google Scholar 

  19. Tishkoff SA, Williams SM. Genetic analysis of African populations: human evolution and complex disease. Nat Rev Genet. 2002;3(8):611–21.

    Article  CAS  Google Scholar 

  20. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451(7181):998–1003.

    Article  CAS  Google Scholar 

  21. Conrad DF, Hurles ME. The population genetics of structural variation. Nat Genet. 2007;39(7S):S30–6.

    Article  CAS  Google Scholar 

  22. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat Rev Genet. 2002;3:370–9.

    Article  CAS  Google Scholar 

  23. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Genetic structure of human populations. Science. 2002;298(5602):2381–5.

    Article  CAS  Google Scholar 

  24. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319(5866):1100–4.

    Article  CAS  Google Scholar 

  25. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, et al. A map of human genome variation from population scale sequencing. Nature. 2010;467(7319):1061–73.

    Article  Google Scholar 

  26. Lane AB, Soodyall H, Arndt S, Ratshikhopha ME, Jonker E, Freeman C, et al. Genetic substructure in south African bantu-speakers: evidence from autosomal DNA and Y-chromosome studies. Am J Phys Anthropol. 2002;119(2):175–85.

    Article  CAS  Google Scholar 

  27. Mitchell P. Genetics and southern African prehistory: An archaeological view. J Anthropol Sci. 2010;88:73–92.

    PubMed  Google Scholar 

  28. Picelli S, Faridani OR, Björklund ÅK, Winberg G, Sagasser S, Sandberg R. Full-length RNA-seq from single cells using smart-seq2. Nat Protoc. 2014;9(1):171–81.

    Article  CAS  Google Scholar 

  29. Andrews S. FastQC: A quality control tool for high throughput sequence data. 2010.

  30. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047–8.

    Article  CAS  Google Scholar 

  31. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Available: 2013.

  32. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  Google Scholar 

  33. Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants. Bioinformatics. 2015;31(13):2202–4.

    Article  CAS  Google Scholar 

  34. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92.

    Article  CAS  Google Scholar 

  35. Paila U, Chapman BA, Kirchner R, Quinlan AR. GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput Biol. 2013;9(7):e1003153.

    Article  CAS  Google Scholar 

  36. R Development Core Team. R: A Language and Environment for Statistical Computing. Vol. 0. Vienna Austria: R Foundation for Statistical Computing; 2010. p. {ISBN} 3–900051–07-0

    Google Scholar 

  37. Court MH MH. Court’s (2005–2008) online calculator. Tuft University Web site. 2012.

    Google Scholar 

  38. Machiela MJ, Chanock SJ. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics. 2015;31(21):3555–7.

    Article  CAS  Google Scholar 

  39. Reddy K, Winkler CA, Werner L, Mlisana K, Abdool Karim SS, Ndung’u T. Apobec3g expression is dysregulated in primary hiv-1 infection and polymorphic variants influence cd4+ t-cell counts and plasma viral load. AIDS. 2010;24(2):195–204.

    Article  CAS  Google Scholar 

  40. Feng Y, Chelico L. Intensity of deoxycytidine deamination of HIV-1 proviral DNA by the retroviral restriction factor APOBEC3G is mediated by the noncatalytic domain. J Biol Chem. 2011;286(13):11415–26.

    Article  CAS  Google Scholar 

  41. Mhandire K, Duri K, Mhandire D, Musarurwa C, Stray-Pedersen B, Dandara C. Evaluating the contribution of APOBEC3G haplotypes on influencing HIV infection in a Zimbabwean paediatric population. S Afr Med J. 2016;106:S119–23.

    Article  CAS  Google Scholar 

  42. Compaore TR, Soubeiga ST, Ouattara AK, Obiri-Yeboah D, Tchelougou D, Maiga M, et al. APOBEC3G variants and protection against HIV-1 infection in Burkina Faso. PLoS One. 2016;11(1):e0146386.

    Article  Google Scholar 

  43. Mulder LCF, Ooms M, Majdak S, Smedresman J, Linscheid C, Harari A, et al. Moderate influence of human APOBEC3F on HIV-1 replication in primary lymphocytes. J Virol. 2010;84(18):9613–7.

    Article  CAS  Google Scholar 

  44. Duggal NK, Malik HS, Emerman M. The breadth of antiviral activity of Apobec3DE in chimpanzees has been driven by positive selection. J Virol. 2011;85(21):11361–71.

    Article  CAS  Google Scholar 

  45. Tan L, Sarkis PTN, Wang T, Tian C, Yu X-F. Sole copy of Z2-type human cytidine deaminase APOBEC3H has inhibitory activity against retrotransposons and HIV-1. FASEB J. 2009;23(1):279–87.

    Article  CAS  Google Scholar 

  46. Li MMH, Wu LI, Emerman M. The range of human APOBEC3H sensitivity to lentiviral Vif proteins. J Virol. 2010;84(1):88–95.

    Article  CAS  Google Scholar 

  47. Zhen A, Wang T, Zhao K, Xiong Y, Yu X-F. A single amino acid difference in human APOBEC3H variants determines HIV-1 Vif sensitivity. J Virol. 2010;84(4):1902–11.

    Article  CAS  Google Scholar 

  48. Zhen A, Du J, Zhou X, Xiong Y, Yu XF. Reduced APOBEC3H variant anti-viral activities are associated with altered RNA binding activities. PLoS One. 2012;7(7):e38771.

    Article  CAS  Google Scholar 

  49. Jin X, Brooks A, Chen H, Bennett R, Reichman R, Smith H. APOBEC3G/CEM15 (hA3G) mRNA levels associate inversely with human immunodeficiency virus viremia. J Virol. 2005;79:11513–6.

    Article  CAS  Google Scholar 

Download references


The authors are grateful to the study participants; Jing Huang at the Myles Thaler Center for Human Retrovirus Research at the University of Virginia, USA for assisting with NGS, and Elizabeth Mashu Etta of the HIV/ AIDS & Global Health Research Programme, University of Venda for assisting with sample collection and processing.


Research reported in this publication was supported by the Myles H. Thaler Research Endowment at the University of Virginia and the South African Medical Research Council (RCDI) through funding received from the South African National Treasury and the South African National Research Foundation (GUN109312, GUN86037). The contents are solely the responsibility of the authors and do not necessarily represent the official views of the University of Virginia, the South African Medical Research Council or the National Research Foundation.

Nontokozo D. Matume was supported by the Research Capacity Development Initiative of the Medical Research Council (RCDI project number: 57009), and the Fogarty International Center/NIH (D43TW006578) as well as by research funds from the Myles H. Thaler Center for AIDS and Human Retrovirus Research at the University of Virginia.

Denis M. Tebit was supported by funds from the Myles H. Thaler Center for AIDS and Human Retrovirus Research at the University of Virginia, and also received partial support through a Carnegie African Diaspora Fellowship Award.

David Rekosh was partially supported by funds from the Myles H. Thaler Professorship at the University of Virginia.

Marie-Louise Hammarskjold was partially supported by funds from the Charles H. Ross Jr. Professorship at the University of Virginia.

In all cases, the funders had no role in study design, data collection, analysis and interpretation of data, or in the writing of the manuscript and decision to submit it for publication.

Availability of data and materials

All of the individual patient sequences used in this study (see Additional file 1: Table S1) have been submitted to the NCBI Sequence Read Archive (Project number: PRJNA429751) and can be accessed using the following link; The BioSample accession numbers for the individual patients are: SAMN08358664- SAMN08358841.

Author information

Authors and Affiliations



NDM performed the laboratory experiments, analyzed the data and prepared the first draft of the manuscript. DMT was involved in the interpretation of the data and revised the first draft of the manuscript. LRG was involved with the design of the sequencing primers and in the performance of the sequencing experiments. She was also instrumental in the analysis of data and the re-analysis in the revision of the manuscript. SDT performed the bioinformatic (SNP) analysis. DR conceptualized the experiments, was involved in the interpretation of the data and in the revision of the manuscript. POB arranged for the patient samples, was involved in the interpretation of the data and revision of the manuscript. MLH conceptualized the experiments, was involved in the interpretation of the data and revised the manuscript. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Pascal O. Bessong or Marie-Louise Hammarskjöld.

Ethics declarations

Ethics approval and consent to participate

The study protocol was approved by the Research Ethics Committee of the University of Venda (SMNS/13/MBY/01/0625) and the University of Virginia Institutional Review Board (IRB-HSR #16815). Permission to access public sector health facilities was obtained from the Limpopo Provincial Department of Health, South Africa. Written informed consent was obtained from all study participants prior to demographic and clinical data collection, and blood draw. Written consent was obtained from a parent or guardian on behalf of participants under the age of 16.

Consent for publication

Personal identifiers were stripped prior to sample processing and data analysis, so a request for consent is non-applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Table S1. Study Participants Demographic Information: Gender, Age, Ethinicity, Geography, HIV Viral Load, CD4+ cell count, Apobec3 genes sequenced. (DOCX 178 kb)

Additional file 2:

Table S2. Apobec 3D- Linkage Disequilibrium Calculations: D’ and R2 values. (PDF 42 kb)

Additional file 3:

Table S3. Apobec 3F- Linkage Disequilibrium Calculations: D’ and R2 values. (PDF 52 kb)

Additional file 4:

Table S4. Apobec 3G- Linkage Disequilibrium Calculations: D’ and R2 values. (PDF 33 kb)

Additional file 5:

Table S5. Apobec 3H- Linkage Disequilibrium Calculations: D’ and R2 values. (PDF 50 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matume, N.D., Tebit, D.M., Gray, L.R. et al. Characterization of APOBEC3 variation in a population of HIV-1 infected individuals in northern South Africa. BMC Med Genet 20, 21 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: