An access process has been defined for Generation Scotland resources and is fully operational. Family-based designs for genome-wide association studies are of renewed interest . High density genome-wide genotype data on 10,000 GS:SFHS DNA samples will soon be generated, together with whole exome sequencing of DNA from nearly 1,000 participants. However, it is important that before resources are committed to such large scale genotyping or DNA sequencing, family relationships are verified where possible and quality of the samples is confirmed. Confident identification of pedigree errors could also allow correction of the stored data, thus improving its accuracy.
This study found high or very high call rates for DNA from all of the samples tested, and for all of the 32 SNPs assayed, and a high rate of consistency with recorded pedigrees. The 32 SNPs in this analysis were chosen for a lung function study , rather than for pedigree testing, but provide a good range of allele frequencies in this Northern European population (Table 1), and most are independent, allowing testing of the recorded family structures. The relative frequency of the major and minor alleles is an important determinant of how informative a biallelic SNP assay is in a pedigree. Error detection rates are lowest when the two alleles have equal frequencies , i.e. there is a higher chance that any (unrelated) trio would show a genotype consistent with Mendelian inheritance, despite not actually being related. Conversely, with a minor allele frequency of 10%, it is more likely that an inheritance discrepancy would be evident in an incorrect pedigree . Detection rates are generally lower when the error occurs in a parent than in an offspring . Teo et al described Nucl3ar software, which assesses the extent of pedigree inconsistent genotype configurations in the presence of genotyping errors. This recognised the problem which was addressed here by analysis using other software as described.
Any pedigree errors will be unequivocally apparent with higher throughput data such as GWAS, but the expected inconsistency rate when a pedigree error is present with current data would require detailed simulation to calculate, as it depends on allele frequency in the population, and in the families studied. There are 27 possible genotype configurations for genotype data at a SNP for the three individuals in a trio, of which 15 are pedigree consistent and 12 are pedigree inconsistent (see Teo et al Figure 2 for a summary diagram). The few inconsistent trios detected here could have arisen because of errors in pedigree data collection, sample handling or labelling errors in the clinic or lab, in the sample selection for genotyping, or genotyping call errors. Pedigree data was recorded during the volunteer recruitment process, and has not been independently verified. Cross-checking with the General Register Office for Scotland would be laborious and outside the terms of consent. Participants could also have failed to disclose adoption, or there could be a different biological father to that recorded.
Reliable estimates for non-paternity rates are difficult to establish, with high quoted rates often proving to be anecdotal . A median rate of paternal discrepancy of 3.7% was reported in a review of 17 populations, studied for reasons other than disputed paternity . The true rate may lie closer to 1% in the UK and elsewhere in Europe [14, 15]. The analysis of 925 trios presented here (Table 3) is unable to unequivocally distinguish the source of all of the few apparent errors present, with inconsistencies in the father, mother or either parent apparently occurring at approximately equal frequency. Consideration of the 16 trios with two or more independent SNPs showing inconsistency (Table 3, fourth column) shows that in 3 of the trios the inconsistency in the child is not with the genotype of the mother, indicating a maximum estimated non-paternity rate in the Scottish Family Health Study of less than 1.5% (13 trios out of 925 analysed). The true rate is likely to be considerably lower, as it is unlikely that all discrepant results are caused by incorrect pedigrees. These relatively low rates may in part be due to non-participation in the study by women who knew the paternity of their child was uncertain. Participants were informed (in the information online) that “As part of the Scottish Family Health Study, researchers will perform tests to check that family members are genetically related, because this is essential for the success of the study. The researchers who carry out these tests will not know, or be able to find out, the identities of the people who gave the samples. Generation Scotland will not pass the results of family testing back to families”. Our study provides a first estimate of these kinds of errors in the Scottish Family Health Study. More refined estimates will be generated once it is feasible to run genome-wide genotyping arrays for these samples, as the extensive information on such arrays will improve both the sensitivity of error detection and the resolution of genotyping errors from pedigree inconsistencies. Whilst genome-wide genotyping lies outside the scope of the current study, the wealth of phenotype data available in GS:SFHS mean that it will prove a rich resource for genome-wide association studies in the near future.