This article has Open Peer Review reports available.
Generation Scotland: Donor DNA Databank; A control DNA resource
© Kerr et al; licensee BioMed Central Ltd. 2010
Received: 26 May 2010
Accepted: 23 November 2010
Published: 23 November 2010
Many medical disorders of public health importance are complex diseases caused by multiple genetic, environmental and lifestyle factors. Recent technological advances have made it possible to analyse the genetic variants that predispose to complex diseases. Reliable detection of these variants requires genome-wide association studies in sufficiently large numbers of cases and controls. This approach is often hampered by difficulties in collecting appropriate control samples. The Generation Scotland: Donor DNA Databank (GS:3D) aims to help solve this problem by providing a resource of control DNA and plasma samples accessible for research.
GS:3D participants were recruited from volunteer blood donors attending Scottish National Blood Transfusion Service (SNBTS) clinics across Scotland. All participants gave full written consent for GS:3D to take spare blood from their normal donation. Participants also supplied demographic data by completing a short questionnaire.
Over five thousand complete sets of samples, data and consent forms were collected. DNA and plasma were extracted and stored. The data and samples were unlinked from their original SNBTS identifier number. The plasma, DNA and demographic data are available for research. New data obtained from analysis of the resource will be fed back to GS:3D and will be made available to other researchers as appropriate.
Recruitment of blood donors is an efficient and cost-effective way of collecting thousands of control samples. Because the collection is large, subsets of controls can be selected, based on age range, gender, and ethnic or geographic origin. The GS:3D resource should reduce time and expense for investigators who would otherwise have had to recruit their own controls.
This paper describes the collection and initial characterisation of a resource of control DNA and plasma samples accessible for research, the Generation Scotland Donor DNA Databank (GS:3D). Common conditions such as cancer, cardiovascular disease, diabetes and mental illness create a heavy burden of morbidity and mortality in developed countries and are consequently of major public health importance [1, 2]. These diseases have a significant heritable component but are difficult to analyse by traditional genetic techniques because they typically result from the combined effect of multiple genetic, lifestyle and environmental factors rather than from the effect of a single gene [2–4]. The completion of the Human Genome Project and the availability of cost-effective, high-throughput methods for systematically characterising genome-wide sequence variation have made it possible to dissect the genetic variants that predispose to complex diseases [2, 3]. This is accomplished through genome-wide association studies in which genetic variants (such as single nucleotide polymorphisms or copy number variants) are typed across the whole genome in large numbers of cases and controls. If a statistically significant increase in the frequency of a variant is observed in cases compared to controls, the region of the genome in linkage disequilibrium with the variant is implicated in disease risk [2, 3]. Genome-wide association studies have already yielded promising results for a number of common diseases [5, 6], with wide-ranging implications for the future of healthcare including better understanding of disease mechanisms, more accurate diagnosis, and personalised therapy [2, 7]. Nevertheless, reliable detection of disease-associated genetic variants requires very large sample and data sets and considerable associated infrastructure [5, 7]. Generation Scotland is a multi-institution, cross-disciplinary collaboration which aims to create an ethically sound, family- and population-based resource for identifying the genetic basis of common complex diseases [8, 9]. One of Generation Scotland's resources, the Generation Scotland: Donor DNA Databank (GS:3D), is a collection of 5,000 control DNA and plasma samples taken with full consent from blood donors across Scotland. This will help solve the problem of gathering the large numbers of controls that are essential for the success of case-control studies. GS:3D has also helped to develop an information technology infrastructure to support large-scale genetics research studies, including the implementation of systems to ensure efficient management of data and samples. GS:3D samples, together with non-identifiable demographic information, will be made available to researchers under appropriate standards of governance.
The objectives of GS:3D were
To recruit and minimally phenotype a cohort of healthy control participants from across Scotland, facilitating studies aimed at identification of genetic variants relevant to common complex diseases.
To extend a technology platform of informatics in genetics research, including the creation of user-friendly databases of study data.
Recruitment of participants
has a history of angina, ulcerative colitis or Crohn's disease.
is taking anticoagulant medication.
is taking beta blockers to treat cardiovascular disease.
has had cardiac surgery, a malignancy, a stroke or transient ischaemic attack.
has ischaemic heart disease, dementia, inflammatory bowel disease, multiple sclerosis, narcolepsy or active rheumatoid arthritis.
has diabetes insipidus or diabetes mellitus which requires medication.
requires maintenance treatment for mental health problems.
Potential donors are also excluded if their haemoglobin levels are below a threshold (125 g/L females, 135 g/L males). Donors must weigh at least 7st 12lb (50 kg). Blood donors are therefore people in good general health, especially regarding cardiovascular health.
Collection and processing of consent forms and demographic data
The questionnaires and attached consent forms from each session (up to 30 participants) were posted to the central research office in Edinburgh. On arrival, unique research ID labels corresponding to the appropriate paired filter and blood samples (see below) were added to the questionnaire and consent forms. Questionnaire data were entered on to a secure database by OMR scanning with a DRS Photoscribe PS900 IM2 scanner, which included a scan of the research ID barcode.
Processing of leucodepletion filters and blood samples
As part of the routine processing procedure of volunteer blood donations in the U.K., blood is filtered to remove cells, and the used filters are an excellent source of DNA . Leucodepletion filters identified as coming from study participants and containing cells for DNA extraction were sent from the SNBTS process and testing laboratories to the research laboratory (WTCRF Genetics Core, University of Edinburgh) in batches and stored for up to seven days prior to extraction. Whole blood samples taken from the prefiltration pouch were also received by the research laboratory in 9 ml EDTA (Becton Dickinson) tubes and stored at room temperature for up to five days from the date of collection until plasma was purified and stored. Although some plasma analytes are unstable over this period of time, many epidemiologically useful measurements can reliably be made . Filters and EDTA tubes were physically paired up by matching SNBTS barcodes. A unique GS:3D research sample ID was then assigned to each SNBTS barcode. Paired samples were logged by scanning barcodes into the GS:3D database, a bespoke study management program. The GS:3D sample ID was subsequently entered into a Starlims Laboratory Information Management System (LIMS), which assigned the required number of LIMS barcode labels and storage space for the resulting processed sample aliquots of DNA and plasma. Use of a LIMS in an SOP-driven core laboratory operating to a GLP standard helps to minimise the risk of sample mix-up.
Extraction and storage of DNA
The initial steps of the protocol for extracting DNA from the leucodepletion filters followed the method described by Cook et al, 2003 . All filters underwent a "draw-through" step to wash out and collect leucocytes. This involved cutting the inlet and outlet tubes at either side of the filter and dispensing 20 ml of phosphate buffered saline (PBS) (GIBCO) pH7.4 through the filter in the counter-direction to that indicated by the arrow on the side of the filter. PBS was dispensed using a 30 ml Terumo syringe (TERUSS-30E5Z1) and 20 ml of draw-through was collected in a labelled 50 ml falcon tube with screw lid (Greiner Bio-one Ltd) and stored at -40°C until ready for extraction. As a quality control procedure, 100 μl of blood cells from the 20 ml sample were spotted onto an FTA nucleic acid capture card (Whatman) which was archived at room temperature in a secure store. This is designed to allow investigation of any sample mix-up during the process of DNA extraction.
Each 20 ml sample was split into two samples of 10 ml each for DNA extraction using a Nucleon Kit (Tepnel Life Science) with the BACC3 protocol. At the DNA precipitation stage, both the upper phases from the two corresponding DNA extractions (originating from the same filter) were layered into a 15 ml EZ Flip tube. The precipitated DNA was hooked out and placed directly into a labelled 2.0 ml microtube (Scientific Specialities Inc) containing 1.5 ml TE buffer pH 7.5 (10 mM Tris-Cl pH 7.5, 1 mM EDTA pH 8.0). Microtubes were rotated for 2 weeks at room temperature until DNA was fully re-suspended. 8 out of every batch of 92 samples were electrophoresed on a 1% agarose gel to test for integrity of the DNA, and all were satisfactory. DNA concentrations (ng/μl) and levels of protein and RNA contamination were determined for all samples using the NanoDrop method (Thermo Scientific). 500 μl of each DNA master stock were transferred to a deep well plate then normalised to 50 ng/μl to make working stock plates. The remaining 1000 μl were archived in a microtube at -40°C.
Purification and storage of plasma
Whole blood in EDTA tubes was centrifuged for 15 minutes at 2000 g to separate plasma. Two aliquots of 1 ml of plasma were dispensed into 2 × 1.4 ml tubes (Fluid X Robo-rack-96) and labelled with a printed LIMS label and barcode. Pierceable lids (TPE Capclusters) were fitted to the tubes which were stored at -80°C in 96 position racks.
A total of 5,934 participants were recruited to the study by SNBTS donor services staff. However, leucodepletion filters were not received from some participants, and these individuals were excluded as they were of no value to the study. Filters were not received by the research laboratory for a variety of reasons. These include the participant being enrolled in the study but then failing to provide a full donation of blood; the identifying marker on the filter not being noticed by the SNBTS laboratory and therefore not being diverted into the onward transport protocol after processing; or the blood failing one of the many sensitive safety tests routinely performed on it by the SNBTS.
A total of 5,230 filters which passed all tests were received by the University of Edinburgh WTCRF research lab. However, the corresponding consent and questionnaire forms were not received for every filter. This occurred for a variety of reasons including a single batch of 22 forms being lost in transit. All consents received were checked by the research team, and some were found to be invalid (unsigned or undated, etc), which meant that the participant had to be removed from the study. A small number of samples were destroyed due to problems during processing in the research lab. Together, this resulted in a total of exactly 5,000 participants with a complete set of data, valid consent and filter processed to cells ready for DNA extraction. Two of these participants withdrew from the study, after leaving the clinic and having time to reflect.
Recorded ethnicity of GS:3D participants
All GS:3D (%)
White - Scottish
White - Other British
White - Irish
White - Other
Asian - Indian
Asian - Pakistani
Asian - Bangladeshi
Asian - Chinese
Asian - Other
Black - Caribbean
Black - African
Black - Other
The primary objective of this study was to establish a collection of samples and an accompanying database of demographic data as a resource for future analysis. Association studies, particularly genome-wide association studies using high density microarrays, are currently a widely used design for investigating the genetics underlying complex diseases [5, 6]. This approach looks for a statistical association between genetic variants (SNP or copy number variants) and a defined phenotype . An association study is usually conducted in a population-based sample of unrelated affected and unaffected individuals (a case-control study). The selection of appropriate and sufficient control samples is therefore crucial for success [5, 7]. Ideally, control samples should reflect the ethnic and geographic (as a proxy for genetic) composition of case samples, and blood donors have been shown to be an appropriate control group for many complex conditions [5, 6]. The statistical power of case-control studies can be increased by increasing the numbers of controls.
Data and sample extraction and access
The GS:3D DNA samples all passed the routine quality control tests of intactness and purity described in Methods. While a few samples had low yield (less than 30 μg), the vast majority yielded over 1 mg of purified DNA. This is a significant amount of material, as each Taqman SNP genotype assay consumes up to 20 ng of DNA, and each Illumina whole genome scan (of up to 1 million markers) consumes around 200 ng of DNA. The GS:3D resource therefore should have an extremely long lifespan. If necessary, at some future point sample stocks could be replenished through the technique of whole genome amplification .
Genotype data from analysis of the DNA samples is at an early stage, but initial results show the first few hundred samples to be of good quality with a high call rate in Taqman SNP genotyping assays (L. Murphy, WTCRF, pers. comm.). Master and working stocks of DNA and aliquots of plasma are available to researchers in the U.K. for hypothesis-driven analyses with appropriate ethical approval. An access process has been defined with reference to Wellcome Trust and Medical Research Council guidance . Access to the samples and data is reviewed by the Generation Scotland Resource Management and Development Committee, and only de-identified data can be made available. Researchers will be obliged to return derived data after an agreed time period, and acknowledge the resource in publications arising.
The development of an IT infrastructure that can support genetics research is an important part of Generation Scotland. Innovative aspects of the IT infrastructure in GS:3D are primarily in the use of optical mark read (OMR) technology to collect questionnaire data and the development and implementation of web-based study management tools. A web-based interface provides password-protected access to summary statistics on the questionnaire data. Participants are identified by a unique research ID number, the production and usage of which is tightly monitored. Systems were developed to ensure efficient and confidential handling and management of all data and samples. The GS:3D database was used to record all exceptions (e.g. consent form not fully filled in, no tube received for plasma), resulting in a complete audit trail of all samples and data that were part of the study. These study management systems are outlined in Macleod et al, 2009 . In due course a cumulative genotype database will be available to researchers through a web portal with appropriate permissions and security. Genotype data across the whole set of DNA samples, ideally genome wide at high density, should be available in the future.
All components of GS:3D, including the protocol and written materials provided to participants, have received ethical approval from the NHS Research Ethics Committee for Scotland A (REC reference number: 06/MRE00/105). In addition, local approval has been obtained from NHS Lothian, Glasgow and North of Scotland Research Ethics Committees, from NHS Research and Development Offices, and from an SNBTS management committee. GS:3D has been granted Research Tissue Bank status by the Tayside Committee on Medical Research Ethics B (REC Reference Number: 10/S1402/21).
The physical and administrative separation of the clinical (SNBTS) and research (University of Edinburgh) teams was designed to minimise the risk to the privacy of participants, while maximising rate of recruitment.
No personal or identifier information was given to the research team by the SNBTS and there is no mention of participation in the study on the donor service record. After a period of 28 days during which participants were able to withdraw, all links to the SNBTS identifier were broken.
It is impossible to know what precise purposes the resource will be used for in the future; therefore fully informed consent for the use of data and DNA and plasma samples cannot be obtained. Instead, "blanket" consent was sought, with ethics approval through the research tissue bank access process required for each new use.
It was emphasised that participation was entirely voluntary. Withdrawal was allowed up to 28 days after the donation session, upon which all data and samples relating to the withdrawing participant were destroyed. After the 28 day period, withdrawal was not possible because the samples and data were de-identified.
GS:3D will be of particular utility to studies of Caucasian populations, but should also have wider applications, for example in testing new genotyping methodologies. The resource is complementary to other population-based genetic epidemiology studies, such as the Generation Scotland: Scottish Family Health Study  and the UK Biobank , which were established primarily to characterise genes and genetic risk in the population. The model of recruitment described here differs from that of most genetic cohort studies by using the infrastructure and expertise of the Blood Transfusion Service. Other study designs usually involve research clinics, which have the advantage of allowing more study-specific data to be collected, but the disadvantage of significantly increased recruitment costs. Such models of recruitment require funding at the level of programme grants, rather than the project grant which was sufficient to implement GS:3D. Although detailed phenotyping of traits relevant to complex diseases is not available in GS:3D, inclusion of participants who would meet the criteria for a case is likely to be rare, due to the stringent blood donor exclusion criteria described in the Methods section. Furthermore, low levels of such misclassification should only have a small adverse effect on power . The Wellcome Trust Case-Control Consortium initially used a control:case ratio of 1.5:1, combining 1958 Birth Cohort and UK Blood Transfusion Service controls, but also expanded this to up to 7.5:1 by including cases for other diseases as controls. This expansion increased evidence for association at most of the loci that received the strongest support from the primary analysis .
The GS:3D study protocol has allowed the efficient generation of a new large scale resource of DNA and plasma samples. This collection is suitable for use in genetic studies of human disease and the sample size is large enough to give substantial numbers of controls selected on the basis of age range, gender, ethnic or geographic origin. The use of blood donors is a cost-effective way to collect a large number of control DNA samples. The availability of the GS:3D resource should reduce costs to investigators who would otherwise have had to recruit their own controls.
We are extremely grateful to all the participants, and the SNBTS donor services, transport and Process and Testing laboratory teams. The contribution of Carol Garnett, SNBTS National Standards Manager, was particularly valuable. We thank SNBTS Local Investigators Dr Rhona Watkins, Glasgow, Dr Sam Rawlinson, Dundee and Dr Peter Forsyth, Inverness. Samples were booked in and processed by the expert staff at the Genetics Core of the Wellcome Trust Clinical Research Facility, Edinburgh. The whole Generation Scotland team, which includes academic researchers, IT staff, laboratory technicians, clerical workers, statisticians and research managers, was vital to the success of the study. We gratefully acknowledge the statistical genetics expertise provided by Dr Andy Macleod and constructive advice on drafting the manuscript from Dr Isabel Hanson. Generation Scotland (GS) is a collaborative initiative between the Universities of Aberdeen, Dundee, Edinburgh and Glasgow and the National Health Service (NHS) Scotland. The Chief Scientist Office of the Scottish Government and the Scottish Funding Council provide core support for Generation Scotland. GS:3D was funded by a project grant from the Scottish Executive Health Department, Chief Scientist Office, grant number CZB/4/285.
- Lopez AD, Mathers CD, Ezzati M, Jamison DT, Murray CJ: Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet. 2006, 367: 1747-1757. 10.1016/S0140-6736(06)68770-9.View ArticlePubMedGoogle Scholar
- Lango H, Weedon MN: What will whole genome searches for susceptibility genes for common complex disease offer to clinical practice?. J Intern Med. 2008, 263: 16-27. 10.1111/j.1365-2796.2007.01895.x.View ArticlePubMedGoogle Scholar
- Hirschhorn JN, Daly MJ: Genome-wide association studies for common diseases and complex traits. Nat Rev Genet. 2005, 6: 95-108. 10.1038/nrg1521.View ArticlePubMedGoogle Scholar
- McCarthy MI, Hirschhorn JN: Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008, 17 (R2): R156-165. 10.1093/hmg/ddn289.View ArticlePubMedPubMed CentralGoogle Scholar
- Wellcome Trust Case Control Consortium: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447: 661-678. 10.1038/nature05911.View ArticleGoogle Scholar
- Johnson AD, O'Donnell CJ: An open access database of genome-wide association results. BMC Med Genet. 2009, 10: 6-10.1186/1471-2350-10-6.View ArticlePubMedPubMed CentralGoogle Scholar
- McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008, 9: 356-69. 10.1038/nrg2344.View ArticlePubMedGoogle Scholar
- Smith BH, Campbell H, Blackwood D, Connell J, Connor M, Deary IJ, Dominiczak AF, Fitzpatrick B, Ford I, Jackson C, Haddow G, Kerr S, Lindsay R, McGilchrist M, Morton R, Murray G, Palmer CNA, Pell JP, Ralston SH, St Clair D, Sullivan F, Watt G, Wolf R, Wright A, Porteous D, Morris AD: Generation Scotland: the Scottish Family Health Study; a new resource for researching genes and heritability. BMC Medical Genetics. 2006, 7: 74-10.1186/1471-2350-7-74.View ArticlePubMedPubMed CentralGoogle Scholar
- Macleod AK, Liewald DCM, McGilchrist MM, Morris AD, Kerr SM, Porteous DJ: Some principles and practices of genetic biobanking studies. Eur Resp J. 2009, 33: 419-425. 10.1183/09031936.00043508. 2009View ArticleGoogle Scholar
- GS:3D web page. [http://www.generationscotland.org/gs3D.htm]
- Cook MA, Jobson SE, Atkinson DC, Lowe DP, Farmer SL, Alvi-Ali WJ, Smith NA, Briggs DC: Used leucodepletion filters as a source of large quantities of DNA suitable for the study of genetic variations in human populations. Transfus Med. 2003, 13: 77-82. 10.1046/j.1365-3148.2003.00423.x.View ArticlePubMedGoogle Scholar
- Clark S, Youngman LD, Palmer A, Parish S, Peto R, Collins R: Stability of plasma analytes after delayed separation of whole blood: implications for epidemiological studies. Int J Epidemiology. 2003, 32: 125-130. 10.1093/ije/dyg023.View ArticleGoogle Scholar
- Lasken RS, Egholm M: Whole genome amplification: abundant supplies of DNA from precious samples or clinical specimens. Trends Biotechnol. 2003, 21: 531-535. 10.1016/j.tibtech.2003.09.010.View ArticlePubMedGoogle Scholar
- Generation Scotland Access Policy web page. [http://www.generationscotland.org/access_policy.htm]
- Elliott P, Peakman TC, UK Biobank: The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int J Epidemiol. 2008, 37: 234-44. 10.1093/ije/dym276.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2350/11/166/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.