Study population and sample design
The Baependi Heart Study  is a genetic epidemiological study of cardiovascular disease risk factors, with a longitudinal design. Baseline enrollment occurred between December, 2005 and January, 2006, selecting 1,857 individuals distributed in 95 families resident in the municipality of Baependi, a city located in the Southeast of Brazil. Probands were identified from the community at large in several stages. Eleven census districts (from a total of twelve) were selected for study and the residential addresses within each district were randomly selected (first by randomly selecting a street, then randomly selecting a household). Only individuals age 18 and older, living in the selected household, were eligible to participate in the study.
Once a proband was enrolled, all his/her first-degree (parents, siblings, and offspring), second-degree (half-siblings, grandparents/grandchildren, uncles/aunts, nephews/nieces, and double cousins), and third-degree (first-cousins, great-uncles/great-aunts, and great-nephews/great-nieces) relatives and his/her respective spouse's relatives, who were at least 18 years old, were invited to participate. After the first contact with the proband, the first degree relatives were invited to participate by phone, including all living relatives in the city of Baependi (urban and rural areas) and surrounding cities. To recruit the participants, the study was advertised through provincial, religious, and municipal authorities, on local television, in newspapers and radio messages, through physicians, and by phone calls. For physical examination, a clinic was established in an easily accessible sector of Baependi.
Information regarding family relationships, sociodemographic characteristics, medical history, and environmental risk factors such as physical activity, smoking habit, and alcohol use were evaluated through a questionnaire completed by each participant. The questionnaire was based on the WHO-MONICA epidemiological instrument, and it was applied and filled out by research assistants specially trained for this task.
The study protocol was approved by the ethics committee of the Hospital das Clínicas, University of São Paulo, Brazil, and each subject provided informed written consent before participation.
The smoking profile of this population was delineated through four dimensions: smoking initiation, persistence, quantity (related at average daily cigarette consumption), and age-at-onset of regular cigarette use. No other form of tobacco (pipe, cigar, etc.) was considered.
Five global questions collected the smoking data of the individuals. Two questions were related to smoking cessation and cigarette smoke exposure, but these data were not used in this study.
The first question assessed the smoking status of the individuals through three possible choices. "Did you already smoke cigarettes?" (1) Yes, in the past, but not currently; (2) Yes, and I still smoke; (3) I do not smoke. Option (2) refers to regular cigarette use, and the three choices are thus characterized as (1) former, (2) current, and (3) non- smokers, respectively. Individuals that never smoked, or who tried smoking a few times, but never smoked regularly were classified as non-smokers. The second question was related to the age-at-onset of regular smoking: "How old were you when you started smoking regularly?", and the last question assessed the average daily consumption of cigarettes: "How many cigarettes do you/did you use to smoke per day?" The two last questions were answered by former and current smokers.
Smoking initiation and persistence were analyzed as dichotomous variables, contrasting ever versus never, and former versus current smokers, respectively. The smoking quantity was analyzed as a continuous variable, representing the average number of cigarettes smoked per day. Natural log-transformation was applied for this trait in order to achieve the required normality assumption, enabling it to be analyzed as a continuous variable. The skewness and kurtosis statistics after natural log-transformation were - 0.39 and 2.31, respectively.
Familial correlations using the pairwise weighting scheme were computed for all main pair types of relatives available in the pedigrees employing the program FCOR of the computer package SAGE .
Polygenic heritability estimates for smoking initiation, persistence, and quantity were calculated using the variance-components approach contained in the SOLAR package . In the variance component model, the level of the trait for individual i (denoted by γ
where μ is the general mean of the trait, and β
is the regression coefficient for covariate j, when applicable, which assumes the value X
for individual i. The remaining parameters g
are the residual genetic effect due the polygenic term, and random error component, respectively. The random effects g
are assumed to be uncorrelated and normally distributed with mean zero and variance and , respectively. As usual, the error component is unique to each individual, whereas the polygenic component is shared between individuals in proportion to their kinship coefficient. Thus, the covariance between traits for individuals i and i' is given by:
The parameter 2ϕii' is the coefficient of the relationship between individuals i and i'. The likelihood of the traits of family members is assumed to follow a multivariate normal distribution. Estimates of the mean and variance components were obtained using maximum likelihood methods.
Two models were fitted to data of smoking initiation, persistence, and quantity, considering no covariate effects (model I), and age, sex, age2, and age by sex interactions effects, simultaneously (model II). In all analyses, the covariate age represents the age at the time of interview.
Household group analyses were also performed using the SOLAR system . An additional variance parameter was added to model the effect of common environment, which is associated with any non-genetic factor shared between individuals living in the same household at the time of study. Using current residential addresses to define households, we have obtained 740 nuclear families from 95 families of the Baependi Heart Study. Household effects were investigated in both polygenic models (models I and II).
Models with distinct genetic and environmental variance components were also employed to evaluate the evidence of heterogeneity among genders in the heritability estimates of smoking initiation, persistence, and quantity, following the method described by Giolo et al. . Assuming that the phenotypes in males and females are influenced by the same set of genes with distinct effects among genders, models I and II, described above, were fitted to the data. Again, the covariate age represents the age at the time of interview. Four situations regarding the genetic and environmental variance components among genders were considered: homogeneity in both variance components; heterogeneity in at least one of the variance components; heterogeneity only in the environmental variance components; heterogeneity only in the genetic variance components. Likelihood ratio tests were applied to define the models that presented the best fit to the data.
The variance-components approach is not appropriate for analyzing age-at-onset data due to the presence of censored observations. For age-at-onset of regular cigarette use, we used the random effects Cox proportional hazards model, proposed by Pankratz et al. (2005) , implemented in the coxme function of the R kinship library .
We fitted two mixed-effects proportional models to assess the genetic and shared family environmental factors influencing the age-at-onset of regular cigarette use. The first model corresponds to the polygenic effect shared by individuals within the family according to the degree of their relationships. The second model simultaneously includes both shared polygenic and shared family environmental effects. These same models were also employed to analyze the heterogeneity in variance components by gender. In all analyses, the covariate sex was included as a fixed effect in the model. Confidence intervals of the variance components were computed based on a profile-likelihood method . This profile-likelihood method reduces the log-likelihood to a function of a single parameter of interest by treating the others as nuisance parameters and maximizing over them. A code in R  was used to obtain such confidence intervals.
It is not possible to obtain direct heritability estimates from mixed-effects Cox models, since there is no random error variance component. However, the variance components estimated may be interpreted as measures of familial aggregation . The relative risk of the smoking behavior that corresponds to the random effect is obtained by exponentiation of the square root of the variance component. Relative risks for each covariate were obtained by exponentiation of their regression coefficients.