Computing the Carrier Probability
Mendelian models require knowledge of which disease each relative developed and the age when it was diagnosed. For example, for BRCAPRO, the diseases are age at ovarian cancer or breast cancer onset. Although there can be many causes of censoring [20], we restrict to a single independent non-informative censoring being the minimum of the age alive after which no information is known or the age of death. In this framework, everyone is eventually censored but disease history up to that age of censoring is observed. Denote the age at which censoring occurs for each family member i (i = 0 is the consultand) as U
i
.
A Mendelian model considers D types of diseases that could occur. Each person has a binary vector indicating disease history c
i
= (c
i,1, ..., c
i,D
) where c
i,k
= 1 indicates that disease k occurred at age y
i,k
and let y
i
= (y
i,1, ..., y
i,D
) be the vector of all ages of disease occurrence. In y
i
, the age for any disease that did not occur is irrelevant, and so can be set to 0. Denoting disease information as T
i
= {y
i
, c
i
}, each person's history is the information H
i
= {U
i
, T
i
} and the full family history is the collection H = {H
0, H
1, ...}.
Additionally, each person can have auxiliary variables x
i
and let x = {x
0, x
1, ...}. Auxiliaries can be any extra information known by the consultand, for example, environmental factors, genetic test results, or ethnicity. For example, in BRCAPRO, x
0 indicates if the consultand is of Ashkenazi Jewish ancestry, an ethnic group with increased prevalence of BRCA mutations. Implicitly, all probabilities in this paper will condition on x, so for simplicity we only explicitly show x in the conditioning when useful.
Mendelian models assume that individuals independently inherit one allele from each parent at each autosomal locus and that the alleles are either normal or mutated. Let γ
i
= 0,1 indicate carrying the genotype(s) that confer(s) disease risk: for example, γ
i
= 1 for a dominant trait when the member carries at least 1 mutant allele, but for a recessive trait γ
i
= 1 implies that the relative carries two mutant alleles. We call γ
i
the carrier status. The prevalence of γ
i
= 1 amongst people with consultand-specific auxiliaries x
0 is π
x
.
The aim of a Mendelian model is to compute the consultand's carrier probability P(γ
0 = 1|H, x). By Bayes rule, the odds of the consultand being a carrier is a product of the carrier odds in the population and the Bayes Factor (BF):
The BF is a ratio of likelihoods. We compute the likelihood, assuming that each member's phenotype H
i
is independent of all other members' phenotypes H
-i
and auxiliaries x
-i
given that member's carrier status γ
i
and auxiliary variables x
i
[21]:
P(H
i
|γ
i
, x, H
-i
) = P(H
i
|γ
i
, x
i
). (2)
The likelihood is
This depends on family history only through the contributions P(H
i
|γ
i
, x
i
), so we focus on computing these. To ease notation, auxiliaries x
i
are always implicitly conditioned on, and will be made explicit only when useful. For more details, a discussion of underlying assumptions, and an explicit derivation of the Bayes Factor [5, 21].
Each person's likelihood contribution P(H
i
|γ
i
, x
i
) will be computed assuming that competing risks are independent given carrier status and auxiliaries [20]. This is plausible for BRCAPRO because time to ovarian cancer and ipsi/contra-lateral breast cancers appear to be mutually independent in BRCA mutation carriers, except for dependence caused by medical interventions like oophorectomy [22, 23] and interventions are explicitly handled in this paper. Auxiliaries x
i
can include all information necessary to make the assumption more plausible [20]. Thus for simplicity and relevance to BRCAPRO, we restrict to independent competing risks.
Under independent competing risks, define the hazard for each disease k > 0 at age T given carrier status as λ
k
(y|γ). The disease-specific survival, the probability of surviving disease k to age y, is
The disease-specific density, the probability of getting disease k at age y, is
f
k
(y|γ) = λ
k
(y|γ) × S
k
(y|γ). (5)
Each likelihood contribution P(H
i
|γ
i
) is an ignorably right-censored survival likelihood contribution, which is the product of disease-specific densities for diseases that occurred and the disease-specific survivals for diseases that did not occur [20]:
Incorporating Medical Interventions
Medical interventions censor the natural time to disease, leaving only the time to disease after intervention. Along with pre-intervention quantities Y
i
, C
i
, U
i
, there is the age of intervention I
i
and post-intervention quantities: post-intervention disease types , ages of disease , and censoring age . Let = {, } and the post-intervention history be = {, }. If intervention occurs, then set U
i
= I
i
. Furthermore, any genetic test results known on relatives can be included as an auxiliary x
test
. Genetic test results provide important information, and Mendelian models can account for imperfect test sensitivity and specificity [5, 24].
To compute each family member's likelihood contribution including potential interventions, figure 2 shows the conditional dependencies between all pre/post-intervention quantities [25]. This graph shows the flow of information from carrier status to pre-intervention disease to intervention to post-intervention disease and will determine which quantities that the contribution requires or can ignore. The graph does not show any quantities in the intervention decision that are obviously unrelated to carrier status, like desire for children. Such quantities can be ignored because they provide no information about carrier status. In the graph, T
i
, x
test
, U
i
, H
-i
are the four factors that point to choosing intervention I
i
. But since U
i
does not connect back to γ
i
, it provides no information on carrier status. Thus only T
i
, x
test
, and H
-i
affect a person's decision to have an intervention and could yield information about carrier status. The likelihood contributions in equation (6) clearly depend on T
i
and x
test
. In addition, the contributions also condition on H
-i
: H
-i
only disappears from (6) because of assumption (2). The expressions below will explicitly show the conditioning on H
-i
to clarify that H
-i
is accounted for by the likelihood contributions. Thus the likelihood contributions contain all quantities related to both carrier status and intervention.
Each person's likelihood contribution depends on whether intervention was chosen. First consider the contribution from a person who did not choose intervention:
P(I
i
> u
i
, H
i
|γ
i
, x
i
, H
-i
) = P(I
i
> u
i
|H
i
, γ
i
, x
i
, H
-i
) × P(H
i
|γ
i
, x
i
). (7)
Note that the second factor is the usual contribution from equation (6) that does not handle interventions. The first factor tries to extract information about carrier status from choosing not to undergo intervention. But by figure 2, as long as the full family history and any genetic test results are known, all three paths back to γ
i
are blocked. Thus there is no information about carrier status by choice of intervention once the full family history and genetic test results are known. Thus the first factor is independent of γ
i
and drops out of the likelihood. So the contribution if intervention was not chosen (7) is the same as of equation (6) that does not consider interventions:
P(I
i
> u
i
, H
i
|γ
i
, x
i
, H
-i
) ∝ P(H
i
|γ
i
, x
i
).
Next, the contribution from a person choosing intervention at age I is
P(I
i
= I, H
i
, |γ
i
, x
i
, H
-i
) = P(|I
i
= I, H
i
, γ
i
, x
i
) × P(I
i
= I|H
i
, γ
i
, x
i
, H
-i
) × P(H
i
|γ
i
, x
i
).
The last two factors can be treated the same as in equation (7), so the contribution is
∝ P(|I
i
= I, H
i
, γ
i
, x
i
)P(H
i
|γ
i
, x
i
). (8)
The second factor is the pre-intervention contribution, and the first factor is an analogous post-intervention factor.
The post-intervention factor in (8) can be estimated from survival data. By figure 2, conditioning on I
i
(as the post-intervention factor does) breaks all links from to both and γ
i
. Thus is independent non-informative censoring given I
i
, so standard survival analysis can estimate the post-intervention disease hazards (y|I
i
, H
i
, γ
i
). A simple way to do this is to fit a Cox model for time to disease with auxiliaries, pre-intervention disease history, and intervention age as time-dependent covariates [26]. The hazard ratios from this Cox model are multiplied with a pre-intervention hazard estimate (perhaps from the same dataset, or taken from other penetrance studies) to yield the post-intervention hazard. Then the post-intervention disease-specific survival is
Note that the hazards are cumulated starting from intervention age I. The post-intervention disease-specific density is . The likelihood contribution for a person who chose intervention is
The contributions to the Bayes Factor in equation (1) are the ratio of likelihood contributions (10) for γ
i
= 1 to γ
i
= 0. [20] The post-intervention part of this ratio is
Note that if the hazard ratios between carriers and non-carriers are equal, then at the age of oophorectomy itself, the densities for carriers and non-carriers are equal to each other (and same for the survivals), and thus (11) is one. At ages beyond the age of oophorectomy, the hazards start cumulating as in equation (9), and the densities and survivals will begin to differ and (11) will differ from one. The amount of information in oophorectomy can be measured by the ratio of hazard ratios of carriers to non-carriers; the further this ratio is from one, the further (11) is one will be from one, and thus the more important it is to account for oophorectomy. However, a hazard ratio of one still has implications for all post-oophorectomy ages, so we stress that ratio of one still must be accounted for.
Incorporate Oophorectomy into BRCAPRO
Including an intervention requires estimating post-intervention disease-specific hazards, the simplest way being multiplying each pre-intervention hazard by a hazard ratio to get post-intervention hazards. BRCAPRO uses pre-oophorectomy hazards estimated by [27]. We use the most recently estimated hazard ratios for obtaining post-oophorectomy breast cancer hazards for mutation carriers [15]. It is critical to consider all factors that could modify the appropriate hazard ratio to use. For example, [15] estimates hazard ratios for breast cancer within groups defined by pre-oophorectomy disease history, age at oophorectomy, time since oophorectomy, and by BRCA1 vs. BRCA2. Although [15] does not formally test if the hazard ratios differ within each group, we must informally assess whether the differences they found are strongly statistically significant. For example, [15] finds that those with BRCA1 mutations have a hazard ratio of 0.43 (0.29, 0.65) and those with BRCA2 mutations a hazard ratio of 0.57 (0.28, 1.15). Since these two estimates must be nearly independent, we can calculate a p-value of 0.50 for the difference in hazard ratios between the two loci; thus we are justified in using the same hazard ratio for both loci. For age at oophorectomy, we cannot calculate a p-value, but the degree of overlap of the confidence intervals between age ranges in [15] suggests that the differences in the hazard ratio are probably statistically insignificant, and thus justifies the use of a common hazard ratio over all ages. The overall hazard ratio found by [15] was 0.46 (0.32, 0.65), but for 15 years after oophorectomy, they find a hazard ratio of 1.30 (0.51, 3.30). The two intervals overlap, but it's possible that the two are significantly different. However, neither [16] nor [28] noticed this in their data, and the 1.30 hazard ratio is estimated quite imprecisely, making us hesitant to use this in a model for clinical decision-making. Also, if only a few modifying factors exist, using a single hazard ratio for everyone is advantageous because this overall hazard ratio would be most precisely estimated and is relevant for most consultands. Thus we use the overall hazard ratio of 0.46.
For oophorectomy and ovarian/peritoneal cancers among mutation carriers, we combined three papers: one retrospective study reports an overall hazard ratio of 0.04 with CI (0.01,0.16) [16] and two prospective studies report 0.15 with CI (0.02,1.31) [28] and 0.20 with CI (0.07,0.58) [11]. Since all references combine ovarian cancer and primary peritoneal carcinoma into a single endpoint, separate effects of oophorectomy on each cancer cannot be estimated, so we combine them into a single endpoint. All papers report that these hazard ratios do not depend on pre-oophorectomy disease history, age at oophorectomy, time since oophorectomy, or by BRCA1 vs. BRCA2. We combine these three results with a fixed-effect meta-analysis [29] to average the hazard ratios weighted by their inverse variances, yielding an estimate of 0.12 with CI (0.05,0.25).
Unfortunately, there are no comparable studies of oophorectomy in BRCA non-carriers, only studies that mix carriers with non-carriers. A population-based study found a hazard ratio of 0.50 [30] and another study among women with family history of breast cancer reported a hazard ratio of 0.41 [31]. Since these ratios are close to the carrier ratio, we set non-carriers to have the same hazard ratio of 0.46 as the carriers. For ovarian/peritoneal cancer, the only comparable study among non-carriers reports a hazard ratio of 0.05 with CI (0.01,0.22) [19]. Although [19] doesn't report a p-value testing for different effects of oophorectomy in carriers vs. non-carriers, since their two estimates should be nearly independent, we calculate a p = 0.048 that the two effects are different. Thus we set the non-carrier hazard ratio for oophorectomy to 0.05.