This document accompanies the “A method to estimate probability of disease and vaccine efficacy from clinical trial immunogenicity data.” publication. It describes the application of PoDBAY package on the PoDBAY efficacy estimation examples using data from clinical trial(s).
The goal of PoDBAY efficacy estimation analysis is to:
We describe two scenarios of application in PoDBAY efficacy estimation:
PoDBAY efficacy is estimated in two subsequent steps as described in the publication, section Methods.
Notes:
PoD curve is estimated (point estimate together with confidence intervals) in three steps - further details can be found in the publication, section Methods.
Titers of all diseased and all non-diseased subjects are used for estimation of PoD curve parameters. Parameter estimates $p_{max}^`$, $et_{50}^`$ and $\gamma^`$ are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^`$, $et_{50}^`$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
Titers of all new diseased and all new non-diseased subjects are used for re-estimation of PoD curve parameters. Parameter estimates $p_{max}^{``}$, $et_{50}^{``}$ and $\gamma^{``}$ are obtained.
Diseased and non-diseased subject level data are required. We’ll use
PoDBAY::diseased
and PoDBAY::nondiseased
mock-up data. Both datasets contain population summary statistics (N,
mean, sd) and individual subject level data (log2 titers, disease status
(DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
library(PoDBAY)
data(diseased)
data(nondiseased)
str(diseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#> $ N : int 35
#> $ mean : num 3.83
#> $ stdDev : num 1.66
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:35] 5.59 6.07 2.43 5.84 6.29 ...
#> ..- attr(*, "names")= chr [1:35] "vacc" "vacc" "vacc" "vacc" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:35] TRUE TRUE TRUE TRUE TRUE TRUE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(nondiseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#> $ N : int 1965
#> $ mean : num 6.01
#> $ stdDev : num 2.3
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:1965] 5.75 7.37 5.33 10.19 7.66 ...
#> ..- attr(*, "names")= chr [1:1965] "vacc" "vacc" "vacc" "vacc" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:1965] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
Note: To convert your data in to the population
class
object use generatePopulation()
function from PoDBAY
package. See vignette
vignette("population", package = "PoDBAY")
for further
details.
Once we have our data prepared function
PoDParamEstimation
is used to estimate PoD curve parameters
in three steps as described above. For more details about the usage of
the function see examples in ?PoDParamEstimation()
.
estimatedParameters <- PoDParamEstimation(diseasedTiters = diseased$titers,
nondiseasedTiters = nondiseased$titers,
nondiseasedGenerationCount = nondiseased$N,
repeatCount = 50)
Step 1: $p_{max}^`$, $et_{50}^`$ and $\gamma^`$
Results corresponding to the first step of estimation of PoD-titer
relationship can be obtained via
estimatedParameters$resultsPriorReset
.
#> # A tibble: 50 × 3
#> pmax slope et50
#> <dbl> <dbl> <dbl>
#> 1 0.0343 28.5 6.05
#> 2 0.0343 28.5 6.05
#> 3 0.0343 28.5 6.05
#> 4 0.0343 28.5 6.05
#> 5 0.0343 28.5 6.05
#> 6 0.0343 28.5 6.05
#> 7 0.0343 28.5 6.05
#> 8 0.0343 28.5 6.05
#> 9 0.0343 28.5 6.05
#> 10 0.0343 28.5 6.05
#> # ℹ 40 more rows
Note that parameter estimates are the same for every
repeatCount
iteration. This is according to our
expectations as the same diseased and non-diseased cases are used in
every iteration in step 1
of this example.
Step 2: Bootstrap and re-assignment of disease status Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^`$, $et_{50}^`$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
Step 3: $p^{``}_{max}$, $et^{``}_{50}$ and $\gamma^{``}$
Results corresponding to the third step of Estimation of PoD-titer
relationship can be obtained via
estimatedParameters$results
.
#> # A tibble: 50 × 3
#> pmax slope et50
#> <dbl> <dbl> <dbl>
#> 1 0.0349 31.4 5.89
#> 2 0.0361 26.9 6.37
#> 3 0.0371 28.5 6.01
#> 4 0.0288 28.5 6.31
#> 5 0.0321 33.2 6.02
#> 6 0.0331 27.4 5.98
#> 7 0.0464 16.3 5.66
#> 8 0.0479 17.7 5.69
#> 9 0.0378 28.5 6.01
#> 10 0.0324 29.5 6.07
#> # ℹ 40 more rows
Non-parametric bootstrap described in step 2
is applied
inside the function. Therefore, the estimated PoD curve parameters
differ in this case.
Parameters of PoD curve point estimate representing the PoD-titer
relationship are estimated using results from ‘step 1’ -
estimatedParameters$resultsPriorReset
.
Confidence intervals (95% level of significance) of PoD curve
parameters are calculated using results from ‘step 3’ -
estimatedParameters$results
.
PoDBAY Efficacy (point estimate together with confidence intervals) is estimated - further details can be found in the publication, section Methods.
PoDParamsPointEst
from
step 1 PoD-titer relationship estimation - Trial A
estimatedParameters$results
from step 1
PoD-titer relationship estimation - Trial A
step 2
) and standard deviationsstep 3
Vaccinated and control population summary statistics (N, mean, sd)
are required. We’ll use PoDBAY::vaccinated
and
PoDBAY::control
mock-up data. Both datasets contain
population summary statistics (N, mean, sd) and individual subject level
log2 titers.
Only the population summary statistics (N, mean, sd) data are used for the PoDBAY efficacy estimation as described above.
data(vaccinated)
data(control)
str(vaccinated)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#> $ N : num 1000
#> $ mean : num 7
#> $ stdDev : num 2
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : num [1:1000] 5.75 7.37 5.33 10.19 7.66 ...
#> $ PoDs : num [1:1000] 0.0137 0.00311 0.01952 0.00034 0.00241 ...
#> $ diseaseStatus : logi [1:1000] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(control)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#> $ N : num 1000
#> $ mean : num 5
#> $ stdDev : num 2
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : num [1:1000] 7.27 7.22 3.26 5.42 5.14 ...
#> $ PoDs : num [1:1000] 0.00339 0.00354 0.04762 0.0181 0.02261 ...
#> $ diseaseStatus : logi [1:1000] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
Note: To convert your data in to the population
class
object use generatePopulation()
function from PoDBAY
package. See vignette
vignette("population", package = "PoDBAY")
for further
details.
Once we have our data prepared function
efficacyComputation
is used to estimate Efficacy point
estimate as described above in step 1
.
Jittering of population mean from step 2
by drawing from
sampling distribution is done inside of PoDBAYEfficacy
function. Efficacy set is estimated as described above in
step 3
Analysis provides following results:
result <- list(
EfficacyPointEst = EfficacyPointEst,
efficacyCI = unlist(CI),
PoDParamsPointEst = PoDParamsPointEst,
PoDParametersCI = unlist(PoDParametersCI),
PoDCurve = PoDCurve
)
result
#> $EfficacyPointEst
#> [1] 0.5383405
#>
#> $efficacyCI
#> mean median CILow CIHigh
#> 0.5429269 0.5492815 0.4767003 0.6106386
#>
#> $PoDParamsPointEst
#> pmax slope et50
#> pmax 0.03430108 28.5166 6.051696
#>
#> $PoDParametersCI
#> PmaxCILow PmaxCIHigh Et50CILow Et50CIHigh SlopeCILow SlopeCIHigh
#> 0.02425859 0.04787302 5.53529308 6.36748061 12.86874466 34.17583340
#>
#> $PoDCurve
In a frequent case when serum samples at baseline and after vaccination are collected and assayed only in a subset of subjects (“immunogenicity sample/ subset”) and the assay value of titer is obtained also for all disease cases at the same time points, the general method for PoD curve estimation described above can be extended. Further details can be found in the publication Appendix A.
PoD curve is estimated (point estimate together with confidence intervals) in three steps.
Titers of all non-diseased subjects are generated by random sampling with replacement from immunogenicity subset.
Titers of all diseased and all generated non-diseased subjects
(generated in step 1
) are used for estimation of PoD curve
parameters. Parameter estimates $p_{max}^`$, $et_{50}^`$ and $\gamma^`$ are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^`$, $et_{50}^`$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
New immunogenicity subset is selected from all new non-diseased, such that the ratio of all diseased versus non-diseased in immunogenicity subset in new data match the ratio in original data.
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset.
Titers of all new diseased and all new generated non-diseased
subjects (generated in step 5
) are used for re-estimation
of PoD curve parameters. Parameter estimates $p_{max}^{``}$, $et_{50}^{``}$ and $\gamma^{``}$ are obtained.
Assume hypothetical case where we have clinical trial data of 2,000 subjects from which only 200 subjects’ plasma samples are collected and examined in the immunogenicity study. Further, out of these 2,000 we identify 35 disease cases to which we measure titers from the same time point. In the end we have titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
Population | # subjects (N) |
---|---|
Whole Trial | |
All subjects | 2,000 |
Diseased | 35 |
Non-diseased | 1,965 |
Measured titers | |
Diseased | 35 |
Immunogenicity sample | 200 |
Note that in the immunogenicity sample the disease status is unknown as the sample is created before the clinical study. However, vaccination status is known.
In our example the steps would be following:
Titers of all non-diseased subjects (N = 1,965) are generated by random sampling with replacement from immunogenicity subset (N = 200).
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are used for estimation of PoD curve parameters.
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are put together and bootstrapped (N = 2,000).
New immunogenicity subset is selected from all new non-diseased ($N^`$ = 2000 - X), such that the ratio of all new diseased ($N^`$ = X) versus new non-diseased in immunogenicity subset in new data match the ratio in original data (ratio = 200:35).
Population | # subjects ($N^`$) |
---|---|
New diseased | X |
New non-diseased | 2000 − X |
New Immunogenicity sample | $X * \frac{200}{35}$ |
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset ($N^` = X * \frac{200}{35}$)
Titers of all new diseased ($N^` = X$) and all new generated non-diseased subjects ($N^` = 2000 - X$) are used for second estimation of PoD curve parameters.
Diseased and non-diseased subject level data are required. We’ll use
PoDBAY::diseased
and PoDBAY::nondiseased
mock-up data. Both datasets contain population summary statistics (N,
mean, sd) and individual subject level data (log2 titers, diseases
status (DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
We create the immunogenicity sample from our mock-up data as described above - We start with the titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
data(diseased)
data(nondiseased)
# Immunogenicity sample created
ImmunogenicitySample <- BlindSampling(diseased, nondiseased, method = list(name = "Fixed", value = 200))
nondiseasedImmunogenicitySample <- ImmunogenicitySample$ImmunogenicityNondiseased
str(diseased)
#> Reference class 'Population' [package ".GlobalEnv"] with 8 fields
#> $ N : int 35
#> $ mean : num 3.83
#> $ stdDev : num 1.66
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:35] 5.59 6.07 2.43 5.84 6.29 ...
#> ..- attr(*, "names")= chr [1:35] "vacc" "vacc" "vacc" "vacc" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:35] TRUE TRUE TRUE TRUE TRUE TRUE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
str(nondiseasedImmunogenicitySample)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#> $ N : int 196
#> $ mean : num 5.93
#> $ stdDev : num 2.45
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:196] 7.7 8.59 8.15 9.1 11.89 ...
#> ..- attr(*, "names")= chr [1:196] "vacc" "vacc" "vacc" "vacc" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:196] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
Note: From now on the analysis and used functions are the same as in
general case. Only the input variable change from unifected
to NondiseasedImmunogenicitySample
. The
nondiseasedGenerationCount
remains the same as the total
number of nondiseased remains the same in the whole trial.
Once we have our data prepared, function
PoDParamEstimation
is used to estimate PoD curve parameters
in six steps as described above. For more details about the usage of the
function see examples in ?PoDParamEstimation()
.
estimatedParametersAP <- PoDParamEstimation(diseasedTiters = diseased$titers,
nondiseasedTiters = nondiseasedImmunogenicitySample$titers,
nondiseasedGenerationCount = nondiseased$N,
repeatCount = 50)
Step 1: $p_{max}^`$, $et_{50}^`$ and $\gamma^`$
Results corresponding to the first step of Estimation of PoD-titer
relationship can be obtained via
estimatedParametersAP$resultsPriorReset
.
#> # A tibble: 49 × 3
#> pmax slope et50
#> <dbl> <dbl> <dbl>
#> 1 0.0324 34.3 6.05
#> 2 0.0318 34.3 6.08
#> 3 0.0322 34.3 6.08
#> 4 0.0324 34.3 6.08
#> 5 0.0315 34.3 6.10
#> 6 0.0316 34.3 6.07
#> 7 0.0325 34.3 6.09
#> 8 0.0322 34.3 6.10
#> 9 0.0316 34.3 6.09
#> 10 0.0323 34.3 6.07
#> # ℹ 39 more rows
Note that parameter estimates are now different for each
repeatCount
iteration. This is according to our
expectations as titers of all non-diseased subjects are generated by
random sampling with replacement from immunogenicity subset in every
iteration in step 1
of this example.
Step 2: Data generation and re-assignment of disease status
Step 3: $p^{``}_{max}$, $et^{``}_{50}$ and $\gamma^{``}$
Results corresponding to the sixth step of estimation of PoD-titer
relationship can be obtained via
estimatedParametersAP$results
.
#> # A tibble: 49 × 3
#> pmax slope et50
#> <dbl> <dbl> <dbl>
#> 1 0.0347 39.8 5.99
#> 2 0.0231 16.0 5.89
#> 3 0.0351 21.4 5.77
#> 4 0.0419 12.1 5.46
#> 5 0.0301 39.8 6.04
#> 6 0.0483 4.59 3.92
#> 7 0.0393 43.7 5.94
#> 8 0.0386 36.8 6.06
#> 9 0.0314 40.5 6.35
#> 10 0.0263 32.7 6.00
#> # ℹ 39 more rows
Non-parametric bootstrap described in step 3
together
with creation of new immunogenicity sample in step 4-5
is
applied inside the function.
Parameters of PoD curve point estimate representing the PoD-titer
relationship are estimated using results from ‘step 1’ -
estimatedParametersAP$resultsPriorReset
.
Confidence intervals (80%, 90% and 95% level of significance) of PoD
curve parameters are calculated using results from ‘step 6’ -
estimatedParametersAP$results
.
There are two possible situations:
We will describe the approach in the situation where Trial A = Trial B.
As stated above the only difference is in the data availability. The fact that vaccinated and control population summary statistics (N, mean, sd) are required remains the same. Therefore, we calculate summary statistics for both populations using immunogenicity subset data - created in the PoD-titer relationship estimation step.
# Immunogenicity sample - vaccinated
str(ImmunogenicitySample$ImmunogenicityVaccinated)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#> $ N : int 97
#> $ mean : num 7.03
#> $ stdDev : num 2.34
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:97] 7.7 8.59 8.15 9.1 11.89 ...
#> ..- attr(*, "names")= chr [1:97] "vacc_FALSE" "vacc_FALSE" "vacc_FALSE" "vacc_FALSE" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:97] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
# Immunogenicity sample - control
str(ImmunogenicitySample$ImmunogenicityControl)
#> Reference class 'Population' [package "PoDBAY"] with 8 fields
#> $ N : int 103
#> $ mean : num 4.73
#> $ stdDev : num 2.13
#> $ unknownDistribution: logi FALSE
#> $ UDFunction :function ()
#> $ titers : Named num [1:103] 2.17 4.53 2.51 6.08 4.22 ...
#> ..- attr(*, "names")= chr [1:103] "control_FALSE" "control_FALSE" "control_FALSE" "control_FALSE" ...
#> $ PoDs : num(0)
#> $ diseaseStatus : logi [1:103] FALSE FALSE FALSE FALSE FALSE FALSE ...
#> and 24 methods, of which 10 are possibly relevant:
#> assignPoD, getDiseasedCount, getDiseasedTiters, getNondiseasedCount,
#> getNondiseasedTiters, getTiters, getUnknown, initialize, popFun, popX
means <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$mean,
"control" = ImmunogenicitySample$ImmunogenicityControl$mean)
standardDeviations <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$stdDev,
"control" = ImmunogenicitySample$ImmunogenicityControl$stdDev)
EfficacyPointEst <- efficacyComputation(PoDParamsPointEst,
means,
standardDeviations)
EfficacyPointEst
#> [1] 0.5298448
Analysis provides following results:
result <- list(
EfficacyPointEst = EfficacyPointEst,
efficacyCI = unlist(CI),
PoDParamsPointEst = PoDParamsPointEst,
PoDParametersCI = unlist(PoDParametersCI),
PoDCurve = PoDCurve
)
result
#> $EfficacyPointEst
#> [1] 0.5298448
#>
#> $efficacyCI
#> mean median CILow95 CIHigh95 CILow90 CIHigh90 CILow80 CIHigh80
#> 0.5375903 0.5419903 0.4432219 0.6231131 0.4527044 0.6106371 0.4608834 0.5999084
#>
#> $PoDParamsPointEst
#> pmax slope et50
#> pmax 0.03218207 33.85656 6.08513
#>
#> $PoDParametersCI
#> PmaxCILow PmaxCIHigh Et50CILow Et50CIHigh SlopeCILow SlopeCIHigh
#> 0.02140353 0.04420208 5.36664708 6.54030103 6.63482304 43.47149473
#>
#> $PoDCurve