Title: | Assessing Performance of Prediction Models for Predicting Patient-Level Treatment Benefit |
---|---|
Description: | Methods for assessing the performance of a prediction model with respect to identifying patient-level treatment benefit. All methods are applicable for continuous and binary outcomes, and for any type of statistical or machine-learning prediction model as long as it uses baseline covariates to predict outcomes under treatment and control. |
Authors: | Orestis Efthimiou |
Maintainer: | Orestis Efthimiou <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.2 |
Built: | 2024-11-12 05:41:52 UTC |
Source: | https://github.com/esm-ispm-unibe-ch/predieval |
This function produces a plot to illustrate the calibration for benefit for a prediction model. The samples are split into a number of groups according to their predicted benefit, and within each group the function estimates the observed treatment benefit and compares it with the predicted one
bencalibr( data = NULL, Ngroups = 5, y.observed, treat, predicted.treat.0, predicted.treat.1, type = "continuous", smoothing.function = "lm", axis.limits = NULL )
bencalibr( data = NULL, Ngroups = 5, y.observed, treat, predicted.treat.0, predicted.treat.1, type = "continuous", smoothing.function = "lm", axis.limits = NULL )
data |
An optional data frame containing the required information. |
Ngroups |
The number of groups to split the data. |
y.observed |
The observed outcome. |
treat |
A vector with the treatment assignment. This must be 0 (for control treatment) or 1 (for active treatment). |
predicted.treat.0 |
A vector with the model predictions for each patient, under the control treatment. For the case of a binary outcome this should be probabilities of an event. |
predicted.treat.1 |
A vector with the model predictions for each patient, under the active treatment. For the case of a binary outcome this should be probabilities of an event. |
type |
The type of the outcome, "binary" or "continuous". |
smoothing.function |
The method used to smooth the calibration line. Can be "lm", "glm", "gam", "loess", "rlm". More details can be found in https://ggplot2.tidyverse.org/reference/geom_smooth.html. |
axis.limits |
Sets the limits of the graph. It can be a vector of two values, i.e. the lower and upper limits for x and y axis. It can be omitted. |
The calibration plot
# continuous outcome dat1=simcont(200)$dat head(dat1) lm1=lm(y.observed~(x1+x2+x3)*t, data=dat1) dat.t0=dat1; dat.t0$t=0 dat.t1=dat1; dat.t1$t=1 dat1$predict.treat.1=predict(lm1, newdata = dat.t1) # predictions in treatment dat1$predict.treat.0=predict(lm1, newdata = dat.t0) # predicions in control bencalibr(data=dat1, Ngroups=10, y.observed, predicted.treat.1=predict.treat.1, predicted.treat.0=predict.treat.0, type="continuous", treat=t, smoothing.function = "lm", axis.limits = c(-2, 2)) # binary outcome dat2=simbinary(500)$dat head(dat2) glm1=glm(y.observed~(x1+x2+x3)*t, data=dat2, family = binomial(link = "logit")) dat2.t0=dat2; dat2.t0$t=0 dat2.t1=dat2; dat2.t1$t=1 dat2$predict.treat.1=predict(glm1, newdata = dat2.t1) # predictions in treatment dat2$predict.treat.0=predict(glm1, newdata = dat2.t0) # predicions in control bencalibr(data=dat2, Ngroups=6, y.observed, predicted.treat.1=expit(predict.treat.1), predicted.treat.0=expit(predict.treat.0), type="binary", treat=t, smoothing.function = "lm")
# continuous outcome dat1=simcont(200)$dat head(dat1) lm1=lm(y.observed~(x1+x2+x3)*t, data=dat1) dat.t0=dat1; dat.t0$t=0 dat.t1=dat1; dat.t1$t=1 dat1$predict.treat.1=predict(lm1, newdata = dat.t1) # predictions in treatment dat1$predict.treat.0=predict(lm1, newdata = dat.t0) # predicions in control bencalibr(data=dat1, Ngroups=10, y.observed, predicted.treat.1=predict.treat.1, predicted.treat.0=predict.treat.0, type="continuous", treat=t, smoothing.function = "lm", axis.limits = c(-2, 2)) # binary outcome dat2=simbinary(500)$dat head(dat2) glm1=glm(y.observed~(x1+x2+x3)*t, data=dat2, family = binomial(link = "logit")) dat2.t0=dat2; dat2.t0$t=0 dat2.t1=dat2; dat2.t1$t=1 dat2$predict.treat.1=predict(glm1, newdata = dat2.t1) # predictions in treatment dat2$predict.treat.0=predict(glm1, newdata = dat2.t0) # predicions in control bencalibr(data=dat2, Ngroups=6, y.observed, predicted.treat.1=expit(predict.treat.1), predicted.treat.0=expit(predict.treat.0), type="binary", treat=t, smoothing.function = "lm")
Calculates the expit of a real number
expit(x)
expit(x)
x |
A real number |
exp(x)/(1+exp(x))
expit(2.3)
expit(2.3)
Calculates the logit of a real number between 0 and 1
logit(x)
logit(x)
x |
A real number between 0 and 1 |
log(x/(1-x))
logit(0.2)
logit(0.2)
This function calculates a series of measures to assess decision accuracy, discrimination for benefit, and calibration for benefit of a prediction model.
predieval( repeats = 50, Ngroups = 10, X, treat, Y, predicted.treat.1, predicted.treat.0, type = "continuous", bootstraps = 500, Threshold = 0 )
predieval( repeats = 50, Ngroups = 10, X, treat, Y, predicted.treat.1, predicted.treat.0, type = "continuous", bootstraps = 500, Threshold = 0 )
repeats |
The number of repetitions for the algorithm. |
Ngroups |
The number of groups to split the data. |
X |
A dataframe with patient covariates. |
treat |
A vector with the treatment assignment. This must be 0 (for control treatment) or 1 (for active treatment). |
Y |
The observed outcome. For binary outcomes this should be 0 or 1 |
predicted.treat.1 |
A vector with the model predictions for each patient, under the active treatment. For the case of a binary outcome this should be probabilities of an event. |
predicted.treat.0 |
A vector with the model predictions for each patient, under the control treatment. For the case of a binary outcome this should be probabilities of an event. |
type |
The type of the outcome, "binary" or "continuous". |
bootstraps |
The number of bootstrap samples to be used for calculating confidence intervals. |
Threshold |
Threshold for treatment benefit |
A table with all estimated measures of performance.
# continuous outcome dat0=simcont(500)$dat head(dat0) # Randomly shuffle the data dat<-dat0[sample(nrow(dat0)),] # Create random folds dat$folds <- cut(seq(1,nrow(dat)),breaks=10,labels=FALSE) # Obtain out-of-sample predictions dat.out.CV<-list() for (i in 1:10){ dat.in.CV=dat[dat$folds!=i,] dat.out.CV[[i]]=dat[dat$folds==i,] dat1<-dat.out.CV[[i]]; dat1$t=1 dat0<-dat.out.CV[[i]]; dat0$t=0 m1=lm(data=dat.in.CV, y.observed~x1*t+x2*t) dat.out.CV[[i]]$predict.treat.1=predict(newdata=dat1, m1)# predictions in treatment dat.out.CV[[i]]$predict.treat.0=predict(newdata=dat0, m1)# predicions in control } dat.CV=dat.out.CV[[1]] for (i in 2:10){ dat.CV=rbind(dat.CV,dat.out.CV[[i]])} # assess model performance predieval(repeats=20, Ngroups=c(5:10), X=dat.CV[,c("x1", "x2","x3")], Y=dat.CV$y.observed, predicted.treat.1 = dat.CV$predict.treat.1, predicted.treat.0 = dat.CV$predict.treat.0, treat=dat.CV$t, type="continuous") # binary outcome dat0=simbinary(500)$dat head(dat0) # Randomly shuffle the data dat<-dat0[sample(nrow(dat0)),] # Create random folds dat$folds <- cut(seq(1,nrow(dat)),breaks=10,labels=FALSE) dat.out.CV<-list() for (i in 1:10){ dat.in.CV=dat[dat$folds!=i,] dat.out.CV[[i]]=dat[dat$folds==i,] dat1<-dat.out.CV[[i]]; dat1$t=1 dat0<-dat.out.CV[[i]]; dat0$t=0 glm1=glm(y.observed~(x1+x2+x3)*t, data=dat.in.CV, family = binomial(link = "logit")) dat.out.CV[[i]]$predict.treat.1=predict(newdata=dat1, glm1)# predictions in treatment dat.out.CV[[i]]$predict.treat.0=predict(newdata=dat0, glm1)# predicions in control } dat.CV=dat.out.CV[[1]] for (i in 2:10){ dat.CV=rbind(dat.CV,dat.out.CV[[i]])} predieval(repeats=20, Ngroups=c(5:10), X=dat.CV[,c("x1", "x2","x3")], Y=dat.CV$y.observed, predicted.treat.1 = expit(dat.CV$predict.treat.1), predicted.treat.0 = expit(dat.CV$predict.treat.0), treat=dat.CV$t, type="binary",bootstraps = 50)
# continuous outcome dat0=simcont(500)$dat head(dat0) # Randomly shuffle the data dat<-dat0[sample(nrow(dat0)),] # Create random folds dat$folds <- cut(seq(1,nrow(dat)),breaks=10,labels=FALSE) # Obtain out-of-sample predictions dat.out.CV<-list() for (i in 1:10){ dat.in.CV=dat[dat$folds!=i,] dat.out.CV[[i]]=dat[dat$folds==i,] dat1<-dat.out.CV[[i]]; dat1$t=1 dat0<-dat.out.CV[[i]]; dat0$t=0 m1=lm(data=dat.in.CV, y.observed~x1*t+x2*t) dat.out.CV[[i]]$predict.treat.1=predict(newdata=dat1, m1)# predictions in treatment dat.out.CV[[i]]$predict.treat.0=predict(newdata=dat0, m1)# predicions in control } dat.CV=dat.out.CV[[1]] for (i in 2:10){ dat.CV=rbind(dat.CV,dat.out.CV[[i]])} # assess model performance predieval(repeats=20, Ngroups=c(5:10), X=dat.CV[,c("x1", "x2","x3")], Y=dat.CV$y.observed, predicted.treat.1 = dat.CV$predict.treat.1, predicted.treat.0 = dat.CV$predict.treat.0, treat=dat.CV$t, type="continuous") # binary outcome dat0=simbinary(500)$dat head(dat0) # Randomly shuffle the data dat<-dat0[sample(nrow(dat0)),] # Create random folds dat$folds <- cut(seq(1,nrow(dat)),breaks=10,labels=FALSE) dat.out.CV<-list() for (i in 1:10){ dat.in.CV=dat[dat$folds!=i,] dat.out.CV[[i]]=dat[dat$folds==i,] dat1<-dat.out.CV[[i]]; dat1$t=1 dat0<-dat.out.CV[[i]]; dat0$t=0 glm1=glm(y.observed~(x1+x2+x3)*t, data=dat.in.CV, family = binomial(link = "logit")) dat.out.CV[[i]]$predict.treat.1=predict(newdata=dat1, glm1)# predictions in treatment dat.out.CV[[i]]$predict.treat.0=predict(newdata=dat0, glm1)# predicions in control } dat.CV=dat.out.CV[[1]] for (i in 2:10){ dat.CV=rbind(dat.CV,dat.out.CV[[i]])} predieval(repeats=20, Ngroups=c(5:10), X=dat.CV[,c("x1", "x2","x3")], Y=dat.CV$y.observed, predicted.treat.1 = expit(dat.CV$predict.treat.1), predicted.treat.0 = expit(dat.CV$predict.treat.0), treat=dat.CV$t, type="binary",bootstraps = 50)
This function generates a dataframe with 6 patient covariates and a binary outcome simulated from a model that uses the covariates.
simbinary(Npat = 100)
simbinary(Npat = 100)
Npat |
Number of patients to simulate. |
The function returns a dataframe with:
x1, x2, x3, x4: patient covariates.
t= treatment assignment (0 for control, 1 for active).
logit.control= the logit of the probability of an outcome in the control treatment.
logit.active= the logit of the probability of an outcome in the active treatment.
benefit= treatment benefit in log odds ratio.
py=the probability of the outcome for each patient, under the treatment actually administered.
logit.py= the logit of py.
y.observed= the observed outcome
dat1=simbinary(100)$dat head(dat1)
dat1=simbinary(100)$dat head(dat1)
This function generates a dataframe with 6 patient covariates and a continuous outcome simulated from a model that uses the covariates.
simcont(Npat = 100)
simcont(Npat = 100)
Npat |
Number of patients to simulate. |
The function returns a dataframe with:
x1, x2, x3, x4: patient covariates.
t= treatment assignment (0 for control, 1 for active).
y.control= the outcome if the patient takes the control treatment.
y.active= the outcome if the patient takes the active treatment.
benefit= the treatment benefit, i.e. y.active-y.control.
y.observed= the observed outcome.
dat1=simcont(100)$dat head(dat1)
dat1=simcont(100)$dat head(dat1)