Since residuals can be correlated, potentially existing observed outcomes of the same individual can be informative for predicting the unobserved valued of the same individual.
Assume that the data is sorted such that are observed and are not. The special case of all outcomes being unobserved (new individual) is covered with .
Let further
be a block decomposition where and similarly for the other blocks.
Predictions can then be made based on the conditional distribution
with
and
Note that does not depend on .
predict
For implementing predict(), only
is required.
For predict(interval = "confidence") additionally
standard errors are required. These could be derived using the delta
methods since
is a function of the estimated model parameters
and
.
This would require the Jacobian
in addition to the estimated variance covariance matrix of the parameter
estimate
,
.
Standard errors for
are then given by the square root of the diagonal elements of
For
predict(interval = "prediction") one would use the square
root of the diagonal elements of
instead. The delta method could again be used to make upper and lower
boundaries reflect parameter estimation uncertainty.
Alternatively, both intervals can be derived using a parametric
bootstrap sample of the unrestricted parameters
.
This would probably also be easier for the
interval = "prediction" case.
Please note that for these intervals, we assume that the distribution is approximately normal: we use to construct it, where is the th element of , and is the element of .
To create simulation of responses from a fitted model, we have multiple situations: whether this simulation is conditional on both and estimates, or it is marginal?
Under conditional simulation setting, the variance-covariance matrix, and the expectation of are already given in Mathematical Derivations.
Please note that in implementation of predict function,
we only use the diagonal elements of
,
however, here we need to make use of the full matrix
to obtain correctly correlated simulated observations.
To simulate marginally, we take the variance of and into consideration. For each simulation, we first generate assuming it approximately follows a multivariate normal distribution. Then, conditional on the we sampled, we generate also assuming it approximately follows a multivariate normal distribution.
Now we have and estimates, and we just follow the conditional simulation.
simulate
To implement simulate function, we first ensure that the
expectation
()
and variance-covariance matrix
()
are generated in predict function, for each of the
subjects.
For simulate(method = "conditional"), we use the
estimated
and
to construct the
and
directly, and generate response with
distribution.
For simulate(method = "marginal"), for each repetition
of simulation, we generate
from the mmrm fit, where the estimate of
and variance-covariance matrix of
are provided. Using the generated
,
we then obtain the
and its variance-covariance matrix, with
and the data used in fit.
Then we sample
as follows. We note that on the C++ side we already have
the robust Cholesky decomposition of the inverse of its asymptotic
covariance matrix:
Hence we make sure to report the lower
triangular matrix
and the diagonal matrix
back to the R side, and afterwards we can generate
samples as follows:
where
is drawn from the standard multivariate normal distribution, since
We note that calculating
is efficient via backwards solving
since
is upper right triangular.
Then we simulate the observations once with
simulate(method = "conditional", beta = beta_sample, theta = theta_new).
We pool all the repetitions together and thus obtain the marginal
simulation results.
predict and simulate
Results
We summarize the different options for predict and
simulate methods and explain how they relate to each
other.
predict options
predict(type = "confidence") gives the variance of the
predictions conditional on the
estimate, taking into account the uncertainty of estimating
.
So here we ignore the uncertainty in estimating
.
It also does not add measurement error
.
We can use this prediction when we are only interested in predicting the
mean of the unobserved
,
assuming the estimated
as the true variance parameters.predict(type = "prediction") in contrast takes into
account the full uncertainty, including the variance of
and the measurement error
.
We can use this prediction when we are interested in reliable confidence
intervals for the unobserved
as well as the observed
,
assuming we would like to predict repeated observations from the same
subjects and time points.simulate options
simulate(type = "conditional") simulates observations
while keeping the
and
estimates fixed. It adds measurement errors
when generating the simulated values. Hence the mean of the simulated
values will be within the confidence intervals from
predict(type = "conditional").simulate(type = "marginal") simulates observations
while taking into account the uncertainty of
and
through sampling from their asymptotic frequentist distributions. On top
of that, it also adds measurement errors
.
This hence is using the same distribution as
predict(type = "prediction").SAS
In SAS, from proc mixed, we are able to
generate predictions using the outp argument in the
model statement. For example:
PROC MIXED DATA = fev_data method=reml;
CLASS RACE(ref = 'Asian') AVISIT(ref = 'VIS4') SEX(ref = 'Male') ARMCD(ref = 'PBO') USUBJID;
MODEL FEV1 = ARMCD / ddfm=Satterthewaite solution chisq outp=pred;
REPEATED AVISIT / subject=USUBJID type=un r rcorr;
LSMEANS ARMCD / pdiff=all cl alpha=0.05 slice=AVISIT;
RUN;
However, there are some differences between the SAS
implementation and our mmrm package, described as
follows:
mmrm and SAS both provide predicted
means (conditional on other observations) for unobserved records,
SAS also provides predicted means for observed records
while mmrm does not. The rationale is that in the
mmrm package we want to be consistent with the notion of
predictions conditional on the observed records - which means that
observed records are observed and therefore there is no prediction
uncertainty anymore.mmrm
and SAS. While in SAS the prediction standard
error is conditional on the estimated variance parameters
,
in mmrm the marginal prediction standard error is provided.
The rationale is that in the mmrm package we want to take
into account the full uncertainty about parameter estimates including
.SAS are based on the t
distribution, while currently in mmrm we use the normal
distribution. We will be considering an extension towards using the t
distribution in the future and welcome feedback on this detail.