Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models?
Reproducible codes
0 Introduction
This document contains the reproducible code for the manuscript Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models? by Haqiqatkhah, Ryan, and Hamaker (2022). Please cite this document with the following info:
Haqiqatkhah, M. M., Ryan, O., & Hamaker, E. L. (2022). Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models? PsyArXiv. https://doi.org/10.31234/osf.io/myuvr
In this study, we simulated multilevel data from three data generating mechanisms (DGMs), namely, the AR(1), \(\chi^2\)AR(1), BinAR(1), and PoDAR(1) models with different parameter sets. The simulation was conducted using the following modular pipeline design, inspired by Bien’s R package simulator
(2016), consisting of the following components:
- Simulation: generating the datasets
- Analysis: modeling the data
- Harvesting: collecting the relevant parameter estimates
- Reporting: making tables and plots
And the components were placed in a pipeline, that managed:
- Making the simulation design matrix that include all relevant conditions
- Book-keeping data files belonging to each replication of each condition
- Performing simulations in batch
- Performing Analyses in batch
- Collecting the data in batch
This document is structured as follows. In Section 1, we explain the four components and the functions used therein. Then, in Section 2 we explain the wrapper functions used in the pipeline, and show how the pipeline was—and can be—executed. Then, in Section 3, we discuss how the harvested data was used to make the figures used in the paper (and others that were not included). Finally, in Section 4 we discuss how supplementary plots were made and how the empirical data analysis [on COGITO data; Schmiedek, Lövdén, and Lindenberger (2010)] was done. Note that although the codes provided here are cleaned as much as possible, they are not necessarily succinctly written; some functions were written to accommodate the most general functionalities which turned out to be not necessary for the simulation study.
The GitHub repository of this study contains all of the code necessary to run the study (in the scripts
folder ). Furthermore, because running the whole study would take a huge amount of time (it took us more than 88 days on a 24-core server and produced 2.69 TB of data), the the raw and summarized datasets of the estimated parameters are provided in simulation-files/harvests
. To give an impression of the simulated datasets and the analysis output files, the .rds
files of the first replication of the study are also provided in simulation-files
folder , and the Mplus files of the first replication of the \(N=100, T=100\) condition is provided in the Mplus-files
folder . Finally, all the figures that can be generated based on the study results are provided in the figures
folder . The figures are also referenced individually in Section 3 and Section 4 of this document.
To replicate the study from the scratch, you should first either clone the repository (using git clone https://github.com/psyguy/skewness-staging.git
) or download the repository as a zip file and extract it on your machine. Then you can sequentially run the .R
files you find in the scripts
folder. Note that you would need to have a licensed version of Mplus version 8.6 (Muthén and Muthén 2017) on your machine to run the whole study. Instead of running the scripts separately, if you have Quarto installed , you can also compile Code documentations.qmd
located in the root directory using Quarto after setting the following variables to TRUE
:
Finally, in case you want to change the scripts (e.g., to run a smaller portion of the simulation, or try other parameters, etc.), you should look up the kintr
parameters called in each chunk (with <<some_param>>
) and find the corresponding code (under ## @knitr some_param
) in the scripts
folder.
1 Core components
1.1 The Simulation component
The Simulation component consists of three sets of functions:
- Functions that implement the DGMs and generate univariate (\(N=1\)) time series of length \(T\) from the parameters given to them;
- Wrappers that interface the DGM functions;
- A wrapper to generate datasets (consisting on \(N\) time series of length \(T\)) with a given DGM
1.1.1 Data-generating models specifications
First we define functions for each data-generating models (DGMs) that can produce univariate, single-subject (\(N=1\)) time series of desired length \(T\) (default: T = 100
) with the two canonical parameters and a given random seed (default: seed = 0
). All model(-implied) parameters are saved in a list (called pa
).
For each model, the first observation (\(X_1\)) is randomly drawn from the model-implied marginal distribution, to eliminate the need for removing the burn-in window in the beginning of the data. After the data is generated, in case the argument only.ts
is set to be TRUE
, the raw data (as a vector of length T
) is returned. Otherwise, the function calculates empirical dynamic (\(\phi\)) and marginal (\(\mu\), \(\sigma^2\), and \(\gamma\)) parameters based on the simulated data, and save it in a list (Empirical.Parameters
). Furthermore, two \(\mathrm{\LaTeX{}}\)-ready strings (Model.Description
and Model.Description.Short
) are made which include a summary of the model parameters (that can be used, e.g., in plots). Finally, in case only.ts != TRUE
, the function returns a list consisting of the time series (stored in x
), verbal description of the dataset (Model.Description
and Model.Description.Short
), theoretical (i.e., model-implied) parameters (Model.Parameters
), and empirical (i.e., sample) estimated parameters (Empirical.Parameters
).
The canonical parameters of the AR(1) model with normally distributed residuals (which we referred to as NAR(1)
in the simulation) are the autoregressive parameter \(\phi\) (default: phi = 0.4
), mean \(\mu\) (default: Mean = 50
), and the marginal variance \(\sigma^2\) (default: var.marginal = 4
). Based on the marginal variance, the residual variance (var.resid
) is calculated via \(\sigma^2_\epsilon = \sigma^2 (1 - \phi^2)\).
1.1.1.0.1 Construction
The time series is constructed by first generating a zero-centered time series \(\tilde X_t\) (x_cent
). To do so, first the initial observation in the time series (x_cent[1]
) is sampled from normal distribution with mean zero and a variance equal to the marginal variance of the model:
\[ \tilde X_1 \sim \mathcal{N}(0, \sigma^2) \]
Then, the remainder of the time series is generated using the definition of the AR(1) model (note that here the residual variance is used in the normal distribution):
\[ \begin{aligned} \tilde X_{t} &= \phi \tilde X_{t-1} + \epsilon_{t} \\ \epsilon_{t} &\sim \mathcal{N}(0, {\sigma^2_{\epsilon}}) \end{aligned} \]
Finally, the mean is added to the centered zero-centered time series to reach the final time series with mean \(\mu\):
\[ X_t = \tilde X_t + \mu \]
1.1.1.0.2 Code
Click to expand the code
dgm_nar <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
## setting default seed if not given
if(is.null(pa$phi)) pa$phi <- 0.4
if(is.null(pa$Mean)) pa$Mean <- 50
if(is.null(pa$var.marginal)) pa$var.marginal <- 4
if(is.null(pa$var.resid)) pa$var.resid <- pa$var.marginal * (1 - pa$phi ^ 2)
if(is.null(pa$k)) pa$k <- 100
if(is.null(pa$T)) pa$T <- 100
if(is.null(pa$seed)) pa$seed <- 0
## Making sure var.marginal and var.resid are correctly related
pa$var.marginal <- pa$var.resid / (1 - pa$phi ^ 2)
### first make time series centered around zero
## drawing the first sample x_cent_1
set.seed(pa$seed)
x_cent <- rep(NA, pa$T)
x_cent[1] <- rnorm(n = 1,
mean = 0,
sd = sqrt(pa$var.marginal)
)
## making the rest of the centered time series
for (t in 2:pa$T){
x_cent[t] <- pa$phi*x_cent[t-1] + rnorm(n = 1,
mean = 0,
sd = sqrt(pa$var.resid)
)
}
## adding the mean to the centered time series
x <- x_cent + pa$Mean
## quick output of raw time series without book-keeping variables/parameters
if(!is.null(pa$only.ts))
if(pa$only.ts==TRUE) return(x)
Empirical.Parameters = list(Mean = mean(x),
Variance = var(x),
Skewness = moments::skewness(x),
AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
)
## making a LaTeX-ready list description of the model
Model.Description <- paste0("\\mu = ",
round(pa$Mean,2),
"(",
round(Empirical.Parameters$Mean,2),
")",
",\\; \\gamma = ",
0,
"(",
round(Empirical.Parameters$Skewness,2),
")",
",\\; \\phi = ",
round(pa$phi,2),
"(",
round(Empirical.Parameters$AR,2),
")",
",\\; \\sigma^2_{marginal} = ",
round(pa$var.marginal,2),
"(",
round(Empirical.Parameters$Variance,2),
")",
",\\; T = ",
pa$T,
"$")
Model.Description.Short <- paste0("$\\mu = ",
round(pa$Mean, 2),
",\\; \\sigma^2_{\\epsilon} = ",
round(pa$var.resid, 2),
",\\; \\phi = ",
round(pa$phi, 2),
"\\; \\rightarrow",
"\\; \\sigma^2 = ",
round(Empirical.Parameters$Variance,2),
",\\; \\gamma = ",
round(Empirical.Parameters$Skewness,2),
"$")
## making the output object
output <- list(x = x,
Model.Description = Model.Description,
Model.Description.Short = Model.Description.Short,
Model.Parameters = pa,
Empirical.Parameters = Empirical.Parameters)
return(output)
}
The canonical parameters of the \(\chi^2\)AR(1) model (which we referred to as ChiAR(1)
in the simulation) are the autoregressive parameter \(\phi\) (default: phi = 0.4
), and degrees of freedom \(\nu\) (default: nu = 3
). We set the intercept to zero (c = 0
).1
1.1.1.0.3 Construction
Similar to the AR(1) model, we need to sample the first observation of the \(\chi^2\)AR(1) model from its marginal distribution. However, since this model does not have a closed-form marginal distribution, as an approximation, we instead sample x[1]
from a \(\chi^2\) distribution with \(\nu\) degrees of freedom:
\[ X_1 \sim \chi^2(\nu) \]
Then, we generate the remainder of the time series using the definition of the \(\chi^2\)AR(1) model:
\[ \begin{aligned} X_{t} &= c + \phi X_{t-1} + a_{t} \\ a_{t} &\sim \chi^2(\nu). \end{aligned} \]
1.1.1.0.4 Code
Click to expand the code
dgm_chiar <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
## setting default seed if not given
if(is.null(pa$phi)) pa$phi <- 0.4
if(is.null(pa$nu)) pa$nu <- 3
if(is.null(pa$c)) pa$c <- 0
if(is.null(pa$k)) pa$k <- 100
if(is.null(pa$T)) pa$T <- 100
if(is.null(pa$seed)) pa$seed <- 0
## drawing the first sample x_1
set.seed(pa$seed)
x <- rep(NA, pa$T)
x[1] <- rchisq(n = 1,
df = pa$Mean)
## making the rest of the time series
for (t in 2:pa$T){
x[t] <- pa$c + pa$phi*x[t-1] + rchisq(n = 1,
df = pa$nu)
}
## quick output of raw time series without book-keeping variables/parameters
if(!is.null(pa$only.ts))
if(pa$only.ts==TRUE) return(x)
Empirical.Parameters = list(Mean = mean(x),
Variance = var(x),
Skewness = moments::skewness(x),
AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
)
## making a LaTeX-ready list description of the model
Model.Description <- paste0("$\\chi^2AR(1):", # \\; with",
"\\; \\mu = ",
round(pa$Mean,2),
"(",
round(Empirical.Parameters$Mean,2),
")",
",\\; \\gamma = ",
round(pa$Skewness,2),
"(",
round(Empirical.Parameters$Skewness,2),
")",
",\\; \\phi = ",
round(pa$phi,2),
"(",
round(Empirical.Parameters$AR,2),
")",
",\\; \\nu = ",
round(pa$nu,2),
",\\; c = ",
round(pa$c,3),
",\\; T = ",
pa$T,
"$")
Model.Description.Short <- paste0("$c = ",
round(pa$c,2),
",\\; \\nu = ",
round(pa$nu, 2),
",\\; \\phi = ",
round(pa$phi, 2),
"\\; \\rightarrow",
"\\; \\mu = ",
round(Empirical.Parameters$Mean, 2),
",\\; \\sigma^2 = ",
round(Empirical.Parameters$Variance,2),
",\\; \\gamma = ",
round(Empirical.Parameters$Skewness,2),
"$")
## making the output object
output <- list(x = x,
Model.Description = Model.Description,
Model.Description.Short = Model.Description.Short,
Model.Parameters = pa,
Empirical.Parameters = Empirical.Parameters)
return(output)
}
The canonical parameters of the BinAR(1) model (which we referred to as BinAR(1)
in the simulation) are the survival probability \(\alpha\) (default: alpha = 0.5
) and the revival probability \(\beta\) (default: beta = 0.4
). By default, the maximum value on scale \(k\) was set to k = 10
.
1.1.1.0.5 Construction
We first calculate the \(\theta\) parameter, which characterizes the marginal distribution of the BinaR(1) model:
\[
\theta = \frac{k \beta}{1-(\alpha-\beta)}
\] Then we draw \(X_1\) (x[1]
) from the marginal distribution of the model:
\[ X_1 \sim Binom(k, \theta) \]
The rest of time series is generated sequentially, for each time point \(t\), by drawing values for the number of survived (S_t[t]
) and revived (R_t[t]
) elements of the BinAR(1) model based on the previous observations (\(X_{t-1}\)), and then adding them:
\[ \begin{aligned} S_{t} &\sim Binom(X_{t-1}, \alpha) \\ R_t &\sim Binom(k -X_{t-1}, \beta) \\ X_{t} &= S_t + R_t \end{aligned} \]
1.1.1.0.6 Code
Click to expand the code
dgm_binar <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
## setting default seed if not given
if(is.null(pa$alpha)) pa$alpha <- 0.5
if(is.null(pa$beta)) pa$beta <- 0.4
if(is.null(pa$k)) pa$k <- 10
if(is.null(pa$T)) pa$T <- 100
if(is.null(pa$seed)) pa$seed <- 0
## making other parameters
pa$rho <- pa$alpha - pa$beta
pa$theta <- pa$beta/(1-pa$rho)
## drawing the first sample x_1
set.seed(pa$seed)
x <- rep(NA, pa$T)
x[1] <- rbinom(n = 1,
size = pa$k ,
prob = pa$theta)
## making the rest of the time series
for (t in 2:pa$T){
S_t <- rbinom(n = 1,
size = x[t-1],
prob = pa$alpha)
R_t <- rbinom(n = 1,
size = pa$k - x[t-1],
prob = pa$beta)
x[t] <- S_t + R_t
}
## quick output of raw time series without book-keeping variables/parameters
if(!is.null(pa$only.ts))
if(pa$only.ts==TRUE) return(x)
Empirical.Parameters = list(Mean = mean(x),
Variance = var(x),
Skewness = moments::skewness(x),
AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
)
## making a LaTeX-ready list description of the model
Model.Description <- paste0("$BinAR(1):",
"\\; \\mu = ",
round(pa$Mean,2),
"(",
round(Empirical.Parameters$Mean,2),
")",
",\\; \\gamma = ",
round(pa$Skewness,2),
"(",
round(Empirical.Parameters$Skewness,2),
")",
",\\; \\rho = ",
round(pa$rho,3),
"(",
round(Empirical.Parameters$AR,2),
")",
",\\; \\alpha = ",
round(pa$alpha,2),
",\\; \\beta = ",
round(pa$beta,2),
",\\; \\theta = ",
round(pa$theta,3),
",\\; T = ",
pa$T,
"$")
Model.Description.Short <- paste0("$k =",
round(pa$k,2),
",\\; \\alpha = ",
round(pa$alpha,2),
",\\; \\beta = ",
round(pa$beta,2),
"\\; \\rightarrow",
"\\; \\rho = ",
round(Empirical.Parameters$AR, 2),
",\\; \\mu = ",
round(Empirical.Parameters$Mean, 2),
",\\; \\sigma^2 = ",
round(Empirical.Parameters$Variance,2),
",\\; \\gamma = ",
round(Empirical.Parameters$Skewness,2),
"$")
## making the output object
output <- list(x = x,
Model.Description = Model.Description,
Model.Description.Short = Model.Description.Short,
Model.Parameters = pa,
Empirical.Parameters = Empirical.Parameters)
return(output)
}
The canonical parameters of the PoDAR(1) model (which we referred to as PoDAR(1)
in the simulation) are the persistence probability \(\tau\) (default: tau = 0.7
) and the average rate \(\lambda\) (default: lambda = 0.5
).
1.1.1.0.7 Construction
To generate the time series, we first draw the first observation \(X_1\) (x[1]
) from a Poisson distribution with rate \(\lambda\):
\[ X_1 \sim Poisson(\lambda) \]
And generate the rest of the time series by first drawing \(Z_t\) from a Poisson distribution with rate \(\lambda\) and \(P_t\) from a binomial distribution with size probability of success \(\tau\) (that is equivalent to a Bernoulli distribution with probability \(\tau\)). Then, we calculate \(X_t\) based on the previous observation (x[t-1]
) and values of \(Z_t\) (Z_t[t]
) and \(P_t\) (P_t[t]
), using the definition of the PoDAR(1) model:
\[ \begin{aligned} Z_t &\sim Poisson(\lambda) \\ P_t &\sim Binom(1, \tau) \\ X_t &= P_t X_{t-1} + (1-P_t) Z_t \end{aligned} \]
1.1.1.0.8 Code
Click to expand the code
dgm_podar <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
## setting default seed if not given
if(is.null(pa$tau)) pa$tau <- 0.7
if(is.null(pa$lambda)) pa$lambda <- 0.5
if(is.null(pa$k)) pa$k <- 6
if(is.null(pa$T)) pa$T <- 100
if(is.null(pa$seed)) pa$seed <- 0
## drawing the first sample x_1
set.seed(pa$seed)
x <- rep(NA, pa$T)
x[1] <- rpois(n = 1,
lambda = pa$lambda)
## making the rest of the time series
for (t in 2:pa$T){
V_t <- rbinom(n = 1,
size = 1,
prob = pa$tau)
Z_t <- rpois(n = 1,
lambda = pa$lambda)
x[t] <- V_t*x[t-1] + (1-V_t)*Z_t
}
## quick output of raw time series without book-keeping variables/parameters
if(!is.null(pa$only.ts))
if(pa$only.ts==TRUE) return(x)
Empirical.Parameters = list(Mean = mean(x),
Variance = var(x),
Skewness = moments::skewness(x),
AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
)
## making a LaTeX-ready list description of the model
Model.Description <- paste0("$PoDAR(1):",
"\\; \\mu = ",
round(pa$Mean,2),
"(",
round(Empirical.Parameters$Mean,2),
")",
",\\; \\gamma = ",
round(pa$Skewness,2),
"(",
round(Empirical.Parameters$Skewness,2),
")",
"\\; \\tau = ",
round(pa$tau,3),
"(",
round(Empirical.Parameters$AR,2),
")",
",\\; \\lambda = ",
round(pa$lambda,3),
",\\; T = ",
pa$T,
"$")
Model.Description.Short <- paste0("$\\lambda = ",
round(pa$lambda,3),
",\\; \\tau = ",
round(pa$tau, 2),
"\\; \\rightarrow",
"\\; \\rho = ",
round(Empirical.Parameters$AR, 2),
",\\; \\mu = ",
round(Empirical.Parameters$Mean, 2),
",\\; \\sigma^2 = ",
round(Empirical.Parameters$Variance,2),
",\\; \\gamma = ",
round(Empirical.Parameters$Skewness,2),
"$")
## making the output object
output <- list(x = x,
Model.Description = Model.Description,
Model.Description.Short = Model.Description.Short,
Model.Parameters = pa,
Empirical.Parameters = Empirical.Parameters)
return(output)
}
# General DGM wrappers ----------------------------------------------------
1.1.2 General DGM wrappers
Given that, in each model, two canonical parameters characterize the dynamic and marginal features of the generated time series, and given that we have analytic formulas that link the canonical parameters to the model-implied \(\phi\), \(\mu\), \(\sigma^2\), and \(\gamma\), we use a function (dgm_parameterizer
) to calculate canonical parameters from two given parameters, and make a complete list of parameters (called pa
). This list also includes non-parameter variables, importantly, the time series length \(T\) (saved in pa$T
) and the random seed used in the dgm_*
functions (saved in pa$seed
). A wrapper function (dgm_generator
) is used as an interface to all dgm_*
functions, which first makes sure the given parameters are sufficient for data generation, makes a complete parameter list pa
with the help of dgm_parameterizer
, and passes pa
to the respective DGM generating function.
The function dgm_parameterizer
calculates canonical/model-implied parameters of a given DGM (specified using the Model
argument) based on the parameters given to it as arguments, and saves them in a list of parameters (pa
), which s returned by the function. The function makes sure that the set of parameters provided are sufficient to characterize the dynamic parameter of the model (i.e., the autoregression \(\phi\)) and at least one of the marginal parameters (importantly, the mean \(\mu\)) but giving default values to some parameters.
Click to expand the code
dgm_parameterizer <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
if(is.null(pa$Model)) pa$Model <- "ChiAR(1)"
if(is.null(pa$phi)) pa$phi <- 0.2
## %%%%%%%%%%%%
## for NAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "nar(1)" | tolower(pa$Model) == "nar"){
if(is.null(pa$k)) pa$k <- 100
## Calculating model parameters
## if mean is given
if (!is.null(pa$Mean)) {
# then from the mean formula
pa$c <- pa$Mean * (1 - pa$phi)
}
## if mean is not given, we get it from the intercept
else{
# set intercept to zero, if already not defined
if(is.null(pa$c)) pa$c <- 0
# then from the mean formula
pa$Mean <- pa$c / (1 - pa$phi)
}
## Now we certainly have the mean calculated
## The within-person (marginal) variance is more interpretable. So
## we always calculate the residual variance based on the marginal
## variance, unless var.resid is known BUT var.marginal is NOT.
if (!is.null(pa$var.resid) & is.null(pa$var.marginal)) {
# from the marginal variance formula
pa$var.marginal <- pa$var.resid / (1 - pa$phi ^ 2)
}
## If none of them are known we first give var.marginal a default
## value and calculate var.resid based on that.
if (is.null(pa$var.resid) & is.null(pa$var.marginal)) {
# we set a default value for the marginal variance
pa$var.marginal <- 4
}
## Then, we calculate var.resid (again) with var.marginal to make
## sure that the marginal variance had been given more importance
pa$var.resid <- pa$var.marginal * (1 - pa$phi ^ 2)
## Returning the parameter list
return(pa)
}
## %%%%%%%%%%%%
## for ChiAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "chiar(1)" | tolower(pa$Model) == "chiar" |
tolower(pa$Model) == "chi2ar(1)" | tolower(pa$Model) == "chi2ar"){
if(is.null(pa$k)) pa$k <- 100
## Calculating model parameters
## if mean and skewness are given
if (!is.null(pa$Mean) & !is.null(pa$Skewness)) {
# from the skewness formula
pa$nu <-
8 * (1 - pa$phi ^ 2) ^ 3 / ((pa$Skewness ^ 2) * (1 - pa$phi ^
3) ^ 2)
# then from the mean formula
pa$c <- pa$Mean * (1 - pa$phi) - pa$nu
# returning the parameter list
return(pa)
}
## if mean and skewness are not given at the same time, we use c
## and one other parameter
else{
# set intercept to zero, if already not defined
if(is.null(pa$c)) pa$c <- 0
## if mean is given
if (!is.null(pa$Mean)) {
# then from the mean formula
pa$nu <- pa$Mean * (1 - pa$phi) - pa$c
# from the skewness formula
pa$Skewness <-
2 * (1 - pa$phi ^ 2) ^ 1.5 / (sqrt(pa$nu / 2) * (1 - pa$phi ^
3))
# returning the parameter list
return(pa)
}
## if skewness is given
if (!is.null(pa$Skewness)) {
# from the skewness formula
pa$nu <-
8 * (1 - pa$phi ^ 2) ^ 3 / ((pa$Skewness ^ 2) * (1 - pa$phi ^
3) ^ 2)
# then from the mean formula
pa$Mean <- (pa$c + pa$nu) / (1 - pa$phi)
# returning the parameter list
return(pa)
}
## if nu is given
if(!is.null(pa$nu)){
# from the mean formula
pa$Mean <- (pa$c + pa$nu)/(1-pa$phi)
# from the skewness formula
pa$Skewness <-
2 * (1 - pa$phi ^ 2) ^ 1.5 / (sqrt(pa$nu / 2) * (1 - pa$phi ^
3))
# returning the parameter list
return(pa)
}
}
}
## %%%%%%%%%%%%
## for BinAR(1)
## %%%%%%%%%%%%
if (tolower(pa$Model) == "binar(1)" | tolower(pa$Model) == "binar") {
if (is.null(pa$k))
pa$k <- 10
## Calculating model parameters
## if mean is given
if (!is.null(pa$Mean)) {
# from skewness formula: m = k*theta
pa$theta <- pa$Mean / pa$k
## we then calculate skewness based on theta
pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
## we then calculate beta based on theta and phi
pa$beta <- pa$theta * (1 - pa$phi)
# then we calculate alpha
pa$alpha <- pa$phi + pa$beta
# finally we calculate k*beta, which is equivalent to c
pa$c <- pa$k * pa$beta
# returning the parameter list
return(pa)
}
## if skewness is given
if (!is.null(pa$Skewness)) {
# from skewness formula: skewness = (1-2*theta)/sqrt(k*theta*(1-theta))
# it is easier to write with ks = k*(skewness^2)
ks <- pa$k*pa$Skewness^2
pa$theta <- (ks + 4 -
sqrt(ks * (ks + 4))) / (2 * ks + 8)
# The above formula only gives theta < 0.5, thus for negative skewness
# we must use 1-theta instead
if(pa$Skewness < 0) pa$theta <- 1 - pa$theta
# we then calculate mean based on theta
pa$Mean <- pa$k * pa$theta
# we then calculate beta based on theta and phi
pa$beta <- pa$theta * (1 - pa$phi)
# then we calculate alpha
pa$alpha <- pa$phi + pa$beta
# finally we calculate k*beta, which is equivalent to c
pa$c <- pa$k * pa$beta
# returning the parameter list
return(pa)
}
## if theta is given
if (!is.null(pa$theta)) {
# from theta formula
pa$beta <- pa$theta * (1 - pa$phi)
# from beta formula
pa$alpha <- pa$phi + pa$beta
# we then calculate mean based on theta
pa$Mean <- pa$k * pa$theta
# we then calculate skewness based on theta
pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
# finally we calculate k*beta, which is equivalent to c
pa$c <- pa$k * pa$beta
# returning the parameter list
return(pa)
}
## if alpha is given
if (!is.null(pa$alpha)) {
# from beta formula
pa$beta <- pa$alpha - pa$phi
# from theta formula
pa$theta <- pa$beta / (1 - pa$phi)
# we then calculate mean based on theta
pa$Mean <- pa$k * pa$theta
# we then calculate skewness based on theta
pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
# finally we calculate k*beta, which is equivalent to c
pa$c <- pa$k * pa$beta
# returning the parameter list
return(pa)
}
## if beta is given
if (!is.null(pa$beta)) {
# from beta formula
pa$alpha <- pa$phi + pa$beta
# from theta formula
pa$theta <- pa$beta / (1 - pa$phi)
# we then calculate mean based on theta
pa$Mean <- pa$k * pa$theta
# we then calculate skewness based on theta
pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
# finally we calculate k*beta, which is equivalent to c
pa$c <- pa$k * pa$beta
# returning the parameter list
return(pa)
}
## if intercept given
if (!is.null(pa$c)) {
# from intercept formula c = k*beta
pa$beta <- pa$c / pa$k
# from beta formula
pa$alpha <- pa$phi + pa$beta
# from theta formula
pa$theta <- pa$beta / (1 - pa$phi)
# we then calculate mean based on theta
pa$Mean <- pa$k * pa$theta
# we then calculate skewness based on theta
pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
# returning the parameter list
return(pa)
}
}
## %%%%%%%%%%%%
## for PoDAR(1)
## %%%%%%%%%%%%
if (tolower(pa$Model) == "podar(1)" | tolower(pa$Model) == "podar") {
if (is.null(pa$k))
pa$k <- 50
# phi and tau are the same, then if tau is defined, it overrules phi
if(!is.null(pa$tau)) pa$phi <- pa$tau
# and if tau is not defined, then tau will get the value of phi
if(is.null(pa$tau)) pa$tau <- pa$phi
# DAR(1) intercept is zero
pa$c <- 0
## Calculating model parameters
## if mean is given
if (!is.null(pa$Mean)) {
# mean of Poisson is lambda
pa$lambda <- pa$Mean
## we then calculate skewness based on lambda
pa$Skewness <- pa$lambda^(-0.5)
# returning the parameter list
return(pa)
}
## if skewness is given
if (!is.null(pa$Skewness)) {
# from skewness formula: skewness = 1/sqrt(lambda)
pa$lambda <- pa$Skewness^(-2)
# we then calculate mean based on theta
pa$Mean <- pa$lambda
# returning the parameter list
return(pa)
}
## if lambda is given
if (!is.null(pa$lambda)) {
# mean of Poisson is lambda
pa$Mean <- pa$lambda
## we then calculate skewness based on lambda
pa$Skewness <- pa$lambda^(-0.5)
# returning the parameter list
return(pa)
}
}
}
The function dgm_generator
gets a set of parameters (either as separate arguments, or a list of parameters, like the one returned by dgm_parameterizer
), saves them in a list called pa
. It checks whether \(\phi\) is included in the list (if not, sets the default value pa$phi = 0.2
), and checks if at least one other parameter (which, together with \(\phi\), is required to characterize the marginal properties of the DGMs) is calculated for it (if not, it sets the default value pa$Mean = 5
for \(\mu\)). Furthermore, if the DGM name, time series length, and the random seed are not provided, it gives them default values (respectively: Model = "ChiAR(1)"
, T = 100
, and seed = 0
) and adds them to pa
.
Then, it passes the pa
list to dgm_parameterizer
to do the necessary conversions to complete the list of canonical and model-implied parameters. Finally, given the model name, it checks if non-canonical parameters \(k\) and \(c\) are set (otherwise assigns appropriate defaults to them), and passes the complete parameter list to the respective DGM function.
Click to expand the code
dgm_generator <- function(...){
pa <- list(...)
if(is.list(pa$pa)) pa <- pa$pa
## setting default seed if not given
if(is.null(pa$Model)) pa$Model <- "ChiAR(1)"
if(is.null(pa$phi)) pa$phi <- 0.2
# if no model parameter is given, then mean is set to a default 5
if (is.null(pa$Mean) &
is.null(pa$Skewness) &
is.null(pa$c) &
is.null(pa$nu) &
is.null(pa$alpha) &
is.null(pa$beta) &
is.null(pa$theta)
) pa$Mean <- 5
# if(is.null(pa$Mean)) pa$Variance <- 10
# if(is.null(pa$Skewness)) pa$Skewness <- 3
if(is.null(pa$T)) pa$T <- 100
if(is.null(pa$seed)) pa$seed <- 0
if(is.null(pa$only.ts)) pa$only.ts <- FALSE
## If you set `pa$only.ts` parameter as TRUE, the dgm_ functions produce only
## the raw time series (a single numeric vector) which is way lighter and way
## faster:
# pa$only.ts <- TRUE
## calculating model parameters
pa <- dgm_parameterizer(pa = pa)
### making models
## %%%%%%%%%%%%
## NAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "nar(1)" | tolower(pa$Model) == "nar"){
# default maximum scale value
if(is.null(pa$k)) pa$k <- 100
## %% Generating the data
o <- dgm_nar(pa = pa)
}
## %%%%%%%%%%%%
## ChiAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "chiar(1)" | tolower(pa$Model) == "chiar" |
tolower(pa$Model) == "chi2ar(1)" | tolower(pa$Model) == "chi2ar"){
# default maximum scale value
if(is.null(pa$k)) pa$k <- 100
## %% Generating the data
o <- dgm_chiar(pa = pa)
}
## %%%%%%%%%%%%
## BinAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "binar(1)" | tolower(pa$Model) == "binar"){
# default maximum scale value
if(is.null(pa$k)) pa$k <- 10
## %% Generating the data
o <- dgm_binar(pa = pa)
}
## %%%%%%%%%%%%
## DAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "dar(1)" | tolower(pa$Model) == "dar"){
# default maximum scale value
if(is.null(pa$k)) pa$k <- 10
## %% Generating the data
o <- dgm_dar(pa = pa)
}
## %%%%%%%%%%%%
## PoDAR(1)
## %%%%%%%%%%%%
if(tolower(pa$Model) == "podar(1)" | tolower(pa$Model) == "podar"){
# default maximum scale value
if(is.null(pa$k)) pa$k <- 100
## %% Generating the data
o <- dgm_podar(pa = pa)
}
## Also allow a data frame output
# if(!is.null(p$as.dataframe) & p$as.dataframe){
#
# }
return(o)
}
# Dataset generation ------------------------------------------------------
1.1.3 Dataset generation
The machinery described above can be used to generate individual (\(N=1\)) time series. However, for the simulation study, we need datasets comprising of multiple (\(N=25, 50, 100\)) individuals. As we discussed in the paper, in our study, all individuals in a dataset of a DGM share the same autoregressive parameter (\(\phi_i=0.4\)) and the individual differences are only in the individual means (\(\mathbb{\mu} = [\mu_1 , \mu_2, \dots, \mu_N]\)). Thus, we write a function (dgm_make.sample
) that can generate, for each DGM, a dataset of \(N\) individuals based on an \(N\)-dimensional vector of individual means, all with the same \(\phi_i\). We then need to find the appropriate parameters for the level-2 distributions (Gaussian and \(\chi^2\) distributions) for each DGM, such that the we get a considerable proportion of individuals with considerably high skewness while respecting the lower and upper bounds of values supported by each model. Finally, with a wrapper function (make_datasets
), we facilitate making dataset by automatically generating the means vector suitable for each DGM.
The function dgm_make.sample
generates a dataset of time series of length T
with the autoregressive parameter phi
from a desired DGM (determined by the Model
argument) given a vector of means (passed as the argument Means
). The length of Means
determine the number of individuals in the dataset (N <- length(Means)
). If Means
is not provided, a randomly generated vector of \(N = 100\) is used as default. Since each individual time series is generated with a random seed, we need a vector of N
unique seeds, which can be provided using the seeds
argument. In case seeds
is not provided, it is generated based on the provided means (seeds.from.means
), and if it is a scalar, the seeds vector is created by adding the scalar to the seeds.from.means
(which would allow generating different datasets with the same mean distributions).
Click to expand the code
dgm_make.sample <- function(Model = "ChiAR(1)",
Means = rnorm(100, 5, 3),
T = 100,
phi = 0.4,
Phis = NULL,
seeds = NULL
){
N <- length(Means)
seeds.from.means <- 1000*N*Means
if(is.null(seeds)) seeds <- seeds.from.means
if(length(seeds)<=1) seeds <- seeds + seeds.from.means
df <- data.frame(subject = rep(1:N, each = T),
t = rep(1:T, times = N),
x = rep(NA, N*T))
# Adapting for v2 implementation
if(length(Phis) != N) Phis <- rep(phi, N)
for(s in 1:N){
x <- dgm_generator(
Model = Model,
only.ts = TRUE,
T = T,
phi = Phis[s],
Mean = Means[s],
seed = seeds[s])
df$x[((s-1)*T + 1):(s*T)] <- x
}
return(df)
}
For each alternative DGM—the \(\chi^2\)AR(1), BinAR(1), and PoDAR(1) models—we should determine appropriate parameters for the level-2 distribution of means such that we have enough skewness in the generated datasets. To do so, we make a function (Mean.vs.Skewness
) to help us experiment with different values for \(\mu\) and \(\sigma^2\) (of the Gaussian level-2 distribution) and \(\nu\) (of the \(\chi^2\) level-2 distribution) for each alternative DGM. Note that we start by generating more than enough samples for each distribution (\(10 \times N\)) and subsample \(N\) values after applying the model-specific lower and upper bounds.
We notice that we get the desired distribution of skewness with the following parameters:
Model | \(\mu\) | \(\sigma^2\) | \(\nu\) |
---|---|---|---|
\(\chi^2\)AR(1) | 10 | 10 | 5 |
BinAR(1) | 2 | 1 | 2.9 |
PoDAR(1) | 4 | 4 | 1.5 |
Giving us the following distributions:
We then use a wrapper (make_datasets
) around dgm_make.sample
that generates datasets for all four DGMs with the appropriate level-2 parameters specified above. Note that here we first generate more than enough (i.e., \(2 \times N\)) samples of means to make sure we end up with \(N\) samples after applying the upper and lower bounds. The datasets are then saved, with some additional variables in separate .rds
files, using wrapper functions described in Section 2.2.
Click to expand the code
make_datasets <- function(Model = "DAR",
T = 100,
N = 100,
phi = 0.4,
l2.distribution = "Gaussian",
uSeed = 0,
version = "v2") {
# save global seed of the global env and set it back before leaving
seed.old <- .Random.seed
on.exit({
.Random.seed <<- seed.old
})
set.seed(uSeed)
if (tolower(Model) == "nar") {
model.name <- "NAR"
lower_bound <- 0
upper_bound <- 100
lev2.Mean <- 50
lev2.Variance <- 4
chi2.df <- 2
}
if (tolower(Model) == "chiar" | tolower(Model) == "chi2ar") {
model.name <- "Chi2AR"
lower_bound <- 0
upper_bound <- 100
lev2.Mean <- 10
lev2.Variance <- 10
chi2.df <- 5
}
if (tolower(Model) == "binar") {
model.name <- "BinAR"
lower_bound <- 0
upper_bound <- k
lev2.Mean <- 2
lev2.Variance <- 1
chi2.df <- 2.9
}
if (tolower(Model) == "podar") {
model.name <- "PoDAR"
lower_bound <- 0
upper_bound <- 100
lev2.Mean <- 4
lev2.Variance <- 4
chi2.df <- 1.5
}
# sampling within-person mean from level 2 distribution
if (l2.distribution == "Gaussian")
Means <- rnorm(2 * N, lev2.Mean, sqrt(lev2.Variance))
if (l2.distribution == "Chi2")
Means <- rchisq(2 * N, chi2.df)
# removing out-of-bounds means
Means[Means < lower_bound] <- NA
Means[Means > upper_bound] <- NA
# keeping N samples from the in-bound means
Means <- Means %>% na.omit() %>% sample(N)
# Adapting for v2 implementation
Phis <- rep(phi, N)
## Implementing v2
if(version == "v2"){
# sampling within-person phis
Phis <- rnorm(2 * N, phi, 0.1)
# removing out-of-bounds phis
Phis[Phis <= 0] <- NA
Phis[Phis >= 1] <- NA
# keeping N samples from the in-bound means
Phis <- Phis %>% na.omit() %>% sample(N)
}
# Making a dataframe using dgm_make.sample
sample_df <- dgm_make.sample(
Model = Model,
Means = Means,
T = T,
Phis = Phis,
seeds = NULL
)
if(version == "v2") return(list(dataset = sample_df,
Means = Means,
Phis = Phis))
return(sample_df)
}
1.2 The Analysis component
We analyzed each dataset with AR(1) models with fixed and random residual variance using Mplus v. 8.1 (Muthén and Muthén 2017). To interface Mplus from R, we used the package MplusAutomation
(Hallquist and Wiley 2018) and wrote a function (run_MplusAutomation
) that for each iteration of each condition would save the dataset as a .dat
file, generate the .inp
input script for the desired analysis type, and run the model for that dataset. Mplus then saves the output files (.out
and .gh5
). After the analysis, run_MplusAutomation
extracts parameter estimates and returns them in an R
object, which, with some additional variables, are saved in separate .rds
files using wrapper functions described in Section 2.2.
Click to expand the code
run_MplusAutomation <- function(df,
PROCESSORS = 2,
CHAINS = 2,
THIN = 2,
BITERATIONS.min = 2000,
BITERATIONS.max =
BITERATIONS.min * BITERATION.minmax.factor,
BITERATION.minmax.factor = 2.5,
out.folder = "Mplus-files/",
model_what = "resid.random",
file.name = Sys.Date(),
remove.files = FALSE,
drop.gh5 = FALSE,
version = "v2") {
inp.name <- paste0(out.folder,
file.name)
MplusAutomation::prepareMplusData(df,
paste0(inp.name, ".dat"))
VARIABLE <- glue::glue("CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);")
ANALYSIS <- glue::glue(
"TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = {PROCESSORS};
CHAINS = {CHAINS};
THIN = {THIN};
BITERATIONS = {BITERATIONS.max}({BITERATIONS.min});"
)
PLOT <- glue::glue("TYPE = PLOT3;
FACTORS = ALL (500);")
if (model_what == "resid.random")
model_string <- glue::glue("%WITHIN%
phi | x ON x&1;
logv | x;
%BETWEEN%
x phi logv WITH x phi logv;")
if (model_what == "resid.fixed")
model_string <- glue::glue("%WITHIN%
phi | x ON x&1;
%BETWEEN%
x phi WITH x phi;")
if (model_what == "within.between") {
model_string <- glue::glue("%WITHIN%
x;
%BETWEEN%
x;")
VARIABLE <- "CLUSTER = subject;"
PLOT <- "TYPE = PLOT3;"
}
model.ar1 <- MplusAutomation::mplusObject(
TITLE = inp.name,
rdata = df,
usevariables = c("subject", "t", "x"),
VARIABLE = VARIABLE,
ANALYSIS = ANALYSIS,
MODEL = model_string,
OUTPUT = "TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;",
PLOT = PLOT
)
fit.ar1 <- MplusAutomation::mplusModeler(
model.ar1,
check = FALSE,
modelout = paste0(inp.name, ".inp"),
hashfilename = FALSE,
run = 1L
)
# Adapting for v2 implementation:
if(version == "v2") remove.files <- drop.gh5 <- TRUE
# Removing Mplus files generated by the R function
if (remove.files == TRUE)
remove.files <- file.remove(file.path(
out.folder,
dir(
path = out.folder,
pattern = file.name,
ignore.case = TRUE
)
))
# Dropping gh5 from the R object
if(drop.gh5 == TRUE) fit.ar1$results$gh5 <- NULL
return(fit.ar1)
}
In each analysis, we simulated two MCMC chains (CHAINS = 2
), and to reduce autocorrelation in the estimated parameters, by defining THIN = 5
we asked Mplus to save every 5th sample. By setting BITERATIONS = 5000(2000)
, we made sure to have between 2000 to 5000 samples (after thinning) for each parameter from each chain. Mplus considers the first half of each chain as burn-in samples and discards them, thus, in total, we got at least 2000 “independent” samples from both chains combined. Finally, with FACTORS = ALL (500)
we asked Mplus to draw 500 samples for each individuals when estimating level-1 parameters. We visually inspected the traceplots and autocorrelation plots of parameter estimates and of a sample of analyzed datasets and good convergence was observed. Furthermore, to make sure the number of iterations and thinning used in the analyses provide sufficiently accurate estimates, we re-analyzed two replications of each alternative DGM with Gaussian and \(\chi^2\)-distributed means with \(N=100\) and \(T=100\) with BITERATIONS = 12500(5000)
and THIN = 20
, which led to estimates of the parameters of interest (unstd X.WITH.PHI
, unstd Variances.PHI
, and stdyx X.WITH.PHI
) almost identical (up to the third decimal) to those estimated with BITERATIONS = 5000(2000)
and THIN = 5
(see below).
The generated input files looked like the following. Note that the TITLE
and DATA
strings in the .inp
files are unique to the dataset being analyzed and included the unique dataset seed uSeed
(passed to make_datasets
to generate datasets), the number of individuals in the dataset N
, the length of the time series T
, the model types used (resid.fixed
or resid.random
), and the replication number Rep
(see Section 2.1 for further details).
By estimating the covariances between mean and autoregression (x phi WITH x phi
under the BETWEEN%
command), the following Mplus script runs an AR(1) with random effect mean and autoregressive parameter but with fixed effect residual variance.
Click to expand the code
TITLE:
fit_uSeed-000000_N-100_T-100_type-resid.fixed_Rep-1
DATA:
FILE = "fit_uSeed-000000_N-100_T-100_type-resid.fixed_Rep-1.dat";
VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);
ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);
MODEL:
%WITHIN%
phi | x ON x&1;
%BETWEEN%
x phi WITH x phi;
OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;
PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);
By estimating the logarithm of the residual variance at level 1 (by logv | x
under the %WITHIN%
command) and estimating the covariances between the level-2 mean, autoregression, and residual variance (x phi logv WITH x phi logv
under the %BETWEEN%
command), the following Mplus script runs an AR(1) model with random effect mean, autoregressive parameter, and residual variance.
Click to expand the code
TITLE:
fit_uSeed-000000_N-100_T-100_type-resid.random_Rep-1
DATA:
FILE = "fit_uSeed-000000_N-100_T-100_type-resid.random_Rep-1.dat";
VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);
ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);
MODEL:
%WITHIN%
phi | x ON x&1;
logv | x;
%BETWEEN%
x phi logv WITH x phi logv;
OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;
PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);
The table below shows the estimated parameters of two replications of with BITERATIONS = 5000(2000)
and THIN = 5
(specified in the table with 2k x 5
) and BITERATIONS = 12500(5000)
and THIN = 20
(specified in the table with 5k x 20
). The relevant parameters (level-2 correlations X.WITH.PHI unstd
, covariances X.WITH.PHI stdyx
, and variance Variances.PHI unstd
) are highlighted with orange.
1.3 The Harvesting component
Extracting parameter estimates of individual analyses
The function fit_extract
gets an .rds
file generated by do_fit_doFuture
(see Section 2.2)—which includes book-keeping information and Mplus output object that was generated by run_MplusAutomation
(see Section 1.2)—and extracts the unstandardized
and stdyx.standardized
Mplus parameter estimates and stores them, along with the additional book-keeping information, in a dataframe.
Click to expand the code
fit_extract <- function(rds.file){
m <- readRDS(rds.file)
est.par <- m[["fit.Dataset"]][["results"]][["parameters"]]
# make empty dataframe for NAs
empty <- data.frame(matrix(ncol = 19, nrow = 0))
colnames(empty) <- c("uSeed", "type", "l2.dist", "Model", "N", "phi", "T",
"Rep", "standardization", "est", "posterior_sd", "pval",
"lower_2.5ci", "upper_2.5ci", "sig", "BetweenWithin",
"param.name", "fit.ElapsedTime", "fit.File")
if(length(est.par) == 0) return(empty)
if(is.null(est.par[["unstandardized"]])) return(empty)
if(is.null(est.par[["stdyx.standardized"]])) return(empty)
unstd <- est.par[["unstandardized"]] %>%
mutate(param.name = paste(paramHeader,
param,
sep = ".")
) %>%
select(-paramHeader:-param) %>%
mutate(standardization = "unstd",
.before = est)
stdyx <- est.par[["stdyx.standardized"]] %>%
mutate(param.name = paste(paramHeader,
param,
sep = ".")
) %>%
select(-paramHeader:-param) %>%
mutate(standardization = "stdyx",
.before = est)
m$fit.Dataset <- NULL
res <- unstd %>%
rbind(stdyx) %>%
mutate(fit.ElapsedTime = (difftime(m[["fit.EndTime"]],
m[["fit.StartTime"]],
units="mins")
) %>%
as.numeric(),
fit.File = gsub(".*/", "", m$fit.File)) %>%
mutate(uSeed = m$uSeed,
type = m$type,
l2.dist = m$l2.dist,
Model = m$Model,
N = m$N,
phi = m$phi,
T = m$T,
Rep = m$Rep,
.before = standardization
)
return(res)
}
1.4 The Reporting component
This component includes a function that clean the harvested dataset (and extract the relevant information) and functions to generate dataset profile plots, results figures, and parameter estimates coverage plots.
Given that the figures contain customized typefaces (from the CMU Serif
and Merriweather
font families), we need to make load the fonts:
Click to expand the code
library(showtext)
# Check the current search path for fonts
font_paths()
#> [1] "C:\\Windows\\Fonts"
# List available font files in the search path
f <- font_files()
font_add("CMU Classical Serif", "cmunci.ttf")
font_add("CMU Serif Upright Italic", "cmunui.ttf")
font_add("CMU Serif", "cmunrm.ttf")
font_add("Merriweather Regular", "Merriweather-Regular.ttf")
font_add("Merriweather Light", "Merriweather Light.ttf")
font_families()
## automatically use showtext for new devices
showtext_auto()
p.colors <-
brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)]
palette_nar <- brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)]
palette_chiar <- brewer.pal(name = "GnBu", n = 9)[c(5, 7, 9)]
palette_binar <- brewer.pal(name = "YlGn", n = 9)[c(5, 7, 9)]
palette_podar <- brewer.pal(name = "BuPu", n = 9)[c(5, 7, 9)]
# Pair plots --------------------------------------------------------------
1.4.1 Cleaning the harvested dataset
Given that the output of fit_extract
has too much information in it, the function harvest_cleanup
extracts the relevant parameter estimates—the point estimates and the upper and lower 2.5 crediblity intervals of level-2 covariance (unstd X.WITH.PHI)
, level-2 correlation (stdyx X.WITH.PHI
), and fixed-effect autoregressive parameter at level 2 (unstd Means.PHI
)—cleans the datasets and makes it tidy, and calculates the outcomes of interest, namely, the (absolute) average estimates, mean absolute and squared error, root mean squared error, estimation bias and variance, number and percentage of of non-converged datasets, and the percentages of significantly positive or negative or non-significant parameter estimates.
Click to expand the code
harvest_cleanup <- function(harv,
return.abridged = FALSE) {
d_abridged <- harv %>%
na.omit() %>%
mutate(std_parname = paste(standardization, param.name)) %>%
filter(std_parname %in% c("stdyx X.WITH.PHI",
"unstd X.WITH.PHI",
"unstd Means.PHI")) %>%
select(-standardization) %>%
mutate(
parameter = case_when(
std_parname == "stdyx X.WITH.PHI" ~ "Correlation",
std_parname == "unstd X.WITH.PHI" ~ "Covariance",
std_parname == "unstd Means.PHI" ~ "Fixed.Phi",
),
.after = Rep
) %>%
select(uSeed:sig) %>%
mutate(
true.value = case_when(
parameter == "Correlation" ~ 0,
parameter == "Covariance" ~ 0,
parameter == "Fixed.Phi" ~ 0.4
),
.after = sig
) %>%
mutate(l2.X.Model = as.factor(paste(l2.dist, Model)),
.before = N) %>%
mutate(
sign.X.sig = case_when(
upper_2.5ci < true.value ~ "Negative",
lower_2.5ci > true.value ~ "Positive",
TRUE ~ "Zero"
)
) %>%
group_by(type,
l2.X.Model,
N,
T,
parameter) %>%
mutate(
mean.est = mean(est),
mean.abs.est = mean(abs(est)),
MAE = mean(abs(est - true.value)),
MSE = mean((est - true.value) ^ 2),
RMSE = sqrt(mean((est - true.value) ^ 2)),
Bias = mean(est) - true.value,
Variance = var(est),
n.converged.datasets = n(),
nonconverged.percent = round(100 * (1000 - n()) / 1000, 2)
) %>%
arrange(est) %>%
ungroup() %>%
group_by(type,
N,
T,
l2.dist,
Model) %>%
mutate(ord = order(est) - n()/2) %>%
group_by(sign.X.sig,
.add = TRUE) %>%
mutate(percent = round(100 * n() / n.converged.datasets, 1),
.after = sign.X.sig) %>%
relocate(sign.X.sig:nonconverged.percent,
.after = parameter) %>%
relocate(est,
.after = percent)
levels(d_abridged$sign.X.sig) <- c(
"Significant negative estimates",
"Non-significant estimates",
"Significant positive estimates"
)
if(return.abridged == TRUE) return(d_abridged)
d_summary <- d_abridged %>%
select(type:nonconverged.percent) %>%
select(-Rep, -est) %>%
distinct() %>%
mutate(`Type-1 error` = percent,
.after = Variance
)
d_important <- d_summary[with(d_summary,
order(type,
l2.X.Model,
parameter,
N,
T)), ] %>%
mutate(`Model name` = factor(paste0(Model, "(1)"),
levels = c("NAR(1)",
"Chi2AR(1)",
"BinAR(1)",
"PoDAR(1)")
),
`Means distribution` = paste0(l2.dist, "-distributed means"),
Resid = case_when(type == "resid.fixed" ~ "`Fixed Residual Variance`",
type == "resid.random" ~ "`Random Residual Variance`"),
.after = l2.X.Model)
levels(d_important$`Model name`) <- c(`NAR(1)` = "AR(1)",
`Chi2AR(1)` = TeX("$\\chi^2$AR(1)"),
`BinAR(1)` = "BinAR(1)",
`PoDAR(1)` = "PoDAR(1)")
return(d_important)
}
1.4.2 Making distribution plots of simulated datasets
These functions visualize datasets (simulated or empirical) by generating individual histograms and pairplots of summary statistics of datasets, and a function to generate (and combine) these two plots for any given dataset simulated by do_sim_parallel
in the pipeline (see Section 2.2).
The function plot_histograms
makes person histograms of the individuals in a dataset (stored as dataframes) and arranges them based on individual means.
Click to expand the code
plot_histograms <- function(d,
binwidth = 0.5,
nrow.facet = 20,
p.colors = c("#FE9929",
"#CC4C02",
"#662506")) {
fill_col <- p.colors[2]
x.range <- d$x %>% range()
d %>%
group_by(subject) %>%
mutate(mm = mean(x)) %>%
arrange(desc(mm)) %>%
ggplot(aes(x = x,
y = ..ndensity..)) +
geom_histogram(binwidth = binwidth,
fill = fill_col,
center = 0) +
facet_wrap(~ reorder(subject, mm),
nrow = nrow.facet) +
geom_hline(yintercept = 0,
size = 0.1) +
ggtitle(NULL) +
theme_tufte() +
theme(
strip.background = element_blank(),
aspect.ratio = 1,
strip.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank()
)
}
The function plot_pairplots
calculates the mean, variance, and skewness of each individual in a dataset (stored as dataframes) and generates pairplots of their joint distribution.
Click to expand the code
plot_pairplots <- function(d,
p.colors = c("#FE9929",
"#CC4C02",
"#662506"),
rel_title = 2,
rel_lines = 0.5,
rel_dots = 3,
bins = 30) {
color_hist <- p.colors[2]
color_scatter <- p.colors[1]
d <- d %>%
group_by(subject) %>%
summarise(
Mean = mean(x,
na.rm = TRUE),
Variance = var(x,
na.rm = TRUE),
Skewness = moments::skewness(x,
na.rm = TRUE)
) %>%
na.omit()
cors <- d %>%
select(Mean, Variance, Skewness) %>%
correlation::correlation() %>%
as.data.frame() %>%
mutate(
value = round(r, 3),
significance = cut(
p,
breaks = c(0, 0.001, 0.01, 0.05, 1),
include.lowest = T,
labels = c('***', '**', '*', '')
),
string = paste0(value, significance)
) %>%
select(-r:-n_Obs)
margin.thing <- 5
## Correlations
p_cor_Mean.Variance <-
ggplot() +
geom_text(aes(
x = mean(range(d$Variance)),
y = mean(range(d$Mean)),
label = paste(
"Corr:",
cors %>%
filter(Parameter1 == "Mean",
Parameter2 == "Variance") %>%
pull(string)
)
)) +
theme_tufte() +
theme(
text = element_text(size = rel(rel_title)),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank()
)
p_cor_Mean.Skewness <-
ggplot() +
geom_text(aes(
x = mean(range(d$Skewness)),
y = mean(range(d$Mean)),
label = paste(
"Corr:",
cors %>%
filter(Parameter1 == "Mean",
Parameter2 == "Skewness") %>%
pull(string)
)
)) +
theme_tufte() +
theme(
text = element_text(size = rel(rel_title)),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank()
)
p_cor_Variance.Skewness <-
ggplot() +
geom_text(aes(
x = mean(range(d$Skewness)),
y = mean(range(d$Variance)),
label = paste(
"Corr:",
cors %>%
filter(Parameter1 == "Variance",
Parameter2 == "Skewness") %>%
pull(string)
)
)) +
theme_tufte() +
theme(
text = element_text(size = rel(rel_title)),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank()
)
## Histograms
p_mean <-
d %>%
ggplot(aes(x = Mean)) +
geom_histogram(fill = color_hist,
bins = bins) +
ggtitle("Mean") +
theme_tufte() +
theme(
plot.title = element_text(
size = rel(rel_title),
hjust = 0.5,
margin = margin(t = margin.thing,
b = -margin.thing)
),
axis.title = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()
)
p_variance <-
d %>%
ggplot(aes(x = Variance)) +
geom_histogram(fill = color_hist,
bins = bins) +
ggtitle("Variance") +
theme_tufte() +
theme(
plot.title = element_text(
size = rel(rel_title),
hjust = 0.5,
margin = margin(t = margin.thing,
b = -margin.thing)
),
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank()
)
p_skewness <-
d %>%
ggplot(aes(x = Skewness)) +
geom_histogram(fill = color_hist,
bins = bins) +
ggtitle("Skewness") +
geom_vline(
xintercept = 0,
alpha = 0.5,
color = "gray45",
linetype = "solid",
size = rel(rel_lines)
) +
geom_vline(
xintercept = c(-0.5, 0.5),
color = "gray45",
linetype = "dotted",
size = rel(rel_lines)
) +
geom_vline(
xintercept = c(-1, 1),
color = "gray45",
linetype = "dashed",
size = rel(rel_lines)
) +
theme_tufte() +
theme(
plot.title = element_text(
size = rel(rel_title),
hjust = 0.5,
margin = margin(t = margin.thing,
b = -margin.thing)
),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank()
)
## scatter plots
p_scatter_Mean.Variance <- d %>%
ggplot(aes(x = Mean,
y = Variance)) +
geom_point(
color = color_scatter,
shape = "bullet",
alpha = 0.85,
size = rel(rel_dots)
) +
theme_tufte() +
theme(
axis.title = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()
)
p_scatter_Mean.Skewness <- d %>%
ggplot(aes(x = Mean,
y = Skewness)) +
geom_hline(
yintercept = 0,
alpha = 0.5,
color = "gray45",
linetype = "solid",
size = rel(rel_lines)
) +
geom_hline(
yintercept = c(-0.5, 0.5),
color = "gray45",
linetype = "dotted",
size = rel(rel_lines)
) +
geom_hline(
yintercept = c(-1, 1),
color = "gray45",
linetype = "dashed",
size = rel(rel_lines)
) +
geom_point(
color = color_scatter,
shape = "bullet",
alpha = 0.7,
size = rel(rel_dots)
) +
theme_tufte() +
theme(axis.title = element_blank())
p_scatter_Variance.Skewness <- d %>%
ggplot(aes(x = Variance,
y = Skewness)) +
geom_hline(
yintercept = 0,
alpha = 0.5,
color = "gray45",
linetype = "solid",
size = rel(rel_lines)
) +
geom_hline(
yintercept = c(-0.5, 0.5),
color = "gray45",
linetype = "dotted",
size = rel(rel_lines)
) +
geom_hline(
yintercept = c(-1, 1),
color = "gray45",
linetype = "dashed",
size = rel(rel_lines)
) +
geom_point(
color = color_scatter,
shape = "bullet",
alpha = 0.7,
size = rel(rel_dots)
) +
theme_tufte() +
theme(
axis.title = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank()
)
## Putting plots together
p_pairplots <-
(p_mean + p_cor_Mean.Variance + p_cor_Mean.Skewness) /
(p_scatter_Mean.Variance + p_variance + p_cor_Variance.Skewness) /
(p_scatter_Mean.Skewness + p_scatter_Variance.Skewness + p_skewness) &
# plot_layout(guides = "collect") &
plot_annotation(theme = theme(
plot.title = element_text(family = "Merriweather Regular"),
axis.text = element_text(size = rel(rel_title) / 2)
))
## Returning the 3x3 pairplots
p_pairplots
}
The function plot_dataset.profile
gets a simulated object (generated by do_sim_parallel
) and combines individual histograms and pairplots in a single figure.
Click to expand the code
plot_dataset.profile <- function(sim.object,
p.colors = NULL,
title_l2.dist = FALSE) {
d_sim <- sim.object$sim.Dataset
l2.distr <- ifelse(sim.object$l2.dist == "Chi2",
"$\\chi^2$",
"Gaussian")
model <- sim.object$Model
title <- model
binwidth <- 0.5
if (model == "NAR") {
title <- "AR"
palette_color <- palette_nar
}
if (model == "Chi2AR") {
title <- "$\\chi^2$AR"
palette_color <- palette_chiar
}
if (model == "BinAR")
palette_color <- palette_binar
if (model == "PoDAR")
palette_color <- palette_podar
if (is.null(p.colors))
p.colors <- palette_color
title <- paste0(title,
"(1)",
" $\\phantom{\\chi^2}$")
if (title_l2.dist)
title <- paste0(l2.distr,
"-distributed means",
" $\\phantom{\\chi^2}$")
p_out <- plot_histograms(
d = d_sim,
binwidth = binwidth,
nrow.facet = 10,
p.colors = p.colors
) +
plot_spacer() +
plot_pairplots(d = d_sim,
p.colors = p.colors) +
plot_layout(widths = c(1, 0.05, 1)) +
plot_annotation(
title = TeX(title),
theme = theme(
plot.title = element_text(size = rel(2.5),
family = "CMU Serif"),
plot.subtitle = element_blank()
)
)
p_out %>% wrap_elements()
}
We also write the function save_dataset_profile
to put together two dataset profile plots to get to plots similar to those in Figures S1, S2, S4, and S5 in the Supplemental Materials, and save the final figure as a PDF file.
Click to expand the code
save_dataset_profile <- function(upper,
lower,
file.name = "Dataset profile",
path = "figures",
title = NULL) {
p_profiles <-
(
plot_dataset.profile(upper, title_l2.dist = TRUE) /
plot_spacer() /
plot_dataset.profile(lower, title_l2.dist = TRUE)
) +
plot_layout(heights = c(1, 0.02, 1)) +
plot_annotation(theme = theme(
plot.title = element_text(
size = rel(4),
hjust = 0.5,
family = "CMU Serif"
),
plot.subtitle = element_blank()
))
ggsave(
paste0(file.name,
".pdf"),
p_profiles,
path = path,
width = 2 * 15,
height = 2.02 * 1.1 * 15,
units = "cm"
)
}
1.4.3 Making plots of estimation results
The functions here make the main figures of the paper, which include aspects of model fit (e.g., estimation bias and RMSE, and Type-I error rates, etc.).
The function plot_Model.x.Resid
makes a figure with two columns (for the models with fixed or random residual variance), that show the outcomes of a fit measure (e.g., bias) for all the four DGMs with a given level-2 distribution of means for different \(N\)s and \(T\)s.
Click to expand the code
plot_Model.x.Resid <- function(d_important,
which.parameter,
which.measure = "Positive error",
Means.dist = c("Gaussian", "Chi2"),
line.width = 1,
font.scale = 5) {
ddd <- d_important %>%
filter(parameter == which.parameter)
if (!grepl("error",
tolower(which.measure),
fixed = TRUE)) {
ddd$value <- ddd %>% pull(which.measure)
ddd <- ddd %>% filter(sign.X.sig == "Zero")
y.axis <- which.measure
}
if (grepl("positive",
tolower(which.measure),
fixed = TRUE)) {
which.measure <- "Type-1 error"
ddd$value <- ddd %>% pull(which.measure)
ddd <- ddd %>% filter(sign.X.sig == "Positive")
y.axis <- "Positive error"
}
if (grepl("negative",
tolower(which.measure),
fixed = TRUE)) {
which.measure <- "Type-1 error"
ddd$value <- ddd %>% pull(which.measure)
ddd <- ddd %>% filter(sign.X.sig == "Negative")
y.axis <- "Negative error"
}
if (grepl("total",
tolower(which.measure),
fixed = TRUE)) {
which.measure <- "Type-1 error"
ddd$value <- 100 - (ddd %>% pull(which.measure))
ddd <- ddd %>% filter(sign.X.sig == "Zero")
y.axis <- "Total error"
}
y.range <- ddd$value %>% range()
y.range[1] <- min(-0.001, y.range[1])
y.range[2] <- max(0.001, y.range[2])
## Making sure the zero values in errors are not missing
ddd <- ddd %>%
filter(l2.dist == Means.dist) %>%
ungroup() %>%
complete(Resid, `Model name`, N, T,
fill = list(value = 0))
ddd$N <- as.factor(ddd$N)
ddd$T <- as.factor(ddd$T)
title <- ifelse(
Means.dist == "Gaussian",
TeX("Gaussian-distributed means $\\phantom{\\chi^2}$"),
TeX("$\\chi^2$-distributed means")
)
output.plot <- ddd %>%
ggplot() +
aes(
x = T,
y = value,
group = N,
color = N
) +
## draw solid y=0 axis line
geom_hline(
yintercept = 0,
linetype = "solid",
color = "black",
alpha = 1
) +
geom_line(size = rel(1),
alpha = 0.8,
lineend = "round") +
geom_point(size = rel(1.5), alpha = 1) +
scale_color_manual(values = brewer.pal(name = "PuRd", n = 9)[c(4, 6, 9)]) +
facet_grid(
rows = vars(`Model name`),
cols = vars(Resid),
labeller = label_parsed
) +
theme_light() +
theme_pubclean() +
scale_y_continuous(limits = y.range) +
ylab(y.axis) +
ggtitle(title) +
theme(
plot.title = element_text(
family = "CMU Serif"),
panel.spacing = unit(0.7, "lines"),
legend.position = "bottom",
legend.key = element_rect(colour = NA, fill = NA),
text = element_text(size = 10,
family = "CMU Serif")
)
if (which.measure == "Type-1 error") {
output.plot <- ddd %>%
ggplot() +
aes(
x = T,
y = value,
group = N,
color = N
) +
## draw solid y=0 axis line
geom_hline(
yintercept = 0,
linetype = "solid",
color = "black",
alpha = 1
) +
## manually add needed grids
geom_hline(
yintercept = c(10, 20, 40, 60),
linetype = "dotted",
color = "#BEBEBE",
alpha = 1
) +
geom_line(size = rel(1),
alpha = 0.8,
lineend = "round") +
geom_point(size = rel(1.5), alpha = 1) +
scale_color_manual(values = brewer.pal(name = "PuRd", n = 9)[c(4, 6, 9)]) +
facet_grid(
rows = vars(`Model name`),
cols = vars(Resid),
labeller = label_parsed
) +
theme_light() +
theme_pubclean() +
scale_y_continuous(breaks = c(2.5, 10, 20, 40, 60, 80, 90, 100),
## looked up myself to assure + and - ranges are equal
limits = c(0, 61)) +
ylab(y.axis) +
ggtitle(title) +
theme(
plot.title = element_text(
family = "CMU Serif"),
panel.spacing = unit(0.7, "lines"),
legend.position = "bottom",
legend.key = element_rect(colour = NA, fill = NA),
legend.key.width = unit(10, "mm"),
text = element_text(size = 10,
family = "CMU Serif"),
## remove grid lines
panel.grid.major.y = element_blank()
)
## add thresholds
if (y.axis == "Total error")
output.plot <- output.plot +
geom_hline(yintercept = 5,
linetype = "dashed",
alpha = 0.7)
if (y.axis != "Total error")
output.plot <- output.plot +
geom_hline(yintercept = 2.5,
linetype = "dashed",
alpha = 0.7)
}
output.plot <- output.plot +
geom_hline(yintercept = 0,
alpha = 0.6)
return(output.plot)
}
The function plot_quadrants
makes a larger figure that includes four quadrants, each generated with plot_Model.x.Resid
., that show the main results of the paper.
Click to expand the code
plot_quadrants <- function(d_important,
which.parameter = "Correlation",
upper.measure = "Bias",
lower.measure = "RMSE",
line.width = 2,
font.scale = 8) {
p.upper.left <- plot_Model.x.Resid(d_important,
which.parameter,
upper.measure,
"Gaussian",
line.width,
font.scale) + xlab(NULL)
p.upper.right <- plot_Model.x.Resid(d_important,
which.parameter,
upper.measure,
"Chi2",
line.width,
font.scale) + xlab(NULL) + ylab(NULL)
p.lower.left <- plot_Model.x.Resid(d_important,
which.parameter,
lower.measure,
"Gaussian",
line.width,
font.scale) + ggtitle(NULL)
p.lower.right <- plot_Model.x.Resid(d_important,
which.parameter,
lower.measure,
"Chi2",
line.width,
font.scale) + ggtitle(NULL) + ylab(NULL)
p.patchwork <-
(p.upper.left | p.upper.right) / (p.lower.left | p.lower.right)
title <- paste(upper.measure,
"and",
lower.measure,
"in the estimated",
tolower(which.parameter))
if (which.parameter == "Fixed.Phi")
title <- "Level-2 $\\phi$"
if (grepl("error", tolower(upper.measure), fixed = TRUE)) {
title <- paste("Type-I error rates in the estimated",
tolower(which.parameter))
}
p.final <- p.patchwork +
plot_layout(guides = "collect") +
plot_annotation(
theme = theme(plot.title =
element_text(
size = 20,
family = "CMU Serif",
hjust = 0.5
))) &
theme(legend.position = "bottom")
return(p.final)
}
1.4.4 Making plots of parameter estimates coverage
The function plot_caterpillar
, for a given condition and a given parameter, makes plots of 95% confidence intervals (as vertical bars) of all converged replications within that condition. The lines are ordered based on the point estimates and are colored based on whether they crossed zero (the correct estimate) or whether the whole 95% interval was bellow or above zero (significantly negative or positive estimates).
Click to expand the code
plot_caterpillar <- function(d_abridged,
l2.dist_ = "Chi2",
Model_ = "PoDAR",
analysis.type = "resid.fixed",
parameter_ = "covariance",
legend.key.width = 5,
legend.line.width = 10) {
title <- ifelse(
analysis.type == "resid.fixed",
"fixed residual variance",
"random residual variance"
) %>%
paste("Modeled with", .)
dd <- d_abridged %>%
filter(l2.dist == l2.dist_,
Model == Model_) %>%
filter(tolower(parameter) == tolower(parameter_)) %>%
filter(type == analysis.type) %>%
select(type,
N,
T,
l2.dist,
Model,
est,
sig,
lower_2.5ci,
upper_2.5ci
) %>%
group_by(N, T, type, Model, l2.dist) %>%
mutate(sign.X.sig = as.factor(sign(est)*sig)) %>%
# mutate(sign.X.sig = factor(sign.X.sig, levels = c("Negative Type-I error",
# "Non-significant estimates",
# "Positive Type-I error"))) %>%
mutate(mean.est = mean(est),
n.datasets = n(),
nonconverged.percent = round(100*(1000-n())/1000,2)) %>%
group_by(sign.X.sig,
.add = TRUE) %>%
mutate(error.percents = round(100*n()/n.datasets,2)) %>%
arrange(N, T,
.by_group = TRUE) %>%
mutate(ordering = est # + 100*sign(est)*sig
) %>%
arrange(ordering) %>%
ungroup() %>%
group_by(type,
N,
T,
l2.dist,
Model) %>%
mutate(ord = order(ordering) - n()/2) %>%
mutate(NN = as.factor(paste("N =", N)),
TT = as.factor(paste("T =", T)))
levels(dd$sign.X.sig) <- c("Significant negative estimates",
"Non-significant estimates",
"Significant positive estimates")
y.range <- c(min(dd$lower_2.5ci), max(dd$upper_2.5ci))
dd %>%
mutate(NN = as.factor(paste("N =", N)),
TT = as.factor(paste("T =", T))) %>%
ggplot() +
aes(x = ord,
color = sign.X.sig) +
geom_segment(
aes(
x = ord,
xend = ord,
y = upper_2.5ci,
yend = lower_2.5ci
),
alpha = 0.8 + 0.2,
size = rel(0.06)
) +
geom_segment(
aes(
x = ord,
xend = ord,
y = est + min(0.01, 0.05 * abs(upper_2.5ci - lower_2.5ci)),
yend = est - min(0.01, 0.05 * abs(upper_2.5ci - lower_2.5ci))
),
alpha = 0.8 + 0.2,
color = "white",
size = rel(0.06)
) +
geom_hline(
yintercept = 0,
linetype = "solid",
size = rel(0.15),
color = "black",
alpha = 0.8
) +
theme_pubclean() +
guides(colour = guide_legend(override.aes = list(
linewidth = rel(1.5),
alpha = 1
))) +
scale_color_manual(values =
c("#F67E4BFF",
"#98CAE1FF",
"#364B9AFF")) +
scale_y_continuous(limits = y.range) +
facet_grid(rows = vars(NN),
# rows = vars(rev(TT)),
cols = vars(TT)) +
ggtitle(title) +
theme(
legend.background = element_rect(colour = NA, fill = NA),
legend.key = element_rect(colour = NA, fill = NA),
legend.title = element_blank(),
axis.title.x = element_blank(),
axis.ticks.x = element_blank(),
# "Score",
axis.text.x = element_blank(),
legend.key.width = unit(legend.key.width * 2, "mm"),
axis.text.y = element_text(size = 10),
text = element_text(family = "CMU Serif",
size = 12)
) +
xlab(NULL) +
ylab(paste("Estimated", parameter))
}
1.4.5 Making plots for DGM time series
A function was also made to make profiles of individual time series, that include the time series itself, its marginal distribution, and its sample ACF.
Click to expand the code
plot_dgm.profile <- function(d,
p.colors = colors.nar.khaki,
Model.name = "NAR(1)",
burnin_proportion = 0.9){
binwidth_ <- 0.5
## To make sure the plot title has the same height as Chi2AR model
Model.name <- paste("$\\phantom{\\chi^2}$",
Model.name,
"$\\phantom{\\chi^2}$")
d$obj.id <- d$obj.id %>% factor(unique(d$obj.id))
d <- d %>%
filter(t > max(d$t)*burnin_proportion) %>%
mutate(t = t - burnin_proportion*max(t))
acf.lag.max <- 10
d_acf <- d %>%
group_by(obj.id) %>%
summarize(lag = stats::acf(value,
lag.max = acf.lag.max,
plot = FALSE)$lag %>% as.numeric(),
acf = stats::acf(value,
lag.max = acf.lag.max,
plot = FALSE)$acf %>% as.numeric()
)
empirical.rho <- d_acf %>%
filter(lag == 1)
empirical.rho <- empirical.rho[rep(1:nrow(empirical.rho),
times = acf.lag.max*20 + 1), ]
e.l <- rep(seq(0, acf.lag.max, 0.05),
each = 3)
empirical.rho$lag <- e.l
empirical.rho$acf <- empirical.rho$acf^empirical.rho$lag
# time series plot
p_ts <- d %>%
ggplot(aes(x = t,
y = value,
group = obj.id)) +
geom_line(aes(color = obj.id),
alpha = 1,
size = 1) +
geom_hline(aes(yintercept = Mean),
size = 0.75,
linetype = "dashed",
color = "honeydew3") +
ggtitle("Time series") +
xlab("t") +
theme(
axis.title.y = element_blank(),
legend.position = "none", # c(0.1,0.95),
legend.background = element_rect(colour = NA, fill = NA),
axis.title.x = element_text(size = 17,
family = "Merriweather Regular"),
axis.text = element_text(size = 20,
family = "Merriweather Regular"),
legend.text.align = 0,
legend.title = element_blank(),
legend.key = element_rect(colour = NA,
fill = NA),
panel.background = element_blank()) +
scale_colour_manual(values = p.colors,
labels = unname(TeX(unique(as.character(d$obj.id))))
)
p_hist <- d %>%
ggplot(aes(x = value,
group = obj.id,
fill = obj.id)) +
geom_histogram(aes(y=..ndensity..),
center = 0,
binwidth = binwidth_) +
facet_wrap(~obj.id,
nrow = 3) +
geom_hline(yintercept = 0,
size = 0.5) +
geom_segment(aes(x = Mean,
xend = Mean,
y = 0,
yend = 1),
size = 0.75,
linetype = "dashed",
color = "honeydew3") +
theme_tufte() +
ggtitle("Marginal distribution") +
xlab("X") +
scale_y_continuous(limits = c(min(d_acf$acf), 1)) +
theme(strip.background = element_blank(),
strip.text = element_blank(),
axis.title.x = element_text(size = 17,
family = "Merriweather Regular"),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "none",
legend.text.align = 0,
legend.title = element_blank(),
legend.key = element_rect(colour = NA,
fill = NA),
panel.background = element_blank()) +
scale_fill_manual(values = p.colors,
unname(TeX(unique(as.character(d$obj.id))))
)
### ACF plots
d_acf <- d %>%
group_by(obj.id) %>%
summarize(lag = stats::acf(value,
lag.max = acf.lag.max,
plot = FALSE)$lag %>% as.numeric(),
acf = stats::acf(value,
lag.max = acf.lag.max,
plot = FALSE)$acf %>% as.numeric(),
empirical.rho = stats::acf(value,
lag.max = acf.lag.max,
plot = FALSE)$acf[2] %>% as.numeric())
p_acf <- d_acf %>%
ggplot(aes(x = lag,
y = acf,
color = obj.id)) +
geom_segment(aes(x = lag,
xend = lag,
y = 0,
yend = acf
),
size = 1.5,
lineend = "butt") +
# exponential fit
geom_line(data = empirical.rho,
aes(x = lag,
y = acf),
linetype = 2,
color = "honeydew3",
size = 0.75) +
facet_wrap(~obj.id,
nrow = 3) +
geom_hline(aes(yintercept = 0),
size = 0.5) +
ggtitle("Sample ACF") +
theme_tufte() +
theme(strip.background = element_blank(),
strip.text = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 17,
family = "Merriweather Regular"),
legend.position = "none",
legend.text.align = 0,
legend.title = element_blank(),
legend.key = element_rect(colour = NA,
fill = NA)
) +
scale_x_continuous(breaks = seq(0, acf.lag.max, 2)) +
xlab("Lag") +
ylab("Autocorrelation function") +
theme(panel.background = element_blank()) +
scale_color_manual(values = p.colors,
unname(TeX(unique(as.character(d$obj.id))))
)
p_profile.main <- p_ts + plot_spacer() + p_hist + plot_spacer() + p_acf +
plot_layout(widths = c(4, 0.05, 1.5, 0.05, 2)) &
theme(
# legend.key.width = unit(2, "line"),
# legend.text=element_text(size=10),
# legend.position = "none",
axis.text = element_text(size = 17,
family = "Merriweather Regular"),
plot.title = element_text(size = 15+5,
family = "Merriweather Regular")
)
p_out <- p_profile.main
return(p_out)
}
2 Pipeline
We implemented each of these tasks in separate functions that were essentially wrapper functions (with parallel-computing implementation) around the modular components. Using these wrapper functions, each replication of each simulated condition was saved in a separate .rds
file. These data files were fed to the analysis wrapper function whose output was saved in separate .rds
data files. To collect relevant parameter estimates, another wrapper function was used to read the data files and save the desired parameters in a dataframe, which then used in reporting.
2.1 Book-keeping
The outcomes of the components are saved in separate .rds
files and indexed by unique, descriptive names, and each replication is given a unique numeric identifier that is also used as the random seed used to generate the dataset within each replication. The file names and address are stored in two dataframes along with model parameters used to generate each dataset, and these dataframes are used when reading and writing data files in other components.
The function make_sim_refs
makes a table consisting of all possible combinations of each conditions (that are provided as a list conditions
), and replicates it Reps
times (number of replications, \(R = 1000\)). Then, to make sure that each simulated dataset is produced with a unique random seed, the function generates a unique uSeed
based on the number (and values) of conditions and the replication number. Furthermore, an initial simulation seed (simSeed
) is included in generation of uSeed
which makes it possible to have different batches of simulations (for instance, if one wants to run the simulation for another 1000 times).
For each condition, a unique seed is generated by weighting different conditions (that are indexed in a vector d.integer
) by prime numbers (that are not among the prime factors of the number of conditions) and summing them up (which is done with dot-producting d.integer %*% primes.seq
). Then, to make unique seeds per replications, the replication number is concatenated to the left side of d.integer %*% primes.seq
, and everything is put into a dataframe (called d.headers
, that is eventually returned by the function) with unique file names for individual .rds
files (that include uSeed
as well as the replication number and the value of condition used in data generation, e.g., sim_uSeed-13293_l2.dist-Chi2_Model-BinAR_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1
) and the address of the folder in which the .rds
files were stored. Finally, if desired, the simulation reference table d.headers
is saved as .csv
and .rds
.
Click to expand the code
make_sim_refs <-
function(conditions = list(T = c(30, 100),
N = c(100),
Model = c("BinAR", "Chi2AR", "DAR"),
l2.dist = c("Gaussian", "Chi2"),
phi = c(0.4)
),
Reps = 100,
simSeed = 0,
save.directory = "self-sim",
save.refs.filename = paste("sim_refs",
Sys.Date()),
prefix.sim.datasets = "sim_"){
dir.create(save.directory, showWarnings = FALSE)
# allow Reps to be used as a vector of indexes
if(length(Reps)<2) Reps = seq_len(Reps)
# save global seed of the global env and set it back before leaving
seed.old <- .Random.seed
on.exit({
.Random.seed <<- seed.old
})
# sorting conditions alphabetically
conditions <- conditions[order(names(conditions))]
conditions$simSeed <- simSeed
conditions$Rep <- Reps
n.conditions <- length(conditions)
# making the first columns of the data frame
d <- conditions %>%
expand.grid(stringsAsFactors = TRUE)
# transforming factors to numerics
d.numeric <- d
factor.columns <- sapply(d.numeric, is.factor)
d.numeric[factor.columns] <-
sapply(d.numeric[factor.columns], as.numeric)
# getting rid of non-integers and Rep
d.integer <- d.numeric[, -n.conditions] %>%
apply(2,
function(x)
x %>%
as.character() %>%
gsub("\\.", "", .) %>%
as.numeric())
# to make unique seeds, we must sum conditions weighted by prime numbers
# but the primes must not be among prime factors of conditions
# conditions prime factors
cpfs <- d.integer %>%
unlist() %>%
as.numeric() %>%
unique() %>%
primes::prime_factors() %>%
unlist() %>%
unique()
primes.seq <- c(cpfs,
primes::generate_n_primes(ncol(d.integer) + length(cpfs)))
primes.seq <-
primes.seq[!(duplicated(primes.seq) |
duplicated(primes.seq, fromLast = TRUE))]
uSeed <- d.integer %*% primes.seq %>%
as.character() %>%
paste0(as.character(d.numeric$Rep),
.)
d.headers <- d %>%
mutate(
uSeed = uSeed,
sim.Path = save.directory,
sim.File = NA
) %>%
relocate(uSeed, .before = 1)
for (r in 1:nrow(d.headers)) {
only.headers <- d.headers %>%
select(-sim.Path:-sim.File) %>%
colnames()
r.values <- d.headers[r, only.headers]
factor.columns <- sapply(r.values, is.factor)
r.values[factor.columns] <-
sapply(r.values[factor.columns], as.character)
d.headers[r, "sim.File"] <- only.headers %>%
paste0("-") %>%
paste0(r.values) %>%
paste(collapse = "_") %>%
paste0(".rds") %>%
paste0(prefix.sim.datasets, .) # prefix for simulated datasets
}
d <- d.headers
# getting rid of factors and making them into strings
factor.columns <- sapply(d.headers, is.factor)
d.headers[factor.columns] <-
sapply(d[factor.columns], as.character)
# making uSeed numeric
d.headers$uSeed <- d.headers$uSeed %>% as.numeric()
# Save the references dataframe to a file, if desired
if(!is.null(save.refs.filename)){
saveRDS(d.headers,
file = here::here(save.directory, paste0(save.refs.filename,
".rds"))
)
write.csv(d.headers,
file = here::here(save.directory, paste0(save.refs.filename,
".csv")),
row.names = FALSE)
}
# return the reference dataframe
return(d.headers)
}
Note that initially, we started by having two values for the length of the time series (\(T = 30\) corresponding to one month of measurements, and \(T = 100\) for a reasonably long time series) and one sample size (\(N = 100\), and subsample the datasets for \(N = 25\) and \(N = 50\)), and we decided to include five DGMs in the simulations: the AR(1), \(\chi^2\)AR(1), BinAR(1), PoDAR(1) models, and a DAR(1) model with binomial marginal distribution). Eventually (and midway through the simulations), we decided to omit the \(T = 30\) condition and the DAR(1) model from our simulation. However, as omitting these would have led to change in the calculations of uSeed
, we decided to keep it the simulation reference table intact and omit Model == "DAR"
and T == 30
later on (see Section 2.3 for details).
Further note that we only generated datasets with \(N = 100\) and \(T = 100\), and sub-sampled other sample sizes and time series lengths (\(N = 25, 50\) and \(T = 25, 50\)) from them. Although this had computational benefits (less simulation time, and less storage required), the main motivation behind this decision was to be able to mimic empirical data collection: Sub-sampling the large datasets would be equated with collecting less data than the ideal case in which we have many participants with many measurements (so sub-sampling what could have been collected). Furthermore, comparing the datasets with the same DGM and level-2 distribution would be more meaningful: All datasets with same Rep
value within the same DGM (say, PoDAR(1) model) and with the same level-2 distribution (say, \(\chi^2\) distribution) have the same uSeed
, thus they can be matched. Consequently, the differences we observe for different \(N\)s and \(T\)s within each cell of the figures reported in the paper can mimic the effect of “non-ideal” sampling.
Changing the conditions (omitting Model == "DAR"
and T == 30
from the simulation reference dataset, and adding additional conditions for N
and T
) is done when running the pipeline (see Section 2.3) and the sub-sampling is implemented when running the analysis in do_fit_doFuture
(see Section 2.2).
Another dataframe was generated to include the references of analyses results per simulation condition and analysis condition (or hyperparameters), and file names and directory of individual .rds
analysis outputs. The hyperparameters included the analysis type (with fixed vs. random residual variance) and the number of iterations and thinning in the MCMC algorithm, which were then used as arguments of run_MplusAutomation
.
Click to expand the code
make_fit_refs <-
function(sim_refs,
hyperparameters = list(iter = c(2000),
thin = c(5),
type = c("resid.random",
"resid.fixed")),
save.directory = "self-sim",
save.refs.filename = paste("fit_refs",
Sys.Date()),
prefix.fit.datasets = "fit_"){
dir.create(save.directory, showWarnings = FALSE)
h <- hyperparameters %>%
expand.grid(stringsAsFactors = TRUE)
dd <- sim_refs[rep(1:nrow(sim_refs), times = nrow(h)),]
rownames(dd) <- NULL
hh <- h[rep(1:nrow(h), each = nrow(sim_refs)),]
rownames(hh) <- NULL
d <- cbind(dd,hh)
d <- d %>%
mutate(fit.Path = save.directory,
fit.File = NA
)
only.headers <- names(hyperparameters) %>%
c("uSeed",
"N",
"T",
.,
"Rep")
only.headers <- only.headers[!(only.headers %in% c("iter", "thin"))]
for (r in 1:nrow(d)) {
r.values <-d[r, only.headers]
factor.columns <- sapply(r.values, is.factor)
r.values[factor.columns] <-
sapply(r.values[factor.columns], as.character)
d[r, "fit.File"] <- only.headers %>%
paste0("-") %>%
paste0(r.values) %>%
paste(collapse = "_") %>%
paste0(".rds") %>%
paste0(prefix.fit.datasets, .) # prefix for fitted datasets
print(r)
}
if(!is.null(save.refs.filename)){
saveRDS(d,
file = here::here(save.directory,
paste0(save.refs.filename,
".rds"))
)
write.csv(d,
file = here::here(save.directory,
paste0(save.refs.filename,
".csv")),
row.names = FALSE)
}
return(d)
}
2.2 Pipeline functions
We implemented the components described in Section 1 using the following functions.
The function do_sim_parallel
is a wrapper around make_datasets
(see Section 1.1.3) and uses the clusterApplyLB
function from the package snow
(Tierney et al. 2021) for parallelization. This wrapper gets the dataframe of simulation file references (sim_refs
, which is the output of make_sim_refs
), and for each of its rows, simulates an \(N \times T\) dataframe and stores it with other information in the same row—and also additional info about the time the simulating the dataset started and ended—in a list, and saves each list as a separate .rds
file in the target folder (specified with sim.Path
). Having the additional data in the saved files allow using them independent from the table of references. Note that to prevent the overload of the hard drive with many parallel write requests (to save the simulated datasets), for the first nClust
rounds, using the Sys.sleep
function, the data generation is started after a varying amount of delay. Given that every dataset simulation in the first round may take different amount of times, after the first round is complete, the simulations on different cores fall out of sync and there is no more need for adding delays for the next rounds.
Click to expand the code
do_sim_parallel <-
function(sim_refs,
nClust = 48,
save.directory = "self-sim",
alternative.sim.Path = NULL,
clusterLOG.filename = paste0("sim_clusterLOG_",
Sys.Date(),
".txt"),
sleeptime = 1) {
cl <- snow::makeSOCKcluster(nClust,
outfile = here::here(save.directory,
clusterLOG.filename))
debug <- TRUE
d <- sim_refs
## Start clusters:
snow::clusterExport(cl,
c("d",
"alternative.sim.Path",
"debug"),
envir = environment())
## Get the timings
t.snow <- snow::snow.time({
## Run the cluster
snow::clusterApplyLB(cl = cl,
seq_len(nrow(d)),
function(i) {
if (i <= nClust)
Sys.sleep(sleeptime * i)
d_i <- as.list(d[i,])
arguments <-
as.list(d_i[2:(length(d_i) - 4)])
arguments$seed <- d_i$uSeed
sim.Path <-
ifelse(is.null(alternative.sim.Path),
d_i$sim.Path,
alternative.sim.Path)
sim.StartTime <- Sys.time()
output_make_datasets <-
do.call(make_datasets,
arguments)
if(is.data.frame(output_make_datasets)){
d_i$sim.Dataset <- output_make_datasets
}else{
# Adapting for v2 implementation
d_i$sim.Dataset <- output_make_datasets$dataset
d_i$given.Means <- output_make_datasets$Means
d_i$given.Phis <- output_make_datasets$Phis
}
d_i$sim.StartTime <- sim.StartTime
d_i$sim.EndTime <- Sys.time()
d_i$sim.ElapsedTime <-
d_i$sim.EndTime - d_i$sim.StartTime
saveRDS(d_i,
file = here::here(sim.Path,
d_i$sim.File))
})
})
## Stop the cluster:
snow::stopCluster(cl)
return(t.snow)
}
The function do_fit_doFuture
is a wrapper around run_MplusAutomation
(see Section 1.2) and uses the package doFuture
(Bengtsson 2020) as a parallelization backend to plyr::a_ply
. The function gets a dataframe containing references of analyses results (fit_refs
, which is the output of make_fit_refs
), and row by row reads the simulated datasets, sub-samples the data if \(N < 100\) or \(T < 100\) (see the second note under make_sim_refs
in Section 2.1) and runs the analysis based on the hyperparameters specified in fit_refs
. It then saves the outcome of the analysis (the Mplus model object produced by MplusAutomation::mplusModeler
and returned by run_MplusAutomation
), which includes Mplus outputs and parameter estimates) with additional information (other values in the same row of fit_refs
as well as timings) in separate .rds
files in the target folder. Note that, again, to prevent the overload of the hard drive with many parallel read and write requests (to read the simulated datasets and save the Mplus outputs), for the first nClust
rounds, using the Sys.sleep
function, the data analysis is started after a varying amount of delay. Given that every analysis instance in the first round take different amount of times, after the first round is complete the analyses on cores fall out of sync and there is no more need for adding delays for the next rounds.
Click to expand the code
do_fit_doFuture <- function(fit_refs,
nClust = 48,
nPROC = 1,
save.directory = "self-sim",
alternative.fit.Path = NULL,
# model_what = "resid.random",
clusterLOG.filename = paste0("fit_clusterLOG_",
Sys.Date(),
".txt"),
sleeptime = 1,
version = "v2") {
## To make sure the required functions are loaded on each cluster
library(tidyverse)
source(here::here("scripts",
"analysis-component.R"))
debug <- TRUE
d <- fit_refs
registerDoFuture()
plan("multisession")
plyr::a_ply(d,
1,
function(d_i) {
if (i <= nClust)
Sys.sleep(sleeptime * i)
d_i <- as.list(d_i)
fit.StartTime <-
Sys.time()
sim.file <- readRDS(file = here::here(d_i$sim.Path,
d_i$sim.File))
df <- sim.file$sim.Dataset %>%
filter(subject <= d_i$N,
t <= d_i$T)
# Adapting for v2 implementation
df_sum <- df %>%
group_by(subject) %>%
summarise(sample.Means = mean(x),
sample.Variances = var(x),
sample.Skewnesses = moments::skewness(x))
file.name <-
gsub(".rds", "", d_i$fit.File)
print(paste(file.name,
"started at",
fit.StartTime))
tryRes <-
try(output.fit <-
run_MplusAutomation(
df = df,
PROCESSORS = nPROC,
BITERATIONS.min = d_i$iter,
THIN = d_i$thin,
model_what = d_i$type,
file.name = file.name
))
d_i$fit.Dataset <-
tryRes
# Adapting for v2 implementation
d_i$given.Means <- sim.file$Means %>% head(d_i$N)
d_i$given.Phis <- sim.file$Phis %>% head(d_i$N)
# Adapting for v2 implementation
d_i$sample.Means <- df_sum$sample.Means
d_i$sample.Variances <- df_sum$sample.Variances
d_i$sample.Skewnesses <- df_sum$sample.Skewnesses
d_i$fit.StartTime <-
fit.StartTime
d_i$fit.EndTime <-
Sys.time()
d_i$fit.ElapsedTime <-
d_i$fit.EndTime - d_i$fit.StartTime
fit.Path <-
ifelse(is.null(alternative.fit.Path),
d_i$fit.Path,
alternative.fit.Path)
saveRDS(d_i,
file = here::here(fit.Path,
d_i$fit.File))
print(paste(file.name,
"finished at",
d_i$fit.EndTime))
},
.parallel = TRUE)
}
The function do_harvest_doFuture
is a wrapper around fit_extract
(see Section 1.3) and uses the package doFuture
as a parallelization backend to foreach
. It gets a list of .rds
files (that can be taken from or fit_refs
, or by using list.files
function in the directory in which the analysis results are saved) and returns a large dataframe including parameter estimates (and additional information) extracted by fit_extract
. Again, like in do_sim_parallel
and do_fit_doFuture
, the first nClust
files are read with varying amount of delay to prevent the overload of the hard drive by multiple read requests.
Click to expand the code
2.3 Pipeline code
The following is the code we used to run the whole pipeline, which consists of three chunk that are responsible of the following:
- Creating the book-keeping reference files (using functions described in Section 2.1).
- Running the whole simulation study (using
do_sim_parallel
anddo_fit_doFuture
described in Section 2.2). The codes in this chunk can be interrupted (deliberately or otherwise) and resumed by running it again. - Harvesting the parameter estimates and storing them in a single dataframe (using
do_harvest_doFuture
of Section 2.2). The resulting dataframe is then used to make the plots using in Section 3)
But first we need to set the directories in which the results of each chunk is saved:
## Root directory of simulation study files
dir_files <- "simulation-files"
## Where reference tables are saved
dir_references <- paste(dir_files,
"refs",
sep = "/")
## Where simulated datasets are saved
dir_simulation <- paste(dir_files,
"sim-files",
sep = "/")
## Where analysis output files are saved
dir_analysis <- paste(dir_files,
"fit-files",
sep = "/")
## Where harvested study results are saved
dir_harvest <- paste(dir_files,
"harvest-files",
sep = "/")
## where plots are saved
dir_plots <- "figures"
The following code makes sim_refs
and fit_refs
tables and make backups of them. In line 17 the rows with T != 100
and Model == "DAR"
are omitted, and with the codes in lines 22-31 we add other values of \(N\)s and \(T\)s (see the notes under make_sim_refs
in Section 2.1).
Click to expand the code
sim_refs_base <- make_sim_refs(
conditions = list(
T = c(30, 100),
N = c(100),
Model = c("BinAR",
"Chi2AR",
"DAR",
"PoDAR",
"NAR"),
l2.dist = c("Gaussian",
"Chi2"),
phi = c(0.4)
),
simSeed = 0,
Reps = 1000,
save.directory = dir_simulation
) %>%
filter(T == 100, Model != "DAR")
## Adding another values for N and T; see the text
sim_refs <- NULL
for (TT in c(25, 50, 100)) {
for (NN in c(25, 50, 100)) {
sim_refs <- sim_refs_base %>%
mutate(N = NN,
T = TT) %>%
rbind(sim_refs)
}
}
## Saving a backup of the dataframe of references of simulation files
saveRDS(sim_refs,
here::here(dir_references,
"sim-refs.rds"))
## Making the dataframe of analysis output reference files
fit_refs <- make_fit_refs(sim_refs = sim_refs,
save.directory = dir_analysis)
## Saving a backup of the dataframe of references of analysis output files
saveRDS(fit_refs,
here::here(dir_references,
"fit-refs.rds"))
Here we reads the reference tables sim_refs
and fit_refs
. We then make a list of already simulated files (that are located in the sim.Path
directory) and a list of analysis output files (that are located in the fit.Path
directory). Then, by comparing sim_refs
and fit_refs
with the the lists of simulated and analyzed files, we only simulate/analyze the datasets that have not been yet simulated/analyzed. This allows us resume the simulation study after an interruption to the R session.
Click to expand the code
# Simulation part ---------------------------------------------------------
## Reading the backup of the dataframe of references of simulation files
sim_refs <- readRDS(here::here(dir_references,
"sim-refs.rds"))
## Making a list of already simulated datasets
sim_refs_done <- list.files(here::here(dir_simulation),
pattern = "sim_uSeed-")
## Finding the datasets that are yet to be simulated
sim_refs_remaining <- sim_refs %>%
filter(!(sim.File %in% sim_refs_done))
## Simulating the remaining datasets
Sys.time()
system.time(
t.sim <- do_sim_parallel(sim_refs = sim_refs_remaining,
save.directory = dir_simulation)
)
Sys.time()
# Analysis part -----------------------------------------------------------
## Reading the backup of the dataframe of references of analysis output files
fit_refs <- readRDS(here::here(dir_references,
"fit-refs.rds"))
## Making a list of analysis output files
fit_files_done <- list.files(here::here(dir_simulation),
pattern = "fit_uSeed-")
## Finding the analyses that are yet to be done
fit_refs_remaining <- fit_refs %>%
filter(!(fit.File %in% fit_files_done))
## Running the analyses of remaining datasets in parallel
Sys.time()
system.time(
t.fit <- do_fit_doFuture(
fit_refs = fit_refs_remaining,
nClust = 48,
nPROC = 1,
sleeptime = 3,
save.directory = dir_analysis
)
)
Sys.time()
Finally, the analyzed datasets are read using do_harvest_doFuture
in parallel and stored in an .rds
file suffixed with date and time (to prevent over-writing the previously saved harvest files). Furthermore, using the function harvest_cleanup
(of Section 1.4.1), an abridged version (d_abridged
, including all results of all replications for the parameters of interest) and a more summarized version (d_important
, containing the outcome measures of interest per condition) of the raw harvest dataframe are made and saved in two .rds
files (again suffixed with date and time).
Click to expand the code
## Harvesting the results in parallel
l.files <- list.files(path = here(dir_analysis),
pattern = glob2rx("*.rds"))
fit.files <- l.files %>%
paste(here::here(dir_analysis),
.,
sep = "/")
list.files(here::here(dir_harvest),
pattern = "harvest-raw_") %>%
sort(decreasing = TRUE) %>%
paste(here::here(dir_harvest),
.,
sep = "/") %>%
readRDS()
Sys.time()
system.time(
harvest_raw <- do_harvest_doFuture(fit.files)
)
Sys.time()
saveRDS(harvest_raw,
here::here(dir_harvest,
paste0(
"harvest-raw_",
format(Sys.time(),
"%Y-%m-%d_%H-%M"),
".rds"
)))
d_abridged <- harvest_raw %>%
harvest_cleanup(return.abridged = TRUE)
saveRDS(d_abridged ,
here::here(dir_harvest,
paste0(
"harvest-abridged_",
format(Sys.time(),
"%Y-%m-%d_%H-%M"),
".rds"
)))
d_important <- harvest_raw %>% harvest_cleanup()
saveRDS(d_important ,
here::here(dir_harvest,
paste0(
"harvest-important_",
format(Sys.time(),
"%Y-%m-%d_%H-%M"),
".rds"
)))
3 Simulation results
We read the abridged harvested data file (harvest-important_***.rds
) and make the outcome plots using plot_Model.x.Resid
and plot_quadrants
functions (of Section 1.4.3 ) and the MCMC estimates coverage plots using the function plot_caterpillar
(of Section 1.4.4). To ensure small file size and good resolutions, all plots are saved in separate PDF files (which can easily be included in a \(\mathrm{\LaTeX{}}\) script) in a specific directory. Before going any further, we should read the abridged harvest dataframe and set the plot output folder.
d_abridged <- list.files(here::here(dir_harvest),
pattern = "harvest-abridged_") %>%
sort(decreasing = TRUE) %>%
paste(here::here(dir_harvest),
.,
sep = "/") %>%
read_rds()
d_important <- list.files(here::here(dir_harvest),
pattern = "harvest-important_") %>%
sort(decreasing = TRUE) %>%
paste(here::here(dir_harvest),
.,
sep = "/") %>%
read_rds()
3.1 Plots main results
The left and right panels (including results for Gaussian and \(\chi^2\) distributed means) are made for each outcome measure of interest using plot_Model.x.Resid
, combined using the package patchwork
, and saved as PDF files.
Click to expand the code
for (parameter in c("Correlation", "Covariance")) {
for (outcome in c("Bias",
"Variance",
"RMSE",
"MAE",
"Positive error",
"Negative error",
"Total error")) {
p.single.left <- plot_Model.x.Resid(d_important,
parameter,
outcome,
"Gaussian",
line.width = 2,
font.scale = 8)
p.single.right <- plot_Model.x.Resid(d_important,
parameter,
outcome,
"Chi2",
line.width = 2,
font.scale = 8) + ylab(NULL)
p.patchwork <- (p.single.left | p.single.right)
title <- paste(outcome,
"in the estimated",
tolower(parameter)
)
p.final <- p.patchwork +
plot_layout(guides = "collect") +
plot_annotation(
## removed the big title and subtitle here
# title = TeX(title),
theme = theme(plot.title =
element_text(size = 20,
family = "CMU Serif",
hjust = 0.5))
) &
theme(legend.position = "bottom")
if(grepl("error", tolower(outcome), fixed = TRUE)) p.final <- p.final +
scale_y_continuous(breaks = c(5, 10, 20, 40, 60, 80, 90, 100))
ggsave(paste0(parameter,
" (",
outcome,
").pdf"),
p.final,
path = dir_plots,
width = 21,
height = 21.5*30/37 - 1,
units = "cm")
}
}
The plots containing pairs of outcome measures (respectively including results for Gaussian and \(\chi^2\) distributed means) are made for each outcome measure of interest, combined using the package patchwork
, and saved as PDF files.
Click to expand the code
for (parameter in c("Correlation", "Covariance")) {
for (upper_lower in list(c("Bias", "RMSE"),
c("Bias", "Variance"),
c("Bias", "MAE"),
c("Positive error", "Negative error"),
c("Positive error", "Total error"),
c("Negative error", "Total error"))) {
upper <- upper_lower[1]
lower <- upper_lower[2]
p.q <- plot_quadrants(d_important,
parameter,
upper,
lower)
## Saving the plot
ggsave(
paste0(parameter,
" (",
upper,
" and ",
lower,
").pdf"),
p.q,
path = dir_plots,
width = 21,
height = 30,
units = "cm"
)
}
}
The generated plots are located in figures
folder and can be accessed via the following tables:
3.2 Plots of MCMC parameter estimates coverage
It should be noted that plot_caterpillar
requires the abridged dataset of harvested results (d_abridged
) that contain estimates of all replications within each condition.
Click to expand the code
# Making lists of level-2 distributions, model names, and analyses --------
list_l2.dist <- c("Gaussian",
"Chi2")
list_model <- c("NAR",
"Chi2AR",
"BinAR",
"PoDAR")
list_model.type <- c("resid.fixed",
"resid.random")
list_parameter <- c("Correlation",
"Covariance")
# Plots with 9 panels -----------------------------------------------------
for (single_parameter in list_parameter) {
cross3(list_model,
list_l2.dist,
list_model.type) %>%
plyr::l_ply(function(x){
single_model <- x[1]
single_model_name <- ifelse(single_model == "Chiar",
"$\\chi^2$AR(1)",
paste0(single_model, "(1)"))
single_l2.dist <- x[2]
single_l2.dist_name <- ifelse(single_l2.dist == "Chi2",
"$\\chi^2$",
single_l2.dist)
model_type <- x[3]
caterpillar_single <- plot_caterpillar(d_abridged,
l2.dist_ = single_l2.dist,
Model_ = single_model,
parameter_ = single_parameter,
analysis.type = model_type)
p.final <- caterpillar_single +
plot_layout(guides = "collect") +
plot_annotation(
theme = theme(plot.title =
element_text(size = 12,
family = "CMU Serif",
hjust = 0.5),
plot.subtitle =
element_text(size = 12,
family = "CMU Serif",
hjust = 0.5))
) &
theme(legend.position = "bottom")
ggsave(paste0("caterpillar-",
single_parameter,
"-",
single_model,
"-",
single_l2.dist,
"-",
model_type,
".pdf"),
p.final,
path = dir_plots,
width = 18,
height = 11,
units = "cm")
})
}
# Plots with 18 panels ----------------------------------------------------
cross3(list_model,
list_l2.dist,
list_parameter) %>%
plyr::l_ply(function(x){
single_model <- x[1]
single_model_name <- ifelse(single_model == "Chiar",
"$\\chi^2$AR(1)",
paste0(single_model, "(1)"))
single_l2.dist <- x[2]
single_l2.dist_name <- ifelse(single_l2.dist == "Chi2",
"$\\chi^2$",
single_l2.dist)
single_parameter <- x[3]
caterpillar_fixed <- plot_caterpillar(d_abridged,
l2.dist_ = single_l2.dist,
Model_ = single_model,
parameter_ = single_parameter,
analysis.type = "resid.fixed")
caterpillar_random <- plot_caterpillar(d_abridged,
l2.dist_ = single_l2.dist,
Model_ = single_model,
parameter_ = single_parameter,
analysis.type = "resid.random")
p.patchwork <- caterpillar_fixed / caterpillar_random
p.final <- p.patchwork +
plot_layout(guides = "collect") +
plot_annotation(
theme = theme(plot.title =
element_text(size = 12,
family = "CMU Serif",
hjust = 0.5),
plot.subtitle =
element_text(size = 12,
family = "CMU Serif",
hjust = 0.5))
) &
theme(legend.position = "bottom")
ggsave(paste0("caterpillar-",
single_parameter,
"-",
single_model,
"-",
single_l2.dist,
"-",
"both",
".pdf"),
p.final,
path = dir_plots,
width = 18,
height = 18,
units = "cm")
})
The generated plots are located in figures
folder and can be accessed via the following tables:
Model | Gaussian-distributed means | \(\chi^2\)-distributed means |
---|---|---|
AR(1) | ||
\(\chi^2\)AR(1) | ||
BinAR(1) | ||
PoDAR(1) |
Model | Gaussian-distributed means | \(\chi^2\)-distributed means |
---|---|---|
AR(1) | ||
\(\chi^2\)AR(1) | ||
BinAR(1) | ||
PoDAR(1) |
4 Additional figures and analyses
4.1 DGM time series plots
We make plots including time series plots, marginal distributions, and sample ACFs for three time series generated by each DGMs.
Click to expand the code
make_dgm_df <- function(obj.1,
obj.2,
obj.3 = NULL){
d.1 <- data.frame(t = 1:length(obj.1$x),
value = obj.1$x,
obj.id = obj.1$Model.Description.Short)
d.2 <- data.frame(t = 1:length(obj.2$x),
value = obj.2$x,
obj.id = obj.2$Model.Description.Short)
d <- rbind(d.1, d.2)
if(!is.null(obj.3)){
d.3 <- data.frame(t = 1:length(obj.3$x),
value = obj.3$x,
obj.id = obj.3$Model.Description.Short)
d <- rbind(d, d.3)
}
d <- d %>%
group_by(obj.id) %>%
mutate(Mean = mean(value))
return(d)
}
Click to expand the code
ts.length <- 5000
d.3.nar <- make_dgm_df(
dgm_nar(
phi = 0.4,
var.resid = 20,
T = ts.length,
Mean = 85,
seed = 1 + 9
),
dgm_nar(
phi = 0.8,
var.resid = 20,
T = ts.length,
Mean = 55,
seed = 2 + 4
),
dgm_nar(
phi = 0.4,
var.resid = 47,
T = ts.length,
Mean = 20,
seed = 3 + 2
)
)
profile_nar <- plot_dgm.profile(d.3.nar,
brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)],
"$AR(1)$")
d.3.chiar <- make_dgm_df(
dgm_generator(
Model = "ChiAR",
phi = 0.4,
nu = 25,
T = ts.length,
seed = 1 + 1
),
dgm_generator(
Model = "ChiAR",
phi = 0.7,
nu = 5,
T = ts.length,
seed = 2 + 5
),
dgm_generator(
Model = "ChiAR",
phi = 0.4,
nu = 1,
T = ts.length,
seed = 3 + 1
)
)
profile_chiar <- plot_dgm.profile(d.3.chiar,
brewer.pal(name = "GnBu", n = 9)[c(5, 7, 9)],
"$\\chi^2AR(1)$")
d.3.binar <- make_dgm_df(
dgm_generator(
Model = "BinAR",
k = 7,
alpha = 0.85,
phi = 0.45,
T = ts.length,
seed = 1 + 5
),
dgm_generator(
Model = "BinAR",
k = 7,
alpha = 0.85,
phi = 0.7,
T = ts.length,
seed = 2 + 6
),
dgm_generator(
Model = "BinAR",
k = 7,
alpha = 0.5,
phi = 0.45,
T = ts.length,
seed = 3
)
)
profile_binar <- plot_dgm.profile(d.3.binar,
brewer.pal(name = "YlGn", n = 9)[c(5, 7, 9)],
"$BinAR(1)$")
d.3.podar <- make_dgm_df(
dgm_generator(
Model = "PoDAR",
tau = 0.4,
Mean = 40,
T = ts.length,
seed = 1 + 4
),
dgm_generator(
Model = "PoDAR",
tau = 0.8,
Mean = 10,
T = ts.length,
seed = 2 + 4
),
dgm_generator(
Model = "PoDAR",
tau = 0.4,
Mean = 1.,
T = ts.length,
seed = 3 + 2
)
)
profile_podar <- plot_dgm.profile(d.3.podar,
brewer.pal(name = "BuPu", n = 9)[c(5, 7, 9)],
"$\\PoDAR(1)$")
Click to expand the code
ggsave(
"Profile NAR.pdf",
profile_nar,
path = dir_plots,
height = 10 * 1.8,
width = 21 * 2,
units = "cm"
)
ggsave(
"Profile Chi2AR.pdf",
profile_chiar,
path = dir_plots,
height = 10 * 1.8,
width = 21 * 2,
units = "cm"
)
ggsave(
"Profile BinAR.pdf",
profile_binar,
path = dir_plots,
height = 10 * 1.8,
width = 21 * 2,
units = "cm"
)
ggsave(
"Profile PoDAR.pdf",
profile_podar,
path = dir_plots,
height = 10 * 1.8,
width = 21 * 2,
units = "cm"
)
ggsave(
"Profile four DGMs.pdf",
profile_nar /
profile_chiar /
profile_binar /
profile_podar,
path = dir_plots,
height = 10 * 1.8 * 4,
width = 21 * 2,
units = "cm"
)
ggsave(
"Profile alternative DGMs.pdf",
profile_chiar /
profile_binar /
profile_podar,
path = dir_plots,
height = 10 * 1.8 * 3,
width = 21 * 2,
units = "cm"
)
4.2 Profiles of the simulated datasets
We read the first replication of simulated datasets with \(N = 100, T = 100\) and plot using the function plot_dataset.profile
(from Section 1.4.2).
Click to expand the code
# Reading simulated datasets and saving in lists --------------------------
dgm_datasets_gaussian.means <- list(
nar = "sim_uSeed-13330_l2.dist-Gaussian_Model-NAR",
chiar = "sim_uSeed-13297_l2.dist-Gaussian_Model-Chi2AR",
binar = "sim_uSeed-13286_l2.dist-Gaussian_Model-BinAR",
podar = "sim_uSeed-13319_l2.dist-Gaussian_Model-PoDAR"
) %>%
lapply(function(x) read_rds(paste0("simulation-files/sim-files/",
x,
"_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1.rds")))
dgm_datasets_chi2.means <- list(
nar = "sim_uSeed-13337_l2.dist-Chi2_Model-NAR",
chiar = "sim_uSeed-13304_l2.dist-Chi2_Model-Chi2AR",
binar = "sim_uSeed-13293_l2.dist-Chi2_Model-BinAR",
podar = "sim_uSeed-13326_l2.dist-Chi2_Model-PoDAR"
) %>%
lapply(function(x) read_rds(paste0("simulation-files/sim-files/",
x,
"_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1.rds")))
# Plot profiles per DGM ---------------------------------------------------
purrr::pwalk(
list(
dgm_datasets_gaussian.means,
dgm_datasets_chi2.means,
names(dgm_datasets_gaussian.means)
),
~ save_dataset_profile(..1,
..2,
paste0("profiles-dataset-", ..3))
)
4.3 COGITO data analysis
In the paper, we reported an analysis of distress time series in of the COGITO dataset Schmiedek, Lövdén, and Lindenberger (2010). Because we are not allowed to make the dataset public, here we provide the Mplus input script and the analysis output.
Click to expand the code
TITLE:
Multilevel AR(1) model of COGITO distress
DATA:
FILE = "cogito-na1-distress.dat";
VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);
ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 2;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);
MODEL:
%WITHIN%
phi | x ON x&1;
logv | x;
%BETWEEN%
x phi logv WITH x phi logv;
OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;
PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);
Click to expand the code
Mplus VERSION 8.6
MUTHEN & MUTHEN
09/28/2022 1:15 PM
INPUT INSTRUCTIONS
TITLE:
Multilevel AR(1) model of COGITO distress
DATA:
FILE = "cogito-na1-distress.dat";
VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);
ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 2;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);
MODEL:
%WITHIN%
phi | x ON x&1;
logv | x;
%BETWEEN%
x phi logv WITH x phi logv;
OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;
PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);
*** WARNING
One or more individual-level variables have no variation within a
cluster for the following clusters.
Variable Cluster IDs with no within-cluster variation
X 46 176 189 141 184 122 90 81 39 48 177 108 178 63 170 159 54 28 68 55 31 112
77 93 117 165 40 84
X&1 46 176 189 141 184 122 90 81 39 48 177 108 178 63 170 159 54 28 68 55 31 112
77 93 117 165 40 84
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
Multilevel AR(1) model of COGITO distress
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 32518
Number of dependent variables 1
Number of independent variables 1
Number of continuous latent variables 2
Observed dependent variables
Continuous
X
Observed independent variables
X&1
Continuous latent variables
PHI LOGV
Variables with special functions
Cluster variable SUBJECT
Within variables
X&1
Estimator BAYES
Specifications for Bayesian Estimation
Point estimate MEDIAN
Number of Markov chain Monte Carlo (MCMC) chains 2
Random seed for the first chain 0
Starting value information UNPERTURBED
Algorithm used for Markov chain Monte Carlo GIBBS(PX1)
Convergence criterion 0.500D-01
Maximum number of iterations 5000
K-th iteration used for thinning 5
Specifications for Bayes Factor Score Estimation
Number of imputed data sets 500
Iteration intervals for thinning 1
Input data file(s)
cogito-na1-distress.dat
Input data format FREE
SUMMARY OF DATA
Number of clusters 204
Size (s) Cluster ID with Size s
114 45
117 22
121 65 195
123 64
124 102 73
125 143
127 46 43 156 176 78
128 154 189 44
129 89 57 131
130 80 133 168 141 118 147
131 139 181 184 186 144 49
132 127 146 122 196
133 61 153 7
134 90 81 39
135 48 132
136 150 193 98 18
137 94 30 177 24 106
138 128 23 187 10 151 152 108
139 164 20
140 182 1
141 87 53 36 136 178 51 149
142 63 170 140 114 50 161 130 198 202
143 88 82 14 137
144 120 29 115 2 172 197 185 159
145 17 109 101 138
146 54 37
148 96
149 129 166 3
150 134 155 4 19 104 191
151 16 92 28
152 121 68 55 31
153 6
154 112 192
155 163 13 70
156 180 5 107 59
157 83 86 77 12 26 9 41
158 160
159 38
160 105 67 66 173 174 75
162 11
163 103 200 35
164 142 99
165 190 188
166 126 72 148 113
167 93
168 56
171 60
172 124
173 169 97 15 183 25
174 91
177 27 71
178 117
179 175
180 8 204
183 47 111
185 79
186 76 165
188 69
189 194
190 110
192 100 145 52 21
193 119
197 62 42
198 116
200 157
201 203
204 74 199
205 34
206 135
211 162
212 123
214 40
220 95
227 125
238 171
239 201
250 167
262 179 32
265 85
276 84
303 33
306 58
373 158
SUMMARY OF MISSING DATA PATTERNS
Number of missing data patterns 4
MISSING DATA PATTERNS (x = not missing)
1 2 3 4
X x x
X&1 x x
MISSING DATA PATTERN FREQUENCIES
Pattern Frequency Pattern Frequency Pattern Frequency
1 15032 3 5301
2 5505 4 6680
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value 0.100
PROPORTION OF DATA PRESENT
Covariance Coverage
X
________
X 0.632
UNIVARIATE SAMPLE STATISTICS
UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS
Variable/ Mean/ Skewness/ Minimum/ % with Percentiles
Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median
X 0.892 1.843 0.000 62.19% 0.000 0.000 0.000
20537.000 2.101 3.178 7.000 0.84% 0.000 2.000
WARNING: PROBLEMS OCCURRED IN SEVERAL ITERATIONS IN THE COMPUTATION OF THE STANDARDIZED ESTIMATES FOR SEVERAL
CLUSTERS. THIS IS MOST LIKELY DUE TO AR COEFFICIENTS GREATER THAN 1 OR PARAMETERS GIVING NON-STATIONARY MODELS.
SUCH POSTERIOR DRAWS ARE REMOVED. THE FOLLOWING CLUSTERS HAD SUCH PROBLEMS:
202
THE MODEL ESTIMATION TERMINATED NORMALLY
USE THE FBITERATIONS OPTION TO INCREASE THE NUMBER OF ITERATIONS BY A FACTOR
OF AT LEAST TWO TO CHECK CONVERGENCE AND THAT THE PSR VALUE DOES NOT INCREASE.
MODEL FIT INFORMATION
Number of Free Parameters 9
Information Criteria
Deviance (DIC) 61653.300
Estimated Number of Parameters (pD) 12317.423
MODEL RESULTS
Posterior One-Tailed 95% C.I.
Estimate S.D. P-Value Lower 2.5% Upper 2.5% Significance
Within Level
Between Level
X WITH
PHI 0.118 0.020 0.000 0.082 0.157 *
LOGV 1.370 0.192 0.000 1.030 1.776 *
PHI WITH
LOGV 0.234 0.047 0.000 0.150 0.338 *
Means
X 0.866 0.067 0.000 0.732 0.997 *
PHI 0.217 0.017 0.000 0.182 0.249 *
LOGV -1.314 0.168 0.000 -1.641 -0.983 *
Variances
X 0.888 0.094 0.000 0.727 1.105 *
PHI 0.052 0.006 0.000 0.041 0.066 *
LOGV 5.745 0.576 0.000 4.754 7.000 *
STANDARDIZED MODEL RESULTS
STDYX Standardization
Posterior One-Tailed 95% C.I.
Estimate S.D. P-Value Lower 2.5% Upper 2.5% Significance
Within-Level Standardized Estimates Averaged Over Clusters
PHI | X ON
X&1 0.217 0.008 0.000 0.202 0.232 *
LOGV |
X 0.903 0.004 0.000 0.894 0.911 *
Between Level
X WITH
PHI 0.551 0.062 0.000 0.415 0.659 *
LOGV 0.607 0.046 0.000 0.506 0.689 *
PHI WITH
LOGV 0.435 0.067 0.000 0.295 0.560 *
Means
X 0.918 0.084 0.000 0.753 1.078 *
PHI 0.955 0.097 0.000 0.756 1.140 *
LOGV -0.549 0.075 0.000 -0.700 -0.402 *
Variances
X 1.000 0.000 0.000 1.000 1.000
PHI 1.000 0.000 0.000 1.000 1.000
LOGV 1.000 0.000 0.000 1.000 1.000
STDY Standardization
Posterior One-Tailed 95% C.I.
Estimate S.D. P-Value Lower 2.5% Upper 2.5% Significance
Within-Level Standardized Estimates Averaged Over Clusters
PHI | X ON
X&1 0.321 0.067 0.000 0.191 0.458 *
LOGV |
X 0.903 0.004 0.000 0.894 0.911 *
Between Level
X WITH
PHI 0.551 0.062 0.000 0.415 0.659 *
LOGV 0.607 0.046 0.000 0.506 0.689 *
PHI WITH
LOGV 0.435 0.067 0.000 0.295 0.560 *
Means
X 0.918 0.084 0.000 0.753 1.078 *
PHI 0.955 0.097 0.000 0.756 1.140 *
LOGV -0.549 0.075 0.000 -0.700 -0.402 *
Variances
X 1.000 0.000 0.000 1.000 1.000
PHI 1.000 0.000 0.000 1.000 1.000
LOGV 1.000 0.000 0.000 1.000 1.000
R-SQUARE
Within-Level R-Square Averaged Across Clusters
Posterior One-Tailed 95% C.I.
Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%
X 0.097 0.004 0.000 0.089 0.106
BETWEEN-LEVEL FACTOR SCORE COMPARISONS
Results for Factor PHI
Ranking Cluster Factor Score Ranking Cluster Factor Score Ranking Cluster Factor Score
1 202 0.983 2 1 0.754 3 179 0.696
4 5 0.676 5 149 0.663 6 129 0.636
7 85 0.633 8 87 0.626 9 147 0.620
10 92 0.610 11 167 0.573 12 88 0.567
13 23 0.559 14 97 0.554 15 142 0.550
16 44 0.544 17 150 0.521 18 183 0.518
19 201 0.514 20 12 0.511 21 64 0.508
22 107 0.499 23 119 0.491 24 35 0.491
25 70 0.481 26 124 0.475 27 204 0.458
28 73 0.449 29 110 0.443 30 143 0.441
31 65 0.437 32 22 0.433 33 37 0.433
34 154 0.431 35 75 0.427 36 196 0.424
37 8 0.422 38 198 0.405 39 25 0.396
40 200 0.391 41 59 0.389 42 188 0.388
43 30 0.379 44 136 0.376 45 53 0.375
46 103 0.374 47 32 0.372 48 197 0.371
49 38 0.361 50 139 0.358 51 83 0.355
52 116 0.349 53 56 0.347 54 158 0.340
55 126 0.331 56 171 0.326 57 135 0.325
58 168 0.324 59 131 0.321 60 3 0.320
61 203 0.319 62 182 0.318 63 14 0.315
64 132 0.312 65 74 0.310 66 6 0.307
67 91 0.303 68 151 0.296 69 61 0.295
70 134 0.294 71 191 0.292 72 95 0.291
73 160 0.290 74 15 0.289 75 24 0.284
76 186 0.282 77 72 0.280 78 51 0.280
79 156 0.275 80 123 0.266 81 114 0.265
82 152 0.261 83 164 0.257 84 7 0.254
85 104 0.251 86 193 0.250 87 58 0.248
88 33 0.246 89 62 0.244 90 78 0.238
91 138 0.237 92 169 0.232 93 153 0.232
94 173 0.231 95 192 0.230 96 145 0.230
97 174 0.226 98 127 0.218 99 137 0.214
100 102 0.213 101 190 0.209 102 161 0.209
103 36 0.208 104 66 0.192 105 42 0.167
106 89 0.163 107 133 0.160 108 185 0.154
109 4 0.152 110 101 0.147 111 79 0.145
112 100 0.139 113 111 0.139 114 52 0.128
115 21 0.127 116 20 0.123 117 18 0.123
118 128 0.120 119 50 0.119 120 148 0.109
121 13 0.108 122 27 0.105 123 29 0.099
124 69 0.097 125 115 0.093 126 19 0.089
127 155 0.083 128 130 0.076 129 144 0.075
130 187 0.075 131 175 0.071 132 194 0.069
133 94 0.065 134 118 0.065 135 43 0.063
136 11 0.060 137 71 0.057 138 80 0.053
139 140 0.050 140 10 0.049 141 17 0.039
142 181 0.037 143 34 0.036 144 106 0.035
145 108 0.033 146 9 0.031 147 90 0.031
148 57 0.028 149 63 0.028 150 96 0.026
151 109 0.026 152 184 0.025 153 162 0.024
154 199 0.024 155 81 0.023 156 77 0.021
157 125 0.021 158 55 0.021 159 47 0.021
160 141 0.021 161 112 0.020 162 40 0.020
163 165 0.020 164 176 0.020 165 177 0.019
166 93 0.019 167 159 0.019 168 68 0.019
169 170 0.019 170 39 0.018 171 189 0.016
172 82 0.016 173 54 0.016 174 16 0.015
175 178 0.015 176 45 0.014 177 117 0.014
178 84 0.014 179 28 0.014 180 122 0.014
181 46 0.013 182 121 0.013 183 48 0.012
184 31 0.011 185 146 0.011 186 26 0.009
187 172 0.009 188 86 0.009 189 163 0.007
190 60 0.007 191 166 0.007 192 98 0.000
193 99 -0.007 194 105 -0.007 195 76 -0.010
196 113 -0.012 197 67 -0.015 198 41 -0.016
199 180 -0.019 200 195 -0.020 201 157 -0.022
202 2 -0.048 203 49 -0.049 204 120 -0.081
Results for Factor LOGV
Ranking Cluster Factor Score Ranking Cluster Factor Score Ranking Cluster Factor Score
1 171 1.801 2 97 1.493 3 95 1.471
4 132 1.452 5 72 1.393 6 169 1.383
7 118 1.326 8 6 1.308 9 152 1.301
10 83 1.262 11 32 1.154 12 100 1.087
13 38 0.990 14 24 0.910 15 69 0.906
16 196 0.906 17 103 0.880 18 33 0.849
19 56 0.822 20 164 0.817 21 14 0.798
22 120 0.776 23 23 0.738 24 151 0.720
25 59 0.712 26 203 0.700 27 101 0.693
28 119 0.692 29 12 0.686 30 50 0.646
31 129 0.626 32 42 0.621 33 114 0.607
34 74 0.603 35 22 0.579 36 30 0.551
37 2 0.542 38 91 0.533 39 1 0.519
40 44 0.482 41 80 0.480 42 124 0.473
43 149 0.452 44 123 0.425 45 53 0.390
46 111 0.381 47 37 0.371 48 27 0.362
49 153 0.356 50 173 0.348 51 78 0.329
52 125 0.324 53 168 0.322 54 144 0.293
55 147 0.289 56 43 0.285 57 183 0.256
58 167 0.224 59 199 0.164 60 107 0.158
61 145 0.153 62 88 0.146 63 10 0.120
64 8 0.094 65 204 0.089 66 150 0.085
67 18 0.058 68 13 0.047 69 7 0.009
70 87 0.007 71 4 -0.047 72 104 -0.048
73 200 -0.073 74 194 -0.087 75 148 -0.100
76 52 -0.100 77 136 -0.109 78 94 -0.117
79 110 -0.137 80 138 -0.143 81 192 -0.145
82 79 -0.148 83 61 -0.164 84 182 -0.178
85 127 -0.181 86 47 -0.184 87 35 -0.193
88 15 -0.196 89 51 -0.217 90 66 -0.233
91 64 -0.250 92 3 -0.260 93 36 -0.277
94 179 -0.292 95 71 -0.335 96 131 -0.350
97 166 -0.365 98 76 -0.384 99 70 -0.436
100 20 -0.451 101 197 -0.469 102 65 -0.472
103 174 -0.495 104 157 -0.495 105 109 -0.521
106 11 -0.528 107 135 -0.566 108 190 -0.586
109 202 -0.587 110 21 -0.594 111 85 -0.607
112 89 -0.628 113 193 -0.633 114 62 -0.648
115 75 -0.654 116 142 -0.655 117 134 -0.710
118 187 -0.711 119 198 -0.717 120 29 -0.721
121 102 -0.792 122 58 -0.813 123 106 -0.816
124 133 -0.840 125 25 -0.859 126 195 -0.903
127 162 -0.934 128 115 -0.967 129 161 -0.989
130 188 -0.990 131 130 -1.012 132 137 -1.027
133 60 -1.082 134 57 -1.133 135 143 -1.185
136 26 -1.189 137 158 -1.208 138 128 -1.232
139 201 -1.269 140 191 -1.271 141 185 -1.334
142 139 -1.335 143 116 -1.424 144 175 -1.433
145 155 -1.460 146 181 -1.497 147 86 -1.670
148 113 -1.696 149 96 -1.821 150 105 -1.962
151 180 -1.978 152 5 -1.986 153 126 -2.057
154 41 -2.178 155 49 -2.219 156 45 -2.257
157 73 -2.268 158 186 -2.276 159 140 -2.423
160 19 -2.508 161 67 -2.543 162 98 -2.814
163 16 -2.986 164 121 -3.190 165 34 -3.202
166 99 -3.234 167 154 -3.532 168 156 -3.610
169 17 -3.896 170 160 -4.161 171 146 -4.569
172 82 -4.574 173 172 -4.588 174 163 -4.599
175 9 -4.619 176 92 -4.986 177 63 -6.213
178 108 -6.213 179 93 -6.213 180 84 -6.213
181 54 -6.213 182 81 -6.213 183 141 -6.213
184 68 -6.213 185 165 -6.213 186 178 -6.213
187 122 -6.213 188 159 -6.213 189 55 -6.213
190 184 -6.213 191 48 -6.213 192 170 -6.213
193 39 -6.213 194 31 -6.213 195 176 -6.213
196 177 -6.213 197 112 -6.213 198 189 -6.214
199 28 -6.214 200 46 -6.214 201 90 -6.214
202 117 -6.214 203 40 -6.214 204 77 -6.214
Results for X (referred to as plausible label B_X)
Ranking Cluster Factor Score Ranking Cluster Factor Score Ranking Cluster Factor Score
1 85 3.459 2 51 3.280 3 97 3.219
4 196 3.096 5 179 3.033 6 107 2.829
7 5 2.753 8 87 2.679 9 12 2.672
10 129 2.665 11 149 2.651 12 72 2.597
13 95 2.498 14 182 2.482 15 202 2.445
16 3 2.434 17 37 2.434 18 44 2.408
19 124 2.395 20 50 2.359 21 151 2.314
22 32 2.267 23 125 2.229 24 74 2.223
25 123 2.213 26 8 2.202 27 171 2.135
28 158 2.108 29 119 2.084 30 23 2.073
31 38 2.060 32 6 2.055 33 22 2.040
34 185 2.039 35 24 2.035 36 145 2.024
37 14 1.997 38 29 1.949 39 193 1.917
40 120 1.866 41 83 1.854 42 4 1.824
43 59 1.794 44 65 1.787 45 132 1.693
46 203 1.687 47 33 1.661 48 197 1.619
49 147 1.615 50 152 1.547 51 100 1.458
52 136 1.421 53 153 1.407 54 115 1.354
55 114 1.337 56 1 1.323 57 25 1.298
58 110 1.263 59 195 1.262 60 91 1.261
61 69 1.258 62 174 1.256 63 103 1.253
64 35 1.201 65 101 1.201 66 169 1.193
67 204 1.177 68 183 1.156 69 167 1.152
70 94 1.131 71 52 1.131 72 164 1.111
73 64 1.107 74 20 0.946 75 190 0.926
76 61 0.908 77 30 0.903 78 102 0.856
79 88 0.838 80 71 0.796 81 118 0.774
82 134 0.772 83 56 0.768 84 131 0.757
85 80 0.750 86 10 0.722 87 78 0.713
88 53 0.693 89 2 0.677 90 111 0.649
91 135 0.632 92 173 0.625 93 66 0.617
94 58 0.604 95 104 0.581 96 168 0.571
97 144 0.567 98 200 0.565 99 18 0.550
100 42 0.547 101 150 0.501 102 7 0.481
103 138 0.473 104 194 0.456 105 79 0.447
106 13 0.438 107 75 0.422 108 70 0.414
109 192 0.392 110 133 0.385 111 166 0.376
112 199 0.376 113 198 0.360 114 15 0.352
115 27 0.338 116 157 0.335 117 89 0.331
118 137 0.321 119 43 0.312 120 142 0.312
121 109 0.303 122 36 0.297 123 130 0.281
124 148 0.279 125 191 0.276 126 106 0.274
127 161 0.273 128 11 0.272 129 21 0.271
130 76 0.267 131 127 0.265 132 139 0.261
133 155 0.256 134 116 0.253 135 47 0.233
136 188 0.223 137 201 0.200 138 143 0.185
139 126 0.177 140 175 0.165 141 62 0.159
142 128 0.144 143 26 0.142 144 105 0.142
145 60 0.141 146 41 0.129 147 57 0.123
148 49 0.121 149 180 0.111 150 186 0.103
151 113 0.099 152 181 0.092 153 19 0.089
154 162 0.080 155 187 0.077 156 73 0.070
157 86 0.069 158 67 0.063 159 45 0.043
160 96 0.043 161 98 0.040 162 99 0.039
163 154 0.038 164 140 0.032 165 16 0.030
166 156 0.029 167 34 0.023 168 17 0.021
169 121 0.021 170 146 0.010 171 172 0.010
172 9 0.010 173 82 0.010 174 163 0.009
175 160 0.006 176 92 0.004 177 112 0.000
178 28 0.000 179 177 0.000 180 189 0.000
181 90 0.000 182 117 0.000 183 48 0.000
184 170 0.000 185 68 0.000 186 108 0.000
187 55 0.000 188 184 0.000 189 39 0.000
190 31 0.000 191 141 0.000 192 40 0.000
193 159 0.000 194 93 0.000 195 54 0.000
196 84 0.000 197 176 0.000 198 77 0.000
199 178 0.000 200 165 0.000 201 63 0.000
202 122 0.000 203 46 0.000 204 81 0.000
TECHNICAL 1 OUTPUT
PARAMETER SPECIFICATION FOR WITHIN
NU
X X&1
________ ________
0 0
LAMBDA
X X&1
________ ________
X 0 0
X&1 0 0
THETA
X X&1
________ ________
X 0
X&1 0 0
ALPHA
X X&1
________ ________
0 0
BETA
X X&1
________ ________
X 0 0
X&1 0 0
PSI
X X&1
________ ________
X 0
X&1 0 0
PARAMETER SPECIFICATION FOR BETWEEN
NU
X
________
0
LAMBDA
PHI LOGV X
________ ________ ________
X 0 0 0
THETA
X
________
X 0
ALPHA
PHI LOGV X
________ ________ ________
1 2 3
BETA
PHI LOGV X
________ ________ ________
PHI 0 0 0
LOGV 0 0 0
X 0 0 0
PSI
PHI LOGV X
________ ________ ________
PHI 4
LOGV 5 6
X 7 8 9
STARTING VALUES FOR WITHIN
NU
X X&1
________ ________
0.000 0.000
LAMBDA
X X&1
________ ________
X 1.000 0.000
X&1 0.000 1.000
THETA
X X&1
________ ________
X 0.000
X&1 0.000 0.000
ALPHA
X X&1
________ ________
0.000 0.000
BETA
X X&1
________ ________
X 0.000 0.000
X&1 0.000 0.000
PSI
X X&1
________ ________
X 0.000
X&1 0.000 1.049
STARTING VALUES FOR BETWEEN
NU
X
________
0.000
LAMBDA
PHI LOGV X
________ ________ ________
X 0.000 0.000 1.000
THETA
X
________
X 0.000
ALPHA
PHI LOGV X
________ ________ ________
0.000 0.000 0.892
BETA
PHI LOGV X
________ ________ ________
PHI 0.000 0.000 0.000
LOGV 0.000 0.000 0.000
X 0.000 0.000 0.000
PSI
PHI LOGV X
________ ________ ________
PHI 1.000
LOGV 0.000 1.000
X 0.000 0.000 1.051
PRIORS FOR ALL PARAMETERS PRIOR MEAN PRIOR VARIANCE PRIOR STD. DEV.
Parameter 1~N(0.000,infinity) 0.0000 infinity infinity
Parameter 2~N(0.000,infinity) 0.0000 infinity infinity
Parameter 3~N(0.000,infinity) 0.0000 infinity infinity
Parameter 4~IW(0.000,-4) infinity infinity infinity
Parameter 5~IW(0.000,-4) infinity infinity infinity
Parameter 6~IW(0.000,-4) infinity infinity infinity
Parameter 7~IW(0.000,-4) infinity infinity infinity
Parameter 8~IW(0.000,-4) infinity infinity infinity
Parameter 9~IW(0.000,-4) infinity infinity infinity
TECHNICAL 2 OUTPUT
TECHNICAL 3 OUTPUT
ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES
1 2 3 4 5
________ ________ ________ ________ ________
1 0.303243D-03
2 0.111479D-02 0.283039D-01
3 0.573716D-03 0.676933D-02 0.443802D-02
4 -0.532720D-05 0.110025D-04 -0.339096D-05 0.391271D-04
5 -0.194184D-04 0.959434D-05 -0.223709D-05 0.151406D-03 0.225353D-02
6 -0.254633D-04 -0.198442D-02 0.203844D-03 0.528874D-03 0.143923D-01
7 -0.155307D-04 -0.632696D-04 0.219154D-04 0.734729D-04 0.609272D-03
8 -0.181356D-04 -0.550199D-03 0.392759D-03 0.269224D-03 0.525530D-02
9 -0.372339D-04 -0.473692D-03 0.237455D-03 0.130175D-03 0.166469D-02
ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES
6 7 8 9
________ ________ ________ ________
6 0.331846D+00
7 0.346864D-02 0.388632D-03
8 0.814117D-01 0.196578D-02 0.368172D-01
9 0.193057D-01 0.115614D-02 0.130115D-01 0.880102D-02
ESTIMATED CORRELATION MATRIX FOR PARAMETER ESTIMATES
1 2 3 4 5
________ ________ ________ ________ ________
1 1.000
2 0.381 1.000
3 0.495 0.604 1.000
4 -0.049 0.010 -0.008 1.000
5 -0.023 0.001 -0.001 0.510 1.000
6 -0.003 -0.020 0.005 0.147 0.526
7 -0.045 -0.019 0.017 0.596 0.651
8 -0.005 -0.017 0.031 0.224 0.577
9 -0.023 -0.030 0.038 0.222 0.374
ESTIMATED CORRELATION MATRIX FOR PARAMETER ESTIMATES
6 7 8 9
________ ________ ________ ________
6 1.000
7 0.305 1.000
8 0.737 0.520 1.000
9 0.357 0.625 0.723 1.000
TECHNICAL 8 OUTPUT
TECHNICAL 8 OUTPUT FOR BAYES ESTIMATION
CHAIN BSEED
1 0
2 285380
POTENTIAL PARAMETER WITH
ITERATION SCALE REDUCTION HIGHEST PSR
100 1.056 6
200 1.000 1
300 1.007 6
400 1.000 6
500 1.002 5
600 1.008 5
700 1.002 5
800 1.001 5
900 1.003 6
1000 1.002 5
1100 1.002 5
1200 1.001 6
1300 1.000 5
1400 1.000 1
1500 1.000 5
1600 1.001 5
1700 1.000 5
1800 1.001 5
1900 1.000 3
2000 1.001 3
SUMMARIES OF PLAUSIBLE VALUES (N = NUMBER OF OBSERVATIONS * NUMBER OF IMPUTATIONS)
SAMPLE STATISTICS
Means
PHI LOGV B_X
________ ________ ________
0.222 -1.237 0.907
Covariances
PHI LOGV B_X
________ ________ ________
PHI 0.050
LOGV 0.221 5.341
B_X 0.117 1.304 0.888
Correlations
PHI LOGV B_X
________ ________ ________
PHI 1.000
LOGV 0.428 1.000
B_X 0.558 0.599 1.000
SUMMARY OF PLAUSIBLE STANDARD DEVIATION (N = NUMBER OF OBSERVATIONS)
SAMPLE STATISTICS
Means
PHI_SD LOGV_SD B_X_SD
________ ________ ________
0.094 0.126 0.114
Covariances
PHI_SD LOGV_SD B_X_SD
________ ________ ________
PHI_SD 0.000
LOGV_SD 0.000 0.002
B_X_SD 0.000 0.002 0.008
Correlations
PHI_SD LOGV_SD B_X_SD
________ ________ ________
PHI_SD 1.000
LOGV_SD -0.028 1.000
B_X_SD -0.257 0.444 1.000
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores)
Scatterplots (sample values, estimated factor scores)
Between-level histograms (sample values, sample/estimated means/variances, estimated factor scores)
Between-level scatterplots (sample values, sample/estimated means/variances, estimated factor scores)
Time series plots (sample values, ACF, PACF, estimated factor scores)
Histogram of subjects per time point
Time interval plots
Bayesian posterior parameter distributions
Bayesian posterior parameter trace plots
Bayesian autocorrelation plots
Latent variable distribution plots
DIAGRAM INFORMATION
Mplus diagrams are currently not available for multilevel analysis.
No diagram output was produced.
Beginning Time: 13:15:14
Ending Time: 13:37:51
Elapsed Time: 00:22:37
MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com
Copyright (c) 1998-2021 Muthen & Muthen
References
Footnotes
The \(\chi^2\)AR(1), in a more general form, can have an intercept (\(X_{t} = c + \phi X_{t-1} + a_t, \quad a_t \sim \chi^2(\nu)\). Since the intercept was set to zero in the simulation study, we discussed a zero-intercept version of this model (\(c=0\)) in the paper. See the Supplemental Materials for more details.↩︎