Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models?

Reproducible codes

Author

MH Manuel Haqiqatkhah

0 Introduction

This document contains the reproducible code for the manuscript Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models? by Haqiqatkhah, Ryan, and Hamaker (2022). Please cite this document with the following info:

Haqiqatkhah, M. M., Ryan, O., & Hamaker, E. L. (2022). Skewness and staging: Does the floor effect induce bias in multilevel AR(1) models? PsyArXiv. https://doi.org/10.31234/osf.io/myuvr

In this study, we simulated multilevel data from three data generating mechanisms (DGMs), namely, the AR(1), \(\chi^2\)AR(1), BinAR(1), and PoDAR(1) models with different parameter sets. The simulation was conducted using the following modular pipeline design, inspired by Bien’s R package simulator (2016), consisting of the following components:

  1. Simulation: generating the datasets
  2. Analysis: modeling the data
  3. Harvesting: collecting the relevant parameter estimates
  4. Reporting: making tables and plots

And the components were placed in a pipeline, that managed:

  1. Making the simulation design matrix that include all relevant conditions
  2. Book-keeping data files belonging to each replication of each condition
  3. Performing simulations in batch
  4. Performing Analyses in batch
  5. Collecting the data in batch

This document is structured as follows. In Section 1, we explain the four components and the functions used therein. Then, in Section 2 we explain the wrapper functions used in the pipeline, and show how the pipeline was—and can be—executed. Then, in Section 3, we discuss how the harvested data was used to make the figures used in the paper (and others that were not included). Finally, in Section 4 we discuss how supplementary plots were made and how the empirical data analysis [on COGITO data; Schmiedek, Lövdén, and Lindenberger (2010)] was done. Note that although the codes provided here are cleaned as much as possible, they are not necessarily succinctly written; some functions were written to accommodate the most general functionalities which turned out to be not necessary for the simulation study.

Addendum: Version 2

Following a request from a reviewer to implement and run another set of analyses, some additional functions and adaptations have been made. These changes are specified with v2 throughout this document and in codes and do not change the legacy code and text.

The GitHub repository of this study contains all of the code necessary to run the study (in the scripts folder ). Furthermore, because running the whole study would take a huge amount of time (it took us more than 88 days on a 24-core server and produced 2.69 TB of data), the the raw and summarized datasets of the estimated parameters are provided in simulation-files/harvests . To give an impression of the simulated datasets and the analysis output files, the .rds files of the first replication of the study are also provided in simulation-files folder , and the Mplus files of the first replication of the \(N=100, T=100\) condition is provided in the Mplus-files folder . Finally, all the figures that can be generated based on the study results are provided in the figures folder . The figures are also referenced individually in Section 3 and Section 4 of this document.

To replicate the study from the scratch, you should first either clone the repository (using git clone https://github.com/psyguy/skewness-staging.git) or download the repository as a zip file and extract it on your machine. Then you can sequentially run the .R files you find in the scripts folder. Note that you would need to have a licensed version of Mplus version 8.6 (Muthén and Muthén 2017) on your machine to run the whole study. Instead of running the scripts separately, if you have Quarto installed , you can also compile Code documentations.qmd located in the root directory using Quarto after setting the following variables to TRUE:

study_RunPipeline <- FALSE
study_MakeMainPlots <- FALSE
study_MakeAdditionalPlots <- FALSE

Finally, in case you want to change the scripts (e.g., to run a smaller portion of the simulation, or try other parameters, etc.), you should look up the kintr parameters called in each chunk (with <<some_param>>) and find the corresponding code (under ## @knitr some_param) in the scripts folder.

1 Core components

1.1 The Simulation component

The Simulation component consists of three sets of functions:

  1. Functions that implement the DGMs and generate univariate (\(N=1\)) time series of length \(T\) from the parameters given to them;
  2. Wrappers that interface the DGM functions;
  3. A wrapper to generate datasets (consisting on \(N\) time series of length \(T\)) with a given DGM

1.1.1 Data-generating models specifications

First we define functions for each data-generating models (DGMs) that can produce univariate, single-subject (\(N=1\)) time series of desired length \(T\) (default: T = 100) with the two canonical parameters and a given random seed (default: seed = 0). All model(-implied) parameters are saved in a list (called pa).

For each model, the first observation (\(X_1\)) is randomly drawn from the model-implied marginal distribution, to eliminate the need for removing the burn-in window in the beginning of the data. After the data is generated, in case the argument only.ts is set to be TRUE, the raw data (as a vector of length T) is returned. Otherwise, the function calculates empirical dynamic (\(\phi\)) and marginal (\(\mu\), \(\sigma^2\), and \(\gamma\)) parameters based on the simulated data, and save it in a list (Empirical.Parameters). Furthermore, two \(\mathrm{\LaTeX{}}\)-ready strings (Model.Description and Model.Description.Short) are made which include a summary of the model parameters (that can be used, e.g., in plots). Finally, in case only.ts != TRUE, the function returns a list consisting of the time series (stored in x), verbal description of the dataset (Model.Description and Model.Description.Short), theoretical (i.e., model-implied) parameters (Model.Parameters), and empirical (i.e., sample) estimated parameters (Empirical.Parameters).

The canonical parameters of the AR(1) model with normally distributed residuals (which we referred to as NAR(1) in the simulation) are the autoregressive parameter \(\phi\) (default: phi = 0.4), mean \(\mu\) (default: Mean = 50), and the marginal variance \(\sigma^2\) (default: var.marginal = 4). Based on the marginal variance, the residual variance (var.resid) is calculated via \(\sigma^2_\epsilon = \sigma^2 (1 - \phi^2)\).

1.1.1.0.1 Construction

The time series is constructed by first generating a zero-centered time series \(\tilde X_t\) (x_cent). To do so, first the initial observation in the time series (x_cent[1]) is sampled from normal distribution with mean zero and a variance equal to the marginal variance of the model:

\[ \tilde X_1 \sim \mathcal{N}(0, \sigma^2) \]

Then, the remainder of the time series is generated using the definition of the AR(1) model (note that here the residual variance is used in the normal distribution):

\[ \begin{aligned} \tilde X_{t} &= \phi \tilde X_{t-1} + \epsilon_{t} \\ \epsilon_{t} &\sim \mathcal{N}(0, {\sigma^2_{\epsilon}}) \end{aligned} \]

Finally, the mean is added to the centered zero-centered time series to reach the final time series with mean \(\mu\):

\[ X_t = \tilde X_t + \mu \]

1.1.1.0.2 Code
Click to expand the code
dgm_nar <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  ## setting default seed if not given
  if(is.null(pa$phi)) pa$phi <- 0.4
  if(is.null(pa$Mean)) pa$Mean <- 50
  if(is.null(pa$var.marginal)) pa$var.marginal <- 4
  if(is.null(pa$var.resid)) pa$var.resid <- pa$var.marginal * (1 - pa$phi ^ 2)
  if(is.null(pa$k)) pa$k <- 100
  if(is.null(pa$T)) pa$T <- 100
  if(is.null(pa$seed)) pa$seed <- 0

  ## Making sure var.marginal and var.resid are correctly related
  pa$var.marginal <- pa$var.resid / (1 - pa$phi ^ 2)

  ### first make time series centered around zero
  ## drawing the first sample x_cent_1
  set.seed(pa$seed)
  x_cent <- rep(NA, pa$T)
  x_cent[1] <- rnorm(n = 1,
                     mean = 0,
                     sd = sqrt(pa$var.marginal)
  )

  ## making the rest of the centered time series
  for (t in 2:pa$T){

    x_cent[t] <- pa$phi*x_cent[t-1] + rnorm(n = 1,
                                            mean = 0,
                                            sd = sqrt(pa$var.resid)
    )
  }
  ## adding the mean to the centered time series
  x <- x_cent + pa$Mean

  ## quick output of raw time series without book-keeping variables/parameters
  if(!is.null(pa$only.ts))
    if(pa$only.ts==TRUE) return(x)

  Empirical.Parameters = list(Mean = mean(x),
                              Variance = var(x),
                              Skewness = moments::skewness(x),
                              AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
  )

  ## making a LaTeX-ready list description of the model
  Model.Description <- paste0("\\mu = ",
                              round(pa$Mean,2),
                              "(",
                              round(Empirical.Parameters$Mean,2),
                              ")",
                              ",\\; \\gamma = ",
                              0,
                              "(",
                              round(Empirical.Parameters$Skewness,2),
                              ")",
                              ",\\; \\phi = ",
                              round(pa$phi,2),
                              "(",
                              round(Empirical.Parameters$AR,2),
                              ")",
                              ",\\; \\sigma^2_{marginal} = ",
                              round(pa$var.marginal,2),
                              "(",
                              round(Empirical.Parameters$Variance,2),
                              ")",
                              ",\\; T = ",
                              pa$T,
                              "$")

  Model.Description.Short <- paste0("$\\mu = ",
                                    round(pa$Mean, 2),
                                    ",\\; \\sigma^2_{\\epsilon} = ",
                                    round(pa$var.resid, 2),
                                    ",\\; \\phi = ",
                                    round(pa$phi, 2),
                                    "\\; \\rightarrow",
                                    "\\; \\sigma^2 = ",
                                    round(Empirical.Parameters$Variance,2),
                                    ",\\; \\gamma = ",
                                    round(Empirical.Parameters$Skewness,2),
                                    "$")


  ## making the output object
  output <- list(x = x,
                 Model.Description = Model.Description,
                 Model.Description.Short = Model.Description.Short,
                 Model.Parameters = pa,
                 Empirical.Parameters = Empirical.Parameters)

  return(output)
}

The canonical parameters of the \(\chi^2\)AR(1) model (which we referred to as ChiAR(1) in the simulation) are the autoregressive parameter \(\phi\) (default: phi = 0.4), and degrees of freedom \(\nu\) (default: nu = 3). We set the intercept to zero (c = 0).1

1.1.1.0.3 Construction

Similar to the AR(1) model, we need to sample the first observation of the \(\chi^2\)AR(1) model from its marginal distribution. However, since this model does not have a closed-form marginal distribution, as an approximation, we instead sample x[1] from a \(\chi^2\) distribution with \(\nu\) degrees of freedom:

\[ X_1 \sim \chi^2(\nu) \]

Then, we generate the remainder of the time series using the definition of the \(\chi^2\)AR(1) model:

\[ \begin{aligned} X_{t} &= c + \phi X_{t-1} + a_{t} \\ a_{t} &\sim \chi^2(\nu). \end{aligned} \]

1.1.1.0.4 Code
Click to expand the code
dgm_chiar <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  ## setting default seed if not given
  if(is.null(pa$phi)) pa$phi <- 0.4
  if(is.null(pa$nu)) pa$nu <- 3
  if(is.null(pa$c)) pa$c <- 0
  if(is.null(pa$k)) pa$k <- 100
  if(is.null(pa$T)) pa$T <- 100
  if(is.null(pa$seed)) pa$seed <- 0

  ## drawing the first sample x_1
  set.seed(pa$seed)
  x <- rep(NA, pa$T)
  x[1] <- rchisq(n = 1,
                 df = pa$Mean)

  ## making the rest of the time series
  for (t in 2:pa$T){

    x[t] <- pa$c + pa$phi*x[t-1] + rchisq(n = 1,
                                          df = pa$nu)
  }

  ## quick output of raw time series without book-keeping variables/parameters
  if(!is.null(pa$only.ts))
    if(pa$only.ts==TRUE) return(x)

  Empirical.Parameters = list(Mean = mean(x),
                              Variance = var(x),
                              Skewness = moments::skewness(x),
                              AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
  )

  ## making a LaTeX-ready list description of the model
  Model.Description <- paste0("$\\chi^2AR(1):", # \\; with",
                              "\\; \\mu = ",
                              round(pa$Mean,2),
                              "(",
                              round(Empirical.Parameters$Mean,2),
                              ")",
                              ",\\; \\gamma = ",
                              round(pa$Skewness,2),
                              "(",
                              round(Empirical.Parameters$Skewness,2),
                              ")",
                              ",\\; \\phi = ",
                              round(pa$phi,2),
                              "(",
                              round(Empirical.Parameters$AR,2),
                              ")",
                              ",\\; \\nu = ",
                              round(pa$nu,2),
                              ",\\; c = ",
                              round(pa$c,3),
                              ",\\; T = ",
                              pa$T,
                              "$")

  Model.Description.Short <- paste0("$c = ",
                                    round(pa$c,2),
                                    ",\\; \\nu = ",
                                    round(pa$nu, 2),
                                    ",\\; \\phi = ",
                                    round(pa$phi, 2),
                                    "\\; \\rightarrow",
                                    "\\; \\mu = ",
                                    round(Empirical.Parameters$Mean, 2),
                                    ",\\; \\sigma^2 = ",
                                    round(Empirical.Parameters$Variance,2),
                                    ",\\; \\gamma = ",
                                    round(Empirical.Parameters$Skewness,2),
                                    "$")
  ## making the output object
  output <- list(x = x,
                 Model.Description = Model.Description,
                 Model.Description.Short = Model.Description.Short,
                 Model.Parameters = pa,
                 Empirical.Parameters = Empirical.Parameters)

  return(output)
}

The canonical parameters of the BinAR(1) model (which we referred to as BinAR(1) in the simulation) are the survival probability \(\alpha\) (default: alpha = 0.5) and the revival probability \(\beta\) (default: beta = 0.4). By default, the maximum value on scale \(k\) was set to k = 10.

1.1.1.0.5 Construction

We first calculate the \(\theta\) parameter, which characterizes the marginal distribution of the BinaR(1) model:

\[ \theta = \frac{k \beta}{1-(\alpha-\beta)} \] Then we draw \(X_1\) (x[1]) from the marginal distribution of the model:

\[ X_1 \sim Binom(k, \theta) \]

The rest of time series is generated sequentially, for each time point \(t\), by drawing values for the number of survived (S_t[t]) and revived (R_t[t]) elements of the BinAR(1) model based on the previous observations (\(X_{t-1}\)), and then adding them:

\[ \begin{aligned} S_{t} &\sim Binom(X_{t-1}, \alpha) \\ R_t &\sim Binom(k -X_{t-1}, \beta) \\ X_{t} &= S_t + R_t \end{aligned} \]

1.1.1.0.6 Code
Click to expand the code
dgm_binar <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  ## setting default seed if not given
  if(is.null(pa$alpha)) pa$alpha <- 0.5
  if(is.null(pa$beta)) pa$beta <- 0.4
  if(is.null(pa$k)) pa$k <- 10
  if(is.null(pa$T)) pa$T <- 100
  if(is.null(pa$seed)) pa$seed <- 0

  ## making other parameters
  pa$rho <- pa$alpha - pa$beta
  pa$theta <- pa$beta/(1-pa$rho)


  ## drawing the first sample x_1
  set.seed(pa$seed)
  x <- rep(NA, pa$T)
  x[1] <- rbinom(n = 1,
                 size = pa$k ,
                 prob = pa$theta)

  ## making the rest of the time series
  for (t in 2:pa$T){
    S_t <- rbinom(n = 1,
                  size = x[t-1],
                  prob = pa$alpha)
    R_t <- rbinom(n = 1,
                  size = pa$k - x[t-1],
                  prob = pa$beta)
    x[t] <- S_t + R_t
  }

  ## quick output of raw time series without book-keeping variables/parameters
  if(!is.null(pa$only.ts))
    if(pa$only.ts==TRUE) return(x)

  Empirical.Parameters = list(Mean = mean(x),
                              Variance = var(x),
                              Skewness = moments::skewness(x),
                              AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
  )

  ## making a LaTeX-ready list description of the model
  Model.Description <- paste0("$BinAR(1):",
                              "\\; \\mu = ",
                              round(pa$Mean,2),
                              "(",
                              round(Empirical.Parameters$Mean,2),
                              ")",
                              ",\\; \\gamma = ",
                              round(pa$Skewness,2),
                              "(",
                              round(Empirical.Parameters$Skewness,2),
                              ")",
                              ",\\; \\rho = ",
                              round(pa$rho,3),
                              "(",
                              round(Empirical.Parameters$AR,2),
                              ")",
                              ",\\; \\alpha = ",
                              round(pa$alpha,2),
                              ",\\; \\beta = ",
                              round(pa$beta,2),
                              ",\\; \\theta = ",
                              round(pa$theta,3),
                              ",\\; T = ",
                              pa$T,
                              "$")

  Model.Description.Short <- paste0("$k =",
                                    round(pa$k,2),
                                    ",\\; \\alpha = ",
                                    round(pa$alpha,2),
                                    ",\\; \\beta = ",
                                    round(pa$beta,2),
                                    "\\; \\rightarrow",
                                    "\\; \\rho = ",
                                    round(Empirical.Parameters$AR, 2),
                                    ",\\; \\mu = ",
                                    round(Empirical.Parameters$Mean, 2),
                                    ",\\; \\sigma^2 = ",
                                    round(Empirical.Parameters$Variance,2),
                                    ",\\; \\gamma = ",
                                    round(Empirical.Parameters$Skewness,2),
                                    "$")
  ## making the output object
  output <- list(x = x,
                 Model.Description = Model.Description,
                 Model.Description.Short = Model.Description.Short,
                 Model.Parameters = pa,
                 Empirical.Parameters = Empirical.Parameters)

  return(output)
}

The canonical parameters of the PoDAR(1) model (which we referred to as PoDAR(1) in the simulation) are the persistence probability \(\tau\) (default: tau = 0.7) and the average rate \(\lambda\) (default: lambda = 0.5).

1.1.1.0.7 Construction

To generate the time series, we first draw the first observation \(X_1\) (x[1]) from a Poisson distribution with rate \(\lambda\):

\[ X_1 \sim Poisson(\lambda) \]

And generate the rest of the time series by first drawing \(Z_t\) from a Poisson distribution with rate \(\lambda\) and \(P_t\) from a binomial distribution with size probability of success \(\tau\) (that is equivalent to a Bernoulli distribution with probability \(\tau\)). Then, we calculate \(X_t\) based on the previous observation (x[t-1]) and values of \(Z_t\) (Z_t[t]) and \(P_t\) (P_t[t]), using the definition of the PoDAR(1) model:

\[ \begin{aligned} Z_t &\sim Poisson(\lambda) \\ P_t &\sim Binom(1, \tau) \\ X_t &= P_t X_{t-1} + (1-P_t) Z_t \end{aligned} \]

1.1.1.0.8 Code
Click to expand the code
dgm_podar <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  ## setting default seed if not given
  if(is.null(pa$tau)) pa$tau <- 0.7
  if(is.null(pa$lambda)) pa$lambda <- 0.5
  if(is.null(pa$k)) pa$k <- 6
  if(is.null(pa$T)) pa$T <- 100
  if(is.null(pa$seed)) pa$seed <- 0


  ## drawing the first sample x_1
  set.seed(pa$seed)
  x <- rep(NA, pa$T)
  x[1] <- rpois(n = 1,
                lambda = pa$lambda)

  ## making the rest of the time series
  for (t in 2:pa$T){
    V_t <- rbinom(n = 1,
                  size = 1,
                  prob = pa$tau)
    Z_t <- rpois(n = 1,
                 lambda = pa$lambda)
    x[t] <- V_t*x[t-1] + (1-V_t)*Z_t
  }

  ## quick output of raw time series without book-keeping variables/parameters
  if(!is.null(pa$only.ts))
    if(pa$only.ts==TRUE) return(x)

  Empirical.Parameters = list(Mean = mean(x),
                              Variance = var(x),
                              Skewness = moments::skewness(x),
                              AR = acf(x, lag.max = 1, plot = FALSE)$acf[2]
  )

  ## making a LaTeX-ready list description of the model
  Model.Description <- paste0("$PoDAR(1):",
                              "\\; \\mu = ",
                              round(pa$Mean,2),
                              "(",
                              round(Empirical.Parameters$Mean,2),
                              ")",
                              ",\\; \\gamma = ",
                              round(pa$Skewness,2),
                              "(",
                              round(Empirical.Parameters$Skewness,2),
                              ")",
                              "\\; \\tau = ",
                              round(pa$tau,3),
                              "(",
                              round(Empirical.Parameters$AR,2),
                              ")",
                              ",\\; \\lambda = ",
                              round(pa$lambda,3),
                              ",\\; T = ",
                              pa$T,
                              "$")


  Model.Description.Short <- paste0("$\\lambda = ",
                                    round(pa$lambda,3),
                                    ",\\; \\tau = ",
                                    round(pa$tau, 2),
                                    "\\; \\rightarrow",
                                    "\\; \\rho = ",
                                    round(Empirical.Parameters$AR, 2),
                                    ",\\; \\mu = ",
                                    round(Empirical.Parameters$Mean, 2),
                                    ",\\; \\sigma^2 = ",
                                    round(Empirical.Parameters$Variance,2),
                                    ",\\; \\gamma = ",
                                    round(Empirical.Parameters$Skewness,2),
                                    "$")
  ## making the output object
  output <- list(x = x,
                 Model.Description = Model.Description,
                 Model.Description.Short = Model.Description.Short,
                 Model.Parameters = pa,
                 Empirical.Parameters = Empirical.Parameters)


  return(output)
}

# General DGM wrappers ----------------------------------------------------

1.1.2 General DGM wrappers

Given that, in each model, two canonical parameters characterize the dynamic and marginal features of the generated time series, and given that we have analytic formulas that link the canonical parameters to the model-implied \(\phi\), \(\mu\), \(\sigma^2\), and \(\gamma\), we use a function (dgm_parameterizer) to calculate canonical parameters from two given parameters, and make a complete list of parameters (called pa). This list also includes non-parameter variables, importantly, the time series length \(T\) (saved in pa$T) and the random seed used in the dgm_* functions (saved in pa$seed). A wrapper function (dgm_generator) is used as an interface to all dgm_* functions, which first makes sure the given parameters are sufficient for data generation, makes a complete parameter list pa with the help of dgm_parameterizer, and passes pa to the respective DGM generating function.

The function dgm_parameterizer calculates canonical/model-implied parameters of a given DGM (specified using the Model argument) based on the parameters given to it as arguments, and saves them in a list of parameters (pa), which s returned by the function. The function makes sure that the set of parameters provided are sufficient to characterize the dynamic parameter of the model (i.e., the autoregression \(\phi\)) and at least one of the marginal parameters (importantly, the mean \(\mu\)) but giving default values to some parameters.

Click to expand the code
dgm_parameterizer <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  if(is.null(pa$Model)) pa$Model <- "ChiAR(1)"
  if(is.null(pa$phi)) pa$phi <- 0.2

  ## %%%%%%%%%%%%
  ## for NAR(1)
  ## %%%%%%%%%%%%

  if(tolower(pa$Model) == "nar(1)" | tolower(pa$Model) == "nar"){

    if(is.null(pa$k)) pa$k <- 100

    ## Calculating model parameters
    ## if mean is given
    if (!is.null(pa$Mean)) {
      # then from the mean formula
      pa$c <- pa$Mean * (1 - pa$phi)
    }
    ## if mean is not given, we get it from the intercept
    else{
      # set intercept to zero, if already not defined
      if(is.null(pa$c)) pa$c <- 0
      # then from the mean formula
      pa$Mean <- pa$c / (1 - pa$phi)
    }

    ## Now we certainly have the mean calculated

    ## The within-person (marginal) variance is more interpretable. So
    ## we always calculate the residual variance based on the marginal
    ## variance, unless var.resid is known BUT var.marginal is NOT.
    if (!is.null(pa$var.resid) & is.null(pa$var.marginal)) {
      # from the marginal variance formula
      pa$var.marginal <- pa$var.resid / (1 - pa$phi ^ 2)
    }
    ## If none of them are known we first give var.marginal a default
    ## value and calculate var.resid based on that.
    if (is.null(pa$var.resid) & is.null(pa$var.marginal)) {
      # we set a default value for the marginal variance
      pa$var.marginal <- 4
    }
    ## Then, we calculate var.resid (again) with var.marginal to make
    ## sure that the marginal variance had been given more importance
    pa$var.resid <- pa$var.marginal * (1 - pa$phi ^ 2)

    ## Returning the parameter list
    return(pa)

  }


  ## %%%%%%%%%%%%
  ## for ChiAR(1)
  ## %%%%%%%%%%%%

  if(tolower(pa$Model) == "chiar(1)" | tolower(pa$Model) == "chiar" |
     tolower(pa$Model) == "chi2ar(1)" | tolower(pa$Model) == "chi2ar"){

    if(is.null(pa$k)) pa$k <- 100

    ## Calculating model parameters
    ## if mean and skewness are given
    if (!is.null(pa$Mean) & !is.null(pa$Skewness)) {
      # from the skewness formula
      pa$nu <-
        8 * (1 - pa$phi ^ 2) ^ 3 / ((pa$Skewness ^ 2) * (1 - pa$phi ^
                                                           3) ^ 2)
      # then from the mean formula
      pa$c <- pa$Mean * (1 - pa$phi) - pa$nu
      # returning the parameter list
      return(pa)
    }
    ## if mean and skewness are not given at the same time, we use c
    ## and one other parameter
    else{

      # set intercept to zero, if already not defined
      if(is.null(pa$c)) pa$c <- 0

      ## if mean is given
      if (!is.null(pa$Mean)) {
        # then from the mean formula
        pa$nu <- pa$Mean * (1 - pa$phi) - pa$c
        # from the skewness formula
        pa$Skewness <-
          2 * (1 - pa$phi ^ 2) ^ 1.5 / (sqrt(pa$nu / 2) * (1 - pa$phi ^
                                                             3))
        # returning the parameter list
        return(pa)
      }

      ## if skewness is given
      if (!is.null(pa$Skewness)) {
        # from the skewness formula
        pa$nu <-
          8 * (1 - pa$phi ^ 2) ^ 3 / ((pa$Skewness ^ 2) * (1 - pa$phi ^
                                                             3) ^ 2)
        # then from the mean formula
        pa$Mean <- (pa$c + pa$nu) / (1 - pa$phi)
        # returning the parameter list
        return(pa)
      }

      ## if nu is given
      if(!is.null(pa$nu)){
        # from the mean formula
        pa$Mean <- (pa$c + pa$nu)/(1-pa$phi)
        # from the skewness formula
        pa$Skewness <-
          2 * (1 - pa$phi ^ 2) ^ 1.5 / (sqrt(pa$nu / 2) * (1 - pa$phi ^
                                                             3))
        # returning the parameter list
        return(pa)
      }

    }


  }


  ## %%%%%%%%%%%%
  ## for BinAR(1)
  ## %%%%%%%%%%%%

  if (tolower(pa$Model) == "binar(1)" | tolower(pa$Model) == "binar") {

    if (is.null(pa$k))
      pa$k <- 10

    ## Calculating model parameters

    ## if mean is given
    if (!is.null(pa$Mean)) {
      # from skewness formula: m = k*theta
      pa$theta <- pa$Mean / pa$k
      ## we then calculate skewness based on theta
      pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
      ## we then calculate beta based on theta and phi
      pa$beta <- pa$theta * (1 - pa$phi)
      # then we calculate alpha
      pa$alpha <- pa$phi + pa$beta
      # finally we calculate k*beta, which is equivalent to c
      pa$c <- pa$k * pa$beta
      # returning the parameter list
      return(pa)
    }

    ## if skewness is given
    if (!is.null(pa$Skewness)) {
      # from skewness formula: skewness = (1-2*theta)/sqrt(k*theta*(1-theta))
      # it is easier to write with ks = k*(skewness^2)
      ks <- pa$k*pa$Skewness^2
      pa$theta <- (ks + 4 -
                     sqrt(ks * (ks + 4))) / (2 * ks + 8)
      # The above formula only gives theta < 0.5, thus for negative skewness
      # we must use 1-theta instead
      if(pa$Skewness < 0) pa$theta <- 1 - pa$theta
      # we then calculate mean based on theta
      pa$Mean <- pa$k * pa$theta
      # we then calculate beta based on theta and phi
      pa$beta <- pa$theta * (1 - pa$phi)
      # then we calculate alpha
      pa$alpha <- pa$phi + pa$beta
      # finally we calculate k*beta, which is equivalent to c
      pa$c <- pa$k * pa$beta
      # returning the parameter list
      return(pa)
    }

    ## if theta is given
    if (!is.null(pa$theta)) {
      # from theta formula
      pa$beta <- pa$theta * (1 - pa$phi)
      # from beta formula
      pa$alpha <- pa$phi + pa$beta
      # we then calculate mean based on theta
      pa$Mean <- pa$k * pa$theta
      # we then calculate skewness based on theta
      pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
      # finally we calculate k*beta, which is equivalent to c
      pa$c <- pa$k * pa$beta
      # returning the parameter list
      return(pa)
    }

    ## if alpha is given
    if (!is.null(pa$alpha)) {
      # from beta formula
      pa$beta <- pa$alpha - pa$phi
      # from theta formula
      pa$theta <- pa$beta / (1 - pa$phi)
      # we then calculate mean based on theta
      pa$Mean <- pa$k * pa$theta
      # we then calculate skewness based on theta
      pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
      # finally we calculate k*beta, which is equivalent to c
      pa$c <- pa$k * pa$beta
      # returning the parameter list
      return(pa)
    }

    ## if beta is given
    if (!is.null(pa$beta)) {
      # from beta formula
      pa$alpha <- pa$phi + pa$beta
      # from theta formula
      pa$theta <- pa$beta / (1 - pa$phi)
      # we then calculate mean based on theta
      pa$Mean <- pa$k * pa$theta
      # we then calculate skewness based on theta
      pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
      # finally we calculate k*beta, which is equivalent to c
      pa$c <- pa$k * pa$beta
      # returning the parameter list
      return(pa)
    }

    ## if intercept given
    if (!is.null(pa$c)) {
      # from intercept formula c = k*beta
      pa$beta <- pa$c / pa$k
      # from beta formula
      pa$alpha <- pa$phi + pa$beta
      # from theta formula
      pa$theta <- pa$beta / (1 - pa$phi)
      # we then calculate mean based on theta
      pa$Mean <- pa$k * pa$theta
      # we then calculate skewness based on theta
      pa$Skewness <- (1 - 2 * pa$theta) / sqrt(pa$k * pa$theta * (1 - pa$theta))
      # returning the parameter list
      return(pa)
    }

  }

  ## %%%%%%%%%%%%
  ## for PoDAR(1)
  ## %%%%%%%%%%%%

  if (tolower(pa$Model) == "podar(1)" | tolower(pa$Model) == "podar") {

    if (is.null(pa$k))
      pa$k <- 50

    # phi and tau are the same, then if tau is defined, it overrules phi
    if(!is.null(pa$tau)) pa$phi <- pa$tau
    # and if tau is not defined, then tau will get the value of phi
    if(is.null(pa$tau)) pa$tau <- pa$phi
    # DAR(1) intercept is zero
    pa$c <- 0

    ## Calculating model parameters

    ## if mean is given
    if (!is.null(pa$Mean)) {
      # mean of Poisson is lambda
      pa$lambda <- pa$Mean
      ## we then calculate skewness based on lambda
      pa$Skewness <- pa$lambda^(-0.5)
      # returning the parameter list
      return(pa)
    }

    ## if skewness is given
    if (!is.null(pa$Skewness)) {
      # from skewness formula: skewness = 1/sqrt(lambda)
      pa$lambda <- pa$Skewness^(-2)
      # we then calculate mean based on theta
      pa$Mean <- pa$lambda
      # returning the parameter list
      return(pa)
    }

    ## if lambda is given
    if (!is.null(pa$lambda)) {
      # mean of Poisson is lambda
      pa$Mean <- pa$lambda
      ## we then calculate skewness based on lambda
      pa$Skewness <- pa$lambda^(-0.5)
      # returning the parameter list
      return(pa)
    }
  }

}

The function dgm_generator gets a set of parameters (either as separate arguments, or a list of parameters, like the one returned by dgm_parameterizer), saves them in a list called pa. It checks whether \(\phi\) is included in the list (if not, sets the default value pa$phi = 0.2), and checks if at least one other parameter (which, together with \(\phi\), is required to characterize the marginal properties of the DGMs) is calculated for it (if not, it sets the default value pa$Mean = 5 for \(\mu\)). Furthermore, if the DGM name, time series length, and the random seed are not provided, it gives them default values (respectively: Model = "ChiAR(1)", T = 100, and seed = 0) and adds them to pa.

Then, it passes the pa list to dgm_parameterizer to do the necessary conversions to complete the list of canonical and model-implied parameters. Finally, given the model name, it checks if non-canonical parameters \(k\) and \(c\) are set (otherwise assigns appropriate defaults to them), and passes the complete parameter list to the respective DGM function.

Click to expand the code
dgm_generator <- function(...){

  pa <- list(...)

  if(is.list(pa$pa)) pa <- pa$pa

  ## setting default seed if not given
  if(is.null(pa$Model)) pa$Model <- "ChiAR(1)"
  if(is.null(pa$phi)) pa$phi <- 0.2
  # if no model parameter is given, then mean is set to a default 5
  if (is.null(pa$Mean) &
      is.null(pa$Skewness) &
      is.null(pa$c) &
      is.null(pa$nu) &
      is.null(pa$alpha) &
      is.null(pa$beta) &
      is.null(pa$theta)
  ) pa$Mean <- 5
  # if(is.null(pa$Mean)) pa$Variance <- 10
  # if(is.null(pa$Skewness)) pa$Skewness <- 3
  if(is.null(pa$T)) pa$T <- 100
  if(is.null(pa$seed)) pa$seed <- 0

  if(is.null(pa$only.ts)) pa$only.ts <- FALSE

  ## If you set `pa$only.ts` parameter as TRUE, the dgm_ functions produce only
  ## the raw time series (a single numeric vector) which is way lighter and way
  ## faster:

  # pa$only.ts <- TRUE


  ## calculating model parameters
  pa <- dgm_parameterizer(pa = pa)

  ### making models

  ## %%%%%%%%%%%%
  ## NAR(1)
  ## %%%%%%%%%%%%
  if(tolower(pa$Model) == "nar(1)" | tolower(pa$Model) == "nar"){
    # default maximum scale value
    if(is.null(pa$k)) pa$k <- 100
    ## %% Generating the data
    o <- dgm_nar(pa = pa)
  }

  ## %%%%%%%%%%%%
  ## ChiAR(1)
  ## %%%%%%%%%%%%
  if(tolower(pa$Model) == "chiar(1)" | tolower(pa$Model) == "chiar" |
     tolower(pa$Model) == "chi2ar(1)" | tolower(pa$Model) == "chi2ar"){
    # default maximum scale value
    if(is.null(pa$k)) pa$k <- 100
    ## %% Generating the data
    o <- dgm_chiar(pa = pa)
  }

  ## %%%%%%%%%%%%
  ## BinAR(1)
  ## %%%%%%%%%%%%
  if(tolower(pa$Model) == "binar(1)" | tolower(pa$Model) == "binar"){
    # default maximum scale value
    if(is.null(pa$k)) pa$k <- 10
    ## %% Generating the data
    o <- dgm_binar(pa = pa)
  }

  ## %%%%%%%%%%%%
  ## DAR(1)
  ## %%%%%%%%%%%%
  if(tolower(pa$Model) == "dar(1)" | tolower(pa$Model) == "dar"){
    # default maximum scale value
    if(is.null(pa$k)) pa$k <- 10
    ## %% Generating the data
    o <- dgm_dar(pa = pa)
  }

  ## %%%%%%%%%%%%
  ## PoDAR(1)
  ## %%%%%%%%%%%%
  if(tolower(pa$Model) == "podar(1)" | tolower(pa$Model) == "podar"){
    # default maximum scale value
    if(is.null(pa$k)) pa$k <- 100
    ## %% Generating the data
    o <- dgm_podar(pa = pa)
  }

  ## Also allow a data frame output
  # if(!is.null(p$as.dataframe) & p$as.dataframe){
  #
  # }

  return(o)
}


# Dataset generation ------------------------------------------------------

1.1.3 Dataset generation

The machinery described above can be used to generate individual (\(N=1\)) time series. However, for the simulation study, we need datasets comprising of multiple (\(N=25, 50, 100\)) individuals. As we discussed in the paper, in our study, all individuals in a dataset of a DGM share the same autoregressive parameter (\(\phi_i=0.4\)) and the individual differences are only in the individual means (\(\mathbb{\mu} = [\mu_1 , \mu_2, \dots, \mu_N]\)). Thus, we write a function (dgm_make.sample) that can generate, for each DGM, a dataset of \(N\) individuals based on an \(N\)-dimensional vector of individual means, all with the same \(\phi_i\). We then need to find the appropriate parameters for the level-2 distributions (Gaussian and \(\chi^2\) distributions) for each DGM, such that the we get a considerable proportion of individuals with considerably high skewness while respecting the lower and upper bounds of values supported by each model. Finally, with a wrapper function (make_datasets), we facilitate making dataset by automatically generating the means vector suitable for each DGM.

The function dgm_make.sample generates a dataset of time series of length T with the autoregressive parameter phi from a desired DGM (determined by the Model argument) given a vector of means (passed as the argument Means). The length of Means determine the number of individuals in the dataset (N <- length(Means)). If Means is not provided, a randomly generated vector of \(N = 100\) is used as default. Since each individual time series is generated with a random seed, we need a vector of N unique seeds, which can be provided using the seeds argument. In case seeds is not provided, it is generated based on the provided means (seeds.from.means), and if it is a scalar, the seeds vector is created by adding the scalar to the seeds.from.means (which would allow generating different datasets with the same mean distributions).

Click to expand the code
dgm_make.sample <- function(Model = "ChiAR(1)",
                            Means = rnorm(100, 5, 3),
                            T = 100,
                            phi = 0.4,
                            Phis = NULL,
                            seeds = NULL
){

  N <- length(Means)
  seeds.from.means <- 1000*N*Means

  if(is.null(seeds)) seeds <- seeds.from.means
  if(length(seeds)<=1) seeds <- seeds + seeds.from.means

  df <- data.frame(subject = rep(1:N, each = T),
                   t = rep(1:T, times = N),
                   x = rep(NA, N*T))

  # Adapting for v2 implementation
  if(length(Phis) != N) Phis <- rep(phi, N)

  for(s in 1:N){
    x <- dgm_generator(
      Model = Model,
      only.ts = TRUE,
      T = T,
      phi = Phis[s],
      Mean = Means[s],
      seed = seeds[s])

    df$x[((s-1)*T + 1):(s*T)] <- x
  }

  return(df)
}

For each alternative DGM—the \(\chi^2\)AR(1), BinAR(1), and PoDAR(1) models—we should determine appropriate parameters for the level-2 distribution of means such that we have enough skewness in the generated datasets. To do so, we make a function (Mean.vs.Skewness) to help us experiment with different values for \(\mu\) and \(\sigma^2\) (of the Gaussian level-2 distribution) and \(\nu\) (of the \(\chi^2\) level-2 distribution) for each alternative DGM. Note that we start by generating more than enough samples for each distribution (\(10 \times N\)) and subsample \(N\) values after applying the model-specific lower and upper bounds.

We notice that we get the desired distribution of skewness with the following parameters:

Parameters of level-2 distribution of means
Model \(\mu\) \(\sigma^2\) \(\nu\)
\(\chi^2\)AR(1) 10 10 5
BinAR(1) 2 1 2.9
PoDAR(1) 4 4 1.5

Giving us the following distributions:

We then use a wrapper (make_datasets) around dgm_make.sample that generates datasets for all four DGMs with the appropriate level-2 parameters specified above. Note that here we first generate more than enough (i.e., \(2 \times N\)) samples of means to make sure we end up with \(N\) samples after applying the upper and lower bounds. The datasets are then saved, with some additional variables in separate .rds files, using wrapper functions described in Section 2.2.

Addendum: Version 2

By setting version = "v2" in make_datasets (now the default), the vector of Phis is randomly drawn from \(\mathcal{N}(0.4, \sqrt{0.1})\). More specifically, like the means, \(2 \times N\) values of \(\phi\) are generated, the values not between 0 and 1 are removed, and \(N\) values are randomly drawn from the remaining.

The function returns a list in this case, which contain the generated dataset dataset, and the vectors of means Means and \(\phi\)s Phis used to generate it.

Click to expand the code
make_datasets <- function(Model = "DAR",
                          T = 100,
                          N = 100,
                          phi = 0.4,
                          l2.distribution = "Gaussian",
                          uSeed = 0,
                          version = "v2") {
  # save global seed of the global env and set it back before leaving
  seed.old <- .Random.seed
  on.exit({
    .Random.seed <<- seed.old
  })
  set.seed(uSeed)

  if (tolower(Model) == "nar") {
    model.name <- "NAR"
    lower_bound <- 0
    upper_bound <- 100
    lev2.Mean <- 50
    lev2.Variance <- 4
    chi2.df <- 2
  }

  if (tolower(Model) == "chiar" | tolower(Model) == "chi2ar") {
    model.name <- "Chi2AR"
    lower_bound <- 0
    upper_bound <- 100
    lev2.Mean <- 10
    lev2.Variance <- 10
    chi2.df <- 5
  }

  if (tolower(Model) == "binar") {
    model.name <- "BinAR"
    lower_bound <- 0
    upper_bound <- k
    lev2.Mean <- 2
    lev2.Variance <- 1
    chi2.df <- 2.9
  }

  if (tolower(Model) == "podar") {
    model.name <- "PoDAR"
    lower_bound <- 0
    upper_bound <- 100
    lev2.Mean <- 4
    lev2.Variance <- 4
    chi2.df <- 1.5
  }

  # sampling within-person mean from level 2 distribution
  if (l2.distribution == "Gaussian")
    Means <- rnorm(2 * N, lev2.Mean, sqrt(lev2.Variance))
  if (l2.distribution == "Chi2")
    Means <- rchisq(2 * N, chi2.df)

  # removing out-of-bounds means
  Means[Means < lower_bound] <- NA
  Means[Means > upper_bound] <- NA

  # keeping N samples from the in-bound means
  Means <- Means %>% na.omit() %>% sample(N)

  # Adapting for v2 implementation
  Phis <- rep(phi, N)

  ## Implementing v2
  if(version == "v2"){
    # sampling within-person phis
    Phis <- rnorm(2 * N, phi, 0.1)

    # removing out-of-bounds phis
    Phis[Phis <= 0] <- NA
    Phis[Phis >= 1] <- NA

    # keeping N samples from the in-bound means
    Phis <- Phis %>% na.omit() %>% sample(N)
  }

  # Making a dataframe using dgm_make.sample
  sample_df <- dgm_make.sample(
    Model = Model,
    Means = Means,
    T = T,
    Phis = Phis,
    seeds = NULL
  )

  if(version == "v2") return(list(dataset = sample_df,
                                  Means = Means,
                                  Phis = Phis))
  return(sample_df)

}

1.2 The Analysis component

We analyzed each dataset with AR(1) models with fixed and random residual variance using Mplus v. 8.1 (Muthén and Muthén 2017). To interface Mplus from R, we used the package MplusAutomation (Hallquist and Wiley 2018) and wrote a function (run_MplusAutomation) that for each iteration of each condition would save the dataset as a .dat file, generate the .inp input script for the desired analysis type, and run the model for that dataset. Mplus then saves the output files (.out and .gh5). After the analysis, run_MplusAutomation extracts parameter estimates and returns them in an R object, which, with some additional variables, are saved in separate .rds files using wrapper functions described in Section 2.2.

Addendum: Version 2

In v2, to make the analysis and data storage more efficient, run_MplusAutomation removes the Mplus files generated by MplusAutomation::mplusModeler (.inp, .dat, .out, and .gh5), and drops $results$gh5 from mplusModeler’s output (which would save a lot of space).

Click to expand the code
run_MplusAutomation <- function(df,
                                PROCESSORS = 2,
                                CHAINS = 2,
                                THIN = 2,
                                BITERATIONS.min = 2000,
                                BITERATIONS.max =
                                  BITERATIONS.min * BITERATION.minmax.factor,
                                BITERATION.minmax.factor = 2.5,
                                out.folder = "Mplus-files/",
                                model_what = "resid.random",
                                file.name = Sys.Date(),
                                remove.files = FALSE,
                                drop.gh5 = FALSE,
                                version = "v2") {

  inp.name <- paste0(out.folder,
                     file.name)

  MplusAutomation::prepareMplusData(df,
                                    paste0(inp.name, ".dat"))

  VARIABLE <- glue::glue("CLUSTER = subject;
                      LAGGED = x(1);
                      TINTERVAL = t(1);")

  ANALYSIS  <- glue::glue(
    "TYPE = TWOLEVEL RANDOM;
    ESTIMATOR = BAYES;
    PROCESSORS = {PROCESSORS};
    CHAINS = {CHAINS};
    THIN = {THIN};
    BITERATIONS = {BITERATIONS.max}({BITERATIONS.min});"
  )

  PLOT <- glue::glue("TYPE = PLOT3;
                     FACTORS = ALL (500);")


  if (model_what == "resid.random")
    model_string <- glue::glue("%WITHIN%
    phi | x ON x&1;
    logv | x;
    %BETWEEN%
    x phi logv WITH x phi logv;")

  if (model_what == "resid.fixed")
    model_string <- glue::glue("%WITHIN%
    phi | x ON x&1;
    %BETWEEN%
    x phi WITH x phi;")


  if (model_what == "within.between") {
    model_string <- glue::glue("%WITHIN%
    x;
    %BETWEEN%
    x;")
    VARIABLE <- "CLUSTER = subject;"
    PLOT <- "TYPE = PLOT3;"

  }


  model.ar1 <- MplusAutomation::mplusObject(
    TITLE = inp.name,
    rdata = df,
    usevariables = c("subject", "t", "x"),
    VARIABLE = VARIABLE,
    ANALYSIS = ANALYSIS,
    MODEL = model_string,
    OUTPUT = "TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;",
    PLOT = PLOT
  )


  fit.ar1 <- MplusAutomation::mplusModeler(
    model.ar1,
    check = FALSE,
    modelout = paste0(inp.name, ".inp"),
    hashfilename = FALSE,
    run = 1L
  )

  # Adapting for v2 implementation:
  if(version == "v2") remove.files <- drop.gh5 <- TRUE

  # Removing Mplus files generated by the R function
  if (remove.files == TRUE)
    remove.files <- file.remove(file.path(
      out.folder,
      dir(
        path = out.folder,
        pattern = file.name,
        ignore.case = TRUE
      )
    ))

  # Dropping gh5 from the R object
  if(drop.gh5 == TRUE) fit.ar1$results$gh5 <- NULL

  return(fit.ar1)
}

In each analysis, we simulated two MCMC chains (CHAINS = 2), and to reduce autocorrelation in the estimated parameters, by defining THIN = 5 we asked Mplus to save every 5th sample. By setting BITERATIONS = 5000(2000), we made sure to have between 2000 to 5000 samples (after thinning) for each parameter from each chain. Mplus considers the first half of each chain as burn-in samples and discards them, thus, in total, we got at least 2000 “independent” samples from both chains combined. Finally, with FACTORS = ALL (500) we asked Mplus to draw 500 samples for each individuals when estimating level-1 parameters. We visually inspected the traceplots and autocorrelation plots of parameter estimates and of a sample of analyzed datasets and good convergence was observed. Furthermore, to make sure the number of iterations and thinning used in the analyses provide sufficiently accurate estimates, we re-analyzed two replications of each alternative DGM with Gaussian and \(\chi^2\)-distributed means with \(N=100\) and \(T=100\) with BITERATIONS = 12500(5000) and THIN = 20, which led to estimates of the parameters of interest (unstd X.WITH.PHI, unstd Variances.PHI, and stdyx X.WITH.PHI) almost identical (up to the third decimal) to those estimated with BITERATIONS = 5000(2000) and THIN = 5 (see below).

The generated input files looked like the following. Note that the TITLE and DATA strings in the .inp files are unique to the dataset being analyzed and included the unique dataset seed uSeed (passed to make_datasets to generate datasets), the number of individuals in the dataset N, the length of the time series T, the model types used (resid.fixed or resid.random), and the replication number Rep (see Section 2.1 for further details).

By estimating the covariances between mean and autoregression (x phi WITH x phi under the BETWEEN% command), the following Mplus script runs an AR(1) with random effect mean and autoregressive parameter but with fixed effect residual variance.

Click to expand the code
TITLE:
fit_uSeed-000000_N-100_T-100_type-resid.fixed_Rep-1

DATA:
FILE = "fit_uSeed-000000_N-100_T-100_type-resid.fixed_Rep-1.dat";

VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);

ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);

MODEL:
%WITHIN%
phi | x ON x&1;
%BETWEEN%
x phi WITH x phi;

OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;

PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);

By estimating the logarithm of the residual variance at level 1 (by logv | x under the %WITHIN% command) and estimating the covariances between the level-2 mean, autoregression, and residual variance (x phi logv WITH x phi logv under the %BETWEEN% command), the following Mplus script runs an AR(1) model with random effect mean, autoregressive parameter, and residual variance.

Click to expand the code
TITLE:
fit_uSeed-000000_N-100_T-100_type-resid.random_Rep-1

DATA:
FILE = "fit_uSeed-000000_N-100_T-100_type-resid.random_Rep-1.dat";

VARIABLE:
NAMES = subject t x;
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);

ANALYSIS:
TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 1;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);

MODEL:
%WITHIN%
phi | x ON x&1;
logv | x;
%BETWEEN%
x phi logv WITH x phi logv;

OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;

PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);

The table below shows the estimated parameters of two replications of with BITERATIONS = 5000(2000) and THIN = 5 (specified in the table with 2k x 5) and BITERATIONS = 12500(5000) and THIN = 20 (specified in the table with 5k x 20). The relevant parameters (level-2 correlations X.WITH.PHI unstd, covariances X.WITH.PHI stdyx, and variance Variances.PHI unstd) are highlighted with orange.

1.3 The Harvesting component

Extracting parameter estimates of individual analyses

The function fit_extract gets an .rds file generated by do_fit_doFuture (see Section 2.2)—which includes book-keeping information and Mplus output object that was generated by run_MplusAutomation (see Section 1.2)—and extracts the unstandardized and stdyx.standardized Mplus parameter estimates and stores them, along with the additional book-keeping information, in a dataframe.

Click to expand the code
fit_extract <- function(rds.file){

  m <- readRDS(rds.file)
  est.par <- m[["fit.Dataset"]][["results"]][["parameters"]]

  # make empty dataframe for NAs
  empty <- data.frame(matrix(ncol = 19, nrow = 0))
  colnames(empty) <- c("uSeed", "type", "l2.dist", "Model", "N", "phi", "T",
                       "Rep", "standardization", "est", "posterior_sd", "pval",
                       "lower_2.5ci", "upper_2.5ci", "sig", "BetweenWithin",
                       "param.name", "fit.ElapsedTime", "fit.File")

  if(length(est.par) == 0) return(empty)
  if(is.null(est.par[["unstandardized"]])) return(empty)
  if(is.null(est.par[["stdyx.standardized"]])) return(empty)

  unstd <- est.par[["unstandardized"]] %>%
    mutate(param.name = paste(paramHeader,
                              param,
                              sep = ".")
    ) %>%
    select(-paramHeader:-param) %>%
    mutate(standardization = "unstd",
           .before = est)
  stdyx <- est.par[["stdyx.standardized"]] %>%
    mutate(param.name = paste(paramHeader,
                              param,
                              sep = ".")
    ) %>%
    select(-paramHeader:-param) %>%
    mutate(standardization = "stdyx",
           .before = est)

  m$fit.Dataset <- NULL

  res <- unstd %>%
    rbind(stdyx) %>%
    mutate(fit.ElapsedTime = (difftime(m[["fit.EndTime"]],
                                       m[["fit.StartTime"]],
                                       units="mins")
    ) %>%
      as.numeric(),
    fit.File = gsub(".*/", "", m$fit.File)) %>%
    mutate(uSeed = m$uSeed,
           type = m$type,
           l2.dist = m$l2.dist,
           Model = m$Model,
           N = m$N,
           phi = m$phi,
           T = m$T,
           Rep = m$Rep,
           .before = standardization
    )

  return(res)

}

1.4 The Reporting component

This component includes a function that clean the harvested dataset (and extract the relevant information) and functions to generate dataset profile plots, results figures, and parameter estimates coverage plots.

Given that the figures contain customized typefaces (from the CMU Serif and Merriweather font families), we need to make load the fonts:

Click to expand the code
library(showtext)

# Check the current search path for fonts
font_paths()
#> [1] "C:\\Windows\\Fonts"

# List available font files in the search path
f <- font_files()

font_add("CMU Classical Serif", "cmunci.ttf")
font_add("CMU Serif Upright Italic", "cmunui.ttf")
font_add("CMU Serif", "cmunrm.ttf")

font_add("Merriweather Regular", "Merriweather-Regular.ttf")
font_add("Merriweather Light", "Merriweather Light.ttf")

font_families()

## automatically use showtext for new devices
showtext_auto()


p.colors <-
  brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)]

palette_nar <- brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)]
palette_chiar <- brewer.pal(name = "GnBu", n = 9)[c(5, 7, 9)]
palette_binar <- brewer.pal(name = "YlGn", n = 9)[c(5, 7, 9)]
palette_podar <- brewer.pal(name = "BuPu", n = 9)[c(5, 7, 9)]

# Pair plots --------------------------------------------------------------

1.4.1 Cleaning the harvested dataset

Given that the output of fit_extract has too much information in it, the function harvest_cleanup extracts the relevant parameter estimates—the point estimates and the upper and lower 2.5 crediblity intervals of level-2 covariance (unstd X.WITH.PHI), level-2 correlation (stdyx X.WITH.PHI), and fixed-effect autoregressive parameter at level 2 (unstd Means.PHI)—cleans the datasets and makes it tidy, and calculates the outcomes of interest, namely, the (absolute) average estimates, mean absolute and squared error, root mean squared error, estimation bias and variance, number and percentage of of non-converged datasets, and the percentages of significantly positive or negative or non-significant parameter estimates.

Click to expand the code
harvest_cleanup <- function(harv,
                            return.abridged = FALSE) {

  d_abridged <- harv %>%
    na.omit() %>%
    mutate(std_parname = paste(standardization, param.name)) %>%
    filter(std_parname %in% c("stdyx X.WITH.PHI",
                              "unstd X.WITH.PHI",
                              "unstd Means.PHI")) %>%
    select(-standardization) %>%
    mutate(
      parameter = case_when(
        std_parname  == "stdyx X.WITH.PHI" ~ "Correlation",
        std_parname  == "unstd X.WITH.PHI" ~ "Covariance",
        std_parname  == "unstd Means.PHI" ~ "Fixed.Phi",
      ),
      .after = Rep
    ) %>%
    select(uSeed:sig) %>%
    mutate(
      true.value = case_when(
        parameter == "Correlation" ~ 0,
        parameter == "Covariance" ~ 0,
        parameter == "Fixed.Phi" ~ 0.4
      ),
      .after = sig
    ) %>%
    mutate(l2.X.Model = as.factor(paste(l2.dist, Model)),
           .before = N) %>%
    mutate(
      sign.X.sig = case_when(
        upper_2.5ci < true.value ~ "Negative",
        lower_2.5ci > true.value ~ "Positive",
        TRUE ~ "Zero"
      )
    ) %>%
    group_by(type,
             l2.X.Model,
             N,
             T,
             parameter) %>%
    mutate(
      mean.est = mean(est),
      mean.abs.est = mean(abs(est)),
      MAE = mean(abs(est - true.value)),
      MSE = mean((est - true.value) ^ 2),
      RMSE = sqrt(mean((est - true.value) ^ 2)),
      Bias = mean(est) - true.value,
      Variance = var(est),
      n.converged.datasets = n(),
      nonconverged.percent = round(100 * (1000 - n()) / 1000, 2)
    ) %>%
    arrange(est) %>%
    ungroup() %>%
    group_by(type,
             N,
             T,
             l2.dist,
             Model) %>%
    mutate(ord = order(est) - n()/2) %>%
    group_by(sign.X.sig,
             .add = TRUE) %>%
    mutate(percent = round(100 * n() / n.converged.datasets, 1),
           .after = sign.X.sig) %>%
    relocate(sign.X.sig:nonconverged.percent,
             .after = parameter) %>%
    relocate(est,
             .after = percent)

  levels(d_abridged$sign.X.sig) <- c(
    "Significant negative estimates",
    "Non-significant estimates",
    "Significant positive estimates"
  )

  if(return.abridged == TRUE) return(d_abridged)

  d_summary <- d_abridged %>%
    select(type:nonconverged.percent) %>%
    select(-Rep, -est) %>%
    distinct() %>%
    mutate(`Type-1 error` = percent,
           .after = Variance
    )


  d_important <- d_summary[with(d_summary,
                                order(type,
                                      l2.X.Model,
                                      parameter,
                                      N,
                                      T)), ] %>%
    mutate(`Model name` = factor(paste0(Model, "(1)"),
                                 levels = c("NAR(1)",
                                            "Chi2AR(1)",
                                            "BinAR(1)",
                                            "PoDAR(1)")
    ),
    `Means distribution` = paste0(l2.dist, "-distributed means"),
    Resid = case_when(type == "resid.fixed" ~ "`Fixed Residual Variance`",
                      type == "resid.random" ~ "`Random Residual Variance`"),
    .after = l2.X.Model)

  levels(d_important$`Model name`) <- c(`NAR(1)` = "AR(1)",
                                        `Chi2AR(1)` = TeX("$\\chi^2$AR(1)"),
                                        `BinAR(1)` = "BinAR(1)",
                                        `PoDAR(1)` = "PoDAR(1)")

  return(d_important)

}

1.4.2 Making distribution plots of simulated datasets

These functions visualize datasets (simulated or empirical) by generating individual histograms and pairplots of summary statistics of datasets, and a function to generate (and combine) these two plots for any given dataset simulated by do_sim_parallel in the pipeline (see Section 2.2).

The function plot_histograms makes person histograms of the individuals in a dataset (stored as dataframes) and arranges them based on individual means.

Click to expand the code
plot_histograms <- function(d,
                            binwidth = 0.5,
                            nrow.facet = 20,
                            p.colors = c("#FE9929",
                                         "#CC4C02",
                                         "#662506")) {
  fill_col <- p.colors[2]
  x.range <- d$x %>% range()

  d %>%
    group_by(subject) %>%
    mutate(mm = mean(x)) %>%
    arrange(desc(mm)) %>%
    ggplot(aes(x = x,
               y = ..ndensity..)) +
    geom_histogram(binwidth = binwidth,
                   fill = fill_col,
                   center = 0) +
    facet_wrap(~ reorder(subject, mm),
               nrow = nrow.facet) +
    geom_hline(yintercept = 0,
               size = 0.1) +
    ggtitle(NULL) +
    theme_tufte() +
    theme(
      strip.background = element_blank(),
      aspect.ratio = 1,
      strip.text = element_blank(),
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    )

}

The function plot_pairplots calculates the mean, variance, and skewness of each individual in a dataset (stored as dataframes) and generates pairplots of their joint distribution.

Click to expand the code
plot_pairplots <- function(d,
                           p.colors = c("#FE9929",
                                        "#CC4C02",
                                        "#662506"),
                           rel_title = 2,
                           rel_lines = 0.5,
                           rel_dots = 3,
                           bins = 30) {
  color_hist <- p.colors[2]
  color_scatter <- p.colors[1]

  d <- d %>%
    group_by(subject) %>%
    summarise(
      Mean = mean(x,
                  na.rm = TRUE),
      Variance = var(x,
                     na.rm = TRUE),
      Skewness = moments::skewness(x,
                                   na.rm = TRUE)
    ) %>%
    na.omit()

  cors <- d %>%
    select(Mean, Variance, Skewness) %>%
    correlation::correlation() %>%
    as.data.frame() %>%
    mutate(
      value = round(r, 3),
      significance = cut(
        p,
        breaks = c(0, 0.001, 0.01, 0.05, 1),
        include.lowest = T,
        labels = c('***', '**', '*', '')
      ),
      string = paste0(value, significance)
    ) %>%
    select(-r:-n_Obs)

  margin.thing <- 5

  ##  Correlations

  p_cor_Mean.Variance <-
    ggplot() +
    geom_text(aes(
      x = mean(range(d$Variance)),
      y = mean(range(d$Mean)),
      label = paste(
        "Corr:",
        cors %>%
          filter(Parameter1 == "Mean",
                 Parameter2 == "Variance") %>%
          pull(string)
      )
    )) +
    theme_tufte() +
    theme(
      text = element_text(size = rel(rel_title)),
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    )

  p_cor_Mean.Skewness <-
    ggplot() +
    geom_text(aes(
      x = mean(range(d$Skewness)),
      y = mean(range(d$Mean)),
      label = paste(
        "Corr:",
        cors %>%
          filter(Parameter1 == "Mean",
                 Parameter2 == "Skewness") %>%
          pull(string)
      )
    )) +
    theme_tufte() +
    theme(
      text = element_text(size = rel(rel_title)),
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    )

  p_cor_Variance.Skewness <-
    ggplot() +
    geom_text(aes(
      x = mean(range(d$Skewness)),
      y = mean(range(d$Variance)),
      label = paste(
        "Corr:",
        cors %>%
          filter(Parameter1 == "Variance",
                 Parameter2 == "Skewness") %>%
          pull(string)
      )
    )) +
    theme_tufte() +
    theme(
      text = element_text(size = rel(rel_title)),
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    )

  ## Histograms

  p_mean <-
    d %>%
    ggplot(aes(x = Mean)) +
    geom_histogram(fill = color_hist,
                   bins = bins) +
    ggtitle("Mean") +
    theme_tufte() +
    theme(
      plot.title = element_text(
        size = rel(rel_title),
        hjust = 0.5,
        margin = margin(t = margin.thing,
                        b = -margin.thing)
      ),
      axis.title = element_blank(),
      axis.ticks.x = element_blank(),
      axis.text.x = element_blank()
    )

  p_variance <-
    d %>%
    ggplot(aes(x = Variance)) +
    geom_histogram(fill = color_hist,
                   bins = bins) +
    ggtitle("Variance") +
    theme_tufte() +
    theme(
      plot.title = element_text(
        size = rel(rel_title),
        hjust = 0.5,
        margin = margin(t = margin.thing,
                        b = -margin.thing)
      ),
      axis.title = element_blank(),
      axis.ticks = element_blank(),
      axis.text = element_blank()
    )

  p_skewness <-
    d %>%
    ggplot(aes(x = Skewness)) +
    geom_histogram(fill = color_hist,
                   bins = bins) +
    ggtitle("Skewness") +
    geom_vline(
      xintercept = 0,
      alpha = 0.5,
      color = "gray45",
      linetype = "solid",
      size = rel(rel_lines)
    ) +
    geom_vline(
      xintercept = c(-0.5, 0.5),
      color = "gray45",
      linetype = "dotted",
      size = rel(rel_lines)
    ) +
    geom_vline(
      xintercept = c(-1, 1),
      color = "gray45",
      linetype = "dashed",
      size = rel(rel_lines)
    ) +
    theme_tufte() +
    theme(
      plot.title = element_text(
        size = rel(rel_title),
        hjust = 0.5,
        margin = margin(t = margin.thing,
                        b = -margin.thing)
      ),
      axis.title = element_blank(),
      axis.ticks.y = element_blank(),
      axis.text.y = element_blank()
    )

  ## scatter plots

  p_scatter_Mean.Variance <- d %>%
    ggplot(aes(x = Mean,
               y = Variance)) +
    geom_point(
      color = color_scatter,
      shape = "bullet",
      alpha = 0.85,
      size = rel(rel_dots)
    ) +
    theme_tufte() +
    theme(
      axis.title = element_blank(),
      axis.ticks.x = element_blank(),
      axis.text.x = element_blank()
    )



  p_scatter_Mean.Skewness <- d %>%
    ggplot(aes(x = Mean,
               y = Skewness)) +
    geom_hline(
      yintercept = 0,
      alpha = 0.5,
      color = "gray45",
      linetype = "solid",
      size = rel(rel_lines)
    ) +
    geom_hline(
      yintercept = c(-0.5, 0.5),
      color = "gray45",
      linetype = "dotted",
      size = rel(rel_lines)
    ) +
    geom_hline(
      yintercept = c(-1, 1),
      color = "gray45",
      linetype = "dashed",
      size = rel(rel_lines)
    ) +
    geom_point(
      color = color_scatter,
      shape = "bullet",
      alpha = 0.7,
      size = rel(rel_dots)
    ) +
    theme_tufte() +
    theme(axis.title = element_blank())



  p_scatter_Variance.Skewness <- d %>%
    ggplot(aes(x = Variance,
               y = Skewness)) +
    geom_hline(
      yintercept = 0,
      alpha = 0.5,
      color = "gray45",
      linetype = "solid",
      size = rel(rel_lines)
    ) +
    geom_hline(
      yintercept = c(-0.5, 0.5),
      color = "gray45",
      linetype = "dotted",
      size = rel(rel_lines)
    ) +
    geom_hline(
      yintercept = c(-1, 1),
      color = "gray45",
      linetype = "dashed",
      size = rel(rel_lines)
    ) +
    geom_point(
      color = color_scatter,
      shape = "bullet",
      alpha = 0.7,
      size = rel(rel_dots)
    ) +
    theme_tufte() +
    theme(
      axis.title = element_blank(),
      axis.ticks.y = element_blank(),
      axis.text.y = element_blank()
    )

  ## Putting plots together

  p_pairplots <-
    (p_mean + p_cor_Mean.Variance + p_cor_Mean.Skewness) /
    (p_scatter_Mean.Variance + p_variance + p_cor_Variance.Skewness) /
    (p_scatter_Mean.Skewness + p_scatter_Variance.Skewness + p_skewness) &
    # plot_layout(guides = "collect") &
    plot_annotation(theme = theme(
      plot.title = element_text(family = "Merriweather Regular"),
      axis.text = element_text(size = rel(rel_title) / 2)
    ))

  ## Returning the 3x3 pairplots
  p_pairplots

}

The function plot_dataset.profile gets a simulated object (generated by do_sim_parallel) and combines individual histograms and pairplots in a single figure.

Click to expand the code
plot_dataset.profile <- function(sim.object,
                                 p.colors = NULL,
                                 title_l2.dist = FALSE) {
  d_sim <- sim.object$sim.Dataset

  l2.distr <- ifelse(sim.object$l2.dist == "Chi2",
                     "$\\chi^2$",
                     "Gaussian")
  model <- sim.object$Model
  title <- model

  binwidth <- 0.5
  if (model == "NAR") {
    title <- "AR"
    palette_color <- palette_nar
  }
  if (model == "Chi2AR") {
    title <- "$\\chi^2$AR"
    palette_color <- palette_chiar
  }
  if (model == "BinAR")
    palette_color <- palette_binar
  if (model == "PoDAR")
    palette_color <- palette_podar

  if (is.null(p.colors))
    p.colors <- palette_color

  title <- paste0(title,
                  "(1)",
                  " $\\phantom{\\chi^2}$")

  if (title_l2.dist)
    title <- paste0(l2.distr,
                    "-distributed means",
                    " $\\phantom{\\chi^2}$")


  p_out <-  plot_histograms(
    d = d_sim,
    binwidth = binwidth,
    nrow.facet = 10,
    p.colors = p.colors
  ) +
    plot_spacer() +
    plot_pairplots(d = d_sim,
                   p.colors = p.colors) +
    plot_layout(widths = c(1, 0.05, 1)) +
    plot_annotation(
      title = TeX(title),
      theme = theme(
        plot.title = element_text(size = rel(2.5),
                                  family = "CMU Serif"),
        plot.subtitle = element_blank()
      )
    )

  p_out %>% wrap_elements()

}

We also write the function save_dataset_profile to put together two dataset profile plots to get to plots similar to those in Figures S1, S2, S4, and S5 in the Supplemental Materials, and save the final figure as a PDF file.

Click to expand the code
save_dataset_profile <- function(upper,
                                 lower,
                                 file.name = "Dataset profile",
                                 path = "figures",
                                 title = NULL) {
  p_profiles <-
    (
      plot_dataset.profile(upper, title_l2.dist = TRUE) /
        plot_spacer() /
        plot_dataset.profile(lower, title_l2.dist = TRUE)
    ) +
    plot_layout(heights = c(1, 0.02, 1)) +
    plot_annotation(theme = theme(
      plot.title = element_text(
        size = rel(4),
        hjust = 0.5,
        family = "CMU Serif"
      ),
      plot.subtitle = element_blank()
    ))

  ggsave(
    paste0(file.name,
           ".pdf"),
    p_profiles,
    path = path,
    width = 2 * 15,
    height = 2.02 * 1.1 * 15,
    units = "cm"
  )

}

1.4.3 Making plots of estimation results

The functions here make the main figures of the paper, which include aspects of model fit (e.g., estimation bias and RMSE, and Type-I error rates, etc.).

The function plot_Model.x.Resid makes a figure with two columns (for the models with fixed or random residual variance), that show the outcomes of a fit measure (e.g., bias) for all the four DGMs with a given level-2 distribution of means for different \(N\)s and \(T\)s.

Click to expand the code
plot_Model.x.Resid <- function(d_important,
                               which.parameter,
                               which.measure = "Positive error",
                               Means.dist = c("Gaussian", "Chi2"),
                               line.width = 1,
                               font.scale = 5) {

  ddd <- d_important %>%
    filter(parameter == which.parameter)


  if (!grepl("error",
             tolower(which.measure),
             fixed = TRUE)) {
    ddd$value <- ddd %>% pull(which.measure)
    ddd <- ddd %>% filter(sign.X.sig == "Zero")
    y.axis <- which.measure
  }


  if (grepl("positive",
            tolower(which.measure),
            fixed = TRUE)) {
    which.measure <- "Type-1 error"
    ddd$value <- ddd %>% pull(which.measure)
    ddd <- ddd %>% filter(sign.X.sig == "Positive")
    y.axis <- "Positive error"
  }

  if (grepl("negative",
            tolower(which.measure),
            fixed = TRUE)) {
    which.measure <- "Type-1 error"
    ddd$value <- ddd %>% pull(which.measure)
    ddd <- ddd %>% filter(sign.X.sig == "Negative")
    y.axis <- "Negative error"
  }

  if (grepl("total",
            tolower(which.measure),
            fixed = TRUE)) {
    which.measure <- "Type-1 error"
    ddd$value <- 100 - (ddd %>% pull(which.measure))
    ddd <- ddd %>% filter(sign.X.sig == "Zero")
    y.axis <- "Total error"
  }


  y.range <- ddd$value %>% range()
  y.range[1] <- min(-0.001, y.range[1])
  y.range[2] <- max(0.001, y.range[2])

  ## Making sure the zero values in errors are not missing
  ddd <- ddd  %>%
    filter(l2.dist == Means.dist) %>%
    ungroup() %>%
    complete(Resid, `Model name`, N, T,
             fill = list(value = 0))

  ddd$N <- as.factor(ddd$N)
  ddd$T <- as.factor(ddd$T)


  title <- ifelse(
    Means.dist == "Gaussian",
    TeX("Gaussian-distributed means $\\phantom{\\chi^2}$"),
    TeX("$\\chi^2$-distributed means")
  )

  output.plot <- ddd %>%
    ggplot() +
    aes(
      x = T,
      y = value,
      group = N,
      color = N
    ) +
    ## draw solid y=0 axis line
    geom_hline(
      yintercept = 0,
      linetype = "solid",
      color = "black",
      alpha = 1
    ) +
    geom_line(size = rel(1),
              alpha = 0.8,
              lineend = "round") +
    geom_point(size = rel(1.5), alpha = 1) +
    scale_color_manual(values = brewer.pal(name = "PuRd", n = 9)[c(4, 6, 9)]) +
    facet_grid(
      rows = vars(`Model name`),
      cols = vars(Resid),
      labeller = label_parsed
    ) +
    theme_light() +
    theme_pubclean() +
    scale_y_continuous(limits = y.range) +
    ylab(y.axis) +
    ggtitle(title) +
    theme(
      plot.title = element_text(
        family = "CMU Serif"),
      panel.spacing = unit(0.7, "lines"),
      legend.position = "bottom",
      legend.key = element_rect(colour = NA, fill = NA),
      text = element_text(size = 10,
                          family = "CMU Serif")
    )

  if (which.measure == "Type-1 error") {
    output.plot <- ddd %>%
      ggplot() +
      aes(
        x = T,
        y = value,
        group = N,
        color = N
      ) +
      ## draw solid y=0 axis line
      geom_hline(
        yintercept = 0,
        linetype = "solid",
        color = "black",
        alpha = 1
      ) +
      ## manually add needed grids
      geom_hline(
        yintercept = c(10, 20, 40, 60),
        linetype = "dotted",
        color = "#BEBEBE",
        alpha = 1
      ) +
      geom_line(size = rel(1),
                alpha = 0.8,
                lineend = "round") +
      geom_point(size = rel(1.5), alpha = 1) +
      scale_color_manual(values = brewer.pal(name = "PuRd", n = 9)[c(4, 6, 9)]) +
      facet_grid(
        rows = vars(`Model name`),
        cols = vars(Resid),
        labeller = label_parsed
      ) +
      theme_light() +
      theme_pubclean() +
      scale_y_continuous(breaks = c(2.5, 10, 20, 40, 60, 80, 90, 100),
                         ## looked up myself to assure  + and - ranges are equal
                         limits = c(0, 61)) +
      ylab(y.axis) +
      ggtitle(title) +
      theme(
        plot.title = element_text(
          family = "CMU Serif"),
        panel.spacing = unit(0.7, "lines"),
        legend.position = "bottom",
        legend.key = element_rect(colour = NA, fill = NA),
        legend.key.width = unit(10, "mm"),
        text = element_text(size = 10,
                            family = "CMU Serif"),
        ## remove grid lines
        panel.grid.major.y = element_blank()
      )

    ## add thresholds
    if (y.axis == "Total error")
      output.plot <- output.plot +
        geom_hline(yintercept = 5,
                   linetype = "dashed",
                   alpha = 0.7)

    if (y.axis != "Total error")
      output.plot <- output.plot +
        geom_hline(yintercept = 2.5,
                   linetype = "dashed",
                   alpha = 0.7)

  }

  output.plot <- output.plot +
    geom_hline(yintercept = 0,
               alpha = 0.6)

  return(output.plot)

}

The function plot_quadrants makes a larger figure that includes four quadrants, each generated with plot_Model.x.Resid., that show the main results of the paper.

Click to expand the code
plot_quadrants <- function(d_important,
                           which.parameter = "Correlation",
                           upper.measure = "Bias",
                           lower.measure = "RMSE",
                           line.width = 2,
                           font.scale = 8) {
  p.upper.left <- plot_Model.x.Resid(d_important,
                                     which.parameter,
                                     upper.measure,
                                     "Gaussian",
                                     line.width,
                                     font.scale) + xlab(NULL)

  p.upper.right <- plot_Model.x.Resid(d_important,
                                      which.parameter,
                                      upper.measure,
                                      "Chi2",
                                      line.width,
                                      font.scale) + xlab(NULL) + ylab(NULL)


  p.lower.left <- plot_Model.x.Resid(d_important,
                                     which.parameter,
                                     lower.measure,
                                     "Gaussian",
                                     line.width,
                                     font.scale)  + ggtitle(NULL)

  p.lower.right <- plot_Model.x.Resid(d_important,
                                      which.parameter,
                                      lower.measure,
                                      "Chi2",
                                      line.width,
                                      font.scale)  + ggtitle(NULL) + ylab(NULL)


  p.patchwork <-
    (p.upper.left | p.upper.right) / (p.lower.left | p.lower.right)

  title <- paste(upper.measure,
                 "and",
                 lower.measure,
                 "in the estimated",
                 tolower(which.parameter))

  if (which.parameter == "Fixed.Phi")
    title <- "Level-2 $\\phi$"

  if (grepl("error", tolower(upper.measure), fixed = TRUE)) {
    title <- paste("Type-I error rates in the estimated",
                   tolower(which.parameter))
  }


  p.final <- p.patchwork +
    plot_layout(guides = "collect") +
    plot_annotation(
      theme = theme(plot.title =
                      element_text(
                        size = 20,
                        family = "CMU Serif",
                        hjust = 0.5
                      ))) &
    theme(legend.position = "bottom")

  return(p.final)

}

1.4.4 Making plots of parameter estimates coverage

The function plot_caterpillar , for a given condition and a given parameter, makes plots of 95% confidence intervals (as vertical bars) of all converged replications within that condition. The lines are ordered based on the point estimates and are colored based on whether they crossed zero (the correct estimate) or whether the whole 95% interval was bellow or above zero (significantly negative or positive estimates).

Click to expand the code
plot_caterpillar <- function(d_abridged,
                             l2.dist_ = "Chi2",
                             Model_ = "PoDAR",
                             analysis.type = "resid.fixed",
                             parameter_ = "covariance",
                             legend.key.width = 5,
                             legend.line.width = 10) {

  title <- ifelse(
    analysis.type == "resid.fixed",
    "fixed residual variance",
    "random residual variance"
  ) %>%
    paste("Modeled with", .)

  dd <- d_abridged %>%
    filter(l2.dist == l2.dist_,
           Model == Model_) %>%
    filter(tolower(parameter) == tolower(parameter_)) %>%
    filter(type == analysis.type) %>%
  select(type,
         N,
         T,
         l2.dist,
         Model,
         est,
         sig,
         lower_2.5ci,
         upper_2.5ci
  ) %>%
    group_by(N, T, type, Model, l2.dist) %>%
    mutate(sign.X.sig = as.factor(sign(est)*sig)) %>%
    # mutate(sign.X.sig = factor(sign.X.sig, levels = c("Negative Type-I error",
    #                                                   "Non-significant estimates",
    #                                                   "Positive Type-I error"))) %>%
    mutate(mean.est = mean(est),
           n.datasets = n(),
           nonconverged.percent = round(100*(1000-n())/1000,2)) %>%
    group_by(sign.X.sig,
             .add = TRUE) %>%
    mutate(error.percents = round(100*n()/n.datasets,2)) %>%
    arrange(N, T,
            .by_group = TRUE) %>%
    mutate(ordering = est # + 100*sign(est)*sig
    ) %>%
    arrange(ordering) %>%
    ungroup() %>%
    group_by(type,
             N,
             T,
             l2.dist,
             Model) %>%
    mutate(ord = order(ordering) - n()/2) %>%
    mutate(NN = as.factor(paste("N =", N)),
           TT = as.factor(paste("T =", T)))


  levels(dd$sign.X.sig) <- c("Significant negative estimates",
                            "Non-significant estimates",
                            "Significant positive estimates")

  y.range <- c(min(dd$lower_2.5ci), max(dd$upper_2.5ci))

  dd %>%
    mutate(NN = as.factor(paste("N =", N)),
           TT = as.factor(paste("T =", T))) %>%
    ggplot() +
    aes(x = ord,
        color = sign.X.sig) +
    geom_segment(
      aes(
        x = ord,
        xend = ord,
        y = upper_2.5ci,
        yend = lower_2.5ci
      ),
      alpha = 0.8 + 0.2,
      size = rel(0.06)
    ) +
    geom_segment(
      aes(
        x = ord,
        xend = ord,
        y = est + min(0.01, 0.05 * abs(upper_2.5ci - lower_2.5ci)),
        yend = est - min(0.01, 0.05 * abs(upper_2.5ci - lower_2.5ci))
      ),
      alpha = 0.8 + 0.2,
      color = "white",
      size = rel(0.06)
    ) +
    geom_hline(
      yintercept = 0,
      linetype = "solid",
      size = rel(0.15),
      color = "black",
      alpha = 0.8
    ) +
    theme_pubclean() +
    guides(colour = guide_legend(override.aes = list(
      linewidth = rel(1.5),
      alpha = 1
    ))) +
    scale_color_manual(values =
                         c("#F67E4BFF",
                           "#98CAE1FF",
                           "#364B9AFF")) +
    scale_y_continuous(limits = y.range) +
    facet_grid(rows = vars(NN),
               # rows = vars(rev(TT)),
               cols = vars(TT)) +
    ggtitle(title) +
    theme(
      legend.background = element_rect(colour = NA, fill = NA),
      legend.key = element_rect(colour = NA, fill = NA),
      legend.title = element_blank(),
      axis.title.x = element_blank(),
      axis.ticks.x = element_blank(),
      # "Score",
      axis.text.x = element_blank(),
      legend.key.width = unit(legend.key.width * 2, "mm"),
      axis.text.y = element_text(size = 10),
      text = element_text(family = "CMU Serif",
                          size = 12)
    ) +
    xlab(NULL) +
    ylab(paste("Estimated", parameter))

}

1.4.5 Making plots for DGM time series

A function was also made to make profiles of individual time series, that include the time series itself, its marginal distribution, and its sample ACF.

Click to expand the code
plot_dgm.profile <- function(d,
                             p.colors = colors.nar.khaki,
                             Model.name = "NAR(1)",
                             burnin_proportion = 0.9){

  binwidth_ <- 0.5

  ## To make sure the plot title has the same height as Chi2AR model
  Model.name <- paste("$\\phantom{\\chi^2}$",
                      Model.name,
                      "$\\phantom{\\chi^2}$")

  d$obj.id <- d$obj.id %>% factor(unique(d$obj.id))
  d <- d %>%
    filter(t > max(d$t)*burnin_proportion) %>%
    mutate(t = t - burnin_proportion*max(t))

  acf.lag.max <- 10

  d_acf <- d %>%
    group_by(obj.id) %>%
    summarize(lag = stats::acf(value,
                               lag.max = acf.lag.max,
                               plot = FALSE)$lag %>% as.numeric(),
              acf = stats::acf(value,
                               lag.max = acf.lag.max,
                               plot = FALSE)$acf %>% as.numeric()
    )
  empirical.rho <- d_acf %>%
    filter(lag == 1)
  empirical.rho <- empirical.rho[rep(1:nrow(empirical.rho),
                                     times = acf.lag.max*20 + 1), ]

  e.l <- rep(seq(0, acf.lag.max, 0.05),
             each = 3)
  empirical.rho$lag <- e.l
  empirical.rho$acf <- empirical.rho$acf^empirical.rho$lag


  # time series plot
  p_ts <- d %>%
    ggplot(aes(x = t,
               y = value,
               group = obj.id)) +
    geom_line(aes(color = obj.id),
              alpha = 1,
              size = 1) +
    geom_hline(aes(yintercept = Mean),
               size = 0.75,
               linetype = "dashed",
               color = "honeydew3") +
    ggtitle("Time series") +
    xlab("t") +
    theme(
      axis.title.y = element_blank(),
      legend.position = "none", # c(0.1,0.95),
      legend.background = element_rect(colour = NA, fill = NA),
      axis.title.x = element_text(size = 17,
                                  family = "Merriweather Regular"),
      axis.text = element_text(size = 20,
                               family = "Merriweather Regular"),
      legend.text.align = 0,
      legend.title = element_blank(),
      legend.key = element_rect(colour = NA,
                                fill = NA),
      panel.background = element_blank()) +
    scale_colour_manual(values = p.colors,
                        labels = unname(TeX(unique(as.character(d$obj.id))))
    )

  p_hist <- d %>%
    ggplot(aes(x = value,
               group = obj.id,
               fill = obj.id)) +
    geom_histogram(aes(y=..ndensity..),
                   center = 0,
                   binwidth = binwidth_) +
    facet_wrap(~obj.id,
               nrow = 3) +
    geom_hline(yintercept = 0,
               size = 0.5) +
    geom_segment(aes(x = Mean,
                     xend = Mean,
                     y = 0,
                     yend = 1),
                 size = 0.75,
                 linetype = "dashed",
                 color = "honeydew3") +
  theme_tufte() +
    ggtitle("Marginal distribution") +
    xlab("X") +
    scale_y_continuous(limits = c(min(d_acf$acf), 1)) +
    theme(strip.background = element_blank(),
          strip.text = element_blank(),
          axis.title.x = element_text(size = 17,
                                      family = "Merriweather Regular"),
          axis.ticks.y = element_blank(),
          axis.text.y = element_blank(),
          axis.title.y = element_blank(),
          legend.position = "none",
          legend.text.align = 0,
          legend.title = element_blank(),
          legend.key = element_rect(colour = NA,
                                    fill = NA),
          panel.background = element_blank()) +
    scale_fill_manual(values = p.colors,
                      unname(TeX(unique(as.character(d$obj.id))))
    )


  ### ACF plots

  d_acf <- d %>%
    group_by(obj.id) %>%
    summarize(lag = stats::acf(value,
                               lag.max = acf.lag.max,
                               plot = FALSE)$lag %>% as.numeric(),
              acf = stats::acf(value,
                               lag.max = acf.lag.max,
                               plot = FALSE)$acf %>% as.numeric(),
              empirical.rho = stats::acf(value,
                                         lag.max = acf.lag.max,
                                         plot = FALSE)$acf[2] %>% as.numeric())




  p_acf <- d_acf %>%
    ggplot(aes(x = lag,
               y = acf,
               color = obj.id)) +
    geom_segment(aes(x = lag,
                     xend = lag,
                     y = 0,
                     yend = acf
    ),
    size = 1.5,
    lineend = "butt") +
    # exponential fit
    geom_line(data = empirical.rho,
              aes(x = lag,
                  y = acf),
              linetype = 2,
              color = "honeydew3",
              size = 0.75) +
    facet_wrap(~obj.id,
               nrow = 3) +
    geom_hline(aes(yintercept = 0),
               size = 0.5) +
    ggtitle("Sample ACF") +
    theme_tufte() +
    theme(strip.background = element_blank(),
          strip.text = element_blank(),
          axis.title.y = element_blank(),
          axis.title.x = element_text(size = 17,
                                      family = "Merriweather Regular"),
          legend.position = "none",
          legend.text.align = 0,
          legend.title = element_blank(),
          legend.key = element_rect(colour = NA,
                                    fill = NA)
    ) +
    scale_x_continuous(breaks = seq(0, acf.lag.max, 2)) +
    xlab("Lag") +
    ylab("Autocorrelation function") +
    theme(panel.background = element_blank()) +
    scale_color_manual(values = p.colors,
                       unname(TeX(unique(as.character(d$obj.id))))
    )

  p_profile.main <- p_ts + plot_spacer() + p_hist + plot_spacer() + p_acf +
    plot_layout(widths = c(4, 0.05, 1.5, 0.05, 2)) &
    theme(
      # legend.key.width = unit(2, "line"),
      # legend.text=element_text(size=10),
      # legend.position = "none",
      axis.text = element_text(size = 17,
                               family = "Merriweather Regular"),
      plot.title = element_text(size = 15+5,
                                family = "Merriweather Regular")
    )

  p_out <- p_profile.main

  return(p_out)

}

2 Pipeline

We implemented each of these tasks in separate functions that were essentially wrapper functions (with parallel-computing implementation) around the modular components. Using these wrapper functions, each replication of each simulated condition was saved in a separate .rds file. These data files were fed to the analysis wrapper function whose output was saved in separate .rds data files. To collect relevant parameter estimates, another wrapper function was used to read the data files and save the desired parameters in a dataframe, which then used in reporting.

2.1 Book-keeping

The outcomes of the components are saved in separate .rds files and indexed by unique, descriptive names, and each replication is given a unique numeric identifier that is also used as the random seed used to generate the dataset within each replication. The file names and address are stored in two dataframes along with model parameters used to generate each dataset, and these dataframes are used when reading and writing data files in other components.

The function make_sim_refs makes a table consisting of all possible combinations of each conditions (that are provided as a list conditions), and replicates it Reps times (number of replications, \(R = 1000\)). Then, to make sure that each simulated dataset is produced with a unique random seed, the function generates a unique uSeed based on the number (and values) of conditions and the replication number. Furthermore, an initial simulation seed (simSeed) is included in generation of uSeed which makes it possible to have different batches of simulations (for instance, if one wants to run the simulation for another 1000 times).

For each condition, a unique seed is generated by weighting different conditions (that are indexed in a vector d.integer) by prime numbers (that are not among the prime factors of the number of conditions) and summing them up (which is done with dot-producting d.integer %*% primes.seq). Then, to make unique seeds per replications, the replication number is concatenated to the left side of d.integer %*% primes.seq, and everything is put into a dataframe (called d.headers, that is eventually returned by the function) with unique file names for individual .rds files (that include uSeed as well as the replication number and the value of condition used in data generation, e.g., sim_uSeed-13293_l2.dist-Chi2_Model-BinAR_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1) and the address of the folder in which the .rds files were stored. Finally, if desired, the simulation reference table d.headers is saved as .csv and .rds.

Click to expand the code
make_sim_refs <-
  function(conditions = list(T = c(30, 100),
                             N = c(100),
                             Model = c("BinAR", "Chi2AR", "DAR"),
                             l2.dist = c("Gaussian", "Chi2"),
                             phi = c(0.4)
  ),
  Reps = 100,
  simSeed = 0,
  save.directory = "self-sim",
  save.refs.filename = paste("sim_refs",
                             Sys.Date()),
  prefix.sim.datasets = "sim_"){

    dir.create(save.directory, showWarnings = FALSE)

    # allow Reps to be used as a vector of indexes
    if(length(Reps)<2) Reps = seq_len(Reps)

    # save global seed of the global env and set it back before leaving
    seed.old <- .Random.seed
    on.exit({
      .Random.seed <<- seed.old
    })

    # sorting conditions alphabetically
    conditions <- conditions[order(names(conditions))]

    conditions$simSeed <- simSeed
    conditions$Rep <- Reps

    n.conditions <- length(conditions)

    # making the first columns of the data frame
    d <- conditions %>%
      expand.grid(stringsAsFactors = TRUE)

    # transforming factors to numerics
    d.numeric <- d
    factor.columns <- sapply(d.numeric, is.factor)
    d.numeric[factor.columns] <-
      sapply(d.numeric[factor.columns], as.numeric)

    # getting rid of non-integers and Rep
    d.integer <- d.numeric[, -n.conditions] %>%
      apply(2,
            function(x)
              x %>%
              as.character() %>%
              gsub("\\.", "", .) %>%
              as.numeric())

    # to make unique seeds, we must sum conditions weighted by prime numbers
    # but the primes must not be among prime factors of conditions

    # conditions prime factors
    cpfs <- d.integer %>%
      unlist() %>%
      as.numeric() %>%
      unique() %>%
      primes::prime_factors() %>%
      unlist() %>%
      unique()

    primes.seq <- c(cpfs,
                    primes::generate_n_primes(ncol(d.integer) + length(cpfs)))

    primes.seq <-
      primes.seq[!(duplicated(primes.seq) |
                     duplicated(primes.seq, fromLast = TRUE))]

    uSeed <- d.integer %*% primes.seq %>%
      as.character() %>%
      paste0(as.character(d.numeric$Rep),
             .)

    d.headers <- d %>%
      mutate(
        uSeed = uSeed,
        sim.Path = save.directory,
        sim.File = NA
      ) %>%
      relocate(uSeed, .before = 1)

    for (r in 1:nrow(d.headers)) {
      only.headers <- d.headers %>%
        select(-sim.Path:-sim.File) %>%
        colnames()
      r.values <- d.headers[r, only.headers]
      factor.columns <- sapply(r.values, is.factor)
      r.values[factor.columns] <-
        sapply(r.values[factor.columns], as.character)
      d.headers[r, "sim.File"] <- only.headers %>%
        paste0("-") %>%
        paste0(r.values) %>%
        paste(collapse = "_") %>%
        paste0(".rds") %>%
        paste0(prefix.sim.datasets, .) # prefix for simulated datasets
    }

    d <- d.headers

    # getting rid of factors and making them into strings
    factor.columns <- sapply(d.headers, is.factor)
    d.headers[factor.columns] <-
      sapply(d[factor.columns], as.character)

    # making uSeed numeric
    d.headers$uSeed <- d.headers$uSeed %>% as.numeric()

    # Save the references dataframe to a file, if desired
    if(!is.null(save.refs.filename)){
      saveRDS(d.headers,
              file = here::here(save.directory, paste0(save.refs.filename,
                                                       ".rds"))
      )
      write.csv(d.headers,
                file = here::here(save.directory, paste0(save.refs.filename,
                                                         ".csv")),
                row.names = FALSE)
    }

    # return the reference dataframe
    return(d.headers)
  }

Note that initially, we started by having two values for the length of the time series (\(T = 30\) corresponding to one month of measurements, and \(T = 100\) for a reasonably long time series) and one sample size (\(N = 100\), and subsample the datasets for \(N = 25\) and \(N = 50\)), and we decided to include five DGMs in the simulations: the AR(1), \(\chi^2\)AR(1), BinAR(1), PoDAR(1) models, and a DAR(1) model with binomial marginal distribution). Eventually (and midway through the simulations), we decided to omit the \(T = 30\) condition and the DAR(1) model from our simulation. However, as omitting these would have led to change in the calculations of uSeed, we decided to keep it the simulation reference table intact and omit Model == "DAR" and T == 30 later on (see Section 2.3 for details).

Further note that we only generated datasets with \(N = 100\) and \(T = 100\), and sub-sampled other sample sizes and time series lengths (\(N = 25, 50\) and \(T = 25, 50\)) from them. Although this had computational benefits (less simulation time, and less storage required), the main motivation behind this decision was to be able to mimic empirical data collection: Sub-sampling the large datasets would be equated with collecting less data than the ideal case in which we have many participants with many measurements (so sub-sampling what could have been collected). Furthermore, comparing the datasets with the same DGM and level-2 distribution would be more meaningful: All datasets with same Rep value within the same DGM (say, PoDAR(1) model) and with the same level-2 distribution (say, \(\chi^2\) distribution) have the same uSeed, thus they can be matched. Consequently, the differences we observe for different \(N\)s and \(T\)s within each cell of the figures reported in the paper can mimic the effect of “non-ideal” sampling.

Changing the conditions (omitting Model == "DAR" and T == 30 from the simulation reference dataset, and adding additional conditions for N and T) is done when running the pipeline (see Section 2.3) and the sub-sampling is implemented when running the analysis in do_fit_doFuture (see Section 2.2).

Another dataframe was generated to include the references of analyses results per simulation condition and analysis condition (or hyperparameters), and file names and directory of individual .rds analysis outputs. The hyperparameters included the analysis type (with fixed vs. random residual variance) and the number of iterations and thinning in the MCMC algorithm, which were then used as arguments of run_MplusAutomation.

Click to expand the code
make_fit_refs <-
  function(sim_refs,
           hyperparameters = list(iter = c(2000),
                                  thin = c(5),
                                  type = c("resid.random",
                                           "resid.fixed")),
           save.directory = "self-sim",
           save.refs.filename = paste("fit_refs",
                                      Sys.Date()),
           prefix.fit.datasets = "fit_"){

    dir.create(save.directory, showWarnings = FALSE)

    h <- hyperparameters %>%
      expand.grid(stringsAsFactors = TRUE)

    dd <- sim_refs[rep(1:nrow(sim_refs), times = nrow(h)),]
    rownames(dd) <- NULL
    hh <- h[rep(1:nrow(h), each = nrow(sim_refs)),]
    rownames(hh) <- NULL

    d <- cbind(dd,hh)
    d <- d %>%
      mutate(fit.Path = save.directory,
             fit.File = NA
      )

    only.headers <- names(hyperparameters) %>%
      c("uSeed",
        "N",
        "T",
        .,
        "Rep")
    only.headers <- only.headers[!(only.headers %in% c("iter", "thin"))]

    for (r in 1:nrow(d)) {
      r.values <-d[r, only.headers]
      factor.columns <- sapply(r.values, is.factor)
      r.values[factor.columns] <-
        sapply(r.values[factor.columns], as.character)
      d[r, "fit.File"] <- only.headers %>%
        paste0("-") %>%
        paste0(r.values) %>%
        paste(collapse = "_") %>%
        paste0(".rds") %>%
        paste0(prefix.fit.datasets, .) # prefix for fitted datasets
      print(r)
    }

    if(!is.null(save.refs.filename)){
      saveRDS(d,
              file = here::here(save.directory,
                                paste0(save.refs.filename,
                                       ".rds"))
      )
      write.csv(d,
                file = here::here(save.directory,
                                  paste0(save.refs.filename,
                                         ".csv")),
                row.names = FALSE)
    }

    return(d)
  }

2.2 Pipeline functions

We implemented the components described in Section 1 using the following functions.

The function do_sim_parallel is a wrapper around make_datasets (see Section 1.1.3) and uses the clusterApplyLB function from the package snow (Tierney et al. 2021) for parallelization. This wrapper gets the dataframe of simulation file references (sim_refs, which is the output of make_sim_refs), and for each of its rows, simulates an \(N \times T\) dataframe and stores it with other information in the same row—and also additional info about the time the simulating the dataset started and ended—in a list, and saves each list as a separate .rds file in the target folder (specified with sim.Path). Having the additional data in the saved files allow using them independent from the table of references. Note that to prevent the overload of the hard drive with many parallel write requests (to save the simulated datasets), for the first nClust rounds, using the Sys.sleep function, the data generation is started after a varying amount of delay. Given that every dataset simulation in the first round may take different amount of times, after the first round is complete, the simulations on different cores fall out of sync and there is no more need for adding delays for the next rounds.

Addendum: Version 2

In v2, make_dataset returns a list. Note that do_sim_parallel also saves the vectors of means and \(\phi\)s in the saved simulation file as $given.Means and $given.Phis.

Click to expand the code
do_sim_parallel <-
  function(sim_refs,
           nClust = 48,
           save.directory = "self-sim",
           alternative.sim.Path = NULL,
           clusterLOG.filename = paste0("sim_clusterLOG_",
                                        Sys.Date(),
                                        ".txt"),
           sleeptime = 1) {
    cl <- snow::makeSOCKcluster(nClust,
                                outfile = here::here(save.directory,
                                                     clusterLOG.filename))
    debug <- TRUE

    d <- sim_refs

    ## Start clusters:

    snow::clusterExport(cl,
                        c("d",
                          "alternative.sim.Path",
                          "debug"),
                        envir = environment())

    ## Get the timings

    t.snow <- snow::snow.time({
      ## Run the cluster
      snow::clusterApplyLB(cl = cl,
                           seq_len(nrow(d)),
                           function(i) {
                             if (i <= nClust)
                               Sys.sleep(sleeptime * i)

                             d_i <- as.list(d[i,])
                             arguments <-
                               as.list(d_i[2:(length(d_i) - 4)])
                             arguments$seed <- d_i$uSeed

                             sim.Path <-
                               ifelse(is.null(alternative.sim.Path),
                                      d_i$sim.Path,
                                      alternative.sim.Path)

                             sim.StartTime <- Sys.time()

                             output_make_datasets <-
                               do.call(make_datasets,
                                       arguments)

                             if(is.data.frame(output_make_datasets)){
                               d_i$sim.Dataset <- output_make_datasets
                             }else{
                               # Adapting for v2 implementation
                               d_i$sim.Dataset <- output_make_datasets$dataset
                               d_i$given.Means <- output_make_datasets$Means
                               d_i$given.Phis <- output_make_datasets$Phis
                               }
                             d_i$sim.StartTime <- sim.StartTime
                             d_i$sim.EndTime <- Sys.time()
                             d_i$sim.ElapsedTime <-
                               d_i$sim.EndTime - d_i$sim.StartTime

                             saveRDS(d_i,
                                     file = here::here(sim.Path,
                                                       d_i$sim.File))

                           })

    })

    ## Stop the cluster:
    snow::stopCluster(cl)

    return(t.snow)

  }

The function do_fit_doFuture is a wrapper around run_MplusAutomation (see Section 1.2) and uses the package doFuture (Bengtsson 2020) as a parallelization backend to plyr::a_ply. The function gets a dataframe containing references of analyses results (fit_refs, which is the output of make_fit_refs), and row by row reads the simulated datasets, sub-samples the data if \(N < 100\) or \(T < 100\) (see the second note under make_sim_refs in Section 2.1) and runs the analysis based on the hyperparameters specified in fit_refs. It then saves the outcome of the analysis (the Mplus model object produced by MplusAutomation::mplusModeler and returned by run_MplusAutomation), which includes Mplus outputs and parameter estimates) with additional information (other values in the same row of fit_refs as well as timings) in separate .rds files in the target folder. Note that, again, to prevent the overload of the hard drive with many parallel read and write requests (to read the simulated datasets and save the Mplus outputs), for the first nClust rounds, using the Sys.sleep function, the data analysis is started after a varying amount of delay. Given that every analysis instance in the first round take different amount of times, after the first round is complete the analyses on cores fall out of sync and there is no more need for adding delays for the next rounds.

Addendum: Version 2

In v2, the given.Means and given.Phis of the \(N\) subjects included in the analysis, as well as the sample summary statistics of the sub-sampled dataset (sample.Means, sample.Variances, and sample.Skewnesses) are also saved in the fit object.

Click to expand the code
do_fit_doFuture <- function(fit_refs,
                            nClust = 48,
                            nPROC = 1,
                            save.directory = "self-sim",
                            alternative.fit.Path = NULL,
                            # model_what = "resid.random",
                            clusterLOG.filename = paste0("fit_clusterLOG_",
                                                         Sys.Date(),
                                                         ".txt"),
                            sleeptime = 1,
                            version = "v2") {

  ## To make sure the required functions are loaded on each cluster
  library(tidyverse)
  source(here::here("scripts",
                    "analysis-component.R"))




  debug <- TRUE

  d <- fit_refs

  registerDoFuture()

  plan("multisession")


  plyr::a_ply(d,
              1,
              function(d_i) {
                if (i <= nClust)
                  Sys.sleep(sleeptime * i)

                d_i <- as.list(d_i)

                fit.StartTime <-
                  Sys.time()

                sim.file <- readRDS(file = here::here(d_i$sim.Path,
                                            d_i$sim.File))

                df <- sim.file$sim.Dataset %>%
                  filter(subject <= d_i$N,
                         t <= d_i$T)

                # Adapting for v2 implementation
                df_sum <- df %>%
                  group_by(subject) %>%
                  summarise(sample.Means = mean(x),
                            sample.Variances = var(x),
                            sample.Skewnesses = moments::skewness(x))

                file.name <-
                  gsub(".rds", "", d_i$fit.File)

                print(paste(file.name,
                            "started at",
                            fit.StartTime))

                tryRes <-
                  try(output.fit <-
                        run_MplusAutomation(
                          df = df,
                          PROCESSORS = nPROC,
                          BITERATIONS.min = d_i$iter,
                          THIN = d_i$thin,
                          model_what = d_i$type,
                          file.name = file.name
                        ))

                d_i$fit.Dataset <-
                  tryRes

                # Adapting for v2 implementation
                d_i$given.Means <- sim.file$Means %>% head(d_i$N)
                d_i$given.Phis <- sim.file$Phis %>% head(d_i$N)

                # Adapting for v2 implementation
                d_i$sample.Means <- df_sum$sample.Means
                d_i$sample.Variances <- df_sum$sample.Variances
                d_i$sample.Skewnesses <- df_sum$sample.Skewnesses

                d_i$fit.StartTime <-
                  fit.StartTime
                d_i$fit.EndTime <-
                  Sys.time()
                d_i$fit.ElapsedTime <-
                  d_i$fit.EndTime - d_i$fit.StartTime

                fit.Path <-
                  ifelse(is.null(alternative.fit.Path),
                         d_i$fit.Path,
                         alternative.fit.Path)

                saveRDS(d_i,
                        file = here::here(fit.Path,
                                          d_i$fit.File))

                print(paste(file.name,
                            "finished at",
                            d_i$fit.EndTime))
              },
              .parallel = TRUE)


}

The function do_harvest_doFuture is a wrapper around fit_extract (see Section 1.3) and uses the package doFuture as a parallelization backend to foreach. It gets a list of .rds files (that can be taken from or fit_refs, or by using list.files function in the directory in which the analysis results are saved) and returns a large dataframe including parameter estimates (and additional information) extracted by fit_extract. Again, like in do_sim_parallel and do_fit_doFuture, the first nClust files are read with varying amount of delay to prevent the overload of the hard drive by multiple read requests.

Click to expand the code
do_harvest_doFuture <- function(fit.files,
                                nClust = 48,
                                sleeptime = 1){

  registerDoFuture()

  plan("multisession")

  results <- foreach(i = 1:length(fit.files),
                     .combine = rbind,
                     .errorhandling = 'remove') %dopar% {
                       if (i <= nClust)
                         Sys.sleep(sleeptime * i)
                       fit_extract(fit.files[i])
                     }
  return(results)
}

2.3 Pipeline code

The following is the code we used to run the whole pipeline, which consists of three chunk that are responsible of the following:

  1. Creating the book-keeping reference files (using functions described in Section 2.1).
  2. Running the whole simulation study (using do_sim_parallel and do_fit_doFuture described in Section 2.2). The codes in this chunk can be interrupted (deliberately or otherwise) and resumed by running it again.
  3. Harvesting the parameter estimates and storing them in a single dataframe (using do_harvest_doFuture of Section 2.2). The resulting dataframe is then used to make the plots using in Section 3)

But first we need to set the directories in which the results of each chunk is saved:

## Root directory of simulation study files
dir_files <- "simulation-files"

## Where reference tables are saved
dir_references <- paste(dir_files,
                        "refs",
                        sep = "/")

## Where simulated datasets are saved
dir_simulation <- paste(dir_files,
                        "sim-files",
                        sep = "/")

## Where analysis output files are saved
dir_analysis <- paste(dir_files,
                      "fit-files",
                      sep = "/")

## Where harvested study results are saved
dir_harvest <- paste(dir_files,
                     "harvest-files",
                     sep = "/")

## where plots are saved

dir_plots <- "figures"

The following code makes sim_refs and fit_refs tables and make backups of them. In line 17 the rows with T != 100 and Model == "DAR" are omitted, and with the codes in lines 22-31 we add other values of \(N\)s and \(T\)s (see the notes under make_sim_refs in Section 2.1).

Click to expand the code
sim_refs_base <- make_sim_refs(
  conditions = list(
    T = c(30, 100),
    N = c(100),
    Model = c("BinAR",
              "Chi2AR",
              "DAR",
              "PoDAR",
              "NAR"),
    l2.dist = c("Gaussian",
                "Chi2"),
    phi = c(0.4)
  ),
  simSeed = 0,
  Reps = 1000,
  save.directory = dir_simulation
) %>%
  filter(T == 100, Model != "DAR")

## Adding another values for N and T; see the text

sim_refs <- NULL

for (TT in c(25, 50, 100)) {
  for (NN in c(25, 50, 100)) {
    sim_refs <- sim_refs_base %>%
      mutate(N = NN,
             T = TT) %>%
      rbind(sim_refs)
  }
}

## Saving a backup of the dataframe of references of simulation files

saveRDS(sim_refs,
        here::here(dir_references,
                   "sim-refs.rds"))

## Making the dataframe of analysis output reference files

fit_refs <- make_fit_refs(sim_refs = sim_refs,
                          save.directory = dir_analysis)


## Saving a backup of the dataframe of references of analysis output files

saveRDS(fit_refs,
        here::here(dir_references,
                   "fit-refs.rds"))

Here we reads the reference tables sim_refs and fit_refs. We then make a list of already simulated files (that are located in the sim.Path directory) and a list of analysis output files (that are located in the fit.Path directory). Then, by comparing sim_refs and fit_refs with the the lists of simulated and analyzed files, we only simulate/analyze the datasets that have not been yet simulated/analyzed. This allows us resume the simulation study after an interruption to the R session.

Click to expand the code
# Simulation part ---------------------------------------------------------

## Reading the backup of the dataframe of references of simulation files

sim_refs <- readRDS(here::here(dir_references,
                               "sim-refs.rds"))

## Making a list of already simulated datasets

sim_refs_done <- list.files(here::here(dir_simulation),
                            pattern = "sim_uSeed-")

## Finding the datasets that are yet to be simulated

sim_refs_remaining <- sim_refs %>%
  filter(!(sim.File %in% sim_refs_done))

## Simulating the remaining datasets

Sys.time()
system.time(
  t.sim <- do_sim_parallel(sim_refs = sim_refs_remaining,
                           save.directory = dir_simulation)
)
Sys.time()


# Analysis part -----------------------------------------------------------

## Reading the backup of the dataframe of references of analysis output files

fit_refs <- readRDS(here::here(dir_references,
                               "fit-refs.rds"))

## Making a list of analysis output files

fit_files_done <- list.files(here::here(dir_simulation),
                             pattern = "fit_uSeed-")

## Finding the analyses that are yet to be done

fit_refs_remaining <- fit_refs %>%
  filter(!(fit.File %in% fit_files_done))

## Running the analyses of remaining datasets in parallel

Sys.time()
system.time(
  t.fit <- do_fit_doFuture(
    fit_refs = fit_refs_remaining,
    nClust = 48,
    nPROC = 1,
    sleeptime = 3,
    save.directory = dir_analysis
  )
)
Sys.time()

Finally, the analyzed datasets are read using do_harvest_doFuture in parallel and stored in an .rds file suffixed with date and time (to prevent over-writing the previously saved harvest files). Furthermore, using the function harvest_cleanup (of Section 1.4.1), an abridged version (d_abridged, including all results of all replications for the parameters of interest) and a more summarized version (d_important, containing the outcome measures of interest per condition) of the raw harvest dataframe are made and saved in two .rds files (again suffixed with date and time).

Click to expand the code
## Harvesting the results in parallel

l.files <- list.files(path = here(dir_analysis),
                      pattern = glob2rx("*.rds"))


fit.files <- l.files %>%
  paste(here::here(dir_analysis),
        .,
        sep = "/")


list.files(here::here(dir_harvest),
           pattern = "harvest-raw_") %>%
  sort(decreasing = TRUE) %>%
  paste(here::here(dir_harvest),
        .,
        sep = "/") %>%
  readRDS()

Sys.time()
system.time(
  harvest_raw <- do_harvest_doFuture(fit.files)
)
Sys.time()


saveRDS(harvest_raw,
        here::here(dir_harvest,
                   paste0(
                     "harvest-raw_",
                     format(Sys.time(),
                            "%Y-%m-%d_%H-%M"),
                     ".rds"
                   )))


d_abridged <- harvest_raw %>%
  harvest_cleanup(return.abridged = TRUE)

saveRDS(d_abridged ,
        here::here(dir_harvest,
                   paste0(
                     "harvest-abridged_",
                     format(Sys.time(),
                            "%Y-%m-%d_%H-%M"),
                     ".rds"
                   )))


d_important <- harvest_raw %>% harvest_cleanup()

saveRDS(d_important ,
        here::here(dir_harvest,
                   paste0(
                     "harvest-important_",
                     format(Sys.time(),
                            "%Y-%m-%d_%H-%M"),
                     ".rds"
                   )))

3 Simulation results

We read the abridged harvested data file (harvest-important_***.rds) and make the outcome plots using plot_Model.x.Resid and plot_quadrants functions (of Section 1.4.3 ) and the MCMC estimates coverage plots using the function plot_caterpillar (of Section 1.4.4). To ensure small file size and good resolutions, all plots are saved in separate PDF files (which can easily be included in a \(\mathrm{\LaTeX{}}\) script) in a specific directory. Before going any further, we should read the abridged harvest dataframe and set the plot output folder.

d_abridged <- list.files(here::here(dir_harvest),
                          pattern = "harvest-abridged_") %>%
  sort(decreasing = TRUE) %>%
  paste(here::here(dir_harvest),
        .,
        sep = "/") %>%
  read_rds()

d_important <- list.files(here::here(dir_harvest),
                          pattern = "harvest-important_") %>%
  sort(decreasing = TRUE) %>%
  paste(here::here(dir_harvest),
        .,
        sep = "/") %>%
  read_rds()

3.1 Plots main results

The left and right panels (including results for Gaussian and \(\chi^2\) distributed means) are made for each outcome measure of interest using plot_Model.x.Resid, combined using the package patchwork, and saved as PDF files.

Click to expand the code
for (parameter in c("Correlation", "Covariance")) {
  for (outcome in c("Bias",
                    "Variance",
                    "RMSE",
                    "MAE",
                    "Positive error",
                    "Negative error",
                    "Total error")) {

    p.single.left <- plot_Model.x.Resid(d_important,
                                       parameter,
                                       outcome,
                                       "Gaussian",
                                       line.width = 2,
                                       font.scale = 8)

    p.single.right <- plot_Model.x.Resid(d_important,
                                        parameter,
                                        outcome,
                                        "Chi2",
                                        line.width = 2,
                                        font.scale = 8) + ylab(NULL)

    p.patchwork <- (p.single.left | p.single.right)

    title <- paste(outcome,
                   "in the estimated",
                   tolower(parameter)
    )

    p.final <- p.patchwork +
      plot_layout(guides = "collect") +
      plot_annotation(
        ## removed the big title and subtitle  here
        # title = TeX(title),
        theme = theme(plot.title =
                        element_text(size = 20,
                                     family = "CMU Serif",
                                     hjust = 0.5))
      ) &
      theme(legend.position = "bottom")

    if(grepl("error", tolower(outcome), fixed = TRUE)) p.final <- p.final +
      scale_y_continuous(breaks = c(5, 10, 20, 40, 60, 80, 90, 100))

    ggsave(paste0(parameter,
                  " (",
                  outcome,
                  ").pdf"),
           p.final,
           path = dir_plots,
           width = 21,
           height = 21.5*30/37 - 1,
           units = "cm")

  }
}

The generated plots are located in figures folder and can be accessed via the following tables:

Individual plots of outcome measures
Outcome measures Correlations Covariances
Bias
Variance
RMSE
MAE
Positive Type-I error rate
Negative Type-I error rate
Total Type-I error rate

The plots containing pairs of outcome measures (respectively including results for Gaussian and \(\chi^2\) distributed means) are made for each outcome measure of interest, combined using the package patchwork, and saved as PDF files.

Click to expand the code
for (parameter in c("Correlation", "Covariance")) {
  for (upper_lower in list(c("Bias", "RMSE"),
                           c("Bias", "Variance"),
                           c("Bias", "MAE"),
                           c("Positive error", "Negative error"),
                           c("Positive error", "Total error"),
                           c("Negative error", "Total error"))) {
    upper <- upper_lower[1]
    lower <- upper_lower[2]

    p.q <- plot_quadrants(d_important,
                           parameter,
                           upper,
                           lower)
      ## Saving the plot
      ggsave(
        paste0(parameter,
               " (",
               upper,
               " and ",
               lower,
               ").pdf"),
        p.q,
        path = dir_plots,
        width = 21,
        height = 30,
        units = "cm"
      )
  }
}

The generated plots are located in figures folder and can be accessed via the following tables:

Combined plots of two outcome measures
Outcome measures Correlations Covariances
Bias and Variance
Bias and RMSE
Bias and MAE
Positive and negative Type-I error
Positive and total Type-I error
Negative and total Type-I error

3.2 Plots of MCMC parameter estimates coverage

It should be noted that plot_caterpillar requires the abridged dataset of harvested results (d_abridged) that contain estimates of all replications within each condition.

Click to expand the code
# Making lists of level-2 distributions, model names, and analyses --------

list_l2.dist <- c("Gaussian",
                  "Chi2")

list_model <- c("NAR",
                "Chi2AR",
                "BinAR",
                "PoDAR")

list_model.type <- c("resid.fixed",
                     "resid.random")

list_parameter <- c("Correlation",
                    "Covariance")

# Plots with 9 panels -----------------------------------------------------

for (single_parameter in list_parameter) {

cross3(list_model,
       list_l2.dist,
       list_model.type) %>%
  plyr::l_ply(function(x){
    single_model <- x[1]
    single_model_name <- ifelse(single_model == "Chiar",
                                "$\\chi^2$AR(1)",
                                paste0(single_model, "(1)"))

    single_l2.dist <- x[2]
    single_l2.dist_name <- ifelse(single_l2.dist == "Chi2",
                                  "$\\chi^2$",
                                  single_l2.dist)

    model_type <- x[3]

    caterpillar_single <- plot_caterpillar(d_abridged,
                                            l2.dist_ = single_l2.dist,
                                            Model_ = single_model,
                                            parameter_ = single_parameter,
                                            analysis.type = model_type)

    p.final <- caterpillar_single +
      plot_layout(guides = "collect") +
      plot_annotation(
        theme = theme(plot.title =
                        element_text(size = 12,
                                     family = "CMU Serif",
                                     hjust = 0.5),
                      plot.subtitle =
                        element_text(size = 12,
                                     family = "CMU Serif",
                                     hjust = 0.5))
      ) &
      theme(legend.position = "bottom")

    ggsave(paste0("caterpillar-",
                  single_parameter,
                  "-",
                  single_model,
                  "-",
                  single_l2.dist,
                  "-",
                  model_type,
                  ".pdf"),
           p.final,
           path = dir_plots,
           width = 18,
           height = 11,
           units = "cm")

  })
}

# Plots with 18 panels ----------------------------------------------------


cross3(list_model,
       list_l2.dist,
       list_parameter) %>%
  plyr::l_ply(function(x){

    single_model <- x[1]
    single_model_name <- ifelse(single_model == "Chiar",
                                "$\\chi^2$AR(1)",
                                paste0(single_model, "(1)"))

    single_l2.dist <- x[2]
    single_l2.dist_name <- ifelse(single_l2.dist == "Chi2",
                                  "$\\chi^2$",
                                  single_l2.dist)

    single_parameter <- x[3]
    caterpillar_fixed <- plot_caterpillar(d_abridged,
                                          l2.dist_ = single_l2.dist,
                                          Model_ = single_model,
                                          parameter_ = single_parameter,
                                          analysis.type = "resid.fixed")
    caterpillar_random <- plot_caterpillar(d_abridged,
                                           l2.dist_ = single_l2.dist,
                                           Model_ = single_model,
                                           parameter_ = single_parameter,
                                           analysis.type = "resid.random")

    p.patchwork <- caterpillar_fixed / caterpillar_random

    p.final <- p.patchwork +
      plot_layout(guides = "collect") +
      plot_annotation(
        theme = theme(plot.title =
                        element_text(size = 12,
                                     family = "CMU Serif",
                                     hjust = 0.5),
                      plot.subtitle =
                        element_text(size = 12,
                                     family = "CMU Serif",
                                     hjust = 0.5))
      ) &
      theme(legend.position = "bottom")

    ggsave(paste0("caterpillar-",
                  single_parameter,
                  "-",
                  single_model,
                  "-",
                  single_l2.dist,
                  "-",
                  "both",
                  ".pdf"),
           p.final,
           path = dir_plots,
           width = 18,
           height = 18,
           units = "cm")

  })

The generated plots are located in figures folder and can be accessed via the following tables:

MCMC parameter estimates coverage plots for correlations
Model Gaussian-distributed means \(\chi^2\)-distributed means
AR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

\(\chi^2\)AR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

BinAR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

PoDAR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

MCMC parameter estimates coverage plots for covariances
Model Gaussian-distributed means \(\chi^2\)-distributed means
AR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

\(\chi^2\)AR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

BinAR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

PoDAR(1)

Fixed res. var

Random res. var

Combined

Fixed res. var

Random res. var

Combined

4 Additional figures and analyses

4.1 DGM time series plots

We make plots including time series plots, marginal distributions, and sample ACFs for three time series generated by each DGMs.

Click to expand the code
make_dgm_df <- function(obj.1,
                        obj.2,
                        obj.3 = NULL){

  d.1 <- data.frame(t = 1:length(obj.1$x),
                    value = obj.1$x,
                    obj.id = obj.1$Model.Description.Short)

  d.2 <- data.frame(t = 1:length(obj.2$x),
                    value = obj.2$x,
                    obj.id = obj.2$Model.Description.Short)

  d <- rbind(d.1, d.2)

  if(!is.null(obj.3)){
    d.3 <- data.frame(t = 1:length(obj.3$x),
                      value = obj.3$x,
                      obj.id = obj.3$Model.Description.Short)
    d <- rbind(d, d.3)
  }

  d <- d %>%
    group_by(obj.id) %>%
    mutate(Mean = mean(value))

  return(d)
}
Click to expand the code
ts.length <- 5000


d.3.nar <- make_dgm_df(
  dgm_nar(
    phi = 0.4,
    var.resid = 20,
    T = ts.length,
    Mean = 85,
    seed = 1 + 9
  ),
  dgm_nar(
    phi = 0.8,
    var.resid = 20,
    T = ts.length,
    Mean = 55,
    seed = 2 + 4
  ),
  dgm_nar(
    phi = 0.4,
    var.resid = 47,
    T = ts.length,
    Mean = 20,
    seed = 3 + 2
  )
)


profile_nar <- plot_dgm.profile(d.3.nar,
                                brewer.pal(name = "YlOrBr", n = 9)[c(5, 7, 9)],
                                "$AR(1)$")


d.3.chiar <- make_dgm_df(
  dgm_generator(
    Model = "ChiAR",
    phi = 0.4,
    nu = 25,
    T = ts.length,
    seed = 1 + 1
  ),
  dgm_generator(
    Model = "ChiAR",
    phi = 0.7,
    nu = 5,
    T = ts.length,
    seed = 2 + 5
  ),
  dgm_generator(
    Model = "ChiAR",
    phi = 0.4,
    nu = 1,
    T = ts.length,
    seed = 3 + 1
  )
)


profile_chiar <- plot_dgm.profile(d.3.chiar,
                                  brewer.pal(name = "GnBu", n = 9)[c(5, 7, 9)],
                                  "$\\chi^2AR(1)$")


d.3.binar <- make_dgm_df(
  dgm_generator(
    Model = "BinAR",
    k = 7,
    alpha  = 0.85,
    phi = 0.45,
    T = ts.length,
    seed = 1 + 5
  ),
  dgm_generator(
    Model = "BinAR",
    k = 7,
    alpha = 0.85,
    phi = 0.7,
    T = ts.length,
    seed = 2 + 6
  ),
  dgm_generator(
    Model = "BinAR",
    k = 7,
    alpha = 0.5,
    phi = 0.45,
    T = ts.length,
    seed = 3
  )
)

profile_binar <- plot_dgm.profile(d.3.binar,
                                  brewer.pal(name = "YlGn", n = 9)[c(5, 7, 9)],
                                  "$BinAR(1)$")


d.3.podar <- make_dgm_df(
  dgm_generator(
    Model = "PoDAR",
    tau = 0.4,
    Mean = 40,
    T = ts.length,
    seed = 1 + 4
  ),
  dgm_generator(
    Model = "PoDAR",
    tau = 0.8,
    Mean = 10,
    T = ts.length,
    seed = 2 + 4
  ),
  dgm_generator(
    Model = "PoDAR",
    tau = 0.4,
    Mean = 1.,
    T = ts.length,
    seed = 3 + 2
  )
)


profile_podar <- plot_dgm.profile(d.3.podar,
                                  brewer.pal(name = "BuPu", n = 9)[c(5, 7, 9)],
                                  "$\\PoDAR(1)$")
Click to expand the code
ggsave(
  "Profile NAR.pdf",
  profile_nar,
  path = dir_plots,
  height = 10 * 1.8,
  width = 21 * 2,
  units = "cm"
)

ggsave(
  "Profile Chi2AR.pdf",
  profile_chiar,
  path = dir_plots,
  height = 10 * 1.8,
  width = 21 * 2,
  units = "cm"
)


ggsave(
  "Profile BinAR.pdf",
  profile_binar,
  path = dir_plots,
  height = 10 * 1.8,
  width = 21 * 2,
  units = "cm"
)


ggsave(
  "Profile PoDAR.pdf",
  profile_podar,
  path = dir_plots,
  height = 10 * 1.8,
  width = 21 * 2,
  units = "cm"
)


ggsave(
  "Profile four DGMs.pdf",
  profile_nar /
    profile_chiar /
    profile_binar /
    profile_podar,
  path = dir_plots,
  height = 10 * 1.8 * 4,
  width = 21 * 2,
  units = "cm"
)


ggsave(
  "Profile alternative DGMs.pdf",
  profile_chiar /
    profile_binar /
    profile_podar,
  path = dir_plots,
  height = 10 * 1.8 * 3,
  width = 21 * 2,
  units = "cm"
)

The generated time series plots of the AR(1) , \(\chi^2\)AR(1) , BinAR(1) , and PoDAR(1) models are saved in the figures folder (see also combined plots, and ).

4.2 Profiles of the simulated datasets

We read the first replication of simulated datasets with \(N = 100, T = 100\) and plot using the function plot_dataset.profile (from Section 1.4.2).

Click to expand the code
# Reading simulated datasets and saving in lists --------------------------

dgm_datasets_gaussian.means <- list(
  nar = "sim_uSeed-13330_l2.dist-Gaussian_Model-NAR",
  chiar = "sim_uSeed-13297_l2.dist-Gaussian_Model-Chi2AR",
  binar = "sim_uSeed-13286_l2.dist-Gaussian_Model-BinAR",
  podar = "sim_uSeed-13319_l2.dist-Gaussian_Model-PoDAR"
) %>%
  lapply(function(x) read_rds(paste0("simulation-files/sim-files/",
                                     x,
                                     "_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1.rds")))


dgm_datasets_chi2.means <- list(
  nar = "sim_uSeed-13337_l2.dist-Chi2_Model-NAR",
  chiar = "sim_uSeed-13304_l2.dist-Chi2_Model-Chi2AR",
  binar = "sim_uSeed-13293_l2.dist-Chi2_Model-BinAR",
  podar = "sim_uSeed-13326_l2.dist-Chi2_Model-PoDAR"
) %>%
  lapply(function(x) read_rds(paste0("simulation-files/sim-files/",
                                     x,
                                     "_N-100_phi-0.4_T-100_sim.Seed-0_Rep-1.rds")))



# Plot profiles per DGM ---------------------------------------------------

purrr::pwalk(
  list(
    dgm_datasets_gaussian.means,
    dgm_datasets_chi2.means,
    names(dgm_datasets_gaussian.means)
  ),
  ~ save_dataset_profile(..1,
                         ..2,
                         paste0("profiles-dataset-", ..3))

)

The generated distribution plots of the AR(1) , \(\chi^2\)AR(1) , BinAR(1) , and PoDAR(1) models are saved in the figures folder.

4.3 COGITO data analysis

In the paper, we reported an analysis of distress time series in of the COGITO dataset Schmiedek, Lövdén, and Lindenberger (2010). Because we are not allowed to make the dataset public, here we provide the Mplus input script and the analysis output.

Click to expand the code
TITLE:
Multilevel AR(1) model of COGITO distress

DATA:
FILE = "cogito-na1-distress.dat";
 
VARIABLE:

NAMES = subject t x; 
MISSING=.;
CLUSTER = subject;
LAGGED = x(1);
TINTERVAL = t(1);

ANALYSIS:

TYPE = TWOLEVEL RANDOM;
ESTIMATOR = BAYES;
PROCESSORS = 2;
CHAINS = 2;
THIN = 5;
BITERATIONS = 5000(2000);

MODEL:
%WITHIN%
phi | x ON x&1;
logv | x;

%BETWEEN%
x phi logv WITH x phi logv;

OUTPUT:
TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;

PLOT:
TYPE = PLOT3;
FACTORS = ALL (500);
Click to expand the code
Mplus VERSION 8.6
MUTHEN & MUTHEN
09/28/2022   1:15 PM

INPUT INSTRUCTIONS

  TITLE:
  Multilevel AR(1) model of COGITO distress

  DATA:
  FILE = "cogito-na1-distress.dat";

  VARIABLE:

  NAMES = subject t x;
  MISSING=.;
  CLUSTER = subject;
  LAGGED = x(1);
  TINTERVAL = t(1);

  ANALYSIS:

  TYPE = TWOLEVEL RANDOM;
  ESTIMATOR = BAYES;
  PROCESSORS = 2;
  CHAINS = 2;
  THIN = 5;
  BITERATIONS = 5000(2000);

  MODEL:
  %WITHIN%
  phi | x ON x&1;
  logv | x;

  %BETWEEN%
  x phi logv WITH x phi logv;

  OUTPUT:
  TECH1 TECH2 TECH3 TECH8 FSCOMPARISON STANDARDIZED STDYX STDY;

  PLOT:
  TYPE = PLOT3;
  FACTORS = ALL (500);



*** WARNING
  One or more individual-level variables have no variation within a
  cluster for the following clusters.

     Variable   Cluster IDs with no within-cluster variation

      X           46 176 189 141 184 122 90 81 39 48 177 108 178 63 170 159 54 28 68 55 31 112
                  77 93 117 165 40 84
      X&1         46 176 189 141 184 122 90 81 39 48 177 108 178 63 170 159 54 28 68 55 31 112
                  77 93 117 165 40 84

   1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS




Multilevel AR(1) model of COGITO distress

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                       32518

Number of dependent variables                                    1
Number of independent variables                                  1
Number of continuous latent variables                            2

Observed dependent variables

  Continuous
   X

Observed independent variables
   X&1

Continuous latent variables
   PHI         LOGV

Variables with special functions

  Cluster variable      SUBJECT

  Within variables
   X&1


Estimator                                                    BAYES
Specifications for Bayesian Estimation
  Point estimate                                            MEDIAN
  Number of Markov chain Monte Carlo (MCMC) chains               2
  Random seed for the first chain                                0
  Starting value information                           UNPERTURBED
  Algorithm used for Markov chain Monte Carlo           GIBBS(PX1)
  Convergence criterion                                  0.500D-01
  Maximum number of iterations                                5000
  K-th iteration used for thinning                               5
Specifications for Bayes Factor Score Estimation
  Number of imputed data sets                                  500
  Iteration intervals for thinning                               1

Input data file(s)
  cogito-na1-distress.dat
Input data format  FREE


SUMMARY OF DATA

     Number of clusters                        204

       Size (s)    Cluster ID with Size s

        114        45
        117        22
        121        65 195
        123        64
        124        102 73
        125        143
        127        46 43 156 176 78
        128        154 189 44
        129        89 57 131
        130        80 133 168 141 118 147
        131        139 181 184 186 144 49
        132        127 146 122 196
        133        61 153 7
        134        90 81 39
        135        48 132
        136        150 193 98 18
        137        94 30 177 24 106
        138        128 23 187 10 151 152 108
        139        164 20
        140        182 1
        141        87 53 36 136 178 51 149
        142        63 170 140 114 50 161 130 198 202
        143        88 82 14 137
        144        120 29 115 2 172 197 185 159
        145        17 109 101 138
        146        54 37
        148        96
        149        129 166 3
        150        134 155 4 19 104 191
        151        16 92 28
        152        121 68 55 31
        153        6
        154        112 192
        155        163 13 70
        156        180 5 107 59
        157        83 86 77 12 26 9 41
        158        160
        159        38
        160        105 67 66 173 174 75
        162        11
        163        103 200 35
        164        142 99
        165        190 188
        166        126 72 148 113
        167        93
        168        56
        171        60
        172        124
        173        169 97 15 183 25
        174        91
        177        27 71
        178        117
        179        175
        180        8 204
        183        47 111
        185        79
        186        76 165
        188        69
        189        194
        190        110
        192        100 145 52 21
        193        119
        197        62 42
        198        116
        200        157
        201        203
        204        74 199
        205        34
        206        135
        211        162
        212        123
        214        40
        220        95
        227        125
        238        171
        239        201
        250        167
        262        179 32
        265        85
        276        84
        303        33
        306        58
        373        158



SUMMARY OF MISSING DATA PATTERNS

     Number of missing data patterns             4

     MISSING DATA PATTERNS (x = not missing)

           1  2  3  4
 X         x  x
 X&1       x     x

     MISSING DATA PATTERN FREQUENCIES

    Pattern   Frequency     Pattern   Frequency     Pattern   Frequency
          1       15032           3        5301
          2        5505           4        6680


COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value   0.100

     PROPORTION OF DATA PRESENT

           Covariance Coverage
              X
              ________
 X              0.632



UNIVARIATE SAMPLE STATISTICS

     UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS

         Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
        Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median

     X                     0.892       1.843       0.000   62.19%       0.000      0.000      0.000
           20537.000       2.101       3.178       7.000    0.84%       0.000      2.000

     WARNING: PROBLEMS OCCURRED IN SEVERAL ITERATIONS IN THE COMPUTATION OF THE STANDARDIZED ESTIMATES FOR SEVERAL
     CLUSTERS. THIS IS MOST LIKELY DUE TO AR COEFFICIENTS GREATER THAN 1 OR PARAMETERS GIVING NON-STATIONARY MODELS.
     SUCH POSTERIOR DRAWS ARE REMOVED. THE FOLLOWING CLUSTERS HAD SUCH PROBLEMS:
          202

THE MODEL ESTIMATION TERMINATED NORMALLY

     USE THE FBITERATIONS OPTION TO INCREASE THE NUMBER OF ITERATIONS BY A FACTOR
     OF AT LEAST TWO TO CHECK CONVERGENCE AND THAT THE PSR VALUE DOES NOT INCREASE.



MODEL FIT INFORMATION

Number of Free Parameters                               9

Information Criteria

          Deviance (DIC)                        61653.300
          Estimated Number of Parameters (pD)   12317.423



MODEL RESULTS

                                Posterior  One-Tailed         95% C.I.
                    Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance

Within Level

Between Level

 X        WITH
    PHI                0.118       0.020      0.000       0.082       0.157      *
    LOGV               1.370       0.192      0.000       1.030       1.776      *

 PHI      WITH
    LOGV               0.234       0.047      0.000       0.150       0.338      *

 Means
    X                  0.866       0.067      0.000       0.732       0.997      *
    PHI                0.217       0.017      0.000       0.182       0.249      *
    LOGV              -1.314       0.168      0.000      -1.641      -0.983      *

 Variances
    X                  0.888       0.094      0.000       0.727       1.105      *
    PHI                0.052       0.006      0.000       0.041       0.066      *
    LOGV               5.745       0.576      0.000       4.754       7.000      *


STANDARDIZED MODEL RESULTS


STDYX Standardization

                                Posterior  One-Tailed         95% C.I.
                    Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance

Within-Level Standardized Estimates Averaged Over Clusters

 PHI | X ON
    X&1                0.217       0.008      0.000       0.202       0.232      *

 LOGV |
    X                  0.903       0.004      0.000       0.894       0.911      *

Between Level

 X        WITH
    PHI                0.551       0.062      0.000       0.415       0.659      *
    LOGV               0.607       0.046      0.000       0.506       0.689      *

 PHI      WITH
    LOGV               0.435       0.067      0.000       0.295       0.560      *

 Means
    X                  0.918       0.084      0.000       0.753       1.078      *
    PHI                0.955       0.097      0.000       0.756       1.140      *
    LOGV              -0.549       0.075      0.000      -0.700      -0.402      *

 Variances
    X                  1.000       0.000      0.000       1.000       1.000
    PHI                1.000       0.000      0.000       1.000       1.000
    LOGV               1.000       0.000      0.000       1.000       1.000


STDY Standardization

                                Posterior  One-Tailed         95% C.I.
                    Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%  Significance

Within-Level Standardized Estimates Averaged Over Clusters

 PHI | X ON
    X&1                0.321       0.067      0.000       0.191       0.458      *

 LOGV |
    X                  0.903       0.004      0.000       0.894       0.911      *

Between Level

 X        WITH
    PHI                0.551       0.062      0.000       0.415       0.659      *
    LOGV               0.607       0.046      0.000       0.506       0.689      *

 PHI      WITH
    LOGV               0.435       0.067      0.000       0.295       0.560      *

 Means
    X                  0.918       0.084      0.000       0.753       1.078      *
    PHI                0.955       0.097      0.000       0.756       1.140      *
    LOGV              -0.549       0.075      0.000      -0.700      -0.402      *

 Variances
    X                  1.000       0.000      0.000       1.000       1.000
    PHI                1.000       0.000      0.000       1.000       1.000
    LOGV               1.000       0.000      0.000       1.000       1.000


R-SQUARE

Within-Level R-Square Averaged Across Clusters

                                Posterior  One-Tailed         95% C.I.
    Variable        Estimate       S.D.      P-Value   Lower 2.5%  Upper 2.5%

    X                  0.097       0.004      0.000       0.089       0.106


BETWEEN-LEVEL FACTOR SCORE COMPARISONS

 Results for Factor PHI

 Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score
     1       202          0.983           2         1          0.754           3       179          0.696
     4         5          0.676           5       149          0.663           6       129          0.636
     7        85          0.633           8        87          0.626           9       147          0.620
    10        92          0.610          11       167          0.573          12        88          0.567
    13        23          0.559          14        97          0.554          15       142          0.550
    16        44          0.544          17       150          0.521          18       183          0.518
    19       201          0.514          20        12          0.511          21        64          0.508
    22       107          0.499          23       119          0.491          24        35          0.491
    25        70          0.481          26       124          0.475          27       204          0.458
    28        73          0.449          29       110          0.443          30       143          0.441
    31        65          0.437          32        22          0.433          33        37          0.433
    34       154          0.431          35        75          0.427          36       196          0.424
    37         8          0.422          38       198          0.405          39        25          0.396
    40       200          0.391          41        59          0.389          42       188          0.388
    43        30          0.379          44       136          0.376          45        53          0.375
    46       103          0.374          47        32          0.372          48       197          0.371
    49        38          0.361          50       139          0.358          51        83          0.355
    52       116          0.349          53        56          0.347          54       158          0.340
    55       126          0.331          56       171          0.326          57       135          0.325
    58       168          0.324          59       131          0.321          60         3          0.320
    61       203          0.319          62       182          0.318          63        14          0.315
    64       132          0.312          65        74          0.310          66         6          0.307
    67        91          0.303          68       151          0.296          69        61          0.295
    70       134          0.294          71       191          0.292          72        95          0.291
    73       160          0.290          74        15          0.289          75        24          0.284
    76       186          0.282          77        72          0.280          78        51          0.280
    79       156          0.275          80       123          0.266          81       114          0.265
    82       152          0.261          83       164          0.257          84         7          0.254
    85       104          0.251          86       193          0.250          87        58          0.248
    88        33          0.246          89        62          0.244          90        78          0.238
    91       138          0.237          92       169          0.232          93       153          0.232
    94       173          0.231          95       192          0.230          96       145          0.230
    97       174          0.226          98       127          0.218          99       137          0.214
   100       102          0.213         101       190          0.209         102       161          0.209
   103        36          0.208         104        66          0.192         105        42          0.167
   106        89          0.163         107       133          0.160         108       185          0.154
   109         4          0.152         110       101          0.147         111        79          0.145
   112       100          0.139         113       111          0.139         114        52          0.128
   115        21          0.127         116        20          0.123         117        18          0.123
   118       128          0.120         119        50          0.119         120       148          0.109
   121        13          0.108         122        27          0.105         123        29          0.099
   124        69          0.097         125       115          0.093         126        19          0.089
   127       155          0.083         128       130          0.076         129       144          0.075
   130       187          0.075         131       175          0.071         132       194          0.069
   133        94          0.065         134       118          0.065         135        43          0.063
   136        11          0.060         137        71          0.057         138        80          0.053
   139       140          0.050         140        10          0.049         141        17          0.039
   142       181          0.037         143        34          0.036         144       106          0.035
   145       108          0.033         146         9          0.031         147        90          0.031
   148        57          0.028         149        63          0.028         150        96          0.026
   151       109          0.026         152       184          0.025         153       162          0.024
   154       199          0.024         155        81          0.023         156        77          0.021
   157       125          0.021         158        55          0.021         159        47          0.021
   160       141          0.021         161       112          0.020         162        40          0.020
   163       165          0.020         164       176          0.020         165       177          0.019
   166        93          0.019         167       159          0.019         168        68          0.019
   169       170          0.019         170        39          0.018         171       189          0.016
   172        82          0.016         173        54          0.016         174        16          0.015
   175       178          0.015         176        45          0.014         177       117          0.014
   178        84          0.014         179        28          0.014         180       122          0.014
   181        46          0.013         182       121          0.013         183        48          0.012
   184        31          0.011         185       146          0.011         186        26          0.009
   187       172          0.009         188        86          0.009         189       163          0.007
   190        60          0.007         191       166          0.007         192        98          0.000
   193        99         -0.007         194       105         -0.007         195        76         -0.010
   196       113         -0.012         197        67         -0.015         198        41         -0.016
   199       180         -0.019         200       195         -0.020         201       157         -0.022
   202         2         -0.048         203        49         -0.049         204       120         -0.081

 Results for Factor LOGV

 Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score
     1       171          1.801           2        97          1.493           3        95          1.471
     4       132          1.452           5        72          1.393           6       169          1.383
     7       118          1.326           8         6          1.308           9       152          1.301
    10        83          1.262          11        32          1.154          12       100          1.087
    13        38          0.990          14        24          0.910          15        69          0.906
    16       196          0.906          17       103          0.880          18        33          0.849
    19        56          0.822          20       164          0.817          21        14          0.798
    22       120          0.776          23        23          0.738          24       151          0.720
    25        59          0.712          26       203          0.700          27       101          0.693
    28       119          0.692          29        12          0.686          30        50          0.646
    31       129          0.626          32        42          0.621          33       114          0.607
    34        74          0.603          35        22          0.579          36        30          0.551
    37         2          0.542          38        91          0.533          39         1          0.519
    40        44          0.482          41        80          0.480          42       124          0.473
    43       149          0.452          44       123          0.425          45        53          0.390
    46       111          0.381          47        37          0.371          48        27          0.362
    49       153          0.356          50       173          0.348          51        78          0.329
    52       125          0.324          53       168          0.322          54       144          0.293
    55       147          0.289          56        43          0.285          57       183          0.256
    58       167          0.224          59       199          0.164          60       107          0.158
    61       145          0.153          62        88          0.146          63        10          0.120
    64         8          0.094          65       204          0.089          66       150          0.085
    67        18          0.058          68        13          0.047          69         7          0.009
    70        87          0.007          71         4         -0.047          72       104         -0.048
    73       200         -0.073          74       194         -0.087          75       148         -0.100
    76        52         -0.100          77       136         -0.109          78        94         -0.117
    79       110         -0.137          80       138         -0.143          81       192         -0.145
    82        79         -0.148          83        61         -0.164          84       182         -0.178
    85       127         -0.181          86        47         -0.184          87        35         -0.193
    88        15         -0.196          89        51         -0.217          90        66         -0.233
    91        64         -0.250          92         3         -0.260          93        36         -0.277
    94       179         -0.292          95        71         -0.335          96       131         -0.350
    97       166         -0.365          98        76         -0.384          99        70         -0.436
   100        20         -0.451         101       197         -0.469         102        65         -0.472
   103       174         -0.495         104       157         -0.495         105       109         -0.521
   106        11         -0.528         107       135         -0.566         108       190         -0.586
   109       202         -0.587         110        21         -0.594         111        85         -0.607
   112        89         -0.628         113       193         -0.633         114        62         -0.648
   115        75         -0.654         116       142         -0.655         117       134         -0.710
   118       187         -0.711         119       198         -0.717         120        29         -0.721
   121       102         -0.792         122        58         -0.813         123       106         -0.816
   124       133         -0.840         125        25         -0.859         126       195         -0.903
   127       162         -0.934         128       115         -0.967         129       161         -0.989
   130       188         -0.990         131       130         -1.012         132       137         -1.027
   133        60         -1.082         134        57         -1.133         135       143         -1.185
   136        26         -1.189         137       158         -1.208         138       128         -1.232
   139       201         -1.269         140       191         -1.271         141       185         -1.334
   142       139         -1.335         143       116         -1.424         144       175         -1.433
   145       155         -1.460         146       181         -1.497         147        86         -1.670
   148       113         -1.696         149        96         -1.821         150       105         -1.962
   151       180         -1.978         152         5         -1.986         153       126         -2.057
   154        41         -2.178         155        49         -2.219         156        45         -2.257
   157        73         -2.268         158       186         -2.276         159       140         -2.423
   160        19         -2.508         161        67         -2.543         162        98         -2.814
   163        16         -2.986         164       121         -3.190         165        34         -3.202
   166        99         -3.234         167       154         -3.532         168       156         -3.610
   169        17         -3.896         170       160         -4.161         171       146         -4.569
   172        82         -4.574         173       172         -4.588         174       163         -4.599
   175         9         -4.619         176        92         -4.986         177        63         -6.213
   178       108         -6.213         179        93         -6.213         180        84         -6.213
   181        54         -6.213         182        81         -6.213         183       141         -6.213
   184        68         -6.213         185       165         -6.213         186       178         -6.213
   187       122         -6.213         188       159         -6.213         189        55         -6.213
   190       184         -6.213         191        48         -6.213         192       170         -6.213
   193        39         -6.213         194        31         -6.213         195       176         -6.213
   196       177         -6.213         197       112         -6.213         198       189         -6.214
   199        28         -6.214         200        46         -6.214         201        90         -6.214
   202       117         -6.214         203        40         -6.214         204        77         -6.214

 Results for X (referred to as plausible label B_X)

 Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score     Ranking  Cluster    Factor Score
     1        85          3.459           2        51          3.280           3        97          3.219
     4       196          3.096           5       179          3.033           6       107          2.829
     7         5          2.753           8        87          2.679           9        12          2.672
    10       129          2.665          11       149          2.651          12        72          2.597
    13        95          2.498          14       182          2.482          15       202          2.445
    16         3          2.434          17        37          2.434          18        44          2.408
    19       124          2.395          20        50          2.359          21       151          2.314
    22        32          2.267          23       125          2.229          24        74          2.223
    25       123          2.213          26         8          2.202          27       171          2.135
    28       158          2.108          29       119          2.084          30        23          2.073
    31        38          2.060          32         6          2.055          33        22          2.040
    34       185          2.039          35        24          2.035          36       145          2.024
    37        14          1.997          38        29          1.949          39       193          1.917
    40       120          1.866          41        83          1.854          42         4          1.824
    43        59          1.794          44        65          1.787          45       132          1.693
    46       203          1.687          47        33          1.661          48       197          1.619
    49       147          1.615          50       152          1.547          51       100          1.458
    52       136          1.421          53       153          1.407          54       115          1.354
    55       114          1.337          56         1          1.323          57        25          1.298
    58       110          1.263          59       195          1.262          60        91          1.261
    61        69          1.258          62       174          1.256          63       103          1.253
    64        35          1.201          65       101          1.201          66       169          1.193
    67       204          1.177          68       183          1.156          69       167          1.152
    70        94          1.131          71        52          1.131          72       164          1.111
    73        64          1.107          74        20          0.946          75       190          0.926
    76        61          0.908          77        30          0.903          78       102          0.856
    79        88          0.838          80        71          0.796          81       118          0.774
    82       134          0.772          83        56          0.768          84       131          0.757
    85        80          0.750          86        10          0.722          87        78          0.713
    88        53          0.693          89         2          0.677          90       111          0.649
    91       135          0.632          92       173          0.625          93        66          0.617
    94        58          0.604          95       104          0.581          96       168          0.571
    97       144          0.567          98       200          0.565          99        18          0.550
   100        42          0.547         101       150          0.501         102         7          0.481
   103       138          0.473         104       194          0.456         105        79          0.447
   106        13          0.438         107        75          0.422         108        70          0.414
   109       192          0.392         110       133          0.385         111       166          0.376
   112       199          0.376         113       198          0.360         114        15          0.352
   115        27          0.338         116       157          0.335         117        89          0.331
   118       137          0.321         119        43          0.312         120       142          0.312
   121       109          0.303         122        36          0.297         123       130          0.281
   124       148          0.279         125       191          0.276         126       106          0.274
   127       161          0.273         128        11          0.272         129        21          0.271
   130        76          0.267         131       127          0.265         132       139          0.261
   133       155          0.256         134       116          0.253         135        47          0.233
   136       188          0.223         137       201          0.200         138       143          0.185
   139       126          0.177         140       175          0.165         141        62          0.159
   142       128          0.144         143        26          0.142         144       105          0.142
   145        60          0.141         146        41          0.129         147        57          0.123
   148        49          0.121         149       180          0.111         150       186          0.103
   151       113          0.099         152       181          0.092         153        19          0.089
   154       162          0.080         155       187          0.077         156        73          0.070
   157        86          0.069         158        67          0.063         159        45          0.043
   160        96          0.043         161        98          0.040         162        99          0.039
   163       154          0.038         164       140          0.032         165        16          0.030
   166       156          0.029         167        34          0.023         168        17          0.021
   169       121          0.021         170       146          0.010         171       172          0.010
   172         9          0.010         173        82          0.010         174       163          0.009
   175       160          0.006         176        92          0.004         177       112          0.000
   178        28          0.000         179       177          0.000         180       189          0.000
   181        90          0.000         182       117          0.000         183        48          0.000
   184       170          0.000         185        68          0.000         186       108          0.000
   187        55          0.000         188       184          0.000         189        39          0.000
   190        31          0.000         191       141          0.000         192        40          0.000
   193       159          0.000         194        93          0.000         195        54          0.000
   196        84          0.000         197       176          0.000         198        77          0.000
   199       178          0.000         200       165          0.000         201        63          0.000
   202       122          0.000         203        46          0.000         204        81          0.000


TECHNICAL 1 OUTPUT

     PARAMETER SPECIFICATION FOR WITHIN

           NU
              X             X&1
              ________      ________
                    0             0

           LAMBDA
              X             X&1
              ________      ________
 X                  0             0
 X&1                0             0

           THETA
              X             X&1
              ________      ________
 X                  0
 X&1                0             0

           ALPHA
              X             X&1
              ________      ________
                    0             0

           BETA
              X             X&1
              ________      ________
 X                  0             0
 X&1                0             0

           PSI
              X             X&1
              ________      ________
 X                  0
 X&1                0             0

     PARAMETER SPECIFICATION FOR BETWEEN

           NU
              X
              ________
                    0

           LAMBDA
              PHI           LOGV          X
              ________      ________      ________
 X                  0             0             0

           THETA
              X
              ________
 X                  0

           ALPHA
              PHI           LOGV          X
              ________      ________      ________
                    1             2             3

           BETA
              PHI           LOGV          X
              ________      ________      ________
 PHI                0             0             0
 LOGV               0             0             0
 X                  0             0             0

           PSI
              PHI           LOGV          X
              ________      ________      ________
 PHI                4
 LOGV               5             6
 X                  7             8             9

     STARTING VALUES FOR WITHIN

           NU
              X             X&1
              ________      ________
                0.000         0.000

           LAMBDA
              X             X&1
              ________      ________
 X              1.000         0.000
 X&1            0.000         1.000

           THETA
              X             X&1
              ________      ________
 X              0.000
 X&1            0.000         0.000

           ALPHA
              X             X&1
              ________      ________
                0.000         0.000

           BETA
              X             X&1
              ________      ________
 X              0.000         0.000
 X&1            0.000         0.000

           PSI
              X             X&1
              ________      ________
 X              0.000
 X&1            0.000         1.049

     STARTING VALUES FOR BETWEEN

           NU
              X
              ________
                0.000

           LAMBDA
              PHI           LOGV          X
              ________      ________      ________
 X              0.000         0.000         1.000

           THETA
              X
              ________
 X              0.000

           ALPHA
              PHI           LOGV          X
              ________      ________      ________
                0.000         0.000         0.892

           BETA
              PHI           LOGV          X
              ________      ________      ________
 PHI            0.000         0.000         0.000
 LOGV           0.000         0.000         0.000
 X              0.000         0.000         0.000

           PSI
              PHI           LOGV          X
              ________      ________      ________
 PHI            1.000
 LOGV           0.000         1.000
 X              0.000         0.000         1.051

     PRIORS FOR ALL PARAMETERS            PRIOR MEAN      PRIOR VARIANCE     PRIOR STD. DEV.

     Parameter 1~N(0.000,infinity)           0.0000            infinity            infinity
     Parameter 2~N(0.000,infinity)           0.0000            infinity            infinity
     Parameter 3~N(0.000,infinity)           0.0000            infinity            infinity
     Parameter 4~IW(0.000,-4)              infinity            infinity            infinity
     Parameter 5~IW(0.000,-4)              infinity            infinity            infinity
     Parameter 6~IW(0.000,-4)              infinity            infinity            infinity
     Parameter 7~IW(0.000,-4)              infinity            infinity            infinity
     Parameter 8~IW(0.000,-4)              infinity            infinity            infinity
     Parameter 9~IW(0.000,-4)              infinity            infinity            infinity


TECHNICAL 2 OUTPUT


TECHNICAL 3 OUTPUT

           ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES
                     1             2             3             4             5
              ________      ________      ________      ________      ________
        1   0.303243D-03
        2   0.111479D-02  0.283039D-01
        3   0.573716D-03  0.676933D-02  0.443802D-02
        4  -0.532720D-05  0.110025D-04 -0.339096D-05  0.391271D-04
        5  -0.194184D-04  0.959434D-05 -0.223709D-05  0.151406D-03  0.225353D-02
        6  -0.254633D-04 -0.198442D-02  0.203844D-03  0.528874D-03  0.143923D-01
        7  -0.155307D-04 -0.632696D-04  0.219154D-04  0.734729D-04  0.609272D-03
        8  -0.181356D-04 -0.550199D-03  0.392759D-03  0.269224D-03  0.525530D-02
        9  -0.372339D-04 -0.473692D-03  0.237455D-03  0.130175D-03  0.166469D-02

           ESTIMATED COVARIANCE MATRIX FOR PARAMETER ESTIMATES
                     6             7             8             9
              ________      ________      ________      ________
        6   0.331846D+00
        7   0.346864D-02  0.388632D-03
        8   0.814117D-01  0.196578D-02  0.368172D-01
        9   0.193057D-01  0.115614D-02  0.130115D-01  0.880102D-02

           ESTIMATED CORRELATION MATRIX FOR PARAMETER ESTIMATES
                     1             2             3             4             5
              ________      ________      ________      ________      ________
        1       1.000
        2       0.381         1.000
        3       0.495         0.604         1.000
        4      -0.049         0.010        -0.008         1.000
        5      -0.023         0.001        -0.001         0.510         1.000
        6      -0.003        -0.020         0.005         0.147         0.526
        7      -0.045        -0.019         0.017         0.596         0.651
        8      -0.005        -0.017         0.031         0.224         0.577
        9      -0.023        -0.030         0.038         0.222         0.374

           ESTIMATED CORRELATION MATRIX FOR PARAMETER ESTIMATES
                     6             7             8             9
              ________      ________      ________      ________
        6       1.000
        7       0.305         1.000
        8       0.737         0.520         1.000
        9       0.357         0.625         0.723         1.000


TECHNICAL 8 OUTPUT


   TECHNICAL 8 OUTPUT FOR BAYES ESTIMATION

     CHAIN    BSEED
     1        0
     2        285380

                     POTENTIAL       PARAMETER WITH
     ITERATION    SCALE REDUCTION      HIGHEST PSR
     100              1.056               6
     200              1.000               1
     300              1.007               6
     400              1.000               6
     500              1.002               5
     600              1.008               5
     700              1.002               5
     800              1.001               5
     900              1.003               6
     1000             1.002               5
     1100             1.002               5
     1200             1.001               6
     1300             1.000               5
     1400             1.000               1
     1500             1.000               5
     1600             1.001               5
     1700             1.000               5
     1800             1.001               5
     1900             1.000               3
     2000             1.001               3


SUMMARIES OF PLAUSIBLE VALUES (N = NUMBER OF OBSERVATIONS * NUMBER OF IMPUTATIONS)

     SAMPLE STATISTICS

           Means
              PHI           LOGV          B_X
              ________      ________      ________
                0.222        -1.237         0.907

           Covariances
              PHI           LOGV          B_X
              ________      ________      ________
 PHI            0.050
 LOGV           0.221         5.341
 B_X            0.117         1.304         0.888

           Correlations
              PHI           LOGV          B_X
              ________      ________      ________
 PHI            1.000
 LOGV           0.428         1.000
 B_X            0.558         0.599         1.000


SUMMARY OF PLAUSIBLE STANDARD DEVIATION (N = NUMBER OF OBSERVATIONS)

     SAMPLE STATISTICS

           Means
              PHI_SD        LOGV_SD       B_X_SD
              ________      ________      ________
                0.094         0.126         0.114

           Covariances
              PHI_SD        LOGV_SD       B_X_SD
              ________      ________      ________
 PHI_SD         0.000
 LOGV_SD        0.000         0.002
 B_X_SD         0.000         0.002         0.008

           Correlations
              PHI_SD        LOGV_SD       B_X_SD
              ________      ________      ________
 PHI_SD         1.000
 LOGV_SD       -0.028         1.000
 B_X_SD        -0.257         0.444         1.000


PLOT INFORMATION

The following plots are available:

  Histograms (sample values, estimated factor scores)
  Scatterplots (sample values, estimated factor scores)
  Between-level histograms (sample values, sample/estimated means/variances, estimated factor scores)
  Between-level scatterplots (sample values, sample/estimated means/variances, estimated factor scores)
  Time series plots (sample values, ACF, PACF, estimated factor scores)
  Histogram of subjects per time point
  Time interval plots
  Bayesian posterior parameter distributions
  Bayesian posterior parameter trace plots
  Bayesian autocorrelation plots
  Latent variable distribution plots

DIAGRAM INFORMATION

  Mplus diagrams are currently not available for multilevel analysis.
  No diagram output was produced.

     Beginning Time:  13:15:14
        Ending Time:  13:37:51
       Elapsed Time:  00:22:37



MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA  90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

Copyright (c) 1998-2021 Muthen & Muthen

References

Bengtsson, Henrik. 2020. “A Unifying Framework for Parallel and Distributed Processing in r Using Futures,” August. https://arxiv.org/abs/2008.00553.
Bien, Jacob. 2016. “The Simulator: An Engine to Streamline Simulations.” arXiv:1607.00021 [Stat], June. http://arxiv.org/abs/1607.00021.
Hallquist, Michael N., and Joshua F. Wiley. 2018. MplusAutomation : An R Package for Facilitating Large-Scale Latent Variable Analyses in M Plus.” Structural Equation Modeling: A Multidisciplinary Journal 25 (4): 621–38. https://doi.org/10.1080/10705511.2017.1402334.
Haqiqatkhah, MH Manuel, Oisín Ryan, and Ellen L. Hamaker. 2022. “Skewness and Staging: Does the Floor Effect Induce Bias in Multilevel AR(1) Models?” https://doi.org/10.31234/osf.io/myuvr.
Muthén, Linda K., and Bengt O. Muthén. 2017. Mplus Users Guide. Eighth Edition. Los Angeles, CA: Muthén & Muthén.
Schmiedek, Florian, Martin Lövdén, and Ulman Lindenberger. 2010. “Hundred Days of Cognitive Training Enhance Broad Cognitive Abilities in Adulthood: Findings from the COGITO Study.” Frontiers in Aging Neuroscience 2. https://doi.org/10.3389/fnagi.2010.00027.
Tierney, Luke, A. J. Rossini, Na Li, and H. Sevcikova. 2021. Snow: Simple Network of Workstations. https://CRAN.R-project.org/package=snow.

Footnotes

  1. The \(\chi^2\)AR(1), in a more general form, can have an intercept (\(X_{t} = c + \phi X_{t-1} + a_t, \quad a_t \sim \chi^2(\nu)\). Since the intercept was set to zero in the simulation study, we discussed a zero-intercept version of this model (\(c=0\)) in the paper. See the Supplemental Materials for more details.↩︎