Daily dynamics and weekly rhythms: A tutorial on seasonal ARMA models combined with day-of-week effects

Haqiqatkhah, Mohammadhossein Manuel; Hamaker, Ellen L.

doi:10.31234/osf.io/duvqh

Daily dynamics and weekly rhythms

Reproducible Code

Reproducible code for Daily dynamics and weekly rhythms: A tutorial on seasonal ARMA models combined with day-of-week effects.

Authors

Affiliations

Haqiqatkhah, Mohammadhossein Manuel

Utrecht University

Hamaker, Ellen L.

Utrecht University

1 Introduction

This document contains reproducible code of the manuscript by Haqiqatkhah and Hamaker (2024) on combining day-of-week effects and week-to-week dynamics with day-to-day dynamics. For attribution, please cite as

Haqiqatkhah, M. M., & Hamaker, E. L. (2024, February 20). Daily dynamics and weekly rhythms: A tutorial on seasonal ARMA models combined with day-of-week effects. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/duvqh

This document has two main sections:

In Section 2, we present the functions used for making the visualizations (Section 2.1), generating simulated time series (Section 2.2), running the Shiny app accompanying the paper (Section 2.3), and fitting models to empirical data (Section 2.4).

In Section 3, we provide the empirical dataset and additional plots for empirical time series (Section 3.1), which is followed by the reproducible code for generating figures shown in the paper (Section 3.2), running all the the analyses on the empirical dataset (Section 3.3) and making the tables reported in the paper (Section 3).

To replicate the study from the scratch, you should first either clone the repository (using git clone https://github.com/psyguy/WeCycle.git) or download the repository as a zip file and extract it on your machine. Then you can run the .R files you find in the scripts folder with the following order:

source("scripts/initialization.R")
source("scripts/functions_simulation.R")
source("scripts/functions_visualization.R")
source("scripts/functions_modeling.R")
source("scripts/run_figures.R")
source("scripts/run_analyses.R")

Alternatively, if you have Quarto installed installed on your machine, you can compile index.qmd located in the root directory using Quarto after setting the following variables to TRUE:

load_functions <- FALSE
run_figures <- FALSE
run_analyses <- FALSE

2 R functions

The code used for plotting the time series and fitting the models requires the time series $y_t$ to be stored in a data.frame (which, let us call df) with at least three columns:

t: Indicating the time of the measurement
y: The value $y_t$ on time $t$
weekday (or weekday_num): The name (or number) of the weekday corresponding to t. In the case of the former, it should be in the form of capitalized three letter codes ("Mon", "Tue", …, "Sun"). Note that we consider Monday to be the first day of the week, and Sunday be the 0th/7th day of the week.

The column weekday (and weekday_num) can be generated if the date (e.g., as date) is included in df, using lubridate::wday and setting:

df <- df %>% 
  mutate(weekday = lubridate::wday(date,
                                   week_start = 1,
                                   label = TRUE),
         weekday_num = lubridate::wday(date,
                                       week_start = 1,
                                       label = FALSE))

The missing values in the time series should be explicitly indicated with NA in the dataframe—that is, we should have a row for each time point—which can be achieved by:

df <- df %>%
  right_join(data.frame(t = min(df$t):max(df$t)),
             by = "t") %>% 
  arrange(t)

2.1 Time series visualizations

The visualizations shown in discussed in the paper were plotted using separate functions for each plot, that are named accordingly plot_hist() , plot_seq(), plot_dowe(), plot_psd(), plot_acf(), and plot_pacf(). The main argument of these functions is d, which can be a numerical vector (for which the weekdays are added, starting by Monday), or it can be a dataframe with the columns specifiedexplained earlier.

Click to reveal plot_hist()

plot_hist <- function(d = NULL,
                      title = " ",
                      subtitle = NULL,
                      remove_titles = TRUE,
                      remove_xlab = TRUE,
                      scale_rel = 0.9,
                      max_acf.lag = 35,
                      max_period = 15,
                      ymin = 0-0.1,
                      ymax = 4+0.1,
                      max_t = 140,
                      max_weeks = 25,
                      col_weekly = "lightsteelblue4",
                      col_dowe.line = "mediumorchid4",
                      col_dowe.point = "deeppink1",
                      col_ts = "lightsteelblue4",
                      col_ts.point = "cornflowerblue",
                      col_hist = "cornflowerblue",
                      col_acf = "darkolivegreen3",
                      col_pacf = "darkorange3",
                      col_spec = "darkorchid4",
                      col_hlines = "dimgray") {

  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  breaks_y <- seq(floor(ymin),
                  ceiling(ymax))
  # Making sure the limits are not off
  ymin <- min(min(d$y), ymin)
  ymax <- max(max(d$y), ymax)

  p_out <- d %>%
    mutate(group_mean = mean(y,
                             na.rm = TRUE)) %>%
    group_by(weekday,
             .add = TRUE) %>%
    mutate(weekday_mean = mean(y,
                               na.rm = TRUE)) %>%
    ungroup() %>%
    ggplot() +
    aes(x = y) +
    geom_histogram(
      aes(y = after_stat(ndensity)),
      center = 0,
      bins = 40,
      fill = col_hist
    ) +
    geom_vline(
      aes(xintercept = group_mean),
      linetype = "dashed",
      linewidth = rel(scale_rel * 0.2)
    ) +
    labs(subtitle = "Distribution",
         x = "y",
         y = title) +
    scale_x_continuous(breaks = breaks_y) +
    xlim(ymin, ymax) +
    theme(
      panel.grid.major = element_blank(),
      # axis.title.y = element_blank(),
      axis.ticks.y = element_blank(),
      axis.line.y = element_blank(),
      axis.title.y = element_text(size = rel(1.4*scale_rel)),
      axis.text.y = element_blank()
    )

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)

  p_out

}

Click to reveal plot_seq()

plot_seq <- function(d = NULL,
                     title = NULL,
                     subtitle = NULL,
                     remove_titles = TRUE,
                     remove_xlab = TRUE,
                     scale_rel = 0.9,
                     max_acf.lag = 35,
                     max_period = 15,
                     ymin = 0-0.1,
                     ymax = 4+0.1,
                     max_t = 140,
                     max_weeks = 25,
                     col_weekly = "lightsteelblue4",
                     col_dowe.line = "mediumorchid4",
                     col_dowe.point = "deeppink1",
                     col_ts = "lightsteelblue4",
                     col_ts.point = "cornflowerblue",
                     col_hist = "cornflowerblue",
                     col_acf = "darkolivegreen3",
                     col_pacf = "darkorange3",
                     col_spec = "darkorchid4",
                     col_hlines = "dimgray") {


  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  # Making sure the limits are not off
  ymin <- min(min(d$y), ymin)
  ymax <- max(max(d$y), ymax)

  p_out <- d %>%
    data_shaper() %>%
    mutate(group_mean = mean(y,
                             na.rm = TRUE)) %>%
    group_by(weekday,
             .add = TRUE) %>%
    mutate(weekday_mean = mean(y,
                               na.rm = TRUE)) %>%
    ungroup() %>%
    ggplot() +
    aes(x = t,
        y = y) +
    geom_hline(
      aes(yintercept = group_mean),
      linetype = "dashed",
      linewidth = rel(scale_rel * 0.2)
    ) +
    geom_line(
      color = col_ts,
      alpha = 1,
      linewidth = rel(scale_rel * 0.2)
    ) +
    geom_point(aes(y = y),
               color = col_ts.point,
               size = rel(scale_rel * 0.12)) +
    scale_y_continuous(# breaks = breaks_y,
      limits = c(ymin, ymax)) +
    xlim(0, max_t) +
    labs(subtitle = "Sequence plot",
         x = "t",
         y = "y")

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)
  p_out

}

Click to reveal plot_dowe()

plot_dowe <- function(d = NULL,
                      title = NULL,
                      subtitle = NULL,
                      remove_titles = TRUE,
                      remove_xlab = TRUE,
                      scale_rel = 0.9,
                      max_acf.lag = 35,
                      max_period = 15,
                      ymin = 0-0.1,
                      ymax = 4+0.1,
                      max_t = 140,
                      max_weeks = 25,
                      col_weekly = "lightsteelblue4",
                      col_dowe.line = "mediumorchid4",
                      col_dowe.point = "deeppink1",
                      col_ts = "lightsteelblue4",
                      col_ts.point = "cornflowerblue",
                      col_hist = "cornflowerblue",
                      col_acf = "darkolivegreen3",
                      col_pacf = "darkorange3",
                      col_spec = "darkorchid4",
                      col_hlines = "dimgray") {

  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  # Making sure the limits are not off
  ymin <- min(min(d$y), ymin)
  ymax <- max(max(d$y), ymax)

  p_out <- d %>%
    data_shaper() %>%
    mutate(group_mean = mean(y,
                             na.rm = TRUE)) %>%
    group_by(weekday,
             .add = TRUE) %>%
    mutate(weekday_mean = mean(y,
                               na.rm = TRUE)) %>%
    ungroup() %>%
    group_by(week_num,
             .add = FALSE) %>%
    filter(week_num <= max_weeks) %>%
    ggplot() +
    aes(x = weekday,
        y = y,
        group = week_num) +
    geom_hline(
      aes(yintercept = group_mean),
      linetype = "dashed",
      linewidth = rel(scale_rel * 0.2),
      alpha = 1
    ) +
    geom_line(
      alpha = 0.5,
      color = col_weekly,
      linewidth = rel(scale_rel * 0.4)
    ) +
    geom_point(aes(y = y),
               alpha = 0.6,
               color = col_ts.point,
               size = rel(scale_rel * 0.12)) +
    geom_line(
      aes(y = weekday_mean),
      color = col_dowe.line,
      alpha = 1,
      linewidth = rel(scale_rel * 1)
    ) +
    geom_point(aes(y = weekday_mean),
               color = col_dowe.point,
               size = rel(scale_rel * 0.8)) +
    labs(subtitle = "Weekly plot",
         x = "Weekdays",
         y = "y") +
    scale_y_continuous(#breaks = breaks_y,
      limits = c(ymin, ymax)) +
    theme(panel.grid.major.x = element_blank(),
          axis.title.y = element_blank(),
          axis.text.x = element_text(angle = 90,
                                     size = rel(1 * scale_rel)))

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)
  p_out

}

Click to reveal plot_psd()

plot_psd <- function(d = NULL,
                     title = NULL,
                     subtitle = NULL,
                     remove_titles = TRUE,
                     remove_xlab = TRUE,
                     scale_rel = 0.9,
                     max_acf.lag = 35,
                     max_period = 15,
                     ymin = 0-0.1,
                     ymax = 4+0.1,
                     max_t = 140,
                     max_weeks = 25,
                     col_weekly = "lightsteelblue4",
                     col_dowe.line = "mediumorchid4",
                     col_dowe.point = "deeppink1",
                     col_ts = "lightsteelblue4",
                     col_ts.point = "cornflowerblue",
                     col_hist = "cornflowerblue",
                     col_acf = "darkolivegreen3",
                     col_pacf = "darkorange3",
                     col_spec = "darkorchid4",
                     col_hlines = "dimgray") {

  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  # Imputing the missing values using seasonal Kalman smoothing
  y_imp <- d$y %>%
    ts(frequency = 7) %>%
    imputeTS::na_kalman()

  ## Calculate Fourier components
  y_imp <- y_imp %>% as.numeric
  n <- length(y_imp)
  Freq = (1:n - 1) / n
  var_component <- Mod(fft(scale(y_imp, scale = FALSE))) ^ 2 / n^2
  df_fourier <- data.frame(Freq = Freq,
                           rel_power = 100*var_component/var(y_imp))%>%
    mutate(Period = round(1 / Freq, 1)) %>%
    filter(Freq != 0,
           Freq <= 0.5,
           Period <= max_period) %>%
    summarise(rel_power = sum(rel_power),
              .by = Period) %>%
    mutate(Freq = 1 / Period,
           .before = 1)

  max_var_component <- max(df_fourier$rel_power)

  p_out <- df_fourier %>%
    ggplot() +
    aes(
      x = Period,
      xend = Period,
      y = rel_power,
      yend = 0
    ) +
    geom_rect(
      aes(
        xmin = 6.5,
        xmax = 7.5,
        ymin = 0,
        ymax = Inf
      ),
      alpha = 0.7,
      fill = "azure2"
    ) +
    geom_segment(linewidth = rel(scale_rel * 0.5),
                 color = col_spec) +
    scale_x_continuous(breaks =
                         seq(0,
                             max_period - 0.5,
                             7)) +
    scale_y_continuous(# labels = scaleFUN,
      breaks = scales::breaks_pretty(4),
      limits = c(0, max(1, max_var_component))) +
    theme(
      # axis.title.y = element_blank(),
      # axis.ticks.y = element_blank(),
      # axis.line.y = element_blank(),
      # axis.text.y = element_blank(),
      panel.grid.major = element_blank()
    ) +
    labs(subtitle = "Power spectral density",
         x = "Period (in days)",
         y = "% total power")

  if(max_var_component < 1)
    p_out <- p_out +
    scale_y_continuous(breaks = c(0,1),
                       limits = c(0, 1))

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)
  p_out

}

Click to reveal plot_acf()

plot_acf <- function(d = NULL,
                     title = NULL,
                     subtitle = NULL,
                     remove_titles = TRUE,
                     remove_xlab = TRUE,
                     scale_rel = 0.9,
                     max_acf.lag = 35,
                     max_period = 15,
                     ymin = 0-0.1,
                     ymax = 4+0.1,
                     max_t = 140,
                     max_weeks = 25,
                     col_weekly = "lightsteelblue4",
                     col_dowe.line = "mediumorchid4",
                     col_dowe.point = "deeppink1",
                     col_ts = "lightsteelblue4",
                     col_ts.point = "cornflowerblue",
                     col_hist = "cornflowerblue",
                     col_acf = "darkolivegreen3",
                     col_pacf = "darkorange3",
                     col_spec = "darkorchid4",
                     col_hlines = "dimgray") {

  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  # Imputing the missing values using seasonal Kalman smoothing
  y_imp <- d$y %>%
    ts(frequency = 7) %>%
    imputeTS::na_kalman()

  breaks_acf <- seq(0,
                    max_acf.lag,
                    by = 14 * floor(max_acf.lag / 7 / 3))

  df_acf <- data.frame(
    lag = c(0:max_acf.lag),
    acf = stats::acf(y_imp,
                     lag.max = max_acf.lag,
                     plot = FALSE)$acf %>%
      as.numeric()
  )

  p_out <- df_acf %>%
    ggplot(aes(x = lag,
               y = acf)) +
    geom_segment(
      aes(
        x = lag,
        xend = lag,
        y = 0,
        yend = acf
      ),
      linewidth = rel(scale_rel * 35 / max_acf.lag / 2),
      color = col_acf,
      lineend = "butt"
    ) +
    geom_hline(
      yintercept = 0,
      linewidth = rel(scale_rel * 0.3),
      linetype = "solid",
      color = col_hlines
    ) +
    scale_x_continuous(breaks = breaks_acf) +
    scale_y_continuous(breaks = c(-.5, 0, 0.5, 1),
                       limits = c(-.25, 1.1)) +
    labs(subtitle = "ACF",
         x = "Lag",
         y = "") +
    theme(
      panel.grid.major = element_blank(),
      axis.title.y = element_blank()
    )

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)
  p_out

}

Click to reveal plot_pacf()

plot_pacf <- function(d = NULL,
                      title = NULL,
                      subtitle = NULL,
                      remove_titles = TRUE,
                      remove_xlab = TRUE,
                      scale_rel = 0.9,
                      max_pacf.lag = 35,
                      max_period = 15,
                      ymin = 0-0.1,
                      ymax = 4+0.1,
                      max_t = 140,
                      max_weeks = 25,
                      col_weekly = "lightsteelblue4",
                      col_dowe.line = "mediumorchid4",
                      col_dowe.point = "deeppink1",
                      col_ts = "lightsteelblue4",
                      col_ts.point = "cornflowerblue",
                      col_hist = "cornflowerblue",
                      col_acf = "darkolivegreen3",
                      col_pacf = "darkorange3",
                      col_spec = "darkorchid4",
                      col_hlines = "dimgray") {

  # Transforming the input to an appropriate dataframe
  d <- d %>% data_shaper()

  # Making sure the optimal theme is in place
  theme_set(ggthemes::theme_few())

  # Imputing the missing values using seasonal Kalman smoothing
  y_imp <- d$y %>%
    ts(frequency = 7) %>%
    imputeTS::na_kalman()

  breaks_acf <- seq(0,
                    max_pacf.lag,
                    by = 14 * floor(max_pacf.lag / 7 / 3))

  df_pacf <- data.frame(
    lag = c(0:max_pacf.lag),
    pacf = stats::pacf(y_imp,
                       max(1, max_pacf.lag),
                       na.action = na.exclude,
                       plot = FALSE)$acf %>%
      as.numeric() %>%
      c(0, .)
  )

  p_out <- df_pacf %>%
    ggplot(aes(x = lag,
               y = pacf)) +
    geom_segment(
      aes(
        x = lag,
        xend = lag,
        y = 0,
        yend = pacf
      ),
      linewidth = rel(scale_rel * 35 / max_pacf.lag / 2),
      color = col_pacf,
      lineend = "butt"
    ) +
    geom_hline(
      yintercept = 0,
      linewidth = rel(scale_rel * 0.3),
      linetype = "solid",
      color = col_hlines
    ) +
    scale_x_continuous(breaks = breaks_acf) +
    scale_y_continuous(breaks = c(-.5, 0, 0.5, 1),
                       limits = c(-.25, 1.1)) +
    labs(subtitle = "PACF",
         x = "Lag",
         y = "") +
    theme(
      panel.grid.major = element_blank(),
      axis.title.y = element_blank()
    )

  if (remove_titles)
    p_out <- p_out + theme(plot.subtitle = element_blank())
  if (remove_xlab)
    p_out <- p_out + xlab(NULL)
  p_out

}

To make sure that the data provided to these functions are in the correct format (with the above-mentioned columns), the function data_shaper() is called within these plotting functions to do the job. This function also amends a new column (week_num) to the dataframe that counts the week number since the start of the time series, which is needed for plotting the DOWEs in plot_dowe().

Click to reveal data_shaper()

data_shaper <- function(d,
                        minimal_output = FALSE) {

  weekdays <- c("Mon",
                "Tue",
                "Wed",
                "Thu",
                "Fri",
                "Sat",
                "Sun")

  if (!is.data.frame(d)){
    d <- data.frame(
      t = 1:length(d),
      y = d,
      weekday = rep(weekdays, length.out = length(d)),
      week_num = rep(1:ceiling(length(d) / 7), each = 7)[1:length(d)]
    )
    # if(minimal_output == TRUE) return(d)
  }

  if ("date" %in% colnames(d))
    d <- d %>%
      mutate(
        weekday = lubridate::wday(date,
                                  week_start = 1,
                                  label = TRUE),
        weekday_num = lubridate::wday(date,
                                      week_start = 1,
                                      label = FALSE)
      )

  if (!("weekday_num" %in% colnames(d)))
    d$weekday_num <- match(d$weekday, weekdays)

  if (!("week_num" %in% colnames(d))) {
    # Initialize week number and an empty vector to store week numbers
    week_number <- 1
    week_numbers <- numeric(length(d$weekday_num))
    # Check if the sequence starts with a day other than Monday and adjust week_number accordingly
    if (d$weekday_num[1] != 1) {
      week_number <- 1
    } else {
      week_number <-
        2  # Start from week 2 if the first day is Monday, to handle edge cases
    }
    # Iterate through the days, increasing week number after encountering a Sunday
    for (i in 1:length(d$weekday_num)) {
      week_numbers[i] <- week_number
      if (d$weekday_num[i] == 7 &&
          i != length(d$weekday_num)) {
        # Check for Sunday and not the last element
        week_number <- week_number + 1
      }
    }
    d$week_num <- week_numbers
  }

  # Substituting NA's for implicit missing values
  d_out <- d %>%
    right_join(data.frame(t = min(d$t):max(d$t)),
               by = "t") %>%
    mutate(weekday = weekday %>% factor(weekdays)) %>%
    arrange(t)

  return(d_out)

}

Then, the function plot_row_assembly() uses the above functions to put them together in a row and returns either the plot, or saves it in a file in the figures folder with dimensions and sizes that render in nice proportions when the plot is saved with a .svg (good for putting in Word or online) or .pdf (good for $\LaTeX$ manuscripts) file formats.

This function takes a list of time series (either as vectors or dataframes) in its list_data argument, and adds vertical labels to each row of the plots with the values passed to list_labels.

Click to reveal plot_row_assembly()

plot_row_assembly <- function(list_data,
                              list_labels = NULL,
                              save_name = "plot-rows.pdf",
                              save_dir = "figures",
                              ...) {
  n_rows <- length(list_data)

  l_hist <- list()
  l_seq <- list()
  l_dowe <- list()
  l_psd <- list()
  l_acf <- list()
  l_pacf <- list()

  title_r <- NULL

  for (r in 1:n_rows) {
    d <- list_data[[r]]

    if (length(list_labels) == n_rows)
      title_r <- list_labels[[r]]

    rm_titles <- TRUE
    rm_xlab <- TRUE

    if (r == 1) {
      rm_titles <- FALSE
      rm_xlab <- TRUE
    }

    if (r == n_rows) {
      rm_titles <- TRUE
      rm_xlab <- FALSE
    }

    if (n_rows == 1) {
      rm_titles <- FALSE
      rm_xlab <- FALSE
    }

    l_hist[[r]] <-  #label_plot(r) +
      plot_hist(
        d,
        title = title_r,
        remove_titles = rm_titles,
        remove_xlab = rm_xlab,
        ...
      )
    l_seq[[r]] <-  plot_seq(d,
                            remove_titles = rm_titles,
                            remove_xlab = rm_xlab,
                            ...)
    l_dowe[[r]] <-  plot_dowe(d,
                              remove_titles = rm_titles,
                              remove_xlab = rm_xlab,
                              ...)
    l_psd[[r]] <-  plot_psd(d,
                            remove_titles = rm_titles,
                            remove_xlab = rm_xlab,
                            ...)
    l_acf[[r]] <-  plot_acf(d,
                            remove_titles = rm_titles,
                            remove_xlab = rm_xlab,
                            ...)
    l_pacf[[r]] <-  plot_pacf(d,
                              remove_titles = rm_titles,
                              remove_xlab = rm_xlab,
                              ...)

  }


  p_tot <- cowplot::plot_grid(
    l_hist %>% wrap_plots(ncol = 1),
    l_seq %>% wrap_plots(ncol = 1),
    l_dowe %>% wrap_plots(ncol = 1),
    l_psd %>% wrap_plots(ncol = 1),
    l_acf %>% wrap_plots(ncol = 1),
    l_pacf %>% wrap_plots(ncol = 1),
    nrow = 1,
    rel_widths = c(.75, 2, 1, 1, 1, 1)
  )

  if (is.character(save_name))
    ggsave(
      here(save_dir, save_name),
      p_tot,
      width = 40,
      height = 4 * n_rows + 0.5,
      units = "cm"
    )
  else
    return(p_tot)

}

2.2 Simulating time series

Pure time series (i.e., without time index and weekday) are generated using m_sim() that generate $y_t = \mu_t + a_t$ by adding a deterministic vector mu_t to a stochastic time series a_t. The innovations used in for simulation are generated from a random seed set by seed argument. The arguments burnin indicate the number of samples thrown away as burn-in samples.

The function generates three time-varying versions of $\mu_t$ as explained in the paper in three vectors:

$D_t$ (as d_t_0) based on the weekday means (DOWEs) that are passed as a vector to dowe (default: all zeros);
$W_t$ (as w_t_0) with workdays mean of 0, and weekend effect determined by the argument wee; and
$H_t$ (as h_t_0) with overall mean of 0 and given amplitude amp (default: 0), and peak shift peak_shift (default: 1, for Monday).

The time-varying mean mu_t is constructed by adding c to d_t_0, w_t_0, h_t_0. This approach allows users to investigate the effect imposing multiple mechanisms for the mean structure $\mu_t$. Needless to say, $D_t$, in general, can account for any arbitrary shape in the mean structure; yet, our implementation may come in handy in the Shiny app (see Section 2.3).

The stochastic component $a_t$ is generated using sarima::sim_sarima(). Since this function cannot generate white noise process, if no SAR[I]MA component is specified, a_t is generated manually. In our simulation function m_sim() we allow including non-seasonal and seasonal differencing in the model (which we did not discuss in the paper, but implemented in the Shiny app).

Click to reveal m_sim()

m_sim <- function(n = 10000,
                  ar = 0,
                  ma = 0,
                  sar = 0,
                  sma = 0,
                  c = 0,
                  iorder = 0,
                  siorder = 0,
                  sigma2 = 1,
                  dowe = rep(0, 7),
                  wee = 0,
                  amp = 0,
                  peak_shift = 1,
                  burnin = 500,
                  seed = 0) {

  ## Generating (time-varying) mean structure mu_t
  # Generating sequence of d_t
  d_t_0 <- cbind(
    Mon = rep(c(1, 0, 0, 0, 0, 0, 0), length.out = n),
    Tue = rep(c(0, 1, 0, 0, 0, 0, 0), length.out = n),
    Wed = rep(c(0, 0, 1, 0, 0, 0, 0), length.out = n),
    Thu = rep(c(0, 0, 0, 1, 0, 0, 0), length.out = n),
    Fri = rep(c(0, 0, 0, 0, 1, 0, 0), length.out = n),
    Sat = rep(c(0, 0, 0, 0, 0, 1, 0), length.out = n),
    Sun = rep(c(0, 0, 0, 0, 0, 0, 1), length.out = n)
  ) %*%
    dowe %>%
    as.numeric()

  ## Generating sequence of w_t_0, which is w_t with c = 0
  # Making weekday-weekend dummies matrix
  w_t_0 <- c(rep(0, 5), rep(wee, 2)) %>% rep(length.out = n)

  # Generating sequence of h_t_0, which is h_t with c = 0
  h_t_0 <- amp*cos((2*pi/7)*((1:n) - peak_shift))

  # Adding d_t, h_t, and w_t together for extra capabilities
  mu_t <- c + d_t_0 + w_t_0 + h_t_0

  ## Generating stochastic component a_t
  # Setting the seed
  set.seed(seed)

  # Making the list of model specifications
  l_model = list(ar = ar,
                 ma = ma,
                 sar = sar,
                 sma = sma,
                 iorder = iorder,
                 siorder = siorder,
                 nseasons = 7,
                 sigma2 = sigma2
  )

  # simulating the data
  if (sum(abs(as.numeric(l_model))) > 7 + sigma2)
    a_t <- sarima::sim_sarima(n = n,
                              model = l_model,
                              n.start = burnin)
  else
    a_t <- rnorm(n + burnin,
                 mean = 0,
                 sd = sqrt(sigma2)) %>%
    tail(n)

  ## Making the complete time series
  y_t <- mu_t + a_t
  return(y_t)
}

2.3 Shiny app

The Shiny app makes use of plot_sim_rows() to simulate data and make visualizations. The code for this app is as follows:

Click to reveal the Shiny app source code

library(shiny)
library(patchwork)
library(tidyverse)
library(imputeTS)
library(forecast)
library(stats)
library(sarima)
library(lubridate)

## To run the app locally, uncomment the following two lines and
## comment the next two `source(...)` function calls:
# source(here::here("scripts/functions_simulation.R"))
# source(here::here("scripts/functions_visualization.R"))

## To deploy on server, comment the above `source(...)`
## function calls and uncomment the following:
source("functions_simulation.R")
source("functions_visualization.R")

options(warn = -1)

ui <- fluidPage(
  tags$style(
    type = "text/css",
    "
               .shiny-input-container { overflow: hidden; }
               .shiny-input-spinner { display: none; }
               .shiny-slider-input { display: inline-block; width: 70%; vertical-align: middle; }
               .slider-label { display: inline-block; width: 20%; text-align: right; margin-right: 10px; vertical-align: middle; }
               "
  ),

  titlePanel("Simulating SARMA processes with time-dependent mean structures"),

  sidebarLayout(
    sidebarPanel(
      style = "height: 90vh; overflow-y: auto;",
      # Make the sidebar responsive

      actionButton("go", "Update the plots"),
      # N, seed, mean
      hr(style = "border-top: 1px solid #000000;"),

      fluidRow(
        column(
          3,
          numericInput(
            "fixed_n",
            "N (x100):",
            value = 1.5,
            width = '120px',
            step = 1
          )
        ),
        column(
          3,
          numericInput("fixed_sigma2", "sigma2:", value = 1, width = '60px')
        ),
        column(
          3,
          numericInput(
            "fixed_seed",
            "Seed:",
            value = 0,
            step = 1,
            width = '60px'
          )
        ),
        column(3, numericInput(
          "fixed_c", "c:", value = 0, width = '60px'
        ))
      ),
      hr(style = "border-top: 1px solid #000000;"),

      # phi, Phi
      fluidRow(
        column(
          6,
          tags$div(class = "slider-label", "ϕ:"),
          sliderInput(
            "fixed_ar",
            NULL,
            min = -0.9,
            max = 0.9,
            value = 0.45,
            step = 0.15
          )
        ),
        column(
          6,
          tags$div(class = "slider-label", "θ:"),
          sliderInput(
            "fixed_ma",
            NULL,
            min = -0.9,
            max = 0.9,
            value = 0.5,
            step = 0.15
          )
        )
      ),

      # theta, Theta
      fluidRow(
        column(
          6,
          tags$div(class = "slider-label", "Φ:"),
          sliderInput(
            "fixed_sar",
            NULL,
            min = -0.9,
            max = 0.9,
            value = 0.2,
            step = 0.15
          )
        ),
        column(
          6,
          tags$div(class = "slider-label", "Θ:"),
          sliderInput(
            "fixed_sma",
            NULL,
            min = -0.9,
            max = 0.9,
            value = 0.1,
            step = 0.1
          )
        )
      ),
      hr(style = "border-top: 1px solid #000000;"),

      # Days
      fluidRow(
        column(3, numericInput(
          "Mon", "Mon:", value = 1, width = '60px'
        )),
        column(3, numericInput(
          "Tue", "Tue:", value = 0, width = '60px'
        )),
        column(3, numericInput(
          "Wed", "Wed:", value = 2, width = '60px'
        )),
        column(3, numericInput(
          "Thu", "Thu:", value = -2, width = '60px'
        ))
      ),

      fluidRow(
        column(3, numericInput(
          "Fri", "Fri:", value = 3, width = '60px'
        )),
        column(3, numericInput(
          "Sat", "Sat:", value = 1, width = '60px'
        )),
        column(3, numericInput(
          "Sun", "Sun:", value = -1, width = '60px'
        )),
        column(3, numericInput(
          "Mult", "Scale:", value = 0.5, width = '60px'
        )),
        column(3) # Empty column to align the days correctly
      ),

      # Amplitude and phase shift
      fluidRow(
        column(
          4,
          numericInput("fixed_amp", "Amplitude:", value = 1.5, width = '100px')
        ),
        column(
          4,
          tags$div(class = "slider-label", "φ:"),
          sliderInput(
            "fixed_peak_shift",
            NULL,
            min = 0,
            max = 7,
            step = 0.5,
            value = 2
          )
        ),
        # Weekend effect
        column(
          4,
          numericInput(
            "fixed_wee",
            "Weekend effect:",
            value = 2,
            width = '100px'
          )
        )
      ),
      width = 4
    ),

    # Plot
    mainPanel(plotOutput(
      "myImage", height = "150px", width = "350px"
    ),
    width = 8)
  )
)


server <- function(input, output) {
  p <- eventReactive(input$go, {
    plot_sim_rows(
      fixed_c = input$fixed_c,
      fixed_dowe = c(
        input$Mon,
        input$Tue,
        input$Wed,
        input$Thu,
        input$Fri,
        input$Sat,
        input$Sun
      ) * input$Mult,
      fixed_amp = input$fixed_amp,
      fixed_peak_shift = input$fixed_peak_shift,
      fixed_wee = input$fixed_wee,
      fixed_sigma2 = input$fixed_sigma2,
      fixed_ma = input$fixed_ma,
      fixed_ar = input$fixed_ar,
      fixed_sar = input$fixed_sar,
      fixed_sma = input$fixed_sma,
      fixed_n = input$fixed_n * 100,
      fixed_seed = input$fixed_seed,
      prefix = NULL,
      for_shiny = TRUE,
      # prefix = NULL,
      file_format = "svg"
    )

  },
  ignoreNULL = FALSE)


  output$myImage <- renderPlot({
    print(p())
  }, width = 40 * 28.35 / 1.4,
  height = (4 * 7 + 0.5) * 28.35 / 2) # Convert cm to pixels if needed

}


shinyApp(ui = ui, server = server)

The app is hoseted online at https://psyguy.shinyapps.io/weekly-rhythms. To run the app locally, run source(here::here("scripts", "app.R").

2.4 Modeling time series data

There are to functions that are used for analyzing the data: m_fit() does the model fitting, and m_estimates() extracts the parameter estimates and post-processes them to get CIs and information criteria, and, if necessary, performs bootstrapping.

2.4.1 Fitting time series models

To carry out the analyses in the paper, we wrote a wrapper function called m_fit() around forecast::Arima() with additional capabilities which is a generalization of the code snippets shown in the paper.

This function flexibly determines the model orders by parsing the model name (as they were presented in the paper) passed as a string to model_string, which helps using it in elsewhere, e.g., in a Shiny app.

Click to reveal m_fit()

m_fit <- function(d,
                  model_string = "d + sarma(1,1)(1,1)",
                  id = NULL,
                  save_fit = FALSE,
                  save_est = FALSE,
                  save_name = NULL,
                  save_prefix_fit = "fit",
                  save_folder_fit = "fits",
                  save_prefix_est = "est",
                  save_folder_est = "ests",
                  ...) {

  model_string <- model_string %>%
    tolower() %>%
    gsub(" ", "", .)

  # Extracting mean structure
  mu <- "c"
  if (grepl("d", model_string, fixed = TRUE))
    mu <- "d"
  if (grepl("w", model_string, fixed = TRUE))
    mu <- "w"
  if (grepl("h", model_string, fixed = TRUE))
    mu <- "h"

  # Extracting [S]AR[I]MA orders
  orders <- model_string %>%
    str_extract_all("\\d+") %>%
    unlist() %>%
    as.numeric()


  # Adding NA to make sure the first element is for Monday
  y_t <- d$y %>%
    c(rep(NA, (d$weekday_num[1] - 1) %% 7),
      .)
  # Getting the length of the time series
  n <- length(y_t)

  ## Building mu_t
  # To make use Arima consistently, instead of changing
  # include.mean for mu_y = c, we model mean with a constant
  # variable for mu_t
  if (mu == "c")
    mu_t <- rep(1, n)

  # Making the dummies matrix
  if (mu == "d")
    mu_t <- cbind(
      Mon = rep(c(1, 0, 0, 0, 0, 0, 0), length.out = n),
      Tue = rep(c(0, 1, 0, 0, 0, 0, 0), length.out = n),
      Wed = rep(c(0, 0, 1, 0, 0, 0, 0), length.out = n),
      Thu = rep(c(0, 0, 0, 1, 0, 0, 0), length.out = n),
      Fri = rep(c(0, 0, 0, 0, 1, 0, 0), length.out = n),
      Sat = rep(c(0, 0, 0, 0, 0, 1, 0), length.out = n),
      Sun = rep(c(0, 0, 0, 0, 0, 0, 1), length.out = n)
    )

  # Making weekday-weekend dummies matrix
  if (mu == "w")
    mu_t <- cbind(
      # Weekday = rep(c(1, 1, 1, 1, 1, 0, 0), length.out = n),
      c = rep(c(1, 1, 1, 1, 1, 1, 1), length.out = n),
      Weekend = rep(c(0, 0, 0, 0, 0, 1, 1), length.out = n)
    )

  # Making the harmonic matrix
  if (mu == "h")
    mu_t <- cbind(
      c = rep(1, n),
      a_cos = cos((1:n) * 2 * pi / 7),
      b_sin = sin((1:n) * 2 * pi / 7)
    )


  ## Defining the model orders from the model string
  # Default: white noise
  order_daily <- c(0, 0, 0)
  # Default: non-seasonal
  order_seasonal <- c(0, 0, 0)
  seasonal <- NULL

  # Setting orders conditional on the model string
  if (grepl("ar(", model_string, fixed = TRUE)) {
    order_daily <- c(orders[1], 0, 0)
  }
  if (grepl("sar(", model_string, fixed = TRUE)) {
    order_daily <- c(orders[1], 0, 0)
    order_seasonal <- c(orders[2], 0, 0)
  }
  if (grepl("ma(", model_string, fixed = TRUE)) {
    order_daily <- c(0, 0, orders[1])
  }
  if (grepl("sma(", model_string, fixed = TRUE)) {
    order_daily <- c(0, 0, orders[1])
    order_seasonal <- c(0, 0, orders[2])
  }
  if (grepl("arma(", model_string, fixed = TRUE)) {
    order_daily <- c(orders[1], 0, orders[2])
  }
  if (grepl("sarma(", model_string, fixed = TRUE)) {
    order_daily <- c(orders[1], 0, orders[2])
    order_seasonal <- c(orders[3], 0, orders[4])
  }

  if (sum(order_seasonal) != 0)
    seasonal <- list(order = order_seasonal,
                     period = 7)

  ## Fitting the model to the time series
  m <- forecast::Arima(
    y_t,
    order = order_daily,
    seasonal = seasonal,
    xreg = mu_t,
    include.mean = FALSE
  )

  if (is.null(save_name))
    save_name <- paste0(
      mu,
      "+sarma(",
      order_daily[1],
      ",",
      order_daily[3],
      ")(",
      order_seasonal[1],
      ",",
      order_seasonal[3],
      ")"
    )

  if(!is.null(id)) save_name <- paste0("id-",
                                       id,
                                       "_",
                                       save_name)
  if (save_fit == TRUE)
    saveRDS(m,
            file = save_name %>%
              paste0(save_prefix_fit,
                     "_",
                     .,
                     ".rds") %>%
              here::here(save_folder_fit,
                         .))
  if (save_est == TRUE)
    m %>%
    m_estimates(...) %>%
    saveRDS(file = save_name %>%
              paste0(save_prefix_est,
                     "_",
                     .,
                     ".rds") %>%
              here::here(save_folder_est,
                         .))
  ## Returning the model object
  return(m)
}

This function can also save the fit object in a folder (specified by save_folder_fit). To streamline the model fitting and post-processing the fit files, m_fit() will call m_estimates() if save_est is set to TRUE.

2.4.2 Extracting parameter estimates

The output of m_fit() is a list of class Arima that contains multiple components (see ?stats::arima and ?forecast::Arima), but does not contain the estimation standard errors and significances. Furthermore, in case $H_t$ is included in the model, the output does not contain estimates for peak shift $\psi$ and amplitude $S$.

The function m_estimates() extracts the point estimates of the parameters( m$coef) and calculates their estimation standard errors using the estimated variance matrix of the coefficients (m$var.coef), and calculates 2.5% and 97.5% confidence intervals and significance of the estimates, and stores them in a dataframe called estimates. The function returns a list that contains this dataframe as well as a 1-row dataframe information_criteria that includes different information criteria (AIC, AICc, BIC) as well as the absolute value of the log-likelihood (that could have been used as a criterion in model selection, which we did not use in our paper).

Click to reveal m_estimates()

m_estimates <- function(m,
                        boot_plot = FALSE,
                        boot_n = 10000,
                        boot_seed = 0) {
  # Point estimates of the parameters
  o_est <- coef(m)
  # Varviance-covariance matrix of the parameter estimates
  o_vcov <- vcov(m)

  # Changing the name of "xreg" to "c" for mu_t == "c"
  xreg_index <- names(o_est) == "xreg"
  names(o_est)[xreg_index] <- "c"
  rownames(o_vcov)[xreg_index] <- "c"
  colnames(o_vcov)[xreg_index] <- "c"

  o_sd <- o_vcov %>%
    diag() %>%
    sqrt()

  # Making the output table
  estimates <- data.frame(
    parameter = "sigma2",
    est = m$sigma2,
    se = NA,
    CI_2.5 = NA,
    CI_97.5 = NA,
    sig = "*"
  )
  rownames(estimates) <- "sigma2"

  if (length(o_est) > 0)
    estimates <- data.frame(parameter = o_est %>% names,
                            est = o_est,
                            se = o_sd) %>%
    mutate(
      CI_2.5 = est + qnorm(0.025) * se,
      CI_97.5 = est + qnorm(0.975) * se,
      sig = case_when(CI_2.5 * CI_97.5 > 0 ~ "*",
                      TRUE ~ "ns")
    ) %>%
    rbind(estimates)

  rownames(estimates) <- NULL

  # Bootstrapping -----------------------------------------------------------

  # Checking if h_t was fitted which would require bootstrapping
  if (sum(names(m$coef) == "a_cos",
          names(m$coef) == "b_sin")) {
    # Selecting est and vcov of only the relevant parameters for h_t
    o_est_h <- o_est[c("c", "a_cos", "b_sin")]
    o_vcov_h <-
      o_vcov[c("c", "a_cos", "b_sin"), c("c", "a_cos", "b_sin")]

    # Bootstrap sampling
    set.seed(boot_seed)

    boot_parameters <- MASS::mvrnorm(boot_n,
                                     o_est_h,
                                     o_vcov_h)

    boot_c <- boot_parameters[, "c"]
    boot_a <- boot_parameters[, "a_cos"]
    boot_b <- boot_parameters[, "b_sin"]

    boot_s <- sqrt(boot_a^2 + boot_b^2)
    boot_psi_radian <- atan2(boot_b, boot_a) %>%
      as.numeric() %>%
      as.circular()
    boot_psi_radian_quantiles <- boot_psi_radian %>%
      as.circular() %>%
      quantile(c(.025, .5, .975)) %>%
      as.numeric()

    boot_psi <- boot_psi_radian * 7 / (2 * pi)
    boot_psi_radian_mod <- boot_psi_radian %% (2 * pi)
    # plot(boot_s, boot_psi_radian)

    # Bootstrapped summaries
    b_c <- mean(boot_c)
    b_s <- median(boot_s)
    b_s_sd <- sd(boot_s)
    b_s_CIs <- as.numeric(quantile(boot_s, c(0.025, 0.975)))
    b_psi <- as.numeric(median(boot_psi_radian)) * 7 / (2 * pi)
    b_psi_sd <- as.numeric(sd(boot_psi_radian)) * 7 / (2 * pi)
    b_psi_CIs <-
      as.numeric(quantile(boot_psi_radian, c(0.025, 0.975))) * 7 / (2 * pi)

    # Adding the estimates to the table
    boot_estimates <- data.frame(
      parameter = c("amp", "peak_shift"),
      est = c(b_s, b_psi),
      se = c(b_s_sd, b_psi_sd),
      CI_2.5 = c(b_s_CIs[1], b_psi_CIs[1]),
      CI_97.5 = c(b_s_CIs[2], b_psi_CIs[2]),
      sig = c("*", "*")
    )

    estimates <- rbind(estimates, boot_estimates)

    if (boot_plot) {
      t <- seq(0, 7, .1)
      tt <- length(t)
      rr <- 100
      df <- data.frame(
        r = rep(1:rr, each = tt) %>% as.character(),
        c = rep(boot_c %>% head(rr), each = tt),
        s = rep(boot_s %>% head(rr), each = tt),
        psi = rep(boot_psi %>% head(rr), each = tt),
        t = rep(t, times = rr)
      ) %>%
        mutate(h_t = c + s * cos((t - psi) * 2 * pi / 7))

      p_boot <- df %>%
        group_by(r) %>%
        ggplot(alpha = 0.05,
               lwd = 0.1,
               linetype = "solid") +
        aes(x = t,
            y = h_t,
            group = r) +
        # Separate curves for the first 100 bootstrapped samples
        geom_line(aes(
          x = t,
          y = h_t,
          group = r,
          color = "single_curves"
        )) +
        # Average of the separate curves the first 100 bootstrapped samples
        stat_summary(
          fun.y = mean,
          geom = "line",
          lwd = 1.5,
          aes(group = -1, color = "mean_curve"),
          alpha = 1
        ) +
        # Curve calculated by bootstrapping point estimates
        geom_function(
          fun = function(t)
            b_c + b_s * cos(2 * pi / 7 * (t - b_psi)),
          mapping = aes(color = "boot_curve"),
          linetype = "solid",
          lwd = .5,
          alpha = 0.5
        ) +
        geom_hline(yintercept = b_c + b_s,
                   color = "red",
                   linetype = "solid",
                   lwd = 1,
                   alpha = 0.85) +
        geom_vline(xintercept = b_psi %% 7,
                   color = "red",
                   linetype = "solid",
                   lwd = 1,
                   alpha = 0.85) +
        # Curve calculated by a and b point estimates without bootstrapping
        geom_function(
          fun = function(t)
            o_est_h[1] +
            o_est_h[2] * cos(t * 2 * pi / 7) +
            o_est_h[3] * sin(t * 2 * pi / 7),
          mapping = aes(color = "naive_curve"),
          linetype = "twodash",
          lwd = .5,
          alpha = 0.5
        ) +
        geom_hline(yintercept = o_est_h[1] + sqrt(o_est_h[2]^2 + o_est_h[3]^2),
                   color = "green",
                   linetype = "twodash",
                   lwd = 1,
                   alpha = 0.85) +
        geom_vline(xintercept = (calc_peak_theta(o_est_h[2], o_est_h[3])*7/(2*pi)) %% 7,
                   color = "green",
                   linetype = "twodash",
                   lwd = 1,
                   alpha = 0.85) +
        geom_vline(xintercept = (atan(o_est_h[3]/o_est_h[2])*7/(2*pi)) %% 7,
                   color = "black",
                   linetype = "dotted",
                   lwd = 1,
                   alpha = 0.85)
        scale_color_manual(
          name = "Y series",
          values = c(
            "single_curves" = "pink",
            "mean_curve" = "darkblue",
            "boot_curve" = "red",
            "naive_curve" = "green"
          )
        )

    }
  }

  estimates[2:5] <- estimates[2:5] %>% round(4)

  information_criteria <- data.frame(
    AIC = m$aic,
    AICc = m$aicc,
    BIC = m$bic,
    ll_abs = abs(m$loglik)
  )

  o <- list(estimates = estimates,
            information_criteria = information_criteria)

  # if (exists("p_boot"))
  #   o$boot_plot <- p_boot

  return(o)

}

2.4.2.1 Estimating harmonic parameters by bootstrapping

In case the harmonic mean structure is in the model, this function also calculates points estimates and their CIs of peak shift $\psi$ and amplitude $S$ by means of bootstrapping by drawing boot_n samples (default: 10000) from the multivariate normal distribution attributable implied by the estimated variance matrix of the coefficients.

The amplitude $S$ and peak shift $\psi$ are respectively calculated by two auxiliary functions, calc_amp() and calc_peak_theta(), which are vectorized (using Vectorize()) to increase the performance when applied to bootstrapped values. Note that calc_peak_theta(a, b) is equivalent to base::atan2(b, a), which is the 2-argument arctangent function; we explicitly defined this function in our code to mirror the expression provided in Equation 22b of the paper.

Note that the variable $\psi$ is circular with a period of 7, that is, peak shifts of $\psi$ and $\psi + 7$ and $\psi - 7$ are equivalent, thus its value cannot calculate summary statistics such as mean, median, and quantiles based on its $[0, 7)$ values; for instance, an individual with $\psi_1 = 1$ (peaking on Monday) is more similar to someone with $\psi_2 = 6$ (peaking on Saturday) than someone with and $\psi_3 = 4$ (peaking on Thursday). See Cremers and Klugkist (2018) for more insights.

To take the circularity of this variable into account, we use circular package (Lund et al. 2023) and the function circular::as.circular, that treats the output of atan2() in radians and assures that its circular equivalent spans $[0, 2\pi)$. Consequently, there is no need of taking the modulo of the values boot_psi_radian (as explained in Equation 23 of the manuscript), and we only need to multiply them by $\frac{7}{2\pi}$ to reach to boot_psi_radian (that spans $[0, 7)$). Furthermore, given that the distribution of $\psi$ (as well as $S$) will be skewed due to the nonlinear transformation of random variable $a$ and $b$, we use the median to determine the point estimates of $S$ and $\psi$, and calculate the confidence intervals based on 2.5 and 97.5% quantiles.

3 Reproducing figures, analyses, and results

3.1 Importing and investigating the empirical data

The empirical data used in this paper comes form Wright, Hopwood, and Simms (2015). For our study, we used the average PA scores of 98 individuals, who had less than 50% missing responses. This subset of data is stored as r_pa.rds in the data/ folder of the repository, which does not contain demographic information, and the unique person identifies have been shuffled and stored as id. Other variables in the dataset are the measurement date (date), the measurement time (t, starting from the first day of the study on 2013-02-11), the weekday associated with the measurement date (weekday), the average PA time series (y), the weekday number (weekday_num) with Monday being the first day of the week, and the week number (week_num) associated with that date.

The data is shown in the following table, and it can be filtered based on some variables. The data can also be copied to the clipboard or downloaded as PDF, CSV, or an Excel file. In case you use the empirical data in a study, please cite Wright, Hopwood, and Simms (2015) as well.

To generate the visualizations of all individuals, we make 14 batches of seven individuals using the following code:

Click to reveal code

for(batch in 1:14){
  ids <- split(1:98,
               ceiling(seq_along(1:98)/7))[[batch]]
  l_selected_batch <- list()
  for (id_ in ids){
    l_selected_batch[[paste("ID:", id_)]] <- d_pa %>%
      filter(id == id_)
  }
  plot_row_assembly(l_selected_batch,
                    names(l_selected_batch),
                    max_period = 35,
                    save_name = paste0("rows-empirical-35_batch-",
                                       batch,
                                       ".svg"))

}

The resulting figures are shown below:

3.2 Reproducing figures in the paper

3.2.1 Visualizing empirical time series

The empirical time series shown of Persons A-F in the Figure 1 of the paper belonged to individuals with identifiers 70, 14, 12, 43, 97, and 20. To make the same plot, we used the following script, which makes use of plot_row_assembly():

Click to reveal code

d_pa <- readRDS(here::here("data",
                           "d_pa.rds"))
dir.create(file.path(here::here("figures")))

names_selected_6 <-
  c("Person A",
    "Person B",
    "Person C",
    "Person D",
    "Person E",
    "Person F")

d_selected_6 <- d_pa %>%
  mutate(
    individual = case_when(
      id == 70 ~ "Person A",
      id == 14 ~ "Person B",
      id == 12 ~ "Person C",
      id == 43 ~ "Person D",
      id == 97 ~ "Person E",
      id == 20 ~ "Person F",
      TRUE ~ "rest"
    ) %>%
      as.factor()
  ) %>%
  filter(individual != "rest")

l_selected_6 <- list()
for (name in names_selected_6){
  l_selected_6[[name]] <- d_selected_6 %>%
    filter(individual == name)
}

plot_row_assembly(l_selected_6, names(l_selected_6),
                  max_period = 35,
                  save_name = "rows-empirical-35.svg")

That generates the following figure:

Note that here we saved the figure as an svg file to be viewed in the HTML output, which has resulted in fonts different from those shown in the paper.

3.2.2 Visualizing simulated time series

Figures 2-5 were generated by plot_sim_rows(), which uses simulating data with m_sim() and plotting them using plot_row_assembly(), and is defined as follows:

Click to reveal code

plot_sim_rows <- function(fixed_c = 0,
                          fixed_dowe = rep(0, 7),
                          fixed_amp = 0,
                          fixed_peak_shift = 0,
                          fixed_wee = 0,
                          fixed_sigma2 = 1,
                          fixed_ma = 0.6,
                          fixed_ar = 0.7,
                          fixed_sar = 0.4,
                          fixed_sma = 0.5,
                          fixed_n = 10000,
                          fixed_seed = 0,
                          prefix = "c",
                          for_shiny = FALSE,
                          file_format = "svg",
                          ...
){

  if(for_shiny == TRUE)
    l_d_sim_w <- list(
      c = m_sim(
        c = fixed_c,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        sar = fixed_sar,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      ),
      D_t = m_sim(
        dowe = fixed_dowe,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        sar = fixed_sar,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      ),
      H_t = m_sim(
        c = fixed_c,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        sar = fixed_sar,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      ),
      W_t = m_sim(
        c = fixed_c,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        sar = fixed_sar,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      )
    )

  if(for_shiny == FALSE)
    l_d_sim_w <- list(
      wn = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        n = fixed_n,
        seed = fixed_seed
      ),
      ma = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ma = fixed_ma,
        n = fixed_n,
        seed = fixed_seed
      ),
      ar = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        n = fixed_n,
        seed = fixed_seed
      ),
      arma = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        n = fixed_n,
        seed = fixed_seed
      ),
      sma = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ma = fixed_ma,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      ),
      sar = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        sar = fixed_sar,
        n = fixed_n,
        seed = fixed_seed
      ),
      sarma = m_sim(
        c = fixed_c,
        dowe = fixed_dowe,
        amp = fixed_amp,
        peak_shift = fixed_peak_shift,
        wee = fixed_wee,
        sigma2 = fixed_sigma2,
        ar = fixed_ar,
        ma = fixed_ma,
        sar = fixed_sar,
        sma = fixed_sma,
        n = fixed_n,
        seed = fixed_seed
      )
    )

  save_name <- paste0("rows-sim-7-",
                      prefix,
                      ".",
                      file_format)
  if(is.null(prefix) | for_shiny == TRUE){
    save_name <- NULL
    prefix <- "a_t"
  }


  plot_row_assembly(l_d_sim_w,
                    names(l_d_sim_w) %>%
                      toupper() %>%
                      paste0(prefix, "+", .),
                    save_name = save_name,
                    ...)

}

Running the following code makes the simulated figures:

Click to reveal code

plot_sim_rows(fixed_c = 0,
              prefix = "c")

plot_sim_rows(fixed_dowe = c(0.1, 0.3, 0.3, -0.7, 0.3, 0.9, -0.5),
              prefix = "d")

plot_sim_rows(fixed_c = 0.1,
              fixed_amp = 0.8,
              fixed_peak_shift = 6,
              prefix = "h")

plot_sim_rows(fixed_c = 0.1,
              fixed_wee = 0.8,
              prefix = "w")

:::

3.3 Reproducing the analyses of the empirical dataset

The following code was used for the empirical analyses in the paper. In short, we first load the required packages and functions, read the empirical data, and make sure that the folders for saving models fits and estimates exist. Then, a dataframe conditions_table is made, and parallel processing is initiated.

The analyses are done by iterating over rows of this dataframe using foreach() in parallel, and the rows in the dataframe which determine the data of which individual should be analyzed by what mean structure and what SARMA model orders at a given iteration. By running the loop, a fit file and a est file are generated and saved, respectively, in fits/ and ests/ folders. The analyses took less than two minutes on our server with 46 parallel threads.

Click to reveal code for running the analyses

source(here::here("scripts", "initialization.R"))
source(here::here("scripts", "functions_modeling.R"))

# Reading the empirical data
d_pa <- readRDS("data/d_pa.rds")

# Making sure the folders for saving models fits and estimates exist
dir.create(file.path(here::here("fits")))
dir.create(file.path(here::here("ests")))

# Making the table of conditions (i.e, model orders etc.)
conditions_table <- expand.grid(
  id = 1:98,
  mu = c("c", "d", "h", "w"),
  ar = c(0, 1),
  ma = c(0, 1),
  sar = c(0, 1),
  sma = c(0, 1)
)

# Initialization
Sys.setenv("OMP_THREAD_LIMIT" = 46)
options(warn = -1)

# Launching parallel computation
doFuture::registerDoFuture()
future::plan("multisession",
     workers = 46)

# Running the fitting and estimation in parallel
# (t_start <- Sys.time())
fit_logs <- foreach(i = 1:nrow(conditions_table)) %dopar% {

  t_c <- conditions_table[i, ]
  # Getting the data
  d_pa_i <- d_pa %>% filter(id == t_c$id)

  dump <- tryCatch({
    paste0(t_c$mu,
           "+sarma(",
           t_c$ar,
           ",",
           t_c$ma,
           ")(",
           t_c$sar,
           ",",
           t_c$sma,
           ")") %>%
      m_fit(
        d_pa_i,
        .,
        id = t_c$id,
        save_folder_fit = "fits",
        save_folder_est = "ests",
        save_fit = TRUE,
        save_est = TRUE
      )
  }, error = function(e)
    NULL)

  # paste(i,
  #       "done at",
  #       Sys.time()) %>%
  #   print()
}
# (t_end <- Sys.time())
# t_end - t_start

3.4 Results

After fitting the models, the estimate files are read from ests/ to make a large dataframe (harvest) of containing all model estimates and information criteria for all models and all individuals. This dataframe is them modified to include other properties, such as timescales, dynamics, and total model, which is saved as results_estimates and is easier for further inspections.

Click to reveal code for extracting the results

# Harvesting the estimates from est_* files -------------------------------

# Listing the estimated files
list_est_files <- list.files("ests", pattern = ".rds", full.names = TRUE)

# Launching parallel computation
doFuture::registerDoFuture()
future::plan("multisession",
     workers = 46)

# Running the harvest
# Sys.time()
harvest <- foreach(
  i = 1:length(list_est_files),
  .combine = rbind,
  .errorhandling = 'remove'
) %dopar% {
  file_name <- list_est_files[i]
  ee <- readRDS(file_name)
  mu <- gsub(".*_([chdw])\\+.*",
             "\\1",
             file_name)

  nums <- file_name %>%
    str_extract_all("\\d+") %>%
    unlist() %>%
    as.numeric()

  harv <- ee$estimates %>%
    cbind(
      id = nums[1],
      mu = mu,
      ar = nums[2],
      ma = nums[3],
      sar = nums[4],
      sma = nums[5],
      .
    ) %>%
    cbind(ee$information_criteria)
  harv
}
# Sys.time()

# Adding more descriptive variables to the results dataframe
results_estimates <- harvest %>%
  ungroup() %>%
  mutate(
    error_name = paste0("sarma",
                        "(", ar, ",", ma, ")",
                        "(", sar, ",", sma, ")"),
    error_name_words = case_when(
      error_name == "sarma(0,0)(0,0)" ~ "01_wn",
      error_name == "sarma(1,0)(0,0)" ~ "02_ar(1)",
      error_name == "sarma(0,1)(0,0)" ~ "03_ma(1)",
      error_name == "sarma(1,1)(0,0)" ~ "04_arma(1,1)",
      error_name == "sarma(0,0)(1,0)" ~ "05_sar(0)(1)",
      error_name == "sarma(1,0)(1,0)" ~ "06_sar(1)(1)",
      error_name == "sarma(0,0)(0,1)" ~ "07_sma(0)(1)",
      error_name == "sarma(0,1)(0,1)" ~ "08_sma(1)(1)",
      error_name == "sarma(0,0)(1,1)" ~ "09_sarma(0,0)(1,1)",
      error_name == "sarma(0,1)(1,0)" ~ "10_sarma(0,1)(1,0)",
      error_name == "sarma(0,1)(1,1)" ~ "11_sarma(0,1)(1,1)",
      error_name == "sarma(1,0)(0,1)" ~ "12_sarma(1,0)(0,1)",
      error_name == "sarma(1,0)(1,1)" ~ "13_sarma(1,0)(1,1)",
      error_name == "sarma(1,1)(0,1)" ~ "14_sarma(1,1)(0,1)",
      error_name == "sarma(1,1)(1,0)" ~ "15_sarma(1,1)(1,0)",
      error_name == "sarma(1,1)(1,1)" ~ "16_sarma(1,1)(1,1)"
    ) %>%
      as.factor(),
    model_name = paste0(mu, "+", error_name),
    timescale_daily = (ar == 1 | ma == 1),
    timescale_weekly = (sar == 1 | sma == 1),
    error_timescale = case_when(
      !timescale_weekly ~ "daily", # To include wwhite noise as well
      !timescale_daily & timescale_weekly ~ "weekly",
      timescale_daily &
        timescale_weekly ~ "daily + weekly"
    ) %>%
      factor(levels = c("daily", "weekly", "daily + weekly")),
    model_timescale = paste0(mu, " + ", error_timescale),
    dynamics_ar = (ar == 1 | sar == 1),
    dynamics_ma = (ma == 1 | sma == 1),
    error_dynamics = case_when(
      dynamics_ar & !dynamics_ma ~ "(s)ar",
      !dynamics_ar & dynamics_ma ~ "(s)ma",
      dynamics_ar &
        dynamics_ma ~ "(s)ar + (s)ma",
      TRUE ~ "wn"
    ) %>%
      factor(levels = c("wn", "(s)ar", "(s)ma", "(s)ar + (s)ma")),
    model_dynamics = paste0(mu, " + ", error_dynamics),
    .after = sma
  )

# Saving the processed estimates & ICs extracted from est_ files
saveRDS(results_estimates,
        here::here("data", "results_estimates.rds"))

# Reporting the results ---------------------------------------------------

Lastly, the results_estimates is summarized to only contain the number of times each of the 64 models have won according to different information criteria, and the results reported in Table 2 and 3 are obtained using the stats::xtabs(), as implemented in the code.

Click to reveal code for reporting the results

```{r}
#| eval: true
#| code-fold: true
#| code-summary: "Click to reveal code for reporting the results"
#| include: true

# Reading the results dataframe
results_estimates <- readRDS(here::here("data", "results_estimates.rds"))

# Counting selected models
r_s <- results_estimates %>%
  filter(parameter == "sigma2") %>%
  select(-parameter:-sig) %>%
  group_by(id) %>%
  mutate(
    win_AIC = case_when(min(AIC) == AIC ~ 1,
                        TRUE ~ 0),
    win_AICc = case_when(min(AICc) == AICc ~ 1,
                         TRUE ~ 0),
    win_BIC = case_when(min(BIC) == BIC ~ 1,
                        TRUE ~ 0)
  ) %>%
  group_by(across(mu:model_dynamics),
           .add = FALSE) %>%
  summarise(
    n_w_AIC = sum(win_AIC),
    n_w_AICc = sum(win_AICc),
    n_w_BIC = sum(win_BIC)
  )

## Making the tables
## For other ICs, change AICc to AIC or BIC

# Making the detailed table with 16 rows
r_s %>%
  xtabs(n_w_AICc  ~ error_name_words + mu,
      .) %>%
  kable(caption =
"Fequency of selected models based on AICc
groupped by $a_t$ and $\\mu_t$")

# Main body of the summarized table
r_s %>%
  filter(error_timescale == "daily") %>%
  xtabs(n_w_AICc ~ error_dynamics + mu,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")

# Main body of the summarized table
r_s %>%
  filter(error_timescale == "weekly") %>%
  xtabs(n_w_AICc ~ error_dynamics + mu,
        .) %>%
  kable(caption =
          "Fequency of selected models based on AICc
with **weekly dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")

# Main body of the summarized table
r_s %>%
  filter(error_timescale == "daily + weekly") %>%
  xtabs(n_w_AICc ~ error_dynamics + mu,
        .) %>%
  kable(caption =
          "Fequency of selected models based on AICc
with **seasonal dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")

# Rows total per mu (Total a)
r_s %>%
  xtabs(n_w_AICc ~ error_dynamics + mu,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc,
groupped by $a_t$ dynamics and $\\mu_t$")

# Rows sum all mu (Sum b)
r_s %>%
  xtabs(n_w_AICc ~ error_dynamics,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc,
groupped by $a_t$ dynamics (Sum b)")

# Columns total per mu (Total c)
r_s %>%
  xtabs(n_w_AICc ~  error_timescale + mu ,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ timescale and $\\mu_t$ (Total c)")

# Column sum all mu (Sum d)
r_s %>%
  xtabs(n_w_AICc ~ error_timescale,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ timescale (Sum d)")

# Totals per mu
r_s %>%
  xtabs(n_w_AICc ~ mu,
      .) %>%
  kable(caption =
          "Fequency of selected models based on AICc, groupped by $\\mu_t$")
```

Fequency of selected models based on AICc groupped by $a_t$ and $\mu_t$
	c	d	h	w
01_wn	8	1	4	5
02_ar(1)	11	1	4	3
03_ma(1)	3	0	0	2
04_arma(1,1)	18	1	8	4
05_sar(0)(1)	1	0	0	0
06_sar(1)(1)	0	0	0	0
07_sma(0)(1)	1	0	0	0
08_sma(1)(1)	0	0	0	0
09_sarma(0,0)(1,1)	3	2	1	0
10_sarma(0,1)(1,0)	0	0	0	0
11_sarma(0,1)(1,1)	0	0	0	0
12_sarma(1,0)(0,1)	0	1	0	0
13_sarma(1,0)(1,1)	0	2	0	1
14_sarma(1,1)(0,1)	1	0	0	0
15_sarma(1,1)(1,0)	3	0	0	2
16_sarma(1,1)(1,1)	1	4	0	2

Fequency of selected models based on AICc with **daily dynamics**, groupped by $a_t$ dynamics and $\mu_t$
	c	d	h	w
wn	8	1	4	5
(s)ar	11	1	4	3
(s)ma	3	0	0	2
(s)ar + (s)ma	18	1	8	4

Fequency of selected models based on AICc with **weekly dynamics**, groupped by $a_t$ dynamics and $\mu_t$
	c	d	h
wn	0	0	0
(s)ar	1	0	0
(s)ma	1	0	0
(s)ar + (s)ma	3	2	1

Fequency of selected models based on AICc with **seasonal dynamics**, groupped by $a_t$ dynamics and $\mu_t$
	c	d	w
wn	0	0	0
(s)ar	0	0	0
(s)ma	0	0	0
(s)ar + (s)ma	5	7	5

Fequency of selected models based on AICc, groupped by $a_t$ dynamics and $\mu_t$
	c	d	h	w
wn	8	1	4	5
(s)ar	12	1	4	3
(s)ma	4	0	0	2
(s)ar + (s)ma	26	10	9	9

Fequency of selected models based on AICc, groupped by $a_t$ dynamics (Sum b)
error_dynamics	Freq
wn	18
(s)ar	20
(s)ma	6
(s)ar + (s)ma	54

Fequency of selected models based on AICc with **daily dynamics**, groupped by $a_t$ timescale and $\mu_t$ (Total c)
	c	d	h	w
daily	40	3	16	14
weekly	5	2	1	0
daily + weekly	5	7	0	5

Fequency of selected models based on AICc with **daily dynamics**, groupped by $a_t$ timescale (Sum d)
error_timescale	Freq
daily	73
weekly	8
daily + weekly	17

Fequency of selected models based on AICc, groupped by $\mu_t$
mu	Freq
c	50
d	12
h	17
w	19

References

Cremers, Jolien, and Irene Klugkist. 2018. “One Direction? A Tutorial for Circular Data Analysis Using R With Examples in Cognitive Psychology.” Frontiers in Psychology 9.

Haqiqatkhah, MohammadHossein Manuel, and Ellen Hamaker. 2024. “Daily Dynamics and Weekly Rhythms: A Tutorial on Seasonal ARMA Models Combined with Day-of-Week Effects,” February. https://doi.org/10.31234/osf.io/duvqh.

Lund, Ulric, Claudio Agostinelli, Hiroyoshi Arai, Alessando Gagliardi, Eduardo García-Portugués, Dimitri Giunchi, Jean-Olivier Irisson, Matthew Pocernich, and Federico Rotolo. 2023. “Circular: Circular Statistics.”

Wright, Aidan G. C., Christopher J. Hopwood, and Leonard J. Simms. 2015. “Daily Interpersonal and Affective Dynamics in Personality Disorder.” Journal of Personality Disorders 29 (4): 503–25. https://doi.org/10.1521/pedi.2015.29.4.503.

Citation

BibTeX citation:

@article{mohammadhosseinmanuel2024,
  author = {Haqiqatkhah, Mohammadhossein Manuel and Hamaker, Ellen L.},
  title = {Daily Dynamics and Weekly Rhythms: {A} Tutorial on Seasonal
    {ARMA} Models Combined with Day-of-Week Effects},
  journal = {PsyArXiv Preprints},
  date = {2024-02-20},
  url = {https://psyarxiv.com/duvqh},
  doi = {10.31234/osf.io/duvqh},
  langid = {en}
}

For attribution, please cite this work as:

Haqiqatkhah, Mohammadhossein Manuel, and Hamaker, Ellen L. 2024. “Daily Dynamics and Weekly Rhythms: A Tutorial on Seasonal ARMA Models Combined with Day-of-Week Effects.” PsyArXiv Preprints, February. https://doi.org/10.31234/osf.io/duvqh.