Daily dynamics and weekly rhythms
Reproducible Code
Reproducible code for Daily dynamics and weekly rhythms: A tutorial on seasonal ARMA models combined with day-of-week effects.
1 Introduction
This document contains reproducible code of the manuscript by Haqiqatkhah and Hamaker (2024) on combining day-of-week effects and week-to-week dynamics with day-to-day dynamics. For attribution, please cite as
Haqiqatkhah, M. M., & Hamaker, E. L. (2024, February 20). Daily dynamics and weekly rhythms: A tutorial on seasonal ARMA models combined with day-of-week effects. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/duvqh
This document has two main sections:
In Section 2, we present the functions used for making the visualizations (Section 2.1), generating simulated time series (Section 2.2), running the Shiny app accompanying the paper (Section 2.3), and fitting models to empirical data (Section 2.4).
In Section 3, we provide the empirical dataset and additional plots for empirical time series (Section 3.1), which is followed by the reproducible code for generating figures shown in the paper (Section 3.2), running all the the analyses on the empirical dataset (Section 3.3) and making the tables reported in the paper (Section 3).
To replicate the study from the scratch, you should first either clone the repository (using git clone https://github.com/psyguy/WeCycle.git) or download the repository as a zip file and extract it on your machine. Then you can run the .R files you find in the scripts folder with the following order:
Alternatively, if you have Quarto installed installed on your machine, you can compile index.qmd located in the root directory using Quarto after setting the following variables to TRUE:
2 R functions
The code used for plotting the time series and fitting the models requires the time series \(y_t\) to be stored in a data.frame (which, let us call df) with at least three columns:
t: Indicating the time of the measurementy: The value \(y_t\) on time \(t\)weekday(orweekday_num): The name (or number) of the weekday corresponding tot. In the case of the former, it should be in the form of capitalized three letter codes ("Mon","Tue", …,"Sun"). Note that we consider Monday to be the first day of the week, and Sunday be the 0th/7th day of the week.
The column weekday (and weekday_num) can be generated if the date (e.g., as date) is included in df, using lubridate::wday and setting:
The missing values in the time series should be explicitly indicated with NA in the dataframe—that is, we should have a row for each time point—which can be achieved by:
2.1 Time series visualizations
The visualizations shown in discussed in the paper were plotted using separate functions for each plot, that are named accordingly plot_hist() , plot_seq(), plot_dowe(), plot_psd(), plot_acf(), and plot_pacf(). The main argument of these functions is d, which can be a numerical vector (for which the weekdays are added, starting by Monday), or it can be a dataframe with the columns specifiedexplained earlier.
Click to reveal plot_hist()
plot_hist <- function(d = NULL,
title = " ",
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_acf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
breaks_y <- seq(floor(ymin),
ceiling(ymax))
# Making sure the limits are not off
ymin <- min(min(d$y), ymin)
ymax <- max(max(d$y), ymax)
p_out <- d %>%
mutate(group_mean = mean(y,
na.rm = TRUE)) %>%
group_by(weekday,
.add = TRUE) %>%
mutate(weekday_mean = mean(y,
na.rm = TRUE)) %>%
ungroup() %>%
ggplot() +
aes(x = y) +
geom_histogram(
aes(y = after_stat(ndensity)),
center = 0,
bins = 40,
fill = col_hist
) +
geom_vline(
aes(xintercept = group_mean),
linetype = "dashed",
linewidth = rel(scale_rel * 0.2)
) +
labs(subtitle = "Distribution",
x = "y",
y = title) +
scale_x_continuous(breaks = breaks_y) +
xlim(ymin, ymax) +
theme(
panel.grid.major = element_blank(),
# axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
axis.line.y = element_blank(),
axis.title.y = element_text(size = rel(1.4*scale_rel)),
axis.text.y = element_blank()
)
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}Click to reveal plot_seq()
plot_seq <- function(d = NULL,
title = NULL,
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_acf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
# Making sure the limits are not off
ymin <- min(min(d$y), ymin)
ymax <- max(max(d$y), ymax)
p_out <- d %>%
data_shaper() %>%
mutate(group_mean = mean(y,
na.rm = TRUE)) %>%
group_by(weekday,
.add = TRUE) %>%
mutate(weekday_mean = mean(y,
na.rm = TRUE)) %>%
ungroup() %>%
ggplot() +
aes(x = t,
y = y) +
geom_hline(
aes(yintercept = group_mean),
linetype = "dashed",
linewidth = rel(scale_rel * 0.2)
) +
geom_line(
color = col_ts,
alpha = 1,
linewidth = rel(scale_rel * 0.2)
) +
geom_point(aes(y = y),
color = col_ts.point,
size = rel(scale_rel * 0.12)) +
scale_y_continuous(# breaks = breaks_y,
limits = c(ymin, ymax)) +
xlim(0, max_t) +
labs(subtitle = "Sequence plot",
x = "t",
y = "y")
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}Click to reveal plot_dowe()
plot_dowe <- function(d = NULL,
title = NULL,
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_acf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
# Making sure the limits are not off
ymin <- min(min(d$y), ymin)
ymax <- max(max(d$y), ymax)
p_out <- d %>%
data_shaper() %>%
mutate(group_mean = mean(y,
na.rm = TRUE)) %>%
group_by(weekday,
.add = TRUE) %>%
mutate(weekday_mean = mean(y,
na.rm = TRUE)) %>%
ungroup() %>%
group_by(week_num,
.add = FALSE) %>%
filter(week_num <= max_weeks) %>%
ggplot() +
aes(x = weekday,
y = y,
group = week_num) +
geom_hline(
aes(yintercept = group_mean),
linetype = "dashed",
linewidth = rel(scale_rel * 0.2),
alpha = 1
) +
geom_line(
alpha = 0.5,
color = col_weekly,
linewidth = rel(scale_rel * 0.4)
) +
geom_point(aes(y = y),
alpha = 0.6,
color = col_ts.point,
size = rel(scale_rel * 0.12)) +
geom_line(
aes(y = weekday_mean),
color = col_dowe.line,
alpha = 1,
linewidth = rel(scale_rel * 1)
) +
geom_point(aes(y = weekday_mean),
color = col_dowe.point,
size = rel(scale_rel * 0.8)) +
labs(subtitle = "Weekly plot",
x = "Weekdays",
y = "y") +
scale_y_continuous(#breaks = breaks_y,
limits = c(ymin, ymax)) +
theme(panel.grid.major.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_text(angle = 90,
size = rel(1 * scale_rel)))
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}Click to reveal plot_psd()
plot_psd <- function(d = NULL,
title = NULL,
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_acf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
# Imputing the missing values using seasonal Kalman smoothing
y_imp <- d$y %>%
ts(frequency = 7) %>%
imputeTS::na_kalman()
## Calculate Fourier components
y_imp <- y_imp %>% as.numeric
n <- length(y_imp)
Freq = (1:n - 1) / n
var_component <- Mod(fft(scale(y_imp, scale = FALSE))) ^ 2 / n^2
df_fourier <- data.frame(Freq = Freq,
rel_power = 100*var_component/var(y_imp))%>%
mutate(Period = round(1 / Freq, 1)) %>%
filter(Freq != 0,
Freq <= 0.5,
Period <= max_period) %>%
summarise(rel_power = sum(rel_power),
.by = Period) %>%
mutate(Freq = 1 / Period,
.before = 1)
max_var_component <- max(df_fourier$rel_power)
p_out <- df_fourier %>%
ggplot() +
aes(
x = Period,
xend = Period,
y = rel_power,
yend = 0
) +
geom_rect(
aes(
xmin = 6.5,
xmax = 7.5,
ymin = 0,
ymax = Inf
),
alpha = 0.7,
fill = "azure2"
) +
geom_segment(linewidth = rel(scale_rel * 0.5),
color = col_spec) +
scale_x_continuous(breaks =
seq(0,
max_period - 0.5,
7)) +
scale_y_continuous(# labels = scaleFUN,
breaks = scales::breaks_pretty(4),
limits = c(0, max(1, max_var_component))) +
theme(
# axis.title.y = element_blank(),
# axis.ticks.y = element_blank(),
# axis.line.y = element_blank(),
# axis.text.y = element_blank(),
panel.grid.major = element_blank()
) +
labs(subtitle = "Power spectral density",
x = "Period (in days)",
y = "% total power")
if(max_var_component < 1)
p_out <- p_out +
scale_y_continuous(breaks = c(0,1),
limits = c(0, 1))
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}Click to reveal plot_acf()
plot_acf <- function(d = NULL,
title = NULL,
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_acf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
# Imputing the missing values using seasonal Kalman smoothing
y_imp <- d$y %>%
ts(frequency = 7) %>%
imputeTS::na_kalman()
breaks_acf <- seq(0,
max_acf.lag,
by = 14 * floor(max_acf.lag / 7 / 3))
df_acf <- data.frame(
lag = c(0:max_acf.lag),
acf = stats::acf(y_imp,
lag.max = max_acf.lag,
plot = FALSE)$acf %>%
as.numeric()
)
p_out <- df_acf %>%
ggplot(aes(x = lag,
y = acf)) +
geom_segment(
aes(
x = lag,
xend = lag,
y = 0,
yend = acf
),
linewidth = rel(scale_rel * 35 / max_acf.lag / 2),
color = col_acf,
lineend = "butt"
) +
geom_hline(
yintercept = 0,
linewidth = rel(scale_rel * 0.3),
linetype = "solid",
color = col_hlines
) +
scale_x_continuous(breaks = breaks_acf) +
scale_y_continuous(breaks = c(-.5, 0, 0.5, 1),
limits = c(-.25, 1.1)) +
labs(subtitle = "ACF",
x = "Lag",
y = "") +
theme(
panel.grid.major = element_blank(),
axis.title.y = element_blank()
)
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}Click to reveal plot_pacf()
plot_pacf <- function(d = NULL,
title = NULL,
subtitle = NULL,
remove_titles = TRUE,
remove_xlab = TRUE,
scale_rel = 0.9,
max_pacf.lag = 35,
max_period = 15,
ymin = 0-0.1,
ymax = 4+0.1,
max_t = 140,
max_weeks = 25,
col_weekly = "lightsteelblue4",
col_dowe.line = "mediumorchid4",
col_dowe.point = "deeppink1",
col_ts = "lightsteelblue4",
col_ts.point = "cornflowerblue",
col_hist = "cornflowerblue",
col_acf = "darkolivegreen3",
col_pacf = "darkorange3",
col_spec = "darkorchid4",
col_hlines = "dimgray") {
# Transforming the input to an appropriate dataframe
d <- d %>% data_shaper()
# Making sure the optimal theme is in place
theme_set(ggthemes::theme_few())
# Imputing the missing values using seasonal Kalman smoothing
y_imp <- d$y %>%
ts(frequency = 7) %>%
imputeTS::na_kalman()
breaks_acf <- seq(0,
max_pacf.lag,
by = 14 * floor(max_pacf.lag / 7 / 3))
df_pacf <- data.frame(
lag = c(0:max_pacf.lag),
pacf = stats::pacf(y_imp,
max(1, max_pacf.lag),
na.action = na.exclude,
plot = FALSE)$acf %>%
as.numeric() %>%
c(0, .)
)
p_out <- df_pacf %>%
ggplot(aes(x = lag,
y = pacf)) +
geom_segment(
aes(
x = lag,
xend = lag,
y = 0,
yend = pacf
),
linewidth = rel(scale_rel * 35 / max_pacf.lag / 2),
color = col_pacf,
lineend = "butt"
) +
geom_hline(
yintercept = 0,
linewidth = rel(scale_rel * 0.3),
linetype = "solid",
color = col_hlines
) +
scale_x_continuous(breaks = breaks_acf) +
scale_y_continuous(breaks = c(-.5, 0, 0.5, 1),
limits = c(-.25, 1.1)) +
labs(subtitle = "PACF",
x = "Lag",
y = "") +
theme(
panel.grid.major = element_blank(),
axis.title.y = element_blank()
)
if (remove_titles)
p_out <- p_out + theme(plot.subtitle = element_blank())
if (remove_xlab)
p_out <- p_out + xlab(NULL)
p_out
}To make sure that the data provided to these functions are in the correct format (with the above-mentioned columns), the function data_shaper() is called within these plotting functions to do the job. This function also amends a new column (week_num) to the dataframe that counts the week number since the start of the time series, which is needed for plotting the DOWEs in plot_dowe().
Click to reveal data_shaper()
data_shaper <- function(d,
minimal_output = FALSE) {
weekdays <- c("Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun")
if (!is.data.frame(d)){
d <- data.frame(
t = 1:length(d),
y = d,
weekday = rep(weekdays, length.out = length(d)),
week_num = rep(1:ceiling(length(d) / 7), each = 7)[1:length(d)]
)
# if(minimal_output == TRUE) return(d)
}
if ("date" %in% colnames(d))
d <- d %>%
mutate(
weekday = lubridate::wday(date,
week_start = 1,
label = TRUE),
weekday_num = lubridate::wday(date,
week_start = 1,
label = FALSE)
)
if (!("weekday_num" %in% colnames(d)))
d$weekday_num <- match(d$weekday, weekdays)
if (!("week_num" %in% colnames(d))) {
# Initialize week number and an empty vector to store week numbers
week_number <- 1
week_numbers <- numeric(length(d$weekday_num))
# Check if the sequence starts with a day other than Monday and adjust week_number accordingly
if (d$weekday_num[1] != 1) {
week_number <- 1
} else {
week_number <-
2 # Start from week 2 if the first day is Monday, to handle edge cases
}
# Iterate through the days, increasing week number after encountering a Sunday
for (i in 1:length(d$weekday_num)) {
week_numbers[i] <- week_number
if (d$weekday_num[i] == 7 &&
i != length(d$weekday_num)) {
# Check for Sunday and not the last element
week_number <- week_number + 1
}
}
d$week_num <- week_numbers
}
# Substituting NA's for implicit missing values
d_out <- d %>%
right_join(data.frame(t = min(d$t):max(d$t)),
by = "t") %>%
mutate(weekday = weekday %>% factor(weekdays)) %>%
arrange(t)
return(d_out)
}Then, the function plot_row_assembly() uses the above functions to put them together in a row and returns either the plot, or saves it in a file in the figures folder with dimensions and sizes that render in nice proportions when the plot is saved with a .svg (good for putting in Word or online) or .pdf (good for \(\LaTeX\) manuscripts) file formats.
This function takes a list of time series (either as vectors or dataframes) in its list_data argument, and adds vertical labels to each row of the plots with the values passed to list_labels.
Click to reveal plot_row_assembly()
plot_row_assembly <- function(list_data,
list_labels = NULL,
save_name = "plot-rows.pdf",
save_dir = "figures",
...) {
n_rows <- length(list_data)
l_hist <- list()
l_seq <- list()
l_dowe <- list()
l_psd <- list()
l_acf <- list()
l_pacf <- list()
title_r <- NULL
for (r in 1:n_rows) {
d <- list_data[[r]]
if (length(list_labels) == n_rows)
title_r <- list_labels[[r]]
rm_titles <- TRUE
rm_xlab <- TRUE
if (r == 1) {
rm_titles <- FALSE
rm_xlab <- TRUE
}
if (r == n_rows) {
rm_titles <- TRUE
rm_xlab <- FALSE
}
if (n_rows == 1) {
rm_titles <- FALSE
rm_xlab <- FALSE
}
l_hist[[r]] <- #label_plot(r) +
plot_hist(
d,
title = title_r,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...
)
l_seq[[r]] <- plot_seq(d,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...)
l_dowe[[r]] <- plot_dowe(d,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...)
l_psd[[r]] <- plot_psd(d,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...)
l_acf[[r]] <- plot_acf(d,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...)
l_pacf[[r]] <- plot_pacf(d,
remove_titles = rm_titles,
remove_xlab = rm_xlab,
...)
}
p_tot <- cowplot::plot_grid(
l_hist %>% wrap_plots(ncol = 1),
l_seq %>% wrap_plots(ncol = 1),
l_dowe %>% wrap_plots(ncol = 1),
l_psd %>% wrap_plots(ncol = 1),
l_acf %>% wrap_plots(ncol = 1),
l_pacf %>% wrap_plots(ncol = 1),
nrow = 1,
rel_widths = c(.75, 2, 1, 1, 1, 1)
)
if (is.character(save_name))
ggsave(
here(save_dir, save_name),
p_tot,
width = 40,
height = 4 * n_rows + 0.5,
units = "cm"
)
else
return(p_tot)
}2.2 Simulating time series
Pure time series (i.e., without time index and weekday) are generated using m_sim() that generate \(y_t = \mu_t + a_t\) by adding a deterministic vector mu_t to a stochastic time series a_t. The innovations used in for simulation are generated from a random seed set by seed argument. The arguments burnin indicate the number of samples thrown away as burn-in samples.
The function generates three time-varying versions of \(\mu_t\) as explained in the paper in three vectors:
\(D_t\) (as
d_t_0) based on the weekday means (DOWEs) that are passed as a vector todowe(default: all zeros);\(W_t\) (as
w_t_0) with workdays mean of 0, and weekend effect determined by the argumentwee; and\(H_t\) (as
h_t_0) with overall mean of 0 and given amplitudeamp(default: 0), and peak shiftpeak_shift(default: 1, for Monday).
The time-varying mean mu_t is constructed by adding c to d_t_0, w_t_0, h_t_0. This approach allows users to investigate the effect imposing multiple mechanisms for the mean structure \(\mu_t\). Needless to say, \(D_t\), in general, can account for any arbitrary shape in the mean structure; yet, our implementation may come in handy in the Shiny app (see Section 2.3).
The stochastic component \(a_t\) is generated using sarima::sim_sarima(). Since this function cannot generate white noise process, if no SAR[I]MA component is specified, a_t is generated manually. In our simulation function m_sim() we allow including non-seasonal and seasonal differencing in the model (which we did not discuss in the paper, but implemented in the Shiny app).
Click to reveal m_sim()
m_sim <- function(n = 10000,
ar = 0,
ma = 0,
sar = 0,
sma = 0,
c = 0,
iorder = 0,
siorder = 0,
sigma2 = 1,
dowe = rep(0, 7),
wee = 0,
amp = 0,
peak_shift = 1,
burnin = 500,
seed = 0) {
## Generating (time-varying) mean structure mu_t
# Generating sequence of d_t
d_t_0 <- cbind(
Mon = rep(c(1, 0, 0, 0, 0, 0, 0), length.out = n),
Tue = rep(c(0, 1, 0, 0, 0, 0, 0), length.out = n),
Wed = rep(c(0, 0, 1, 0, 0, 0, 0), length.out = n),
Thu = rep(c(0, 0, 0, 1, 0, 0, 0), length.out = n),
Fri = rep(c(0, 0, 0, 0, 1, 0, 0), length.out = n),
Sat = rep(c(0, 0, 0, 0, 0, 1, 0), length.out = n),
Sun = rep(c(0, 0, 0, 0, 0, 0, 1), length.out = n)
) %*%
dowe %>%
as.numeric()
## Generating sequence of w_t_0, which is w_t with c = 0
# Making weekday-weekend dummies matrix
w_t_0 <- c(rep(0, 5), rep(wee, 2)) %>% rep(length.out = n)
# Generating sequence of h_t_0, which is h_t with c = 0
h_t_0 <- amp*cos((2*pi/7)*((1:n) - peak_shift))
# Adding d_t, h_t, and w_t together for extra capabilities
mu_t <- c + d_t_0 + w_t_0 + h_t_0
## Generating stochastic component a_t
# Setting the seed
set.seed(seed)
# Making the list of model specifications
l_model = list(ar = ar,
ma = ma,
sar = sar,
sma = sma,
iorder = iorder,
siorder = siorder,
nseasons = 7,
sigma2 = sigma2
)
# simulating the data
if (sum(abs(as.numeric(l_model))) > 7 + sigma2)
a_t <- sarima::sim_sarima(n = n,
model = l_model,
n.start = burnin)
else
a_t <- rnorm(n + burnin,
mean = 0,
sd = sqrt(sigma2)) %>%
tail(n)
## Making the complete time series
y_t <- mu_t + a_t
return(y_t)
}2.3 Shiny app
The Shiny app makes use of plot_sim_rows() to simulate data and make visualizations. The code for this app is as follows:
Click to reveal the Shiny app source code
library(shiny)
library(patchwork)
library(tidyverse)
library(imputeTS)
library(forecast)
library(stats)
library(sarima)
library(lubridate)
## To run the app locally, uncomment the following two lines and
## comment the next two `source(...)` function calls:
# source(here::here("scripts/functions_simulation.R"))
# source(here::here("scripts/functions_visualization.R"))
## To deploy on server, comment the above `source(...)`
## function calls and uncomment the following:
source("functions_simulation.R")
source("functions_visualization.R")
options(warn = -1)
ui <- fluidPage(
tags$style(
type = "text/css",
"
.shiny-input-container { overflow: hidden; }
.shiny-input-spinner { display: none; }
.shiny-slider-input { display: inline-block; width: 70%; vertical-align: middle; }
.slider-label { display: inline-block; width: 20%; text-align: right; margin-right: 10px; vertical-align: middle; }
"
),
titlePanel("Simulating SARMA processes with time-dependent mean structures"),
sidebarLayout(
sidebarPanel(
style = "height: 90vh; overflow-y: auto;",
# Make the sidebar responsive
actionButton("go", "Update the plots"),
# N, seed, mean
hr(style = "border-top: 1px solid #000000;"),
fluidRow(
column(
3,
numericInput(
"fixed_n",
"N (x100):",
value = 1.5,
width = '120px',
step = 1
)
),
column(
3,
numericInput("fixed_sigma2", "sigma2:", value = 1, width = '60px')
),
column(
3,
numericInput(
"fixed_seed",
"Seed:",
value = 0,
step = 1,
width = '60px'
)
),
column(3, numericInput(
"fixed_c", "c:", value = 0, width = '60px'
))
),
hr(style = "border-top: 1px solid #000000;"),
# phi, Phi
fluidRow(
column(
6,
tags$div(class = "slider-label", "ϕ:"),
sliderInput(
"fixed_ar",
NULL,
min = -0.9,
max = 0.9,
value = 0.45,
step = 0.15
)
),
column(
6,
tags$div(class = "slider-label", "θ:"),
sliderInput(
"fixed_ma",
NULL,
min = -0.9,
max = 0.9,
value = 0.5,
step = 0.15
)
)
),
# theta, Theta
fluidRow(
column(
6,
tags$div(class = "slider-label", "Φ:"),
sliderInput(
"fixed_sar",
NULL,
min = -0.9,
max = 0.9,
value = 0.2,
step = 0.15
)
),
column(
6,
tags$div(class = "slider-label", "Θ:"),
sliderInput(
"fixed_sma",
NULL,
min = -0.9,
max = 0.9,
value = 0.1,
step = 0.1
)
)
),
hr(style = "border-top: 1px solid #000000;"),
# Days
fluidRow(
column(3, numericInput(
"Mon", "Mon:", value = 1, width = '60px'
)),
column(3, numericInput(
"Tue", "Tue:", value = 0, width = '60px'
)),
column(3, numericInput(
"Wed", "Wed:", value = 2, width = '60px'
)),
column(3, numericInput(
"Thu", "Thu:", value = -2, width = '60px'
))
),
fluidRow(
column(3, numericInput(
"Fri", "Fri:", value = 3, width = '60px'
)),
column(3, numericInput(
"Sat", "Sat:", value = 1, width = '60px'
)),
column(3, numericInput(
"Sun", "Sun:", value = -1, width = '60px'
)),
column(3, numericInput(
"Mult", "Scale:", value = 0.5, width = '60px'
)),
column(3) # Empty column to align the days correctly
),
# Amplitude and phase shift
fluidRow(
column(
4,
numericInput("fixed_amp", "Amplitude:", value = 1.5, width = '100px')
),
column(
4,
tags$div(class = "slider-label", "φ:"),
sliderInput(
"fixed_peak_shift",
NULL,
min = 0,
max = 7,
step = 0.5,
value = 2
)
),
# Weekend effect
column(
4,
numericInput(
"fixed_wee",
"Weekend effect:",
value = 2,
width = '100px'
)
)
),
width = 4
),
# Plot
mainPanel(plotOutput(
"myImage", height = "150px", width = "350px"
),
width = 8)
)
)
server <- function(input, output) {
p <- eventReactive(input$go, {
plot_sim_rows(
fixed_c = input$fixed_c,
fixed_dowe = c(
input$Mon,
input$Tue,
input$Wed,
input$Thu,
input$Fri,
input$Sat,
input$Sun
) * input$Mult,
fixed_amp = input$fixed_amp,
fixed_peak_shift = input$fixed_peak_shift,
fixed_wee = input$fixed_wee,
fixed_sigma2 = input$fixed_sigma2,
fixed_ma = input$fixed_ma,
fixed_ar = input$fixed_ar,
fixed_sar = input$fixed_sar,
fixed_sma = input$fixed_sma,
fixed_n = input$fixed_n * 100,
fixed_seed = input$fixed_seed,
prefix = NULL,
for_shiny = TRUE,
# prefix = NULL,
file_format = "svg"
)
},
ignoreNULL = FALSE)
output$myImage <- renderPlot({
print(p())
}, width = 40 * 28.35 / 1.4,
height = (4 * 7 + 0.5) * 28.35 / 2) # Convert cm to pixels if needed
}
shinyApp(ui = ui, server = server)The app is hoseted online at https://psyguy.shinyapps.io/weekly-rhythms. To run the app locally, run source(here::here("scripts", "app.R").
2.4 Modeling time series data
There are to functions that are used for analyzing the data: m_fit() does the model fitting, and m_estimates() extracts the parameter estimates and post-processes them to get CIs and information criteria, and, if necessary, performs bootstrapping.
2.4.1 Fitting time series models
To carry out the analyses in the paper, we wrote a wrapper function called m_fit() around forecast::Arima() with additional capabilities which is a generalization of the code snippets shown in the paper.
This function flexibly determines the model orders by parsing the model name (as they were presented in the paper) passed as a string to model_string, which helps using it in elsewhere, e.g., in a Shiny app.
Click to reveal m_fit()
m_fit <- function(d,
model_string = "d + sarma(1,1)(1,1)",
id = NULL,
save_fit = FALSE,
save_est = FALSE,
save_name = NULL,
save_prefix_fit = "fit",
save_folder_fit = "fits",
save_prefix_est = "est",
save_folder_est = "ests",
...) {
model_string <- model_string %>%
tolower() %>%
gsub(" ", "", .)
# Extracting mean structure
mu <- "c"
if (grepl("d", model_string, fixed = TRUE))
mu <- "d"
if (grepl("w", model_string, fixed = TRUE))
mu <- "w"
if (grepl("h", model_string, fixed = TRUE))
mu <- "h"
# Extracting [S]AR[I]MA orders
orders <- model_string %>%
str_extract_all("\\d+") %>%
unlist() %>%
as.numeric()
# Adding NA to make sure the first element is for Monday
y_t <- d$y %>%
c(rep(NA, (d$weekday_num[1] - 1) %% 7),
.)
# Getting the length of the time series
n <- length(y_t)
## Building mu_t
# To make use Arima consistently, instead of changing
# include.mean for mu_y = c, we model mean with a constant
# variable for mu_t
if (mu == "c")
mu_t <- rep(1, n)
# Making the dummies matrix
if (mu == "d")
mu_t <- cbind(
Mon = rep(c(1, 0, 0, 0, 0, 0, 0), length.out = n),
Tue = rep(c(0, 1, 0, 0, 0, 0, 0), length.out = n),
Wed = rep(c(0, 0, 1, 0, 0, 0, 0), length.out = n),
Thu = rep(c(0, 0, 0, 1, 0, 0, 0), length.out = n),
Fri = rep(c(0, 0, 0, 0, 1, 0, 0), length.out = n),
Sat = rep(c(0, 0, 0, 0, 0, 1, 0), length.out = n),
Sun = rep(c(0, 0, 0, 0, 0, 0, 1), length.out = n)
)
# Making weekday-weekend dummies matrix
if (mu == "w")
mu_t <- cbind(
# Weekday = rep(c(1, 1, 1, 1, 1, 0, 0), length.out = n),
c = rep(c(1, 1, 1, 1, 1, 1, 1), length.out = n),
Weekend = rep(c(0, 0, 0, 0, 0, 1, 1), length.out = n)
)
# Making the harmonic matrix
if (mu == "h")
mu_t <- cbind(
c = rep(1, n),
a_cos = cos((1:n) * 2 * pi / 7),
b_sin = sin((1:n) * 2 * pi / 7)
)
## Defining the model orders from the model string
# Default: white noise
order_daily <- c(0, 0, 0)
# Default: non-seasonal
order_seasonal <- c(0, 0, 0)
seasonal <- NULL
# Setting orders conditional on the model string
if (grepl("ar(", model_string, fixed = TRUE)) {
order_daily <- c(orders[1], 0, 0)
}
if (grepl("sar(", model_string, fixed = TRUE)) {
order_daily <- c(orders[1], 0, 0)
order_seasonal <- c(orders[2], 0, 0)
}
if (grepl("ma(", model_string, fixed = TRUE)) {
order_daily <- c(0, 0, orders[1])
}
if (grepl("sma(", model_string, fixed = TRUE)) {
order_daily <- c(0, 0, orders[1])
order_seasonal <- c(0, 0, orders[2])
}
if (grepl("arma(", model_string, fixed = TRUE)) {
order_daily <- c(orders[1], 0, orders[2])
}
if (grepl("sarma(", model_string, fixed = TRUE)) {
order_daily <- c(orders[1], 0, orders[2])
order_seasonal <- c(orders[3], 0, orders[4])
}
if (sum(order_seasonal) != 0)
seasonal <- list(order = order_seasonal,
period = 7)
## Fitting the model to the time series
m <- forecast::Arima(
y_t,
order = order_daily,
seasonal = seasonal,
xreg = mu_t,
include.mean = FALSE
)
if (is.null(save_name))
save_name <- paste0(
mu,
"+sarma(",
order_daily[1],
",",
order_daily[3],
")(",
order_seasonal[1],
",",
order_seasonal[3],
")"
)
if(!is.null(id)) save_name <- paste0("id-",
id,
"_",
save_name)
if (save_fit == TRUE)
saveRDS(m,
file = save_name %>%
paste0(save_prefix_fit,
"_",
.,
".rds") %>%
here::here(save_folder_fit,
.))
if (save_est == TRUE)
m %>%
m_estimates(...) %>%
saveRDS(file = save_name %>%
paste0(save_prefix_est,
"_",
.,
".rds") %>%
here::here(save_folder_est,
.))
## Returning the model object
return(m)
}This function can also save the fit object in a folder (specified by save_folder_fit). To streamline the model fitting and post-processing the fit files, m_fit() will call m_estimates() if save_est is set to TRUE.
2.4.2 Extracting parameter estimates
The output of m_fit() is a list of class Arima that contains multiple components (see ?stats::arima and ?forecast::Arima), but does not contain the estimation standard errors and significances. Furthermore, in case \(H_t\) is included in the model, the output does not contain estimates for peak shift \(\psi\) and amplitude \(S\).
The function m_estimates() extracts the point estimates of the parameters( m$coef) and calculates their estimation standard errors using the estimated variance matrix of the coefficients (m$var.coef), and calculates 2.5% and 97.5% confidence intervals and significance of the estimates, and stores them in a dataframe called estimates. The function returns a list that contains this dataframe as well as a 1-row dataframe information_criteria that includes different information criteria (AIC, AICc, BIC) as well as the absolute value of the log-likelihood (that could have been used as a criterion in model selection, which we did not use in our paper).
Click to reveal m_estimates()
m_estimates <- function(m,
boot_plot = FALSE,
boot_n = 10000,
boot_seed = 0) {
# Point estimates of the parameters
o_est <- coef(m)
# Varviance-covariance matrix of the parameter estimates
o_vcov <- vcov(m)
# Changing the name of "xreg" to "c" for mu_t == "c"
xreg_index <- names(o_est) == "xreg"
names(o_est)[xreg_index] <- "c"
rownames(o_vcov)[xreg_index] <- "c"
colnames(o_vcov)[xreg_index] <- "c"
o_sd <- o_vcov %>%
diag() %>%
sqrt()
# Making the output table
estimates <- data.frame(
parameter = "sigma2",
est = m$sigma2,
se = NA,
CI_2.5 = NA,
CI_97.5 = NA,
sig = "*"
)
rownames(estimates) <- "sigma2"
if (length(o_est) > 0)
estimates <- data.frame(parameter = o_est %>% names,
est = o_est,
se = o_sd) %>%
mutate(
CI_2.5 = est + qnorm(0.025) * se,
CI_97.5 = est + qnorm(0.975) * se,
sig = case_when(CI_2.5 * CI_97.5 > 0 ~ "*",
TRUE ~ "ns")
) %>%
rbind(estimates)
rownames(estimates) <- NULL
# Bootstrapping -----------------------------------------------------------
# Checking if h_t was fitted which would require bootstrapping
if (sum(names(m$coef) == "a_cos",
names(m$coef) == "b_sin")) {
# Selecting est and vcov of only the relevant parameters for h_t
o_est_h <- o_est[c("c", "a_cos", "b_sin")]
o_vcov_h <-
o_vcov[c("c", "a_cos", "b_sin"), c("c", "a_cos", "b_sin")]
# Bootstrap sampling
set.seed(boot_seed)
boot_parameters <- MASS::mvrnorm(boot_n,
o_est_h,
o_vcov_h)
boot_c <- boot_parameters[, "c"]
boot_a <- boot_parameters[, "a_cos"]
boot_b <- boot_parameters[, "b_sin"]
boot_s <- sqrt(boot_a^2 + boot_b^2)
boot_psi_radian <- atan2(boot_b, boot_a) %>%
as.numeric() %>%
as.circular()
boot_psi_radian_quantiles <- boot_psi_radian %>%
as.circular() %>%
quantile(c(.025, .5, .975)) %>%
as.numeric()
boot_psi <- boot_psi_radian * 7 / (2 * pi)
boot_psi_radian_mod <- boot_psi_radian %% (2 * pi)
# plot(boot_s, boot_psi_radian)
# Bootstrapped summaries
b_c <- mean(boot_c)
b_s <- median(boot_s)
b_s_sd <- sd(boot_s)
b_s_CIs <- as.numeric(quantile(boot_s, c(0.025, 0.975)))
b_psi <- as.numeric(median(boot_psi_radian)) * 7 / (2 * pi)
b_psi_sd <- as.numeric(sd(boot_psi_radian)) * 7 / (2 * pi)
b_psi_CIs <-
as.numeric(quantile(boot_psi_radian, c(0.025, 0.975))) * 7 / (2 * pi)
# Adding the estimates to the table
boot_estimates <- data.frame(
parameter = c("amp", "peak_shift"),
est = c(b_s, b_psi),
se = c(b_s_sd, b_psi_sd),
CI_2.5 = c(b_s_CIs[1], b_psi_CIs[1]),
CI_97.5 = c(b_s_CIs[2], b_psi_CIs[2]),
sig = c("*", "*")
)
estimates <- rbind(estimates, boot_estimates)
if (boot_plot) {
t <- seq(0, 7, .1)
tt <- length(t)
rr <- 100
df <- data.frame(
r = rep(1:rr, each = tt) %>% as.character(),
c = rep(boot_c %>% head(rr), each = tt),
s = rep(boot_s %>% head(rr), each = tt),
psi = rep(boot_psi %>% head(rr), each = tt),
t = rep(t, times = rr)
) %>%
mutate(h_t = c + s * cos((t - psi) * 2 * pi / 7))
p_boot <- df %>%
group_by(r) %>%
ggplot(alpha = 0.05,
lwd = 0.1,
linetype = "solid") +
aes(x = t,
y = h_t,
group = r) +
# Separate curves for the first 100 bootstrapped samples
geom_line(aes(
x = t,
y = h_t,
group = r,
color = "single_curves"
)) +
# Average of the separate curves the first 100 bootstrapped samples
stat_summary(
fun.y = mean,
geom = "line",
lwd = 1.5,
aes(group = -1, color = "mean_curve"),
alpha = 1
) +
# Curve calculated by bootstrapping point estimates
geom_function(
fun = function(t)
b_c + b_s * cos(2 * pi / 7 * (t - b_psi)),
mapping = aes(color = "boot_curve"),
linetype = "solid",
lwd = .5,
alpha = 0.5
) +
geom_hline(yintercept = b_c + b_s,
color = "red",
linetype = "solid",
lwd = 1,
alpha = 0.85) +
geom_vline(xintercept = b_psi %% 7,
color = "red",
linetype = "solid",
lwd = 1,
alpha = 0.85) +
# Curve calculated by a and b point estimates without bootstrapping
geom_function(
fun = function(t)
o_est_h[1] +
o_est_h[2] * cos(t * 2 * pi / 7) +
o_est_h[3] * sin(t * 2 * pi / 7),
mapping = aes(color = "naive_curve"),
linetype = "twodash",
lwd = .5,
alpha = 0.5
) +
geom_hline(yintercept = o_est_h[1] + sqrt(o_est_h[2]^2 + o_est_h[3]^2),
color = "green",
linetype = "twodash",
lwd = 1,
alpha = 0.85) +
geom_vline(xintercept = (calc_peak_theta(o_est_h[2], o_est_h[3])*7/(2*pi)) %% 7,
color = "green",
linetype = "twodash",
lwd = 1,
alpha = 0.85) +
geom_vline(xintercept = (atan(o_est_h[3]/o_est_h[2])*7/(2*pi)) %% 7,
color = "black",
linetype = "dotted",
lwd = 1,
alpha = 0.85)
scale_color_manual(
name = "Y series",
values = c(
"single_curves" = "pink",
"mean_curve" = "darkblue",
"boot_curve" = "red",
"naive_curve" = "green"
)
)
}
}
estimates[2:5] <- estimates[2:5] %>% round(4)
information_criteria <- data.frame(
AIC = m$aic,
AICc = m$aicc,
BIC = m$bic,
ll_abs = abs(m$loglik)
)
o <- list(estimates = estimates,
information_criteria = information_criteria)
# if (exists("p_boot"))
# o$boot_plot <- p_boot
return(o)
}2.4.2.1 Estimating harmonic parameters by bootstrapping
In case the harmonic mean structure is in the model, this function also calculates points estimates and their CIs of peak shift \(\psi\) and amplitude \(S\) by means of bootstrapping by drawing boot_n samples (default: 10000) from the multivariate normal distribution attributable implied by the estimated variance matrix of the coefficients.
The amplitude \(S\) and peak shift \(\psi\) are respectively calculated by two auxiliary functions, calc_amp() and calc_peak_theta(), which are vectorized (using Vectorize()) to increase the performance when applied to bootstrapped values. Note that calc_peak_theta(a, b) is equivalent to base::atan2(b, a), which is the 2-argument arctangent function; we explicitly defined this function in our code to mirror the expression provided in Equation 22b of the paper.
Note that the variable \(\psi\) is circular with a period of 7, that is, peak shifts of \(\psi\) and \(\psi + 7\) and \(\psi - 7\) are equivalent, thus its value cannot calculate summary statistics such as mean, median, and quantiles based on its \([0, 7)\) values; for instance, an individual with \(\psi_1 = 1\) (peaking on Monday) is more similar to someone with \(\psi_2 = 6\) (peaking on Saturday) than someone with and \(\psi_3 = 4\) (peaking on Thursday). See Cremers and Klugkist (2018) for more insights.
To take the circularity of this variable into account, we use circular package (Lund et al. 2023) and the function circular::as.circular, that treats the output of atan2() in radians and assures that its circular equivalent spans \([0, 2\pi)\). Consequently, there is no need of taking the modulo of the values boot_psi_radian (as explained in Equation 23 of the manuscript), and we only need to multiply them by \(\frac{7}{2\pi}\) to reach to boot_psi_radian (that spans \([0, 7)\)). Furthermore, given that the distribution of \(\psi\) (as well as \(S\)) will be skewed due to the nonlinear transformation of random variable \(a\) and \(b\), we use the median to determine the point estimates of \(S\) and \(\psi\), and calculate the confidence intervals based on 2.5 and 97.5% quantiles.
3 Reproducing figures, analyses, and results
3.1 Importing and investigating the empirical data
The empirical data used in this paper comes form Wright, Hopwood, and Simms (2015). For our study, we used the average PA scores of 98 individuals, who had less than 50% missing responses. This subset of data is stored as r_pa.rds in the data/ folder of the repository, which does not contain demographic information, and the unique person identifies have been shuffled and stored as id. Other variables in the dataset are the measurement date (date), the measurement time (t, starting from the first day of the study on 2013-02-11), the weekday associated with the measurement date (weekday), the average PA time series (y), the weekday number (weekday_num) with Monday being the first day of the week, and the week number (week_num) associated with that date.
The data is shown in the following table, and it can be filtered based on some variables. The data can also be copied to the clipboard or downloaded as PDF, CSV, or an Excel file. In case you use the empirical data in a study, please cite Wright, Hopwood, and Simms (2015) as well.
To generate the visualizations of all individuals, we make 14 batches of seven individuals using the following code:
Click to reveal code
for(batch in 1:14){
ids <- split(1:98,
ceiling(seq_along(1:98)/7))[[batch]]
l_selected_batch <- list()
for (id_ in ids){
l_selected_batch[[paste("ID:", id_)]] <- d_pa %>%
filter(id == id_)
}
plot_row_assembly(l_selected_batch,
names(l_selected_batch),
max_period = 35,
save_name = paste0("rows-empirical-35_batch-",
batch,
".svg"))
}The resulting figures are shown below:
3.2 Reproducing figures in the paper
3.2.1 Visualizing empirical time series
The empirical time series shown of Persons A-F in the Figure 1 of the paper belonged to individuals with identifiers 70, 14, 12, 43, 97, and 20. To make the same plot, we used the following script, which makes use of plot_row_assembly():
Click to reveal code
d_pa <- readRDS(here::here("data",
"d_pa.rds"))
dir.create(file.path(here::here("figures")))
names_selected_6 <-
c("Person A",
"Person B",
"Person C",
"Person D",
"Person E",
"Person F")
d_selected_6 <- d_pa %>%
mutate(
individual = case_when(
id == 70 ~ "Person A",
id == 14 ~ "Person B",
id == 12 ~ "Person C",
id == 43 ~ "Person D",
id == 97 ~ "Person E",
id == 20 ~ "Person F",
TRUE ~ "rest"
) %>%
as.factor()
) %>%
filter(individual != "rest")
l_selected_6 <- list()
for (name in names_selected_6){
l_selected_6[[name]] <- d_selected_6 %>%
filter(individual == name)
}
plot_row_assembly(l_selected_6, names(l_selected_6),
max_period = 35,
save_name = "rows-empirical-35.svg")That generates the following figure:
Note that here we saved the figure as an svg file to be viewed in the HTML output, which has resulted in fonts different from those shown in the paper.
3.2.2 Visualizing simulated time series
Figures 2-5 were generated by plot_sim_rows(), which uses simulating data with m_sim() and plotting them using plot_row_assembly(), and is defined as follows:
Click to reveal code
plot_sim_rows <- function(fixed_c = 0,
fixed_dowe = rep(0, 7),
fixed_amp = 0,
fixed_peak_shift = 0,
fixed_wee = 0,
fixed_sigma2 = 1,
fixed_ma = 0.6,
fixed_ar = 0.7,
fixed_sar = 0.4,
fixed_sma = 0.5,
fixed_n = 10000,
fixed_seed = 0,
prefix = "c",
for_shiny = FALSE,
file_format = "svg",
...
){
if(for_shiny == TRUE)
l_d_sim_w <- list(
c = m_sim(
c = fixed_c,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
sar = fixed_sar,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
),
D_t = m_sim(
dowe = fixed_dowe,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
sar = fixed_sar,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
),
H_t = m_sim(
c = fixed_c,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
sar = fixed_sar,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
),
W_t = m_sim(
c = fixed_c,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
sar = fixed_sar,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
)
)
if(for_shiny == FALSE)
l_d_sim_w <- list(
wn = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
n = fixed_n,
seed = fixed_seed
),
ma = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ma = fixed_ma,
n = fixed_n,
seed = fixed_seed
),
ar = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ar = fixed_ar,
n = fixed_n,
seed = fixed_seed
),
arma = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
n = fixed_n,
seed = fixed_seed
),
sma = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ma = fixed_ma,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
),
sar = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ar = fixed_ar,
sar = fixed_sar,
n = fixed_n,
seed = fixed_seed
),
sarma = m_sim(
c = fixed_c,
dowe = fixed_dowe,
amp = fixed_amp,
peak_shift = fixed_peak_shift,
wee = fixed_wee,
sigma2 = fixed_sigma2,
ar = fixed_ar,
ma = fixed_ma,
sar = fixed_sar,
sma = fixed_sma,
n = fixed_n,
seed = fixed_seed
)
)
save_name <- paste0("rows-sim-7-",
prefix,
".",
file_format)
if(is.null(prefix) | for_shiny == TRUE){
save_name <- NULL
prefix <- "a_t"
}
plot_row_assembly(l_d_sim_w,
names(l_d_sim_w) %>%
toupper() %>%
paste0(prefix, "+", .),
save_name = save_name,
...)
}Running the following code makes the simulated figures:
:::
3.3 Reproducing the analyses of the empirical dataset
The following code was used for the empirical analyses in the paper. In short, we first load the required packages and functions, read the empirical data, and make sure that the folders for saving models fits and estimates exist. Then, a dataframe conditions_table is made, and parallel processing is initiated.
The analyses are done by iterating over rows of this dataframe using foreach() in parallel, and the rows in the dataframe which determine the data of which individual should be analyzed by what mean structure and what SARMA model orders at a given iteration. By running the loop, a fit file and a est file are generated and saved, respectively, in fits/ and ests/ folders. The analyses took less than two minutes on our server with 46 parallel threads.
Click to reveal code for running the analyses
source(here::here("scripts", "initialization.R"))
source(here::here("scripts", "functions_modeling.R"))
# Reading the empirical data
d_pa <- readRDS("data/d_pa.rds")
# Making sure the folders for saving models fits and estimates exist
dir.create(file.path(here::here("fits")))
dir.create(file.path(here::here("ests")))
# Making the table of conditions (i.e, model orders etc.)
conditions_table <- expand.grid(
id = 1:98,
mu = c("c", "d", "h", "w"),
ar = c(0, 1),
ma = c(0, 1),
sar = c(0, 1),
sma = c(0, 1)
)
# Initialization
Sys.setenv("OMP_THREAD_LIMIT" = 46)
options(warn = -1)
# Launching parallel computation
doFuture::registerDoFuture()
future::plan("multisession",
workers = 46)
# Running the fitting and estimation in parallel
# (t_start <- Sys.time())
fit_logs <- foreach(i = 1:nrow(conditions_table)) %dopar% {
t_c <- conditions_table[i, ]
# Getting the data
d_pa_i <- d_pa %>% filter(id == t_c$id)
dump <- tryCatch({
paste0(t_c$mu,
"+sarma(",
t_c$ar,
",",
t_c$ma,
")(",
t_c$sar,
",",
t_c$sma,
")") %>%
m_fit(
d_pa_i,
.,
id = t_c$id,
save_folder_fit = "fits",
save_folder_est = "ests",
save_fit = TRUE,
save_est = TRUE
)
}, error = function(e)
NULL)
# paste(i,
# "done at",
# Sys.time()) %>%
# print()
}
# (t_end <- Sys.time())
# t_end - t_start3.4 Results
After fitting the models, the estimate files are read from ests/ to make a large dataframe (harvest) of containing all model estimates and information criteria for all models and all individuals. This dataframe is them modified to include other properties, such as timescales, dynamics, and total model, which is saved as results_estimates and is easier for further inspections.
Click to reveal code for extracting the results
# Harvesting the estimates from est_* files -------------------------------
# Listing the estimated files
list_est_files <- list.files("ests", pattern = ".rds", full.names = TRUE)
# Launching parallel computation
doFuture::registerDoFuture()
future::plan("multisession",
workers = 46)
# Running the harvest
# Sys.time()
harvest <- foreach(
i = 1:length(list_est_files),
.combine = rbind,
.errorhandling = 'remove'
) %dopar% {
file_name <- list_est_files[i]
ee <- readRDS(file_name)
mu <- gsub(".*_([chdw])\\+.*",
"\\1",
file_name)
nums <- file_name %>%
str_extract_all("\\d+") %>%
unlist() %>%
as.numeric()
harv <- ee$estimates %>%
cbind(
id = nums[1],
mu = mu,
ar = nums[2],
ma = nums[3],
sar = nums[4],
sma = nums[5],
.
) %>%
cbind(ee$information_criteria)
harv
}
# Sys.time()
# Adding more descriptive variables to the results dataframe
results_estimates <- harvest %>%
ungroup() %>%
mutate(
error_name = paste0("sarma",
"(", ar, ",", ma, ")",
"(", sar, ",", sma, ")"),
error_name_words = case_when(
error_name == "sarma(0,0)(0,0)" ~ "01_wn",
error_name == "sarma(1,0)(0,0)" ~ "02_ar(1)",
error_name == "sarma(0,1)(0,0)" ~ "03_ma(1)",
error_name == "sarma(1,1)(0,0)" ~ "04_arma(1,1)",
error_name == "sarma(0,0)(1,0)" ~ "05_sar(0)(1)",
error_name == "sarma(1,0)(1,0)" ~ "06_sar(1)(1)",
error_name == "sarma(0,0)(0,1)" ~ "07_sma(0)(1)",
error_name == "sarma(0,1)(0,1)" ~ "08_sma(1)(1)",
error_name == "sarma(0,0)(1,1)" ~ "09_sarma(0,0)(1,1)",
error_name == "sarma(0,1)(1,0)" ~ "10_sarma(0,1)(1,0)",
error_name == "sarma(0,1)(1,1)" ~ "11_sarma(0,1)(1,1)",
error_name == "sarma(1,0)(0,1)" ~ "12_sarma(1,0)(0,1)",
error_name == "sarma(1,0)(1,1)" ~ "13_sarma(1,0)(1,1)",
error_name == "sarma(1,1)(0,1)" ~ "14_sarma(1,1)(0,1)",
error_name == "sarma(1,1)(1,0)" ~ "15_sarma(1,1)(1,0)",
error_name == "sarma(1,1)(1,1)" ~ "16_sarma(1,1)(1,1)"
) %>%
as.factor(),
model_name = paste0(mu, "+", error_name),
timescale_daily = (ar == 1 | ma == 1),
timescale_weekly = (sar == 1 | sma == 1),
error_timescale = case_when(
!timescale_weekly ~ "daily", # To include wwhite noise as well
!timescale_daily & timescale_weekly ~ "weekly",
timescale_daily &
timescale_weekly ~ "daily + weekly"
) %>%
factor(levels = c("daily", "weekly", "daily + weekly")),
model_timescale = paste0(mu, " + ", error_timescale),
dynamics_ar = (ar == 1 | sar == 1),
dynamics_ma = (ma == 1 | sma == 1),
error_dynamics = case_when(
dynamics_ar & !dynamics_ma ~ "(s)ar",
!dynamics_ar & dynamics_ma ~ "(s)ma",
dynamics_ar &
dynamics_ma ~ "(s)ar + (s)ma",
TRUE ~ "wn"
) %>%
factor(levels = c("wn", "(s)ar", "(s)ma", "(s)ar + (s)ma")),
model_dynamics = paste0(mu, " + ", error_dynamics),
.after = sma
)
# Saving the processed estimates & ICs extracted from est_ files
saveRDS(results_estimates,
here::here("data", "results_estimates.rds"))
# Reporting the results ---------------------------------------------------Lastly, the results_estimates is summarized to only contain the number of times each of the 64 models have won according to different information criteria, and the results reported in Table 2 and 3 are obtained using the stats::xtabs(), as implemented in the code.
Click to reveal code for reporting the results
```{r}
#| eval: true
#| code-fold: true
#| code-summary: "Click to reveal code for reporting the results"
#| include: true
# Reading the results dataframe
results_estimates <- readRDS(here::here("data", "results_estimates.rds"))
# Counting selected models
r_s <- results_estimates %>%
filter(parameter == "sigma2") %>%
select(-parameter:-sig) %>%
group_by(id) %>%
mutate(
win_AIC = case_when(min(AIC) == AIC ~ 1,
TRUE ~ 0),
win_AICc = case_when(min(AICc) == AICc ~ 1,
TRUE ~ 0),
win_BIC = case_when(min(BIC) == BIC ~ 1,
TRUE ~ 0)
) %>%
group_by(across(mu:model_dynamics),
.add = FALSE) %>%
summarise(
n_w_AIC = sum(win_AIC),
n_w_AICc = sum(win_AICc),
n_w_BIC = sum(win_BIC)
)
## Making the tables
## For other ICs, change AICc to AIC or BIC
# Making the detailed table with 16 rows
r_s %>%
xtabs(n_w_AICc ~ error_name_words + mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
groupped by $a_t$ and $\\mu_t$")
# Main body of the summarized table
r_s %>%
filter(error_timescale == "daily") %>%
xtabs(n_w_AICc ~ error_dynamics + mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")
# Main body of the summarized table
r_s %>%
filter(error_timescale == "weekly") %>%
xtabs(n_w_AICc ~ error_dynamics + mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
with **weekly dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")
# Main body of the summarized table
r_s %>%
filter(error_timescale == "daily + weekly") %>%
xtabs(n_w_AICc ~ error_dynamics + mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
with **seasonal dynamics**, groupped by $a_t$ dynamics and $\\mu_t$")
# Rows total per mu (Total a)
r_s %>%
xtabs(n_w_AICc ~ error_dynamics + mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc,
groupped by $a_t$ dynamics and $\\mu_t$")
# Rows sum all mu (Sum b)
r_s %>%
xtabs(n_w_AICc ~ error_dynamics,
.) %>%
kable(caption =
"Fequency of selected models based on AICc,
groupped by $a_t$ dynamics (Sum b)")
# Columns total per mu (Total c)
r_s %>%
xtabs(n_w_AICc ~ error_timescale + mu ,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ timescale and $\\mu_t$ (Total c)")
# Column sum all mu (Sum d)
r_s %>%
xtabs(n_w_AICc ~ error_timescale,
.) %>%
kable(caption =
"Fequency of selected models based on AICc
with **daily dynamics**, groupped by $a_t$ timescale (Sum d)")
# Totals per mu
r_s %>%
xtabs(n_w_AICc ~ mu,
.) %>%
kable(caption =
"Fequency of selected models based on AICc, groupped by $\\mu_t$")
```| c | d | h | w | |
|---|---|---|---|---|
| 01_wn | 8 | 1 | 4 | 5 |
| 02_ar(1) | 11 | 1 | 4 | 3 |
| 03_ma(1) | 3 | 0 | 0 | 2 |
| 04_arma(1,1) | 18 | 1 | 8 | 4 |
| 05_sar(0)(1) | 1 | 0 | 0 | 0 |
| 06_sar(1)(1) | 0 | 0 | 0 | 0 |
| 07_sma(0)(1) | 1 | 0 | 0 | 0 |
| 08_sma(1)(1) | 0 | 0 | 0 | 0 |
| 09_sarma(0,0)(1,1) | 3 | 2 | 1 | 0 |
| 10_sarma(0,1)(1,0) | 0 | 0 | 0 | 0 |
| 11_sarma(0,1)(1,1) | 0 | 0 | 0 | 0 |
| 12_sarma(1,0)(0,1) | 0 | 1 | 0 | 0 |
| 13_sarma(1,0)(1,1) | 0 | 2 | 0 | 1 |
| 14_sarma(1,1)(0,1) | 1 | 0 | 0 | 0 |
| 15_sarma(1,1)(1,0) | 3 | 0 | 0 | 2 |
| 16_sarma(1,1)(1,1) | 1 | 4 | 0 | 2 |
| c | d | h | w | |
|---|---|---|---|---|
| wn | 8 | 1 | 4 | 5 |
| (s)ar | 11 | 1 | 4 | 3 |
| (s)ma | 3 | 0 | 0 | 2 |
| (s)ar + (s)ma | 18 | 1 | 8 | 4 |
| c | d | h | w | |
|---|---|---|---|---|
| wn | 0 | 0 | 0 | 0 |
| (s)ar | 1 | 0 | 0 | 0 |
| (s)ma | 1 | 0 | 0 | 0 |
| (s)ar + (s)ma | 3 | 2 | 1 | 0 |
| c | d | h | w | |
|---|---|---|---|---|
| wn | 0 | 0 | 0 | 0 |
| (s)ar | 0 | 0 | 0 | 0 |
| (s)ma | 0 | 0 | 0 | 0 |
| (s)ar + (s)ma | 5 | 7 | 0 | 5 |
| c | d | h | w | |
|---|---|---|---|---|
| wn | 8 | 1 | 4 | 5 |
| (s)ar | 12 | 1 | 4 | 3 |
| (s)ma | 4 | 0 | 0 | 2 |
| (s)ar + (s)ma | 26 | 10 | 9 | 9 |
| error_dynamics | Freq |
|---|---|
| wn | 18 |
| (s)ar | 20 |
| (s)ma | 6 |
| (s)ar + (s)ma | 54 |
| c | d | h | w | |
|---|---|---|---|---|
| daily | 40 | 3 | 16 | 14 |
| weekly | 5 | 2 | 1 | 0 |
| daily + weekly | 5 | 7 | 0 | 5 |
| error_timescale | Freq |
|---|---|
| daily | 73 |
| weekly | 8 |
| daily + weekly | 17 |
| mu | Freq |
|---|---|
| c | 50 |
| d | 12 |
| h | 17 |
| w | 19 |
References
Citation
@article{mohammadhosseinmanuel2024,
author = {Haqiqatkhah, Mohammadhossein Manuel and Hamaker, Ellen L.},
title = {Daily Dynamics and Weekly Rhythms: {A} Tutorial on Seasonal
{ARMA} Models Combined with Day-of-Week Effects},
journal = {PsyArXiv Preprints},
date = {2024-02-20},
url = {https://psyarxiv.com/duvqh},
doi = {10.31234/osf.io/duvqh},
langid = {en}
}