Bootstrap Balanced Resampling Analysis

Overview

This document applies bootstrap balanced resampling to assess the robustness of developmental effects in LMM analysis.

The Problem: Unbalanced Clustered Data

In longitudinal studies, animals often contribute different numbers of observations. This can bias population-level estimates toward patterns in well-sampled individuals.

The Solution: Bootstrap Balanced Resampling

Bootstrap balanced resampling addresses this by:

Finding minimum sample size (n_min) across all animals
Repeating B times (e.g., 500-1000 iterations):
- Randomly sample n_min observations from each animal
- Fit LMM on the balanced subsample
- Store the fixed effect slope estimate
Computing summary statistics: Mean slope, SE, and 95% confidence interval

Comparison with Weighted LMMs

Approach	Mechanism	Advantage
Inverse Weighting	Downweight observations from large clusters	Uses all data
Bootstrap Balanced	Subsample to equal n per cluster	Distribution of estimates

We use inverse-weighted LMM as a benchmark and bootstrap to assess consistency.

Interpretation Guide

Bootstrap Result	Interpretation
95% CI excludes zero	Robust effect, not driven by sample imbalance
95% CI includes zero	Effect may be unstable or driven by specific animals
Same direction as weighted	Converging evidence across methods
Opposite direction	Effect driven by dominant individuals

Validation Criteria

A developmental effect is considered robust if:

✅ Bootstrap 95% CI excludes zero
✅ Bootstrap slope has same sign as weighted LMM
✅ Bootstrap and weighted estimates are within reasonable range

Load Package and Data

library("VNS")
packageVersion("VNS") 

# Load population data
load("./data/population_data.rda")

Number of bootstrap iterations

N_BOOT <- 1000

Temporal Difference Metrics

Define Metrics

td_metrics <- list(
  "Peak Timing (Median)" = "median_peak_position_ms",
  "Temporal Entropy" = "mean_entropy",
  "Gini Coefficient" = "mean_gini",
  "Backward Shift Rate" = "backward_shift_rate"
)

Weighted LMM Benchmark

td_weighted <- analyze_td_metrics(temporal_shift_res, balance_method = "inverse")

Bootstrap Analysis

td_boot <- run_bootstrap_metrics(
  data = temporal_shift_res,
  metrics = td_metrics,
  n_boot = N_BOOT
)

knitr::kable(td_boot$summary, caption = "TD Metrics: Bootstrap Results (95% CI)")

Bootstrap Distributions

par(mfrow = c(2, 2))
for (name in names(td_boot$results)) {
  slopes <- td_boot$results[[name]]$boot_samples[, "dph"]
  hist(slopes, breaks = 30, col = "steelblue", border = "white",
       main = name, xlab = "Slope (per day)")
  abline(v = 0, col = "gray40", lwd = 2, lty = 2)
  abline(v = mean(slopes), col = "red", lwd = 2)
  abline(v = quantile(slopes, c(0.025, 0.975)), col = "red", lty = 2)
}

Amplitude Metrics

Define Metrics

amp_metrics <- list(
  "Early Peak Amplitude" = "early_mean",
  "Late Peak Amplitude" = "late_mean",
  "Early/Late Ratio" = "amplitude_ratio"
)

Weighted LMM Benchmark

amp_weighted <- analyze_amplitude_metrics(amplitude_shift_res, balance_method = "inverse")

Bootstrap Analysis

amp_boot <- run_bootstrap_metrics(
  data = amplitude_shift_res,
  metrics = amp_metrics,
  n_boot = N_BOOT
)

knitr::kable(amp_boot$summary, caption = "Amplitude Metrics: Bootstrap Results (95% CI)")

Bootstrap Distributions

par(mfrow = c(1, 3))
for (name in names(amp_boot$results)) {
  slopes <- amp_boot$results[[name]]$boot_samples[, "dph"]
  hist(slopes, breaks = 30, col = "darkgreen", border = "white",
       main = name, xlab = "Slope (per day)")
  abline(v = 0, col = "gray40", lwd = 2, lty = 2)
  abline(v = mean(slopes), col = "red", lwd = 2)
  abline(v = quantile(slopes, c(0.025, 0.975)), col = "red", lty = 2)
}

Information Flow Metrics

Define Metrics

cH_metrics <- list(
  "Forward Entropy" = "H_forward",
  "Backward Entropy" = "H_backward",
  "Prediction Asymmetry" = "asymmetry"
)

Weighted LMM Benchmark

cH_weighted <- analyze_cH_metrics(cH_res, balance_method = "inverse")

Bootstrap Analysis

cH_boot <- run_bootstrap_metrics(
  data = cH_res,
  metrics = cH_metrics,
  n_boot = N_BOOT
)

knitr::kable(cH_boot$summary, caption = "Information Flow Metrics: Bootstrap Results (95% CI)")

Bootstrap Distributions

par(mfrow = c(1, 3))
for (name in names(cH_boot$results)) {
  slopes <- cH_boot$results[[name]]$boot_samples[, "dph"]
  hist(slopes, breaks = 30, col = "darkorange", border = "white",
       main = name, xlab = "Slope (per day)")
  abline(v = 0, col = "gray40", lwd = 2, lty = 2)
  abline(v = mean(slopes), col = "red", lwd = 2)
  abline(v = quantile(slopes, c(0.025, 0.975)), col = "red", lty = 2)
}

Combined Summary

# Add category labels
td_boot$summary$Category <- "TD Metrics"
amp_boot$summary$Category <- "Amplitude"
cH_boot$summary$Category <- "Info Flow"

# Combine
all_summary <- rbind(td_boot$summary, amp_boot$summary, cH_boot$summary)

# Reorder columns
all_summary <- all_summary[, c("Category", "Metric", "Slope", "SE", "CI_Lower", "CI_Upper", "Significant", "Direction")]

knitr::kable(all_summary, caption = "Bootstrap Analysis: All Metrics Summary", row.names = FALSE)

Conclusion

Session Info

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.39   R6_2.6.1        fastmap_1.2.0   xfun_0.58      
##  [5] cachem_1.1.0    knitr_1.51      htmltools_0.5.9 rmarkdown_2.31 
##  [9] lifecycle_1.0.5 cli_3.6.6       sass_0.4.10     jquerylib_0.1.4
## [13] compiler_4.4.1  tools_4.4.1     evaluate_1.0.5  bslib_0.11.0   
## [17] yaml_2.3.12     rlang_1.2.0     jsonlite_2.0.0

Bootstrap Balanced Resampling Analysis

2025-12-10

Overview

The Problem: Unbalanced Clustered Data

The Solution: Bootstrap Balanced Resampling

Comparison with Weighted LMMs

Interpretation Guide

Validation Criteria

Load Package and Data

Number of bootstrap iterations

Temporal Difference Metrics

Define Metrics

Weighted LMM Benchmark

Bootstrap Analysis

Bootstrap Distributions

Amplitude Metrics

Define Metrics

Weighted LMM Benchmark

Bootstrap Analysis

Bootstrap Distributions

Information Flow Metrics

Define Metrics

Weighted LMM Benchmark

Bootstrap Analysis

Bootstrap Distributions

Combined Summary

Conclusion

Session Info