This guide covers how to structure your data for webforest, handle missing values, and prepare data from common meta-analysis workflows.

TipKey Principle: Column Names, Not Values

webforest uses a column-mapping pattern: you specify which columns contain your data, not the values themselves. This makes it easy to use any data frame without renaming columns.

# Your data can have any column names
my_data <- data.frame(my_study = ..., my_or = ..., my_lo = ..., my_hi = ...)

# Just map them to the right arguments
forest_plot(my_data, point = "my_or", lower = "my_lo", upper = "my_hi", label = "my_study")

Required Columns

At minimum, forest_plot() needs four column mappings:

Argument Description Example
point Point estimate (effect size) Hazard ratio, odds ratio, mean difference
lower Lower confidence interval bound 95% CI lower
upper Upper confidence interval bound 95% CI upper
label Row label text Study name, subgroup
Code
# Minimal example
data <- data.frame(
  study = c("Smith 2020", "Jones 2021", "Lee 2022"),
  hr = c(0.72, 0.85, 0.91),
  lo = c(0.55, 0.70, 0.75),
  hi = c(0.95, 1.03, 1.10)
)

forest_plot(data,
  point = "hr", lower = "lo", upper = "hi", label = "study",
  scale = "log", null_value = 1
)

Optional Columns

Beyond the core four, you can map additional columns for styling and display:

Category Columns Purpose
Grouping group Hierarchical nesting
Row styling row_type, row_bold, row_indent, row_color, row_badge Per-row appearance
Cell styling style_cols, style_bold, style_color, style_bg Per-cell formatting
Display Any column referenced in columns = list(...) Extra data columns

Handling Missing Values (NA)

webforest uses NA values strategically for structured layouts:

Header and Spacer Rows

Rows with row_type = "header" or "spacer" typically have NA for effect estimates. The plot renders these as label-only rows without intervals:

Code
structured <- data.frame(
  label = c("Primary Outcomes", "  CV Death", "  MI", "", "Secondary"),
  hr = c(NA, 0.82, 0.79, NA, NA),
  lower = c(NA, 0.72, 0.68, NA, NA),
  upper = c(NA, 0.94, 0.92, NA, NA),
  rtype = c("header", "data", "data", "spacer", "header"),
  rbold = c(TRUE, FALSE, FALSE, FALSE, TRUE)
)

forest_plot(structured,
  point = "hr", lower = "lower", upper = "upper", label = "label",
  row_type = "rtype", row_bold = "rbold",
  scale = "log", null_value = 1
)

Missing Effect Estimates

For data rows where an effect couldn’t be calculated, NA values display the label without plotting an interval. This is useful for subgroups with insufficient data.

Styling Column NAs

NA in styling columns (e.g., row_color, row_badge) means “use default” - no special styling is applied.

Scale Considerations

Log Scale

WarningLog Scale Requires Positive Values

When using scale = "log", all values in point, lower, and upper must be positive. Zero or negative values will cause rendering errors.

When using scale = "log", all values in point, lower, and upper must be positive:

Code
# This will cause issues:
bad_data <- data.frame(
  study = "Problematic",
  or = 0,        # Zero breaks log scale
  lower = -0.1,  # Negative breaks log scale
  upper = 1.5
)

# Solution: Filter or handle before plotting
good_data <- your_data |>
  filter(or > 0, lower > 0, upper > 0)

Typical null_value for log scale: 1 (ratio of 1 = no effect)

Linear Scale

Linear scale accepts any numeric values including negatives:

Code
# Mean difference example (linear scale)
diff_data <- data.frame(
  comparison = c("Treatment A", "Treatment B", "Treatment C"),
  mean_diff = c(-2.5, 1.3, -0.8),
  lower = c(-4.1, -0.2, -2.1),
  upper = c(-0.9, 2.8, 0.5)
)

forest_plot(diff_data,
  point = "mean_diff", lower = "lower", upper = "upper",
  label = "comparison",
  scale = "linear", null_value = 0,
  axis_label = "Mean Difference (95% CI)"
)

Typical null_value for linear scale: 0 (difference of 0 = no effect)

Creating Grouping Columns

Single-Level Grouping

Use a categorical column to group rows:

Code
trials <- data.frame(
  study = c("ADVANCE", "SPRINT", "ACCORD", "ONTARGET"),
  region = c("Europe", "North America", "North America", "Global"),
  hr = c(0.91, 0.75, 0.88, 0.94),
  lower = c(0.83, 0.64, 0.76, 0.86),
  upper = c(1.01, 0.87, 1.01, 1.02)
)

forest_plot(trials,
  point = "hr", lower = "lower", upper = "upper",
  label = "study", group = "region",
  scale = "log", null_value = 1
)

Hierarchical (Nested) Grouping

Pass multiple column names for nested subgroups:

Code
nested <- data.frame(
  study = c("Site A", "Site B", "Site C", "Site D", "Site E", "Site F"),
  region = c("Americas", "Americas", "Americas", "Europe", "Europe", "Europe"),
  country = c("USA", "USA", "Brazil", "UK", "Germany", "Germany"),
  hr = c(0.72, 0.85, 0.79, 0.88, 0.91, 0.76),
  lower = c(0.58, 0.71, 0.62, 0.74, 0.78, 0.61),
  upper = c(0.89, 1.02, 1.01, 1.05, 1.06, 0.95)
)

forest_plot(nested,
  point = "hr", lower = "lower", upper = "upper",
  label = "study",
  group = c("region", "country"),  # Nested: region > country
  scale = "log", null_value = 1
)

Working with Meta-Analysis Results

From metafor

Code
library(metafor)

# Run meta-analysis
res <- rma(yi = log_or, sei = se, data = studies, method = "REML")

# Convert to webforest format
forest_data <- studies |>
  mutate(
    or = exp(log_or),
    lower = exp(log_or - 1.96 * se),
    upper = exp(log_or + 1.96 * se)
  ) |>
  # Add pooled estimate as summary row

  bind_rows(
    tibble(
      study = "Pooled Estimate",
      or = exp(res$b),
      lower = exp(res$ci.lb),
      upper = exp(res$ci.ub),
      rtype = "summary",
      rbold = TRUE
    )
  )

forest_plot(forest_data,
  point = "or", lower = "lower", upper = "upper",
  label = "study",
  row_type = "rtype", row_bold = "rbold",
  scale = "log", null_value = 1
)

From meta Package

Code
library(meta)

# Run meta-analysis
m <- metagen(TE = log_or, seTE = se, studlab = study, data = studies)

# Extract study-level data
forest_data <- tibble(
  study = m$studlab,
  or = exp(m$TE),
  lower = exp(m$lower),
  upper = exp(m$upper),
  weight = m$w.random / sum(m$w.random) * 100
) |>
  bind_rows(
    tibble(
      study = "Random Effects",
      or = exp(m$TE.random),
      lower = exp(m$lower.random),
      upper = exp(m$upper.random),
      rtype = "summary",
      rbold = TRUE
    )
  )

Common Data Transformations

Adding Row Types

Code
# Transform flat data into structured layout
raw <- data.frame(
  outcome = c("CV Death", "MI", "Stroke"),
  category = c("Primary", "Primary", "Secondary"),
  hr = c(0.82, 0.79, 0.88),
  lower = c(0.72, 0.68, 0.74),
  upper = c(0.94, 0.92, 1.05)
)

structured <- raw |>
  group_by(category) |>
  group_modify(~ {
    header <- tibble(
      outcome = .y$category,
      hr = NA, lower = NA, upper = NA,
      rtype = "header", rbold = TRUE, rindent = 0
    )
    data <- .x |>
      mutate(
        outcome = paste0("  ", outcome),
        rtype = "data", rbold = FALSE, rindent = 1
      )
    bind_rows(header, data)
  }) |>
  ungroup()

Computing Weight Percentages

Code
studies |>
  mutate(
    # Inverse-variance weight
    weight = 1 / se^2,
    weight_pct = weight / sum(weight) * 100
  )

Formatting Confidence Intervals

If you want a pre-formatted CI column for display:

Code
data |>
  mutate(
    ci_text = sprintf("%.2f (%.2f-%.2f)", hr, lower, upper)
  )

Use col_text("ci_text", "HR (95% CI)") to display it, or use col_interval() for automatic formatting.

Data Validation Tips

  1. Check for non-positive values before log scale: any(data$hr <= 0)
  2. Verify CI ordering: all(data$lower <= data$hr & data$hr <= data$upper)
  3. Check for character columns: sapply(data, class) - numeric columns shouldn’t be character
  4. Preview structured data: Print the data frame to verify header/spacer row placement
Code
# Quick validation function
validate_forest_data <- function(data, point, lower, upper, scale = "linear") {
  issues <- character()

  p <- data[[point]]
  l <- data[[lower]]
  u <- data[[upper]]

  # Filter to non-NA (data rows only)
  valid <- !is.na(p)

  if (scale == "log" && any(p[valid] <= 0 | l[valid] <= 0 | u[valid] <= 0)) {
    issues <- c(issues, "Log scale requires all positive values")
  }

  if (any(l[valid] > p[valid] | p[valid] > u[valid])) {
    issues <- c(issues, "CI bounds should satisfy: lower <= point <= upper")
  }

  if (length(issues) == 0) {
    message("Data looks valid!")
  } else {
    warning(paste(issues, collapse = "\n"))
  }
}