| Title: | Statistical Tools Designed for End Users |
|---|---|
| Description: | The statistical tools in this package do one of four things: 1) Enhance basic statistical functions with more flexible inputs, smarter defaults, and richer, clearer, and ready-to-use output (e.g., t.test2()) 2) Produce publication-ready commonly needed figures with one line of code (e.g., plot_cdf()) 3) Implement novel analytical tools developed by the authors (e.g., twolines()) 4) Deliver niche functions of high value to the authors that are not easily available elsewhere (e.g., clear(), convert_to_sql(), resize_images()). |
| Authors: | Uri Simonsohn [aut, cre] |
| Maintainer: | Uri Simonsohn <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.0.9001 |
| Built: | 2026-05-21 11:00:23 UTC |
| Source: | https://github.com/urisohn/statuser |
Clears plot, global environment, and console. On first use the user is prompted to authorize clearing the environment to comply with CRAN rules.
clear()clear()
This function performs three cleanup operations:
Plot: Closes all open graphics devices (except the null device)
Global environment: Removes all objects from the global environment
Console: Clears the console screen (only in interactive sessions)
clear() will not modify the global environment unless you have
previously typed "yes" when prompted. If you do not type "yes", you are asked
again next time; only "yes" is remembered for future sessions.
Warning: This function deletes all objects in the global environment. Save anything that you wish to keep before running.
Invisibly returns NULL. Prints a colored confirmation message.
# Interactive use: clear workspace, console, and plots # First run may prompt; once you type "yes", your preference is saved. clear()# Interactive use: clear workspace, console, and plots # First run may prompt; once you type "yes", your preference is saved. clear()
When ‘stimulus.plot()' with 'plot.type = ’effects'' is run, resamples are saved in package state. If an identical call is run again (same data, dv, condition, number of simulations, etc.), stored results are loaded instead of re-calculated. Force recalculation by clearing the cache with this function.
clear_stimulus_cache()clear_stimulus_cache()
Reads a CSV file and generates SQL statements to insert all rows. Optionally can also generate a CREATE TABLE statement. The function automatically infers column types (REAL for numeric, DATE for date strings matching YYYY-MM-DD format, TEXT otherwise).
convert_to_sql(input, output, create_table = FALSE)convert_to_sql(input, output, create_table = FALSE)
input |
Character string. Path to the input CSV file. |
output |
Character string. Path to the output SQL file where the statements will be written. |
create_table |
Logical. If |
The function performs the following steps:
Reads the CSV file using read.csv() with
stringsAsFactors = FALSE
Infers SQL column types:
Numeric columns become REAL
Date columns (matching YYYY-MM-DD format) become DATE
All other columns become TEXT
If create_table = TRUE, generates a CREATE TABLE
statement using the base filename (without extension) as the table name
Generates INSERT INTO statements for each row
Writes all SQL statements to the output file
Single quotes in text values are escaped by doubling them (SQL standard). Numeric values are inserted without quotes, while text and date values are wrapped in single quotes.
Invisibly returns NULL. The function writes SQL statements to the specified output file.
# Convert a CSV file to SQL (INSERT statements only) tmp_csv <- tempfile(fileext = ".csv") tmp_sql <- tempfile(fileext = ".sql") write.csv( data.frame(id = 1:2, value = c("a", "b"), date = c("2024-01-01", "2024-02-02")), tmp_csv, row.names = FALSE ) convert_to_sql(tmp_csv, tmp_sql) # Convert a CSV file to SQL with CREATE TABLE statement convert_to_sql(tmp_csv, tmp_sql, create_table = TRUE)# Convert a CSV file to SQL (INSERT statements only) tmp_csv <- tempfile(fileext = ".csv") tmp_sql <- tempfile(fileext = ".sql") write.csv( data.frame(id = 1:2, value = c("a", "b"), date = c("2024-01-01", "2024-02-02")), tmp_csv, row.names = FALSE ) convert_to_sql(tmp_csv, tmp_sql) # Convert a CSV file to SQL with CREATE TABLE statement convert_to_sql(tmp_csv, tmp_sql, create_table = TRUE)
Returns a dataframe with one row per group.
desc_var(y, group = NULL, data = NULL, digits = 3)desc_var(y, group = NULL, data = NULL, digits = 3)
y |
A numeric vector of values, a column name (character string or unquoted) if |
group |
Optional grouping variable, if not provided computed for the full data.
Ignored if |
data |
Optional data frame containing the variable(s). |
digits |
Number of decimal places to round to. Default is 3. |
The dependent variable ('y') must be numeric. If you pass an existing non-numeric variable (e.g., character, factor), 'desc_var()' will stop with a clear error indicating that the variable must be numeric.
A data frame with one row per group (or one row if no group is specified) containing:
group: Group identifier
mean: Mean
sd: Standard deviation
se: Standard error
median: Median
min: Minimum
max: Maximum
mode: Most frequent value
freq_mode: Frequency of mode
mode2: 2nd most frequent value
freq_mode2: Frequency of 2nd mode
n.total: Number of observations
n.missing: Number of observations with missing (NA) values
n.unique: Number of unique values
Columns may also carry a "label" attribute, which provides a short
human-readable description of each column.
# With grouping df <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) desc_var(y, group, data = df) # Without grouping (full dataset) desc_var(y, data = df) # Direct vectors y <- rnorm(100) group <- rep(c("A", "B"), 50) desc_var(y, group) # With custom decimal places desc_var(y, group, data = df, digits = 2) # Using formula syntax: y ~ x desc_var(y ~ group, data = df) # Using formula syntax with multiple grouping variables: y ~ x1 + x2 df2 <- data.frame(y = rnorm(200), x1 = rep(c("A", "B"), 100), x2 = rep(c("X", "Y"), each = 100)) desc_var(y ~ x1 + x2, data = df2)# With grouping df <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) desc_var(y, group, data = df) # Without grouping (full dataset) desc_var(y, data = df) # Direct vectors y <- rnorm(100) group <- rep(c("A", "B"), 50) desc_var(y, group) # With custom decimal places desc_var(y, group, data = df, digits = 2) # Using formula syntax: y ~ x desc_var(y ~ group, data = df) # Using formula syntax with multiple grouping variables: y ~ x1 + x2 df2 <- data.frame(y = rnorm(200), x1 = rep(c("A", "B"), 100), x2 = rep(c("X", "Y"), each = 100)) desc_var(y ~ x1 + x2, data = df2)
Formats p-values for clean display in figures and tables. e.g., p = .0231, p<.0001
format_pvalue(p, digits = 4, include_p = FALSE)format_pvalue(p, digits = 4, include_p = FALSE)
p |
A numeric vector of p-values to format. |
digits |
Number of decimal places to round to. Default is 4. |
include_p |
Logical. If TRUE, includes "p" prefix before the formatted value (e.g., "p = .05"). Default is FALSE. |
A character vector of formatted p-values.
# Basic usage format_pvalue(0.05) format_pvalue(0.0001) # More rounding format_pvalue(0.0001,digits=2) # Vector input format_pvalue(c(0.05, 0.001, 0.00001, 0.99)) # With p prefix format_pvalue(0.05, include_p = TRUE)# Basic usage format_pvalue(0.05) format_pvalue(0.0001) # More rounding format_pvalue(0.0001,digits=2) # Vector input format_pvalue(c(0.05, 0.001, 0.00001, 0.99)) # With p prefix format_pvalue(0.05, include_p = TRUE)
Probes an interaction by estimating (or accepting) a model and computing: - simple slopes ("spotlights") using predicted values - Johnson-Neyman ("jn") curves using marginal effects
interprobe( x = NULL, z = NULL, y = NULL, model = NULL, data = NULL, moderator.on.x.axis = TRUE, k = NULL, spotlights = NULL, spotlight.labels = NULL, histogram = TRUE, max.unique = 11, n.bin.continuous = 10, n.max = 50, xlab = "", ylab1 = "", ylab2 = "", cols = c("red4", "dodgerblue", "green4"), main1 = "GAM Simple Slopes", main2 = "GAM Johnson-Neyman", legend.round = c(1, 4), draw = "both", save.as = NULL, xlim = NULL, ylim1 = NULL, ylim2 = NULL, x.ticks = NULL, y1.ticks = NULL, y2.ticks = NULL, legend.simple.slopes = NULL, legend.jn = NULL, quiet = FALSE, probe.bins = 100 )interprobe( x = NULL, z = NULL, y = NULL, model = NULL, data = NULL, moderator.on.x.axis = TRUE, k = NULL, spotlights = NULL, spotlight.labels = NULL, histogram = TRUE, max.unique = 11, n.bin.continuous = 10, n.max = 50, xlab = "", ylab1 = "", ylab2 = "", cols = c("red4", "dodgerblue", "green4"), main1 = "GAM Simple Slopes", main2 = "GAM Johnson-Neyman", legend.round = c(1, 4), draw = "both", save.as = NULL, xlim = NULL, ylim1 = NULL, ylim2 = NULL, x.ticks = NULL, y1.ticks = NULL, y2.ticks = NULL, legend.simple.slopes = NULL, legend.jn = NULL, quiet = FALSE, probe.bins = 100 )
x |
The focal predictor. Can be a name (bare or quoted) when 'data' or 'model' is provided, or a numeric/factor vector when probing from vectors. |
z |
The moderator. Same options as 'x'. |
y |
The dependent variable. Same options as 'x'. Not required when 'model' is supplied. |
model |
By default 'interprobe' estimates a GAM model predicting 'y' with 'x' and 'z'. You can instead probe a linear interaction by setting model=linear. You can also probe a model of your choice by running it separately, saving the output, and submitting it as the model argument to interprobe. This is the way to include covariates for a probed interaction. |
data |
Optional data frame containing 'x', 'z', and 'y'. |
moderator.on.x.axis |
Logical. If TRUE (default), moderator ('z') is shown on the x-axis. |
k |
Integer. Smoothness parameter passed to 'mgcv::gam()' when estimating with the default GAM engine. |
spotlights |
Numeric vector of length 3. Values at which curves are computed. |
spotlight.labels |
Character vector of length 3. Labels for the legend. |
histogram |
Logical. If TRUE (default), show sample size distribution under the plot. |
max.unique |
Integer. Threshold for treating a variable as continuous vs discrete. |
n.bin.continuous |
Integer. Number of bins used in histogram when binning continuous values. |
n.max |
Integer. Sample size at which line darkness/width saturates. |
xlab |
Character. X-axis label. |
ylab1 |
Character. Y-axis label for simple slopes panel. |
ylab2 |
Character. Y-axis label for JN panel. |
cols |
Character vector of length 3. Colors for the three curves. |
main1 |
Character. Title for simple slopes panel. |
main2 |
Character. Title for JN panel. |
legend.round |
Integer vector length 2. Min/max decimals in legend. |
draw |
Which plots to draw: '"both"' (default), '"simple slopes"' (or legacy '"simple.slopes"'), or '"jn"'. |
save.as |
Optional file path to save plot ('.png' or '.svg'). |
xlim |
Numeric vector length 2. X-axis limits. |
ylim1 |
Numeric vector length 2. Y-axis limits for simple slopes. |
ylim2 |
Numeric vector length 2. Y-axis limits for JN. |
x.ticks |
Optional custom x-axis ticks. |
y1.ticks |
Optional custom y-axis ticks for panel 1. |
y2.ticks |
Optional custom y-axis ticks for panel 2. |
legend.simple.slopes |
Optional legend title for simple slopes. |
legend.jn |
Optional legend title for JN. |
quiet |
Logical. If TRUE, reduces console output. |
probe.bins |
Integer. Resolution for probing curves (larger = smoother/slower). |
Designed for GAM models but works with any model supported by 'marginaleffects' (including 'lm', 'glm', 'mgcv::gam', and 'lm2' / 'estimatr::lm_robust').
Invisibly returns a list with:
simple.slopes: data.frame of predicted values and confidence intervals
johnson.neyman: data.frame of marginal effects and confidence intervals
frequencies: data.frame with bin frequencies used for shading/histogram
gam_results: the fitted GAM model when estimated inside interprobe()
gam_results_testing: when interprobe() estimates a GAM internally and x
has exactly 2 unique values, a separate GAM fit used for interaction testing (with a ti()
term and numeric coding of x)
lm2_results: when interprobe() estimates a GAM internally, also returns the
corresponding linear fit lm2(y ~ x * z) (or NULL if package estimatr
is not installed)
List with objects that are automatically named.
list2(...)list2(...)
... |
Objects to include in the list. Objects are automatically named based on their variable names unless explicit names are provided. |
list2(x , y) is equivalent to list(x = x , y = y)
list2(x , y2 = y) is equivalent to list(x = x , y2 = y)
A named list. Each element is named after the variable passed to
the function (or the explicit name if provided). The structure is identical
to a standard R list created with list.
x <- 1:5 y <- letters[1:3] z <- matrix(1:4, nrow = 2) # Create named list from objects my_list <- list2(x, y, z) names(my_list) # "x" "y" "z" # Works with explicit names too my_list2 <- list2(a = x, b = y) names(my_list2) # "a" "b"x <- 1:5 y <- letters[1:3] z <- matrix(1:4, nrow = 2) # Create named list from objects my_list <- list2(x, y, z) names(my_list) # "x" "y" "z" # Works with explicit names too my_list2 <- list2(a = x, b = y) names(my_list2) # "a" "b"
Runs a linear regression with better defaults (robust SE), and richer & better
formatted output than lm. For robust and clustered errors it relies on lm_robust.
The output reports classical and robust errors, number of missing observations per
variable, an effect size column (standardized regression coefficient), and a red.flag column per variable
flagging the need to conduct specific diagnostics. It relies by default on HC3 for standard errors;
lm_robust relies on HC2 (and Stata's 'reg y x, robust' on HC1), which can have
inflated false-positive rates in smaller samples (Long & Ervin, 2000).
se_type |
The type of standard error to use. Default is |
notes |
|
clusters |
An optional variable indicating clusters for cluster-robust standard
errors. When specified, |
fixed_effects |
An optional right-sided formula containing the fixed effects
to be projected out (absorbed) before estimation. Useful for models with many
fixed effect groups (e.g., |
... |
Additional arguments passed to |
Robust standard errors and clustered standard errors are computed using
lm_robust; see the documentation of that function for details.
With clusters, lm_robust uses CR2 by default.
The output shows both standard errors; when clustering is used it reports all three.
The red.flag column is based on the difference between robust and classical standard errors.
The red.flag column provides diagnostic warnings:
!, !!, !!!: Robust and classical standard errors differ by
more than 25%, 50%, or 100%, respectively. Large differences may suggest model
misspecification or outliers (but they may also be benign). When encountering a red flag,
authors should plot the distributions to look for outliers or skewed data, and use scatter.gam
to look for possible nonlinearities in the relevant variables.
King & Roberts (2015) propose a higher cutoff, at 100%, and a bootstrapped significance test;
statuser does not follow either recommendation. The former seems too liberal, the
latter too time consuming to include in every regression, plus the focus here is on individual variables rather than joint tests.
X: For interaction terms, the component variables are correlated (|r| > 0.3 or p < .05),
which means the interaction term is likely to be biased. See Simonsohn (2024) "Interacting with curves"
doi:10.1177/25152459231207787.
An object of class c("lm2", "lm_robust", "lm"). This inherits
from lm_robust and can be used with packages like
marginaleffects. The object contains all components of an lm_robust
object plus additional attributes:
A data frame with columns: term, estimate,
SE.robust, SE.classical, t, df, p.value,
B (standardized coefficient), and optionally SE.cluster when
clustered standard errors are used.
The underlying lm object with classical standard errors.
Integer vector of missing value counts per variable.
Total number of observations excluded due to missing values.
Logical indicating whether clustered standard errors were used.
When printed, displays a formatted regression table with robust and classical standard errors, effect sizes, and diagnostic red flags.
King, G., & Roberts, M. E. (2015). How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis, 23(2), 159-179.
Long, J. S., & Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54(3), 217-224.
Simonsohn, U. (2024). Interacting with curves: How to validly test and probe interactions in the real (nonlinear) world. Advances in Methods and Practices in Psychological Science, 7(1), 1-22.
lm_robust, scatter.gam, lm2_notes
# Basic usage with data argument lm2(mpg ~ wt + hp, data = mtcars) # Without data argument (variables from environment) y <- mtcars$mpg x1 <- mtcars$wt x2 <- mtcars$hp lm2(y ~ x1 + x2) # RED FLAG EXAMPLES # Example 1: red flag catches a nonlinearity # True model is quadratic: y = x^2 set.seed(123) x <- runif(200, -3, 3) y <- x^2 + rnorm(200, sd = 2) # lm2() shows red flag due to misspecification lm2(y ~ x) # Follow up with scatter.gam() to diagnose it scatter.gam(x, y) # Example 2: red flag catches an outlier in y # True model is y = x, but one observation has a very large y value set.seed(123) x <- sort(rnorm(200)) y <- round(x + rnorm(200, sd = 2), 1) y[200] <- 100 # Outlier # lm2() flags x lm2(y ~ x) # Look at distribution of y to spot the outlier plot_freq(y) # Example 3: red flag catches an outlier in one predictor # True model is y = x1 + x2, but x2 has an extreme value set.seed(123) x1 <- round(rnorm(200),.1) x2 <- round(rnorm(200),.1) y <- x1 + x2 + rnorm(200, sd = 0.5) x2[200] <- 50 # Outlier in x2 # lm2() flags x2 (but not x1) lm2(y ~ x1 + x2) # Look at distribution of x2 to spot the outlier plot_freq(x2) # CLUSTERED STANDARD ERRORS # When observations are grouped (e.g., students within schools), # use clusters to account for within-group correlation set.seed(123) n_clusters <- 20 n_per_cluster <- 15 cluster_id <- rep(1:n_clusters, each = n_per_cluster) cluster_effect <- rnorm(n_clusters, sd = 2)[cluster_id] x <- rnorm(n_clusters * n_per_cluster) y <- 1 + 0.5 * x + cluster_effect + rnorm(n_clusters * n_per_cluster) mydata <- data.frame(y = y, x = x, cluster_id = cluster_id) # Clustered SE (CR2) - note the SE.cluster column in output lm2(y ~ x, data = mydata, clusters = cluster_id) # FIXED EFFECTS # Use fixed_effects to absorb group-level variation (e.g., firm or year effects) # This is useful for panel data or when you have many fixed effect levels set.seed(456) n_firms <- 30 n_years <- 5 firm_id <- rep(1:n_firms, each = n_years) year <- rep(2018:2022, times = n_firms) firm_effect <- rnorm(n_firms, sd = 3)[firm_id] x <- rnorm(n_firms * n_years) y <- 2 + 0.8 * x + firm_effect + rnorm(n_firms * n_years) panel <- data.frame(y = y, x = x, firm_id = factor(firm_id), year = factor(year)) # Absorb firm fixed effects (coefficient on x is estimated, firm dummies are not shown) lm2(y ~ x, data = panel, fixed_effects = ~ firm_id) # Two-way fixed effects (firm and year) lm2(y ~ x, data = panel, fixed_effects = ~ firm_id + year)# Basic usage with data argument lm2(mpg ~ wt + hp, data = mtcars) # Without data argument (variables from environment) y <- mtcars$mpg x1 <- mtcars$wt x2 <- mtcars$hp lm2(y ~ x1 + x2) # RED FLAG EXAMPLES # Example 1: red flag catches a nonlinearity # True model is quadratic: y = x^2 set.seed(123) x <- runif(200, -3, 3) y <- x^2 + rnorm(200, sd = 2) # lm2() shows red flag due to misspecification lm2(y ~ x) # Follow up with scatter.gam() to diagnose it scatter.gam(x, y) # Example 2: red flag catches an outlier in y # True model is y = x, but one observation has a very large y value set.seed(123) x <- sort(rnorm(200)) y <- round(x + rnorm(200, sd = 2), 1) y[200] <- 100 # Outlier # lm2() flags x lm2(y ~ x) # Look at distribution of y to spot the outlier plot_freq(y) # Example 3: red flag catches an outlier in one predictor # True model is y = x1 + x2, but x2 has an extreme value set.seed(123) x1 <- round(rnorm(200),.1) x2 <- round(rnorm(200),.1) y <- x1 + x2 + rnorm(200, sd = 0.5) x2[200] <- 50 # Outlier in x2 # lm2() flags x2 (but not x1) lm2(y ~ x1 + x2) # Look at distribution of x2 to spot the outlier plot_freq(x2) # CLUSTERED STANDARD ERRORS # When observations are grouped (e.g., students within schools), # use clusters to account for within-group correlation set.seed(123) n_clusters <- 20 n_per_cluster <- 15 cluster_id <- rep(1:n_clusters, each = n_per_cluster) cluster_effect <- rnorm(n_clusters, sd = 2)[cluster_id] x <- rnorm(n_clusters * n_per_cluster) y <- 1 + 0.5 * x + cluster_effect + rnorm(n_clusters * n_per_cluster) mydata <- data.frame(y = y, x = x, cluster_id = cluster_id) # Clustered SE (CR2) - note the SE.cluster column in output lm2(y ~ x, data = mydata, clusters = cluster_id) # FIXED EFFECTS # Use fixed_effects to absorb group-level variation (e.g., firm or year effects) # This is useful for panel data or when you have many fixed effect levels set.seed(456) n_firms <- 30 n_years <- 5 firm_id <- rep(1:n_firms, each = n_years) year <- rep(2018:2022, times = n_firms) firm_effect <- rnorm(n_firms, sd = 3)[firm_id] x <- rnorm(n_firms * n_years) y <- 2 + 0.8 * x + firm_effect + rnorm(n_firms * n_years) panel <- data.frame(y = y, x = x, firm_id = factor(firm_id), year = factor(year)) # Absorb firm fixed effects (coefficient on x is estimated, firm dummies are not shown) lm2(y ~ x, data = panel, fixed_effects = ~ firm_id) # Two-way fixed effects (firm and year) lm2(y ~ x, data = panel, fixed_effects = ~ firm_id + year)
lm2 printAfter printing an lm2 object, call lm2_notes() to display the
full Notes: block (significance legend, column definitions, red-flag
guidance, etc.).
lm2_notes()lm2_notes()
Invisibly returns the notes character string, or NULL if none stored.
Add options to set color and to end execution of code (to be used as error message)
message2(..., col = "cyan", font = 1, stop = FALSE)message2(..., col = "cyan", font = 1, stop = FALSE)
... |
Message content to be printed. Multiple arguments are pasted together. |
col |
text color. Default is "cyan". |
font |
Integer. 1 for plain text (default), 2 for bold text. |
stop |
Logical. If TRUE, stops execution (like |
This function prints colored messages to the console. If ANSI color codes are supported
by the terminal, the message will be colored. Otherwise, it will be printed as plain text.
If stop = TRUE, execution will be halted after printing the message.
No return value, called for side effects. Prints a colored message
to the console. If stop = TRUE, execution is halted after printing
the message.
message2("This is a plain cyan message", col = "cyan", font = 1) message2("This is a bold cyan message", col = "cyan", font = 2) message2("This is a bold red message", col = "red", font = 2) cat("this will be shown") try(message2("This stops execution", stop = TRUE), silent = TRUE) cat("this will be shown after the try")message2("This is a plain cyan message", col = "cyan", font = 1) message2("This is a bold cyan message", col = "cyan", font = 2) message2("This is a bold red message", col = "red", font = 2) cat("this will be shown") try(message2("This stops execution", stop = TRUE), silent = TRUE) cat("this will be shown after the try")
Plots empirical cumulative distribution functions (ECDFs) separately for each unique value of a grouping variable, with support for vectorized plotting parameters. If no grouping variable is provided, plots a single ECDF.
plot_cdf( formula, y2 = NULL, data = NULL, order = NULL, show.ks = TRUE, show.quantiles = TRUE, ... )plot_cdf( formula, y2 = NULL, data = NULL, order = NULL, show.ks = TRUE, show.quantiles = TRUE, ... )
formula |
Two possible uses (similar to
|
y2 |
optional second variable when contrasting two variables |
data |
An optional data frame containing the variables in the formula.
If |
order |
Controls the order in which groups appear in the plot and legend.
Use |
show.ks |
Logical. If TRUE (default), shows Kolmogorov-Smirnov test results when there are exactly 2 groups. If FALSE, KS test results are not displayed. |
show.quantiles |
Logical. If TRUE (default), shows horizontal lines and results at 25th, 50th, and 75th percentiles when there are exactly 2 groups. If FALSE, quantile lines and results are not displayed. |
... |
Additional arguments passed to plotting functions. Can be single values
(applied to all groups) or vectors (applied element-wise to each group).
Common parameters include |
Invisibly returns a list containing:
ecdfs: A list of ECDF function objects, one per group. Each can be
called as a function to compute cumulative probabilities (e.g., result$ecdfs[[1]](5)
returns P(X <= 5) for group 1).
ks_test: (Only when exactly 2 groups) The Kolmogorov-Smirnov test result
comparing the two distributions. Access p-value with result$ks_test$p.value.
quantile_regression_25: (Only when exactly 2 groups) Quantile regression
model for the 25th percentile.
quantile_regression_50: (Only when exactly 2 groups) Quantile regression
model for the 50th percentile (median).
quantile_regression_75: (Only when exactly 2 groups) Quantile regression
model for the 75th percentile.
warnings: Any warnings captured during execution (if any).
# Basic usage with single variable (no grouping) y <- rnorm(100) plot_cdf(y) # Basic usage with formula syntax and grouping group <- rep(c("A", "B", "C"), c(30, 40, 30)) plot_cdf(y ~ group) # With custom colors (scalar - same for all) plot_cdf(y ~ group, col = "blue") # With custom colors (vector - different for each group) plot_cdf(y ~ group, col = c("red", "green", "blue")) # Multiple parameters plot_cdf(y ~ group, col = c("red", "green", "blue"), lwd = c(1, 2, 3)) # With line type and point character plot_cdf(y ~ group, col = c("red", "green", "blue"), lty = c(1, 2, 3), lwd = 2) # Using data frame df <- data.frame(value = rnorm(100), group = rep(c("A", "B"), 50)) plot_cdf(value ~ group, data = df) plot_cdf(value ~ group, data = df, col = c("red", "blue")) # Compare two vectors y1 <- rnorm(50) y2 <- rnorm(50, mean = 1) plot_cdf(y1, y2) # Formula syntax without data (variables evaluated from environment) widgetness <- rnorm(100) gender <- rep(c("M", "F"), 50) plot_cdf(widgetness ~ gender) # Using the returned object df <- data.frame(value = c(rnorm(50, 0), rnorm(50, 1)), group = rep(c("A", "B"), each = 50)) result <- plot_cdf(value ~ group, data = df) # Use ECDF to find P(X <= 0.5) for group A result$ecdfs[[1]](0.5) # Access KS test p-value result$ks_test$p.value # Summarize median quantile regression summary(result$quantile_regression_50)# Basic usage with single variable (no grouping) y <- rnorm(100) plot_cdf(y) # Basic usage with formula syntax and grouping group <- rep(c("A", "B", "C"), c(30, 40, 30)) plot_cdf(y ~ group) # With custom colors (scalar - same for all) plot_cdf(y ~ group, col = "blue") # With custom colors (vector - different for each group) plot_cdf(y ~ group, col = c("red", "green", "blue")) # Multiple parameters plot_cdf(y ~ group, col = c("red", "green", "blue"), lwd = c(1, 2, 3)) # With line type and point character plot_cdf(y ~ group, col = c("red", "green", "blue"), lty = c(1, 2, 3), lwd = 2) # Using data frame df <- data.frame(value = rnorm(100), group = rep(c("A", "B"), 50)) plot_cdf(value ~ group, data = df) plot_cdf(value ~ group, data = df, col = c("red", "blue")) # Compare two vectors y1 <- rnorm(50) y2 <- rnorm(50, mean = 1) plot_cdf(y1, y2) # Formula syntax without data (variables evaluated from environment) widgetness <- rnorm(100) gender <- rep(c("M", "F"), 50) plot_cdf(widgetness ~ gender) # Using the returned object df <- data.frame(value = c(rnorm(50, 0), rnorm(50, 1)), group = rep(c("A", "B"), each = 50)) result <- plot_cdf(value ~ group, data = df) # Use ECDF to find P(X <= 0.5) for group A result$ecdfs[[1]](0.5) # Access KS test p-value result$ks_test$p.value # Summarize median quantile regression summary(result$quantile_regression_50)
Plots the distribution of a variable by group, simply: plot_density(y ~ x)
plot_density( formula, y2 = NULL, data = NULL, order = NULL, show_means = TRUE, ... )plot_density( formula, y2 = NULL, data = NULL, order = NULL, show_means = TRUE, ... )
formula |
Two possible uses (similar to
|
y2 |
optional second variable when contrasting two variables |
data |
An optional data frame containing the variables in the formula. |
order |
Controls the order in which groups appear in the plot and legend.
Use |
show_means |
Logical. If TRUE (default), shows points at means. |
... |
Additional arguments passed to plotting functions. |
Plot parameters like
col, lwd, lty, and pch can be specified as:
A single value: applied to all groups
A vector: applied to groups in order of unique group values
Invisibly returns a list with the following element:
A named list of density objects (class "density"),
one for each group. Each density object contains x (evaluation points),
y (density estimates), bw (bandwidth), and other components
as returned by density. If no grouping variable is
provided, the list contains a single element named "all".
The function is primarily called for its side effect of creating a plot.
# Basic usage with formula syntax (no grouping) y <- rnorm(100) plot_density(y) # With grouping variable group <- rep(c("A", "B", "C"), c(30, 40, 30)) plot_density(y ~ group) # With custom colors (scalar - same for all) plot_density(y ~ group, col = "blue") # With custom colors (vector - different for each group) plot_density(y ~ group, col = c("red", "green", "blue")) # Multiple parameters plot_density(y ~ group, col = c("red", "green", "blue"), lwd = c(1, 2, 3)) # With line type plot_density(y ~ group, col = c("red", "green", "blue"), lty = c(1, 2, 3), lwd = 2) # Using data frame df <- data.frame(value = rnorm(100), group = rep(c("A", "B"), 50)) plot_density(value ~ group, data = df) plot_density(value ~ group, data = df, col = c("red", "blue")) # Compare two vectors y1 <- rnorm(50) y2 <- rnorm(50, mean = 1) plot_density(y1, y2)# Basic usage with formula syntax (no grouping) y <- rnorm(100) plot_density(y) # With grouping variable group <- rep(c("A", "B", "C"), c(30, 40, 30)) plot_density(y ~ group) # With custom colors (scalar - same for all) plot_density(y ~ group, col = "blue") # With custom colors (vector - different for each group) plot_density(y ~ group, col = c("red", "green", "blue")) # Multiple parameters plot_density(y ~ group, col = c("red", "green", "blue"), lwd = c(1, 2, 3)) # With line type plot_density(y ~ group, col = c("red", "green", "blue"), lty = c(1, 2, 3), lwd = 2) # Using data frame df <- data.frame(value = rnorm(100), group = rep(c("A", "B"), 50)) plot_density(value ~ group, data = df) plot_density(value ~ group, data = df, col = c("red", "blue")) # Compare two vectors y1 <- rnorm(50) y2 <- rnorm(50, mean = 1) plot_density(y1, y2)
Creates a frequency plot showing the frequency of every observed value, optionally by group. Most frequent values are labeled by default.
plot_freq( formula, y2 = NULL, data = NULL, freq = TRUE, order = NULL, col = "dodgerblue", lwd = 9, width = NULL, value.labels = "auto", ticks.max = 30, show.x.value = "auto", show.legend = TRUE, legend.title = NULL, col.text = NULL, ... )plot_freq( formula, y2 = NULL, data = NULL, freq = TRUE, order = NULL, col = "dodgerblue", lwd = 9, width = NULL, value.labels = "auto", ticks.max = 30, show.x.value = "auto", show.legend = TRUE, legend.title = NULL, col.text = NULL, ... )
formula |
Two possible uses (similar to
|
y2 |
optional second variable when contrasting two variables |
data |
An optional data frame containing the variables in the formula. |
freq |
Logical. If TRUE (default), displays frequencies. If FALSE, displays percentages. |
order |
Controls the order in which groups appear in the plot and legend.
Use |
col |
Color for the bars. |
lwd |
Line width for the frequency bars. Default is 9. |
width |
Numeric. Width of the frequency bars. If NULL (default), width is automatically calculated based on the spacing between values. |
value.labels |
Controls value labeling. If numeric, shows labels for the |
ticks.max |
Integer. Maximum number of unique x values to label on the x-axis. If there are more
than |
show.x.value |
Either |
show.legend |
Logical. If TRUE (default), displays a legend when |
legend.title |
Character string. Title for the legend when |
col.text |
Color for the value labels. If not specified, uses |
... |
Pass on any argument accepted by |
This function creates a frequency plot where each observed value is shown with its frequency. Unlike a histogram (no binning) and unlike a barplot (which omits unobserved levels), unobserved values are shown at frequency 0.
Invisibly returns a data frame with values and their frequencies.
# Simple example x <- c(1, 1, 2, 2, 2, 5, 5) plot_freq(x) # Pass on some common \code{plot()} arguments plot_freq(x, col = "steelblue", xlab = "Value", ylab = "Frequency",ylim=c(0,7)) # Add to an existing plot plot_freq(x, col = "dodgerblue") # Compare two vectors y1 <- c(1, 1, 2, 2, 2, 5, 5) y2 <- c(1, 2, 2, 3, 3, 3) plot_freq(y1, y2) # Using a data frame with grouping df <- data.frame(value = c(1, 1, 2, 2, 2, 5, 5), group = c("A", "A", "A", "B", "B", "A", "B")) plot_freq(value ~ 1, data = df) # single variable plot_freq(value ~ group, data = df) # with grouping # Control group order in legend and plot plot_freq(value ~ group, data = df, order = c("B", "A")) # B first, then A plot_freq(value ~ group, data = df, order = -1) # Reverse default order# Simple example x <- c(1, 1, 2, 2, 2, 5, 5) plot_freq(x) # Pass on some common \code{plot()} arguments plot_freq(x, col = "steelblue", xlab = "Value", ylab = "Frequency",ylim=c(0,7)) # Add to an existing plot plot_freq(x, col = "dodgerblue") # Compare two vectors y1 <- c(1, 1, 2, 2, 2, 5, 5) y2 <- c(1, 2, 2, 3, 3, 3) plot_freq(y1, y2) # Using a data frame with grouping df <- data.frame(value = c(1, 1, 2, 2, 2, 5, 5), group = c("A", "A", "A", "B", "B", "A", "B")) plot_freq(value ~ 1, data = df) # single variable plot_freq(value ~ group, data = df) # with grouping # Control group order in legend and plot plot_freq(value ~ group, data = df, order = c("B", "A")) # B first, then A plot_freq(value ~ group, data = df, order = -1) # Reverse default order
Plots fitted GAM values for focal predictor, keeping any other predictors in the model at a specified quantile (default: median)
plot_gam( model, predictor, quantile.others = 50, col = "blue4", bg = adjustcolor("dodgerblue", 0.2), plot2 = "auto", col2 = NULL, bg2 = "gray90", ... )plot_gam( model, predictor, quantile.others = 50, col = "blue4", bg = adjustcolor("dodgerblue", 0.2), plot2 = "auto", col2 = NULL, bg2 = "gray90", ... )
model |
A GAM model object fitted using |
predictor |
Character string specifying the name of the predictor variable to plot on the x-axis. |
quantile.others |
Number between 1 and 99 for quantile at which all other predictors are held constant. Default is 50 (median). |
col |
Color for the prediction line. Default is "blue4". |
bg |
Background color for the confidence band. Default is
|
plot2 |
How to plot the distribution in the lower plot. Options: |
col2 |
Color for the lines/bars in the bottom distribution plot. Default is "dodgerblue" |
bg2 |
Background color for the bottom distribution plot. Default is "gray90". |
... |
Additional arguments passed to |
Invisibly returns a list containing:
predictor_values: The sequence of predictor values used
predicted: The predicted values
se: The standard errors
lower: Lower confidence bound (predicted - 2*se)
upper: Upper confidence bound (predicted + 2*se)
library(mgcv) # Fit a GAM model data(mtcars) mtcars$cyl <- factor(mtcars$cyl) # Convert to factor before fitting GAM model <- gam(mpg ~ s(hp) + s(wt) + cyl, data = mtcars) # Plot effect of hp (with other variables at median) plot_gam(model, "hp") # Plot effect of hp (with other variables at 25th percentile) plot_gam(model, "hp", quantile.others = 25) # Customize plot plot_gam(model, "hp", main = "Effect of Horsepower", col = "blue", lwd = 2)library(mgcv) # Fit a GAM model data(mtcars) mtcars$cyl <- factor(mtcars$cyl) # Convert to factor before fitting GAM model <- gam(mpg ~ s(hp) + s(wt) + cyl, data = mtcars) # Plot effect of hp (with other variables at median) plot_gam(model, "hp") # Plot effect of hp (with other variables at 25th percentile) plot_gam(model, "hp", quantile.others = 25) # Customize plot plot_gam(model, "hp", main = "Effect of Horsepower", col = "blue", lwd = 2)
Plots means, with confidence intervals, and (optionally) p-values for differences of means and interactions
plot_means( formula, data = NULL, cluster = NULL, tests = "auto", quiet = FALSE, order = NULL, legend.title = NULL, col = NULL, col.text = NULL, values.cex = 1, values.align = "top", values.round = 1, pvalue.cex = 0.9, pvalue.col = "gray50", ci.level = 95, buffer.top = "auto", ... )plot_means( formula, data = NULL, cluster = NULL, tests = "auto", quiet = FALSE, order = NULL, legend.title = NULL, col = NULL, col.text = NULL, values.cex = 1, values.align = "top", values.round = 1, pvalue.cex = 0.9, pvalue.col = "gray50", ci.level = 95, buffer.top = "auto", ... )
formula |
A formula, e.g., |
data |
Optional data frame containing variables in the formula. |
cluster |
Optional clustering variable when there are repeated
observations per cluster
e.g., |
tests |
specifies which comparisons of means to report. Syntax involves
putting column numbers in a character string, with a - to symbolize a
comparison and a + symbolizing combination. For example tests="1-2" reports
t-test comparing columns 1 and 2. tests="1-2,3-4" t-tests comparing columns
1 & 2 and another 3 & 4. To run an interaction use parentheses:
tests= |
quiet |
Logical. When |
order |
Controls the order of |
legend.title |
Character string. Title for the legend. If |
col |
Color(s) for |
col.text |
Color for confidence intervals and other non-bar annotations.
If |
values.cex |
Numeric scalar controlling text size for mean value labels (and related annotations). |
values.align |
Where within the bars to put the mean value labels: |
values.round |
Non-negative integer. Number of decimal places for mean value labels. |
pvalue.cex |
Numeric scalar controlling p-value label size. |
pvalue.col |
Color for p-value brackets/labels. |
ci.level |
Confidence interval level for |
buffer.top |
Either |
... |
Additional arguments passed to |
When tests="auto", the function reports a small default set of
differences-of-means tests (when applicable) and, in 2x2 designs, an
interaction test:
If cluster is NULL, these are Welch
two-sample t-tests computed with t.test(..., var.equal=FALSE). If
cluster is provided, these comparisons are computed from a
regression using lm2() with clustered standard errors.
The interaction is tested using a linear regression fit
with lm2(), even when cluster is NULL; when
cluster is provided, the interaction test uses clustered standard
errors.
The regression-based tests use heteroskedasticity-robust inference (HC3) when
cluster is NULL. HC3 is a common small-sample adjustment to
White-type robust standard errors and is used to reduce sensitivity to
heteroskedasticity. When cluster is provided, plot_means()
instead uses clustered standard errors (robust to within-cluster correlation).
In the returned $means table, ciL and ciH are the lower and
upper bounds of a ci.level% confidence interval for the mean (when
available). The same confidence level is used for the confidence-interval
whiskers drawn in the figure.
A minimal list returned invisibly with two elements:
meansA data frame of means (and, when available, confidence intervals) aligned to the plotting grid.
testsA data frame of comparisons used for p-value
annotation (or NULL if not applicable).
df <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) plot_means(y ~ group, data = df) df2 <- data.frame( y = rnorm(200), x1 = rep(c("A", "B"), 100), x2 = rep(c("X", "Y"), each = 100) ) plot_means(y ~ x1 + x2, data = df2) df3 <- data.frame( y = rnorm(600), x1 = rep(c("control", "treatment"), times = 300), x2 = rep(rep(c("low", "high"), each = 150), times = 2), x3 = rep(c("online", "lab"), each = 300) ) plot_means(y ~ x1 + x2 + x3, data = df3)df <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) plot_means(y ~ group, data = df) df2 <- data.frame( y = rnorm(200), x1 = rep(c("A", "B"), 100), x2 = rep(c("X", "Y"), each = 100) ) plot_means(y ~ x1 + x2, data = df2) df3 <- data.frame( y = rnorm(600), x1 = rep(c("control", "treatment"), times = 300), x2 = rep(rep(c("low", "high"), each = 150), times = 2), x3 = rep(c("online", "lab"), each = 300) ) plot_means(y ~ x1 + x2 + x3, data = df3)
Predict method for lm2 objects
## S3 method for class 'lm2' predict(object, newdata, ...)## S3 method for class 'lm2' predict(object, newdata, ...)
object |
An object of class |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the original model data is used. |
... |
Additional arguments passed to |
A vector of predicted values (or a list with fit and se.fit
if se.fit = TRUE, or a matrix with fit, lwr, upr
if interval is specified)
Print method for desc_var objects
## S3 method for class 'desc_var' print(x, ...)## S3 method for class 'desc_var' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to print.data.frame |
Invisibly returns the original object
Print method for lm2 objects
## S3 method for class 'lm2' print(x, notes, ...)## S3 method for class 'lm2' print(x, notes, ...)
x |
An object of class |
notes |
|
... |
Additional arguments (ignored) |
Invisibly returns the original object
Print method for t.test2 output
## S3 method for class 't.test2' print(x, ...)## S3 method for class 't.test2' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to print |
Invisibly returns the input object x. Called for its side effect
of printing a formatted t-test summary to the console, including means,
confidence intervals, test statistics, p-values, sample sizes, and
APA-formatted results.
Print method for table2 output with centered column variable name
Print method for table2 objects
## S3 method for class 'table2' print(x, ...) ## S3 method for class 'table2' print(x, ...)## S3 method for class 'table2' print(x, ...) ## S3 method for class 'table2' print(x, ...)
x |
An object of class |
... |
Additional arguments (ignored) |
Invisibly returns the input object x. Called for its side effect
of printing a formatted cross-tabulation table to the console. The output
includes frequencies, optional relative frequencies (row, column, or overall
proportions), and chi-squared test results when applicable.
Invisibly returns the original object
Saves images to PNG with a specified width. As input it accepts (SVG, PDF, EPS, JPG, JPEG, TIF, TIFF, BMP, PNG) Saves to subdirectory '/resized' within input folder (or same directory as file if input is a single file)
resize_images(path, width)resize_images(path, width)
path |
Character string. Path to a folder containing image files, or path to a single image file. |
width |
Numeric vector. Target width(s) in pixels for the output PNG files. Can be a single value (recycled for all files) or a vector matching the number of files found. |
This function:
Searches for image files with extensions: svg, pdf, eps, jpg, jpeg, tif, tiff, bmp, png
Creates a "resized" subfolder in the target directory if it doesn't exist
Converts each file to PNG format at the specified width(s)
Saves output files as: originalname_width.png in the resized subfolder
Supported input formats:
Vector graphics: SVG, PDF, EPS (rasterized using rsvg/magick)
Raster images: JPG, JPEG, TIF, TIFF, BMP, PNG
Invisibly returns TRUE on success.
Dependencies required: rsvg, magick, and tools (base R).
SVG files are rasterized using rsvg::rsvg(), while PDF/EPS and other
formats are handled by magick::image_read().
# Create a temporary PNG file and resize it tmp_png <- tempfile(fileext = ".png") grDevices::png(tmp_png, width = 400, height = 300) old_par <- graphics::par(no.readonly = TRUE) graphics::par(mar = c(2, 2, 1, 1)) graphics::plot(1:2, 1:2, type = "n") grDevices::dev.off() graphics::par(old_par) resize_images(tmp_png, width = 80)# Create a temporary PNG file and resize it tmp_png <- tempfile(fileext = ".png") grDevices::png(tmp_png, width = 400, height = 300) old_par <- graphics::par(no.readonly = TRUE) graphics::par(mar = c(2, 2, 1, 1)) graphics::plot(1:2, 1:2, type = "n") grDevices::dev.off() graphics::par(old_par) resize_images(tmp_png, width = 80)
Creates a scatter plot with a GAM (Generalized Additive Model) smooth line.
Supports both scatter.gam(x, y) and scatter.gam(y ~ x).
scatter.gam( x, y, data.dots = TRUE, three.dots = FALSE, data = NULL, k = NULL, plot.dist = NULL, dot.pch = 16, dot.col = adjustcolor("gray", 0.7), jitter = FALSE, ... )scatter.gam( x, y, data.dots = TRUE, three.dots = FALSE, data = NULL, k = NULL, plot.dist = NULL, dot.pch = 16, dot.col = adjustcolor("gray", 0.7), jitter = FALSE, ... )
x |
A numeric vector of x values, or a formula of the form |
y |
A numeric vector of y values. Not used if |
data.dots |
Logical. If TRUE, displays data on scatterplot |
three.dots |
Logical. If TRUE, divides x into tertiles and puts markers on the average x & y for each |
data |
An optional data frame containing the variables |
k |
Optional integer specifying the basis dimension for the smooth term
in the GAM model (passed to |
plot.dist |
Character string specifying how to plot the distribution of |
dot.pch |
Plotting character for data points when |
dot.col |
Color for data points when |
jitter |
Logical. If TRUE, applies a small amount of jitter to data points to reduce overplotting. Default is FALSE. |
... |
Additional arguments passed to
|
This function fits a GAM model with a smooth term for x and plots the fitted
smooth line. The function uses the mgcv package's gam() function.
When three.dots = TRUE, the x variable is divided into three equal-sized
groups (tertiles), and the mean x and y values for each group are plotted as
points. This provides a simple summary of the relationship across the range of x.
Invisibly returns the fitted GAM model object.
scatter.smooth for a simpler loess-based scatter plot smoother.
# Generate sample data for examples x <- rnorm(100) y <- 2*x + rnorm(100) # Plot GAM smooth line only scatter.gam(x, y) # Equivalent call using formula syntax (y ~ x) scatter.gam(y ~ x) # Include scatter plot with underlying data points behind the GAM line scatter.gam(x, y, data.dots = TRUE) # Include summary points showing mean x and y for each tertile bin scatter.gam(x, y, three.dots = TRUE) # Customize the plot with a custom title, line color, and line width scatter.gam(x, y, data.dots = TRUE, col = "red", lwd = 2, main = "GAM Fit") # Control smoothness of the GAM line by specifying the basis dimension scatter.gam(x, y, k = 10)# Generate sample data for examples x <- rnorm(100) y <- 2*x + rnorm(100) # Plot GAM smooth line only scatter.gam(x, y) # Equivalent call using formula syntax (y ~ x) scatter.gam(y ~ x) # Include scatter plot with underlying data points behind the GAM line scatter.gam(x, y, data.dots = TRUE) # Include summary points showing mean x and y for each tertile bin scatter.gam(x, y, three.dots = TRUE) # Customize the plot with a custom title, line color, and line width scatter.gam(x, y, data.dots = TRUE, col = "red", lwd = 2, main = "GAM Fit") # Control smoothness of the GAM line by specifying the basis dimension scatter.gam(x, y, k = 10)
Beeswarm plot for compared-stimulus designs from "Stimulus Sampling Reimagined" (Simonsohn, Montealegre, & Evangelidis, 2025).
stimulus.beeswarm( data, dv, stimulus, condition, flip.conditions = FALSE, dv.is.percentage = FALSE, simtot = 500, confidence = 95, ylim = c(), ylab1 = "", ylab2 = "", xlab1 = "", xlab2 = NULL, dot.spacing = "auto", col1 = "blue4", col2 = "red4", main = "", watermark = TRUE, save.as = "", svg.width = "", svg.height = "", ... )stimulus.beeswarm( data, dv, stimulus, condition, flip.conditions = FALSE, dv.is.percentage = FALSE, simtot = 500, confidence = 95, ylim = c(), ylab1 = "", ylab2 = "", xlab1 = "", xlab2 = NULL, dot.spacing = "auto", col1 = "blue4", col2 = "red4", main = "", watermark = TRUE, save.as = "", svg.width = "", svg.height = "", ... )
data |
Data frame containing variables to be analyzed. |
dv |
Name of the dependent variable. |
stimulus |
Name of the variable containing the stimulus ID. |
condition |
Name of the variable containing the condition indicator. |
flip.conditions |
If 'TRUE', reverse the order of condition labels. |
dv.is.percentage |
If 'TRUE', format values as percentages. |
simtot |
Number of bootstraps for the confidence band under homogeneity. |
confidence |
Confidence level for the band (default 95). |
ylim |
Optional y-axis limits. |
ylab1, ylab2
|
Labels on the y-axis (optional). |
xlab1, xlab2
|
Labels on the x-axis (optional). |
dot.spacing |
Horizontal distance between stimulus labels ('"auto"' by default). |
col1, col2
|
Colors for the two conditions. |
main |
Plot title. |
watermark |
If 'TRUE', display package version watermark. |
save.as |
File path for saving the figure ('.svg' or '.png', optional). |
svg.width |
Optional width when saving. |
svg.height |
Optional height when saving. |
... |
Additional arguments passed to [graphics::plot.default()]. |
Invisibly, a two-column matrix of beeswarm coordinates.
Build stimulus plots comparing results for individual stimuli in an experiment (Simonsohn, Montealegre, & Evangelidis, 2025).
stimulus.plot( plot.type = "means", data, dv, condition, stimulus, participant = "", save.as = "", svg.width = "", svg.height = "", sort.by = "", flip.conditions = FALSE, model = c(), overall.estimate = c(), overall.ci = c(), overall.p = c(), overall.label = c(), ylab1 = "", ylab2 = "", xlab1 = "Stimuli", xlab2 = "", decimals = "auto", null.method = "shuffle", dv.is.percentage = FALSE, legend.title = "", simtot = 1000, watermark = TRUE, seed = 2024, ylim = c(), main = "", ... )stimulus.plot( plot.type = "means", data, dv, condition, stimulus, participant = "", save.as = "", svg.width = "", svg.height = "", sort.by = "", flip.conditions = FALSE, model = c(), overall.estimate = c(), overall.ci = c(), overall.p = c(), overall.label = c(), ylab1 = "", ylab2 = "", xlab1 = "Stimuli", xlab2 = "", decimals = "auto", null.method = "shuffle", dv.is.percentage = FALSE, legend.title = "", simtot = 1000, watermark = TRUE, seed = 2024, ylim = c(), main = "", ... )
plot.type |
Either '"means"' or '"effects"'; determines what is plotted on the y-axis. |
data |
Data frame containing variables to be analyzed. |
dv |
Name of the dependent variable (e.g., ‘dv = ’y''); quotes are not required. |
condition |
Name of the variable containing the condition indicator. |
stimulus |
Name of the variable containing the stimulus ID. |
participant |
Name of the variable containing participant IDs; necessary for valid inference when ‘plot.type = ’effects'' and each participant provided more than one observation. |
save.as |
File path for saving the figure ('.svg' or '.png', optional). |
svg.width |
Optional width when saving to SVG/PNG. |
svg.height |
Optional height when saving to SVG/PNG. |
sort.by |
Variable to sort stimuli by. Defaults to sorting by observed effect size. |
flip.conditions |
If 'TRUE', subtract the first condition from the second instead of the default. |
model |
Method used to compute overall average: ''regression'‘, '’intercepts'', ''slopes'‘, and/or '’all''. |
overall.estimate |
Scalar or vector of overall average effect computed outside statuser. |
overall.ci |
Confidence interval bounds for 'overall.estimate'. |
overall.p |
P-value for overall average effect computed outside statuser. |
overall.label |
Label for overall averages on the x-axis. |
ylab1, ylab2
|
Labels on the y-axis (optional). |
xlab1, xlab2
|
Labels on the x-axis (optional). |
decimals |
Number of decimals for value labels ('"auto"' by default). |
null.method |
Null method for effects plots: '"shuffle"' or '"demean"'. |
dv.is.percentage |
If 'TRUE', format the dependent variable as percentages. |
legend.title |
Text with title above legend (optional). |
simtot |
Number of resamples for heterogeneity under null (effects plots). |
watermark |
If 'TRUE', display package version watermark. |
seed |
Seed used when resampling for effects plots. |
ylim |
Optional y-axis limits. |
main |
Plot title. |
... |
Additional arguments passed to [graphics::plot.default()]. |
Invisibly, a data frame (means plot) or list (effects plot).
Summary method for lm2 objects
## S3 method for class 'lm2' summary(object, ...)## S3 method for class 'lm2' summary(object, ...)
object |
An object of class |
... |
Additional arguments passed to |
Invisibly returns the original object
The basic t-test function in R, t.test, does not report
the observed difference of means, does not stipulate which mean
is subtracted from which (i.e., whether it computed A-B or B-A), and
presents the test results on the console in a verbose unorganized
paragraph of text. t.test2 improves on all those counts, and
in addition, it reports the number of observations per group and if any observations
are missing it issues a warning. It returns a dataframe instead of a list.
The function is also registered as t.test2 for the t
generic to satisfy S3 registration checks while keeping direct calls to
t.test2(...) unchanged.
x |
The first data argument passed to |
... |
Arguments passed to |
A data frame with class c("t.test2", "data.frame") containing
a single row with the following columns:
One or two columns containing group means, named after
the input variables (e.g., men, women) or Group 1,
Group 2 for long names.
For two-sample tests, the difference between means
(e.g., men-women).
For two-sample independent tests only, Cohen's d computed as the mean difference divided by the pooled standard deviation.
The confidence level as a string (e.g., "95 percent").
Lower and upper bounds of the confidence interval.
The t-statistic.
Degrees of freedom.
The p-value.
Sample sizes, named N(group1), N(group2) or
N1, N2. For paired tests, a single N column.
For paired tests only, the correlation between pairs.
Attributes store additional information including missing value counts and test type (one-sample, two-sample, paired, Welch vs. Student).
# Two-sample t-test men <- rnorm(100, mean = 5, sd = 1) women <- rnorm(100, mean = 4.8, sd = 1) t.test2(men, women) # Paired t-test x <- rnorm(50, mean = 5, sd = 1) y <- rnorm(50, mean = 5.2, sd = 1) t.test2(x, y, paired = TRUE) # One-sample t-test data <- rnorm(100, mean = 5, sd = 1) t.test2(data, mu = 0) # Formula syntax data <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) t.test2(y ~ group, data = data)# Two-sample t-test men <- rnorm(100, mean = 5, sd = 1) women <- rnorm(100, mean = 4.8, sd = 1) t.test2(men, women) # Paired t-test x <- rnorm(50, mean = 5, sd = 1) y <- rnorm(50, mean = 5.2, sd = 1) t.test2(x, y, paired = TRUE) # One-sample t-test data <- rnorm(100, mean = 5, sd = 1) t.test2(data, mu = 0) # Formula syntax data <- data.frame(y = rnorm(100), group = rep(c("A", "B"), 50)) t.test2(y ~ group, data = data)
The function table does not show variable
names when tabulating from a dataframe, requires running another
function, prop.table, to tabulate proportions
and yet another function, chisq.test to test difference of
proportions. table2 does what those three functions do, producing easier to
read output, and always shows variable names.
... |
same arguments as |
prop |
report a table with:
|
digits |
Number of decimal values to show for proportions |
chi |
Logical. If |
correct |
Logical. If |
A list (object of class "table2") with the following components:
freq: frequency table
prop: proportions table
chisq: chi-square test
# Create example data df <- data.frame( group = c("A", "A", "B", "B", "A"), status = c("X", "Y", "X", "Y", "X") ) # Enhanced table with variable names (2 variables) table2(df$group, df$status) # Enhanced table with variable names (3 variables) df3 <- data.frame( x = c("A", "A", "B", "B"), y = c("X", "Y", "X", "Y"), z = c("high", "low", "high", "low") ) table2(df3$x, df3$y, df3$z) # Table with proportions table2(df$group, df$status, prop = 'all') # Overall proportions table2(df$group, df$status, prop = 'row') # Row proportions table2(df$group, df$status, prop = 'col') # Column proportions # Table with chi-square test table2(df$group, df$status, chi = TRUE,prop='all')# Create example data df <- data.frame( group = c("A", "A", "B", "B", "A"), status = c("X", "Y", "X", "Y", "X") ) # Enhanced table with variable names (2 variables) table2(df$group, df$status) # Enhanced table with variable names (3 variables) df3 <- data.frame( x = c("A", "A", "B", "B"), y = c("X", "Y", "X", "Y"), z = c("high", "low", "high", "low") ) table2(df3$x, df3$y, df3$z) # Table with proportions table2(df$group, df$status, prop = 'all') # Overall proportions table2(df$group, df$status, prop = 'row') # Row proportions table2(df$group, df$status, prop = 'col') # Column proportions # Table with chi-square test table2(df$group, df$status, chi = TRUE,prop='all')
Adds to text() optional background color and horizontal alignment (align='center')
x, y
|
coordinates for text placement |
labels |
text to display |
align |
alignment in relation to x coordinate ('left','center','right') |
bg |
background color |
cex |
character expansion factor |
pad |
left/right padding in percentage (e.g., .03) |
pad_v |
top/bottom padding in percentage (e.g., .25) |
... |
Additional arguments passed to |
No return value, called for side effects. Adds text with an optional background rectangle to an existing plot.
# Create a simple plot plot(1:10, 1:10, type = "n", main = "text2() - Alignment & Color") # Alignment with respect to x=5 text2(5, 8, "align='left' from 5", align = "left", bg = "yellow1") text2(5, 7, "align='right' from 5", align = "right", bg = "blue", col = "white") text2(5, 6, "align='center' from 5", align = "center", bg = "black", col = "white") abline(v = 5, lty = 2) # Multiple labels with different alignments text2(c(2, 5, 8), c(5, 5, 5), labels = c("Left", "Center", "Right"), align = c("left", "center", "right"), bg = c("pink", "lightblue", "lightgreen")) # Text with custom font color (passed through ...) text2(5, 3, "Red Text", col = "red", bg = "white") # Padding examples plot(1:10, 1:10, type = "n", main = "Padding Examples") # Default padding (pad=0.03, pad_v=0.25) text2(5, 8, "Default padding", bg = "lightblue") # More horizontal padding text2(5, 6, "Wide padding", pad = 0.2, bg = "lightgreen") # More vertical padding text2(5, 4, "Tall padding", pad_v = 0.8, bg = "lightyellow") # Both padding increased text2(5, 2, "Extra padding", pad = 0.15, pad_v = 0.6, bg = "pink")# Create a simple plot plot(1:10, 1:10, type = "n", main = "text2() - Alignment & Color") # Alignment with respect to x=5 text2(5, 8, "align='left' from 5", align = "left", bg = "yellow1") text2(5, 7, "align='right' from 5", align = "right", bg = "blue", col = "white") text2(5, 6, "align='center' from 5", align = "center", bg = "black", col = "white") abline(v = 5, lty = 2) # Multiple labels with different alignments text2(c(2, 5, 8), c(5, 5, 5), labels = c("Left", "Center", "Right"), align = c("left", "center", "right"), bg = c("pink", "lightblue", "lightgreen")) # Text with custom font color (passed through ...) text2(5, 3, "Red Text", col = "red", bg = "white") # Padding examples plot(1:10, 1:10, type = "n", main = "Padding Examples") # Default padding (pad=0.03, pad_v=0.25) text2(5, 8, "Default padding", bg = "lightblue") # More horizontal padding text2(5, 6, "Wide padding", pad = 0.2, bg = "lightgreen") # More vertical padding text2(5, 4, "Tall padding", pad_v = 0.8, bg = "lightyellow") # Both padding increased text2(5, 2, "Extra padding", pad = 0.15, pad_v = 0.6, bg = "pink")
Implements the two-lines test for U-shaped (or inverted U-shaped) relationships introduced by Simonsohn (2018).
twolines( f, graph = 1, link = "gaussian", data = NULL, pngfile = "", quiet = FALSE )twolines( f, graph = 1, link = "gaussian", data = NULL, pngfile = "", quiet = FALSE )
f |
A formula object specifying the model (e.g., y ~ x1 + x2 + x3). The first predictor is the one tested for a u-shaped relationship. |
graph |
Integer. If 1 (default), produces a plot. If 0, no plot is generated. |
link |
Character string specifying the link function for the GAM model. Default is "gaussian". |
data |
An optional data frame containing the variables in the formula. If not provided, variables are evaluated from the calling environment. |
pngfile |
Optional character string. If provided, saves the plot to a PNG file with the specified filename. |
quiet |
Logical. If TRUE, suppresses the Robin Hood details messages. Default is FALSE. |
Reference: Simonsohn, Uri (2018) "Two lines: A valid alternative to the invalid testing of U-shaped relationships with quadratic regressions." AMPPS, 538-555. doi:10.1177/2515245918805755
The test beings fitting a GAM model, predicting y with a smooth of x, and optionally with covariates. It identifies the interior most extreme value of fitted y, and adjusts from the matching x-value to set the breakpoint relying on the Robin Hood procedure introduced also by Simonsohn (2018). It then estimates the (once) interrupted regression using that breakpoint, and reports the slope and significance of the average slopes at either side of it. A U-shape is significant if the slopes are of opposite sign and are both individually significant.
A list containing:
All elements from reg2(): b1, b2, p1, p2,
z1, z2, u.sig, xc, glm1, glm2, rob1,
rob2, msg, yhat.smooth
yobs: Observed y values (adjusted for covariates if present)
y.hat: Fitted values from GAM
y.ub, y.lb: Upper and lower bounds for fitted values
y.most: Most extreme fitted value
x.most: x-value associated with most extreme fitted value
f: Formula as character string
bx1, bx2: Linear and quadratic coefficients from preliminary quadratic regression
minx: Minimum x value
midflat: Median of flat region
midz1, midz2: Z-statistics at midpoint
# Simple example with simulated data set.seed(123) x <- rnorm(100) y <- -x^2 + rnorm(100) data <- data.frame(x = x, y = y) result <- twolines(y ~ x, data = data) # With covariates z <- rnorm(100) y <- -x^2 + 0.5*z + rnorm(100) data <- data.frame(x = x, y = y, z = z) result <- twolines(y ~ x + z, data = data) # Without data argument (variables evaluated from environment) x <- rnorm(100) y <- -x^2 + rnorm(100) result <- twolines(y ~ x)# Simple example with simulated data set.seed(123) x <- rnorm(100) y <- -x^2 + rnorm(100) data <- data.frame(x = x, y = y) result <- twolines(y ~ x, data = data) # With covariates z <- rnorm(100) y <- -x^2 + 0.5*z + rnorm(100) data <- data.frame(x = x, y = y, z = z) result <- twolines(y ~ x + z, data = data) # Without data argument (variables evaluated from environment) x <- rnorm(100) y <- -x^2 + rnorm(100) result <- twolines(y ~ x)
Assign labels to describe variables in a data frame. Can be called on a single variable or full data, to assign or to read existing labels.
var_labels(x) var_labels(x) <- valuevar_labels(x) var_labels(x) <- value
x |
A vector or a data frame. |
value |
Character label(s) to assign. |
For a vector, this reads/writes a base-R "label" attribute.
For a data frame, this reads/writes the "label" attribute of each
column.
- If x is a data frame: a named character vector with one entry per
column (missing labels are returned as NA_character_).
- Otherwise: a single character string (or NA_character_).
df <- data.frame(x = 1:3, y = 4:6) # Set labels for all columns var_labels(df) <- c("this is x", "this is y") var_labels(df) # Set a label for a single column var_labels(df$x) <- "this is x" var_labels(df$x)df <- data.frame(x = 1:3, y = 4:6) # Set labels for all columns var_labels(df) <- c("this is x", "this is y") var_labels(df) # Set a label for a single column var_labels(df$x) <- "this is x" var_labels(df$x)