Skip to contents

Computes the Hodges-Lehmann pseudomedian and bootstrap confidence interval.

Usage

pmedian(
  data,
  formula,
  conf_level = 0.95,
  conf_method = "percentile",
  n_resamples = 1000L,
  agg_fun = "error"
)

pmedian2(
  x,
  y = NULL,
  conf_level = 0.95,
  conf_method = "percentile",
  n_resamples = 1000L
)

Arguments

data

(data.frame)
The data frame of interest.

formula

(formula)
A formula of form:

y ~ group | block

Use when data is in tall format. y is the numeric outcome, group is the binary grouping variable, and block is the subject/item-level variable indicating pairs of observations. group will be converted to a factor and the first level will be the reference value. For example, when levels(data$group) <- c("pre", "post"), the focal level is 'post', so differences are post - pre. Pairs with non-finite values (infinite or missing) are silently dropped. See agg_fun for handling duplicate cases of grouping/blocking combinations.

y ~ x

Use when data is in wide format. y and x must be numeric vectors. Differences are calculated as data$y - data$x. Pairs with non-finite values (infinite or missing) are silently dropped.

~ x

Use when data$x represents pre-calculated differences or for the one-sample case. Non-finite values (infinite or missing) are silently dropped.

conf_level

(Scalar numeric: 0.95; [0, 1))
The confidence level. If conf_level = 0, no confidence interval is calculated.

conf_method

(Scalar character: c("percentile", "bca"))
The type of bootstrap confidence interval.

n_resamples

(Scalar integer: 1000L; [10L, Inf))
The number of bootstrap resamples. If conf_level = 0, no resampling is performed.

agg_fun

(Scalar character or function: "error")
Used for aggregating duplicate cases of grouping/blocking combinations when data is in tall format and formula has structure y ~ group | block. "error" (default) will return an error if duplicate grouping/blocking combinations are encountered. Select one of "first", "last", "sum", "mean", "median", "min", or "max" for built in aggregation handling (each applies na.rm = TRUE). Or define your own function. For example, myfun <- function(x) {as.numeric(quantile(x, 0.75, na.rm = TRUE))}.

x

(numeric)
Numeric vector of data. Values with non-finite values (infinite or missing) are silently dropped.

y

(numeric: NULL)
Numeric vector of data or NULL. If NULL (default), a one-sample test is performed using x. If numeric, differences are calculated as x - y. Pairs with non-finite values (infinite or missing) are silently dropped.

Value

A list with the following elements:

SlotSubslotNameDescription
1pseudomedianMeasure of centrality.
2lowerLower bound of confidence interval for the pseudomedian.
3upperUpper bound of confidence interval for the pseudomedian.
4methodEstimate method.
5infoAdditional information.
51n_sampleNumber of observations in the original data.
52n_analyticNumber of observations after removing non-finite values from the original data.
53data_typeData type.
54focal_nameName of the focal variable (differences are focal - reference).
55reference_nameName of the reference variable (differences are focal - reference).
6callA named list of the function's arguments (use as.call() to convert to a call).

Details

This function generates a confidence interval for the pseudomedian based on the observed data, not based on an inversion of the signed-rank test srt().

The Hodges-Lehmann estimator is the median of all pairwise averages of the sample values. $$\mathrm{HL} = \mathrm{median} \left\{ \frac{x_i + x_j}{2} \right\}_{i \le j}$$ This pseudomedian is a robust, distribution-free estimate of central tendency for a single sample, or a location-shift estimator for paired data. It's resistant to outliers and compatible with rank-based inference.

The percentile and BCa bootstrap confidence interval methods are described in chapter 5.3 of Davison and Hinkley (1997) .

This function is mainly a wrapper for the function Hmisc::pMedian().

References

Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application, 1 edition. Cambridge University Press. ISBN 9780511802843, doi:10.1017/CBO9780511802843 .

Harrell Jr FE (2025). Hmisc: Harrell Miscellaneous. R package version 5.2-4, https://hbiostat.org/R/Hmisc/.

See also

Examples

#----------------------------------------------------------------------------
# pmedian() example
#----------------------------------------------------------------------------
library(rankdifferencetest)

# Use example data from Kornbrot (1990)
data <- kornbrot_table1

# Create tall-format data for demonstration purposes
data_tall <- reshape(
  data = kornbrot_table1,
  direction = "long",
  varying = c("placebo", "drug"),
  v.names = c("time"),
  idvar = "subject",
  times = c("placebo", "drug"),
  timevar = "treatment",
  new.row.names = seq_len(prod(length(c("placebo", "drug")), nrow(kornbrot_table1)))
)

# Subject and treatment should be factors. The ordering of the treatment factor
# will determine the difference (placebo - drug).
data_tall$subject <- factor(data_tall$subject)
data_tall$treatment <- factor(data_tall$treatment, levels = c("drug", "placebo"))

# Rate transformation inverts the rank ordering.
data$placebo_rate <- 60 / data$placebo
data$drug_rate <- 60 / data$drug
data_tall$rate <- 60 / data_tall$time

# Estimates
pmedian(
  data = data,
  formula = placebo ~ drug
)
#> Error in get_dep_data(data = x, formula = formula, single_term = TRUE,     agg_fun = agg_fun): unused argument (single_term = TRUE)

pmedian(
  data = data_tall,
  formula = time ~ treatment | subject
)
#> Error in get_dep_data(data = x, formula = formula, single_term = TRUE,     agg_fun = agg_fun): unused argument (single_term = TRUE)

pmedian2(
  x = data$placebo_rate,
  y = data$drug_rate
)
#> 
#> Hodges-Lehmann estimator and percentile bootstrap confidence
#> interval
#> 
#> Paired differences: data$placebo_rate - data$drug_rate
#> 
#> Pseudomedian: -1.78
#> 95% CI: -6.37, -0.107

pmedian(
  data = data_tall,
  formula = rate ~ treatment | subject
)
#> Error in get_dep_data(data = x, formula = formula, single_term = TRUE,     agg_fun = agg_fun): unused argument (single_term = TRUE)