Test statistic distribution under the null

Constructs a list which defines the test statistic reference distribution under the null hypothesis.

Usage

asymptotic()

simulated(method = "approximate", nsims = 1000L, ncores = 1L, ...)

Arguments

method: (Scalar string: "approximate")
The method used to derive the distribution of the test statistic under the null hypothesis. Must be one of "approximate" (default) or "exact". See 'Details' for additional information.
nsims: (Scalar integer: 1000L; [2, Inf))
The number of resamples for method = "approximate". Not used for method = "exact", except for the case when the number of exact resamples exceeds approximately 1e6 and then method = "approximate" will be used as a fallback. In the power() context, nsims defines the number of simulated datasets under the null hypothesis. For this case you would typically set nsims as greater than or equal to the number of simulated datasets in the design row of the power analysis. See 'Details' for additional information.
ncores: (Scalar integer: 1L; [1, Inf))
The number of cores (number of worker processes) to use. Do not set greater than the value returned by parallel::detectCores().
...: Optional arguments for internal use.

Value

list

Details

The default asymptotic test is performed for distribution = asymptotic().

When setting argument distribution = simulated(method = "exact"), the exact randomization test is defined by:

Independent two-sample tests
1. Calculate the observed test statistic.
2. Check if length(combn(x=n1+n2, m=n1))<1e6
  1. If TRUE continue with the exact randomization test.
  2. If FALSE revert to the approximate randomization test.
3. For all combn(x=n1+n2, m=n1) permutations:
  1. Assign corresponding group labels.
  2. Calculate the test statistic.
4. Calculate the exact randomization test p-value as the mean of the logical vector resampled_test_stats >= observed_test_stat.
Dependent two-sample tests
1. Calculate the observed test statistic.
2. Check if npairs < 21 (maximum 2^20 resamples)
  1. If TRUE continue with the exact randomization test.
  2. If FALSE revert to the approximate randomization test.
3. For all 2^npairs permutations:
  1. Assign corresponding pair labels.
  2. Calculate the test statistic.
4. Calculate the exact randomization test p-value as the mean of the logical vector resampled_test_stats >= observed_test_stat.

For argument distribution = simulated(method = "approximate"), the approximate randomization test is defined by:

Independent two-sample tests
1. Calculate the observed test statistic.
2. For nsims iterations:
  1. Randomly assign group labels.
  2. Calculate the test statistic.
3. Insert the observed test statistic to the vector of resampled test statistics.
4. Calculate the approximate randomization test p-value as the mean of the logical vector resampled_test_stats >= observed_test_stat.
Dependent two-sample tests
1. Calculate the observed test statistic.
2. For nsims iterations:
  1. Randomly assign pair labels.
  2. Calculate the test statistic.
3. Insert the observed test statistic to the vector of resampled test statistics.
4. Calculate the approximate randomization test p-value as the mean of the logical vector resampled_test_stats >= observed_test_stat.

In the power analysis setting, power(), we can simulate data for groups 1 and 2 using their known distributions under the assumptions of the null hypothesis. Unlike above where nonparametric randomization tests are performed, in this setting approximate parametric tests are performed.

For example, power(wald_test_nb(distribution = simulated())) would result in an approximate parametric Wald test defined by:

For each relevant design row in data:
1. For simulated(nsims=integer()) iterations:
  1. Simulate new data for group 1 and group 2 under the null hypothesis.
  2. Calculate the Wald test statistic, $\chi^2_{null}$.
2. Collect all $\chi^2_{null}$ into a vector.
3. For each of the sim_nb(nsims=integer()) simulated datasets:
  1. Calculate the Wald test statistic, $\chi^2_{obs}$.
  2. Calculate the p-value based on the empirical null distribution of test statistics, $\chi^2_{null}$. (the mean of the logical vector null_test_stats >= observed_test_stat)
4. Collect all p-values into a vector.
5. Calculate power as sum(p <= alpha) / nsims.
Return all results from power().

Randomization tests use the positive-biased p-value estimate in the style of Davison and Hinkley (1997) (see also Phipson and Smyth (2010) ):

$$ \hat{p} = \frac{1 + \sum_{i=1}^B \mathbb{I} \{\chi^2_i \geq \chi^2_{obs}\}}{B + 1}. $$

The number of resamples defines the minimum observable p-value (e.g. nsims=1000L results in min(p-value)=1/1001). It's recommended to set $\text{nsims} \gg \frac{1}{\alpha}$.

References

Davison AC, Hinkley DV (1997). Bootstrap Methods and their Application, 1 edition. Cambridge University Press. ISBN 9780521574716, doi:10.1017/CBO9780511802843 .

Phipson B, Smyth GK (2010). “Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1). ISSN 1544-6115, doi:10.48550/arXiv.1603.05766 .

Examples

#----------------------------------------------------------------------------
# asymptotic() examples
#----------------------------------------------------------------------------
library(depower)

set.seed(1234)
data <- sim_nb(
  n1 = 60,
  n2 = 40,
  mean1 = 10,
  ratio = 1.5,
  dispersion1 = 2,
  dispersion2 = 8
)

data |>
  wald_test_nb(distribution = asymptotic())
#> $chisq
#> [1] 11.35158
#> 
#> $df
#> [1] 1
#> 
#> $p
#> [1] 0.0007538376
#> 
#> $ratio
#> $ratio$estimate
#> [1] 1.542934
#> 
#> $ratio$lower
#> [1] NA
#> 
#> $ratio$upper
#> [1] NA
#> 
#> 
#> $mean1
#> [1] 9.316667
#> 
#> $mean2
#> [1] 14.375
#> 
#> $dispersion1
#> [1] 1.545421
#> 
#> $dispersion2
#> [1] 11.08002
#> 
#> $n1
#> [1] 60
#> 
#> $n2
#> [1] 40
#> 
#> $method
#> [1] "Asymptotic Wald test for independent negative binomial ratio of means"
#> 
#> $ci_level
#> NULL
#> 
#> $equal_dispersion
#> [1] FALSE
#> 
#> $link
#> [1] "log"
#> 
#> $ratio_null
#> [1] 1
#> 
#> $mle_code
#> [1] 0
#> 
#> $mle_message
#> [1] "relative convergence (4)"
#> 

#----------------------------------------------------------------------------
# simulated() examples
#----------------------------------------------------------------------------
data |>
  wald_test_nb(distribution = simulated(nsims = 200L))
#> $chisq
#> [1] 11.35158
#> 
#> $df
#> [1] 1
#> 
#> $p
#> [1] 0.00990099
#> 
#> $ratio
#> $ratio$estimate
#> [1] 1.542934
#> 
#> $ratio$lower
#> [1] NA
#> 
#> $ratio$upper
#> [1] NA
#> 
#> 
#> $mean1
#> [1] 9.316667
#> 
#> $mean2
#> [1] 14.375
#> 
#> $dispersion1
#> [1] 1.545421
#> 
#> $dispersion2
#> [1] 11.08002
#> 
#> $n1
#> [1] 60
#> 
#> $n2
#> [1] 40
#> 
#> $method
#> [1] "Approximate randomization Wald test for independent negative binomial ratio of means"
#> 
#> $ci_level
#> NULL
#> 
#> $equal_dispersion
#> [1] FALSE
#> 
#> $link
#> [1] "log"
#> 
#> $ratio_null
#> [1] 1
#> 
#> $mle_code
#> [1] 0
#> 
#> $mle_message
#> [1] "relative convergence (4)"
#>