3/17/2020

\(p\)-values

One of the most common ways to report statistical test outcomes is by using \(p\)-values, or attained significance levels.

For tests where a rejection threshold \(T\geq k=k(\alpha)\) is picked based on the requested significance level, a realization \(t\) of \(T\) splits the interval \([0,1]\) into two parts: \[ [0,1] = \{\alpha : t \geq k(\alpha)\} \cup \{\alpha : t \ngeq k(\alpha)\} \]

The smallest value in \(\{\alpha : t \geq k(\alpha)\}\) is the “best possible” rejection level (smallest rejection region) that still rejects at \(T=t\).

\(p\)-values

Definition

Given a test with test statistic \(T\) and a realization \(t\) of \(T\), the corresponding p-value is the smallest level \(\alpha\) at which the test rejects the null for \(T=t\).

\(p\)-values with known distributions

For many if not most tests we meet in this course, the distribution of the test statistic under the null hypothesis is well known.

Calculating a \(p\)-value often corresponds directly to computing the CDF of the test statistic. Let the test statistic \(T\) be realized by an observed \(t\).

\(p\)-values with known distributions

\(H_0: T=0\) and Rejection Region: \(\{T \geq k\}\). The smallest rejection region that still rejects at \(T=t\) is given by \(k=t\).

The corresponding \(p = \alpha = \mathbb{P}(T\geq t | H_0) = 1-F_T(t)\).

\(p\)-values with known distributions

\(H_0: T=0\) and Rejection Region: \(\{T \leq k\}\). The smallest rejection region that still rejects at \(T=t\) is given by \(k=t\).

The corresponding \(p = \alpha = \mathbb{P}(T\leq t | H_0) = F_T(t)\).

\(p\)-values with known distributions

\(H_0: T=0\) and Rejection Region: \(\{|T| \geq k\}\). The smallest rejection region that still rejects at \(T=t\) is given by \(k=|t|\).

The corresponding \(p = \alpha = \mathbb{P}(|T|\geq |t| | H_0) = F_T(-|t|) + (1-F_T(|t|))\).

When symmetric around 0, the two tails are equal and \(p=2F_T(-|t|)\).

\(p\)-values as random variables

Since \(p\) depends on the actual outcome \(t\) of \(T\), it is a random variable itself.

Consider the case where the rejection region is \(\{T \leq k\}\). For the CDF, we get:

\[ \begin{aligned} F_P(p) &= \mathbb{P}(P\leq p) \\ &= \mathbb{P}(F_T(T)\leq p) \\ &= \mathbb{P}\left(F_T^{-1}(F_T(T))\leq F_T^{-1}(p)\right) \\ &= \mathbb{P}\left(T\leq F_T^{-1}(p)\right) \\ &= F_T\left(F_T^{-1}(p)\right) = p \end{aligned} \]

The other cases follow analogously.

Random Distributions in R

  • rdist(n, ...) - Generate \(n\) random numbers from dist
  • pdist(x, ...) - Calculate the CDF: \(F(x)\)
  • qdist(p, ...) - Calculate the inverse CDF: \(F^{-1}(p)\)
  • ddist(x, ...) - Calculate the likelihood (density): \(\mathcal{L}(x)\)

Examples

  • rnorm, pnorm, qnorm, dnorm - Normal distribution
  • rbinom, pbinom, qbinom, dbinom - Binomial distribution
  • rpois, ppois, qpois, dpois - Poisson distribution
  • rexp, pexp, qexp, dexp - Exponential distribution
  • rt, pt, qt, dt - Student’s T-distribution
  • rchisq, pchisq, qchisq, dchisq - chi-square distribution