Two-sample Inference: Proportions, Variances

Mikael Vejdemo-Johansson

Two-sample tests for proportions

Let \(X_1 \sim Binomial(n_1, p_1)\) and \(X_2 \sim Binomial(n_2, p_2)\). We are interested in relative behaviors of \(p_1\) and \(p_2\).

There are two “obvious” ways to compare \(p_1\) to \(p_2\) - by their difference \(p_1-p_2\) or by their ratio \(p_1/p_2\)1

Testing for difference of proportions

The parameter \(p_1-p_2\) does not have an obvious exact distribution, but we can use the central limit theorem here. \(p_1-p_2\) has an unbiased, asymptotically normal estimator in \(\hat{p}_1 - \hat{p}_2\), and we can see that for \(n_1, n_2\) large enough that \(n_1\hat{p}_1 \geq 10\) and \(n_2\hat{p}_2 \geq 10\),

\[ \hat{p}_1-\hat{p}_2 \overset\approx\sim \mathcal{N}(p_1-p_2, \sigma^2_{\hat{p}_1} + \sigma^2_{\hat{p}_2}) \qquad\text{and}\qquad \sigma^2_{\hat{p}_1} + \sigma^2_{\hat{p}_2} = \frac{p_1q_1}{n_1} + \frac{p_2q_2}{n_2} \]

We may proceed to construct a Wald test as usual, with one important tweak - we need to keep track of what we are assuming at any given point.

If our assumption is \(p_1\neq p_2\) (because our null hypothesis is a non-zero difference in proportions, or because we are calculating power or confidence intervals), then we are justified (for large samples) to insert \(\hat{p}_1\) for \(p_1\) and \(\hat{p}_2\) for \(p_2\).

If we are assuming \(p_1=p_2=p\), then we need to pool our data. Instead of separate samples of size \(n_1\) and \(n_2\) from different populations, we have a single sample \(X_1+X_2\sim Binomial(n_1+n_2, p)\) and get a pooled estimator:

\[ \hat{p} = \frac{X_1+X_2}{n_1+n_2} = \frac{n_1}{n_1+n_2}\hat{p}_1 + \frac{n_2}{n_1+n_2}\hat{p}_2 \]

Two-sample tests for proportions

From the Wald construction we get:

Large-Sample Hypothesis Test for Difference of Proportions

Write \(\hat{p}_1 = X_1/n_1\), \(\hat{p}_2 = X_2/n_2\), \(\hat{p} = (X_1+X_2)/(n_1+n_2)\), and \(q_1, q_2, q, \hat{q}_1, \hat{q}_2, \hat{q}\) all defined as usual.

Null Hypothesis
\(H_0: p_1 - p_2 = \Delta_0\), often with \(\Delta_0 = 0\)
Test Statistic
\(Z = \frac{(\hat{p}_1-\hat{p}_2) - \Delta_0}{\sqrt{\hat{p}\hat{q}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\)
Alternative Hypothesis Rejection Region for Level \(\alpha\) Power at \(p_1\) and \(p_2\) Sample size needed for power \(1-\beta\) at \(p_1\) and \(p_2\)
Upper Tail \(z \geq z_\alpha\) \(1-\Phi\left(\frac{z_\alpha\sqrt{\overline{p}\overline{q}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}-(p_1-p_2)}{\sigma}\right)\) \(\frac{\left(z_\alpha\sqrt{(p_1+p_2)(q_1+q_2)/2}+z_\beta\sqrt{p_1q_1+p_2q_2}\right)^2}{(p_1-p_2)^2}\)
Two-tailed \(|z| \geq z_{\alpha/2}\) \(\Phi\left(\frac{z_{\alpha/2}\sqrt{\overline{p}\overline{q}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}-(p_1-p_2)}{\sigma}\right) - \Phi\left(\frac{-z_{\alpha/2}\sqrt{\overline{p}\overline{q}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}-(p_1-p_2)}{\sigma}\right)\) \(\frac{\left(z_{\alpha/2}\sqrt{(p_1+p_2)(q_1+q_2)/2}+z_\beta\sqrt{p_1q_1+p_2q_2}\right)^2}{(p_1-p_2)^2}\)
Lower Tail \(z \leq -z_\alpha\) \(\Phi\left(\frac{-z_\alpha\sqrt{\overline{p}\overline{q}\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}-(p_1-p_2)}{\sigma}\right)\) \(\frac{\left(z_\alpha\sqrt{(p_1+p_2)(q_1+q_2)/2}+z_\beta\sqrt{p_1q_1+p_2q_2}\right)^2}{(p_1-p_2)^2}\)

Where we also define \(\overline{p} = (n_1p_1+n_2p_2)/(n_1+n_2)\), \(\overline{q}\) and \(\sigma=\sqrt{\frac{p_1q_1}{n_1}+\frac{p_2q_2}{n_2}}\).

Confidence Interval
\(\hat{p}_1-\hat{p}_2 \pm z_{\alpha/2}\sqrt{\frac{\hat{p}_1\hat{q}_1}{n_1}+\frac{\hat{p}_2\hat{q}_2}{n_2}}\) or, especially for small sample sizes, you may want to use \(\tilde{p}_1=(X_1+1)/(n_1+2)\) and \(\tilde{p}_2=(X_2+1)/(n_2+2)\) instead of \(\hat{p}_1\) and \(\hat{p}_2\).

What about ratios instead of differences?

Chapter 10; Exercise 54

The parameter of interest is the risk ratio or relative risk \(RR = \frac{p_1}{p_2}\). We pick as our estimator of interest \(\widehat{RR} = \hat{p}_1/\hat{p}_2\).

For large \(n_1, n_2\), we get an approximately normal distribution of the log of the risk ratio: \[ \log{\widehat{RR}} = \log\hat{p}_1 - \log\hat{p}_2 \sim\mathcal{N}\left(\log{RR}, \frac{n_1-X_1}{n_1X_1} + \frac{n_2-X_2}{n_2X_2}\right) \]

From this, we may compute a large-sample confidence interval and formulate a large-sample test by the Wald approach:

Large Sample Risk Ratio Test

Null Hypothesis
\(RR = RR_0\) usually taken to be 1.
Test Statistic
\(Z = \frac{\log(\widehat{RR})-\log{RR_0}}{\sqrt{\frac{n_1-X_1}{n_1X_1}+\frac{n_2-X_2}{n_2X_2}}}\)
Alternative Hypothesis Rejection Region for Level \(\alpha\) p-value
Upper Tail \(z > z_\alpha\) \(1-\Phi(z)\)
Two-tailed \(|z| > z_{\alpha/2}\) \(2\min\{\Phi(z), 1-\Phi(z)\}\)
Lower Tail \(z < -z_\alpha\) \(\Phi(z)\)
Confidence Interval at confidence level \(1-\alpha\)
\(\log{RR} \in \log\widehat{RR} \pm z_{\alpha/2}\sqrt{\frac{n_1-X_1}{n_1X_1}+\frac{n_2-X_2}{n_2X_2}}\)
\(\exp\left[\log\widehat{RR} \pm z_{\alpha/2}\sqrt{\frac{n_1-X_1}{n_1X_1}+\frac{n_2-X_2}{n_2X_2}}\right]\)

What about ratios instead of differences?

Chapter 10; Exercise 54

Example 1.3: Heart attack reported to be reduced by taking aspirin, based on a random controlled designed experiment. Data given was \(x\) and \(n\) - the rest is computed:

Code
n = np.array([11034, 11037])
x = np.array([189, 104])
p_hat = x/n
pqn = p_hat * (1-p_hat) / n
rrvar = ((n-x)/(n*x)).sum()
p_diff = p_hat[1]-p_hat[0]
RR_hat = p_hat[1]/p_hat[0]
se_diff = np.sqrt(pqn.sum())
se_RR = np.sqrt(rrvar)
z025 = scipy.stats.norm.isf(0.025)

print(f"""
Treatment | $n$ | $x$ | $\hat{{p}}$ | Intervals | $\hat{{p}}_A-\hat{{p}}_C$ | $\widehat{{RR}}$ | 95% CI | low | high
-|-|-|-|-|-|-|-|-|-
Control | {n[0]} | {x[0]} | {p_hat[0]:.4f} | Estimators | {p_diff:.4f} | {RR_hat:.4f} | RR | {np.exp(np.log(RR_hat) - z025*se_RR):.4f} | {np.exp(np.log(RR_hat) + z025*se_RR):.4f}
Aspirin | {n[1]} | {x[1]} | {p_hat[1]:.4f} | Std. Err.s | {se_diff:.4f} | {np.sqrt(rrvar):.4f} | Difference | {p_diff - z025*se_diff:.4f} | {p_diff + z025*se_diff:.4f}
""")
Treatment \(n\) \(x\) \(\hat{p}\) Intervals \(\hat{p}_A-\hat{p}_C\) \(\widehat{RR}\) 95% CI low high
Control 11034 189 0.0171 Estimators -0.0077 0.5501 RR 0.4337 0.6978
Aspirin 11037 104 0.0094 Std. Err.s 0.0015 0.1213 Difference -0.0107 -0.0047

Using the difference of proportions confidence interval we can say that there is a reduction between \(0.4%\) and \(1%\) in prevalence of heart attack with Aspirin.

Using the RR confidence interval, we can say that you are about twice as likely (between 43% more likely and 130% more likely) to have a heart attack without aspirin as with aspirin.

Odds

Probabilities may often be given as odds (also known as “odds for”):

\[ o = \frac{p}{1-p} \qquad\qquad p = \frac{o}{o+1} \]

Odds of \(1:1\) correspond to a probability of \(50%\); \(2:1\) is approximately \(67%\), etc.

Where probabilities are computed as \(\frac{\text{# favored outcomes}}{\text{# outcomes}}\), odds are computed as \(\frac{\text{# favored outcomes}}{\text{# unfavored outcomes}}\). They give an easy way to compute fair payouts in betting setups.

Odds Ratios

Commonly used especially in medical and epidemiological research is a statistic called the odds ratio (OR) to quantify the strength of association between two events. It is the ratio of conditional odds \(\frac{o(A | B)}{o(A | \text{not }B)}\).

The OR gets used a lot in lieu of the risk ratio, when one of the events is sufficiently rare (\(\PP(A) < 10\%\)) - because with this assumption, the odds ratio approximates the risk ratio.

  Treatment Control Margin
Event \(X_{11}\) \(X_{01}\) \(X_{*1}\)
Non-event \(X_{10}\) \(X_{00}\) \(X_{*0}\)
Margin \(X_{1*}\) \(X_{0*}\) N

With this notation, to compare the association of a treatment and an event, we might use the risk ratio: \[ \frac{\PP(\text{event}|\text{treatment})}{\PP(\text{event}|\text{control})} = \frac{X_{11}/X_{1*}}{X_{01}/X_{0*}} = \frac{X_{11}/(X_{11}+X_{10})}{X_{01}/(X_{01}+X_{00})} \]

Another option would be the odds ratio: \[ \frac{o(\text{event}|\text{treatment})}{o(\text{event}|\text{control})} = \frac{X_{11}/X_{10}}{X_{01}/X_{00}} \]

Under the rare disease assumption, we can expect \(X_{11}\) and \(X_{01}\) to be much smaller than \(X_{10}\) and \(X_{00}\), so that \[ \frac{X_{11}/(X_{11}+X_{10})}{X_{01}/(X_{01}+X_{00})} \approx \frac{X_{11}/X_{10}}{X_{01}/X_{00}} \]

Odds Ratios

While less conceptually accessible than the risk ratio, the odds ratio has important and useful theoretical properties:

Invertibility
If we switch from an event to its complement - from event-free survival to adverse risk - then the risk ratio does not simply invert, whereas the odds ratio does.
As an example, suppose there was a risk of \(4/100\) for adverse events in a drug treatment group, and of \(2/100\) in a control group. We get \(RR=\frac{4/100}{2/100}=2\) and \(OR=\frac{4/96}{2/98}\approx2.04166\).
Flipping to the complement however, we get \(RR=\frac{96/100}{98/100}\approx0.9796\), but \(OR=\frac{96/4}{98/2}\approx0.4898= 1/2.04166\).
Availability from stratified samples
It may be difficult to get reliable data on an entire population: for the risk ratio we need to know the total numbers of the treatment and control groups. For the odds ratio, we might be able compute with a sample instead of the entire population.
For an example, suppose a village with 1000 inhabitants was exposed to radiation, and we want to quantify whether a particular rare disease has increased as a result. We could have had the total number of exposed individuals \(X_{E*}=400\) out of which \(X_{ED}=20\) developed the disease and \(X_{EH}=380\) stayed healthy; unexposed individuals \(X_{N*}=600\) with \(X_{ND}=6\) developing the disease and \(X_{NH}=594\) staying healthy.
Risk ratio is \(\frac{X_{ED}/X_{E*}}{X_{ND}/X_{N*}}=\frac{20/400}{6/600} = 5\). Odds ratio is \(\frac{X_{ED}/X_{EH}}{X_{ND}/X_{NH}}=\frac{20/380}{6/594} \approx 5.2\).
But maybe we cannot count the entire population - but we can take samples. With a rare enough disease it might be difficult to randomly select people who caught the disease, so we structure sampling differently. Pick all diseased individuals you can find, and then sample an equal amount of healthy individuals. Say you are treating \(26\) diseased individuals, out of whom \(20\) had been exposed. You recruit randomly \(26\) healthy individuals and find out \(10\) of them had been exposed.
Risk ratio cannot be computed from the sample - we still need population sizes.
Odds ratio is \(\frac{X_{ED}/X_{EH}}{X_{ND}/X_{NH}} = \frac{20/10}{6/16} = 5.33\).

Odds Ratios - Inference

It turns out that the log odds ratio can be estimated with the sample log odds ratio \[ \begin{align*} \log OR &= \log \frac{p_{11}/p_{10}}{p_{01}/p_{00}} = \log p_{11} + \log p_{00} - \log p_{10} - \log p_{01} \\ L &= \log \frac{\hat p_{11}/\hat p_{10}}{\hat p_{01}/\hat p_{00}} = \log \frac{X_{11}/X_{10}}{X_{01}/X_{00}} = \log \frac{X_{11}X_{00}}{X_{01}X_{10}} \end{align*} \]

The sample log odds ratio is asymptotically approximately normal, unbiased, and with standard error \[ SE = \sqrt{\frac{1}{X_{11}}+\frac{1}{X_{01}}+\frac{1}{X_{10}}+\frac{1}{X_{00}}} \]

Using Wald’s construction we get a hypothesis test and confidence intervals from this information - where for a CI for the odds ratio, we need to exponentiate the bounds in the Wald CI.

Example

Using the radiation and disease example on the previous slide, we had \(X_{11}=20\), \(X_{10}=6\), \(X_{01}=10\), \(X_{00}=16\), which gives us

\[ \begin{align*} L &= \log 20 + \log 16 - \log 6 - \log 10 \approx 1.6740 & SE &= \sqrt{\frac{1}{20}+\frac{1}{6}+\frac{1}{10}+\frac{1}{16}} \approx 0.1387 \\ CI_L &= \left(L-z_{0.975}SE, L+z_{0.975}SE\right) \approx (1.4021, 1.9458) & CI_{OR} &= \exp[CI_L] \approx (4.0640,6.9990) \\ p &= 2\min\{CDF(L/SE), 1-CDF(L/SE)\} \approx 7.49\times 10^{-34} \end{align*} \]

Paired proportions: McNemar’s Test

Chapter 10; Exercise 55

With \(n\) experimental units, two measurements are performed (paired data), producing a contingency table:

Table 1: Two-way contingency for paired data
  After
Success
After
Failure
Marginals
Before Success \(X_{11}\) \(X_{10}\) \(X_{1*}\)
Before Failure \(X_{01}\) \(X_{00}\) \(X_{0*}\)
Marginals \(X_{*1}\) \(X_{*0}\) N
a. State the null and alternative in terms of \(p_{00}, \dots\)

Notice that the marginal probabilities tell us about total successes before and after the treatment. So the null hypothesis (no increase) would be \(H_0:p_{*1} - p_{1*} = 0\), whereas the alternative (increased support) would be \(H_a:p_{*1} - p_{1*} > 0\).

b. Construct an estimator for the difference

We can estimate the difference by using the marginal sums:

\[ \hat{p}_{*1} = \frac{X_{*1}}{N} = \frac{X_{11}+X_{01}}{N} \qquad \hat{p}_{1*} = \frac{X_{1*}}{N} = \frac{X_{11}+X_{10}}{N} \qquad \hat{\Delta}_p = \hat{p}_{*1}-\hat{p}_{1*} = \frac{X_{01} - X_{10}}{N} \]

\(\hat\Delta_p\) estimates the difference of the marginal probabilities.

Paired proportions: McNemar’s Test

Aside: multinomial covariance

While each entry behaves like a \(Binomial(N, p_{ij})\) variable, their interactions are easier to understand with the multinomial distribution - the vector \((X_{11}, X_{10}, X_{01}, X_{00})\) follows \(Multinomial(N, (p_{11}, p_{10}, p_{01}, p_{00}))\).

We can write the outcome as a sum of indicator functions \(X_i = \sum_k \boldsymbol{1}_i(k)\), and each indicator function is a Bernoulli-distributed variable.

Then: \[ \begin{align*} Cov(X_i, X_j) &= Cov\left(\sum_k\boldsymbol{1}_i(k), \sum_l\boldsymbol{1}_j(k)\right) \\ &= \sum_k\sum_l Cov(\boldsymbol{1}_i(k), \boldsymbol{1}_j(l)) \\ &= \sum_k\left[Cov(\boldsymbol{1}_i(k), \boldsymbol{1}_j(k)) + \sum_{l, l\neq k}\underbrace{Cov(\boldsymbol{1}_i(k),\boldsymbol{1}_j(l)}_{=0\text{ bc independent}}\right] \\ \text{assuming } i\neq j \quad &= \sum_k\left[ \underbrace{\EE[\boldsymbol{1}_i(k)\cdot\boldsymbol{1}_j(k)]}_{=0} - \EE[\boldsymbol{1}_i(k)]\cdot\EE[\boldsymbol{1}_j(k)] \right] \\ &= -\sum\EE[\boldsymbol{1}_i(k)]\cdot\EE[\boldsymbol{1}_j(k)] = -np_ip_j \end{align*} \]

Paired proportions: McNemar’s Test

Aside: variance of \(X_i-X_j\)

The variables \(X_i, X_j\) from a multinomial vector are not independent, so any calculation of variance involving both will need to involve covariance.

\[ \begin{align*} \VV[X_i-X_j] &= \VV[X_i] + \VV[X_j] + 2\cdot(-1)\cdot Cov(X_i, X_j) \\ &= np_i(1-p_i) + np_j(1-p_j) - 2\cdot(-np_ip_j) \\ &= \color{Teal}{np_i} \color{DarkMagenta}{- np_i^2} + \color{Teal}{np_j} \color{DarkMagenta}{- np_j^2 + 2np_ip_j} \\ &= \color{Teal}{np_i+np_j} \color{DarkMagenta}{-n(p_i-p_j)^2} \end{align*} \]

Paired proportions: McNemar’s Test

Chapter 10; Exercise 55

c. Use \((X_i-X_j)/n \overset\approx\sim \mathcal{N}(p_i-p_j, (p_i+p_j-(p_i-p_j)^2)/n)\) to construct test statistic

Our null hypothesis is equal probabilities, so \(p_{01} = p_{10} = p_x\)

\[ \hat\Delta_p = \frac{X_{01}-X_{10}}{N} \sim \mathcal{N}\left( p_{01}-p_{10}, \frac{p_{01}+p_{10}-(p_{01}-p_{10})^2}{N} \right) = \mathcal{N}(0, (p_{01}+p_{10})/N)\\ Z = \frac{\Delta_p}{\sqrt{\hat{p}/N}} = \frac{X_{01}/N - X_{10}/N}{\sqrt{(X_{01}+X_{10})/N}/\sqrt{N}} = \frac{X_{01}-X_{10}}{\sqrt{X_{01}+X_{10}}} \sim \mathcal{N}(0,1) \]

d. Concrete worked example
  After
Success
After
Failure
Marginals
Before Success \(350\) \(150\) \(\color{DarkMagenta}{500}\)
Before Failure \(200\) \(300\) \(\color{DarkMagenta}{500}\)
Marginals \(\color{DarkMagenta}{550}\) \(\color{DarkMagenta}{450}\) 1000

So we compute: \(Z = \frac{200-150}{\sqrt{200+150}} \approx 2.6726\), for a two-sided p-value of \(0.0075\)

Paired proportions: McNemar’s Test

Continuity Correction

Edwards (1948)1 suggested a continuity correction version of the test, where the test statistic is adjusted to \[ Z^2 = \frac{(|X_{01}-X_{10}|-1)^2}{X_{01}+X_{10}} \sim \chi^2(1) \]

Small Sample Sizes

The chi-squared or standard normal approximations assume large enough data, usually taken to mean \(X_{01}+X_{10}<25\). For small sample sizes, an exact binomial test can be used instead:

Out of the \(X_{01}+X_{10}\) pairs that changed group across the pairing, we could view - under the null hypothesis of no difference between before and after - the \(X_{01}\) value as \(X_{01}\sim Binomial(X_{01}+X_{10}, 1/2)\) and compute a two-tailed p-value by doubling the smaller tail as usual.

Two-Sample Tests for Variances

From the \(F\) distribution to inference

Theorem

Let \(X_1,\dots,X_{n_X}\sim\mathcal{N}(\mu_X,\sigma_X^2)\) iid and independently \(Y_1,\dots,Y_{n_Y}\sim\mathcal{N}(\mu_Y, \sigma_Y^2)\) iid.

Then \[ F = \frac{S_X^2/\sigma_X^2}{S_Y^2/\sigma_Y^2} \sim F(n_X-1, n_Y-1) \]

Proof

We know \(\frac{(n_X-1)S_X^2}{\sigma_X^2}\sim\chi^2(n_X-1)\) and \(\frac{(n_Y-1)S_Y^2}{\sigma_Y^2}\sim\chi^2(n_Y-1)\). Divide each variable by its respective degrees of freedom, and we get a ratio of independent rescaled chi-squared variables, hence an F-distributed variable:

\[ \frac {\frac{\color{Teal}{(n_X-1)}S_X^2}{\sigma_X^2}{\Large/}\color{Teal}{(n_X-1)}} {\frac{\color{DarkMagenta}{(n_Y-1)}S_Y^2}{\sigma_Y^2}{\Large/}\color{DarkMagenta}{(n_Y-1)}} = \frac{S_X^2/\sigma_X^2}{S_Y^2/\sigma_Y^2} \]

Important

That this ratio follows an F-distribution relies crucially on the samples being independent and normal. This distribution, and everything we derive from it are highly sensitive to departures from normality.

From the \(F\) distribution to inference

From this we get a pivot quantity as well as a test statistic. If we (under the null hypothesis) assume \(\sigma_X^2=\sigma_Y^2\), then they cancel each other out in the ratio.

The F test for equality of variances

Null Hypothesis
\(H_0:\sigma_X^2 = \sigma_Y^2\)
Test Statistic
\(f = s_X^2/s_Y^2\)
Alternative Hypothesis Rejection Region for Level \(\alpha\) p-value
Upper Tail \(f\geq F_{\alpha,n_X-1,n_Y-1}\) \(1-CDF_{F(n_X-1,n_Y-1)}(f)\)
Two-tailed \(f\geq F_{\alpha/2,n_X-1,n_Y-1}\) or \(f\leq F_{1-\alpha/2,n_X-1,n_Y-1}\) \(2\min\{CDF_{F(n_X-1,n_Y-1)}(f), 1-CDF_{F(n_X-1,n_Y-1)}(f)\}\)
Lower Tail \(f\leq F_{1-\alpha,n_X-1,n_Y-1}\) \(CDF_{F(n_X-1,n_Y-1)}(f)\)
Confidence Interval
\(\frac{\sigma_X^2}{\sigma_Y^2}\in\left(\frac{s_X^2}{s_Y^2}\cdot\frac{1}{F_{\alpha/2, n_X-1, n_Y-1}},\frac{s_X^2}{s_Y^2}\cdot\frac{1}{F_{1-\alpha/2, n_X-1, n_Y-1}}\right)\)
Compute Critical Values
R: qf(1-alpha, d1, d2)
Python: scipy.stats.f(d1,d2).isf(alpha)
Compute p-Values
R: pf(f, d1, d2) (lower), pf(f, d1, d2, lower.tail=FALSE) (upper)
Python: scipy.stats.f(d1,d2).cdf(f) (lower), scipy.stats.f(d1,d2).sf(f) (upper)

Here, we write \(F_{\alpha,d_1,d_2} = CDF_{F(d_1,d_2)}^{-1}(1-\alpha)\).