Categorical Data

Mikael Vejdemo-Johansson

Source code for these slides

Your Homework

13:4 - Justin Ramirez

Based on the question:

The null hypothesis is \(H_0: p_{10} = p_{20} = \dots = p_{80} = {1\over 8} = 0.125\).

\(H_0\) is rejected if \(x^2\) value is greater than or equal to the values of \(X^2\), \(\alpha=0.10\), \(df=k-1=7\), \(k=8\).

The chi-squared test with 7 df and \(\alpha=0.10\): \(X^2_{0.10, 7} \approx 12.017\) so \(H_0\) is rejected if \(X^2\geq 12.017\).

\(n=120\). The expected frequency is \(E_i=np_{i0} = 120\cdot 0.125 = 15\).

From the given frequency table, the observed frequencies are: \[ O_1 = 12, O_2 = 16, O_3 = 17, O_4 = 15, O_5 = 13, O_6 = 20, O_7 = 17, O_8 = 10 \]

The test statistic value is \[ \begin{align*} X^2 &= \sum_{i=1}^k{(O_i-E_i)^2\over E_i} = {(12-15)^2\over 15} + \dots + {(10-15)^2\over 15} \\ &= {9\over 15}+{1\over 15}+{4\over 15}+{0\over 15}+{4\over 15}+{25\over 15}+{4\over15}+{25\over 15} \\ &= {24\over 5} = 4.80 \end{align*} \]

Since \(4.80<X^2_{0.10,7}\), the null hypothesis was not rejected.

13:14 - Maxim Kleyer

a)

\[ \begin{align*} \PP(X=x) &= p^{x-1}\overbrace{q}^{1-p}, \quad x=1,2,\dots \\ \mathcal{L}(p) &= \prod_{i=1}^n p^{x_i-1}q = p^{\sum x_i-n}q^n \\ \ell(p) = \log\mathcal{L}(p)&= \left(\sum x_i-n\right)\log p+n\log(1-p) \\ {d\ell(p)\over dp} &= {\sum x_i-n\over p}-{n\over 1-p} \qquad \text{set }=0 \\ 0 &= {\left(\sum x_i-n\right)(1-p)-np\over p(1-p)} \\ 0 &= {\sum x_i - n - p\sum x_i + \color{DarkMagenta}{np - np}\over p(1-p)} \\ p\sum x_i &= \sum x_i-n \\ \hat p &= {\sum x_i-n\over \sum x_i} = {363-130\over 130} \approx .642 \end{align*} \]

13:14 - Maxim Kleyer

b) \(H_0:\) data does not fit the geometric distribution, \(H_a:\) data does fit geometric distribution. MVJ: This is exactly opposite the \(H_0\) and \(H_a\) we established for chi-squared tests.

  Observed E O-E \((O-E)^2\) \((O-E)^2\over E\)
1 48 46.34 1.46 2.1516 0.04380
2 31 29.87 1.13 1.2769 0.04275
3 20 19.18 0.82 0.6724 0.89002
4 9 12.31 -3.31 10.9561 0.45696
5 6 7.90 -1.90 3.6100 0.00097
6 5 5.07 -0.07 0.0049 0.0009664694
\(\geq 7\) 11 3.25 7.75 60.0625 18.48077

\(\chi^2 = \sum{(O-E)^2\over E} \approx 20.231\), \(\alpha=0.05\). \(df=k-1-m = 7-1-1 = 5\).

\(\chi^2_{0.05,5}\approx 11.07\)

\(20.231 > 11.07\), reject H_0.

13:24 - Nicholas Basile

Code
tbl.2w = matrix(c(409,512,921,11,4,15,22,14,36,7,11,18,277,220,497,726,761,1487), nrow=3)
tbl.2w %>% 
  kable %>%
  kableExtra::column_spec(6, background="#eeeeee") %>%
  kableExtra::row_spec(3, background="#eeeeee")
409 11 22 7 277 726
512 4 14 11 220 761
921 15 36 18 497 1487

\(E_i = {\text{row total}\cdot\text{column total}\over\text{total}}\):

Code
E.i = matrix(c(449.66,471.34,7.32,7.68,17.58,18.42,8.79,9.21,242.65,254.35), nrow=2) 
E.i %>%
  kable
449.66 7.32 17.58 8.79 242.65
471.34 7.68 18.42 9.21 254.35

\(\chi^2 = \sum\left(O_i-E_i\over \sqrt{E_i}\right)^2 \approx 23.13\)

Assuming \(\alpha=0.05\), with \(4\) df, p-value \(0.0001 < 0.05\).