Review for Test 2

Test 2 will be November 20th, and cover the following sections from the book: 4.1, 4.2, 4.3, 4.4; 5.2, 5.3; 6.1, 6.2; and 7.1.

Here are some sample questions. There are many more potential questions that could appear on the test.

Question

An experiment consists of flipping a coin and tossing one die. Describe the sample space; what outcomes are in the event ``a heads and even roll’’. What is the probability of this event? What assumptions did you make to compute the probability?

ANS: There are 12 outcomes: (H,1), (H,2), …, (H,6), (T,1), …, (T,6). The event consists of (H,2), (H,4), and (H,6). This has probability 3/12 if outcomes are equally likely, which will be the case if the two are fair and independent.

Question

In computing the normal, we use \(P(Z > z) = 1 - P(Z \leq z)\). What ``rule’’ allows us to do this? Compute \(P(Z > 2.34)\).

ANS: Oops– usee \(2.34\). The complement rule. In R, we have:

1 - pnorm(2.34)
## [1] 0.00964187

Question

If we toss two coins (a penny and a nickel); is the result on the penny disjoint from the result on the nickel? If not, what is their relationship (using probability).

ANS: The two are independent, and not disjoint.

Question

Suppose we have a probability distribution:

k     1    2    3   4
-------------------------
P(k) .43  .25  .17  x

What is x? What is the mean of the probability distribution?

ANS: We have:

x = 1 - (.43 + .25 + .17)
x
## [1] 0.15

And the mean is

1 * .43 + 2 * .25 + 3*.17 + 4*x
## [1] 2.04

Question

Let \(X\) be a random variable with distribution given by:

k       0    1    2    3    4
-----------------------------
P(X=k) .10  .10  .20   x   .30

Find x. Compute \(\mu\). Compute \(\sigma^2\),

ANS: As before, all probabilities must sum to 1:

x = 1 - (.1 +  .1 +  .2 +  .3)
x
## [1] 0.3

The mean is:

mu =  0*.1 +  1*.1 +2*.2+3*x + 4*.3
mu
## [1] 2.6

And the variance is

sigma2 =  (0-mu)^2*.1 +  (1-mu)^2*.1 +(2-mu)^2*.2+(3-mu)^2*x + (4-mu)^2*.3
sigma2
## [1] 1.64

Question

Let \(X\) be a continuous random variable with density given by \(f(x) = 1/2\) for \(0 \leq x \leq 2\) and \(0\) otherwise. What is \(P(1/2 < X < 7/8)\)? Using symmetry, what is \(\mu\)?

ANS: The area is \(1/2 \cdot (7/8 - 1/2)\). By symmetry, the mean is at 1.

Question

Let \(X\) have a distribution:

k         -1   0   1
---------------------
P(X=k)    1/4 1/2 1/4

Compute \(\mu\) and \(\sigma\) for this random variable.

ANS: we have \(\mu\) is \(0 = -1\cdot 1/4 + 0\cdot 1/2 + 1\cdot 1/4\). The value of \(\sigma^2\) us \(1/2 = (-1)^2\cdot 1/4 + (1)^2\cdot 1/4\). So \(\sigma = \sqrt{1/2}\).

Suppose, we have \(n\) independent random variables, \(X_1\), \(X_2\), … with this same distribution. Let \(S = X_1 + X_2 + \dots + X_n\). What is \(\mu_S\), \(\sigma_S\)?

ANS: \(\mu_S = n\cdot \mu = 0\); \(\sigma^2_S = n\sigma^2 = n/2\), so \(\sigma_S = \sqrt{n/2}\).

Question

The law of large numbers states if \(\mu\) is the population mean and \(X_1, X_2, \dots X_n\) is an i.i.d sample from this population, the \(\bar{x}\) approaches the mean of \(\mu\) as closely as you have specified and then stays that close.

Suppose you toss a fair coin 100 times and the you have 30 heads and 70 tails. Does the law of large numbers say in the next 100 coin tosses you will have 70 heads and 30 tails?

ANS: The law of large numbers does not imply things will immediately even out, only eventually.

Question

At the beginning of the term, we used the terms ‘shape’, ‘center’, and ‘spread’ to describe a distribution. For large \(n\), describe the distribution of \(\bar{x}_n\) using these terms. Be as specific as you can be. Did you make any assumptions? How do you know you are right?

ANS: The center is \(\mu_{\bar{x}} = \mu\), the spread is \(\sigma_{\bar{x}} = \sigma/\sqrt{n}\); finally if \(n\) is large enough or the population is normal, the shape will be bell shaped.

Question

In Chapter 3, a principle of experimental design is ``Repeat’’ (each treatment on many units to reduce chance variation in the results). Give an example where you have a formula that demonstrates this.

ANS: The standard deviation – which measures variation – gets smaller by a factor \(1/\sqrt{n}\).

Question

A population has mean \(\mu{}\), standard deviation \(\sigma{}\) and median \(M\).

Let \(X_1\), \(X_2\), …, \(X_n\) be an i.i.d. sample of size \(n\) from this population. Assume \(n\) is ``large enough’’ for the central limit to apply.

Which is true: \(P(X_1 > \mu) = 1/2\) or \(P(X_1 > M) = 1/2\)?

ANS: The latter – for a single value, the median splits the area in half.

Which is true \(P(\bar{x}_n > \mu = 1/2)\) or \(P(\bar{x}_n > M) = 1/2\)?

ANS: For large \(n\), the shape is symmetric – if the central limit theorem applies – so the former is true.

Which is true \(\mu_{\bar{x}} = \mu{}\) or \(\mu_{\bar{x}} = \mu/\sqrt{n}\)?

ANS: the former

Which is true \(\sigma_{\bar{x}} = \sigma{}\) or \(\sigma_{\bar{x}} = \sigma/\sqrt{n}\)? (or both?)

ANS: the latter.

Question

A population has mean \(\mu=0\), standard deviation \(\sigma=6.74\) and \(X_1\), \(X_2\), , \(X_n\) is an i.i.d sample with \(n\) large enough for the central limit theorem to apply.

How large must \(n\) be so that \(P(-1 < \bar{x} < 1) = 1/2\)? (What must the \(z\)-score of \(1\) be?)

ANS: We use \(z\) scores two ways. First, we have

\[ P(-1 < \bar{x} < 1) = P(\frac{-1 - \mu}{\sigma/\sqrt{n}} < Z < \frac{1 - \mu}{\sigma/\sqrt{n}}) = P(-\sqrt{n}/\sigma < Z < \sqrt{n}/\sigma). \]

But we can find \(z^*\) with \(P(-z^* < Z < z^*)=1/2\) using a table, or using qnorm:

zstar  = qnorm(0.5 +  0.25)  # add in left tail too, as in class
zstar
## [1] 0.6744898

Solving \(\sqrt{n}/\sigma = z^*\), we have:

sigma = 6.74
n = (sigma * zstar)^2
n
## [1] 20.66667

Question

Assume the mean number of skittles in a standard size bag is 15 and the standard deviation is 2. Assume the population of skittles in a bag is ``bell shaped’’ (whatever that means for a discrete distribution).

What is the probability that there are \(21\) or more skittles in a given bag?

ANS: Here \(\sigma =2\), so we get:

mu = 15
sigma = 2
1 - pnorm((21 - mu)/sigma)
## [1] 0.001349898

Now consider 10 bags. What is the probability the average number of skittles in the bags is 18 or more?

ANS: what changese now is \(n=10\):

n=10
1 - pnorm((21 - mu)/(sigma/sqrt(n)))
## [1] 0

What, \(0\)? This is because the \(z\) score is so small:

(21 - mu)/(sigma/sqrt(n))
## [1] 9.486833

Now consider 100 bags. What is the probability the average number of skittles in the bags is 16 or more?

ANS:

WIth \(n=100\), we now get:

n = 100
1 - pnorm((21 - mu)/(sigma/sqrt(n)))
## [1] 0

Question

The binomial distribution is used to count the number of ``successes’’ in a fixed number (\(n\)) of trials. What scenarios below will it apply to:

  • Toss a coin times, let \(X\) be the number of heads

ANS: Yes \(n=100\), \(p=1/2\)

  • Roll a die 100 times, let \(X\) be the number of twos.

ANS: Yes, \(n=100\), \(p=1/6\).

  • Survey 100 Americans at random (with replacement), let \(X\) be the number who have been to Staten Island

ANS: Yes \(n=100\), but here \(p\) is unknown

  • Ask your friends if they can drive you home. Let \(X\) be the number of friends you ask until you find a ride.

ANS: No. A binomial needs a fixed number of trials.

Question

Let \(p\) be a population proportion and \(\hat{p}\) be a sample proportion from a SRS of size \(n\) drawn from the large population.

Let \(n=123\) and \(p = 1/23\). What is \(\mu_{\hat{p}}\)? What is \(\sigma_{\hat{p}}\)?

ANS: use \(\mu=p\) and \(\sigma = \sqrt{p(1-p)/n}\):

n = 123
p = 1/23
mu = p
sigma = sqrt(p*(1-p)/n)
c(mu=mu, sigma=sigma)
##         mu      sigma 
## 0.04347826 0.01838785

Is \(np > 10\)?

ANS: We would need both \(np\) and \(n(1-p)\) to be bigger than 10:

c(n*p, n*(1-p))
## [1]   5.347826 117.652174

Use the normal approximation to compute \(P(\hat{p} > 0.10)\).

ANS: We have using the above values, this normal probability, which is an approximation:

1 - pnorm((0.10 - mu)/sigma)
## [1] 0.001056531

Question

Toss a fair coin 250 times. What is the probability you have 150 or more heads? (Use the normal approximation to give an answer).

ANS: As before

n = 250
p = 1/2
mu = n*p
sigma = sqrt(n*p*(1-p))
1 -pnorm((150 - mu)/sigma)
## [1] 0.0007827011

Question

A survey of 1000 people answering yes and no is taken. If the population proportion of ``yes’’ answerers is assumed to be \(0.60\), what is the probability that 62% or more in the survey will answer yes? (Use the normal approximation to give an answer).

ANS: We use \(\hat{p}\) here, not a binomial so the formulas are adjusted

n = 1000
p = 0.60
mu = p
sigma = sqrt(p*(1-p)/n)
1 - pnorm((0.62 - p)/sigma)
## [1] 0.0983528

Question

Mary finds a 90% confidence interval for some \(\mu\) by taking a sample and John finds his own 90% confidence interval for the same \(\mu\) by taking a survey. It turns out they did not overlap on any values. Can they both have computed their values properly? Explain.

ANS: Yes, a given CI is not guaranteed to contain \(\mu\) – just the process is likely to produce an interval that will

Question

How much sleep do CSI students get, on average? To investigate, a student researcher interviewed 12 students at random and found this summary data:

xbar  s   n
-----------
 6.7 1.2  12
  • If the student assumes the population standard deviation is \(\sigma=1\), find the \(90\%\) confidence interval based on this data.

  • If the student does not assume the population standard deviation is \(\sigma=1\), but does assume the data is from a normal population, find the \(90\%\) confidence interval based on this data.

ANS:

We have two MOE’s to compute:

First, with a normal assumption

xbar = 6.7
s = 1.2
n = 12
sigma = 1

zstar = qnorm(0.9 + 0.05)
MOE = zstar * sigma/sqrt(n)
c(xbar - MOE, xbar + MOE)
## [1] 6.225172 7.174828

If we don’t assume \(\sigma\) is known, then we use \(t^*\):

tstar = qt(0.9 + 0.05, df = n-1)
MOE = tstar * s/sqrt(n)
c(xbar - MOE, xbar + MOE)
## [1] 6.077887 7.322113

Question

A researcher knows that a population of interest is normally distributed with unknown mean \(\mu\), but known standard deviation, \(\sigma=10\). To estimate \(\mu\), she will take a random sample. How large a random sample is needed so that a 90% confidence interval has a margin of error of \(1\). Repeat with a 99% confidence interval.

ANS:

We need to solve \(z^* \sigma/\sqrt{n} = 1\), or \(n = (z^* \sigma)^2\). For the two answers we have:

sigma = 10
zstar = qnorm(0.9 + 0.05)
(zstar * sigma)^2
## [1] 270.5543

And

sigma = 10
alpha = 0.99
zstar = qnorm(alpha + (1-alpha)/2)
(zstar * sigma)^2
## [1] 663.4897

Question

A CSI student research wants to know if students get less sleep around final exam period. From earlier work, she is confident that the population mean for amount of sleep in normal class periods is 7 hours with a population standard deviation of 1.25. During finals, she also assumes a normal distribution of sleep times, and she takes a survey of 25 students, randomly selected, and computes this sample data:

xbar   s   n
-------------
6.8   1.1  25
  • Write out a null and alternative hypothesis corresponding to the question of whether students get less sleep.

ANS: \(H_0: \mu = 7\), \(H_a: \mu < 7\)

  • Assuming \(\sigma=1.25\) applies to finals week sleep times, what test statistic would you use for a significance test? What would be its sampling distribution under the null hypothesis?

ANS: The \(Z\), which has a normal distribution if \(n\) is large enough or the population is normal

  • Assuming \(\sigma=1.25\) does not apply to finals week sleep times, what test statistic would you use for a significance test? What would be its sampling distribution under the null hypothesis?

ANS: The \(T\), which has a \(t\) distribution with \(n-1=24\) degrees of freedom if the population is normally distributed

  • Assuming \(\sigma=1.25\) applies to finals week sleep times, is the computed \(p\)-value less than \(\alpha=0.05\)?

ANS: We find the observed value this way:

mu = 7
sigma = 1.25
n = 25
xbar = 6.8
s = 1.1
Zobs = (xbar - mu)/(sigma/sqrt(n))
Zobs
## [1] -0.8

We can find a \(p\) value with:

pnorm(Zobs)  ## left side is what we have in H_a

Or simply compare this to:

alpha = 0.05
qnorm(alpha/2)
## [1] -1.959964
  • Assuming \(\sigma=1.25\) does not apply to finals week sleep times, is the computed \(p\)-value less than \(\alpha=0.05\)?

ANS: Here we need to compute the SE not the SD:

SE = s/sqrt(n)
Tobs = (xbar - mu)/SE

The \(p\) value is computable with R:

pt(Tobs, df=n-1)
## [1] 0.1861708

Or, we could find the \(t^*\) corresponding to \(\alpha\) and compare:

qt(alpha/2, df=n-1)

In both cases, the difference is not statistically significant.

  • Were you a student researcher, would you think assuming \(\sigma\) is the same for finals period and the regular class period to be a reasonable or unreasonable assumption?

ANS: Likely it is not

Question

ANS: THis is a matched sample problem. Wait…. it will appear on the final, but not this exam.

A student researcher wants to know if men and women play computer games the same amount of time per week on average. She has limited manpower and knows data may be quite variable, so she constructs a matched sample experiment, where she takes college aged men and women who are dating and finds their respective times. The data she finds is given here:

Partnership  1    2   3   4   5   6 |  xbar  s     n
----------------------------------------------------
Male         3    13  12  0   4  25 |   9.5  9.18  6
Female       0     9   9  5   5   4 |   5.3  3.38  6
----------------------------------------------------
MF           3     4   3 -5  -1  21 |   4.2  8.91  6
  • The sample average for males is more than that of females, but is the difference statistically significant. Construct a one-sided test that the difference of means is greater than \(0\) (\(\mu_{male}\) \(>\) \(\mu_{female}\)). That is what is \(H_0\) and \(H_a\)?

  • What test statistic will you use, what is its sampling distribution (assuming normal populations, which isn’t really such a good idea).

  • What \(p\)-value was found?