Test 2 will be November 20th, and cover the following sections from the book: 4.1, 4.2, 4.3, 4.4; 5.2, 5.3; 6.1, 6.2; and 7.1.
Here are some sample questions. There are many more potential questions that could appear on the test.
An experiment consists of flipping a coin and tossing one die. Describe the sample space; what outcomes are in the event ``a heads and even roll’’. What is the probability of this event? What assumptions did you make to compute the probability?
ANS: There are 12 outcomes: (H,1), (H,2), …, (H,6), (T,1), …, (T,6). The event consists of (H,2), (H,4), and (H,6). This has probability 3/12 if outcomes are equally likely, which will be the case if the two are fair and independent.
In computing the normal, we use \(P(Z > z) = 1 - P(Z \leq z)\). What ``rule’’ allows us to do this? Compute \(P(Z > 2.34)\).
ANS: Oops– usee \(2.34\). The complement rule. In R, we have:
1 - pnorm(2.34)
## [1] 0.00964187
If we toss two coins (a penny and a nickel); is the result on the penny disjoint from the result on the nickel? If not, what is their relationship (using probability).
ANS: The two are independent, and not disjoint.
Suppose we have a probability distribution:
k 1 2 3 4
-------------------------
P(k) .43 .25 .17 x
What is x
? What is the mean of the probability distribution?
ANS: We have:
x = 1 - (.43 + .25 + .17)
x
## [1] 0.15
And the mean is
1 * .43 + 2 * .25 + 3*.17 + 4*x
## [1] 2.04
Let \(X\) be a random variable with distribution given by:
k 0 1 2 3 4
-----------------------------
P(X=k) .10 .10 .20 x .30
Find x
. Compute \(\mu\). Compute \(\sigma^2\),
ANS: As before, all probabilities must sum to 1:
x = 1 - (.1 + .1 + .2 + .3)
x
## [1] 0.3
The mean is:
mu = 0*.1 + 1*.1 +2*.2+3*x + 4*.3
mu
## [1] 2.6
And the variance is
sigma2 = (0-mu)^2*.1 + (1-mu)^2*.1 +(2-mu)^2*.2+(3-mu)^2*x + (4-mu)^2*.3
sigma2
## [1] 1.64
Let \(X\) be a continuous random variable with density given by \(f(x) = 1/2\) for \(0 \leq x \leq 2\) and \(0\) otherwise. What is \(P(1/2 < X < 7/8)\)? Using symmetry, what is \(\mu\)?
ANS: The area is \(1/2 \cdot (7/8 - 1/2)\). By symmetry, the mean is at 1.
Let \(X\) have a distribution:
k -1 0 1
---------------------
P(X=k) 1/4 1/2 1/4
Compute \(\mu\) and \(\sigma\) for this random variable.
ANS: we have \(\mu\) is \(0 = -1\cdot 1/4 + 0\cdot 1/2 + 1\cdot 1/4\). The value of \(\sigma^2\) us \(1/2 = (-1)^2\cdot 1/4 + (1)^2\cdot 1/4\). So \(\sigma = \sqrt{1/2}\).
Suppose, we have \(n\) independent random variables, \(X_1\), \(X_2\), … with this same distribution. Let \(S = X_1 + X_2 + \dots + X_n\). What is \(\mu_S\), \(\sigma_S\)?
ANS: \(\mu_S = n\cdot \mu = 0\); \(\sigma^2_S = n\sigma^2 = n/2\), so \(\sigma_S = \sqrt{n/2}\).
The law of large numbers states if \(\mu\) is the population mean and \(X_1, X_2, \dots X_n\) is an i.i.d sample from this population, the \(\bar{x}\) approaches the mean of \(\mu\) as closely as you have specified and then stays that close.
Suppose you toss a fair coin 100 times and the you have 30 heads and 70 tails. Does the law of large numbers say in the next 100 coin tosses you will have 70 heads and 30 tails?
ANS: The law of large numbers does not imply things will immediately even out, only eventually.
At the beginning of the term, we used the terms ‘shape’, ‘center’, and ‘spread’ to describe a distribution. For large \(n\), describe the distribution of \(\bar{x}_n\) using these terms. Be as specific as you can be. Did you make any assumptions? How do you know you are right?
ANS: The center is \(\mu_{\bar{x}} = \mu\), the spread is \(\sigma_{\bar{x}} = \sigma/\sqrt{n}\); finally if \(n\) is large enough or the population is normal, the shape will be bell shaped.
In Chapter 3, a principle of experimental design is ``Repeat’’ (each treatment on many units to reduce chance variation in the results). Give an example where you have a formula that demonstrates this.
ANS: The standard deviation – which measures variation – gets smaller by a factor \(1/\sqrt{n}\).
A population has mean \(\mu{}\), standard deviation \(\sigma{}\) and median \(M\).
Let \(X_1\), \(X_2\), …, \(X_n\) be an i.i.d. sample of size \(n\) from this population. Assume \(n\) is ``large enough’’ for the central limit to apply.
Which is true: \(P(X_1 > \mu) = 1/2\) or \(P(X_1 > M) = 1/2\)?
ANS: The latter – for a single value, the median splits the area in half.
Which is true \(P(\bar{x}_n > \mu = 1/2)\) or \(P(\bar{x}_n > M) = 1/2\)?
ANS: For large \(n\), the shape is symmetric – if the central limit theorem applies – so the former is true.
Which is true \(\mu_{\bar{x}} = \mu{}\) or \(\mu_{\bar{x}} = \mu/\sqrt{n}\)?
ANS: the former
Which is true \(\sigma_{\bar{x}} = \sigma{}\) or \(\sigma_{\bar{x}} = \sigma/\sqrt{n}\)? (or both?)
ANS: the latter.
A population has mean \(\mu=0\), standard deviation \(\sigma=6.74\) and \(X_1\), \(X_2\), , \(X_n\) is an i.i.d sample with \(n\) large enough for the central limit theorem to apply.
How large must \(n\) be so that \(P(-1 < \bar{x} < 1) = 1/2\)? (What must the \(z\)-score of \(1\) be?)
ANS: We use \(z\) scores two ways. First, we have
\[ P(-1 < \bar{x} < 1) = P(\frac{-1 - \mu}{\sigma/\sqrt{n}} < Z < \frac{1 - \mu}{\sigma/\sqrt{n}}) = P(-\sqrt{n}/\sigma < Z < \sqrt{n}/\sigma). \]
But we can find \(z^*\) with \(P(-z^* < Z < z^*)=1/2\) using a table, or using qnorm
:
zstar = qnorm(0.5 + 0.25) # add in left tail too, as in class
zstar
## [1] 0.6744898
Solving \(\sqrt{n}/\sigma = z^*\), we have:
sigma = 6.74
n = (sigma * zstar)^2
n
## [1] 20.66667
Assume the mean number of skittles in a standard size bag is 15 and the standard deviation is 2. Assume the population of skittles in a bag is ``bell shaped’’ (whatever that means for a discrete distribution).
What is the probability that there are \(21\) or more skittles in a given bag?
ANS: Here \(\sigma =2\), so we get:
mu = 15
sigma = 2
1 - pnorm((21 - mu)/sigma)
## [1] 0.001349898
Now consider 10 bags. What is the probability the average number of skittles in the bags is 18 or more?
ANS: what changese now is \(n=10\):
n=10
1 - pnorm((21 - mu)/(sigma/sqrt(n)))
## [1] 0
What, \(0\)? This is because the \(z\) score is so small:
(21 - mu)/(sigma/sqrt(n))
## [1] 9.486833
Now consider 100 bags. What is the probability the average number of skittles in the bags is 16 or more?
ANS:
WIth \(n=100\), we now get:
n = 100
1 - pnorm((21 - mu)/(sigma/sqrt(n)))
## [1] 0
The binomial distribution is used to count the number of ``successes’’ in a fixed number (\(n\)) of trials. What scenarios below will it apply to:
ANS: Yes \(n=100\), \(p=1/2\)
ANS: Yes, \(n=100\), \(p=1/6\).
ANS: Yes \(n=100\), but here \(p\) is unknown
ANS: No. A binomial needs a fixed number of trials.
Let \(p\) be a population proportion and \(\hat{p}\) be a sample proportion from a SRS of size \(n\) drawn from the large population.
Let \(n=123\) and \(p = 1/23\). What is \(\mu_{\hat{p}}\)? What is \(\sigma_{\hat{p}}\)?
ANS: use \(\mu=p\) and \(\sigma = \sqrt{p(1-p)/n}\):
n = 123
p = 1/23
mu = p
sigma = sqrt(p*(1-p)/n)
c(mu=mu, sigma=sigma)
## mu sigma
## 0.04347826 0.01838785
Is \(np > 10\)?
ANS: We would need both \(np\) and \(n(1-p)\) to be bigger than 10:
c(n*p, n*(1-p))
## [1] 5.347826 117.652174
Use the normal approximation to compute \(P(\hat{p} > 0.10)\).
ANS: We have using the above values, this normal probability, which is an approximation:
1 - pnorm((0.10 - mu)/sigma)
## [1] 0.001056531
Toss a fair coin 250 times. What is the probability you have 150 or more heads? (Use the normal approximation to give an answer).
ANS: As before
n = 250
p = 1/2
mu = n*p
sigma = sqrt(n*p*(1-p))
1 -pnorm((150 - mu)/sigma)
## [1] 0.0007827011
A survey of 1000 people answering yes and no is taken. If the population proportion of ``yes’’ answerers is assumed to be \(0.60\), what is the probability that 62% or more in the survey will answer yes? (Use the normal approximation to give an answer).
ANS: We use \(\hat{p}\) here, not a binomial so the formulas are adjusted
n = 1000
p = 0.60
mu = p
sigma = sqrt(p*(1-p)/n)
1 - pnorm((0.62 - p)/sigma)
## [1] 0.0983528
Mary finds a 90% confidence interval for some \(\mu\) by taking a sample and John finds his own 90% confidence interval for the same \(\mu\) by taking a survey. It turns out they did not overlap on any values. Can they both have computed their values properly? Explain.
ANS: Yes, a given CI is not guaranteed to contain \(\mu\) – just the process is likely to produce an interval that will
How much sleep do CSI students get, on average? To investigate, a student researcher interviewed 12 students at random and found this summary data:
xbar s n
-----------
6.7 1.2 12
If the student assumes the population standard deviation is \(\sigma=1\), find the \(90\%\) confidence interval based on this data.
If the student does not assume the population standard deviation is \(\sigma=1\), but does assume the data is from a normal population, find the \(90\%\) confidence interval based on this data.
ANS:
We have two MOE’s to compute:
First, with a normal assumption
xbar = 6.7
s = 1.2
n = 12
sigma = 1
zstar = qnorm(0.9 + 0.05)
MOE = zstar * sigma/sqrt(n)
c(xbar - MOE, xbar + MOE)
## [1] 6.225172 7.174828
If we don’t assume \(\sigma\) is known, then we use \(t^*\):
tstar = qt(0.9 + 0.05, df = n-1)
MOE = tstar * s/sqrt(n)
c(xbar - MOE, xbar + MOE)
## [1] 6.077887 7.322113
A researcher knows that a population of interest is normally distributed with unknown mean \(\mu\), but known standard deviation, \(\sigma=10\). To estimate \(\mu\), she will take a random sample. How large a random sample is needed so that a 90% confidence interval has a margin of error of \(1\). Repeat with a 99% confidence interval.
ANS:
We need to solve \(z^* \sigma/\sqrt{n} = 1\), or \(n = (z^* \sigma)^2\). For the two answers we have:
sigma = 10
zstar = qnorm(0.9 + 0.05)
(zstar * sigma)^2
## [1] 270.5543
And
sigma = 10
alpha = 0.99
zstar = qnorm(alpha + (1-alpha)/2)
(zstar * sigma)^2
## [1] 663.4897
A CSI student research wants to know if students get less sleep around final exam period. From earlier work, she is confident that the population mean for amount of sleep in normal class periods is 7 hours with a population standard deviation of 1.25. During finals, she also assumes a normal distribution of sleep times, and she takes a survey of 25 students, randomly selected, and computes this sample data:
xbar s n
-------------
6.8 1.1 25
ANS: \(H_0: \mu = 7\), \(H_a: \mu < 7\)
ANS: The \(Z\), which has a normal distribution if \(n\) is large enough or the population is normal
ANS: The \(T\), which has a \(t\) distribution with \(n-1=24\) degrees of freedom if the population is normally distributed
ANS: We find the observed value this way:
mu = 7
sigma = 1.25
n = 25
xbar = 6.8
s = 1.1
Zobs = (xbar - mu)/(sigma/sqrt(n))
Zobs
## [1] -0.8
We can find a \(p\) value with:
pnorm(Zobs) ## left side is what we have in H_a
Or simply compare this to:
alpha = 0.05
qnorm(alpha/2)
## [1] -1.959964
ANS: Here we need to compute the SE not the SD:
SE = s/sqrt(n)
Tobs = (xbar - mu)/SE
The \(p\) value is computable with R:
pt(Tobs, df=n-1)
## [1] 0.1861708
Or, we could find the \(t^*\) corresponding to \(\alpha\) and compare:
qt(alpha/2, df=n-1)
In both cases, the difference is not statistically significant.
ANS: Likely it is not
ANS: THis is a matched sample problem. Wait…. it will appear on the final, but not this exam.
A student researcher wants to know if men and women play computer games the same amount of time per week on average. She has limited manpower and knows data may be quite variable, so she constructs a matched sample experiment, where she takes college aged men and women who are dating and finds their respective times. The data she finds is given here:
Partnership 1 2 3 4 5 6 | xbar s n
----------------------------------------------------
Male 3 13 12 0 4 25 | 9.5 9.18 6
Female 0 9 9 5 5 4 | 5.3 3.38 6
----------------------------------------------------
MF 3 4 3 -5 -1 21 | 4.2 8.91 6
The sample average for males is more than that of females, but is the difference statistically significant. Construct a one-sided test that the difference of means is greater than \(0\) (\(\mu_{male}\) \(>\) \(\mu_{female}\)). That is what is \(H_0\) and \(H_a\)?
What test statistic will you use, what is its sampling distribution (assuming normal populations, which isn’t really such a good idea).
What \(p\)-value was found?