R has many functions for carrying out significance tests intervals, though none (built-in) for the case of unknown \(\mu\) but known \(\sigma\).
Here is one, borrowed (and simplified) from BDSA:
z.test <- function(x, mu = 0, sigma = NULL,
conf.level = 0.95, alternative="two.sided") {
choices <- c("two.sided", "greater", "less")
alt <- pmatch(alternative, choices)
alternative <- choices[alt]
dname <- deparse(substitute(x))
x <- x[!is.na(x)]
estimate <- mean(x)
SE <- sigma/sqrt(length(x))
zobs = (mean(x) - mu)/SE
names(mu) <- "mean"
names(zobs) = "z"
names(estimate) <- c("sample mean of x")
method <- c("One-sample z-Test")
if(alternative == "less") {
pval <- pnorm(zobs)
zstar = qnorm(conf.level)
MOE = zstar * SE
cint <- c(NA, (estimate-mu) + MOE)
} else if(alternative == "greater") {
pval <- 1 - pnorm(zobs)
zstar = qnorm(conf.level)
MOE = zstar*SE
cint <- c( (estimate-mu) - MOE, NA)
} else {
pval <- 2 * pnorm(- abs(zobs))
alpha <- 1 - conf.level
zstar <- qnorm((1 - alpha/2))
MOE = zstar * SE
cint <- (estimate-mu) + c(-MOE, MOE)
}
cint <- cint + mu
attr(cint, "conf.level") <- conf.level
rval <- list(statistic = zobs, p.value = pval, conf.int = cint,
estimate = estimate, null.value = mu, alternative =
alternative, method = method, data.name = dname )
attr(rval, "class") <- "htest"
return(rval)
}
You need to copy-and-paste this into your R session for it to work. (So go ahead, copy it over into your R Script.)
The observant student will note this is the exact same function used for finding confidence intervals under similar assumptions!
This function is used like the other R functions. It computes both the confidence interval and performs a test of significance. Here we focus on using it to carry out a test of significance.
Example: (6.17 of book) Water quality testing. The setup is water quality should have a lead level of no more than 15 pbb. Assuming, the amount of lead tested in a sample is random, but follows a normal distribution with mean \(\mu\) and variance \(\sigma = 0.25\), perform a one-sided test of significance that the sampled lead indicates a greater mean than \(\mu=15\) given a sample with values 15.84, 15.33, and 15.58.
We first specify the null and alternative. Here we have a one-sided alternative:
The test statistic will be \(Z = (\bar{x}-\mu)/(\sigma/\sqrt{n})\). It is important to know that under our assumptions, this has a standard normal distribution.
QUESTION: What assumption(s) ensures that \(Z\) will have a standard normal distribution?
Once that is confirmed, then the z.test
function will carry out the work. We need to specify:
mu=...
)alternative="greater"
, alternative="less"
, or alternative="two.sided"
)water_sample = c(15.84, 15.33, 15.58)
sigma = 0.25
z.test(water_sample, sigma=sigma, mu=15, alternative="greater")
##
## One-sample z-Test
##
## data: water_sample
## z = 4.0415, p-value = 2.656e-05
## alternative hypothesis: true mean is greater than 15
## 95 percent confidence interval:
## 15.34592 NA
## sample estimates:
## sample mean of x
## 15.58333
A \(p\)-value is returned.
QUESTION: What is the \(p\)-value?
QUESTION: Using $=0.10 is this difference (between the observed and expected values) statistically significant at the \(\alpha\) significance level?
Suppose the distribution of skittles in a bag has unknown \(\mu\), but known variance \(\sigma^2 = 10\). A student wishes to test if the average number of skittles is different from a value of \(20\) (a number presumably read on the internet). To investigate, they take a Halloween size bag of candy holding 10 bags of skittles and find these values:
19 14 23 17 15 21 24 18 16 20
QUESTION: The central limit theorem states that if \(n\) is large enough \(\bar{x}\) – and hence \(Z\) – will have a normal distribution. The value of \(n=10\) is usually large enough if the data is not too skewed or too long tailed. Make a graphic to investigate whether the sample inidicates the population is skewed and or long tailed. What do you conclude?
QUESTION: State the null and alternative hypotheses that match this example
QUESTION: Use z.test
to compute the \(p\)-value. Is the difference statistically significant at the \(\alpha=0.10\) significance level?
A researcher wants to know if students are having less fun. To get a sense, she uses a validated test that was widely distributed amongst students in 2015 and had an average value of \(82\) with a standard deviation of \(5\). The researcher gives the test to 8 students and gets these values back:
85 75 81 70 78 76 80 85
The researcher assumes the population is normally distributed, so for any size sample \(\bar{x}\) (and hence \(Z\)) will have a normal distribution. She assumes the population has mean \(\mu\), an unknown, but known standard deviation \(\sigma=5\).
QUESTION: What is null and one-sided alternative hypothesis for this example? (Are students having less fun than in 2015, where a large sample established that the average for this test then is 82.)
QUESTION: What \(p\) value did the researcher find, under her assumptions? Is the difference statistically significant at the \(\alpha=0.05\) significance level?
QUESTION: WERE a two-sided test computed, would she have found that the difference is statistically significant at the \(\alpha=0.05\) significance level?
Why the fuss to assume \(Z\) has a normal distribution. If it isn’t so, then the computed \(p\) value might be off and hence decisions might be made incorrectly. To illustrate (maybe), we use an exponential population with mean \(1\) and standard deviation \(1\). In a test with \(n=3\), if the normal distribution applies, we expect the observed value of \(Z\) to be greater than 1.96 only 2.5% of the time.
Let’s see if this is the case. Copy and paste this command into your R session:
M = 1
n = 3
mu = 1
sigma=1
replicate(M, {
xs = rexp(n)
SD = sigma/sqrt(n)
zobs = (mean(xs) - mu)/SD
zobs
})
## [1] -0.8236451
As written (with M=1
) this computes just 1 sample and from that 1 observed value of \(Z\).
QUESTION: Change to M=20
. What proportion of values are greater than \(1.96\)? (Count by hand.)
QUESTION: Change to M=10000
. Save the observed values as zobs
and use sum(zobs > 1.96)/M
to find the proportion greater. Does this suggest a probability of \(0.25\) for the event?