Some vocabulary from chapter 1 on data, surveys and experiments.
Chapter 1.
The Population – The collection of objects or items of interest in a statistical study.
The Sample – the subset of the population that is used to study characteristics of the entire population.
The sample consists of the sampled population, the goal is to infer characteristics about the target population. If we take every member of the population then we no longer have a sample, but instead a census.
Experimental Unit – the individual items in the population that are studies in the sample.
The experimental unit is the smallest entity in the study
For each experimental unit, variables, or characteristics that can be measured are defined.
An observation is a value of the variable for a given experimental unit.
Note: The experimental units are all different in a study, the variables are all the same, for each experimental unit an observation is recorded for each variable. These are often recorded in a spreadsheet or table format. The columns are headed by variable names, the rows record observations for each experimental unit or subject.
The Distribution of a variable specifies all the possible values of a variable and how likely these values are to occur. It illustrates the pattern or variation of the data.
Some questions: Give an example survey and identify various of these. Why is a census not a great idea in all cases (cost, time, can destroy the population, may not have access to all the population, can be inaccurate!/
Section 2 – Sources of Data
Collecting Data Some key steps
What are objectives
Choose the variables to measure
What is the appropriate design for producing the data
Collect the data
As to design, their are two broad categories:
experiments – try to detect a cause-and-effect relationship between variables. The experimental units would be different trials, and we might try to see if one variable affects the values of another.
surveys – we just want to collect data rather than influence the data. In fact the opposite is true, we want to be unobtrusive as possible to get as accurate a snapshot as possible.
Sampling: We take a subset of the population and perform a survey. This can be done in various ways
representative sample we try to match our sample to the overall population (similar proportions). Generally a good idea.
Bias – a systematic under or over counting.
Simple Random Sample. This is a sample where each experimental unit is chosen at random. All samples of the same size have the same chance of being chosen. (Exchangeable)
Section 3. Are the data good. What are some questions we can have if we find a data set.
Biasness: can we trust the creators and describers of the data?
How was the data collected?
Do the results seem reasonable?
Are the results useful, relevant, recorded properly.