Some vocabulary from chapter 1 on data, surveys and experiments.

Chapter 1.

The Population – The collection of objects or items of interest in a statistical study.
The Sample – the subset of the population that is used to study characteristics of the entire population.
- The sample consists of the sampled population, the goal is to infer characteristics about the target population. If we take every member of the population then we no longer have a sample, but instead a census.
Experimental Unit – the individual items in the population that are studies in the sample.
- The experimental unit is the smallest entity in the study
- For each experimental unit, variables, or characteristics that can be measured are defined.
- An observation is a value of the variable for a given experimental unit.
  
  Note: The experimental units are all different in a study, the variables are all the same, for each experimental unit an observation is recorded for each variable. These are often recorded in a spreadsheet or table format. The columns are headed by variable names, the rows record observations for each experimental unit or subject.
The Distribution of a variable specifies all the possible values of a variable and how likely these values are to occur. It illustrates the pattern or variation of the data.
Some questions: Give an example survey and identify various of these. Why is a census not a great idea in all cases (cost, time, can destroy the population, may not have access to all the population, can be inaccurate!/

Section 2 – Sources of Data

Collecting Data Some key steps
- What are objectives
- Choose the variables to measure
- What is the appropriate design for producing the data
- Collect the data
As to design, their are two broad categories:
- experiments – try to detect a cause-and-effect relationship between variables. The experimental units would be different trials, and we might try to see if one variable affects the values of another.
- surveys – we just want to collect data rather than influence the data. In fact the opposite is true, we want to be unobtrusive as possible to get as accurate a snapshot as possible.
Sampling: We take a subset of the population and perform a survey. This can be done in various ways
- representative sample we try to match our sample to the overall population (similar proportions). Generally a good idea.
- Bias – a systematic under or over counting.
- Simple Random Sample. This is a sample where each experimental unit is chosen at random. All samples of the same size have the same chance of being chosen. (Exchangeable)

Section 3. Are the data good. What are some questions we can have if we find a data set.

Biasness: can we trust the creators and describers of the data?
How was the data collected?
Do the results seem reasonable?
Are the results useful, relevant, recorded properly.

Section 4. More on surveys and their design. Here are keys to be aware of:

What is the sampled population to be, what is the target population. Does our sampled one give us similar characteristics as target?
How do we contact? phone, in person, on the web, radio call in, asking friends, etc.
How many people respond. (Israeli polls. Many undecided)
The wording and presentation of a question.
The timing of an interview.
How large is the sample. Is larger better? When?
What is the technique of sampling. Some are
- Convenience sampling: A journalist calls his buddies from college to “take the pulse”
- self-selected sample – biases
- systematic – all people with last letter a “c”
- stratified. Break into chunks and then randomly sample. Our chunks are proportional to known information.
Some experimental ideas
- With subjects, we have treatments and observations. Both are variables, the former we control, the latter we measure.
- A confounding variable is one that is unknown, but may be the cause of the effect. eg. in observational studies we can’t get full control so a statement such as “smoking causes cancer” can be refuted (weakly) by saying the accompanying personality or lifestyle is the true cause.
  - We can avoid some confounding variables such as bias and the placebo effect by randomizing and assigning into two blocks a control group and an experimental group. This can be double-blind (good), matched pairs (so both groups are similar).