WHAT IS STATISTICS?
o The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling
o The subject of statistics can be divided into descriptive statistics - describing data, and inferential Statistics - drawing conclusions from data (Source: dictionary.com)
WHY SHOULD WE STUDY STATISTICS?
Descriptive Statistics : To describe a phenomenon
o Summary and presentation of data
Inferential Statistics: To draw conclusions
o Making statements or predictions about the population based on statistical information
POPULATION & SAMPLE
POPULATION: is the group of all objects or individuals of interest.
o All York Students
o Canadians
SAMPLE: is a subset of the population
o 40 York students chosen at random
o People interviewed for the latest election poll
o We refer to the individual components of a sample as "observations"
PARAMETERS AND STATISTICS
Very generally we can say that:
o Populations are described by PARAMETERS
o Samples are described by STATISTICS
For example:
Parameter: the average hair length of all domestic cats (reflects the true value for the population)
Statistic: the average hair length of cats in my sample (it's an estimate)
Statistical inference: is the process of drawing a conclusion about the population based on the sample (with certain levels of confidence and significance)
FINAL DEFINITIONS
A variable is a characteristic of a population or sample.
o student grades, height, income, etc.
Variables have values
o student marks (0..100)
Data are the observed values of a variable.
o student marks: {67, 74, 71, 83, 93, 55, 48}
ATTAINING THE DATA
We have a phenomenon of interest and we would like to collect data to study it further
o We can directly collect the data: this is called PRIMARY DATA.
o We can use data collected by others (e.g. Statistics Canada; market research companies; etc.): this is called SECONDARY DATA
o
HOW DO WE COLLECT PRIMARY DATA?
1. By observations
2. By experiment
3. By survey
The difference is generally in the amount of control exercised by the researcher and the strength of the inference that can be made
DECISIONS INVOLVED IN SAMPLING
Sample Population
o From which population do we sample?
o Why is this important? What do we have to consider?
Sample Size
o How large should the sample be?
Sampling Method
o How should we pick the sample out of the population?
SAMPLE SIZE DEPENDS ON
o The size of the population
The sample size will INCREASE with the population size
o The variation in the population
The sample size will INCREASE with the variation
o The amount of error that can be tolerated
The sample size will DECREASE with the accepted error
o The amount of resources available
The sample size will INCREASE with resources
HOW TO CREATE THE SAMPLE
There are several statistical sampling methods you can use:
1. Simple Random Sample
2. Stratified Random Sample
3. Cluster Sample
SIMPLE RANDOM SAMPLE (SRS)
Each subject is equally likely to be chosen
o Like raffles, drawing from a hat, etc.
o Subject choice is determined by random numbers
STRATIFIED RANDOM SAMPLE
The population is divided into mutually exclusive subgroups called strata
o i.e. age, gender, home type
Within strata, the sampling is random (simple)
Advantages: Assures the sample has the same structure as the population
Inferences can also be made about the subcategories
CLUSTER SAMPLING
The population is divided into groups, called clusters
Geographical regions, classrooms in a school
Each clusters ideally has the same characteristics as the population
We use simple random sampling to select only a few clusters
We then use either simple random or stratified sampling within each cluster
SAMPLING ERRORS
A sampling error refers to the difference between the sample statistic and the population parameter
Example: survey shows 51% of students work when in fact only 50.42% work
We will learn how to deal with this error in later classes
NON-SAMPLING ERRORS
A non-sampling Error refers to errors in data acquisition Inaccuracies & mistakes; less-than-truthful responses
Non-response Bias: only people with a certain agenda respond to the survey
Selection bias: sampling problems
Thanks guys!
SEEK0HELP0HERE 7 months ago