Main Topics (Aka BIG topics)
1. Regression
2. Two-Way Tables
3. Sampling Distribution of X-bar
Side Topics (Small topics)
1. Statistic versus Parameter
2. Control Charts
3. Response versus Explanatory Variables
4. Probability
1. REGRESSION
Main Point: Regression is all about knowing your vocab, vocab, vocab. And how to apply it of course.
Regression is on Bi-variate, quantitative data (if you don't know what that means, I'd look it up!) The whole point is to find out if there is an association between your two variables.
a. Scatterplots
Scatterplots have Strength, Form and Direction. Understand what each of these means and how to recognize them.
Form: Linear/non linear
Direction: Negative/positive
Strength: Correlation coefficient, r
Correlation Coefficient: Be able to recognize false statements about r, be able to guestimate r based on a graph, and now how outliers affect r.
Things to remember about r:
-No units
-Only quantitative data
-Only linear, data
-between -1 and 1
-effected by outliers
r-squared (r^2): "The percent variation in y, explained by x" See previous posts for a good explanation on this. Be able to "interpret in context" (i.e. replace the highlighted words) as well as recognize this definition to know when they WANT r-squared.
TEST YOURSELF: What is the difference between r and r^2? What do they tell us?
Least Squared Regression Line: "Minimizing the sum of the squared residuals". Never hurts to know a definition. Basically, this is our best fit line that allows us to predict values based on the equation of the line. Which is:
y-hat = bx + a
Where y-hat is your predicted y, b=slope and a= y-intercept.
Realize you never have to come up with b and a by yourself, they will always be given in outputs. Practice knowing how to read these outputs!!
Slope: "Average change in y for every one unit increase in x". This is the big one! Know how to interpret in context (ie, exchanging out the highlighted words) and know how to recognize the definition when they are asking just for the number. (Example: "What is the average change in farm population for every year that passes? Do you see the "change in y" for a "unit increase" in x? They would just want the slope value, found from the output).
Realize that it may not be WORD FOR WORD of this definition (they can change up the order, for example, saying "for every one unit increase in x, what is the average change in y?" or for a practical example, "For every year that passes, what is the change in farm population on average?" If you KNOW the definition, you should be able to recognize it.
Summary: Know your definitions, because everything is given to you in the output. You merely need to recognize what they are asking. Don't panic. You know this stuff.
2. TWO-WAY TABLES
Main Point: This is about calculation. Two-way tables assess relationships (aka, association between variables) for bivariate categorical data.
Marginal Distributions
These deal with "overalls". They are exclusively in the margins. Don't leave the margins. Okay? Because they are distributions, expect more than one number.
Conditional Distributions
These are based on a condition, or rather, a specific. They are still distributions, but they are based on the "inside" numbers.
Conditional Values
These are a singular value based on two conditions. The first condition is the total that governs.
TEST YOURSELF
In case the above didn't make much sense, let's practice!
Study in the Library mainly
What is the marginal distribution of years?
What is the marginal distribution of people who study in the library?
What is the conditional distribution of sophomores?
What is the conditional distribution of people who DO study in the library?
For those who are seniors, how many study in the library?
For those who study in the library, how many are seniors?
Answers:
1. 0.1677, .3829, .2974, .1518
2. 0.48, .515
3. 0.5619, .438
4. 0.294, .444, .1503 .... (keep going)
5. 0.354
6. 0.111
Summary: Know your calculations.
3. Sampling Distributions of x-bar.
Main Point: This is probably the most concept-heavy portion of the second test, and thus, the thing people have the hardest time with. Know. These. Concepts. Understand them well, and you won't have a problem.
What is a sampling distribution of x-bar? It is the distribution of sample means from every possible sample of a particular size n.
Know the difference between a population graph and a sampling distribution of x-bar graph. (here's a hint to remind you: population = individual, S.D. of x-bar = sample mean).
Central Limit Theorem: For a non-normal population, if n>or= 30, then the sampling. distribution of x-bar is approximately normal. (as a side note: For an already normal population, if n> or = 30, then the sampling distribution of x-bar is exactly normal.)
Know how to use the formula associated with sampling distributions of x-bar (i.e. z=x-bar-mu/sigma/sqrt(n)). Remember, you use it in the same way as the population z-score. But practice anyways.
Things to know: The standard deviation of a S.D. is SIGMA/SQRT(n).
The mean of a sampling distribution of x-bar is ALWAYS. ALWAYS. ALWAYS = to mu. Always. Doesn't matter what n is. Always. Always.
TEST YOURSELF:What happens to the following graphs?
Let's start with a population of 15,000 that is severely left skewed.
If I take a sample of 800 people and graph it, what does it look like?
If I take all possible samples of size n=10 and graph it, what does it look like?
If I take all possible samples of size n=600 and graph it, what does it look like?
Answers:
1. The key here is that I only took ONE SAMPLE. Remember, the CLT only applies to sampling distributions of x-bar. So, we would expect it to look like the population.
2. This is a sampling distribution of x-bar, however, n<30 so CLT doesn't apply. This graph would be more normal, but cannot be considered normal because the CLT doesn't apply.
3. This is a sampling distribution of x-bar and n>30, so the CLT applies and we have an approximately normal distribution.
As a side note, don't forget about the law of large numbers, which only deals with samples (not sampling distributions of x-bar). Check out your notes on that one.
Summary: Again this is concept heavy. Of course, it does require some calculations in way of the formula, but if you were okay with these on the first exam, you should be okay now.
4. THE SMALL STUFF
Be sure to go over your definitions for:
statistic versus parameters (be able to tell the difference)
Control Charts (remember the equation for the upper and lower limits aren't on your equation sheet. Memorize them!)
Probability (know facts about the probability distribution and what it takes to have a proper one)
Response versus explanatory variables (be able to tell the difference).
And of course, brush up on knowing the different experimental designs!
GOOD LUCK!!
-Hillary