Stat 121 Fall 2011

Thursday, April 5, 2012

Final Exam Review

So this review will basically just cover the new material since Exam 3. If you want to see topics from Exam 3 and before, you can look at the three other exam reviews I have written for each exam.

However, after lab today a student said "I know everything we've learned all interconnects...but how do we keep it STRAIGHT?" It was a good question. So, the first part of this review will be an overall, comprehensive look at some things we have done this semester. You should be able to click on them to make the charts larger.

1. A list of all "tests" we have performed and why/when we perform them. (There is a document on blackboard similar to this. It tells the same information, mine is just in a different format. Use whichever one makes the most sense for you)

2. A list of all conditions for any test we have done.

3. A comprehensive list of symbols and their meanings.

Okay, on to new material.

New material mostly covers:
1. Proportions
2. Chi-Square
3. Tests of Significance on Slope (Regression)
4. (small topic) Can you tell which test to use?

1. Proportions

Proportions represent categorical variables, like survey questions. You should know:

a. How to compute proportions.

p-hat is just "number of successes" over the total sample size, or n.

Be able to use the test equations. BE CAREFUL on which "p" you are using- whether is is the "null p" or the "p-hat, sample p".
Seriously though, they love to test you on that. Especially true or false. For example, "For a one-sided confidence interval estimation, we would test normality by checking np>=10 and n(1-p)>=10. You said true, right? Well it's FALSE. For confidence intervals, we check p-hat, not p. Watch out.
Speaking of np checks, know the np checks for normality. In case you forgot, they are:

Know how to calculate two-sample proportions, namely knowing what just "p-hat" is (the pooled proportion p-hat).

b. General things

Be able to write parameters for proportions (homework is good for this).
Know facts about the sampling distribution of p-hat (HINT: It's a lot like facts about the sampling distribution of x-bar...)
Really anything asked in a four step process (be able to conclude, get a p-value, etc).

2. CHI-SQUARE

a. Computing chi-square

Remember chi-square is just on two-way tables.
We compute expected counts (the zorro method, row total X table total/Row total), and then the chi-square contributor for EACH cell
We sum up all of the chi-square contributors to get the chi-square test statistic. Using the new table, which works exactly like the t-table, we get a p-value.
Don't forget about conditions.
Chi-square is always testing to see if there is SOME association between the variables.
Remember the relationship between expected counts and observed counts.
You get degrees of freedom in a different way [(r-1)*(c-1)]
Don't talk about causation. Just don't do it. We need an experiment for causation.

3. REGRESSION / SLOPE ANALYSIS

a. Theory

The theory of tests of significance on regression is that SLOPE determines if there is an association or not. Thus we are testing if slope is zero versus slope is not zero (greater than/ less than / not equal to).
Be able to write/interpret a parameter in context.
"Slope is the average change in y for every one unit increase in x"
Change out what is in green, and you are on your merry way.
Be sure you know how to check conditions

b. Calculations

Remember, regression is really all based on the "output". Be sure you know how to read them. Then the equations become easy.
Don't be fooled: DF=n-2.
Conclude in a similar fashion as with every test.

c. Reminders

Don't forget other stuff you "used to" know about regression. Namely

How to write best-fit equations based on the output
how to plug in numbers to those equations
r
r^2 (and interpretation of it)

d. Confidence Intervals versus Prediction intervals

This is pretty simple stuff. Confidence intervals : are on means. Prediction intervals: Are on individuals
Which one is wider, and why?

4. WHICH TEST IS IT?

As it is the end of a semester, we have gone through a whole lot of different procedures. They are going to ask you questions about "which procedure are we using?" Use the chart I gave above about all the different procedures to help you with this one.

Above all, take these slow. Eliminate one thing at a time.

Example:

"Suzy wants to test which oven is best. In oven one, she bakes ten loaves of bread and times the average time it takes for them to bake. She then puts ten loaves in oven two, and calculates the average baking time for those. She discovers, with a p-value of 0.002, that oven two cooks faster"

Answers:
a. One-sample t- test for means
b. One-sample z-test for means
c. One-sample z-test for proportions
d.Two-sample t-test for means
e. Two-sample z-test for means
f. Two-sample z-test for proportions
g. Two-sample t-confidence interval for means
h. Two-sample z-confidence interval for proportions
g. ANOVA
e. Chi-Square
f. Regression

Whoa. There are a lot of choices. Ask yourself some questions.
1. Is this for means or proportions? She is calculating the average bake time. Means.
2. Is this z or t? No mention of sigma. T-test.
3. Is this a confidence interval or test? It mentions a p-value, so test.
4. Is this one-sample, two-sample or matched pairs? There are two ovens, so two different data samples collected. This is two-sample.

The correct answer is (d).

You'll most likely never actually have to go through all of those questions, but they are good types of questions to ask to narrow things down.

For help on other exam material, see my other reviews posted.

GOOD LUCK GUYS. YOU'LL DO GREAT.

Have a great life :)
-Hillary

Tuesday, April 3, 2012

Assignment 29

THE LAST *REAL* Assignment!! Assignment 30 is just an extra credit TA evaluation :) Congrats, you made it!

So...I want to do all of Assignment 29 in class. So, I'm not going to write about it here just yet. If we don't get to some questions in any of my classes and I think you need clarification, check back here after Thursday. I'll write something then.

Also, My Final Exam review will be coming soon.

-Hillary

Assignment 28

(I didn't go over the Chi-Square assignment since we did all but three problems in class =] )

Questions 1-3 should be a great review of regression. We will review this in class, but you should be able to get these pretty easily.

We will be doing questions 4-8 in class. We haven't gone over this concept yet.

Questions 8 until the end are GREAT final review questions. Basically, this is taking ALL of the tests we have done and making you decide which one we should be using!

Here's a quick reminder of what the three we are comparing here are:

Chi-Square: Compares multiple proportions. AKA, we need multiple categorical variables (more than two). This means we will always be using a two-way table.

Regression: Compares two quantitative variables. (Like weight versus height). We use scatterplots to recognize correlation in these (using best fit lines).

ANOVA: Compares multiple means. (like I want to know if a certain scent makes people stay in a restaurant longer, so I compare the mean time of three, like lemon, sage and lavender)

Hope that helps!
-Hillary

Wednesday, March 28, 2012

Assignment 26

Question 1-4 are about confidence intervals on p. These confidence intervals are exactly like confidence intervals for means! Just follow the equation.

Some things to note:
So the equation is:

p-hat +/- z* [ sqrt ((phat*(1-phat))/n)) ]

So we can see that this equation follows what we are used to. P-hat is like x-bar: the sample proportion. We know how to find z*- it hasn't changed! The next portion is just the standard error- just like we are used to!

Keep note though, I said it was the standard error. Can you tell the reason? It's because we are using p-hat, not Po because we do not have hypothesis.

The confidence interval conclusion has not changed. (BE CAREFUL: We aren't talking about the true mean any more! Remember the proper parameter- check question 1 or last assignment for help.)

Question 5: Here's a hint: Whenever the question is "Why can't we" it means CHECK CONDITIONS!!

Question 6: Check out the equation on the right hand side of the equation sheet, in the row right under the heading "proportions".

Questions 9-12: We will be doing questions 13-16 in class, so these questions will be easier to answer after that!

Good Luck!
-Hillary

Wednesday, March 21, 2012

Exam 3 Review

Exam 3 is here! Remember this exam typically has the lowest averages, so we typically say it is the most challenging. This may or may not be true for you specifically, but prepare well.

What makes it challenging? It is a LOT of interpretation. Most people are comfortable with all the calculations. We test you on your understanding of the concepts and definitions.

Basically, this means know your definitions. Know them in and out. Know how to interpret them and recognize them.

Main Topics:
1. Tests of Significance
2. Confidence Interval Estimations
a. For both 1&2, need to know t and z tests, four step process
3. ANOVA

Side Topics
1. Type I/Type II errors
2. What type of procedure is this?
3. Two-sided confidence intervals
4. Sample size
5. Symbols

1. Tests of Significance

Definitions to KNOW:

Test of significance: An outcome that is unlikely to happen if a claim is true is good evidence that the claim is not true. (This is the theory of a test of significance. Remember the coin example?)
p-value: The probability of getting an x-bar as extreme or more extreme if the null hypothesis is true. (KNOW THIS. You will need to be able to INTERPRET this as well. Meaning if I give you an actual situation, you could put numbers into the right locations. You can see previous posts for a more in depth explanation of this).
Parameter: The mean of what you are finding out about the population. Okay, so this isn't really a definition, but be comfortable writing parameters. (Remember we need the MEAN and the POPULATION).
Null/Alt Hypothesis: Null Hypothesis: Statement of no change. Alternative: What we want to prove.

Procedure

Obviously you need to be comfortable with every part of the four step process for a test of significance (for both t and z tests).

Write the parameter, null and alternative hypothesis and state the level of significance. I've talked about these previously, but make sure you can do them.

Conditions
For a Z-test the conditions are (and are met by):
1. Randomization: Met through SRS OR RAT.
2. Normality: Met through CLT OR graph displaying approximately normality.
3. Sigma is known: Yes or no. They give it to you or they don't.

For a t-test conditions are (and are met by):
1. Randomization: Met through SRS OR RAT.
2. Normality: Met through CLT OR graph displaying no extreme skewness or outliers.

For a z-test, we use the equation z=x-bar - mu/ (sigma/ sqrt(n)). This is called the test statistic. We then go to the z-table and get a value for the p-value.

Things to remember about how to find z-test p-values:

One-sided test with Ha: Mu<#: Read p-value directly off table.
One-sided test with Ha: Mu>#: 1-table value = P-value.
Two-sided test with x-bar < null hypothesis mu: 2*(table value)= p-value.
Two-sided test with x-bar > null hypothesis mu: 2*(1-table value)=p-value.

Don't take my word for it though...draw the picture :)

For a t-test, we use the equation t=x-bar-mu/( s/sqrt(n). This is called the t test statistic. We then go to the t-table and get a value for p-value.

We like the t table because it already accounts for if it's a one-sided or two sided or if it is greater than or less than. The basic process to find the p-value is as follows:

Take your t test-statistic
Find your degrees of freedom (df) (n-1)
Enter the table on your df row.
Find the two values that sandwich your t test stat.
Follow those two columns down to the bottom.
Decide if you have a one-sided or two-sided test
Read the two p-value values off
Say "P-value = Number on right < P-value

I have an example on a previous post.

Conclude

Compare p-value with alpha
Reject/Fail to Reject Null (p-valuealpha, fail to reject).
Conclude in context.

2. Confidence Interval Estimation

Definitions to KNOW:

What is a confidence interval: It is used to estimate the mean. Gives reasonable values for the mean, etc.
Confidence Level: If the procedure were repeated many times, confidence level is the amount of INTERVALS we would expect to contain the true mean. (This is an important one. Realize what confidence level is NOT: It is NOT how often our interval will contain mu, or x-bar or the percentage of time we are right).

Margin of Error: the amount we expect our mean (mu) to differ from our sample mean (x-bar).

Procedure

Write the parameter, choose confidence level. Conditions are the same as for test of hypothesis.

Z confidence interval

Use equation x-bar +/- z* (sigma/sqrt(n)).

Finding z*

Go to the t-table
Find the row with your confidence level in it (top of chart)
Follow it down to the third row from the bottom labeled "z*".

t confidence interval

Use equation x-bar +/- t* (s/sqrt(n))

Finding t*

Find degrees of freedom
Go to where your df and your confidence level intersect
That is your t*

Conclude

Use the cookie-cutter answer to conclude for confidence intervals.

"We are _______% confident that the true mean _____________ lies between (_____,_____)"

a. Difference between Z and t tests and four step process

We basically stepped through the four step process for both confidence intervals and tests of significance above. Wouldn't hurt to go over the different sections, though.

Remember, we use a z-test is Sigma (population standard deviation) is KNOW, we use a t-test if sigma is UNKNOWN.

For multiple choice, it's helpful to really remember if you are using a t and z test. Remember, a z-test will give you a one-number p-value. A t-test will give you a range of values. Keep this in mind for your answers.

It may help you to remember the differences between the distributions (we talked about this briefly in class, check your notes). For example, the t-distribution has more areas in the tail (less precise). As n increases, the t-distribution becomes more like the z-distribution in shape.

3. ANOVA (Analysis of Variance)

Anova is all about reading the output. Be sure you know how to:

Write Hypothesis
Find the p-value
check conditions
Conclude IN CONTEXT based on the confidence intervals

If you want to go through an example, you can see my post about Assignment 24.

SIDE TOPICS

1. Type I/ Type II Errors

You should know the definitions of these/ what they are. I can't really give you the graph on the blog, but check over your notes and be sure to understand/know where everything is on the graph.

Be able to know the connection between alpha and beta, and when we might make alpha big or small (relatively).

Type I Error: Rejecting a true null hypothesis

Type II Error: Failing to Reject a false null hypothesis

Power: Rejecting a false null hypothesis or accepting a true alternative hypothesis

Also be able to do something like this:

Ho: Hillary is not addicted to "drawsomething"

Ha: Hillary is addicted to "drawsomething"

If she is not addicted, she gets to keep the app. If she is addicted, the app gets deleted. Be able to write the errors and power for something like that.

2. What type of procedure is this?

Be able to recognize procedures (aka is this a "one sample t-test" or a "two sample z-test" etc).

Think of the chart we drew in class. You can choose one thing from each column.

One Sample | z | test (of significance)

Two Sample | t | confidence interval estimation

Matched Pairs | |

3. Two-sided confidence interval estimation

I caution to even write about this. This was a question on the homework we didn't have time to go over in class. This is when you can use a confidence interval to prove/disprove hypothesis.

BE AWARE: This can ONLY be used in a TWO-SIDED TEST. And it should ONLY be used when specifically asked for. You should never, ever conclude a confidence interval this way. Concluding confidence intervals means using the "cookie-cutter" answer. Only if they ask you to do this should you, regardless if it is a two-sided test or not.

Okay. Read that warning a few times. Make sure you understand.

Anyways, Let's say we have the following hypothesis:

Ho: Mu=15

Ha: Mu does not equal 15.

Since we have a two-sided test, we could use a confidence interval to decide whether to reject or fail to reject the null hypothesis.

For example, let's say we got the interval (20, 28). Because the purpose of a confidence interval is to estimate the true mean, what could we say in this case? Clearly, 15 is not in the interval. So we could reject the null in this case.

Let's say we got an interval of (13,20)? Well, because fifteen is IN the interval, there is not sufficient evidence that mu is NOT 15. Thus, we fail to reject the null.

4. Sample Size Estimation

This is super easy. It's just an equation. You did it once on the homework. The equation you use is on the second row, far right side, of your equation sheet.

The only thing you need to know is that you always round UP. No matter what. The reason is that this is solving for a specific number of people you need to sample in order to meet specifications given to you. So if, for example, your equation gave you:

n= 24.3 people.

If you use 24 people, you don't meet the requirements. So you must round up to 25.

5. Symbols

So symbols are on every exam. But a few of you brought them up in class, and since I like you guys, I'm going to do a section on them. Here's symbols you should know:

µ (possible answers: mean of the sampling dist. of x-bar, population mean)

σ (population standard deviation)

x-bar (sample mean)

σ/√n (standard deviation of the sampling distribution of x-bar)

s/√n (standard error of the sampling distribution of x-bar, or just standard error)

s (standard deviation of a sample)

This is not necessarily a comprehensive list, but it should help.

Finally, don't forget the written portion. Be sure you:

Go over assignment 21. It was there for a reason.
Know how to do the four step for both two-sample and matched pairs. (this includes things like, how does the parameter change? When do you do each one? What do you graph? What equations do you use? We went over all this in class).

Remember, this is not necessarily a comprehensive list, just things I think will help.

GOOD LUCK!!!

-Hillary

Monday, March 19, 2012

Assignment 24

We already went through a bulk of assignment 24, but I realize it was very rushed, so I want to go through an example that isn't on your homework (but will clearly help you do all of your homework.) It also may be a good idea to go through this example as an exam review if you have already finished the homework. Just try to answer the questions before looking at my answers.

Problem (in honor of this Thursday nights event):

The Capitol wants to know what the average survival time in hours of the tributes from each of the twelve districts will be for the 74th hunger games. Sample data was collected from a random sample of 20 hunger games. Data can be considered to be normally distributed.

1. What type of study is this?

Observational. The hunger games (the past 20 sampled) have already happened. We are observing. (Although the hunger games themselves are certainly not an observational study...)

2. What are the hypothesis of this test?

Ho: µ1=µ2=µ3=µ4=µ5=µ6=µ7=µ8=µ9=µ10=µ11=µ12

Ha: Not all means are the same.

Below is the ANOVA output.

3. Is the normality condition met?

Yes, stated in the problem.

4. Is the randomization condition met?

Yes, stated in the problem. (could be met through graphing as well).

5. Is the equal variance condition met?

The largest Standard Deviation is: 28. The Smallest standard deviation is: 20

28/20=1.4 < 2. This condition is met.

6. How does p-value compare to alpha?

We can find the p-value on the chart. It is 0. Because we have 95% confidence intervals, that gives us an alpha=0.05. Thus, 0<0.05, so we are significant. Thus, one of the means does differ.

7. What does this mean in context? (In other words, what do the confidence intervals tell us? or What means differ? or Are some districts likely to last longer? All of these ways are reasonable ways to ask the same question).

We can see that districts 3-11 have confidence intervals that significantly overlap. This shows that their true means are unlikely to differ by much. However, we can see that districts one and two's confidence intervals are much larger, showing that on average they last much longer. District 12's confidence interval shows us that it differs by them lasting much less time in the arena.

May the odds ever be in your favor...(can you tell I'm excited?)

-Hillary

Assignment 23

Sorry this is late. Wrote about it but it must have not saved...

For the first few questions, think about what each part of the procedure does for you.

You can have a:

One sample: (one population, one group of data)

Two Sample: (two populations, a treatment and no treatment, two sets of data)

Matched Pairs: (two sets of data, yet both are on the same individuals).

AND

t-test: sigma unknown

z-test: sigma known

For questions 3-12, we are doing a two sample test because we have men and women scores for the same thing.

Remember, the parameter differs from a one sample. What is the word we need in there that makes it different? HINT: It starts with a "D".

Remember this question is a great way to prepare for the exam written part.

Good Luck!

-Hillary