Wednesday, March 28, 2012

Assignment 26

Question 1-4 are about confidence intervals on p. These confidence intervals are exactly like confidence intervals for means! Just follow the equation.

Some things to note:
So the equation is:

p-hat +/- z* [ sqrt ((phat*(1-phat))/n)) ]

So we can see that this equation follows what we are used to. P-hat is like x-bar: the sample proportion. We know how to find z*- it hasn't changed! The next portion is just the standard error- just like we are used to!

Keep note though, I said it was the standard error. Can you tell the reason? It's because we are using p-hat, not Po because we do not have hypothesis.

The confidence interval conclusion has not changed. (BE CAREFUL: We aren't talking about the true mean any more! Remember the proper parameter- check question 1 or last assignment for help.)

Question 5: Here's a hint: Whenever the question is "Why can't we" it means CHECK CONDITIONS!!

Question 6: Check out the equation on the right hand side of the equation sheet, in the row right under the heading "proportions".

Questions 9-12: We will be doing questions 13-16 in class, so these questions will be easier to answer after that!

Good Luck!
-Hillary

Wednesday, March 21, 2012

Exam 3 Review

Exam 3 is here! Remember this exam typically has the lowest averages, so we typically say it is the most challenging. This may or may not be true for you specifically, but prepare well.

What makes it challenging? It is a LOT of interpretation. Most people are comfortable with all the calculations. We test you on your understanding of the concepts and definitions.

Basically, this means know your definitions. Know them in and out. Know how to interpret them and recognize them.

Main Topics:
1. Tests of Significance
2. Confidence Interval Estimations
a. For both 1&2, need to know t and z tests, four step process
3. ANOVA

Side Topics
1. Type I/Type II errors
2. What type of procedure is this?
3. Two-sided confidence intervals
4. Sample size
5. Symbols


1. Tests of Significance

Definitions to KNOW:
  • Test of significance: An outcome that is unlikely to happen if a claim is true is good evidence that the claim is not true. (This is the theory of a test of significance. Remember the coin example?)

  • p-value: The probability of getting an x-bar as extreme or more extreme if the null hypothesis is true. (KNOW THIS. You will need to be able to INTERPRET this as well. Meaning if I give you an actual situation, you could put numbers into the right locations. You can see previous posts for a more in depth explanation of this).

  • Parameter: The mean of what you are finding out about the population. Okay, so this isn't really a definition, but be comfortable writing parameters. (Remember we need the MEAN and the POPULATION).

  • Null/Alt Hypothesis: Null Hypothesis: Statement of no change. Alternative: What we want to prove.
Procedure

Obviously you need to be comfortable with every part of the four step process for a test of significance (for both t and z tests).

Write the parameter, null and alternative hypothesis and state the level of significance. I've talked about these previously, but make sure you can do them.

Conditions
For a Z-test the conditions are (and are met by):
1. Randomization: Met through SRS OR RAT.
2. Normality: Met through CLT OR graph displaying approximately normality.
3. Sigma is known: Yes or no. They give it to you or they don't.

For a t-test conditions are (and are met by):
1. Randomization: Met through SRS OR RAT.
2. Normality: Met through CLT OR graph displaying no extreme skewness or outliers.

For a z-test, we use the equation z=x-bar - mu/ (sigma/ sqrt(n)). This is called the test statistic. We then go to the z-table and get a value for the p-value.

Things to remember about how to find z-test p-values:
  • One-sided test with Ha: Mu<#: Read p-value directly off table.
  • One-sided test with Ha: Mu>#: 1-table value = P-value.
  • Two-sided test with x-bar < null hypothesis mu: 2*(table value)= p-value.
  • Two-sided test with x-bar > null hypothesis mu: 2*(1-table value)=p-value.
Don't take my word for it though...draw the picture :)

For a t-test, we use the equation t=x-bar-mu/( s/sqrt(n). This is called the t test statistic. We then go to the t-table and get a value for p-value.

We like the t table because it already accounts for if it's a one-sided or two sided or if it is greater than or less than. The basic process to find the p-value is as follows:
  1. Take your t test-statistic
  2. Find your degrees of freedom (df) (n-1)
  3. Enter the table on your df row.
  4. Find the two values that sandwich your t test stat.
  5. Follow those two columns down to the bottom.
  6. Decide if you have a one-sided or two-sided test
  7. Read the two p-value values off
  8. Say "P-value = Number on right < P-value
I have an example on a previous post.

Conclude
  1. Compare p-value with alpha
  2. Reject/Fail to Reject Null (p-valuealpha, fail to reject).
  3. Conclude in context.


2. Confidence Interval Estimation

Definitions to KNOW:

  • What is a confidence interval: It is used to estimate the mean. Gives reasonable values for the mean, etc.

  • Confidence Level: If the procedure were repeated many times, confidence level is the amount of INTERVALS we would expect to contain the true mean. (This is an important one. Realize what confidence level is NOT: It is NOT how often our interval will contain mu, or x-bar or the percentage of time we are right).
  • Margin of Error: the amount we expect our mean (mu) to differ from our sample mean (x-bar).

Procedure

Write the parameter, choose confidence level. Conditions are the same as for test of hypothesis.

Z confidence interval

Use equation x-bar +/- z* (sigma/sqrt(n)).

Finding z*

  1. Go to the t-table
  2. Find the row with your confidence level in it (top of chart)
  3. Follow it down to the third row from the bottom labeled "z*".

t confidence interval

Use equation x-bar +/- t* (s/sqrt(n))

Finding t*

  1. Find degrees of freedom
  2. Go to where your df and your confidence level intersect
  3. That is your t*

Conclude

Use the cookie-cutter answer to conclude for confidence intervals.

"We are _______% confident that the true mean _____________ lies between (_____,_____)"


a. Difference between Z and t tests and four step process

We basically stepped through the four step process for both confidence intervals and tests of significance above. Wouldn't hurt to go over the different sections, though.

Remember, we use a z-test is Sigma (population standard deviation) is KNOW, we use a t-test if sigma is UNKNOWN.

For multiple choice, it's helpful to really remember if you are using a t and z test. Remember, a z-test will give you a one-number p-value. A t-test will give you a range of values. Keep this in mind for your answers.

It may help you to remember the differences between the distributions (we talked about this briefly in class, check your notes). For example, the t-distribution has more areas in the tail (less precise). As n increases, the t-distribution becomes more like the z-distribution in shape.


3. ANOVA (Analysis of Variance)

Anova is all about reading the output. Be sure you know how to:

  • Write Hypothesis
  • Find the p-value
  • check conditions
  • Conclude IN CONTEXT based on the confidence intervals
If you want to go through an example, you can see my post about Assignment 24.

SIDE TOPICS

1. Type I/ Type II Errors

You should know the definitions of these/ what they are. I can't really give you the graph on the blog, but check over your notes and be sure to understand/know where everything is on the graph.

Be able to know the connection between alpha and beta, and when we might make alpha big or small (relatively).

Type I Error: Rejecting a true null hypothesis
Type II Error: Failing to Reject a false null hypothesis
Power: Rejecting a false null hypothesis or accepting a true alternative hypothesis

Also be able to do something like this:
Ho: Hillary is not addicted to "drawsomething"
Ha: Hillary is addicted to "drawsomething"

If she is not addicted, she gets to keep the app. If she is addicted, the app gets deleted. Be able to write the errors and power for something like that.

2. What type of procedure is this?

Be able to recognize procedures (aka is this a "one sample t-test" or a "two sample z-test" etc).

Think of the chart we drew in class. You can choose one thing from each column.

One Sample | z | test (of significance)
Two Sample | t | confidence interval estimation
Matched Pairs | |


3. Two-sided confidence interval estimation

I caution to even write about this. This was a question on the homework we didn't have time to go over in class. This is when you can use a confidence interval to prove/disprove hypothesis.

BE AWARE: This can ONLY be used in a TWO-SIDED TEST. And it should ONLY be used when specifically asked for. You should never, ever conclude a confidence interval this way. Concluding confidence intervals means using the "cookie-cutter" answer. Only if they ask you to do this should you, regardless if it is a two-sided test or not.

Okay. Read that warning a few times. Make sure you understand.

Anyways, Let's say we have the following hypothesis:
Ho: Mu=15
Ha: Mu does not equal 15.

Since we have a two-sided test, we could use a confidence interval to decide whether to reject or fail to reject the null hypothesis.

For example, let's say we got the interval (20, 28). Because the purpose of a confidence interval is to estimate the true mean, what could we say in this case? Clearly, 15 is not in the interval. So we could reject the null in this case.

Let's say we got an interval of (13,20)? Well, because fifteen is IN the interval, there is not sufficient evidence that mu is NOT 15. Thus, we fail to reject the null.

4. Sample Size Estimation

This is super easy. It's just an equation. You did it once on the homework. The equation you use is on the second row, far right side, of your equation sheet.

The only thing you need to know is that you always round UP. No matter what. The reason is that this is solving for a specific number of people you need to sample in order to meet specifications given to you. So if, for example, your equation gave you:

n= 24.3 people.

If you use 24 people, you don't meet the requirements. So you must round up to 25.

5. Symbols

So symbols are on every exam. But a few of you brought them up in class, and since I like you guys, I'm going to do a section on them. Here's symbols you should know:

µ (possible answers: mean of the sampling dist. of x-bar, population mean)
σ (population standard deviation)
x-bar (sample mean)
σ/√n (standard deviation of the sampling distribution of x-bar)
s/√n (standard error of the sampling distribution of x-bar, or just standard error)
s (standard deviation of a sample)

This is not necessarily a comprehensive list, but it should help.


Finally, don't forget the written portion. Be sure you:
  • Go over assignment 21. It was there for a reason.
  • Know how to do the four step for both two-sample and matched pairs. (this includes things like, how does the parameter change? When do you do each one? What do you graph? What equations do you use? We went over all this in class).

Remember, this is not necessarily a comprehensive list, just things I think will help.

GOOD LUCK!!!
-Hillary

Monday, March 19, 2012

Assignment 24

We already went through a bulk of assignment 24, but I realize it was very rushed, so I want to go through an example that isn't on your homework (but will clearly help you do all of your homework.) It also may be a good idea to go through this example as an exam review if you have already finished the homework. Just try to answer the questions before looking at my answers.
Problem (in honor of this Thursday nights event):
The Capitol wants to know what the average survival time in hours of the tributes from each of the twelve districts will be for the 74th hunger games. Sample data was collected from a random sample of 20 hunger games. Data can be considered to be normally distributed.

1. What type of study is this?
Observational. The hunger games (the past 20 sampled) have already happened. We are observing. (Although the hunger games themselves are certainly not an observational study...)

2. What are the hypothesis of this test?
Ho: µ1=µ2=µ3=µ4=µ5=µ6=µ7=µ8=µ9=µ10=µ11=µ12
Ha: Not all means are the same.

Below is the ANOVA output.


3. Is the normality condition met?
Yes, stated in the problem.

4. Is the randomization condition met?
Yes, stated in the problem. (could be met through graphing as well).

5. Is the equal variance condition met?
The largest Standard Deviation is: 28. The Smallest standard deviation is: 20

28/20=1.4 < 2. This condition is met.

6. How does p-value compare to alpha?
We can find the p-value on the chart. It is 0. Because we have 95% confidence intervals, that gives us an alpha=0.05. Thus, 0<0.05, so we are significant. Thus, one of the means does differ.

7. What does this mean in context? (In other words, what do the confidence intervals tell us? or What means differ? or Are some districts likely to last longer? All of these ways are reasonable ways to ask the same question).
We can see that districts 3-11 have confidence intervals that significantly overlap. This shows that their true means are unlikely to differ by much. However, we can see that districts one and two's confidence intervals are much larger, showing that on average they last much longer. District 12's confidence interval shows us that it differs by them lasting much less time in the arena.

May the odds ever be in your favor...(can you tell I'm excited?)
-Hillary

Assignment 23

Sorry this is late. Wrote about it but it must have not saved...

For the first few questions, think about what each part of the procedure does for you.

You can have a:

One sample: (one population, one group of data)
Two Sample: (two populations, a treatment and no treatment, two sets of data)
Matched Pairs: (two sets of data, yet both are on the same individuals).

AND

t-test: sigma unknown
z-test: sigma known

For questions 3-12, we are doing a two sample test because we have men and women scores for the same thing.

Remember, the parameter differs from a one sample. What is the word we need in there that makes it different? HINT: It starts with a "D".

Remember this question is a great way to prepare for the exam written part.

Good Luck!
-Hillary

Thursday, March 15, 2012

Assignment 22

Way to go today! I realize it was long and boring, but you guys made it through it! Great news though, we are back on schedule now which means more homework in class! But onto assignment 22.

Something I forgot to mention in class is that we call sigma/sqrt(n) the standard deviation of x-bar.

Well s/sqrt (n) = standard error.

Questions 2-5 are just a simple one-sample t-test, so that should be simple enough.

Questions 9-15: What kind of procedure is this?

In this case, they didn't necessarily give you the differences, but you can clearly see that each student has TWO measurements: two thighs were hit with tennis balls. (And of course, I gave you this example in class, so that might have helped :) )

Be careful on calculating the t-test statistic. Remember we only care about the differences. You need to use statcrunch to do this.

Good Luck!

Sunday, March 11, 2012

Assignment 21 + t-test Practice

We talked about this assignment 21 a lot on Thursday, so I don't feel the need to go over it a whole lot. We answered all of the "Plan" section, and even one of the answers to the "Plan" section of part B.

A common mistake on this one is the "list and state how the conditions are met". This means you must STATE the condition (like Normality:) then after the colon, state how it is met for this particular problem.

Remember how the conditions change slightly for a t-test.

On part B, be careful with the t-test confidence interval: we are using t*, not Z*, which I think is a common mistake.

Okay, now onto t-test practice so you can actually solve the problems!

Let's do an example!

Hillary thinks that the statistic department isn't correctly stating the actual amount of late-fee money they receive from Stat 121 students. They claim that on average each test gives them 5,000 dollars. Hillary takes a simple random sample of a 10 different testing periods over the last five years and gets a mean of 5,800 dollars and standard deviation of 750. Alpha = 0.05. Assume test fees are normally distributed.

STATE: Is the true mean income earned by Stat 121 late fees greater than 5,000 dollars?

Okay. So there are a few things we notice here off the bat. First where is the standard deviation from? It says in the problem it is from the sample, meaning that we know S, not sigma. This means we will be doing a t-test. Also, the STATE lets us know what our hypothesis will end up being (greater than).

For the sake of this problem, we aren't going to go through the entire Plan or Solve steps, only because the point of this problem is to help you learn how to use the t-table.

Ho: Mu=5000
Ha: Mu > 5000

t=5800-5000/ [750/sqrt(10)] = 3.37

Now we go to the t chart. We need one more thing though before we use it: degrees of freedom. Remember, df= n-1.

So in this case, df= 10-1 = 9.

Go to the tenth row in the t-table. Find the two values that sandwich our t value.

I see that the t* values of 3.690 and 4.397.
I then trace my fingers down to the "one sided t test" row (because we are greater than) and read off the two p value values: .005 and .0025

Thus I can say my p value is: .0025< p value < .oo5.
Conclude as usual.

Now let's try a t confidence interval.

The only thing that changes for a t test versus a z test is we are finding t* instead of z*.

Let's try finding t* using our problem above.

We need two things to find t*. One, degrees of freedom which we already found to be 9. Second is confidence level. Since we had an alpha= 0.05, it follows that our confidence level is 95%.


Now we simply find where our df and confidence level intersect. This is our t*.

From the table, I get t*= 2.262.

Hope that helps! Remember don't wait to do assignment 21. The open lab will be overflowing on Wednesday. Get it done! As always if you have questions email me!

-Hillary

Assignment 20

Most of assignment 20 is stuff you are familiar with. Remember, since the first few questions are about "she knows the standard deviation should" we are still talking about sigma, thus doing z-tests. Try not to get confused! Otherwise, it's just the standard procedure.

Remember the definition for p-value:

"P-value is the probability of getting an x-bar as extreme or more extreme if the null hypothesis were true"

This definition has slightly more things you can see "subbed in" for. Let's try an example.

Let's do the example we talked about in class, the pink cookies from the vending machine. We get an x-bar of 650 calories, and we are testing:

Ho: Mu=600 cal
Ha: Mu>600 cal.

We calculate a p-value of 0.03. If the question asked us to interpret the p-value in context, we might say:

"The probability is 3% of getting a value as high or higher than 650 calories if the true calories of the cookies was 600."

I highlighted the same colors of the sentence that correspond to the definition sentence. See how the main points are there and how you can recognize them? There are obviously different ways to re-arrange the sentence, but all the main parts have to be there.

The last questions (questions 8-11) are what we couldn't go over and you should have learned in class. Just some hints (we will go over what it REALLY means this Thursday.)

A type I error is REJECTING a TRUE null hypothesis.

A type II error is FAILING to REJECT a FALSE null hypothesis.

For example:

Ho: The cake is done.
Ha: The cake is not done.

In a type I error, we REJECT a null hypothesis that was actually TRUE. So, We would say that the cake is not done when the cake was actually done (meaning we left it in the oven and overcooked it).

In a type II error, we would take the cake out, but it wasn't done yet (because we failed to reject the null, but it was false).

alpha=probability of a type I error
beta=probability of a type II error.

I hope that helps you answer the questions, although it contains none of the explanation. I think it will make more sense once we go over it.

Good luck!
-Hillary

Monday, March 5, 2012

Assignment 18

Assignment 18 is testing your knowledge on all the vocabulary we talked about on Thursday dealing with tests of significance.

Don't get confused by the wording on question 1. You know what the mean and standard deviation are of a sampling distribution of x-bar: remember, we always assume the null hypothesis is true.

Remember: Test-statistic = z-score.

I realize in question 6 that they do not give you an alpha. But even without an alpha, you should be able to answer the question. Which p-value gives us more evidence against the null (helps us accept the alternative)? What does it mean when p-value is low? When p-value is high? How do we get these p-values? If our test statistic (z) is further from the mean, does that give us a high or low p-value? DRAW IT OUT ON A GRAPH! It will help. I promise.

The rest of the questions step you through the process we talked about at the end of class.

Be careful on p-value in question 12: Our null hypothesis is greater than. What proportion do we want from the table?

Good Luck!
-Hillary

Assignment 17

We pretty much did all of assignment 17 in class on Thursday. Something I didn't mention:

Statistically significant means it did not happen due to chance alone. Meaning, our p-value was significant. In other words, if p-value is less than alpha, and we reject the null hypothesis, our p-value was statistically significant.

-Hillary

Thursday, March 1, 2012

Assignment 16

I am SO SORRY this is so late. This assignment used to be Assignment 17 so I was expecting to be able to do more of it in class.

The key here is to remember that you only need to choose ONE POPULATION.

So, for example, you'd have one of these populations:

Middle-aged American women of a healthy weight and BMI that don't drink wine.

OR

Middle-aged American women of a healthy weight and BMI that do drink wine.

Once you choose one, roll with it for the rest of the time.

Thus when you write the parameter, just write about one of the populations (the one you chose). How does a parameter differ from the population? What word do we need to add? (Hint...it starts with "m").

Remember the difference between confidence level and confidence interval. Interval is what we actually report, LEVEL is that complicated definition we talked about. Here's a hint for question 5, you should NOT answer:

"That is the percentage of the time we will find mu in the interval". This is WRONG. Hopefully this WRONG answer will help you remember the right one :)

The last question we want to use our conclude cookie-cutter answer.

Good Luck!