Monday, January 30, 2012

Exam 1 Review

Here are some things to focus on for exam 1. This is not meant to be a comprehensive list. It is merely trying to help you focus on some important things to study.

1. Definitions

Definitions are a huge part of this exam. Be sure to understand them, not just memorize them. Some definitions to know:

  • Population versus Sample (can you identify the sample/population?)

  • Experiment versus Observation study (How can you tell? Be sure to know! Experiment = treatments applied, observational study = just looking at something that has already happened)

  • A control / comparison

  • Replication (be careful with this one!)

  • Explanatory Variable versus Response Variable (explanatory = treatments, response = a measurable thing on the individual)
2. Randomization

This is a huge part of all your tests. KNOW the randomized designs and how to recognize them.

  • Know the difference between RANDOMIZED SAMPLING and RANDOMIZED DESIGNS.

  • Randomized Sampling includes: SRS (Simple Random Sample), Stratified Sampling, and Multi-Stage Sampling

  • Randomized Experimental Designs: CRD (completely randomized design), Block Design and Matched Pairs.

  • Realize that if it's an experiment, we care more about the randomized experimental design than the sampling design.
3. Graphing

  • Know what types of graphs are categorical and quantitative (and which ones we like and how to recognize them)

  • Five number summary. How to find it, how to recognize it on a blox plot. (The percentage between Q1 and Q3 is....)

  • Shape, Center and Spread of a graph. (Shape: Skewed right/left etc, Center: mean/median and when to use them, and Spread: IQR and St. Deviation. Know things about both.)
4 . Z-Scores!
  • Check my post below for a comprehensive review on z-score equation problems.

Remember, many things are concept base, so make sure you understand why, not just how.

Good Luck!
-Hillary

Z-Score Review

Z-scores are going to be an important part of the upcoming test (well and the rest of the semester). The way I see it, there are three main types of z-score problems.

1. A "higher than" or "lower than" Problem: These types of problems give you an "x" and want you to find the percentage of something higher or lower than that value.

2. A "in between" problem: These type of problems want you to find the percentage or proportion between two x-values.

3. A "Give you to percentage/proportion" problem: These problems give you a proportion and would like you to work backwards to solve for x.

Using the examples in class (although the numbers may be slightly different), I'll give you a problem of each type. We talked about Jimmer's statistics with the kings according to ESPN.

Mu: 8.8 ppg (points per game)
Sigma: 1.4 ppg

Type 1: Find the proportion of games that Jimmer scores above 11 ppg.

Type 2: Fine the proportion of games that Jimmer scores between 7 and 11 ppg.

Type 3: Find the threshold ppg for the top 5% of all of Jimmer's games.


Solution:

Type 1. This question is asking us for a proportion above. Thus, we plug in the numbers to our z-score equation: z= (11-8.8)/1.4 = 1.57
We take this z-score and look it up on the z-table. From the table, we read: 0.9418. But since we are asking for above and we know the z-table only gives us the proportion to the left, or underneath, we subtract the proportion from one. Thus, our answer is 1-0.9418=5.82% or .0582

Type 2. This question is asking for us to find the proportion between. This might seem difficult, but if you draw it out it will make more sense. The very first thing you need to do in between problems is do two separate z-score equations for both x-values. In this case, our x-values are 7 and 11. Since we already did the problem for 11, all we need is to do it for 7. So, z=(7-8.8)/1.4 = -1.29. Looking this up on the z-table, we get the proportion of 0.0985.

Now think of the graph. We can get the area to the left of 7, and the area to the left of 11. Draw that out on a piece of paper on a normal graph. See what we have to do? It's clear from the graph we just need to subtract the smaller proportion from the larger proportion.

So, 0.9418-0.0985= 0.8433 = 84.33%

Type 3: The last type is typically the hardest. They are giving us a proportion. Where do we find proportions? That's right, in the middle of the z-table. We must work backwards: we use the proportion to find a z-score, then solve our equation for x.

This one is even trickier though. Because we want the top five percent, we have to remember to look up what the area to the LEFT is, since that is what the table tells us. Thus, we look 95% up in the table. The closest proportion I can find is .9505 (you could also use .9495, going above or below doesn't matter, as long as it's the closest). This corresponds to a z-score of 1.65.

Plugging it into my equation I get 1.65= (x-8.8)/1.4. x= 11.11 ppg.


Hopefully that helps with the concept. Here are some more practice questions below I'd try out (the numbers don't correspond with the type. Try to figure out what type they are for yourself). The answers are listed after with some tips if you got it wrong.

Let's look at average months dating to engagement time at BYU. Let's say the average is 5 months with a standard deviation of 2 months. Find the following.

1. The number of months until engaged that are in the bottom 10 percent of BYU students.

2. The percentage of students who get engaged at 3 months of dating or less.

3. The proportion of students who get engaged at 9 months or more.

4. The number of months that are in the top two percent of students.

5. The proportion of students who get engaged between 4 and 13 months.


Try these on your own! But here are the answers:


1. Type 3 problem. -1.28 is the z-score from the table, so the x-value you get is : 2.44 months.

2. Type 1 problem. Since it is "or less" you keep the proportion from the table. Z-score= -1. Answer: 0.1587.

3. Type 1 problem. This is an "or more" so you need to subtract the proportion from 1. Z-score= 2. Answer: 0.0228.

4. Type 3 problem: This is "in the top". So we look up 98% in the table. We get a z-score of: 2.5. Answer = 10 months.

5. Type 2 problem. Z-score for 4: -0.5, proportion: 0.3085. Z-score for 13: 4. Proportion:... Wait...what? four? But that isn't on our table!! That's okay. What is the proportion for four? It just means that EVERYTHING is under it on the graph. Meaning our proportion is 1 (the whole area). Similarly, if we got a z-score of -4, we would assume what? (no area, so = 0).

So, now we subtract. 1- 0.3085 = 0.6915 = 69.15%.

Some things to remember:

NEVER. NEVER EVER. Subtract z-scores. Or add them. ONLY SUBTRACT OR ADD PROPORTIONS.

Be careful about "above" or "below". Know what the table shows you. Draw pictures if in doubt.

Good Luck!

Tuesday, January 24, 2012

Assignment 6

We will be talking about the concepts of Standard Deviation and the 68-95-99.7 rule in class on Thursday.

I won't have time to go over the IQR Rule, so hopefully I can put a good explanation of it here.

The IQR Rule is a way we determine if we have outliers. As we know, the IQR is the inter-quartile range, or (Q3-Q1). Here is the rule:

If an observation is GREATER than (1.5 multiplied by IQR) plus Q3, then the observation is a high outlier. If an observation is LESS than Q1 minus (1.5 multiplied by IQR), then the observation is a low outlier. In mathematical form:

Observation < Q1 -1.5*(Q3-Q1), then the observation is a low outlier and
Observation > Q3 + 1.5*(Q3-Q1), then the observation is a high outlier.

Let's do an example to make sure it makes sense. Let's say that my five number summary is as follows: 2, 10, 15, 20, 50. I want to know if 50 or 2 is an outlier. Let's do the math.

(Q3-Q1) = (20-10)=10.
1.5* 10 = 15

To Check for the high outlier:
Q3+15=20+15=35. Since 50>35, 50 is an outlier.

To check for the low outlier:
Q1-15=10-15=-5. Because 2 is NOT < -5, then 2 is NOT an outlier.

Hopefully that helps with that concept.

Questions 9-13 should not be that difficult. These all deal with concepts we discussed last Thursday. Particularly the mean and median ones: Remember, the median cuts the data in HALF (not necessarily where the peak is). Then remember where the mean goes based on skewed data.

Hope that helps,
Hillary

Assignment 5

We went through most of the assignment already in class (the ones we didn't are very similar in nature to the rest of the questions).

The only question I have advice about is Question 13. You can do this by sight. Look at the graphs. When we say "clear", we mean, is there a data point that is further away from the others by quite a margin? Don't over think it.

-Hillary

Tuesday, January 17, 2012

Assignment 4

We will go through the other difficult ones in class. Here is some help on the others:

Question 5-7: This is a really long, hard problem if done by hand. So, I already plotted the data for you to make your life easier :). Be sure you understand HOW I plotted this data: be sure you can interpret the graph. (You will be required at times to plot your own data, like for example on a written portion of an exam. So please be sure you understand how to graph).

0 4
0 66777888
1 11233
1 78999
2 012334
2 578899
3 0001
3 88
4
4
5
5 7

See how I "split the stems" as it asked? This allows one line in a chart not to be super long. Splitting means having two "stems" (like double 2's or double 1's as shown above). Realize that my scale is 1|7 = 1700. I also rounded to the nearest hundred like asked.

Use this to find the rest of the answers. I understand it may be difficult to say the exact shape "is it only "slightly" or "totally" skewed?) But the medians are different, so you should be able to determine the answer. (You may ask the quesiton: if you rounded, how am I to know the actual median? Let's say the median fell on a 2|8 (this is incorrect). If it was the first 2|8, I would use the number "2828" because it is the lower of the 2800 numbers. Hope that makes sense).

Hope that helps :)

-Hillary

Assignment 3

Sorry this is so late- the next assignments will be ON TIME (meaning at least a day or two before they are due). I was sick this weekend and just didn't do as much as usual!

We talked about this assignment already, but just some reminders.

Make sure you have an actual experiment. This means applying a treatment (be sure you aren't just observing).

Answer the whole question. (This is especially true for #2. When can we infer?)

Careful on the response variable. Make sure it is measurable and on the individual.

Good Luck!
-Hillary

Wednesday, January 11, 2012

Welcome! and Assignment 2

Welcome to Stat 121! Hopefully this blog will be useful to you throughout the semester. As a reminder, come here for homework questions we didn't cover in lab as well as helpful reviews for all of the exams.

Some reminders on housekeeping duties at the beginning of the semester:

  • Be sure to register your iClicker (using your netID! NOT your 9-digit number!)
  • Buy StatsPortal, if you haven't
  • Put your netID into StatsPortal at the top of the page

Now, onto some help with Assignment 2: