Lab 2H

Directions: Follow along with the slides and answer the questions in **red** font in your journal.

- The
*normal distribution*is a curve we often see in real data.- We see it in people’s blood pressures and in measurement errors.

- When data appears to be
*normally distributed*, we can use the*normal model*to:- Simulate
*normally distributed*data. - Easily compute probabilities.

- Simulate
- In this lab, we’ll look at some previous data sets to see if we can find data that are roughly normally distributed.

- The normal distribution is
*symmetric about the mean*:- The
`mean`

is found in the very center of the distribution. - And the curve looks the same to the left of the mean as it does on the right.

- The
- Use the following to draw a normal distribution:

- To draw a normal curve, we need to know exactly 2 things:
- The
`mean`

and`sd`

.

- The
- The
`sd`

, or*standard deviation*, is a measure of spread that’s similar to the`MAD`

. **Which part of the normal curve changes when the value of the**`mean`

changes?**Which part of the normal curve changes when the value of the**`sd`

changes?*Hint*: Try changing the`mean`

and`sd`

values in the`plotDist`

function.

- Load the
`cdc`

data and use the`histogram`

function to answer the following: **Based on what you know about these variables, which of the variables do you think have distributions that will look like the normal distribution?****Make histograms of these variables. Which ones look like the normal distribution?***Hint*: To help answer this question, try including the option`fit = "normal"`

in the histogram function. You might also try faceting by`gender`

.

- Data scientists like using normal models because it often resembles real data.
*But not EVERYTHING is normally distributed.*

- As a data scientist in training, you must decide when a normal model seems appropriate.
- No model is ever perfect 100% of the time.
- If you choose a model, you should be able to justify why you chose it.

**For each of the following, determine which, if any, appear to be normally distributed. Explain your reasoning:****The**`weight`

of people in our`cdc`

data, faceted by`gender`

.**The difference in**`mean`

weights between`Male`

s and`Female`

s for 500 random shuffles.**The difference in**`median`

weights between`Male`

s and`Female`

s for 500 random shuffles.