Lab 2H

Directions: Follow along with the slides, completing
the questions in **blue** on your
computer, and answering the questions in **red** in your
journal.

Space, Click, Right Arrow or swipe left to move to
the next slide.

- The
**normal distribution**is a curve we often see in real data.- We see it in people’s blood pressures and in measurement errors.

- When data appears to be
*normally distributed*, we can use the*normal model*to:- Simulate
*normally distributed*data. - Easily compute probabilities.

- Simulate
- In this lab, we’ll look at some previous datasets to see if we can find data that are roughly normally distributed.

- The normal distribution is
*symmetric about the mean*:- The
`mean`

is found in the very center of the distribution. - And the curve looks the same to the left of the mean as it does on the right.

- The
- Use the following to draw a normal distribution:

- To draw a normal curve, we need to know exactly 2 things:
- The
`mean`

and`sd`

.

- The
- The
`sd`

, or**standard deviation**, is a measure of spread that’s similar to the`MAD`

. **Which part of the normal curve changes when the value of the**`mean`

changes?**Which part of the normal curve changes when the value of the**`sd`

changes?*Hint*: Try changing the`mean`

and`sd`

values in the`plotDist`

function.

- Load the
`cdc`

data and use the`histogram`

function to answer the following: **Think about the**`height`

and`weight`

variables. Based on what you know about these variables, which of the variables do you think have distributions that will look like the normal distribution?**Make histograms of these variables. Which ones look like the normal distribution?***Hint*: To help answer this question, try including the option`fit = "normal"`

in the histogram function. You might also try faceting by`gender`

.

- Data scientists like using normal models because it often resembles
real data.
*But not EVERYTHING is normally distributed.*

- As a data scientist in training, you must decide when a normal model
seems appropriate.
- No model is ever perfect 100% of the time.
- If you choose a model, you should be able to justify why you chose it.

**For each of the following, determine which, if any, appear to be normally distributed. Explain your reasoning:****The difference in**`percentages`

between male and female survivors in a slasher film for 500 random shuffles.**The difference in**`median`

fares between survivors and non-survivors on the Titanic for 500 random shuffles.**The difference in**`mean`

fares between survivors and non-survivors on the Titanic for 500 random shuffles.

- Hint: Refer to Lab 2E and 2F.