Lab 2F

Directions: Follow along with the slides and answer the questions in **red** font in your journal.

- In the previous lab, we learned that by using a
`do`

-loop and the`shuffle`

function, we could simulate randomly shuffling our data many times.- This helps us determine how likely it is that a difference between groups is due to chance.

- For this lab, we will extend these ideas to
*numerical*variables by using random shuffling and numerical summaries. - The question we will investigate in this lab is:

*Is there any evidence to suggest that those who survived paid a higher fare than those who died?*

- We will consider wealthier passengers to be those that paid a higher
`fare`

for their ticket.

- The Titanic was a ship that sank en route to the U.S.A. from England after hitting an Iceberg in 1912.
- At the time, it was claimed that the Titanic was
*unsinkable*… it wasn’t … because it did.

- At the time, it was claimed that the Titanic was
- Use the
`data`

function to load the`titanic`

passenger and survival data. - Create a boxplot of the
`fare`

s paid by passengers and facet the plot based on whether the passenger survived or not.**Based on the plot, do you believe that passengers who paid a higher fare on the Titanic were more likely to survive? Explain why and describe how certain you are of being correct.**

- Start your analysis by calculating how much more the
*typical*survivor paid than the*typical*non-survivor in our data.- Based on the distributions of fares paid, which numerical summary that describes the
*typical*value might be preferred?

- Based on the distributions of fares paid, which numerical summary that describes the
**What was the***typical*fare paid by survivors? Non-survivors? How much more did the typical survivor pay?

- Use the
`do`

and the`shuffle`

functions to shuffle the passenger’s survival status 500 times.- Use the previous lab if you need some help on how to do this.
- For each shuffle, compute each group’s
`median`

fare paid. `Assign`

your shuffled data the name`shuffled_survival`

.

- After shuffling your data, use the mutate function to create a variable called diff which is the median fare of survivors minus the median fare of non-survivors.(Assign your mutated data the name shuffled_survival again).

**By using your shuffled data, answer the research question we posed at the beginning of the lab.**

*Is there any evidence to suggest that those who survived paid a higher fare than those who died?*

**Write up your answer as a statistical analysis. Create a plot and explain how the plot supports your conclusion. Be sure to also explain why shuffling your data is important.**

What about if instead of calculating the median fare price for each group after a shuffle, we calculated the mean fare price and took the difference (mean_survivor – mean_victim).

**If we did this 500 times, what do you predict the distribution of differences will look like?**Use the

`do`

and the`shuffle`

functions to shuffle the passenger’s survival status 500 times.- For each shuffle, compute each group’s mean fare paid.
- After shuffling your data, use the
`mutate`

function to create a variable called`diff`

which is the mean fare of survivors minus the mean fare of non-survivors.

**What does the shuffled data reveal? Does the answer to the research question below change when using the mean fares instead of the median fares?**

*Is there any evidence to suggest that those who survived paid a higher fare than those who died?*