Lab 2F

Directions: Follow along with the slides, completing
the questions in **blue** on your
computer, and answering the questions in **red** in your
journal.

Space, Click, Right Arrow or swipe left to move to
the next slide.

- In the previous lab, we learned that by using a
`do`

-loop and the`shuffle`

function, we could simulate randomly shuffling our data many times.- This helps us determine how likely it is that a difference between groups is due to chance.

- For this lab, we will extend these ideas to
*numerical*variables by using random shuffling and numerical summaries. - The question we will investigate in this lab is:

*Is there any evidence to suggest that those who survived paid a
higher fare than those who died?*

- We will consider wealthier passengers to be those that paid a higher
`fare`

for their ticket.

- The Titanic was a ship that sank en route to the U.S.A. from England
after hitting an iceberg in 1912.
- At the time, it was claimed that the Titanic was
*unsinkable*… it wasn’t … because it did.

- At the time, it was claimed that the Titanic was
- Use the
`data`

function to load the`titanic`

passenger and survival data. - Create a boxplot of the
`fare`

s paid by passengers and facet the plot based on whether the passenger survived or not.**Based on the plot, do you believe that passengers who paid a higher fare on the Titanic were more likely to survive? Explain why and describe how certain you are of being correct.**

- Start your analysis by calculating how much more
the
*typical*survivor paid than the*typical*non-survivor in our data. **Based on the distributions of fares paid, which numerical summary that describes the***typical*value might be preferred?**What was the***typical*fare paid by survivors? Non-survivors? How much more did the typical survivor pay?

- Use the
`do`

and the`shuffle`

functions to`shuffle`

the passenger’s survival status 500 times.- Use the previous lab if you need some help on how to do this.
- For each shuffle, compute each group’s median fare paid.
- Assign your shuffled data the name
`shuffled_survival`

.

- After shuffling your data, use the
`mutate`

function to create a variable called*diff*which is the`median`

fare of survivors minus the`median`

fare of non-survivors.- Assign your mutated data the name
`shuffled_survival`

again.

- Assign your mutated data the name

**Using your shuffled data, answer the research question we posed at the beginning of the lab.**

*Is there any evidence to suggest that those who survived paid a
higher fare than those who died?*

**Write up your answer as a statistical analysis. Create a plot and explain how the plot supports your conclusion. Be sure to also explain why shuffling your data is important.**

What about if instead of calculating the median fare price for each group after a shuffle, we calculated the mean fare price and took the difference (mean_survivor – mean_victim)?

**If we did this 500 times, what do you predict the distribution of differences will look like?**Use the

`do`

and the`shuffle`

functions to shuffle the passenger’s survival status 500 times.- For each shuffle, compute each group’s mean fare paid.
- After shuffling your data, use the
`mutate`

function to create a variable called`diff`

which is the mean fare of survivors minus the mean fare of non-survivors.

**What does the shuffled data reveal? Does the answer to the research question below change when using the mean fares instead of the median fares?**

*Is there any evidence to suggest that those who survived paid a
higher fare than those who died?*