Lab 2D

Directions: Follow along with the slides, completing
the questions in **blue** on your
computer, and answering the questions in **red** in your
journal.

Space, Click, Right Arrow or swipe left to move to
the next slide.

- In the last lab, we looked at how we can use computer simulations to
compute estimates of simple probabilities.
- Like the probability of drawing a song genre from a playlist.

- We also saw that performing
*more*simulations:- Took
*longer*to finish. - Had estimates that
*varied less*.

- Took
- In this lab, we’ll extend our simulation methods to cover situations
that are more complex.
- We’ll learn how to estimate their probabilities.
- We will also look at the role of sampling
*with*or*without**replacement*.

- In
`R`

, simulate a*playlist of songs*containing 30`"rap"`

songs, 23`"country"`

songs and 47`"rock"`

songs.*Assign*the`c`

ombined playlist the name`songs`

.

- Simulate choosing a single song 50 times. Then
use your simulated draws to estimate the probability of choosing a
*rap*song.- The actual (theoretical) probability of choosing a
*rap song*in this case is`0.30`

. **Write a sentence comparing your estimated probability to the actual probability.**

- The actual (theoretical) probability of choosing a

- So far, you’ve selected songs
*with replacement*.- We called it that, because each time you made a selection, you started with the same playlist. That is, you chose a song, wrote down its data, and then placed it back on the list.

- It’s also possible to select
*without replacement*by setting the`replace`

option in the`sample`

function to`FALSE`

. - Take a sample of
`size`

100 from our playlist of songs*without replacement*. Assign this sample the name`without`

.**Run**`tally(without)`

and describe the output. Does something similar happen if you sample*with replacement*?- Notice that the tilde
`~`

was not needed with the`tally`

function. This is because`without`

was not a variable within a data frame but rather a vector which acts like a lone variable.

- Notice that the tilde
**What happens if**`size = 101`

and`replace = FALSE`

?

- Imagine the following two scenarios.
- You have a coin with two sides:
*Heads*and*Tails*. You’re not sure if the coin is fair and so you want to estimate the probability of getting a*Head*. - A child reaches into a candy jar with 10
*strawberry*, 50*chocolate*and 25*watermelon*candies. The child is able to grab three candies with their hand and you’re interested in the probability that all three candies will be chocolate.

- You have a coin with two sides:
**Which of these scenarios would you sample***with replacement*and which would you sample*without replacement*? Why?**Write down the line of code you would run to**`sample`

from the candy jar. Assume the simulated jar is named`candies`

.

- In reality, songs from a playlist are chosen without replacement.
- This way, you won’t hear the same song several times in a row.

- Let’s write a more realistic simulation and estimate the probability
that if we select two songs at random, without replacement, that both
are rap songs.
- Use the
`do`

function to perform 10 simulated`sample`

s of`size`

2, without replacement and*assign*the simulations the name`draws`

and then`View`

your file. Use`set.seed(1)`

.

- Use the
**What are the variable names? What happened in the first simulation? Did any of your 10 simulations contain two***rap*songs?

- To estimate the probability from our simulations, we need to find the proportion of times that the event we’re interested in occurs in the simulations.
- In other words, we need to count the number of times the desired events occurred, divided by the number of attempts we made (the number of simulations).
- The next slides will show you two ways to do this.

- One way we can estimate the probability of drawing two songs of the
*same*genre is to use the following trick to count the number of*rap*songs in each of the 10 simulations:

**Let’s break down the code above by running each part of the code one piece at a time. As you run each line of code below describe the output.**

- Remember to assign a name to your mutated dataset.

- Another method we can use to estimate the probability of complex
events is to use the following 2-step procedure:
- Subset or filter the rows of the simulations that match our desired outcomes.
- Count the number of rows in the subset and divide by the number of simulations.

- The result that you obtain is an estimate of the probability that a specific combination of events occurred.
- We’ll see an example of this method on the next slide.

- Fill in the blanks below to:
- Create a subset of our simulations when both draws were
`"rap"`

songs. - Count the number of rows in this subset.
- And divide by the total number of repeated simulations.

- Create a subset of our simulations when both draws were

Answer the following questions by performing 500 simulations of sampling 2 songs from a playlist of 30

`"rap"`

, 23`"country"`

and 47`"rock"`

songs. You might consider running`set.seed`

so that your results can be reproduced:**Calculate estimated probabilities for the following situations:**- You draw two
`"rap"`

songs. - You draw a
`"rap"`

song in the first draw and a`"country"`

song in the 2nd.

- You draw two
**Create a**`histogram`

that displays the number of times a`"rap"`

song occurred in each simulation. That is, how often were zero rap songs drawn? A single rap song? Two rap songs?

- Using what you’ve learned in the previous two labs, answer the following question by performing two computer simulations with 500 repetitions a piece:

*If we draw 5 songs from a playlist of 30 rap, 23 country
and 47 rock songs, how does the estimated probability of all 5 songs
being rap songs change if we draw the songs with or without
replacement?*

- For each simulation:
- Create a histogram for the number of
*rap*songs that occurred for each of the 500 repetitions.

- Create a histogram for the number of
**Describe how the distribution of the number of***rap*songs changes depending on if we use replacement or not.