Lab 2I
Directions: Follow along with the slides and answer the questions in red font in your journal.
mean
of random shuffles also produces differences that are normally distributed.R
functions to:
titanic
data and calculate the mean
age
of people in the data but shuffle
their survival
status 500 times.
Assign
this data the name shfls
.shfls
, use mutate
to add a new variable to the data set. This new variable should have the name diff
and should be the mean
age
of those who survived minus those who died.mean
and sd
of the diff
variable.
Assign
these values the name diff_mean
and diff_sd
.diff
variable looks approximately normally distributed.
Since the distribution of our diff
variable appears normally distributed, we can use a normal model to estimate the probability of seeing differences that are more extreme than our actual data.
Fill in the blanks to calculate the probability of an even smaller difference occurring than our actual difference using a normal model.
The probability you calculated in the previous slide is an estimate for how often we expect to see a difference smaller than the actual one we observed, by chance alone.
If you wanted to instead calculate the probability that the difference would be larger than the one observed, we could run (fill in the blanks):
rnorm
function.
mean
height is 67 inches and the standard deviation
is 3 inches.histogram
.pnorm
to calculate probabilities based on a specified quantity.
Conduct one of the statistical investigations below:
titanic
data
cdc
data