Lab 4D
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
movie
dataset to
investigate the following question:Which variables are better predictors of a movie’s
critics_rating
when the predictions are made using a line
of best fit?
The correlation coefficient describes the strength and direction of the linear trend.
It’s only useful when the trend is linear and both variables are numeric.
Are these variables linearly related? Why or why not?
movie
data using the
data
command.critics_rating
contains values between 0 and 100,
100 being the best.audience_rating
contains values that range between
0 and 10, 10 being the best.n_critics
and n_audience
describe the
number of reviews used for the ratings.gross
and budget
describe the amount of
money the film made and took to make.cor()
function to find the particular
correlation coefficient of the variables from the previous plot, which
happen to be audience_rating
and
critics_rating
.cor()
function removes any observations
which contain an NA
value in either variable.cor
function. The inputs to the
functions work just like the inputs of the xyplot
function.critics_rating
.
critics_rating
and each of the two variables.critics_rating
.lm
models to predict
critics_rating
with each variable and compute the MSE for
each.
critics_rating
.movie
data. Plot the variables using the
xyplot()
function.