Lab 4D
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
movie dataset to
investigate the following question:Which variables are better predictors of a movie’s
critics_rating when the predictions are made using a line
of best fit?
The correlation coefficient describes the strength and direction of the linear trend.
It’s only useful when the trend is linear and both variables are
numeric.
Are these variables linearly related? Why or why not?
movie data using the
data command.critics_rating contains values between 0 and 100,
100 being the best.audience_rating contains values that range between
0 and 10, 10 being the best.n_critics and n_audience describe the
number of reviews used for the ratings.gross and budget describe the amount of
money the film made and took to make.cor() function to find the particular
correlation coefficient of the variables from the previous plot, which
happen to be audience_rating and
critics_rating.cor() function removes any observations
which contain an NA value in either variable.cor function. The inputs to the
functions work just like the inputs of the xyplot
function.critics_rating.
critics_rating and each of the two variables.critics_rating.lm models to predict
critics_rating with each variable and compute the MSE for
each.
critics_rating.movie data. Plot the variables using the
xyplot() function.