Lab 4D
Directions: Follow along with the slides and answer the questions in red font in your journal.
movie
data set to investigate the following questions:Which variables are better predictors of a movie’s audience_rating
when the predictions are made using a line of best fit?
It’s only useful when the trend is linear and both variables are numeric.
Are these variables linearly related? Why or why not?
movie
data using the data
command.critics_rating
contains values between 0 and 100, 100 being the best.audience_rating
contains values that range between 0 and 10, 10 being the best.n_critics
and n_audience
describe the number of reviews used for the ratings.gross
and budget
descibes the amount of money the film made and took to make.cor()
function to find the particular correlation coefficient of the variables from the previous plot, which happen to be audience_rating
and critics_rating
.
cor()
function removes any observations which contains an NA
value in either variable.cor
function. The inputs to the functions work just like the inputs of the xyplot
function.critics_rating
.
critics_rating
and each of the two variables.critics_rating
.lm
models to predict critics_rating
with each variable and compute the MSE for each.
critics_rating
.movie
data.xyplot()
function.