Lab 4B
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal. 
 
 Space, Click, Right Arrow or swipe left to move to
the next slide.
arm_span data again.
xyplot with
height on the y-axis and armspan on the
x-axis.add_line() to run the
add_line function; you’ll be prompted to click twice in the
plot window to create a line that you think fits the data
well.heights based on
their armspan:arm_span data.arm_span.What do the residuals measure?
One method we might consider to measure our model’s accuracy is to sum the residuals.
Fill in the blanks below to calculate our accuracy summary.
Hint: Like mutate, the first argument of
summarize is a dataframe, and the second argument is the
action to perform on a column of the dataframe. Whereas the output of
mutate is a column, the output of summarize is
(usually) a single number summary.
Describe and interpret, in words, what the output of your accuracy summary means.
Write down why adding positive and negative errors together is problematic for assessing prediction accuracy.
When adding residuals, the positive errors in our predictions (underestimates) are cancelled out by negative errors (overestimates) which lead to the impression that our model is making better predictions than it actually is.
To solve this problem we calculate the squared values of the errors because squared values are always positive.
The mean squared error (MSE) is calculated by squaring all of the residuals, and then taking the mean of the squared residuals.
Fill in the blanks below to calculate the MSE of your line.
height of people with
similar armspans.lm, which stands for
linear model:Type best_fit into the console to
see the slope and intercept of the regression line.
Add this line to a scatterplot by filling in the blanks below.
R is familiar with is
simpler than with lines, or models, we come up with ourselves.
best_fit:predict function takes a linear model as
input, and outputs the predictions of that model.The lm() function creates the line of best
fit equation by finding the line that minimizes the mean
squared error. Meaning, it’s the best fitting line
possible.
Calculate the MSE for the values predicted using the regression line.
Compare the MSE of the linear model you fitted using
add_line() to the MSE of the linear model obtained with
lm(). Which linear model performed
better?
Ask your neighbors if any of their lines beat the
lm line in terms of the MSE. Were any of them
successful?