Lab 4B
Directions: Follow along with the slides, completing
the questions in blue on your
computer, and answering the questions in red in your
journal.
Space, Click, Right Arrow or swipe left to move to
the next slide.
arm_span
data again.
xyplot
with
height
on the y-axis and armspan
on the
x-axis.add_line()
to run the
add_line
function; you’ll be prompted to click twice in the
plot window to create a line that you think fits the data
well.height
s based on
their armspan
:arm_span
data.arm_span
.What do the residuals measure?
One method we might consider to measure our model’s accuracy is to sum the residuals.
Fill in the blanks below to calculate our accuracy summary.
Hint: Like mutate
, the first argument of
summarize
is a dataframe, and the second argument is the
action to perform on a column of the dataframe. Whereas the output of
mutate
is a column, the output of summarize
is
(usually) a single number summary.
Describe and interpret, in words, what the output of your accuracy summary means.
Write down why adding positive and negative errors together is problematic for assessing prediction accuracy.
When adding residuals, the positive errors in our predictions (underestimates) are cancelled out by negative errors (overestimates) which lead to the impression that our model is making better predictions than it actually is.
To solve this problem we calculate the squared values of the errors because squared values are always positive.
The mean squared error (MSE) is calculated by squaring all of the residuals, and then taking the mean of the squared residuals.
Fill in the blanks below to calculate the MSE of your line.
height
of people with
similar armspan
s.lm
, which stands for
linear model:Type best_fit
into the console to
see the slope and intercept of the regression line.
Add this line to a scatterplot by filling in the blanks below.
R
is familiar with is
simpler than with lines, or models, we come up with ourselves.
best_fit
:predict
function takes a linear model as
input, and outputs the predictions of that model.The lm()
function creates the line of best
fit equation by finding the line that minimizes the mean
squared error. Meaning, it’s the best fitting line
possible.
Calculate the MSE for the values predicted using the regression line.
Compare the MSE of the linear model you fitted using
add_line()
to the MSE of the linear model obtained with
lm()
. Which linear model performed
better?
Ask your neighbors if any of their lines beat the
lm
line in terms of the MSE. Were any of them
successful?