# Previously

• In the previous lab, we learned we could make predictions about one variable by utilizing the information of another.
• In this lab, we will learn how to measure the accuracy of our predictions.
• This in turn will let us evaluate how well a model performs at making predictions.
• We’ll also use this information later to compare different models to find which model makes the best predictions.

# Predictions using a line

• Load the arm_span data again.
• Create an xyplot with height on the y-axis and armspan on the x-axis.
• Type add_line() to run the add_line function; you’ll be prompted to click twice in the plot window to create a line that you think fits the data well.
• Fill in the blanks below to create a function that will make predictions of people’s heights based on their armspan:
make_predictions <- function(armspans) {
____ * armspans + ____
}

• Fill in the blanks to include your predictions in the arm_span data.
____ <- mutate(____, predictions = ____(____))
• Now that we’ve made our predictions, we’ll need to figure out a way to decide how accurate our predictions are.
• We’ll want to compare our predicted heights to the actual heights.
• At the end, we’ll want to come up with a single number summary that describes our model’s accuracy.

# Sums of differences

• One method we might consider to measure our model’s accuracy is to sum the differences in the actual heights minus our predicted heights.
• What do these differences measure?
• Fill in the blanks below to create a function which calculates the sum of differences:
accuracy <- function(actual, predicted) {
sum(____ - ____)
}
• Then fill in the blanks to calculate our accuracy summary.
summarize(____, ____(____, ____))

# Checking our work

• Describe and interpret, in words, what the output of your accuracy summary means.
• Compare your accuracy summary with a neighbor. Whose line was more accurate and why?
• Write down why adding positive and negative errors together is problematic for accessing prediction accuracy.
• Why does calculating the squared values for the differences solve this problem?
• Alter your accuracy function to first calculate the differences, then square them and finally take the mean of the squared differences. This is called the mean squared error (MSE).
• Calculate the MSE of your line.

• Create a regression line as you did in the previous lab, for height and armspan.
• We also refer to regression lines as linear models.
• Assign this model the name best_fit.
• Making predictions with models R is familiar with is simpler than with lines, or models, we come up with ourselves.
• Fill in the blanks to make predictions using best_fit:
____ <- mutate(____, predictions = predict(____))
• Calculate the MSE for these new predicted values.

# The magic of lm()

• The lm() function creates the line of best fit equation by finding the line that minimizes the mean squared error. Meaning, it’s the best fitting line possible.
• Compare the MSE value you calculated using the line you fitted with add_line() to the the same value you calculated using the lm function.
• Ask your neighbors if any of their lines beat the lm line in terms of the MSE. Were any of them successful?
• To see how the lm line fits your data, create a scatterplot and then run:
add_line(intercept = ____, slope = ____)