Lab 4B

Directions: Follow along with the slides and answer the questions in **red** font in your journal.

- In the previous lab, we learned we could make predictions about one variable by utilizing the information of another.
- In this lab, we will learn how to measure the accuracy of our predictions.
- This in turn will let us evaluate how well a model performs at making predictions.
- We'll also use this information later to compare different models to find which model makes the best predictions.

- Load the
`arm_span`

data again.- Create an
`xyplot`

with`height`

on the y-axis and`armspan`

on the x-axis. - Type
`add_line()`

to run the`add_line`

function; you'll be prompted to click twice in the plot window to create a line that you think fits the data well.

- Create an
- Fill in the blanks below to create a function that will make predictions of people's
`height`

s based on their`armspan`

:

```
make_predictions <- function(armspans) {
____ * armspans + ____
}
```

- Fill in the blanks to include your predictions in the
`arm_span`

data.

```
____ <- mutate(____, predictions = ____(____))
```

- Now that we've made our predictions, we'll need to figure out a way to decide how accurate our predictions are.
- We'll want to compare our
*predicted heights*to the*actual heights*. - At the end, we'll want to come up with a single number summary that describes our model's accuracy.

- We'll want to compare our

- One method we might consider to measure our model's accuracy is to sum the differences in the actual heights minus our predicted heights.
**What do these differences measure?**- Fill in the blanks below to create a function which calculates the sum of differences:

```
accuracy <- function(actual, predicted) {
sum(____ - ____)
}
```

- Then fill in the blanks to calculate our accuracy summary.

```
summarize(____, ____(____, ____))
```

**Describe and interpret, in words, what the output of your accuracy summary means.****Compare your accuracy summary with a neighbor. Whose line was more accurate and why?**

**Write down why adding positive and negative errors together is problematic for accessing prediction accuracy.****Why does calculating the squared values for the differences solve this problem?**

- Alter your accuracy function to first calculate the differences, then square them and finally take the
`mean`

of the squared differences. This is called the*mean squared error*(MSE).- Calculate the MSE of your line.

- Create a
*regression line*as you did in the previous lab, for`height`

and`armspan`

.- We also refer to
*regression lines*as*linear models*. - Assign this model the name
`best_fit`

.

- We also refer to
- Making predictions with models
`R`

is familiar with is simpler than with lines, or models, we come up with ourselves.- Fill in the blanks to make predictions using
`best_fit`

:

- Fill in the blanks to make predictions using

```
____ <- mutate(____, predictions = predict(____))
```

- Calculate the MSE for these new predicted values.

- The
`lm()`

function creates the*line of best fit*equation by finding the line that minimizes the*mean squared error*. Meaning, it's the*best fitting line possible*.- Compare the MSE value you calculated using the line you fitted with
`add_line()`

to the the same value you calculated using the`lm`

function. - Ask your neighbors if any of their lines beat the
`lm`

line in terms of the MSE. Were any of them successful?

- Compare the MSE value you calculated using the line you fitted with
- To see how the
`lm`

line fits your data, create a scatterplot and then run:

```
add_line(intercept = ____, slope = ____)
```