Lab 4A

Directions: Follow along with the slides and answer the questions in **red** font in your journal.

- Anyone can make predictions.
- Data scientists use data to inform their predictions by using the information learned from the sample to make predictions for the whole population.

- In this lab, we’ll learn how to make predictions by finding the
*line of best-fit*.- You will also learn how to use the information from one variable to make predictions about another variable.

- Use the
`data()`

function to load the`arm_span`

data. - This data comes from a sample of 90 people in the Los Angeles area.
- The measurements of
`height`

and`armspan`

are in inches. - A person’s
`armspan`

is the maximum distance between their fingertips when they spread their arms out wide.

- The measurements of
- Make a plot of the
`height`

variable.**If you had to predict the height of someone in the Los Angeles area, what single height would you choose and why?****Would you describe this as a***good*guess? What might you try to improve your predictions?

- Create two subsets of our
`arm_span`

data:- One for
`armspan >= 61 & armspan <= 63`

. - A second for
`armspan >= 64 & armspan <= 66`

.

- One for
- Create a histogram for the
`height`

of people in each subset. Answer the following based on the data:**What**`height`

would you predict if you knew a person had an`armspan`

around 62 inches?**What**`height`

would you predict if you knew a person had an`armspan`

around 65 inches?**Does knowing someone’s**`armspan`

help you predict their height? Why or why not?

- Notice that there is a trend that people with a larger
`armspan`

also tend to have a larger mean`height`

.- One way of describing this sort of trend is with a line.

- Data scientists often
*fit*lines to their data to make predictions.- What we mean by
*fit*is to come up with a line that’s close to as many of the data points as possible.

- What we mean by
- Create a scatterplot for
`height`

and`armspan`

. Then run the following code. Draw a line by clicking twice on the*Plot*pane.

- Draw a line that you think is a good
*fit*and write down its equation. Using this equation:**Predict how tall a person with a 62 inch armspan and a person with a 65 inch**`armspan`

would be.

- Using a line to make predictions also lets us make predictions for
`armspan`

s that aren’t in our data.**How tall would you predict a person with a 63.5 inch**`armspan`

to be?

**Compare your answers with a neighbor. Did both of you come up with the same equation for a line? If not, can you tell which line fits the data best?**

- If you were to go around your class, each student would have created a different line that they feel
*fit*the data best.- Which is a problem because everyone’s line will make slightly different predictions.

- To avoid this variation in predictions, data scientists will use
*regression lines*.- This line connects the mean
`height`

of people with similar`arm_spans`

. - Fill in the blanks below to create a
*regression line*using an`lm`

, or*linear model*:

- This line connects the mean

- Use the output of the code from the previous slide to write down the equation of the
*regression line*in the form

`y = a + bx`

. - Add this line to a scatterplot by filling in the blanks below:

- Predict the height of a person with a 63.5 inch
`armspan`

and compare it with a neighbor. Ensure you both arrive at the same predicted value. **Measure your**`armspan`

and use the regression line to predict your height. How close was the prediction?