A Diamond in the Rough

Lab 1F

Directions: Follow along with the slides and answer the questions in red font in your journal.

Messy data? Get used to it

Messy data?

The American Time Use Survey

Load and go:

data(atu_dirty)
View(atu_dirty)

Description of ATU Variables

New name, same old data

atu_cleaner <- rename(atu_dirty, age = V1,
                       gender = V2)

Next up: Strings

"string"
"A1B2c3"
"Hot Cocoa"
"0015"

Numbers are words? (Sometimes)

Changing strings into numbers

as.numeric("3.14")
## [1] 3.14

Mutating in action

atu_cleaner <- mutate(atu_cleaner, 
                 age = as.numeric(age),
                 ___ = as.numeric(___))

Deciphering Categorical Variables

Factors and Levels

tally(~gender, data = atu_cleaner)

A level by any other name…

atu_cleaner <- mutate(atu_cleaner, gender = 
                 recode(gender, 
                         "01"="Male", 
                         "02" = "Female"))

Allow me to explain

atu_cleaner <- mutate(atu_cleaner, gender = 
                  recode(gender, "01"="Male", 
                    "02" = "Female"))

Finish it off!

The final lines

Run the code below:

atu_clean <- atu_cleaner

Run the code below:

save(atu_clean, file = "atu_clean.Rda")

Flex your skills

Run the code below:

histogram(~calories | healthy_level, data = food)

Notice that the healthy_level categories are now numbers as opposed to tick-marks. This is an improvement but an even better solution would be to recode the categories.