dplyr is the tidyverse’s grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges. Think of dplyr verbs as the fundamental building blocks for data transformation.
library(tidyverse)# We'll use the built-in mtcars dataset for examplesdata(mtcars)cars <-as_tibble(mtcars, rownames ="model")glimpse(cars)
# A tibble: 2 × 2
model mpg
<chr> <dbl>
1 Mazda RX4 21
2 Mazda RX4 Wag 21
Filtering Missing Values
# Create data with NAscars_with_na <- cars %>%mutate(mpg =if_else(row_number() %in%c(2, 5, 8), NA_real_, mpg))# Remove rows with NA in mpgcars_with_na %>%filter(!is.na(mpg)) %>%select(model, mpg) %>%head()
# A tibble: 6 × 5
model mpg wt efficiency size
<chr> <dbl> <dbl> <chr> <chr>
1 Mazda RX4 21 2.62 Efficient Light
2 Mazda RX4 Wag 21 2.88 Efficient Light
3 Datsun 710 22.8 2.32 Efficient Light
4 Hornet 4 Drive 21.4 3.22 Efficient Heavy
5 Hornet Sportabout 18.7 3.44 Not Efficient Heavy
6 Valiant 18.1 3.46 Not Efficient Heavy
# Using case_when() for multiple conditionscars %>%mutate(performance =case_when( hp <100~"Low Power", hp <150~"Medium Power", hp <200~"High Power",TRUE~"Very High Power" ),efficiency =case_when( mpg >30~"Excellent", mpg >25~"Good", mpg >20~"Fair",TRUE~"Poor" ) ) %>%select(model, hp, mpg, performance, efficiency) %>%head(10)
# A tibble: 10 × 5
model hp mpg performance efficiency
<chr> <dbl> <dbl> <chr> <chr>
1 Mazda RX4 110 21 Medium Power Fair
2 Mazda RX4 Wag 110 21 Medium Power Fair
3 Datsun 710 93 22.8 Low Power Fair
4 Hornet 4 Drive 110 21.4 Medium Power Fair
5 Hornet Sportabout 175 18.7 High Power Poor
6 Valiant 105 18.1 Medium Power Poor
7 Duster 360 245 14.3 Very High Power Poor
8 Merc 240D 62 24.4 Low Power Fair
9 Merc 230 95 22.8 Low Power Fair
10 Merc 280 123 19.2 Medium Power Poor
# Group by cylinder and calculate summariescars %>%group_by(cyl) %>%summarize(n =n(),avg_mpg =mean(mpg),avg_hp =mean(hp),avg_wt =mean(wt),.groups ="drop" )
# Summarize all numeric columnscars %>%summarize(across(where(is.numeric), mean)) %>%round(2)
# A tibble: 1 × 11
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20.1 6.19 231. 147. 3.6 3.22 17.8 0.44 0.41 3.69 2.81
Exercises
Exercise 1: Basic Verbs
Using the iris dataset: 1. Select only the Petal columns and Species 2. Filter for Petal.Length > 4 3. Create a new column for Petal.Area (Length × Width) 4. Arrange by Petal.Area descending 5. Show only the top 10 rows
Exercise 2: Grouped Operations
Using mtcars: 1. Group by number of gears 2. Calculate average mpg and hp for each group 3. Add a column showing the difference from overall average mpg 4. Keep only groups where average mpg > 20
Exercise 3: Complex Pipeline
Create a pipeline that: 1. Filters cars with 4 or 6 cylinders 2. Creates an efficiency score (mpg × 1000 / (hp × wt)) 3. Groups by cylinder count 4. Finds the most and least efficient car in each group 5. Presents the results in a clean summary table
Summary
dplyr provides a powerful, intuitive grammar for data manipulation:
select(): Choose your columns
filter(): Choose your rows
mutate(): Create new variables
arrange(): Order your data
summarize(): Calculate summaries
group_by(): Split-apply-combine operations
These verbs can be combined in endless ways to solve virtually any data manipulation challenge. The key is to think of data transformation as a series of simple steps, each accomplishing one specific task.