library(tidyverse)Tibbles: Modern Data Frames
What are Tibbles?
Tibbles are a modern reimagining of data frames, designed to work seamlessly within the tidyverse. While they behave like data frames in most ways, tibbles have several improvements that make them safer and more user-friendly for data analysis.
Creating Tibbles
From Scratch with tibble()
The most common way to create a tibble is with the tibble() function:
# Create a tibble
my_tibble <- tibble(
id = 1:5,
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
score = c(85.5, 92.3, 78.9, 95.1, 88.7),
passed = score >= 80
)
my_tibble# A tibble: 5 × 4
id name score passed
<int> <chr> <dbl> <lgl>
1 1 Alice 85.5 TRUE
2 2 Bob 92.3 TRUE
3 3 Charlie 78.9 FALSE
4 4 Diana 95.1 TRUE
5 5 Eve 88.7 TRUE
Converting Data Frames to Tibbles
You can convert existing data frames to tibbles:
# Traditional data frame
df <- data.frame(
x = 1:3,
y = c("a", "b", "c")
)
# Convert to tibble
tbl <- as_tibble(df)
# Compare the printing
print(df) x y
1 1 a
2 2 b
3 3 c
print(tbl)# A tibble: 3 × 2
x y
<int> <chr>
1 1 a
2 2 b
3 3 c
Using tribble() for Row-wise Creation
tribble() allows you to create tibbles row-by-row, which is great for small datasets:
# Create a tibble row by row
grades <- tribble(
~student, ~math, ~science, ~english,
#---------|------|---------|---------|
"Alice", 85, 90, 88,
"Bob", 92, 88, 85,
"Charlie", 78, 82, 90,
"Diana", 95, 92, 87
)
grades# A tibble: 4 × 4
student math science english
<chr> <dbl> <dbl> <dbl>
1 Alice 85 90 88
2 Bob 92 88 85
3 Charlie 78 82 90
4 Diana 95 92 87
Key Differences from Data Frames
1. Enhanced Printing
Tibbles have a more informative print method:
# Create a larger tibble
large_tibble <- tibble(
id = 1:1000,
value = rnorm(1000),
category = sample(letters[1:5], 1000, replace = TRUE),
timestamp = Sys.time() + 1:1000
)
# Tibble shows first 10 rows and all columns that fit on screen
large_tibble# A tibble: 1,000 × 4
id value category timestamp
<int> <dbl> <chr> <dttm>
1 1 0.00233 b 2025-09-22 00:35:53
2 2 0.361 a 2025-09-22 00:35:54
3 3 -0.476 c 2025-09-22 00:35:55
4 4 -0.884 e 2025-09-22 00:35:56
5 5 -1.01 d 2025-09-22 00:35:57
6 6 -1.25 b 2025-09-22 00:35:58
7 7 2.38 a 2025-09-22 00:35:59
8 8 0.828 a 2025-09-22 00:36:00
9 9 -0.0807 b 2025-09-22 00:36:01
10 10 0.360 e 2025-09-22 00:36:02
# ℹ 990 more rows
# Data frame would print all 1000 rows!
# df <- as.data.frame(large_tibble)
# df # Don't run this - it would print everything!2. Column Types are Preserved
Tibbles don’t automatically convert strings to factors:
# Data frame converts strings to factors (in older R versions)
df <- data.frame(
text = c("apple", "banana", "cherry"),
value = 1:3
)
# Tibble preserves strings as character vectors
tbl <- tibble(
text = c("apple", "banana", "cherry"),
value = 1:3
)
# Check the types
str(df)'data.frame': 3 obs. of 2 variables:
$ text : chr "apple" "banana" "cherry"
$ value: int 1 2 3
str(tbl)tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
$ text : chr [1:3] "apple" "banana" "cherry"
$ value: int [1:3] 1 2 3
3. Stricter Subsetting
Tibbles are more strict about subsetting:
# Create a tibble
tbl <- tibble(x = 1:3, y = 4:6)
# Single bracket always returns a tibble
tbl[1] # Returns a tibble with one column# A tibble: 3 × 1
x
<int>
1 1
2 2
3 3
tbl["x"] # Returns a tibble with column x# A tibble: 3 × 1
x
<int>
1 1
2 2
3 3
# Double bracket and $ extract the column
tbl[[1]] # Returns the vector[1] 1 2 3
tbl$x # Returns the vector[1] 1 2 3
tbl[["x"]] # Returns the vector[1] 1 2 3
# Tibbles don't do partial matching
# tbl$xy # This would error - no partial matching!Working with Tibbles
Adding and Modifying Columns
# Start with a basic tibble
students <- tibble(
name = c("Alice", "Bob", "Charlie"),
midterm = c(85, 78, 92),
final = c(88, 82, 90)
)
# Add columns with mutate
students <- students %>%
mutate(
average = (midterm + final) / 2,
grade = case_when(
average >= 90 ~ "A",
average >= 80 ~ "B",
average >= 70 ~ "C",
TRUE ~ "D"
)
)
students# A tibble: 3 × 5
name midterm final average grade
<chr> <dbl> <dbl> <dbl> <chr>
1 Alice 85 88 86.5 B
2 Bob 78 82 80 B
3 Charlie 92 90 91 A
Accessing Tibble Information
# Create a sample tibble
data <- tibble(
id = 1:100,
group = sample(LETTERS[1:4], 100, replace = TRUE),
value = rnorm(100, mean = 50, sd = 10)
)
# Get dimensions
nrow(data)[1] 100
ncol(data)[1] 3
dim(data)[1] 100 3
# Get column names
names(data)[1] "id" "group" "value"
colnames(data)[1] "id" "group" "value"
# Check if it's a tibble
is_tibble(data)[1] TRUE
is.data.frame(data) # TRUE - tibbles are also data frames[1] TRUE
# Get a glimpse of the structure
glimpse(data)Rows: 100
Columns: 3
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
$ group <chr> "D", "B", "C", "D", "B", "D", "B", "B", "C", "C", "A", "B", "C",…
$ value <dbl> 47.78940, 45.17783, 54.64428, 41.40415, 46.95217, 41.23244, 54.6…
Advanced Tibble Features
List Columns
Tibbles can contain list columns, allowing complex data structures:
# Create a tibble with list columns
complex_data <- tibble(
id = 1:3,
name = c("Experiment A", "Experiment B", "Experiment C"),
measurements = list(
c(1.2, 1.5, 1.3),
c(2.1, 2.3, 2.2, 2.4),
c(3.1, 3.0)
),
metadata = list(
list(date = "2024-01-01", operator = "Alice"),
list(date = "2024-01-02", operator = "Bob"),
list(date = "2024-01-03", operator = "Charlie")
)
)
complex_data# A tibble: 3 × 4
id name measurements metadata
<int> <chr> <list> <list>
1 1 Experiment A <dbl [3]> <named list [2]>
2 2 Experiment B <dbl [4]> <named list [2]>
3 3 Experiment C <dbl [2]> <named list [2]>
# Access list column elements
complex_data$measurements[[1]][1] 1.2 1.5 1.3
complex_data$metadata[[2]]$operator[1] "Bob"
Nested Tibbles
You can nest tibbles within tibbles:
# Create nested data
sales_data <- tibble(
region = c("North", "South", "East", "West"),
data = list(
tibble(month = 1:3, sales = c(100, 120, 110)),
tibble(month = 1:3, sales = c(80, 85, 90)),
tibble(month = 1:3, sales = c(95, 100, 105)),
tibble(month = 1:3, sales = c(110, 115, 125))
)
)
sales_data# A tibble: 4 × 2
region data
<chr> <list>
1 North <tibble [3 × 2]>
2 South <tibble [3 × 2]>
3 East <tibble [3 × 2]>
4 West <tibble [3 × 2]>
# Access nested tibble
sales_data$data[[1]]# A tibble: 3 × 2
month sales
<int> <dbl>
1 1 100
2 2 120
3 3 110
# Work with nested data using purrr
sales_data %>%
mutate(total_sales = map_dbl(data, ~sum(.x$sales)))# A tibble: 4 × 3
region data total_sales
<chr> <list> <dbl>
1 North <tibble [3 × 2]> 330
2 South <tibble [3 × 2]> 255
3 East <tibble [3 × 2]> 300
4 West <tibble [3 × 2]> 350
Tibble Validation
Tibbles perform validation to ensure data integrity:
# Recycling rules are stricter
# This works - length 1 recycles
tibble(x = 1:4, y = 1)# A tibble: 4 × 2
x y
<int> <dbl>
1 1 1
2 2 1
3 3 1
4 4 1
# This errors - inconsistent lengths
# tibble(x = 1:4, y = 1:3) # Error!
# Column names must be unique
# tibble(x = 1, x = 2) # Error!
# All columns must have the same length (or length 1)
tibble(
x = 1:3,
y = 1, # Length 1 - OK, will be recycled
z = 4:6 # Length 3 - OK
)# A tibble: 3 × 3
x y z
<int> <dbl> <int>
1 1 1 4
2 2 1 5
3 3 1 6
Tibbles in Data Analysis Workflows
Example: Customer Analysis Pipeline
# Create customer data
customers <- tibble(
customer_id = 1:20,
age = sample(18:65, 20, replace = TRUE),
region = sample(c("North", "South", "East", "West"), 20, replace = TRUE),
purchases = sample(0:50, 20, replace = TRUE),
member_since = sample(2018:2024, 20, replace = TRUE)
)
# Analysis pipeline
customer_summary <- customers %>%
mutate(
years_member = 2024 - member_since,
age_group = case_when(
age < 30 ~ "Young",
age < 50 ~ "Middle",
TRUE ~ "Senior"
)
) %>%
group_by(region, age_group) %>%
summarize(
n_customers = n(),
avg_purchases = mean(purchases),
total_purchases = sum(purchases),
avg_tenure = mean(years_member),
.groups = "drop"
) %>%
arrange(desc(total_purchases))
customer_summary# A tibble: 12 × 6
region age_group n_customers avg_purchases total_purchases avg_tenure
<chr> <chr> <int> <dbl> <int> <dbl>
1 South Middle 4 21.5 86 2.25
2 South Young 3 25.3 76 4.67
3 West Middle 3 16.3 49 1.33
4 East Senior 1 46 46 0
5 South Senior 1 43 43 1
6 East Middle 1 33 33 4
7 North Middle 2 16.5 33 3.5
8 North Senior 1 22 22 0
9 North Young 1 22 22 4
10 West Young 1 22 22 4
11 West Senior 1 7 7 3
12 East Young 1 5 5 2
Combining Multiple Tibbles
# Product information
products <- tibble(
product_id = 1:5,
product_name = c("Widget A", "Widget B", "Gadget C", "Gadget D", "Tool E"),
price = c(19.99, 29.99, 49.99, 39.99, 24.99)
)
# Sales records
sales <- tibble(
sale_id = 1:10,
product_id = sample(1:5, 10, replace = TRUE),
quantity = sample(1:5, 10, replace = TRUE),
date = seq.Date(from = as.Date("2024-01-01"),
length.out = 10,
by = "day")
)
# Join tibbles
sales_details <- sales %>%
left_join(products, by = "product_id") %>%
mutate(revenue = quantity * price) %>%
select(sale_id, date, product_name, quantity, price, revenue)
sales_details# A tibble: 10 × 6
sale_id date product_name quantity price revenue
<int> <date> <chr> <int> <dbl> <dbl>
1 1 2024-01-01 Tool E 1 25.0 25.0
2 2 2024-01-02 Tool E 3 25.0 75.0
3 3 2024-01-03 Tool E 5 25.0 125.
4 4 2024-01-04 Gadget D 3 40.0 120.
5 5 2024-01-05 Gadget C 3 50.0 150.
6 6 2024-01-06 Gadget C 3 50.0 150.
7 7 2024-01-07 Widget A 4 20.0 80.0
8 8 2024-01-08 Widget A 1 20.0 20.0
9 9 2024-01-09 Widget B 4 30.0 120.
10 10 2024-01-10 Gadget C 1 50.0 50.0
Best Practices with Tibbles
1. Use Consistent Naming
# Good: Consistent, descriptive names
good_tibble <- tibble(
customer_id = 1:3,
customer_name = c("Alice", "Bob", "Charlie"),
customer_age = c(25, 30, 35),
purchase_amount = c(100, 200, 150)
)
# Better: Even more consistent with snake_case
better_tibble <- tibble(
id = 1:3,
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
amount_purchased = c(100, 200, 150)
)2. Document Complex Structures
# Document what each column represents
experiment_data <- tibble(
# Unique identifier for each experiment
exp_id = 1:5,
# Timestamp when experiment started (UTC)
start_time = Sys.time() + 0:4 * 3600,
# List of measurements taken during experiment
measurements = list(
rnorm(10), rnorm(12), rnorm(8), rnorm(15), rnorm(10)
),
# Nested metadata about experimental conditions
conditions = list(
list(temp = 20, pressure = 1.0),
list(temp = 25, pressure = 1.0),
list(temp = 20, pressure = 1.5),
list(temp = 25, pressure = 1.5),
list(temp = 22.5, pressure = 1.25)
)
)
glimpse(experiment_data)Rows: 5
Columns: 4
$ exp_id <int> 1, 2, 3, 4, 5
$ start_time <dttm> 2025-09-22 00:35:52, 2025-09-22 01:35:52, 2025-09-22 02:3…
$ measurements <list> <0.3860134, 0.4324334, -0.5975443, -0.4130127, -0.433889…
$ conditions <list> [20, 1], [25, 1], [20, 1.5], [25, 1.5], [22.5, 1.25]
3. Prefer Tibbles for New Code
# When reading data, get tibbles directly
# read_csv() returns a tibble
# read.csv() returns a data frame
# When creating data, use tibble() not data.frame()
modern_approach <- tibble(
x = 1:3,
y = c("a", "b", "c")
)
# When converting existing data frames
legacy_df <- data.frame(x = 1:3, y = 4:6)
modernized <- as_tibble(legacy_df)Common Operations Reference
# Create a reference tibble
ref <- tibble(
id = 1:5,
value = c(10, 20, 30, 40, 50),
category = c("A", "B", "A", "B", "A")
)
# Selection operations
ref %>% select(id, value) # Select columns# A tibble: 5 × 2
id value
<int> <dbl>
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
ref %>% select(-category) # Drop columns# A tibble: 5 × 2
id value
<int> <dbl>
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
ref %>% filter(value > 20) # Filter rows# A tibble: 3 × 3
id value category
<int> <dbl> <chr>
1 3 30 A
2 4 40 B
3 5 50 A
ref %>% slice(1:3) # Select rows by position# A tibble: 3 × 3
id value category
<int> <dbl> <chr>
1 1 10 A
2 2 20 B
3 3 30 A
# Modification operations
ref %>% mutate(double = value * 2) # Add column# A tibble: 5 × 4
id value category double
<int> <dbl> <chr> <dbl>
1 1 10 A 20
2 2 20 B 40
3 3 30 A 60
4 4 40 B 80
5 5 50 A 100
ref %>% rename(val = value) # Rename column# A tibble: 5 × 3
id val category
<int> <dbl> <chr>
1 1 10 A
2 2 20 B
3 3 30 A
4 4 40 B
5 5 50 A
ref %>% arrange(desc(value)) # Sort rows# A tibble: 5 × 3
id value category
<int> <dbl> <chr>
1 5 50 A
2 4 40 B
3 3 30 A
4 2 20 B
5 1 10 A
# Aggregation operations
ref %>%
group_by(category) %>%
summarize(
mean_value = mean(value),
total = sum(value),
count = n()
)# A tibble: 2 × 4
category mean_value total count
<chr> <dbl> <dbl> <int>
1 A 30 90 3
2 B 30 60 2
Exercises
Exercise 1: Create and Explore
Create a tibble with information about 5 books (title, author, year, pages). Print it and use glimpse() to examine its structure.
Exercise 2: List Columns
Create a tibble where each row represents a student and includes: - Name (character) - Grades (list of numeric scores) - Subjects (list of course names)
Exercise 3: Tibble vs Data Frame
Create the same dataset as both a data frame and a tibble. Compare: - How they print - How they handle column access - What happens with partial name matching
Exercise 4: Complex Analysis
Using the built-in diamonds dataset (convert to tibble first): 1. Group by cut and color 2. Calculate average price and carat 3. Create a list column with all clarity values per group 4. Find which combination has the highest average price
Summary
Tibbles are the foundation of tidy data analysis:
- Safer defaults: No automatic type conversion, no partial matching
- Better printing: Shows only what fits, includes type information
- Modern features: List columns, nested data structures
- Consistent behavior: Predictable subsetting and modification
- Tidyverse integration: Works seamlessly with all tidyverse functions
As you work more with the tidyverse, tibbles will become your default data structure. They eliminate many of the surprises and inconsistencies of traditional data frames while adding powerful new capabilities.
Next, we’ll learn about importing data with readr to get your data into tibble format!