library(tidyverse)
Tibbles: Modern Data Frames
What are Tibbles?
Tibbles are a modern reimagining of data frames, designed to work seamlessly within the tidyverse. While they behave like data frames in most ways, tibbles have several improvements that make them safer and more user-friendly for data analysis.
Creating Tibbles
From Scratch with tibble()
The most common way to create a tibble is with the tibble()
function:
# Create a tibble
<- tibble(
my_tibble id = 1:5,
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
score = c(85.5, 92.3, 78.9, 95.1, 88.7),
passed = score >= 80
)
my_tibble
# A tibble: 5 × 4
id name score passed
<int> <chr> <dbl> <lgl>
1 1 Alice 85.5 TRUE
2 2 Bob 92.3 TRUE
3 3 Charlie 78.9 FALSE
4 4 Diana 95.1 TRUE
5 5 Eve 88.7 TRUE
Converting Data Frames to Tibbles
You can convert existing data frames to tibbles:
# Traditional data frame
<- data.frame(
df x = 1:3,
y = c("a", "b", "c")
)
# Convert to tibble
<- as_tibble(df)
tbl
# Compare the printing
print(df)
x y
1 1 a
2 2 b
3 3 c
print(tbl)
# A tibble: 3 × 2
x y
<int> <chr>
1 1 a
2 2 b
3 3 c
Using tribble()
for Row-wise Creation
tribble()
allows you to create tibbles row-by-row, which is great for small datasets:
# Create a tibble row by row
<- tribble(
grades ~student, ~math, ~science, ~english,
#---------|------|---------|---------|
"Alice", 85, 90, 88,
"Bob", 92, 88, 85,
"Charlie", 78, 82, 90,
"Diana", 95, 92, 87
)
grades
# A tibble: 4 × 4
student math science english
<chr> <dbl> <dbl> <dbl>
1 Alice 85 90 88
2 Bob 92 88 85
3 Charlie 78 82 90
4 Diana 95 92 87
Key Differences from Data Frames
1. Enhanced Printing
Tibbles have a more informative print method:
# Create a larger tibble
<- tibble(
large_tibble id = 1:1000,
value = rnorm(1000),
category = sample(letters[1:5], 1000, replace = TRUE),
timestamp = Sys.time() + 1:1000
)
# Tibble shows first 10 rows and all columns that fit on screen
large_tibble
# A tibble: 1,000 × 4
id value category timestamp
<int> <dbl> <chr> <dttm>
1 1 0.00233 b 2025-09-22 00:35:53
2 2 0.361 a 2025-09-22 00:35:54
3 3 -0.476 c 2025-09-22 00:35:55
4 4 -0.884 e 2025-09-22 00:35:56
5 5 -1.01 d 2025-09-22 00:35:57
6 6 -1.25 b 2025-09-22 00:35:58
7 7 2.38 a 2025-09-22 00:35:59
8 8 0.828 a 2025-09-22 00:36:00
9 9 -0.0807 b 2025-09-22 00:36:01
10 10 0.360 e 2025-09-22 00:36:02
# ℹ 990 more rows
# Data frame would print all 1000 rows!
# df <- as.data.frame(large_tibble)
# df # Don't run this - it would print everything!
2. Column Types are Preserved
Tibbles don’t automatically convert strings to factors:
# Data frame converts strings to factors (in older R versions)
<- data.frame(
df text = c("apple", "banana", "cherry"),
value = 1:3
)
# Tibble preserves strings as character vectors
<- tibble(
tbl text = c("apple", "banana", "cherry"),
value = 1:3
)
# Check the types
str(df)
'data.frame': 3 obs. of 2 variables:
$ text : chr "apple" "banana" "cherry"
$ value: int 1 2 3
str(tbl)
tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
$ text : chr [1:3] "apple" "banana" "cherry"
$ value: int [1:3] 1 2 3
3. Stricter Subsetting
Tibbles are more strict about subsetting:
# Create a tibble
<- tibble(x = 1:3, y = 4:6)
tbl
# Single bracket always returns a tibble
1] # Returns a tibble with one column tbl[
# A tibble: 3 × 1
x
<int>
1 1
2 2
3 3
"x"] # Returns a tibble with column x tbl[
# A tibble: 3 × 1
x
<int>
1 1
2 2
3 3
# Double bracket and $ extract the column
1]] # Returns the vector tbl[[
[1] 1 2 3
$x # Returns the vector tbl
[1] 1 2 3
"x"]] # Returns the vector tbl[[
[1] 1 2 3
# Tibbles don't do partial matching
# tbl$xy # This would error - no partial matching!
Working with Tibbles
Adding and Modifying Columns
# Start with a basic tibble
<- tibble(
students name = c("Alice", "Bob", "Charlie"),
midterm = c(85, 78, 92),
final = c(88, 82, 90)
)
# Add columns with mutate
<- students %>%
students mutate(
average = (midterm + final) / 2,
grade = case_when(
>= 90 ~ "A",
average >= 80 ~ "B",
average >= 70 ~ "C",
average TRUE ~ "D"
)
)
students
# A tibble: 3 × 5
name midterm final average grade
<chr> <dbl> <dbl> <dbl> <chr>
1 Alice 85 88 86.5 B
2 Bob 78 82 80 B
3 Charlie 92 90 91 A
Accessing Tibble Information
# Create a sample tibble
<- tibble(
data id = 1:100,
group = sample(LETTERS[1:4], 100, replace = TRUE),
value = rnorm(100, mean = 50, sd = 10)
)
# Get dimensions
nrow(data)
[1] 100
ncol(data)
[1] 3
dim(data)
[1] 100 3
# Get column names
names(data)
[1] "id" "group" "value"
colnames(data)
[1] "id" "group" "value"
# Check if it's a tibble
is_tibble(data)
[1] TRUE
is.data.frame(data) # TRUE - tibbles are also data frames
[1] TRUE
# Get a glimpse of the structure
glimpse(data)
Rows: 100
Columns: 3
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
$ group <chr> "D", "B", "C", "D", "B", "D", "B", "B", "C", "C", "A", "B", "C",…
$ value <dbl> 47.78940, 45.17783, 54.64428, 41.40415, 46.95217, 41.23244, 54.6…
Advanced Tibble Features
List Columns
Tibbles can contain list columns, allowing complex data structures:
# Create a tibble with list columns
<- tibble(
complex_data id = 1:3,
name = c("Experiment A", "Experiment B", "Experiment C"),
measurements = list(
c(1.2, 1.5, 1.3),
c(2.1, 2.3, 2.2, 2.4),
c(3.1, 3.0)
),metadata = list(
list(date = "2024-01-01", operator = "Alice"),
list(date = "2024-01-02", operator = "Bob"),
list(date = "2024-01-03", operator = "Charlie")
)
)
complex_data
# A tibble: 3 × 4
id name measurements metadata
<int> <chr> <list> <list>
1 1 Experiment A <dbl [3]> <named list [2]>
2 2 Experiment B <dbl [4]> <named list [2]>
3 3 Experiment C <dbl [2]> <named list [2]>
# Access list column elements
$measurements[[1]] complex_data
[1] 1.2 1.5 1.3
$metadata[[2]]$operator complex_data
[1] "Bob"
Nested Tibbles
You can nest tibbles within tibbles:
# Create nested data
<- tibble(
sales_data region = c("North", "South", "East", "West"),
data = list(
tibble(month = 1:3, sales = c(100, 120, 110)),
tibble(month = 1:3, sales = c(80, 85, 90)),
tibble(month = 1:3, sales = c(95, 100, 105)),
tibble(month = 1:3, sales = c(110, 115, 125))
)
)
sales_data
# A tibble: 4 × 2
region data
<chr> <list>
1 North <tibble [3 × 2]>
2 South <tibble [3 × 2]>
3 East <tibble [3 × 2]>
4 West <tibble [3 × 2]>
# Access nested tibble
$data[[1]] sales_data
# A tibble: 3 × 2
month sales
<int> <dbl>
1 1 100
2 2 120
3 3 110
# Work with nested data using purrr
%>%
sales_data mutate(total_sales = map_dbl(data, ~sum(.x$sales)))
# A tibble: 4 × 3
region data total_sales
<chr> <list> <dbl>
1 North <tibble [3 × 2]> 330
2 South <tibble [3 × 2]> 255
3 East <tibble [3 × 2]> 300
4 West <tibble [3 × 2]> 350
Tibble Validation
Tibbles perform validation to ensure data integrity:
# Recycling rules are stricter
# This works - length 1 recycles
tibble(x = 1:4, y = 1)
# A tibble: 4 × 2
x y
<int> <dbl>
1 1 1
2 2 1
3 3 1
4 4 1
# This errors - inconsistent lengths
# tibble(x = 1:4, y = 1:3) # Error!
# Column names must be unique
# tibble(x = 1, x = 2) # Error!
# All columns must have the same length (or length 1)
tibble(
x = 1:3,
y = 1, # Length 1 - OK, will be recycled
z = 4:6 # Length 3 - OK
)
# A tibble: 3 × 3
x y z
<int> <dbl> <int>
1 1 1 4
2 2 1 5
3 3 1 6
Tibbles in Data Analysis Workflows
Example: Customer Analysis Pipeline
# Create customer data
<- tibble(
customers customer_id = 1:20,
age = sample(18:65, 20, replace = TRUE),
region = sample(c("North", "South", "East", "West"), 20, replace = TRUE),
purchases = sample(0:50, 20, replace = TRUE),
member_since = sample(2018:2024, 20, replace = TRUE)
)
# Analysis pipeline
<- customers %>%
customer_summary mutate(
years_member = 2024 - member_since,
age_group = case_when(
< 30 ~ "Young",
age < 50 ~ "Middle",
age TRUE ~ "Senior"
)%>%
) group_by(region, age_group) %>%
summarize(
n_customers = n(),
avg_purchases = mean(purchases),
total_purchases = sum(purchases),
avg_tenure = mean(years_member),
.groups = "drop"
%>%
) arrange(desc(total_purchases))
customer_summary
# A tibble: 12 × 6
region age_group n_customers avg_purchases total_purchases avg_tenure
<chr> <chr> <int> <dbl> <int> <dbl>
1 South Middle 4 21.5 86 2.25
2 South Young 3 25.3 76 4.67
3 West Middle 3 16.3 49 1.33
4 East Senior 1 46 46 0
5 South Senior 1 43 43 1
6 East Middle 1 33 33 4
7 North Middle 2 16.5 33 3.5
8 North Senior 1 22 22 0
9 North Young 1 22 22 4
10 West Young 1 22 22 4
11 West Senior 1 7 7 3
12 East Young 1 5 5 2
Combining Multiple Tibbles
# Product information
<- tibble(
products product_id = 1:5,
product_name = c("Widget A", "Widget B", "Gadget C", "Gadget D", "Tool E"),
price = c(19.99, 29.99, 49.99, 39.99, 24.99)
)
# Sales records
<- tibble(
sales sale_id = 1:10,
product_id = sample(1:5, 10, replace = TRUE),
quantity = sample(1:5, 10, replace = TRUE),
date = seq.Date(from = as.Date("2024-01-01"),
length.out = 10,
by = "day")
)
# Join tibbles
<- sales %>%
sales_details left_join(products, by = "product_id") %>%
mutate(revenue = quantity * price) %>%
select(sale_id, date, product_name, quantity, price, revenue)
sales_details
# A tibble: 10 × 6
sale_id date product_name quantity price revenue
<int> <date> <chr> <int> <dbl> <dbl>
1 1 2024-01-01 Tool E 1 25.0 25.0
2 2 2024-01-02 Tool E 3 25.0 75.0
3 3 2024-01-03 Tool E 5 25.0 125.
4 4 2024-01-04 Gadget D 3 40.0 120.
5 5 2024-01-05 Gadget C 3 50.0 150.
6 6 2024-01-06 Gadget C 3 50.0 150.
7 7 2024-01-07 Widget A 4 20.0 80.0
8 8 2024-01-08 Widget A 1 20.0 20.0
9 9 2024-01-09 Widget B 4 30.0 120.
10 10 2024-01-10 Gadget C 1 50.0 50.0
Best Practices with Tibbles
1. Use Consistent Naming
# Good: Consistent, descriptive names
<- tibble(
good_tibble customer_id = 1:3,
customer_name = c("Alice", "Bob", "Charlie"),
customer_age = c(25, 30, 35),
purchase_amount = c(100, 200, 150)
)
# Better: Even more consistent with snake_case
<- tibble(
better_tibble id = 1:3,
name = c("Alice", "Bob", "Charlie"),
age = c(25, 30, 35),
amount_purchased = c(100, 200, 150)
)
2. Document Complex Structures
# Document what each column represents
<- tibble(
experiment_data # Unique identifier for each experiment
exp_id = 1:5,
# Timestamp when experiment started (UTC)
start_time = Sys.time() + 0:4 * 3600,
# List of measurements taken during experiment
measurements = list(
rnorm(10), rnorm(12), rnorm(8), rnorm(15), rnorm(10)
),
# Nested metadata about experimental conditions
conditions = list(
list(temp = 20, pressure = 1.0),
list(temp = 25, pressure = 1.0),
list(temp = 20, pressure = 1.5),
list(temp = 25, pressure = 1.5),
list(temp = 22.5, pressure = 1.25)
)
)
glimpse(experiment_data)
Rows: 5
Columns: 4
$ exp_id <int> 1, 2, 3, 4, 5
$ start_time <dttm> 2025-09-22 00:35:52, 2025-09-22 01:35:52, 2025-09-22 02:3…
$ measurements <list> <0.3860134, 0.4324334, -0.5975443, -0.4130127, -0.433889…
$ conditions <list> [20, 1], [25, 1], [20, 1.5], [25, 1.5], [22.5, 1.25]
3. Prefer Tibbles for New Code
# When reading data, get tibbles directly
# read_csv() returns a tibble
# read.csv() returns a data frame
# When creating data, use tibble() not data.frame()
<- tibble(
modern_approach x = 1:3,
y = c("a", "b", "c")
)
# When converting existing data frames
<- data.frame(x = 1:3, y = 4:6)
legacy_df <- as_tibble(legacy_df) modernized
Common Operations Reference
# Create a reference tibble
<- tibble(
ref id = 1:5,
value = c(10, 20, 30, 40, 50),
category = c("A", "B", "A", "B", "A")
)
# Selection operations
%>% select(id, value) # Select columns ref
# A tibble: 5 × 2
id value
<int> <dbl>
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
%>% select(-category) # Drop columns ref
# A tibble: 5 × 2
id value
<int> <dbl>
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
%>% filter(value > 20) # Filter rows ref
# A tibble: 3 × 3
id value category
<int> <dbl> <chr>
1 3 30 A
2 4 40 B
3 5 50 A
%>% slice(1:3) # Select rows by position ref
# A tibble: 3 × 3
id value category
<int> <dbl> <chr>
1 1 10 A
2 2 20 B
3 3 30 A
# Modification operations
%>% mutate(double = value * 2) # Add column ref
# A tibble: 5 × 4
id value category double
<int> <dbl> <chr> <dbl>
1 1 10 A 20
2 2 20 B 40
3 3 30 A 60
4 4 40 B 80
5 5 50 A 100
%>% rename(val = value) # Rename column ref
# A tibble: 5 × 3
id val category
<int> <dbl> <chr>
1 1 10 A
2 2 20 B
3 3 30 A
4 4 40 B
5 5 50 A
%>% arrange(desc(value)) # Sort rows ref
# A tibble: 5 × 3
id value category
<int> <dbl> <chr>
1 5 50 A
2 4 40 B
3 3 30 A
4 2 20 B
5 1 10 A
# Aggregation operations
%>%
ref group_by(category) %>%
summarize(
mean_value = mean(value),
total = sum(value),
count = n()
)
# A tibble: 2 × 4
category mean_value total count
<chr> <dbl> <dbl> <int>
1 A 30 90 3
2 B 30 60 2
Exercises
Exercise 1: Create and Explore
Create a tibble with information about 5 books (title, author, year, pages). Print it and use glimpse()
to examine its structure.
Exercise 2: List Columns
Create a tibble where each row represents a student and includes: - Name (character) - Grades (list of numeric scores) - Subjects (list of course names)
Exercise 3: Tibble vs Data Frame
Create the same dataset as both a data frame and a tibble. Compare: - How they print - How they handle column access - What happens with partial name matching
Exercise 4: Complex Analysis
Using the built-in diamonds
dataset (convert to tibble first): 1. Group by cut and color 2. Calculate average price and carat 3. Create a list column with all clarity values per group 4. Find which combination has the highest average price
Summary
Tibbles are the foundation of tidy data analysis:
- Safer defaults: No automatic type conversion, no partial matching
- Better printing: Shows only what fits, includes type information
- Modern features: List columns, nested data structures
- Consistent behavior: Predictable subsetting and modification
- Tidyverse integration: Works seamlessly with all tidyverse functions
As you work more with the tidyverse, tibbles will become your default data structure. They eliminate many of the surprises and inconsistencies of traditional data frames while adding powerful new capabilities.
Next, we’ll learn about importing data with readr to get your data into tibble format!