Tibbles: Modern Data Frames

Author

IND215

Published

September 22, 2025

What are Tibbles?

Tibbles are a modern reimagining of data frames, designed to work seamlessly within the tidyverse. While they behave like data frames in most ways, tibbles have several improvements that make them safer and more user-friendly for data analysis.

library(tidyverse)

Creating Tibbles

From Scratch with `tibble()`

The most common way to create a tibble is with the tibble() function:

# Create a tibble
my_tibble <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
  score = c(85.5, 92.3, 78.9, 95.1, 88.7),
  passed = score >= 80
)

my_tibble

# A tibble: 5 × 4
     id name    score passed
  <int> <chr>   <dbl> <lgl> 
1     1 Alice    85.5 TRUE  
2     2 Bob      92.3 TRUE  
3     3 Charlie  78.9 FALSE 
4     4 Diana    95.1 TRUE  
5     5 Eve      88.7 TRUE

Converting Data Frames to Tibbles

You can convert existing data frames to tibbles:

# Traditional data frame
df <- data.frame(
  x = 1:3,
  y = c("a", "b", "c")
)

# Convert to tibble
tbl <- as_tibble(df)

# Compare the printing
print(df)

  x y
1 1 a
2 2 b
3 3 c

print(tbl)

# A tibble: 3 × 2
      x y    
  <int> <chr>
1     1 a    
2     2 b    
3     3 c

Using `tribble()` for Row-wise Creation

tribble() allows you to create tibbles row-by-row, which is great for small datasets:

# Create a tibble row by row
grades <- tribble(
  ~student,  ~math, ~science, ~english,
  #---------|------|---------|---------|
  "Alice",      85,       90,       88,
  "Bob",        92,       88,       85,
  "Charlie",    78,       82,       90,
  "Diana",      95,       92,       87
)

grades

# A tibble: 4 × 4
  student  math science english
  <chr>   <dbl>   <dbl>   <dbl>
1 Alice      85      90      88
2 Bob        92      88      85
3 Charlie    78      82      90
4 Diana      95      92      87

Key Differences from Data Frames

1. Enhanced Printing

Tibbles have a more informative print method:

# Create a larger tibble
large_tibble <- tibble(
  id = 1:1000,
  value = rnorm(1000),
  category = sample(letters[1:5], 1000, replace = TRUE),
  timestamp = Sys.time() + 1:1000
)

# Tibble shows first 10 rows and all columns that fit on screen
large_tibble

# A tibble: 1,000 × 4
      id    value category timestamp          
   <int>    <dbl> <chr>    <dttm>             
 1     1  0.00233 b        2025-09-22 00:35:53
 2     2  0.361   a        2025-09-22 00:35:54
 3     3 -0.476   c        2025-09-22 00:35:55
 4     4 -0.884   e        2025-09-22 00:35:56
 5     5 -1.01    d        2025-09-22 00:35:57
 6     6 -1.25    b        2025-09-22 00:35:58
 7     7  2.38    a        2025-09-22 00:35:59
 8     8  0.828   a        2025-09-22 00:36:00
 9     9 -0.0807  b        2025-09-22 00:36:01
10    10  0.360   e        2025-09-22 00:36:02
# ℹ 990 more rows

# Data frame would print all 1000 rows!
# df <- as.data.frame(large_tibble)
# df  # Don't run this - it would print everything!

2. Column Types are Preserved

Tibbles don’t automatically convert strings to factors:

# Data frame converts strings to factors (in older R versions)
df <- data.frame(
  text = c("apple", "banana", "cherry"),
  value = 1:3
)

# Tibble preserves strings as character vectors
tbl <- tibble(
  text = c("apple", "banana", "cherry"),
  value = 1:3
)

# Check the types
str(df)

'data.frame':   3 obs. of  2 variables:
 $ text : chr  "apple" "banana" "cherry"
 $ value: int  1 2 3

str(tbl)

tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
 $ text : chr [1:3] "apple" "banana" "cherry"
 $ value: int [1:3] 1 2 3

3. Stricter Subsetting

Tibbles are more strict about subsetting:

# Create a tibble
tbl <- tibble(x = 1:3, y = 4:6)

# Single bracket always returns a tibble
tbl[1]       # Returns a tibble with one column

# A tibble: 3 × 1
      x
  <int>
1     1
2     2
3     3

tbl["x"]     # Returns a tibble with column x

# A tibble: 3 × 1
      x
  <int>
1     1
2     2
3     3

# Double bracket and $ extract the column
tbl[[1]]     # Returns the vector

[1] 1 2 3

tbl$x        # Returns the vector

[1] 1 2 3

tbl[["x"]]   # Returns the vector

[1] 1 2 3

# Tibbles don't do partial matching
# tbl$xy     # This would error - no partial matching!

Working with Tibbles

Adding and Modifying Columns

# Start with a basic tibble
students <- tibble(
  name = c("Alice", "Bob", "Charlie"),
  midterm = c(85, 78, 92),
  final = c(88, 82, 90)
)

# Add columns with mutate
students <- students %>%
  mutate(
    average = (midterm + final) / 2,
    grade = case_when(
      average >= 90 ~ "A",
      average >= 80 ~ "B",
      average >= 70 ~ "C",
      TRUE ~ "D"
    )
  )

students

# A tibble: 3 × 5
  name    midterm final average grade
  <chr>     <dbl> <dbl>   <dbl> <chr>
1 Alice        85    88    86.5 B    
2 Bob          78    82    80   B    
3 Charlie      92    90    91   A

Accessing Tibble Information

# Create a sample tibble
data <- tibble(
  id = 1:100,
  group = sample(LETTERS[1:4], 100, replace = TRUE),
  value = rnorm(100, mean = 50, sd = 10)
)

# Get dimensions
nrow(data)

[1] 100

ncol(data)

[1] 3

dim(data)

[1] 100   3

# Get column names
names(data)

[1] "id"    "group" "value"

colnames(data)

[1] "id"    "group" "value"

# Check if it's a tibble
is_tibble(data)

[1] TRUE

is.data.frame(data)  # TRUE - tibbles are also data frames

[1] TRUE

# Get a glimpse of the structure
glimpse(data)

Rows: 100
Columns: 3
$ id    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1…
$ group <chr> "D", "B", "C", "D", "B", "D", "B", "B", "C", "C", "A", "B", "C",…
$ value <dbl> 47.78940, 45.17783, 54.64428, 41.40415, 46.95217, 41.23244, 54.6…

Advanced Tibble Features

List Columns

Tibbles can contain list columns, allowing complex data structures:

# Create a tibble with list columns
complex_data <- tibble(
  id = 1:3,
  name = c("Experiment A", "Experiment B", "Experiment C"),
  measurements = list(
    c(1.2, 1.5, 1.3),
    c(2.1, 2.3, 2.2, 2.4),
    c(3.1, 3.0)
  ),
  metadata = list(
    list(date = "2024-01-01", operator = "Alice"),
    list(date = "2024-01-02", operator = "Bob"),
    list(date = "2024-01-03", operator = "Charlie")
  )
)

complex_data

# A tibble: 3 × 4
     id name         measurements metadata        
  <int> <chr>        <list>       <list>          
1     1 Experiment A <dbl [3]>    <named list [2]>
2     2 Experiment B <dbl [4]>    <named list [2]>
3     3 Experiment C <dbl [2]>    <named list [2]>

# Access list column elements
complex_data$measurements[[1]]

[1] 1.2 1.5 1.3

complex_data$metadata[[2]]$operator

[1] "Bob"

Nested Tibbles

You can nest tibbles within tibbles:

# Create nested data
sales_data <- tibble(
  region = c("North", "South", "East", "West"),
  data = list(
    tibble(month = 1:3, sales = c(100, 120, 110)),
    tibble(month = 1:3, sales = c(80, 85, 90)),
    tibble(month = 1:3, sales = c(95, 100, 105)),
    tibble(month = 1:3, sales = c(110, 115, 125))
  )
)

sales_data

# A tibble: 4 × 2
  region data            
  <chr>  <list>          
1 North  <tibble [3 × 2]>
2 South  <tibble [3 × 2]>
3 East   <tibble [3 × 2]>
4 West   <tibble [3 × 2]>

# Access nested tibble
sales_data$data[[1]]

# A tibble: 3 × 2
  month sales
  <int> <dbl>
1     1   100
2     2   120
3     3   110

# Work with nested data using purrr
sales_data %>%
  mutate(total_sales = map_dbl(data, ~sum(.x$sales)))

# A tibble: 4 × 3
  region data             total_sales
  <chr>  <list>                 <dbl>
1 North  <tibble [3 × 2]>         330
2 South  <tibble [3 × 2]>         255
3 East   <tibble [3 × 2]>         300
4 West   <tibble [3 × 2]>         350

Tibble Validation

Tibbles perform validation to ensure data integrity:

# Recycling rules are stricter
# This works - length 1 recycles
tibble(x = 1:4, y = 1)

# A tibble: 4 × 2
      x     y
  <int> <dbl>
1     1     1
2     2     1
3     3     1
4     4     1

# This errors - inconsistent lengths
# tibble(x = 1:4, y = 1:3)  # Error!

# Column names must be unique
# tibble(x = 1, x = 2)  # Error!

# All columns must have the same length (or length 1)
tibble(
  x = 1:3,
  y = 1,      # Length 1 - OK, will be recycled
  z = 4:6     # Length 3 - OK
)

# A tibble: 3 × 3
      x     y     z
  <int> <dbl> <int>
1     1     1     4
2     2     1     5
3     3     1     6

Tibbles in Data Analysis Workflows

Example: Customer Analysis Pipeline

# Create customer data
customers <- tibble(
  customer_id = 1:20,
  age = sample(18:65, 20, replace = TRUE),
  region = sample(c("North", "South", "East", "West"), 20, replace = TRUE),
  purchases = sample(0:50, 20, replace = TRUE),
  member_since = sample(2018:2024, 20, replace = TRUE)
)

# Analysis pipeline
customer_summary <- customers %>%
  mutate(
    years_member = 2024 - member_since,
    age_group = case_when(
      age < 30 ~ "Young",
      age < 50 ~ "Middle",
      TRUE ~ "Senior"
    )
  ) %>%
  group_by(region, age_group) %>%
  summarize(
    n_customers = n(),
    avg_purchases = mean(purchases),
    total_purchases = sum(purchases),
    avg_tenure = mean(years_member),
    .groups = "drop"
  ) %>%
  arrange(desc(total_purchases))

customer_summary

# A tibble: 12 × 6
   region age_group n_customers avg_purchases total_purchases avg_tenure
   <chr>  <chr>           <int>         <dbl>           <int>      <dbl>
 1 South  Middle              4          21.5              86       2.25
 2 South  Young               3          25.3              76       4.67
 3 West   Middle              3          16.3              49       1.33
 4 East   Senior              1          46                46       0   
 5 South  Senior              1          43                43       1   
 6 East   Middle              1          33                33       4   
 7 North  Middle              2          16.5              33       3.5 
 8 North  Senior              1          22                22       0   
 9 North  Young               1          22                22       4   
10 West   Young               1          22                22       4   
11 West   Senior              1           7                 7       3   
12 East   Young               1           5                 5       2

Combining Multiple Tibbles

# Product information
products <- tibble(
  product_id = 1:5,
  product_name = c("Widget A", "Widget B", "Gadget C", "Gadget D", "Tool E"),
  price = c(19.99, 29.99, 49.99, 39.99, 24.99)
)

# Sales records
sales <- tibble(
  sale_id = 1:10,
  product_id = sample(1:5, 10, replace = TRUE),
  quantity = sample(1:5, 10, replace = TRUE),
  date = seq.Date(from = as.Date("2024-01-01"),
                  length.out = 10,
                  by = "day")
)

# Join tibbles
sales_details <- sales %>%
  left_join(products, by = "product_id") %>%
  mutate(revenue = quantity * price) %>%
  select(sale_id, date, product_name, quantity, price, revenue)

sales_details

# A tibble: 10 × 6
   sale_id date       product_name quantity price revenue
     <int> <date>     <chr>           <int> <dbl>   <dbl>
 1       1 2024-01-01 Tool E              1  25.0    25.0
 2       2 2024-01-02 Tool E              3  25.0    75.0
 3       3 2024-01-03 Tool E              5  25.0   125. 
 4       4 2024-01-04 Gadget D            3  40.0   120. 
 5       5 2024-01-05 Gadget C            3  50.0   150. 
 6       6 2024-01-06 Gadget C            3  50.0   150. 
 7       7 2024-01-07 Widget A            4  20.0    80.0
 8       8 2024-01-08 Widget A            1  20.0    20.0
 9       9 2024-01-09 Widget B            4  30.0   120. 
10      10 2024-01-10 Gadget C            1  50.0    50.0

Best Practices with Tibbles

1. Use Consistent Naming

# Good: Consistent, descriptive names
good_tibble <- tibble(
  customer_id = 1:3,
  customer_name = c("Alice", "Bob", "Charlie"),
  customer_age = c(25, 30, 35),
  purchase_amount = c(100, 200, 150)
)

# Better: Even more consistent with snake_case
better_tibble <- tibble(
  id = 1:3,
  name = c("Alice", "Bob", "Charlie"),
  age = c(25, 30, 35),
  amount_purchased = c(100, 200, 150)
)

2. Document Complex Structures

# Document what each column represents
experiment_data <- tibble(
  # Unique identifier for each experiment
  exp_id = 1:5,

  # Timestamp when experiment started (UTC)
  start_time = Sys.time() + 0:4 * 3600,

  # List of measurements taken during experiment
  measurements = list(
    rnorm(10), rnorm(12), rnorm(8), rnorm(15), rnorm(10)
  ),

  # Nested metadata about experimental conditions
  conditions = list(
    list(temp = 20, pressure = 1.0),
    list(temp = 25, pressure = 1.0),
    list(temp = 20, pressure = 1.5),
    list(temp = 25, pressure = 1.5),
    list(temp = 22.5, pressure = 1.25)
  )
)

glimpse(experiment_data)

Rows: 5
Columns: 4
$ exp_id       <int> 1, 2, 3, 4, 5
$ start_time   <dttm> 2025-09-22 00:35:52, 2025-09-22 01:35:52, 2025-09-22 02:3…
$ measurements <list> <0.3860134, 0.4324334, -0.5975443, -0.4130127, -0.433889…
$ conditions   <list> [20, 1], [25, 1], [20, 1.5], [25, 1.5], [22.5, 1.25]

3. Prefer Tibbles for New Code

# When reading data, get tibbles directly
# read_csv() returns a tibble
# read.csv() returns a data frame

# When creating data, use tibble() not data.frame()
modern_approach <- tibble(
  x = 1:3,
  y = c("a", "b", "c")
)

# When converting existing data frames
legacy_df <- data.frame(x = 1:3, y = 4:6)
modernized <- as_tibble(legacy_df)

Common Operations Reference

# Create a reference tibble
ref <- tibble(
  id = 1:5,
  value = c(10, 20, 30, 40, 50),
  category = c("A", "B", "A", "B", "A")
)

# Selection operations
ref %>% select(id, value)              # Select columns

# A tibble: 5 × 2
     id value
  <int> <dbl>
1     1    10
2     2    20
3     3    30
4     4    40
5     5    50

ref %>% select(-category)              # Drop columns

# A tibble: 5 × 2
     id value
  <int> <dbl>
1     1    10
2     2    20
3     3    30
4     4    40
5     5    50

ref %>% filter(value > 20)             # Filter rows

# A tibble: 3 × 3
     id value category
  <int> <dbl> <chr>   
1     3    30 A       
2     4    40 B       
3     5    50 A

ref %>% slice(1:3)                     # Select rows by position

# A tibble: 3 × 3
     id value category
  <int> <dbl> <chr>   
1     1    10 A       
2     2    20 B       
3     3    30 A

# Modification operations
ref %>% mutate(double = value * 2)     # Add column

# A tibble: 5 × 4
     id value category double
  <int> <dbl> <chr>     <dbl>
1     1    10 A            20
2     2    20 B            40
3     3    30 A            60
4     4    40 B            80
5     5    50 A           100

ref %>% rename(val = value)            # Rename column

# A tibble: 5 × 3
     id   val category
  <int> <dbl> <chr>   
1     1    10 A       
2     2    20 B       
3     3    30 A       
4     4    40 B       
5     5    50 A

ref %>% arrange(desc(value))           # Sort rows

# A tibble: 5 × 3
     id value category
  <int> <dbl> <chr>   
1     5    50 A       
2     4    40 B       
3     3    30 A       
4     2    20 B       
5     1    10 A

# Aggregation operations
ref %>%
  group_by(category) %>%
  summarize(
    mean_value = mean(value),
    total = sum(value),
    count = n()
  )

# A tibble: 2 × 4
  category mean_value total count
  <chr>         <dbl> <dbl> <int>
1 A                30    90     3
2 B                30    60     2

Exercises

Exercise 1: Create and Explore

Create a tibble with information about 5 books (title, author, year, pages). Print it and use glimpse() to examine its structure.

Exercise 2: List Columns

Create a tibble where each row represents a student and includes: - Name (character) - Grades (list of numeric scores) - Subjects (list of course names)

Exercise 3: Tibble vs Data Frame

Create the same dataset as both a data frame and a tibble. Compare: - How they print - How they handle column access - What happens with partial name matching

Exercise 4: Complex Analysis

Using the built-in diamonds dataset (convert to tibble first): 1. Group by cut and color 2. Calculate average price and carat 3. Create a list column with all clarity values per group 4. Find which combination has the highest average price

Summary

Tibbles are the foundation of tidy data analysis:

Safer defaults: No automatic type conversion, no partial matching
Better printing: Shows only what fits, includes type information
Modern features: List columns, nested data structures
Consistent behavior: Predictable subsetting and modification
Tidyverse integration: Works seamlessly with all tidyverse functions

As you work more with the tidyverse, tibbles will become your default data structure. They eliminate many of the surprises and inconsistencies of traditional data frames while adding powerful new capabilities.

Next, we’ll learn about importing data with readr to get your data into tibble format!