Vectors: R’s Building Blocks

Author

IND215

Published

September 22, 2025

Introduction to Vectors

Vectors are the most fundamental data structure in R. In fact, even single values in R are vectors with one element! Understanding vectors is crucial because virtually everything in R is built upon them.

A vector is a sequence of data elements of the same type. Think of it as a container that holds multiple values in a specific order.

Creating Vectors

The c() Function

The most common way to create vectors is using the c() function (which stands for “combine” or “concatenate”):

# Numeric vectors
numbers <- c(1, 2, 3, 4, 5)
temperatures <- c(72.5, 75.2, 68.9, 80.1, 77.3)

# Character vectors
names <- c("Alice", "Bob", "Charlie", "Diana")
colors <- c("red", "green", "blue", "yellow")

# Logical vectors
answers <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

# Print the vectors
print(numbers)
[1] 1 2 3 4 5
print(names)
[1] "Alice"   "Bob"     "Charlie" "Diana"  
print(answers)
[1]  TRUE FALSE  TRUE  TRUE FALSE

Vector Properties

Every vector has important properties:

scores <- c(85, 92, 78, 96, 88)

# Length: number of elements
length(scores)
[1] 5
# Type: what kind of data
typeof(scores)
[1] "double"
# Class: object class
class(scores)
[1] "numeric"
# Structure: comprehensive overview
str(scores)
 num [1:5] 85 92 78 96 88

Creating Sequences

R provides several ways to create vectors with patterns:

# Simple sequences
seq1 <- 1:10          # 1, 2, 3, ..., 10
seq2 <- 10:1          # 10, 9, 8, ..., 1

# Using seq() function
seq3 <- seq(from = 0, to = 100, by = 10)    # 0, 10, 20, ..., 100
seq4 <- seq(0, 1, length.out = 11)         # 11 equally spaced numbers

# Repeated values
rep1 <- rep(5, times = 8)                  # 5, 5, 5, 5, 5, 5, 5, 5
rep2 <- rep(c(1, 2, 3), times = 3)        # 1, 2, 3, 1, 2, 3, 1, 2, 3
rep3 <- rep(c(1, 2, 3), each = 3)         # 1, 1, 1, 2, 2, 2, 3, 3, 3

print(seq3)
 [1]   0  10  20  30  40  50  60  70  80  90 100
print(rep2)
[1] 1 2 3 1 2 3 1 2 3
print(rep3)
[1] 1 1 1 2 2 2 3 3 3

Vector Indexing and Subsetting

Accessing Elements by Position

fruits <- c("apple", "banana", "cherry", "date", "elderberry")

# Single element (note: R uses 1-based indexing!)
fruits[1]     # First element
[1] "apple"
fruits[3]     # Third element
[1] "cherry"
fruits[5]     # Last element
[1] "elderberry"
# Multiple elements
fruits[c(1, 3, 5)]    # Elements 1, 3, and 5
[1] "apple"      "cherry"     "elderberry"
fruits[1:3]           # Elements 1 through 3
[1] "apple"  "banana" "cherry"
fruits[c(2, 4)]       # Elements 2 and 4
[1] "banana" "date"  

Negative Indexing

Use negative indices to exclude elements:

numbers <- c(10, 20, 30, 40, 50)

# Exclude specific elements
numbers[-1]           # All except the first
[1] 20 30 40 50
numbers[-c(1, 5)]     # All except first and last
[1] 20 30 40
numbers[-(2:4)]       # All except elements 2 through 4
[1] 10 50
print(numbers[-c(1, 5)])
[1] 20 30 40

Logical Indexing

Use logical vectors to subset based on conditions:

ages <- c(23, 35, 28, 42, 19, 31, 27)

# Find elements meeting a condition
ages > 30             # Logical vector
[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
ages[ages > 30]       # Elements where condition is TRUE
[1] 35 42 31
# Multiple conditions
ages[ages >= 25 & ages <= 35]     # Ages between 25 and 35
[1] 35 28 31 27
ages[ages < 25 | ages > 40]       # Ages less than 25 OR greater than 40
[1] 23 42 19
# Store logical vector for reuse
adults <- ages >= 18
ages[adults]
[1] 23 35 28 42 19 31 27

Subsetting with Names

Vectors can have named elements:

# Create named vector
student_grades <- c(alice = 92, bob = 87, charlie = 95, diana = 89)
print(student_grades)
  alice     bob charlie   diana 
     92      87      95      89 
# Access by name
student_grades["alice"]
alice 
   92 
student_grades[c("alice", "charlie")]
  alice charlie 
     92      95 
# Get names
names(student_grades)
[1] "alice"   "bob"     "charlie" "diana"  
# Add names to existing vector
scores <- c(85, 90, 78, 92)
names(scores) <- c("Math", "Science", "English", "History")
print(scores)
   Math Science English History 
     85      90      78      92 

Element-wise Operations

One of R’s greatest strengths is vectorization - operations work on entire vectors automatically:

Arithmetic Operations

# Create vectors
a <- c(2, 4, 6, 8, 10)
b <- c(1, 2, 3, 4, 5)

# Element-wise arithmetic
a + b     # Add corresponding elements
[1]  3  6  9 12 15
a - b     # Subtract corresponding elements
[1] 1 2 3 4 5
a * b     # Multiply corresponding elements
[1]  2  8 18 32 50
a / b     # Divide corresponding elements
[1] 2 2 2 2 2
a ^ b     # Raise a to the power of b
[1]      2     16    216   4096 100000
# Operations with single values (recycling)
a + 10    # Add 10 to each element
[1] 12 14 16 18 20
a * 2     # Multiply each element by 2
[1]  4  8 12 16 20
a / 2     # Divide each element by 2
[1] 1 2 3 4 5
print(a + b)
[1]  3  6  9 12 15
print(a * 2)
[1]  4  8 12 16 20

Comparison Operations

scores <- c(85, 92, 78, 96, 88, 74, 91)

# Comparisons return logical vectors
high_scores <- scores > 90
passing_scores <- scores >= 80
failing_scores <- scores < 70

print(high_scores)
[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
print(passing_scores)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE
# Count TRUE values
sum(high_scores)      # How many scored above 90?
[1] 3
sum(passing_scores)   # How many passed?
[1] 5
# Percentage calculations
mean(high_scores) * 100    # Percentage with high scores
[1] 42.85714

Logical Operations

x <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
y <- c(FALSE, FALSE, TRUE, TRUE, TRUE)

# Element-wise logical operations
x & y     # AND operation
[1] FALSE FALSE  TRUE FALSE  TRUE
x | y     # OR operation
[1]  TRUE FALSE  TRUE  TRUE  TRUE
!x        # NOT operation
[1] FALSE  TRUE FALSE  TRUE FALSE
print(x & y)
[1] FALSE FALSE  TRUE FALSE  TRUE
print(x | y)
[1]  TRUE FALSE  TRUE  TRUE  TRUE

Vector Functions

R provides many built-in functions that work with vectors:

Mathematical Functions

values <- c(1, 4, 9, 16, 25)

# Basic functions
sum(values)           # Sum of all elements
[1] 55
mean(values)          # Average
[1] 11
median(values)        # Median
[1] 9
min(values)           # Minimum value
[1] 1
max(values)           # Maximum value
[1] 25
range(values)         # Min and max
[1]  1 25
var(values)           # Variance
[1] 93.5
sd(values)            # Standard deviation
[1] 9.66954
# Element-wise mathematical functions
sqrt(values)          # Square root of each element
[1] 1 2 3 4 5
log(values)           # Natural logarithm
[1] 0.000000 1.386294 2.197225 2.772589 3.218876
round(sqrt(values), 2) # Round to 2 decimal places
[1] 1 2 3 4 5
print(sqrt(values))
[1] 1 2 3 4 5

Statistical Functions

# Generate some sample data
set.seed(123)
sample_data <- round(rnorm(20, mean = 75, sd = 10))

# Comprehensive statistics
summary(sample_data)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  55.00   69.75   76.00   76.40   80.50   93.00 
# Quantiles
quantile(sample_data)
   0%   25%   50%   75%  100% 
55.00 69.75 76.00 80.50 93.00 
quantile(sample_data, probs = c(0.25, 0.5, 0.75, 0.95))
  25%   50%   75%   95% 
69.75 76.00 80.50 92.05 
# Ranking and ordering
sort(sample_data)                    # Sort in ascending order
 [1] 55 62 68 69 69 70 71 73 76 76 76 79 79 80 80 82 87 91 92 93
sort(sample_data, decreasing = TRUE) # Sort in descending order
 [1] 93 92 91 87 82 80 80 79 79 76 76 76 73 71 70 69 69 68 62 55
order(sample_data)                   # Indices that would sort the vector
 [1] 18  8  9  1 15 20 10  2  4  5 14 12 13  7 17 19 11  3  6 16
rank(sample_data)                    # Ranks of each element
 [1]  4.5  8.0 18.0 10.0 10.0 19.0 14.5  2.0  3.0  7.0 17.0 12.5 12.5 10.0  4.5
[16] 20.0 14.5  1.0 16.0  6.0

Finding and Counting

grades <- c(85, 92, 78, 96, 88, 74, 91, 89)

# Find specific values
which(grades > 90)           # Positions of elements > 90
[1] 2 4 7
which.max(grades)            # Position of maximum value
[1] 4
which.min(grades)            # Position of minimum value
[1] 6
# Check for presence
85 %in% grades               # Is 85 in the vector?
[1] TRUE
c(85, 100) %in% grades       # Which of these are in the vector?
[1]  TRUE FALSE
# Count specific values
sum(grades == 85)            # How many times does 85 appear?
[1] 1
sum(grades > 90)             # How many scores above 90?
[1] 3
# Unique values
duplicated_values <- c(1, 2, 2, 3, 3, 3, 4)
unique(duplicated_values)    # Get unique values
[1] 1 2 3 4
duplicated(duplicated_values) # Which are duplicates?
[1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE

Modifying Vectors

Adding Elements

# Start with a vector
original <- c(1, 2, 3)

# Add elements to the end
extended <- c(original, 4, 5)
print(extended)
[1] 1 2 3 4 5
# Add elements to the beginning
prepended <- c(0, original)
print(prepended)
[1] 0 1 2 3
# Insert elements in the middle
# (This requires more complex indexing)
middle_insert <- c(original[1:2], 2.5, original[3])
print(middle_insert)
[1] 1.0 2.0 2.5 3.0

Replacing Elements

scores <- c(85, 92, 78, 96, 88)

# Replace specific positions
scores[3] <- 82          # Replace 3rd element
scores[c(1, 5)] <- c(87, 90)  # Replace 1st and 5th elements

print(scores)
[1] 87 92 82 96 90
# Conditional replacement
ages <- c(23, 35, 28, 42, 19, 31, 27)
ages[ages < 25] <- 25    # Set minimum age to 25
print(ages)
[1] 25 35 28 42 25 31 27

Removing Elements

numbers <- c(10, 20, 30, 40, 50)

# Remove by position
shortened <- numbers[-3]              # Remove 3rd element
multiple_removed <- numbers[-c(1, 5)] # Remove 1st and 5th

print(shortened)
[1] 10 20 40 50
print(multiple_removed)
[1] 20 30 40
# Remove by condition
grades <- c(85, 92, 78, 96, 88, 74, 91)
passing_only <- grades[grades >= 80]  # Keep only passing grades
print(passing_only)
[1] 85 92 96 88 91

Working with Missing Values

Creating and Detecting Missing Values

# Vector with missing values
incomplete_data <- c(1, 2, NA, 4, 5, NA, 7)

# Detect missing values
is.na(incomplete_data)
[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
which(is.na(incomplete_data))     # Positions of NA values
[1] 3 6
sum(is.na(incomplete_data))       # Count of NA values
[1] 2
# Complete cases (non-missing)
complete.cases(incomplete_data)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE
incomplete_data[complete.cases(incomplete_data)]
[1] 1 2 4 5 7

Handling Missing Values in Calculations

data_with_na <- c(10, 15, NA, 20, 25, NA, 30)

# Many functions return NA if any element is NA
mean(data_with_na)           # Returns NA
[1] NA
sum(data_with_na)            # Returns NA
[1] NA
# Use na.rm = TRUE to exclude NA values
mean(data_with_na, na.rm = TRUE)
[1] 20
sum(data_with_na, na.rm = TRUE)
[1] 100
sd(data_with_na, na.rm = TRUE)
[1] 7.905694
# Functions that handle NA by default
length(data_with_na)         # Counts NA values too
[1] 7
length(na.omit(data_with_na)) # Length after removing NA
[1] 5

Vector Recycling

When vectors of different lengths are used together, R “recycles” the shorter vector:

# Vectors of different lengths
long_vector <- c(1, 2, 3, 4, 5, 6)
short_vector <- c(10, 20)

# The short vector gets recycled
result <- long_vector + short_vector
print(result)  # c(11, 22, 13, 24, 15, 26)
[1] 11 22 13 24 15 26
# Recycling with single values
add_five <- long_vector + 5  # 5 is recycled to match length
print(add_five)
[1]  6  7  8  9 10 11
# Warning when lengths don't divide evenly
uneven_example <- c(1, 2, 3, 4, 5) + c(10, 20, 30)  # Warning!

Practical Examples

Example 1: Grade Analysis

# Student grades for a class
student_names <- c("Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace")
midterm_scores <- c(85, 78, 92, 88, 79, 94, 87)
final_scores <- c(88, 82, 89, 91, 83, 96, 90)

# Calculate overall grades (60% final, 40% midterm)
overall_grades <- 0.4 * midterm_scores + 0.6 * final_scores

# Assign letter grades
letter_grades <- ifelse(overall_grades >= 90, "A",
                       ifelse(overall_grades >= 80, "B",
                             ifelse(overall_grades >= 70, "C",
                                   ifelse(overall_grades >= 60, "D", "F"))))

# Create a summary
grade_summary <- data.frame(
  Student = student_names,
  Midterm = midterm_scores,
  Final = final_scores,
  Overall = round(overall_grades, 1),
  Grade = letter_grades
)

print(grade_summary)
  Student Midterm Final Overall Grade
1   Alice      85    88    86.8     B
2     Bob      78    82    80.4     B
3 Charlie      92    89    90.2     A
4   Diana      88    91    89.8     B
5     Eve      79    83    81.4     B
6   Frank      94    96    95.2     A
7   Grace      87    90    88.8     B
# Class statistics
cat("Class average:", round(mean(overall_grades), 1), "\n")
Class average: 87.5 
cat("Students with A:", sum(letter_grades == "A"), "\n")
Students with A: 2 
cat("Passing rate:", round(mean(overall_grades >= 60) * 100, 1), "%\n")
Passing rate: 100 %

Example 2: Temperature Conversion

# Daily temperatures in Fahrenheit
fahrenheit_temps <- c(68, 72, 75, 71, 69, 74, 78, 76, 73, 70)

# Convert to Celsius
celsius_temps <- (fahrenheit_temps - 32) * 5/9

# Categorize temperatures
temp_categories <- ifelse(celsius_temps < 15, "Cold",
                         ifelse(celsius_temps < 25, "Mild", "Warm"))

# Summary
temp_summary <- data.frame(
  Day = 1:10,
  Fahrenheit = fahrenheit_temps,
  Celsius = round(celsius_temps, 1),
  Category = temp_categories
)

print(temp_summary)
   Day Fahrenheit Celsius Category
1    1         68    20.0     Mild
2    2         72    22.2     Mild
3    3         75    23.9     Mild
4    4         71    21.7     Mild
5    5         69    20.6     Mild
6    6         74    23.3     Mild
7    7         78    25.6     Warm
8    8         76    24.4     Mild
9    9         73    22.8     Mild
10  10         70    21.1     Mild
# Find extreme days
hottest_day <- which.max(celsius_temps)
coldest_day <- which.min(celsius_temps)

cat("Hottest day:", hottest_day, "with", round(celsius_temps[hottest_day], 1), "°C\n")
Hottest day: 7 with 25.6 °C
cat("Coldest day:", coldest_day, "with", round(celsius_temps[coldest_day], 1), "°C\n")
Coldest day: 1 with 20 °C

Common Mistakes and Best Practices

1. Remember 1-based Indexing

my_vector <- c("a", "b", "c", "d", "e")

# R uses 1-based indexing (not 0-based like many languages)
my_vector[1]  # First element (not my_vector[0])
[1] "a"
my_vector[5]  # Last element
[1] "e"

2. Vector Type Consistency

# Vectors can only hold one type of data
mixed_attempt <- c(1, "two", 3, "four")
print(mixed_attempt)  # Everything becomes character!
[1] "1"    "two"  "3"    "four"
typeof(mixed_attempt)
[1] "character"
# Use lists for mixed types (covered in next section)

3. NA Propagation

# One NA can affect entire calculations
values_with_na <- c(1, 2, NA, 4, 5)
mean(values_with_na)           # Returns NA
[1] NA
mean(values_with_na, na.rm = TRUE)  # Proper way to handle
[1] 3

Exercises

Exercise 1: Vector Creation and Manipulation

  1. Create a vector of the first 20 even numbers
  2. Create a vector with your name repeated 5 times
  3. Create a vector of 15 random numbers between 1 and 100

Exercise 2: Data Analysis Practice

Given these test scores: scores <- c(78, 85, 92, 88, 79, 95, 87, 83, 90, 86)

  1. Calculate the mean, median, and standard deviation
  2. Find how many scores are above average
  3. Identify the positions of scores above 90
  4. Replace any score below 80 with 80

Exercise 3: Real-world Application

You have monthly sales data: sales <- c(12000, 15000, 13500, 16000, 14200, 17500)

  1. Calculate the total yearly sales
  2. Find the month with highest sales
  3. Calculate the percentage increase from the first month to the last month
  4. Identify months where sales exceeded $15,000

Summary

Vectors are fundamental to R programming:

  • Creation: Use c(), sequences (1:10, seq()), and repetition (rep())
  • Indexing: Access elements by position [1], multiple positions [c(1,3,5)], or conditions [x > 5]
  • Operations: Vectorized arithmetic and comparisons work element-wise
  • Functions: Many built-in functions work naturally with vectors
  • Modification: Add, replace, or remove elements as needed

Key principles: - Vectors hold elements of the same type - R uses 1-based indexing - Operations are vectorized by default - Missing values (NA) propagate through calculations

Understanding vectors is essential because they form the foundation for more complex data structures like data frames and lists, which we’ll explore next!