Vectors: R’s Building Blocks

Author

IND215

Published

September 22, 2025

Introduction to Vectors

Vectors are the most fundamental data structure in R. In fact, even single values in R are vectors with one element! Understanding vectors is crucial because virtually everything in R is built upon them.

A vector is a sequence of data elements of the same type. Think of it as a container that holds multiple values in a specific order.

Creating Vectors

The `c()` Function

The most common way to create vectors is using the c() function (which stands for “combine” or “concatenate”):

# Numeric vectors
numbers <- c(1, 2, 3, 4, 5)
temperatures <- c(72.5, 75.2, 68.9, 80.1, 77.3)

# Character vectors
names <- c("Alice", "Bob", "Charlie", "Diana")
colors <- c("red", "green", "blue", "yellow")

# Logical vectors
answers <- c(TRUE, FALSE, TRUE, TRUE, FALSE)

# Print the vectors
print(numbers)

[1] 1 2 3 4 5

print(names)

[1] "Alice"   "Bob"     "Charlie" "Diana"

print(answers)

[1]  TRUE FALSE  TRUE  TRUE FALSE

Vector Properties

Every vector has important properties:

scores <- c(85, 92, 78, 96, 88)

# Length: number of elements
length(scores)

[1] 5

# Type: what kind of data
typeof(scores)

[1] "double"

# Class: object class
class(scores)

[1] "numeric"

# Structure: comprehensive overview
str(scores)

 num [1:5] 85 92 78 96 88

Creating Sequences

R provides several ways to create vectors with patterns:

# Simple sequences
seq1 <- 1:10          # 1, 2, 3, ..., 10
seq2 <- 10:1          # 10, 9, 8, ..., 1

# Using seq() function
seq3 <- seq(from = 0, to = 100, by = 10)    # 0, 10, 20, ..., 100
seq4 <- seq(0, 1, length.out = 11)         # 11 equally spaced numbers

# Repeated values
rep1 <- rep(5, times = 8)                  # 5, 5, 5, 5, 5, 5, 5, 5
rep2 <- rep(c(1, 2, 3), times = 3)        # 1, 2, 3, 1, 2, 3, 1, 2, 3
rep3 <- rep(c(1, 2, 3), each = 3)         # 1, 1, 1, 2, 2, 2, 3, 3, 3

print(seq3)

 [1]   0  10  20  30  40  50  60  70  80  90 100

print(rep2)

[1] 1 2 3 1 2 3 1 2 3

print(rep3)

[1] 1 1 1 2 2 2 3 3 3

Vector Indexing and Subsetting

Accessing Elements by Position

fruits <- c("apple", "banana", "cherry", "date", "elderberry")

# Single element (note: R uses 1-based indexing!)
fruits[1]     # First element

[1] "apple"

fruits[3]     # Third element

[1] "cherry"

fruits[5]     # Last element

[1] "elderberry"

# Multiple elements
fruits[c(1, 3, 5)]    # Elements 1, 3, and 5

[1] "apple"      "cherry"     "elderberry"

fruits[1:3]           # Elements 1 through 3

[1] "apple"  "banana" "cherry"

fruits[c(2, 4)]       # Elements 2 and 4

[1] "banana" "date"

Negative Indexing

Use negative indices to exclude elements:

numbers <- c(10, 20, 30, 40, 50)

# Exclude specific elements
numbers[-1]           # All except the first

[1] 20 30 40 50

numbers[-c(1, 5)]     # All except first and last

[1] 20 30 40

numbers[-(2:4)]       # All except elements 2 through 4

[1] 10 50

print(numbers[-c(1, 5)])

[1] 20 30 40

Logical Indexing

Use logical vectors to subset based on conditions:

ages <- c(23, 35, 28, 42, 19, 31, 27)

# Find elements meeting a condition
ages > 30             # Logical vector

[1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE

ages[ages > 30]       # Elements where condition is TRUE

[1] 35 42 31

# Multiple conditions
ages[ages >= 25 & ages <= 35]     # Ages between 25 and 35

[1] 35 28 31 27

ages[ages < 25 | ages > 40]       # Ages less than 25 OR greater than 40

[1] 23 42 19

# Store logical vector for reuse
adults <- ages >= 18
ages[adults]

[1] 23 35 28 42 19 31 27

Subsetting with Names

Vectors can have named elements:

# Create named vector
student_grades <- c(alice = 92, bob = 87, charlie = 95, diana = 89)
print(student_grades)

  alice     bob charlie   diana 
     92      87      95      89

# Access by name
student_grades["alice"]

alice 
   92

student_grades[c("alice", "charlie")]

  alice charlie 
     92      95

# Get names
names(student_grades)

[1] "alice"   "bob"     "charlie" "diana"

# Add names to existing vector
scores <- c(85, 90, 78, 92)
names(scores) <- c("Math", "Science", "English", "History")
print(scores)

   Math Science English History 
     85      90      78      92

Element-wise Operations

One of R’s greatest strengths is vectorization - operations work on entire vectors automatically:

Arithmetic Operations

# Create vectors
a <- c(2, 4, 6, 8, 10)
b <- c(1, 2, 3, 4, 5)

# Element-wise arithmetic
a + b     # Add corresponding elements

[1]  3  6  9 12 15

a - b     # Subtract corresponding elements

[1] 1 2 3 4 5

a * b     # Multiply corresponding elements

[1]  2  8 18 32 50

a / b     # Divide corresponding elements

[1] 2 2 2 2 2

a ^ b     # Raise a to the power of b

[1]      2     16    216   4096 100000

# Operations with single values (recycling)
a + 10    # Add 10 to each element

[1] 12 14 16 18 20

a * 2     # Multiply each element by 2

[1]  4  8 12 16 20

a / 2     # Divide each element by 2

[1] 1 2 3 4 5

print(a + b)

[1]  3  6  9 12 15

print(a * 2)

[1]  4  8 12 16 20

Comparison Operations

scores <- c(85, 92, 78, 96, 88, 74, 91)

# Comparisons return logical vectors
high_scores <- scores > 90
passing_scores <- scores >= 80
failing_scores <- scores < 70

print(high_scores)

[1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE

print(passing_scores)

[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE

# Count TRUE values
sum(high_scores)      # How many scored above 90?

[1] 3

sum(passing_scores)   # How many passed?

[1] 5

# Percentage calculations
mean(high_scores) * 100    # Percentage with high scores

[1] 42.85714

Logical Operations

x <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
y <- c(FALSE, FALSE, TRUE, TRUE, TRUE)

# Element-wise logical operations
x & y     # AND operation

[1] FALSE FALSE  TRUE FALSE  TRUE

x | y     # OR operation

[1]  TRUE FALSE  TRUE  TRUE  TRUE

!x        # NOT operation

[1] FALSE  TRUE FALSE  TRUE FALSE

print(x & y)

[1] FALSE FALSE  TRUE FALSE  TRUE

print(x | y)

[1]  TRUE FALSE  TRUE  TRUE  TRUE

Vector Functions

R provides many built-in functions that work with vectors:

Mathematical Functions

values <- c(1, 4, 9, 16, 25)

# Basic functions
sum(values)           # Sum of all elements

[1] 55

mean(values)          # Average

[1] 11

median(values)        # Median

[1] 9

min(values)           # Minimum value

[1] 1

max(values)           # Maximum value

[1] 25

range(values)         # Min and max

[1]  1 25

var(values)           # Variance

[1] 93.5

sd(values)            # Standard deviation

[1] 9.66954

# Element-wise mathematical functions
sqrt(values)          # Square root of each element

[1] 1 2 3 4 5

log(values)           # Natural logarithm

[1] 0.000000 1.386294 2.197225 2.772589 3.218876

round(sqrt(values), 2) # Round to 2 decimal places

[1] 1 2 3 4 5

print(sqrt(values))

[1] 1 2 3 4 5

Statistical Functions

# Generate some sample data
set.seed(123)
sample_data <- round(rnorm(20, mean = 75, sd = 10))

# Comprehensive statistics
summary(sample_data)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  55.00   69.75   76.00   76.40   80.50   93.00

# Quantiles
quantile(sample_data)

   0%   25%   50%   75%  100% 
55.00 69.75 76.00 80.50 93.00

quantile(sample_data, probs = c(0.25, 0.5, 0.75, 0.95))

  25%   50%   75%   95% 
69.75 76.00 80.50 92.05

# Ranking and ordering
sort(sample_data)                    # Sort in ascending order

 [1] 55 62 68 69 69 70 71 73 76 76 76 79 79 80 80 82 87 91 92 93

sort(sample_data, decreasing = TRUE) # Sort in descending order

 [1] 93 92 91 87 82 80 80 79 79 76 76 76 73 71 70 69 69 68 62 55

order(sample_data)                   # Indices that would sort the vector

 [1] 18  8  9  1 15 20 10  2  4  5 14 12 13  7 17 19 11  3  6 16

rank(sample_data)                    # Ranks of each element

 [1]  4.5  8.0 18.0 10.0 10.0 19.0 14.5  2.0  3.0  7.0 17.0 12.5 12.5 10.0  4.5
[16] 20.0 14.5  1.0 16.0  6.0

Finding and Counting

grades <- c(85, 92, 78, 96, 88, 74, 91, 89)

# Find specific values
which(grades > 90)           # Positions of elements > 90

[1] 2 4 7

which.max(grades)            # Position of maximum value

[1] 4

which.min(grades)            # Position of minimum value

[1] 6

# Check for presence
85 %in% grades               # Is 85 in the vector?

[1] TRUE

c(85, 100) %in% grades       # Which of these are in the vector?

[1]  TRUE FALSE

# Count specific values
sum(grades == 85)            # How many times does 85 appear?

[1] 1

sum(grades > 90)             # How many scores above 90?

[1] 3

# Unique values
duplicated_values <- c(1, 2, 2, 3, 3, 3, 4)
unique(duplicated_values)    # Get unique values

[1] 1 2 3 4

duplicated(duplicated_values) # Which are duplicates?

[1] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE

Modifying Vectors

Adding Elements

# Start with a vector
original <- c(1, 2, 3)

# Add elements to the end
extended <- c(original, 4, 5)
print(extended)

[1] 1 2 3 4 5

# Add elements to the beginning
prepended <- c(0, original)
print(prepended)

[1] 0 1 2 3

# Insert elements in the middle
# (This requires more complex indexing)
middle_insert <- c(original[1:2], 2.5, original[3])
print(middle_insert)

[1] 1.0 2.0 2.5 3.0

Replacing Elements

scores <- c(85, 92, 78, 96, 88)

# Replace specific positions
scores[3] <- 82          # Replace 3rd element
scores[c(1, 5)] <- c(87, 90)  # Replace 1st and 5th elements

print(scores)

[1] 87 92 82 96 90

# Conditional replacement
ages <- c(23, 35, 28, 42, 19, 31, 27)
ages[ages < 25] <- 25    # Set minimum age to 25
print(ages)

[1] 25 35 28 42 25 31 27

Removing Elements

numbers <- c(10, 20, 30, 40, 50)

# Remove by position
shortened <- numbers[-3]              # Remove 3rd element
multiple_removed <- numbers[-c(1, 5)] # Remove 1st and 5th

print(shortened)

[1] 10 20 40 50

print(multiple_removed)

[1] 20 30 40

# Remove by condition
grades <- c(85, 92, 78, 96, 88, 74, 91)
passing_only <- grades[grades >= 80]  # Keep only passing grades
print(passing_only)

[1] 85 92 96 88 91

Working with Missing Values

Creating and Detecting Missing Values

# Vector with missing values
incomplete_data <- c(1, 2, NA, 4, 5, NA, 7)

# Detect missing values
is.na(incomplete_data)

[1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

which(is.na(incomplete_data))     # Positions of NA values

[1] 3 6

sum(is.na(incomplete_data))       # Count of NA values

[1] 2

# Complete cases (non-missing)
complete.cases(incomplete_data)

[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE

incomplete_data[complete.cases(incomplete_data)]

[1] 1 2 4 5 7

Handling Missing Values in Calculations

data_with_na <- c(10, 15, NA, 20, 25, NA, 30)

# Many functions return NA if any element is NA
mean(data_with_na)           # Returns NA

[1] NA

sum(data_with_na)            # Returns NA

[1] NA

# Use na.rm = TRUE to exclude NA values
mean(data_with_na, na.rm = TRUE)

[1] 20

sum(data_with_na, na.rm = TRUE)

[1] 100

sd(data_with_na, na.rm = TRUE)

[1] 7.905694

# Functions that handle NA by default
length(data_with_na)         # Counts NA values too

[1] 7

length(na.omit(data_with_na)) # Length after removing NA

[1] 5

Vector Recycling

When vectors of different lengths are used together, R “recycles” the shorter vector:

# Vectors of different lengths
long_vector <- c(1, 2, 3, 4, 5, 6)
short_vector <- c(10, 20)

# The short vector gets recycled
result <- long_vector + short_vector
print(result)  # c(11, 22, 13, 24, 15, 26)

[1] 11 22 13 24 15 26

# Recycling with single values
add_five <- long_vector + 5  # 5 is recycled to match length
print(add_five)

[1]  6  7  8  9 10 11

# Warning when lengths don't divide evenly
uneven_example <- c(1, 2, 3, 4, 5) + c(10, 20, 30)  # Warning!

Practical Examples

Example 1: Grade Analysis

# Student grades for a class
student_names <- c("Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace")
midterm_scores <- c(85, 78, 92, 88, 79, 94, 87)
final_scores <- c(88, 82, 89, 91, 83, 96, 90)

# Calculate overall grades (60% final, 40% midterm)
overall_grades <- 0.4 * midterm_scores + 0.6 * final_scores

# Assign letter grades
letter_grades <- ifelse(overall_grades >= 90, "A",
                       ifelse(overall_grades >= 80, "B",
                             ifelse(overall_grades >= 70, "C",
                                   ifelse(overall_grades >= 60, "D", "F"))))

# Create a summary
grade_summary <- data.frame(
  Student = student_names,
  Midterm = midterm_scores,
  Final = final_scores,
  Overall = round(overall_grades, 1),
  Grade = letter_grades
)

print(grade_summary)

  Student Midterm Final Overall Grade
1   Alice      85    88    86.8     B
2     Bob      78    82    80.4     B
3 Charlie      92    89    90.2     A
4   Diana      88    91    89.8     B
5     Eve      79    83    81.4     B
6   Frank      94    96    95.2     A
7   Grace      87    90    88.8     B

# Class statistics
cat("Class average:", round(mean(overall_grades), 1), "\n")

Class average: 87.5

cat("Students with A:", sum(letter_grades == "A"), "\n")

Students with A: 2

cat("Passing rate:", round(mean(overall_grades >= 60) * 100, 1), "%\n")

Passing rate: 100 %

Example 2: Temperature Conversion

# Daily temperatures in Fahrenheit
fahrenheit_temps <- c(68, 72, 75, 71, 69, 74, 78, 76, 73, 70)

# Convert to Celsius
celsius_temps <- (fahrenheit_temps - 32) * 5/9

# Categorize temperatures
temp_categories <- ifelse(celsius_temps < 15, "Cold",
                         ifelse(celsius_temps < 25, "Mild", "Warm"))

# Summary
temp_summary <- data.frame(
  Day = 1:10,
  Fahrenheit = fahrenheit_temps,
  Celsius = round(celsius_temps, 1),
  Category = temp_categories
)

print(temp_summary)

   Day Fahrenheit Celsius Category
1    1         68    20.0     Mild
2    2         72    22.2     Mild
3    3         75    23.9     Mild
4    4         71    21.7     Mild
5    5         69    20.6     Mild
6    6         74    23.3     Mild
7    7         78    25.6     Warm
8    8         76    24.4     Mild
9    9         73    22.8     Mild
10  10         70    21.1     Mild

# Find extreme days
hottest_day <- which.max(celsius_temps)
coldest_day <- which.min(celsius_temps)

cat("Hottest day:", hottest_day, "with", round(celsius_temps[hottest_day], 1), "°C\n")

Hottest day: 7 with 25.6 °C

cat("Coldest day:", coldest_day, "with", round(celsius_temps[coldest_day], 1), "°C\n")

Coldest day: 1 with 20 °C

Common Mistakes and Best Practices

1. Remember 1-based Indexing

my_vector <- c("a", "b", "c", "d", "e")

# R uses 1-based indexing (not 0-based like many languages)
my_vector[1]  # First element (not my_vector[0])

[1] "a"

my_vector[5]  # Last element

[1] "e"

2. Vector Type Consistency

# Vectors can only hold one type of data
mixed_attempt <- c(1, "two", 3, "four")
print(mixed_attempt)  # Everything becomes character!

[1] "1"    "two"  "3"    "four"

typeof(mixed_attempt)

[1] "character"

# Use lists for mixed types (covered in next section)

3. NA Propagation

# One NA can affect entire calculations
values_with_na <- c(1, 2, NA, 4, 5)
mean(values_with_na)           # Returns NA

[1] NA

mean(values_with_na, na.rm = TRUE)  # Proper way to handle

[1] 3

Exercises

Exercise 1: Vector Creation and Manipulation

Create a vector of the first 20 even numbers
Create a vector with your name repeated 5 times
Create a vector of 15 random numbers between 1 and 100

Exercise 2: Data Analysis Practice

Given these test scores: scores <- c(78, 85, 92, 88, 79, 95, 87, 83, 90, 86)

Calculate the mean, median, and standard deviation
Find how many scores are above average
Identify the positions of scores above 90
Replace any score below 80 with 80

Exercise 3: Real-world Application

You have monthly sales data: sales <- c(12000, 15000, 13500, 16000, 14200, 17500)

Calculate the total yearly sales
Find the month with highest sales
Calculate the percentage increase from the first month to the last month
Identify months where sales exceeded $15,000

Summary

Vectors are fundamental to R programming:

Creation: Use c(), sequences (1:10, seq()), and repetition (rep())
Indexing: Access elements by position [1], multiple positions [c(1,3,5)], or conditions [x > 5]
Operations: Vectorized arithmetic and comparisons work element-wise
Functions: Many built-in functions work naturally with vectors
Modification: Add, replace, or remove elements as needed

Key principles: - Vectors hold elements of the same type - R uses 1-based indexing - Operations are vectorized by default - Missing values (NA) propagate through calculations

Understanding vectors is essential because they form the foundation for more complex data structures like data frames and lists, which we’ll explore next!