Vectors are the most fundamental data structure in R. In fact, even single values in R are vectors with one element! Understanding vectors is crucial because virtually everything in R is built upon them.
A vector is a sequence of data elements of the same type. Think of it as a container that holds multiple values in a specific order.
Creating Vectors
The c() Function
The most common way to create vectors is using the c() function (which stands for “combine” or “concatenate”):
Student Midterm Final Overall Grade
1 Alice 85 88 86.8 B
2 Bob 78 82 80.4 B
3 Charlie 92 89 90.2 A
4 Diana 88 91 89.8 B
5 Eve 79 83 81.4 B
6 Frank 94 96 95.2 A
7 Grace 87 90 88.8 B
# Class statisticscat("Class average:", round(mean(overall_grades), 1), "\n")
Class average: 87.5
cat("Students with A:", sum(letter_grades =="A"), "\n")
my_vector <-c("a", "b", "c", "d", "e")# R uses 1-based indexing (not 0-based like many languages)my_vector[1] # First element (not my_vector[0])
[1] "a"
my_vector[5] # Last element
[1] "e"
2. Vector Type Consistency
# Vectors can only hold one type of datamixed_attempt <-c(1, "two", 3, "four")print(mixed_attempt) # Everything becomes character!
[1] "1" "two" "3" "four"
typeof(mixed_attempt)
[1] "character"
# Use lists for mixed types (covered in next section)
3. NA Propagation
# One NA can affect entire calculationsvalues_with_na <-c(1, 2, NA, 4, 5)mean(values_with_na) # Returns NA
[1] NA
mean(values_with_na, na.rm =TRUE) # Proper way to handle
[1] 3
Exercises
Exercise 1: Vector Creation and Manipulation
Create a vector of the first 20 even numbers
Create a vector with your name repeated 5 times
Create a vector of 15 random numbers between 1 and 100
Exercise 2: Data Analysis Practice
Given these test scores: scores <- c(78, 85, 92, 88, 79, 95, 87, 83, 90, 86)
Calculate the mean, median, and standard deviation
Find how many scores are above average
Identify the positions of scores above 90
Replace any score below 80 with 80
Exercise 3: Real-world Application
You have monthly sales data: sales <- c(12000, 15000, 13500, 16000, 14200, 17500)
Calculate the total yearly sales
Find the month with highest sales
Calculate the percentage increase from the first month to the last month
Identify months where sales exceeded $15,000
Summary
Vectors are fundamental to R programming:
Creation: Use c(), sequences (1:10, seq()), and repetition (rep())
Indexing: Access elements by position [1], multiple positions [c(1,3,5)], or conditions [x > 5]
Operations: Vectorized arithmetic and comparisons work element-wise
Functions: Many built-in functions work naturally with vectors
Modification: Add, replace, or remove elements as needed
Key principles: - Vectors hold elements of the same type - R uses 1-based indexing - Operations are vectorized by default - Missing values (NA) propagate through calculations
Understanding vectors is essential because they form the foundation for more complex data structures like data frames and lists, which we’ll explore next!
---title: "Vectors: R's Building Blocks"author: "IND215"date: todayformat: html: toc: true toc-depth: 3 code-fold: false code-tools: true---## Introduction to VectorsVectors are the most fundamental data structure in R. In fact, even single values in R are vectors with one element! Understanding vectors is crucial because virtually everything in R is built upon them.A **vector** is a sequence of data elements of the same type. Think of it as a container that holds multiple values in a specific order.## Creating Vectors### The `c()` FunctionThe most common way to create vectors is using the `c()` function (which stands for "combine" or "concatenate"):```{r}#| label: vector-creation# Numeric vectorsnumbers <-c(1, 2, 3, 4, 5)temperatures <-c(72.5, 75.2, 68.9, 80.1, 77.3)# Character vectorsnames <-c("Alice", "Bob", "Charlie", "Diana")colors <-c("red", "green", "blue", "yellow")# Logical vectorsanswers <-c(TRUE, FALSE, TRUE, TRUE, FALSE)# Print the vectorsprint(numbers)print(names)print(answers)```### Vector PropertiesEvery vector has important properties:```{r}#| label: vector-propertiesscores <-c(85, 92, 78, 96, 88)# Length: number of elementslength(scores)# Type: what kind of datatypeof(scores)# Class: object classclass(scores)# Structure: comprehensive overviewstr(scores)```### Creating SequencesR provides several ways to create vectors with patterns:```{r}#| label: vector-sequences# Simple sequencesseq1 <-1:10# 1, 2, 3, ..., 10seq2 <-10:1# 10, 9, 8, ..., 1# Using seq() functionseq3 <-seq(from =0, to =100, by =10) # 0, 10, 20, ..., 100seq4 <-seq(0, 1, length.out =11) # 11 equally spaced numbers# Repeated valuesrep1 <-rep(5, times =8) # 5, 5, 5, 5, 5, 5, 5, 5rep2 <-rep(c(1, 2, 3), times =3) # 1, 2, 3, 1, 2, 3, 1, 2, 3rep3 <-rep(c(1, 2, 3), each =3) # 1, 1, 1, 2, 2, 2, 3, 3, 3print(seq3)print(rep2)print(rep3)```## Vector Indexing and Subsetting### Accessing Elements by Position```{r}#| label: vector-indexingfruits <-c("apple", "banana", "cherry", "date", "elderberry")# Single element (note: R uses 1-based indexing!)fruits[1] # First elementfruits[3] # Third elementfruits[5] # Last element# Multiple elementsfruits[c(1, 3, 5)] # Elements 1, 3, and 5fruits[1:3] # Elements 1 through 3fruits[c(2, 4)] # Elements 2 and 4```### Negative IndexingUse negative indices to exclude elements:```{r}#| label: negative-indexingnumbers <-c(10, 20, 30, 40, 50)# Exclude specific elementsnumbers[-1] # All except the firstnumbers[-c(1, 5)] # All except first and lastnumbers[-(2:4)] # All except elements 2 through 4print(numbers[-c(1, 5)])```### Logical IndexingUse logical vectors to subset based on conditions:```{r}#| label: logical-indexingages <-c(23, 35, 28, 42, 19, 31, 27)# Find elements meeting a conditionages >30# Logical vectorages[ages >30] # Elements where condition is TRUE# Multiple conditionsages[ages >=25& ages <=35] # Ages between 25 and 35ages[ages <25| ages >40] # Ages less than 25 OR greater than 40# Store logical vector for reuseadults <- ages >=18ages[adults]```### Subsetting with NamesVectors can have named elements:```{r}#| label: named-vectors# Create named vectorstudent_grades <-c(alice =92, bob =87, charlie =95, diana =89)print(student_grades)# Access by namestudent_grades["alice"]student_grades[c("alice", "charlie")]# Get namesnames(student_grades)# Add names to existing vectorscores <-c(85, 90, 78, 92)names(scores) <-c("Math", "Science", "English", "History")print(scores)```## Element-wise OperationsOne of R's greatest strengths is **vectorization** - operations work on entire vectors automatically:### Arithmetic Operations```{r}#| label: vector-arithmetic# Create vectorsa <-c(2, 4, 6, 8, 10)b <-c(1, 2, 3, 4, 5)# Element-wise arithmetica + b # Add corresponding elementsa - b # Subtract corresponding elementsa * b # Multiply corresponding elementsa / b # Divide corresponding elementsa ^ b # Raise a to the power of b# Operations with single values (recycling)a +10# Add 10 to each elementa *2# Multiply each element by 2a /2# Divide each element by 2print(a + b)print(a *2)```### Comparison Operations```{r}#| label: vector-comparisonsscores <-c(85, 92, 78, 96, 88, 74, 91)# Comparisons return logical vectorshigh_scores <- scores >90passing_scores <- scores >=80failing_scores <- scores <70print(high_scores)print(passing_scores)# Count TRUE valuessum(high_scores) # How many scored above 90?sum(passing_scores) # How many passed?# Percentage calculationsmean(high_scores) *100# Percentage with high scores```### Logical Operations```{r}#| label: vector-logicalx <-c(TRUE, FALSE, TRUE, FALSE, TRUE)y <-c(FALSE, FALSE, TRUE, TRUE, TRUE)# Element-wise logical operationsx & y # AND operationx | y # OR operation!x # NOT operationprint(x & y)print(x | y)```## Vector FunctionsR provides many built-in functions that work with vectors:### Mathematical Functions```{r}#| label: vector-math-functionsvalues <-c(1, 4, 9, 16, 25)# Basic functionssum(values) # Sum of all elementsmean(values) # Averagemedian(values) # Medianmin(values) # Minimum valuemax(values) # Maximum valuerange(values) # Min and maxvar(values) # Variancesd(values) # Standard deviation# Element-wise mathematical functionssqrt(values) # Square root of each elementlog(values) # Natural logarithmround(sqrt(values), 2) # Round to 2 decimal placesprint(sqrt(values))```### Statistical Functions```{r}#| label: vector-stats# Generate some sample dataset.seed(123)sample_data <-round(rnorm(20, mean =75, sd =10))# Comprehensive statisticssummary(sample_data)# Quantilesquantile(sample_data)quantile(sample_data, probs =c(0.25, 0.5, 0.75, 0.95))# Ranking and orderingsort(sample_data) # Sort in ascending ordersort(sample_data, decreasing =TRUE) # Sort in descending orderorder(sample_data) # Indices that would sort the vectorrank(sample_data) # Ranks of each element```### Finding and Counting```{r}#| label: vector-findinggrades <-c(85, 92, 78, 96, 88, 74, 91, 89)# Find specific valueswhich(grades >90) # Positions of elements > 90which.max(grades) # Position of maximum valuewhich.min(grades) # Position of minimum value# Check for presence85%in% grades # Is 85 in the vector?c(85, 100) %in% grades # Which of these are in the vector?# Count specific valuessum(grades ==85) # How many times does 85 appear?sum(grades >90) # How many scores above 90?# Unique valuesduplicated_values <-c(1, 2, 2, 3, 3, 3, 4)unique(duplicated_values) # Get unique valuesduplicated(duplicated_values) # Which are duplicates?```## Modifying Vectors### Adding Elements```{r}#| label: vector-modification# Start with a vectororiginal <-c(1, 2, 3)# Add elements to the endextended <-c(original, 4, 5)print(extended)# Add elements to the beginningprepended <-c(0, original)print(prepended)# Insert elements in the middle# (This requires more complex indexing)middle_insert <-c(original[1:2], 2.5, original[3])print(middle_insert)```### Replacing Elements```{r}#| label: vector-replacementscores <-c(85, 92, 78, 96, 88)# Replace specific positionsscores[3] <-82# Replace 3rd elementscores[c(1, 5)] <-c(87, 90) # Replace 1st and 5th elementsprint(scores)# Conditional replacementages <-c(23, 35, 28, 42, 19, 31, 27)ages[ages <25] <-25# Set minimum age to 25print(ages)```### Removing Elements```{r}#| label: vector-removalnumbers <-c(10, 20, 30, 40, 50)# Remove by positionshortened <- numbers[-3] # Remove 3rd elementmultiple_removed <- numbers[-c(1, 5)] # Remove 1st and 5thprint(shortened)print(multiple_removed)# Remove by conditiongrades <-c(85, 92, 78, 96, 88, 74, 91)passing_only <- grades[grades >=80] # Keep only passing gradesprint(passing_only)```## Working with Missing Values### Creating and Detecting Missing Values```{r}#| label: missing-values# Vector with missing valuesincomplete_data <-c(1, 2, NA, 4, 5, NA, 7)# Detect missing valuesis.na(incomplete_data)which(is.na(incomplete_data)) # Positions of NA valuessum(is.na(incomplete_data)) # Count of NA values# Complete cases (non-missing)complete.cases(incomplete_data)incomplete_data[complete.cases(incomplete_data)]```### Handling Missing Values in Calculations```{r}#| label: na-calculationsdata_with_na <-c(10, 15, NA, 20, 25, NA, 30)# Many functions return NA if any element is NAmean(data_with_na) # Returns NAsum(data_with_na) # Returns NA# Use na.rm = TRUE to exclude NA valuesmean(data_with_na, na.rm =TRUE)sum(data_with_na, na.rm =TRUE)sd(data_with_na, na.rm =TRUE)# Functions that handle NA by defaultlength(data_with_na) # Counts NA values toolength(na.omit(data_with_na)) # Length after removing NA```## Vector RecyclingWhen vectors of different lengths are used together, R "recycles" the shorter vector:```{r}#| label: vector-recycling# Vectors of different lengthslong_vector <-c(1, 2, 3, 4, 5, 6)short_vector <-c(10, 20)# The short vector gets recycledresult <- long_vector + short_vectorprint(result) # c(11, 22, 13, 24, 15, 26)# Recycling with single valuesadd_five <- long_vector +5# 5 is recycled to match lengthprint(add_five)# Warning when lengths don't divide evenlyuneven_example <-c(1, 2, 3, 4, 5) +c(10, 20, 30) # Warning!```## Practical Examples### Example 1: Grade Analysis```{r}#| label: grade-analysis# Student grades for a classstudent_names <-c("Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace")midterm_scores <-c(85, 78, 92, 88, 79, 94, 87)final_scores <-c(88, 82, 89, 91, 83, 96, 90)# Calculate overall grades (60% final, 40% midterm)overall_grades <-0.4* midterm_scores +0.6* final_scores# Assign letter gradesletter_grades <-ifelse(overall_grades >=90, "A",ifelse(overall_grades >=80, "B",ifelse(overall_grades >=70, "C",ifelse(overall_grades >=60, "D", "F"))))# Create a summarygrade_summary <-data.frame(Student = student_names,Midterm = midterm_scores,Final = final_scores,Overall =round(overall_grades, 1),Grade = letter_grades)print(grade_summary)# Class statisticscat("Class average:", round(mean(overall_grades), 1), "\n")cat("Students with A:", sum(letter_grades =="A"), "\n")cat("Passing rate:", round(mean(overall_grades >=60) *100, 1), "%\n")```### Example 2: Temperature Conversion```{r}#| label: temperature-conversion# Daily temperatures in Fahrenheitfahrenheit_temps <-c(68, 72, 75, 71, 69, 74, 78, 76, 73, 70)# Convert to Celsiuscelsius_temps <- (fahrenheit_temps -32) *5/9# Categorize temperaturestemp_categories <-ifelse(celsius_temps <15, "Cold",ifelse(celsius_temps <25, "Mild", "Warm"))# Summarytemp_summary <-data.frame(Day =1:10,Fahrenheit = fahrenheit_temps,Celsius =round(celsius_temps, 1),Category = temp_categories)print(temp_summary)# Find extreme dayshottest_day <-which.max(celsius_temps)coldest_day <-which.min(celsius_temps)cat("Hottest day:", hottest_day, "with", round(celsius_temps[hottest_day], 1), "°C\n")cat("Coldest day:", coldest_day, "with", round(celsius_temps[coldest_day], 1), "°C\n")```## Common Mistakes and Best Practices### 1. Remember 1-based Indexing```{r}#| label: indexing-remindermy_vector <-c("a", "b", "c", "d", "e")# R uses 1-based indexing (not 0-based like many languages)my_vector[1] # First element (not my_vector[0])my_vector[5] # Last element```### 2. Vector Type Consistency```{r}#| label: type-consistency# Vectors can only hold one type of datamixed_attempt <-c(1, "two", 3, "four")print(mixed_attempt) # Everything becomes character!typeof(mixed_attempt)# Use lists for mixed types (covered in next section)```### 3. NA Propagation```{r}#| label: na-propagation# One NA can affect entire calculationsvalues_with_na <-c(1, 2, NA, 4, 5)mean(values_with_na) # Returns NAmean(values_with_na, na.rm =TRUE) # Proper way to handle```## Exercises### Exercise 1: Vector Creation and Manipulation1. Create a vector of the first 20 even numbers2. Create a vector with your name repeated 5 times3. Create a vector of 15 random numbers between 1 and 100### Exercise 2: Data Analysis PracticeGiven these test scores: `scores <- c(78, 85, 92, 88, 79, 95, 87, 83, 90, 86)`1. Calculate the mean, median, and standard deviation2. Find how many scores are above average3. Identify the positions of scores above 904. Replace any score below 80 with 80### Exercise 3: Real-world ApplicationYou have monthly sales data: `sales <- c(12000, 15000, 13500, 16000, 14200, 17500)`1. Calculate the total yearly sales2. Find the month with highest sales3. Calculate the percentage increase from the first month to the last month4. Identify months where sales exceeded $15,000## SummaryVectors are fundamental to R programming:- **Creation**: Use `c()`, sequences (`1:10`, `seq()`), and repetition (`rep()`)- **Indexing**: Access elements by position `[1]`, multiple positions `[c(1,3,5)]`, or conditions `[x > 5]`- **Operations**: Vectorized arithmetic and comparisons work element-wise- **Functions**: Many built-in functions work naturally with vectors- **Modification**: Add, replace, or remove elements as neededKey principles:- Vectors hold elements of the same type- R uses 1-based indexing- Operations are vectorized by default- Missing values (NA) propagate through calculationsUnderstanding vectors is essential because they form the foundation for more complex data structures like data frames and lists, which we'll explore next!