Events and Data
This guide covers how to work with relational event data in REM.jl.
Events
An Event represents a single directed interaction between actors:
using REM
# Basic event: sender, receiver, time
e = Event(1, 2, 1.0)
# With optional event type and weight
e = Event(1, 2, 1.0; eventtype=:email, weight=2.0)Event Fields
| Field | Type | Description | Default |
|---|---|---|---|
sender | Int | ID of the event sender | Required |
receiver | Int | ID of the event receiver | Required |
time | T | Timestamp of the event | Required |
eventtype | Symbol | Category of the event | :event |
weight | Float64 | Weight/magnitude | 1.0 |
Accessing Event Data
e = Event(1, 2, 3.5; eventtype=:phone, weight=2.0)
e.sender # 1
e.receiver # 2
e.time # 3.5
e.eventtype # :phone
e.weight # 2.0Timestamp Types
REM.jl supports various timestamp types:
using Dates
# Numeric timestamps
Event(1, 2, 1.0) # Float64
Event(1, 2, 1) # Int
# Calendar timestamps
Event(1, 2, DateTime(2024, 1, 15, 10, 30)) # DateTime
Event(1, 2, Date(2024, 1, 15)) # DateAll events in a sequence must have the same timestamp type.
Event Sequences
An EventSequence is a time-sorted collection of events:
events = [
Event(1, 2, 3.0), # Not in chronological order...
Event(2, 1, 1.0),
Event(1, 3, 2.0),
]
seq = EventSequence(events) # Automatically sorted by time
# After sorting: times are [1.0, 2.0, 3.0]Accessing Sequence Data
# Basic access
seq[1] # First event (earliest time)
seq[end] # Last event (latest time)
length(seq) # Number of events
# Metadata
seq.n_actors # Number of unique actors
seq.actors # Set of actor IDs
seq.eventtypes # Set of event types
# Iteration
for event in seq
println(event.sender, " → ", event.receiver)
end
# Collect times
times = [e.time for e in seq]Adding Events
Events are inserted maintaining time order:
# Insert a new event
push!(seq, Event(3, 1, 1.5))
# The sequence remains sorted by timeCreating Empty Sequences
# Empty sequence for Float64 timestamps
seq = EventSequence{Float64}()
# Add events incrementally
push!(seq, Event(1, 2, 1.0))
push!(seq, Event(2, 1, 2.0))Loading Data
From DataFrame
The most common way to load events:
using DataFrames
df = DataFrame(
sender = [1, 2, 1],
receiver = [2, 1, 3],
time = [1.0, 2.0, 3.0]
)
seq = load_events(df)Custom Column Names
When your DataFrame has different column names:
df = DataFrame(
from = [1, 2, 1],
to = [2, 1, 3],
timestamp = [1.0, 2.0, 3.0],
type = [:email, :email, :meeting],
importance = [1.0, 2.0, 1.5]
)
seq = load_events(df;
sender_col = :from,
receiver_col = :to,
time_col = :timestamp,
type_col = :type,
weight_col = :importance
)String Actor Names
When actors are identified by names rather than numeric IDs:
df = DataFrame(
sender = ["Alice", "Bob", "Alice", "Carol"],
receiver = ["Bob", "Alice", "Carol", "Bob"],
time = [1.0, 2.0, 3.0, 4.0]
)
seq = load_events(df; actor_names=true)
# Actors are assigned numeric IDs internally
# Access the mapping through the returned sequence
println(seq.n_actors) # 3From CSV File
Load directly from a CSV file:
# Basic usage
seq = load_events("events.csv")
# With options
seq = load_events("events.csv";
sender_col = :source,
receiver_col = :target,
time_col = :timestamp,
actor_names = true
)DateTime Parsing
For string timestamps that need parsing:
df = DataFrame(
sender = [1, 2, 1],
receiver = [2, 1, 3],
time = ["2024-01-01T10:00:00", "2024-01-01T11:00:00", "2024-01-01T12:00:00"]
)
seq = load_events(df; time_type=DateTime)Node Attributes
Node attributes store actor-level covariates for use with attribute statistics.
Creating Attributes
# Categorical attribute with default value
gender = NodeAttribute(:gender,
Dict(1 => "M", 2 => "F", 3 => "M"), # Actor ID → value
"Unknown" # Default for unspecified actors
)
# Numeric attribute
age = NodeAttribute(:age,
Dict(1 => 25.0, 2 => 30.0, 3 => 28.0),
0.0 # Default
)
# Boolean attribute
is_manager = NodeAttribute(:manager,
Dict(1 => true, 2 => false, 3 => true),
false
)Accessing Attribute Values
gender[1] # "M"
gender[2] # "F"
gender[4] # "Unknown" (default - actor 4 not in dict)
age[1] # 25.0
age[99] # 0.0 (default)Modifying Attributes
# Set a value
age[4] = 35.0
# Update existing
age[1] = 26.0Using Attributes in Statistics
# Homophily: same gender
NodeMatch(gender)
# Difference: age difference
NodeDifference(age)
# Main effects
SenderAttribute(age)
ReceiverAttribute(age)
# Specific combinations
NodeMix(gender, "M", "F") # Male sender, female receiverActor Sets
For specifying custom sets of actors:
# From numeric IDs
actors = ActorSet([1, 2, 3, 4, 5])
# From names (creates ID mapping)
actors = ActorSet(["Alice", "Bob", "Carol", "David"])
# Access mappings
actors.name_to_id["Alice"] # 1
actors.id_to_name[1] # "Alice"
actors.ids # [1, 2, 3, 4]
# Check membership
2 in actors # true
10 in actors # falseRisk Sets
Risk sets define which dyads could potentially experience an event. This is used internally for case-control sampling.
rs = RiskSet(
5, # Index of focal event
[1, 2, 3], # Potential senders
[1, 2, 3, 4]; # Potential receivers
exclude_self_loops = true # Exclude s == r (default: true)
)
# Number of dyads in risk set
n_dyads(rs) # 3*4 - 3 = 9 (excluding self-loops)Working with Different Time Scales
Numeric Time
For abstract time units:
events = [
Event(1, 2, 0.0),
Event(2, 1, 1.0),
Event(1, 2, 2.5),
]
seq = EventSequence(events)
# Decay with numeric halflife
decay = halflife_to_decay(10.0) # Half weight after 10 time unitsDateTime
For real calendar time:
using Dates
events = [
Event(1, 2, DateTime(2024, 1, 1, 9, 0)), # 9:00 AM
Event(2, 1, DateTime(2024, 1, 1, 10, 30)), # 10:30 AM
Event(1, 3, DateTime(2024, 1, 1, 14, 0)), # 2:00 PM
]
seq = EventSequence(events)
# Decay: halflife of 1 hour = 3600 seconds
decay = halflife_to_decay(3600.0)
state = NetworkState(seq; decay=decay)Date
For daily granularity:
using Dates
events = [
Event(1, 2, Date(2024, 1, 1)),
Event(2, 1, Date(2024, 1, 8)), # One week later
Event(1, 3, Date(2024, 1, 15)), # Two weeks later
]
seq = EventSequence(events)
# Decay: halflife of 7 days = 7 * 86400 seconds
decay = halflife_to_decay(7.0 * 86400)Data Validation
Self-Loops
Events where sender equals receiver generate a warning:
e = Event(1, 1, 1.0) # Warning: Self-loop detectedTo filter self-loops:
events = [e for e in raw_events if e.sender != e.receiver]
seq = EventSequence(events)Missing Data
For DataFrames with missing values:
# Filter rows with missing values before loading
df_clean = dropmissing(df, [:sender, :receiver, :time])
seq = load_events(df_clean)Duplicate Events
Events at the exact same time between the same actors are allowed but may affect some statistics:
events = [
Event(1, 2, 1.0),
Event(1, 2, 1.0), # Duplicate - both are included
]
seq = EventSequence(events)
length(seq) # 2Utility Functions
Time Conversions
# Convert halflife to decay rate
decay = halflife_to_decay(10.0) # λ such that weight = 0.5 at t = 10
# Convert back
halflife = decay_to_halflife(decay)
# Compute decay weight for elapsed time
weight = compute_decay_weight(decay, elapsed_time)Sequence Statistics
# Time span
first_time = seq[1].time
last_time = seq[end].time
duration = last_time - first_time
# Event counts by actor
using StatsBase
senders = [e.sender for e in seq]
sender_counts = countmap(senders)
# Unique dyads
dyads = Set((e.sender, e.receiver) for e in seq)
n_unique_dyads = length(dyads)