完整过程
Let me explain the technical details of route inference and anomaly detection based on the paper:
Route Inference Technical Details:
- Single-scale Inference:
- Uses RNN to process embeddings at each timestamp:
hi = g1(ẽi, hi-1), i = 1, 2, ..., n
where g1 is the RNN module and hi-1 is the previous hidden state
- Multi-scale Inference:
- Uses Gaussian mixture model with C components
- Parameters calculated through linear transformations:
μ(s) = f3(Σk λk(s)hk(s)), σ2(s) = f4(Σk λk(s)hk(s))
μ(t) = f5(Σk λk(t)hk(t)), σ2(t) = f6(Σk λk(t)hk(t))
where:
- hk(s) and hk(t) are final hidden states
- λk(s) and λk(t) are scale parameters
- f3-f6 are fully connected layers
Anomaly Detection Technical Details:
- Score Calculation:
For full trajectory:
Score(T) = 1 - argmaxc(s),c(t) exp[logpγ(T(s)|μc(s))pγ(T(t)|μc(t))/n]
For online detection:
Score(T≤i) = 1 - argmaxc(s),c(t) exp[logpγ(T(s)≤i|μc(s))pγ(e(s)i|T(s)≤i,μc(s))/(i+1) +logpγ(T(t)≤i|μc(t))pγ(e(t)i|T(t)≤i,μc(t))/(i+1)
]
- Ranking Model for Updates:
rankc(T) = [1 + e-(w·p(c|r(s))+b)]-1 * N
where:
- w and b are weight and bias
- p(c|r(s)) is probability of type c
- N is total number of trajectories
The ranking loss function:
L(w;b) = ΣNi=1(yi - rankc(Ti))2
where yi is the true rank of trajectory Ti
This enables constant-time complexity for updating trajectory rankings in the model.
完整过程举例
Let me break this down with a concrete example to help explain how route inference and anomaly detection work:
Let me explain with this concrete example:
- Route Inference Example:
Let's say we have a trajectory T with three points:
Spatial points: [(1,1), (2,2), (3,3)]
Timestamps: [0, 10, 20]
After embedding, we get:
Spatial embeddings: [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
Temporal embeddings: [[0.05, 0.1], [0.15, 0.2], [0.25, 0.3]]
- Gaussian Mixture Model:
Let's say we have 3 route types (Gaussian components):
Component 1: mean = 0.5, std = 0.1
Component 2: mean = 0.7, std = 0.15
Component 3: mean = 0.3, std = 0.05
- Probability Calculation:
For the first point:
Spatial probabilities:
- Component 1: 0.00034
- Component 2: 0.00034
- Component 3: 0.00034Temporal probabilities:
- Component 1: 0.000040
- Component 2: 0.000084
- Component 3: 0.0000037
- Anomaly Score Calculation:
Max spatial probability = 0.00034 (Component 1)
Max temporal probability = 0.000084 (Component 2)
Anomaly score = 1 - (0.00034 * 0.000084) ≈ 0.99999
- Ranking Model Example:
Let's say we're updating the dataset with a new trajectory:
Initial probabilities: [0.00034, 0.00034, 0.00034]
Apply ranking function: rankc(T) = [1 + e-(0.00034*w + b)]-1 * N
If N = 1000, w = 1, b = 0:
Rank ≈ 500 (middle of dataset)
In this example:
- The high anomaly score (close to 1) suggests this trajectory is likely anomalous
- The ranking puts it in the middle of the dataset, suggesting moderate similarity to normal patterns
- For online detection, we would update this score as new points arrive
Would you like me to explain any specific part in more detail or provide another example with different parameters?
编码部分
I'll walk through a numeric example of how temporal embedding fusion works with multiple scales in MST-OATD.
Let's say we have a trajectory with 8 points, each with a temporal embedding of dimension 4 for simplicity:
# Initial temporal embeddings (8 points x 4 dimensions)
t_inputs = [[1.0, 0.5, 0.3, 0.2], # t1[1.2, 0.6, 0.4, 0.3], # t2[1.4, 0.7, 0.5, 0.4], # t3[1.6, 0.8, 0.6, 0.5], # t4[1.8, 0.9, 0.7, 0.6], # t5[2.0, 1.0, 0.8, 0.7], # t6[2.2, 1.1, 0.9, 0.8], # t7[2.4, 1.2, 1.0, 0.9] # t8
]# Parameters
s1_size = 2 # Scale 1: groups of 2
s2_size = 4 # Scale 2: groups of 4# Scale 1 (segments of size 2)
scale1_embeddings = [# Mean of t1,t2[1.1, 0.55, 0.35, 0.25], # Mean of t3,t4[1.5, 0.75, 0.55, 0.45], # Mean of t5,t6[1.9, 0.95, 0.75, 0.65], # Mean of t7,t8[2.3, 1.15, 0.95, 0.85]
]# Scale 2 (segments of size 4)
scale2_embeddings = [# Mean of t1,t2,t3,t4[1.3, 0.65, 0.45, 0.35], # Mean of t5,t6,t7,t8[2.1, 1.05, 0.85, 0.75]
]# Weights learned for combining scales
W1 = 0.4 # Weight for original scale
W2 = 0.3 # Weight for scale1
W3 = 0.3 # Weight for scale2# Final embedding combines all scales with weighted sum
final_state = (W1 * original_state + # Original temporal featuresW2 * scale1_state + # Scale 1 features (pairs)W3 * scale2_state # Scale 2 features (groups of 4)
)
Let's see how one point's embedding gets updated through the attention mechanism:
# For t4 point, attention weights might look like:
attention_weights = [0.1, # attention to t1-t2 group0.5, # attention to t3-t4 group (highest as it contains t4)0.3, # attention to t5-t6 group0.1 # attention to t7-t8 group
]# Final embedding for t4 after attention:
t4_attended = (0.1 * [1.1, 0.55, 0.35, 0.25] + # from first group0.5 * [1.5, 0.75, 0.55, 0.45] + # from second group0.3 * [1.9, 0.95, 0.75, 0.65] + # from third group0.1 * [2.3, 1.15, 0.95, 0.85] # from fourth group
) = [1.64, 0.82, 0.62, 0.52]
This multi-scale approach allows the model to:
- Capture local temporal patterns (original scale)
- Learn patterns between pairs of points (scale 1)
- Learn longer-range patterns (scale 2)
- Use attention to weight the importance of different temporal segments
The final embedding for each point thus contains information about both its immediate temporal neighborhood and longer-range temporal dependencies.
示例:基于代码的轨迹推断、GMM分类与重构过程解析
以下是对轨迹嵌入聚合后,进行推断、GMM分类以及轨迹重构的详细示例。这个示例完全基于你提供的 mst_oatd.py
和 mst_oatd_trainer.py
代码。
场景设定:
假设轨迹嵌入经过多尺度融合,得到以下聚合后的嵌入向量:
encoder_final_state = torch.tensor([0.7, 0.5, 0.8])
这是在经过三层 GRU 编码器推断,并通过可学习权重 $ W1, W2, W3 $ 加权融合后得到的轨迹表示。
假设模型中定义了 3 个高斯混合模型 (GMM) 簇,分别代表不同的正常轨迹模式。
步骤 1:高斯混合模型 (GMM) 分类
代码片段:
mu = self.fc_mu(encoder_final_state)
logvar = self.fc_logvar(encoder_final_state)
z = self.reparameterize(mu, logvar)
示例解释:
- 将轨迹嵌入映射到潜在空间 (Latent Space):
通过线性层计算均值 $ \mu $ 和对数方差 $ \log \sigma^2 $:
mu = fc_mu(encoder_final_state) # 假设 fc_mu 输出 [0.6, 0.4, 0.7]
logvar = fc_logvar(encoder_final_state) # 假设 fc_logvar 输出 [-0.5, -0.7, -0.3]
- 使用重参数化技巧生成潜在向量 $ z $:
std = torch.exp(0.5 * logvar) # 计算标准差 std = exp([-0.25, -0.35, -0.15]) ≈ [0.78, 0.71, 0.87]
eps = torch.randn_like(std) # eps ~ N(0, 1),假设 eps = [0.2, -0.3, 0.1]
z = mu + eps * std # z = [0.6, 0.4, 0.7] + [0.2, -0.3, 0.1] * [0.78, 0.71, 0.87] ≈ [0.756, 0.187, 0.787]
步骤 2:GMM 分类
代码片段:
mu_prior = self.mu_prior # 每个簇的均值
log_var_prior = self.log_var_prior # 每个簇的方差
假设高斯混合模型有 3 个簇,其均值和方差如下:
mu_prior = torch.tensor([[0.5, 0.4, 0.6], # 簇1[0.7, 0.5, 0.8], # 簇2[0.4, 0.3, 0.5] # 簇3
])
log_var_prior = torch.tensor([[-0.4, -0.5, -0.6], # 簇1[-0.3, -0.3, -0.4], # 簇2[-0.6, -0.7, -0.5] # 簇3
])
计算轨迹嵌入 $ z $ 在每个簇的概率:
prob_c1 = -0.5 * torch.sum(((z - mu_prior[0]) ** 2) / torch.exp(log_var_prior[0]))
prob_c2 = -0.5 * torch.sum(((z - mu_prior[1]) ** 2) / torch.exp(log_var_prior[1]))
prob_c3 = -0.5 * torch.sum(((z - mu_prior[2]) ** 2) / torch.exp(log_var_prior[2]))
假设计算得到的结果:
prob_c1 = -1.2
prob_c2 = -0.7
prob_c3 = -1.5
选择最大概率的簇,即簇2 (概率最高,最接近轨迹模式)。
步骤 3:轨迹重构
代码片段:
decoder_outputs, _ = self.decoder(decoder_inputs, z)
在轨迹重构阶段,解码器以潜在表示 $ z $ 作为初始状态,生成轨迹嵌入。假设输出如下:
decoder_outputs = torch.tensor([[1.05, 0.55, 0.35, 0.25],[1.25, 0.65, 0.45, 0.35],[1.45, 0.75, 0.55, 0.45]
]) # 3个时间步的轨迹重构
全连接层将解码器输出映射回原始轨迹空间:
output = self.fc_out(decoder_outputs)
假设输出为:
output = torch.tensor([[1.1, 0.6, 0.4, 0.3],[1.3, 0.7, 0.5, 0.4],[1.6, 0.8, 0.6, 0.5]
])
步骤 4:异常检测概率计算
在 mst_oatd_trainer.py
文件中,异常检测通过生成轨迹概率计算:
likelihood = torch.exp(-torch.sum((output - embeddings) ** 2, dim=-1))
score = 1 - likelihood.max()
假设原始嵌入为:
embeddings = torch.tensor([[1.0, 0.5, 0.3, 0.2],[1.2, 0.6, 0.4, 0.3],[1.4, 0.7, 0.5, 0.4]
])
计算重构误差并转换为概率:
error = torch.sum((output - embeddings) ** 2, dim=-1) # [0.02, 0.03, 0.04]
likelihood = torch.exp(-error) # [0.98, 0.97, 0.96]
score = 1 - likelihood.max() # 1 - 0.98 = 0.02
最终结果:
- 潜在表示 $ z $:\(0.756, 0.187, 0.787\)
- 选中的GMM簇: 簇2
- 轨迹重构: \(1.1, 0.6, 0.4, 0.3\), \(1.3, 0.7, 0.5, 0.4\), \(1.6, 0.8, 0.6, 0.5\)
- 异常分数: 0.02(接近 0,表示轨迹正常)
总结:
- GMM 分类 帮助模型选择轨迹所属的正常模式。
- 解码器重构轨迹 并与原始轨迹比较,计算重构误差。
- 异常分数 接近 1 表示轨迹异常,接近 0 表示轨迹正常。
embed以及检测的代码
From analyzing the code, here's how trajectories are processed and anomaly scores calculated:
- Trajectory Embedding Process in
mst_oatd.py
:
# Initial spatial embedding through graph convolution
H = D.mm(A).mm(self.V).mm(D) # Normalize adjacency matrix
nodes = H.mm(self.embedding(self.nodes))
s_inputs = torch.index_select(nodes, 0, trajs.flatten())# Temporal embedding
t_inputs = self.d2v(times)# Combine via cross-attention
att_s, att_t = self.co_attention(s_inputs, t_inputs)
st_inputs = torch.concat((att_s, att_t), dim=2)# Multi-scale processing via different RNNs at scales s1 and s2
encoder_inputs_s1 = pack_padded_sequence(self.attention_layer(st_inputs, lengths))
encoder_inputs_s2 = self.scale_process(st_inputs, self.s1_size, [int(i // self.s1_size) for i in lengths])
encoder_inputs_s3 = self.scale_process(st_inputs, self.s2_size, [int(i // self.s2_size) for i in lengths])# Combine scales with learned weights
encoder_final_state = (self.W1 * encoder_final_state_s1 + self.W2 * encoder_final_state_s2 + self.W3 * encoder_final_state_s3)
- Anomaly Score Calculation in
mst_oatd_trainer.py
:
def detection(self):# Compute likelihood across all clustersfor batch in self.outliers_loader:trajs, times, seq_lengths = batchc_likelihood_s = []c_likelihood_t = []# Get likelihood for each clusterfor c in range(self.n_cluster):output_s, _, _, _ = self.MST_OATD_S(trajs, times, seq_lengths, batch_size, "test", c)likelihood_s = - self.detec(output_s.reshape(-1, output_s.shape[-1]),trajs.to(self.device).reshape(-1))likelihood_s = torch.exp(torch.sum(mask * (likelihood_s.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))output_t, _, _, _ = self.MST_OATD_T(trajs, times, seq_lengths, batch_size, "test", c)likelihood_t = - self.detec(output_t.reshape(-1, output_t.shape[-1]),times_token.to(self.device).reshape(-1))likelihood_t = torch.exp(torch.sum(mask * (likelihood_t.reshape(batch_size, -1)), dim=-1) / torch.sum(mask, 1))c_likelihood_s.append(likelihood_s)c_likelihood_t.append(likelihood_t)# Final anomaly score combines spatial and temporal likelihoodslikelihood_s = torch.cat(c_likelihood_s).max(0)[0]likelihood_t = torch.cat(c_likelihood_t).max(0)[0]score = 1 - likelihood_s * likelihood_t
The code shows that anomaly scores are computed using:
- Maximum likelihood across all clusters
- Product of spatial and temporal likelihoods
- Normalized by trajectory length via the mask
- Negative cross-entropy loss as the base likelihood measure