what
余弦相似度是一种用于度量向量相似性的metric。
\[cos\theta = \frac{A.B}{|A|.|B|}
\]
A.B
:向量的内积|A|
:向量的模长- \(cos\theta\):的范围$ [ -1 , 1 ] $
why
余弦相似度的计算复杂度很低,对于稀疏向量而言,只用考虑非零向量
How
numpy实现
import numpy as npdef cosine_similarity(vec1, vec2) -> float:norm_vec1 = np.linalg.norm(vec1)norm_vec2 = np.linalg.norm(vec2)return np.dot(vec1, vec2) / (norm_vec1 * norm_vec2)if __name__ == '__main__':print(cosine_similarity([1, 2, 3], [1, 2, 3]))
pytorch实现
import torch
import torch.nn.functional as Fvec1 = torch.FloatTensor([1, 2, 3, 4])
vec2 = torch.FloatTensor([5, 6, 7, 8])cos_sim = F.cosine_similarity(vec1, vec2, dim=0)
print(cos_sim)