Sklearn Paired Cosine Distance Issue by napsternxg

0CommentsShare PostShare on Facebook Share on XShare by EmailSend Link

News

Sklearn Paired Cosine Distance Issue by napsternxg

ByHackTech September 7, 2023

0Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

Here I explore an issue with how sklearn’s paired_cosine_distances function returns erronous values when we have one vector with zero norm.

import numpy as np
from sklearn.metrics.pairwise import paired_cosine_distances, paired_euclidean_distances
from sklearn.feature_extraction.text import TfidfVectorizer

paired_cosine_distances(
    np.array([[1, 1], [0, 1], [0, 0], [1, 0]]),
    np.array([[1, 1], [0, 1], [0, 1], [0, 1]])
)
# Outputs: array([0. , 0. , 0.5, 1. ])

# dot products
(np.array([[1, 1], [0, 1], [0, 0], [1, 0]]) * np.array([[1, 1], [0, 1], [0, 1], [0, 1]])).sum(axis=-1)
# Outputs: array([2, 1, 0, 0])

Dot product between [0, 0] and [0, 1] is zero and hence cosine sim should also be zero (or at best undefined).
However, sklearn paried cosine dist will give a value of 0.5 for this case which is not realistic.
The r

Tags: Paired Sklearn

0Likes

Written by

HackTech

View all posts by HackTech

Sklearn Paired Cosine Distance Issue by napsternxg

Sklearn Paired Cosine Distance Issue by napsternxg

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Sklearn Paired Cosine Distance Issue by napsternxg

Sklearn Paired Cosine Distance Issue by napsternxg

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter