I was born with congenital anosmia i.e. I cannot and have never been able to smell. Farts, flowers, cookies, and perfume; I have no personal experience of any of these smells. Yet, I can tell you that farts take over a room and cookies smell like home. This is all picked up through context. It’s picked up from watching my friends retching at the stench of the small animal that died in the vents of my middle school. For me, a smell is defined by its relation to other smells and the emotive descriptions of others. This is not altogether different from a sentence embedding.
I want to eventually build a system that can help me interpret smells. Now, I know I could ask an LLM (or a friend) to describe the smell of something but I do wonder if vector addition could provide some unexpected insights. What smells are quite similar but distant in context? I’d also like to try using reduced vectors to generate music or some other synesthetic output.
In this post, I’m going to explore vector addition and vector rotations as a means of modifying and interpreting these embeddings. My explorations are (mostly) a failure although hopefully, my process might save someone else some time.
If you have any ideas or corrections, please email me at ted@timbrell.dev
Background
My inspiration for this comes from Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Except in this case, I’m using sentence embeddings rather than word embeddings. If I want to embed smells, I’ll need to be able to input “the smell of red wine.”
from openai import OpenAI
import numpy as np
import heapq
import pandas as pd
import os
openai_client = OpenAI()
def get_embedding_openai(names):
response = openai_client.embeddings.create(
input=names, model="text-embedding-3-small"
)
return np.array([d.embedding for d in response.data])
king, queen, man, woman, prince, princess = get_embedding_openai(
[
"king of england",
"queen of england",
"man",
"woman",
"prince of england",
"princess of england",
]
)
son, daughter, actor, actress, steward, stewardess = get_embedding_openai([
"son",
"daughter",
"actor",
"actress",
"steward",
"stewardess",
])
Q: Wait aren’t you supposed to be using smells?
It’s pretty hard to reason about something you can’t experience. I might know fresh coffee smells good in the morning but I can’t tell how similar/dissimilar that is the smell of dew in the morning. I’ll get to smells in a later post.
I’m already making a jump from word embeddings to sentence embeddings so I believe it’s worth revisitng gender before moving on. It also has the benefit of being easy to generate examples for and is built into the English language. I’ll be using cosine similarity and Euclidean distance to get a sense of the distance between the vectors.
def cosine_similarity(vec1, vec2):
vec1 = np.array(vec1)
vec2 = np.array(vec2)
dot_product = np.dot(vec1, vec2)
magnitude_vec1 = np.linalg.norm(vec1)
magnitude_vec2 = np.linalg.norm(vec2)
if magnitude_vec1 == 0 or magnitude_vec2 == 0:
return 0.0
return dot_product / (magnitude_vec1 * magnitude_vec2)
def euc_dist(a, b):
return sum(abs(a - b))
Simple example: vector offsets and addition
Let’s try to get the vector for “King” from the vector for “Queen”.
male_offset = man - woman
added_queen = queen + male_offset
print(f"{cosine_similarity(king, queen)=}")
print(f"{cosine_similarity(king, added_queen)=}")
cosine_similarity(king, queen)=np.float64(0.7561968293567973)
cosine_similarity(king, added_queen)=np.float64(0.7436281583952487)
print(f"{euc_dist(king, queen)=}")
print(f"{euc_dist(king, added_queen)=}")
euc_dist(king, queen)=np.float64(21.68610072977549)
euc_dist(king, added_queen)=np.float64(23.88454184561374)
Well, that’s annoying. Unlike what I’d expect from the word embedding paper, our vector for “Queen” plus the gender offset is further away from the vector for “King” both in angle and Euclidean distance.
I’m also surprised by just how little the similarity metrics moved. Then again, the geometry is unclear here. The vector offset might be going in the wrong direction or under/overshooting.
f"man - woman offset magnitude: {np.linalg.norm(male_offset)}", f"King - queen offset magnitude {np.linalg.norm(king - queen)}"
('man - woman offset magnitude: 0.7648199511941038',
'King - queen offset magnitude 0.6982881772713763')
So we’re moving, roughly, the same distance as we’d need to reach the “king” vector.
f"{np.arccos(cosine_similarity(added_queen, queen))} radians between added_queen and queen"
f"{np.arccos(cosine_similarity(king, queen))} radians between king and queen"
'0.7318444725115928 radians between added_queen and queen'
'0.7133150836556438 radians between king and queen'
And we’re changing our angle by roughly the same amount as expected… just not in the right direction.
Let’s take a look at the cosine similarity between these gendered offsets
gender_vectors = [
man - woman,
king - queen,
prince - princess,
son - daughter,
actor - actress,
steward - stewardess,
]
for idx in range(len(gender_vectors)):
gender_vectors[idx] /= np.linalg.norm(gender_vectors[0])
res = np.zeros(shape=(len(gender_vectors), len(gender_vectors)))
for r in range(len(gender_vectors)):
for c in range(len(gender_vectors)):
res[r, c] = cosine_similarity(gender_vectors[r], gender_vectors[c])
pd.DataFrame(res)
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | 1.000000 | 0.455847 | 0.438728 | 0.244890 | 0.276235 | 0.214461 |
1 | 0.455847 | 1.000000 | 0.657469 | 0.222574 | 0.386973 | 0.355544 |
2 | 0.438728 | 0.657469 | 1.000000 | 0.229321 | 0.337566 | 0.385467 |
3 | 0.244890 | 0.222574 | 0.229321 | 1.000000 | 0.154418 | 0.111242 |
4 | 0.276235 | 0.386973 | 0.337566 | 0.154418 | 1.000000 | 0.232568 |
5 | 0.214461 | 0.355544 | 0.385467 | 0.111242 | 0.232568 | 1.000000 |
Despite the thought that these are just gendered versions of the same concept… the offsets point in quite different directions. son - daughter
differs from steward - stewardess
by 1.47 radians (or 84 degrees).
Rotation
I’m not up to date on research into embeddings but I find the use of vector addition for these analyses odd. I know that these vectors are generated through a series of additions and activations but if these models are normalizing everything to a unit vector and comparing everything with cosine similarity are we not inherently saying that it’s the angles that matter?
To that end, what if I rotate the “queen” vector along the plane created by the “man” and “woman” vectors? The vectors for “king” and “queen” have to be offset from our vectors for “man” and “woman”. We also know that embeddings capture more concepts than just the N dimensions represented in the vector. A rotation, while more expensive, could help in the case of an angular difference between the initial vector pair and the compared vector pair.
Below, we try rotating our queen vector with the rotation matrix found from getting to “man” from “woman”.
def compute_nd_rotation_matrix(a, b):
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
cos_theta = np.dot(a_norm, b_norm)
cos_theta = np.clip(cos_theta, -1.0, 1.0)
angle = np.arccos(cos_theta)
v = b_norm - np.dot(b_norm, a_norm) * a_norm
v_norm = np.linalg.norm(v)
if v_norm < 1e-8: # a and b are collinear
return np.eye(len(a)),
v = v / v_norm
identity = np.eye(len(a))
outer_aa = np.outer(a_norm, a_norm)
outer_av = np.outer(a_norm, v)
outer_va = np.outer(v, a_norm)
outer_vv = np.outer(v, v)
R = (
identity
+ np.sin(angle) * (outer_va - outer_av)
+ (np.cos(angle) - 1) * (outer_vv + outer_aa)
)
return R, angle
gender_rotation, gender_angle = compute_nd_rotation_matrix(woman, man)
rotated_queen = np.dot(gender_rotation, queen)
def highlight_max(s):
is_max = s == s.max()
return ["font-weight: bold" if v else "" for v in is_max]
def highlight_min(s):
is_min = s == s.min()
return ["font-weight: bold" if v else "" for v in is_min]
def compute_results(*, target, source, offset, rotation):
target_norm = target / np.linalg.norm(target)
source_norm = source / np.linalg.norm(source)
added_source = source_norm + offset
added_source /= np.linalg.norm(added_source)
rotated_source = np.dot(rotation, source_norm)
rotated_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, rotated_source),
"euclidean_distance": euc_dist(target_norm, rotated_source),
}
summed_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, added_source),
"euclidean_distance": euc_dist(target_norm, added_source),
}
original_vector_metrics = {
"cosine_similarity": cosine_similarity(target_norm, source_norm),
"euclidean_distance": euc_dist(target_norm, source_norm),
}
df = pd.DataFrame(
{
"Original Vector": original_vector_metrics,
"Summed Vector": summed_vector_metrics,
"Rotated Vector": rotated_vector_metrics,
}
).T
return df
def style_results(df):
styled_df = df.style.apply(highlight_max, subset=["cosine_similarity"])
styled_df.apply(highlight_min, subset=["euclidean_distance"])
return styled_df
style_results(
compute_results(
target=king, source=queen, offset=male_offset, rotation=gender_rotation
)
)
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.756197 | 21.686100 |
Summed Vector | 0.743628 | 22.333814 |
Rotated Vector | 0.800727 | 19.759377 |
np.arccos(0.756197)- np.arccos(0.800727)
np.float64(0.07102636138332474)
The rotation helps! Though, it only moves us 0.07 radians (4 degrees) closer.
Let’s try with other gendered titles.
Below we try with “prince” and “princess”,
style_results(compute_results(target=prince, source=princess, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.798122 | 19.734372 |
Summed Vector | 0.752733 | 21.830848 |
Rotated Vector | 0.831910 | 17.951380 |
This yields similar results, although this is just another title for royalty.
style_results(compute_results(target=son, source=daughter, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.506902 | 30.924019 |
Summed Vector | 0.505972 | 31.099625 |
Rotated Vector | 0.519920 | 30.628173 |
style_results(compute_results(target=actor, source=actress, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.618884 | 27.222238 |
Summed Vector | 0.565936 | 29.213596 |
Rotated Vector | 0.644523 | 26.165533 |
style_results(compute_results(target=steward, source=stewardess, rotation=gender_rotation, offset=male_offset))
cosine_similarity | euclidean_distance | |
---|---|---|
Original Vector | 0.753096 | 21.497546 |
Summed Vector | 0.633975 | 26.397621 |
Rotated Vector | 0.760474 | 21.201120 |
I have two takeaways here, 1) that the summed vector is worse in every pairing and 2) that the rotated vector is encoding some aspect of gender (but the improvement is quite small). Let’s explore each of these.
1. The summed vector is further from the target than the original vector of all title pairs.
I find this result suspicious. Let’s try scaling the offset vector to see if I can get a better result.
from scipy.optimize import minimize
def objective(k, source, offset, target):
adjusted_vector = source + k * offset
return -cosine_similarity(adjusted_vector, target)
options = []
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
initial_k = 0.0
result = minimize(objective, initial_k, args=(source, male_offset, target))
optimal_k = result.x[0]
options.append(optimal_k)
print(optimal_k)
average_k = sum(options) / len(options)
print(f"Average K: {average_k}")
0.4442216918664982
0.3758034405295738
0.45508087793089264
0.3248524288805894
0.18004962701645405
Average K: 0.3560016132448016
Above, I’m printing the individual best-fit scalar modifier for our gender offset vector for each pair. We can see it’s overshooting in every case.
For simplicity, let’s average the optimal scalar and recompute the similarity stats. This is not optimal as I should be minimizing on the batch and then testing on out-of-sample data.
Trivial to say; that a singular, consistent magnitude for the offset would have been nice. In the case where we don’t have a known target, that offset would allow us to naively add/subtract the gender offset to a source vector and have confidence in its meaning.
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
display(style_results(compute_results(target=target, source=source, rotation=gender_rotation, offset=male_offset * average_k)))
print()
Queen -> King | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.756197 | 21.686100 |
Summed Vector | 0.801380 | 19.744261 |
Rotated Vector | 0.800727 | 19.759377 |
Princess -> Prince | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.798122 | 19.734372 |
Summed Vector | 0.833044 | 18.017618 |
Rotated Vector | 0.831910 | 17.951380 |
Daughter -> Son | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.506902 | 30.924019 |
Summed Vector | 0.537458 | 29.976789 |
Rotated Vector | 0.519920 | 30.628173 |
Actress -> Actor | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.618884 | 27.222238 |
Summed Vector | 0.638717 | 26.398096 |
Rotated Vector | 0.644523 | 26.165533 |
Stewardess -> Steward | cosine_similarity | euclidean_distance |
---|---|---|
Original Vector | 0.753096 | 21.497546 |
Summed Vector | 0.753192 | 21.533765 |
Rotated Vector | 0.760474 | 21.201120 |
There we go! Our summed vector is now better or at least matches our original vector’s similarity. The summed vector now also matches the performance of the rotated vector, although it required an additional optimization step and K chosen in-sample.
Our largest outlier pair when optimizing for our scalar K was “steward” and “stewardess”. The optimal scalar K for that pair is half the average. Still, we the addition does no harm in terms of distance from the target. Though, we see the rotated vector makes progress in approaching the target vector.
I recognize that only using the “man” and “woman” vectors to generate the offset is a bit silly. Using a broader collection of gendered words, sentences, titles, etc., and averaging them to create average “man” and “woman” vectors before taking the offset is best practice. However, I’m going to have limited data once I get to smells so I’m trying to keep this simple.
Now that I’ve resolved the issue with vector addition, let’s go back to rotations!
2. The rotated vector is closer to the target!
The rotated vector is helping a little but isn’t closing much of the gap between the vectors. As we saw earlier, we’re rotating by roughly the correct amount, but on the wrong plane.
I think I’m encoding some concept of gender in the rotation but am I accounting for it entirely? While I might say there are no differences between a King and a Queen the studies on bias in LLMs show us that isn’t the case. I expect some difference in the transformed vectors no matter what naive transformations are performed. Gender stereotypes encoded into the embedding of “prince of England” might not be encoded into the general embedding for “man” (or are lessened through averaging). So, is the remaining distance due to other features/meanings or am I failing to account for general aspects of gender?
To start with, let’s make this a fair comparison with the offset and optimize the angle (magnitude) of rotation. After all, my hypothesis is that the two-dimensional plane can be treated as a feature, and its angle as a magnitude.
A scalar product for our angle wouldn’t be all that interpretable so instead, I’ll optimize for the angle of rotation directly.
def compute_nd_rotation_matrix(a, b, angle=None):
a_norm = a / np.linalg.norm(a)
b_norm = b / np.linalg.norm(b)
cos_theta = np.dot(a_norm, b_norm)
cos_theta = np.clip(cos_theta, -1.0, 1.0)
if angle is None:
angle = np.arccos(cos_theta)
v = b_norm - np.dot(b_norm, a_norm) * a_norm
v_norm = np.linalg.norm(v)
if v_norm < 1e-8: # a and b are collinear
return np.eye(len(a)),
v = v / v_norm
identity = np.eye(len(a))
outer_aa = np.outer(a_norm, a_norm)
outer_av = np.outer(a_norm, v)
outer_va = np.outer(v, a_norm)
outer_vv = np.outer(v, v)
R = (
identity
+ np.sin(angle) * (outer_va - outer_av)
+ (np.cos(angle) - 1) * (outer_vv + outer_aa)
)
return R, angle
def objective(m, source, base_source, base_target , target):
R, angle = compute_nd_rotation_matrix(base_source, base_target, m)
adjusted_vector = np.dot(R, source)
return -cosine_similarity(adjusted_vector, target)
options = []
for target, source in [
(king, queen),
(prince, princess),
(son, daughter),
(actor, actress),
(steward, stewardess),
]:
initial_m = gender_angle
result = minimize(objective, initial_m, args=(source, woman, man, target))
optimal_m = result.x[0]
options.append(optimal_m)
print(optimal_m)
print()
average_m = sum(options) / len(options)
print(f"Average Optimized Angle: {average_m} (radians)")
print(f"Orginal Angle: {gender_angle} (radians)")
print(f"Difference: {abs(average_m - gender_angle)} radians, {np.rad2deg(abs