/
...
/
/
👩‍⚕️
Semantic similarity between words — a thought experiment
Search
Try Notion
👩‍⚕️👩‍⚕️
Semantic similarity between words — a thought experiment
In 👩‍⚕️Semantic similarity between words — a thought experiment, we explore the semantic similarities between target words. In other circumstances, we do have expectations about the semantic associations that tie words together. For example, there’s abundant research suggesting that professions are subject to gender-based stereotypes. Since our childhood, we’re socialized in cultural representations of how male and female jobs look like. Let’s see how ‘doctors’ and ‘nurse’ map onto the words ‘father’ and ‘mother’.
We start by importing one of spaCy’s models of the language.
Python
Copy
>>> import numpy as np >>> import matplotlib >>> import matplotlib.pyplot as plt >>> from sklearn.manifold import TSNE >>> import spacy
Here’s the list containing the four words that will form our bare bone semantic network (i.e., a set of words along with the semantic relationships linking them).
Python
Copy
>>> my_words = ["doctor", "father", "mother", "nurse"]
The next step is to retrieve the word embeddings associated with each word included in ‘my_words’. Before doing that, it’s necessary to load a model of the language, though (step 1). Then, we create an empty dictionary to store the word embeddings (step 2) and we populate it with the retrieved word vectors (step 3). Finally, we arrange the data in a numpy array (step 4) whose dimensionality we want to reduce (step 5).
Python
Copy
>>> nlp = spacy.load("en_core_web_lg") # step 1 >>> word_embeddings = {} # step 2 >>> for item in my_words: # step 3 word_embeddings[item] = nlp.vocab[item].vector # step 4 >>> X = np.array([word_embeddings[item] for item in my_words]) # step 5 >>> X_embedded = TSNE( n_components=2, learning_rate="auto", init="random" ).fit_transform(X)
The scatter diagram created with Matplotlib suggests that:
dimension 1 maps words onto gender differences
dimension 2 distinguishes words associated with kinship (lower section) and words associated with professions (upper section)
mapping the bottom and upper section onto each other, it seems that ‘father stands to doctor as mother to nurse’. In other words, the professions of doctor and nurse are affected by gender-based sterotypes.
Python
Copy
>>> fig = plt.figure(figsize=(4, 4)) >>> ax = fig.add_subplot(1, 1, 1) >>> for item, coordinates in zip(my_words, X_embedded): ax.scatter(coordinates[0], coordinates[1], color="k") ax.annotate(item, (coordinates[0] + 2, coordinates[1] + 2)) >>> ax.set_xlabel("Dimension 1") >>> ax.set_ylabel("Dimension 2") # increase margins ax.margins(0.4) >>> ax.set_xticks([]) >>> ax.set_yticks([]) >>> plt.show()
This snippet comes from the Python script “whatlies.py”, hosted in the GitHub repo simoneSantoni/NLP-orgs-markets.