💬
NLP, Organizations, and Markets
Search
Try Notion
💬💬
NLP, Organizations, and Markets
Hi there! I’m Simone Santoni, an economic sociologist with a taste for fancy methods 🧐
It seems you landed on my ’NLP, Organizations, and Markets’ GitHub Page. Here, you can find ideas, toolkits, and examples showing how to harness NLP to understand the functioning of organizations and markets. You’ll also come across ‘theoretical stuff’ regarding the foundations of NLP and some prominent algorithms (e.g., Mikolov et al’s word2vec).
Mainly, this GitHub Page condensates five years of experience I developed by researching, using, and teaching NLP. Talking about teaching: during the Summer Term of 2022, I’ll be offering a course on NLP for the MSc students of Bayes Business School (formerly, Cass Business School). I’ll expand the materials available in this GitHub Page on a rolling basis as I go through the various weeks of my NLP course, ending on the first week of July. Meanwhile… hang tight 🎢
Scope
What you’ll find in this GitHub page
Key aspects of algorithms playing a core role in NLP (e.g., word2vec)
A high-level description of prominent NLP tools (e.g., topic modeling)
Examples showing how to use prominent NLP tools to get a closer understanding of organizations and markets
Python scripts to deploy ‘as is’ or adapt to make things happen
Sample datasets
… and what you won’t find
A comprehensive survey of NLP-related algorithms and tools
Materials on NLP applications that pursue operational goals (e.g., chatbots, translation)
Advice on how to improve the performance of extant NLP-related algorithms
Building blocks
Table
Filter
Sort
Building block
Synopsis
Tags
This block highlights the origins of NLP, some key NLP frameworks, and what we can do with them. A series of examples and Python scripts show how to manipulate text with NTLK — the Swiss army knife of NLP — and spaCy — a top-class library to pre-process and analyze text corpora at scale
Computational linguistics
NLP frameworks
spaCy
Text pipelines
NLTK
In this block, the attention revolves around the analytical and computational strategies to model the meaning included in a corpus of text: i) human-annotated dictionaries, and ii) word vectors. A series of examples and Python scripts show how to leverage human-annotated dictionaries and learn word vectors using text corpora regarding organizations and markets
Computational linguistics
Distributional HP
Connotations
Word vectors
Human-annotated dictionaries
NLTK
This block focuses on embeddings — a framework that relies on ML/DL to learn word vectors. A series of examples and Python scripts show how to harness word vectors for the analysis of organizations and markets
word2vec
Model language
GloVe
fasttext
BERT
Gensim
NumPy
SciPy
sent2vec
doc2vec
Foundational models
Revealing the hidden themes in a corpus of text is the subject of this block. We’ll see how to design and evaluate a topic model and to post-process topic modeling outcome. A series of examples and Python scripts show how to deploy topic modeling to analyze text corpora comprising corporate filings, financial analyst reports, or product reviews
LDA
Tomotopy
Gensim
Unsupervised learning
This block copes with the problem of text classification, the task behind sentiment analysis, and many other NLP frameworks. A series of examples and Python scripts illustrate how to implement different classifiers, from the Naive Bayes Classifier to Deep-Learning powered classifiers. Special attention is devoted to product review data
Sentiment analysis
Affect lexicons
NBC
Custom affect lexicons
PyTorch
NLP frameworks
Semi-supervised learning
Word vectors
Supervised learning
Here, the focus is on various tasks that fall within the remit of information extraction. Examples include Named Entity Recognizer, identifying events, times, and relations among entities. A series of Python scripts illustrate how to extract ‘structured’ information out of a variety of text corpora comprising data on organizations and markets
Named entity recognition
Prodigy
flair
spaCy
Supervised learning
How to approach the individual building blocks
Goals
If you’re stubborn enough to engage with all my materials , then you may (say, will 💪) be able to:
clean, prepare, and transform text corpora containing organization- and marker-level data
design and operate a variety of NLP pipelines
associate the most appropriate NLP framework/tools with specific analytic problems
translate NLP outcomes into valuable insights regarding organizations and markets
Materials
Materials are organized around 🛠️NLP tools. For each NLP tool, I provide a learning package comprising a chunk of (narrated) decks, quizzes to self-evaluate your learning, Python scripts and problem sets to consolidate your NLP skills, and case studies showing how to mobilize NLP tools to address relevant problems concerning the functioning of organizations and markets.
For Bayes Business School students
The Moodle page of SMM694 contains all the important information regarding the organization of the module I’ll be teaching over Summer 2022. Students are required to refer to the Moodle page 🙏. In the interest of redundancy, here, I report some critical information such as the calendar of lectures, a suggested timelines — i.e., what to study when — and the text of the assignments.
Spotlight
The economic relevance of NLP
Where NLP comes from
r/
FAQ
Who is this for?
People with a minimal programming background and a taste for analyzing organizations and markets
Why did you build this?
Because I think NLP tools can radically advance our understanding of organization and markets — however, NLP tools are still limitedly diffused
How to use this?
Familiarize with NLP tools — test your knowledge — apply what you learn to problems regarding organizations and markets
Love this GitHub Page?
Share your contacts by dropping a line to simone.santoni.1@city.ac.uk
Built with Notion.so & Loconotion
09:10
Gallery view