NLP in a Great Hurry

(Abstracted with permission from NLP in a Hurry by Pier Lim.)

Here is a collection of different Python libraries for natural language processing (NLP) which can be invaluable for rapid prototyping.

Semantic Similarity

Sentence Transformershttps://github.com/UKPLab/sentence-transformers

BERT/XLNET produces rather bad sentence embeddings out-of-the-box . This library helps you produce your own sentence embeddings tuned for your specific task. This would be useful for anything to do with semantic textual similarity, clustering and semantic search.

Rule-based Text Sentiment for Social Media

VaderSentimenthttps://github.com/cjhutto/vaderSentiment

While deep learning models are cool, rule-based models still have their place under the sun. Especially when you don’t have a lot of data and time to tune your model. The library describes itself as specifically attuned to sentiments expressed in social media. This means that emoticons and sentiment intensity markers (e.g. “!!!”) are taken into account.

Named Entity Recognition

SpaCyhttps://spacy.io/usage/linguistic-features#named-entities-101

SpaCy is a popular open-source library which can be used for production. Apart from the default entities, spaCy also gives us the liberty to add arbitrary classes to the NER model, by training the model to update it with newer trained examples. This blog post shows how.

Production-Ready BERT Models

BERT-as-a-Servicehttps://github.com/hanxiao/bert-as-service

BERT-as-a-Service wraps the BERT code and serves it using ZeroMQ, allowing one to serve BERT embeddings with just a few lines of code which is fast (optimised), scalable and reliable.

Related Articles

Analysis of Tweets on the Hong Kong Protest Movement 2019 with Python demonstrates the use of the Vader tool.

Author