Lev Konstantinovskiy - Text similiarity with the next generation of word embeddings in Gensim





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Jul 26, 2017

What is the closest word to "king"? Is it "Canute" or is it "crowned"? There are many ways to define "similar words" and "similar texts". Depending on your definition you should choose a word embedding to use. There is a new generation of word embeddings added to Gensim open source NLP package using morphological information and learning-to-rank: Facebook's FastText, VarEmbed and WordRank.

There are many ways to find similar words/docs with an open-source Natural Language processing library Gensim that I maintain. I will give an overview of modern word embeddings like Google's Word2vec, Facebook's FastText, GloVe, WordRank, VarEmbed and discuss what business tasks fit them best.

What is the most similar word to "king"? It depends on what you mean by similar. "King" can be interchanged with "Canute", but it's attribute is "crown". We will discuss how to achieve these two kinds of similarity from word embeddings.


PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...