Loading...

Robert Meyer - Analysing user comments with Doc2Vec and Machine Learning classification

18,716 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Jul 26, 2017

Description
I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Can we determine for a particular user comment from which news site it originated?

Abstract
Doc2Vec is a nice neural network framework for text analysis. The machine learning technique computes so called document and word embeddings, i.e. vector representations of documents and words. These representations can be used to uncover semantic relations. For instance, Doc2Vec may learn that the word "King" is similar to "Queen" but less so to "Database".

I used the Doc2Vec framework to analyze user comments on German online news articles and uncovered some interesting relations among the data. Furthermore, I fed the resulting Doc2Vec document embeddings as inputs to a supervised machine learning classifier. Accordingly, given a particular comment, can we determine from which news site it originated? Are there patterns among user comments? Can we identify stereotypical comments for different news sites? Besides presenting the results of my experiments, I will give a short introduction to Doc2Vec.

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...