Hendrik Heuer - Data Science for Digital Humanities: Extracting meaning from Images and Text





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Jul 26, 2017

Analyzing millions of images and enormous text sources using machine learning and deep learning techniques is simple and straightforward in the Python ecosystem. Powerful machine learning algorithms and interactive visualization frameworks make it easy to conduct and communicate large scale experiments. Exploring this data can yield new insights for researchers, journalists, and businesses.

The focus of this talk is extracting meaning from data and making powerful methods usable by everybody. With the advent of big data, new approaches and technologies are needed to tackle the increase in volume, variety, and velocity of data. This talk illustrates how analysts, journalists, and scientists can benefit from exploratory data analysis and data science.

Imagine a journalist who wants to cross-reference the names on the guest list of a parliament with online information about lobbyists to identify which party meets which company. A business analyst might want to quantify what topics certain customers are discussing on Twitter or how their sentiment towards a particular product is. Exploratory data analysis and data science techniques enable researchers, journalists and businesses to ask bigger and more ambitious questions than anybody before them and to leverage the abundance of information that is available today.

The Digital Humanities are located at the intersection of computing and the disciplines of the humanities. They can benefit from the massive-scale automated analysis of content like images and text. Researchers, analysts, and journalists can quantify the state of society from publicly available data like tweets. It is now possible to construct an almost complete map of our civilization just by looking at the tags and GPS coordinates of Flickr photos.

A vast Python ecosystem is supporting this including machine learning frameworks like scikit-learn, dedicated deep learning frameworks like Keras, and topic modeling tools like gensim. All these tools are open source and can be integrated into powerful data science pipelines. Rather than training neural networks from scratch, pretrained features for text and images can be adapted for fast results.


PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Comments are disabled for this video.

to add this to Watch Later

Add to

Loading playlists...