Loading...

Miroslav Batchkarov - Gold standard data: lessons from the trenches

360 views

Loading...

Loading...

Transcript

The interactive transcript could not be loaded.

Loading...

Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Jul 26, 2017

Description
The first stage in a data science project is often to collect training data. However, getting a good data set is surprisingly tricky and takes longer than one expects. This talk describes our experiences in labelling gold-standard data and the lessons we learnt the hard way. We will present three case studies from natural language processing and discuss the challenges we encountered.

Abstract
It is often said that rather than spending a month figuring out how to apply unsupervised learning to a problem domain, a data scientist should spend a week labelling data. However, the difficulty of annotating data is often underestimated. Gathering a sufficiently large collection of good-quality labelled data requires careful problem definition and multiple iterations. In this talk, I will describe three case studies and lessons learnt from them. Each case shows several aspect of the process that should be considered in advance to ensure the project is successful.

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next


to add this to Watch Later

Add to

Loading playlists...