Vaibhav Singh, Jaroslaw Szymczak - Machine Learning to moderate ads





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on Jul 26, 2017

Vaibhav Singh, Jaroslaw Szymczak - Machine Learning to moderate ads in real world classified's business

In todays world of online business, it is difficult to moderate all the content coming to your site. In this talk we share our experiences on how we built machine learning models to moderate 100+ million classified ads every month. Audience will get a chance to experience a real world of content moderation and a race to beat online fraudsters and scammers.

In an online classified's business, one may encounter a lot of spam and fraud once the business starts to grow. One way to inhibit this is to moderate all incoming advertisements by using static filters or having human moderators but this may not go a long way if the business deals with millions of advertisements every day. Static filters may catch good advertisements and flag them as bad and would also require humans to add, remove or improve them. On the other hand employing human moderators to moderate all incoming advertisements does not scale. Creating machine learning models is what we believe is the right way to address this kind of problem. Machine learning models identifies patterns in data and classifies ads thereby reducing the overhead of creating complex filters and reducing number of human moderators

In this talk we share our experiences in building machine learning models to act as human moderators. This talk will cover mainly the following topics

-Creating a simple platform architecture that can do predictions on millions of requests without spending too much resources on devops and machines -- Batching of requests so as to use CPU's optimally. -- Containerising code so as to have ease of deployments
--Creating models from training set containing millions of rows and thousands of features which can be trained on simple machines rather than using complex Spark Hadoop Architectures. -- Using SVM files as a means data format rather than huge dataframes that can not fit in memory
-Orchestrate model generation pipeline using Luigi workflow.
-Controlling error rate using prediction probability thresholds
-Evaluating moderation/fraud detection models.
-Management of hundreds of models and manage their performance across all geographical regions.


PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Comments are turned off
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...