Ryan Henderson - One in a billion: finding matching images in very large corpora





The interactive transcript could not be loaded.


Rating is available when the video has been rented.
This feature is not available right now. Please try again later.
Published on May 31, 2016

PyData Berlin 2016

The goal was not only to support high write volumes of over 10k/s but also to support fast lookup of similar images around 1-2s for over 1B images. Though similar paid services and free image hashing libraries exist, this may be the first complete free open-source solution. Available at: https://github.com/ascribe/image-match

image-match started as an internal project. We needed a way, given some target image, to find similar images downloaded by our web-crawler (think Tineye).

So not only did we need to support fast, accurate lookup for millions or even billions of images, we also needed to facilitate very high volume insertion -- around 10k images per second.

In my talk, I will cover:

- The Problem: why is finding similar images hard?
- Algorithm: based on this paper
- Performance: but does it scale?
- Alternatives

Comments are disabled for this video.
When autoplay is enabled, a suggested video will automatically play next.

Up next

to add this to Watch Later

Add to

Loading playlists...