 I had a pretty extensive stint at Avanad. Who knows about Avanad? It's a consulting firm, 80% owned by Accenture and 20% owned by Microsoft. And I inherited some of the traits. I'm a Windows guy, so I don't use Mac. And after that pretty extensive stint, I ventured into a startup myself on the retail domain. The use case is very simple. It's no secret. Whenever you go to a retail outlet, you get paper receipts dispensed by a printer. So our idea was to convert it to an e-receipt dispensed by an NFC writer to your mobile phone. And then build a huge cloud service to get all the e-receipts and build on top of it. It's something similar to min.com, but applied to retail domain. And we failed fast. And after that, I'm consulting to this company now, Playware Studios. And I'm working there as an engineering manager. And we do data science stuff as well. Before I go into the slides proper, I just want to give you some context of... So we all use Google, always. And if we don't have Google, we will die. So here, for example, you can type anything and you get results, right? And what about image search? And you remember there's a little nifty icon there. So you can click on that. And you can paste a URL or you can upload an image. So I'm going to upload an image now. So let's say, let's take this image. So there you go. So basically it says that this is the image and this is the size of the image. And it tries to identify the image, which is a bag. And then it gives you similar bags. Which is nearest to that particular image. These are the nearest images to the bag. Yeah, sometimes. It's very basic. And think about how much resources would have gone in, how much machine learning code, how much infrastructure would have gone in to build this whole big search engine, especially with respect to image. Keep the thought aside. Let me move on to the slides now. So today I'm going to talk about, well, whether I'm going to inspire you or bore you, I would know at the end of the talk. So first I'm going to talk about the motivation and challenge about building this experiment to build the visual search engine. And how did I end up from concept to code? I think pretty much I hacked it around in two to three weeks time. And then the text stack involved. And I'll show you about the code and demo. And I'll talk about the search singularity, the topic itself. And finally about the collaborative call. There's one more motivation because I was talking about Google. There's one more motivation which I want to talk about. I'm an avid user of this app. So I think you have a lot of things in your home which you want to declutter and you want to sell away. And you can always save money with this. So I think you would have guessed what is this app is all about. And it's so incidental that I didn't pick up Carousel as a use case. But I ended up, it was a mere coincidence. Because when I finished this experiment, I did blog it. It was around the end of November-December timeline. So I didn't expect that I would present the same thing in Carousel. So you know the tagline. So at that point of time I was thinking, what if I could snap an image and then look up at Carousel to see whether that particular object of my passion or my interest is available in the second-hand market. If somebody wants to sell that particular book, just say for $2, it will be a great bargain for me. So that was one of the motivation. I thought, well, if there could be an ifty feature, that will be very useful. So perhaps then you had to change the tagline something like this. So that's one of the motivations. So how do I build a solution, a reverse visual search engine? So these things came to my mind. So what is the source? Where can I get the images? What kind of compute should I use? Should I use cloud or should I just go with my laptop? Because it's going to be an experiment. And what kind of models can I use? Machine learning models that can speed up my search. And with respect to performance, how performant it should be? Should it return results in a minute or maybe three seconds? Because in Google, we saw it was returning about 0.5 seconds. So how do I present this? Should I build a mobile app or should I build a web app? So these were the considerations when I started the experiment. Well, I was thinking, I was doing a thought experiment. If I'm running a company, so what would I do? Would I build or buy? And there were a lot of third-party services available on the web. So first is the Viscence. Viscence is a cloud web service which allows you to do visual search. So basically if you have millions of images, you just send it to them and then they will give you an API to do the search. And they are a local firm. Viscence is a local firm. And the other ones are from US. And Thread Genius is the late entry into the market. They are very good as well. So in my case, what's my personal take and motivation to build this is basically just to build a simple prototype for web. Don't build a mobile application because it's going to be a bit complicated. Avoid data leak. Well, if I'm going to run a startup and if I have lots and lots of images, I don't want to send all my images to a third-party service. So that's one of the motivations as well. So last one is to learn new things in machine learning in Python. So that's the idea behind. So again, concept to code. So first thing is sources. Well, I was looking at Carousel and then I was, okay, I'll take some of the images from Carousel itself to build the prototype. So I'll build the crawler and I'll capture the metadata and I'll also put it on an image repository. Second one, I'll use Cloud. That's where I think the title comes, that you can build a reverse engine for $0. Well, if you go to Google now and if you put your credit card, they'll give you three months free of compute time and with pretty large servers. So that's why I said, the other thing is, as you can see, BYOD. I think that's the metaphor I want to try to use here, bring your own device. So if I'm a startup, I want to build my own service. So that's the idea behind. So development. When I was doing this experiment, the timeline was November last year, November, December. Until November, TensorFlow was not available on Windows platform. So I can't use my Windows machine. So that is one of the key points here. So I have to move on to Linux 10. Of course, I don't want to build my own model. I don't want to train huge number of images. So I will go for a pre-trained ML model and using that model, I'll extract the images. So the last one is I was looking at three to four seconds search response of 100k images. And how do I do the web UI? So I wanted to learn on Python Flask. Sorry, I'll come back to that. So basically it should allow you to upload an image, something similar to Google. And then it should allow me to crop the image. Whenever you take a photo, it's going to be huge. Maybe you want to focus on a particular area. So that's where you should have a cropping functionality. And it should also check for appropriateness. Sometimes if you're going to build a commercial service, somebody is going to upload certain inappropriate content. So it should flag whether it's appropriate or inappropriate. Well, text stack. Well, the crawler is built on Python pure Python code. And I'm using SQL Lite as my image repository. Linux file storage to keep all your image files and index files. Well, I used Google Cloud. It gave me three months free trial. I settled upon Ubuntu Linux VM. So I have no choice. I don't have a Mac, so I have to go with that. And since I'm a Microsoft guy, I went with VS Code. It's a very nifty editor. So I used that. And I used Inception v3 model, which is a pre-trained model on ImageNet. So that's a very good starting point for you. Well, it's the typical KNN search that I used. For WebUI, I used Flask, Angular, and Bootstrap. So this is the text stack. Okay, the code. So I was searching GitHub if there are any other codes, similar codes that I can tap on as a starting point. But there was one, but it was very buggy. I think it's very typical about GitHub. Whenever you publish your personal projects, it's going to work at that point of time. Maybe some of the services are there, some of the services are gone. So I did end up finding one project, but it was totally buggy. But it gave me a good starting point. But the whole code is available here. So you can go here and download the code. And there's also a blog which details all the thing. So I'm just going to give you a brief how the code is organized. Basically, there are two Python files. So the first one is to extract features from your images. And the second one is to do the crawling. So the web app is under app code. And you have all the settings under this file. This allows you to run your web app on the web. This is very important. The network.protobuf file is the Inception v3 protobuf file. So this is basically the pre-trained model, a protobuf file given by Google. And the last one is your database, your SQLite database. So I'll just briefly run through the crawler before I go into the code and show you how the images are crawled and stored. Yeah, when I started and I was looking at this carousel app, I wanted to know how images are served in the mobile app. So pretty much you have to do a bit of reverse engineering hack. And then you look at how the API is serving the images. So what you do is basically you take your Android phone, you set up your Fiddler, and then you can see what is going in and around. So that's how you find out what's the API signature. Well, after finding out the API signature, what I did is I just built an iterative routine to fetch the images of that API. How do I fetch the images? Basically, you use the request.get, and you pass the return JSON, and you fill the SQLite feed. Well, I designed the crawlers to be idocotent. So whenever there is an issue in your crawler, it'll just crash or whatever it is. You just go and restart it again. So it will again fetch all the images. If the image is already there, it won't fetch. And I assume that this is what the JPEG and PNG images are served by API. Yeah, so these are the modules that I used in Python. So request, JSON, and you are in there. So what I'm going to do is... So this is pretty much the server that I'm running on cloud. It's running off Azure. So I just started an instance on Azure, Linux instance. So just to show you... So that's the home directory. So what I'm going to do is... So I did a git pull, and then the code is under TensorFlow search. So what I told you, how it is organized. So basically you have your crawler here, and your inception.py will be having your indexer. Your settings file is here. And your app code, which is your web app, is over here. So what I'm going to do now is... I'm going to run the... I'm using Fabric to run the crawler now. So what I'm going to do is shot images. So before I run this, I just want to show you. So this is the code, fabfile.py. That's your code. And this is the routine, the crawler. So basically it's going to get some images from Carousel. So this is how... I think this is the API. And a couple of things. I'm going to insert whatever images I'm going to get into SQL3 LiteDB. So the code is going to run and fetch, whether it's PNG or JPEG images. And then it's going to push it to the SQL LiteDB. And there's one more thing. What you can do is... There's a settings, which allows you to, like, do the steps. So how many eye-traces and how many collections you want to do. So those things, you can set it here. So I'm going to simply say, sudo fab shop site images. So it's going to just download all the images and it's going to just push it to the SQL LiteDB. So for this case, for the trial, for the demo, I just downloaded about 45, I mean, 15 images. It processed. There were some issues on five images. So 45 images came in. So the images will normally go to... So this is how it is organized. So all your images will go to the image folder. And when I build the index, it will go to the index folder of TensorFlow. And this is your database where all the metadata and the image locations are kept. And once the images are processed, it moves to this folder. So if I go to the images folder, it should have images, yeah. So just now we downloaded some of the images and the images are there. So what I'm going to do is I'm going to index these files now. So I'll run this sudo fab index. So this is the routine that I'm going to run. So if I go back to the code, here you go. So this is the index routine. Basically what's going to happen is... So all the images that I've shown in that particular folder, it's going to run through TensorFlow. And then we're going to extract features from the images. And then we're going to store it as index files. So this is the routine which is going to run. As you can see here, basically we're going to load the inception protobuf file. And then for each of the images, we're going to extract the features. So that's what I'm going to run. My location is wrong. I think it should be in a few seconds. It should be over because we have only 50 images. So, yeah. So it's done. So if I go back to that particular directory, oh, slash, slash, images. So they're all gone. So if you move to... All the images have come here. And if I move to index, you can see the index files there. So these are the index files created for all the images. We had about 50 images. So what I'm going to do now is I just want to launch the web app now. So what I'm going to do is I have to move back to that directory. So... So server. So this will launch the web app. So essentially it has loaded all the NPY files, basically all the index files that we built out of TensorFlow. Okay, this machine. This... As you can see here, the tricor machine is a... Azure machine. So what I'm going to do is I'll just go back to my server. So this is the IP where the machine is being... Yeah. So I'm going to run this. So as it is running, perhaps... Okay, alternatively, I'll move back to the slides. And... Oh, no, no, that's locally. Because when you launch your... Okay, never mind. So I do have one more which I have built and kept it. So this is how it will appear. So you have a shop site visual search server. And I'll come back to this UI. I'll switch back to the slides for now. And I'll come back to the visual search UI again. So just now I showed you... Oops. Okay. Just now I showed you the TensorFlow pre-trained model, the proto buffer file that we used to train and extract the features. So this inception is given by Google. So if you want to try community-developed pre-trained models, this is a very good website. If you go there, a lot of people have trained their own pre-trained models on top of it. And even now we have inception v4. So you can go here and take a look. You can use that models. You can use the code here and try it out. So what we do in the indexer is basically we use this particular feature, the Incept Bar Pool 3.0. And then using this, we will extract the image features. And that's why you see the NPY files generated. So you chunk 500 images together as one single file. So that's how we do for very large amount of files. Okay. With respect to UI, so it's using Flask Framework and it's using Ginger 2 templates. Angular and Bootstrap, of course. What I did is I also added this Clarify API. So what does it do? Basically it will do object classification. It'll guess what is the object inside the image. So that is what it's going to do. Similar to what Google was doing. Remember, it made a guess and it said it's a bag. So for that I'm going to use, utilize this API. And I'm also going to use the same API for this particular part. And it will flag whether your content that you have uploaded, whether it's appropriate or inappropriate. Well, how do we do our nearest search? You're going to use the NeoPy package. It's very, it's a very fantastic package. It gives you nearest neighbor search in a very large high-dimensional data. And if you look at the NPY files we generated, basically it has about 2048 bytes of the features. Every image it'll go and fetch 2048 byte size feature. No matter what kind of resolution is your images, it doesn't matter. So that's how it'll generate the feature. And then we can use this. It's very fast. But you can also use a brute force method. Basically you use this particular method. And then you use Euclidean distance to find the very nearest images out there. Okay. Let me move back to the image to the demo. Okay. This particular website is running on a different... You don't see it. So this is how the visual search will appear. And actually I'm running one more app on a different Azure Linux machine over there. So what you can do is I just want to add an image and I want to do a search. So what I do is I'll just use the same bag that I uploaded to Google. So this is the bag image. And I'm going to do a search now. So the approximate search is the fastest search that we saw just now. And the search by image is the brute force search. So I'll do this now. So this is how the bags that you get. The nearest images for that. And the other feature that I spoke about is the image tags. As you can see it has detected that it's a bag, luggage, leather, whatever. And it's managed to do the searching in 2.5 seconds. Google was 0.5 seconds. So we can't match that. So the approximate was finished in 1.6 seconds. The inappropriate content is here. So it gives you a probability that this content is appropriate 0.99. Perhaps if you try yourself with some other images then you can get the appropriateness. So the next thing is I want to do the search by image. Let's see. It's more better. But it takes more time. So what I can do is I can exclude some of the content here. I want to focus on a particular area. I can say that I don't want this particular area. And then I will do a search now. So it's going to exclude that particular part. Maybe I should use some of the image. Let's try to add some of the image. So this is with the approximate search. Let's do search with there you go. So the nearest is here. And using angular bootstrap what I did is I just introduced an icon here which can straight away take to Kerosu. So I can try to build the whole site as an aggregation of e-commerce sites. I can pull all their images in metadata. And then I can provide a service where people can quickly search and then buy their second-hand item in whichever service that's going to offer me that. So let's try the exclude part again. Maybe I want to exclude this part. Okay. So there you go. So it has excluded some of the content on top. And then it's trying to do a search based on this. So as you can see here. So I cut off the head and everything so you can see the results. Most of them are cut off with the head. So switch back to 19. Okay. Coming back to search singularity. Well why I want the search to be remarkable is this one way to make your search, visual search very remarkable. It's basically if you're going to take a snap of any picture and give it to a search engine, visual search engine. It should immediately tell you what's the content in the image. Perhaps it should even identify the brand and even more. It should give you the context. It should get the context and location from your mobile phone and then even add more metadata for you. Then it becomes seamless for you. You can buy or it should even list all the services that is offering this with the office. So that will be very cool. And how do we achieve that? So as I was going through some of the literature what I found is this one way to achieve that. So what you can do is you can build an object detector because whenever you take a snap of any object, there's going to be multiple objects. So you need to find out each of those objects. So one way is to do is called region proposal network. So that's latest in the literature which allows you to quickly find out the various sub-parts, sub-objects in an image. And once you identify the sub-objects what you can do is you can do a feature extraction. So let's say within this figure there's going to be boots jacket whatever maybe. So for each other object you're going to extract some features and then you're also going to look at metric learning. That's one of the newest addition to the machine learning where they're looking at the metric distances which is used for nearest searches. You have Euclidean distance which we used but there are a lot of other distances which you can use and the latest one is called large margin nearest neighbors and it's very good for very large dimensional data points. So basically you can use these two features. You can also look at training data from various web resources and let's say if your carousel or some other servers you can just push all the images and then you can use this and you can use the GPU servers and you can use the airflow obviously the ML stack is over here then you can come up then you can do the singularity API which is really cool. Well another take on singularity is basically are you aware of solar Lucine search engine so I think we can attempt to build something similar to solar and Lucine where it should be able to ingest any image any type resolution batch in real time so it should take all the images and even it should take feeds of videos if possible and then it should have some crawlable pluggable engine so you can just put that engine and it's going to crawl any websites or any APIs and then get all the images and then you can also use pre-trained models so it should it should also be pluggable within the engine so you can have any pre-trained models then you can store all your metadata in Lucine and finally you can do this basically. So it's a kind of solar for images because we have solar for search basically it's text search so why don't we take this and adopt it for image so this is the architecture perhaps. Yeah so coming back to the final slide I'm working on a very interesting web API web app an API so it's all about neural nets for unsupervised entity extraction and I'm also looking at human intervention framework which is basically like once you have done the first step perhaps some of the entities may not be able to be extracted properly maybe a misfit which you want to post it back to Amaze on M-Turk so that they can figure out what is exactly the entity is all about so a kind of framework where once you have run your ML job basically you have certain data points which are properly predicted properly identified there are some missing points which you want to give it back to manual intervention so that is about this and about crowdsourced data entity entry so this this app is covering all these three areas so I'm looking at whoever is interested can talk to me because I'm started on this thing and finally the code is there you can go to this github address the code all the code is over there oops yeah that's it about my presentation so if you have any questions no I didn't because I want to learn TensorFlow and I don't want to leverage on all the APIs which I listed as well Google provides its own engine to extract but they send and those so those are the ones where you push your image and they will do the stuff but I didn't try that the question was whether I was trying Google's visual API engine Google is doing it in a very very big scale by putting captchas so when you use Google captchas in your website you do a marking of it this is a sign so you select three images without a sign or the image of which mountains those are the images coming from actually this they help you they use this data fit into their own I think what he mentioned is the Google API engine there are a couple of functions there one is object detection what is there inside an image whether it's a shoe or a watch that particular function identifies the various sub parts the other one is even they have an inappropriate content flagging as well so there are numerous features out there but well for my experiment I want to learn so that's the reason but I don't have the money to download all the 3 TV and play around so I didn't have to order 3 TV I was trying but any other questions so your question is whether I have to iterate and take every image and then generate the features yes obviously for this case yes you have to do that because every image you have to extract the features then well you can do a search if not then you won't be able to do a search well the one that I was running it has about 39k images so 39,000 images the question is what's the performance and how much images was used in the website which I showed you it has 40,000 images and the search response is about 3 seconds for the brute force near a search and the fast search is about 1.5 oh yeah ok so coming back to that what happens if the image size is going to get larger and larger I think what you have to do is right now all our searches are all the indexes are in the files and what's going to happen is if I'm going to do a brute force near a search I'm going to load all the images 2048 bytes into memory and do a search in the memory so there is a limitation but we have to find out a way where we can store all the indexes on a database which is so quick so that's one of the ways to address that limitation oh no I haven't tried on a GPU yet this is ah no I didn't maybe I have to I have to give a try ok but then what is the what's the storage mechanism to store the indexes no so it's Peole files which has the features as a binary there, and then you load it into memory. The question is, am I categorizing the images? No. For this experiment, I just want to see how the nearest search is, whether I'm able to perform a nearest search using TensorFlow. Coming to that point, I shared you about search singularity. What we can do is basically you can take an image, you can do some object identification, and based on that, you can figure out a lot of features. Like for example, in a fashion domain, you can take an image and you can say what kind of tease it is, whether it's men's or women's or kids's. So you can start doing this classification, and then you can take the attributes, and then you can store it, and that's why you can speed up the whole thing. It can detect the moving train in the image. And given 90, I think we tried three times, the train model. Did you see that? Oh, no, sorry. We put the whole model on the Raspberry Pi, and in the real time, it can detect the train. Yeah, possible, could be. Actually, that was pretty interesting. I saw that one, and because Raspberry Pi doesn't have a computer power, but it was able to do it in the real time. Yeah, incidentally, TensorFlow. I think his question is whether we can apply, in a way I deduce that the question is whether TensorFlow models can be used in your mobile phone. Yes, TensorFlow is available on mobile, and you can use TensorFlow for object detection and everything. Oh, it's the learning curve. You have to learn tons of things here. Which one was hardest to learn? Well, it's not TensorFlow. It was all the flask, fabric, and stuff, which I'm totally new. So TensorFlow is very easy. Just use the protobuf model, get the features out, you're done. And also, you have to figure out the nearest search, how you do the nearest search and stuff. I thought somebody will ask this question. There was no issue on throttling, but you guys need to put a check whether you're sending back your API results very quick. And you have to put a throttle check in your API to see whether the requests are coming from one particular IP address very, very fast. Then you should just stop the IP. Don't serve results anymore. Just stop it. That's a good way. So you don't allow people to go and scrap as they want. But anyway, Google is scrapping every site. So you have to be mindful of that. No, I'm not. It's a public API anyway. I have no idea. I can build a web service where you can build your own web service, which you can offer to commercial organizations to do visual search. But there's a lot of improvement as we've done. It's very, very crude. And the response time is very low. And you have to really take it for a ride and see how it's. What do you call us? 47 singularity. Oh, OK. So the idea behind naming the talk as search singularity is to, first, is to grab attention. The next is to talk about how far search is now improved compared to last time. And especially with machine learning, it's going to get even better and better. So that's where singularity comes in. So how remarkable is your search going to be in future? So I hope it would have given you a glimpse of how it's going to be. So in future, you take your mobile phone. You just take a snap. You're going to get tons of information of that particular snap. And it could even say, who is this person? And it's history and everything. So that's going to happen. Well, not so much, I mean. Not so much. How much do you have hundreds of people? And if you were to correctly identify, where would you be? No, we are talking about more domain oriented. I mean, we can take this whole concept and apply to a fashion domain and make sure that your search is so accurate, which Google has not done yet. We are very generic now. So you can take it to different, different domains and make it more and more better. It's always ongoing. It's always evolving. There's no perfection for anything. I mean, if you take any endeavor in life, any process, obviously, accuracy will keep improving. I mean, right now, as you saw, we did the search. And if the features are very exact, it's OK. Sometimes you get very bizarre results as well. So that is where you have to improve upon. Well, honestly, I didn't want to experiment more on the extra features because I thought I'll make the application complete first to make it run. Perhaps next iteration, I would look at auto encoders and all these stuff. Oh no, I just want to give you an option where the question is, why did I use Google model? Google pre-tained model rather than other models? Well, if you're going to use TensorFlow, the official model that is available is Inception. V3 is the latest model from Google, which is very easy to use and quick. What I listed here is perhaps there are other options as well. I mean, people in the community have took Inception V3 model. And they have trained it better for certain domains. So if there is a model available for fashion domain, if I'm going to use this service for fashion, I could very well go to the community and take that model and apply it and see whether if it's going to give better results, I would rather use it rather than train myself from scratch. So the idea is to give you an option where there are community models that are available as well. And off late, we have Inception V4 also coming. Is there a value listed in the beginning? Oh, I didn't. Well, the question is, did I compare my service, the performance with the other service I listed? Well, I just want to give you an idea like this very rudimentary, this very... I understood your question. No, I thought why should... No, the basic premise that I listed is BYOS. So why don't you build your own service rather than depending on somebody else? So that's the idea. So I didn't compare. But again, if you want to do comparison, then you have to go and get the APIs, then you have to push your image, and then you have to... I mean, there could be some trials, but Viscence is not giving a trial. IQ net, all the services are not giving trial. So except Clarify, that one gives a trial. So that's why I'm using the trial to do the object detection. So most of the services are very enterprise oriented. So you have to pay them to get your... So I can't compare. Oh, thank you. That's it. So thank you.