 it like by showing you the codes, the proto text files to be specific. So what we will try to do is do the same thing visually as one of our activity. But apart from that, one more thing I would like to share that there was an ELSON saliency challenge where our lab has been declared winner of that saliency challenge DeepFix. So first of all, it was 2013 when I started my work in deep learning at IISC. And my work was in salient object segmentation. So I was the one who actually installed cafe. So before that we used to work on something like conditional random fields or we used to work on sift or wavelets and stuff like that. And this deep learning completely changed the way our lab also works. So my talk will try to cover what is CNN like at first. And also one more thing, we use Titan X. So it's very cheap and you can use it to do your training and stuff like that. So NVIDIA also sponsors our research. So what I will try to cover is first of all what is CNN as such. And then I will talk about the tool which I developed. Because I developed it because I found that there is a need for visual representation of the layers. So that's where I started to develop. At that time, there was no digit tensor flow or stuff like that. They just came very recently. And also we will have a fun activity. So I know that a lot of jargons will be there. So just brace yourself. So if you see what you need, if you go and signal processing literature, you will find very good explanation as to what is convolution. So it's nothing but how a system responds to an input. But for 2D, it becomes even more simple. It's nothing but a filter which is applied like a soap all over the previous input. This is, I should have mentioned the references, by the way. But still, this is from reference is the previous one I created myself. But the point is that if you see the right side, you will see that how the convolution is nothing but applying a filter. So how does a simple neuron differ from a convolutional neuron, like convolutional layer to be specific? So if you see a simple neuron, let's say there are three units in the previous layer and two units in the next layer, which is feature map. So the total number of units will be 3 into 2, 6 for a fully connected layer. Fully connected layer and simple neuron are same. So before convolutional neural network came, it was just fully connected. But in convolutional neuron, what you will see is that what you try to train is restricted by the weights. So between the input layer and the output layer, if you have some, let's say, only one cross one weight blob, then you will only restrict your training to just that one cross one weight. So only one parameter will be changed all the time. So another way is let's say you have a restriction that you have to only have only one layer and you need to decide what to do. Use a convolutional layer or a simple or fully connected neural network. What would you do? So there are two problems I would ask you like what's intuitive. So if you have image and you want to do cat dog kind of, whether that image is of cat or dog, given that images contain only cat and dog images. So what would be more sensible? So it is important that every pixel in the image has a connection from this previous layer to this cat or dog category. So that's what is a fully connected layer or a simple neural network. But if you have a particular image and let's say you want to make it sharp, although there are static filters, but let's say you want to make it learn according to some problem, then you would probably prefer training a filter, not all to all connection. So that's where the convolutional neural network changes. You can also think of it like a cake where you have an image and you have a lot of stacked on filters, pooling layers. So all this I am very happy that Abhishek discussed some time back. So these layers are something which are stacked over and the hot chocolate is nothing but back propagation which changes all the weights down. So I think it's close enough so when you go and eat cake, just think of it as CNN. So now this is the way the pipeline is. I just like creating some animations at times when I'm free. So what are applications of CNN? So CNN is useful for a variety of tasks, like we use it for depth estimations. My work specifically is in the saliency, this eye fixation prediction. So you can see like when you look at an object, let's say I'm looking at you all. So what I will do is that my eye will roll around a few people. So I am trying to find out where my eye is fixing. Let's say you look at... So I will go around few important people or few people who are in let's say court or something. So my work was to understand how eye fixation works. So we have some research work about that published sometime back. So and also there are other applications which I want to say like MNIST has become really famous after Jeffrey Hinton and Andrew Ng, they have made it really popular. So MNIST, you have a particular scribbling of some particular number and you want to find out where it hits. So let's say here 8 is there. So it should hit the right chord 8. But let's say it hits the wrong chord. So you have a probability vector. If it does not... The arg max of that probability vector is not correctly hitting or it's not very well. So you are trying to make it hit right. That's what is object recognition problem. In this case, character recognition. The other thing is like pose estimation and depth estimation. So you are trying to... So in case of depth estimation, the output is not just a single one cross 4096 or some kind of a layer. It's rather a 2D layer. So there the way you back propagate changes drastically. So first of all why... How I come to the problem that why do you need ExpressO? If you already have a lot of tools. Like this is buildover cafe. So I chose cafe because it's really fast. The same reasons which Abhishek mentioned. Like it's very fast to use. But so this... What it does is that it takes the cafe's prototype and it makes it visual for you. And not just that, it allows you to visualize the responses in so many ways. Other advantages that... Like when an industry develops the tool, it develops for its usage. But when a research lab develops, it also tries to integrate a plethora of tools which it's commonly... Like at least for us, computer vision community often uses. So dense CRFs, Lib SVM, tools like that are very commonly used by computer vision community, which integrated with the deep neural network. So these all tools are integrated. And what we were planning that... We will also integrate a lot of other tools with it. The other thing is, you know, you have to sit back and when you start training, I remember some... The first speaker, he was telling that we reduce the time from seven days to like months to a week. But I would say it has been reduced from week to days and not just days, from days to six hours. GPUs have been really pivotal in the growth of convolutional neural network. So the thing is when you start training, you have a lot of concurrent tasks. Like you might be wanting to train multiple things. You might be wanting to import data along with it. You might be wanting to visualize. All these need to be concurrent. So it needs to be parallel. So what I did is that made it... I have multi-threaded and each thread tries to create a sub-processed stuff like that. But the thing is that I tried to have concurrency and there were some challenges because I was using PyQt, so I had to face some issues. But the other thing is whatever you see as a GUI has a metadata behind it. So to make that metadata... To design that metadata is very important because if you try to overdo things, then it becomes really clumsy. So you want to make it very simple yet complete. So that was all the thing. And we integrated with different tools also. So something like this. Like async threads creating sub-process. Because in PyQt you can do this way only in something. So this is how it's working. Now let's start the activity which I was talking about. But yeah, one second. So I will be using Siffer 10 dataset. So Siffer 10 has 10 labels. And 50,000 train and 10,000 test images. And each of them are dimension 32 cross 32 cross 3 which is RGB. And the labels associated are 0 to 9. So that's how Siffer 10 dataset is. So first task... So this is the tool which I am talking about, which I developed and was reviewed by Ravi. So the thing is this tool, you can see four views. First task when you do training. So I will be... So just pay attention. If you have paid attention to what Abhishek did, he described trained solver and deploy prototypes. I will be doing the same thing in a visual way. So here in this data view, what you can see is that like you will be able to visualize what data you have imported. You can have different formats as you can see. So first I'll just load the data for both train and test. You can see here, like the data is important, along with its cardinality, everything is visible. Also in the top right corner, you will see a notification bar. So that will indicate that the import has happened. Then you will just import a particular train test net. I mean that the net which is responsible for training. So here you can see... Let me just... So you need to understand its researchers work, not an industry work. The thing is what we try to do is we gave a visual understanding. So what happens is that there are some intricate calculations happening. Like if you have... If you have tried or some pooling, some things, some parameters into it, then the next layer becomes... The next layer's output becomes different. So all these things needs to be visualized. So that's why we added it. I'll just complete. So then we added some... We add some auxiliary parameters here. We select the net here. Now we go to train view for actually training. We select the solver parameters, which are required for pack propagation. And then we select the data. And finally we see the training in progress. So here you can see that... Cross entropy loss is actually decreasing very rapidly from where it started off. So that progress is also visible here. And once you train, the train model is available here. Now all you need to do is benchmarking. So what you do is that for the test data set, you extract the features from probability layer. And then go... And for that you select the probability layer and click on evaluate. So what we'll do is that the arg max of that probability, it will compare with the actual labels and see if it is different or not. So if it is different, then it's zero. It's not then one, and it will calculate some accuracy figures. And just very... In just... I will take one more minute. So for visualization also, we have two kinds of visualization. Let's say I have nine images. I want to visualize it, visualize the responses those nine images visualization for a particular layer. Let's say con one or con pool one. So that you can visualize, but also you have a facility of visualizing for a particular image all the layers, sorry, all the responses from different layers. So these two things are provided and also what you can do is that apply a pipeline of operations like log and invert. Log is important because it balances the differences in the responses. It makes levels of things. So yeah, I would conclude my talk now and all I would request is... So this was... This is the summary you... cafe trains. It creates cafe model. And then what you do is... You can also find you, by the way. And then this cafe model goes through and fits. You give images, it will also do feature extraction. And you can... the GitHub link, there is a GitHub link, there is a Google group. You can also contribute to the code. It's open source. So that's all. And yeah, questions. And then there is a lunchtime. I know you are hungry. So...