 Hi everyone, this is my first time here at Hagwell, so as introduced I work right across the street at Franklin, so I am into working as an engineer in a startup which is into cognitive media solutions, so be it surveillance or be it fashion attributes, everything today uses artificial intelligence and to the core of it it uses deep learning. So I participated in Red Hat Open Innovation Labs Hackathon 2017 and these were the problem statements like five of them so you can just check it out from the paper if you want. We chose to solve problem statement which was crowd management in common spaces and we had to do to collect the parameters like number of footfalls, heat maps and feedbacks from the visitors, so suppose just assume that you are in some MRT station and you want to calculate the footfall, so there are various ways to do it, one is using sensors and other is using say camera for instance, so we went with the camera based solutions because we wanted to leverage the computing power of Raspberry Pi to the fullest and not only that, it was not only a computer vision based solution, we wanted to integrate it with the cutting edge artificial intelligence which sits on Amazon EC2 instance because Raspberry Pi doesn't have a GPU and we require high computation for artificial intelligence, so we linked Raspberry Pi to the Amazon EC2 cloud to give the first real distributed artificial intelligence solution, so this was the problem statement basically, so we had to do number of footfalls, heat maps and so for feedback there was nothing specified but we will see how we did it. So the underlying theory of footfall is we had to detect the ups and the downs, people moving in, people moving out using, so for now we use computer vision and how we do it I will just demo it for you, so for example this is a video for now that I have selected and as you can see that for humans it is very easy to see how many people are moving in say up direction and down direction, but for a computer it is incidentally difficult to check how they do it, so to start with Raspberry Pi doesn't have much of processing capabilities as far as image is concerned, we wanted to do two things, we wanted to do image processing on Raspberry Pi and we wanted to make it real time as real time as possible, so with the current cameras the problem is they are full resolution LC cameras and they are like thousands cross thousands in pixels, so the search time for a particular feature increases that many times like O of n square as we all might know, so what we did is we have had this theory in running into practical applications it is called image pyramid, so it is called in the terms of computer vision it is called either up sampling or down sampling, so you resize the image keeping the aspect ratio the same constant that means your feature is saved that means it does not get aberrated, so we used it to reduce the search time of a particular feature, now how do we detect moving objects so for example for instance there are two frames here one is called background model and one is called the current frame, when you subtract it the background gets subtracted totally and only the objects which are moving are there in this frame which are not there in the previous frame appear and then you threshold it to get a black and white image, in computer vision this is called a blob BLOB, so I will show you, so the same video now as you can see we have got blogs here, so these are the blogs which are extracted by pixel differentiation and background subtraction, now if I if I run this video again as you can see on the left it is also counting I will just move this frame, so what it does is basically when the blogs are detected we try to complete the blog using the external parameter perimeter and it is called code to detection and then we fit it with respect to a bounding box around it and then we try to find the center of the bounding box and we compare the current frame with the previous frame and apply a theory called Hungarian algorithm which is nothing but data association, that means in the current frame the point associates itself in the next frame to associate itself with the next frame and the best or the nearest match is selected, but not only that it can count, it can do other crazy stuffs as well, so for example this not only it counts it can if the camera calibration is right it can give you the velocity of the objects as well, now I mean one might ask that why do we use it or what is the application of this, it has got tremendous application in surveillance, so not only this, so we have got a lot of attributes associated with the object, this is a computer vision based solution remember, but if you integrate TensorFlow with it and you have a classifier or a detector you can, so for example if it is a video scene where cars are moving by and you want to count them, not only you can count them, you can also you know map it to the car make model number plate, the license plate and ID number, timestamp, so if something happens let's say in future, you don't have to go through the entire video, you can just go through the particular event and you will get the results, so this is how we did it, these are the attributes that we had generated, count in count out totaling in AHU percentage, so for example in this room if the number of people increases right the air con temperature should go down, but it shouldn't because currently it is set point based, so we actually proved using an LED that you know when number of people or the crowd density increases you can actually have an automation protocol linked with it, so that's AHU is air handling unit, it is a frequent term used in chillers and mechanical components used to control the air con, so heat maps we came up with a theory on the spot, and so these are the blogs that are intended for example, so what we do is we divide the image into n cross n grids, so we reshape the image in a form of square and divide it in n cross n grids, so since it's a black and white image, we know the area of each square and we divide the number of non-zero pixels that is white pixels they are read by the total area, and if it's greater than some percent we give it some color, so whenever the blog moves depending on how much occupancy it is in each grid the heat map is made, okay user feedback that was completely automated, so what happens is currently if you see the subway station the MRT stations, so when you move out there are like few buttons that you have to press like you are happy or you are sad, but we wanted to do it live and in an automated way, so we used Microsoft Vision API for face recognition, so at a particular spot in the image whenever the people pass we take their face ratings and just map their emotions directly to the database, that means we don't require anyone to press anything, the face says it all, so for example if some line is not working a lot of people would be either angry or upset, so usually it doesn't help but when something goes wrong it gives you an idea that yes something is wrong, so future expansion that we actually demoed currently I don't have a wifi so I cannot connect to the cloud the thing is in this method that we have discussed the blog detection method even if a dog passes it is detected as a blob, so what we do is the rectangle that is generated we pass it through a classifier and whenever there is a, so we classify it either as a person depends on the number of classes, so the training happens as the number of samples increases so we have got multiple classes like cats, dogs, cars, bikes, humans like persons, so we just detect a particular object and map it with the attribute, so if we want to count only person we will only count the person, future expansion is like CNN based object detection scheme, it can be object classification scheme as well then yeah so one thing why we came runners up and not won the competition is what they said is the solution had to be end to end it had to have an app, so we couldn't make a web app or a messenger box so yeah integration end to end with the web would be the next thing and that's about it, thank you, also any questions are welcome if any the video from the top, the people are quite dark in the center but when you have images which usually have a lot of light the blocks are not actually very immediately become cloud of points so sometimes in this area the subtraction is the most simple method so in videos like this subtraction becomes really easy I agree but when it's a low hanging camera where people appear bigger in size we usually go for object detection first followed by tracking that means detection paradigm would be AI, machine learning base so for example the current best performing detectors are YOLO that is you look only once which is built on the top of TensorFlow any other questions, thank you