 Good morning to one in all present here. Today, we, the members of Ideal Rupal team, are standing before you all to deliver a presentation on our project titled Building a Raspberry Pi based device to ask questions and get answers in video format. Our team consists of three members, namely Navneet Agarwal, Tushank Gupta and myself Shreya Mishra. The entire Rupal project was mainly composed of the following modules. First one was the Raspberry Pi based hardware device. Second one was the Android app development. And the third one was the video question analysis, out of which our team worked on the Raspberry Pi component. Our objective was to build a Raspberry Pi based hardware device to ask questions. Videos enable the expert to answer questions according to the understanding ability of the person asking the question. The main purpose of our project is to encourage people to ask more questions in a simple and hassle-free manner. Although several platforms still exist to ask questions in textual format, but very few of them support video formats because of the complexity involved in video analysis. Videos enable the expert to answer questions according to the understanding ability of the asker. So the key feature of our Raspberry Pi device is its ease of use and access by anyone working in the institution where it is installed. Even a person who is not technically sound can access it and use it conveniently. Suppose a student or a teacher or even a non-technical staff can use it with equal convenience. So this generates ample scope for our machine to be installed in thousands of institutions and encourage people to learn more by asking more questions. So in order to highlight the importance of our video-based questions, I'd like to take an example. Look at this picture. Suppose this person in the picture asks the question how to send a message using a smartphone. So will you answer this question? Obviously not because as it is evident from the picture, this person seems to be a student and educated enough to know how to operate a smartphone. But now, if the same question is asked by this person who appears to be a poor farmer or a labourer, then obviously will answer this question because he is not educated enough to know how to use a smartphone. So for our project, we have used the following technologies. First one is the Raspberry Pi which is the core component of our project. Second one is the FFMPEG technology which we have used for recording audios and videos with synchronization. Third one is Kintel which is a Python library which we have used for graphical user interface. Fourth one is the face recognition which we have used for matching our questions with answers. Fifth one is the Python VLC which we have used for playing back the answer video. And sixth one is the Spring Boot MongoDB which we have used for the development of REST API. Now since our project is based on hardware, so the following are the hardware components of our project. First one is the Raspberry Pi. The model may be 3B plus or higher. Second one is a 3.2 inch TFT display. Third one is a 16 GB or more micro SD card. Fourth one is the USB webcam with mic. And fifth one is the micro USB power supply. Now my friend Navneet will continue. Our project mainly consists of seven independent modules. The first module is used for initialization. When the device is set up for the first time initialization is necessary. The initialization can be done by running the Python file init.py. It will assign a unique machine ID to the device and the institute ID is fetched from the central server by authenticating the email and the password. This process set up the device under the institution profile in the central server. The second module is for recording videos. Here the FFN command is generated and executed using a Python program. Video is recorded in the AVI format because the MP4 requires the more hardware resources. The resolution for the video is set to 320 x 240 pixels. The third module used is logger. Logger is used to generate the JSON files. JSON file contains the information related to the video such as the name of the video, time and date of the creation of video, the institute ID, machine ID and as well the extension of the video. Here the new log entry method is used to generate the JSON file. This module is also used to store and retrieve the data from the SQLite database. The fourth module is face recognition. The face recognition is used to map the questions with the answers. Here we use the open source library of the Python called face recognition which works on the database backend. The face encodings of a person asking questions are stored in the numpy zip file. Recognition accuracy can be altered by adjusting the tolerance value in the config.py file. To record a video using a device, when the device is started, there are three buttons on the screen. The first is the start button, second is the sync button and third is get answer button. To start recording, press the start button. It will first register your face and then the recording will be started. How it works? It will first generate the face encodings and store in the numpy zip file. Then it will generate a JSON file which contains the metadata related to the video such as name and other information. Then it will start recording video. When the video is started recording, it will display two buttons on the screen. First is the stop button and the second is cancel. To save the video successfully, press the stop button. If you want to discard a video, press the cancel button. Now Tushank will continue. The next module is used to play and answer videos on the device. The next module is used to play and answer videos on the device. This works with the help of VLC bandings for Python. So when a user presses the get answer button, so the first face recognition is done and the user is found in the database to know which questions he has asked and the device checks if an answer is available for any of those questions. The first available answer gets played. So this is how it works. First the device captures the face encodings of the person in front of the camera. Then these face encodings are matched with the available face encodings and a list of questions are generated. These questions are checked if the answer to any of them is available on the device and the first available answer gets played. Another important module is the synchronization module. This module is responsible for sending question videos to the server and downloading answer videos from the server. It sends the videos and the JSON files to a REST API enabled server and stores entries in the SQLite database. Md5 checksum is also done to ensure successful file transfer. Device info module displays three important properties of the device. The machine ID, IP address and the storage currently left on the device. When the user presses the sync button for at least two seconds this information is displayed as shown here. So you can press the sync button once to sync a video or you can long press the button to see the device info. So this is how the synchronization of video with the server works. First the device sends a request with the JSON file containing the data of the video then the server stores this information into a Mongo database and responds with a unique question ID. Then the device sends the video along with this question ID to the server. The server calculates the Md5 checksum of the file received and responds back to the device. If the Md5 matches that of the original file the device adds an entry to the SQLite database and then deletes the video from the local file system. So this is the whole picture, the flow of information over the network. The question videos are sent from RPi devices to the RPi device server which forwards these questions to another central server which also receives questions from other Android apps and iOS apps. These questions are then forwarded to the video analysis server which generates tags for each of these videos. These tags are used to identify the experts who have the required knowledge about these questions. These experts get these questions on their device and when they answer these questions get back to the original users via these servers. So this is the final outcome of a project, this device. Thank you. So one question is... Suppose I don't get a matching answer to my question, what do you do? You don't get an answer on the device. Then the device won't get the answer. The question is not answered yet. The question is not answered. So is there any way to resolve it? So to find out which questions are not answered and... There is a database that contains all the questions which are not answered. So once that question gets answered, the entry is deleted from the database. So what do you do with the list of unanswered questions? Like when the device syncs to the server, it checks all those questions. If an answer is available for any of that question and it is downloaded to a device. My question is slightly different. Is there any mechanism that you provide? I mean I think it is out of scope, your project may be. Because then there may be some other server there, central server having some other module which will actually look for such mechanism to look for questions which are not actually answered. And then identify the potential experts who can answer those questions. I think that could be an extension. What is the size of the video? That you have if you record. Like one minute video will be around 2-3 MB. 2-3 MB. And if the device gets filled up with that storage. I mean if you have lots of videos coming into that device, what do you do? Then it will give an error. Like we haven't reached that. Okay. So where will this device be used and who will use it? Like where will you place it? In institutions where there is a lack of internet connectivity and this device can be installed in village institutions where students who can't afford smartphones can come and ask questions. And whenever the institute connects this device to the internet it sends the questions to the server and then after it gets answered they are sent back to this device. So first I will come and ask a question and then later on I will come and check if there is an answer. Is it the same way to ask the question and reply to that question? We have to use the same device. So how it will be answered through video or through chat? It will be answered by the Android app. The next team has made an Android app. Yes. And you have used FF MPEG and that is very old technology. There is no Python library which can sync both audio and video at the same time. So the FF MPEG is very good in this that it can record a video and with sync with the audio easily. That is okay. I know the codec and it is I think 20 years back it has been developed. And there are so many libraries available which can produce good quality audio and video and with less size. So I think in future you should try that codec. So because of these limitations you use this. Due to power consumption problem also. This is a small device and the processing power is also low. Once I get the answer to my question. So I will get a list of videos and I will see that these are the questions which have been answered. So you said they are deleted. So do I have the option of deleting or how does it get deleted? The answer video. If the asker sees the like watches his answer then after that the answer will be deleted. Automatically. Right now it is automatically. Because suppose I want to see the answer after one or two hours. In future we can have the option that one can delete one can decide when to delete it. Okay thanks. So when video is recorded all the videos is stored in this device. So the videos are stored in the device when the device synchronizes with the server. And when the file is successfully transferred the video is deleted from this. And you have been sent one more video analytics server. So what is the use of that? They generate tags about the video like the video is related to which subject. So they will explain the third team. How it is generated tags? Ask the portion and you will receive the answer also. But he is not satisfied with the answer. So what at that situation your device is going to do? You don't have any button or anything. Right now it is not there but this can be done in future. But definitely I am not satisfied with the answer or not answered it. Yes. Right now maybe we have to record that question again. And maybe this time it will be answered by some other teacher who can satisfy that student. I don't think so that recording should be there but there should be not proper answer or anything. So whatever video format answer has come. I am not satisfied that can be deleted from that particular device. But waiting for another because it is a forum if I am not wrong at the end. So there should be multiple things for that particular question coming. It should not be stopped. Yes. Right now we have only one. For one question we have only one answer. One video will come on our device. Suppose I just see that particular question is there and a person has answered it also. I want to just modify that answer and put it another more elaborate one. So that option should be there. It should be highlighted here and one cannot satisfied or something button should also be there. Just a point to reference for the future work. Yes. I think you are making wrong assumption here. I believe you are not in control whether your device will receive the answer or not. Once the main server is sending you the answer video, your device is supposed to receive it. Once the device receives the video it deletes that entry from that database so that again it does not check for that same question. Otherwise there will be a long list. No, no. Then it is wrong. See on the server there is no limitation. The limitation is just for this device. Correct? Yes. And if you save textual information here you can store a lot of data. Yes. I believe lags. Maybe crores of videos list can be there. Video cannot be there but the question ID can still be there. This device will be taking hardly one MB of the space. Correct? So if your server always maintain a list of questions asked by a particular device and teachers are generating answers anytime because it is a forum. The question will be there all the time on Android devices. Teachers will generate the answer. Once the answer is generated, no matter when it was generated, it can be generated after a year. It should be synced back to the device. And there should be a limit that after 7 or 8 days or 30 days if nobody watched that, delete that from the device. It's okay. But let many answers sync to the device. There is no problem in that. As you said, as Yogendra said for the textual information or anything can be there. Teacher may answer in a text form also because I am just talking for this particular device only. Since you have buttons, so with the help of that buttons, you can have the text information displayed which can be scroll up and down. So any, because the teacher can answer in the video form also and give some text also. To view that text something should be there. So it recognize faces. So who will be responsible to create that faces? How it is generated? Like we are using an open source library face recognition. So it captures the face encodings when the person first asks a question. Suppose I am a new user. Yes. I don't have entries in the database. So whenever you will ask a question and the first your face encodings are stored along with the question ID. So automatically it is stored. There are some procedures. Then you press the start button to start the recording. First it will register your face to generate the face encodings and then the recording will start. And in future when you press the get answer button then it matches that face encodings with your current face encodings. Suppose again I have to ask a question then. Then the first question for which the answer is available will be shown. First time it recognize my face. Yes. Later if I ask some questions then again it will recognize or it will take the old one. It will take. How it matches? Like the next time you will ask the question and once again the face encodings will be stored to a device along with the question ID. Then what you are doing with the old one? It will show the first. Unnecessary creating the faces again and again. Like when the one question is answered already is that already the entries are user specific or question specific. Question specific. It should be user specific also. One user can ask many questions. Yes. Next time. Yeah. Tell me one thing. I am a new user to your device. I started recording video. It is syncing my data. It is taking my face information and putting it somewhere in a file. That file is a textual file or it is a database related file. Textual. Textual information. Okay. So it is a binary file. My information is stored there. Next time again I am asking a question. It will again detect my face. Try to identify. Whenever you ask a question. So what will happen? If second time I am asking a question will it create a new entry or it will match with the existing entries. It will create a new entry along with the question ID. Like for each question there is a face encoding for that question. Then there is a big problem. See if I have asked 10 questions. Okay. And then I am getting answers to all the questions here on the device. Okay. So if I am coming there to get the answers what will happen then? Which answer will I get? All of them will get matched to your face. All of them will get. So then why are you creating multiple entries for a face? You can actually after generating the entry you can see if that entry already existing or not. And you can simply match it to that rather than creating one more text file. Getting me? There should be one text file per new face or repeating face as well. Is it so? But then it will create problems in matching the answer to the question. Like which answer is matched to which question. Because we are recognizing the face and based on that only we are... No that's okay. So let us say if I have asked 10 questions. If I talk about in terms of table the one entry will be of the asker. One entry will be of the question ID. Correct. So asker will always be the same. You should associate question ID with answers. Yes. Right now we are associating a face encoding with the question ID. The face ID you are storing it locally. Locally yes. For all the thing. All the users. All users who asked on this device. Yeah. So once it is recorded the face ID is there. So it is synced to the server. And what about that face ID still remains with the... Face ID remains only on this device. It does not send to the server. Okay. If that scene is there then whenever that same person comes. He should first go through that table of the face ID. Recognize it and take the... He should not again capture the device and create a new ID again. If it is locally stored then you should use the same thing. So you try to identify. Then see from the identity that it already matches. Then don't create a new entry. It will be multiple entries. Again the system will be having... Staking space also. Unnecessarily more process will go again for that. Yes. Do you have different signature? Signature? I call it as a signature because you want to encode my face. Put it somewhere. Right? Maybe binary or whatever. Yes. Okay. So is your algorithm giving a single signature for the same face which appears even after 10 days, 20 days, 30 days? I think yes. How have we checked it? Like we tested it. What is the test? How have you tested it? Yes, it works. So what do you capture? When you see a face, what is that you look for? And what is the algorithm? We have not developed that face recognition algorithm. It's an open source library. What is the library? Face underscore recognition. Face underscore recognition. It works on Dlib. Yeah. Thanks. Thank you.