 My name is Abhijeet Dubey, this is Nitin and we are students here in the CS department of IT Bombay and we will be mentoring this project. So basically what this project is about, so as you have seen earlier the Sunbird platform, there are lots and lots of videos from which you learn, from which you learn. There are documents also from you go through and then you get tests and all. But then there are no proper tagging involved. So what happens right now, as of now is at the time of uploading, the instructor like provide manual tags or you don't provide tags at all. So there is no organization of stuff. What we are trying to do here is instead of this manual tagging, we are trying to extract like semantic structures from the document, extracting entities and attributes from the document and then provide tags for the document or video automatically. So he will give you the overview about what smart learning is and how we are going to go about it. Hi, I am Nitin and I will just explain why we are doing the smart tagging project. So basically we have, when we have an online learning platform and we have an automated evaluation process, then it, we require that all the contents that are used in this evaluation to be present with certain tags and certain metadata so that we can have some data on what the students are lagging in, which portions or which difficulty level the students are lagging in, the time required to solve certain problems, the skills required to solve certain problems so that when a student has some difficulty in solving a certain set of problems, we can identify which areas he is weak in or he or she is weak in and provide extra resources for that student. So as to provide a customized learning experience, so we need all this data regarding which type of content that the student is weak in or strong in. So that we can achieve by tagging the respective content with what type of content it is. So what we generally do is what is mostly done in the industry is that at least previously done was we used to tag it with the expert faculty used to tag the content that they design themselves. They would tag machine learning course with the particular difficulty set that they have created and if we are creating huge online learning course then maybe basically the tagging used to be done by human beings themselves. The problem with that is this is an expensive process. People will have to go through each individual video and determine the difficulty level of each video, each quiz, each document, each quiz, any content that we have has to be tagged individually by a person. This is an expensive process. This is a time consuming process. Another problem is that this has the potential for human bias. Basically what it means is if we divide the data set and give it to three people what may be difficult for one of the individual may be easy for another individual. So there is a possibility of bias scraping it and we want to avoid all of this. So we plan on automating the whole tagging system. This we plan as an extension for the Sunbed platform and what exactly smart tagging is basically we try to, as Abhijit explained, we try to identify the semantic structure that each individual content has and then we try to predict what tags it can hold to. So each video we try to convert it into a text document or something and then we try to get okay which this may be related to theory, this may be related to artificial intelligence, this may be related to programming etc. So these are the skills that we expect from the interns. So you should be familiar with basic programming skills. You should have basic data structure and algorithm skills and you should be familiar with at least one programming language, preferably Python, we will probably be using Python but any other programming language will be fine. You can pick up Python on your way and we will be using a lot of machine learning and any familiarity with basic machine learning would be a plus but it's not necessarily required. If you want to go and we will be using a lot of NLP as well, natural language processing. If you are familiar with any of that, that would also be a huge plus. But basically NLP and machine learning you can learn here itself. It's not mandatory that you have, you don't expect you to be familiar with machine learning and NLP from day one. You will get time to learn both. I will hand over the mic to Abhijit. So basically what we are going to do over the period of this project is, first we will look at some basic ML and NLP terms and like approaches used. This would basically be a one week thing where we will get ourselves familiar with the machine learning and natural language processing. Then we will read about the existing approaches which are there for this tagging stuff. Then we will use a couple of libraries to build a model. So we will basically try to train a dummy model so that you get familiar with the basic libraries and then we will improvise the existing models which are there and we will try to implement new models. So basically it will be programming intensive. We expect you to like explore the existing things and if you are not familiar with machine learning and NLP, that's fine, that's fine. We expect you after the completion of this project, you will be familiar with what machine learning is, what NLP is and how you can train your models. Another thing is training these models on the data is computation intensive. So we will probably going to need GPUs for that. So first after like finalizing our model, if we are sure that that is working fine, I'll give you the access to one of the GPUs from which we can like finally train our model. The final outcome is one API which will have the trained model in itself and the input will be the document and the output will be the relevant tags from the document. This is not only specific to Sunbird. This is like a general tagging module. You can incorporate it into Sunbird. You can incorporate it into anything else. That's fine. One of the stretch goals we have in mind is if this thing we can complete within the timeline, we are going to start doing this tagging thing right from the video itself. So currently we are focusing on extracting tags from the document by analyzing the structure of the document. If this thing like we are able to complete within the time, then we will start with predicting tags for the video itself. So initial component will be to convert the audio of the video into text and then the existing module from text to tags will be used. So this is one additional stretch goal. I'm not sure whether we are going to complete that or not, but yeah. So after this, you'll basically be familiar with what machine learning is and what all the hype about machine learning is these days. Cool. Any questions? If you have any questions, you can mail us or contact us.