 In this LED we will talk about the course project, there is a small project in this course. Before that we will describe what is the data and how we process the data and we will explain the each columns in the data sheet also what you have to predict. This course project is based on the MOOC data we collected from IIT Bombay X. So, we discussed this MOOC data like what data can be collected from MOOC in week 2. Just to recollect here as the data we can collect from MOOC such as the learners ID, student ID, session ID, the IP address, the discussion in the forums, number of upwards, downwards, also their video watching behaviors. They are seeking the video or they are watching video in pitch speed, 1x speed, 1.5x speed or these in the caption, all this information can be collected. And also we saw the two set of data processed. For example, this is a raw data we collect from the MOOC courses and the raw data consists of the student is seeking the video and where the student is seeking the video also given here. Also we saw a similar raw data for scrolling the book or scrolling the PDF in the MOOC. So, we have data collected from MOOC and pre-processed. We will provide you the pre-processed data for the course project. Now, I will show you the pre-processed data in Excel sheet and also explain what is the goal of this project. So, that is what aim of this particular project and what you have to do. Here is the data. The goal of this course project is to predict when the user will drop out. The first column is the user ID. The user ID is anonymized. So, the user is taking the course for 5 weeks. So, 1, 2, 3, 4, 5. This each row in this sheet is weeks data, one week data like those users 1 week interaction data with the MOOC. And for the second user, the user also interacted with more than 5 weeks. So, course was only for 4 weeks. So, the user is interacting with 5 weeks means he has completed all the weeks of courses. Similarly, for user 3, user 4 and everyone has interacted more than 5 weeks. There are few users who interacted with the course for first 2 weeks and third week then they drop out. We want to predict which users will drop out on this week. For example, this user 1 has interacted with MOOC. For 5 weeks and he completed the course. So, the student is not dropping out. So, the column 1 is the user ID and column 2 is on slide seek. Like is the student is seeking the video. If seeking is when you are watching a video, you can move the time like timeline. You can say I want to watch video for 1 minute seek back to 2 minutes. You can seek the video. If they are seeking the video how many times they did in this particular week that is a count. This is a column 1. Then column 2, this is weak number like the user is participating in 5 weeks, 4 weeks, 1 or 2 weeks each week number. So, we have data for each week. And how many times the user interacted in a forum the 2 times. And how many minutes the student spent on the video that is 1 that is 1 minutes here and 67 minutes, 79 minutes. And this particular column indicates how many times the student were actively participate in the discussion forum or discussions. This one indicates how many times the student navigation moving the page from video to LXT or LXT to LXI or navigation happens. Or here this is the time the student spent on courses. So, these each thing is a time spent on each of these pages in the LMS model. Also the grade the student has taken a course assignment. This column is grades. The grades indicates students performance in the assignment for each week. We will indicate what is the maximum marks for each week. And this is the attempts how many times the student attempted on this particular assignment is a one time attempt multiple times the attempt numbers is there. And vote is the student upvoted someone's comments or someone's discussion in the thread. And how many times user is participated in a thread. And this is the user searched anything in the forum. The forum has lot of comments. This is searched for anything any keyword in the forum and we looked for that. The search number of searches also given in this column. And he is looking for in force and he is like looking for in line comments or he is jumping to the courses or he jumped to particular HTML page or how many times he closed the particular page or video playing time a number of times he played the video and problem check failed and problem created. So, we got all this information. We will provide the details of each column in the project. So, what we have to predict? We have to predict the students drop out. So, here we have a users we have weekly data or interaction data in LMS. And also we have the label to predict that is what you predict the students is dropping out or not. So, only binary classification. You need to use this data and apply the data on software VECA. You need to use this data and apply it on a ML software VECA and create a binary classifier model. And you have to use 10 fold cross validation and predict the students dropout rate from the given data. In our course project, we will provide you the details of each column on what to predict and what software use and we will talk about also cross validation. That is all about the data. Thank you.