 In this learning dialogue, we will discuss about data collection and MOOC environment. You are taking this course in massive online open course platform, that is MOOC. Assume that you are a course administrator in MOOC and you have access to all the programs, all the data in the MOOC and you can write the scripts which we need to collect data in the MOOC. If you have all the access and you have administrative access and you can write code, what data you will collect about the learners in the MOOC? You can pass this video, write your answers after completing this task, you can resume the video to continue. Data collection in MOOC, there is a common parameters that is timestamp of each event or action, learner ID, who is the learner and session ID, the same learner can log into MOOC for multiple times in a day or over the time. So session ID is important, IP address to know where the user is logging from. So you can collect this information for all users across MOOC. This is a very key information to identify which user, which session and where is from and the timestamp to tell fine-grained data such that when the event occurred. What from this, you might have thought about you can collect the pages view, discussions, navigation, behaviors in the video, I would like to go each one in detail. If this page view, we need to note down what page view and what is the time students spent on that particular page, what is the page title and where the student is leading. Depending on the MOOC platform, you can perform various activities in the discussion forum. Like in the comment, you can delete, reply, report, downvote or like the comment or in the trade, you can create a trade, you can delete a trade, you can unfollow a trade, you can follow a trade or you can update or just visit. You just watch the trade, you read the trade but you are not spent any time in it. You have not created any comment, you have not followed anything. How do I know whether you visited a trade and read or not? If a timestamp tells that, you are in the trade for more than several seconds, say three seconds or five seconds, you might be looking something in the trade or it is possible that you open the trade page and you go out, you come back after five minutes, that is also possible. So, there should be upper threshold limit also to consider a student is reading a trade or just looking a trade or creating something in trade. Students forum, the learner can search something and follow a particular user. So, these kind of activities should be logged in the MOOC. Apart from this, the student navigates in the MOOC from one page to other page. This navigation behavior also can be logged. So, MOOCs contains basically three main things. One is there is a lot of resources and videos and discussion forum. So, we need to log all the information as students interacting with the MOOC. In video behaviors, you can collect more information such as play, pass, seek, speech change and transcript. Let us look at raw data of MOOC collected from IIT Bombay X course. You can see the raw data, the username is anonymized and the event source is browser and the event name is seek video. The learner is seeking the video and the timestamp tells when it is happened. Also the user ID is there, organization which is offering the course IIT Bombay X and course ID. When you seek information, the context we need is from time, which time to which time the students seek the video. The new time is 0.557, the old time is something else. So, the student seek the video from 625.9 to 0.557. So, he used the sleek on slide seek. So, if particular event or particular action is seek video, we need to know couple of things that is which time the student seek the video, which particular video he seek the and what is the starting and ending duration in the seek video. Let us look at the other example. In this example, the event source is browser. The test book is the event name is actually scrolling. The student is scrolling the reading resources. It has a detail of which page, where it is from, again the student ID and everything. Apart from this, it has other important information that is which dash is scrolling. Is it scrolling downward or upward? We may not need such a rich data, but it is good to log all this data to understand what student would have done in the particular time, why the student is not performing well in this particular time. So, these two examples, the example shown in the collecting data in the classroom environment, also collecting data and work environment to show that, you have to think what kind of data we can collect in a different environments. You collected raw data or you ask the MOOC platform to provide a raw data to you. Most of the MOOC platforms like IIT Bombay X or COSERA or ADEX can provide a log data to you if you are offering a course in the platform. The raw data should be converted into actions and events. So, the raw data can be saying into what is the student ID, session ID, what is the action, what are the important parameters we need in that action. So, you can develop some scripts, maybe in any programming language to convert this log data into a readable data in a CSV format. Each action should contain timestamp, user ID, session ID. Also what is the action name, is there a seek video, is it a scrolling, is it a discussion forum creation, is it deletion of discussion forum and context of the action. For example, in seek video we saw start time, end time, the duration of seek or if it is a question answer, response to the answer, what answer is the student entered. If it is a reading page, what is the page name is reading, speed of the video if he is watching the video. If it is in forum, what is the title of the forum, this kind of information should be captured in the context of the action. If you want to learn about preprocessing that after collecting data, after writing a script to extract these actions, if you want to apply preprocessing, there are lot of videos and lectures about preprocessing in ML and data mining courses. Teaching preprocessing is beyond the scope of this course.