 Welcome to learning analytics tools course. In last video, we saw that how to collect data from open-ended learning environment. However, the data collected is clickstream data we collected too much data, like we collect every clicks of the students. So, that we call it as a raw data. So, we saw that data we collect from a classroom environment, MOOC or daily. In classroom environment, we collected performance, engagement, attendance, that data we can collect, it is not in a raw format. Already the data in kind of a feature format, you can create new features from the data. Or in MOOC, we talked about looking at the page views, video watching behavior, that is still again a raw data format. We have to extract features or construct features from that raw data. In Telly also, we showed that there are lot of clickstream or user interaction data can be collected. How do we create features from those kind of raw data? We look at that in this video. So, this is the raw data we collected in the MOOC. This is the sample raw data we collected from Metal. Here is the raw data. The first action is it is the user readable action like this user started solving the problem, then he started solving problem, then he is submitting some questions. Then he is in a functional model is checking something. After checking the functional model, the student moved to the main model, that is the main problem page. After that, he went to the evaluate sub problem, then he went to information simulator, the student is moving towards different subtask in the main problem page. What is he doing in that particular problem page? For example, this tells you that the student is problem map, which location is in context is begin solving the problem, now he is in a functional model. So, where the student is currently is can be captured here. Then the students detail like what is he doing in that particular page can be given from this task like he is checking or he is doing a calculator in the simulator or he is checking the other actions. So, the students detailed action can be captured here and also with the timestamp. This data is directly for MongoDB conversion to CSV, this kind of data we can get it, if we write a logging mechanism, how to log students interaction from a learning environment. But I will show you the better data. From the CSV of MongoDB download, we can convert with a very minimal effect assign a student ID A1, A2 or some student ID we can assign. And the timestamp can be converted to the readable format instead of UTC timestamp, Unix timestamp we saw in the last table, we can convert into a timestamp in a different format. Then we can make readable actions like the student is in functional modeling. So, I went to the first level of action, I do not want to go second level like in functional modeling, what is the student doing? That is depends on your research question. Then you can say what is he doing in that model is doing something else, info center, simulator, some contextual information, you can add another column to add more contextual information. It depends on the research question we are asking. So, my suggestion is capture all the data in a raw format, then create your construct your own features based on what is the research question, what is your aim and how do you want to construct the features. So, this is the data we can obtain from the MongoDB with very little effort. So, let us think about it. So, we have this data, let us do a small activity. Given the raw data you saw in the last two slides, can you list down at least three features? Please pass this video and write down your answers and resume to continue. So, the features in tele can be average time on functional model. For example, a student will be using this metal for multiple sessions and you might be moving around functional model, qualitative model, quantitative model multiple times. What is the average time a student spent on functional model that can be the feature? So, how do you compute it? You have to use Excel sheet, simple Excel sheet tools, tricks, you have to define what are the student ID for a given student ID and you can compute when and when and all students use the functional model and you have to use the timestamp data to compute the average. So, you might know how to use Excel sheet as if you have watched the week 2 video. Now, can you consider the frequency of actions? So, first one is time, the second one is frequency of actions. Number of times a student use functional model in a session or number of times the student use functional model in a week. So, you can come up with the different set of features. So, there are two basic common features we should consider is time and frequency. Like functional model, you can create features for qualitative model, information center, calculator, simulator, you can create a lot of features from the log data. Also, you can compute features like average time per session. What is the average time a student is spending per session? And is it different from group A to group B? Suppose you have a two set of groups, one is control group or experimental group and you might be introducing a new intervention in the OALD for group B, then you want to see that whether this recommendation or intervention improves students average time spent on the OALD. So, you can compute these kind of features from the log data in a raw format. So, how do you construct features? The main thing is domain knowledge. So, that is why the domain expertise is more important to construct these features. The domain expertise in the problem you are handling and also you might know by your experience a student might do these actions in order to complete task Y. So, that kind of domain knowledge will be helpful to create the features. So, or please read research papers which talks about features, feature construction from that you can start your feature construction. The basic thing as I mentioned is time and frequency. XR is the easy tool to start with does not require much coding or any knowledge on programming. In this week, we will teach how to use Excel sheet as a learning analytics tool. Also, the Python or R script is really good. So, I recommend even if you have worked on any programming language, I recommend you to start learning one of these script languages. Especially in Python, please use pandas. Especially in Python, please use the library called pandas. That helps you to process the raw data from the CSV or in a matrix format. Once you have the raw data locked into the Python, you can do a lot of actions just like we do in Excel sheet. Pandas library is really good and I recommend you learning Python. But if you watch the demo of how to use Excel for the LDA, that is enough. So, in this video, we talked about what is raw data, raw data from MongoDB example for matter and also how to construct features. The basic feature construction is please consider time and frequency of each actions, create features from that. Thank you.