 Welcome to learning analytics tools course. This is introduction to machine learning part two. So, in machine learning there are four steps. The first step is data collection and data processing. This is similar to what we saw in learning analytics. In this step we have to decide what data to collect and why you have to collect. And importantly we have to make sure that the data is available or not. Also we have to do the data pre-processing like removing the missing values or outliers, ensure that there is no bias in the data. Also if you find any errors you have to remove those data. The second step is choosing model or algorithm. That is like which model to use for the data or research question you have selected. This step includes decision of choosing the suitable algorithm. And the third step is training and testing. So, now you have a data, you collected data, you have a research question and you also have the model to use it for your analysis. And the next thing is you have to train and test. So, you have to classify your data into training and testing data and use the data of training and testing. Training involves the process of determining the parameters of model using the data. You are creating model using the training data and testing data is used to evaluate the performance of the model. You can create more than one model and you can compare the performance of the model using the test data. So, let us look at the type of ML algorithm. There are four types, supervised learning, unsupervised learning, reinforcement learning and recommender system. In this course we will talk about supervised and unsupervised machine learning techniques. So, what is reinforcement learning? Reinforcement learning is mostly used in neural networks recently. And recommender system is the system which in an education context which recommends based on the student's interaction. It recommends what to do next like for it to the questionnaire or a new content or a new example something like that. In a recommender system in other context it can be recommending a new algorithm itself or recommending a new program or new strategy something like that. Let us start with the supervised algorithm. Supervised algorithm is a developer model or function that maps an input to one output with the help of label data. So, you have to find the mapping between the input and output. What is input and what is output? Let us see. For example, you have collected the data of student, three students here. You collect the data of three students and your data that is average session time. So, imagine they are working on MOOC or metal kind of daily. Let us see it is a MOOC. First you have to see average session time. So, in a MOOC they might be logging into the course for several times. So, what is the average session time the student when he is interacting with MOOC? So, you say 34 minutes it is given in minutes. And number of videos the student watched overall may be 12 or 30 or 25. That is the number even second feature. The other input we can have is average time on each video in minutes. Like you watch 12 videos, but what is average time spent on each video say 2 minutes. So, some videos you might have watched more than 2 minutes. Some videos were watched less than 2 minutes, but that is average you can compute. Also, we can talk about number of interactions in forum like a 10, 20 or 6. So, I assume that there are 4 features I can collect from the MOOC data. I have a log data I computed these 4 features using Excel or written some script to capture it from the log data. Also, I have the student's performance in the final exam or the exam conducted after the MOOC course. So, I have the final exam here. I will consider this input that is which I can observe like x1, x2, x3 or x4 all as features like xi. The i here is 1, 2, 3, 4 features and I will consider this performance as the predictive value or a dependent value. This performance is depend on these 4 values. So, it is called dependent value. These are all independent value. So, we call that as a y that is the performance as the label y and xi is the features x1, x2, x3, x4. Here I have a student a1, a2, a3. Now we want to create a model for this data so that if a new student a4 comes with say a different time 42 and number of videos was it is say 36 and he was only 1.5 minutes, number of interaction is say 30, what will be the performance? So, the idea in machine learning is that you have a supervised data that is you have a new student's interaction that is observation input data also the label create a model using this data and test with or predict the performance on the new student or new data you are yet to see that is a machine learning supervised learning. Let us understand what is the, now let us understand what is x, y, x1, y1 mean. So, a label data say that is xi, yi where this yi could be either an element belong to a finite set of class or real number. So, the x, xi that we saw in a previous slide that the four features we collected from the MOOC data interaction, yi over there is performance or student's code. The yi can be pass or fail, pass or fail or the real number as you saw in the last example pass can be denoted as 1, fail can be denoted as 0 or it can be apple, orange or other fruit we can denote this as 1, 2, 3. So, the now the yi is set of 1, 2 and 3 in this example. Similarly, yi can be 0 or 1 it is a binary it is a real set of numbers or the real numbers as we saw in the previous class like performance in the exam x63, 40 or 70 something like that. So, example for supervised algorithm is that we can predict students final mark final exams course like based on the interactions or the based on the interactions in the class in the environment or we can predict in MOOC which user will drop out, drop the course in 5 weeks or 6 weeks something like that. The dropping will be binary classification saying that you will drop or not drop or which user will uninstall the app that is churn rate. Suppose, consider you are creating a education mobile app and you want to know which user will uninstall that app and then they will install. Can you predict it based on their interaction with the app? Then you are predicting whether uninstall or install churn rate that can be again 0 to 1 binary classification problem and predicting the students performance in a nest question that is a more final level prediction that is based on the students interaction with the system for say last 10 minutes or last couple of sessions. You want to predict whether the student will answer the nest set of questions given to him or not. If the student is not going to answer you might consider giving the less difficult questions or providing hints or asking them to read something that will help the students time and also student can learn better. So, these are the examples of supervised learning. In this all examples we know we want to predict the students something like a final exam score we should have the final exam score we can collect it. Also we want to know other students drops the course or not you know what to predict you know what is the value you are going to predict also you have to come up with the set of input features like x1, x2, x3, x4. So, then supervised learning you know what is the input also what to predict. If we have both x1, xi and yi then it is considered to be a supervised learning approach. So, given the examples you saw in the previous slide can you list down other two examples for supervised learning problems from the data collection we discussed in the last week. List what is the independent variable that is xi what is the dependent variable that is yi that is label. List down two problems and list down its independent variables and dependent xi and yi after listing it down let you to continue. So, you might have listed down some of the examples like xi, yi. So, let us consider that you have xi and yi. Say there is a simple example students ID there are say 7 students and we have the attendance percentage in the semester also we have the final marks in the semester. Now, this is xi and this is yi I want to predict the students final map score final marks in the exam using the attendance percentage. So, or you can have a multiple input value say x1 and x2 for example attendance also the mid-sem marks. So, I have two marks two data and input features I want to predict the students final marks. So, that is yi the y1 is the final marks real number and x1 and x2 is here. So, in this case we have both x1 x2 also the final marks. So, this is the supervised learning technique should be used for the prediction. So, what is unsupervised learning? So, we said that supervised learning as both xi and yi unsupervised learning as only the xi we do not know what we want to predict it or that is like we have observed the students interaction, but not sure what to predict that is one thing or you do not have historical data that is that in a previous slide we saw that students attendance mid-sem marks and exam scores available, but that data available from your previous years teaching record or some data available already some students have taken the exam and everything you want to predict the students performance in the current batch you do not have any historical data of the score or something in that case also you can use the unsupervised learning approach. So, in unsupervised learning approach we have x1 x2 there is no y we do not know what we are predicting, but we have collected a lot of data, engagement everything we want to see is there any pattern evolve from the data or any clustering happen from the data. The for example, so given a new article given a news article classify the news article as a sports entertainment and politics in the Google news it happens automatically without having labeled it as a classifying a sports entertainment or if we had a lot of human data labeled all these news articles automatically then this is the supervised problem. Consider there is a given news article a topic of one news item say there was there was a festival in a particular state and we want to collect all the related article from a different newspapers if you want to collect about that particular news from a different newspaper news publisher then it can be unsupervised algorithm because we may not know laboring of this algorithm ahead of the time. So, there the system automatically groups the similar uses for example, festival in a state and if you want to consider that news it uses the key words in those news articles and tries to find is a similar keyword available in any other news magazines and news people then it collage everything together and put it under one particular news heading. If you go to Google news and if you use it news.google.com you will see this kind of grouping together happens using unsupervised learning algorithm. And other one is group uses based on the profile data. Suppose you want to create a project lab course or something like that and you want to group the students based on profile data like based on the previous background or the branch in their class tool or diploma or some other thing or their background information you can group that information and you can create groups and assign some tasks to them or group the students in a class based on engagement in the class. Some students will highly engage some students who are not engaged you might find grouping of these students and you may want to mix them to make a better peer learning or something like that. So in a nutshell there is a difference between unsupervised learning and supersized learning is consider we have attendance in percentage midterm marks if I plot that in X and Y axis and I have plotted like this and I have no idea what is this student belongs to what is this student belongs to. So if I play a grouping algorithm or clustering algorithm I can group this as a one group and this as a second group or I think if I want to apply further these two groups may not be right I want to apply three groups maybe I should go and create a new group maybe this is a one group this can be the second group and this can be the third group and this can be the fourth group. How many clusters is good two or four that we have to identify by using the mechanism of error or minimal error. We will talk about that later in a clustering class. So given this data without any label we can group them into two groups or four groups for clustering algorithm can identify the groups with a similar behavior in the data. Whereas in supervised algorithm we know there are two groups here these students have scored less than 70 marks in the final exam and these students have scored more than 70 marks in the final exam. So this is a exact two class classification problem this is a class one this is class two class one is less than 70 marks and this is equal to greater than 70 marks. So now you know the label Y and you know the X1 attendance and X2 midterm marks now this is a supervised learning problem. In this problem we have X1 and X2 we do not have Y. So it is not a supervised problem we can come up with the clustering say two clusters more than two clusters depending upon the data and interaction behavior how they interact with each other. We will talk about the clustering in detail in a separate class. So this is to introduce the difference between supervised and unsupervised learning. So since you have seen supervised and unsupervised learning can you list down two unsupervised learning problems from a data selection we did in the last week. After listing down please resume the video to continue. So as I mentioned already you can use this unsupervised algorithm to form groups for class project or from a lab project you may not want to form a group with a similar set of students instead you want to mix and match. For that first you need to find what are the students' behavior similar groups or you can use this unsupervised algorithm to provide remedial content or extra coaching to them or you might want to give exam which is less difficult or teach a special course or something like that or you can compare the behavior among two groups if you have a group A and group B and you can identify patterns among these two groups and compare the patterns of group A and group B the patterns can be like order actions they do you can identify mind patterns in the unsupervised algorithm also you can develop clusters based on interaction data in Moodle or Tilly so the interaction data we can use it to create clustering algorithm and create clusters to group the students into multiple groups and you can provide different level of recommendations to them. So here we have types of supervised learning that is classification and regression unsupervised learning classification again goes to binary classification and multiclass classification and we saw that binary classification as a 0 and 1 multiclass can a 1, 2, 3, 4 or Applebanan or something like that in regression the y is usually the real number the continuous variable like the performance 65, 70, 75 something like that so in a classification problem it aims to develop a model that could help in separating data into multiple categorical class that is in this class I want to set into 2 class I want to set into 2 class I want to 2 class so it is a binary classification problem the goal is to predict the classic class of category to a particular instance will belong to for example if I have a new data say new data with attendance new data with attendance 63 and the midterm marks is say 40 or 50 where this data belongs to this data belongs to class 1 or class 2 it is based on where you draw this line how do you draw the line we will talk about that in detail it is based on where you draw the line suppose if I have a line drawn on suppose I have a line drawn like this this data might belong to the other class so it depends on the where you draw the line we will talk about that in detail but even a classification algorithm when a new data comes the goal is to put the data into either this class or that class in regression again it is subdivided into multiple like a simple and multiple in simple it is linear, nonlinear and also in multiple it is linear, nonlinear we will talk about a simple linear regression now the output is the continuous variable this is the same plot we discussed a few slides ago there is attendance and the final mark if I draw a line or linear line which fits all this data which assumes that there is a linear relationship between attendance and final mark and I have to fit these values in a linear line so the simple linear regression aims to fit this this two marks like a x1 and into y1 for example I just go here so this is x1 this is yi it is time to fit the linear relation between these two it assumes that there is a linear relation between these two variables consider there is a new data or new student with attendance equal to 65% so what will be the students mark in the final exam so since we fit this line using a linear regression algorithm we can able to tell what will be the students score in the final score since it is 65 we can draw the line and the mark will be around something like 68 or 69 or something like that so in a linear regression algorithm instead of classifying in the new students data whether the class 1 or class 2 or class 3 we are trying to find the score using the continuous variable so you have seen just the introduction of classification and regression we will talk about this classification and regression algorithms in detail but given the introduction of classification and regression algorithms can you list down minimum two differences after listening it do not mention the video to continue so the classification algorithm creates a model of the data into two or more classes binary or more not binary classifiers the data the y is usually the discrete or categorical where like example student will pass or fail student will get more than 80 marks or not something like that whereas in regression the model trying to fit the given data points that is x y to the x i to the y also the y is continuous data serial number and try to predicting the score for example a new student comes what will be the score instead of trying to put him into whether the student will pass or fail or student will get less than 70 more than 70 or more than 80 they are trying to predict the score that is why the difference between these two so there are few algorithms for supervised learning that is linear regression, nearest neighbor naib base decision tree we will see these algorithms in detail not upper take term machine or random for us but we will see the first four algorithms in detail for unsupervised learning there are two types again clustering and competitive learning like we saw in the supervised learning there is a classification regression in unsupervised learning there are two types clustering and competitive learning in clustering again there are many types of clustering we just gave two of them here so came in clustering and hierarchy clustering we will discuss this in detail to summarize in this video we saw what is supervised learning and we saw unsupervised learning what is type of supervised learning algorithm such as classification and regression in unsupervised learning we saw a clustering technique we did not discuss much but we introduced what is clustering we will discuss each of these algorithms with example algorithm in detail in further classes thank you