 Hello everybody, welcome to this conference. My name is Stefan Urena, I am the Chief Data and AI Officer at ESSEC Business School. And today we are going to talk about using Moodle Analytics in order to prevent students failure. So just a few words about ESSEC Business School. ESSEC is one of the leading business schools in Europe. I just put some metrics about the financial time, so we are ranked ninth as European Business School in 2022, number sixth for the Master in Management 2022, third in the Master in Finance, and ninth for Executive Education. We have 69,000 graduates worldwide. We have four campuses, two in the Parisian region, one in Morocco and one in Singapore. Plus we have an online campus. We're going to talk about it. We also have 220 universities, university partnerships in 46 countries, 31 double degrees. And we also have 169 permanent faculty of 37 nationalities, 24 learning and research years, and more than 1,000 partner companies. In terms of students, we have more than 7,000 students in full-time undergraduate and graduate programs from almost 100 nationalities and 5,000 managers in executive education. So this was just to give you some insights about our school. So today we're going to talk about a specific problem which is preventing students failure for online degrees and online learning. So I guess that all of you are developing online programs over the, I have been developing online programs over the years, especially since the pandemic. But this trend is soaring over the years. And the problem is that for us, for instance, we have a lot of certificates online, even degrees like a Master for Executives in Digital Transformation. And that involves hundreds and sometimes thousands of students. So the problem is how can a teacher in a specific course follow thousands of students for a specific course? That is very difficult, even with pedagogical assistance. And so how can we do that? So given the number of students involved each year, tens of them fail, maybe sometimes hundreds, as I said, because we have thousands of students. So how can teachers follow all those students at this level with such a volume? And how can they identify those that are at risk without the contact of on-site teaching? So we found a solution at Moodle. It's a brick that is directly incorporated inside Moodle. It's called Moodle Analytics. It's a component in inside Moodle. What is it about? It's a component that brings predictive models regarding student success in online courses. It is primarily based on their interaction with the Moodle platform. So that means that when students are on Moodle, they are connected, they have interaction with their platform. It generates logs and those logs are used then in order to analyze their interaction with the platform and their likeness to succeed. With this automated system, we can then send notifications automatically to students at risk. I'm gonna show you the platform just after that. And one of the good things is that it is compliant with, sorry, it's complicated to manage too in the same time, it allows you to be compliant with the GDPR because everything is inside Moodle. It's a brick of Moodle. So you don't need to extract data from Moodle and load it in other data platforms to train models and make predictions. It's fully incorporated inside Moodle. So how does it work? So I guess all of you know Moodle, so you just go to the platform Moodle, then you go to site administration and then you have a button which name is analytics models inside the section analytics. And then you have this interface. I'm sorry, we don't see very well, but I'm gonna explain you. So on this interface, you can see that there are three lines. The first line says courses at risk of not starting. The one that we're interested in is the third one, students at risk of not achieving the minimum grade to pass the course. It's a model that is incorporated inside Moodle. And so we used it and I'm gonna explain how. So when you are on Moodle analytics, you choose a model or a target, we say in machine learning in order to predict something. What we aim at predicting here is as it is written above, students at risk of not achieving the minimum grade to pass the course. So this target aims at preventing students failure. And so how does it work? You select this target, you enable it, and then you select a few indicators. So these indicators in machine learning, we call them features, are variables that are correlated with what we aim at predicting. So for instance, we see the one in blue here. I'm sorry, we don't see very well, but it is written course access before start date. And so it is a feature that can contain some signal whether the student is likely to succeed or not. For instance, if a student goes to the platform before the beginning of the course, it means that he's involved in the course more than those that are not going to the platform before the start of the course. So it is a good signal. There are other predictors like any right action in the platform, any right action in the course, read actions amount, all these features capture some signal saying that the student is really involved or not in the course. Then you choose an analysis interval. And then you choose some training data in order to train those models. So the good thing is that all these things are done automatically. You don't need to go to some extra platform from the GAFAMs. You can do all these things directly inside Moodle without any knowledge in machine learning or data analytics. You just have to click and select the target that you want, the features that you want, the training data. So the historical courses that you had in the past to train the models to make some predictions. So this is just a screenshot from the documentation explaining the features that you can see inside the platform to make the predictions. So for instance, you have course access after before start date, the second one. It's a binary indicator which set up one. If the user has access to course before the start date, zero otherwise. You have all the description inside the documentation of Moodle Analytics. So I presented you this brick. This brick, which name is Moodle Analytics, as I said, aims at predicting students at risk of failing in their online course. So as I said, the system is good. It is user friendly. You can use it quickly without any knowledge in machine learning. You just have to read the documentation. But now, how can you be sure that the system is precise? That is to say, it's making predictions saying these students are at risk, but how can we be sure that these students are really at risk? That's something that is important to be sure because if you send notifications to students that are not at risk, then it can be a real problem because you will contact students for nothing. And so this is some metrics that we have been calculating. I'm going to explain. So this is a confusion matrix. And it is a matrix that we use in order to evaluate the precision of classification algorithms. So we have two metrics, precision and recall. The precision aims at limiting false alerts. So what do we see? We see in the column in blue, predicted positive. So what do we see? We see that we have 51 people, 47 were real positive and four are real negative. What does it mean? It means that out of 51 people that the algorithm predicted as at risk, 47 really failed in the historical data. In machine learning, we train algorithms and then we test them on new data that the algorithm doesn't know. But we have the outcome and we know what happened. And so we see the predictions of the algorithm and then we compare them to the real output. And we can see that in the end, 82% of the people that were considered at risk by the algorithm really failed. So it gives you a good confidence of the precision of the algorithm. There are not a lot of false alerts. And so you can be confident that the students that you're going to contact, they are really at risk. It is very likely that they will fail. So you're not contacting them for nothing. So that's a good news. Then the second metric is called recall. What does it mean, the recall? It aims at limiting misclassified failing students. So as you can see in the red row, we have the line real positive and we have two columns. The first column predicted positive, we have 47. It's the second column in reality, but the one in the middle. Then the one on the right, predicting negative, you see that we have 18. So what does it mean? It means that the algorithm made some predictions. It predicted that 47 were going to fail. And if we compare to the truth, they're really failed. So 47 were well classified. But on the column on the right, predicted negative, we see that 18 people were considered not at risk by the algorithm, but in reality, when we see the outcome on the left, they were real positive. That means that they failed. So that means that the algorithm failed or missed 18 people and predicted that 47 were going to fail and it was right. So that means that the algorithm was able to detect 76% of the people that were going really to fail but missed 24%. So I would say that on average, when you have metrics above 75% for precision and recall, it means that the algorithm is quite good. So you can be confident using it. It will bring value to the school. That means that you will be able to contact students that are very likely to fail. And in the meantime, you will be able to contact almost all the students that are going to fail. So these algorithms are good and the data model behind model analytics, which captures, as I said, all the logs and all the interactions of the student with the platform brings you some value because it is very precise. So here is, let's say the interface, when the predictions are made by the algorithm, they are made automatically. And when you have them, you can see the list of the students that are at risk. And as you can see in this interface, you can click on the button. Well, we don't see it well but you can select all the students that you want and you can contact them directly by sending a message automatically to all of them. So that's something good because in the end you have a platform that is able to do machine learning in a user friendly way. You don't need to understand all the things behind. There is a procedure and I will recap it at the end. And you have in the end an interface with all the students at risk. You can just contact them directly without buildings, some extra features more than what Moodle already offers. So in order to recap, so you need to have some courses on your Moodle platform because you need some history in order to train the algorithms. If you don't have any courses at the beginning, you need to run a few courses and then you will have some historical data that you will be able to use or that will be used by Moodle Analytics in order to make some training of algorithms and then predictions that are quite good as I just showed you. So first half courses on Moodle. Then you select the target as I explained. So for instance, the one that we are interested in here was students at risk of not achieving the minimum grade to pass their course but it can be some other predictors, some other targets like predicting students at risk of dropping out. There are different definitions. You can see all of them on the documentation of Moodle Analytics and you will be able to choose what you need given your problematics. Then once you have selected a specific target, then you will need to select the predictors. The predictors in machine learning, we call them features and the features are variables that you think that are correlated to the target that you aim at predicting and the algorithm will use them in order to be able to calibrate and make predictions as accurate as possible. Then once you have your online courses that are already inside Moodle, you selected the model that you are interested in or the target, you select the features, then you run the algorithms and you evaluate them and you see their precision. And so one measure that is interesting, it is the harmonical mean of the precision and recall that we saw just before, it's called the, sorry, the F1 score. And so this F1 score is something interesting and it is already inside Moodle Analytics. So when you evaluate your algorithms in order to see their precision, you will have this measure. It is not called F1 score, it's called precision, I think. I think that they use this word because it is easier for people from the business to understand that we are talking about precision if you are not from a statistical background, but this measure gives you the precision of the algorithm. And if it is above 75%, as I said, for instance, it's something that is very valuable, even 70%, but you will see what you have inside the signal that the system is able to capture from your courses when you run it. Then when you are confident with your algorithm, with your system, you have to include the teachers from those courses in the process in order to explain them what you have been doing and in order to have their agreement, in order to use these systems on their classes. That's something that can be a challenge sometimes because you have to explain the teachers what you're doing and it's not very easy sometimes, but take the time to explain those things and then you will be able to run them with them. Then there is also something that is important. I mentioned it, but quickly, for instance, for instance, us at ESSEC, we have online courses that last five weeks. So we launched these algorithms one week after the start of the course because we want to make those predictions as quickly as possible in order to prevent failure. So one week after the start of the course, which lasts five weeks, we have these algorithms that run and that are able to capture the students that really at risk. And so that's something that is interesting and valuable to us. Once you have the algorithms that are running, you have some predictions, then you have to explain after the teachers, you have to explain the pedagogical assistance how to contact the students from the interface that I showed you just a few minutes ago in order to send the messages, to tell them, hey guys, you are probably lagging behind the average of the cohort. You should hurry up to catch up your delay, something like that. You are free to choose the kind of message that you want to send. And then another advice is do not hesitate to repeat that a few weeks later or a few months later, depending on the pace of your course. But the more you will run those algorithms and the more you will try to detect over the time the students at risk, the less likely you will have students that will fail. So I encourage you to run that different times. So thank you for your attention and I'm available for some questions. Thank you. Thank you. Well, thanks for this very interesting presentation and it was so interesting to see some insights into an institution that is actually using Moodle Learning Analytics. I have a bunch of questions since this is also my interest in my research, but maybe I'll just ask one and then see how much time we got. So my first question is, could you just remind us when you started using Moodle Learning Analytics and have you observed any decrease in student dropout since? Yes, we carried out some A-B tests, but we did it only once. So we want to carry out more A-B tests in order to compare the effect of what we are undertaking here compared to doing nothing. So what I can tell you from the first A-B test was that we saw that there were 11% less failing students in the cohort in which we carried out these actions, but we wanted to carry out more tests in order to have more statistical evidence that this has impact. The point is that there are two things. The first one is to measure the precision of the algorithms. So this is certain and accurate. We measured it. We made some extra developments in order to have confusion metrics because you don't have them in Moodle Analytics. You just have the F1 score, so the global precision, let's say, and to have more in-depth analysis, we developed a recall and precision and the confusion metrics. So what I can tell you here is that the precision that we have for those algorithms in order to detect students at risk is the numbers that I showed you. So it's quite high. It's 76% recall and 82% precision. Then in order to measure the effect of the intervention in the students, we just saw that 11% less students were failing in the first test, but the thing is that we just sent one email and so we are right now carrying out new experiments, sending more emails as I just told you to try to decrease the number of students failing. So that's something that we are carrying out. Maybe I will be able to give you the numbers by the end of the year and we can keep in contact if you want to talk about that. Thank you. Hi, thank you for the talk. I was interested in knowing how long does it take you approximately to understand that the metrics that you're getting are precise. You were saying that you need to be running the courses to calibrate that the algorithm is working. So how long does it take you? When can you say, hey, this is working now? Well, if you have historical data, so the training data in machine learning, you need training data and test data. If you already have some courses inside your platform and let's say three or four, for instance, then you can just go on the Moodle Analytics documentation, train some others. As easy as I tried to show here, but I admit that the quality of what you could see here was not as optimal as I expected, but once you run the algorithms by following the Moodle Analytics documentation, you have the results almost in a few minutes. So let's say that the cost, the barrier to entry is just to read the documentation and undertake those experiments by just going to the section Moodle Analytics and Analytics Models, choose the target that you want, then select the predictors, then the analysis interval, depending on the timeline of your course, then choose the historical courses that you have and then run and you will have the output quite quickly, the precision. Yes, do you have any experience how this is impacting performance-wise the system when you're running an analysis and is it better to run this in the night hours or when do you run your analysis? Okay, well, you can automatically configure them and they, at the time that you want, so it's true that we do that during the night, but when you want to do it manually, you can, it is better to run it from the common line rather than from the platform, but you can do it from the platform, but it's better to try to do it from the common line for programmers, the common line, you know, the black screen from which you write code. Hi, hi, thank you for breaking it down. This has always been like a mystery box to me. What I wanted to ask you about is the read indicator amount. How does it actually predict or calculate, does it calculate the reading by dedication or by, you know, the logs within the course itself? How does it actually work? Well, it's a machine learning algorithm, so it takes a lot of variables as vector inputs and then it's a function that tries to minimize its error, so it tries out many times until it is calibrated well. That means that the error is as low as possible and then you run this function on new data and you see the performances and those performances are those that I showed here on new data that the algorithm hasn't seen in order to evaluate it. So I can give you more information if you want after. Yeah, well, thank you for the talk. As far as I understand the algorithm, you need to have similar courses, the one you trained upon and the one that you're predicting. Is this is also your experience or did you generalize it to be able to predict even, because instructors all the time change the course? Yes, I understand your question. You mean whether the structure of the course has impact normally? I mean, well, from our experience, we have courses that have roughly the same timeline. So I cannot say from courses that are very different, because if you make the predictions after one week, if you make predictions one week after the start, for instance, if you have a six months course and a five week course, people are not going, the students are not going to react the same way when they're going to be given the course. So I would say that it's probably better to try to make the training on courses that have a structure that is quite similar, even if it's not exactly the same, but not six months courses and five week courses. Even if the platform, let's say it does not care, but it doesn't make sense to make some predictions from courses that are so different. Any domain? No, you can have, it is, we don't care about the domain. Whether it's finance, corporate social responsibility, entrepreneurship, whatever, it works for any domain. You're welcome. Sorry? No. No, no, it's, I mean, the server on which you run Moodle. Well, if you have any question, my email is urena, so u-r-e-n-a. You will be able to see my name on the Moodle schedule at essec.edu, if you have any question. Feel free to ask, I will answer with pleasure. Thank you.