 So let me give you guys a little bit of background, right? So learning management systems such as MULU, right? So are now widely developed, deployed, and have become an integral part, right? At different universities, and we use them extensively to teach courses, right? So, and, you know, we're using primarily, you know, and, you know, you guys probably have heard that here at the University of Minnesota, we have one of the largest, you know, MULU installations, right? So MULU has become the standard, you know, LMS systems for the last, you know, four or five years, and actually a lot of homegrown systems have been kind of, you know, retired, and everybody's using MULU, right? So, and, you know, and it's used extensively here at the University to distribute course material, discussion forums, curriculum like quizzes, assignments, and so forth, right? As well as, you know, have a, you know, a grade book, okay? So, so it's a great tool, right, to allow us to actually, you know, measure, you know, how well, you know, the students engage, you know, with the different, you know, course material, how well they engage in the, in the, in the class. So, so the thing that we're trying to kind of, you know, we're, we're interesting to find out is, is whether or not we can actually use our leverage, you know, you know, information extracting from student engagement within a MULU site, right? To help us being able to predict, you know, how well a student will do in the next grade assignment, right? So, so the idea here is not to predict, you know, whether or not the student will get an A or B or C at the end of the term, right? Is that, you know, us kind of, you know, the course, you know, goes on, you know, based on the historical information up to that point, you know, how can we predict or how accurately can we predict the grade the student will achieve in the next grade assignment, right? And by grade assignment over here, we're looking, you know, primarily at assignment increases, okay? So, and again, you can, you can kind of think, you know, how a system like that would be used if we're used to something of an early warning system, you know, try to figure out, you know, which are the students, you know, are not performing well, or they're not performing as well as they expect to be performing, you know, given what the system is, is predicting, right? So, so the task is, as I said, is, is predict the grade that a student will achieve in a greater activity, right? Based on information associated with the student's prior performance, okay? And the primary data that we used for the study, that was a study that, you know, we used data that was extracted from the University of Minnesota's Moodle installation, I believe it was 2012, 2013 data. So, so we got data from around 11,000 students, you know, spanning over 800 courses. And, and within that data set, there were 114,000 assignment submissions, 75,000 quizzes, and there were roughly a quarter of a million, you know, student form postings, okay? So, this is the size of the data that we play with, okay? So, so we followed a traditional machine learning or statistical learning approach, you know, to tackle that problem, right? And the very first thing that we did is, you know, we analyzed a lot of, you know, the Moodle log data to extract a bunch of features, right? And actually, you know, let me back up. We used actually three types of features, right? So, one set of features had to do with the student performance specific features, things like the cumulative GPA of the student, you know, prior to taking that course, the cumulative grade that the student has achieved up to that point in the course, right? And also, we got some features that relates to that particular activity in the course. I think I'm getting feedback from two different mics here. Activity of a particular course. So, the type of activity, the course level, as well as the department, right? And then we analyzed the Moodle log files to extract a set of features from, for how the student interacts, you know, with the Moodle site, right? So, the features were extracted, the number of discussions that the student initiated, right? The number of posts that students wrote, the number of posts that students read, the number of pages, the number of times, you know, the student had something to do with Wiki and the number of other activities, you know, technique service and so forth, okay? So, that number, right? So, we broke into those features into different features reflecting, you know, what was the delta T between the assignment due date and when the student did that stuff, okay? So, counter-determined different time intervals prior to the activity due date that covered only the period after the last grade activity. So, that set of features that were extracted from the Moodle was between two grade activities, right? So, for instance, if there was an activity on the third week and there was activity on the fifth week, one wanted to predict the activity for the fifth week, one wanted to extract the data from between the third and the fifth week, okay? So, those were the set of features that were extracted from our dataset and we played with a couple different models, okay? And again, to give you guys my bias, you know, our goal was really to try to see, you know, developing new methods to be able to, you know, really new machine learning methods to be able to tackle that problem, right? So, the very first method that we used was really a Stroman's baseline, which is a very simple linear regression model, right? And, you know, our goal is, you know, can we actually do better than that, right? For those guys who are not familiar with, you know, standard linear regression, right? The goal here is to predict the grade that the student will achieve on a particular activity, so that G hat subscript SA is the grade that student S will achieve on activity A, right? And that grade is modeled as a linear regression with an after term, you know, which is called beta, which is, you know, sometimes we will call it a bias, right? And then there is a W, which is actually the linear model and that F prime SA is actually the feature vector, you know, representing the student, right? So that contains all the different features that I described before, right? So this is the standard, you know, linear regression model trying to predict the grade. Now, this model is global and for all the students, right? So we're trying to estimate from the data is a single weight vector, right? That will tell us, you know, how important are the different features in predicting the grade, okay? And then the thing that we developed is, you know, what we call that collaborative multi-regression model, which is a little bit more complicated than that, right? So the idea here is, instead of having a single linear regression model that applies to all the students, right, we assume or we postulate, right, that there are a small number of linear regression models, right, that can explain the different students, okay? And the specific regression model for each student is a linear combination of those, okay? So mathematically, the way that model is expressed is, you know, the predicted grade or the estimated grade, right, the G subscript SA is equal to two constant terms, PS plus BC, this is the student bias and this is the course bias. So those bias terms measure, for instance, you know, how well the student performs, like a constant term tied to the student, as well as a constant term tied to a course, like an easy course versus a hard course, okay? Plus the actual, the third term over there that the last term over here, there is a feature vector, the same thing as before, right? W is now a matrix, right, whose columns, right, now correspond to the different linear regression models, right? And PS is a student-specific linear combination or weight vector that combines the outputs of those different regressions, okay? No, yeah, probably that's a little bit too, a lot of math for some of you guys, but, you know, it works, right? So you guys, you know, trust me on that, right? So now, but the key idea is the following, is so we have the global model in which we assume that everybody has a single regression model that explains all the users, right? Now there's an opposite end to that that says every student has its own model, right? But the problem with every student having each model is we don't have enough data to estimate that, okay? So what the model says over here is that, well, it's somewhere in between, right? So they are, you know, assumes that the students belong in different groups, right? And each group, you know, has its own model, but a small number of those groups, small number of those models, right? And then the student-specific model is a linear combination of those, right? So that's why the term called collaborative multi-regression kind of model. So those are ideas that come in from collaborative filtering for those of guys who are familiar with that line of research, okay? So this is what I just said, okay? So let me give you some results, you know, to see what another thing works or not, right? So this is a plot in which it has a bunch of different things. First of all, the figure of merit, you know, what we compare over here is a written square error, right? So we have normalized the grades to be between zero and one, but the one being a perfect grade zero means not so perfect, right? And what we try here to do is we're trying to compare, you know, what is the error, right? All the difference between our predicted grade and the grade that the actual history and actually achieved, right? And the smaller the better, okay? So if you see over here, and that's one part, right? And then we compare two different types of models or two different sets of features, right? So one of the features is a set of features that contain nothing from the Moodle site. So those are the features that have to do primarily with, you know, the students, you know, GPA, his current performance in the course, the type of the class, the type of the assignment and the level of the class, right? You know, in terms of, you know, first year, second year and so forth, right? And we compare the two models, one is a single regression model and the other one is a collaborative multi-regression model, right? So the two squares right here at the top, right? Those correspond to the performance of that single regression model, right? When we use, you know, the red is the case in which we don't use any of the Moodle features, right? And the green is when we use the rest plus the Moodle features, right? So if you can see over here in that single regression model, if we add Moodle features to the mix, you know, we get some improvement, okay? Now, since the RMSE over here is close to 0.2, I mean, if you think about it, we're about, you know, 20% off, you know, from the actual prediction, right, if you want to think about it, okay? Now, once we switch to that most sophisticated collaborative multi-regression models, so we can see that, you know, the performance actually improves, right? And that's where we increase the number of regression models, you know, the size of that W metric that we estimate, okay? You know, the performance, you know, gets, you know, somewhat better at some point in time, you know, the gains that we get, you know, kind of diminish, right? And we see that again, you know, the same thing as before, you know, using versus not using features extracting from Moodle, you know, there is a significant benefit by utilizing features extracting from Moodle, right? So the magenta line over there is the best performance that we get, and this is the case in which, you know, we use those multi-regression, you know, models that contain, you know, in addition to the rest of the features, also features extracted from the students interaction with the Moodle site, okay? So, and the interesting part over here is that the performance in terms of what we get from prediction, we're roughly between, you know, 0.14, 0.15, right? So, I mean, roughly, you know, half the grade tick or less, right? You know, in terms of, you know, how well we can predict, you know, the grade, right? Which is actually quite good, because then we can, you know, what the takeaway method of the slide is that, you know, we can actually fairly reasonably estimate, you know, the grade that students will get, that means the models that we learned, they actually learn something interesting, okay? Which, you know, it gets us down to, actually, I can drop that stuff. So you can actually then start analyzing, you know, what are the, what are those models? Okay? So, this is a plot in which I show what is the weight of each feature, or at least for some of the features, when I use a single model, a single model, two models and three models, right? So, in the case of one model, which is, you know, that bar over here, I thought I disabled all my warning, all my reminders, okay, guess again. So, in the case of a single model, you'll see that, you know, some of the most important features was whether or not the student actually could view the material, right? No, it's always good, right? I mean, it's very hard to assign to submit an assignment if you haven't really viewed the assignment, okay? So, you know, the other part that is also important is, you know, how well the student does so far in the class, right? So the cumul of grade, which is really what it means, it means that future performance does correlate with past performance, right? So this is, you know, why that feature is very high, right? And then, you know, it's interesting that, you know, assignment has a big weight, right? So whether or not the grade active in assignment or not, you know, that impacts the performance, right? Now, the interesting thing that will happen is, is when we analyze the model in which we had only two regressions, so this is the middle set of results, and you can see that the two models have different characteristics, right? So there's a group of students, for instance, there's the blue guys, right? For which, you know, cumul of grade and, you know, cumul of GPA, the ones that have a high weight, right? And they have really nothing so much, or at least very low weight for anything that has to do with model activities, okay? And then the green model, right? Are the model that is well suited for predicting the students, right? That does take into account, you know, model activity, okay? So you can see that, you know, different student populations, right? You know, their models that predict their grade, you know, depend on different set of features, right? And again, you also see some differences when you go to three models and four models and so forth, just become progressively harder and harder and harder to figure out what's going on at that point in time, okay? Now, so the other thing that we did is, we looked at the two models and then we kind of, you know, took a somewhat of a deeper look. So what we did over here is on the scatter plot on the right, on your right, my left, is so we look at the prediction that we compute for each user, right? It's a linear combination of the prediction that comes from the two models, right? So there's that, in my mathematical model, there was that vector P of S, which is a student-specific model, weight vector. And what I think plots, it plots is how much of the prediction comes from the first model and how much of the prediction comes from the second model, okay? For that particular student, okay? So you can see, and you just did a scatter plot of that, we'll see a lot of the students, you know, come from right in the middle, right? So in other words, you know, those two models, you know, roughly weight the same, right? But we have a group that comes from one side and the group that's coming from the other side, right? So we'll call the green guys, right? In which, you know, the first model place is more important, you know, in determining the final grade, right? And then we have the blue model in which the second, the blue points in which the second model is more important, right? So the green guys are the guys that, you know, have features that has to do with the middle interaction features as well as, you know, the assignment related, whether or not the activity is an assignment. And the blue guys are the ones that, you know, for them the prediction is primarily driven by the GPA, right, you know, how well they have been performing, okay? And then we actually did an analysis to try to figure out whether or not those subsets of students, whether or not each dot over there is a student, right? Whether or not the subset of students, you know, have some sort of a different characteristics. So the interesting part is that, and again we look at some very basic statistics, right? So we may understand deviation. And we see that the red guys, which is the guys in the middle, right, tend to have the highest GPA, right? So, which, you know, one way to interpret that part is that, you know, these are the type of students, you know, whose performance, the models who predict the performance, you know, equally weights, both kind of, you know, past performance as well as the middle interaction, okay? Then we have the green guys that have a higher GPA than the blue guys. So the blue guys don't do any middle interactions, in which, you know, they will seem to indicate that between the blue and the green, right, the one student whose performance is better predicted by models that utilize, you know, model features tend to also achieve, you know, better performance. And the blue guys are the ones that achieve, you know, the lowest, you know, mean average grade. And then we did kind of analysis, whether or not there is a department-specific signature in terms of the two models. And let me, first of all, explain to you guys that red line in the middle. So Asma, the way she drew the plot, she compute the difference of blue minus green, right, and then sort them based on that, right? So blue minus green is positive, and some point times becomes negative, right? Some over there becomes zero, right? And that's the red line, right? I mean, that's an artifact of the selection of that sort, right? So you can ignore the red line for now, okay? So the interesting part over here is that for some departments, okay, you know, the majority of the students, I don't wanna say departments, you know, those are the departments for which the courses were taken, right? The majority of the students, you know, their prediction is primarily driven by past performance, right? Where some other departments, you know, the prediction is primarily driven, you know, by my mutual interaction features, okay? And the interesting part for that is, again, you know, we try to tease out of whether or not, you know, maybe, I mean, our first hypothesis was that, you know, the departments over here whose prediction was primarily driven by non-moodle interaction features were type of departments in which the courses that they were offering did not really have any moodle content, right? So if you don't have any moodle content, you cannot really interact with any of the moodle content. So that was not the case, right? So because, you know, when it comes down in terms of the amount of material that the instructors have put in the type of activities, you know, discussion forms and stuff like that, you know, there was a lot of that stuff going on. It just simply, those type of moodle interactions, right? We're not driving, you know, the prediction or they were not indicative of the prediction, okay? And the interesting part, I think, you know, computer science, I'm going to show my, okay, is actually somewhere here in the middle, right? And then we have some other departments in which, you know, a lot of the students taking those courses, you know, the moodle features were actually very important in determining their performance, right? And those had to do with, you know, ESL, psychology, you know, public health journalism and so forth, right? I can understand those acronyms. So again, you know, it's still kind of the verdict, it's still kind of, you know, we haven't really done a very deep dive in terms of, you know, trying to figure out exactly why this is the case, right? But you know, we did analyze a little bit computer science courses because, you know, that's our home department. And the thing that we found in computer science is that you have a lot of content on computer science websites, right? You know, the moodle sites. But our students, you know, they don't really interact a lot with that content, okay? And the primary reason for that, at least the hypothesis we have, is that they do all of their, quote unquote, online learning outside the moodle site, right? They just Google, go find the stuff they want somewhere else and go from there, okay? So now, so there are the reasons they're using moodle interaction features, we're able to improve the prediction accuracy, right? So this is the first class that I show you guys. And that should not be surprising because, you know, the moodle features is probably the only way by which we can measure, you know, student engagement, okay? So the features mostly continue to be related to whether or not a student has viewed course materials, right, as well as, you know, how well the student performs, right? And again, you know, there is some sort of a department-specific signature on the one set of features versus the other set of features in terms of which ones are best in determining the student performance, okay? So that is more, to a large extent, you know, concludes, you know, the technical type. Let's just have one slide, give you guys some sort of a big picture. So one of the things that we're doing here at the university, we have a large project, you know, started this year, in which we tried to leverage in the whole field of learning analytics to try to apply big data approaches to educational data, okay? And we're focusing on actually, you know, three kind of, you know, key tasks, right? One is, you know, improving academic pathways, you know, improving, you know, providing, you know, making them for effective pedagogy. I should be able to pronounce that word, but I have a problem, right? And then, you know, you know, aiding in retention and persistence, right? And the idea here is to leverage all the data that we have within the university, right? Coupled with advanced, you know, data mining and machine learning methods to try to tackle those problems. And one component of that is actually access or being able to analyze and model data that we have here at the university, right? So we have a very long, a large historical data set of model data. And we're hoping to really leverage that stuff to be able to identify, you know, things that, for instance, you know, what is the preferred, you know, learning style of a student, right? You know, under what kind of, you know, teaching environment the students perform better, right? So we can be able to both inform the instructor, you know, about, you know, their customers, right? In terms of, here's what they like and here's what they don't like, here are the conditions under which they perform better, here are the conditions in which they don't, right? And also provide that information to the student, right, so the student will have a much more informed decision, you know, in terms of, you know, what kind of courses to take, as well as through academic advice. And I'll stop here, because she's stood up. Questions? Quick question, we're sort of run out of time, but you're going to make me run all the way to the back of the room too, aren't you? Phew, get my fitness work out of here. My question, you might not be able to answer, and it's somewhat speculative, but you, back one page, you talked about some of your findings, and one of them was that, should I get it right? Some departments, the viewing tended to contribute and some it didn't. I'm guessing you didn't have much view point into what those materials were, but speculatively, do you think it could be those materials that required further action were the ones that were able to be predictive and those that were just there for their viewing were not? Yeah, that's a good question, and I have no answer to your question, as you have correctly predicted, right? But yeah, I mean, there's a lot of why, you know, kind of questions that will require actually a very deep dive into the data to try to figure that part out, but that can be the case. I mean, our initially hypothesis was, you know, was just lack of material, right? And that was not the case. I mean, the material was there, right? I mean, a hypothesis that we had, I guess, just a guess, I mean, is that, you know, when we look at some of the departments in which, you know, we could understand, you know, things like, you know, come from science and engineering, right? And knowing how our students behave, right? You know, we put a lot of material in the middle, right? And every time I teach, you know, I have a lot of material, right? So the whole copy-paste, you know, works extremely well, right? But you know, our students, you know, they're just Google. I mean, I mean, it's amazing, you know, how some of them, you know, will just get the assignment or maybe look at something, and they will not go there again, right? You know, the next time, you just, you know, Google has a much better search interface or have you to log in on the middle side and navigate over there to get what you want. So, but again, you know, you haven't really done a deep analysis. Try to throw this quick. I'd probably just another guess after seeing all this research. Thank you for it. This is really great to have this data in front of us to have these conversations. I hope we can do more of that with our own institutions. But just also making comment about how that change in those different regression weights is that the GPA, perhaps too blunt of an object, if we want to look at success of, based on previous performance by students, looking at the objectives of those courses and whether their prerequisites and the number of prerequisites required for that current course to wait, whether they're gonna be successful in tying their early performance. So this is actually a very good question, but for an entirely different project that we have in terms of trying to be able to predict course performance, so we can be able to do course recommendation, right? It's amazing that students are very rational individuals. Right, I mean, they will only take courses that they expected to do well, right? Number one, and they continue to remain those courses registered only if they're doing well. I mean, this is a subset of the courses, students, those are students that actually completed, right? I mean, you know, so to a large extent, students will not stay in a course if their performance is below what they thought should be capable of, right? And that's why, I mean, at least here at the University of Minnesota, I mean, the enrollment changes, the first, you know, I would say two to three weeks of courses, it's just dramatic, and I'm sure the same thing is happening in every other place, right? So, I don't know, so I mean, the whole idea of, you know, I mean, that's one comment, right? My question is whether or not the GPA, I mean, that has to do with the prerequisites. So most of the students who take those courses usually have the prerequisites, right? Now whether or not the cumulative GPA is a very crude measure, I completely agree with you. I mean, one of the things that I've been always complaining at least in our institution is that at the end of the day we get a grade. That grade just means really nothing. I mean, a course is not really one topic. It has, you know, it's really, it has a bunch of different things, right? If the student has an A or A plus, I know he did well in everything, right? You know, if the student failed, you know, you know, probably not pretty well that he didn't well in anything, right? Anywhere in between, I have no idea, right? So right now GPA is just a crude average. One day some sort of a different, you know, components of regular course. So it would be nice to have some sort of a component level grade. I mean, that would be great. All right. Thank you very much. George, can I ask you to join me in thanking George for his presentation? Thank you. Thank you.