 pandas and python to easily investigate your Moodle data. So we have all this Moodle data and we want to improve lunar outcomes, we want to improve retention, we want to increase student satisfaction. Python I think I'm gonna argue is a good tool to use. It's open source, it's easy to use and read, that's good documentation and there's a large number of scientific libraries like NumPy and SciPy. Pandas is the Python data analysis library, it's also an open source library, easy to use data structures which I hope to show you. Very high performance and good documentation and I've included the link there. One of the ways to interact with Python is to use the Jupyter Notebook and iPython. They allow for interactive development, they can contain live code and equations and visualizations and explanatory text all in one. So you could use a notebook to teach someone else how to do what you've accomplished. You can share notebooks, you can share notebooks through GitHub and maybe some of the people that are interested with me can help get that get a community going there. Anaconda is the easy way to get Python and pandas onto your computer. It's a multi-platform, it could be a standard platform for this community so that we all have the same version of pandas and so that it's not some difference in Python or pandas leading to the weirdness that we're seeing. Moodle Adminer is based on Adminer and it's a plug-in for Moodle. It supports MySQL, Postgres SQL, Oracle, MSSQL. It makes it very easy to download data. Why do we want to download data? It minimizes impact on the live system and then you really want to do data analysis on a static dataset. If the data is always changing, you're not really doing analysis. The one thing is with Moodle Adminer, you have to be a Moodle admin to use it and if you're a faculty member, you can download data directly from Moodle. You can download from Moodle logs, you can download from quiz results and the example I'm going to work with is going to be data that is downloaded from a quiz. This is what it looks like to be working in a Jupyter notebook. It runs on a server or on your notebook or on your desktop and the interface is a web browser. To start using pandas, it's just as simple as import pandas as PD. Everything with PD is pandas now. Some of this other stuff here is to make the graph show up nice and then the quiz A equals that next statement. I've just loaded from my CSV file the quiz data that I've just downloaded. It's that easy. Then if I were to look at what quiz A actually is, I've cleaned this up a little bit because when you download the data, you get name and email and other things in there. You don't want to be sharing that with everyone. I've cleaned that stuff up. It's very easy to do. Pandas is intuitive. After I get good at this and I upload, I can just use quiz A ahead to see that, oh yes, things are imported correctly and I don't have student names. Pandas is the one that works for you. I haven't done anything. I've just uploaded the data and I can get the count, the mean, the standard deviation and the breakdown in quartiles. I can look at the data for everyone that has a score greater than 85, for instance. Very easy to do. It looks sensible. If you just look at that, you can guess that that's maybe what that's doing. It's very easy to do plots. Now, this isn't the correct plot for this data, but I just wanted to show you that you could just simply do a plot. With a little more work, you can do a scatter plot. This is time in seconds from when the quiz opened and when they took it, it was open for two and a half to three days and the grade. I think you can already see that maybe. You'd have to actually run stats to know. Visually, it looks like better grades on the first day. It spread out a little bit more on the second day and some lower grades. We're still getting some good but lower. In that last day, we still got good grades, but we also see some of the worst grades. Here's a histogram of that. I can also split things into the last day versus the first days and if you look very closely at this, you can see the minimum is different. The means are different. Again, you'd want to run real statistics to do an analysis, but right now, we're just looking to see if this is an interesting question worth investigating and you could do all that real statistics using pandas. Here's a histogram of that information and then here's the final few hours and you can see that's where we do have some good grades. When you're working with this data, ethics are involved. Are you looking at data to improve your teaching? Are you doing research? You really should run it by your institutional review board and keep your data safe and clean your data, remove names and all that kind of stuff. What is my analysis picking up on? This is audience participation. Can you see the difference between these two rows here? Anyone? This was submitted to ARXIV as an open-source place to submit journals and the argument is the top row are criminals and they were at wanted for mug shots and these weren't mug shots. These were driver's license photos, but they use these photos and published them on wanted posters and then these are innocent people below, but to me and this is from someone else who actually analyzed this, picked apart the paper and picked up on this. The top row has frowns and the bottom row has smiles. Is the machine learning picking up on frowns and smiles or criminals and non-criminals? Can we really look at pictures? Anyone that does this kind of stuff? Look at this article, Cusiginogamy's New Clothes and I think it will open your idea to the misuse of data. Overfitting. The machine is able to minimize the right answers and doesn't generalize. That's essentially what overfitting is and then are we picking up on stereotypes, smiling versus frowning, trustworthy versus untrustworthy, you know? And machines are very good at finding patterns that aren't there. This is a plate of spaghetti with the image, you know, produced by Deep Dream looking for dog faces and we can do that same thing. Machines just help speed up the process. So be very ethical examining data. Science really does work if you're careful, recognize and control for your biases and don't stretch your conclusions. And then this is the last one. Am I doing research or am I just looking for potential lines of inquiry or am I looking for ideas for course improvement? You know, think about what you're doing. Research really needs to be done through your IR board and done ethically. And now I'm open for questions. I think it was very easy to do. If you look at the documentation and play around and don't get frustrated with your first wrong answer, you can figure out how to do this stuff too. Okay. Now, yeah, that's what I'm looking to gain interest to. Okay, so we should get together and think about how we want to do it. It's NBI, I'm the king. So I mean so far nothing like earth shaking. People turning in things late are doing are doing worse. People working in the middle of the night are doing are doing worse. The number of clicks, not surprisingly, in a course is not entirely related to two grades. But if you can analyze if there's long gaps between interaction, there's something there. But it's not always that way. It's really hard to tease out clicks and and stuff because someone could have just downloaded everything out of the course from the get go. Yeah, yeah, and I think that's the thing is that you do need to be real careful. And sometimes you might want to get a hold of someone at your institution that is a statistician because there's the right type of test. When you when you do a histogram, there's the right number of bins for your data. There are rules. And like if you were going to submit something to publication, that journal would would know what the right answer is and would read out you if you're not if you're not following the rules. So I will, you know, like avail myself of that because I think it's very important to do things the right the right way and it and some people invested a lot of time and figured out the right way for some data sets.