 Hello everyone, my name is Andrew, I'm from Catalyst and I'm here to talk to you about Moodle Asset Archiving, cleaning up your Moodle with a growing footprint of data. So Catalyst IT, we're the premium sponsor here, we were awarded partner of the year on Monday which is amazing, thank you very much Moodle. We've been doing Moodle for a number of years and we have a global team and we do big Moodles. So we have been doing big Moodles since around 2008, we were one of the first organisations to be associated with Moodle on AWS. So based on that experience I wanted to share some insights and some stories and some things that might be of use to you in a growing Moodle footprint. So as part of the digital asset archiving, part of the thing I wanted to do first is just talk a little bit about the word archiving because there's a lot of definitions out there on the internet around archiving and you know it's a historical thing, archiving is not new, we've been archiving physical documents for some time and I sort of came up with the first definition of archiving, the long one there which I'm not going to read, but the layman term for archiving I like is keeping things available for future use but with less storage and clutter and I think that's an applicable definition for Moodle but I want to focus on the future use component because you should in the future, it's obvious, but you should in the future be able to actually use it, it's obvious but I say it all the same and I mentioned this a few times, you need to be aware of legal considerations with archiving, it's real, it's the world we live in in terms of retention and deletion and also the restore process of how people will get access to archive data needs to be explored and documented and made clear prior you know because it matters because it plays towards the ability to do future use. So once again compliance and archiving, a lot of organisations have pretty ordinary and old and sometimes silly backup and archiving policies because they were built in a time they were made when it was dependent on how many tape drives you had or how many small discs you had available and good archiving policy doesn't mean you're compliant and being compliant doesn't mean you have good archiving policy. So here's a question for who remembers, do you remember your first ever digital camera? So let's say that was in, yes I remember my first ever digital camera, let's say that was in 2008 right? So who could if they were at home get access to all of the photos they took in 2008? Okay, who could get access to all the photos taken in October in 2008 quickly and easily? Okay, who woke up one day and decided that we're going to get rid of all of our photos that are older than five years because we just don't need them? Okay, I certainly don't, I just wanted to give an example and here's some photos actually here's some photos from us and some of these are well from before 2008 and since some old photos are catalysts just give you an example of a data set that it isn't all about every data asset needs to go through a cycle of use archiving and deletion right? There are digital assets that you pretty much want to keep forever. A big data footprint is not always wrong right like you shouldn't, there is not a process where everything has to be discarded. So moodle and archiving, what are the common problems? Now I only have sort of 15 minutes so I just wanted to pick the big ones. So what are the organizational problems that people are trying to solve around a big moodle because that's generally the context so often it's too much repetition, clutter and junk lots of copies of courses, stale data, server and cloud constraints you know we're running out of storage on our infrastructure we don't have enough disks, there's too many files, cost management because we're spending too much on our IT infrastructure or once again compliance and legal considerations it's important but usually quite frustrating. So let's explore that question of our moodle site data is overflowing or growing too fast. A lot of organizations in the COVID period their moodle got fatter and fatter and fatter right in terms of more and more data got in there and this sort of scenario that's meant to be a picture of a hard disk exploding from too much data that was the best of the bad options that AI gave me. So you know let's imagine a world in which there was an eight terabytes of data and in 2019 and then 12 and 2020 and then 15 and then 18 and then 24 that's a pretty common scenario and catalyst has numerous moodles that are well over 100 terabytes by the way and that looks like a trajectory that's problematic to organizations because they might have limited storage or they might have ideas about what their budgets are and something needs to be done. So when you are faced with this issue don't just look at the storage total right because it's not necessarily about the right number the right amount of storage that should live in your moodle. Ask questions like I mean hopefully there's some useful tips in here that might actually solve some people's problems but often it's not quite that simple. How are your course backup set up right course backups often can be set up in ways in which you're storing far too many copies of the same thing. Are you using media files in ways that are swelling your data storage unnecessarily like putting big raw media files inside moodle is not particularly efficient. You should either they should be they should be transcoded better or you should be using a video repository. Can you see any patterns or classifications around this the swelling in data is a particular faculties is a particular courses is a particular professors and the other one is how much of your data is associated with courses that are no longer being delivered or have not being delivered in some time. That's a useful one in some of the difficult discussions around defining old content you know what else can you see about your data footprint you know things like log storage which is not a magic wand but still with asking and the first question which I put last is what is your organization's data storage budget right like often there isn't one but that that allows you to make decisions with periods as it grows. So the other problem is you know we've got too much old stuff right we've got this we've got this moodle that has old courses and we know we've got to do something about it that causes problems we're not using the material it's outdated we don't want people to mistakenly choose it. So let's imagine a situation and we've had this discussion so many times we work with clients especially with the 3.9 4.1 migration sorry upgrade imagine you've got a I've got a reasonable size moodle with 3000 courses in there you're a university you've got a course AI and economics 2022 we're keeping that one of course right but you've got a course that's you know HML marketing you know 1998 okay moodle wasn't around in 1998 but let's say 2008 right an old course that no one's using you're never going to deliver again and you go yep that's all we've got to get rid of it. The challenge we see or have seen with our clients is that you know there'll be the what we call the gray area between the definite yes and the definite no for example fashion in 1999 1980s Libya that might be a really important part of analyzing social change in the Middle East right that might be something that someone still actually wants to use that might have good learning assets in it right you know philosophical maths chair dancing polar bears whatever it's not always easy to define the yes or no when you look at an already full moodle and hence this idea to just get rid of all the old or bad content often isn't quite as easy as people think and then the exercise you don't really get rid of much I'm in Europe so you are cool which I believe needs to be careful so the other one that does anyone in anyone's organization if you prepared to I'll put my hand up because we do have what's we call an archive moodle instance which is like an old version of moodle before you upgraded that you sort of keep somewhere so you can go and look at it right this is a popular approach to maintaining copies of old content but you do need to be very careful because an archive moodle like 3.3 or 3.1 or whatever or 3.9 soon is actually a production moodle right like it is a production moodle with the expectations around data protection and backups and all those sort of things and sometimes it doesn't get treated like a production moodle moodle doesn't know it's an archive moodle doesn't have an a native archive mode so you know there are things that could happen in your moodle your archive moodle that would be very bad you know for example if cron decided to run and send lots of emails out or something like that or someone updated a forum post or you know there's things around application updates potentially access control could all be cooked because you remove it from your IDP and you put a couple of logins so that people can log in and then you suddenly lose all this access control that's very important around who can see data you know should students have access absolutely not but someone might decide that's a good idea what is your end of life for these applications because they shouldn't just sit there it shouldn't be internet available but often it is because that's what makes it more usable so we're very very cautious about archive instances of old moodle's because they tend to just hang around forever and the risk around I mean you know the risk around data breaches and student details being published or any of that stuff is you know is very very real and only getting it's not to say the people are hacking more but certainly the consequences of these things is more and more real so the other approach is shipping off course backups it's fine but course backups are a relatively inefficient way of doing storage and also you run the risk of once again losing access control and digital repositories are sometimes a bit of a mess so a big moodle is not a bad moodle want to leave you with that thought and we love big moodle's so that's all I've got time for come to us a lot of booth thank you so we have a few minutes for questions if there are any literal mic runner do you have any suggestions when dealing with files and moodle I see all your points on this chart but the storing of files and moodle is you have to go into the tables view the huge tables and then you find the hash of that table that file and then delete it if you want to get rid of it or if you want to find duplicate videos and everything and so there's missing just now a sort admin file explorer just can see where is that one used or whatever do you have any hints how do you deal with thousands of files and get rid of them so we follow the wise words of our clients and these things but look analyze report explore okay so moodle should actually defend against duplicate files being uploaded because of the way the abstraction layer works for the file system so you shouldn't get a lot of duplicate copies of large files but you need to analyze you need to look for certain things and it isn't always easy but you need to first you need to define the question you're trying to answer right please I want to find all the files that are in courses that no one has enrolled in for whatever right and that's a solvable problem right there are a lot of a lot of SQL out there that has been published that would help you get some of the way but some of the SQL book will become quite complicated and it may be even needs to be a custom script or a custom report you know you could conceivably plug in your moodle database into a business intelligence system or something like a reporting tool I can tell the board but you need to actively and iteratively explore understanding the question you're trying to answer because it can be quite difficult I think that really interesting kind of unique use case we have students to do a two-year course 20 modules but the rewarding body requires that they can access for five years which is frustrating that means we end up often with hundreds of students who haven't logged in for three or four years they're just it that to me is clutter but I appreciate the need what's your thoughts on for example downloading that category or that collection of modules and having a separate moodle which should they be required to access again it's in a separate space and away from the main VLE does that cause more complications or is it better to actually just accept that I'm going to have hundreds of dormant accounts and live with that it like it depends but I will say that there is an overhead to every moodle instance right especially if it's a busy one that has a bit of data and is important organizationally and if you're using a moodle in ways that aren't isn't for delivering actually delivering education that you do need to think about making sure that that's not going to bite you but I mean you could you could choose either approach you could choose either approach it might also depend in your case around whether or not they're using an IDP like an organization level identity provider which might roll their accounts over conceivably and that that might be a consideration but both are viable it's just a matter of what works for you I don't I think there'd be pluses and minuses to both approaches. We have time for one last question. Hello I'm from the Profutura Foundation I had a question regarding certifications because we've been running for six years and we managed not to lead anything yet but we get we get teachers asking for certifications of courses they did six years ago. So the only thing we can actually manage to get those certifications is to have is to have the course actually available. So how have you managed that before because once you work with public organizations sometimes they require exactly the same modules even if the course has evolved and you have a second version of the same course they need exact the exact activity times exact modules they actually did. So how do you manage them did you generate them before archiving them or something like that so you have the actual record that the course was done. There is a great convenience to having everything in Moodle and it's staying in Moodle right there is a great convenience to that. So I mean your it depends to some degree about how much data gets generated and how big the footprint is. If it's not causing you operational problems to keep things inside Moodle then maybe you just keep things inside Moodle but if you export it then you need to be very clear that you've exported in a way that will make sense 10 years in the future or something right because that's one of the great risks of removing things from Moodle is that versions change things change and you know it will probably work but you don't you're not as sure as if it's living inside Rooza. There's nothing wrong with leaving things inside Moodle but with some organizations they have just such a massive footprint of data like if you have courses of people uploading a lot of files and all that stuff then it's it's more of an issue. But if you have like a compliance based training system for example where there isn't really that much data coming from the user and it's just like tasks and certifications and a small level assessment. The data footprint is quite small so you don't have to pull things out that's that we're sort of we're hooked on this idea of rotating and removing data and purging data but historically that was because we didn't have any storage now we do so you don't have to remove it and it sounds as though you're it's easy if you if it stays where it is so leave it where it is.