 a we'r gwneud i gweithio, o amser Martyn Llynch, ymweld i'r wych mewn fath o'r talu, yn gwneud yn cael ei ddaeth gweithio i mynd i gwychau a'r ystod yn ysgrifennu yma, oedd hynny'n gweithio i'r fath. Ddiolch yn bryd. Dwi'n gweithio i'r gweithio i'r hyn yn ysgrifennu, rydyn i'r rhan o'r rhanion i'r ddweud, dyna ychydigol i'r cyflwyn i ddwy i'r 3.25 o'r rhanion yn ysgrifennu, ac mae'n rhaid i'r rhaid i'n 15 eich munud, i ddim yn gweithio. D newsydd, ydych chi'n dechreu'r cyffredin, rydw i ddim yn mynd i'ch gweithio. Felly, rydw i'n bwysig i'n ddim yn cael hwnnw i'r dweud ar gyfer y cyfrifiad, rydw i'n ddim yn ymgylchedd y chyfrifiad a'r ddim yn y dweud yr oedd. Felly, rydw i'n ddim yn cyfrifiad ar gyfer y ddydd, ar yr ymgylchedd yng Nghymru, gyda'r Fflaenau Llywodraeth, gyda y Gysgrig ddigonol Llywodraeth – y rywbeth deallwyr yma i gyda ddigonol Llywodraeth a'r cyfrannu cyfrannu cyfrannu cyfrannu. A dyna'r ysgulion fagod, dyna'n ffryd ac mae'n board o'r ddigonol Llywodraeth yn rhywbeth gan odd, ac mae'n edrych ar y gyrhau cyfrannu cyfrannu gyfrannu cyfrannu cyfrannu, yn yr ysgulion cyfrannu. Ac esud yn ddigonol o'r ddigonol views neu ddigonol enw'ch chymru fathau o'r cyd-drygu oesolion ni, gweld y cerddau sydd wedi'i'n cyd-drygu'r cyd-drygu. Mae'r hirthfeydd gondwch, uniglion Cerddiaeth, efo'r uniglion Cerddiaeth lirthaf, mae uniglion Cerddiaeth wedi'u'r ymddi Cerddiaeth. Ma'n wybod o wiilio'r teulu ar gyflorsau, fe yw'r bryd ac'r rhaid i'r cyd-drygu'r sydd i'n gilyniadau a mynd i'r dd discriminadau ym wath. Fe wna'n gwneud o fyddechau'r polar, a phwylwgol, mae'n amliannod i'n gweithio a'i bwysig i'r hwn i'n gweithio. Rwy'n meddwl i'r hwnnw, ac mae'n meddwl i'r hwn. Rwy'n meddwl i'n meddwl i'r hwnnw, ac mae'n meddwl i'r hwnnw. Rwy'n meddwl i'r gweithio ysgolig ar gyfer y Lleingell Lleingell, wrth gwrs March 2016. Shaw-Hans, yna'r gweithio ar y gweithio ysgolig? Yn ni, mae'n meddwl i'r gweithio ar gyfer y gweithio. So ydydd hyn yn dod o'r oedd y prifysgol gyda'n ffordd a i'r ffordd y ffordd yw yna yn ymgyrchol. Ond rydyn ni'n bwysig i gyd yn y ffordd. Roedd y ffordd yw'r ffordd yw'n riddio a ddweud yn bellio physical y Austin. Byddwn yn y peth bwysig i golygu. yr adnodd gyda... ...y'r adnodd gyda'r gymunedau â bwysig... ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? was that the data would be used to drive interventions that our personal academic coaches would use engagement data and the traffic lights that were implicit in that service to make interventions. This did not happen. I can tell you more about why that did not happen, but it did not happen for a number of key reasons. Nothing to do with the data, but mostly to do with human systems. A sort of also around programme in here was just to explore the potential of predictive analytics, which was for all of us in the programme quite an early offer. The schema of what an analytic service looks like looks like something like this. At the bottom is all the data that we're collecting about our students. This is the course level information, module level information, staff on a module, et cetera, so going from left to right. Then we've got all the past stuff about the students, the courses and modules they're enrolled on, and then we're moving into activity. Clicks, visits, VALE, book borrowing, we use the ALMA system, attendance monitoring. There's a veiled tier story about attendance monitoring, which I won't get into. Coming next year we do have panopto sessions viewed and quite excitingly we've got electronic resources accessed through the library. This all gets sent up. The critical thing is that data has to conform to a data specification, the UDD. That's the master plan. If it's not in the UDD, it's not getting in. This is all sent up into the learning data hub and this is where the magic happens. This is where the learning analytics process does its stuff. The outputs, the top layer, the most obvious for our staff is the staff dashboard. This is the descriptive data screen that the staff are accessing and viewing all that good stuff for the students. The students themselves are getting an app. They're using that app to consume a smaller subsection of the data, but they're also using that app to check into events, attendance monitoring. We've been cunning little foxes and we've now got our own way of drawing this data back in through a reporting service and adding stuff that we can't get into the UDD. That's data that we know is important, but it doesn't actually form part of the UDD and we're going to be using that for an exciting next stage next year. If this was all working nicely, you'd have alerts and interventions coming out of it. That's the program. We're in this for another two years, our licence last until 2021. This is one year out of three, so watch this space. The scourge of the blank spot. Again, if you were part of the analytics program back in the early days, you'd have had to choose a data science partner. We chose tribal and their mature product student insight. Any student insight users here. Good, because I couldn't talk about what had happened for us. The problem with student insight is a very mature product, but the problem that we found was when we were looking at the screens it was presenting and the screens it was presenting were showing a percentage likelihood of risk for the student. It was basing that on quite a lot of personal characteristic data parents educational background, their highest tariff on entry, this type of thing. The black spot was coined because our personal academic coach was saying to us if a student comes to us with some of these characteristics and they have a bad prediction rating, I can't do anything about that. I can be aware of it, but it sets up a bad precedent. It gives them a black spot as it were. It goes against the ethos of the coaching model that was trying to be promoted there. Staff were reluctant to engage with this. As it turned out, it was also very much about a data deficit model. It wasn't looking at the cohort or a course, but it was looking at an individual. However, last summer the supplier at that point said if you want to continue from this point you're going to have to pay a licence fee. We hadn't even validated the results. We weren't even sure about the results, but we were going to be expected to pay six figures. We were chucked. We were essentially dropped as part of the partnership. That left us working with JISC. JISC, by this time, had selected a consortium led by consultants Unicom and Marist College. The difference between the model within this one was that it focused almost entirely on the descriptive data on behaviour and activities. The personal characteristics that were in there were very minor, very background, no blank spot. The model only starts to generate predictions when the activity data appears. We were very happy with that. Our data was quite mature, so we were able to get our predictive analysis model working quite quickly. We enabled that last September. For a full year, I'm going to skip this whole section, how it works, you don't need to know about that. Rockcurve for any of the data scientists, the results. We were capturing the predictions that were coming out of 61 courses. It was about 3,000 students. We chose those 61 courses because of their low, medium and high retention rates. We were very interested to see a predictor. Was any good at predicting risk, academic risk? What it was predicting or set out to predict, or claimed to predict, was the likelihood of a student not progressing on the course, deemed as an academic risk. We had already developed a method for anonymising the data, so we could give it to an undergraduate. We gave it to an MSc student. We created a research proposal. MSc students took it and ran with it. The challenge was take this data, take the actual results from these 61 courses and compare the predicted result with the actual result. Not as a whole, but very interestingly through the year. At what point was this predictor any good, and for what reasons and for what courses? In the literature, there's very few people of anybody doing this kind of thing about a timeliness of it, but when does this happen? We want to know, at a secondary level, how the model performs over the course of the second year. We're attempting... What are we trying to do? We want to intervene early and we want to intervene effectively. We want an early indication of risk. The GIS model differs slightly from what we might be looking at, in that the GIS model is looking at the bottom 15%. The 15% of student average grades. If anyone is familiar with the types of progression codes that you can get for a student, we've got about 65, it's a moot point as to what categorises success and failure. Is resitting an assessment of risk? A lot of academics would say it probably is. Does the student see that as a risk, et cetera? I've got five minutes I'm going to crack on. I'm going to forget this precision versus recall, although it's an important concept. The results, this is what you're all here for. On the left, you've got the students in our sample who we knew were actually at risk, and on the right, you've got the ones who were not at risk. In red, predictor predicted those at risk. Now, you might think 75%, that's not bad. So if you were 75% chance of being predicted, and that just means that they were picked up in a predictive score once in the entire year. So as a year, that's not so bad. Unfortunately, through the year, it doesn't look so good. It's probably easy to see it on a monthly view, and this is on a weekly view. This is the whole data set. As you can see, kind of where you might want to see, there was a gap. Unfortunately, the predictor service stopped working for about seven weeks, which is why there's this gap in the middle. Right, so at the top, you've got this, it's reversing the graph, so the top layer is the at risk, the known at risk, the bottom is the not at risk. Now, what we really want is the top layer to be all red, and the bottom layer to be all blue. All red would mean we could actually predict that students were actually at risk. As you can see, it's not particularly effective, and particularly when you might want it to be effective right at the start of the year, because then you can do an intervention. By the time it starts to improve, it's the end of the year. Not particularly helpful. However, that's not the whole story. For some courses, it's very good. Now, here's the BASC computer games development course, and this is a very good graph, mostly red at the top, mostly blue at the bottom, if you see one of the first bar charts, that's about 55%. It's even better on this one, which is the BASC computer science. This is our best performing example. In this one, 75% of the students who actually failed were identified in that November data grab. That's really exciting. If we intervened using that data, we would have hit all the students who would have actually failed. That's a really good example. On the other side, unfortunately we've got some very, very poor examples. Theatre and drama, it considered everybody to be at risk. Aeronautic engineering, nobody was at risk until the very end, when some people were at risk, but they sat their exams, we knew they were at risk. Foundation degree and community, it was even worse. So what's going on here, well, I tell you what we think is going on here is the computing studies courses are the only ones that have got blanket coverage of attendance monitoring. So we've got a real problem in our institution of data gaps, and our attendance monitoring is a big problem for us and something we need to do a lot more work on. For that course and the way they teach it, attendance is critical, and they may put a big store in it. So surprise, surprise, the takeaway from this is where the data is good, the predictor works, where you've got no data, it doesn't really work very well. So what? There's some graphs which can prove this. You can see that the ones in red are best performance. This is a particularly interesting one. Clearly, in a good performing course, it outperforms the average miles. So, et cetera, et cetera. So the rock curve, anyone's familiar with rock curves, a receiver operating curve, that shows it takes the false positive and the true negatives, puts them together. What you don't want is this graph. What this graph is showing is almost a straight line. That's just better than guesswork. And that's in November. So in November, if you just got a monkey to chuck darts, you'd probably get as good as the predictor was doing. However, in June, it looked like that, and that's a good receiver operating curve. So preliminary conclusions from this one minute, perfect. We do happily conclude that JISC is using robust methods and it's predicted calculations. There's nothing technically wrong with what they're doing and it's been done well. Overall, the accuracy for a sample course is approximately 60%. Precision, which is a concept of how, when it was predicting a student, how accurate was that prediction, it was only 25% accurate and didn't get up until 35% until the end of the year. Recall, which is how many of the students who actually were at risk did it capture, was 25% didn't get much below 50%. So for the majority of predictions they're not useful until too late in the year. Where the data is richer in certain courses, it performed much, much better. But given the overall performance of this, we were absolutely right in holding it back and putting it in a black box. Do not look at it and test it. So we did the right thing. Our next steps, now that we've got a method for examining individual course performance, now we've got to continue to investigate what is the data that's contributing to the highest performing courses and why and what can we do about it. I'm going to work with suppliers to understand this research, confirm the findings particularly our two different definitions of risk. So as our definition of risk different to the 15%, it turns out that they wouldn't have made much difference. So I think the conclusions are broadly correct. We continue to collect data for our sample courses and for those courses where it is actually performing quite well, we're going to be carrying out a pilot where we're actually going to be using this data in anger with our course teams to say take action on this data because we can now be comfortable that in these courses and for these reasons it works. Directors Cut later. Thank you Martin very much indeed. Very well timed. We do have about three minutes for questions. If there are any in the audience. Yes, please. Second row from the front there. Thank you very much. Matt Offord from the Adam Smith Business School. I just wonder if you could talk a little bit more about the black spot. Is this a kind of machine learning problem where we pick up a lot of stereotypes due to bad data in the first place? I think it's the model that we're certainly seeing within the tribal model. So the tribal model was looking at outcomes and was looking at factors that lead to certain outcomes and it was putting a lot of emphasis on background characteristics. So the kind of things that students couldn't change is where they're from, their qualifications on entry, the way that they got here. So it doesn't tell the same, the right story. What we liked about the Unicom and Marist model was that it was that stuff was okay at the start but it was massively skewed by behaviour. We don't care where you've come from but it's what you're doing here that counts the grades you're getting, the attendance, that type of thing. So that's why we were more ethically drawn at the time. Thank you. Martin, we have a number of questions on VVox. I don't know if you want to select any one of those in particular. You'd like to respond to it. The first one right away, yes. I think there's a huge data literacy question about staff looking at understanding a dashboard. It's something that we're working through with our... We've only been doing this for a year with our personal academic coaches and it's something that we're constantly working to improve with our staff training. So I think data literacy about graphs and them from meaning is very important. I can't go to read that one, sorry. No, the prediction algorithm is not created by core staff. That's created by JISC, the supplier and the perception of the value of data plus putting the accuracy of the system. Students weren't seeing any of this data and they weren't aware of it and no action was being taken on that data at all. It's just sitting in the black box as an experiment. Lots of questions coming in now. I think we've literally got one minute. So I think the question, the longer question was just I think pointing towards courses with lots of course work, checking points, opportunities for writing drafts. Do they give more possibilities to identify students at risk? It's all about the data. So the courses that are performing better seem to be where we've got course work which is being graded early and put into the system. Quite often our courses might not put that grade information in the data system at all. They'll be using attendance data. They'll be using the VLE. So we've got a problem in our institution about lack of engagement with the VLE, so has everyone else. But our VLE rates are very low. But in courses where there is good use of it, you can actually use it as a metric for engagement and it does seem to be meaningful and there is a correlation with success. So I think that's a good question. But it's a lot about that activity turning into actionable data that we can put into the system. Great. Thank you very much. Colleagues, apologies because there's still a number of questions coming in here. It's provoked a lot of thought, which is fantastic. However, it's now time to provoke a lot of lunch. So if we could just thank Martin and other speakers again. Teachers and students can develop and share coding skills with multiple or Jupyter Notebook servers. Our DigiMap services deliver high quality mapping data for all stages of education. Future developments include a text and data mining service, working with satellite data and machine learning and smart campus technology.