 I'm joining from the land of the Wurundjeri people of the Kulin nation also, and I'd like to pay my respects to their eldest past, present and emerging. When we were thinking about this session, Gerard and I were sort of corresponding and planning it and going, gee, we could spend an hour on each sort of slide that we've each got here. So we've got the timing down to the wire. So as Kaz mentioned, we're happy to take questions on notice and respond to them in writing after the session because we know this is a topic of deep interest. But we're also each happy to hang on the line for about 20 minutes after the formal close of this session. So up till about 6.50, if anyone would like to hang on the line then for any burning questions. But to get into the formal, I guess, presentation or hopefully knowledge transfer tonight, what we'd like to do tonight is to acknowledge that this topic can be really daunting. So new technology like AI can be scary, it can be complex and it could be hard for you to find a way in, especially if you're busy with your day-to-day or your business as usual. So it's easier to sort of ignore the technology or hope it goes away or not engage with it, but really hoping tonight to give you some capability and some confidence to engage with it in a positive way and to be able to take it forward in your practice as evaluators and have some comfort in using the technology as evaluators. So my tongue-in-cheek aspiration is, this is AI 101 for evaluators. Let's get everyone on the line to AO2 by the end of the hour. So I like to start off with a bit of a call to action. So there's so much written and published and talked about AI at the moment. So why should you care? So I landed on three main reasons for tonight, for this audience. So there's estimates that there are at least 275 AI and automated decision tools in the new self-wilders public sector. So that's a result from a stocktake undertaken across the new self-wilders state government. If we take my bad math and multiply that across Australian states and territories, that puts us at about 1,900 AI and automated decision tools across Australian state and territory bodies. That doesn't count federal or local government bodies. That doesn't count the for-purpose sector. That doesn't count the private sector. So there's a lot happening in this space already and there has been for decades that the generative of AI has really brought this to the fore. So what that means is the second point there, almost every organization in the country is asking what AI means for them. They're thinking about it at their executive levels, at their management levels and at the operational levels. So they're wondering, what does this mean for the skill sets that we have now and we need in the future? What does this mean for our productivity, for our competitiveness, for our ability to add value to service our customers and our target populations? So this is a huge topic of discussion right now. So it's really important to be part of that conversation and lean in an educated and informed manner to contribute to that and to help guide and shape that as part of our duty and role as evaluators. So sadly, as you'll see, there's new solutions and applications being introduced and developing at an exponential rate and so too are the jobs and skills then required to sort of develop, implement, manage, monitor, assure, et cetera. All these solutions, while at the same time traditional roles are under threat. So what we're seeing is there's a real shift in the capability skills and aptitudes that are being required. So we've seen this shift for some time with movements like future of work and sort of talking about what that means, but it's now happening in real time as we're living and breathing. So it's really important to have that future focus posture here more than ever to be aware of and to be able to take advantage of that technology and really on an individual level, ensure you're not left behind, but on an organizational level, ensure that your organization can make the most of the opportunities while mitigating the risks that do exist. Some common terminology and concepts. So there's a lot on the slide here and so this is one where if I went through every box would be here all night. So what I've tried to do is to group like concepts together and I'm happy to provide a copy of these slides afterwards if you'd like as well. So just find me on LinkedIn by Googling Christy Hornby. But so I've started at the top left-hand corner with some sort of general terms and why we're here. So AI, artificial intelligence, it's essentially machines making decisions or providing insights or providing knowledge. Automated decision-making, ADM has been around for some time. ADM was what was used in the road for debt. But ADM has also been used for some time in processing, in sort of recruitment processes where they might pull out key words if a candidate's included the word leadership, for example, it might pull out that candidate as a top candidate and it's been used for years by banking, insurance and financial institutions to determine what credit score you might have and whether you'll be approved or denied a home loan. So some of this technology's been around for a really long time. So if we move to the right generative AI is what all this buzz is about and that too has been being worked on for some time but it was really the introduction of chat GPT to the world free accessible to all, really democratised at the end of 2022 that brought this discussion forward. The sort of text in the middle, AGI, artificial general intelligence, this is sort of the concept of machines being able to think like a human. So traditionally AI previously has been used for specific purposes. So if you think of self-driving cars, they've been used for the purpose of driving or the recruitment AI that's been used for the purpose of recommending preferred candidates. AGI is more of a generalised concept where those kind of machines can think like a human. So they're not necessarily linked to just one application or one focus area but their knowledge and their insight can span a whole range of different areas. This is also a point where some are sort of arguing in the community, we've arrived, we're at AGI, the technology is surpassing human knowledge and can mimic the human brain and human insights. Some are arguing we haven't arrived there yet but will in years to come. So there's a bit of a debate about that but AGI is where the field looks like it's heading to. Over on the right-hand side are some sort of older or more common models of AI that you might see. So a reactive machine is one that sort of reacts as it says in the name to inputs that you provide it. So it might be decision trees or different algorithms, same with limited memory machines as well. So they might have some information about a particular task or a particular topic but they can really only provide advice on the scope of the inquiry that you're giving them with very specific parameters. Theory of mind is sort of a little bit further along the journey but not quite as far as generative AI where it can provide that sort of insight and mimic human behavior and feedback and knowledge to some extent but it's still very sort of rudimentary technology. If we jump down to the bottom right-hand side, this sort of constellation of concepts and terminology refers to the different types of machines that are now available and the ways that they learn. So on the left-hand side, LLM large language models, so these are your Google bar, or your chat identity, or your cloud, et cetera. So these are taking huge volumes of written words and being able to produce written content back to you. But there are also MFMs, so multi-modal foundation models, that one's a tongue twister, try and say that fast. So these are the ones that can produce audio, can produce video, can produce images. Something that concerns me is when I hear people say, oh, AI is not very advanced because they're referring to chat DPD and I agree the publicly available chat DPD isn't very powerful and is prone to a lot of errors. I think the conflation between chat DPD equals AI is erroneous. So it's just really, really important to keep in your minds that while that's been, I guess, the flexion point for us all thinking about it, that's not the sole type of AI technology or solution or product that's out there. It's actually far broader, which we'll talk about in a sec. But how do these LLMs and MFMs learn and grow and produce the amazing results that we get when we interact with them? So there's ML, so that's machine learning. So you might have heard of training data. So machine learning is when a machine, a program is fed lots and lots of information as training data. So in the recruitment scenario, it might be fed lots and lots of profiles of these are successful candidates for these kinds of roles. So the machine's trained on that training data and then it learns what a good answer is and a bad answer is with some reinforcement. So that's how that one works. Deep learning sort of builds on the machine learning and the training data, but it has more nodes and it sort of is able to use more complex constructs and interactions amongst itself to be able to go further on that and become sort of self-improving over time. So needing less and less human prompts. Neural networks, there can be a whole different bunch of neural networks and all different acronyms associated with them as well. So I've just left it generic for now. But neural networks are essentially when a machine mimics the human brain. So that's the further extension of them having different nodes and being able to connect different concepts. And that's where we see we're starting to get into that realm of AGI, artificial general intelligence. But all of them still to some degree more for some degree less for others still have that need for reinforcement learning. So you continue to teach them what's a good example, what's a bad example in that recruitment scenario which is quite a common one. You might reinforce the reduction or minimization of bias, for example. So studies have shown that, you know, these machines used in recruitment scenarios commonly produce white Anglo-Saxon males as the preferred candidates. And why is that? Because the training data shows why Anglo-Saxon males are traditionally successful for roles. So you need to continue to keep monitoring the machines what they produce and reinforcing the learnings going in the right way. How they do that? They do it through natural language processing which is their ability to read and understand written content. They do it through natural language generation and there's other types there of natural language understanding and others but they're the main two to be aware of that sort of reading and scanning and taking in huge volumes of text, understanding human syntax and expression and putting it out again as an output. Lastly, on the bottom right-hand side of the slide there's just some common concepts to be aware of when you're interacting with these types whether they're the large language models or the multimodal foundation models. So prompt engineering is an emerging discipline whereas we're seeing a whole range of skills and jobs, you know, pop up into existence. So prompt engineering is the sort of science and new discipline of how do you get the best results possible out of AI? What sort of prompts or commands or, you know, guidance do you provide the machines to best get the answer you want and need in the quickest amount of time? Hallucination is a term that refers to the ability of the machine to be wrong and sometimes they're so convincingly realistic but still wrong. So for example, chat GPT, the publicly available one doesn't have any data current, you know, up past September 2021. For example, it's a known sort of issue with the machine. However, if you might ask for information on something that happened in 2023, it might hallucinate and make up that information for you but presented in a very plausible, credible way. So, hallucination refers to the ability of these machines to make errors and you just need to be aware that that's something that's possible. Temperature is the last point where you can use temperature to change the sensitivity of the models and to change their response. So you can fine tune by raising the temperature, you can fine tune it to be more sensitive and provide a more nuanced answer. So they're just some three concepts to be aware of when you're engaging with AI machines. In terms of the current state of play globally, so again, this is a topic I could spend a very long time on but just really keen to give attendees here the Cliff Notes version. So over on the left, the US and China is where a lot of this development's happening. So no surprise with the US, it's got Silicon Valley, it's got a well-established tech industry and lots of knowledge and capability in that. So a lot of the innovation we're seeing is coming out of private companies in the US. China also has a bit of a vested interest in developing this technology and promulgating it for various reasons that are speculated by writers far more knowledgeable on the subject than me but a lot of the development's happening in the US and China as a result. The US is looking at some legislation, it's also just recently passed a requirement for its federal bodies to have chief AI officers. So that's a really interesting development where it's saying for its federal government bodies, this is a risk you need to be on top of this technology and have established executive positions for it. China is regulating some applications of AI. So it's chosen specific sort of industries and applications to regulate and it's sort of developing a bit of a wait-and-see approach on the others. The EU in the middle top there, so the strongest legislation to date is the EU AI Act. So unsurprisingly, they're the ones with the GDPR, the global data privacy requirement legislation that was introduced a few years ago. So it's unsurprising that the EU is quite on the front foot here. So the EU AI Act is really unique in that it's actually said there are some unacceptable uses of AI that we just won't tolerate as a society and they're outright banned. And then there's high risk applications that are accepted but have stringent reporting requirements around them. So the implementation of the EU AI Act, it will be very interesting to see. The UK has some emerging legislation that it's developing, but it's also working in partnership with its industries. Canada has a AI Act that it's just passed or is just about to pass as well. So they've got some legislation down the middle column there. And us over in Oceana in Australia and New Zealand, we're taking more of a sort of principles, ethics framework based approach at the moment. So Australia has the AI ethics framework that was introduced in 2019. I'll touch on local factors in a moment, but DISA and Sira have been doing some wonderful work in the thinking around this as well. So that's just a really quick snapshot of what's happening in some key countries. And then there are some international approaches that are happening as well. So there's the global AI partnership, which has a range of member entities signed up and a few other sort of global bodies that are sort of looking at how do we harness this technology, make the most of it, but manage the risks. As I alluded to within Australia, DISA, the Department of Industry, Science and Resources and Sira are really flying the flag here. So DISA provided the interim guidance to federal agencies back probably a year or more ago now that sort of was looking at the introduction of generative AI and providing some parameters and guidance for Australian public service members on what to do and how to use it. DISA has also commissioned and made public a range of sort of information and other knowledge materials on the topic. So I highly commend the rapid response review that sort of looks at the state of play of the technology and provides a really nice easy laypersons, few of the world and what to be aware of. Sira has the National AI Center, the name, and Data61. So Sira has a community of practice around AI and do informative webinars and also have a range of resources. So they're my top tips for rapidly upskilling yourself, go and access the very user-friendly materials available through DISA and Sira. They're doing great work and making it available to us all. So why not take advantage of it? Just a last little bit of background before I get into some of the more meaty stuff that we're all here for. With my sort of concern that there is sometimes a view that chat equity is conflated with AI, this is just a small smattering of some of the solutions that are available. So Gerard, who you'll hear from soon, is a big fan of Claude and I think the quote was, Claude takes chat GPT out the back and beats it up. I'm sure Gerard can correct that saying. But so chat GPT, Claude, Bard, Baidu, all sort of similar in the large language model space. Copilot, Microsoft has a whole range of different copilot applications to integrate with its, I guess, other enterprise solutions. So a whole bunch of different add-ons that can sort of extend those. Baidu's also in the large language model space, more of a Chinese one. Dali and Sinusthesia and not Runway, Runway's a video one, but they can produce images. So interestingly enough, I borrowed a book from the library last week that had the images generated by Dali. So that's taken the job of an illustrator this children's book had Dali generated images in it. Magic Form, something you can use to support customer interactions. GitHub Copilot, my partner's a software engineer, so they're exploring the use of that. Runway is a video sort of technology and Otter AI I've used before as well and that sort of captures your meeting notes and summarizes them. So it doesn't just capture a transcript and feed it back to you, but it's really fascinating in that it's able to generate entirely new paragraphs, summarizing what has been said just off the content of the voice input that it's given. So my main call to action here is just be aware there's a whole world out there and new applications developing every day. So keep your mind open to the almost infinite number of applications of AI and this is why it's so important to remain abreast of this emerging technology. Christy, can I just interrupt quickly just because it aligns to a question that's been asked around which ones you think in terms of the tools are the most current in terms of their data as well and there's been a bit of chat and some feedback on that but be keen to hear your thoughts on that as well. Which one's the most current and most used for? It's really difficult because my evaluator's brain goes it depends based on what criteria, right? So I would say for accessibility, I think chat GPT is still the most well-known brand. It's the one most people globally are interacting with. So its currency is a bit out of date but that's where I'd counsel, just be clear what you're using it for. So if you're using it to summarize information or pulling a general information pack together for you or write a letter for you, perfect use. If you're using it to conduct research into highly complex subject matter, maybe use other technology or maybe still use a human-based approach. Bard is quite up there, Bard's quite good. So that sort of takes the capability a little bit further and is more current and contemporary but isn't as well-known and therefore isn't as accessible or thought of by others. Dali is quite good for image generation. You do have to sort of prompt and poke it a lot and Bill, the CEO of the AES was just sharing some challenges they had with the image generation for Fastivale using one of these visualization tools. So they're not magic yet. Odder and fireflies are two sort of voice recognition softwares that are quite good as well. So I'd say courses for courses. Think about what you want to use the technology for and even then just hop on Dr. Google, Google can give you a sense of which applications are best for what use case. And with that, that's really sort of some of the key opportunities and risks here because the technology is so accessible. We've got unlimited access to this very, very powerful computing technology but we need to use it wisely. So fans of the Spider-Man movies will know with great power comes great responsibility and that's what I'd urge us all to think about on the line. So in terms of some of the real opportunities for us all here, the number one for me is I see a real societal shift here where it's the democratization of knowledge and power for those who may not traditionally be able to access it. So if you've got English as a second language or if you haven't been able to undertake further studies, this technology is for you. You can hop on and find an answer or perhaps access some writing or image generation capability that you'd normally have to pay for or go for advanced studies for. So a good example is Photoshop. Back in the day, that software package was commanding $1,000 plus RRP. Now you can get a similar thing with a little bit of prompting and poking from the image generation software and solutions that are out and available. All this means greater process efficiency. So tasks that used to be quite manual and quite onerous can now be done more efficiently and more effectively, ensuring you've got appropriate quality assurance in place of course. But it means more times available for individuals and organizations for strategic thinking and more value being created as a result. So I think this can really unlock some productivity here for the economy and hopefully we will finally reach that nirvana that was speculated back in the Jetsons of not needing to work 40 hours a week to make a living. I think there's also a real opportunity here with the new jobs and skills being created, new opportunities for all. So it's breaking down and challenging our traditional sort of paradigms of what education meant, of what skills meant. There's gonna be a whole host of new pathways for everyone and that will also break down some barriers that have systemically been in place for some parts of society. We'll also see knowledge breakthroughs in key areas which we already have. Healthcare diagnostic has just gone through the roof. There's some scanning of technology available of X-rays that produces sort of 98% success rate of diagnosis and most cancer diagnosis and different things like that. But we're also seeing it in food production optimization, climate change action, really looking for solutions to those wicked problem. We can put machines that can suddenly go one plus one equals three instead of one plus one equals two and be able to augment our own capabilities as a species much faster and much quicker for much less cost. There is always a downside and so there are some really significant risks here. So top of my list is privacy and confidentiality breaches. So with anything that's open source or anything that has training data that feeds back to the host organization or the owner of that technology, anything you put in the machines is gone. It's gone forever and it's out there in the ether. So there's a famous case in early 2023 of a Victorian man who was incorrectly identified as a whistleblower, you put his name into the tool and it came back with his name as a whistleblower. So he went to court to try and clear his name but wasn't able to and so that man's reputations just caught in the machine and will come up on Google as a result of this incorrect hallucination from the machine. There's also the risk of misinformation, disinformation. So with hallucination, the machines can inadvertently lie. If you prompt them enough, they will try and give you an answer even if it's not the right answer. But I've added the disinformation part as well as misinformation because we know there's been challenges in the past where we really are living in an alternative fact post-truth world and so with things like the US elections coming up and the ability to create and manipulate audio and video and I'm easier than ever, there's a real risk there. Perhaps we can't always believe everything we see in here. There can be heightened risk of discrimination towards marginalized people. So that example of the recruitment bias is a true one is a live one. So not just there but also in processing of passports, in processing of mortgage applications. So these algorithms, these machines can affect your lives in more ways than one. And there's many sort of schools have thought about it but one way that AI is distinguished from ADM, automated decision-making is that ADM has a black box and you can trace the decision back and AI you can't. So you may not know the reason why you were rejected for something but it can significantly affect your life nonetheless. There will be job losses and there will be structural change in our societies. Are we ready for it? We don't know if there's a forward plan for our country, for our nation, for the world, what's going to happen when these potentially large restructures and transformations of industries happen. There's no plan yet which is again, part of my call to action to make sure we're part of it so that we're part of a community of thinkers applying our knowledge and our insights to solve these problems for society. Last but not least, a lot of the challenges you think about humans in general, it's exacerbated by this massive computing power. So the ability to access knowledge quickly, the ability to write things quickly, that really exacerbates a lot of our existing risks around sort of using technology and just the use of correct and proper and true information. So it just sort of creates all the risks that existed in our society before but brings them on speed because of that extra sort of volume that we're able to unlock now. So there are opportunities and risks but to sort of start taking us towards Gerard's demo, which I'm very excited for, I've got a couple of last takeaways for you. So what does this mean for evaluators evaluating AI-enabled programs and then overly for what does this mean for evaluators using AI-enabled programs? So my top four suggestions, the framework we've developed at Grosvenor here is when you're evaluating AI-enabled programs, you need to take into consideration that you must evaluate at all stages of the program lifecycle. So you need to think about how you're assessing the model design, the models training, it's testing, it's implementation and it's post-implementation. You also need to evaluate more than usual to understand where the human is in the system. So you might hear a term of human in the loop, we don't think that's enough. We think that encourages a compliance tick box mentality of, yep, we've got a human in the loop, tick, good. We think you should know where the human is in the system and the human should be centered in the system. And as a result, how oversight governance and the avoidance of bias is being managed. And with that, you'll need to take that step of looking and paying specific attention to the training data. Thirdly, you need to determine the impact of the AI-enabled program as you would for any other evaluation. But with a particular focus on the quality of the training and or input data, so that could be a possible area for improvement that you might turn your attention to, the decision-making process and the impact on the human considering both positive and negative outcomes. But fourthly, hopefully a very loud and clear message through this, this all only works if you understand the technology yourself. So to sort of discharge your duty to conduct high quality evaluations and provide useful insight for finding the recommendations, you do need a layperson's working basic understanding of the technology to be able to do this, to be able to discharge your role as an evaluator from this point forward. If you're considering using AI in your practice, here's my four, which is really a cheeky six, top tips for evaluators using AI. So one, which is really number one, two and three because it's so, so, so important, be conscious of the strengths and limitations of the technology you're using and that sort of on that question earlier as well, which application is best? Well, it sort of depends on for what. So there is luckily, because it's such a developed field, there's a lot being written about it. And so you can find out which tool might be best for your purposes just by doing a bit of review and research as you would in curing or using any technology. Bias in training data is a huge issue. So just be sure you're not inadvertently perpetuating any inequalities or incorrect information. So it's really your responsibility now to educate yourself regarding AI and make sure you're using it responsibly. Once you've started to cover the appropriate use and the essence of it, I recommend validating your outputs at least the first few times. So, you know, the sort of advanced equivalent of look, say, cover, right check. So what's, what outputs the machine giving you? What output would you get using a more traditional means? How close or how congruent is that? Check that a few times, adjust and tweak as needed. And once you're comfortable with that, you can progress more rapidly. But I'd still recommend doing that process, you know, fairly frequently just to make sure there's no drifting away from what you'd see the results of being and to minimize that risk of hallucination. Please subscribe to threads and channels about it. It's very easy to get a lot of information like this invite size digestible, you know, chunks to you. So stay across the latest updates or any bugs. So there can be sort of instability issues or there can be major defects in different versions that are released. So stay across what's changing to make sure you're remaining contemporary. And last but not least, again, give it a go. We are a passionate community. We are a smart community. We are an ethical community. I think the world needs our thinking power and our consideration. And that involves you just, you know, being comfortable experimenting, trying something new and adapting to be future ready but also to provide the best quality advice possible. So without further ado, I'll hand over to Gerard from ARTD for a live demo. I've seen parts of this before at AES Brisbane and it was amazing. So I'm looking forward to seeing tonight's session and hope you all enjoy it as much as I will. Great. Thanks, Christy. And hi everyone from beautiful Gundich Mara country down in Waterville, Victoria. And I'd like to also acknowledge Aboriginal Elders past president emerging and extend that acknowledgement to Aboriginal, Torres Strait Islander, Pacifica, Māori and First Nations people who are joining us tonight. So I'm going to be jumping back and forth on share screens tonight because I'm going to be demoing across a few different apps but I'm going to start off by sharing my PowerPoint presentation because we always need to have a PowerPoint presentation. So let's play that from the start. And so I've titled this applied machine learning approaches in evaluation because I feel like AI doesn't quite cut it. It's a subset of where the opportunities lie in this area and it is evolving constantly. So there's a lot of terminology that gets bandied about I prefer to use machine learning because it allows me to encompass a whole range of tools. But I'll start by talking about evaluation encompassing a whole range of disciplines. Now the scripted fans in the audience might go, no, that's that's anathema. But I used to come at evaluation from the idea of it being a polydiscipline drawing upon the elements of market research and psychology and public policy to do its work. The flip side of it is that those various disciplines all have some form of evaluation at its core. What we're seeing with this development, this technological development, machine learning and large language models and yes, artificial intelligence is this effect across all of these disciplines. So regardless of which school of thought you subscribe to, it's pure logic that if these outer disciplines are being impacted then evaluation is going to be impacted as well. So in the next few slides, what I'm gonna do is give you a bit of an overview of where I'm coming from on machine learning and artificial intelligence and then I'm gonna get into the applications of it and show you some real life examples live of using some of these models to do evaluation tasks and evaluation like tasks and see how they perform and what we can learn from that and what are the implications for practice as evaluators? So how do we apply AI or machine learning and evaluation? Well, broadly and I mean that in two senses of the word. Firstly, AI we have to define more broadly. As I said, artificial intelligence is it's often bandied about. We're talking about things like chat GPT, Claude, Gemini, et cetera, et cetera. But these are really a subset of large language models which in turn are a subset of machine learning more broadly. So that's sense one. Within that, there's different degrees of machine learning. You can have fully automated machine learning out there on the right, which is where everything or almost everything is being done by the machine learning system and you just put the inputs and get the outputs. But there's different degrees of automation depending on the kind of work that you're doing. Sometimes you'll be doing part automation where you put in an input, you'll get the output, refine that, put it back in as an input. This is more the back and forth chat as an example that you'll have, but there are other types of machine learning that do rely on a back and forth and the machine learning is just doing steps along the way. And then you've got low automation where there's actually a lot of human input to get, for example, the data ready so that it can be fed into the model. And then you've got a post process, the data coming out of the model to turn it into a form that's useful. These are all types of machine learning and there's a lot of different models and algorithms and approaches that sit under that. What they all have in common is they operate from a design perspective of training, testing, and then applying. So these machine learning models have to have an understanding of the set in the world in which they're working and that's where we do the training. And then we test it on fresh data to make sure that it's able to attack new problems and come up with your togent responses. Once we're satisfied with that, we then apply that in practice. And then there's a cycle of refinement that goes on throughout. The other part goes back to this polydiscipline idea. AI applied broadly. There isn't one of these domains, yes, not even philosophy where I haven't seen AI be talked about. In fact, philosophy is where a lot of the deep discussions around AI are happening and there's some fascinating articles out there. There's one in the New Yorker recently that's talked about some of the big debates happening philosophically with AI and the interface with technology. But across all these domains, we're seeing applications. And as I said, because these influence evaluation and are influenced by evaluation, we can just simply expect that evaluation and AI are going to be things that happen together. So with that in mind, I'm gonna run through applications of AI and machine learning in evaluation across all stages of the life cycle. And for this, I'm using the rainbow framework that you might have seen on Better Evaluation. It's a framework like any... It's not necessarily gonna cover all the evaluation space, but it gives us a good guide for the various steps of an evaluation taking place. And what I'm gonna do is show you at each stage how we can apply machine learning techniques. This is just the first taster. So I got Claude to actually define the rainbow framework for me. Because Claude is very good at being articulate, provided a nice little summary there of what the rainbow framework is. But let's get into the applications themselves. So on the managed side, we talk in the rainbow framework about, stakeholder development, identifying ethical considerations and even just the project management document management and where I see large language models in particular working well here is where they are assistants, advisors and repositories. So they can help do some of these tasks. They can be a sounding board for your ideas and they can come up with potential things that you might have missed. And that's gonna be a thread that we carry through a lot of the stages of evaluation. But the repository part is also interesting. Some of the more recent developments have allowed large language models to be applied to your document systems, custom data and start to look through that and structure it and add metadata as necessary. So I'm going to stop sharing for a second because I'm going to now demo what we're doing here. So for this, I'm actually not going to use one of the online services. And this is one of the big developments that's really gaining steam and it's all around locally hosted large language models. And this is what's got me excited because in AR2D, we have an AI working group, we have an AI policy. That policy sets out all the AI tools that we know that have potential use cases and what the use cases are. And most of them, we're saying you do not touch this because of security risks and because of our contractual obligations to our clients. What I've been testing out here with offline locally hosted large language models is systems where we're not sending any data out to the cloud. It is kept secure in a isolated environment. And we want to see whether this is viable. And so today I'm working in a piece of software called Anything LLM. It's a platform, there's quite a few that have been developed and allows me to talk with a range of different models in different contexts, load in test documents, test it out. The model I'll be using today is one called Zephyr 7B Beta. It's a locally hosted model. The 7B is about how many data points it's been trained on. So it's been trained on seven billion data points. Now, by comparison, some of those bigger cloud hosted models, they are anywhere between 33 billion and a trillion. You can't run them on your home machine. This one can actually run on a standard desktop computer. And it's a little slower, but not by much. And some of the more recently developed ones and Zephyr 7B Beta included are actually performing not far off like chat GPT 3.5, for example. And in some cases exceeding it. So what I'm gonna do in this live demo we'll go across to this thread here. And I'm gonna copy in a prompt. Now, the first thing you should know when doing work with AI is that it's only as good as the instructions that it gets. And what can really help in prompting an AI and getting a large language model to provide responses consistent with your expectations is to give it a character to adopt, give it a context. And so just pasting in here now, you're a program evaluator that's designing an evaluation of a youth focused mental health service in a regional community. So this is where it's immediately started to hallucinate. And I knew this might be a problem with this particular model. It doesn't quite process questions, but that's okay. What it has provided as a response is a little bit of understanding, okay, I get that this is the context. That's okay. What I'm now going to do is a follow up question. And I'm working with a completely fictitious design, but I wanna show you that there's enough knowledge underpinning these models that with the right prompts it can actually come back with some really useful insights that you can then take and refine with your knowledge in your context. So I say to it, okay, who might the key stakeholders be? How might they engage and what interests might they have? Useful questions. And I also said, can you put it in the table because I want a nice instruction that I can copy across to another document. And you can see right live now, it is generating that table and listing out the stakeholders. Now, interestingly, I compared the results of this prompt with what Claude puts out. So one of the big cloud services, it was almost identical in terms of which groups of stakeholders it was identifying. So the thoroughness was there. That gives me a good confidence that there's something here in these offline LLMs. And it's also added a little bit at the end to say, I hope this helps. All these assistants are designed to be helpful by default. That's a blessing and a curse. They sometimes don't tell you that they may not know something. But so far, so good. Going back to the slides, we then have defined the next stage in this framework, in this process. And what it can do is create descriptions from existing documents. It can do program logics and theories of change. This was our first test use case that we did in our research and development. And we were pretty impressed. It's a starting point. Like anything, you know, it's not perfect but it's taken you enough of the way there that you can really start working with it. The other thing that it's useful for is identifying unintended potential results and impacts because as a single evaluator, my brain space is only gonna cover so much context. Having a sounding board, having something that may prompt ideas is really useful because even if it doesn't get it perfect, it might prompt something in my brain that I can go and investigate. So doing the stop-start sharing again, I'll go back across to anything LLM and say, okay, let's define a few things here. What might be a typical theory of change for this kind of a program? Now, interestingly, when I've used the word theory of change across different large language models, it's come up with different definitions of theory of change. So some of them do actually treat it as a program logic-like structure. Others, they go with this broader generalized concept and some will provide an example of a pros-like theory of change. In this case, we've got a structure. It would identify the kinds of things that you would be having in the theory of change. It hasn't actually generated it. And some of that's down in language of the prompting. If you don't get the result you like, you can always go back and say, hey, I was actually looking for something a bit more like this. What can you give me that looks like this? And it goes, oh, okay, I'll try that. So if it doesn't work the first time, you do have the opportunity to refine it and teach it what you want. Because remember, these models are only as good as the information that goes in. And so if the answer doesn't look the way that you're expecting or wanting, always check whether it's what you've instructed it and how it's interpreted it. So that's helped. It's given some prompts about what we might be looking for in the theory of change. Fantastic. The next step is around framing and how we set that up. So framing, we can do various things. This is one where I did last year looking at whether chat GPT could actually analyze a rubric. And not only did I show that it kinda can, but it can also come up with its own rubrics, which is fantastic. And it can set up some of these structures that can guide our evaluation practice. So there's that one. I can also instruct a large language model, how's the best way to work with you to do the task that I want it to do? So in this case, I was asking chat GPT, how do I get you to review a rubric? And it just went, boom. Here's how you would write a prompt for me. Also on the framing side, I can get it to generate key evaluation questions, for example. And so what I can do now is go back into that same demo and say, what might be the key evaluation questions I wanna ask when doing a process evaluation of this program, using a realist evaluation framework? Now, hopefully this works. There's always the option for failure here. And it's just waiting. So I think what it's, yep, here we go. So it's done the intro text, but now it's actually creating a set of key evaluation questions. And they may not be perfect, but this is a very good start. And that's really useful for us to interrogate. It tests our thinking and it gives us questions that we can then move from. And that's where I see large language models as having real value is that they can assist that thinking process more than take the place of it. So whilst it's generating the key questions in the interest of time, I'll keep moving this presentation along because we've still got a few more areas where things can happen. So in the describe part, you can use these large language models to ask questions about what might be a good sample frame, how might we set that up? What might be good measures in this case? It can help with data collection tool design. Come up with survey question ideas. Again, they may not be perfect, but your job is then to refine them so that they are. There's all sorts of things about analysis planning and even data visualization. I've been looking at some demos of ChatGPT4. It can actually now spit out visualizations of data, which is pretty impressive. If you give it a spreadsheet, it'll analyze it, create bar charts. It's not perfect. And in fact, the demo I want to show is based on something slightly different, which is where I've got RStudio, and all that code there was generated from a large language model with the prompt, hey, could you synthesize some data? And it went, no. And I said, well, can you show me the R code to synthesize some data? And it went, yep. Again, it's only dependent on your input, check your input and know the limitations of some of these models. And in this case, it doesn't understand that it can synthesize data, but it can synthesize R code, which can synthesize data. So we just had to remember that there's that step in the process. And I said, okay, now tell me some R code to visualize the data because I've learned this time that no, it's probably not gonna be able to visualize it directly, but it will tell me some great R code to do so. And I took that, ran the code, and it popped up that bar chart. Now this is where it also gets quite interesting because I can tell it based on the data there highlighted what's going on in the data itself. So it can start to do a descriptive analysis of what's taking place in the data. And that's particularly interesting to me as somebody who's very much in the sort of quant analysis side. But what I'm going to do is actually show you this, actually I won't because of time, but I can show it perhaps later if people are interested. I can get Claude to actually do the analysis and it'll provide averages, standard deviations, talk about outliers. But what I will do is move across into the next stage of the framework, which is about, actually before we do, just quickly explaining this slide, people who have seen my presentation last year, this shows just the variety of different approaches that constitute machine learning in some degree. Right hand side, you've got GPT 3.5, there are other language analysis approaches. So you have BERT, you've got LDA, which is very old school. You've got QAD, which is my particular method, my pet method, and then just pure human coding. And I did an analysis last year comparing all these tools on a range of different dimensions. And you can see that there's different performance outputs. Moving to the next stage, we talk about understanding. So it can take the data, understand it, and start to interrogate it. It can think about things like counterfactuals and identify comparative analysis. It can look at potential alternative explanations for the patterns you're seeing in your data. And an example of this, I'll show it live. Same example of that mental health framework. Now I create a scenario and I say, look, on the whole, the programs worked. However, for certain group, the data doesn't work as well. And what might be some factors that could explain why? And each LLM provides a slightly different version of this. Here it's actually gone and just said, I'll just look at the key evaluation questions. So that's not quite what I wanted, but even within that, you're starting to think, okay, there are some contextual factors. It's guiding me towards the kinds of questions I should be asking of the data. So it's not too bad. It's not quite the right response. And when I've run this through something like Claude, it's sort of provided the response in the form of an answer that says, okay, you should consider these factors. But it's come back with questions that do reflect the kind of factors we'd wanna look at. So, this is where it's evolving towards some really insightful behaviors. And you can work with that response. Back to the presentation, we've got the synthesis element. And this is where I was talking a bit about some of the visualization that can be done, some of the reporting. One demo that I have from last year's was where I got chat GPT to actually run a rubric on some different types of data sets. So we had paper abstracts. So it was all the abstracts from AES 23. We had a database of recipes and a database of resumes that was openly available. And we created a rubric around each of those data sets and let chat GPT had a crack at it. And this was the raw results. We had sort of 10 examples from each of the data points and it went through and analyzed it and scored it and showed its reasoning when I interrogated it. So it is capable of doing some of this synthesis, but you do need to check it. You do need to test it. It's not perfect, but what I've seen over the past, even just the past six months is it's getting better. And there were new products in the market that are tuned towards different tasks. And those ones are going to have a high degree of implication for our work. Then you've got reporting the last stage. And this is where I got a bit lazy. I went, hey, I've just left the bullet points here. How about I get an LLM to generate something a little more pros-like? And so again, I'll bring up anything LLM and I'm going to switch over to a new one because I wanted to start from a clean slate. Otherwise any answers it gives will be in the context of previous answers that it's given. So it'll start talking about that demo program. And this is quite common with a lot of these back and forth type interfaces with LLMs. So in this case, I'm gonna say, right, take those bullet points and turn it into a 200 word pros summary and do it in plain English because my regular English is far too academic. And so you can see it's turning bullet points into pros text, which is pretty cool. It's a useful tool. Again, you have to check the output to make sure that it's reflecting what you were thinking when you wrote those bullet points. But you can see it's taken those bullet points. It started to build that out into something that reads like pros, which is pretty good. If I put it into some of the state of the art LLMs, the big old line ones, it's even more elegantly written and it might just come back with one or two paragraphs where it's all integrated and threaded and very articulate. Another example that we can do is, we can take pros text and turn it into something a bit more summarized, a bit more clear. So for this, I'm actually taking text from a paper I wrote and it was very academic and the peer reviewers didn't necessarily like it, but it is dense and it's like, okay, what does this mean for somebody in plain English? And it's happened to have a good think about it, but here we go. It's starting to summarize things in a bit more clear English. But you know what? I don't think that's clear enough. I want to take that and when it's done doing this, I'm gonna, oh, won't let me until it's finished. It's current output. So I say that's great and you put it in bullet points and here we go. As I said, this is all running on a desktop machine, which is why it's taking a couple of seconds. You know, it's not gonna be as fast as a giant server farm, but you can see it's put it into nice bullet points and that's useful, that's fantastic. Now, moving on from there, I just want to go into some final points very quickly. I know I've taken up a lot of people's time tonight and I appreciate your patience on this, but I want to say some rules about this. Firstly, always thank your assistants. They are models, they are just at their core probability engines, but I don't want to leave anything to chance. And it's good manners. And it also helps train the models, especially the online ones, that positive feedback is a reinforcement loop in some of those models to train them. So be thankful when it does work because that creates better models in future. What are the implications? Well, machine learning approaches can be cheap, some of them are free, but you get what you pay for and that's where we run into things like security issues because there's no such thing as a free lunch and it's the same deal. If you're using a free service, you are the product. So really be careful in selecting your tools and understanding what their security practices are. It's one of the reasons that we've switched over to testing offline LLMs so that we've got a greater degree of control over that data and the compromise for that doesn't perform quite as well and we have to do a lot more work of tuning so that's where we're paying for it in terms of time and effort to get it set up. Choose the right tool for the job. Different machine learning approaches and tools have different use cases and I already saw there was a great question there. What's the best AI tools for different things? People are still figuring their way out. I did put a link in there to a digest that I get by email every day and it often lists some of the new tools coming out and that's where people are doing the work to take some of these models and really fine tune them for specific applications and so knowing about what's out there might give you some good steers on what to use and for some tasks, these models are matching and exceeding human performance. You saw even on a live demo on a desktop machine it was coming out with text faster than I could type it at least and I think a lot of people could type it so I consider that exceeding human performance. Some caveats, we've talked about security and privacy you've got to be really careful especially when you are working with data that is not yours. So I really do advise caution, test, test, test and only when you were extremely confident about the quality of the outputs and the security of your data then you can start having the conversation about applying it. Data limitations, AKA context. We talked about the inputs needing to be important context is everything and there's only so much that these models can process keep in memory at a certain time so that also influences the output, the coding barrier it's still there, it's getting less but to really get the most out of these you still need to know how to program a machine and yes, some of these large language models can generate code for you but if the code breaks you still need to know enough to know to fix it. Then there's the algorithmic bias which Christie's discussed and I'll reinforce the training data is biased. There were things in place that are trying to work against this but you've still got to question everything in your data and hallucination remains a problem and this was clawed when I asked it originally about the rainbow framework and give me a nice pot of definition of it. I thought, oh, it's just a different framework and went on with my day and then went, no, I think I would have heard of this framework it looks like a really good framework. And I did my research there's no Mark Matheson who invented this came up with this framework there's no framework out there called the rainbow framework that looks like this but I'll tell you what, that's a nice framework. Thinking I might just use that myself in future but what I had to do is go back to Claude and say, no, I meant the rainbow framework on better evaluation that links with this and it goes all that framework right and gave me a nice pot in summary. So you have to question everything that comes out of these don't take it at face value and use your instincts as an evaluator and your experience as an evaluator to critically analyze these outputs and question them for veracity. You know, I know Christie said like human in the loop is a tick box exercise as soon as she said that I went, oh gosh, guess what's in my slide deck but it stands to reason whether any application there needs to be active participation of humans and scrutiny of the processes both within the system and from outside the system to make sure that it is functioning correctly. And that's where we still provide value as a profession is that critical thinking and that brings me back to this. This is the core of both schools of thought that evaluation is about critical thinking and even though large language models and machine learning are starting to do critical thinking like tasks and getting quite competent at it, we still need the human critical thinking and the experience and skills that trained and experienced evaluators provide to make those tools function at their best and create their best contribution. So thank you so much for your patience in going through that demo. I'll take a look at the chat comments now but I know that I need to hand it back to Christy and Christian who are going to do a little thing for you about some research in this space. Well, thanks, Gerard. While I'm bringing the slides up, so there's a question in the chat for you. How much does anything LLM cost? Well, how much do you wanna pay? Cause I can give you a quote. No, anything LLM is free. It is actually free software. That being said, the business model is more about hopefully providing some enterprise cloud served systems using this platform as the front end but they were also recognizing that there was a huge open source research community out there that is gonna be below their threshold and people who wanna work on a desktop environment. So for now, it's free. So if you're interested in trying some of these out, you can take a look there. The other piece of software that was running in the background that is actually powering anything LLM because anything LLM is kind of a front end. On the back end, I'm using a piece of software called LM Studio which allows you to use some of those self-contained models as a virtual server that anything LLM communicates with. The reason I use anything LLM is because it can actually take in documents which LM Studio can't. So it's just sort of an interfacing thing but both those tools are free for now. Yeah, great. Thanks, Gerard. That's a dope life. So just to sort of close us out and as Gerard said, thanks all for your patience. It's a topic we could yammer on about for a long time. So maybe next time we'll look for a half day workshop for all interested. But this is an evolving field and we've actually got on the line Christian who's a master's of evaluation student who's doing this research right at the moment. So I'll pass over to Christian to talk for a couple of moments about that and what he'd like from those on the line if you'd like to participate. Christian? I can put my first slide. So do I share the screen? Let's see what I've got. You're sharing my screen? Yeah. So hi everyone, so I'm Christian. Christian Chabouda, I'm a student at the University of Melbourne and I'm pursuing my master's of evaluation. They are excited to share my project. My final project is generative AI used in program evaluation. So this project is more than just academic. This is by exploring how generative AI is going to modify our field of work. So generative AI, we know it's generating new contents from massively large data sets like we've seen that TATTPT co-pilot, Google by Gmini and other specific tools that design process data and help the evaluation board. So slide two. Oh yeah, so I'm, no, no, that's that one. The previous one's for it. Can you put the previous one? So currently this technology AI is considered transforming the way it works. This is transformational path. Also this, if you like this, remind me in a long background in humanitarian aid, about the time when 2007 to 2009, the smartphones appeared in the field and the field data collection completely changed and the skill that was going with that also gets new requirement. So here, what I plan to do with two main work stream is first to review or study the article starting from 2022. We've seen that TATTPT started in November 2022 and more importantly is to get your input and from experience evaluator as you like. So I guess that this project will be getting better as much as it gets your help and experiences. So that's last slide. Over the next week, I would like to collect your views and your opinions about how AI, generative AI is modifying our field of work and that would be either for a short interview, 20 minutes, 30 minutes, or a quick survey about six, seven questions which the link has been put in the chat or is there or you can take the QR code on the screen at this time. So joining of course is voluntary and all data will be identified in any reports of application and we'll be able to get the summary of the results if you wish. So thank you and I hope I get a lot of feedback from that. Thanks, Christian. And I think that's a nice sort of wrap up before we close out. While Gerard was presenting, there was some commentary in the chat of what does this mean? Essentially, does this make our job pain by numbers? I think it just sort of increases the colors and the number of colors on the palette that we've got. So there are lots of options out there for using this technology. Some of it isn't surpassing human ability at the moment, some of it is, but it's just important to stay current and sort of consider this. But I think what's most important is to upskill ourselves as a profession to be ready for when we encounter AI-enabled programs in the world, which we are doing. So we even had an RFP last week about evaluating an AI-enabled generative program for a government department. So it is happening. We will see those requests and requirements more and more. So it's really important that we're across this. And I guess to continue the learning journey, the AES, as Kaz mentioned at the start, is doing Festivaal 2024. So that's running across May 28th to 30th. It focuses on AI and evaluation. So if you like this session, if you like this topic, if you want to know more, keep an eye on your inboxes and some more information will come out from the AES on that. So thanks all very much. We understand we've run a bit over. It's very hard because we're both very enthusiastic on this topic. So thanks very much. And feel free to reach out to either of us on LinkedIn if you'd like to connect or understand more. We'll stay on for a couple of burning questions. Oh, Kaz. I was going to say rather than going to the chat, I was just going to wait and see what happened. I do have questions, but if it just ends up being asked in the room, I can hold them for another time. No, that's all right. You're welcome to ask away because sometimes there needs to be the first volunteer tribute for others to feel confident. I'm really interested. And, Gerald, I don't know if it's more from your view, but as we go about thinking about how we use these models, that kind of value for effort, resource, like where's the, maybe not the tipping point, but sometimes we're doing small pieces and we might be doing thematic analysis and we might be looking at thinking about that coding approach that you take. Where's the kind of tipping point between the learning to actually be able to do that and to put that practice into place versus the doing it as an individual? So I'm just trying to make sense here. So you're thinking about like, what's the training required or? I guess both. What's the kind of level of training required to understand how to do these in an effective manner and then, you know, and actually the time them required to do it as well? I think it's really important and I think Christy has said this that, you know, we should have at least a basic understanding of how these things operate. And I actually took the time a couple of months ago. It almost is a refresher for me because I've been working with big data and machine learning for decades, but I went and took link, I think it was LinkedIn or Microsoft's online course through LinkedIn on foundations of AI just to see what was being set in the space. And it actually had an exceptionally good module on ethics in AI. It also had a module that got into some more technical aspects of how these models work. I think that can be useful for a lot of people. It's just going to probably be too much, but it helps me at least to understand how these models work. And I sometimes joke that, you know, chat GPT, Claude, all these are just very advanced probability engines. All they are ever doing is saying, given the previous word, what's the next word? And it's a probability. And, you know, it's a very reductive analogy, but it really is autocomplete on steroids. And that's how those models essentially function. Now, again, I say it's reductive because I know how they do function and it is much more complicated than that. But at the end of the day, it's thinking that it's just like, yeah, jumped up autocomplete. It helps ground you to remember that, no, it's not necessarily a real intelligence that you're talking to. And for that reason, it puts you in the mindset, this is a tool, like any tool. And, you know, if you learn how to use that tool well and wield it effectively, it can be very powerful. So it's worth, you know, exploring, you know, how you can best use it. And, you know, I couldn't even begin to recommend where there's some good guides because there are so many good guides out there. And as you saw in my demo, sometimes the best thing is to ask the tool itself because I know the training data is in there that it understands the concept of prompting large language models. So, because it's had the training data that includes it. So sometimes just asking it how it would like to be addressed can help you explore. And that's the other part. Find the time to experiment, you know, in safe scenarios and environments, have a play around. Learn how it makes mistakes. Learn how you make mistakes with the input. And then you can evolve from that because the truth is, you know, anything I've demoed today will be obsolete in a couple of weeks. You know, GPT 3.5 is all news. GPT 4 is all news. You know, every week, there's a new model out there. There's a new shiny. And they are doing, you know, really impressive things. The things I'm doing with desktop models now, I'm seeing evolutions of that that are already starting to exceed where we're at smaller, faster, lighter models that still produce really good and encouraging outputs of text. I'd add to that, though. I think there's two ways to think about effort, Kaz. So there's one in terms of what would be your effort to acquire the skills in a traditional way. So Gerard has programming and coding skills that I don't have. So for me to learn those skills, I'd have to go to university and do a three or four or potentially five year degree. So what this technology does is democratize it. So, hey, I need to code and develop this sort of thing, how do I do it so you can have a conversation back and forth with the machine and you can shortcut some of the learning that people like Gerard have done, you know, the hard way for one to have better term. But then I guess there's effort in terms of your question of, well, what's the return on investment if I'm doing just a small task? Well, it's probably not there for one-offs, but what if that task is repeatable? So if you're always doing a thematic summary, you might sort of develop some prompts that work really well for you. And so that's where you can sort of say, hey, can you check ResearchGate for XYZ or whatever database it might have and pull this information for me? So my top tip on that regard would actually be in addition to Gerard's advice about just experiment, just have a go. Also spend half an hour just sort of looking at the discipline of prompt engineering because that'll give you that shortcut to the effort equation as well. So with prompt engineering, like you saw in Gerard's opening, you are a programmer evaluated in this context. So telling the machine what role you want it to play, how you want it to reply, do you want it to be funny? Do you want it to be serious? Do you want it to be in the style of, I don't know, Daniel Andrews? It can respond to the prompts that you're giving it and that will shortcut the effort that you need to put into it. My favorite one is to get it to respond in a pirate voice. Arr! Thank you, we do have a couple of questions, I think coming in as well. One of the first ones in there and feel free, the people that have dropped me if you want to chime in was around, how do we address the ethical concern when we analyze the data involving human objects? And there's a question and it's a follow on question around any project that has had ethical clearance yet. And I'm kind of intrigued by, do you need to tell ethics committees that you're using AI in what form? Good question. We aren't using it in our business yet for client data, so I'm more sort of playing around with it to understand from a theoretical perspective what governing and evaluating AI means. So we haven't had that conundrum yet because we're still using human and traditional methods of doing that. In terms of the greatest insight I can provide, probably with evaluated automated decision-making processes before where they link up data across several systems and one was a vulnerable cohort where we had access to their names, their home address, their phone number, their chronic medical conditions as well and it would spit out a sort of risk rating of this person's likelihood of hospitalization. So the program was trying to reduce the number of potentially preventable hospitalizations. So that one required ethics clearance that was through the high-risk process because the act of linking data and using an algorithm to do so increases the risk to humans and individuals and could impact their ability to be identified and have their benefits impacted. So I'd say where you're getting into the AI automated decision-making space, it does tend to increase the risk profile of ethics and you do need to consider that, consistent with the national statement and all the other guidelines that you'd use regardless. Yeah, that's pretty much my answer on that. Use the national statement. The thing is a lot of these machine learning approaches have been used on data like this automated decision-making linkages. They do get used and they do get cleared. It comes back to the principles and some of that is transparency. Like, if you're throwing people's data into chat GPT, that's unethical because you're giving that data to somebody who's- I'm controlling. And it's again the reason that we're not using data in that way. I'm sort of coming from the perspective of would my client let me do that? Would I let me do that? Would stakeholders let me do that? And I am seeing contracts, no AI clauses. I reviewed a contract in a different domain to evaluation this week that had very clear no AI clauses and very circumscribed things. I think if you're using AI services and I include things like automatic transcription in that thing. So if you're using auto or if you're using even Microsoft's automatic transcription, be upfront, be clear about it and give people the option to provide informed consent in relation to the use of their data because it is their data. What they're saying is their data. And you'll see on Zoom tonight it said, you know, you're being recorded. Click okay if you're cool with that. That's the same sort of kind of principle. It's just more that it's a new tool and we extend it that way. Thank you. I am possibly asked still some questions in the chat but I'm conscious that it's hit seven o'clock and people might actually like to go and have dinner. Shall we wrap it up now? And kind of look at the questions that are in the chat and send out some kind of messaging around them if we can as a follow-on? Yeah, very happy to. If it's possible to export the chat, Gerard and I can sort of work through them across the next week and circulate that. I think everyone on the line's seen both of our passion for it lying from a sort of theoretical process and Gerard's amazing brain from an applied process. So, you know, we'd love to sort of help the profession in any way that we can. Yeah, and there's brilliant questions in there and if we had enough time, I would love to sort of get into the depths of them. But yeah, we'll do what we can to answer them in the next week or so. Maybe we'll take you up on that couple of hours and create a real profession around it. Or if people have the chance to join a festival, that might be the great space to do it as well. Thank you so very much for your time and your insight. It's always a great learning opportunity for me when I hear the two of you speak and reminds me how little I know and how quickly it's evolving and how much I'll need to do to catch up. And that's okay. But yeah, thank you very much and thanks everyone that has stayed on.