 to press the recording button. Thank you very much. So welcome again, Paul. And now the floor is yours. You should have sharing privileges. Please go ahead. Yeah, thanks. I mean, thank you, Igor, and it's great to see everyone. It feels like back to school in September. Well, I expect, like many of you, the whole topic of AI has been something that's been flooding our media and flooding our inboxes. And like you, it's been a bit overwhelming, I'd say, from time to time. So I was fortunate, I suppose, having transitioned out of the executive director role to have a little bit of time to do a deep dive on AI. And so I spent a chunk of time looking at AI, but explicitly from an open perspective. Because a lot of what we're reading about AI is about other topics, all of which are interesting. But who's really looking at it from an open perspective? And so with that in mind, I wrote this rather long post. And it's too long to really go into in detail today. But I would like to provide at least a summary at the high level of the major topics that I explore. And so let me just describe what those major topics are. And I will try to keep this to like 20, 25 minutes kind of playing. Obviously, there's a lot of things we could discuss. I simply want to, in this session, introduce some of what I think of as the major topics of discussion relevant to our work in open. So here are the topics that I try to cover in this post. One has to do with the source data that is used to train AI, including the origins of that data, the legality of its use, the implications of that data. And here we're talking about the use of things like scraping the web as a dataset that gets used to train models in AI or the use of Wikipedia either. And then secondly, I wanna talk about open licensing because there's been a really, the whole AI field has opened up this whole interesting conversation around open licensing that goes beyond just the use of Creative Commons licenses and open source software licenses to look at things like responsible licenses. And so the whole field of licensing is now having a significant moment in a serious conversation around responsible versus open, I'll say. And then when I read all of the stuff about AI, it's often difficult to know exactly what part of AI are we talking about. So I have also generated a diagram of what I call the AI tech stack and describe the ways in which open plays out across each of the different layers of the stack. So it's helpful when you're reading something about AI to think, okay, that's this part of the whole AI technology ecosystem and they're talking about that little piece. So we will have a look at that diagram and I'll talk briefly about the layers and how open is different across different layers. A huge topic which I spend considerable time on is models and how models are being developed, the different kinds of models. And also a really hot topic right now in my view is the iteration of models based on user data. So data from our own interactions with these AI applications, as well as starting with like a foundational model, which is general purpose and then customizing that model with either proprietary data or additional data that goes beyond the initial training set. Topics that are somewhat talked about but one that I feel perhaps we ought to be spending more time talking about is just the pedagogical way in which AI is being trained. This is like artificial intelligence. So it's about learning. We're seeing these massive language learning models and so what is the learning methodology being used to train AI? And I'm at least my superficial kind of review of what's currently happening is that it's very behaviorist and very much kind of reinforcement learning. So rather the low, the kind of historically old and rather primitive forms of learning are being used to train AI. And I think as educators involved in education we ought to be paying attention to that. And then the last three things, and again this is like too much to cover in any kind of detail today but we will look at them briefly. The whole ethics, AI ethics and where is open in the context of that. AI regulation with a particular lens on what the US is doing but more explicitly on the EU AI Act that is currently under significant development. And lastly, the last thing I did in this post was something that I'd like to ask all of us in this meeting today to think about and that is how can we move from kind of a rather passive involvement in what is happening where the stuff is going on and we have to kind of figure out what the heck is going on. And then often it's like well out of the barn so to speak and how can we move from that rather passive position to being more proactively open around what role we think open can play in this AI ecosystem and actually begin to express and vocalize recommendations around how we would like to see this play out. So as I talked to you today and kind of walked through some of what I've been exploring I would just ask you to think about what is one recommendation you would make around how open can play a significant role in this AI ecosystem. So that's kind of a high level summary of what I wrote about which is like a massively long blog post too long didn't read for a lot of people and there's a lot of links throughout it all that kind of take you off to further exploration if you wish. But maybe what I'll do is just take a moment to share my screen and then let's see, here we go. And then I'll just kind of use that to kind of walk down through some of what I've been exploring. But you can kind of, I've given you the link so you can kind of jump around as you wish. But maybe before I get going like out of all those things that I just talked about is there something that some of you or one of you finds like you really want to talk about? Well, you can always drop something into chat although I'm not looking at chat so you might want to just speak up if I get going on a roll. The top of the post talks about just my own personal explorations because I think like many of you one of the ways you learn about it is just getting hands on with it, right? So I've definitely spent some hands on time with chat GBT, Dali and more lately stable diffusion although which are really interesting but when you interact with them especially like I'd say chat GBT it initially is hugely impressive because it's very conversational it often provides quite you know really good overview summary answers to any text prompt you give it but when I used it I certainly experienced the whole issues around a deeper responses lacking so if you know a topic really well and you ask questions about it it might not be able to generate as sufficiently deeper responses you might hope for and also if you ask it where you know what sources did you draw on to come up with this answer? It either will say I can't tell you or if you press it will actually generate sometimes in my experience fictitious references and so that's the hallucination side of some of these AI pieces and those can be quite disturbing given the reliance or the very heavy usage that all of these things are getting right now. So once I started to experience some of the hallucinations I became very interested in what is the underlying data set? What is the data that these systems are being trained on? And how reliable is it? How full of bias is it? And that leads into some really, really contentious areas this is highly contentious the source data being used to train AI if you look at chat GPT which is the one I kind of dug into there's a whole bunch of articles referenced here in this post that provides some information around chat GPT's underlying data enough to say that it's like Wikipedia, Common Crawl, a set of books and so they are definitely using data scraped from the web as training data as well as openly licensed data sets like Wikipedia to do the training of their AI models. But the way this has become contentious is that is that legal? It's the big question is it legal to be just taking just scraping the web and sucking all that up into training data to be used to create a model for your AI system? Clearly it's legal to use Wikipedia as a source but interestingly from my point of view at least is Wikipedia's Creative Commons license CC by SA and so when you customize the initial training of data sets based on Wikipedia data you ought to then have a requirement to share back what your customization of that data has been if it is customized and it was really interesting to me to learn just how important openness and the Wikipedia data for example is really treasured as a high quality input source for training AI models and there's a whole bunch of things that are being talked about in terms of Wikipedia not only as a source of data but as a potentially something that's at risk if the AI models become as a kind of use as a reference source instead of Wikipedia. So there's some really interesting stuff here about the history of the use of openly licensed images and text to train AI models I won't go into some of the contentious ones but I will say that this whole issue about is it legal or illegal is really one of the media topics that you see being referenced all the time with authors suing open AI the maker of chat GPT for using their books let's say as a data set to train their model and the way it's boiling down appears to be and I feel like this is gonna be settled in court but the argument is is it really copyrightable the stuff that's produced by AI is it fair to actually make these sources of data from the web is it fair use to use those to train AI? I think I lay out the questions around fair use and that's what the AI developers will be arguing that it's fair use of this data but of course many people feel that it isn't fair use and that the big issue really is around whether the use of this data interferes with the livelihood of those whose data is being used and certainly artists and others are suing the record labels are suing it's like this is like a massively contested space the underlying data that's being used to train the models and so in terms of openness it used to be that some of that information was readily provided but organizations like OpenAI now are no longer saying where their data is coming from or even the meta model that came out is like they're not gonna disclose what the underlying data sources are and so that's actually an interesting issue because it's gonna lead to perhaps regulations that require you to disclose what data was used to train your model because otherwise it's like can we trust it? Is it reliable? Is it quality? Is it biased? Is it, you know, if you don't know what went into it then it's difficult to say. I did have a chance to go to an AI commons around table at the internet archive back in December where there was a really interesting exploration of rail licenses. If you've never heard of these these are different from the Creative Commons or the open source software licenses in that they contain rules or requirements around how you can use this item that's being licensed with a real license and so there's a list of extensive prohibitions that address a lot of the concerns that have been raised about AI and so if you look at some of the responsible or rail AI and user license agreement you can read in the introductory conduct and prohibition section all the things that they're trying to prevent you from doing with this licensed thing that's being made available through a real license. You can't stock, you can't impersonate someone you can't collect store personal data, et cetera, et cetera. There's quite an extensive set of prohibitions and so this is like, but the rest of the license is very much modeled on open source software and Creative Commons licenses. So it's like, think of them as a real license is like a Creative Commons or order of source software kind of license with additional prohibitions and so there's a sort of like a set of behaviors that you're being prohibited from doing and this is very similar to something I was arguing for at Creative Commons which is that I wish we asked creators to express their intent what is the reason by which, why are you opening licensing your thing? Because if we want to enable better sharing I think it would be enabled by ensuring that that creator's intent is actually happening. And so Creative Commons is actually doing some interesting work in this space and there's a whole set of questions around whether those even existing licenses like open source software or Creative Commons licenses will well adapt and kind of modify their terms or whether they'll have new licensing regimes like the real license or even Meta with their model came up with their own license. So there's lots of proliferation of licenses happening within the AI space. And it's just, I think it's really interesting. I think there's lots of what people are calling open washing because people are saying they're open sourcing such and such, but they aren't complying with the Creative Commons or open source software kind of rules. And I feel like there's a significant conversation going on right now around this topic of open licensing. I spent quite a bit of time trying to figure out what is the layers, what are the layers of the AI stack? And so there's a bunch of images here that I've lifted from other players which begin to show what the layers are of an AI stack. And so this one, for example, is showing compute hardware at the bottom and then a cloud platform that allows others to access that because the compute hardware is like super expensive. And then on top of that, the models and then on top of that apps or end-to-end apps. So there's kind of a stack, if you will, that you can imagine being created here. I think it's interesting when I look at some of these diagrams the absence of the data as part of the stack. So in my picture, I added the data back in and I provide some information about each of those layers in the stack and try to describe them because it turns out that openness pertains to each layer of the stack in a slightly different way. This is a quite interesting image in the way that it shows how data gets ingested and then cleaned and validated and transformed before it actually is kind of labeled and used in a more extensive way for trainings. This sort of shows how a model gets trained. They really, really think the whole area of models is really fascinating and has huge potential in our own work. And so I did spend quite a bit of time looking at models, digging into the difference between general AI models and then specific AI models and hyper-local models. And so there's multiple kinds of models being created in foundational models, of course too. But the general ones and the foundational models are kind of general purpose models that like chat GPT is very general purpose that train for general use, but then you can kind of customize it with specificity or even hyper-local kind of data to make it more specific to a particular task that you want to accomplish. The general and foundational models are actually designed to be open source and to be used as a starting point from which you can build specific AI applications. And so it's quite, what became apparent to me is open and open source, like already hugely important in the AI development space. And so if you're wondering whether it has a future, it definitely has a big future, at least the way it's playing out currently in the AI space. So I dove into models and I think really the model area is something quite fascinating from my point of view. I'm gonna skim down. I'm gonna try to keep this short. There's a couple of diagrams here that examine how a model can be customized or iterated. And so this looks at three different ways. Customizing a model using your own additional data or observing usage patterns and the way a user uses it. And then also looking at what gets generated. And so all those can be used to iterate a model so that the model kind of doesn't, it remains a kind of constantly being improved thing, if you will. But this issue about whether when you interact with chat GBT or stable diffusion or any of these other AI applications, whether your data and the way that you're using it can be used to further train the model is contentious. And you may have seen like Zoom had a big thing about whether it was gonna use the data that happens when we use Zoom like we are now to further enhance their AI model. Microsoft is having a similar moment right now. It's really difficult to understand whether Microsoft is using your data to customize their model or not. So chat GBT allows you to opt out. So if you want to not have your own usage of it be used to train the model, you can actually opt out. So there's a whole bunch of things happening around our, I guess our use of these systems and the extent to which they can be used by the technology provider for their own benefit to improve their underlying technology. And I think that's gonna be really a contentious area. I think there's some interesting issues there that relate to our open field. And then in terms of applications and apps there's just a huge number of fields in which this is all playing out in and the range of applications and apps. We're really just starting to see the beginnings of I think we're going to continue to see more and more stuff coming out of AI to do more and more things. Everything from transcription to two year own kind of personal assistant are going to be big developments over time. Here's my, I'm just gonna go down now to, oh, and I do wanna say something about this because I think the other, so models in my view are really a significant area of play here. And then the API is because in the technology stack they're using at the high level they're making their models available through application programming interfaces for others to use to build your own version of a model. So I think personalized and organizationally specific models will become an interesting aspect of this. And again, the models can be made available everything from fully closed to fully open. So fully closed when you gotta pay significant dollars for a fully open would be free. But then here's the stack that I ended up creating looks like this at the bottom is compute hardware then there's a cloud platform that makes the hardware available to others to use because it's so expensive to create your own. Then there's source data what's the source data being used to train your AI model then there's enhanced data then there's the model itself the application programming interfaces that are used to make the models available and then on top of that apps and applications and then on top of that we have our own user interactions and how our prompts and questions and use can be used to enhance the model. And then we have what we are actually getting as generative output out of a generative AI application. And this is really interesting because this too has been various users that have tried to essentially claim intellectual property or copyright over their output but so far at least the courts are ruling that output coming from an AI application is not copyrightable which for me this is under talked about piece of the AI space because in the essence that implies that the outputs are essentially public domain. And so in the context of open and the commons if all AI outputs are public domain that's actually a really big thing. Not sure that I'm right about that but if it's true that they're not copyrightable and you can't claim into IP over it then potentially all AI outputs are public domain. Here's a little bit. Yeah, there's a question on my topic. So, I don't know how well you know the court cases are these, is this just the generative output in its pure form or are these works created by a human that used the generative output to create content? And it seems to me the most common use of AI that I've seen is that you use, you don't take the pure output, you might use that as a starter or you might have to build an outline and then you build something. And I guess the question gets around do you even have to cite that you used AI, right? Are we at the point where you do have to cite you use the calculator to get to Excel to get to some sort of output. This is this weird space here but do you have any sense of where these court cases fall right now in terms of the how much creating what they output? Yeah, definitely. Yeah, you reference a really important thing, TJ which is, well, I would say there is a court case already that said that someone who started with the pure AI generated output and then customized it, they were allowed to claim ownership over the customized version. But even that I think will be contentious. I think that we are only at the very beginning of all these court cases and I think it's just really a bit of an unknown field because I would also say, TJ, that I think that we're budding up against the limitations of copyright and IP to handle some of the decision making around issues like that. So it's like, yeah, watch this space kind of thing and see what happens. Yeah, I'm learning, I mentioned this already. I think this is like, what's the pedagogical approach by which AI is being trained? There's lots of stuff about machine learning and neural networks. It gets quite engineering specific and very complicated fast. But when you start digging into just exactly how it's trained, I would say that it's not really the models are not sort of, you're not gonna be engaging it in an intellectually, physical, culturally, emotionally, social kind of way that we definitely see as part of the contemporary landscape for pedagogies. Ethics is like a hugely fascinating area and I've kind of highlighted some of the big headlines is that the greatest art is it an appropriation of the sum total of human knowledge and these are like things from actual articles that others have written. You can't be used to solve UN sustainable development goals and so on. So there's lots of ethical issues that are both for and against AI and I won't go through all of them but I think that this is a space where I think we in the open field could be playing a role, could be kind of taking an advocacy role if we wanted to and kind of stating what we think should happen. The UN has been weighing in in terms of how they see ethics in artificial intelligence and who's responsible for it. And I know that there's some deeper and more specific work taking place particularly in the EU context and also of course, there's a lot of, based on the concerns there's a lot of efforts going on to regulate. And so both EU and the US are kind of having regular meetings around this. I'd say that EU Act, which I go into in quite a bit of detail here in this post is by far the furthest advanced and it's really the priority of the EU Act is to make AI systems safe, transparent, traceable, nondiscriminatory and environmentally friendly an interesting priority and goal. And in some ways you can imagine how openness enables them to meet some of those priorities but it very much takes the tact of looking at the risks associated with AI systems. And so for like, I found this really interesting unacceptable risk AI systems will be banned. And when you read down the list of like the systems that fall into that, like high risk education vocational training. It's like, whoa, wait a sec. Anyways, it's interesting to read this from an education point of view. I'm not gonna go into the regulations because I wanna actually end with this basic thing, which is when I got to the end, I was like, well, so where do we sit? Where does open, where did we're all of us involved in open? Where do we sit within this AI ecosystem? And clearly open is already a driving force across the layers but there's really not been an overarching effort to fully acknowledge its role or to make proactive suggestions for how to sustain, strengthen and expand on it. So this is just a little bit of a diagram that I created to try to do that, which looks starting at the left at AI research and development and suggests that open access, open data and open science are inputs into AI development. I've already talked very briefly about some of the open issues across all the layers of the stack. But then on the right, we have like the ethical, the values and the legal that are feeding into decisions around how to regulate it or what might even be community self-governance because certainly a lot of the big AI developers are trying to govern themselves. And then there's what I call in the far right AI human which looks at the role of open communities and networks that looks at end users and looks at or ought to aspire, I feel, towards some sort of public good from all this AI. And so I closed this blog post with actually making a bunch of recommendations from my own personal perspective on what should happen in each of those areas in terms of thinking and positions or advocacy recommendations that we in the open space could make. And I invite all of you to do something similar. I'd love to hear, I'm gonna stop sharing. I'd like to stop sharing and just hear if any of you have specific recommendations. And I know I blasted through that without paying attention to chat. And I know it's like a huge post and I didn't wanna take up this whole meeting but that's a pretty good short summary. Vanessa, yes. So I'm afraid I haven't got a specific recommendation because there's a lot to unpack there but there are a couple of things that I've been wondering about and wondering whether you have also. So there are two things. Firstly, have you thought about how AI can, how can we increase the quality of AI through open? Because that's one of the biggest concerns that there's a lot of rubbish in the content of AI. And then my other question is, what are the threats of AI to our initiatives to advance open? I think that's also a concern of mine. And if you look at the higher education sector where they're really worried about AI taking over the jobs or taking, obviously lots of plagiarism and less equality, et cetera. If we encourage more to open up, I'm fearing that this is really threatening a lot of the work that we've been paving the way where we share more openly. I mean, my argument would be the more you share openly the better the quality of AI and the more advantage we have. But there's also another question of some of the open content, whether it's really legally whether it can legally be used as such by AI. You can read it, reading access, but whether you can actually use it and then repurpose it, I'm not sure. I need to look into that again. But so sorry, I probably just added more to the mix. Yeah, there's a lot. Yeah, I know. But I think that this is really important if we're thinking about open. It's like, well, how does this affect our vision and our mission? And whether you thought about that? I haven't had time to read your posts. I'm really looking forward to looking at it. That's all right. That's all right. Well, I certainly have taken some time to think about that. I would say that in general, there's a high value being placed on openly licensed, let's call it commons-based resources as the source data for training models. And I also will say that the regulations, both from the EU and the US, are pushing very explicitly for the training data to be made explicit. Like it's almost like in the US context, it was talked about as like an ingredients label. What are the ingredients that make up this model? And if you don't tell us the ingredients, then the model's not out of it highly. Now in some ways, I would argue like, well, a lot of these applications are already out there being used by everybody. And it's sort of the horses out of the barn a little bit. But I think the regulations could potentially play a role in ensuring that the data being used is actually openly published. And then I think we can begin to see efforts made to ensure that that data is increasingly high quality because clearly the current data, you're just scraping the web, there's gonna be a significant amount of untruths and bias included in that data. And so mitigating against those is like a serious risk. And then in terms of like, are there threats to our work? Maybe, I actually have been looking at it from the other way. How does it change our work or perhaps actually make our work even important? And I guess maybe I'm an optimist. But I think there's some really, really interesting ways in which that could happen. And yeah, this is like a longer conversation, I think. But it's great to see that we're all starting to have thoughts like this and kind of identify some of the issues and the things that we should be talking about and exploring. Lisa, please. Maybe this is the last one Igor and I should turn it back to you for the rest of them. Or Lisa, let's do Lisa and Neil, okay. Yeah, a couple of things just to go to the last comment too. I think about, so it's impact on OER, about how a lot of our research studies have showed that the way OER travels well is because its provenance is known, you know who used it, you know what context it was used in, right? There's this whole sense that there's some, I don't wanna say that there's one big community because there's lots of communities, but there's some sense that we know what this is, where it came from, how it was used. And when it goes into, you know, AI tends to, at least currently kind of sanitize that process. So that's lost in a sense. And so I worry about that from the perspective of AI. I certainly see its benefits from the way you've described Paul, but I think those are some of the things that we have to worry about. And the other thing is, you know, the thing that I've been worried most about is, you know, the AI investments are from big corporate people who are seeing, you know, billion dollar signs, you know, in their eyes, and they could care less about open. So like I'm kind of wondering about our ability to impact that field unless we have a completely alternative, you know, AI model that's only used around open, open data, open access, open education resources, you know, it just seems, it's hard to imagine that the things that we think are important aren't gonna just get, you know, run right over by the big AI truck. That's a little pessimistic, but those are the caution signs I have. Yeah, I'll just quickly respond to say, I think that potentially like the issues about the way AI kind of sanitizes everything and makes it sort of bland, I think are going to change. I actually think that we may end up with like very much our own ability to create personalized learning models, like trained data sets based on our own data sources that we feed it. And that becomes a very different kind of picture. And then in terms of like, whether the big investors are just gonna blast who cares about open, I would just say like when I dug into this, like if you start reading some of the stuff that a lot of the big names put out there, they're actually very strong advocates for open in terms of speeding innovation. And just kind of the kind of, I guess the trying to deal with the ethical issues in ways that would appease a lot of the concerns from both regulators and the public. And so I don't actually think that there's a current, I actually think it may turn out to be that we get support for open, but I could be wrong. Neil, over to you. I didn't have a huge amount of any substance to add as usual. So I mean, I think the main thing I just wanted to say was to thank you, Paul, for the blog post, which is you did warn us it was long, which it clearly is. I think I need to go through it more carefully. I think the kind of key message I just wanted to communicate which is very personal message is over the last eight weeks, for various personal reasons, what one of my main quests has been to try to disconnect myself from the entanglement with technology that I think I have as an individual. Because one of the things that I've realized increasingly is that whatever the technology, whether it be having email on my phone or access to news feeds or social media or whatever, I'm concluding for me personally that the downsides of being entangled with the technology outstrip the benefits. And I have to say that just having listened to everything you've said, I see very little in what you've said that doesn't persuade me further that that observation is correct. So I mean, I think it's great that someone's doing that hard work of researching and understanding this field on my behalf so that I don't have to go and do it myself. So that's why I'm very grateful. And I'm not meaning to be pessimistic. I think that the point that I'm making is about the risk of the rabbit hole and how we get pulled into it and what purpose it actually serves and whether that purpose is our purpose or someone else's purpose. And I think so much of the discourse around AI as someone else's intention to try to manipulate us into believing that the things that they are trying to build are important. And I'm not convinced that that's true as far as education is concerned. So, yeah, so, you know, I need to engage properly and I'll try and do that with as open mind as possible. But I'm feeling greatly liberated by every step that I take to disentangle myself with my relationship with technology. And the further I go down that pathway, the more convinced I am that it brings great benefits. And it's so much of what we've been led to believe, our benefits of technologies are really about benefits to someone else and not to us. So I've just put that thought there as a way of thinking about how we respond to all of these different challenges. Sure. Well, thanks. I mean, I'm glad this sort of has stimulated some of that kind of discussion. I mean, clearly my intent here was really to do what I feel like is currently an absent lens on the whole space, which is what does AI look like from the open perspective? And yes, if you haven't read the whole post, I invite you to do the rabbit hole dive or not if you're a needle and just go outside and enjoy nature. Back to you, Igor, I'll stop. That was too long for me. No, no, great. No, thank you very much, Paul. I know that you've got some other engagements shortly. So you'll have to leave the meeting as well. But yeah, thank you very much for the deep dive that you did into the topic and for doing this overview and presentation and everybody else for engaging on this topic as well. This is just to also remind you that this conversation can continue. There are some different discussion threats set up on OEG Connect. You will see those links in the meeting notes. So you can continue these discussions there as we can see also in the chat exchanges that different organizations thinking about this topic and reflecting on it anyway, including Katie Comments and Ben from Wikimedia as he was sharing with us last time. And even like with Neil with OEG Africa, the implications of AI for open educational resources. So there are discussions happening on these topics and this can continue on this OEG Connect space as well. So we can conclude the specific item for today. Thank you very much again, Paul, I really appreciate it. And thank you everybody else for engaging. I'm going to stop the recording for now.