 Erin, take it away and introduce your folks. All right, yep, I'm on. OK, oh, so I'm Erin Boyd. I spoke a little bit earlier about storage. I'm going to be your moderator. If you are just joining us, I work for Red Hat and the Office of the CTO, primarily working on hybrid cloud and multi-cluster capabilities for storage. But enough about me. I'm not the smarties on this panel. So if you guys want to go ahead and introduce yourselves and maybe tell us your most controversial opinion of the future of AI just to really start it off with a bang. Daniel, you want to start? Daniel Riek, Red Hat, Office of the CTO. I manage the AI center of excellence. The most controversial opinion is like, I think we're better off with AI. And I'm a big fan of Max Tagmark, who if I paraphrase him basically says that it's OK if AI replaces us because it's just the next round of evolution of life. So I think that's OK in the long term. OK, Alex, go ahead. Hi, good afternoon. So I'm Alex Housley, founder and CEO of Selden. We're an open source machine learning deployment platform providing model serving, model management, and governance. So yeah, good question. So my most controversial view is really that this kind of AGI thing that people are talking about is probably unlikely to happen. OK, are you all familiar with what that is? Do you want Alex to explain more? AGI? Yes. Artificial general intelligence. OK. In a lot of these discussions, the topic quite quickly goes on to this kind of singularity in the world of superintelligence. And I think that is quite way off, if not unlikely to happen at all. And there's many more exciting things that are changing the world, revolutionizing all industries, transforming our lives with the technology that we currently have available today. So I think it's really good to focus on that. OK, great. And Fred? So I'm Frederick. I'm head of Edge Infrastructure over at Doc AI, which does medical AI. I also have worked in the open source community extensively and networking. And one of the things that I focus on is bringing things like AI into the infrastructure. My most controversial opinion, I think, among many, is I don't think that we're doing enough socially to try to work out what to do when AI starts to replace jobs. And we need to start focusing on that now and not wait. Great. Those are all really great points. Thank you, guys. So I'm glad you guys touched on how AI is going to improve our lives. It's not necessarily this doom, except for maybe some jobs. But for the most part, AI seems to always be seen as a bit of a savior in enhancing our lives and a new technology that's going to revolutionize. But we're also seeing how that technology can exploit people and their data and their privacy. So since this panel is about AI and ethics, tell me why we need ethics in AI. Frederick, if you want to start. Sure. So I think you framed it very well. So AI is going to be everywhere. And it's something that is absolutely going to improve many aspects of our life. But like any particular tool, AI is particularly interesting in that as we start to build more AI models, the type of things that we decide to build or the type of things that we decide to train it on, the types of biases that exist in the data centers are amplified. It's not just like, hey, I have this tool. I'm going to use a tool. I used it once. It's like, I built this tool. And this tool is fully automated. And it learns. And it's going to do this thing over and over and over again. So we need to make sure that the type of things that we build, that when we build something, it's going to reflect on our ethics systems, on how we approach things. And so we need to make sure that we form these type of thoughts, that we have these type of discussions that are absolutely important to have so that we can all be aware of them. And even if you don't have all the answers or we don't have all the answers today, just the fact that we're a little bit more aware of them so we can drive in the right direction and come up with a more fair and beneficial outcome. OK, so Alex, can you address with the discussion of ethics how we can practically enforce that or lead the community in that way? Can you talk more about that? I wouldn't say so much about enforcement, but more about how can you put in place processes, tools, et cetera, to enable an organization to actually deploy machine learning models in a fair and ethical way. Data science in itself and still is, in many cases, a big challenge for organizations. So everything from obviously gathering the data, training, building models, and then operationalizing the models. I think about 60% or so of organizations now are doing some kind of machine learning, but only about 13% of those are in production. So that's great for the challenge between data science and DevOps and how the two teams collaborate through deployment. But the things which really matter to an organization as a whole and to execs and people at board levels, et cetera, are around will my organization get fined by regulators? Will we get reputational damage? Will we kill people by accident? Those kind of things are incredibly damaging to a business. And so these are things which ultimately will be driven by ethical principles, which are kind of commonly accepted, but not formalized. And then regulations are only starting to emerge. And as they emerge, obviously, they're a moving target, and both broadly across industries and sort of industry specific. And what companies have to do is a big challenge here of translating these rules written in English or other languages into code, which they can drop into their machine learning pipelines to avoid them from blocking up. So that's a very big problem and one which is really best solved through open source collaboration, a lot of the best tools that we've seen emerge for things like explainable AI, bias detection, et cetera, have emerged in open source. And that's obviously built upon open research. Right. And so with those rules that will train these models, there also is liability around those. Daniel, can you go into how liability today with our models is maybe going to become more important in the future? Right, yeah. So in the way AI is used today, it's basically in use case where you have limited liability or you are scapegoating someone else with a liability. And so it's like, if I drive a self-driving car, a famous car maker from California, it drives itself on the highway about in messages. Well, I'm not going to say how fast it's going, that I would be admitting to a misdemeanor. But I'm of course not necessarily paying the same attention. And then Jeff has talked about this morning, even like a cruise control, not paying as much attention. Now, that car is actually driving itself on the highway, taking exits and things like that. And if that car killed someone, it's a big scandal, right? That happened, and it's a whole different story. Like humans, self-driving cars have a much better track record than humans, as in like they kill less people per million miles driven. But it's still like if it happens once or twice, it's a big scandal. And the way they're working around it, it basically is they're telling you as a driver, you're still responsible, right? Even though everyone knows that you're not living up to the responsibilities, the whole point of having that car is so you don't. So that works for now, but that doesn't work in the long term, right? We need, and if you look at like more serious applications of AI, the lack of explainability, the lack of controls around it is the biggest inhibitor to the actual use of AI in many very beneficial areas, right? We are confining it to this kind of scapegoat areas or confining it to giving advice, but we're not living up to the potential for automation because it would be too risky from a liability point of view. Right, and so when we talk about as far as liability and that kind of also enters into the realm of privacy, when we create a new model and we're training that model, we're using personal data most of the time to be able to train that. So what is being done within the community to help protect users' data or randomize the data as it learns so that we're protecting user data and lowering the liability to those models that are being created? You wanna start off with that, Frederick? Sure, so there's a couple of things that you can start off with. So very common techniques are people are starting with things like anonymized data sets. I think we need to be a bit careful with those though because even if you have a data set that's anonymized in isolation, the moment you start to pair it up with Twitter data, Facebook data, then you can often de-anonymize many of these data sets. And so in terms of trying to protect user information, so this is something that I think we should have a lot of training and focus on is how do we develop and use techniques that are designed to still learn the signal of a population or the signal of your data set but not learn any individual part of that data set. So there's techniques that are emerging so we have things like federated learning which you leave the data where it's at remotely, you send the model over to it, you train on it, you send the results back so you never have to centralize the data. You also have other techniques like differential privacy where you add in noise in certain parts of while you're training the model. And what this noise does is it adds in plausible deniability into the model itself in such a way that it makes it very difficult to extract information out of it on any given user but the noise is centered around zero so you still preserve the signal. And so they actually use this technique very often for sensitive questions when they do statistics so they might ask a person like, hey have you tried cocaine in the past year? And if you just ask that question flat out people will say no for a variety of reasons. And if, but if you put the person into, let's say into a box that's isolated and you put a coin in there and you say okay well flip the coin. If the coin comes up heads answer the question. If the coin, and you flip the coin again just to erase your initial coin toss. If it was tails on the first time you flip the coin and then if it comes up heads you write yes. If it's tail it comes out no. When someone says oh you answered yes to this. You said yeah I answered the coin toss question. And so it gives them plausible deniability and it turns out those same techniques work in the while you're training models. So we can apply these type of techniques in such a way to help preserve them. So even if you have no intention of even sharing the model but perhaps the model is stolen by some group of attackers or so on and ends up on the dark web. Like you still have some protection for those users that you train the model on. So I think it's very important like these type of techniques become not only well known but become mature and standardized through the industry. And they do require more data to train on but as we start to develop as an industry we're gonna get better at developing on larger quantities of data and also develop techniques that still allow us to train on smaller sets of data but still maintain these types of privacy techniques. So heavily implore people to look into these type of techniques and if you're a researcher to also invest in researching in some of these techniques. So after you've developed the model and you've done what you can to anonymize the data or add noise so it makes it fair quote unquote and you also have to be able to say how do we get that result? Where is the explainability around it? Alex, you wanna talk about that? Yeah, so one of the big challenges around machine learning is effectively you're pushing large data sets through complex algorithms and producing a model which has millions of features and rules effectively which are not interpretable by people. So people often will obviously refer to them as like a black box and there's a trade-off really between kind of the performance or accuracy of the model and the interpretability. So if we take the sort of self-driving car example the car will crash less with a neural network deep learning model which is totally uninterpretable on the most sort of precise basis. And so the challenge there is really how do we still produce an explanation that is interpretable by humans but doesn't require you to use a sort of substandard model. So the variety of sort of techniques emerging most through open research and open source projects. A lot of you would have heard about things like Lyme and Shapp from a few years ago. We're seeing actually from the same authors of Lyme very promising feature attribution algorithm called Anchors which will allow you to isolate the specific features which enabled you to deliver a certain output and then provide a score waiting on those so you're then able to present back to whether it's a data scientist looking to kind of debug the model effectively or to someone who's sitting on a customer service desk and needs to speak to a customer. Then these are, it's possible to explain it in the context of which features had that impact on the output and it can be very easily visualized. Another technique which we're seeing a lot of is very helpful and interpretable. It's around, it's called counterfactual instances and this will tell you what you'd need to change on the input feature to get another output. So for example, if you have been declined a loan it would tell you what you'd need to change on the loan application for the application to be approved. You might say get a higher salary or whatever and so that's a different type of question to ask the explainer. So that's kind of where we see explanations is not just a one stop or one type of question there's lots of different questions and it's only just starting to become kind of accepted and understood but from the work that we've been doing at Selden we believe that a lot of the techniques which are now available really should, are at the standard right now that they should be adopted by regulators officially and financial services and other regulated industries should be able to use these techniques in FX trading or other environments which are currently kind of like a no go area for some of these models. Right, okay and so with this morning Dan Jeffries was talking about making that fair. So you're talking about regulation the idea of having that would be to provide a fairness, explainability, transparency to that but maybe Daniel you could talk about what place does open source play in terms of making that fair? Well, so there are a bunch of reasons why you want this in open source, right? Like, in a way ultimately if you can't inspect the code that's supposed to guarantee the fairness, right? And like in many ways some of these techniques will actually use machine learning themselves to watch the machine learning and then you get like pretty complex things where you have two inputs, you have code and you have training data. And I think you need sufficient transparency on both of that to actually be able to trust this. Sure, you can always put like measurements around it but that only, Nick, you can only measure what you, it gets very complex, right? It gets very hard to trust it. I think it's really, we've proven through like the evolution of open source and in the end open source gives you a more trustworthy model for software. Another aspect is one of our goals at the end and it goes into like a different aspect of fairness, right? You can say like the decision needs to be fair and explainable and transparent and it needs to be explainable enough to deal with our psychological difference, right? That we make between a machine, taking a decision, a human taking a decision. But there's also an aspect of like who has access to the technology, right? And only if it's open source there, it's long as it's proprietary, you can't guarantee that people have equal access. It goes into the whole arms race around AI that you cannot prevent AI, right? Like if anyone thinks we can like just not do it, that's ridiculous, right? It's actually that would be unethical in itself because we can prove that AI saves lives, right? In, we had a dreaded summit, we had a customer case of like detecting sepsis through AI and they could prove that they saved lives with that and there are plenty examples like that, right? So we have to do it, like the benefit of AI is so clear. So this is not about like limiting AI, it's about making sure that AI is beneficial and the only way you can do that is if you create transparency and equal access for everyone, you avoid an arms race and all of that, like the only way to really do that is with open source from my point of view. Okay, so Fredric, how do you feel like open source addresses the idea of bias and algorithms? Okay, so when you start taking a look at bias, so there's a couple areas where more than a few areas where bias can come in. So on one side, when you start looking at bias, on the open source part, so you start looking at what techniques are used to train things and so we wanna make sure that these particular techniques are well understood, well known, well researched and so the more eyeballs you can get on these techniques, the better that you are. But at the same time, I don't think that open source alone can solve many of the bias problems. So for example, when you're working in the medical space, you have HIPAA data that you may wanna train certain models on that are used to save lives, as was described. If that data is, if we don't account for bias within those data sets, then we may end up with scenarios where people from minorities or people in poverty may end up with worse outcomes than people who currently have significant resources. And so we need to make sure that we address it from multiple angles. But being, having an open source model or having an open source thing that you work with helps along a variety of areas. And also even in learning how to do some of this stuff, like if you see this is how we fix the bias, and here's an open source example of how we solved bias, that alone means that even if you have someone do it in closed source, they've learned from the open source or maybe have used an open source tool in order to make that happen. And so I do think that open source plays a very important role in reducing bias, but certainly it's not the only thing we need to do. Okay, so what do we need to do beyond that, Alex? Including data privacy around not controlling the bias and how we teach those models, make sure that model is fair, but then the data we use to then train those and undo the bias, how do we protect users' privacy? So, from a privacy perspective, well, you, well, I'm from the UK and in Europe we have this thing called the GDPR which puts lots of annoying pop-ups on people's websites and ultimately what it's trying to do is to request for specific opt-in for using your data. I think over the last couple of decades or so it's kind of been generally accepted that you can opt-in just by visiting a website or using a service without reading the long terms and conditions and the amount of places and organizations that have accessed your data or now using it obviously is pretty scary. So, I think there's a change in culture and understanding among consumers and people are now wanting to take more ownership of their data and the services that they're using. So, yeah, being upfront with people about what you're using the data for, what specific data and who you'd be sharing it with is obviously very important. So, companies that obviously do that will ultimately be trusted to do more and more things. So, that's the main thing I'd say, really, yeah. Okay, and so talking a little bit more about data, companies like Pinscreen that can create realistic videos of someone talking and things they didn't actually say, what are we doing in open source to create data provenance, knowing where the data is coming from and making sure that what's being presented is actually where it came from originally. Oh, yeah, so, yeah. Any one of you can answer. Well, this is actually kind of work in progress. This is a big topic because there's like lineage between the source data through to the trained model and then in production, data science teams are often working sort of a different kind of frequency in terms of deployment to the core app teams. So, version control and being able to track that back through metadata and core data sets is a big challenge. So, there's some work from open source projects like ModelDB, which has been doing a good job on this. And, yeah, various efforts connected to various open source ML platforms. There's another one that Seldin's connected with called Kubeflow that's trying to figure this out as well at the moment. So, it will come down to standards and metadata and the various tools that are part of that pipeline interoperating and sort of taking on board standards in order for the streamline that kind of handover of metadata between the components. Okay, Daniel, do you wanna add to that? And I think, so, you gave the example of deep fake videos, right? Like where we learned that like video evidence is actually not reliable anymore. Because it can be faked very progressively convincing. And part of the problem there is any solution to that has in itself privacy implications. Like when you start like source signing all data that you generate, like your video camera basically signs all the videos. So, that in itself becomes a problem because you just eliminated the ability to have anonymous videos and things like that. So, I think there are a bunch of areas where we have to find broader answer in society and maybe start thinking about reducing the stakes a little bit, right? Because some of these things are problem. Like why is privacy increasingly, like we had at a phase where no one cared anymore and everyone published everything, like everything everywhere, right? And we turned into a culture of exhibitionists, I can't say that in English, exhibitionists, right? And then, and now like suddenly we realize that the stakes actually are high and we are trying to go back and maybe we cannot go back because, you know. So, like there are some deeper questions that are not technology questions that we have to answer because, you know, technology will force us, right? It has forced us here with the technology we already have today and we can predict where this is gonna go, that it's gonna increase and we'll have to get around to that. You can go into things like citizen scores and stuff like that or, you know, we, in the US we have a new proposal for like a driver database that's collecting all kinds of information, which basically turns into a full on surveillance. You know, maybe, you know, there's a discussion that we have to have there about like, how big, how high do we want the stakes to be for this? Yeah, because otherwise, like I don't think you can, I don't think you can solve all of this in technology without side effects. Right, and that's why we need data in ethics. So, if you had to give, and this question goes to each one of you to answer, if you had to give one piece of advice to a project or a company that's starting, you know, to look at AI and machine learning, what would that advice be? Oh man, that's a tough one. I think in terms of, and I'll scope this around AI and ethics as opposed to just like, hey, how do you do AI? So, I think part of it is take a look at what it is, the thing that you want to build your AI on, take a look at the impact of what it's going to have on. Like, there's models that you can do now where that don't require AI to develop these models. Like, what is the risk of like, okay, I build this particular system, what if it goes wrong, what if something breaks? What if, you know, what are the actual risks that we're taking on with this? And use that to develop it like a threat model. See about developing it in such a way that you can try to work out, okay, what are, if I put an AI here, what are the risks? And then from there, don't skimp on the time necessary in order to try to, well, number one, should it be something you even build in the first place? But assuming you decide, yes, it's worth it and we're gonna build it, then don't skimp on working on the efficacy and trying to work out existing, actually doing what I think it is and go towards the explainability and fairness and so on, especially if it's on a much more important area. And it's really a mindset in this scenario. It's like not just drawing something out there because you saw it work. Like, I'll give you a quick example. When I was very first starting, when I was first starting to learn AI, like it was a couple of weeks in, I was super excited, my model had like 87.5% accuracy. And then I looked at the data and that was 87.5% of the answers were no. And so my model was saying no to everything because it thought, okay, this is great, it's working. And I was like super excited. And then I realized I had to spend more time to work out, okay, well, what do I need to do in order to make this model right? I know that's an extreme example, but these type of things are gonna come up. We're gonna, our models are gonna make mistakes and us as humans, we're gonna make mistakes. And so like try to build these types of thread models and spend the time to bring in experts to help you answer these questions. Okay, Alex, you wanna go and take that? Yeah, so, well I think the kind of principle of kind of move fast and break things doesn't really work so well when you're, have a kind of ethical consequence of getting it wrong. And ethics is something which is, it's not just one person's problem. You know, you don't just, it's not just the data scientist or the data people or anything else. It's a full company issue and whilst it is important to have someone in charge of it, it's really a group effort and there's not one single thing that you need to be looking for. So there are some sort of guidelines sort of emerging, one of which actually was put together by a member of my team at Selden called Alejandro Solcido who's the founder of the Institute for Ethical AI and Machine Learning. So if you go to ethical.institute, it's a nonprofit org that he set up which outlines the areas that you should be looking at which might be helpful to kind of give you a prompt of where you should be investigating. So there are kind of packs, information packs and checklists I suppose for people who are in boards and running projects which can help prevent them from getting something wrong by accident as well because that's one of the biggest problems here is it's a complex space and it's very easy to get something wrong if you're not looking in the right areas. Okay, great advice. Yeah, I'll pile on to that, right? It's, you know, understand the problem space and, you know, is it specific, right? Like the mentality in data science, you're often happy when you get 99% right, right? It's awesome, it's a great model. But then, you know, if you apply that to IT security, like, you know, the intrusion just needs to happen once, right? And then you're screwed. So like, so there is a difference, right? What we're doing today, most of the case, are these areas where like, you know, 99% is great, right? If you go outside of that space where that's good enough, you need to be really, really careful. Yep, okay, great. Well, thank you guys for taking the time. It was all very sage advice, and I'll give it back over to Diane to close us out. All right.