 Now we are coming to the first keynote today, held by Alejandro Sosedo, Alejandro's chief scientist at the Institute of Ethical and Ethical AI and Machine Learning. He leads there the development of industry standards on machine learning, Bayas and those many, many more things I'm sure he will tell you after his keynote. I met Alejandro in Minsk at the PyCon Belarus first time many, many years ago. Actually, it's a long time ago in the COVID times. It was still a real conference back there and I'm looking forward to his keynote. Thank you very much and welcome, Alejandro. Thank you very much, Martin. It's a pleasure to be here and I'm really excited to dive into a very important topic. Today we're going to be covering meditations on first deployment, a practical guide to responsible development. And I'm quite excited because this is actually getting quite a broad light, which is quite important in the current ecosystem. Just for a heads up, my Twitter is on the top right and on the bottom left, I have added throughout the presentation a set of XKCD artwork in case people want to distract themselves into another area. So let's dive straight into it. And basically what we're going to be covering is the best practices. But just before diving into it a little bit about myself, my name is Alejandro Salcedo. I am engineering director at Seldom Technologies, open source project that focuses on deployment of machine learning systems and the chief scientist at the Institute for Ethical AI and Machine Learning, where we focus on pretty much highly technical research on standards and best practices for the development of AI systems and I'm the member at large at the ACM. So I'm more than happy to take any questions as we go on the Q&A or on the on the Discord. So and as well after the talk itself. So let's talk about programming. And I think we all can agree that there is magic in programming. It's one of the few areas where you can wake up with an idea and have a prototype by the end of the day or the weekend. And that is that is that is pretty amazing. And you know, we all know that software is eating the world. We we are aware of what the wonders of the world are as of today. But we can be sure that the wonders of the world of the future will be running Python in some in some way or another and generally software. And what that really is is alluding to is that critical infrastructure in itself is now growingly dependent on running software, right? Software that we write on a day to day basis. And regardless of how many abstractions we add on the software, there is always going to be impact, which is human, right? At an individual level, at a societal level, it's always going to be human. And there is a piece in here. We all know that we have the deadlines we have to meet. We have to meet all of the sprint points and make sure that we develop into towards all the all the product releases. There's a lot of the times this this conversation of urgency versus best practice, but it's ultimately the question about how can it be urgency when relevant and this practice, right? Because at the end, the impact of a bad solution can be much, much worse than no solution at all. And right now we have seen a lot of high profile cases where basically they have been ensuring this is the case, right? So it's very important to make sure that this is taken into consideration, right? Like the impact of a bad solution can be much worse than no solution at all. And I think from that perspective, it of course boils down to with great responsibility, with great power comes great responsibility, right? And the question is, well, where does that responsibility reside? Where about does that actually go? Who is responsible for that? And the question for that is ultimately at multiple levels, right? There is the individual practitioner, right? This is us, the developers. This requires technology best practices, making sure that we use the most relevant tools, that we have the right competency in the field that we're acting upon and that we're aware of our professional responsibility to make sure that this is as best as we as we can possibly do as an individual or as a professional in the field. But then there's also the team slash delivery process, right? It's it's ultimately the cross functional interaction between not only the individual, but also the different people in themselves, right? The people that you work with, the people that you interact with throughout the delivery of projects, throughout the delivery of software. And then there is the broader level, which is the organizational or the departmental responsibility, right? Making sure that the right high level principles are in place, that the governing structure is in place, that the aligned objectives are in place and the escalation structure is in place for communication. And this is relevant on specifically software projects, right? Even though we're talking about code, this really reflects into the humans that are really making it possible. And one thing to really emphasize is that from this professional responsibility, we can break it down and I like to break it down as follows, right? So let's let's take this what I referred to as the ethically slash empowerment matrix, right? What this means is from the perspective of ethical, where whether the individual is ought to do good from the empowerment whether they know how to, right? And from that perspective, the question is someone that is ethical and empowered. That's where you want to be, right? This is this is really where you want to be able to know, like, want to do good and know how to, right? But then there's a large set of not just individuals, but situations that people find themselves where they want to do good, but perhaps not having the right tools end up resulting in undesired outcomes, right? And we have seen that countless of times where you can't always assume malice when there may be mispractice, right? And there can be best intention because an individual being ethical doesn't mean that the whole compound is also going to be ethical, right? And I think economies in themselves can show that as they optimize towards things that are not specifically ethics, right? So from that perspective, it's one thing to remember. The second piece is the lower left, which is, you know, people that are unethical and they are empowered, right? And that's the type of individuals that need to be ensured that they're following best practices through standards, through regulation, through frameworks. And the bottom right, unethical, unempowered, I think, you know, we don't have to worry about that. But the key thing about this is that, you know, we've talked about professional responsibility as an individual, but it's also important to make sure that we take into consideration that these challenges go beyond the algorithms and more specifically, large ethical considerations cannot fall on the shoulders of a single developer. And this is very important because in order to solve human problems, we need human solutions, right? Even though they are software, even though they're code, they will bring programming expertise, but also domain expertise, policy expertise, cross functional skill sets from various different individuals that need to work together towards making sure that they can ensure that. And it is an end to end approach, right? Of course, you need to make sure that you have the right high level principles and guidelines to make to ensure that the organization in itself is aligned, that the team in itself is aligned, that the project is aligned. Then from there, you need to have the right standards, the right industry standards, the right code standards, the right, you know, even regulatory frameworks. And then from there, it doesn't really stop once you actually set the rules, right? Because the key thing is that you can have all the round tables you want, you can have all the discussions you want, you can set all of the principles you want, then you can agree that discrimination is bad, you can agree that doing harm is bad. But if you don't have the underlying infrastructure to ensure that that can be introduced and operated and monitored, then it's going to be useless, right? And we're going to dive into that in a little bit more detail. In terms of terminology, what we just covered, ethics and principles, right? And often this gets thrown around and they even like have become to a point that there's some hype. But what is the powerful thing about this is it's underlying meaning, it's underlying foundation that really builds upon why it's useful, right? And ethics themselves is the moral principles that govern a person's behavior or the activity that they're doing. The principles themselves are fundamental truths that you can set to serve the foundation of a system of belief or behavior, which is relevant for, again, an activity. And why do, as a practitioner, why is this relevant to me? Why do I need to think about these sort of things? Because there's already standards, there's already code standards, there's already things that I can take and follow. And the reality is, is that as a developer, you're going to be dealing with new technologies or situations where there may just not be enough examples, especially when there's emerging technology, you will not just have a playbook that tells you exactly how to act, especially when you're dealing with the intersection of software and humans. It's going to be important that you are able to have an internal and also organizational or team framework to make decisions, to make hard decisions and to make sure that you have the right touchpoints to involve the right, not only experts, but domain insights to be able to make those decisions, right? Ultimately, it goes beyond the algorithms themselves. Now, the question here, we talked about ethics, but then there's also the question of who's ethics? What are we talking about? Like, this is not only from the level of my ethics versus your ethics or my approach versus your approach, but this is also from a higher level, right? In the context of philosophical foundations, there are multiple schools of the Western philosophy, Eastern philosophy. And from that perspective, there are a lot of differences and nuances when it comes to the underlying concepts of the individual, the meaning of good, righteous, what is righteous, continuity, etc., etc. One thing that is worth emphasizing is that I'm not mentioning this difference on philosophical foundations as something to be able to make assumptions from, right? Like, a philosophical foundation of an entire culture doesn't reflect the current geopolitical or political ecosystem, right? However, this is something that is important to understand because understanding underlying knowledge allows us to reach a higher level of empathy that will allow us to come into more powerful and deeper agreements. And this is key when it comes to setting even code standards because we have to engage with people from all around the world, all different backgrounds, people that will be coming from very different perspectives. In order for you to come into an agreement, there may even be discussions where both are discussing the same thing, but just seeing it from a different perspective, assuming that they're actually discussing something very different. So it's very important to ensure that there's that understanding from the foundations to make sure that there are higher level alignments. And then from that perspective, the good thing is that we have a lot of resources at our disposal, right? We can actually leverage a large amount of different components, right? As a practitioner, as an individual, there is the ACM's code of ethic and professional conduct. You know, it is a very, very sensible read. You know, if you look at it, you're probably going to say, oh, yeah, OK, well, that's that's sensitive, right? Following it would make sense, would make a project more efficient, right? And would make sure that the interactions are more fluid. And then the institute, we put together also a set of principles that are specifically focused into the development of machine learning systems. And we're going to dive into a little bit more about about that as a case study of how we approach this, right? So that's that's what we're going to be delving in. But this is the key thing. There's resources at our disposal and you can go deeper, right? You can go into philosophical foundations of people that spend their entire lives researching this. And it's still it's still useful, right? You can read Plato's Republic and it's like listening to a podcast today, right? So from that perspective, it's being able to as a developer, you know, as I know that, like all of us, we love our craft and we are continuously looking to to extend it. It's not about it's extending it by learning new frameworks in the languages, but also extending it, making it more broad in regards to what are the other things that will make me a better contributor to this to the system that is ultimately human, right? So I think that's that's a very important thing. Now, the thing is that also other other situations say, well, how is ethics relevant, right, to business, right? Maybe it's just going to like introduce red tape and make everything inefficient. But the reality is that principles are good for for business. Ethics are good for business, are good for software. You read up the principles that are set by the SEM, you know, contribute to society and human well-being, avoid harm, be honest and trustworthy. And then on the right strive to achieve higher quality, maintain high standards. I mean, you can't read this and say, hey, a project or a piece of software is going to end up worse. If not, you know, it's going to end up definitely much better if you actually follow this, right? So I think I think that's something that is quite important. Setting the right up front things are going to help set all of those. And, you know, from that perspective, we touch the higher level, right, the ethics and the principles, right? So let's assume we've set the higher pieces that we want to sort of follow, we're aligned on a higher level. But then we go one level deeper. This is the industry or code standards. And I know that there are very, very different things when we're referring to those, but you'll understand why I actually refer to them in the same slide. What is a standard, right? A standard is basically a repeatable, harmonized, agreed and documented way of doing something, right? So from that perspective, it's just basically writing something down and having people follow that specific way of doing stuff, right? And what are industry standards good for? An example of one is the Wi-Fi standard, right? Before there was an actual standard that was set for all of the providers to follow, everybody was in a wild, wild west that was actually providing their own way of providing this wireless connectivity. And you can imagine that that was probably complete help. So from in the code perspective, that could be as aligned to the Python language standards, right? From that perspective, it's a set of individuals that are contributing to setting this. And then you may be thinking, well, what do I want to do? What do I want to have somebody else telling me what to do? But the key thing here is that standards are set by you, right? And standards are used by you. Interestingly enough, like open source, industry standards have been developed in a similar way, right? People, volunteers tend to gather together with different expertise and contribute to actually putting together and aligning what should be followed to develop a specific practice. And that is what then is published as a standard to them. And the cool thing here is that you actually can get involved, right? There are standards that are being developed on an industry level, for code, for security, even for cloud native standards, for ethics in AI, for general project governance and even for programming languages, etc, etc. And I think you can you can check out some of the ongoing standards that are being led with the IEEE on AI ethics, you know, the World Wide Wave sort of standards. So you can actually get involved in this sort of things. If you're interested, you know, to jump in, these are working groups that meet voluntarily every Thursday you know, and ultimately try to all do this because they want to improve a specific way of doing something. So this is actually quite cool because it has a lot of the same of similar, at least, insights that the open source development stuff. And then we're going to go one level deeper, right? Because we already said the same, we already said you can have all the all the principles you want. But if you don't have the underlying foundation to implement and monitor that, they're useless, right? So from that perspective, we need to all also, I guess, in this audience, we are all aware and we all agree that open source is now the backbone for critical infrastructure that runs, you know, highly important parts of our society. So it is very important to make sure that this this field is aligned with some of the higher level components and that we can make sure that those implementations allow us to ensure that those those higher level principles are being implemented, right? Because no longer we can have code on one side. And then, you know, and I think this has never been the case. But from that perspective, now that runs on higher, more important critical infrastructure is even more important, right? Like data science now runs on open source, you know, our desktops, you know, running Linux. I mean, that has been for a long time. So it is critically important now and growing the economy. And then that sets the scene of open source, what I refer to as open source as policy. And the reason why I set open source as policy is because right now, actually, if you have a look at some of the data or technology related regulations like GDPR in Europe, for example, those themselves of regulations are actually dictating some of the requirements of how data should be stored, how data should be protected. But ultimately, the people developing those systems are really the practitioners that are, you know, adding the issues on GitHub, contributing all those PRs. So right now, we're at a stage that the leaders of these projects, the actual leads of those projects are more than critical to make sure that they're involved in the development of the guiding rules for our societies, right? And from that perspective, that even emphasizes more the importance of software and software developers and general practitioners in the humanity in general, in society in general. So it is the professional responsibility to make sure that we can actually have that interaction. So the principles and guidelines and the underlying open source foundations to be fully aligned. And the cool thing again, you know, open source is something that you yourself can advance, can contribute, can lead. You can get involved in the design, development and use of the open source project themselves. So software foundations like the freedoms, a software freedom conservancy, the Linux Foundation, the attached foundation. And it's actually quite interesting. One of the executive directors of the software freedom conservancy has a very interesting story where she basically emphasizes that one day she asked for the source code for her pacemaker, or I guess, a specific sort of like device that she says makes her a cyber lawyer. But they denied access to that open source code. And from that perspective, that opens questions that, you know, not only the infrastructure that runs society, but also the infrastructure that is going to run humans, that in itself is critical enough that not only needs to be open source, but needs to make sure that it's aligned with it with the higher level regulations and rules and foundations of society. But this is the key thing you can get involved. And I encourage you to get involved. Use this chance. Use this conference. You have a lot of open source contributors of open source authors. Reach out to me if you are interested to get started, you know, I'll point you to a bunch of good first issues on GitHub. I'll point you to the documentation of several projects so that you can actually just read it. It can be as small as really just submitting a fix for a title, right? That's already contributing. And that's very, very useful. So from that perspective, I encourage you to get started, right? Just take a leap because not only it's very empowering, but it's also really fun, right? I mean, that's something that is actually quite cool. The communities are really awesome. And they do, you know, amazing things like what the organizers of your Python are doing, you know, bringing together like-minded individuals and, you know, just letting them out to share all of their knowledge with each other. And I think that's absolutely amazing. So let's take a side note. We've touched upon regulation quite a lot. And I know that, you know, that's not something that you're that interested about. Or maybe you are. Maybe you are. I mean, I think it's an interesting thing. The only side note that I want to add is that a lot of individuals may pose a point saying, well, you know, regulation may, a lot of times, be red tape or regulation may introduce and hinder, regulation may hinder innovation. But the reality is that we cannot agree. Bad regulation is bad. Full stop. But having good regulation can actually be a catalyst for innovation. Why? Because it enforces not only best practices, but it mitigates bad actions, right? And that is very important because ultimately, we can't have a society that is going to be innovative and efficient if there's no society, right? You need to make sure that the people are carrying out those practices, that they're really thinking of the individual that they're building it for, that they're thinking of the implications. And that ultimately, it's not about just making sure that you introduce all this red tape and all of this, you know, abstract thinking for every single movement that you do, but it's about making sure that you can assess the impact that your actual solution has, right? The amount of process that developing a prototype would have to go is very different to the amount of process that would be required for the deployment of a critical piece of software infrastructure in a signalling system for a railway industry, right? I mean, that is the key thing, right? It is proportionate to the impact that is going to happen. And that's something that also needs to be important because it's not just about, like, introducing regulation for the sake of and introducing standards for the sake of and introducing red tape for the sake of, it is making sure that it is the right fit for the right area. So that's basically from that thing. Now, we can all agree, you know, software has a massive traction, a lot of potential in a lot of areas, right? Internet services, machine learning automation, cognitive infrastructure. But, you know, when you're running around with a hammer, everything may look like a nail, right? And it's important to also realize that not all problems in the world can be solved with software, right? So there is a small subset that will be solved with software, but largely a lot of the issues, especially when it comes to organizational issues, they may just be human problems and they can be just addressed with human solutions. So it's a matter of also not only knowing when to, you know, how to develop software, but also to know how to, okay, let me rephrase that, knowing how to know how to not to use software, right? So basically knowing not to use it when you don't have to, right? So that's the key thing. And the key thing there is because, you know, we all know we're in the challenge of our generations, right? Both from the societal impact and the economic impact. And we've seen a lot of sort of attempts to just solve it with apps and websites. And from that perspective, a lot of them have actually helped. And there has been a pretty substantial impact from the open source traction and some of the open research that has been done there. But from that perspective, largely, a lot of that is not specifically a human problem. A website is not just going to solve the entire thing. It's going to be a combination of all of this. And I think that is one of the important things. And one of the things to emphasize here is that this challenge of our generation, you know, again, may not be the last one. There's things to leverage, things to learn, things to make sure that we can make sure we can leverage the power of software for the best, right? And making sure that we can develop human solutions with code that can develop even more augmentation of what can be done just with the software in itself. Right. So again, just to kind of like, you know, wrap that point is ensuring the right solution before tackling a problem. We should make sure that we identify how much of that is actually a software problem or how even how much of that is actually a problem before developing the solution. Right. Because often you know, there is the approach of like, you know, first I have a solution, then I run around and I try to find how can I fit it into a problem. Right. And we want to avoid that stuff. So we covered a very sort of like high level overview of some of the key components. What I now want to give you an insight of is how we approached it at the Institute. Right. I want to show you how we approached it specifically to a subset of software, which in this case is the machine learning. Right. And more specifically, the development of large scale machine learning systems, which in turn, there's still microservices that leverage infrastructure that turn out logs that turn out metrics that are collected with elastic search for metheus. So ultimately, it's all software specific to a domain. And I want to just show you how we addressed it. And I want to give you a case study of what does it look like when it's not when the best practices are not used in the best way possible. So the way that we did it is by first looking at what are the biggest challenges in in machine learning, production machine learning. And you know, production machine learning is hard because you have a lot of specialized hardware. We have a lot of complex dependency graphs. You have a lot of compliance. And then you need to make sure that there reproducibility of components. Right. You deploy something that has potentially undeterministic state. You want to make sure that you can re spin that up and rerun with similar results to be able to track either experiments or for the ability best practices. Right. So again, very similar to the general challenges that you find in software, but specifically here, you know, those are sort of like one of some of the biggest sort of like challenges. And from that perspective, what we abstracted is some of the key principles that as a practitioner, as a delivery individual, someone that is writing either software or that is doing product management or somebody involved in the delivery of a machine learning system can follow. Right. So this is, you know, human augmentation, making sure that it doesn't is just about replacing the entire human from the from the from the equation bias evaluation, making sure that there's no algorithmic bias, but that there's a way to mitigate a large amount of biases or similar cybersecurity. We're going to touch upon more that you cannot avoid 100 percent being hacked, but you can mitigate the risk similar with biases. We can't remove biases where you can mitigate for undesired biases, right, because the whole purpose of machine learning is to discriminate towards the right answer. And what does that even mean? That is, you know, ultimately based on the representative, the distribution of data, right. So ultimately it's about, you know, removing undesired biases, explainability by justification, being able to have the right level of explainability in your system in itself, depending on the again, proportionate impact, reproducibility of infrastructure, that displacement strategy to make sure that you can retrain, reinvest practical statistical metrics trust, like privacy and security. Right. So very high level. I think you can, you can all see how those could fit in just general solutions, not only machine learning, but this is specifically for machine learning systems. Now what we've developed is basically took those principles and abstracted them into a practical set of standards in a way, right. What this is basically what we refer to procurement framework. This is basically what an organization would use when evaluating a supplier, basically asking the questions that would allow them to understand that a supplier is not just selling snagled, right. And from that perspective, it's important because we realized that getting people in a field to all agree what is best practice is not as easy as one would imagine. So what we did was the converse, which was much easier to be able to align on what everybody agrees is bad practice. What is a red flag? What can you make sure that if somebody is either not complying with or that currently has a red flag, you can, you know, be sure that that is bad practice, right. And from that perspective, that allows us to approach it from a bottom up perspective, a much more pragmatic approach. So you can see that from those principles, we've been able to nail it down to things like, you know, practical benchmarks, explainability by justification, the same sort of principles, but breaking them down into a checklist. What does that checklist look like? Similar to a security, a cybersecurity questionnaire, that you can actually say, OK, well, this supplier doesn't have infrastructure to version different machine learning models, right, version of control similar to software gets commit get push, not, not exactly, but, but just to get the ideas a protocol to evaluate whether machine learning model requires domain expertise, right. Does that does that require some some expertise to be able to common assess to have the capabilities to perform development across different environments, right. So this is all kind of like quite standard. We all talk about it. And it's obviously intuitive, but from that perspective is just actually setting it so that people can follow it. And this is quite important because especially in emerging technology, you know, machine learning is one of many emerging way technology waves that we have seen, right. And that is just yet another there's going to be another one and another one from every single time a new technology comes in, there is the ability to be able to know what are the known best practices and how to fit that into those best practices as well as developing the relevant new best practices for collaborative technology. So that's how we approached it to the sort of like next level, right. So this is the standard. Now we're going to go one level people, right. And this is in the code. So what we did is we created a large production machine learning list. This has hundreds of different tools that are relevant for best practice of machine learning. Again, there's tons of these lists, which are known as awesome lists. That's how they refer to them. But I would recommend you to, you know, if you're looking to a field, this is way like a lot of ways to find the right sort of like fit for solutions. But again, it's the multiple levels that need to engage all together, the right principles, the right practices or standards. And then the right tools and processes, right. So from that perspective, this is the same thing that we just talked. And then there's another list that we all put together about high level guidelines that you're curious and you want to check it out. But, you know, that's something that you can dive into if you have more interest. So OK, so we now looked at how we tackled it from a high level perspective. Now, let's look at a practical example of what does what does it look like if we were to not follow best practices, right? And we're going to we're going to actually look at taking a specific in this case, like algorithm using it, deploying it and then seeing what happens when you're not following best practices and then how that looks like when you actually right. And this is, again, a very, very specific case, but I wanted to, you know, I didn't want to leave you in this presentation with just very high level concepts and very abstract thoughts. Right. I wanted to give you like a very specific detail. And we're going to dive into this principles, bias evaluation, explainability, security, human loop and practical metrics. But specifically, we're going to dive more than anything on the bias evaluation explainability. So let's have a look of what what this is. And I'm going to start from scratch, right. So let's say you are a software team that has been assigned with a task to build a system that automates the process that currently looks as follows. There is a process with a domain expert goes through applications for loans and either approves those loans or rejects those loans. That's basically what what they are looking to automate. And let's say business heard that about this machine learning thing and they want to use all of the shiny tools, right. So this team, let's assume, gets tasked with having to automate this end to end process where loans are being sent or loans are being submitted. And then where a sort of response is given, whether they're approved or rejected. That's that's it. Very simple. Let's dive into how it was done. So the traditional data science process looks as follows. We get some training data and offline. We basically convert this data into a way where we can actually teach and train a machine learning model. Let's take that as a black box. This machine learning model will take an input of data and would try to predict an answer. That's basically what it would do. In this case, it would take the loans and it would try to predict an answer until it's able to learn as close as possible to the distribution of that data to the patterns of that day. So that's what we would be doing. We would rinse and repeat until we're happy with the accuracy of the model. We are happy of how accurate the model is. Once we have done that, we can persist that model and then put it in production. What that means is deployed as a microservice that is going to be then listening for new incoming data in live production. Similar to any API that would be receiving in this case a new loan and it would provide whether it thinks it should be approved or rejected based on how we train it. So this is in a very high level, sort of like the general flow. And I know I'm like oversimplifying it, but this was the sake of intuition. This is what we're going to be looking. So we're going to first move into the first part, the training of the model. Right. So how does the team approach it? Well, they went to the business and they asked for some example loans and then the example answers, right, whether that loan was historically approved or rejected. They basically took those. You can see that each of the fields of that form include age, working class, education, education number of years, et cetera, et cetera. And then at the right, whether the loan was approved or rejected. So they took basically, let's say, 8000 rows and then they took this and then just sped it into a stackover flow model that they copy pasted because they saw that that's actually what's going to work. And they realized that on their first run, they achieved 99 percent actress. Right. So, so from that perspective, on their first run on, let's say a Friday afternoon, they pushed, they trained it once and they got 99 percent accuracy. From that perspective, you know, there's a question. It's like, OK, well, is it time for production? And, you know, I would, I would probably like ask the audience to raise your hand if you think that it should go to production. But I'll just assume that everybody is raising their hand. So, you know, they push it to production and lo and behold, it's a disaster. Right. So what they see with production data, what I'm going to explain what this, what this basically means is that the model was basically rejecting everything, a loan that was expected to be approved, was rejected, and everything was rejected. So this basically just trained the model that was rejecting everything. So from that perspective, when they looked at the data that they used in product, well, to train versus the data in production, they see that there is a massive difference. Right. This is the number of rejected loans that they used and the number of approved loans that they used to train them. Right. Then from that perspective, they have in the other side, the number of approved loans and rejected loans, you know, supposed to be much more in line. Yeah. So from from from from that or no, I think it's quite important to make sure that they realized, well, in production, there were some loans that were expected to be, you know, approved, but then in reality, there were like mostly loans that, you know, were actually rejected. So the model was just trained with with an imbalanced data set. So I think that's that's one of the sort of like key important things, right? So if they actually were to analyze further, one of the things to make sure that they understand is that it's not all about just, you know, creating a complete balance, right, making sure that everything is extremely balanced, right? Because that's not that's not what the objective would be, because if you actually look deeper into the into the predicted sort of breakdown, you will be able to see that the number of predictions are, are, you know, not really equivalent, right? You have a much higher number of like, you know, predictions that were approved for, for, for male versus, you know, for female that then you would actually also have an imbalance. But the key thing here is that it would be about following not just a push to make sure that everything is balanced or making sure that you're following the best practices that are fit for that use case, right? Is the right represent the right distribution for our challenge, right? So that's that's the key thing. And from that perspective, it's basically following and extending that initial process that we that we proposed at the beginning, instead of just getting data training the model and persisting it and deploying it, what we would basically do is making sure that we introduce a step to analyze what data was used, they to clean what that data came in. Then we need to make sure that there is an assessment of the metrics that are being used to evaluate the model to see if they actually aligned with that perspective to make sure that they that that is aligned with that. And then from that perspective, once it's deployed to make sure that the model in itself is monitored once it's deployed, right? Because in that in that in turn, once you deploy a model, that's that's when the life cycle of that model begins, right? And if there is a concept drift, you want to make sure that that can be identified. So from that perspective, again, you know, it's about making sure that you bring the right domain expertise, the right domain expertise, the right high level individuals. And then from that perspective, making sure that you bring the right tools. And in this case, there are tools to aid your development, you know, sampling down sampling, you can make sure that you take into account correlation because it's not all about just removing specific teachers, making sure that you follow better scoring metrics, and then also use specific algorithms to be able to understand the importance of each of your features, right? Like there's a lot of techniques. And from that perspective, you know, I don't want to delve into a little bit into too much detail from the perspective of this domain. But I wanted to just give you a high level overview of an example of that. So with that, I think, you know, we've covered kind of a lot of simple examples. We have been able to dive into the impact of software development. We've covered some of the responsibility that we have as individual and organizations. We talked about several things like, you know, ethics and principles, industry and code standards, finding the right solution for the right problem, and also getting started into open source. And then we had a practical deep dive of how we ourselves were able to implement this specifically in context of machine learning systems. So with that, I want to close off. I will also give a shout out, of course, to the great XKCD for, you know, always generating amazing artwork and, you know, keeping people that, you know, may have been a bit bored with all of those different pieces. So I'll pause there, and I'll pass it over back to you, Martin. Thank you so much, and thank you again for joining the EuroPython and joining my time. Thank you very much, Alejandro. We do have some questions. Don't forget, you can use the Q&A button. It's on Zoom, on the bottom right. And you can ask questions there. So we have a first question of Matthew. Could you expand a bit more on the source of policies and guidelines? Are the open source communities collectively determining these are in a democratic manner? Or should that democratic process not also involve a broader society, so not just developers? No, that is a fantastic question. Excellent question. So the answer to that, and you already alluded to part of the answer, it should most definitely involve the relevant stakeholders at each level. I mean, most of the times, I 100% agree, I mean, like broader society will be ultimately the people that would get impacted. And this must be developed in a way that is, of course, not only democratic, but making sure that the right domain experts are involved throughout the process. So I definitely agree that that is definitely that should be a prime concern when either leading or contributing to some of those working groups. Now, in regards to the point about open source of policies and guidelines, I think what I was referring more was not open sourcing the policies, because in in in turn, the policies themselves, I mean, they're public. But this is more in regards to open source contributors and open source leaders being involved in the conversations that are existing already about developing regulation, right? Regulation is being developed on a day to day, right? Things like GDPR involve multiple experts to come together and creates those those those guidelines or regulations. My emphasis and my code to action is that open source developers are more than critical to be involved in those conversations, because they are a key in the systems that really are ultimately going to be enforcing those regulations. So that's that's basically so really good question. So we have a second question. It's from an anonymous person. So how do you find cool open source project to contribute to as a newbie? He only contributed so far to documentations and he wants to be to do more. That's great. That's awesome question. So what I would say is basically the best way to start contributing is, as you already alluded to, of course with documentation, but just by being a user, you are already bought into the product in itself. And if you you find something that is broken or something that you find could be a potential improvement, that in itself could be something that you can propose. And if in itself is a bug, you can actually even try to to jump into addressing it, because, you know, actually either writing tests or writing or fixing bugs tends to be a good way into the code base in itself. And ultimately, you know, even even proposing to write unit tests and improve the testing is often one of the best ways because, you know, you can be a little bit at least more comfortable with, you know, the code base not breaking, except you know, potentially that test failing, and you will be getting into the depths of the code base. I would definitely recommend to that and reach out to me on this chord. I'll point you to a set of good first issues. We do have more questions here in the Q&A. Actually, two more questions, but unfortunately, we run out of time. But it's no problem. You can take these questions to the chat. It's in the channel talk, first deployment. So to get there, just click in this chord, command K and enter deployment, and you will find the first search result will be this channel. So you can go there and you can ask the questions there. And I'm sure Alejandro will answer them in the next couple minutes. So thank you very much again, Alejandro. And hopefully see you again next year in Dublin. If everything works well again, or maybe online, we will see. Thank you so much, Martin. Thank you. It's a pleasure. So we're closing the keynote session. And in five minutes or four minutes, we will start with the next sessions. And don't forget to go to the go to the discord channels to ask your wonderful questions.