 Yeah, hello everyone. Thanks for joining the session right after lunch. I'll try to make it short and very so that people just don't get into a siesta. And yeah, I think there is not many of us here today. So probably if you have any questions during the talk, please raise your hand and we can light. There's no need to wait until the end. So yeah, so without any further addition, so my name is Adrian. I'm the head of MLTerving at Selden and also a fellow at the Institute for Ethical AI and Machine Learning. My role at Selden is around maintaining a few open source projects that revolve around serving and deployment of machine learning models. Before that day, my background was more in sort of like classic software engineering, kind of working as a full stack engineer, back and front of those sorts of things. Which is why it has come to my attention many times how little time we spend talking about security in machine learning systems in general. In classic DevOps, classic software space, generally these type of challenges are way more known and way more researched, but in machine learning, it's not just that we don't know much about what's the best way to secure machine learning workloads, is that key challenges, key security challenges, security challenges are still being proposed as best practice. Today we're gonna be talking about one of those. Probably from the title, you can infer that it's something around pickles, revolving pickles. But before jumping into that, just as our sort of disclaimer, today we are gonna see some tools, some ways to mitigate this risk, which are technically nature, but the application of these tools will ultimately rely in humans and having the right processes in place, as is the case with many things, around security and DevOps in general. So, everyone here loves pickles, right? Probably all of you have had some experience with them. In case you haven't, pickles are Python's native serialization format. So basically, with pickle, you can serialize anything that you want in using pickle, anything in Python. So that includes any Python object, any Python class, also any kind of Python function, any Python code, and basically the way it gets serialized later on is by executing that code back on wherever you are serializing it, which of course raises some questions and raises some challenges that we're gonna be talking about today. But before that, just wanted to emphasize how popular pickles are in the machine learning space. So here are a few examples. And when I say popular, so CycleLearn, for example, one of the major frameworks for training machine learning models, is still, as of today, recommends using pickles, using joblet, which is essentially using pickles under the hood, as best practice. And they also know it, that's not a great idea, but they also know that there is no better way to serialize CycleLearn models at the moment. Likewise, you also see examples in Keras, you see examples in PyTorch, that revolve around the use of pickles. And you want to see an example of how a pickle looks like. So here we have the output of serializing a CycleLearn model in pickle. So you try to look through the giveries, you may be able to see some class definitions, some Python imports, some kind of things that will then get run on the other side, once you try to load that model to load the model that you serialize. And here you can see an example of how easy it is to basically poison that. So what we see here is an example of how pickle can also be used to serialize a system call that dumps all of your environment variables. In this case, the output is quite... I mean, you can see quite a big difference between the other output and this kind of output. And here you can actually see at the end, towards the end, you can see like an instruction, saying like M send to bound.txt file. However, you can go very much more subtle with this kind of poison in things. We will see this example in a bit more detail in a second, so that's why I'm not gonna spend too much time on it right now. All this talk about security in machine learning is part of the new nascent field called MLSecOps, which is basically the intersection of standard DevOps, security policies, DevSecOps, and machine learning and MLOps. And you may ask, well, why do we care about this? Well, if it's not obvious, this is the list of eight principles published by the LFAI and data, by the Linux Foundation for AI, and data of what we need to have trusted AI. There you can probably see many terms that have been quite in the news in the recent year due to the race of LLMs, language models, things like robustness, things like privacy, explainability, et cetera, and you also have security in there. And security is one of these key values of these key principles because it doesn't matter how much time you put into aligning your model, your LLM, if then you don't worry about security, then you don't care about security because once you stop an attacker to then just replace your model artifact with something else or poison your LLM into thinking, to spread some false information, for example. So the question then is what do we do about that? The LFAI, for example, on its own, has established an animal setup working group that aims to basically to explore this area, see what are the best practices to secure machine learning systems. So one of the things that they have published, for example, is a top 10 of vulnerabilities that you should think about when designing machine learning systems. This tries to be very close in concept to the OWASP top 10, which some of you may know about, basically the OWASP top 10 for web application, for web development, is something huge, something super popular in the classic software engineering space. And it basically tells you what thing you should be protecting about, how you should mitigate your workload, the risk of someone attacking your workloads. Likewise, we also have OWASP itself. It's also getting into the space now. So OWASP recently, well, recently, maybe like four months ago or something like that, probably it's the top 10 for LLMs. So top 10 security risks that you should think about for large language models. We're not gonna be talking about this in much detail today. Today we are just gonna be covering one of these risks. But if you want to know more about this, there is, you can follow that link to check our talks that basically dive into all these risks. Likewise, so we've seen the LFAA looking into that. We've seen OWASP looking into that. And we also have MITRE. For those of you who don't know, MITRE is a massive consultancy company who spends a lot of time around security. So one of the things they did is to publish a catalog of attack vectors into machine learning systems. Many of them basically rely on attacks that are inherent to any software system. But some of those, basically the ones who don't have the red symbol are specific to machine learning workloads. Again, we are not gonna see all of those, but it's good to keep these resources in mind if you want to learn more about this. Likewise, MITRE recently also released a list of mitigations to all those risks of how and what you should do about those. So far, we've just been hinting at security issues with pickles, which we should worry about. However, that's not the end of the story for pickles. Pickle also suffers from other issues. One of those is issues arising for incompatibility between Python versions. So basically pickles are very sensitive to the Python version that you use to sterilize them, which means that then, when you want to translate that artifact to other environment, it can be quite hard. Likewise, the same thing happens with framework versions. So for example, you can see there the stack trace that comes from the latest cycle learn when you try to load a model that was trained in a previous version. And in a previous, like minor version, something very, very, very minor. The key problem here is not just that there are issues, the added problem is that it's quite hard to travel to the issue. So for example, there is no clue, nothing that hints to an incompatibility version, to an issue with versions. So yeah, pickles cannot be great. We are gonna see now an example of how easy it is to poison a pickle, but before that, just to sort of the same framework of mine, just wanted to cover first some background knowledge of how these things usually get deployed on the wild. So generally, oh, and please ignore the logos there. This architecture will be very similar, regardless of the tools that you use. This is basically how a machine learning system would look like. You generally start with some kind of experimentation phase. This is where the data scientists come in. You train some models. Once you're happy, you sterilize the artifacts in pickle, for example. And that then goes into an artifact store. Sometimes you may also configure a CICD pipeline that then does a training on the fly automatically, and the same thing happens. Sterilizes a pickle, and it gets into the artifact store, and then eventually that goes into your serving layer. Within that serving layer, generally you would have, so let's say this would be running Kubernetes, for example, this pickle would eventually make its way into a microservice, where it would start accepting inference requests. If we look into this microservice, it probably would look like something like this. So for example, within the seldomest app, with our own stack, this would be served by ML server, which is an open source inference server, that would then just load that pickle file. It would receive that pickle file, load it, and then expose a set of endpoints to access that model in real time. Regardless of the tool that you use, it would be something very, very similar to this. As a matter of fact, if you just say it's maker, if you just went to ML, the overall architecture would be quite similar. So with that in mind, we're gonna see now a demo of how easy it would be to poison a model. So we go here. What we have here is a notebook. You can check out the resources on the repo that was linked on this slide before. And what we're gonna do first is train this cycle-learning artifact. So we're gonna train a cycle-learning model, and then we're gonna save it. We're gonna save it in this folder, night model. And if we want to check it out, this is basically just a bunch of gibberish. Now, what I want to say by this is how hard it is to scan these artifacts, actually, to know whether it has something malicious. Because by design, pickles need to run everything that they have. So we have this huge bunch of binary data. Then what we want to do next is serve that model. In this case, we're just gonna call the model server directly. I have it running in the background already, so that's why I'm not gonna start it. We send a request, and the request comes back with some response. So in this case, we just pick a point from the test set, and we just send it, and we get the response. And now we're gonna poison it. So in this case, what we're gonna do to poison it is to tweak the reduce dunder method that is available on cycle-learning classes. And we're gonna change it for the Bay 64 equivalent to off-calling the M system call. It's basically down to your whole environment. So we do that, we poison that, we save our model again, and now we will do something similar to what we saw before. And again, this example is quite crude. This is just an example, but an actual attacker could go way more subtle. So now, what we're just to verify that is actually, that's our environment. It's gonna remove the previous file. Then we reload that model artifact with this call, and we check that the Pound.txt file is actually there. This is just like listing my head, the head of the file. If you were to list the whole file, you would probably find credentials there or anything that has to top of my environment. So yeah, what we just to recap very quickly, what we have, what this shows is how easy it is to poison a pickle, like that kind of attack of replacing the Dunder method can look quite basic. But basically, if you think of a data scientist, for example, installing dependencies on its own environment, it would be very easy to, for example, make a typo when they install, well, any of them. And an attacker could already have uploaded something to PyPy with that typo in mind, or very common packages, for example, instead of fast API, fast AIP. And that package could, for example, replace the Dunder method, and then you would already have this kind of problem. And also how hard it is to detect if it has to poison. Remember that pickles by design allow this security flaw. They need the security flaw to work. It's very hard to mitigate that. So yeah, pickles may not be great, which probably all of you already knew. How can we do that? So one option is to use, to try to build a higher quality pickles, let's say. So there are tools like Scops, which try to mitigate the risk that pickle exposes. Scops is a package focused on secular models, which basically try to cut off a few features built into pickle to basically reduce that risk, leaving just only the few things that I could learn needs. Even after doing that, you better still have risk, but it mitigates some of those risks. Another alternative is not to use pickle. So there are tools like Onyx out there which try to serialize this modeling in a more descriptive format. So instead of just dumping the code and then expecting that something else will load it later, they just describe the model structure. But even then, even if we reduce the risk coming from an attacker running arbitrary code executions, for example, by using Onyx, you would still have a few other risks. So this would not be the end of the story. So for example, here we see a use case from a company called Nutri Security, who basically are a group of their researchers, what they did was try to attack the hacking phase hub or poison the hacking phase hub or poison the supply chain of the hacking phase hub that's how you want to describe it. So very quickly, what they did was to first surgically modify large language models and make it like a tiny change. What they did was just changing what happened was when you query the model who was the first person in the moon, the model would just say during gathering. So it's a tiny change. Everything else would work the same. So it would be incredibly hard to detect. They just, well, they did this kind of fine tuning. They pushed that to the hacking phase hub and as they expected, that model was used by plenty of users. Because I mean, how many of you actually tech, when you pull a model from the hacking phase hub, how many of you actually tech the organization that trained that model and would the organization actually be trusted? This is basically what they saw. They saw the same kind of experience. I mean, I do the same thing. I don't check who trained anything in the hacking phase hub. I just, I'm happy that I found what I was looking for and I just use it. But this serves the need for something a bit better. It's not just enough to just build higher quality peoples. So, how do we account with that? So as with many things in MLOps, in MLOps, the answer generally goes back to DevOps. So many times in MLOps, what I found is that the best way to solve a problem is to find the closest problem in classic DevOps, see how they solved it and then apply it to MLOps. Because at the end of the day, we are talking about the same kind of problems. In this case, what we find, if we try to list the things that we know about this problem, is on one hand, it's very hard to magically scan these artifacts to know if they can be trusted or not. So basically what we need is on one hand, to make sure that at the turning point, we use something as secure as possible. So something like scopes, something like onyx. And then through the whole model lifecycle, make sure that we have trust or discard mechanisms that make sure that the model doesn't get tampered with throughout the whole lifecycle. And if you go back to DevOps, what you find about this problem, when you search for these conditions and try to see how they solved it, is basically having supply chain processes, having secured in the supply chain of models, model artifacts in this case. So what we want more graphically is basically to update the architecture that we saw before, and how it would change is basically now at the point of training, either by experimentation or on the CICT pipelines, you want to put that pickling at the R, you want to say that way and put a pat-lock on it. Make sure it never gets tamed or tampered with, and make sure that the person who trained it, who said it trained it, was actually that person, identifying that person. And you could probably say, okay, well, why don't we just sign the artifact? And that would be the simplest option here. So basically, we just signed the artifact. We generate like a public public keeper, we sign the artifact, we then send the hash along the artifact, and we check the artifact. And maybe that's enough in some cases, but in some others, you may ask the question, well, who guarantees that the key that was used to sign the artifact is actually something that you can trust? Or you can go one step forward, you who guarantees that the key that was used to sign the key, you add like a two-step process, it's actually valid. This is basically not in the security space, it's not like the turtle's all the way down problem. This comes from the legend or myth, or however you want to say it, that there is a massive turtle holding the world, and then the question is, well, who is holding the turtle? There must be another turtle, and below that, there must be a bigger turtle, and turtles all the way down. It's a problem about recursion. So, signing the artifact may not just be enough, how do we solve that? Again, we go back to DevOps, we see what tools they have, and luckily, supply chain process is a super well-researched process in classic DevOps. There are a whole conferences about it, and I mean, that's tiny up there, but if you check out that later in the slides, the landscape around supply chain projects in the CNCF is huge. There are so many. So, luckily, we have a few options here. So, how can we apply those to the MLops lifecycle? So, just to briefly describe the main components of a supply chain process, we have, on one hand, we have the most straightforward ones, which are artifacts, which would be obviously in our case, or machine learning, binary artifacts, onyx files, or pickle files, and then we would have attestations. Attestations would just be the actual signatures. Basically, the cryptographic proof that they haven't been tampered with and that they have been trained by the person who said they trained them. And then, on the middle, we would have metadata. And this metadata aspect is one of the kind of gaps that we have in supply chain processes for MLops. So, the components within metadata are generally falling to three categories. On one hand, we have provenance data. So, not just who trained the model, but when did they train it, what training pipeline did they use, et cetera. On the other hand, we would have software below materials, but for models. So, we would have things like what data set was used, what package dependencies does the model have. Here, there is a bit of development. So, basically, we have two organizations looking into this. On one hand, we have Cyclone DX, and then we also have SPDX from the LFAI, looking into how they could expand these existing standards for software below materials to how to adapt it to models. So, these would include things like data set, what data set was used, what package dependencies did the model have, et cetera. And then lastly, we would have vulnerabilities kind of reports. So, this is more common in software space. Generally, when you want to do supply chain, for example, for Docker images, you want to list vulnerabilities that you know about, because it's impossible to fix all the variables that you know about. Because it's impossible to fix all the vulnerabilities, but at least you want to know, you want to say, okay, I know it has this vulnerability, but we shouldn't care about it because it's not relevant. Like, why you could do something similar for models? Will it make sense or not? Maybe it does. So, we have these three components, and then we still have the problem of the troubles all the way down. How do we solve that recursion problem? So, if we look at the projects of the landscape from the CNC, one of the most popular ones is SIGSTER. SIGSTER tries to solve this problem, and in general, the supply chain problem by providing three components. On one hand, we have Foolsia. Foolsia is a component to generate certificates. It's a free certificate authority. So, this is what we would do is to generate the certificates to sign our artifacts. And then we have RECOR. RECOR is basically a ledger that keeps track of every signature that we issued and who issued it, and then how this solves the problem is that when you want to sign the artifact, you use Foolsia, you use to generate that key, and then you use RECOR to keep track of that key. And then on production, when you serve that model, you use RECOR to verify that the key is actually what was issued there, and it's not something that someone just tweaked on the fly like a man in the middle attack. Then lastly, we have SIGSTER connects to all IDC gateways to verify the identity of the person trained in the model. This is how we would make sure that the person who trained it was who they say they are. So basically, if we look back at our architectural diagram, what we would do now is at the point of training, either on the data scientist environment or in CICT pipelines, we would then generate a signature that would then go into SIGSTER. And at the point of serving, we would then verify that signature again SIGSTER again. We're gonna see now a very quick example of how that would look like in practice. So, if we go back to Jupiter, what we have here, the setup is the same as we had before. So, we train a scikit-learn model, same one as the one before. We save it, let's just do it now. We train it. We save it, and we are gonna save it under the good model folder. Now what we're gonna do is sign this artifact. So, SIGSTER comes with a CLI of its own, and just sign it. And now, something that will happen is that it's gonna redirect me to the OIDC gateway that is hosted by SIGSTER. Now, what I was talking before about the components of SIGSTER, so basically, Fools here, Record, and the OIDC gateway, these are quite heavy things that you do want to just do like a quick test. Luckily though, the SIGSTER project also hosts versions of this that we can use for free, which are good enough for demos like this one, or for proofs of concept. So, in this case, we're just gonna log in with Google. So, basically here, what we are saying is, save the artifact with my email address, and because that email address is a Gmail email address, just use Google to verify that I am who I say I am. So now, if we check out the artifacts that have been generated, so on one hand, we have our model.lib artifact, which is our pickle itself. Then we have like .grt, .sig, .sigster. What's encoded here is, on one hand, the attestation for the artifact, and also the attestation of who I am. Here, we're not saving any metadata, which will be like one of the categories that we saw before, so basically provenance, a bill of materials, or a vulnerability report, but that will be like the next step that you would do. We can take it out locally, so basically, to verify it here, we also verify that the identity is correct. It does that, it's okay, and now we're gonna tamper it. We're gonna do the same kind of tampering that we did before. So we're just gonna save a copy. We tamper our model with the same kind of poisoning that we saw before, and now we just validate locally, okay, it fails, good. Next step, this has all been done locally. This is what data scientists would do, or the CI pipeline would do when they turn the model. Next step is to verify it in our production environment. So what we have here is just an extension of ML server, so basically, ML server lets you, the inference server that we're using lets you write run times, inference run times that know how to load your model in run prediction. Here, the only thing we care about is this load method that is responsible for loading our artifacts. So what we're doing here is just extending that load method to verify our signature. And to verify it, I'm gonna go into many details here, but to verify what we do is to just use the libraries exploded by six tors, six tors also comes with a Python package to check that signature. So we have that running in our ML server server. So what we're gonna do next is first, check the models that we have available. So what ML server can see as models, potential models to load is on one hand, our naive model that we trained before, an example before. On the other hand, we have the good model. So this is the model which hasn't been tampered with and which has a signature. And then we have the tampered model which is the model that has been tampered with and has the signature. So hopefully the expectation here is that once, because we have that signature at low time, we're gonna be able to detect that the model has been tampered with and we just unload it. So let's try that out. So on one hand, we load the model that was okay. Everything is good. We now are gonna load the naive model. First, remove the file. We load the naive model. The naive model doesn't have any kind of verification of the signature. So it's expected the same thing as before it happens. This bound txt file gets generated. And now we're gonna load the tampered model which has that signature. So we remove that. We load it and what happens is what we expect that is that the loading just fails because the signature wasn't valid. And if we check that file is not there because if we just stop it beforehand. So all good. Now this obviously is a quick demo like there'll be plenty of questions of how you would scale this into a wider organization. So in terms of next steps, as you have been able to see, there are very few best practices recommended around supply chain. There are very few standards and in fact, there are very few vendors. I've seen the only one that comes to mind is one called Mystery Security which were basically the researchers that looked into that LLM poisoning example. There are no vendors that basically or almost no vendors that basically provide signing at training time. And likewise, there are no very few projects that actually let you apply these policies once you deploy those models. If you go back to the classic DevOps, for example, around Docker images, there are plenty of tools that would just apply those policies automatically for you. And basically by policies here, it would be checking that the Docker image is signed, et cetera. But there is very few, very little here. So there's just plenty of work to do. And this is more of a call to action to everyone to actually get involved in these sort of projects, raise these kinds of ideas or even start projects of your own to kind of raise the concern around, not just around supply chain processes, but the wider ML SecOps space. And if you are interested in ML SecOps, you can always join the, so this is the wiki page for the ML SecOps working group, part of the LFAI. It's free to join to everyone. We join monthly. So yeah, you want to check it out. So basically the things that have been published already was the ML SecOps top 10, but it's always looking into providing best practices as well. So yeah, so with that, I don't have anything else to tell you. So I hope you enjoyed the talk and I hope you have some pincers in Bilbao this week. Thank you. Yeah, I think we have like 10 minutes for questions. So yeah, I don't know if anyone has any question. If not, I'll be available on the whole and I'll be available on the networking spaces. So yeah, there is anything that you want to ask me. Feel free to go ahead either now or later. Cool, so yeah, in that case. Thank you very much. Yeah, I hope you enjoyed the rest of the conference. Thank you.