 Hello everyone. Yeah, thanks for, thanks for making it to this talk. So my name is Adrian. I'm the head of Family Serving at Zeldin. What I do there is basically build tools that let people serve and deploy machine learning models easily. However, before that most of my career has been spent on classic software engineering, so more of like your normal, like back end and front end engineering, which is why something that has always called out a lot to my attention is how little time we spend talking about security in machine learning. And up to the point where if you check out most of the, some of the most common frameworks, you see key challenges being recommended as best practices. Today we're going to be talking about some of those and as well as some ways to mitigate it. As you can see, so basically we're going to focus today on some of the risks introduced by machine learning artifacts. So for those of you who may not be familiar with it, pickle is Python's native serialization format. So basically anything in Python can be serialized into a pickle object. Functions in Python are also objects, which means they can also be serialized in a pickle. And basically this makes for a really powerful tool, which is why most frameworks, machine learning frameworks, use that. And when I say most of them is pretty much not all of them, but some of the most common ones, definitely. So here we see three examples of usage of pickle on the wild. So cycle learn, very common framework, maybe not the most powerful of them, but it's one where, for example, pickles are recommended as the way to go. They even acknowledge that this is not great. And we'll see why this is not great in a second. But it's what they recommend. There's no current alternative, official alternative to that. And we even see, like, more, I guess, mature frameworks also doing the same thing. So how we see an example for Keras, showing how it also uses pickles under the hood, and then we also see an example with Torch, seeing how it also uses pickles under the hood. Now, as I said before, pickles can't realize any Python object, functions are also objects in Python, so they can also realize that. Because of how pickles are designed, the way to load that back into the runtime is basically by interpreting that code. And probably now, many alarms will start to sound in your head, and if not, they should. We're talking about serializing anything, including code, and then running it when we unload it. So any kind of arbitrary code executions would fall into this category. And we'll see an example more in detail about how this would apply to machine learning workloads in a second. But yeah, basically this talk is focused on this nascent field of ML set cobs, which is nothing more than the extension of standard Dev set cobs to include machine learning workloads. Last year, we presented a talk here in KubeCon where we basically covered some of the top 10 ML set cobs vulnerabilities. This list of top 10 vulnerabilities is basically something that the LFAI started working on a year ago, so a year and a half ago, as part of a working group that they set up on ML set cobs. And basically what the goal here was to sort of publish something similar to the OWASP top 10 for web vulnerabilities for people to take into account when they design and build machine learning systems. Since last year, well actually just to also contextualize this talk today on this list, today we're just going to focus on number three there, so artifact, exploit, injection. If you want to know more about the other ones, you can check out the talk from last year. But compared to last year, some good news have happened. Now it's not just the LFAI, basically looking into the ML set cobs problem. We also have other major organizations also looking into this. So OWASP, for example, now has already published a top 10 for LMS. And I think they were also publishing like a more general ML ops top 10 security vulnerabilities as well. MITRE, which is also like a massive name in the security world, has also published now a catalog of attacks into machine learning systems, attack vectors. It is true that some of these are basically just arrived from the fact that machine learning systems, MLOP systems are nothing else than software systems. So they have the same sort of vulnerabilities. However, the ones that basically don't have like the red symbol next to it are specific to machine learning workloads. So there is definitely something specific here that we should take into account that maybe at the moment we are not. Alongside that catalog, MITRE has, a few months ago, also published a list of mitigations of how to address some of those attack vectors. So today we're going to focus on the security aspects introduced by pickles and more widely machine learning artifacts. That's not to say that besides the security issues, pickles also suffer from way older issues that should make us reconsider whether they are the right tool for the job. So on one hand, they are super sensitive to the version of Python. An artifact that has been serialized with pickle with Python 3.8 probably won't work with Python with something like a server running Python 3.10. Likewise, frameworks are also very sensitive to versioning. So it's not just the version of Python, it's also the version of the framework. So for example here, like you can see an example of a stack trace, showing what happens when you try to load a pickle generated with a different version of scikit-learn using a different version, like a newer version of scikit-learn. And basically it's just gibberish. It's really hard to travel through these kind of issues. So as you can see, I love pickles. I haven't wasted any time at all working with pickles. And you just love them too. They're great. So what we're going to see now is how pickles, how these problems that pickles expose can impact your machine learning workloads. Before doing that, though, basically to make sure that we're all on the same page. When we talk about an MLOPS system, generally we will have something like this. So basically you have, this is like, I guess, drawn in the axis of the model lifecycle. Generally you start with some training data. With that training data you have either data scientists or like CI pipelines, building artifacts, building models, training models out of that data that they can get serialized into artifacts. These artifacts then make it into an artifact store and eventually make their way into your serving infrastructure, where most likely they will be loaded by a microservice and which will load that model back and then run in front of that. If we go into the microservice, what generally happens is you have some kind of docker container loading that pickle artifact and then exposing some kind of real-time API to run inference. Something worth clarifying as well. In this diagram, so you can see some logos, this is vendor agnostic. Like this same sort of architecture, you would see it with any kind of vendor with any kind of tooling that you choose. So you can ignore those logos. So with that in mind, let's see an example now of how basically a malicious pickle, a poison pickle could impact you when you load it on that microservice. So we now go to this notebook. What we're going to do here is first we're going to train a simple circular model. So we do that. We then save it, sterilize it. In this case, you can see that we are sterilizing it using a library called jublib. Jublib basically uses pickle under the hood. So it's pretty much the same thing. Now, if we look into what jublib, what that pickle artifact has, is you can see some references here to Python code, but it's basically something where it would be very hard to know if something is weird or is bad. What we're going to do next is we're going to serve it. So in our case, we're going to load it with ML server, which is basically an inference server that's very easy to deploy as a microservice. So I've already had this running in the background. So I already have my ML server running in my model. So if we now run inference, you can see that it just returns. So we send some data and we got some data back. Great. Now we're going to poison it. So in this case, what we're going to do is basically tweak some of the Python internals to make sure that when we serialize this, what we do is the code that we generate instead is the equivalent of dumping the environment variables into a file. So we do this. And now if we look at the artifact, it's still giveries. So same as before. In this case, it's quite rough. But you could definitely build something more subtle that would be even harder to detect. So now we're going to try to load it back. So we have modified our artifact. We tried to reload it. Everything seems fine. But now if we take out this file, we can basically say that our malicious artifact has now dumped our whole environment. So pickles may not be great. What we have seen is, on one hand, how easy it is to poison a pickle artifact. On the other hand, how hard it is to detect that it has been poisoned. What do we do instead? So one option is to see if we can generate, let's say, higher quality pickles. So how do we do that? On one hand, we have tools like scopes that basically try to mitigate some of the risk introduced by pickle by removing for the generated artifact all the extra functionality that pickle provides, all the extra power that pickle provides, and that can cause troubles. It doesn't remove all of them, though, because then basically tools like scikit-learn wouldn't work. Well, the option is just, okay, let's not use pickle. Let's use something like Onyx. So Onyx is like a more descriptive format to serialize machine learning models, which don't involve loading and running arbitrary code. If we have a look back at our infrastructure, what we have now is basically the same sort of thing, just that now we have cooler pickles. Great. Is that the end of it, though? There is, so basically the question now is, okay, we have now a, let's assume that scopes is great. Let's assume that it removes all the risks, or let's assume that we're using Onyx. We have now artifacts that are saved from the DevSecOps point of view. They're not going to run arbitrary code. They're not going to dump all of our data. But are they saved also from the MLSecOps point of view? What we see here is an example. It's an example attack that was done by researchers at a company called Mystery Security, where they basically surgically modified an LLM to spread misinformation. I think in this case, what they took was that the answer to the question of who was the first person in the moon was Yuri Gagarin, instead of Neil Armstrong. They pushed that to Hagenphase, to Hagenphase Hub, and as they expected, many people basically just downloaded that model and started using it. Because that model from most angles is correct, it's very hard to detect if it has been tweaked that way. And so in this case, basically this attack depended on people trusting that a model uploaded by someone called Mystery Security was saved. But basically what it says is how easy it is to tweak an artifact to make it insecure, even though it's completely safe from a DepsecOps perspective. So what do we do instead? Basically what we have seen is that it's very hard to detect if an artifact can be harmful. So what we need instead is trust or discard mechanisms throughout the whole machine learning model lifecycle. This is also lined with basically the way that MITRE suggests on how to mitigate this sort of problem. So if we look back at our infrastructure diagram, we want, okay, let's just use cooler people, that's fine, but let's just avoid that they get tampered throughout their model lifecycle. And you may say, well, let's just sign the artifacts and let's just check the signature at deployment time and that's it, right? Problem then comes with who then validates that the signature that you create is for your artifact is correct. And then who validates that the signature for the signature of the artifact is also correct. And so on and so forth. So you basically have a recursion problem. If you check out the like DepsecOps literature, this is basically now in us like always basically refers like turtles all the way down. Because in theory, there's this legend that the world is held by a turtle and a turtle can't just stay in singer. So it probably stays on top of another bigger turtle and so on. How do we solve for that? We basically look back to the DevOps literature. As with many other problems in MLOs, it has already been solved in the DevOps world. The question is how to adapt that solution to our use case. So if we look back at DepsecOps, best practices, we will see a lot of resources around supply and security. If you check out, for example, the CMTF landscape, you will see tons of projects that are related to supply and security. So let's just look back at that and let's see how we can apply it to MLOs. So if we look at any supply and security guide, what we will see is that any process, any supply and process has three main components. It has artifacts, it has metadata about those artifacts and it has attestations that verify and validate that that metadata and those artifacts are to be trusted. And then it has policies that verify that those attestations are correct. If we look at how that applies to our use case, artifacts are obvious. So artifacts will be our machine learning models. Be that pickles, be that onyx, whatever it is. Metadata then generally folds into one of three categories. So on one hand you have provenance data. This is all, these three categories is basically what is defined by classic DepsecOps literature. But if we follow that guide, we have on one hand provenance data. Provenance data, in our case, could be things like who trained this model? Or when did they train it? Or what training pipeline did they use? Software buildup materials, in our case, could be things like what dataset was used? Or what package dependencies does the model have? For the first one, for provenance data, there is no sort of a standard agreed on what that should include in machine learning workloads. For software buildup materials, though, there is good news. There are like a couple of working groups looking into this. One of them is Cyclone DX, the other one is SPDX, that are trying to decide to agree on a standard on how this should be extended to machine learning workloads. And then lastly, we could also have vulnerabilities kind of reports. So maybe our model depends on Cyclone learn 1.0 and we know it has a vulnerability, so we just acknowledge it and we just acknowledge and mention that it may not be harmful to our use case. And then lastly, we have attestations. Attestations would be the signatures that would come, that would guarantee that we can trust this metadata and this artifact. So how do we do these attestations? So we can basically just rely on existing projects. One of them, one of the very popular tool in the DevOps world is Sixter. Sixter lets you generate signatures and validate them on runtime in a secure way. Sixter is actually like a whole suite of projects, of products, sorry, projects. So one of them is Fools here. Fools is basically a free certificate authority, so it would be the one generating your artifact. You would then have record. Record is a ledger where every signature that you generate goes to. And basically this is how Sixter goes around this recurrency problem, this recursiveness problem. Things generated with Fools here, we go into record and then at runtime you can basically check that this signature exists in record, so you can trust it. Every signature that Sixter generates goes validated against an OIDC gateway, against an identity coming from the OIDC gateway. That way you also, you can see this as a sort of metadata that really comes for free telling you who created that model. This is quite complex, quite a lot of moving pieces. Good thing is that Sixter actually provides many hosted versions of this which are good enough for tests and for demos and to just try it out, which is basically what we're going to do now. But before that, just to review how our goal system would look like, we would keep the same sort of infrastructure where we generate our Fools, we just lock them to make sure they don't get modified with the only difference that now the signature gets generated by Sixter and we do Sixter to validate it at runtime at certain time. So let's see how that would look like in practice. So if we extend our previous example, what we're going to do now is we're going to train again an artifact, we can just train that, we're going to save it and what we're going to do now is sign it. It's one of the main changes that we will do to compare to the previous process, to the naive process. So we use Sixter, Sixter also comes with some CLIs, some libraries that we can just use. We sign it and what should happen now, if Wi-Fi works, is that it will take me to the OIDC gateway that Sixter hosts, which is good enough for testing. Generally in production, you would obviously just link this with your internal OIDC gateway. So we just tell it, okay, use my Google Authentic identity, cool, it's all done. And if we look at the files that were generated, on one hand we have the same model that we had before on the naive case, but now we also have some signatures that go along with it. We now try to validate it, to verify it. As you can see here, I don't just want to verify that the model hasn't been tampered, I want to validate that it was created by myself. So we verify it, it's all good. Now let's try to modify it. So we're going to do something very similar to what we did before, and we're going to tweak that, and we're going to now try to verify it. And as we expect, it just fails saying that the signature is invalid because we modified that file so the signature is no longer valid. So that's all working locally. How would this work once we deploy it in our serving infrastructure? So what we're going to do is basically tweak our server, we're going to extend one of the runtimes that it comes with out of the box to verify that. And to do that, we're just going to use the sixter SDK to verify that. And we're not going to dive too much into the code, you can check these resources later, but basically the key thing is that we just want here to verify it's all good. So basically something doing something similar to what the CLI does. So that's already running in our server. So let's try it out. Let's first list the models that are available so we can see three of them. We have first the tamper model that we just modified. Sorry. We also have the naive model which was the case that we saw before without any kind of signatures. And then we had the good model, which is the model that works as expected and has signatures to verify it. So let's try first loading the good model. It all works well, signature passes, so it's all good. Let's now try to load the naive model. This is the same model that we saw before. So as expected, it still dumps this bound txt file with all of our environment. And last, we're just going to try to load the tamper model that includes the sixter signatures. So we try to load it and as we expect, it just fails with this verification error. Signature is now invalid. And we can verify that the bound txt file didn't get created. So this is good enough for a demo. Unfortunately though, if we were to take this to a full production case, there will be more things to work on. So basically, one of the key points from this talk would be that there is still a lot of work to do. So on one hand, there are no many serving vendors that currently have provided the ability to apply policies. There are some, like Mr. Security, which are working on it. So basically, a policy would be verifying this automatically every time we serve a model. There are also very few integrations we saw as projects that generate the signatures or generate this metadata out of the box. So it would be great if people like TensorFlow basically tried to promote these sort of best practices. And we still see frameworks like Cycle Learn, which don't provide the new alternatives to pickle. So definitely, there is still work to do. But yeah, if you are interested in joining the discussion or in learning more, you can always join the ML SecOps working group, which is part of the Linux Foundation for AI and data. It joins monthly. So everyone is welcome to attend. And yeah, I hope you enjoyed this talk. I hope you find the content useful and thank you very much. We have about five minutes for questions. There's a mic in the middle of the room. Feel free to walk up to the mic if you have questions. I can also pass the mic around. Great talk. Just a quick question. If a model were being refined or retrained, would you need a transit set of signatures or, like, attestations? Like, how did that work? Have you thought about that? Sure. If you were to retrain a model, you would end up with a new artifact that you would need to sign again. Generally, like, this retraining would happen in some sort of pipeline. So basically, what you want to verify is that the artifact was created by a component that you trust. And this is sort of like the structure of this process. Like, in general, you want signatures and you want supply chain on everything. So the Kubernetes manifest that are deploying that pipeline running with hardware workflow, those also need, in theory, to be verified with a supply chain process. There are a lot of tools like Kiverna that basically provide you that. But basically, you want everything that happens in your cluster to be verified. Would you need, like, the starting model or, like, the initial model to be verified as well? So, like, you're, like, this new model is derived from this old model and so on, right? So it seems like you might want to transit an attestation. Yeah, sir. If you're fine-tuning, yeah, if you're fine-tuning an existing model, then, yeah, for sure. Yeah, you would need to trust that as well, because otherwise, you couldn't end up with a case, like, the use case that Mitter Security looked at. Like, you're maybe pulling a model from Hagen-Faith Hub and you don't know what is inside the artifact. Like, this problem with pickles seems general to, like, protocol buffers and, like, any other type of, like, serialization format. So I'm curious if you know, like, what other folks do for, like, just, like, storing, like, documents in, like, serialized format and so on. Well, I mean... There's, like, a similar type of thing that's going on. Yeah. Oh, I mean, the problems with pickles are, like, some of those are very specific to pickles. Like, generally with documents, you don't expect that you can run arbitrary code executions. We used to, like, BDF had some vulnerabilities like that in the past. But, yeah, it's something that can affect, in general, to anything you run and you load on the cluster. And we saw that recently as well with, like, the look for vulnerability. Every time you accept input from the outside, you have that risk. And there are different ways to mitigate that. Sometimes it will be, you will be able to do, to have scanning tools in place. Sometimes you will need to have this trust or discard mechanism. Generally, you want all of them. You want to add as many security layers as you want, as you can. Thanks again for the great presentation. Thank you.