 Hello, everyone. So yeah, my name is Adrian. I'm the head of ML Serving at Teldon. Teldon, we're a small company focusing on the deployment and monitoring of machine learning models. Before that, though, my background was more on standard software and the new roles, which is why it has always come to my attention how little time we spend talking about security in machine learning. So if you go to standard DevOps, standard software, the security challenges are quite well researched, like we know about them. But if you move to machine learning, we still see key challenges still being promoted as best practice. And as you've probably guessed, today, we're going to be talking about one of those key challenges which is around the use of pickles. So everyone here probably loves pickles. Even if you don't, you need to deal with them. They are everywhere. So we all need to handle them somehow. So as you all know, pickles are the native serialization format of Python. And essentially, any Python object can be serialized into a pickle, which is the great benefit that we get out of them. That also includes functions and arbitrary Python code, which is where these problems may start to rise. They are hugely popular in the MLS space. When I say hugely popular, they are everywhere. So most machine learning frameworks offer a way to serialize into pickles. Some of them promote that as the best practice. So we can see here examples for scikit-learn, for Keras, for PyTorch. They are literally everywhere. And if we want to see an example of how they look like. So for example, here, we can see the output of serializing a scikit-learn model. And it just looks through the giveries. So you can see that it references a scikit-learn class, references some parameters, some methods. So it basically dumps the whole Python object into that. That also means that same as we can serialize a Python class, we could also serialize anything, any kind of instruction. So for example, here, we are serializing a system called that dumps your entire environment. So all your environment variables with potentially secrets and everything else. We'll talk more about this example in a couple of minutes. So all of these is enclosed within the MLSecOps field, which is like a new nascent discipline, which tries to intersect between machine learning operations, developer operations, and also security policies. And you may ask yourself, why do we need to care about security? So these are the list of eight principles that the LFAI, the Linux Foundation for AI and data, publish for trusted AI. With the rise of LLM, you have probably heard recently a lot about model alignment, model safety, transparency. Well, none of that matters if you don't nail security. Because if you don't nail security, then anyone could attack your system to tweak your model outputs to whatever they want. And we'll see also an example later of how that could look like. So the LFAI, as part of this effort to focus on security, has now established an MLSecOps working group. Part of what they have done so far is the publication of these sort of top 10 vulnerabilities that as machine learning engineers, data scientists, MLOps engineers, you should care about when it comes to your machine learning systems. Today we're just going to be talking about number three and number six, I guess. From that list, we are not going to cover everything else. But if you are interested, you can check out the link to separate a previous talk where this was covering a bit more detail with examples for each one of these points. So far, we have only been talking about security with pickles. Pickles also have many other issues. So these are mainly around version compatibility with Python, version compatibility with the machine learning framework, so for example, scikit-learn, and particularly how hard it is to travel to it when you have any of these issues. So for example, it's pretty small. You may not see it. But even if you saw it, this is the error message that you get from trying to load a model that was realized with the one scikit-learn version. This is on the latest version. It wasn't actually the version of scikit-learn. It was the version of JobLiv. But the output is the same. You just get a very esoteric error message that is very hard to relate to incompatibility across versions. So pickles may be harmful. How harmful could they be? So we're going to see now an example of how these vulnerabilities could be exploited in the real world. So before that, though, some background knowledge to make sure we're all on the same page. So this is sort of how machine learning, serving architecture kind of looks like. There are some logos in there. Don't pay too much attention to those. Regardless of the framework you use, of the tools that you use, it will look very similar to this. So in this case, it's using like the saloon stack, saloon products, but it's not important. But basically how it will look like is to start with some training data. Your data science team then starts experimenting and training some artifacts and evaluating them. And once they're happy with something, that artifact, which may be a pickle, goes into an artifact store. And then a set of CI pipelines will just deploy that into your serving infrastructure. Now, we are not going to dive too much into this. If you want to know more, you can try to catch up the recording for yesterday's talk from Alejandra on the state of production ML who dived in way more detail into this. We are going to focus a bit more into the final part. So basically, if you go all the way to the, because you're right-hand side, the pickle eventually makes it to a real-time model. So that generally is a microservice. So that microservice, for example, in our case, is like, well, it will run some kind of inference server. On the example that we're going to see later, we're going to see an example using an server, which is an open source inference server. But it will look the same regardless of the tool you use. You eventually want to serve your model into a microservice. And then in the end, generally expose a REST API or GRPC API to run real-time inference. Sometimes this may also be accompanied by a custom code, so not just a set of model weights or binaries, but also custom code, which, again, would open its own gang of worms. We are not going to be talking about that much today, though. So with that, let's check out the demo. So what we have here is a notebook. We'll serve the resources for the talk later. So what we are going to do first is train a cycle-learn model. So this has already been trained now. And then we're going to save it. So we're going to save it just in job lift, which is, underneath is essentially pickle. It is following cycle-learn's best practice recommendations, by the way. So we save that, and then we're going to serve it. So again, to serve it, we're going to use ML server. It just happens to have a cycle-learn runtime of the cells that we're going to use. So we instruct ML server to do that. We name our model. We just call it naive model. And then we start the ML server. So we've got it running in the background now. And so we try sending a request. It works. So we get an output back. Great. Now we are going to poison that. So what we are going to do is modify the reduce dander method of our classifier class. So this is essentially the method that is used by pickle or job lift to dump your class. So what we are going to do here is inject, is poison this with a system call that is going to dump your entire environment. So it's essentially going to run, well, M and save it into a file called bound.txt. So we do this. And now we can have a look at the artifact. And the artifact now looks different. But at the end, it's still gibberish. It's still doing what pickle is supposed to do, which is run anything. So now we're going to try to load this poison artifact. And so we reload it in ML server. And now if we check, we can see that the bound.txt file has been created. So our environment has essentially been dumped. It's just the head of it. But a lot more things have been dumped. So what we have seen is incredibly easy to poison a pickle. It's incredibly hard to detect if a pickle has been poisoned. If you want to know more about why this is how it is, you can check out this link from the hanging face docs, which goes into way more detail of the internals of how this actually works under the hood and why it is as it is. So pickles may not be that great. What can we do? So option one, we don't use them. And we just use something like Onyx or something else. So basically, a serialization format that doesn't need code execution. Second option, because we cannot run away from pickles, we just use tools like scops, which is essentially a framework that, between many other things, it also mitigates the risk covered by pickle. So it mitigates how many things you can run when you load a pickle from scikit-learn. You can see there's a link to a talk there, which talks a bit more detail about scops. Scops also tackle other challenges around the productionization of scikit-learn models. However, even if we, so let's say we get, OK, we get very high quality pickles. They are super safe. They don't run arbitrary code execution. You can still have issues. So for example, so here we have an example where a group of researchers, so this is from a company called Mithrid Security, just graphed an LLM, a large language model of the cell, and then surgically changed it to propagate misinformation just on certain questions. So I think in the example, it's probably very hard to read. But the example was who set food in the moon first, and they say in the model answers we do regathering. They then will upload this model to the hacking phase hub where a lot of people started to use it without knowing that. What this shows is that you can also have issues where your model doesn't carry any risk of arbitrary code execution, but it still presents a risk to your system. So what do we do with that? How do we ensure that our pickles are not just secure, but also they are what we think they are? So what we can do is go back to the DevOps world. So we go back to DevOps because they have a similar problem, right? Like when you deploy classic software, you need to make sure that that software was made by who you think it was, and it wasn't tampered with. And that's essentially a trusted discard mechanism. So there is very little we can do with just scanning artifacts. But what we can do is ensure that the artifacts haven't been compromised in the whole way through our ML observing architecture, and also that they have been developed by who we think they were developed. We're trained. So what we can do is generate some sort of signature that basically protects our pickles from any sort of manipulation. So updating our previous architecture diagram, we now would have tamper-proof pickle jars. So instead of just a pickle, we would have something a bit more secure. And maybe the simplest way to solve this would be, yeah, let's just sign the artifact. Do you like hash, or maybe go a bit fancier? Do you have some kind of generating, some kind of certificate for the file? But the problem then is who validates that the key is also correct, or because maybe someone else could also generate some key. And then the artifact is totally correct according to the key, but the key was also tampered with. And then who guarantees that the guarantee for that key is also valid, and so on and so forth. So this is like in the space, this is known as the turtle's all the way down problem. So basically, there is this myth that the world, it sits on top of a turtle, and then the question is, well, where does the turtle sit on top on? There must be a bigger turtle under the knees, and so on and so forth. So turtle's all the way down. What we can do is, again, borrow the ideas from classic DevSecOps. So this is a really solved problem in DevOps, and it's basically just called supply chain security. So we have tons of projects, a massive landscape of projects that tackle this. On the example today that we're going to see, we are going to use SixTor for this, but there are plenty others. But what we can see, though, is like a general representation of how a supply chain security process looks like. At the end of the day, regardless of what you're trying to protect, you're always going to have artifacts, which is your original art binaries. You're going to have metadata about those, and you're going to have attestations. Attestations are essentially the signatures that verify those things. And then you're going to have policies, because it doesn't matter if you have a signature. It's worth less than you don't verify it. So in our case, in MLOps, machine learning, how this would look like is, well, artifacts are obviously our machine learning binary artifacts, so Picles, Onyx, whatever it is. Metadata, then, will generally be, and this is following the best practices from software, it will generally be divided into three separate things. So on one hand, provenance. So provenance, in our case, would mean, OK, who trained this model? When did they train it? What training pipeline did they use to train it? It will then include also a software build of materials. So in our case, that would mean, what data say did they use to train the model? What version of cycle learn did they use to train the model, or something else? Luckily for this, there are standards that we can use to provide this. And then lastly, also, vulnerabilities can report, because, well, no software is going to have no vulnerabilities, but it's good to know about them, and to show that we know about them. So this will be, for example, vulnerabilities about your cycle learn library that you know about, or it could be issues with your microservice, or with the custom code that you provide, if you were to also provide custom code alongside your artifact. And then lastly, well, signatures to ensure that this hasn't been tampered with. As I said before, for this, we're going to use SIGSTER. SIGSTER, essentially, you're going to rely on it, or mitigates, that torches us all the way down problem. So what SIGSTER has is, Ongahan has Fools here, which is like a free certificate authority. They host a version. You can also run it on-prem. We will use this as a way to generate certificates for our own artifacts, for our own signatures. And then it also has RECOR. So RECOR is a ledger that keeps track of all the certificates that you generated for your artifacts. So every time you sign something, it will also go into RECOR. RECOR, they also offer a hosted version. It comes to be run on-prem. What this means is that when you want to validate something, you can just check RECOR. Like, OK, I've got a key for this artifact, and I can check the digest. It's valid. But is the key then valid as well? We can then check RECOR for that. Also something good about SIGSTER is that it links, it ties, the signature with an IDC gateway. So you can link artifacts to people in your company or teams, any kind of identity. So updating our previous architecture schema, what we have now is how this would change the process is that when data science trains a model, this would then sign the artifact, put that into SIGSTER. And then when we deploy it, we would apply our policy, which in this case is just validating that the artifact hasn't been tampered with and has been trained by who we think it was trained by. This would obviously be kind of deployed in an automatic manner, again, following the best of those practices, SIFT left, et cetera. So let's see a demo of how that could look like. All right, so now we're going to do something similar as before. So we are going to start by training a circular model, which has already been trained. You're going to save it. Now we're going to save it and call it good model. So what we're going to do now is sign this artifact. So we have our binary. I'm going to try to sign it now. This will take me to the IDC gateway. So this is where I prove I am who I am. And we're just going to use the Google gateway that they have on good. And the certificate now has been created. We check the folder. We can see that now there are a bunch of files next to our pickle-slash-job-lib file. We can then validate that. Now because I created that signature and validated that myself, I will just need to check that it was actually me who created that and signed that. So we check that, and it's all good. Now we are going to tamper that. So we are going to follow the same process as before, where we just inject these system calls to dump the environment. So we have modified the file. We saved it into a separate folder called tamper-model. And we are now going to verify it. And as we can see, it just fails. Like six or fails. OK, cool. Now this is all running locally. Like this is just a quick experiment. How do we deploy that? How do we make sure that these policies are applied in production in our serving infrastructure? So for that, what we are going to do is let me just expand this. We are going to extend the scikit-learn runtime that we have of the self with ML server to do that kind of validation. So we're going to many details, but the basics are that what we're going to do now is just, when we load the artifact, before loading the artifact, the first thing we are going to do is check that the signature is valid and that the file hasn't been tampered with. So we load that, and we start ML server. So ML servers are already running in the background. And now we check the models that we have available. So as we can see, now we have three models. So we have the tamper model, which is a model that should fail, the insecure one, the naive model, which is the one that we modified previously, but the one that is using the up the self scikit-learn runtime with no validation, and then the good model, which is a one which should be where everything should be OK. So we start first by loading the good model. All looks good is what we thought it was. Everything looks good. Now we're going to load the naive model, which, as we know from before, we will just load it, and then it will dump our whole environment because it has that injection. And now we're going to try to load the tamper model. So we load that, and we can see that the runtime does what it's supposed to do. Sixor does what it's supposed to do, and it just fails. And we can check that it's not in there. But it's a final designer. So what this has allowed us is to protect. So we have detected that something was tamper on the model lifecycle, and we have prevented from deploying that and loading that. So just re-gapping, just wrapping up, what we have seen today is that animal psychosis is a very new field. There are not many best practices around. We are still working on them. Part of that is probably because there haven't been any large attacks that we know about. So there isn't that huge effort that you can see in other areas to identify security policies, to identify best practices. It's even to the point where, if you look at training vendors, there are no vendors that provide something as basic assigning. Or if you look at serving tools, there are no tools that, of the self, provide the ability to apply policies. Now, this is all super-developed in the software space. Like in the software space, you have tools. Like if you go to infrastructure tools like Kubernetes, you have plenty of projects that can sign anything and validate anything. And when I say anything, this is like Docker images. This is like the Kubernetes workloads itself, the Kubernetes manifest itself. But there is nothing around models. It's still treated as a domain problem. We don't do anything there. So first question that is worth raising is, can we use other open source projects to apply policies? And if not, can we create those projects or add those features to this project? This is an open question. So there is still plenty of work to do. Now, if you are interested in the topic, you can join, you're free to join, the MS SecOps Working Group. So this is the working group by the Linux Foundation for AI and data that joins on a monthly basis. One of the things they did was publish that top 10 MS SecOps list. And yeah, everyone is welcome to join. So feel free to join. And with that, thank you very much. I hope you enjoyed the talk. And happy to answer any questions. We have a lot of time for questions. Great talk, Adrian. Just please use the microphone who wants to ask. I will put the other there. Thank you a lot for the talk. So I don't love pickle. I have to use them because I haven't found another better or at least as good alternative. As you mentioned, one of the problem is the security issue, which you can kind of mitigate. But the other issues you mentioned, like all these versioning ourselves and all of that, still quite a big problem. Do you have any recommendation on things? Because things like Onyx are very specific to neural networks, so if you need a bit more flexibility, have you run into any good alternative to pick or? Yeah, I mean, I know that there are a lot of tools that let you, for example, export scikit-learn models to Onyx or from other frameworks. Now, I haven't used them firsthand. I don't know how good they are. Definitely just using Onyx is not always an option. It's an option maybe. If you use PyTorch, for example, and Onyx refers to you, then definitely go for it. Because in general, it's going to be a way better solution than pickle by far. But yeah, regarding the other ones, I think it's one of these open questions. I mean, scikit-learn has been going on for a while, and you still see how it just promotes, OK, just use Joplin. And they even acknowledge that it's not a great solution. They're docs, but there's no other option. So yeah, I would say Onyx or from us like those would be the way to go. They still expose vulnerabilities, but they definitely are a better option than pickle. And totally agree with not liking pickle. I think if everyone likes pickle, I know they are a bit of a psychopath, to be honest. Thank you. Thanks. I didn't really get in your six-store example. So there was a centralized key server where these keys were uploaded, right? What exactly is the model information that gets uploaded? Because not all development is open sourced. We can't always upload some private stuff. No, it's totally agree. So in our example, for simplicity, we just use like, because six-store has too many moving pieces. It has full show, it has record, then it has like, it needs to integrate with an IDC gateway. So for simplicity, we just use like, the hosted versions that they offer for this. But generally probably like in a production setting, you would just deploy those internally and then use them, yeah. Claire, thank you. Hi, thank you for your talk. I was wondering since you referenced also a blog or some of some sort from Huggingface, what Huggingface is doing on this topic or planning to do on this topic? Oh, yeah, so I know that the Huggingface Hub signs the artifacts. So it protects, I don't know what they use under the hood for signing. I don't know if we have someone from Huggingface maybe able to answer that. No, I don't know what they use under the hood, but I know that they sign the artifacts. However, that it still leaves the door open to things like if we scroll back, oh, went too far. Yeah, it still leaves the door open to things like this. Like in this case, that signature is valid because it's just a file that was uploaded by who we think it was, but then it's a matter of also you on the other end checking who has trained that and taking that signature and see if you can actually trust that person with that organization. But yeah, I know that they do that. They were sponsoring as well at first the ESCOPS project. I don't know if they are doing that anymore. Because, yeah, like everyone in the space kind of acknowledges like, yeah, now we don't have great solutions for this. And it's something pretty basic, serializing models. Oh, yeah. Thank you. No other questions? We'll have time if anyone else wants a question. OK, if you don't want, you can find Adrian OpenSpace or in Discord. Thank you again, Adrian. Thank you. Thanks.