 Hi, everyone. Thank you so much for tuning in. I'm David Roncek, co-founder of Kubeflow and program manager in the CTO organization inside Azure. With Yanis Arkadas, I'll be talking about owned by statistics, how you can attack machine learning models, and how you can use Kubeflow to defend. One thing I want to say is how much I love KubeCon every year and it's such an honor to be able to present even though it's virtual and I miss you all terribly. With that, one thing that we talk a lot about in machine learning is how powerful everything is, and how many advancements have occurred in the recent past. Specifically, you can see some of those examples here, where you have machine learning models from Microsoft really setting the bar to new and interesting advancements and really handing it back to the community. Machine learning is all about open source, and it's not just that we built on these open source projects externally, but then gave back after we were done. Now with that, you wonder how this affects Microsoft, and the reality is it affects us in just about every way possible. It touches our consumer, our enterprise, our game business, our search business. They all simply could not work without the power that machine learning gives us to analyze enormous datasets. When I say enormous datasets, I mean stuff like you see here, whether or not it's people using office, people asking questions of Cortana, or the 6.5 trillion signals that we evaluate on a daily basis around security. We could not handle this without the things that machine learning provides. With that said, oftentimes in the machine learning space, we tell you these great stories, and then leave it up to you as an exercise to the reader to go and implement it. The reality is machine learning is hard, and it's something that we definitely don't pay enough attention to. One of the reasons it is so hard is because of what you see here. Most of the time, people focus on just building a model. But the reality is that building a model is only one step of a very complicated workflow, and thinking about the pipelines is how people will help get people to adopt things more. Now, let's say you're a data scientist, and you say, well, I don't really care about that. I'm good. The answer is yes, you do. The reason is because of what you see here. People build these fantastic models, and then take forever to get them out to production. One way through that is with MLOps. MLOps is machine learning operations, derivative from DevOps and GitOps. It brings data scientists and machine learning engineers into the development process. Now, they can execute this inner loop, like you see here on the left. Iterating very quickly. Then when the time comes, they're able to expand and be brought into the normal application developer cycle, where in this case, they're able to now integrate their sophisticated models into the development and rollout procedures. You may be asking yourself, wasn't this supposed to be a talk about security? The answer is yes. That is because MLOps is the baseline for security. I'll talk to you about how MLOps provides you security by focusing on three different types of attacks you may see. The first is that we'll dive into is, your attacker gets your ML model to life. We're going to start with a really basic example of a model here. In this case, we'll be looking at what I'm calling a circle detector. In a circle detector, you're going to pull in a lot of things into ingested. You're going to split both your production and your training data. Then you're going to actually train on that data which will be quite large, and then you'll have some inference endpoint. Now in this case, we're going to start by collecting a whole bunch of examples of circles and feed that into our training data, and then out the other side, when I present a sample object, and it's going to come back and say that it is in fact a circle. So far so good, looking great. Now if I present a square to it, it still says it's a circle, and I have to wonder why? What's going on here? Well, if I look at these two things, one is green, one is blue, one is round, one is square, how could these two be confused? The reality is, is because I didn't give my model enough examples. In this case, I gave it something which has all the same pixels as a circle, but in fact, I didn't tell it what to look for, and so the model says, well it has the same pixels as a circle, so it's a circle, but obviously that is incorrect. Now surely advanced models are better, we don't see this in the real world, but the reality is that's wrong too. Here's an example of a Husky versus Wolf detector, and it looks great. We were able to do it with only one mistake out of the entire group, and you're looking pretty good. But when you turn it over and you look into the explainer, what you see is that a lot of the data under the hood looks like in the Husky case, it looks like it's picking up pictures of the animal, but in the Wolf case, it looks like it's just picking up stuff around the edge. I'm kind of like, well what's going on here? As we dive just a little bit deeper into it, you see that in fact, lots of the Wolf was not included, and what we really did was train a snow detector. Now you say, well this again, maybe this is just for paper purposes. No, in fact, you see this over and over again, where in one case you have a ship that is actually looking for water, in another case you have trains that are actually looking for train tracks, and then in the third case, in one of my favorites, you see a horse detector who is really looking for copyright notifications. Apparently a lot of horse images have copyright notifications. And that's just from accidental stuff. Now let's talk about when people are actually trying to attack you. In this case, on the left hand side, we put pixels over a stop sign and it indicates it's a speed limit sign, and on the right, they 3D printed a shell and made it think that it was a rifle. It gets even worse when you start thinking about military implications. In this case, we have a model that which is looking for airplanes on a runway, which is easily able to detect, until you put a sticker on the back of the airplane and it is able to camouflage those airplanes as though they didn't exist. And finally, thinking about what happens when you put a layer of glasses over these individuals, you're able to convince the model that these are in fact completely different people. And you think that these might just be toy examples until you realize that Amazon's face recognition system mismatched 28 Congress people with their mug shots, indicating they might be criminals. And it's not just that, it's also extremely hard to opt out. So you can't even get out of these systems even if you wanted to. So are you terrified yet? The answer for this is to use ML Ops and an ML Ops pipeline. You start with ingestion where you have more edge cases and detecting for bad data, use better evaluation metrics and different models. You also make sure that you have a red team in place and attack your own models first along with rich alerting and monitoring. But most importantly, it's feeding the information you get back from your serving into your model so that you can quickly become smarter over time and continually iterate and improve. And this is really only possible with a rich ML Ops pipeline. So a pipeline you say, tell me more. Well, when it comes to a pipeline, what we're talking about is the ability to break down a complex series of microservices and wire them together in a way that makes sense. In this case, we will be talking about using something like CI CD, GitHub Actions and Jenkins, loosely modular components that you can use from on cloud or on-prem and making sure that you're constantly measuring and updating. And the only way to do that is with something declarative that you can run over and over again without human intervention. To drill in just a little bit deeper, let's say you start with Jupyter in the upper left-hand corner there where you're automatically iterating. Once you're done, you check that into GitHub. And then at that point, you're gonna connect it to your pipeline. In this case, we're gonna use Kubeflow Pipelines as our CI CD selection of choice. And when GitHub receives that, it's going to begin that process. And so first, it's gonna use Kale, which is a open source tool that takes a Jupyter notebook and wires it up with your pipeline. Then you may kick off something like your feature engineering step, which may call out to actually an external pipeline, such as Spark, to run through all your large data processing. Once that's done, it's gonna hand back to Kubeflow Pipelines and continue with the next step. In this case, it'll kick off your training step. And in this case, we're gonna use Kubeflow as a pipeline itself. And so it's gonna call out, Kubeflow Pipelines will, to that training step, execute and then hand back when that's done. It'll then go to hyperparameters sweep, where again, you're gonna reach out to a Kubeflow Pipeline. And in this case, it's gonna execute many hundreds of pipelines simultaneously to search across the entire grid of hyperparameters and find the one that is best suited for your model. Once that's done, it's also gonna hand back. We're gonna package that and hand it off to serving. And then from that point, from serving, it's gonna be out there for inference purposes and be able to be used by your application. This all sits on top of metadata storage and infrastructure. And you know, rather than continuing on about this, let me hand it off and let Yanis show a quick demo. Thanks, David. So David explained the dangers your ML models face when they're exposed publicly and stressed the importance of having a solid MLL strategy and a composable pipeline on which you can iterate quickly. Now let's make this concrete and show what an ML part workflow actually looks like from the perspective of a data scientist. In this demo, we'll show the steps a data scientist takes from experimenting local injector notebook to running a single pipeline generated from that notebook, running hundreds of pipelines for hyperparameter tuning and then choosing the best model to serve. In this process, are it just an absolute store and ML metadata and so pull your credibility and lineage tracking. So let's dive right in. So let's pretend that we are a data scientist and we want to start working on an exciting new problem. We open the Qpload dashboard, go to the notebook stuff and start a new notebook server, connect to it and get right to work. The problem we're working on today is the open vaccine Kaggle challenge. And we're trying to locate the weak spots of a messenger RNA structure to help create a stable vaccine. I don't know too much about biology either, so it's okay. We're working for a while and in this notebook, we have prepared all the necessary steps for fetching the data, pre-processing them and training a model. Now, we want to run this whole process as a reproducible MLOps pipeline. Normally, we would have to rewrite our code to use a specific pipeline DSL. However, Kale makes a transition from notebook to pipeline, very simple. And let me show you how. In the side of the notebook, I open the Kale sidebar and enable Kale. As you can see, certain colors appeared around in cell. Cells with the same color are part of the same pipeline step. This information is stored in the notebook's metadata, which is pre-filled in this case. So we just need to annotate our notebook cells using Kale's intuitive UI. For example, we can declare what type of cell this is, how it's called, and what other cells it depends on. And after that, we click on compile and run and Kale will part the notebook, create steps out of cells, detect data dependencies, take a snapshot with rock and finally generate a pipeline which starts from a notebook's snapshot state and submitted to Kubeflow pipeline. And as you can see, the notebook is transparently converted to a reputable ML of pipeline and with a data scientist, didn't have to learn anything new about pipeline. After Kale submits the pipeline, it provides us with a link to the Kubeflow dashboard, which we can follow to see the progress of the pipeline run. And every step of this pipeline that you see is snapshotting by rock and each step's input and output artifacts are tracked by ML metadata. So for every step, you can see logs and also ML metadata, which records the input and output artifacts of each step. In this case, they are a rock snapshot, which we can actually follow and see the snapshot in the rock UI. Now, we have a reproducible ML of pipeline generated automatically from our notebook. However, this pipeline only trains our model for a specific set of parameters, like learning rate or batch size. Tweaking those parameters can result in a vastly improved model. So we want to explore more configurations. This procedure is called hyperparameter tuning and Kale provides an intuitive UI for enabling. To perform hyperparameter tuning with its three things. First, we need to define what the hyperparameters are. In this case, our hyperparameters are epochs, batch size, et cetera, and Kale provides us with a special cell called pipeline parameters to declare a hyperparameter, are parameters for hyperparameter tuning. Then, we need some metric to guide our search. The metric shows if our model is actually performing better or worse with its new parameter configuration. So in this case, we use the validation laws as the metric guiding the hyperparameter tuning. To declare our metric, we simply print it and declare the cell as a pipeline metric in Kale. And finally, we enable HP tuning in Kale and specify our cell's algorithm and hyperparameter settings. So for example, the batch size can be between 32 and 256 in increments of 32. In this case, the settings are prefilled so we can start the tuning right away. Kale, as you can see, provides a pretty intuitive form to do this configuration in a UI-driven way. After defining the parameters, the metric and the cell's algorithm, we are ready to start our hyperparameter tuning by pressing the button of the Kale sidebar. To perform the hyperparameter tuning, Kale uses CATIP, a Cubel component. Kale will start a CATIP experiment which will create a trial for each different set of hyperparameter values. We have implemented a SIM so that each CATIP trial results in a KFP pipeline run. Once we click the Kale button, Kale will use CATIP to spin up many instances of the pipeline creating the first demo but with different parameters. And as you can see, Kale also presented with links to follow the progress of the hyperparameter tuning. Now, because the hyperparameter tuning will take a while, we have prepared the results of this tuning beforehand. But we can see the CATIP experiment with all the creative trials so far in the graph showing all the hyperparameter configuration and their success metric in an intuitive way. We can also find the best trials so far as the UI will highlight it for us. And from the CATIP trial, we can navigate to the KFP pipeline run it corresponds to by clicking the small pipeline button next to the trial. So, once we click this button and go to the pipeline run, notice how many steps have to click the recycle icon on them. This icon means that these steps were cast to massively speed up the execution time of the pipeline run. Caching is powered by Erectos Rock by taking snapshots at the end and start of each step. We can also investigate, we can also navigate from a pipeline run to the CATIP experiment belongs to. So, we start from a notebook and create the pipeline. Then we run many pipelines in order to perform hyperparameter tuning and get the best model. After running the hyperparameter tuning, we have found the best combination of hyperparameters for our model and now we need to restore and serve that model. We can find the best combination by going to the CATIP UI it will be highlighted for us. From the trial, we can easily navigate to the corresponding pipeline run as we saw earlier by clicking the little pipeline button. Now, we are going to use the snapshot empire of rock restore the best version of our model in a notebook. To do that, we will go to the last step of our best pipeline run and locate the rock snapshot URL which is found under the visualization tab. We copy the rock URL and use it to restore a new notebook to the state of the pipeline run at that specific step. So we copy the code of URL, put the notebook UI, paste it and restore it. Once the notebook is ready, we connect to it. And as we will see, Kaley recognizes that it is restored from a snapshot and uses a pop up to point us to the last step that was running. Our best model in shape in the model variable which we will print to show that it is in memory. We want to take this model and serve it. To do that, we will use the KLSDK which is part by KF serving underneath. The function for serving is called serve and needs the model as input. In this case, we also use a function to process the raw data before passing them to the predictor. Once we run serve, Kaley will snapshot the notebook and use it to serve the model in the same immutable environment. Once this process completes, let's make some prediction. We first define some JSON data as input and then send it to the server with KF server.indict. The KF server object wraps the whole procedure of calling the model server with HTTP or gRPC and we immediately get a result back. We can also print the KF server variable to get information about the model server and follow the link to get to the model UI. The model UI is super useful for listing, monitoring, and debunk your model server. You can see the state of the server, you can see metrics from Grafana, you can also see logs from the model server's pods like the predictor or the transformer which performs the preprocessing and you can also see the configuration of the model server. So all in all, we saw an MLogSquare workflow where we went from experimenting inside a notebook to generate a reproducible pipeline that would perform hyperparameter tuning by running the pipeline we generated before multiple times with different parameters. And finally, we used Rock's Snapshotting Power to restore the best model and easily serve it using KF and KF serving. So thank you so much, Giannis. I think you can see how powerful an MLogS pipeline is and how much flexibility it gives you not just to move from a notebook which is the lingua franca of where people are doing their coding around models today but bring that into a very rich, very sophisticated pipeline that includes things like hyperparameter suites and automatic rollout to production. We touched on the first attack that MLogS helps you defend against. Let's get into the second, in this case where your attacker tries to take your model. Now, this is a situation in which a malicious attacker is going to try and probe your model many, many times to get an underlying example of what that model is actually doing. Their goal may not be complete accuracy, but as long as they can get close, they can start to use your model in really malicious ways. And these are really hard to defend against. So having a great MLogS pipeline is critical. The two types of attacks I'm gonna walk down are a distillation attack where they're pulling things around object detection and other things like that. And the second is model extraction which is more focused on transformers and language. So a distillation attack. The first you start with a black box model like you have here on the left. And I'm just gonna be again to probe that using sample examples. I don't know what's inside that model, but I do have access to the API. So I'm gonna start with a heart and that fails. Then I present a pentagon and it passes. And then I decided to pick a whole bunch of widely arrayed examples to probe against. And so now I do a whole bunch of different ones and I begin to get a better look at it. And then I present a lot of models. Anyone wanna guess what kind of detector this is? It's of course a Nina Simone detector. No, of course not. It is in fact a triangle detector. And from this, from all these many examples, I'm able to get a really good understanding of what passes and what doesn't for this particular black box model. Now with that, I'm able to pull that out and recreate the underlying model using just those examples. And now I can present that model to my audience and they never have to go back to that original model's author, which obviously is a real pain. Now the issue here is how accurate can I get? What does it take to get to 99% accuracy? And thanks to a number of researchers, they were able to find that the number of queries is actually very, very small. For both of these, they were able to reproduce the model in under 5,000 total queries, which is about two queries a minute over two days. It's really not a lot. So now let's get into a model extraction attack where we're actually gonna take that NLP model based on transformers and so on. In this case, Burt came out somewhat recently from Google. It really transformed the way that people do language models in introducing the brand new transformer architecture at the time. And most models that you'll see today, including ones like on Azure, are based on some derivation of this original transformer architecture. So you have a lot of surface area for under attack here. The way these work is you present a large corpus of data to the model to train on. And from that point, then you ask a question of that underlying data and it will do its best to give you a response. In this case, how many instruments did Prince play? That comes from the corpus. And in this case, the answer is 27. And you can see how sophisticated it is. There are a number of different places in the text where it describes instruments, the actual sentence where it says 27 instruments is split by a participle. And then you have down below different types of instruments and classes of instruments. So these models are quite sophisticated and very impressive. In order to attack these models, you have these researchers from Google who have done some really interesting stuff here. What they've done is they've looked at the underlying model and decided that simply by presenting either random words or words from the corpus, you're able to get a really accurate representation of what the model is looking for. In this case, here's what the random looks like, just obviously random words from any dictionary in the world or the corpus itself. Or if you know what the underlying corpus is, which in this case comes from Wikipedia, you're able to present more structured questions about it. In this case, words from the actual block of text or the corpus that it was trained on. And in each case, the model is doing its best to give you a response. It doesn't really know, but it's gonna pull things out of that corpus and be able to present it to the user as a potential answer. Now, once you've done that, you're able to start probing that model many, many times. And with just one tenth, the total number of queries, you're able to get to 72% accuracy according to the model. And if you were able to do the exact same number of questions, you're able to get to 86%, which is really good. And certainly the basis for doing a lot of work. And it's much, much less than the millions of dollars that it's often required to train that model in the first place. So to use MLOps to defend, you can do your best. You can try and focus on endpoints and securing the API, watermarking. But the real value here is gonna be in the pipeline. Your ability to retrain, to train for domains, to fix things over time, that's gonna be where the value is. So really, you should treat your model's security like anything else, where if you're exposing it to the world, it will be attacked. And what's most important is how quickly you can update and iterate and detect those attacks and make changes quickly. In shorthand, I would spend the majority of your engineering time on the left-hand side and much less of it on the right-hand side. Because on the left-hand side, those are the areas where you're gonna be able to make all your changes. And on the right-hand side, you know, at best you can stop people, but you're not really adding a lot of user value. So that's the way if your attacker gets your model, is able to take your models. Those are two examples there. Now let's talk about a third one around data leakage, where the attacker finds out about hidden data. In this case, a malicious user's gonna look for ways to attack your model to understand what it was trained on. You probably are already having this problem around data leakage. And the problem here is it just becomes more obfuscated because of ML. To show you some examples of data leakage today that didn't require ML at all, on the left-hand side, you see first, ways recommending potentially a very secret meeting that I might have coming from my history. And again, if it wasn't protected, that would reveal it to an attacker. Maybe it looks at the network graph where maybe I'm a union organizer and it detects who all my friends are following, not me, and I'm able to understand what their graphs look like and who they may be following. Or third, when you look at overall jogging and race recommendations and things like that, they're revealing the actual layout of potentially confidential buildings and structures. And again, this wasn't even me or my friends. This was from the community that was able to develop this and roll it out. Now, there's nothing so bad that it can't be made worse, especially with machine learning. So in this case, what you see is where the model is predicting what I should type next. And so if I type a beginning of a sentence, it may move on and reveal other things that come from my corpus of information. So it's revealing basically my private emails as suggestions for what I should write next. Now the problem here is of course, if it's doing its job right, it sounds great, it sounds exactly like me. The problem is that this is going to reveal a lot about what I write in other mails. And of course, this gets quite bad. Here, you have a number of different examples, whether or not it's revealing the rest of my address or my phone number. Maybe it reveals my relationship information or when it starts to get really bad, you use things like your Visa card where it has a known prefix and suggests the rest or maybe it reveals the social security number where again, it comes from a known prefix and reveals the rest of it. So in this case, the model which is totally obfuscated is revealing all this stuff without ever intending to because it simply wants to sound like me. So there's some very cool ways to try and detect these things. In this case, there's a way to use a canary where you inject that canary early on in your training data and then later detect for whether or not the canary leaked through and in fact did reveal information about that private corpus. The problem here is that even in this case, all you're really doing is detecting leakage that is already occurring, not preventing the leakage in the first place. And it's up to you to go back and rerun your data and reanonymize so that that data isn't being leaked through inappropriately. Now, there are things like differential privacy which may help over time, but I can't stress enough at the end of the day, you're going to reveal things that are private because that's the intent. For this thing to work properly, it should sound like the original user. It should be accurate. The problem is of course, you have to lock it down and you have to be able to detect it very, very quickly. The key here is building a pipeline, understanding your exposure quickly and mitigating quickly. So in summary, MLOps gives you a lot of goodness, best practices, repeatable workflows, an immutable record of what happened and real acceleration around getting two user benefits. Unfortunately, it doesn't give you this for free. There is some work involved, but the reality is this is the best we have for making sure your systems are working great. And the reality is it's a whole new world. Data science will touch every individual and it's up to us, the people watching this and the data scientists and the ML engineers to put these tools in the hands of people who can really use them to make the world a better place. The truth is you can't avoid here, your models will be attacked, your pipelines will have issues and the game is all about mitigation of harms and quick recovery. And you can do that using an MLOps pipeline. And with that, here's the final slide with all the papers that you may have any questions and thank you so much.