 Hi, this is Brahman Shetty, I'm an assistant professor at the University of Michigan in Dearborn. I'm going to talk about machine learning, trustworthiness. So the title of my talk is State of the Model, Promising Steps and Remaining Challenges Towards Trustworthiness Machine Learning. So to set the scene here, let's consider this setting where we have a supervised learning task, and the goal is to train a model or a classifier, such as a spam filter or a marvel detector or an image classification model. And the way this happens is through this pipeline, typical pipeline machine learning, which is where we have a training data which is labeled, and we will run this through a typical training pipeline, and produce a model F, and in modern days, this is typically a deep neural network, and then we deploy this model as some sort of prediction API, and users would send inputs to this model, so the prediction API, and get an output or a label of a specific input. Okay, so that's the setup we will consider. So when we talk about machine learning, the progress in machine learning today, we've got two sides of the story. So there is this exciting side where machine learning does a lot of amazing things, and we also have the opposite or somehow the darker side of machine learning, where the machine learning model could do a lot of crazy things. So if we start with the bright side, we have seen progress where in machine learning models, these days, are good at image classification tasks in computer vision, or in domains, safety critical domains like autonomous vehicles, or even in domains where we do translation of languages, in the medical setting, we've also seen models which are pretty close to human accuracy in terms of detecting cancer, and we've also seen examples where models could beat professional gamers, and we all have recently seen examples where audio assistants or voice-based assistants could be used for a number of predictive tasks in our houses. So that's the nice side, or the great side of progress we see in machine learning. The flip side, or what I call the worrisome side of machine learning, could be summarized with one question here, which is, in the pipeline that I gave you earlier, what could possibly go wrong if we replace the user here with some sort of adversary who has some malicious intent. So because the adversary could do a great deal of manipulations of the input or maybe try to manipulate the training data for fulfilling the adversary goals. So in the sense of this adversary setting where the adversary could come with a lot of intentions, there are a couple of scenarios that we've now witnessed that machine learning is not secure or safe. So the first one is what we call data poisoning attacks. So in this setting, the goal of the adversary is basically to inject some training samples. So this sample should be carefully crafted so that the model's decision would be skewed or somehow influence it towards producing an output that the adversary wants. So the way this happens is when machine learning models, such as spam filters or recommendation systems or malware detectors, have to collect data from untrusted sources and merge that connected data into their original training set in the hopes that they would improve the accuracy of the model. So what they merge into the original training set is or could be poisonous kind of training data that is intentionally devised by adversaries to reduce the accuracy of the model or maybe to bias the model to specific kind of labels. In real life, we have seen many examples where training data has been poisoned and back in 2016, there was this infamous chatbot from Microsoft, which was retraining itself from data that it was getting from people online. And within a span of hours, it just turned to be a very racist chatbot because people were feeding this chatbot a lot of crazy content, which is motivated by race or politics and things like that. So that's about poisoning. The other common attack or threat that we see in the machine learning setup here is what we call adversarial example. So this is a very famous attack vector where the goal here is given an input X, the adversary would perturb or modify this input in a non-random way in a very calculated way to produce what we call an adversarial example. And that adversarial example is going to be misclassified by the model. So the goal is essentially by performing minimal perturbations on a given input, let's say an image and without changing the visual appearance of the image, the adversary would fool the model. So for this, we have seen more than enough examples. So in image classification, people have demonstrated this is possible. Even in more safety sensitive domains like healthcare, we have seen models misclassifying a tumor to the wrong class, which means basically misdiagnosing a patient. And we have also seen cases where a stop sign could be misclassified as something like a yield sign, which obviously has safety implications in autonomous vehicles. And even in voice commands, we have seen how to fool voice commands in voice assistance. And beyond computer vision or the audio domain, we have also seen cases where malware detectors could be fooled by just minimal changes made to, let's say, Android APKs or Windows executables without changing the malicious behavior of the samples. And the other kind of attack is what we call model extraction attack. Model extraction is motivated by an adversary who wants to either steal a model for, let's say, reasons like intellectual property or national security secrets and so on. So the idea here is by interacting with the model, the adversary would use the model as an oracle to label a bunch of data items. And then using this label data set, the adversary would train what we call a substantive model. And this substantive model is functionally equivalent to the original model. And in doing this, what the adversary achieves is the adversary basically gets an almost exact copy of the original model. And now if the original model was trained based on, let's say, a huge amount of data that was collected over a number of years, and that happens to be an intellectual property or, let's say, even in the worst case scenario, it is trained based on, let's say, a national security secret of a nation, then the adversary effectively steals that knowledge or that model. And in addition to the three attacks I described, model, machine learning models could also be vulnerable to what we call membership inference attacks. These are privacy motivated attacks where the goal of the adversary is by simply inspecting the model predictions, they want to probabilistically determine whether a sample was used to train the model or not. So the intuition of the attack here is based on how machine learning models generalize when they are trained. So the common observation we have about machine learning models these days is that machine learning models tend to overfit on their training data, which means a model would be more confident in its predictions when it sees members of its training set. So by exploiting this behavior or statistical distinguishability of how the model behaves on members versus non-members, an adversary could build a probabilistic or a threshold-based identification or inference model that would tell it, okay, this looks like a member of the training set versus this other sample, which doesn't have a good score of being a member. So the implication for this is, since this is a privacy-motivated attack, the implication for this is that if a model is trained on let's say hospital data set or medical records, if the adversary is able to identify a person, this is obviously a privacy breach. So the consequence is going to be bad obviously. All right, so I kind of gave you an overview of the four different attack vectors that we have fairly understood at this point are pretty important in the machine learning pipeline. So what I'm going to do in the rest of the talk is I'll take two of these four, namely adversarial examples and membership inference. So adversarial examples for fooling the model, membership inference for basically inferring a record used to train a model. So in the first part, I will focus on this moving target defense that we recently developed to improve upon existing defenses against adversarial examples. And in the second part, I will move into this other defense that focuses on defending models against the membership inference tags and this is based on what we call preemptive exclusion of the data points. And in the third part, the final part, I'm going to come back and try to tie back whatever I'm going to talk about with the title of my talk, which is the state of the model. So where do we stand in the state of the robustness or machine learning model or trustworthiness or machine learning model? And I'll take a broader perspective beyond the two kinds of attacks that I'm going to describe and the progress we have made in defending against adversarial examples and also membership inference. All right, so let's jump into the first part. So in this first part, I'm going to talk about this recent work that we've done called Morphin. So the idea for Morphin is basically to make the machine learning model a moving target in the eyes of the adversary. So before I get to Morphins, it's fair to kind of assess where we stand in terms of the progress we've made in adversarial examples, defense, arms race, right? So adversarial examples are dated back to early 2000s and the landmark paper that kind of introduced adversarial examples in the sense of deep learning is the 2013 paper from Google that looks at this intriguing properties of adversarial examples where they could be fooled with simple observations. After that, defenses have emerged in many directions. So the early defenses looked at gradient based attacks and they wanted to defend against this great gradient based attacks because the attacks were exploiting the gradient information. So the defenses were obviously based on doing things like gradient masking or refining or pruning the model or performing some transformations on the inputs of data. And then around 2017, there was this seminal paper by Carlini and Wagner which broke what was the state of the art defense at the time, defense installation, and later a series of attacks by Carlini and others broke most of the defense that we were proposed against adversarial examples. Around 2017, there was a new defense called adversarial training. So basically training the model or introducing the model to adversarial examples so that when it sees the these examples in the future, it can correctly classify them into the right class instead of the adversarial class that the adversary wants the sample to fit into. So as we speak, adversarial training is fairly, fairly good, but it comes with its own costs, which is when you train a model in an adversarial setting, you are fitting the training pipeline with adversarial examples in addition to the original training points. So there is a risk of penalizing the clean input accuracy of the model. Around 2019, and quite recently, we have also seen a surge in this direction of certified defenses where the idea is to provide a minimum robustness guarantee under a specific setting, for example, the LP norm distance measure of images so that the user of the model would have the lower bound on the accuracy of the model under attack. So if I guarantee that your model won't go beyond, let's say, 55% accuracy under a specific class of attacks, then when you deploy the model, you already understand that there is a certification up to a certain degree, and it's up to you to deploy the model given the certification I gave you. So it's basically giving the proof or telling the user of the model that here is a guarantee I can give you, but it's up to you to deploy. So the divide defenses come with that kind of guarantee, but they do also have their own limitations in terms of scalability and covering different classes of attacks. All right. So beyond what I said about this different classes of defenses, what is common to all defenses up to this point up to certified defenses is that they all treat the model as a fixed target model. So if an adversary tries to attack your model once and the attack succeeds, the adversary can come back and attack the model because the model doesn't move or the model is not changing the decision boundary of the model is basically the same. So that's where we come in with this new defense technique called Morphens, which as I said earlier is based on moving target idea. So the idea here is we're going to deploy a pool of models where the decision boundaries of these models are slightly different, but the overall accuracy comparison between these models is the same. So what we do is when the adversary comes in with an input X, we pick one of these models and that model would predict the output of the input. And then when the adversary comes with another input or even the same input again, what we do is we keep moving this models or picking a different model, which is as accurate as any other model in the pool so that the adversary who is trying to establish, let's say the decision boundary or the robust sensitivity of the model against different perturbations would be discouraged and eventually give up in terms of the attack that it is planning to launch against this model. So that's the idea. So to give you a little bit of context as to how our approach works, here is a brief overview. So given a model that is trained on a specific training dataset, the first step that we do is what we call seed pool generation. So as I said, we have a pool of models that will act as a moving target for the adversary. So what we do is given the weights of the model, what we do is we just perturb this model in a very bounded way and generate what we call the initial seed of models. And this initial seed of models, since we perturb the weights, is bound to be less accurate compared to the original model. Therefore, we have this second step that we call seed pool retraining, which is basically aimed for two goals. The first one is, of course, we want to gain accuracy back. And the second is we have to also create sufficient diversity between the individual models that we forked from the original model because in adversarial example literature, there is this phenomenon called transferability of attacks. So if you attack one model with an adversarial example, chances are that the attack would work on another model, even if this models are architecturally different. So to minimize that transferability, we also apply different transformations on the different, the original datasets. So the T1, T2 up to TN here indicate the unique transformations that are aimed for getting the diverse models. And after doing that, we'll get a fairly diverse model and we'll also get gain back the accuracy of the individual models. And then in this third stage, what we do is what we call selective adversarial training. So as I told you earlier, adversarial training is sort of the benchmark defense that we use because as far as we knew at the time of developing morphines, this was a best defense. So what we do here is instead of training all of the models in adversarial, what we do is we pick a subset of the models and train them adversarial. So apply adversarial training. Basically, we introduce these models to adversarial examples. And the reason why we are selective here is because if we train every model in the pool with adversarial examples, they would be great at catching adversarial examples, but they would penalize overall accuracy on clean examples. So to balance for that, we keep the remaining subset of the models as is to make sure that we're not using on clean example accuracy. And finally, we deploy this pool of models, which is a mix of adversarial trend and also the originally improved models. And then we have what we call a scheduler, which basically accepts an input and then performs the moving target aspect of the whole pipeline here. So when we do scheduling, we have to also make sure that the model pool that we produce after passing through these three stages should also be replenished or it should be renewed because if the model pool remains static, again, we're back to square one, which is the adversary would recover or discover some information about the fact that, okay, it looks like I have kind of saturated this model pool. So now I know that this is a pool that never changed. So to avoid that kind of risk, what we want is we, every maximum number of queries, we want to just run the pipeline one, two, three here and produce another pool of models. So another batch of models will be deployed. And that way, the moving target aspect will continue by updating the model pool. Okay, so that is basically how our system orphans operates to achieve the moving target aspect of the defense here. So let me give you a bit of a highlight of how this defense performs and how it compares against the state of the art at the time, which is adversarial training. So here, we've got different kinds of attacks of fast gradient sign method, Carlini Wagner and SPSA. So fast gradient sign method and Carlini Wagner are white box attacks, where the adversary knows the gradient and details of the model, while SPSA is an iterative black box attack. So this is the accuracy of the model when there is no attack. So it's a fairly good accuracy of that. This is, by the way, for the MNIST dataset, the benchmark toy dataset for image classification. And when we apply fast gradient sign method, you can see that the accuracy just drops to almost 10% from 99%. Carlini Wagner basically paralyzes the whole model, so zero. It's a very strong white box attack. And SPSA also drops the accuracy up to roughly 30%. Then when you look at adversarial training, obviously, it improved up on the accuracy under attack. For example, for FGSM up to 42%. But it fails to recover from Carlini Wagner and it fairly improves the SPSA from the SPSA attack. But when you look at this last column, which is our defense, Morphens, you can see that Morphens doesn't really incur any cost on clean label accuracy or clean data accuracy. You would see that it is almost the same as the original, the undefended model's accuracy. So we're good on that. And when you compare it against adversarial training on all these attacks, you can see that Morphens outperforms adversarial training with a very large margin. We also tested Morphens on another benchmark dataset, slightly more complicated than MNIST. This is called C410. So the conclusion here is that Morphens again outperforms adversarial training in all cases as it can be seen here. So the takeaway for the Morphens evaluation is that the moving target aspect is really working. So what it ensures is that it is much more robust than a fixed target model, obviously. It can also prevent falling victim to the same attack multiple times. So we have details of evaluation in the actual paper. And the iterative query-based attacks like SPSA that we have evaluated, they are very unlikely to succeed in the phase of a moving target model. So we've kind of understood or learned these lessons by applying this moving target strategy against adversarial examples. So the second part I'm going to talk about is about membership inference defense. So we have this upcoming paper called MiaShield Defending Membership Inference Attacks Using Preemptive Exclusion of the Member Data Points. So like I did for adversarial example, I want to highlight a little bit about the context here because later on when I talk about our defense against membership inference attacks, you would realize what we are improving on. So membership inference attack in the context of machine learning was introduced in back in 2017. And after that, there were a range of defense techniques that were proposed spanning different strategies. So there's this regularization-based defenses. Differential privacy is another one or masking the confidence of the model. Remember when I introduced membership inference, I said the adversary exploits the confidence of the model on members versus the members to differentiate members from non-members. So if you do some sort of masking of the confidence without, of course, affecting the accuracy of the model or the label of the model, then you can gain back some robustness. And there's also another class of defenses called ensemble methods, which is basically stacking machine learning models so as to confuse the adversary not to establish any, any intuition about, you know, whether a sample is a member or not. And then lately, we've got what we call knowledge distillation, which is based on making, basically pruning the model so that the model doesn't leak too much information about members. Similar to what I said for adversarial example defenses, what is common here among all these defenses is that they are based on this idea of masking or concealing the presence of a member. So basically all techniques up to this point are what they're doing is the member, the, the member element is in the, in the models. So we want to protect, let's say, you know, a patient's record that was used to train them on. What we do is we do our best so that the adversary doesn't, doesn't really, you know, discover that the item or the data point is not in the model. So it's basically based on masking or concealing the fact that the data point is in the, in the training. So what we say for the intuition we have for our defense technique is so it is well established that the presence of a data point offers a strong membership signal for inference. So the question we ask is how about excluding the data point, of course, without compromising, you know, the accuracy of the model so that this signal the adversary gets would be weak and as a result, maybe the attack would fail. Okay. So basically we're, we're departing from what has been done in the literature in the sense that instead of masking, we're basically excluding the item or the data point so that any probabilistic inference would be at best a false positive because if I remove the training point and I still give you the same utility or accuracy as the model, any, any conclusion you make as an adversary would be a false conclusion because the data point doesn't exist. Okay. So there is that, you know, theoretical guarantee that we don't prove here, but we'll see how this approach turns out to be. So here is a highlight of how Mia Shield or our approach defense of a membership inference works. So as I said, the idea is we want to preemptively exclude the member when we respond to predictions of the model, predictions of an input. So the idea is given a sensitive data set, let's say a patient record data set D, what we do is we want to first split this data set into disjoint subsets, let's say D1 to Dn and we apply data augmentation to gain back accuracy that would be lost to this disjoint splitting of the original data set and then we train the individual models after this data augmentation step and produce n models corresponding to the n disjoint data sets or subsets. And once we have this models, what we do is when we get an input from an adversary, we will leverage what we call an exclusion oracle, which would first decide whether the input X is or belongs to one of the training sets of this models F1 to Fn. If it belongs to one of this models, what we do is we will exclude that model from the ensemble prediction that we are going to compute over here. So y equals the ensemble of F1 to Fn. So if fi, model fi contains the input X, we will exclude model fi and perform the ensemble prediction or aggregation on the rest of the models. So basically what we're doing here is since the signal, the model that carries the signal, which is the target data point X here, is identified through this exclusion oracle, we exclude that model from participating in the predictions, but we still want to maintain the accuracy of the model. So as you might imagine, the challenge here is as to how we have to design this exclusion oracle because that seems to be sort of the bottleneck here. If the exclusion oracle gets things wrong, the whole pipeline or the whole defense strategy would fail. So the next slides what I'm going to do is I'm going to give you a highlight of different strategies that we explored for how to implement this exclusion oracle. So we've looked at five different techniques starting with a basic naive baseline. So the first technique that we used is what we call model confidence-based exclusion. So what we do here is to exclude one of these models from the ensemble prediction, what we do is we exclude the most confident model on a given input. So given an input, if we look at the confidence of all these models, we exclude the most confident. And the intuition here is that it goes back to the original intuition of what is the root cause for membership in terms of tax. It is the fact that models are more confident on their members. So by the same token, the most confident model in this scenario is likely the model that contains the input. So that's the basis for using this strategy. As it turns out, the limitation of this technique is that the most confident model is not necessarily the model that is trained on the target data point. So we've empirically validated that this, although this is a good baseline or starting point as an exclusion oracle, it's not the best. So as a result, we had to explore other alternatives. So the next alternative exclusion strategy we looked at is what we call exact matching based on exclusion. So the idea here is since we can compute hash values of cryptographic hash values of the individual data points in each subset, what we did was we basically compare the inputs hash value to the hash values of the data points in each subset. So whenever we find a match, we exclude the model trained on a sample that exactly matches the inputs. Since this is an exact matching technique, the limitation, the obvious limitation for this is when you slightly manipulate an input, specifically an image, that slight manipulation, let's say one pixel change would mislead this oracle. So it's not a very effective technique. It works when the exact match is found, but it fails when the slight manipulation is there. So incrementally, we went to the third alternative, which is instead of exact matching, why don't we do approximate matching based exclusion? So the difference here is that we just exclude the model trained on a sample that approximately matches the input. And one way to do this approximate matching is still on a hash value comparison or hash value lookup, but the hash function or method we use is perceptual hashing technique instead of the cryptographic hashing technique. While this improves up on the exact matching technique, because it catches certain images or model samples within a threshold of a specific distance, the limitation is that data points that fall outside the distance threshold will be again missed by the work. Okay, so next up is, okay, so we've looked at this exact matching and approximate matching techniques, but how about looking at the exclusion itself as a classification problem, right? So probabilistic classification problem. So what we do here is we want to predict the model to exclude, so for which we have to train the oracle, the model for the oracle itself, right? So what we do here is, yeah, so we get a portion of the subsets of the datasets, and we use these datasets to generate feature vectors that would be used as the basis for establishing different classes of models. So if we have n models, we're going to have n labels on which we're going to train this classifier-based exclusion strategy. The natural limitation, while it improves upon the limitations or it solves the issues with the other two or three techniques I talk, the issue with this one is it can be overfeed on members because you're training a model and we've already said that the root cause for membership inference is the fact that models are overfeed on their members. So it might suffer or it might inherit the same issue, okay? So that's sort of the limitation by design, but empirically as I will show later, it's much better than the other ones. And the other alternative that also speaks to the limitation of this classifier-based oracle is how about connecting or using what we call a chain of oracles? So basically, we query exclusion oracles progressively instead of just picking one of them. So we first start with the exact matching and when it doesn't find a match, instead of giving up, we go to the approximate matching and when it doesn't find a match, again, instead of giving up there, we go to the classifier based oracle or exclusion as a last resort. So this also turns out to be a much better technique empirically speaking. So let me give you a highlight of how this performs. So I gave you an overview of the five different techniques that we explored. So the model confidence-based exclusion, the exact signature-based exclusion, approximate signature-based exclusion, classifier-based exclusion, and chain of exclusion oracles. So these are the five exclusion strategies that we have tested. So what we have here is results of our defense or the performance of the different oracles against the undefended model for two datasets. So on the left is C-partain dataset and on the right is CHM list, which is a sort of benchmark medical image classification dataset. So on the x-axis, we've got model accuracy and on the y-axis, we've got the accuracy of the attack. And the baseline for the attack accuracy is 50%. So if the adversary gets, on average, a 50% attack accuracy on a batch of, let's say, inputs, it doesn't say much about the effectiveness of the attack because it's equal to random gas. So that's why we are fixing the baseline at 50%. So the closer the attack accuracy to 50% is good for the defense, it indicates a good defense. And of course, when you talk about attack accuracy, we have to also look at it with respect to model utility or model test accuracy. So that's why every time we measure the effectiveness of a defense or even an attack, we have to look at the trade-off between model accuracy and attack effectiveness. So while keeping the model, the original model accuracy, if the attack could be dropped into the baseline, which is a 50% of random gas, then we can consider that defense to be a good defense because without compromising the utility of the model, we maintain that the attack won't succeed. So the blue circle here is then defended model. So that is the accuracy somewhere between 1680. And as you can see, all the other points are our exclusion oracles, which are not too far in terms of model test accuracy, which is good news. And interestingly, they are almost on the baseline or on the 50% random gas attack accuracy, which shows that the attack is essentially failing. So that tells you that the defense is super effective. On the right hand side, you would see the same kind of story, except that this is a different data set, which happens to be a medical image classification. The next thing we did was, okay, so it looks like our defense, Miashield, is good enough in terms of basically mitigating the membership inference attack and bringing the accuracy of the attack down to randomness. Then we looked at the defenses that were proposed over the years, over the last five years or so. And these defenses, as I told you earlier, fall into different categories. Some are based on differential privacy, others are based on confidence masking, while others are based on model stacking, etc. and regularization also. So we did the same kind of evaluation. So we tested the tradeoff between model accuracy and attack effectiveness against our defense, Miashield, which is the green square, and all other defenses, like DPSGD, since it's a differentially private stochastic gradient descent. Pate is also another ensemble based differential privacy-based technique, and model stacking, Memguard is a confidence masking based, model stacking is an ensemble method, and MMD mixed up is regularization. So as you can see, we've considered, you know, a span of model membership inference defense techniques across the board over the last five or so years. So compared to the baseline, which is a 50%, as you can see, our defense, Miashield, is right here, so right on the line in terms of accuracy, which means it basically mitigates the attack. And compared to the undefended model's accuracy, Miashield is almost aligned with the undefended model's accuracy, which means without losing accuracy, significantly, it is providing a randomness kind of attack protection here. There are some other techniques like model stacking and Memguard, which are pretty close, and also the MMD technique here, the regularization-based technique, which are close to Miashield. But as you can see, their attack accuracy on each of them is slightly higher, although they are pretty close to the, in terms of the model accuracy. So overall, our defense still is a winner. And more interestingly, when you look at the differential privacy-based techniques, DPSGD and Pate, they are super comparable in basically mitigating the attack, so they're almost the same as Miashield. But as you can see the gap between the blue circles horizontally, and this green triangle and the red square, which is DPSGD and Pate, you can see that differential privacy-based methods, they cost a lot on the utility of the model. So that's where our defense is overall much better than almost every model that we've tried here, every defense we've tried here. The same is true for the other kind of data sets, the HMNs stars, the story is more or less the same here, except minor differences. This the third thing we did in addition to comparing our defense against existing defense is any defense has to be tried or tested against a possible adaptive attacks. So one of the natural adaptive attacks that we anticipate against our defense is, especially in the sense of perceptual hashing exclusion oracles is that an adversary would keep just keep perturbing or manipulating these samples until the sample is misclassified differently by the model. So for that we've looked at a range of data augmentation or manipulation techniques that the adversary would likely perform, specifically by rotating or translating inputs. So here we've got results on the left for C510 and the right CHM NIST. So the blue line is our defense and the black line is the baseline. And the red line shows the defended models accuracy as accuracy versus accurate attack, the attack accuracy versus this manipulation parameter. So as you can see our method or our defense maintains the attack accuracy while the attack accuracy of the defended against the undefended model is much higher in both cases. So this shows that adaptive attacks like this one, which are based on manipulation of inputs, may not succeed overall. So as a takeaway for this membership inference defense, so basically the membership inference defense that we proposed, which is based on this elimination of the signal, it works. And we have seen this across the board for multiple datasets and also compared it against the distinct defenses. And overall it provides much better utility privacy trade-off. And as I just showed you in the previous slide, it also remains resilient against an adaptive adversary that basically monetizes their knowledge about how our exclusion or abuse might have been designed. All right, so those are the two lines of works that I kind of described. So how does this fit into the way I start from the talk, which is the state of the model, what are the progress we have made, and what are the problems we have in terms of this grand scheme of making machine learning models trustworthy. So I took two threats against machine learning models, adversarial examples and membership inference and tried to kind of walk you through what has been done and what we have added on top of the existing state of the art for defenses. Okay, so now for the last part of the talk, what I'm going to do is I'm going to take a much broader take on what we call trust words in machine learning. So I only looked at the two dimensions, adversarial robustness against adversarial examples and robustness against membership inference attack, which are important and we have to, we have to do this kind of work. But trustworthiness of machine learning is not all about robustness against this adversarial inputs or robustness against this privacy motivated attacks like membership inference. It's much broader than that, right? So on this slide, I'll just take a step back and try to kind of summarize the progress we've made in different dimensions, including the ones I just discussed and some open issues that I consider are important. All right, so in the front of adversarial robustness, so techniques like adversarial training, certified defenses, and the moving target strategy that I just described from our work are super useful and they are they're good progress. But on the flip side, what we are struggling with, especially in the sense of adversarial examples, these days is that, you know, the adversarial example literature didn't move much. So we're gaining some robustness empirical robustness against, let's say, the previous defense technique. And then somebody comes and breaks these defenses and so that we are in that cycle of, you know, the classic attack defense arms race. So I think we are at the time where we have to rethink what we call robustness, especially in the sense of adversarial examples. So we have to broadly reason about robustness and get out of, you know, this well established robustness assessment in the image domain where, you know, the norms or the distance metric is somehow limited to the classic LP norms. So I'm not the first to say this, a lot of people suggested this and I agree with them. On privacy, differential privacy has become sort of the gold standard for what we want a privacy definition to be mathematically rigorous and provide some, you know, guaranteed utility versus privacy guarantee that you can empirically verify and also formally support. But beyond what we call the average case metrics that was which we are measuring attack, you know, effectiveness, we have to also look at some realistic scenarios for measuring privacy leakage. And we have to also reason about cross domain formulations of privacy because privacy leakage or implications of privacy leakage in one domain may not necessarily translate to another domain. For example, privacy leakage metrics for images versus, let's say, language models may not necessarily be the same because the tokens we're dealing with are completely different. And the semantics of privacy leakage is different as well. And so the adversarial robustness and privacy are the two things that I covered. But beyond this, as I said, we have to also look at trustworthiness from a transparency point of view, which is when machine learning models make decisions on very important tasks, we want to understand how the machine learning models got to that decision. So there is this, you know, explosive line of work in literature in the so-called interpretable or explainable machine learning, which is useful. But I guess what is challenging for moving forward with interpretability or explainability of machine learning models is that, you know, these machine learning models are keep going in terms of size and getting more and more complex. So the black box aspect is just getting more and more bigger. And it's becoming harder to scale up existing explainability frameworks to the models that we're seeing today, which has got, you know, billions of parameters. So that's about transparency. And the picture won't be complete if we don't bring in fairness and ethics into the whole equation of trustworthiness machine learning here, because you might do a great job on making the model robust against adversarial manipulations or privacy-motivated attacks, and you can still make it, you know, somehow transparent about its decisions and so on. But if it is not fair to everyone, or if it is somehow unfair to a specific long tail or, you know, portion of the dataset or a population on which a model is trained, then we are not doing things right. So the model is not trustworthy. So the idea here is one of the progress we've made in the literature of fairness and ethics and so on of machine learning or AI in general is this tendency to quantify or measure fairness and ethics. The question that might come naturally is, is it okay or is it natural to quantify fairness? Or is fairness and ethics quantifiable in the first place? That's a very big question that the ethics and fairness community has to deal with. If we have to formulate fairness and ethics, we have to have some alignment between what humans perceive as fair or ethical and try to encode this human policy into the formulations of fairness and ethics in machine learning. More importantly though, all these different dimensions of trustworthy machine learning are important, but what is more important is even the dynamics between these different properties that we expect from machine learning, right? So we have to also study, this is a very underexplored or somehow overlooked, fairly overlooked area where we have to look at, for example, accuracy versus all these kinds of properties that we expect. How does accuracy fair against adversarial robustness, privacy, transparency, fairness and so on. And then between this different, if you take a pair, let's say robustness and privacy or robustness and fairness or privacy and transparency, some of them are seemingly conflicting, for example, privacy and transparency. Privacy is all about limiting leakage in the sense of, for example, membership inference, but at the same time, you also want the machine learning models to be transparent about its decisions. So how do you balance privacy and transparency? Privacy and fairness is the same kind of story here. So this is the way I look into the whole puzzle of trustworthy machine learning in terms of the progress we have made so far and some of the open issues that I consider are important. So to conclude, the way I want to look into trustworthy machine learning is using this analogy of an umbrella. So in the normal sense of an umbrella, so you want your umbrella to be reliable or trustworthy so that it protects you from the things that you anticipate an umbrella will protect you against. So if there is a UV ray or light coming from the sun, you want your umbrella to protect you. If there is a rain coming, you have to also expect or anticipate the umbrella to protect you against the rain or if there is a wind, unless it is a storm or something like that, you still have some reasonable belief that your umbrella will protect you. So you can replace all the UV rays, the wind and the rain with any threats or any properties that you expect from a machine learning model like adversarial robustness, privacy preserving and so on. So if we consider machine learning as an umbrella and we want it to be trustworthy, this trustworthy machine learning umbrella should include, of course, robustness to adversarial manipulations. It has to be privacy preserving. It has to be interpretable or the explainability aspect should be there. It shouldn't be biased to a specific group of people and it should align with the ethical values that humans, reasonable humans would use. But this umbrella also needs some reinforcement. So it has to have this united kind of hands that will hold it together so that it doesn't flip around. So the way I look into this reinforcement is this triangular view of the synergy between academia doing basic research, industry extending basic research into products and services and the public sector as the mediator between the two because the public sector has this authority of doing oversight, auditing and also regulatory aspects through legislations and so on. So these three entities can work in synergy to reinforce what I call the trustworthy machine learning umbrella which has to encompass or have all these properties that I mentioned here on the umbrella itself. Okay, so with that I will stop my talk here and I will look forward to your questions.