 Good morning. Good afternoon. My name is Brent Halpern. I am the Scientific Director for the AI Horizon Network, and this is our second AI Horizon Network seminar. Today we have In Yu Chen from IBM Research. He's going to be talking about recent progress in adversarial robustness in AI models, which is an extension of his NURBS paper from 2018. Can you use an RSM, a research staff member? I've been research in New York, or Cambridge, probably Cambridge, sorry. I've been research AI. I've been working with the MIT IBM Watson AI Lab projects and expertise in data mining, machine learning, signal processing, cyber security. The general format is we'll have everybody muted, and Pinyu will give us talk, and then we'll open the conversation for questions. Also, if you want to unmute yourself and ask your questions, there's a chat channel. The chat is a little speech bubble icon at the bottom when you mouse over the bottom of your screen. So with no further ado, Pinyu, please go ahead. Okay. So Brent, can you hear me all right? You are fine. Okay. Yeah. So hi, everyone. Nice meeting you. I'm Pinyu Chen, and thanks Brent for the nice introduction and for the invitation. So I'm a research staff member based in Yorktown in New York. So today I'm going to walk you through our recent works on the area of adversarial robustness. So this will, as Brent mentioned, this will be extended version of our certification work published at the New Ribs last year. But in order to help motivate what robustness certification is about, I will have a multi-tour to work you through attacks and defenses and spend the last half of the section on robustness certification. Okay. So let's start with something fun. So if I were to give you this label and ask you to provide a label for this image, what would people usually provide? It's most likely you will say this is the ostrich. And this is also what AI says. It is an ostrich by AI model, identified by AI model as well. But the interesting thing is about this image. So what would this image be labeled? So human would probably say, hey, it's also an ostrich. But the interesting thing is now AI says it's a shoe shop. And this is not a coincidence. So it turns out like this AI models can be easily fooled by this so-called adversarial image. So you can not only turn a label of ostrich to shoe shop, it can be turned into safe or vacuum of whatever label you want. And the AI model we are talking about is not some random and crappy AI model. This is actually the best image classifiers using neural networks that we are using nowadays. And one thing I would like to highlight is that images and neural network models are not the only victims to these adversarial inputs. We are interested in images because it's easily to be visualized. And we are interested in neural networks because they are the state-of-the-art models for this task, for example, object detection. If you are working on other domains, you can certainly take other models into consideration and study their adversarial robustness. So these images, as I mentioned, is called the adversarial examples. So these are like prediction evasive samples that are being created at test time. Like you have already a trend neural network model and you want to find adversarial perturbations to that input such that the prediction of that model will go wrong. But the adversarial input as you look like as the original input. And as some of you might know, I'm a big superhero fan. So there is always an episode in this superhero movie where you have characters that look totally the same, but one is benign and one is evil. And then they look totally the same, but they have different characteristics. So I think this is a perfect analogy for adversarial examples. They look totally alike as benign input, but their prediction and their output will be totally different. And they will have different functionalities. So why do we care about adversarial examples? So as I mentioned, this adversarial robustness is a big umbrella and you can think about this robustness from different angles. For example, in addition to adversarial robustness, people also care about robustness of the training data. For example, if your adversary can have access to your training data and embed some backdoor to your training data, such that the model you train on this poison data could be affected, then it is also a type of adversarial attack that we call the poisoning attack. But here we are more interested in this talk, we are more interested in adversarial examples, which is the attack that's been deployed at test time. You don't have access to the training data. And instead, we are facing a trend AI model at test time and you want to create some prediction evasive adversarial inputs to that training data. And this is a typical scenario where we deploy AI models, machine learning models, systems as a service, right? So like a Google cloud service, IBM services and so on. And we are now asking whether you can find the vulnerabilities of these deployed systems. And by the way, let me know if you have any questions and just unmute yourself and let me know by any time. So why do we care about these adversarial examples? So there are several reasons we have to deal with this carefully. The first thing is there is a crisis in trust. So like general audience seems to be very concerned if any inconsistent decision making between AI and machine learning models will exist. So this is, for example, a typical example. Is it possible to manipulate a stop sign by adding some adversarial stickers such that your autonomous driving car when you see this perturbed stop sign, you will think it's a speed limit. Hence, it wouldn't stop and causing some security related issues. If you are a machine learning researcher, we are very interested in knowing why these machine learning models are so vulnerable to these small perturbations happening at the input space. And because this somehow indicates there is a limitation in our current machine learning systems and how we train these machine learning models to do their jobs. And lastly, in terms of company or public awareness, there is always a concern about loss in revenue and reputation. For example, just like two weeks ago, you might find some articles saying Tesla's autopilot system can be easily fooled by just adding some small stickers on the roadside and the autopilot system will direct you to a different route that is supposed not to do that. And like recently, like last year, some Google online service accidentally tagged African Americans as gorillas and kind of arise some discussions and leads to some revenue and reputation loss for these AI-based services. But we know as a fact that these things could happen either by the bias of the data or it can be manipulated by the so-called adversarial examples. So if the general audience didn't know this technique exists and didn't know how easy we can manipulate the output of our AI model, then we can certainly create these adversarial images and causing some revenue or reputation loss or distrust among humans and machines. That's something we certainly want to avoid. So as I mentioned, this is what IBM's strategy is, and that's why we are very dedicated to this trusty AI. So trusty AI is our company strategy. And among trusty AI, there are different pillars, including robustness, things that we just talked about, not just for adversarial examples, but also training time attacks, model stealing and so on. And we also care about fairness. How do we make sure the decisions made by models are fair and justifiable? And we also care about explainability. How did you explain AI's models, the decision, why you think this is ostrich and why it is not some other labels? And we also care about accountability to make the AI models transparent. So this is the big umbrella under trusty AI and robustness is something we are going to zoom in, and then we are going to have a deep dive to this certification method. So I would like to show our portfolio in the research of adversarial robustness research. So this is apparently not an inclusive list, so I'm only highlighting the research that I'm involved in. So if you count IBM, the whole IBM team, we have like probably double number of papers or technical reports working on this field. And I'm also focusing on adversarial examples on this slide. But I would like to mention we have done a lot of amazing works in other type of adversarial attacks as well. For example, to detect poisoning attacks, training time attacks, the team at IBM, the research lab actually won the best paper award at the SAFE AI this year. So we certainly have a lot of visibility here. And in terms of research, we have published more than 15 papers on this topic of adversarial robustness. So we are certainly one of one of the most active, and I would say a leading group in this field. And we cover a lot of different topics, including attacks, defenses, robustness certification evaluation, connections to interpretability and robustness accuracy tradeoff, and also some zeroes or the optimization techniques to make this about vulnerability analysis easier. And our research certainly have a lot of attention from the general public as well. So media will cover our research because this is not an interest, not just to our research community, but to a general public as well, because AI models one day they are going to be more co-exist with human being widely deployed to our daily lives. So this kind of research is very important, not to researchers, but also to general audience. And we also have IBM has an open source toolbox called adversarial robustness toolbox where we release our publications and codes to help people do research in adversarial robustness research. So here is the roadmap of making AI or machinery model trustworthy. And as I mentioned, there are many topics under adversarial robustness. So they are attacks, how you find the vulnerability of your AI models, their defenses, how do you improve the robustness of your AI model and also detect adversarial inputs, how and certification evaluation basically how did you certify your model and your input is robust or not, and how did you provide a formal evaluation of the robustness of your AI model. And there is always the interpretability angle of adversarial examples because eventually the existence of adversarial example can be explained by lacking interpretability of our AI systems. And for example, my colleague Sijia Liu is expert in this field. There are a lot of research skill sets and tools that can be used to study adversarial robustness, including a lot of optimization techniques, robust optimization, high dimensional statistics, and of course, deep learning. So when I'm working on this adversarial robustness, I like to think of this research as a magic mirror. So basically every researcher will see some part of it when you are looking at the papers on the adversarial robustness because it is a very young and exciting field and it's a very inclusive field in the sense that every model we are working on, every task we are working on, we must have a model to execute these tasks. And whether this model is robust to these adversarial attempts is of our interest. So basically, adversarial robustness research is everywhere and it's very comprehensive. So one thing that we discovered at first place and it is honestly quite shocking for us is that the accuracy does not imply adversarial robustness. So if you are familiar with the research of deep learning, you might notice very well known competition called ImageNet. So basically ImageNet is where people submit the models to predict object detections and then basically rank each model's performance in terms of their test error or test accuracy. So for the past few years, the only goal to benchmark each model's performance is the standard test accuracy. So out of curiosity, we actually take 18 different ImageNet models submitted over time and they have a different accuracy and then we kind of rank them in terms of robustness. So the x-axis is the different models accuracy and y-axis is the robustness. So you can think of the robustness that we are showing here is a major of how easy it is to manipulate input to generate adversarial examples. And somehow to our surprise, we found that the more accurate models is actually less robust to adversarial examples. And this can be somehow explained in a sense that in order to achieve high accuracy, you more or less need to overfit the decision boundaries a bit and make a decision boundary non-smooth. And somehow this non-smoothness will cost adversarial examples to exist. So that's why we call it a robustness accuracy trade-off. And this also indicates in order to make this AI model trustworthy, accuracy is not the only metric. We should also care about robustness because if you're solely pursuing a high accuracy model, it usually wouldn't come with robustness. So as I mentioned, why do researchers and society care about adversarial robustness? So overall, this is all about trust because AI models and AI services are going to penetrate our daily lives in no time. But what's being discovered in this research of adversarial robustness is whenever there is a neural network model, there is actually a way to generate these adversarial examples. So I'm showing you other possible applications of generating adversarial examples. For example, this is the imagery captioning test attack that we have shown in the past. So basically this image will generate correct and related captions, for example, a red stop sign sitting on the side of a road. But again, if you use our technique and you add some small perturbation to create adversarial examples in the bottom, you can manipulate the outcome and having some crazy captions like a brown teddy bear laying on top of a bed. And these two images basically look the same to humans. And you can do the same thing for automatic speech recognition systems, which we will revisit in a few slides. And these things did not just happen in digital world. In physical world, these things could happen as well. So CMU researchers have devised some adversarial glasses. So once you wear that adversarial glasses, your face recognition system will think you are someone else rather than the person wearing the glass. You can apply this technique to evaluate the robustness of autonomous driving systems for sure. And you can create these 3D adversarial objects, in this case of 3D adversarial turtles. And if you take your cell phone and take a picture from different angles, for most of the time your machines wouldn't realize it is an adversarial turtle and you will think it is a rifle or something else. So this adversarial robustness is not just happening in digital world. In physical world, people have been finding ways to make these things happen. And so you can imagine if our lives truly depends on this adversarial AI models to do jobs for us, there is a trust crisis that we need to resolve. Okay, so next I'm going to show you what you through some adversarial attacks that have been developed in the past. So how do we actually generate these adversarial examples? So researchers actually start from a white box approach. So white box means everything is transparent to the adversary when you try to generate these adversarial examples, including how your model architecture is, how your model is trained, how many, basically there is no secret to hide from an adversary. So here this is an example of showing you how do we generate adversarial example of a French bulldog. So if you input this original image, French bulldog, your neural network classifier will say it's a French bulldog with 90% accuracy. But what if now I want to generate some perturbations to the input such that it will be classified as let's say best people. So what people do is actually very easy is relying on the back propagation of this neural network you are going to attack. So the way we do is in order to generate this adversarial perturbation, we ask neural network to give us some directions to increase the confidence of being classified as a best people and also decrease the confidence of being classified as a French bulldog. And we also want to make sure the perturbations are small such that the final adversarial image will look the same as the original image. And as you can imagine this technique is very general and does not limit it to image classification alone. So basically different applications relying on neural network because you have this function of back propagation you can do this adversarial examples very easily in the white box setting. So here it's more like a mathematical definition of how people generally are the adversarial attacks or more specifically are the adversarial perturbations. So first we have to define something that we call threat model like what is the allowed perturbation to your input. So we have a perturbation delta confined to some distance metric or some semantic space relative to a given input x zero. So semantic space you can think of this semantic perturbations happening on x zero as well for example rotation translation change the lightening and so on. And so the general attack formulation is we try to minimize the distance basically preserve the similarity between the original image X naught and the perturbed image X naught plus delta while ensuring the prediction of the neural network model of X naught and X naught plus delta are different. So this is a typical setting for an targeted attack where you want to find a delta that make the final prediction went wrong but you don't specify which target you desire and it's very easy to change your entire targeted attack for million two and targeted attack formulation where you want your prediction of the perturbed image to be a specific label. And there are several alternatives of this attack formulation. You can do minimize the distance plus some attack loss function. So basically it's a lot surrogate loss function of the condition that the two predictions on the original image and the perturbed image has to be different. Or you can do minimize the attack loss function subject to the some distance as a constraint. So you want to make sure the distance between the original image and the perturbed image are smaller than some epsilon distance. And here are some commonly used the distance in the literature. So we start from the most easy and mathematically will define the distance for example LP normal center on X naught. So in this case the distance of between X naught and X naught plus delta really boils down to the norms of the deltas. And usually people consider different LP norms. For example, an infinity norm of delta basically means what the maximum preservation allowed in each input dimension. L2 norms are basically the sum of square differences of each input dimension. And L1 norms are their total variation and L0 norms are the number of modified dimensions. For example, how many pixels you will going to be modified. And more recently our recent work have really pushed the limit of this attack by considering mixed norms and structure attack that fake focus on convolutional filters and so on. So we can generate a clean and interpretable perturbations and that's how we can build connection to interpretability as well. And for the loss function attack loss function usually people use cross entropy or constructive loss. So constructive loss is something I explained in a few slides ago. For example, in order to make this bagel image to be classified as a grand piano after adding this noise, you will you will try to increase the confidence of the grand piano and also try to decrease the confidence of the bagel label. That's how people generate adversarial examples in a mathematical formulation. But you may argue so far we are talking about is in the white box attack sense where you have access to your model and you can do bad propagation. But what if in a practical setting when you deploy your machine learning model as a service, there is no way I tell you what my model is behind the service. So this is in the actually a setting of AI or machine learning systems with limited access and that's why we call a black box setting. So there was a time people believe this black box setting is actually robust to this adversarial perturbations because now you are not able to do bad propagation and hence my model should be secured. But our recent work shows this is actually not the case. So it turns out that even without doing bad propagation, we can still generate these adversarial examples and so our work published in 2017 is actually the first work that tried to make this attack more practical in a black box setting. So the way we did it is we tried instead of doing bad propagation, which is not which is infeasible in this setting. We are actually trying to estimate a gradient instead of doing actual bubble propagation and such that using these estimated gradient we can generate this adversarial perturbations and make a French bottle being classified as a basketball or so or making a bagel classifier as a green piano. And this grading estimation is nothing but some finite difference method that we have learned in high school or in calculus. And more recently in 2019 we have improved the version of this black box attack. So our previous method is great, but it's usually needs a lot of queries to make this attack feasible. So for example, to make the bagel being classified as a green piano, we usually need a medium of queries from the machine learning model. So what we are proposing here is actually a more query efficient way of doing this. So we actually apply dimension reduction to make the number of queries more efficient and we also have a newer grading estimation technique to reduce the number of queries we need to find adversarial perturbations. So if you compare the first row and the second row of this method, you can you can see that we we are the new method auto zoom reduces more than 80% of the queries while still giving you a similar adversarial images at the end. So this auto zoom is actually is making this black box attack or the way we call it the efficient way of estimate evaluating the robustness of AI systems with this limited access plausible. Here are some other more examples of how we how this autism works on ImageNet. So basically image auto zoom can save media of queries when compared to our first attack the zoo attack. And what is actually doing is using a few queries to find successful perturbations and then using more queries to refine a perturbation such that the final image will be similar to the original image. But still having this adversarial effect. So now you may ask so what we have discussed so far we are actually assuming we know the confidence score of the AI model from a black box system in order to generate this adversarial example. Then is then the next question is, is it possible to generate this adversarial examples. If I only give you the top one prediction of the model and I didn't give you how confident I am for the for the input. The turns out the answer is yes. So in our recent ICLR paper we showed that even if you are giving me the label of the top one prediction which is the least possible information you should you should be given to any regular user. It is still possible to generate these adversarial inputs. So for example in this case we have a snack snake image as the original input and we want to be to be classified as a cat. So we can we while we are doing is we start from a cat image and gradually fine tune our our perturbation such that this image will be similar to the original image but you will be classified as a cat. And we can do it in a very query efficient way as well. So as I alluded before, other attacks are not only occurring in images and occurring other and not only limited to convolution neural net. So like recurrent neural net can be a model as well and other tasks for example image plus tax the image captioning could be vulnerable to this other attacks as well. If you follow the general principle of doing back propagation and specifying how you are going to attack your model and specify specifying your threat model. So I would argue this is more or less a universal threat or a universal concern to machine learning researchers no matter what they have said you are using what model you are using. There is the adversarial robustness angle that you should consider how reliable your model is to this type of adversarial input. Next I'm moving to adversarial defenses. So compared to attacks, defenses are actually less well developed because it is actually a more challenging task than doing attack because what attack is doing is finding one adversarial input that breaks the prediction. But for defense you basically need to find you basically need to defend all possible adversarial inputs. So this is really a worst case scenario and that's why making this defense very challenging task. And there are a lot of reasons why learning a robust model is challenging. For example, we usually don't have interpretability of what our AI model makes decision and our training data could be noisy and biased. And also, when we design these AI models, these neural networks, we really don't have the notion of security or privacy or robustness in mind. So the current architecture that we are training on could be vulnerable to these adversarial inputs. And finally, this attack and defenses, they can both leverage these AI techniques we build on and try to improve upon each other. So for example, if you look at the history of this adversarial research, you will find a trend that people propose some defenses in a few weeks or a few months later. Other researchers find a way to bypass these existing defenses and claim these defenses to be ineffective. So there is really an arm race between attacks and defenses and more or less this is what makes this research a lot of fun. So where we are in terms of this adversarial defense. So first, the way we evaluate defense is actually we are allowing the defender to move first. That means a defense is robust only when it is known to an adversary but still cannot break it. So defender has to make the first move to make your model robust and attack makes the latter move where it sees how the defender is doing and try to see if there is a way to bypass the defense mechanisms that you apply on your model. So there are several attempts for defenses, for example, data augmentation, try to augment with adversarial examples to improve, to retrain your model. So it helps but not, did not truly solve the problem. And there are ways to improve the model robustness, for example, by changing the conventional model training from minimization to minimax training. Basically, it's called a robust training. So in addition to train the model parameters, in the meantime, we are also generating these adversarial inputs and giving them the same, the corrected labels and ask your model to memorize these adversarial inputs somehow and correct them so you can learn a more robust model. So this approach is effective, but we also find it to be not scalable. And because of this minimax, the worst case of training scenario, we often, you are suffering from a significant drop in your test accuracy compared to conventional training. There are other ways that doing input transformation, doing correction, rectification, anomaly detection, but many of the defenses are bypassed by advanced attacks. So people have been also looking at other ways. So the one direction that we believe are more promising is actually using the diverse models, including model ensembles and model with randomness to basically increase the cost of adversarial attacks. And later on, I will show you some domain-specific defenses where we have some promising results, but one downside is these domain-specific tasks, these defense rules cannot be easily generalized to other domains. So here is a case study of how we try to detect these audio-adversarial samples. So audio-adversarial samples is, again, you add some small perturbation to your input audio file and it could alter the transcribed results of your audio system. So the way we detect these audio-adversarial samples is really leveraging the temporal dependency nature of the audio speech recognition system to find the discrepancy between benign adversarial inputs. So the way we did it is we first passed the whole audio input to the system and we obtained a corresponding sentence. Then we tried to chop off the input sequence and only passed the chopped version of the sequence through the system again. Then we compare the counterpart of this whole sequence and the chopped sequence. So if you are benign audio, then your chopped sequence and the whole sequence will more or less be similar in terms of this word error rate or character error rate. But if you are adversarial input because your nature is to change the transcribed output, then we can expect that what error rate of the chopped sentence and the whole sentence will be significantly different. And we can actually use this error rate as a statistic to distinguish benign adversarial inputs. And this heuristic actually turns out to be very effective. And this is because we know how this automatic speech recognition system works and somehow we are leveraging domain knowledge of temporal dependency to build a detector and hence improve the robustness of the entire AI system. So robustness and improving robustness and detecting adversarial inputs are certainly possible, but in this case we are really relying on some domain specific knowledge. And therefore, and one interesting question is how do we generalize this notion and kind of creating an automatic way of finding this domain specific knowledge for different tasks in the automated fashion. That would be a very interesting direction to go. Okay, so finally we are looking at the robustness and certification evaluation. So the reason I motivate attack and defenses first is there are several ways to evaluate the robustness of your AI model. So one very typical way is people usually use a game based approach where you specify a set of players in this case I can call a different attack function and different defenses and operate them on my on my model and I can benchmark the performance between each attack and defender pair. So this is a very reasonable approach, but the downside is this a magical rank may be misleading and there's no guarantee that your mother will be robust to the attacks that you didn't perform on. So there's a concern of not being able to generalize your defense or robustness performance to other attacks or future attacks. So that's why kind of motivate the second way of evaluating robustness that is we call it verification or certification. So in this case we are not relying on attacks to evaluate robustness instead we try to build an attack independent certification for evaluating robustness. So in this case what this approach can do is to provide a certificate for your model, but the downside of this approach is that the optimum verification is very difficult especially for large neural networks and I will go do a deep dive in a few slides later. So geometrically how do we understand how this verification works and what does it mean in terms of this adversarial event post. So geometrically if you think of this dashed lines as the decision boundaries learned from a neural network classifier so if your input is in the middle region you will say it's ostrich here and and if you go across the decision boundary you will be classified as other labels. So the way we define robustness the attacks is trying to push the image to the other decision boundary to go across decision boundaries so it will be misclassified as something else but trying to keep the distance between the red line and red point and the black point small so they will look visually the same. So attacks in this case is actually an upper bound on the minimum perturbation so here we define the minimum perturbation as the smallest distance required to alter the decision of the input sequence. So in this case any of the perturbation given by any attack successful attack is actually an upper bound of the minimum perturbation. And on the contrary, the robustness certification is actually the lower bound of the minimum perturbation so eventually we want to find the minimum perturbation of any given input but that problem has to be has already been proven to be NP hard so instead we try to find the efficient way of computing a lower bound of the minimum perturbation and we can provide a certificate saying no matter how you perturb your image within this epsilon within this green ball your top one prediction will be the same which means your mother will be consistent. The top one prediction of your mother will be consistent no matter how you move your point in this green ball so that's how the certification where we are going to give for the remaining slides. So here is the overview of what's been developed in the past few years. So we have done a series of works along the line of robustness certification and evaluation, including the clever and fast limit crown and see an insert. So these are like different versions of our robustness certification works and you can support so we are trying to make the functions more general and support different network models. So I'll go then go ahead to explain what we are doing for for this line of work. So basically when we are trying to certify a neural network model, you need to first give give a trend neural network model and then second given a data input. So because different data input will have different minimum distortion some may close be close to the decision boundary some may be further away so this robustness certification is given in the input sample level instead of a model level. And then we can specify a threat model that we want to certify for example the LP norms between the perturbed image X and original image X naught. So what we are asking is if you are allowed to modify the image X naught by adding some perturbation we think this epsilon ball well well the top one prediction of the perturbed image be altered. So in this case is it possible we think epsilon ball perturbation is it possible to alter the label of ostrich to a shoe shop or a vacuum. So the way we did it is if you can so now you're allowing your input to be plus and minus epsilon for each input dimension. And then basically if you're assuming your activation functions can provide some upper and lower bounds assuming the upper and lower bounds are known at this moment. So you can have you can have a lower and upper bounds on each activation function. And what all we do is we kind of propagate this activation function the bounce of activation function layer by layer. So after layers of layers operation we can reach the upper and lower bounds of each class output. And then what we do is that we we are comparing the lower bound of the ostrich with the upper bound of all other classes. And if we can assure the lower bound of ostrich is higher than the upper bound of all other shoes of all other classes under this perturbation epsilon perturbation. Then we can guarantee there is no attack within epsilon ball perturbation that can alter the prediction of our model and therefore you can have an epsilon certificate. And then the remaining task is basically how do you do a bisection on this epsilon to find the largest possible epsilon that give you this type of consistent decision making certificate. Okay, so this this thing is easy if we know the lower bound and upper bound of your activation function. So the next question and that's actually the main task that our new ribs paper is trying to solve is how do we find this lower and upper bounds of each activation function in a very efficient way. So you we know for a fact that the most of the activation functions in neural networks, they are actually nonlinear for example, have a valid tension, radio sigmoid and so on. So it is it is not linear in nature. But in order to make our certificate efficient, we actually requires the linear bond on each activation on each activation neuron and and over each day. So what we end up doing is we actually linearize try to provide linear bonds of each activation function. In this case, for example, this is an example of a radio function where we can provide a linear upper and lower bonds of this radio function. And we do it for each layer each neuron and in each layer. So if we linearize the activation bonds in each for each neuron, then end to end we can have a linearized neural network model and we can have we can propagate these bonds in a very efficient manner. So this is some small theory that basically shows what we are doing here. So imagine this fj ff your is your neural network classifier and j is the j's class. So if we do this linear bonding techniques, what we can show is that the the actual output of your function of fjx is can be upper and lower bounded by two linear functions fjl and fj meet you, where fjl and fj you actually linear functions by themselves and this layer coefficients are determined by these publications over layers that we have shown here. And what we are showing here, this is another example of how we do linearized bonds for different activation functions. So in this technique, our technique with crown can be applied to different general activation functions. So basically you specify activation function then crown will try to linearize different parts of your your activation function and make sure these linear bonds can be efficiently computed and can be propagated to from input all the way to the output. Okay, so what so once we have that in mind, basically what we are doing here is again, we are trying to provide a linear bond on the output of a neural network model which is known to be highly nonlinear but providing linear bonds is actually a good balance for us to efficiently do this certification framework. So for example, if we specify the epsilon and we want to say we want to know under this epsilon threat model is it possible to find other personal examples, then different methods actually I have a different ways of doing that in a in a in a efficient manner For example, the earliest work fastly assumes that the upper and lower bounds have the same parameter and crown kind of allows different parameter for the upper and lower bounds and also extended to general activation functions whereas fasting only considers radio activations. And our most recent work, CNN cert, which is published at triple ads 2019 is actually some certification methods optimized for convolution neural nets. So here we are really leveraging a convolutional nature of a convolution neural network and so we can represent the upper and lower bounds in terms of convolution operations and make the computation more efficient. So, so CNN cert is basically some technique that we build upon crown so it includes all the benefits of crown so it can also support general activation functions, but it can also support other things that we haven't show for crown for example, CNN cert can provide support to different various building blocks. So I should have mentioned this robustness certification is a very challenging text so whatever model you have, whatever layer you have, whatever new operations you have in mind. We have to catch up with you with the advance of this neural network architecture and provide the corresponding certification techniques. So this does not come for free for example, for a network model consisting of different layers including convolution layers, specialization layers, residual blocks or pulling layers, we literally need to define how do we do this, how do we find linear bounds for different layers and different operations such that we can do this certification in an efficient way. And so CNN cert is our latest version of this robustness certification tool so it can supply not just for pure CNN but also more advanced models like ResNet or the Net. So this is the slide that we try to conclude what is the difference between CNN cert and the previous works. So what CNN cert is doing is it can provide general network architecture so I would say it can support state-of-the-art machine learning architectures like usually includes specialization, pulling and residual blocks. And CNN cert is relatively more efficient than CRON or FastLIM because CNN cert is really using the convolution operations on CNN and instead FastLIM and CRON when we are developing we usually convert them back to multi-dayer perceptions. So CNN cert kind of avoids this conversion and makes things computation more efficient. So here is a comparison of the certification bounds. So again, we are testing this certification on different networks so it's different layers and different activation functions, different deaths and so on. And we can consider them for different LP norms. So what we can show is that the bounds found by CNN cert is usually larger than the existing methods and it's improved from some percent and possibly go up to 20% because of the efficient techniques and general architectures that we have in mind. And it is also computationally more efficient so because of this convolutional nature, CNN cert actually saves a lot of computation time. Okay, so lastly, a different work from certification is CLEBR that is actually the first work we developed for evaluating robustness. So CLEBR is again an attack-independent score but instead of providing a certificate saying no attacks can alter the prediction, we think EPS on both, it is rather an estimated score. So it is a score rather than a certificate in the sense that CLEBR did not provide a certificate but it provides an estimate of the minimum perturbation of a given input to the closest decision boundary. So how did you use CLEBR or other certifications that we have in mind? The best use case that we are recommending is really the before-after robustness comparison. So more or less you encounter a problem if I do a certain operation in my current model, for example, either pulling there or do some pre-filtering, how much robustness can I gain? So to answer that question, it's actually very ideal to use CLEBR or the verification techniques that we have discovered so far to do this job. So it provides a score or a certificate to justify how much robustness you can improve by doing a certain operation. And we also use CLEBR in other ways, for example, to understand the accuracy robustness trade-off of different image-connected classifiers that I have shown you before. We think IBM, we actually extensively use CLEBR to do these interesting demos and so on. So one interesting demo we do is called the Big Check. So what we do for CLEBR is we kind of hypothesize three different banks using different classifiers to do this handwritten digital recognition. And we use CLEBR to compute their robustness scores and ask humans to rank these robustness agreements between humans and machines. And more or less we found out CLEBR score can reflect how people think the robustness of these AI banking systems as well. So it is a very interesting use case and we have a demo on this link address. So if you're interested, feel free to check this out. So that's how CLEBR can be used. You can use to understand the robustness of different threat models, different data sets, different neuronal architectures or different defense mechanisms. We have open source CLEBR, so it's being implemented in the universal robustness toolbox. And in addition to CLEBR, art has a lot of different attacks and defenses. So I would say it's the most comprehensive universal toolbox open source so far. So feel free to check it if you are interested in this line of research. Okay, so lastly I would like to give you some takeaways. So hopefully by now I have convinced you that universal robustness is really a new AI standard for trustworthy machine learning system. Because eventually when we deploy machine learning or AI models to the world, we not only want them to be accurate, but we also don't want them to make mistakes, especially the stupid mistakes. And I would consider these adversarial examples as a very stupid mistake, considered by any human being. So accuracy should not be the only goal for AI models. We should also consider adversarial robustness and make sure these AI models don't make these stupid mistakes, no matter it is intentionally or maliciously being manipulated. And I also want to highlight this is a very interesting area because there's always an unraised between attackers and defenses. So attackers can use AI to generate more advanced attacks and also defenders can also use AI to generate more advanced robust models. And we are very curious about why it will be the equilibrium of these unraised schemes. And lastly we also spend some time to discuss how do we formally evaluate and improve model robustness. And what I discovered so far is we have a clever, basically attack and demand a robustness course. And I also tell you how we develop crowns and also a CAO insert, basically efficient ways to provide robustness certifications. And I would argue doing this certified robustness or evaluation is better than just coding some random attack functions to test your model because that game-based evaluation is not certified and you cannot guarantee your defense can be generalized to more advanced attacks. So certification or robustness evaluation can actually guarantee this robustness no matter what attacks you have been using. And the other good thing is that you can use this certification or evaluation tools to compare your model's robustness before or after you implement some mechanisms to make it more robust. So eventually I think the next thing that we are going to focus on and also the trend of the whole research community is to look into prurable, certifiable and scalable defenses. So as you can imagine there is a lot of fun in this research area. And if you are interested feel free to contact me or any colleagues within IBM that you know of and join us for this exciting journey. So this is indeed robustness I believe is an intersection of a human's perception, AI model and also the data quality. So this is a very challenging but also a very salient job that we need to solve. So I want to do a final acknowledgement here. So a lot of works have many amazing collaborators and many of them I believe are on the phone now, especially my IBM colleagues. And we have these amazing collaborations between MIT and IBM and also from AI horizon networks. So I would like to thank Lisa, David Cox for making this happen and also from great support from my management, a chance like Paiot and also Saskia. And also a lot of collaborators for people who develop different attacks that we didn't really look into for example poisoning attack and also developing these other robustness tool box. And then Matthew and their teams and also Casey and their teams for making our demo possible. So if you have questions feel free to reach out to me or find me on the Twitter and that will conclude my presentation. Thank you. Thank you very, very much. It's a great talk. Let's open up for any questions. Please unmute yourself if you're not familiar with WebEx. If you move your mouse you should see a little microphone and then click on it. I have a question. Go ahead. Thanks for your great research. My question is about certification. One thing that has been bothering me about certification is the fact that you are actually certifying for a particular image rather than for the network itself. Yes. So my question is do you foresee any way where we can actually get a different perspective that doesn't require you to provide an image? Specifically worry about images because potentially before you deploy a model you don't know exactly the distribution of the data that you are going to receive. And so the images may be slightly different. So even though you have a certification for a very particular image, the results may not be consistent for more general distributions. So we'd love to hear your thoughts. Yes. I think this is a very legitimate question. But I guess there are some difficulties to make certification for the model itself at this point because eventually these adversarial inputs are basically operated on a per-sample base. And if you look at the decision boundaries, especially when we are considering certification for adversarial examples, these decision boundaries from a neural network model are already fixed. So it is basically impossible to provide certificate for any input because there must be some inputs that are very close to the decision boundary. But the next interesting question is can we somehow, when we try to gauge the robustness in terms of models or let's say in terms of a per class, can we somehow create some representative samples? And such that by guaranteeing the certification on these representative samples, then this robustness can be bring up to a model level or the class level. I think that would be a very interesting question. But at this point, I don't think it is an easy task. Okay. Thank you. Hello. This is Leo Shah from RKI. So much for the great talk. This is really exciting. So I'm wondering if you can comment on some theoretical problems along this direction. Okay. Yeah. So I guess there are various theoretical problems that I can think of. For example, is robustness really at the price of accuracy? So empirically, we observe in order to make your model more robust, you always sacrifice test accuracy. But is that necessary? And if it's true, what is the theoretical explanation? What is the limit? How did you categorize this theoretical trade-off? And in terms of robustness training, for example, this minimax training, people, it's already a challenging task for robust training for itself. Not to mention, now we are trying to bring this minimax optimization to the neural network level. So how did you even sure minimax training on neural network will converge? And also, is there a scalable way of making this minimax training possible? I think there are various problems. And for example, there are some fundamental problems that we don't know quite sure what are the answers. For example, why do this adversarial example exist? And we don't really know. So yeah, I think a lot of open questions and no matter what your expertise and your angle is, you must be able to find an angle to tackle this problem. Yeah, and then, sorry, just a very full-up comment. It reminds me of a smooth analysis. Yes, your input may have some noise and then you are trying to be the classifier to fight against it. Then you put some kind of the mixture of worst-case-slash-average-case technology. So that's interesting. Yeah, exactly. That's a very good point. So this smoothness actually relates to the input-dipstrious function. So if you think a neural network has a function, then its dipstrious constant actually regularizes how much your input interpretation will work. It will amplify the output. And that's something that we use to evaluate the robustness. That's a very nice observation. Thank you. Okay, we're almost at the hour. I don't want to cut off questions, but for those of you that have to leave, I want to remind you that our next seminar is Thursday, May 2nd at 4 p.m. Eastern Time and it will be by Hugo Chin of RPI on bidirectional attentive memory networks for question answering over knowledge bases. Okay, any other questions for Pinyun? Okay, any other comments? I want to thank everybody then for attending. This was a really great turnout, even bigger than our first. And I hope to see you all again next week. Thank you, Brent. Thank you, everyone.