 My name is Wenbo, this is a joint work with my team is Alex Wadir, I say hi to everyone. And another team is in Silicon Valley right now, and Dr. Xin Yuxing and Jimmy Zhu, they went off from the JD.com security research lab. So today maybe I'm going to talk something different, it's not about attack different sort of things. So the title you can see here is about explanation. So we call this an alternative path to secure deep learning system. So first, Sanyos, all learning systems are exploitable, right? So why I draw this conclusion, so just first is because people are doing this, they keep on typing things. I remember one guy, he's a PhD in UC Berkeley. So he keep building a lot of, he keep breaking a lot of learning system. So another, this is not from me, it's from a good fellow, we all know him, right? So he's from Google. So he said this pipeline for different fielding. First we can see I said, we're getting some no effort, I don't see the title. It goes down, goes down, goes down, right here, right here. Do not generate or the straight mode. So it seems that it's very side, right? So, but remember, our ultimate goal is building a secure and trustworthy deep learning system. So of course we can still try doing the different sort of things to continue making our classroom more robust. All we can go another path is that we can do explanation. So what we are doing right now is that we want to interpret in the deep learning model. We want giving a model, giving an output, we want to know why the model makes this decision. So imagine if we know the reason of the every decision the model makes, we can make our own judgment whether this model is trustworthy or not. So, okay, so here is the roadmap of my talk. First I'm going to talk about what is model explanation. And talk about some existing explanation technologies and then it's our work at JD demonstration and some results and then it comes to a short summary. So first, what is explanation or interpretation? This is a definition actually given by me. So it's not a formal one but you can at least get a sense. It's giving an input sample. We want to identify a set of important features that make key contribution to the model decision. For image recognition it could be a group of like a pixel. For signal analysis it could be some keywords. For binary function stuff, is anyone here know about binary or something? It's not just a sequence of hikes. So it could be important instruction which is hikes. Here is an example of giving this review. You see this review is definitely a natural review. So for human why we think it's natural because we see this keyword, this price. What does not worth the price? So these are the keywords. So if a deep learning or like other machine learning model qualifies it as net queue, what we want to do is we want to find these keywords why they make these decisions. So giving another example is binary function stuff identification things. A little bit background about binary function is that it's just a sequence of hikes. What we want to do is a sequence of hikes, a sequence of functions. So we just want to identify which hikes is actually the function stuff. So here the left side here is a demonstration of the assembly code. But here we are actually using the binary. So the model output, this instruction is the function stuff. It's the entries of the function. So the explanation is that the model thing this one is the function stuff because these two instructions are very important of the high importance code. So from the people, from the binary people perspective, they all think this explanation makes sense because these two, this push-e-b-p move-e-b-p-e-s-p I think is a well known product. It's like a golden rule in this field. If you saw this rule, this definitely be a function stuff. So if we get this explanation, we're going to say, okay, the model recognizes the right thing. So we probably maybe contrast this more. Okay, so goes down. Actually, explanation is, I think it's maybe even harder than difference because we have so many challenges right here. I just missed the two of them. First, the model structure is very complicated. It's very, very large. There are two kinds of existing techniques. One is Y-box, one is Y-box. By words of Y-box, you just go inside model layer by layer, you see how the information perfectly like since then. The other is Y-box that we treat the model as one. We just query it, we get output, we use the input output, we do some things and we get the explanation. For people who are doing like me who are doing security, we think that Y-box is not that good because of following reasons. The model architecture keeps working. A simple example is that if we do recurrent neural nights, we all know that there are different kinds of neurons like GIO or LSTM sort of things. So if we design like say one method for one type of the hidden end, it maybe cannot be transferred to the other end, the other type so you need to design one for each, which is time consuming. The second is that model architecture is not available in many security applications. The worst total is the online malware detection tools. We can only query it, we cannot get the detection. The third reason is that some other issues that if you train the model to the separate area, so the gradient maybe come to zero, so you can get nothing if you use gradient-based Y-box explanation method. So we switch to black box. So what is black box? Here is an example, so if we give an input image, here is orange. This F say it's like a deep neural network. Okay, the model will tell you this is orange. And what we can do is that we use another function called CG, another type of model. And we use this model approximate to the F. And then when the image comes out again, the model will tell you, okay, so which part is important. So these are the important features. They just highlight the object here, it's only here, so this is so-called the explanation. So the idea is pretty simple as we just do an approximation. But so here is a little bit of a technique that always generates explanations that would first approximate the model with another transparency model. And then it inspects the, like if you're using the linear regression, kind of things, you can inspect the regression coefficients and then you can run it and get the important features. Here's another example. Here right here, so if this F also, so this F is the nonlinear function of the neural network. So what we want to do is we want to get an explanation for this image right here. So we want to approximate the local dysentery of this F, so we use the linear, so we say we use the linear regression GFX. And then we get this parameter, the idea is called the regression coefficients right here. And we run them, we find the top three. So this indicates the most important parts. Okay so, but we all know that the neural network is highly nonlinear. The decimal rate is so complicated so you cannot just approximate with the linear function. It will run into some problem. Here's the example. It will just approximate this function with the linear line. You can say it's not a very good approximation and the explanation, of course, is not so good. So what we want to do is we want to, we will try to do something different. We want to get a very precise, high precise approximation. So what we're thinking is we go to the statistical side and we found some mixture model, which is actually pretty good. This one called, this one called Dutry-Process-Micture-Regression-Model is a mixture of linear regression. So I will not do that far details, but the reason why it's actually this model is that we want some high precision approximation. If we get a very accurate approximation we can get a better or a very good explanation. So what we did to this model is that we had different regularization terms. So for different network structure we had different regularization terms. So for MLP and CDN we just used the elastic net to give us the approximation model, the model G, the ability to deal with high dimensional and highly correlated data. For images, if you're doing some image net or something else, it's like hundreds times hundreds of dimensions. But we're also doing something for the recognized that input is sequential data, basically it's always used in the MLP field. So we had the regression called filter. So to deal with this kind of network, this filter also can give us the ability to deal with the differences within the input sequence. Say if there is a word, for example, it's I am one more, so the I and I and the third one, the third one one more, they have certain kind of collisions or differences. So this filter also can give us the ability to deal with this kind of difference. So what we do is to generate and to de-vary explanation first of all, we fit the mixture regression model. We fit the model with the input of the deeper model and output of the different model. So we treat the model as driver, we keep querying to get our own data and fit our regression model with the data that we crafted. After that, we also deal with I am sensitive to do this, technical details, so I just skip. Either way, we found the best mixture components that are data-lizing. If you see here, this is a model with maybe five components, and this one is actually where the data-lizing. So I just pull this component out and get the regression coefficients of this component as our, and draw them from the important pixels of the SOR explanation. See here, the explanation way much better because if you can see here, we got the better possible. Okay, so that's pretty much our model already. This is nice. But there are actually a lot of minds, a lot of some technical detail, I just skipped, but we got paper so maybe if you're interested, you can refer later. So I just show some results. Here is our explanation of single images. These come from three different designs. One, first one, I mean the second one is fashion and I mean the third one is well, no image none. So you can see our method can always highlight the objects. This is interesting, so this is the pattern, the pattern. First of all, we want to explain why this is a pattern. Second, we want to explain why this is a pattern. So our method can always get to and also going beyond the simple images. Going beyond the simple images. If we get the important feature of the images, we can actually do some good thing or bad thing. One thing we can do is we can synthesize some examples. So if we just unify the important features inside those images, the image also looks like the original one, so here also is one and this is two. But you put this image into the different concept, we'll fail to recognize them because you just remove the important features. The second thing is that we just pick up the most important features right here. The image no longer looks as what it should be, so here it's something like short of something, you cannot recognize it, but the model can still classify it into the correct only because we keep the most important features. Also, another thing we did, I didn't show up here, we get this important image out and put it on these two. So it's so much out, but the model still can't specify it's size of one because we keep the important feature of one. So this can be another way of generating and also sample, you can use this model, use this image to return the model, make the model more robust. Here is another example, it's a binary function that I mentioned before. It's trained by, original trained by a recurrent network, so we got like 99.9 kind of accuracy. Also we swim this kind of, we swim that neural network, so here is what we got. Here is the high sequence, it's like the color means the important score, it's like a pigment. You can see the most important sequence right here. So if we disassemble this sequence we can get the instruction right here. This instruction is well done pending at the end of the function. That means if you have the first function with like 50 lines, the second function maybe 70 lines. So there are 20 lines, so this one usually the compiler will maybe insert some pending in that. So a lot of functions are ending with this kind of pending. So of course the model, if the model see this kind of pending they will say okay I see this, I saw this, maybe this one should be the function stopped. So this makes sense most on the model side, also on the binary side. So I show this result to someone to some binary X or this. Oh yes, we already recognize the function stop of a human by recognize this kind of pending. This is just another example. This is also another one pending and this is a well known product. What this instruction did is just prepare the the stack, so which is very common at the start of the function. Okay so for explanation what explanation gave us first this time. If you see this, this result if the binary expert who didn't believe in deeper than before, if you show the explanation result to them, they probably can change their view. I tried this before, so if they say okay the model actually learns the right pattern they probably tend to recognize this model and use it. So this is the benefit of the explanation between the brain class is that building trust of the model. Another benefit is that besides building trust, the model can pick up some new heuristics. This is also very interesting that some heuristics are not recognized by the expert. They only recognize a small part of the heuristic sense. After we do the explanation, we do the expert and they know okay they can verify them they can verify the pattern and then they find out this pattern could be the new heuristic which enlarges their knowledge which is also a cool thing. So here comes my summary. Firstly, as I mentioned before defense is long way to go because reducing attack space is pretty hard. Alternative way explanation So for security we turn to Breibach's explanation because the reason I mentioned before. And the benefit of explanation is that first of all we can of course explain in single instance also we can scrutinize the model weakness which we showed before we can generate some example. The other tool is that we can build trust of model, get new knowledge maybe generate some data to retrieve model and get more robust and more accurate. Due to time I just stop here, that's pretty much I won't talk today. So if you are interested in our work, you can just go to my personal web home page. We got two or three paper related to this topics and we are also open for collaborations. Okay, thanks. Any questions? Is there any questions or just leave? Like you are looking at matching the cats on the picture are you trying to match any cats or specific cats? It doesn't matter actually, so I would say any has. As long as we generate enough training data to train our approximation model, we can do that. Okay, so thanks.