 Okay, so welcome back. So Sanjit will give the second talk which is on verified artificial intelligence. I, you know, when I got invited to give this presentation, I thought, well, maybe I could give one talk on one thing like Euclid 5 or I could give two talks that would try to highlight different aspects, different areas where I think there is potential for applications of SAT and SMT and SAT SMT based technology. And so I decided for the second one. So in this talk, what I would like to do is something completely different from the morning, which is looking at this emerging area that I'm calling verified AI, verified artificial intelligence, which is really opportunities for applying formal methods to AI and machine learning based systems. And I think SAT and SMT have been applied to some problems in this area, but probably not, you know, I don't feel like it's reached its full potential. So my goal here is this talk is really, I would say right now this the work that I'll be presenting is less directly connected to SAT and SMT as the previous talk in the morning. However, I will try to highlight where I think we can use that in SMT here and just give you a sense of the opportunities. So that's really one of the goals of this presentation. Also there are a couple of tools and languages that we have created in my group for bringing formal methods to AI. So these are the two things that I mentioned on the slide here. So verif AI or we pronounce it as verify is a toolkit that we have created that we presented at CAV earlier this year. And scenic is a probabilistic programming language or modeling language that we've been trying to use for these class of systems. So the motivation for this I think this is something that we are all aware of is that there's growing use of machine learning and AI in all kinds of systems. But many of these are actually safety critical or mission critical and some of these are cyber physical. So they involve the integration of computation with physical processes, with physical systems. So a good example of a safety critical cyber physical system that uses AI is something like a car that has autonomous driving or driver assistance technology in it. And this is already happening today on cars that are sold. And so you have things like deep neural networks doing object detection, scene understandings, segmentation. Then that gets fed into a pipeline that involves perception, planning, prediction, control and so on. And the use of this sort of thing is projected only to rise. And of course these are things like cars and avionics and medical devices. For example, there's more and more use of deep learning in aspects of medicine. These are all safety critical and there have been growing concerns about safety. So there are many papers that have shown that things like deep neural networks can be easily fooled. I'll show a few of my own favorite diagrams later on in the talk. So that's one concern. The second concern is that some of this might have actually already had serious consequences. So there was an accident involving an Uber vehicle which was fatal and the report just came out a week or two ago from the National Transportation Safety Board in the US saying that one of the aspects of that could have been the perception software. So these are serious concerns that have to be addressed. Informal methods can certainly play a role. But the question is, from a research perspective, what are the new challenges that this poses? And so personally I believe this is actually a very rich space for work in sat and SMT and more broadly informal methods. And to understand this about three and a half years ago, I and some of my colleagues wrote a paper that you see cited up there. It's still an archive. And basically what we try to do in that paper is try to understand what are the challenges and opportunities for applying formal methods to AI based systems. So this is the standard diagram for formal verification. You have you taken three inputs, which is a model of the system being verified, a model of the environment in which it's supposed to operate, and the specification phi that should satisfy. And the question we ask in formal verification is if I take S and compose it with E, does the resulting model satisfy phi? And yes. Right now it's just abstract some form of composition. The particular form of composition of course depends on the mode of interaction between the system and the environment, right? And then ideally you want a decision procedure. So either say yes, it does satisfy, he is a proof or no, it doesn't satisfy. And here's the counter example, one or more counter examples. So let us just try to get an understanding of what this means in something like an autonomous vehicle that's using deep learning. So the first thing is the environment. So normally if I'm doing software verification and maybe I have a program that makes system calls, so I have to model how the system calls may behave, what outputs I can get. Or I'm doing hardware verification and I want to model what the interfaces look like and whatever inputs I can get. But for something like an autonomous vehicle, the environment has this open world property. You don't even sometimes, you don't even know what are all the agents and the variables of the environment, let alone how those variables change. So this is a major problem and I'm going to spend the second half of the talk focusing on this, so I'll defer further discussion to that. The second aspect of this, the challenge is that the systems themselves are very complex in a way that we haven't seen before. So you have components like deep neural networks that have hundreds of thousands, about millions of parameters. And people who have worked in circuit verification might say, oh, that's nothing because we can handle, SAT solvers can handle circuits with millions of gates. But the characteristics are different because these are the parameters thought of as reals, but you think of them as floating point numbers. But they are essentially large circuits with hybrid set of variables, discrete and continuous. And furthermore, they operate on very high dimensional input. So imagine this, you have a neural network that's doing object detection. It's taking in a stream of images. Each image can be, if you think of it as a vector, it's like a million dimensional vector. So that's one challenge. And then I would say that perhaps the biggest challenge is the specification. Because if you see the most effective uses of things like deep learning, today are in perception. So computer vision, natural language processing, those sorts of things. And typically there is no formal spec for what the system should do. So for example, I want my neural network to detect all instances of a car. So you give it this picture and you say, well, is this a picture of a car? And in this case, it so happens to be a car that was actually created by an engineer who used to work at Berkeley. But it doesn't look quite like a car at first glance. So you can't really formally specify this kind of thing. That's the point. Even if you solve those problems and you come up with S, E and Phi, then you have the standard problem in verification, which is you have to search very high dimensional input in state spaces. And sometimes the challenge of doing this can be so hard that you might say, well, why don't we just do a clean slate design to correct the construction? But it's not immediately obvious what that means. We can think about areas like reactive synthesis where we start with a formal specification, want to derive an implementation from that. This is quite different. So what are the research questions here? So the first is, if I want to do formal specification, what is the formal specification? What does it look like? What formalisms do we need? The second is, if I have the spec, how do I then get verification to scale up? The third is, how do you model the environment? As I mentioned, it's a very different type of environment than we have in hardware and software verification. And then finally, even if you do all these things at runtime, the environment might not be what you had modeled at design time. Unexpected things will always crop up. And so you have to be able to design for failures. So this is called, here called runtime assurance. You have to be able to monitor and then do have some failsafe mitigations at runtime. So those are all the research questions there. I think from the SAT SMT perspective, there's at least two opportunities. And if you can think of more at the end of the talk, please do come and talk to me. The first is, I think in standard verification, we have tended to focus on satisfiability mostly. So I want to find the violation of a property is, does there exist a trace that violates the property? But in this case, I think there's a scope for doing more, especially written optimization. So we saw a little bit of this again in the morning talk. We can do optimization modulo theories, but I think this is an area where there is a lot more opportunities for that. The second is going beyond SAT. So to go to problems like model counting, which is counting the number of solutions to the satisfying solutions to the problem, or the related problem of distribution aware sampling. So I have a space defined by constraints, and now I want to be able to sample, I have a distribution over that and I want to sample from that distribution. So we will see a very concrete example of this problem in the second half of the talk. And so I wanted to further stress that, but what we want is not for SAT, but for SMT. That is really the place where we need a lot more advances. So the outline for the rest of the talk is, I will start, the first part of my talk will be an overview of some of the basic challenges. So I'll give a motivating example for the kind of system that we're talking about. I'll talk about specification, and then I'll talk about some initial ideas to how to get verification to scale. Then in the second half of the talk, I'm going to spend on environment modeling. In particular, I'll talk about this language called scenic that we have devised. And then I'll talk about the verify toolkit which uses scenic as the modeling language. And I'll give a number of examples of these and videos of demos of its use and so on. Unfortunately, I was hoping to give a demo on my laptop, but ran into a glitch on my Windows laptop. So I'm not going to be able to do a live demo, but do come talk to me if you want to try out these tools or see a demo of this later. So let me start with an example. So here's an example. This is an automatic emergency braking system that uses deep learning for perception. And I just want to give you the structure of the example. So systems like this are already implemented on vehicles, although not all of them use deep learning. So the idea here is you have a camera in front of the car and the camera is looking out at the road in front. And if it detects the presence of another vehicle or some other object in front, then that goes to the controller. The controller is not based on AI or machine learning. It's sort of a standard hybrid systems based controller. And that controller will actuate the brakes if it detects something in front. And then that will cause the vehicle to slow down possibly. And then that will in turn interact with the environment. So that changes parameters like the relative velocity between the autonomous vehicle and the other vehicle and the relative distance and things like that. And then that produces new sensor input. So this closed loop system is the one that we're looking at. So the goal of the system is to break when the obstacle is near and you want to maintain a minimum safety distance. In this talk I'll keep this abstract. But if you want to experiment with models of this, we have a number of models that we have created over the last few years. So we started with a model in MATLAB Simulink. So we'll talk, this is talking about hybrid systems. And then we have used a number of simulators that people have released for autonomous vehicles including the Uday city simulator, robotic simulator called VBOTS and so on. But the key point is the neural network is an actual neural network, industrial scale, state of the art neural network that people have used. And we have tried many of them. So those of you who are familiar with deep learning, there's networks called Inception and AlexNet that were originally that emerged six, seven years ago, which were trained on this big data set called ImageNet. And then more recently, we've been experimenting with networks that are targeted specifically for autonomous driving, which are trained on data sets that are also for autonomous driving. So squeeze debt, YOLO are these sorts of things. So the first challenge is formal specification. So what is the specification for something like a neural network component that is supposed to detect obstacles like cars and pedestrians, bicyclists, things like that. So how do you even start this problem? So our take on it in this common sense is that I at least I don't know a way of formalizing the perception task. So I'm not even going to try. Instead, I'm going to look at the whole closed loop system and write a safety property for that system. I'll give you a very concrete example in a few minutes. But the idea is you start with the system level, you specify the property over the whole system. And then from that you flow down constraints on the interface of the neural network. So we're not really trying to answer this sort of question. Because in many cases, in many of the most impactful applications of neural networks, this is not meaningful. So verification of neural network object detector. What is meaningful is verify that the car that uses the neural network always stays safe. So if this is my system, this is my end to end system, then what I'm going to do is write a temporal logic property, which I've simplified and written somewhat informally here, which says that always the case that the distance between the ego vehicle, ego vehicle is the autonomous vehicle, and any object in the environment is at least a certain threshold delta. So while the car is running, it maintains a certain minimum threshold from anything in its environment. Of course, the threshold will vary depending on which part of the world you're driving. So in Mumbai, maybe it's like a few centimeters. And in the US, it's like a few feet. But that's the typical thing. So key point here is that the property is not over the inputs or outputs or the hidden layers of the network. I'm not writing a property on that. I'm writing a property over the variables modeling the vehicle, position, velocity, things like that. And the variables modeling the object in the environment. That's what I'm writing the property over. And if I have a model in MATLAB, for instance, I have variables for all these things, and the property is well specified. So that seems like almost punting on the problem, because I wrote the property for the whole network. And so the question, a natural question is, do we ever need to formally specify properties on the network itself? And the answer is, yes, of course, we do need to do this sometimes. And we have a paper about this if you're interested. That was at ATWA 2018. So there's two cases. One is where the component specifications are really meaningful. So one place is where you use machine learning for control. So the inputs to this are things like the current position, the current orientation, the current velocity, things like that are the inputs to the neural network. And the output is the throttle and what the vehicle is going to do next, what the next action. So in this case, you can write a meaningful specification on the network itself. Also in cases in some other applications, where, for instance, you have a neural network that is making decisions about bank loans or things like that. So in those cases, you can write properties that can be meaningful. I'll have a bit more to say about this in the next slide. So there's a number of types of properties, though, that you can write. So one is called semantic robustness, which I will expand on in the next few slides. The other is, you can just think of a neural network as a program, especially a feed forward network. It takes inputs and it generates outputs. And so you can write a pre-post pair on this program. And this can be meaningful in things like control. Or you can write monotonicity. For instance, I have a neural network that is making a decision on a bank loan. And I have one of the inputs is the salary of the applicant. So you might say, well, if the salary goes up, it shouldn't be that the loan is denied if everything else saved the same. So you have some kind of a monotonicity property on the output. Or you have things of fairness and so on. So those are various properties that do make sense, but not typically when the neural network is used for perception. So when the neural network is used for perception, there is a different use case, which is when you want to do compositional analysis. And this is what I'll talk about. First I'll talk about semantic robustness, and then I'll talk about this particular use case. Good. So any questions so far? So let me tell you about semantic robustness. So one of the most common things that you might have heard about when people talk about verification or testing of neural networks is something like this. So there's lots of these articles which find yet another way to fool the neural network. So the panda becomes a gibbon. The turtle becomes a rifle. In that case, the dog becomes an ostrich. Stop sign becomes a heel sign. And so the idea here is that they typically start with an image. They make a few small changes, add some noise, change some pixels, and suddenly the neural network is fooled by it. So this is an example of robustness, input output robustness. So a small change to the input caused a large change in the output that was unexpected. Okay. Now, from a formal methods perspective, what I found was when I read the literature, and literally there were 100 of papers on this topic, there were lots of variants of robustness. Everybody had their own definition of what robustness means. So we said, okay, well, let's come up with a common formulation of what robustness means across all of these flavors. So this is what appeared earlier this year. So this is the definition of robustness. So capital X is the set of all input items. So think of them as set of all possible images. So little x is an element of capital X. So you're given a specific image, little x. You want to find an x star, which is a modification of x, subject to three constraints. The first constraint is called admissibility. And what admissibility says is that x star should lie in x tilde, which is a subset of capital X. So what this is saying is I'm only going to allow certain legal or valid perturbations of my input x. This is a hard constraint. Second one says that, but I want this new input to look similar to my previous one in some way. So this is typically enforced by a distance constraint. So you have typically a metric mu between x and x star. And you say that mu x x star is bounded by some parameter alpha. Here we have abstracted this to a predicate. So we say d of mu and alpha is a predicate, because in some cases people have different sorts of constraints on the distance function. So that's the second one. The key point here is this is a tunable one. You can change the value of alpha depending on your need. The third is a target behavior constraint. So this is what the adversary wants to achieve. So this can vary. For example, the adversary might say, I just want to flip the label. That's all. Or the adversary might say, I want to maximize the loss function on x star. I want x star to be the worst in terms of the loss function. So that sort of thing. So there's three. And this beta is a parameter, which again quantifies the extent to which the adversary wants to change something like the loss function. So this is the decision version of the local robustness problem. Typically, if you read a paper on this, they use an optimization version. So either they'll say, we want to minimize the perturbation. So we want to find the smallest perturbation, the smallest alpha that will change the output. Or they might say, we want to maximize the loss. So I want to find the largest beta such that the loss of f of x star is bigger than f of x by that beta. So that's basically local robustness. And what we found is pretty much all the definitions of robustness in the literature fall in this. And they just have different predicates d and a and different set x tilde. And maybe different distance function. So this is under, I don't expect you to read this slide, but this is about 20 papers in the literature from 2013 to 2019, 2018, which is categorized based on the different flavors of these three kinds of constraints. So that is what is called local robustness. This is what people typically study. But I was never very happy with this, because so what? I mean, I can change my stop sign to a yield sign, but is that really going to cause the car to crash? Most cases probably not, because there's a lot of people build these systems to be robust. So even that automatic emergency braking example that I just talked about earlier, we have not only a camera but also radar in the vehicle and it uses both. So you want to know, you want to find adversarial perturbations that will have a system level impact. So there are a couple of questions. One, can the environment really make these mutations? I'm coloring a few pixels in the image, but really can that sort of thing happen in the environment? That's one. Second, why only look for small mutations? Maybe I have a huge change in the pixel space. I take a red car and replace it with a blue car. I still want the output to be the same, that it's a car, but it's lots of pixels have changed. And then the third one is what I said, which is what is the system level impact of this? So first, let me talk about the first two and what formal methods can bring to this. So this is what we call semantic adversarial analysis. And if you're more interested, we had a paper at CAV in 2018 talking about this idea. So the general idea here is, if you think about neural network for perception, it starts with concrete input space, the image space, and you're producing some outputs. And typically the outputs can be a vector. So if you're taking an image and drawing bounding boxes around the cars, then the output is actually a vector of all possible bounding boxes along with the coordinates. So what we really are interested in here is that these images are produced from some underlying semantics of the world. So if I have a renderer or a simulator, and that simulator has a 3D model of what is in the world, then I think of that semantic feature space as ss. So these are things like the position of the car, the model of the car, the color of the car, the time of day, the surrounding environment, and so forth. And the renderer produces pixels and then neural network classifies those. So what I would like is robustness with respect to the semantic feature space. So if I take two vectors s and s prime in this space, and then s is rendered to x and s prime is rendered to x prime, x and x prime can be quite far apart, but I still want the outputs to be similar. So similar vectors in the semantic space lead to similar outputs, that's the idea. So it turns out, this is the kind of problem we want to solve, and it's hard to solve in general because of this renderer. So you don't typically know what the renderer is doing. It can be some very complicated physics-based rendering engine in practice like a spaghetti jumble of software. And so it can be hard, but what's interesting is in the graphics and vision community, people are coming up with very nice ways to build differentiable renderers. And the reason that's useful is typically, if you've been following the deep learning literature, a lot of the work on training and so forth is possible because of algorithms like stochastic gradient descent and the variations which all rely on f being differentiable. If r is also differentiable, then if you want to search for adversarial examples, you can actually take a step with respect to doing a differentiation of the composition of r and f. So this is what we actually did recently. So we have a paper on archive about this, and I won't go into the details, but we have a semantic representation and using that, this is one image that we started with, and we were able to change the color of this car, you can see down here. And also the time of day, so you can see the shadows look a little different. And then as you can see in the top image, the orange car is detected, but in the bottom image the recolored car is completely undetected. So this is a way of producing these semantic perturbations. Good. So opportunity here for the SAT SMT world is you can actually, if you're able to take, integrate the kinds of search that we do in SAT and SMT with the optimization driven search that the adversarial example literature does, then I think we can find, we can do something where we can do proofs, not just find adversarial examples. So proofs of semantic robustness. So let's talk about the second part, which is do adversarial examples have system level impacts? How do you find those? So here I'm going to go back to the notion of a system level specification that I talked about. And this is linked to the whole question of scalability, because now I have this closed loop cyber physical system, which is a hybrid system, both discrete continuous variables. And I have fairly complex software running, as well as the non-linear differential equation model of the vehicles and so on. So how do I actually get things to scale up? And so our approach has been, first of all, we want to do compositional analysis. And secondly, we're going to use simulation based verification, which is also called falsification. Okay. So here's the problem. You start with a closed loop system like this one. This is what I showed you, except I just labeled all the arrows with variables. And if you look at the hybrid systems community, there's a lot of work on verifying systems that look like this. So there's no sensor, there's no neural network, but there's a model of the environment, a model of the controller, a model of the plant. And there are tools for reachability analysis and simulation based verification for that. Now, if you take something like this and you try to apply those tools, it just doesn't work. And the reason is that the dimensionality of this space, which is the image, the input to the neural network is much, much bigger than the semantic space that it models. And these operate over the semantic space. This is the third space. So we're talking about something like searching over a few tens of dimension to going to million dimensions. So it's a much bigger input space. So when we started doing this work three years ago, the approach we took was threefold. And actually you'll see a lot of, I mean, this is like standard stuff for formal methods. So first we'll do abstraction. We're going to reduce this. We call this CPS ML. So it's a cyber physical system with machine learning. We'll reduce it to a combination of analyzing machine learning free model and then an analysis of the machine learning model by itself. Then we'll use for the CPS model, we'll use temporal logic falsification. And this is a very scalable technology that is already used in production in different kinds of systems. And then for the third part, we'll do the semantic feature space analysis of the machine learning component. So let me tell you very briefly in one slide what is simulation based falsification. So the idea here is that you start with a specification in something like temporal logic. You write a logic based specification. But then you will turn the logic based specification into a cost function. And the key there is we'll use temporal logics that have so-called quantitative semantics, meaning that on typically for a trace, you typically say that the formula is either true or false. So it's Boolean. But these logics, you can not only assign a Boolean, but you can assign a real number which says how true it is or how false it is. So how well it is satisfied or not. So I'll give you an example. So here's a simple metric temporal logic formula. So this is globally in time zero to tau, the distance between the vehicle and an obstacle is at least delta. And it's been lots of past work on these kinds of metric temporal logics, including by people like Professor Pandya in the room. So there's a nice literature on this topic. The thing here you can do is that you can take this temporal logic property and you can compile it into a very syntax directed fashion into a cost function. So basically look at the inside part, you take this predicate and you turn this into a function that's the difference between the left-hand side and right-hand side. So if you look at this particular inside expression, distance minus delta, if it is positive, then you know the predicate is true. And if it is negative, the predicate is false. So the sign of this function corresponds to the satisfaction of this predicate. And now you want to take the, say globally it's true over this interval, you turn that into an infimum. And now if the infimum falls below zero between time zero and tau, then you know that the property has been violated sometime. So what does this allow you to do? It allows you to turn verification into an optimization problem other than satisfiability. If I can minimize this cost function and if that minimum falls below zero, I found a bug. So I think this general principle can be used not just for this class of systems but for other systems as well where you can do this sort of encoding into from logic to cost functions. So that's good. Now we have this cost function and we can use that to drive the search for bugs. And the way actually this is done in practice is using for this class of systems, numerical optimization, what's called black box optimization. So you just, it's also known as gradient free optimization. You don't know the gradient of the cost function because you don't know the model of the system. You're running code for Instagram. But you try to reconstruct it by doing heuristic search. So that's good. But you still have this challenge of a very high dimensional input space. And in particular, if we want to do compositional verification, the standard approach is it takes something like this, break it up into its components and write specifications for all the components and then verify it. But the challenge here is that you have one component that doesn't have a formal spec. So we want to do compositional verification but we don't have compositional specification here. That's the specific kind of challenge here. And it's the first time I came across something like this in the formal methods world where we have this situation. So there's a little tech report I wrote about it if you're interested in that. So how do we solve this? So the approach here is that we'll start with our model and we'll do abstraction and we'll take our neural network and we'll replace it with an abstraction. And you can do, in fact, what we found is even very simple abstractions give us a lot of mileage. So for instance, you can take a neural network and we can do two abstractions which we call an under approximate and over approximate abstraction. So under approximate means that I take my neural network and I replace it with a perfect classifier. So now this neural network, because I have a model of the environment, I know what is ground truth. I know if whether there is a car or not. I know where the car is, et cetera. And so I can just replace my actual neural network by the perfect classifier. Or the over approximation on the other end of the scale, there's the worst classifier, which always gives you the wrong answer. Again, you can construct this because you know what the ground truth is from your environment model. So you can build the over approximation and under approximation by just replacing the neural network. And since now this is now bypassing the sensor, you've also gotten rid of the high dimensional input to your system. So this becomes a standard CPS verification problem. And we construct, we have an algorithm to construct what is called the region of uncertainty. So the region of uncertainty is basically the set of all evaluations of the state of the environment, the state of the controller and the actual system in which if the neural network made a mistake, so if it went from the perfect classifier to the wrong classifier, if the answer was flipped, then that would cause the safety property to be violated. So I'll give a concrete example in the next slide. Basically it's like the subset of the semantic feature space where the decision of the machine learning component matters. So we take this and remember this is over these variables and we project it to the actual feature space, input space of the neural network. So this is the pixel space. So for instance if I know that my neural network fails when the car is five meters away, then, and this requires some access to the renderer, I can figure out what is the region of the image where the car is going to be, what region of the road. And then I can focus all my perturbations in that region. So that's what we do. So we can then do a local search over the pixel space there. And then once we do that, we combine the inputs from both these sides and do a full simulation. And if you find a counter example in that full simulation, so full simulation meaning the whole system is modeled, then it's a true counter example. Otherwise we do refinement. So what this means is we found a spurious counter example that is this something happened here, right? We were using approximation that was to course. So we're going to go ahead and refine that. Okay, so I'll give you an actual example. So this is the emergency braking system example. So here we kept it very simple. So the environment was simply the straight desert road. And here it's driving on the right. And there's exactly one car in front of the autonomous vehicle, okay, this one. And the only environment parameters are going to be the relative distance between the camera and that car and the relative velocity between the two vehicles. So in this, what you're seeing here is on a diagram of the all possible combinations of relative distance on the y-axis and relative velocity on the x-axis. And this is for the case of the under approximation where the neural network or the machine learning classifier is perfect. So any point in the green region is a combination of relative distance and relative velocity where it's safe and the red region is unsafe. So what this means is if I was very close to the to this car already and I was going very fast, doesn't matter if my network detected the thing, I'm already going too fast to brake and brake safely, okay? So that's what this region means. The over approximation obviously when the network makes more mistakes, more things become red, okay? But what's interesting is not everything becomes red, right? I mean, there's various other robustness in the system. This particular thing had a radar that helped. And so we take a diff of these two images that gives you the yellow region. And the yellow region is basically the region of uncertainty. So this is all combinations of initial relative distance relative velocity where a mistake by the neural network matters, okay? And so in this case, we ran this using inception on this particular scenario. And there were three semantic features we changed. One was the x, which is the lateral position of the car. So, you know, horizontally on the road this way, the this dimension is the z dimension, which is moving it towards the camera or away from the camera. And then this vertical dimension is the brightness of the image. And what we can see here is this is what we get from the component level analysis. So what we see here is every x is a combination of these three parameters where the neural network misses the car, okay? And what we see there's lots of mistakes where the brightness of the image is high and when it is far away from the camera, right? And you get things like that, which, you know, is it turns out it misses the car, but as it gets closer, it finds it. So it's not a problem, okay? But here's one example, which is a corner case image, right? So it's like one red cross in a sea of green crosses. And this turned out to be this image, okay? So remember here, we're driving on the right. So this car is going away from the the ego vehicle, but it's going slightly on the wrong side of the road, okay? And it's close enough that in simulation, it'll violate the safety property. So this is the kind of analysis we can do by using this compositional framework, right? Because otherwise the space of possible images is too large, can't possibly hope to explore it. But if we shrink it and we focus only on the part that is system level relevant, then we're able to do this. Later, that was three years ago almost, we extended it to streams of images. So this is a very similar thing where every green dot is a position where if you place a car, it's detected. Every red dot is where it's not detected. And you can see somewhat alarming that this, there's a lot of red dots close to the front bumper, right? And this is an actual neural network that has become since a commercial product. So this is, you know, even state of the art, you know, techniques for designing these things have issues. And what we can do is we can generate lots of counter examples. All the red dots are counter examples. And we can basically superimpose them over an image and show it to a designer and say, you know, that's your blind spot. So in this case, all the cold colors are blind spots. These are all places where if we put a car there, squeeze that would miss it. So, so that sort of analysis can be useful. So I think the summary here is that there's a really a role for formal methods to play in finding these sorts of issues with these even very state of the art industrial machine learning components. But it has to, you have to help it by doing abstraction and compositional analysis. Okay. So that's the end of this part of the talk. Any questions? Yeah, Superthik. So do we know why there are those red dots in the sea of blue dots? So you mean this one or the previous one? In the previous one. The previous one is a real puzzle. I mean, I still don't know why that's happening. And it's become very nonlinear behavior of the neural network. But we have observed this kind of thing again and again. I'll come up with another example of this. There's some sometimes it just has these, it's hard to give a like an actual explanation for why something like this is happening. This one on the other hand, we had an explanation, right? So the idea here was that you, you're having a lot of misdetections near the front bumper. And we realized that this is because this was trained on an actual data set, which is collected by driving around. And the people who are driving around were not driving too close for safety reasons, right? And so they didn't just didn't have images where the car was close to the front bumper, right? I mean, we've also done, I won't talk about this today, but we've also done a counter example guided retraining. So you can generate these counter examples, you can add it back to the data set and you can retrain. And then you can make the system more robust. Okay, so in our, in our CAV 2018 paper, we have an example where we did this and it got, it did get more robust. It's just that there was still collisions, but they were at low velocity. So, you know, yeah. Rather than raw pixel data, because these raw pixel data is very, very unreliable. You just change a few pixels and you get a different label. Okay. So the question just to repeat it is, does it make sense to design a network that will work on the semantic space? I think that would be the, the ideal, the holy grail. But the problem is, right? That's what the neural network is trying to do. The neural network is basically, its job is seen understanding. So it's trying to take rod sensor data and reconstruct the scene from that, right? So the, there are, I think what can be done to help is for neural networks to, and some people are doing this, right? So I heard a talk recently where at an autonomous driving company, they were saying they're trying to design the neural network so that their intermediate outputs are more meaningful, right? So, you know, you don't have to kind of think of it as a black box that goes from pixels to decisions. So that will definitely help. But, but otherwise, you know, it's because even if you consider how humans perceive, I think it's very hierarchical. You kind of look at objects in bulk and then maybe go into details rather than the other way around. Yeah. I mean, that's an excellent question. And I, I think need more expertise from people from the neural network side to tell us if something like this can be done, right? Okay. All right. So I'm going to switch to the second part of my talk, which is about environment modeling. And so there's a number of challenges with environment modeling. And in this paper, we basically sketched out three strategies to deal with it. The first we talked about is data-driven. The second is using probabilistic methods. The third is technique called introspective modeling. So I'll tell you more about all of these in the next slide. But I'll spend most of my time talking about the probabilistic aspect. So what is environment modeling? Independent of this particular domain, right? Even for non AI based systems, it's basically knowing your assumptions, writing down your assumptions. Because no, in truth, no system is, is just like formally verified in the absolute, right? You're always making some assumption about its operating environment. So then in my morning talk, I talked about security and security is always with respect to an attacker model, right? So that's then assumption. So in this case, the task of environment modeling can be quite difficult depending on what sort of assumptions you're making, okay? So or what's unknown, okay? So I'm just going to again continue with this autonomous driving application, but this is more general than that, of course. So the first level is imagine that you know what are all the other agents in the environment, right? So it's a, I'm doing, this is an application called platooning, right? So in this case, you, you control all the cars, like you're designing that system of five cars. You're designing the, the communication that goes between the cars. You're designing the sensors and all of them, right? So this is a system that you completely are designing. And let's assume the environment has nothing else in it. So there's no animal that's going to cross the road or anything like that, okay? So that's, it's a very controlled environment, but still at runtime, you don't have in absolute control over the speed. And you may have, you may not know, like the, the coefficient of friction for the surface and things like that. So there's some parameters that are unknown, okay? So that is one kind of uncertainty. And you have to make assumptions about that. The second kind of uncertainty is, so that's one level. The one level higher than that is when not only do you not know the parameters, but you don't know the behaviors of the agents. Like what, what is the model of how they change their state or make decisions? So a good example of this is modeling humans in the environment, right? And this is something that you have to do for these sorts of systems, model a pedestrian. You need to know how is that pedestrian going to move, right? What are, you need to predict the trajectory, okay? Similarly, other human drivers, you have to be able to model what they're going to do. And in this case, you don't really have, it's not like you have a, a finite state machine or a Markov decision process model of these things, okay? And then the third thing is one level even higher is you don't even know what are all the agents in the environment, right? So this is like a crowded traffic environment. You have lots of cyclists. You have a lot of cars, okay? And so you don't know any point in time how many cars, how many cyclists are going to be there, where are they going to be, etc. So we have been trying to chip away at all these levels. I'll talk only about the first one. So for the first one, we've found that probabilistic programming and probabilistic reasoning can be a very effective way to deal with this sort of uncertainty. Okay? So I'll tell you more about that in the next several minutes. But for the second one, we felt that, you know, since we don't have models of the, of how these agents behave, it's a good strategy to try to learn those models from data. And you learn it not only passively, but also actively. So we have this work where we have, we put people in a driving simulator and then, you know, do put them in different situations and see how they behave. And then, you know, in, when the system is running, it's actively trying to make, take actions and observe what the responses of human agents are to these actions. Okay? And then the final one is the hardest. And our approach to that is called introspective environment modeling. So I have a paper at the runtime verification conference happened earlier this year that describes this in detail. And the idea here is really extracting the minimum set of assumptions that the system has to make about the environment which it can monitor over the sensors available to it. So basically algorithmically generating the assumptions that will keep it safe. Okay? All right. But now I'll tell you about the first one, which is how do you model the behavior, the environment where the main source of uncertainty is parameter uncertainty. So I'll take this example again of autonomous vehicles. Like imagine you have to model something like that, you know, bumper to bumper traffic situation. Okay? Now even this has a lot of variation associated with it. Okay? So what we're going to do is we're going to define the term scene as the configuration of objects and agents in 3D space along with the behaviors and the features of all of those objects and agents. Okay? So the term scene I'm using here is basically that it represents the 3D configuration along with behaviors. Now the way typically people try to do this is they try to collect lots of data, drive around, collect images and video and then train on that. But this is very expensive. Right? And what if you have been following the literature, a lot of companies invest lots of time and money on simulation. Almost every company has, you know, sometimes its own simulator, sometimes multiple simulators to be able to do this. Okay? So that's good. But the problem is if you just do random simulation and say, they say, you'll read articles would say, we drove billions of miles in simulation. That's meaningless. Because what matters is what did you actually simulate? Right? And we all know this in formal methods. And so what we want to do is generate, you know, useful, interesting realistic scenes. Okay? So let's even take a very simple example. So just take an image of a car. Right? That's it. Even that has a lot of variation to it. Right? So it could be like it depends the model of the car could change, the location could change, the background could change, the time of day, the weather, lots of things that can change. So even the scene of a car in the environment has a lot of variation to it and it's very high dimensional. Okay? But even though it seems like there's so much diversity, there are constraints. Right? The car has to be on the surface of the road or a parking lot or whatever. Right? And, you know, there's kind of physical constraints that are there implicitly. And furthermore, from a design or verification perspective, I might want to put other constraints on it. So I might say I'm interested in highway scenes. Right? I want to analyze the system in that context. Right? So the question is one of representation first. How do we represent this sort of a distribution of scenes? And once you represent it, how do you then systematically generate data from that representation? So our approach to this problem is something we call scene improvisation, which is a way to generate synthetic data. So given a renderer or a simulator, we can answer questions like create images to train this neural network. Right? And you'll have different types of constraints. So for example, you might say that objects should not intersect on the road. Right? It should look similar to real-world traffic and yet you want a large diversity. Okay? So diversity can be expressed sometimes in terms of the way in which you sample from that space. Right? So I want to sample uniformly at random, for instance. So the reason we call the scene improvisation is it's actually an instance of a more general problem that my student Daniel Fremont did his thesis on where the problem is you have hard constraints like this one, soft constraints, which is some kind of a distance of quantitative measure and you have a randomness or a distribution constraint and you want to basically generate samples that satisfy these three kinds of constraints. Okay? But in this case, in this particular setting, we are going to use a probabilistic programming language to represent these three kinds of constraints all together in one representation. So scenic is the language. It's a domain specific probabilistic programming language and basically a scenic program, I'll give you examples very soon. It defines a distribution over scenes. Okay? So a scene is a concrete instance of a 3D configuration of objects and agents and a scenic program represents a set of scenes along with an underlying distribution. Okay? So for example, bumper to bumper traffic, this is about 20 lines of scenic code. Okay? Scenic is an embedded DSL in Python and I'll give you an example of it very soon. But the implementation is on top of Python, but the language definition itself is very formal and we have presented this at PLDI this year. So it has well-defined semantics for defining these sorts of constraints. So it has constructs to represent common geometric relationships and other things. And here's the connection to Saturn SMT, one of the connections, right? What you want to do is you want to represent these kinds of constraints as a scenic program and then to generate data, you want to be able to do distribution aware sampling from the space defined by this. So here's one use case for scenic. So you start with a scenic program. You generate lots of scenes. You run that through a simulator, you get data to test your neural network. You can also train it using this. You can use this data to augment real data that you collect. And then if you find a bug, for example, this is an actual image of a car where the neural network thought it was three cars. So you see three boxes here. Then you can use that and try to debug it. So you start with a scenic program and we can take counter examples and we can do various kinds of debugging and specialize the scenic program to say this is the subset of scenarios on which the neural network fails. So this is a probabilistic programming language, but this is actually a well-studied area. So there's lots of work on probabilistic programming languages. And in fact, we are really building on top of a lot of that. So the semantics of scenics is really a standard imperative probabilistic programming language. And in fact, people have actually even used this in graphics. But the main difference in scenic is it's domain specific. And in fact, we have made syntactic limitations on scenic that make sampling more efficient. And also a lot of the work in probabilistic programming is on inference. So you define a probabilistic program and then you say you want to compute the probability some assertion holes. Whereas our emphasis has been mostly on generation. But of course, I mean this is something that is more of a qualitative difference. It does also work on graphics and ML on doing these sorts of things. But they're typically not based on a program representation. And we feel like a program representation gives more control and it's easier to interpret. It's easier for the neural network designer to know what exactly is my probe, what are the assumptions that I'm making. So let me start out by telling you step by step what scenic looks like. So imagine we want to describe the scenario of a badly parked car. So a badly parked car is a car that is parked at a funny angle at the curb, like this one. So here's how we would define the scenic program. The first thing to note is that scenic is a language that is agnostic of how you produce the images. So these particular images that we produced, these were generated using the Grand Theft Auto GTA 5 game engine. And you can use any renderer of your choice. In fact, we now have scenic interface to like five different simulators. But the first thing is you need to tell, you need to say what the backend renderer is going to be. And in particular, scenic is not only restricted autonomous vehicles and traffic scenes. I'm showing you examples of that, but you can talk about anything in scenic. And so it doesn't even know what a car is until you import it from a library that you use to interface to this. So this is what the first line does. It says, from the interface library to GTA, import the notion of a car, a curb and road direction. Now the key thing is these are all objects, but these are random variables. So a car is a vector of attributes. And the attributes are things like the model of the car, the color of the car, the typical size of the car on the road, things like that. And we define default distributions for this that one can change. But it depends on the renderer. And then you say, okay, now you're going to describe a scene, but you want to describe the scene from the viewpoint of an ego vehicle, an ego viewpoint. So in this case, we say the ego object from whose viewpoint the scene is being generated is a car itself. The reason this is interesting is because if the ego vehicle starts to move, the viewpoint will change. Then the next thing we do is we'll say, okay, I want to now, so that's the ego vehicle, this is the schematic of the ego vehicle. And this is the view cone of the camera. And now what I want to do is pick a spot on the curb region. And it has to be visible from the ego object. And so the way we'll do that is the red part is the curb. And now what I've outlined in black is the visible part. It's intersects with the view cone. So now what you have to do now is you have a curb region. So think of that as a union of polygons. And now you have the view cone, which is another polygon, you have to do intersection of polygons, right? And you get this. And then we pick, we say the spot is an oriented point. So in point is a location along with an orientation. And that's the spot. Now so far, I have not explicitly said anything about the distribution. So everything is default is uniform random. So it's basically saying, intersect the polygons, then sample a point uniformly at random, sample an orientation uniformly at random. All right. Then the next thing is I want to, this is the spot on the curb where I'm going to place the car near. I'm going to now pick a bad angle for the car. So this is saying, pick an angle that is between 10 and 20 degrees inclined from the curb. So that's an interval. Pick uniformly at random from the interval. And then multiply that by either plus 1 or minus 1. So either it's turning like this or it's turning like that. So that's another random variable, the bad angle. Then you say place a car. So that's, it's picking a random car. Place it left of that spot this way offset by a vector. This is x, y. So 0.5 meters that way. Facing the bad angle and everything in scenic, every object has a local coordinate system. So you're basically saying the, and the bad angle has to be relative to the road direction, not with respect to your own local coordinate system. So that's the scenic program. That's the entirety of this badly parked car scenario. And then every time you run the scenic program with the GTA backend, it'll generate images like this. And the key is it's doing sampling in a way that it will satisfy the constraints, but otherwise it'll sample according to the distribution, which if it is uniform at random, you will get a large diversity of backgrounds and other things will change quite a bit. All right, good. So from a language point of view, what is new in scenic is one of the things is this notion of specifiers. So if you think of scenic, the scenic code that I showed you earlier, it looks like Python, but with English sort of readable constructs. And we've designed that so that it's easy for people who are familiar with Python to kind of learn this language, but also easy to read for people who are not programmers. So you have these things like at, facing towards, left off. These are all operators, right? And they specify different aspects of other variables like position and heading and so. Then the other thing that specifiers help with is capturing dependencies. So when you do sampling, you have a number of random variables. So you have some underlying probabilistic model, and you think of every random variable has a node, and then there are dependencies between these variables, right? So for instance, I have to place a car left off the taxi. So taxi is a random variable, but I have to sample that first, and then I'll be able to sample the location of the car. So things like that. So basically using specifiers, we're able to construct this graph, and then do the sampling in the right sequence to generate the scene. So it helps with generating dependencies. The other aspect of scenic is that there are underlying geometric constraints, but they are fairly complex. I mean, I have worked with Saturn SMT for quite a long time, and I would love to use Saturn SMT in this domain as well. But the kinds of constraints we have here are highly non-linear constraints over domains with very different types of variables. And so what we decided is before we go into Saturn SMT, which is fairly heavy weight, let's just try something simple. In fact, we just tried rejection sampling, okay? Rejection sampling is you're just going to sample a point in the space, and then reject the point if it doesn't satisfy the constraints, right? Now rejection sampling can be quite bad, right, if your space is kind of narrow, and then you may end up rejecting a lot of points and not really reaching that particular region. And also it can take a lot of samples before you get a useful point. So we have a number of domain-specific strategies to help with this. So here's one example of them. So this is a map of some region of Berkeley, and here are, here's a very simple set of constraints from a scenic program. So here you have a car and you have a taxi, and then you have these two constraints. So this says that distance to taxi should be less than five meters, and the relative heading angle between the car and the taxi has to be between 15 degrees and 45 degrees, okay? So what this means is that you think of every vehicle has a rectangular bounding box, okay? So you have that polygon for the car, you have a polygon for the taxi, and then you have all these polygons for road segments and intersections and all these things. And now I want to, I mean, if I just sampled blindly, if I sampled road segments, then it's very hard, I mean, at a road segment, if you want a car and taxi to be on the same side of the road, you know, driving, or even opposite sides of the road, it's very hard to get them at that angle, right? But in intersections, you can get this quite easily, right? And basically we have a way of using these constraints to then dilate the polygons in a way so that the segments that are unlikely to produce valid samples are basically eliminated, and you end up sampling from the green regions, which is more likely. So what can you do with scenic? We can do this data generation and then retrain with it. So here's an actual example, and there's more details in the PLDI paper. So one of the challenges, one of the neural networks that we've worked with had was detecting cars when one car is blocking another one, okay? So it's a little hard to see with this size, but all of these are images where you have in this case is a green car and a yellow car behind it. And you can, so we found the neural network was actually having trouble with this sort of image. Then we wrote a scenic program for this scenario, this again a few lines, and then we generated images, retrained it, and then it became more robust. So retained accuracy in the original data set, and it was much more robust on this sort of thing. Here's the case I talked about earlier, where there was one car that was detected as three cars. And what we can do with scenic is we can test a hypothesis like, what's the reason for the misdetection? Was it the model of the car? And that was our first guess because this car had a racing stripe on it and we thought, oh, that must be something that neural network hasn't seen before. So what we can do is you can basically do delta debugging on the scenic program. So you can change, you can fix other things and change one at a time and see if you end up with a misdetection. And it turned out in this case, it was not the model of the car. And it actually was something very similar to the previous thing I talked about earlier, which is it is the distance between the camera and the location of the car. And it was very nonlinear again. So anyway, so that's another thing you can do. And then you can do other things like design space exploration. So you're designing the neural network and you want to basically explore how its performance will be. Okay, so in the last part of my talk, I want to talk about this toolkit called Verify. So both scenic and Verify are open source. So both of them are implemented on top of Python. And so it seems to work just fine on macOS and Linux. And on Windows, we're still working on getting compatible with all the libraries and so forth. But if you're interested and you have a chance, do try it and let us know. So this was something that we built a toolkit that uses scenic as the modeling language. But we want this to be something similar to other formal tools like the one I presented this morning. But for with features that are useful for AI machine learning based systems. Okay, so here's the standard diagram. If you just look at the inputs and the outputs, this takes in the system, the environment and the specification. And then it lets you do a number of things for verification debugging and synthesis. Right? And the key thing inside what's going on inside the box is that we take a scenic program and we compile it into an internal representation we call the semantic feature space. So every variable in scenic will correspond to an element or some subset of elements of the semantic feature space. And then all our algorithms for verification or for synthesis will search on that semantic feature space. And the way this happens is you take, you can imagine one approach is you pick an element of the semantic feature space, you run a simulation and you monitor a property on it and you see whether the property is satisfied or not. So you can do this temporal logic falsification thing that I talked about earlier using this loop. And you can use that for a number of things. So you can do first testing, for example. So first testing is very easy. You have your scenic program, you just generate samples from it, run it through the network. Falsification is what I talked, you have a temporal logic property and you do directed search for violations using optimization. You can do root cause analysis which is trying to understand why the neural network failed by doing this iterative analysis in the scenic program. You can augment your data set and do retraining and you can also do things like synthesizing parameters of the neural network of the ML model and also hyperparameters that you can use in training. But you can do this in a property directed way. So you can say give me the parameters of my machine learning model that will help me the overall system satisfy certain property. Anyway, so there's a lot more detail in the CAF paper about all of this. I want to just give you an example of a case study we did, a couple of case studies actually. So this is the one where we modeled accident scenario in a simulator called VBOTS. So here the setup is that you have the ego vehicle, autonomous vehicle approaching a broken car which is cordoned off with a bunch of cones. And so the actual behavior you would like is that this vehicle will detect that and it will navigate around it. So to do that it has a camera on front, the vehicle has a camera on front and that image from the camera goes into a neural network and the system is, it starts out in lane keeping mode which is it's driving in that lane and the neural network's output is the distance to the accident site. It's estimate of the distance. So if that distance is estimated to be less than 15 meters then it will do a lane change. It will start the lane change and once the lane change is complete it goes back to lane keeping. So that's, so if it's less than 15 meters it goes around it. So here's the scenic program that we wrote to not the entire program but a relevant snippet of it that you know to model the scenario. So this is basically saying that first pick a point here where you're going to place this blockage. So you pick, it should be near the curb and it's a point with an orientation and then you pick a spot for the first cone. You place that left of that site by an offset that is in the interval 0.3 meters to 1 meter. You place a traffic cone at that spot with a certain, placing a certain orientation. We allow traffic cones to have fallen and so on. So that's there, that's mentioned in this particular thing. And these things like traffic cones, small car, etc. these are all coming from a library again. So scenic doesn't know about these objects. This comes from the underlying simulator library. So and then you place this is, so you place cone 1, cone 2 and cone 3. This is the dot, dot, dot. And then you place this broken car, the small car ahead of the spot by this kind of xy offset. So the x coordinate is this interval uniformly at random, y coordinate in that interval. So that's the initial condition for the simulation. And then you can use scenic to generate many initial scenes. So this is a map of downtown Berkeley where you have this road is going that way and this road is going that way. And here you see one configuration, here's another configuration, third configuration, so on. And then this video is going to show you a visualization of our falsification. So let me actually pause it. So let's just go back there. So basically every run you see is one simulation, but we are showing you what happened inside the falsifier, like basically the sequence of simulations it searched to find a safety violation. So the way the falsifier works is it first starts out with a correct execution. This is a correct execution. So the car starts, it correctly detects it and moves around it. But remember it turned the temporal logic property into a cost function. So now it is trying to iteratively try to minimize the cost function. And this is the first collision. So it kind of grazed the right cone. And now it keeps finding more and more collisions. And it's changing the position of the cones, the type of the car, the color of the car, the initial velocity and so on. And you can find lots of collisions. And this last one was a particularly interesting collision. What happened here is that the falsifier gave the stalled car the same color as the cone. And so the neural network thought that the car itself is a traffic cone, because that's what it correlated. It correlated the color of the cone with the fact that it is a cone. And so it executed the lane change maneuver very early. But it did this in a way that was inconsistent with the underlying assumptions of the controller. So this is what happened. It was actually 30 meters away, but it thought it was less than 15 meters away. But that violated the controller's assumptions about the speed at which you can do a lane change. And so it ended up hitting the car. So anyway, so what you can do now is you can update the assumptions of the controller and retrain the perception module and so on. And so the energy goes away. Okay. Here's the other case study I want to share with you. So this is more recent. So we have been working with Boeing. I should mention this is a research unit in Boeing. And they are just experimenting with the use of neural networks. And in particular, they're looking at automated taxiing. So what's happening, I think, in airports around the world is that they're getting more and more crowded because of the demand for air travel. And so there's a lot of load on pilots to even navigate through an airport setting. So they want some kind of pilot assistance. So in this case, this is going to be in a simulator that our colleagues at Boeing are using in this project. And the specification is that you see the center line with the number four. So the nose of the plane has to be between 1.5 meters plus or minus of that line. So that's what they want to do. And this is a correct execution. So this one, you can see the plane is following the center line. It's deviating a little bit, but more or less it is. And it's doing that in spite of all these marks on the runway. So this is pretty common, I'm told, where you have these skid marks from the tires of planes that can often hide the center line. And it's doing a pretty reasonable job. Okay. So sounds good. So the question is, can you actually get the neural network to do something unsafe? And so the first thing we did is we wrote a scenic program. So this is again scenic where it's interface to the simulator. And now it knows what a plane is and what different aspects of this are. So the features here are time of day, the type of clouds in the sky, the amount of rain that's happening, and then a particular tire mark that we're going to place on the runway. And then this is the scenic program. So basically you have things like, so this is Zulu time, this is in aviation, they use these terms for time zones. So Zulu time is the aviation term for GMT or UTC. And so basically here we are saying the time of day is going to be uniformly in the interval 6 a.m. to 6, you know, 6 hours to 18 hours, 6 a.m. to 6 p.m. And this is just computing it in minutes and seconds. The clouds and rains are of different types. So this is the cloud type. So it goes from zero to five. They have different codes. This is the amount of rain. So there's like, with these cloud types, you can have no rain. And these are the rain clouds. And you can have the amount of rain is a parameter between 0.25 and 1. The ego object for us is the plane. And then here's we're placing a tire mark on the runway either plus or minus 10 meters and 40 meters to 80 meters along the runway. And here you see the use of other distributions, right? So you can say I want a normal distribution for the orientation. And I want a normal distribution for the height of the camera and so on, of the smudge and so on. And then here is the property. So we are saying this is CTE is centerline tracking error. So the amount of deviation from the centerline and that has to be bounded by 1.5 meters always. And so with scenic we specify this and then with the verify toolkit we can run falsification for that temporal logic property. And we ended up finding lots of counter examples. Okay, here's one. So what you can see here is it's dark. It's a little rainy. It turns out to be early morning hours. This is using, this is based on a model of an airfield on the west coast of the US. And basically it loses track of the centerline and just drives completely off. So anyway, one of the things we found is that something like scenic is really, really useful, right, in doing this analysis. It's a way of formally modeling the environment and the assumptions that they made. So we went back and he told them that we can find a lot of counter examples. And then they said, oh, yeah, we are trained it between one and three PM in the afternoon in sunny conditions and so on. So I think this sort of thing can be very useful to kind of uncover hidden assumptions in the data, right, which is in machine learning world data is everything, but often there are assumptions in the data set that they themselves don't know the people who design these things don't know and formal methods can be quite useful in uncovering them. So the neural network is doing classification or inference based on the generated images, right? I'm a bit confused because you would want, in the real world, you would get kind of the real world images. These are computer generated graphics. That's right, yeah. So the neural network was trained on the simulated images. But can you say something about what would happen? The real world? Okay, very good. So that'll come on the next slide. All right. So okay. So I want to summarize now. So we talked about these two tools, Verify and Scenic, Open Source. And again, as in the morning talk, we'd love to work with people both in academia and industry. So a couple of other things that I did not talk about. So the first thing is what Kuldi passed. How do you bridge simulation in the real world, right? And this is what everybody is worried about, right? You do all this great stuff in simulation, and how can you retrain it to work in the real world? So the first thing is something that not really in my domain of expertise, but there are people in computer vision and other areas who are working on these techniques called domain adaptation, which is ways to train something in simulation and then lift it to operate on the real data. And you can do that. And there's groups at Berkeley and elsewhere that have a lot of expertise in this. The second, this is the formal methods aspect of it. We have some very recent work at HSCC earlier this year where we can quantify the distance between real and simulated behaviors. So basically, in normal simulation, right, you have two traces of two systems, and you can have a simulation relation that says these are equivalent in some way, right? But in these sorts of things, you can have two traces and you want to come up with a metric which says how far away are these two, okay? So it could be as simple as the trajectories of the vehicle and some distance on that, right? But in this paper, we show how you can have a specification-based simulation metric that you can compute very efficiently. And you can then use that to do, you can basically compute this metric and then train or synthesize a conservative controller in the simulation, which is guaranteed to work in the real world based on this distance metric, okay? So that's the idea. The other aspect of this work is dealing with more complex sensors that I didn't talk about today than the ones that I showed. You can do counter-example guided retraining, right? So that's another thing where you can do kind of take techniques from inductive, formal inductive synthesis and apply it to this area. And then there's the question of runtime assurance, which is what happens when your assumptions break at runtime? And we had some work at the dependable systems and network conference about this. So again, circling back to the SAT SMT, I think these are the things that I can think of are really useful here. So one thing about machine learning systems is, I mean, it's all about optimization, right? So you craft the right objective function and you optimize and that becomes your planner or that becomes the classifier. And so one of the big challenges in that domain is that sometimes you want hard constraints to be enforced, right? And so how can you do this? How can you do this constrained optimization efficiently, right? So that a neural network is not only trained to minimize the cost function, but also provably will always satisfy a constraint. So that's one interesting application. And the other one, think of something like Scenic, right? And we did use rejection sampling, but there will be cases where, I mean, we haven't hit it so far because it's very, I think these ML based systems are very easy to break now. And the expectation I have is they will get better over time and at some point, rejection sampling will not work. And maybe like Markov chain Monte Carlo sampling is also not going to work. And then you need something more sophisticated. And that point, I think there will be quite a good need for something that's based on SMT based techniques for sampling or even before that. And so I think one of the things I would love to work with people who are interested in this is to identify maybe a subset of Scenic programs where we can do distribution of our sampling for the underlying SMT theories, okay? So I'll conclude here. So just to circle back to the first thing, talked about verified AI, talked about these five challenges, environment modeling, specification, complexity of systems, how to scale up verification and then design for correctness. And there's a number of strategies that we've outlined in this paper, some of which I had a chance to share with you today. But please, if you if more interested, do read this paper and I would be glad to continue the conversation. Thank you. Any questions? So what's interesting about your approach is once you have this semantic feature space, you never have to solve the inverse problem of you know, from the from some errors in the parameters of the image, you have to go back to what the feature which would reconstruct that kind of parameter, right? You are always, but is there any way, any situation where you really would have to solve that hard problem? Okay. So I can say, well, yes, you know, there is, if I understand your question right, right? The question is really, is there a problem for which you can't really come up with a semantic feature space, right? And you have to operate over the concrete space, right? That's a crux question. And so I think I can think of examples that indeed do have that, right? If you don't have a renderer, for instance, right? Or you don't want to rely on the renderer, you don't have a reliable renderer for the application, then you would have this issue. So one domain I can think of immediately is medical images, right? So, you know, images of tumors or things like that. I mean, this is, I don't know people in the room have more expertise than I do, but I don't know there's any way of generating synthetically these sorts of images, right? But you do want your neural network that operates on these images to satisfy some properties. And so you have to operate over this concrete domain, right? But the challenge there is also if I want to generate counter examples, I have to know what the precondition is going to be of valid images, right? And that to me is not clear. So maybe, so that requires some domain expertise to characterize and then you have to operate over the concrete space, right? So, one thing when you discuss about reason of uncertainty and then you project it on the for machine learning parameters. So, like is the reason of uncertainty described by polynomial constraints like because how do you do that? Yeah, good question. So, the reason of uncertainty that we had was not described using constraints of that sort. It was basically just you can think of it described as a union of hyper boxes. So, you can probably encode it as constraints, but we didn't really do that, we left it in the geometric form. Yeah. So, in general actually one strategy I think might be interesting to take because in general these constraints are highly non-linear, right? And so, it becomes hard to think about how am I going to deal with sampling from polynomial constraints when even finding one solution might be hard. And so, one strategy I have been thinking about is maybe you can just think of like a triangulation of the space, use ideas from what people in graphics already do, right? Which is they represent some very complex surface by just lots of triangles and then the constraints for those triangles are simple linear constraints, right? But you have lots of them, right? But that is what SMT is good at, right? Like searching over like Boolean combinations of lots of constraints. So, in this regional uncertainty thing we are doing something similar, right? We have like unions of hyper boxes, but I think that could suggest a strategy. So, in this model how much of it is domain specific that is with respect to the graphics or the vision part of it? Because I know that you had some previous work on music improvisation also. So, the improvisation problem how different is it from that work? Underline improvisation problem, yeah, right. I seem to remember it was the same combination of hard solve and randomness. So, what is, yeah. So, this is a question for the people who know about that past work. The underlying algorithms are very different, right? So, there the forms of constraints we had were based represented with formal languages and the distribution constraint was also simpler. It was basically saying, do not generate the same, yeah, the same melody which is a word in the language. Do not generate it too often. So, you basically upper bound the probability, right? And so, if your constraint is of that form then we have nice algorithms that are customized to solving that. But here you have arbitrary distributions with scenic. So, we cannot really reuse a lot of those algorithms.