 Has any of you ever been using, like, formal methods or automated proof softwares? Oh, many of you. I found this topic super interesting because when I first read of it I was like, oh my god, now I'm going to prove the security of everything I've written so far. Then I realized it takes a lot of time and it's very difficult and most of this stuff doesn't have at all any kind of documentation. But, I mean, Cornelius works on formal method for security and network management so I believe he will be able to tell me more about this. And in particular for this talk of today he will be telling us about verifying properties of table firewall rules. So please give him a big round of applause. Okay, thank you very much. Thank you all for getting up there early and joining my talk. This talk is called Verified Firewall Ruleset Verification so I think it's best to first refresh our knowledge about firewalls in the first few seconds. So here's a firewall ruleset I found on a network attached storage. It's a Linux IP tables firewall. Let's just walk through that ruleset to see what it does. So let's say we have a packet for this box. The packet is given to the input chain and the box will now walk through all these firewall rules sequentially until it finds a matching rule to know what to do with this packet. Let's just do that for the sake of example so we all know what firewalls do. So this is the first rule. The box will look at what you see there on your right hand side. It's what we call the match condition. It tells you if this rule matches for a packet, this rule matches for all protocols, all source IPs, all destination IPs. So essentially this rule matches everything and then it will execute this action which is called DOS protect which means we jump to the user defined chain DOS protect which is down there and will continue our evaluation there. The first rule let's read the match condition first looks at ICMP protocol, any source, any destination IP. ICMP type 8 which is an echo request as we all know and checked if a certain limit is not exceeded. If this is the case, we will return, we return to where we came from. Otherwise we will go to the next rule. It looks at ICMP echo request again and drops them. So we see a pattern here. Those two rules essentially implement rate limiting. First, if the limit is not exceeded, we can go out of the DOS protect giant. Otherwise things are dropped here. The same pattern we see here for TCP packets, TCP flags obviously reset, flags that there. If a certain limit is exceeded, they will be dropped. Otherwise we can return. So let's say we get through the DOS protect chain without being dropped. We end up where we came from. The next rule accepts all packets which belong to an established connection. Accepting everything that belongs to an established connection is usually considered best practice. Opinions differ on the related match. Well, the interesting question is how do we get a connection in an established state? Well, this is what most of the rules of our firewall are about. So how do we get in an established state for this firewall? Well, not for SSH packets, they are blocked. Also a lot more TCP ports are blocked. Also a lot more UDP ports are blocked. Finally, the firewall is accepting something. It's accepting all packets with a source IP address in the local 192.168.0 range and it's dropping everything else. Okay, I hope by now we have refreshed our knowledge about firewalls. So what's the problem with firewalls? Well, I guess everyone in this room loves open source solutions. So we don't have to care about backdoors and proprietary software. We also don't have to care about co-bound for such a security made in Germany. We love our open source firewalls, the Linux or BSD firewalls, they are quite good. So what's the main problem? Well, there was a study that looked at real world enterprise firewalls out there in the wild and the study found, well, there are no good high complexity rule sets. So the main problem with firewalls is administrating them, setting them up, which all comes down to configuring the rules set for the firewalls and well, a few of you were laughing. That rules that we saw on the slide before was actually quite simple. Just imagine those rule sets get larger. The study repeated this finding a few years later and it still finds firewalls are still poorly configured. So it's all about the firewall rule sets that, well, if the rules gets complicated, we make errors. And by no means do I mean that administrators are incompetent. No, thank you very much to all those masters of complexity out there who get our network running. Thank you to the knock for everything. Thank you for not firewalling us here so we can have more fun. So it's all the configuration of the firewall. So people had the idea, well, let's just write a few tools that check our firewall rule set and tell if the rule set is good. People had that idea and we looked at the tools that were out there. So we fed those tools some real world firewall rule sets and we found, well, there's essentially no tool out there that understands our real world firewalls because have you ever opened man IP tables extensions? Well, then you understand why things can get very complicated there. So and even if we had a tool that would check our firewall rules that would we trust that tool? Well, probably not. So let's do where others fail. Let's try again. Let's try to write a tool to check firewall rule sets. But because we found, well, tools we have so far do not understand our rules that let's start from the very beginning. Let's first talk about specification and implementation. When we mean implementation, we are usually talking about the code, about our tool, we talk about low level hacks to increase the performance of our tool all day on the right hand side. And this is usually the stuff you don't show to your users. To your users, you show them the specification or the documentation or just the question, what does your tool do? So here we to write a tool, we have to answer the question, what is a correct rule set? And for the sake of example, for this talk, we will stick to spoofing protection. So our ultimate goal will be a tool that checks if our firewall rule set has spoofing protection. We only have 30 minutes. So let's stick to spoofing protection. But before we can specify what spoofing protection actually means, well, we need to specify what is a firewall. And therefore we need a model of the packet filtering of IP tables. And that model, you saw the firewall in the beginning should really, really be expressive so that we can really say, well, this model mirrors reality and we can get all our complex firewalls of all fancy IP tables matching extensions into that model. So we can in the end run our tool that checks if we have spoofing protection. We will implement everything in the ease of a theorem proven. Why are we using a theorem proven? Well, you know the good or problem. You look up the documentation of some library you are using, then you look to the implementation, you found, well, the documentation was just plain lying to you because documentation are usually horribly out of date. So in this talk, we want to write an implementation and then prove that the implementation corresponds to the specification. That's why we are using a theorem proven to have a proof in the very end that our code really does what it is specified to do. So the most important thing here is to specify what it should do. So to summarize what will this talk be about, we will write a verifier for rule set and we will verify the verifier itself. So let's get started with step one. We need a model for IP tables before we can specify anything. Okay, so we need match expressions. Let's start simple. Let's write down the syntax of match expressions. The syntax is all about how do I write things down. There, let's define a data type, match expression. This data type should be polymorphic over the type apostrophe A, which means this apostrophe A can be any type. We will call this the primitive which will be the features IP tables can match on. Let's keep that generic for a moment. So how can a match expression then be constructed? Well, we can match on such a primitive. We can have a match expression that just matches plain anything. We can negate a match expression or we can combine two match expressions to one larger match expression. There's an example. We are combining two match expressions with this image. And there we see in the inner thing, these are the primitives. So those things we keep completely arbitrary. Here we are matching on destination IP and protocol TCP. And again, we will keep all these primitives, the features we can match on completely generic. This was the syntax, how we can write things down. But what does this mean? Well, so we have to specify the semantics. Semantics is all about what do match expressions mean. Well, match expressions are matching on packets. So we can't specify the semantics without a packet. Let's look at the type signature of the matching semantics first. There it is. Looks a bit intimidating first. Let's look at what we have there. The first parameter of this matches function is a function itself. We call it gamma. And we also call it the primitive measure. Because look at the type of the function. The first parameter is the primitive, some primitive match thing we can match on, for example, source IPs or protocol. The second apostrophe P is the packet it should match on. By the apostrophe you see we also have a generic packet model you can plug in there anything. And it returns a boolean, true or false. And, well, this function gamma should return true if and only if this primitive match condition matches for a packet. The second parameter for our semantics is a match expression, as we defined it before. Then we give it the packet. We want to know if it matches and returns true if and only if the packet matches for the match expression. So let's look at the individual rules here. The first rule and straightforward if we match on the primitive A for a packet P, well we just ask our primitive measure if this A matches for packet P. Then the next rule, very straightforward the match any match expression matches everything. If we have this match not which means we negate our match expression and the match end you may already have guessed it is just the conjunction of the two match expressions. So now we have defined syntax and semantics of match expressions so we can match on packets. Now we only need to specify the filtering behavior of IP tables. There it is. Quite simple. First of all, first of all, what the fuck. So the good news about this is, well this is really everything we need to know about IP tables packet filtering. It's on one slide. If we have enough time we could really read it. It's much more concise than the main pages. Well it just has a few funny symbols on it. Let's try to read it. Everything we have there looks the following. Maybe we can recognize something. P should be the packet the firewall is examining. There we have the gamma again or primitive measure which has encoded all the features IP tables can match on. Then at this position we have the rule set. The rule set is just a list of rules. The firewall is currently examining. S should be the start state. The firewall starts looking at the packet. Usually the firewall in the beginning is undecided about what to do with the packet. And there at the end we have the final state. So in the end the firewall usually makes a decision either to accept or to drop a packet. So let's read the rules. Let's read a simple rule to skip rule. They are all written with this line which means everything above the line is the precondition. Everything below the line is the conclusion. Here we don't have any precondition so this rule holds unconditionally. So let's read what this means. There we have it. First of all this rule looks at the empty rule set the empty list. So what does a firewall of the empty rule set do? The rule says the start state and the final state are the same. So essentially this rule says for the empty rule set the firewall does nothing. Not really that hard but we need to state we need to start at some point. Let's look at a quite more complicated rule. Here we also have a precondition and what we are looking at is a rule set which only consists of one single rule which has some match condition m and the action is accept. Our precondition is that we assume that the match condition matches for the packet. Well what would then happen? The action is accept what should the firewall do? Well if we don't have a decision for the packet yet the action was accept then we are going to accept it. Okay also not that hard and I guess we all agree that this is the behavior of our common firewall. So all the other rules they actually read pretty similar and we only have 11 of them so if we have enough time it's not that hard to read the slide you were first laughing at. So let's directly jump to the most complicated rule we have then our rule set. It's called the call return rule. Well looks a bit complicated. It is. So first again we have a rule set with a single rule with some call C there and our first precondition is this one rule matches otherwise the firewall wouldn't do anything at all. Then there's the complicated part the action of the rule we called it our call C which means we want to call or jump minus J option in IP tables to some user defined chain. The name of the chain here is generically expressed as C. In the example from the beginning for example this C was the name does protect and we also have as a precondition that the chain and the background rule set looks as follows. So we have this capital gamma there of C which basically means look up in the background rule set of all the user defined chain how the chain of this rule how the chain for the chain with the name C looks like well so essentially we get what is in the does protect chain for the sorry for the example from the beginning and here we say well it looks the following first there is RS1 then there's emperor return and then there is RS2 which means well this user defined chain can look as follow first there is an arbitrary part called RS1 an arbitrary amount of rules could for example be the first ICMP rules we saw in the does protect chain then there is a rule which has an action of return and then there can be more arbitrary rules called RS2 can be an arbitrary long list of other rules. Okay then we have the next precondition it says we can process this first part RS1 without getting a decision so so far we have called to a user defined chain have processed a first part of this chain without getting any decision yet then there's the next assumption we have a matching return rule so what does happen there in IP tables so we have called to a user defined chain we have process something didn't get a result then we got a matching return rule so we came back to where we started from without any result and this is what the rule tells us but I know I'm going over those formalism a bit very fast I hope a few can follow first of all the cool thing about this this is really a mathematical specification it's not an implementation and we can see that we can specify the behavior for example of calling to and returning from user defined chain without a call stack so this is really just a specification not an implementation it's not executable but hopefully it tells quite clear what the firewall does well anyway if I already lost you I hope you can join now because there's the question why are we doing all this formalism well let's get a bit more applied now with every and every slide we have we will get more applied and we will now use this formalism to actually do something useful with it now we just have specified the filtering behavior of a firewall okay let's specify more let's read this formula there on your left hand side you have behavior of a firewall then we have if and only if and on the right hand side again behavior for firewall so the left hand side and right hand side are equal okay where's the difference the difference here is this f f is a function which takes a rule set and transforms it to another rule set so what could this function f possibly do well we said the behavior of those firewalls are equal so it means that this function f takes a rule set transforms it to a different rule set but it didn't change the behavior of the firewall so for example f could be a function that improves the performance of your rule set and you could safely run this function on your rule set because well there the formula says f doesn't change the behavior of your firewall you can safely deploy to production and we have implemented several such f functions a very simple example we can remove all the logging rules or semantics only cares about packet filtering logging doesn't influence that in ip tables so if we move all the logging rules the behavior of the firewall stays the same okay simple example you also can do more we can unfold all calls to and returns from user defined chains so when after we have run this function well the firewall will be a long list of rules which will all be processed sequentially there's no more jumping around which is a cool thing unfortunately the match conditions will get quite complicated for the rules set but again we can normalize these match conditions to simpler ones so in the end we will have a firewall which is just a long list of simple rules there is no jumping around and the actions are either accept or drop in the end so we can essentially simplify our firewall and the formula says this simplification doesn't change the packet filtering behavior at all so we can safely run this okay let's look at more things we implemented our semantics in ternary logic in boolean logic we either have true or false in ternary logic we have true false and unknown well what can we do with it well cool stuff let's first read this formula let's see what we have there in the middle in the middle we have a set we have a set of packets and the condition for this packet is they are accepted by the firewall you start in the undecided state and in the end you're accepting it so there in the middle it says we have the set of all packets accepted by the firewall well we can specify that at theoretical computer scientists we believe in specification we have specified the set of all packets accepted by the firewall quite cool well I guess as hackers we don't believe in just such a set specification which is standing just out there in the void having no connection to reality you can't even execute that set no as hackers we also believe in running code and this is what we have there in the sets above and below above we have an under approximation which is essentially a stricter version of the firewall and that's important that we embedded the whole thing in ternary logic because it's quite impossible to implement all the matching features you have in ip tables in an analysis tool and I'm not aware of any tool that implements the full set because well the netfilter team also adds new features with many releases so here we have an approximation embedded in ternary logic and well we have proven that this is a sound approximation which makes the firewall stricter so basically this firewall may accept less packets than the original firewall and on the set on the bottom we get an over approximation which essentially makes the firewall more permissive this firewall may accept more packets than original firewall and the important thing here is well those are executable and well the reality or the specification what we want is in the middle and we have executable code around the reality so to say and well we will use these things all in the background now because now we can execute the whole thing and we can safely approximate something if we run into some features we don't understand because we have embedded all thing in ternary logic so essentially we have now a sound approximation where we can say well there's some match expression in somewhere in the firewall we don't understand the result there is unknown and the whole thing will be safely approximated according to what you want do you want to be stricter or more permissive so I said I will explain you how we get spoofing protection in the end let's look at spoofing protection let's first specify what spoofing protection is let's say we have an IP assignment there eth0 belongs to the IP range 192.168.0 slash 24 so what does spoofing protection mean it means for any IP packet that enters eth0 the source IP must be in this 192.168 IP range so let's specify that again we are looking at a set and we have seen that before we require that in the set these are the packets accepted by the firewall there's an additional constraint about the packet the packet should come from the interface eth0 and just like before we are looking at the packet but not the complete packet we only look at the source IP so this big set expression on your left is essentially the set of all packets accepted by the firewall coming from eth0 but only their source IP well and what should the set fulfill that the whole firewall implement spoofing protection well all these source IP should be in the valid range of that interface I guess we can all now agree that this specifies spoofing protection for that particular example let's generalize that let's say we require this for all the interfaces we have in our system and now we have specified spoofing protection okay again we have specified it I guess we all agree that this is spoofing protection what can we do with the specification well we wrote an algorithm an algorithm you only give this IP assignment as above and the rule set and if the algorithm returns true you definitely have spoofing protection near firewall and let me point out the cool thing about it there you see this arbitrary gamma arbitrary mathematics always means well it's all quantified what does this all quantified gamma mean gamma was the set of all matching features our firewall model supports here it's arbitrary so our algorithm can tell if you have spoofing protection for any magic IP tables match condition you may have in your firewall even if the net filter team decides to implement a new cool matching feature in the future maybe they don't know yet that they will implement that feature here the specifications as well this algorithm will correctly understand it and check if you have spoofing protection anyway even if there is some unknown cool feature feature in it and I think that's a pretty cool specification there that we have about the algorithm so it essentially says we understand all possible match conditions you can ever use in your firewall well so what about the specification well you know of normal specifications well they are quite imprecise and not really tell you what the algorithm does this is a mathematical specification it's concise it's really precise and it tells you exactly what the algorithm gives to you what else about usual specification well very often they lie to you they have some implicit assumptions they don't tell you you have to call in a special way but they didn't document it well not in this case we have a formal proof that our algorithm really fulfills this condition so there is no implicit assumption somewhere this specification will never night will never lie to you and also for other for some other documentations you always know that the documentation is usually quite outdated especially for particular projects well here we have a machine verifiable proof you can run that proof on your computer you don't have to reconstruct it by hand you can give the proof you to your computer and tell your computer to reject the proof at any time so you can check it all the time if this specification is still what the algorithm provides you so this specification will never be outdated and I claim well this is probably the best way to document your code to prove it correctly it will never be outdated it's really precise what it does it doesn't have any hidden assumptions and it will definitely not lie to you so I said we have executable code well this looks like a lot of formula how does the execute executable code look like well we wrote the whole thing in isabel proved it and well once we got executable code we can export it to different languages here we choose haskell and you can really run that algorithm on your machine you don't need isabel or any theorem prove as a backend it's a standalone program here you can see how you can call it you just feed your ip table save dump into the tool then you supply the ip assignment as we saw on the slide before or probably generated by if config or ip address show and the tool will just automatically run it doesn't need any additional input you don't have to prove anything by hand it will just check if you have spoofing protection on your firewall so there we have it we have a verified tool to verify that you have spoofing protection on your firewall and because i'm running out of time i just want to show you a short outlook what else we have in our verified tool set there well probably you have seen on the slide before how big was this firewall 4800 rules quite large I guess this firewall rules that now hit the 5000 rules and we had the question well by the way who is allowed to set up ssh connections with whom in a 5000 rules firewall rules that well there is the answer and all these ranges you see their internal servers i know they essentially correspond to the complete ipv4 address range i just gave it symbolic names here because well those ranges are really split up maybe non-continuous because there are so many exceptions in there but they're all clustered together and we have proven a few things about this graph it's the complete ipv4 address range all these ip addresses that are grouped together for example in the servers or internal group really have the same access right and this is the best possible solution you can get for this question because you cannot compress this any further you can't get a clearer or better answer to that well you may ask why do we have to on the left inet and inet prime well because of a typo in the firewall which is now fixed well it's pretty clear in the graph that you don't want two internets there and you can really see the bugs there and what was really interesting essentially for us we can we looked at the long list of servers there and internal there and really verified that all these ip's for the servers are the machines which should be accessible by ssh the internal ones are the ones which should not be accessible by ssh and well for a 500 rules that you can see that there are a lot more funny exceptions which have funny access rights which we probably also should look at okay that's it thank you for your attention well it would be really really great if you have some firewall rules that's lying around and would want to donate them for reset for research would be really really happy and of course everything we are doing here is not only fully verified but also completely open source you can get everything thank you for your attention i guess we have a few minutes left for questions so yeah just laying up in front of the microphones right here and here um i think there is one there you go so have you done any work or are you aware of any work um that involves um uh studying firewall rulesets of multiple um cooperating firewalls um short answer is possible with your at the moment no but there is other work in research which already claims to have solved this problem and they did but they have a very limited firewall model what we can do with our tool you can give them you can give us an ip table's um safe dump with all the complex features you have in the firewall we can simplify it either to an over or under approximation which will create a much more simpler firewall ruleset and this simple firewall ruleset is then actually understood by some tools which have been developed in academia okay thank you we can pre-process but we don't have yet incorporated further analysis in a fully verified manner hi thank you for the talk is uh very uh very well done and you know i uh would like to get more involved in like formal methods and proof theories and stuff like that but i have trouble knowing where to start i was wondering if you had some insight into what resources would be good to get started with programming in isabel or koker uh even idris things like that so first you should look at the theorem prover at your choice i prefer isabel if you're more from the programming community you will probably enjoy cock more it depends a lot on personal choice and everything if you're a student at to munich you should definitely go to to be a snip course great courses because there we have the isabel group directly and if you want to get started with isabel there's the book concrete semantics by to be a snip coin gaving line and i hope i didn't miss another author which is i guess really available you can also buy it and all the examples in the book should be included in the default isabel distribution so that's the easiest way to get started download the complete isabel package look at the book it runs usually on your system you don't have to compile any additional packets and get started with the book okay thank you very much hey uh we're running out of time so there is time for one last question from the signal angel yes uh on the irc they would like to know what's the time complexity of the algorithm the runtime complexity well bad news this pre-processing can blow up to exponentially many rules in theory in practice we looked at a lot of firewalls to well and this exponential complexity is sort of exponential in how fucked up you manage to call your user defined chains so usually you do that by hand or semi-automated so it doesn't blow up that much the worst thing to tell you real numbers i saw is it was about a 500 rule firewall based on a shore wall setup which was really really quiet well it triggered that a lot and also when we rewrite ip address ranges to non-negated zero range which can also blow up a lot the worst blow up we ever saw was from 5 000 rules to 20 000 rules which is still very good handable by a computer because afterwards basically all our algorithms are mostly polynomial time if not even linear to tell you real numbers this took about one minute of pre-processing for five thousand rules and then about one second per interface and this took less than two minutes all the four five thousand rules so you can really run those things on your computer cool thanks