 Oh, right. Talk, yes. Welcome. I'm going to be talking to you about automated firewall testing. And to give you the entire point of the talk right up front as a spoiler, testing is good. You should all do more testing. Any questions? Excellent. Okay, we can go now. To do the typical shameless self-promotion, I'm Kristoff. I'm a FreeBSD developer. In FreeBSD, I mostly try to maintain PF. PF is one of the many firewalls, well, one of the three firewalls in FreeBSD. PF we stole from, borrowed from OpenBSD and will return it to them any day now. Professionally, I tend to do a lot of embedded projects, quite a lot of them on, you know, that other UNIX operating system nobody's ever heard of. I want to make it very clear that I'm not for sale, but I am for rent at very reasonable rates. You know, send me email. On to more interesting things. PF is a packet filter. I trust we all have a vague idea of what firewalls do. They look at your packets, cluck their tongues disapprovingly and then throw some of them away. Sometimes they let packets through. I mostly think that's usually a mistake. We imported it from OpenBSD. We did that a while ago. I actually forgot to look up when, but it's been many years since we did the last import, which means that there have been differences. You know, the OpenBSD people keep changing their mind about what the syntax should be and we haven't taken any of those changes in yet. We do take occasional bug fixes. There's even one or two commits with my name on it in OpenBSD, which should carry you off that particular operating system. Now that I've said, you know, what we don't have in FreeBSD, some of the things we do have is VNAT, and I'll talk more about VNAT later. And what we also have is multi-core capability. So PF in FreeBSD tends to be quite a good bit faster than the one in OpenBSD. The OpenBSD people are working hard on unlocking their kernel. They are, I think, discovering as FreeBSD did in FreeBSD 5 that this is not easy and it's going to take a while, but they'll get there. Onto the actual topic, testing. Why do we want automated testing? Well, obviously because I'm really, really, really lazy. So if I can get the computer to do it, I don't have to do it. That's clearly a win. Testing is a good idea because our users are very spoiled and they have this vague notion that if we give them software it should actually work. Yeah, I know. They're very unreasonable, but what are you going to do? It's also nice that when we fix the bug it doesn't spontaneously re-emerge. Regressions are a thing that happens. I've got some examples later on. Actually on the next slide, but that's later. It's also really nice when you're doing development. I recently did some work on PF Sync and it's really nice that you can run some tests and have an idea that I may have broken it, but at least the breakage will be subtle and it will take a while for people to notice. It doesn't explode immediately when you try to do this. I will confess that during development I had quite a lot of episodes of I think this will work and then you run the test and it just massively explodes, but that's actually when you want your tests to fail as when you're developing rather than when you've delivered it to the customer and they're going to start using it. Some example regressions. There's a pattern here. I wonder if you can spot it. One of the things I did to PF was make it understand V6 fragmentation and I hate V6 fragmentation now. Long, long ago that went in and that worked and it was wonderful and then somebody made the V6 forwarding path faster and broke PF's fragment handling and I didn't notice for nine months which is that's kind of painful, a painfully long period for something to be broken and that was before we had any sorts of tests for PF, so that would only happen when I manually built a test setup and tried this out and my weekends are boring but they're not that boring. So it did eventually end up being fixed and a little while later actually just last year around August I think V6 fragment handling broke again. This time as a result of a security fix there was an issue with fragment handling not just in FreeBSD but in Linux and in a bunch of other operating systems where somebody decided to try what would happen if I generate an evil sequence of fragmented packets and it turns out that a lot of operating systems had list operations and you could exploit this to make it take a really, really, really long time to do fragment lookups. So that was a denial of service vector. Someone fixed this that really should have looked up his name but it turns out that the fix had a bug in it and I'll show you the code and give you a moment to try to find the bug but the test picked it up immediately and it only took about two weeks between the bug actually being introduced and the fix going in which you know it's two weeks that it was broken that's not ideal but two weeks is a lot better than nine months. It was also really annoying bug which you know a Heisen bug. You try to debug it and you attach detrace probes to the function and all of a sudden the bug goes away and you can't really tell all of your users you know just run this detrace probe and your code will work just fine. Also annoyingly you know the PF tests failed over this so everybody thought it's a PF bug crystal messed it up again nobody needs to look at this so eventually I looked at it and discovered you know very confused because I had not actually touched the V6 fragment code in PF in months. Why is this broken? Turns out it wasn't PF that was broken it was the rest of the network stack that was broken. So here's a little bit of code. This is where fragments enter the stack and has anyone spotted the bug yet? I have actually made it slightly obvious so basically what it does now is it hashes source and destination IP address and what else does it hash presumably it hashes an identification somewhere and it uses this to divide up the fragments into buckets to make it much much easier to find them rather than run through lists and it turns out that this hash so you know we allocate an array and then we hash over the entire size of the array. The array of 4 bytes you in 32s is 4 times as big as it should be so we hash not only the source and destination IP address and the ID we also hash whatever random garbage happens to be on the stack at that moment which makes this hash unpredictable which means that you know some parts of your fragmented packet will end up in one bucket some parts will end up in another bucket and so you never ever manage to reassemble them. Unless of course you attach detrace and detrace leaves predictable garbage on the stack so they all end up in the correct buckets and everything just works. Yeah I there was some amount of swearing when I finally figured out what went on there and it is subtle enough that it's easy to miss so that's the patch that went in is you know just make this array a bit smaller use less stack real estate hash only the bits that you actually mean to hash and then everything is fine so what do we want out of tests well again we're lazy so we want them to be easy to write we also want other people to be able to run them not only so that they can run them and find their own bugs but also so that they can write them because you know no matter how easy it is to write a test if somebody else does it I don't have to do the work. We also wanted to integrate with our existing continuous integration infrastructure. FreeBSD actually has a fairly large number of tests and we do actually run them regularly so CI.FreeBSD.org if you're interested so we run all of these tests we build you know it's always nice if your software actually compiles and then we actually test this some tests failed some tests failed here it's math related and math is something computers do I can't help you with that one about 7300 tests and these run a couple times a day basically you know whenever whenever it got done doing the last test run it does another one we now have someone who will actually follow up on those tests so that's really nice so I maintain the firewall so the bit of code that I would like to test is the firewall how do you go about testing firewall code the typical approach that you might think first time around is okay we get a bunch of hardware we get you know like three machines we have machine A machine B and then in the middle we stick the machine where we run the firewall and we have machine A's and the packet through through the firewall machine and machine B will go and look at you know did this packet make it through and molested that's all fine you know you end up with having having to need a fair bit of hardware and then when you want to test more complicated setups like PF sync where you synchronize states of one firewall to another well you need two firewall machines so you need four machines and then you might want to test carp to do failover so you also need four or possibly even more machines so this gets to be a bit of expensive not only that you know how would you configure this machine well you just say chin to it except that we have just configured the firewall to drop all traffic okay fine you can you can you know you can attach a serial port or you can use IPMI on a different interface but you know your hardware budget starts to grow fairly significantly if you want to do this then you've got another issue that you know you really want to be running the latest and greatest code so you can plausibly net boot this thing which is really annoying if you want to test a firewall also you know you need a server to host this on and your setup gets to become you know really really complicated you need to deal with you know this box panicked so now we need to reboot it and we can't log into it so we need to do this over serial but if you ever try to do interactive scripting against something that's intended to be used by a human it's hideously unreliable so you need a remote power switch it's to be you know very complicated very expensive slow because machines take forever to boot and you know where would this hardware live the free BSE project has some friends who would you know lend them the hardware and who would host it for them but then you only have the one set up so how do other people write tests so this is a very unsatisfactory sort of solution it turns out that these days you can have virtual machines has anyone ever heard of this concept free BSE even has its own hypervisor beehive and this is the approach that was actually taken in a Google summer of code project and it has a bunch of advantages because you don't need to buy all of this hardware and a lot more people can build this setup but you still got you know what if we block all the traffic okay we need to use a serial interface your configuration gets to be a bit complicated another issue is that the way the tests are currently run is they run in a virtual machine because that's a really convenient way to run your tests but if we want our test to start up a virtual machine we have to nested virtual machines and that becomes a lot more interesting it's also and that's probably more an artifact of how we do our builds in our tests but it's really really annoying to build a virtual machine from your test setup where you might not have the source code installed you might not have already compiled it and building that virtual machine actually building it can be slow never mind booting it so this is better but it is still an annoying an annoying setup so what we finally wound up with is vnet and before we actually go into that it might be useful to explain what vnet is so how many of you know what vnet does oh excellent then you guys can explain it and I don't have to basically vnet is virtual IP stacks Linux has jails like containers but we've had them for a lot longer and you can associate an IP stack with this jail now which means that you can from inside the jail you can set an IP address you can run a DHCP client and you can configure a firewall this has been enabled by default in 12 the latest release everybody should be running 12 it's awesome pf supports this now by which I mean I don't know of any way to make it panic I'm sure there are some and if you know of any you know file bugs and tell me about it but it actually does mostly work now which means you can start up a jail and give it its own firewall which means we can just start the jail throw some traffic at it see what happens so how do we do this you know I don't know how many of you have played with Linux containers but there's a lot of stuff that goes on to actually make that container behave the way you want it to behave any number of moving parts and any number of things that have to be configured and you know your abstraction layers like Docker will do that for you but there's still a lot of moving parts on 3d as the you know I bet it's it's actually not that difficult that's it that's a jail with its own IP stack you know you start the jail you give it a name I've named it alcatraz because I think that sort of thing is funny and then you tell it that you wanted to have its own IP stack so vnet and you want the jail to stay to remain running even if no processors are running in it so persist that's it of course I've lied to you slightly because that's not quite everything that there is to it because while this jail now has its own IP stack it doesn't have any network interfaces it has loopback but yeah you know firewalling loopback is is perhaps not the most productive thing you could be doing with yourself I'm not going I'm not going to tell you how to spend your Saturday but yeah consider picking up a different hobby so what you need to do is we need to create well what we we do in the tests is we create an e-pair and an e-pair is basically two network cards with a cable between them virtual network cards you can't actually use them to link up two different machines but within one machine you can link up virtual machines or gels or stuff with them so we create one we assign an IP address to it and then we tell the jail that you know you can have this network interface and an additional really fun thing about the gels is they make it really easy to execute things inside the jail so there we go you know executing the jail if config e-pair 0 be set this IP address and up the jail and after we've done that you know we have one external interface on the host host system we have one internal inside the jail and I should actually include the name of the jail here so there's a typo in the slides and then you can ping the jail there you go we're testing that the network stack works I see in P Echoes are not very exciting but they do actually exercise a fair bit of your functionality so why don't we take a look at the basic test that's not the entire test that's basically just that's just the header we use ATF that we have you know we noticed that NetBSD had left their front door unlocked one night so we went in and took their testing framework in quite entirely possible I haven't actually looked at the new one yet because they keep locking their door now so what do we do you know we we declare a test case for v4 it's got a cleanup function we'll see the cleanup function after we've done the really interesting things you can set the description and you know this test actually wants to run as route the Manny state of free BSD has decided that you know if you want to create network interfaces and start jails and configure the firewall maybe we shouldn't let every single user on the system do this yeah I know I know but that's how it is so that's just setup code you can forget this that it's not important this is the actual test and this is it's a very basic test but it is already a useful test it tests that the firewall works and blocks packets so what do we do some initialization codes everything the initialization code does is make sure that you know you've actually loaded the bf module it is very hard to test a firewall if the code's not not in the kernel so if it's not loaded this test will just be skipped next step create an e-pair set up an IP address on the e-pair so basically what we saw before is we want an interface in the host we want an interface in the jail so that we can send traffic back and forth make a jail there's a there's a wrapper around this just to make it slightly easier to type what it also does is it keeps track of the list of jails you've created so that in the cleanup it will automatically destroy all of these jails quite nice to actually clean up after yourself set an IP address in the jail and then a sanity test check can we actually ping this jail now so if things fail here it's not actually a pf bug because pf is not enabled yet but we might as well test that this works so atf check you know run this command and check that the exit status is zero ignore the output and then run a ping command you know one ping only with a timeout of one seconds because if your echo request hasn't made it to a jail running on the same host within a second something's probably wrong next step is we enable pf and pf defaults to allowing all of the traffic to go through and then we can still ping and then when we tell pf to block all of the traffic we can't ping anymore hence the you know exit status two it won't work anymore and that's that's a very basic test of course you know the firewall should be able to make more fine grained decisions on your packet let's say then you know allow all packets or deny all packets but it's basic functionality test the rest of this test actually does a little bit more but this is all I could fit on the slide but you can see how you can fairly trivially test that I can filter out only icmp echoes and then I can still tell that into it or whatever cleanup codes cleanup and then add the test case to the list of test cases so if you run this using qa which is another tool that we happened to spot lying in an unlocked garage from that bsd you can tell qa you know run this one particular test and it will run the test and you will see that the test took 1.2 seconds 1.2 seconds of which it spent an entire second waiting for an icmp reply that we knew would never turn up so 0.2 seconds to start up a machine a jail configure the networking on it configure the firewall and tear it all down again I think we've satisfied the you know these tests are quick we've also satisfied that everyone can run this because you don't need anything if you've got a a free bsd system running free bsd 12 system or current or or even an 11 system although there are any number of bugs still in vimage in 11 so it's not enabled by default there you could build your own but if you want vimage you really want to be running 12 it also stores the test results so you can later on go and have a look at what was the output what environment variables were set what was the timing all sorts of things like that so this was a very simple test I've mentioned pf sync before so so let's take a look at a really complicated test this is just half of the test but this is a very basic test for pf sync it synchronizes states from one firewall to another so what do we need well we need an interface over which they can sync we need two jails that will have states and we will just send our traffic from the host so we create jail one and jail two yeah the jail the you know naming the jails after real-life jails joke got a bit old so they're one and two here so what do we do we set up IP addresses for the external interface for the sync interface we configure pf sync so we tell it what device it should synchronize over all this does is if you don't configure IP addresses in pf sync it will just multicast so if you have two uh two machines connected over a over a straight up link cable it will just magically work you don't need to worry about it set that up do the same thing on the other interface so that's all just set up what do we do next well we turn on pf we set some rules on both hosts uh we set up the interfaces and then we send a ping to it select the correct source interface just in case that would go wrong give pf sync some time to actually synchronize the states because this is not an instant process and then we go and look not at the jail that saw the traffic but at the other one do you have the corresponding state for this so we just grab and go look and then does this state exist if it does not exist we error out if it does exist everything worked and it's awesome you have you had a question so the question is have i tried to use tcp dump to see if it's arriving if the the traffic arrives i have not mostly what i've tried to do is make it really easy on myself to actually test this uh so rather than look at do i actually get this packet we check do we get the effects of this packet having been delivered so typically things like icmp echoes i don't check that the reply arrives i check is ping happy that the reply arrived so theoretically that does mean that you could have bugs like um the firewall corrupts the payload of an icmp echo reply packet and and ping doesn't check for this corruption or expect this corruption that is a bug that you wouldn't notice in this setup um so that is that is arguably a weakness of this setup is that you're not actually doing formal testing that is the tcp or ip output actually standards compliance all we're testing is is free bsd itself happy with the packet it's packets it sees uh i think that's a price worth paying because doing the full formal verification is much much harder uh and it turns out that you know if free bsd does tcp wrong we're going to notice uh if nothing else the netflix people will shout at us well they won't shout they're actually nice people but you know they will tell us uh now that i've all got you all fired up for these tests where can you find them the source code for these tests lives in uh user source assuming you've installed your source to user source tests sys netpfil pf which mirrors the structure of where the the code for pf lives they get installed to users tests sys netpfil pf which again mirrors the structure of where they're uh installed so you can uh if you want to run them yourself uh you might want to run some uh might want to install some tools qr to actually run the tests and uh i never know how to pronounce this scapey uh it's a python tool that lets you generate and analyze packets so there are some tests where i actually deliberately create uh very specifically formatted packets and actually go look at them some of the fragment uh fragmentation tests just do this where you can deliberately create corrupted packets for instance uh which might also be a really interesting test to do um if i knew of any you know malformed packets that cause panics for instance that is something i would definitely do is create this packet to attempt to provoke this panic uh you want to load well pf sync because a pf sync module depends on pf so it will implicitly load that one uh and then you can go to that directory and as root run qr tests and it will run through all of the tests for pf takes i forget about 30 seconds we don't have that many of them yet but it turns out that having a couple of tests already gives you a lot of value so you don't need to think that you know it's a firewall with hundreds of features so i need to write thousands of tests ideally you want to have thousands of tests but it turns out that just having you know five of them already tests a lot of the functionality of pf and you you can get a lot of you know fallout accidental tests along the way for instance this uh ipv6 fragmentation bug well the reassembly bug we didn't have code to specifically test reassembly pf just tested this accidentally while it was testing some well something related but not quite that code so you you get a lot of value out of relatively little investment for tests um at this point you know i would appeal to authority and get some sort of really profound quote from someone to to reinforce my point i didn't find one because i was lazy and i didn't look for one so the quote that you're getting is you know the tests are good and you should write more tests and and and it's for me just now so you know it is a very well sourced quote i hope you know to continue to try to persuade you that tests are a good idea and that you should try to write more of them what is in it for you well you can prototype your own setups you know if you want to play with with carport or pf sync or or pf and and you don't happen to have a box lying around to run this on just spool up a jail and and test this and if you write them as a test you can keep running this and you can be sure that this won't actually break uh so if you have a very specific use case that relies on a feature that you're not sure other people are using you know send me a test case we'll include it and you can be sure that the next time somebody breaks it it will get fixed rather than you noticing you know when you try to upgrade to a release two years after somebody broke the feature for you another case and and this is what i really want to focus on is when you report a bug to pf i have to try to reproduce it i have to try to work out what your test setup is and quite often in most cases in fact i spend more time trying to work out what your setup is how it breaks than actually debugging the problem so if you give me a test case you've already done more than half of the work not only that if you give me the test case when i fix the bug i would like to have a test for it well you've given me a test case most of the work is already done so if you have a pet bug that's been really really been annoying you and you want it fixed that's the best way actually that's not true that's the second best way the best way is to pay me no that's not a joke you know i'm very money motivated if you pay me to do work for you i will do work for you but if you don't want to pay me you should but if you don't want to pay me write the test and i promise you well i'm not going to promise you that i will absolutely fix your bug for you but it does vastly increase the chances of it getting fixed i've had someone report a bug to me in in the NAT system and it was a very strange bug and i had no idea of where it was or how to fix it right up to the point where he gave me a trivial test case for it not quite in this format yet but he gave me a configuration file and a little shell script and you know you do this and i see these problems and then i could suddenly reproduce the problem and it only took a day or two to fix for a you know a problem that started out as a description of you know i have this very large NAT setup you know very large i have you know i run a university so a lot of students a lot of traffic flowing through this and every once in a while without any obvious cause the machine will just stop forwarding packets and then it will sit in that state for a few minutes and then it will start forwarding packets again until we have the you know the active description of this is how you trigger it and then it only took a day to well it took barely any time at all to figure out where the problem was and fix it so write tests illustrate your your problem make it easy for other people to fix the bug for you or you know you can also fix the bug yourself and that's awesome but it turns out that the reviewing code means that i need to understand what the problem is which means that i need to know what your setup is and what the bug is which is also accomplished by a test case if you want to make it even more likely that your bug gets fixed write the test case write the patch and submit that but you know whatever you do test case test case test case um if you don't want to be the first one to do this you won't be uh olivier uh korshar labé was is one of our uh one of the contributors to the free bsd project and he's really really good at benchmarking things and for his previous employer he really cared about ipsec and it kept breaking and he was really sick of this so he wrote tests for it and now there are tests and ipsec isn't broken and if somebody does break it lee one will shout at them well that's a lie lee one's a really nice guy he won't shout but he will tell people that it's broken and that they should fix it and this also means that you know because we run these tests a couple of times a day if you commit something that breaks ipsec today we're going to notice tomorrow or on monday and you're going to have a bug assign to you on monday or on tuesday so it will be very fresh in your memory uh you know before i belabor the point anymore does anyone have any questions have i been that clear oh uh yes go ahead well it's filter i don't uh test pf as a filter on the low level i mean when you for example use ping you you look um there's a lot of things that can go wrong uh when the ping doesn't exit exits with the executable tool and you're not quite sure if the as you mentioned before that pf is broken yes so uh the question is you know when we test pf based on the ping tool there is a lot of codes that needs to run so why don't we simplify this down so that we're only testing pf that is a valid approach to take in testing i think the advantage of not doing it that way of doing the the you know we just use the ping tool to send the packet is twofold the first is that it's much easier we don't have to write any code to send ping messages we don't have to write any code to receive them to validate them that code is already there so we can just use it makes it much easier to write the test another side effect of this and you can argue if that's good or bad is that you test much more code so that's the incidental testing i was talking about like we had with the the v6 reassembly code you are incidental to this test you are also testing ping and that's an upside because you test more of the the code and the functionality of the system it's also a downside because when it breaks it's not immediately obvious where the breakage is so if somebody breaks the ping tool changes the return codes me changes it so sorry so that it no longer sends icmp echo requests or or it it thinks that they're corrupted when they're not corrupted this test is going to fail and is going to look like pf broke look that's arguably a downside but hey at least we know that something's broken and especially if you run them fairly free these tests fairly frequently you will typically not be looking at thousands upon thousands of commits that could have caused the problem so if you look and you see you know today it was working tomorrow it was broken in this time span we had 50 commits 40 of them touched unrelated driver code five of them touched the ip stack in some way one of them touched the ping tool there's not too many things that could have caused the breakage it's a valid point but you can argue it both ways yes yes so the question is do we have test cases for the other firewalls in free bsd so free bsd ships out of the box with three firewalls ipf ipf w and pf we're not going to go into the reasons for this because honestly i don't know if i understand and and agree with all of them but the fact is we have three firewalls we currently don't have tests for the other firewalls right now pf is the only one that has these tests it would be trivial to look at the pf tests and go okay now i'm going to rewrite them or i'm going to write similar tests for ipf w or ipf i believe both of them also support vnet it does actually take some changes to the firewall to make this work but to make vnet work not to make the test work the tests do assume that you have vnet so we don't have them right now i actually just this morning posted an idea for a google summer of code project so if there are any students who are interested in working on firewalls and testing there is a project write more tests ideally it should be possible to write a test so that it works on all three firewalls there is some functionality like pf sync that is common that is sorry is unique to pf so you would have to write a test specifically for pf but for things like you know drop this packet allow this packet keep state you could write tests that would work on all three we don't have them right now it would be awesome to have them any other questions okay thank you very much for for your attention so the question is do the tests include any randomness right now they do not again randomness in tests is something you can argue both ways so it's good because first testing is great because you can trip over unexpected bugs it's bad because you can't necessarily reproduce your results right now there is no randomness in it or no deliberate randomness anyway