 OK, good morning. My name is Austin Gadeant. I'm co-founder and CTO of a Linux security company called Valley Cyber. I'll start off by saying a little bit about my background and experience. I did my undergrad at the United States Air Force Academy and my master's degree at MIT. And previous to my current role, I was working as an active duty officer in the Air Force on a program focused on developing secure software architectures for satellites. And it turns out the flight computers for most satellites run Linux systems. So I was working with Linux systems from a defensive perspective. And the program was focused on acquiring new technologies that could be used to secure these systems. And we ran into a lot of challenges, which is ultimately what motivated me to start Valley Cyber. But I'm not here to talk about value. I'm here to talk about security perf, which is an open source tool we've developed to make it easier to determine the performance impact of security solutions on workloads that are commonly run on Linux systems. So let's start off with a little thought experiment. Let's imagine we have an Apache web server. And we've benchmarked this server. It takes about five milliseconds on average to respond to a web request. So it's a nice, snappy, fast server. The users that are using it are happy. They're getting good response times. And the service is blown up. So we want to make sure that we secure the data that these users are accessing from this server. And let's pretend we have a tool that's going to provide perfect security for the system. So we apply this tool. And we're not going to have to worry about cyber attacks anymore against this system. And it's going to impose performance impact that's going to increase the response time by 1%. So instead of taking five milliseconds, it's 5.05 milliseconds. And in this case, like it's good, we're cruising around. We have a wonderfully performance system, even with that small performance impact. And it's totally secure. So we don't have to worry about security anymore. This tool had a much greater performance impact. What if instead of 1%, it caused the response time to be 500 milliseconds or 5 milliseconds? Well, this is not so good. Folks are not going to be too happy about using a web server that's going to take a long time to load. And they're going to be waiting for the web pages. So it's not going to be a good user experience. And what this little experiment tells us is that perfect security often isn't the goal. In fact, in industry, sometimes even basic security isn't really put much effort into. Instead, the valuable services are where people put a lot of effort. You see this where security is put on the back burner for new features or new capabilities that are designed to meet timelines. And that tends to be the focus of a lot of organizations. And to some extent, it makes sense. You want to put effort into securing something that's valuable so you're only going to spend resources securing something if you feel it's valuable to secure in the first place. The other thing to note is that the one thing that's very true about this example, obviously we don't have a magical security tool that makes everything perfect. But many security tools are going to have some sort of performance impact on a system that they're securing. And you can think of firewalls that are going to inspect network packets. This is going to have an impact on network throughput. If it's a file scanning anti-malware system, that's going to scan files for signatures. That's going to consume CPU resources. So it's very common for a security tool to have some sort of performance impact. And what motivated me to create Security Perf in the first place is when I was working on this satellite program that I was talking about earlier, I was at Black Hat. And I was going around the expo floor talking to different vendors about the products, trying to understand what the performance impact of their security solution is. So I would get answers like the tool they have is lightweight. Low overhead is, on average, consume less than 1% of the CPU. When I dug a little bit deeper to try to understand how they measured those performance metrics, it was almost always the same answer. It was, go install the tool, go try it out, run some commands. You're going to see that your system isn't any slower when you run those commands. And a human is not going to be able to tell the difference of a few microseconds or a few milliseconds. But it's certainly going to matter, especially when you're thinking about deploying a system at scale. And with Linux systems, they're often deployed at scale. And a few milliseconds that are added to the response time of a web server or a database is going to impact the amount of underlying infrastructure you need to serve the same amount of user base and the same amount of load, which can, of course, increase your competing cost. So performance is super important. And the reproducibility of some sort of performance benchmark you want to apply is also very important. When I started doing some benchmarking on different tools, I would write my own benchmarks or take benchmarks. And I'd basically install the services, install the benchmarks, and run them individually. And when someone tried to reproduce the results a week later, it was very difficult for them to do because often these tools take a little bit of effort to set up. And it is the case that automating these benchmarks and automating the instantiation, the environment for the benchmarks is something that can make getting reproducible results much, much easier. So let's talk about some common mistakes I see get made when folks are trying to do performance benchmarking of Linux security tools. And with users that are a little bit inexperienced, a very common thing to do is to use top or htop to look at the CPU utilization or the memory utilization of processes and kernel threads at runtime after they've installed some sort of tool. But this isn't going to tell us the performance impact for, say, the transactions per second. A web server is going to be able to achieve because top and htop are going to look at the proc file system. They're going to gather information about the amount of time a thread has been scheduled in the user space, the amount of time it's been scheduled in kernel space, and the amount of time it's been idle. And from this, they calculate CPU percentages. But this doesn't necessarily tell you what the impact of something like a kernel callback might be. A lot of security tools will install kernel modules onto a system, and those kernel modules have callbacks. And that overhead ends up getting mixed into whatever a user space application is going to be using whatever function the kernel module is inspecting, whether it's system calls or a network traffic or something along those lines. Top and htop can certainly be useful for triaging issues if you already know a problem exists, but they're not detailed enough to capture the sort of information we're looking for. So on the other side of things, or top and htop are a little bit too high level, you have functional benchmarking. And functional benchmarking is essentially you run a certain function as many times as you can in an infinite loop, and then you calculate how much work you were able to do over that period of time. And a great example of this is the Byte Unix Bench benchmarking framework. Byte Unix Bench is very popular. It's open source on GitHub. And essentially what it does is it has different tests that it runs through the drystone and wetstone tests up at the top. Run register-based arithmetic instructions so it doesn't execute any sort of system calls. It's all occurring in the user space. Process creation is how many times a process can be forked within a certain amount of time. Then you have system call overhead, which is essentially trying to measure the cost of context switching between the kernel and user space through low overhead system calls like getPID or getUID. And the issue is these sorts of tests lack the context of an entire workload. Because if you think about what a web server is going to do, sure it's going to be executing system calls and opening files. But it's also going to be reading the data from those files, putting that data into a format that can be represented for HTTP responses, and it's going to be sending that data across the network. So there's a wide variety of things production workloads do. And there's a paper that does some functional benchmarking called the Performance Overhead, or excuse me, Analyzing the Overhead of File System Protections Using Linux Security Modules. And they present overhead of different file-based system calls when being monitored by different Linux security modules, App Armor and SE Linux, for example. And this table is showing some of the calculated metrics they got for SE Linux. And they note that the overhead for open was very high, that they calculated it was over 80%. And one of the conclusions of the paper is that this is a concern and a potential problem is something that should be looked into. The issue here is that in Glib C2.26, back in 2017, there was a change made to the way the open library function works. And so it doesn't actually use the open system call under the hood. It uses the open at system call. And so most modern workloads, if you were to S-trace them and look at what system calls they're executing, the open system call is not called that often. Instead, the open at system call is what is used most of the time. So there's a common belief in computer architecture design where you want to make the common case fast. And if you're a maintainer of SE Linux and you're looking at these results, perhaps optimizing open actually isn't the best use of your time because it doesn't actually use that much. Optimizing the performance of open at could be far, far more useful. So with that in mind, let's talk about security perf. So security perf is built on three main tenets. The first is that it needs to be realistic. So we want to use realistic workloads and benchmarking suites for those workloads that are going to produce load on the system that is going to represent what happens in production. And we want to make sure that it's portable across different versions of Linux and distributions of Linux and different hardware so that all the different flavors and iterations of the different ways that people use Linux are going to be supported. And then we want to make sure it's easy to extend. So if you want to add a new service or a new configuration for a service, this should be very straightforward to do because once again there are lots of different services that run on Linux systems and lots of different configurations for those services. Okay, so security perf at its core is a collection of Python scripts and Docker files. And these Python scripts are going to be used to automate the running of different benchmarks and workloads. And when those benchmarks and workloads are run, metrics are going to be captured about the results. And the results and the workloads will be run for a certain number of iterations so that we can do some statistical analysis on all the different tests. By default, security perf implements five different services. The first is Apache. It's gonna use the Apache benchmarking framework and Apache benchmark is an HTTP benchmarking system that is going to make a certain number of concurrent requests and it's going to make a certain number of requests that you tell the tool to make. And then there is MongoDB, which is a popular NoSQL database and that is gonna be benchmarked by the Yahoo Cloud serving benchmark or YCSB. And YCSB is very popular open source on GitHub. It's used to benchmark lots of different NoSQL databases. MongoDB just being one of them. And MySQL is gonna be benchmarked by Sysbench, which is one of the recommended benchmarks for MySQL. It can also be used to benchmark various kernel operations, but in this case, we're gonna use the MySQL benchmarking capabilities that it has. And then RabbitMQ is a popular message queuing service and there's a specific performance tests written for RabbitMQ by the same folks that wrote RabbitMQ service itself called RabbitMQ perf test. That's what we're gonna use in that case. And then WordPress is a bit of an odd ball out from the other services in the sense that WordPress is the combination of two different services, a database and a web server. And so WordPress is gonna be composed of the Apache web server and the MySQL database for the default configuration in security perf. But if you can certainly configure WordPress to be used with a Postgres database or some other type of database or some other web server as well in different types of security perf configurations. And so WordPress is gonna be benchmarked by the Apache benchmark as well. It's gonna be making HTTP requests, concurrent number of HTTP requests and a certain number of HTTP requests that we provide the benchmarking suite. Okay, so how does security perf work? The first thing it is going to do is it's gonna run a script called run.py and this is the script that the user interacts with and it's going to build container images both for the benchmarking container and the service container. The benchmarking container is gonna contain the benchmarking suite that's run against the service and these container images are gonna be used to instantiate containers that run on two different operating systems. The benchmark OS and the service OS and so the idea here is to mirror the way these client server architectures work in production where you have two different operating systems that are going to be communicating with each other. Ideally, you set up your operating system so that they are on separate hardware and disconnected by a network so that you can mirror what's happening in production as best as possible and we're gonna use SSH to log in to the service OS and once we log in with SSH, we're gonna copy the service container images onto the service Linux OS and we're going to keep the benchmark container image on the benchmarking operating system and by default, security perf is gonna use password-based SSH to do the authentication. You can certainly configure it to use keys as well and run.py is gonna start the service and it's gonna wait a certain amount of time for that service to initialize itself and after the service is initialized, it is going to start the benchmarking container and the benchmarking container is gonna run a benchmark against the service. It's gonna make a certain number of connections and requests against that service. It's gonna receive the responses and it's gonna calculate different metrics about the responses and the response rate that it was able to achieve for that specific test and the data of these different responses is gonna be captured by run.py but first, we're gonna shut down the different services in the benchmarking container and we're gonna clean up. By default, security perf is not going to remove the container images from the given system so there is a little bit of this space that gets used to run those different or keep those different container images on the specific systems and the reason for that is you don't wanna have to build a container image each time you run a test. The goal is to be able to run the test in many iterations so that you can do things like calculate averages and standard deviations of the different tests that you ran. And now it's time to gather results so run.py is gonna take the output from the benchmarking suite and it's going to parse it and gather a specific metric that you might be interested in from that benchmarking suite and it's gonna write the results into a TXT file called the summaryresults.txt file. The summaryresults.txt file has the different values for each iteration of the test that you ran and it's gonna have calculations like the average and the standard deviation as well. And then we want to compare results between different tests so the whole idea with security perf is to take a system that's a baseline system that doesn't have any sort of security tooling or security change or modification applied to it and that'll be your baseline that you gather metrics for and then the next system is going to be a modified system so this is a system that you've applied perhaps a new LSM policy or a new tool or perhaps you've stuck firewall in front of it whatever security change you wanna make and then you're gonna gather the results for both of these different iterations of the tests and you're gonna compare them against each other and create the comparisonresults.txt file which is gonna contain a calculation of the overhead percentage between the baseline and the system that had the modification that was made. Okay, so let's take a look at how you can go ahead and create a new service. So we'll use the Apache service as an example and the Apache service as implemented in security perf is gonna have two subdirectories, a Python script and a readme. The readme just contains information about the benchmarking suite that is used and some general information about the test. The Docker files that are used to create the different environments the service and the benchmarking suite runs in are very straightforward. They're gonna define the environment for Apache and also define the environment for the Apache benchmarking suite because once again that's the benchmarking service we're gonna use for Apache and then there's a load testing script which is going to define how we run the Apache benchmark test and by default it's gonna make 100 concurrent connections and it's gonna try to make 100,000 requests when it runs its test against the Apache service. The Apache service is written in Python and this is the representation of Apache from security perf's point of view. There is basically a class hierarchy within security perf. The benchmark class is the superclass of all the other service classes. Benchmark class contains functions like functions that will allow you to SSH between different systems and copy data, copy files between different systems and the benchmark classes contain specifics about the individual benchmarks themselves and so for example with the Apache benchmark we have things like the container names, the container images names, the service name, the initialization delay is also an important metric and this is an area where it can cause some issues potentially depending on how slow your hardware is. So we found that when running benchmarks against specific services you might run into a problem where if the service hasn't had enough time to initialize either you're not gonna be able to connect to it or if you do connect to it it's going to affect the results that you obtain in a meaningful way and so the initialization delays are by default quite generous and so it's gonna be the amount of time that security perf is gonna wait for the service to initialize but it might be the case that the service initialization delays need to be adjusted or increased based on the specific hardware configuration that you're running on and the other notable parts of this script would be the commands that are used to run the different Docker images and then you have the results header, target token line parser. These are essentially portions of the class definition that define what metric you're looking for and what you wanna gather to do your comparison between different iterations and different runs of security perf. So the WordPress service is a little bit different. WordPress service is a composition of two different services so it's MySQL and Apache and as a result it overrides some of the functions from the base class. It's still gonna use functions from the base class to run commands on remote systems and copy data between the different systems but it is slightly different in that it's gonna be using Docker swarm rather than Docker directly to compose the different services that are gonna make up MySQL or that are gonna make up WordPress. So the WordPress service is represented through a YAML file which is what's used by Docker swarm by default and the YAML file is gonna define the ports that are used by the different services and the user names and passwords that are used to communicate between the different services. The user names and passwords are parameterized so that they're randomly generated at runtime and they're randomly generated at each iteration of the test and the WordPress service itself in security perf is represented by 31 lines of YAML, 51 lines of Python. So it's not a whole lot of code and the Apache test is even simpler so the idea is if you wanna add a new service you essentially write or you take the following steps. The first thing you're gonna do is you're going to create the Docker file for the service which is gonna define the environment that the service runs in and then you're gonna create a load testing script which is gonna be used to run whatever benchmarking suite you wanna run and benchmark that service and then you're gonna create the Docker file which is going to define the environment that the benchmarking suite runs in. And once you've done this, now you wanna test manually. Before you try to automate everything we definitely recommend you test it manually first so that you can work out any kinks like firewall rules that might be needing to be adjusted to ensure that the benchmarking suite and the service can communicate effectively and so you can run the different Docker images with a Docker run command and test that your service script or your benchmarking script is gonna run properly and then you also wanna make sure that once this is working you can add the changes to the run.py script now you can automate the test and run the test for a specific number of iterations and then of course you can test that everything is working with run.py directly. Okay, so interpreting results. This is an example of a results file from SecurityPerf for the MySQL test and the MySQL test is gonna use Sysbench as I mentioned before. Sysbench is recommended by MySQL to test the service and the specific metric that's gonna be pulled out by SecurityPerf by default is the transactions per second metric and transactions per second is in this case 60.84 so that line parts of that lambda function that we saw earlier in the definition for a given SecurityPerf class that's what is gonna be used to pull the value out of the file and after we've done this we can now compose all the different results into a set of iterations and then we can do things like calculate the mean and the standard deviation across all the different iterations of the test that we've run so this is the summaryresults.txt file which contains each of the results for each of the different tests that we ran and then it's also going to contain the mean transactions per second and the standard deviation of the transactions per second as well. And then of course compare.py is going to be used to compare the results between each other so you're going to take the modified system and the baseline system you're gonna have the different results the summaryresults.txt files those are gonna be the inputs to compare.py and compare.py is going to compare those results and determine what the percentage overhead was from the different set of tests that you ran for the two different systems and this is an example of a comparisonresults.txt file where you have the modified mean the baseline mean you also have the standard deviation that was calculated for each of the different tests and the percentage overhead which is the overhead that was calculated based on determining the percentage difference between the baseline and the modified mean. Okay and now some results for specific applications that we tested. So we tested Clamav, AppArmor and we tested Falco and we chose these specific tests because Clamav is a file scanner AppArmor is a Linux security module and then Falco is a BPF based runtime security system that's gonna be using EBPF to inspect system calls that get executed and Clamav had significant overhead on Apache and there's a good reason for this is because when I ran the Clamav test when I started the Clamav test I ran a file scan with Clamav and so if you've ever used Clamav when it's running the file scan it's gonna consume quite a bit of CPU by calculating check sums of different files on the file system and also comparing different file names against known bad file names that it has in a database and the other thing that's important to note from these results is there was a negative overhead calculated for MongoDB and this might seem a bit counterintuitive at first that you might have a negative overhead why would having a security solution cause you to have better performance and the reason for this is there's non-determinism in the operating system, there's non-determinism in the network there are a lot of different variables that are hard to control and so what I would recommend when using security perf is take a look at the standard deviation so in this case the standard deviation is 34.23 and the percentage overhead difference is within two standard deviations of the mean so if you're within two standard deviations of the mean essentially what we're saying is there's probably no noticeable difference in the performance impact if you want to reduce the standard deviation you can certainly run the test for more iterations by default we do five but you can certainly run for a hundred or a thousand different iterations to reduce the standard deviation and make sure you're capturing the true mean that can give you a better metric on the overhead calculation that you obtain okay and then we have app armor and so app armor did quite well across the board with these tests it did have a 10% overhead on my SQL but once again I would suggest that you take a look at the calculated standard deviation and the calculated standard deviation in this case was 8.74 which is over 10% of the mean so in this case it would be recommended that you would run security perf with more iterations so that you can calculate a better standard deviation a lower standard deviation and you can be more certain that whatever overhead you capture and calculate is going to be accurate and so in this case the overhead is still within two standard deviations but that standard deviation is quite high so the recommendation would be to run the test with more iterations in the future and we have Falco EVPF agent so Falco has several different drivers that it can use it can use a kernel module, has an EVPF based driver it also has what it calls a modern EVPF based driver and the modern EVPF based driver just uses some newer functionality and EVPF that allow it to improve performance a little bit and we selected the old EVPF driver because that's what's released in production the modern EVPF driver is still it's still in a beta phase right now so it hasn't been released for production use at this point and the notable result in this case is that WordPress had a significant performance impact the other services did quite well and the important thing to note here is that why did we have a difference with WordPress when Apache and MySQL did so well so remember the WordPress sample that's used in security perf by default is gonna use Apache and MySQL as the different services that compose WordPress and the reason why WordPress had a higher overhead than the Apache test for example is Apache is configured in a slightly different way between the two different runs so for the Apache test Apache is configured to spin off a new thread for each concurrent connection it has but there's also an alternate mode where you can configure Apache to spin off an entirely new process and so that's what's happening with the WordPress test where Apache is spinning off a new process per concurrent connection it has given the way that Falco works it's gonna look at the system calls that are being executed by specific processes at runtime and it's going to correlate those system calls with those individual processes this sort of behavior, this sort of activity imposes a higher impact on the system and ironically you could actually see this with top if you were to run top while you were running this test you could see that Falco is consuming much more CPU for this specific test and this is where a lot of the beauty of security perf comes in where security tools have to be deployed in many different situations across many different workloads and we want them to be performant in all of them and even small configurations of the specific service even small configuration differences of a specific service can make a big difference in the ultimate performance impact security tool might make and so the idea with security perf is to automate the running of these different services automate the benchmarking of these different services and allow you to run many different iterations and variations and configurations so that you can capture what the performance impact of various or of a given security tool change might be if you apply it to security perf and the services used by security perf. So what did we contribute? So security perf has three main tenants it's built on it's going to be realistic so we want to make sure that we're using services that are commonly used on Linux systems like popular databases and popular web servers and we want to make sure that we're using benchmarking frameworks that are specifically designed to benchmark those given services we also want to make sure it's open source so all the different benchmarking suites we use our open source security perf itself is open source so that anyone can take a look at what the benchmarking suites are doing and what security perf is doing under the hood and by doing so we help make things a little bit more reproducible so that you can take the code modify the code as you see fit and you can also run everything in a containerized fashion that's what security perf does by default so it's going to run the services and the benchmarking suites in containers that allows the system to be deployed on any sort of Linux infrastructure that's going to support containerization which helps make it portable between different hardware configurations and different operating system configurations and of course we want it to be easy to extend so we want it to be easy to add new services security perf and add new configurations of individual services to security perf so that if you have a given service that you care about or that you're running in production that you're worried about testing with a bunch of different security tools you should be able to do that in a straightforward manner the currently supported services are MySQL, MongoDB, RabbitMQ, Apache, and WordPress on the roadmap we have Nginx and Postgres coming up because they're also two very popular services that run on quite a few different Linux systems security perf is open source on GitHub so you can go grab today if you'd like we definitely welcome any sort of collaboration from the community so if you want to suggest new features or new services to be added to security perf definitely launch a new issue or log a new issue in the GitHub repository and also feel free if you want to to make your own changes to the code base and submit pull requests I'll get to those as quickly as I can but we would definitely love any sort of involvement from the community with security perf so that concludes the talk I would just finish up by saying that I hope all your Linux systems stay secure and I hope they all stay highly, highly performant as well so at this point I'll take any questions Hi, I have seen there are some significant high overhead in the test so do you have any guidance to inform how to use root cause of this? Yes, yes, so I think that functional benchmarking I was talking about earlier with Unix Byte Bench the problem with it is it lacks the context of a specific workload but it can be very useful for benchmarking a specific function so let's say you want to or you already know that there's some sort of bottleneck with the open system call you can use functional benchmarking to test the open system call specifically so what I would recommend is if you have a security tool that you know you can add some instrumentation to it if you're the developer of that security tool you can try and use that instrumentation to determine where you're using a whole lot of computation with security perf and then you can do functional benchmarking so then you can do a little bit more fine grain test on a smaller test than using an entire workload to really exercise that part of the code where you're seeing a bottleneck the other thing that could be quite useful is perf especially if you compile your security tool with debug symbols you can use perf to get information about the specific functions that are being executed most often and that can be helpful for determining what bottleneck is being triggered by a specific workload that you're testing with security perf I have another question what if they all has caused by the hardware such as the TRB flash the catch maze something like that so if the overhead is caused by the hardware I would recommend running the tests on the same hardware over time of course if you have some sort of hardware differentiation where if you run the test one time and there's some change with the hardware that occurs and that affects the test later on that's not something that security perf really handles very well so security perf is designed to test software variations and the differences in software it's not going to be able to incorporate the differences in hardware tests the one thing you could do is run the tests for more iterations or over an extended period of time so that you capture the differences you might experience due to hardware changes in hardware operation and that might be able to allow you to capture some of that information as part of your test as well so your example security services that you ran through two of those were kernel level that I presume you just installed in the kernel of the benchmark machine but the Clam scan is just a user level process was that running in the service container that you were benchmarking how did you inject that or was it running on the container host? Yes, so the idea with security perf is it's not going to automate the installation the security tool you do that yourself so you install the security tool or the kernel module or whatever you want to test and Clam AV in this case was installed just on the host system so it wasn't running inside of a container and with Clam AV when it's running a file scan it's going to consume a lot of CPU resources which is why we saw a greater overhead with Apache but during the Apache test it stopped doing its file scan so it finished its file scan that's why there wasn't as much of an overhead with the other services because it wasn't really doing anything anymore so the Clam AV example it was running on the host system and it wasn't running in the specific container that was being run for the service. Any roadmap to include some hooks for actually putting the security service in the container that's running the service that you're interested in benchmarking? Yeah, that's a great idea so I think ultimately what security perf is doing is it's running a Docker command so we could add some instrumentation or some additional command line options to run .py to allow you to extend that command so that you could install or run whatever security tool you would want to run. The other thing that you could do is you could just adjust the Docker file that's used to define the environment for the service so if you adjust that Docker file to include your security tool by default then any time that service gets tested it's going to include your security tool and that way it's going to automate the running of that security tool because every time the container spins up it's already included in the container and that security tool is going to run. Thanks. Okay, I think we're all set. Thank you so much.