 Tak for at komme til den forstående sæson af KubeCon Chicago 2023. Det er forstående introduktioner og OSSFORS demo. Jeg hedder Adam Kocinski. Det er David. Vi er fra etologisk, og vi gør en masse forstående i generelt, og en masse forstående af C&TF-projekter. Og det er en mere praktisk forstående approach. Vi har mange eksempler og praktiske hvordan forstående for at forstå med forstående og sæsonen af afføringen. Vi taler om C&TF forstående ekosystem. Der er en masse af ting for CSTF-projekter, specifisk. Men der er en masse ressourcer og materiale som er available for den åbentlige kommunitet som et sted. Så hvis du ikke er en C&CF-projekt, er det også for dig. Så vi vil gøre en lille intro på, hvordan forstås en softwareprojekt. Ja, jeg vil gøre det i en sekund. Og så forstås en livscykel. Så hvad som sker, når du har rettet en forstås? Hvad har du gjort? Og hvordan har du gjort det effektivt? Og OSS var det endelig, som er en stor del af den forstås-livscykel. Så ja, forstås som koncept er en måde til, i essens, stress-test-applikationer. Og med en hel del af det, er det at finde et sted. Både i sekuriteten og afsætninger. Og det gør det, fordi det er stress-test, vi arbejder med dynamic analysis her. Det betyder, at vi faktisk sker den code under analysis. Så hvad du har gjort, når du har rettet en forstås, er, at du har givet en forstås, og denne forstås vil gøre et forstås. Essentlig infinitivt. Det betyder, at det gør en forstås, før det stopper, eller når det findes en bug eller sådan. Og når jeg siger, at det stopper, men når nogen stopper det. Så det er bare et processus af stress-test-applikation over og over igen. Og i generelt, en forstås approach, som vi bruger, som er coverage guided forståsning, kommer det bedre over tid, når vi analyser en specifik target. Og det er fordi det relaterer på det, som vi kender genetik mutasjonale algoritmer, som stopper en set af testkæsene, hvor hver testkæs representerer en input, der vil udføre en unik kodpath i et targetprojekt, relativt til de andre testkæsene, de første har genereret. Så det er sådan en lille forståsning. Jeg vil ikke gøre så mange af de konceptielle detaljer af forståsning. Men det gode er at finde bugs, finde vulnerabiliteter. Så måske, jeg vil sige her, det er ikke bare vulnerabiliteter, som findes. Og det, hvor du gør det, er at udføre et forstås af din applikation, din software over og over igen med sådan en random input. Mange projekter her i CNCF er først, med alle de, som har været her. Og primært, perhaps det er vort at møde Anvoj og Fluenbit, fordi de var nogle af de første adoptere af forståsning, primært fordi de er rytte i mængden af sætninger, så de er rytte i CNC plus plus. Og dette er usuallyt, hvor forståsning originerer fra, fordi mængden korruption kan have meget mere særlige konsekvenser, så siger de, at det er en sådan enkelte forståsning i Python eller sådan noget. Så det CNCF forståsning ekosystem er en relativt små ekosystem, i deres sætning, at vi har en repositori, getup.com slash CNCF slash CNCF forståsning. Og du finder en masse resurser, som vi har for forståsninger på CNCF. Du finder ordentale rapporter. Så det usurde approach, som vi tager som CNCF forståsning, som vi engagerer med det givende CNCF projekt, de mantænere i dette stedet. Vi analyser en bit af deres større model, der forstår, hvad der er konsekvenser af, hvis noget går forståsning, når dette api er kaldt og sådan noget. Og vi sammen med de mantænere, at bygge op som en forståsning, som vil kælde appen i mange andre måder. Og en forståsning her er essentielt bare, som en forståsning, som er unitester på steroider, i hvert fald. Og så har vi gjort dette kollaboration med CNCF projekt. Vi skriver op en ordentale rapporter, der har detaljer af hver del af større forståsninger, der er varige findere, hvilke større forståsninger eller forståsninger. Og du kan finde alle disse rapporter, som reposterterne har tænkt på CNCF forståsning. Og en anden større komponent i CNCF forståsning er projekt OSSFOS, der er et service run på Google, der er essentielt bare et gæt op repositorer, men det er, at du placer nogle filer i dette repositorer, der bygger forståsning af et projekt. Så E-Steer vil have et projektfoldere på OSSFOS, Envoj vil have et projektfoldere på OSSFOS, og Google vil gøre din software, og rune det i cloud, og reporte til dig, om der er nogen sikkerheder. The importance is here, that they will do it on a daily basis. So you develop a forståsning, you put it up on this repository, and then they'll run it for the next ten years, or as long as they keep it going. And this is really central to the way we forstås in CNCF, because we help write the harnesses, and then we let the maintainers take over as much as possible. We'll give you, so OSSFOS is a really key here in the ecosystem, and we'll give you some examples of what it can do later. So we, I just want to give a small shout out, because we just released a forståsning handbook that goes into what is forståsning, how to forstås in various languages. So in this case, CNC plus plus, Python and Go. And then it also has a sort of thorough guide to OSSFOS and end to end walk through showing various details, that you don't really find in the documentation of OSSFOS, and gives a very pragmatic approach to how you can integrate into OSSFOS. And if you are a maintainer out there, at CNCF project maintainer out there, it's worth mentioning that once you integrate into OSSFOS, you can also claim a bounty and get a reward for actually submitting a project and en rolling in OSSFOS. So a small shout out there, you can check the link to the handbook. With that, I'll pass it over to Adam. Yeah, so this is a small walkthrough of forståsning, of writing a forståsning and running it, starting with how we write a forståsning. And David mentioned that what we do with CNCF forståsning is we, David and I, write a bunch of forstås for the CNCF projects and hand it over to the respective CNCF projects. And at that moment, it's a great thing if the project takes the forståsning further, adopts it, maintains it, and perhaps write their own forståsning in the future. So this is a smaller intro to how that process looks like. You need four things to fuss essentially at a high level. You have a piece of code that you want to test for box and vulnerabilities. You have a fussing engine that generates the test cases that David mentioned in the intro. And you have a fuss harness that's the test that you write as the developer. And finally, you need time to run the forståsning. It needs to run independently. Again, David mentioned this in the intro. Here we have an example of a harness. This is a harness for the Knative Project testing this admit method and of the config validation controller. What we do here, this is a little more than an advanced example but what we do here is that we create an admission request and randomize it and pass it to this admit method to see if the admit method somehow crashes when executing or is being passed some sort of weird admission request. And you can see here in the bottom this forståsning lives in the CNCF forståsning repository. Let's try to look at how forståsning explores code. As you recall, David in the intro mentioned that part of an important component of forståsning is that forståsning explores the code that it tests, which is why we need time to run the forståsning to a forståsning that we have written to give it time to explore the target application. So let's look at how forståsning does that. So let's say here on the right side we have a piece of code that we want to test for bugs and vulnerabilities. Fairly simple piece of code that does a series of checks against a string and returns true if the string finally is equal to abcd in lowercase. So we have our forståsning engine and typically as a developer we don't really do much with the forståsning engine at a high level except of choosing it. But what the forståsning engine does is that when we write our harness it gives us the and when we run that harness the forståsning engine is in charge of generating the test cases that we receive in our test harness. So I'll just quickly go back here to this k-native example in the top here. We see that this is the test case that the forståsning engine generates for us and we it's passed to our forståsning harness as a parameter data here by slice and it's our job in the forståsning harness to pass that test case onto the code that we are testing. So as you see here the engine passes the input test case to the forståsning harness and the forståsning harness' job is to pass that onto the code that we are testing. So let's say that we have written our harness and we run it. Now the engine will generate test cases at a very high speed. So let's take an assemble and zoom in on a few of those test cases that the forståsning engine will generate. So let's say that the forståsning engine generates a test case that is 000. On the right side we see that the code that we are testing. Well this test case will explore the first branch of the code that we are testing as shown because it doesn't have a length of four so it returns in the first branch here. Next the forståsning engine is able to after maybe a few thousand executions is able to generate a test case of four zeroes and is able to pass or get pass the first length check there. So it goes into the second branch there and returns because the first character of the byte or the first byte is not in lowercase a or sorry the first first character of the string is not a lowercase a. So the forståsning engine is able to at a later point when running is able to generate a three zeroes and is able to get pass the first branch and return the sorry get pass the second branch and return the third and so forth and in that way the forståsning engine is able to get the feedback that it receives from the from exploring the code and generate meaningful test cases. So essential component in this example is the first is rely on generating random data but as you see in this function there are four bytes the forståsning has to guess that's 32 bits if you are to generate a test case that gets down to the return true statement you have to guess one out of two to 32. The forståsning based on the coverage feedback reduces this significantly by each time it passes at condition it saves that test case and puts it in what we call its corpus now the point is then in the next time the forståsning picks one of the test cases from the corpus mutates it and then tries that so instead of having to guess one in two to 32 it has to guess four times one in two to one out of 256 because a byte is only eight bits so that's the central component her each time we get past one condition we get past eight we say that test case we have it took one out of 256 guesses to find that to find the test case that would pass the first condition we then save that corpus and we only have to guess one out of 256 again one out of 256 again so that's the central component here we it reduces the complexity the coverage feedback mechanism because it knows which code was executed that's the central component here it's it's random but because the coverage feedback you don't have to have these huge complexities as you might might assume yeah i make a note of the corpus because we will get back to that later when we find when we go into how we reproduce a crash that the forstås find so let's say that we have written our forstås and we let's say that we have written our forstås and we want to test them so we write our forstås and we submit them to a project in this case we actually contribute them to an open source project for eksempel a CNCF project and make a pull request get them merged upstream when this project wants to run these forstås it can do so in three ways it can do it in the ci it can do it continuously or the developers can run them locally and all of these typically happen when when you first a project so let's talk about the ci let's talk about why and how of each of these three ways of running the forstås when you first in the ci the the typical reason you would do that is to have a bunch of forstås that test the code for box in that box that the pr's introduce so you want to catch box in pr's before the code is merged another good reason is also that you want to ensure that you don't break your forstå build or your forstås in general so you if you have 70 forstås you don't want to merge a single forstå that break all the other forstås and prevent them from running how a good way to run in the ci is either yeah as i mentioned in pr's before they are merged part of the standard the workflows or you can do it continuously in the ci with a project like claustrophoslide which we will get into well claustrophoslide is kind of a forståing framework that automates a lot of stuff and that that is that runs continuously in your ci or you can if you are if you have a project that's integrated into ossefos you can use ci fos which is also a tool for which is basically a get up action that you just add to your get up workflows folder continuously we we touch upon it forstås continuously in the ci but let's talk about forståsing continuously in the background and and why yeah again why and how you want to first of all you want to ensure that your forstås actually run it happens a lot that a project will write a bunch of forstås really excellent forstås as well and forget about them and they won't run and i have personally seen cases where simply running getting getting these forstås to run have found security issues another reason is that forstås some might some forstås might need a lot of time to explore code and and get past in the example we saw that the forstås in forstås needed to get past a bunch of branches these branches may sometimes be very complicated and difficult to not difficult they might might require time to get past um so forståsing continuously means that the forstås will have enough or plenty of time to do so um and yeah three um that's what i just said how um osfos cluster for slide osfos is an open source project david introduced i think no need to go into that but here's the link if you want to read more about that to strongly suggest looking up on it we will go we will do some demos on it in a bit testing locally um is the with testing locally is mainly when you want to when you're developing forstås and you you yeah you want to make sure they run you want to perhaps like when you're making contributions to a to a project you want to ensure that you're not introducing box yourself forståsing locally is always an intermediary step for forståsing continuously it's not a replacement for forståsing continuously the goal is to get the forstås to run um continuously and um locally when you first locally you uh you kind of ensure that the that they can the forstås can also run continuously uh how uh yeah i use a uh you can you either compile the forstås into binaries and run invoke them on your command line or you use some kind of cli which is what the golang uh which which is what golang enables through their um through their test uh cli uh tool so i want to go back to forståsing continuously a bit more having gone through all of these three ways of running let's go look at a bit deeper into forståsing continuously because it's very important it's a very important component and part of forståsing forståsing continuously has a few problems and difficulties um typically you will have a bunch of forstås that require different um different amounts of time to run you might have a forstås that tests thousand lines of code and you might have a forstås that tests 20 000 lines of code naturally you should give more time to the forstå that tests 20 000 lines of code not naturally but most likely you should um the more forstås you write the more complex it becomes to run these forstås continuously it requires more effort to manage 100 forstås than two forstås and doing that in an automated way it can be complex you want to reuse the corpus um jumping back to the to the points made by david and myself about the code exploration let's say david mentioned the corpus um let's say that your forstås has found a way to or has generated a seed that gets all the way to the to or that gets deep into the code you want to take that seed and use it to start your next forstås run so such that you start the forstå not from scratch again and it has to spend a bunch of time to find the same seed you want to take that seed and start it start the next forstås run with that seed then the forstås might find crashes and you might fix those crashes hopefully and that also that also requires some some some effort to get to get the details from the forstås runs and not forget like not lose them in in your logs for example and because of that amongst other things google or the google or sesfos project is it's a great open source project for forstås it's a critical component of open source forstås okay um gone through that the workflow um this is a diagram of the of our sesfos let's look at more into our sesfos uh this is the workflow uh from the dapper forstås audit this is also public in the dapper forstås audit report but here we see the forstås how sesfos works on left side we have the forstås sitting in cncf forstås that gets pulled into a sesfos at build time then we have the target source fires in the top that also get pulled in then our sesfos builds those and over here over here and run the forstås for a couple of executions and check if any of the previous bugs are found they are sorry if not not if any of the previous found bugs are fixed if they are the or sesfos will update a central bug bug tracker and if they are not maintain us or sesfos will check next time um you see a lot of the stuff going on here and or sesfos automates that for for the project integrated there are three key components of forstås forstås at dockerfile build sh file a project.jammel file the docker file is okay this is the east view project dable will get more into that just highlighting them here there are three key commands of or sesfos build image build forstås run forstås the build image sets the builds the environment for for building the forstås build forstås build the binaries um so we can run them later and run forstås will run a forstås forstås and if you are contributing forstås to a project that's integrated into a sesfos these commands are great to know so let's do a small demo uh almost live demo i made it this morning but we do pull in a lot of packages so uh i wanted to do to do a live demo but the internet wifi is not super strong so what we do here is all the way from scratch we clone the or sesfos repository highlighting basically a to z of building the forstås and running them we cd into the or sesfos directory that we just cloned and we build the image of the east view forstås now we have built the image and it's time to build the forstås east view has close to 70 forstås so this takes a while i have increased the speed here by quite a lot i think 20 times so just speeding through that or maybe i haven't but just skipping through here for time this is all the process of building the forstås here of east view and at some time so at some point it finishes and here we have we are finishing building the forstås we have still only run build image and build forstås and we want to check the which forstås we have built these old binaries these all the first built forstås of east view and now let's run a forstå so we invoke run forstås east view and the forstå name her we choose one of these here the green ones here and the forstås running and these three commands were all you needed to get that done the demo just did that what about when a forstå finds a crash it's part of an important part of forstås why we do it when no assessor finds a crash by running one of our forstås we we get an email with a link to a report like like so on the left side you see who which which people receive or which emails we received this bug report these are all defined in the project.yaml file of the of the assessor's integration that we mentioned a few moments ago and here we have a link to the detailed report so let's go let's click into that and we see okay so Adam was just about to show you how the bug reports look on OSS first but i'm going to give you a live example of that just to show you how it looks while clicking around the browser so say for example we have a repository here which contains a CNCF project that we want to fuzz and we have two files here which is just a small library in c just an example library and we have two forstås so the library we have uh has a function past complex input past complex format we want to write a fuzzer for this entry point into our CNCF project the fuzzer that we will write in this case simply looks as follows this is the entry point that the fuzz engine will call so this is the core fun this is the core fuzzing harness we have the fuzzer will give us a raw buffer as well as the size of this buffer a raw buffer of bytes and what we do is we wrap this raw buffer into a null terminated string and then we pass it to our function past complex format what will happen now is that the fuzz engine will continuously give us a buffer that makes sense and we're trying to see if there are any memory corruption issues into in the past complex format now if we are to so we want to integrate this into OSS first and we have here a pull request so all of this by the way you can find in the handbook we have an example that that goes through this but here we make the pull request into the OSS first repository you can see we are at the OSS first repository and we make a OSS first example initial integration and we have here the dockerfile and spoke of the build as well as the project yaml and the dockerfile the only thing it does is it inherits from the google provided base images which has a lot of like fuzzing uh entrain six tools and so on that you need the right compilers and so on and then you just clone your repository so we have it inside the dockerfile what we then do is we build the code of our library in a certain way and in this case we have to use various environment variables which correspond to the compiler we want to use as well as the flags to the compiler the key is here that these flags they indicate various information we need to provide to the compiler that tells it instrument the code instrument the code in a manner that is suitable for fuzzing and this includes instrumenting for code coverage feedback as well as sanitizers such as address sanitiser and so on and then we build our fuzzers against this library we have just compiled now this was accepted so it was merged in we can see and now we actually have OSS first example inside the projects folder of OSS first what then happens is so this happened on the 7th of august the we in the project yaml we had a field that said let's file let's file github issues meaning when OSS first finds an issue we get a bug report on our github tracker so now we are back in our OSS first example repository our OSS first project was merged the 7th of august the 8th of august we got a notification that an issue has been found it tells us OSS first has found a bug a bug in this project please see this link for details so we try and go into this link and what we get is this bug report so what you see here is you have the stack trace first you see the sanitiser report there was a heap of overflow we get the exact line where the error occurs so in this case we can see that it's at this index and likely it's just because idx here is out of bounds well it's because idx here is out of bounds we can see what the specific input to the fuzz is which if it's asky we can see that there's a little key here saying fuzz and then a bunch of stuff and we can use this test case to also reproduce the issue so what we then did in our OSS first example project was we tried to fix this issue and the way we did that was we submitted a patch fix bug and the bug was because there was some buffer sizing going wrong instead of giving we gave it x but we should actually give it x divided by 2 and this happened on the 10th of august what then happened in our issue was OSS first came back the 11th of august and said the 11th of august and said OSS first says close this bug because it's fixed so this is kind of like the nature of OSS first you integrate you make a pull request on OSS first with your 3 docker file with your 3 files a docker file a build and a project yaml and then OSS first will simply start to run your build script and run the various fuzzers so first complex parser that you have here and as well we also had one other fuzzer it will just continuously run these and report whenever any issues are found it does a lot more than actually just find just report when issues are found there is a website here called introspecta.osfs.com which shows you data about all projects on OSS first so for example it shows that it shows project about 960 projects in OSS first including and just about one it shows that just about 1.5 million functions are being analyzed and what it shows you is let's look up our example project what it shows is first and foremost you have a code coverage report so you can look at the exact lines of code that being first in your project in any project on OSS first this is something everybody can access so we can see here that for example our past complex format this is past performance second but our past complex format is executed by 295 unique inputs we can see here and we can also see what code is not being executed this is super important when you are sort of like iteratively working on your project in order to get maximum code coverage so say for example there is uh let's let's pick on on a google project called libwap which was uh there was recently a critical issue in this project and we can see whether it is actually being first well and we can see that it has a code coverage of 84 percent which is not 100 percent and we can start to look around at where are there potentially limitations in their in their fussing routines and we can see for example there are some missing coverage of some interesting decoder functions so this is something where we can start to contribute fussers to libwap and ensure that various open source projects are are first and we can do the exact same thing let's say we have envoy and we can see here that they have 63 different fussing harnesses which is huge and we can start to explore the code coverage report and well there's a lot of code 1.6 million lines of code we can see and we can just start to analyze what more should be fussed which in this case is is a lot at least to get 100 percent and we can also see collect historical progressions i think we are out of time so i think with that i'll end and say integrate the projects into o.s.fush and you will get these types of data reach out to cncf fussing and we will help develop fussers for you if you are running a cncf project and we will help you get continuous fussing into your cncf project and check also the handbook please it's a it's a great reference if you want to kickstart your fussing pursuit and drop by our booth in the project pavilion and learn more about fussing if you're interested i don't know if we have time for questions or if there are any questions i don't think there are any questions but in any case you can come to the project pavilion if you're interested we have a cncf fussing booth