 Yeah, so we're good to go. I guess. So, hey, everyone. Welcome to DevCon of CZ 2022 for the first session of Room 1 Saturday. And let me welcome here our first speaker, Andre Lichtner. If you have any questions for our speaker, please use the Q&A section. You'll see in Hopin. We'll get to the questions at the end of the talk. Thank you, and the floor is yours, Andre. Thank you. So, as was said, my name is Andre Lichtner. I work for Red Hat specifically for the networking services team, which employs a number of people that work directly on the kernel networking stack or related technologies, and they're very intelligent developers. Me specifically, not that smart, so I just do automated network testing for them and handle infrastructure. So we have a completely integrated pipeline for that, and this talk is going to be about one small part of that, which is a framework for writing network tests that we have created for that use. So, first of all, what is the problem that we are trying to solve? You mentioned network testing. It's a little bit different than your traditional testing when you think about software development. It typically involves, or what our situation is, it involves multi-host testing, and that can get very difficult sometimes because you need to interact with real servers, and because you are doing kernel development, as I've mentioned, you are oftentimes also rebooting the systems and reconfiguring stuff in a quite complicated fashion. So when you start doing that, or if you start thinking about how to do that, you can try doing manually by hand, configuring everything yourself, then running some kind of a benchmark. At some point, you may want to automate this via a shell script, but because you are doing development and you don't really want to run the development kernel on your also development machine, you want to do this remotely. So you start creating some basic test harness over SSH to distribute everything and to control everything, and you still end up with a bunch of unresolved problems. Most of these are, you know, it is very difficult to actually synchronize your test hosts and their individual actions. You need to handle distributing logs or your tests into the machine properly and a bunch of other stuff. So because I'm not very good at bash, we decided to actually do something smarter and implement our own framework, which we call Linux Network Snack Test or LNST for short. It's a Python framework. It's also a multi-host, multi-process application, and it's also a test repository. So everything is hosted on GitHub on the specific link. I did upload the slides to the schedule, so you may want to click on that there later on. Now, what I mean by Python framework is that this is, you know, it allows you to write network tests in a nicer way than just doing bash scripts. Multi-host, multi-process application is what actually executes those tests so that it provides you some sort of a harness. And test repository is that all of the tests that we develop and use at Red Hat for our integrated pipeline are actually hosted directly on the Git project that is listed here. So you may want to look at that if you want to. Before I actually start getting into details, there are three main disclaimers I need to talk about. First of all, because we are doing a configuration of networks, everything requires root permissions everywhere, and because we are distributing test code over network, the project is basically a remote code execution security hazard. So don't run this on anything you care about and, you know, be careful if you do anyway. Second, releases and packages, you may find RPM packages or a PyPy package for LNST. These are very out of date right now because a lot of the stuff that I will be talking about is new implementation, and we went through a bunch of redesign and re-implementation. And me as the maintainer of the project was very lazy and don't like RPM packaging or any sort of packaging at all. So these are out of date still. Finally, we are working with networking. So we inherited a lot of problematic language. So in this presentation, I'm going to be using better words, but in the actual implementation of the project, you may encounter a lot of problematic language. We have a colleague working on this actively right now, but it is not finished. So we are working on it. Now, the architecture of what LNST actually is. This is a screenshot that I digged out from a very old Wiki page, but it is still valid. The basic idea is that you have a multi-process, multi-host application that runs on multiple servers. So you have your controller machine. This is typically where you would be doing your development and where you want to run your actual test. And then you have your test machines that actually run your test code. We show here that there are two networks here, the green and the red one. The green one works as your remote management network. So this is what you use to manage the power settings of your machines, to reboot them remotely, etc. You use this also to provision your machines. And then LNST reuses this to communicate between its applications. Then the red network is what is actually used for your testing. So this is dynamic and you reconfigure it all the time and it is running all of your network testing. It is important to have it separated because if you're bringing devices up and down, your controller and LNST agent would get disconnected. So you need to have a stable network for that communication setup. It is also a good idea to have your test network completely isolated from everything else because we are going to be running performance benchmarks, which you don't want to get noisy due to other traffic happening on your network. Or on the other hand, your performance benchmark impacting normal users of the network. The other idea that or the other part of the architecture for LNST is the concept of a test recipe, which we'll be talking about in this talk. The test recipe is a sort of extension to what a test is. It defines not only the test procedure that you want to do, but also the configuration of your network and the physical requirements for your network. So we may have a lab that is very large with regard to the network, but one of our tests only requires two machines, so there is no need to reserve the entire lab. So your test recipe specifies what specifically requires LNST reserves that for you and uses only the resources that it needs. Okay, so I mentioned recipes changed and that's the new thing that I want to talk about. So a little bit of history. We started with recipes in XML form, which is shown in this screenshot. There are two parts of it. There's the description of the network topology and its configuration and the actual test. So the test is just a ping connection check between the two hosts. And the reason for this was that we wanted to describe the network topology as a graph, which notes and links between them made sense. So and there's a standard to use to do XMLs for describing graphs. So we did this was not very good for actually coding the test procedure. So we moved to the second version where we split off the test procedure into a Python file and had that here. But this still leaves a lot of issues with just having to synchronize and link these two files together. And main problems with this for us was maintaining the test set that we have created. It got very large and we had a lot of duplication and it wasn't very easy to maintain due to the limitations of how the recipes were written. So the new thing that we are using right now is tests or test recipes that are completely written in iPhone such as this. All of these three examples are the same. They're required just the two machines that are connected directly back to back with one network. And you configure one IP address on each of them and you run a ping connectivity test between them and you just execute the recipe. So what are the actual features of LNST and why is it a good idea to use it? Going from simple to complex of what you can do, you start with the basic requirements of what you need to do to create a recipe. That is you describe the network requirements, the typology, what the hardware looks like basically. You have to define your test method and you can optionally provide parameters. So we just saw that in the Hello World example. But after that, LNST provides you with abstractions for network device configuration via the LNST devices package. This is connected or this runs directly on our test machines and it connects over net link to your kernel to configure everything. You get test modules which are basically Python classes of test code that has some shared common functionalities that you want to distribute to our test machines and run it there. So these get dynamically sent to everywhere you need them to be. You can of course use recipe inheritance because you are in Python so all of that works nicely which makes our life a lot easier to just maintain our test code and everything else. And we are developing the recipe common module which contains classes that are either mixings or extensions to recipes which we found as common patterns in our test that we think may be useful to just share. So I do have a demo stage to this because I think some of this stuff may be easier to just show you. So if I go into the terminal, hopefully everyone can read this fine. We checked before the start of the talk. So this is the example that you just saw as the Hello World. We described the requirements. This is the description of the physical part of the network. So you described that you want to host requirements. You could parametrize them if you wanted to and you described that you want a device on each of them and to indicate that these two devices can talk to each other you add a label. This is the only required part of this description. Everything else can get dynamic with regards to how you want to describe your network. And you have your test method which in here you get the real representation of what machine you got. So you have a special object for matched objects and in here you get proxy objects that transfer all of your actions to your test machines. So this code is actually running on your controller. But every time you execute any action here that gets transferred to your test machine if required. So the first first or easiest thing to do is to add IP addresses and then finally you run a command. This simply runs a shell command as you see it here. Now this part here, this file, this demo script is both the definition of the test recipe as well as an executable Python script that runs it. It has a little bit of a boilerplate code that may not be easy to understand or read. The basic idea is that in here we just set up the controller instance and we set it up in a specific way so that I don't have to interact with the real network or real test machines. Instead the entire network that I described that I require for my recipe gets created with podman containers. After that I create an instance of the recipe and I just run it. And when that finishes I print some summary with regards to what the results were. So if I go here, I run this and again remember I mentioned everything needs to run on root permissions so I need to execute this with a sudo and if I do this it's going to go through a couple of stages which I'm going to describe from the logs. You see a lot of errors here. This is because we are running with containers. These are basically information errors for us right now because these are trying to retrieve information about our hardware but because we are in containers there is no hardware to retrieve information about. So if we go back to the start there is a first stage of the recipe is to find a matching configuration or a matching network. So we look for a subgraph in a graph that is available. Now because we're running with containers that gets created dynamically and we get a description of what was the match that was used. We do some basic pre-test machine cleanup and restoring some system configuration. We would get some hardware information if it was available and after that we start with the actual test. Now because I zoomed in a little bit too much this is wrapped due to the length of the line but what you should be seeing here is that you're calling device methods that are configuring the IP addresses here and finally you're starting a job here which would be your ping test. So all of that gets executed and at the end you get a cleanup of the configuration, everything gets restored and you get a summary of what was done and you see that you had a ping which passed whatever that means right now. And the overall result is again a pass. So I have a second script or a second demo which introduces the additional features that honestly gives you. Now I mentioned recipe inheritance so this time I'm writing a demo to recipe but instead of basing this on our base recipe class I'm basing this on my first demo recipe which means that I don't have to any longer describe the requirements because I'm using the same ones I could change them but I don't have to and I'm also adding an additional class which is our proof recipe common module from the recipe common and this provides us with some additional mixing methods that help us write performance recipes. So our second recipe doesn't describe the requirements of the network and it also doesn't describe the configuration of the work because we are reusing this from our parent class. After that we just add the performance test action here. This is inherited from our proof recipe and it requires some configuration to know what the performance test should be. So this is a very you know it looks like a very complicated construct for this but typically this is connected into the back end of your recipes where everything all of this gets generated dynamically. The basic idea is that you specify how many times you want to repeat the performance test and you specified what measurements you want to have done. We want to have an IPer flow measurement done and it should measure one flow which is going to be a TCP stream between our two IP addresses between the two machines and it's going to take 10 seconds and that's it. Additionally we are going to use this report and evaluate action which is going to take a look at the performance results from the performance test and tell us something about them. I do have a bug in the code that I found during making while making this presentation and I do need to specify this method here as an empty blind spot for it so that it doesn't crash. So I'm going to fix that after the presentation but it's it's not a big deal. Now I did one more thing here. I configured the summary formator to give me one more level of detail while printing everything so the summary here is going to look a little bit more in depth. So if I run this thing the start of the recipe is the same as we have inherited that from the parent recipe. Now at some point it's going to start the performance test which you should see here now that it started an IPer server and an IPer client. Now this takes 10 seconds to measure something and it's going to now it's finished here and it's going to do that again because we specified two iterations for it. So it started the same thing again and once we finish we get a nice summary. We should here see now because we added one more level of detail we now see the configuration actions that we have taken so we also see the summary print us information about the configuration. The pink test that we've done now shows us that it retrieved also information about what the thing did so this is the standard output of everything and then we have the actions of measurement so IPer server, IPer private client's running and because we called the report and evaluate it will also tell us what the actual measurement results are so we see some numbers measured here. And it also includes the raw data of everything so this is how everything was parsed during the measurement so you could print out a nice graph from this that shows you what the flow was while everything was running. So with that the demo should be pretty much finished so I'll just finish out the presentation. I talked about how we do this or I wanted to talk about how we use this in Red Hat so we built an entire recipe set which we call early network regression testing for ENRT. This uses the performance package we just saw we start with a base ENRT recipe class that we implemented in a way that it combines a complex combination of scenarios that define or describe the basic topology of various software devices and how these are configured statically but then we combine a number of mixing classes that do smaller system configuration parts that can be looped over to create a very large matrix of tests. So an example would be configuration of device interrupts or offload settings for various device or network cards etc. Now everything all of this measures performance as you just saw and internally we have it connected to a measurement database where we track the history of all of the performance measurements and the performance report and evaluate method that we call to show us our results. We'll also call an automated evaluation code in there that will compare it to an older baseline result and tell us if pass or failure based on that. Extra features that you kind of saw here that ONST supports is that you can work with virtual machines and ONST will create a dynamic network topology for you automatically. However you are limited to the virtual machines that you are running on your own laptop so those are created those are static that you have preset up yourself. What you just saw was an initial support for dynamic creation of containers to run recipes it has some limitations right now which weren't visible in the demo it's not able to do more complicated network topologies but we are working on that. I did mention test machine pool matching so you I did explain that and the code that you are running on your test machines gets dynamically distributed to your test machines so the recipe itself and log collection and everything is running on your controller machine and this is also where you have all of your test code and you can write additional modules for it even if you want to run those modules on your test machines and this gets distributed dynamically over network to your test machines and runs there so I think that's everything so these are some links which you can check out the mailing list itself is for discussions which we don't use too much because most of our team is at Red Hat but we are definitely monitoring it if you are interested in reaching out to us the documentation is getting there because all of this is kind of new so it's not completely it's not complete but it should help you out at least starting and so it's all for me. Thank you Andrei for your presentation also for the live demo we have some questions in the Q&A section shall I read them for you? I can start myself I think from the top so Lukash asks about you mentioned virtual machines how much configurable they are is it liberate based or k8 sorry kubernetes or could you pass qmus you come on directly so the virtual machines are liberate based and it's I think it could be extended it doesn't do too much it basically is just a very simple wrapper that creates the network between them so we don't play around too much right now but it could be an interesting idea to try out we just haven't had the use case for it for ourselves next is anonymous you mentioned this is somewhat risky to run on production network or something similar would you like would you be willing to go over this again so all of the network configuration requires the root permission so the agent process or agent application that runs on your server is something that is exposed to your network so the controller needs to connect to your lnsd agent and the lnsd agent is going to run root commands to configure your networks but it also runs all of your other test commands and everything basically so the lnsd agent does have very very very basic implementation of some login credentials in there because somebody asked me to ask me to do this and I found it interesting but it's not it's definitely not something that was properly security audited or anything so anyone just can just connect to your agent which gives you root permissions to the entire machine which is the reason why this is not a good idea to run on anything that you care about and look how she can you mentioned you're not much font of bash how are you actually configuring the system is a danceable custom python API shell commands so it's split into two parts the provisioning of the machines that you know sets up the actual OS system is handled by beaker internally at redhat which does this part of the system after that we just do a very rudimentary configuration over a couple of beaker tasks but that basically just installs everything for us and after that lnsd takes over the network configuration is done by lnsd via our lnsd devices package that connects over netlink and if we need to do anything else we write we can write bash commands or shell commands directly in lnsd so lnsd just takes care of that and we typically want to create test modules that would wrap more complex procedures and do additional parsing on that so that it's a little bit more reliable and there's a new question so with the not run on anything you care about in mind what exactly would the end user use user case would look like well right now we are the end user use case internally at redhat we have a private lab that is reserved for this so this lab is isolated from everything and that basically is good enough for us because we don't expect anyone to attack that because it's not reachable additionally a kernel developer or network developer it could be an end user again with the idea that they have either that they are either running the tests locally on virtual machines or containers which don't get exposed to a public network or they have their own machine hardware setup that they would be able to use for testing which again if you're doing kernel network development often times many people do actually have those servers available for their testing thank you very much for your answers andre this was our last question i would like to thank you once again for your presentation i would also like to thank to our audience for being here and for asking the question it was very interesting uh so thank you very much and if you want to contact andre you'll see the contact info on the screen and we'll be back in less than five minutes with the next presentation thank you i will go to the work adventure thing if anyone wants to talk about a little bit more don't see you