 Good afternoon, ladies and gentlemen. Thank you for coming to the 1.40 session. Next up, we have Sushil Kulkarni, engineering manager at Red Hat. And Ani Slaghalam, sorry for the mispronunciation, who's a software engineer at Red Hat. So they're going to be talking about catching network regressions using LNSTs. Good luck, guys. Thank you. So I think I know some people in the audience, but how many people are involved in networking development? How about CI CD? It's supposed to CI CD a little bit. So this one is basically going to walk you through a tool called LNST, a framework called LNST that we've developed. And I work in a networking group in the Cat Red Hat, and so does Ani's. And basically show you what this framework does and give examples. So here's an agenda. I'll basically first talk about why we did this framework, LNST, and why it was required for us in the networking team. And then we'll also talk about, leading to that, what is LNST and what is it capable of doing? And then we'll give you an example of how you can set up a test or topology using LNST. We'll show you how it reports results and that you can interpret and use for catching regressions. And in the end, I'll just tell you what's going on, what's coming up in the future for LNST. So sometime ago, there was an engineer in the networking group, and he started writing regression tests for bonding. And then he soon found out that he had to redo a lot of things when he had to start testing teaming, for example, because it's kind of similar technologies. So that's where the need for a framework kind of arose, where it would be important to have a universal framework for running networking tests that could be easily extendable, and it could be really consistent so that you can run them over releases or over cycles of nightlies and then kind of compare results and see if there was a regression somebody introduced to regression. It was also important to also make it easily extendable, because you didn't want something which is tightly good do scripting, but it's really not extendable. So we wanted a framework where people could add new functionality into it and run it so that you could get newer and newer topologies as people add in more and more features or more and more functional items. And there's something called test recipes, which Anise will talk about in a little bit, but we also wanted to be something that you could describe your topology in so that this framework could run the topology and create the topology for you and run the test. The reference to CICD was another thing where we found out that at some point we found a bug that we could have caught it really early in the cycle of the development. And if we had something that we could test at a developer level, for example, then that would be great so that the person that the developer could run it, the tests and catch and regressions, or the LNSC team at Red Hat could also run it and test and catch regressions much before then maybe towards the really end of the race. So these are some of the reasons why this framework came into being. So the original engineer at Red Hat, he started this, Giri Percoghi started this framework and then there's a team at Red Hat that works. And this whole thing is upstream and the front page had the Git repo on it. So you can go up and look at it and see how you can do something like that. So I'll hand it over to Anise. He's going to run through a lot more than me. And he's going to talk more. Thank you. I've been with Red Hat for a little bit over four years and I've been using LNSC since then. What is LNSC? It's an abstraction, a collection of programs that help developers to ease their work. It's a tool written in Python that has set of tools and definitions that make the workload of the developer easier. It is an automated testing. We, you know, developers, as Sushi mentioned, when the Giri Percoghi was trying to do teaming and bonding, he has to do a lot of stuff. So automation helps in this way, helps eliminating the human error or human factor by using the same commands in a sequential order over and over to keep the tester steady. Portability, you write it once and the idea is to use it multiple times. It's not dependent on any hardware or special hardware. Could be the hardware that you tested on or could be used by another developer, for example, and you need another hardware. You just use the same test again. The fact that it is abstract, portable, extendable make it really easy to use and save time for redoing tests over and over or configuring tests over. Is this working? So what can LNST do for you? It helps you set up your environment faster and easier. And when I say it could be used by developers, it could be used by hardware guys, firmware guys, and you can test different topologies on the fly. It has a library of test pre-designed pre-configured and you can just pick what tests you would like to, is it bonding, teaming, is it virtual, guest to guest, has different and multiple variety of tests. You can test the functionality. You can test the throughput. You can use other ads on IPsec, Macsec. It's very extendable. Has a feature where there is, in your lab, you should have a pool of machines. And basically, you tell LNST what I want to test. And LNST will go check that pool and see what configuration is good for you and use those set of machines. It logs tests with timestamp for debugging in case you catch anything or you see something that unlocks you. You can always go to logs and it has very detailed logs. And the most important thing, it cleans after itself. Once you run it, it goes, set up environment, do the tests. And once it's done, it flashes the necks and return them to their original state. This is really just a simple overview. This is how it's going to look. In our tests, we use Beaker. It's an open source program that goes and controls the systems for us provisioning what kind of OS I would like to use. It's pluggable into Jenkins, which is also an automated framework. Let's say you build a kernel, it will kick in. Jenkins will tell Beaker, OK, the build is ready. Go install these machines with this kernel. Beaker will go to the machines and the installing for you. Once it's done, it will kick the NST test. This blue line and this green line are totally different networks. This is the controlling of how you SSH or how you VNC to your systems. The green network is where your test is going to be, your PINs or your IPer, NEPer, whatever tool that you will be using. So how is the NST test itself up in a chronological order? For the sake of time, we will assume that we have two systems that have already our OS installing them, which is Linux. And they have access to these machines through a NEC that is not on this design. I can SSH to them. I can install whatever I want, like RPMs, all that stuff. And these two next are the ones that I'm going to be testing either for after an upgrade, driver upgrade, or a kernel upgrade, or firmware. That's it. Then I'll install NST on one of the machines. They have slaves. You can install them through DNF, install LNST slave. And you only need one controller, this controller. You could be on your laptop. You could be on your desktop. Or you could be on one of the test machines. Once you click the LNST, it will go and find those two nicks that you are trying to test. In this case, it's going to be a bond. And it will configure it with two slaves with a bond, given IP address of your choice. And after that, it will run traffic. Whether it's a thing, or is it throughput performance? Next, please. And at the end, it clean after it logs. It reports back to you. It's fresh machines I can for a new test if you want. This is as simple as it can get. We're going to go through how the LNST was built. Two machines with LNST and slave installed on them. The controller always is installed. And this is the final test, or the final look of matters that I want to test, this connection right here, either for functionality or for throughput. How does LNST know this? By, we call it recipe. Recipe is an XML that has attributes. And you tell what machines you're going to be using. Machine 1 is also supposed to be a machine that XML in your directory or your LNST pool with all the machines that LNST knows about. And has the MAC address, and has the host name. And for this case, we picked machine 1. It has one nick, and this is the IP that I'll be using. In the same XML, there is machine 2. And these three dots are just like copy paste of the same just for the sake of space with the two. LNST will read this XML, and find that you want this task. It will go into the same directory, look for it. Right here, this is as simple as it can get. It imports modules, libraries, sign a variable, and it will run this command, ping 1 from this IP to this IP. How do you run it? Just this command, LNST, control, run recipe.xml. And as I said, there is a variety of options. What recipe? And they are all upstreams and accessible. Just showing the example back here. I'm sorry. Next one, please. This is another example. I want to test bonding versus a nick, bare middle nick. Same thing, like I did not change the EOS. I did not remove this. So I'm done with nick to nick. This is my signature, I want to test. I can just do LNST, that color, run the recipe that has the bonding in it. And it will do the same. Next page, please. That XML for the bonding, it's one single XML, just split. Same thing, has machine 1, has two necks in it. And the driver for the next, in case the machine has more than one nick, with different drivers, I can specify the driver to test. And the bond, the name is bond zero. The type of it is active backup. It will enslave TH1 and TH2. And it will assign IP address IPv4 or IPv6, or both at the same time. And machine 2 is similar to machine 1 from the previous XML. And once it reaches this Python task, it will go to it. It's a more complicated Python code than the simple network or the thing that we just showed earlier. And it's available online. At the end, I will post the get where you can find all the recipes and the XMLs that we use, the Python also. Next one, please. Yeah, so essentially just to complete, Shanice's thought we're, you can see how this recipe is actually defining the topology that you really want to test. So you can customize your topology to what you want. And provided there are these libraries that will run the test for you, that's perfect. You can just run it. If there isn't, then we have to either add a new library or modify an existing library to do this. This is a more complex setup. Again, two systems with OS on them. Each system has two guests, VM1, VM2, VM3, VM4, has an open V-switch running the network internally. One controller, and if you notice there is a slave running on this host, slave on the second host, slave on each VM. Those slaves are listening for instructions from the controller to what to do. Configure what type of topology, or what IP I'm testing, IPv4, IPv6. So the controller will submit the instructions, and the slaves will just execute. In our case, we're going to test functionality tests either ping, for example, between VM1 on this host to VM3. ETH0 on each VM is the controller. This is how I can SSH to this machine. ETH1 is the neck that I'm v-testing. This slave will configure the ETH1 with a VLAN 10, and that this slave will attach ETH1 to the switch, the physical necks to the switch with the bounding with an IP address here also. Same thing, mirror will be done on this. Then VM1 will try to ping VM3, and everything, the results will be in the log. It will also ping VM1, it will ping VM4, which is on a different VLAN just to see if it passes or not. In the past, we caught a regression where the VLAN was crossing the VLAN ID tag. And it's important to note that it's not just ping. You can run any traffic generator you want. You can extend it to whatever IPerf, netperf, or vf. We're also thinking about using another greenhouse built traffic generator than we have. Yeah, so essentially we're running traffic to see if there's regression in the kernel network. So how does LNST report? If it's a pass, you will get a nice looking detailed report with the pass and the summary at the end. If you want to skip the debugging part as a pass. If it fail, it could fail for any reason. But we care about the functional failures. If, for example, the ping didn't pass, it will tell you that it failed. You can tell them to pass. The pass here, for example, in our test, we test performance as well, not just the functionality. This test was a result of it was using the netperf who was 9.4. And it passes because it was based the baseline. And the baseline is not just an arbitrary number, but it was decided based on multiple runs and making sure that this version is stable and we average it. In each run in our LNST, we don't run it netperf just one time. It's five time and average it. And I will talk more on the failures. If it fails for the throughput reason, this one was less than this baseline. It tells you. And the second failure is, for example, the standard deviation allowed in our test has to be within 20% of the measured throughput. And the next slide will be a more visual idea about how the deviation failure is. In our tests, we rely on an open source project called perf repo, which is a database that, since we care about performance as well, we have to store that and compare it to a baseline. This baseline, the one I spoke before, is this green line. And this perf repo has a web UI interface that makes life a little bit easier. The y-axis is the throughput in gigabits per second. The x-axis is the runs. Each natural number means there is a run. There was a kernel here that we run. And this was the result. The orange line is the average throughput, minimum throughput, maximum throughput. And the space between is the deviation. How the data that we collected from the runs, how scattered were between the max and the minimum is this space. So if the space is 20% of the average throughput, that means there is a failure. Like the numbers are not reliable. This one is a nice one. Same machines, same tests. I wouldn't say same test, but it's a different test with a different protocol. This is a nice looking graph where the baseline, after we tested it, we decided, OK, this is how much we're going to be. And every time it's passed, this is kind of the loop. Now, this is an example of how early regression testing can be helpful. We run here kernel versions during the development cycle. If you can see here, all the threes are several times we run this test. Then we decided, OK, the baseline, let's agree on this baseline right here, was this point, for example. And this dotted line is our baseline. Then after that, every time we test, if it fails, we go and debug and see what it feels. Here, it's a clear regression. That starting this kernel right here, it was a regression. And the bug was filed and the developers are working it. So that's how LST makes it easy for us to just automate it and run it. And I'll handle too, so sure. So what's next coming up? So if you saw the recipes were XML based before, that has a set of limitations where it's not very flexible and some of the more complex topologies is difficult to implement or provide a recipe for. So the next thing that's actually underway currently is converting all these things into adding support for Python-based recipes, which also means we have to convert our existing recipes into Python recipes. So that's one of the things that's going on. Like I said, there were different type of traffic backings that you can put in, like high-perf, net-perf. And there's another one which we're trying to use called rush-it, which is, again, a traffic generator that we built inside the networking team. And that's also upstream. So we're trying to integrate that as well. Of course, there's work happening on the next branch. There's conversion from Python 2 to Python 3 because Python 2 is going to be deprecated, I believe, so in a couple of years. So that conversion is happening. There are other supporting things that are not specific to LNST, but something that's, if you remember that package that Anish showed about beaker, Jenkins, there's something else that we've added, which is sometimes we see set-up issues where you might get false positives. So we've run created these bots which will rerun the test again just to make sure we've really caught a regression. So we might integrate into LNST, you're not sure, but there's something that we're running on the site. So it runs it again. We do three passes. Basically, if you have a majority pass and you know it's like a set-up glitch, we mark it as something that we need to go look. And we fix it as a test setup issue or a test recipe issue or a test case issue. So we also want to bisect it so that if once a regression is found, we want to be able to go back and tell exactly which commit it was that caused this problem. So that's another bot that we're working on so that we can go tell the developer, find, and say, look, this is where you start a regression. Please fix it. So that's kind of something that's happening as well. And of course, more topologies. Networking has so many topologies. So we're adding more and more tests. As people add more functionalities in the Linux kernel, we also add recipes and tests for that. So that's another thing that's continually ongoing. So finally, credits. Like the engineer who started this project, Jiri Perco, he's kind of the founder of it. And Andre Lekner, he's the maintainer. And Jan Luka and Anis and all these guys, they work in the LNST team as well. So this is the place where you can go get the code. You can, if you want to contribute to writing tests or if you want to reach out for any questions, we are also on IRC that I put in the beginning of the slide deck here. So we are on a free note at this point. So reach out if you need any help with any of your tests. At this point, we'll take questions from any of you. Thank you for the information. So I just wanted to know if this LNST does test for things like DPDK, SRI, OV? Yeah, yeah. We have added the work in other ways. We are adding DPDK and OVS as well. OK, so for DPDK, I mean, what kind of traffic generator that is using like test PMD or something like that? The traffic generator, I think, it was Moongen, I believe. If it was Moongen, there's another one, which is the Cisco. So one of them is being used. Chirax. Yeah, Chirax. OK. OK, thanks. Any more questions? Thank you very much. Thank you. Thank you.