 Okay, I'm Hanok, I'm from Cisco System, and I'm going to present to you a small project that we created to test our routers and our feature on the router. It's called QX, and actually it was my first journey in the open source because my friend told me, try to open source it. Don't put it in your drawer, and this is the result. Okay, so I'm going to talk about traffic generation, how we are doing it in Cisco to test our routers, and it's all based on software. Okay, today there is a DPDK library and I will talk about that. So because I have very, very short time, I will talk about stateless and advanced stateful mode of the traffic generator. Let me start with the result. The result after we open source the traffic generator, it seems that many has the same problem. Many needed traffic generator to test their routers. So for example, many open source start to use T-Rex. Like OpenNV, DPDK if you want, FIDO, Cisco internally, and many things in Cisco and Intel and Melanox, and Red Hat and Small. This is from an analytic on our documentation. You can see that it's growing about 1,000 users actively. And this is the mode of operation because there are many types of modes that you need to test. There need to be modes of operation from the traffic generation. For example, you don't need realistic traffic to test a switch, right? Because there is only a switching of packet of simple lookup. But when you want to test snort that doing inspection and normalization or DPI environment, you need to create a realistic traffic, something that with Layer 7 that simulate client, servers, and application. If you want really to evaluate the performance of the gear that you want to test. So there are in general two modes of operation, stateless and stateful. One of them is I will talk about advanced stateful and I will talk about stateless. The problem that we try to solve again is to estimate the performance of stateful features on the router. Stateful feature on the router behaving in a weird way. You know, for every flow we open a context, we cache the clients, cache the server, try to normalize the traffic and so forth. And by bumping a UDP packet and short packet, we won't get any reasonable number, right? It won't give us anything. But because of that, we need to generate realistic traffic. The problem is that realistic traffic really cost, really expensive, like 500K for 50G or 100G. And it's not flexible and this is the reason that we open source this. So what is T-Rex? T-Rex is a software. It's an application, it's a Linux application. It sits on top of DPDK and it's exposed a free way of mode that I've already talked about. And using that, it can come with a container and it's scalable. Everything is about scale and virtualization. This is how, you know, I perceive it. It's really, really fast. Everything is from bottom up is scale. And this is the slide I'm using in Europe. There is another slide for the US. Okay, let's talk about Stateless. Stateless is a way to generate traffic to test switches. It's composed, the building block is a stream. We call it a stream. And you can add more stream, remove stream. And the API is JSON-RPC and we created a Python that simplified the way that you can work with it. So there is a server side, you install the T-Rex server, and then you interact with it with the client, with the Python. You add stream, remove stream, get statistics, start it, and everything else. There is a nice GUI that works on top of this API, on JSON-RPC API. And let's see what is stream. This is the traffic mix that we are using. So you can build a profile that builds from streams. In this example, there is three streams. The blue one, the green one, and the yellow one. So the blue one is a packet that I can generate using Scapi. Anyone knows Scapi? So you can build a packet, a template of a packet, and then you can build a program that changes the packet over time. For example, I want to create a range of source IP to destination IP. And then I can choose the mode. The blue one is continuous, just bump the packet in specific rate. The green one is a burst of packet. Let's say I want only three packets. And the yellow one is the multi-burst with inter-packet gap, inter-burst gap, and I can connect them. I can create a program that say when the green one finished, start the yellow one and then point to another stream, it's like a program that you can build on Python, using Python. Let's see how simple it is before that. So this is the architecture, the high-level architecture. There is a server. There is an RPC using JSON-RPC. There is a data path that I talk later on that is scalable with the number of scores. As you add more cores, you will get linear scale with the performance, which is really, really high. And then come the Python that encapsulates all the JSON RPC into a nice API. And there is a Java API for someone. Ericsson is supporting that. And there is a GUI. And there is a console. There is the API. We're just wrapping everything for the users. So we separated the definition of the profile from what to do with the profile. This is the definition of the profile. I define this as the low-word of a profile of stateless. We defined really a simple continuous stream with this definition. Ethernet over IP, over UDP, over 10Xs. Okay, this is the packet. There will be a different type of packet from different direction. From one direction it will be 16.001 to 48.001. And the other direction is the opposite. This way we can create B-directional traffic. So this is the profile. And then we can manipulate the profile, load it. So there is a console that you can start the profile, load the profile, get statistical about the profile, and so forth. And there is the API. This is the API impact on that we wrap everything. So in this way we are connecting to the server. And then reset everything, reset all the statistics, add a stream that I showed already. Clear the statistic, start the traffic. And here I can multiply and say that the traffic that I want is 5 megapacket per second for a duration of 10 seconds. And then I can wait. After I wait, I can get statistics, how much packet, how many drops, how many packets were sent, and so forth. So a really, really simple API. And T-Rex do the hard work to separate what you ask it, what you ask him to multiple core. It's by magic dunder. You don't need to separate the profile. So this is the thing that we do for you. This is the performance. This is on the XL710 on one core. You can see that we can reach to 30 megapacket per second in one core. And it's linear scale. So it's all about the performance. So let's talk about stateful. Stateful is more for features that inspect the traffic, like DPI, like SNOT, like firewalls, like NAT. They need to generate stateful traffic. In this example, T-Rex can act like a server or like a client and generate traffic on top of TCP stack that we wrote. The reason that we wrote the TCP stack is that if you will take TCP stack from Linux, it won't scale. It will scale to 1 million packet per second, and we needed much more. We needed 10 million, 40 million active flow generating 200 gig. And this is how we did it. We took a BSD, a native BSD stack, and we changed it in a way that it would be multi-core. Every instance of the... Every thread instance had its separate stack, and we did, through the API, through the control plane, separated the application to each core. And we managed that. From the perspective of the user, you see one box that do one thing. Okay, so there is a layer of emulation of application on top of TCP stack, on top of DPDK, and everything is event-driven. Every core has an event-driven loop. No thread, nothing. No locks, no interaction between them. Only a messaging between the core. And by that, we can reach really high-scale. This is an example for the emulation layer. Okay, the clients, for example, do a request, and then wait for a response, and then can do a delay of random, and then send another request, wait for response and close. The server side do the opposite. You wait for the request, send the response, and so forth. Okay? This is just the low-word of the lower-level microcode that we have in the emulation layer. Let me show you a real profile. Remember the profile of the stateless? It was Python. It's talk about streams. Here we are talking about application on top of TCP stack. So in this example, we have a utility that can take a pickup file, convert it to the instruction that the emulation layer understand, and then replay it on top of TCP stack from the client to the server, and so forth. And by that, reaching millions of servers, millions of clients talking to each other, and exercise the device under test with millions of flows. I will show a performance. Just to dive in what we are doing inside, so from simulation point of view, from the client side we are simulating, creating the socket, connect. Once we got the cincinac, we are writing a buffer to the TCP stack, and then reading. This is an example of write and then read. And then we close the flow. From the server side, we don't open all the servers ahead of time because we cannot do that, right? Let's say we have a millions server. We won't do the bind and socket because we might not need all of them. So we have a special API today because we rewrote the TCP stack from BSD. We had a lazy allocation. Once we get a packet to a server, we dynamically simulate everything like it created the socket, bind, listen, and then start the program. But you don't need to do that, right? This is internal. You just need to define what you want to do to provide the pickup, and we will do it for you and get all the statistics. This is an example with two templates. There is a ton of statistics. Statistic is the God here, right? We cannot miss even one packet. So you can take a JSON of all the counters from the TCP stack, from the flow table stack, from UDP, and from other layers of the traffic generator. I want to touch really one point of the complexity in what we did. This is a real engine, by the way, of the car that I showed you before. So let's see what is the problem with scale with TCP. This is one flow of TCP. In the transmit side, you have a sliding window. Let's say 32K or 64K. This is how the Mbuff. Mbuff is a layer inside the kernel that manages the pointer inside the packet. This is how a packet looks like. It needs 32K. In the worst case, if we need 10 million flow, let's do the math. 10 million flow multiplied by 32K, it's about close to half a gig of memory. But we cannot have half a gig of memory for 10 million flow. It's about 40 million flows. So what we did is we changed the API of the stack. Instead of pushing the data to the stack, the stack is asking the data from us, from the upper layer. And by that, we are saving a lot of memory. We can use for 10 million flow only .1 gig of memory. And you don't need to do anything. You just need to write and you can use one of the profiles that already have and you change it a bit and we will do it for you. The same problem happened in the ARX side when there is a drop. There is a window that tries to accumulate everything. Okay? And just to show you the last comparison, we compare NGNX with T-Rex to T-Rex client and server. And the performance is factor 100 faster and from memory perspective is 3 orders of magnitude. That's it.