 Hello, everyone. My name is Karan. I'm passionate about systems engineering and reliability. I work at Walmart, where I'm responsible for developing traffic platforms. Today, we are going to go through an overview of leaf. Leaf stands for lightweight EBBF application foundation. We acronym it as leaf because it's based on EBBF and is extremely lightweight. Let's go through what we plan to cover today. We will start with a brief introduction to EBBF and leaf, followed by an overview of the leaf that form an EBBF solutions. We would then like to talk about our open sourcing plans and walk you through a demo. EBBF runs as a mini VM inside the kernel. It essentially provides a sandbox environment to insert code into the running kernel. Let's understand this in a little bit more detail. The Linux kernel is fundamentally event driven applications that are running in user space, make system calls to the kernel. A couple of examples of system calls include reading and writing from desk, connecting to applications, reading from sockets. We also have hardware devices like network cards, disks, USBs. And these hardware devices send interrupts to tell the kernel that certain data is ready to process. Example when a packet arrives at the neck or when the data is available for reading on the desk. The kernel which you see in the middle handles all these events. In cases where we would like to extend the kernel's functionality, the traditional approach has been to build your own custom kernel by writing modules upstream and wait for the next kernel release. EBBF presents a new model which allows us to extend the kernel functionality through simple programs. These programs can be associated with desired kernel events so they are executed whenever the event happens. For example, we can run an EBBF program when a packet arrives at the neck or when an application makes a system call to the kernel. So in a way, EBBF programs are to kernel, what plugins are to proxy or web servers. EBBF has out-of-the-box integrations with low level networks such as XTP, TC, as well as probing mechanisms such as K probes, U probes and trace points. Now EBBF also provides a safe and secure way to write efficient EBBF programs and run them in the kernel. The verification step ensures that the EBBF program is safe to run. It validates that the program meets several conditions. For example, it makes sure that the program does not crash and that it always runs to completion without sitting in infinite loops, for example. The just-in-time compilation step translates the generic bytecode of the program into machine-specific instruction set to optimize execution speed of the program. This, in other words, makes EBBF programs run as efficiently as natively compiled kernel code. The popularity of EBBF is rapidly growing. There are more and more EBBF programs being written to solve a wide variety of problems. Several startups are building technologies around EBBF and large technology companies like Facebook, Netflix, Microsoft are embracing EBBF to solve large-scale problems. At Walmart, we too are embracing EBBF and using it to solve similar problems. A challenge we faced when first adopting EBBF was how to manage and orchestrate multiple EBBF programs on a large scale. We require to run numerous EBBF programs on a given node and we have thousands of nodes across many DCs in a hybrid cloud environment using multiple cloud providers. Due to the lack of an enterprise-ready solution, we decided to develop our own platform. LEAF provides complete lifecycle management of EBBF programs with the help of an advanced control plane that has been written in Go. This control plane orchestrates and composes independent EBBF programs across our infrastructure to solve crucial business problems. As we proceed further, you will hear me use the word kernel functions interchangeably with EBBF programs. We refer EBBF programs as kernel functions as they allow us to extend the kernel function. Coming back to our control plane, our control plane consists of two major components that work together to orchestrate kernel functions. The first one is deployment APIs, which a user calls to generate configuration data. The configuration data includes which kernel functions will run their execution order and the configuration arguments for each kernel function. The second one is the LEAF demon, which runs on each node. It reads the configuration data and manages the execution and monitoring of kernel functions running on that node according to the data that's available. In this way, we have simple APIs that allow users to add, remove, and reorder kernel functions on the fly. And these APIs sit on top of a distributed model to manage and configure kernel functions on a per node basis. Outside of the control plane, we have developed many EBBF-based kernel functions that help us replace proprietary applications and hardware with blazing fast EBBF code. We do have a section later that provides more details about this. Together, the control plane and the EBBF kernel functions provide kernel function as a service. In the next slide, we'll try and understand how this kernel function as a service model works. Now, as you can see on the top right corner of the diagram, EBBF programs can be community developed, third-party vendor ones, or the ones that the Walmart LEAF team has developed. We have a LEAF build engine which pulls the kernel functions, compiles them, and pushes the bytecode to an artifact management solution. Now, when a user wants to deploy a kernel function, he can call a LEAF D API to provide configuration data. Once LEAF D reads this new config, it orchestrates the kernel function on that Linux host as per the defined parameters. If the user gives a set of kernel functions, then LEAF D can orchestrate all of them in the same sequence that the user wanted. Executing kernel functions in a sequence is called chaining, and it is one of the most critical operations that LEAF can perform as a platform. This whole workflow is in line with the build ones deploying everywhere philosophy wherein we would build the deployment package once for any environment which in this case would translate to multiple kernel versions and set configuration at deploy time. But I walked you through an example of a new kernel function deployment. All the operations that we discussed in the previous slide can be triggered through APIs. In the next section, we will take a closer look at the EBBF kernel function ecosystem. EBBF programs have the capability to instrument, inspect and interact traffic while providing enhanced performance observability. This slide gives us an overview of use cases. And as you can see, these can be broadly classified into three main areas, networking, observability or tracing and security. In the realm of networking solutions, the combination of programmability and efficiency makes EBBF a natural fit for all packet processing requirements. We have developed EBBF programs for various business use cases such as packet tagging, traffic mirroring, traffic direction and layer 4 road balancing. Similarly, we have started providing deep visibility into system and network performance. While traditional solutions rely on static counters, static counters and engages exposed by the operating system. EBBF enables the collection and internal aggregation of custom metrics that are generated from a wide range of sources. We're also actively pursuing the use cases of tracing and profiling that allow unprecedented visibility into the runtime behavior of applications and system itself. On the security side, EBBF allows us to combine the visibility and control from all aspects to provide functionality that operates on more context with better level of control. Some of the EBBF programs that we have developed allow us to export net flow logs and perform connection and connection rate limiting. We are also partnering with our security teams on use cases around deep packet inspection and DDoS. In today's presentation, we'd like to talk about some of the networking and security solutions that have been developed by the LEAF project. EBBF is developed and managed by a super smart and enthusiastic community that is actively working on adding new features to it. We believe that EBBF will eventually become the modern software defined networking solution of the cloud and cloud native era. There are a lot of networking and security solutions in the market that are based on EBBF. Celium and Calico are good examples of CNIs for Kubernetes. In this slide, we're going to talk about some of the EBBF based networking solutions that have been developed as a part of the LEAF project. Hopefully these examples will showcase the potential of EBBF. I mentioned previously that EBBF provides us with a mechanism through which we can plug in our code to pre-defined hook points in the common. Here we are using a network hook called TC, which is traffic control. All of the ingress and egress traffic goes through these hook points. Once we place our code at these points, it can see all the traffic and process it. For example, we can extract the information that we need and act on it, pass the data onto higher layers of stack, drop the traffic or redirect the traffic to some other host. It can also manipulate the traffic by changing certain fields of the packet depending on the requirement. All these are done in the kernel, so it yields the best performance as we don't need to carry the packet all the way up the stack to the user space, which would then involve multiple context switches and processing resources. Let's go through the EBBF solutions then. The first one mentioned here is FlowExporter. As enterprises start serving live traffic out of public clouds, it is increasingly important for them to export traffic flow data to security solutions that provide an advanced threat protection across the extended network and cloud. Private clouds provide traffic flow data through dedicated network appliances, which are like hardware-based solutions. However, tenants in public clouds do not enjoy a similar level of access or network visibility since the infrastructure layer is shared. We considered options to address this, such as adding a network hub to process traffic flow data. However, such a configuration increases traffic latency and also adds another layer to manage in the traffic ingress stack. As the best solution, LEAF developed an EBBF kernel function which extracts and exports flow metadata directly from Linux-based edge proxy servers. This essentially eliminated a hop in the critical ingress path, reduced the site latency by 50 milliseconds, and also helped us save on the licensing and operational cost involved in managing the additional hop. Next is traffic mirroring. The best way to succeed in a business is by providing an amazing customer experience. At Walmart, we want to have visibility into how our customers are interacting with our site. We have a few analytic solutions that can operate on the data streams and provide the needed analysis. But these solutions need the data of interest, and that interest changes from time to time. There is an opportunity to save valuable time and money by automating the process of collecting this data. One of the most effective ways of collecting this data of interest in the public cloud is from the edge proxy servers. However, it is also critical hop that handles all of the interest traffic to the site and is performance sensitive. We again saw the potential of using EBBF and developed a kernel function that encapsulates the filtering and mirroring functionalities together. This solution supports one or many custom filters and also allows us to capture only header data if required, thereby limiting the bandwidth utilization. Additionally, given that EBBF is very lightweight, highly performant and safe, this solution has been implemented at the source, which is on the edge proxy itself. This allows us to eliminate multiple hops in the traffic paths, reduce latencies, and the licensing costs of buying and maintaining third party vendor solutions. The last one on this slide is packet typing that we use to ensure quality of service in the stores. Like I mentioned before, EBBF can not only perform read only processing, but it can also be used to manipulate the traffic. This is one good example of packet manipulation. The bandwidth availability can be limited in stores and so it is extremely important to use it optimally, especially in peak traffic situations like the holiday season. Since payment transactions are critical for any business, we do prioritize them over other kind of traffic. The way we achieve this is by setting the DSCP tag on the packet so that the egress routers can see the tag and allocate dedicated bandwidth to the tagged packets. This ensures that even in the event of congestion, the payment transactions don't fail. We have a few more kernel functions here that are based on XDP, which is ExpressDataPath. With the advent of XDP and EBBF, it is now possible to achieve high performance packet processing in the kernel data path. XDP allows us to attach an EBBF program to a low level hook inside the kernel. This XDP hook, implemented by the network driver, provides a programmable layer before the Linux networking stack. This makes XDP the de facto choice for use cases wherein we must drop or redirect the traffic. Let's go through a couple of XDP-based EBBF solutions that have been developed by the LEED project. The first one is concurrency and rate limiting. As enterprises increase their digital footprint, it is vital to have safeguards put against certain bursts of traffic or cyber attacks. Additionally, sometimes due to certain bursts of traffic, the upstream applications can slow down to do various reasons, which in turn can cause back pressure at the edge causing the connections to pile up and to overall maverage infrastructure. Having a connection limiting and connection rate limiting feature allows us to limit the concurrent number of TCP connections and the rate at which new TCP connections are established. Adding this functionality to our edge proxies and load balancers protects our compute resources from getting overwhelmed when there is a sudden burst of traffic that is beyond what our resources are capable of handling. There is also a recent exciting offering from LEED that is Layer 4 load balancer. Several internet companies serve millions of requests every second out of their edge network. Since the Layer 4 load balancer must process every incoming packet, the solution needs to be highly performant and also must provide the necessary flexibility and scalability required in production environments. Traditionally, Layer 4 load balancers have been hardware based primarily to suit the high performance requirement. However, taking a hardware centric approach limits the system's flexibility and introduces limitations such as lack of agility, scalability and elasticity. As applications increase in number, complexity and importance, it is vital that the infrastructure layer is app focused and not limited by the confines of hardware configurations. All the performance needs can be met with software solutions themselves using EBPF and XTP. The LEED project has leveraged Katran from Facebook to develop an EBPF based load balancer offering that is implemented in a hairpin model on a single neck. One of the key features that we wanted to enable in our production environment is DSR or direct server return so that we can send responses directly to the client. This ensures that the ELB does not need to handle return packets which are typically larger in size. To implement DSR, we developed an ELB agent that runs on our fleet of hypervisors. For other environment types, we run the agent on VMs, bare models, etc, depending on the use case. The agent can run on any commodity hardware that is based on Linux. ELB is helping us replace hardware based solutions with a modern software based solution. Enabling DSR not only eliminates centralized choke points in our network but also helps us improve the overall site frequency. Since it uses XTP, it is blazing fast and it can process the traffic at near line rates, thereby helping us reduce our infrastructure footprint considerably. Both the above XTP kernel functions can be run in a chained fashion to work cooperatively with each other. This can ensure that all the illegitimate traffic is dropped by the connection and connection rate limit functions and the ELB doesn't have any undesired effects even under adverse conditions. This concludes our section on EBPF solutions. Next, we would like to walk you through our open sourcing plans. At LEAF, our vision is to create a marketplace for EBPF programs where users and developers can share their own signed EBPF programs and download EBPF programs from others. Our LEAF control plane can then be used to orchestrate and compose selected EBPF programs from the marketplace to several business needs. In this way, LEAF provides developers with a cloud and vendor agnostic platform for adding capabilities to the operating system at runtime. We believe that the creation of such a fully integrated software ecosystem around EBPF will unleash its full potential for community adoption. A vital prerequisite to the kernel function marketplace is the open sourcing of the LEAF project, which has been a top priority for Walmart and our team. I am glad that we are open sourcing the project today and would like to thank the Linux Foundation team for helping us through every step. We would like to encourage everyone here to adopt and contribute to the LEAF project. In the next section, Brian will walk us through a demo on how we can set up our dev environment and start using LEAF. Thanks, Karan. Hi, everyone. My name is Brian Merrill and I'm a software engineer on the LEAF team. I'm excited to be giving you a demonstration of LEAF today. Before we actually drop into a terminal and start running commands, I want to give you a quick preview so you have some context going into this demonstration. So if you look at this slide, there's two main boxes on the right hand side is our virtual machine on which we'll be running LEAF. This virtual machine is all set up and configured using Vagrant Automation, which is checked into our open source repository. So you can grab this automation and within just a few minutes be running the same commands and things that I'll be showing you today. It's a very easy, quick way to get your feet wet with LEAF and give it a try. So on this virtual machine, we're running a couple of go-based web servers where we can send test traffic. We're hosting a kernel function repository. This is where LEAF can download the EBPF programs that it's going to manage and execute. And we're also running Prometheus and Grafana to show some of the metrics that LEAF provides for the EBPF programs that it's running. On the left hand side of the screen, we see our host. From a host where we'll be able to access all of these services on the virtual machine via some ports that we've configured. So let's go ahead and go to our terminal and start running some commands. So like I said, there's some ports configured on the host machine. We're showing those here. Those are our go HTTP web servers that we can access to send test traffic to. We have our Prometheus, our Grafana port, Prometheus. And we have a couple of LEAF ports that we're able to run and access APIs on. And then this is showing our LEAF code on the host, which is mounted onto the virtual machine. So we could actually make code updates to LEAFD and quickly test them out on the virtual machine if we wanted to. So to keep things consistent with our slides, the slide that I was just showing, I'm going to keep our host commands running on the left hand side of the machine, the screen. And our virtual machine commands running on the right hand side of the screen. And then you can see which window pane is active because it'll be the black one. Hopefully that'll help you follow along as I move between some of these window panes. So onto a virtual machine, you can see that we've already built a LEAF binary, a LEAFD binary. Let's go ahead and run LEAFD. For that we need to be root because it's going to be doing some privileged things and loading these EBPF programs into the kernel. We need to provide it a configuration file. This configuration file doesn't tell it which EBPF programs to run or how to run them. This is just an initialization configuration telling you which ports to use, how to log, where to store metrics, things like that. Let's go ahead and start LEAFD. Okay, LEAFD is up and running, but it's not yet running any EBPF programs, but it's ready to be configured to do so. So before we actually start some EBPF programs, let's run some traffic against those test web servers or one and see what the results are. So I'm going to run Hay, which is just a nice, easy program for creating HTTP load. Let's run 200 requests with 20 concurrent workers and let's run them to just pick one of these web servers. Okay, you see that we got all of those responses back very quickly. It only took 0.01 seconds. Most of them came in within that time and they were all 200 okay responses. So that looks good. Let's say we're in a scenario where we really want to slow down this traffic. Let's see how we could do that with LEAF. So we have a payload here. This is a payload that we can send to our LEAF configuration API. In this payload, we search by sequence ID. We'll see that there's two EBPF programs in this payload. One is a rate limiting program and one is a connection limiting program. Then we have various arguments and things we can send to them. Some neat things we could change are which ports we want these two, these EBPF programs to monitor and the actual rate limit and max connection value. And these can all be changed on the fly, which is really great. So now that we've seen the payload, let's actually send that into LEAF. You can see we can do this just with a simple HTTP post in our test environment. So let's run that. And in the top right, you should see the LEAFD log load this configuration and start these EBPF programs. So we can see it doing that now and we got a successful response back. To verify that it's actually running the programs we expect, we can look at the LEAF debug API or the NIC that's configured on the virtual machine. So if we run this, we'll see now LEAF is running with our connection limiting EBPF program and our rate limiting EBPF program. So now we should really be slowing down this traffic. We run our hay command, the same hay command we ran before and see if we've made a difference. You can already see that it's taking longer. In fact, it's going to be taking much longer. We're really going to slow this traffic down. So let's take advantage of this time to kind of take a deeper dive look into some of the things that LEAF is doing under the covers for those that are interested. So as a root, we can run a BPF tool, which is a core BPF program. It's not a LEAF program. It's provided by the kernel team, I believe. And we can show which programs are running. So you can see our XDP programs are running here. We can also show some of the maps that are running. And LEAF uses these maps to do all kinds of things. It's where it provides the configured metrics. It's where we provide the configuration values. It has these maps are how LEAF is able to chain together programs but be able to reorder them or add and remove them without having to remove the base BPF program from the network interface. There's lots of things that are going on here under the covers that we don't have time to cover right now, but might be interesting for you to dig around in on your own. So our hay program finished. You can see it took much longer. It took about 60 seconds. So we really slowed that traffic down. And LEAF provides metrics into what those BPF programs were doing. So let's take a look at that using a browser and logging into Grafana. I'm going to skip creating a new password. And then I can look at the pre-configured dashboards that are part of our vagrant automation that are already populated here. So the first program that runs in that chain is our rate limiting program. It was sequence ID one. So let's take a look at it and see what it saw. You can see that it saw a big sharp increase in the number of connections received. And you can see that it actually started dropping some of these connections. So any of the requests that got passed rate limiting would go on to the next program in the chain, which would be our connection limiting program, which limits the max number of connections that can be alive at one time. So let's look at what it saw. Again, it saw a sharp increase in connections. And even it was dropping some of those connections that it saw. So it went through two layers of aggressive limiting. And that's why we saw that time for our requests increased so much. This was an extreme case to show sharp increases and dropping the traffic. But these programs are all configurable to fit your business needs and requirements. So that concludes our demonstration. Let's jump back to the slides. We want to give a thank you. The entire Leaf team would like to say thanks to all those that have been involved in our journey thus far to get Leaf to the point where it is and running in production at Walmart and doing some great things. We're excited that it's now open source and hope to have many, many more collaborators and contributors in the future. We have an exciting roadmap planned ahead for Leaf and lots of things that we'd like to do. So if you're interested in getting involved, we would love to have you. If you're excited to answer any questions you might have about Leaf. We also just want to say thanks to the open source community in general for providing eBPF and answering our questions and helping us get to the point where we are. Thanks very much and have a good day.