 Hello. Welcome to this next KubeCon North America 2021 session. My name is Tom Feelen and my co-presenter today is Tom Galway. We are both from Hewlett Packard Enterprise and we'll be discussing the why and how of using Kubernetes with data processing units to offload software infrastructure. I make the usual disclaimer that this is a discussion of what is possible. It does not imply a commitment from HPE on specific product feature delivery. Now, before we can discuss the use of data processing units, more commonly known as DPUs with Kubernetes, we need to understand why we want to use them to run containerized workloads in the first place. For this, let me turn it over to Tom Galway to tell us about some recent developments in foundational trends in IT. Thanks, Tom. As Tom mentioned, let's start with some foundational trends we're seeing across the IT spectrum, starting with the shift in thinking when creating or transforming business initiatives. Digital disruption and digital transformation have been topics for some time, but we're seeing an acceleration of enterprises shifting their business innovation strategies to embrace the concept of being digitally aware. Enterprises are adopting the concept of a digital ecosystem where the automation of business processes may be disaggregated and extend beyond the enterprise into third parties, including suppliers, partners, customers, and competitors. Enterprises are designing individual business initiatives within the digital ecosystem to be organized into a collection of processes and data interacting within a virtual infrastructure. The expectation is that this virtual infrastructure is optimized for the business outcome and fully abstracted from the physical infrastructure. As enterprises innovate on the business side, we see data and application architectures are shifting towards a more disaggregated model offering greater agility and supporting elasticity. This is substantially changing the way IT resources are consumed. The number of sources where data is created and ingested into the application ecosystem is increasing. For example, the proliferation of IoT with its unique data models and contextual significance. At the same time, the value and usage of data is also changing, which not only shapes the service levels required, but also influences the gravitational pull be exerted by various entities within the digital ecosystem. This gravitational pull influences the how and where compute and data need to be processed. Despite the hype, data doesn't always exert the strongest gravitational pull. To deliver these new models, application developers are adopting cloud native and microservice based architectures. As applications move towards cloud native, we are seeing technologies such as service mesh that support greater agility and move IT towards the ultimate goal, which is operating in a just-in-time model. By leveraging a scale out, disaggregated model, applications have the ability to be deployed using a composable model with a mix of core code, libraries, open source, external applications, and microservices. Individual components of the application will be executed on the platform that best fits its function in the context of the overall class of service. This brings substantial benefits, however it does bring new challenges, particularly in the areas of security, orchestration, and meeting service levels. Application topologies and flows will have greater complexity, which brings a greater potential for increased latency, jitter, and traffic flow. Small degradations in performance may seem inconsequential for an individual microservice, but when aggregated to the application level, it will impact business process service levels. This will require a cloud native infrastructure that supports the optimal placement of workload, ensures the placement supports its performance, security, manageability, and accessibility to data. This also needs to be extensible to support the connectivity and flows to all services that are part of the application. With cloud native application architectures and distributed data sources, securing an application becomes more complex. We are seeing an increased attack surface with new and more sophisticated attack vectors. Remember, bad actors have a culture of continuous innovation, and so we'd have to have that culture as well. Given that, there is an increased need for a trust fabric with a focus on the protection of applications and data in motion and at rest. In addition, there is a trend towards protection of these assets in use, known as confidential computing. As business requirements evolve, so does the need to improve service metrics by like performance, latency, and jitter. There's also an increased focus on measuring the return on investment for an application, which increases the need to support service-level differentiation that correlates to the business value of the application. And lastly, we're seeing a broad adaptation of the tenant operator model, separating the application running from the underlying services. The requirement is to provide isolation between operator and tenant. In many enterprises, there's also a requirement to support isolation between tenants, even in the same company. And for example, in highly regulated industries, the separation between regulated and unregulated parts of the business. So we talked about the trends and drivers for moving to cloud data and touched on some of the challenges. In looking at technologies that can serve as an enabler to unlock the value of these trends, while solving the challenges, the value provided by the capabilities of some of the CNCF projects start to become clear. This is especially true in areas such as security, observability, policy, and orchestration. The conventional way we would look to instantiate these capabilities would be to operate them on the host processor and attach to the workload directly or as a sidecar. As we integrate these capabilities into the workload and introduce additional automation, the required consumption of CPU cores and memory begins to increase. And since these capabilities are usually single-tenant, resource consumption increases proportionately to the number of workloads per host. The net impact is generally negative despite the benefits that can be realized with cloud native application architectures. The challenge is, how do we enable greater optimization of the cloud native application architecture while ensuring the usage of CPU cores and memory is maximized to support applications. One way is through the use of a data processing unit, or DPU. Now Tom is going to be discussing the DPU in more detail, but just for context, we see a DPU roughly defined as a card with processing capability, memory, and acceleration designed with the goal of supporting the delivery of a variety of underlying services to the workloads that operate on the host CPU. Typically these would be services that are common for the workloads and may be combined with the network capabilities typically found in smartNICs. So let's talk a little bit about the capabilities required to fully realize the power of cloud native. Now just as a disclaimer, I'm only going to be focusing on the capabilities that would benefit from residing on a DPU with much of this would be based on what's possible. I'm not talking too much about what is generally available in the marketplace. An important part of this architecture is how the collection of the capabilities of the underlying infrastructure are presented to the workload. Applications will expect to see an infrastructure that's optimized to its service level requirements. The shift towards disaggregation and disaggregated architectures is driving the need for building the underlying infrastructure to support what I call infrastructure plasticity. Now infrastructure plasticity is where the underlying physical infrastructure is enriched and presented to the application as a virtual ecosystem that is fully optimized to support the application's class of service requirements. In addition to supporting service level requirements, this also provides isolation between workloads on the host processor and isolation between workloads and the underlying services. In a sense, it's a pseudo metaverse for each application. In cloud native environments, there's an emphasis on the control and data planes. However, the service plane is equally important and an ideal to be deployed on a DPU. The service plane includes service mesh technologies such as Linkerd and Istio, but also includes observation of the behavior of all components of the application while measuring compliance to the service level requirements. This is also where policies are enforced, both explicit as set by the application owner and implicit as an action taken in response to conflicts in resource consumption. So for example, an example of an implicit enforcement would be invoking a traffic shaping algorithm in response to a noisy neighbor event. And we discuss the digital ecosystem and the challenges to secure it, particularly establishing trusts across the ecosystem. Some methods to mitigate the risk to the application supply chain include strong identity management and a mutable route of trust. With the route of trust, a vertical and horizontal chain of trust can be established per application ecosystem. This chain of trusts will also include the underlying infrastructure. Attestation between cooperating entities is required as is the enforcement of security policies. A route of trust, identity management and establishment of a chain of trust ideal capabilities to be deployed on a DPU. And we talked a little bit about the importance of observability in the DPU and how it provides value to the service plane for monitoring and troubleshooting. Having observability run on the DPU allows a real-time granular insights on the consumption of resources by individual workloads. Now, this can bring additional value when you combine that with the awareness of infrastructure capabilities. Analysis of the observability data can provide enriched information to the Kubernetes scheduler to support optimal placement of individual microservices, ensuring that that placement supports the performance, security, accessibility to data, and manageability requirements by the application. And lastly, the DPU can provide all the network services required to support scale-out application architectures. This would include protocol acceleration, firewall load balancing, and network overlays to support complex application topologies. Now, I'm going to hand it over to Tom, who will take us deeper into the architecture, the operating model of the DPU, and how the DPU and CPU combine to talk to each other to deliver these services. Thanks. Thanks, Tom. We have heard a lot about applications and how they can make the use of DPUs, but what really is a DPU is typically a PCI card that plugs into the slot of the host computer motherboard. On this card are a number of components such as a bus. Connected to this bus are CPUs. These are typically ARM processors that have a smaller space footprint, consume less power, and cost less, but also run slower and have less functionality than the x86-based CPUs usually found on computer hosts. There is memory, typically 32 or 64 gigabytes, but that number is rapidly increasing. There can also be specialized hardware dedicated to performing specific functionality. One such piece of specialized hardware is a P4 engine. P4 is a language that is used to define how data packets are processed. Having a P4 engine allows the DPU to support a programmable data plane. This means that many network and storage packet processing activities can be moved from the host to the DPU without loss of performance. Similarly, there can be dedicated hardware to implement data encryption or decryption without using the ARM cores. In fact, there can be FPGAs on the DPU, which implement many CPU intensive operations, greatly reducing the consumption of CPU cycles without loss of any performance. This could even include functionality to support the offload of portions of libraries and development kits. One such kit is the SPDK, Storage Performance Development Kit. Also the DPDK, Data Plane Development Kit, can be offloaded to the DPU. If the DPU happens to be a smart storage device, meaning this design specifically to offload storage activities, it will have a RAID controller and other specific hardware. Similarly, the DPU happens to be a smart NIC designed to offload network processing, it will have NIC support as well as Ethernet ports. So there we have it, the composition of a DPU. Now that we know what DPUs are and why we need them, let's spend some time understanding how we can use them. Since we're at KubeCon and focusing on containerized applications, we will approach this from the standpoint of Kubernetes as opposed to virtual machines or bare metal access to DPUs. There are three common software methods for using a DPU. First, we can treat the DPU as just another Kubernetes node. We can assume that it is running some version of Linux and we can use the standard Kubernetes APIs to access the node and deploy containerized workloads on it. This is the easiest way to use a DPU, but it hides a great deal of the power of the device. Next, we can treat the DPU as a collection of hardware resources and software APIs. The APIs supported on a specific model can be open or proprietary and are likely to be a mixture of both. The Kubernetes container network interface and container storage interface definitions can be used to implement a CNI and or a CSI for accessing the DPU. If the DPU has a P4 engine or RAID controller or other special hardware functionality, these can be used by the CNI and CSIs or accessed directly by the workload running in the container. This allows the functionality of the DPU to be used by the workload, but unless we are willing to write to vendor specific APIs, we are not likely to be able to use the full power of the DPU. The third method is to use the DPU to run specific services that we want to offload from the main CPU. These services are accessed by the containerized workloads using the same APIs as they would today with the services running on the main CPUs. Where the services are running is completely transparent to the application. This third method has the greatest potential for unlocking the power of the DPU and is what we will explore now. Here we will walk through a software functional architecture that will describe how to turn a DPU into a general platform for running services. We have a computer host and the DPU. On the host we are running two containerized workloads, in this case labeled workload 1 and workload 2. The infrastructure resource management layer refers to the set of DPU instance specific hardware and APIs which we discussed earlier. While there will be many common open APIs such as P4, DPDK, etc., there will also likely be vendor specific APIs as well. This dashed box represents a stack of software that will run on the ARM CPUs of the DPU. It provides a software environment where we can run our offloaded services. The top layer is the forwarding and policy enforcement layer. This code determines the path of IO requests issued by the workloads running on the host. It is controlled by a set of policies that define what can be accessed by whom and with what quality of service. The infrastructure resource services layer implements an abstraction of the infrastructure services such as network security, data plane management, and data encryption or decryption. This shields the services from any vendor specific details of the infrastructure implementation. The control services are a set of common services used to manage the configuration and operation of the DPU itself. These are examples of services which can run on the DPU and surface functionality to the containerized workloads. In this example, we see that telemetry information, aggregation, and logging functionality can be implemented on the DPU, perhaps using a common logging service such as Prometheus. The same can be true for a service mesh such as Istio and even a message passing service such as RabbitMQ or Kafka. Offloading such infrastructure services from the main CPU in a DPU vendor neutral manner means that we can run implementations of these services on any DPU. The infrastructure resource services layer hides the details of the DPU specific implementation. Now this is all well and good, but these are relatively simple infrastructure services. What is possible next? What is possible next is to offload more complex services. Let's refer to these services as common application services because they can be used by multiple, but not necessarily all, the containerized application workloads running on a given host. One such common application service is something like Spark or another distributed data processing framework. Let's just consider what we can do with telemetry data after it has been used to drain a machine learning model. For example, we can detect malware running in the container by observing its current network activity and comparing it to its historical behavior. We can build a model of the available resources for each node and compare it to knowledge learned about how the expected workload will use resources and then place the workload on the Kubernetes node which has sufficient resources for that workload. We can also use historical knowledge of application resource consumption to predict future resource needs. Now let's review what we have learned today. A foundational transformation in infrastructure is underway. Application architectures are becoming increasingly disaggregated in a move to provide greater application, agility, and elasticity. Data processing units and CPU offload devices in general are well positioned to aid in this ongoing transformation of the compute ecosystem. We have discussed how DPUs can provide a number of value added capabilities. Some of these capabilities are available now, some will be coming in the near future. As we outlined in the functional software architecture, DPUs can surface a consolidated infrastructure, a vendor-neutral way of providing common services to applications while reducing burden on the main CPUs. This software architecture includes an infrastructure abstraction layer to hide the specifics of a given DPU implementation. As we discussed in next steps, this infrastructure can lead to reduced application downtime by effective use of detailed telemetry data. A side effect of this telemetry data also referred to as observability data is to give more detailed insight into application behavior which allows for the implementation of enhanced security functionality. As we noted, application-specific value added services can be run on the DPU in addition to the common services. And finally, the running of services on the ARM cores of the DPU allows more application code to be run on the x86 processors of the host, increasing the overall utilization of the server hardware. With this sort of intelligent use, DPUs can become an integral part of modern computing environments. Thank you for your attention today. We will now open it up for questions and answers. As a reminder, after this session, please join us on the meeting place site for continued discussion of this topic.