 Okay, hi everybody and welcome to our session and today we're going to be talking to you about the open programmable infrastructure project and Telling you about what we're trying to do there and the goals and objectives During it. We go through some interesting technology pieces as well. My name is Mike Lynch I'm a senior director of networking in Intel I'm joined by Shakir. Hi everybody. I'm Shakir Caesar. I'm one of the principal architect within NVIDIA on the DPU project so So what is the open programmable infrastructure project or OPI for short? So the objective is To foster a community-driven standards-based open ecosystem for next-gen architectures and frameworks based on DPU and IPU technologies, right? So so for those of you in the audience with a Networking background. You've probably heard of these terms before DPUs and IPU's data processing units and infrastructure processing units They're basically devices with very very similar characteristics. Shakir will go through some of that in a few minutes time and go into some more of the details But think of them think of them as essentially cutting-edge and really our cutting-edge networking devices that can be deployed across cloud infrastructure, but also across Across other workloads and Places in the network not just in pure cloud environments So the project is brand new. It's only been formed since June 22nd It was it was officially formed and as part of the Linux foundation You can see the founder members here on this slide. So it's a it's a good cross-section of Device vendors and you've got myself mental and Shakir from the video here But also cross OS vendors red hat cross server OEMs Dell Testing vendors like Keysight But also end users or our cloud service providers like 10 cent and Solutions providers in the in the telco world like like CTE. So so quite a good cross-section of the ecosystem participating and and these are just the The founding members at the premier level. There's a bunch more Companies and individuals participating and contributing as well we also have a number of of folks at the General membership tier even in the last couple of weeks. We've had dream big semiconductor solid run and unifabrics joined the project So that's very encouraging So we're just kicking off. So really what we want to try and get across to you today is, you know What it's all about and why it's really important and And to gauge interest and participation So I'm gonna hand over to Shakir now just to segue into some of the technical aspects Thank You Michael. I think it is important to understand a little bit the history of Deep use and ip use and also the core components how it's evolved to appreciate the opi project because I Think the evolution of the Nick technology basically demanded the need for an opi So before I go more in detail on the technical side. I think visit data centers and demand for elastic use of computing and effective utilization of resources The ISPs already started experimenting beyond Nick so basically they put some FPGAs to encrypt data to do IPsec or even tunnels Later they introduced the desegregation of storage to be effectively utilized and so on We have seen several projects been introduced over five six seven years By Amazon and by Microsoft most of these technologies were based on Standard Nick with an FPGA and over time with putting larger FPGAs this type of Knicks became Smart Knicks and here you can see in this diagram How this smart Knicks work? So we have the top-deserve architecture. We have the user application bare metal operating systems Running on top of the CPU and then we have the PCIe beneath the PCIe We have accelerators and this accelerators need drivers again on the CPU side to take full advantage Beneath we have the Nick and so on as you can see most ISPs custom-built the solution so off the shelf was difficult to be deployed and Most companies first introduced that look you can buy a Smart Nick with an FPGA you download your IP yourself. So there were for example business models where Suppliers of smart Knicks We're talking about hardware IP and drivers on top on software where Basically customers can tailor the smart Nick for their needs, but with increased performance and complexity This cannot be sustained. Also the programming paradigm was complicated You basically had different vendors with different IBS supplying different Functions so overall the evolution was How we can basically evolve this type of technology to whether improve our infrastructure But the couple the couple from the upper layers means that we do have a here basically standard layer beneath we still have the drivers and Computing resources arm cores that basically do the processing in collaboration with I with the hard wired IP so accelerators and provide the services so The solution of IP use and deep use evolved from smart Knicks But the performance that a state-of-the-art deep use and IPs offer is beyond what smart Nick could ever offer on top the ability to software-defined programmability and Ability to have a shim layer to make this programmable means standard program by Basically having different vendors providing the hardware and on top Visible software that has been provided having a standard interface to the application running on top Let's look at bit the technologies themselves. So this is an officer chef diagram illustrating what the components of a Dpu IP is comprised of usually we do have PCIe because this is the way how it's been connected to source on the right-hand side We do have the network interface in between there are Switches the switches allow you Basically to softly define the way how you steer the traffic to potential VMs or different hosts on the same platform But also it allows you to program Policies how the steering is happening. So the common language being widely used is P4 and many vendors do have P4 programmable or Open-floor programmable data pass dealing with this steering in addition, we have Arm or MIPS course, but at the moment, I think the industry has settled with arm course even Intel is using arm in this particular case This is the computing resources that give you the flexibility and Where you can run a complete software stacks including an operating system and that basically the ability to do Lot of workloads you would do on a CPU where your outsource to this to the nick This could be for example the hypervisor and then in addition which differentiates The smart niggas That you do have accelerators the accelerator important When you want to do this type of workloads on a CPU the computing resource can be significant So because this is dedicated to infrastructure the use case are limited in this case You can now deploy hardware accelerators to achieve Hundreds of gigabits now latest DPU targeting 400 gigabit and 800 gigabit and this type of performance you can't Not achieve purely with software. This includes obviously protocols from NVMe over TCP offload over rocky and Also part of the P4 pipeline itself also in line macro processor HBM On the other side secure to become more dominant and more important because we have to isolate Different flows from different VMs. We have to ensure isolation, but we also Deploy in addition to isolation capabilities like Monitoring policing policy enforcement We have to check the integrity. So this accelerators that's been widely used include crypto Crypto engines for public key and private key regular expression for pattern matching to inspect intrusion and intrusion prevention and hashing to check integrity that For example VM images are not modified or changed So looking at now use cases Is every vendor out there has certain Spot where they focus the DPU IPU Specifically the most dominant area has been storage. So implementation of storage offload and the ability to attach Solid state this solid state Devices on to the store SmartNIC but another use cases include network security like intrusion detection intrusion prevention, but also the ability to introduce the Confidential computing to provide the necessary Isolation to ensure confidentiality Is enough computing power in terms of the MIPS or ARM cores that you can deploy gateways and deploy basically routing capabilities business devices and finally because of the Computing resource you have you can even Run your hypervisor the control plane of the hypervisor completely on this ARM cores basically Enabling the CPUs and the memory on the host themselves to be fully utilized for workloads So there is limitless This devices are standalone devices. They are not only can be used in with a server But you can use the device on its own Remotely log in and they're on its own rights They are computers and because of the accelerators capabilities They can also be used for appliances as well and for another type of each computing devices Because of this capability you may want to attach GPUs on these device or Also AI processing capabilities. It doesn't need to be purely CPU So as you can see from a evolution from a standard Nick a new family of technologies evolved and this technology is called IPUs and GPUs and this technology is Primarily targeted for the data center infrastructure But has beyond this capabilities to be widely used in other areas as well So we can see over the next decade when software Pully implemented on top of this platforms in a unique manner in a way that we can basically under OPI Standardized the API's that user uses. I think the use case can be even wider in particular in a distributed H computing nodes as well as in large data centers Thank you so So with that introduction about the technology by Shakir So why are we here again in terms of why do we want to foster an open community around this technology? Really is because it's first of all, it's super complicated. It's super advanced This technology has this genesis I guess and worked on by the major hyperscalers over the last five seven years practically all of them, you know from Amazon to You know some of the guys in China to to Microsoft to others have deployed IPU DPU type technology in various guises mainly proprietary guys as some instances merchant DPUs IPUs as well in their own public clouds and For very particular reasons. I mean they obviously like you know want to clearly delineate between, you know the infrastructure that they have and What is happening with respect to tenants that they're that they're hosting, you know on their servers and You know, they have a obviously strong desire You know to control control that environment top to bottom, right? And of course, you know because of the nature of their business They're uniquely positioned to do that by themselves. They don't have to depend on a supply chain or an ecosystem to support them or to work with they essentially can do it themselves And you know, so therefore, you know, they have the ability to really, you know deploy this technology optimized fine-tune it end-to-end, you know and and from their perspectives they've been able to successfully do that, right? That's not the case outside the major hyperscalers outside For the rest of the world, you know be it, you know parts of the enterprise domains or the telco domains You know, there is an existing supply chain, you know through from in-customers down to server OEMs and silicon providers like ourselves and we all need to work together, right? And therefore it's much more of a more complicated tapestry to get right and You know, there's there's also a Few things that have to be lined up or put in place in terms of requirements before this technology can deploy at scale Right as I said is complicated And you got to stitch together various elements of the supply chain, you know to make it happen So we have a few things here like you know that are mandatory for the for the broader ecosystem to to address in terms of those requirements So first of all What we want to see is, you know an open democratized Ecosystem without vendor lock-in, you know, the traditional supply chain is very much characterized by the Necessity of dual supply or multi-vendor supply That's going to be a necessity for the deployment of this type of technology as well Right, and that's for to ensure we keep costs in check and reduce Costs and TCO for adoption of technology over time. It's really really important So that's number one. The second thing is that Application portability is key. So so customers Who spend a lot of money in terms of? Legacy software and applications over the years, you know, they have to maintain An ROI in that and be able to port that that investment Through future generations of of their own of their own releases But also upon like you know, whatever new generations of hardware that they want to deploy upon So that needs to be a migration path to be supported. So those Those set of customers today, you know, particularly over the last, you know, 10 years for example If you take NFV or network functions virtualization as an example, they've they've deployed applications and network functions based on a You might call it a standard Infrastructure, you know consisting of you know server-based networking, you know and standard standard nicks, right? And they're quite tightly bound as Shakira said, right? And so but they need to be offered a way to migrate those applications to DPU and IPU type technology and and the type of Acceleration services then that they wish to avail of, you know need to be Transparent in the sense of infrastructural acceleration that they wish to adopt be it for underlying crypto capabilities underlying support for storage use cases or Kubernetes acceleration Use cases or general networking use cases that needs to be transparent in terms of how the Applications offer accesses those and leverages those But also at the application layer itself. There may be some apps in the network to be referred to as network functions They too might wish to Avail of acceleration capabilities, you know through offloading portions of the application, right? That could be for example a piece of a firewall you might want to offload or a network function like a user plane function in the telco world again, the interfaces that those applications leverage to to facilitate that need to be Need to be open and need to be standard or are pseudo standard and And on this aspect here last but not least on this one the validation effort that Particularly customers in the you know in the in the 10 space You might want to call them like, you know the network equipment vendors the Ericsson's the Nokia's of the world Cisco's of the world The validation effort that they need to Apply when they're consuming this technology needs to be minimized, you know what to mean by that is that? You know they they have again they've invested so much in their in their own software And they want to maintain that investment And they don't want to be having to validate, you know next generation of software on different vendor devices Be it an Intel device and video and Marvell device, you know, that's going to incur a Appellancy really like this going to be unsustainable in the in the longer term in terms of meeting a TCO So they want to be able to validate against the standard API's, you know, that is common across Multiple vendor hardware that would be a general goal Then in terms of software Software not just in the sense of you know portable software sacks and SDKs that vendors provide Again, which has to be quite sophisticated from firmware, you know through to drivers through to orchestration through to manage ability capabilities not just in terms of those SDKs that need to be Developed and enhanced and supported gen-on-gen across vendor devices but remember those devices have to be Qualified in in major server OEM hardware, right and they have to be Validated in in OS vendor distributions, you know So again, there's a rich tapestry of ecosystem alignment that has to take place And all has to be choreographed across timelines and releases So that has to happen and that's like the TCO. I mean and all these things feed into into a TCO but again, you know particularly in the in the telco world, I mean it's It's a it's a constrained environment in terms of balance use generally and You know, there's there's often there's there's severe pressure on costs and bottom lines So if the technology won't cost in versus what they have today in terms of traditional infrastructure It's going to be very difficult, you know to see it adopt it. So so all these things, you know need to come together and You know, that's why that's why we're here really in terms of trying to form this community and see it evolve and because if we don't Unfortunately, like, you know will be in a situation where the adoption of the technology could be pretty sporadic it'd be niche like and It'll be difficult to adopt Alright, so that really is the the raison d'etre the rationale for for the the OPI community or the open program of infrastructure community So so I've got a few slides about how the project is is structured, you know So at a high level the main objectives are Number one to reduce the variation across different implementations, you know, so We don't want to have, you know, different ways or our bunch of different APIs or interfaces by which you stand up a networking use case on this technology or storage use case or how security is handled or how provisioning And life cycle is managed for a GPU. We want common approaches here We want to reuse standard APIs that that exist already in the industry. We want we don't want to start from scratch There's already like, you know, great investment and progress and value In existing frameworks DPDK being a been a great example You know, we want to leverage those where we can and We want to recycle any best practices that exist out there today where it makes sense, you know, again best practices when it comes to EG programmability these devices are Extremely sophisticated in their programming capabilities So whether that's like, you know, tools around specific domain specific languages like like P4 Or whether it's like a common schema for manageability. These are things we want to see put in place And lastly here A key goal of the project will be to provide Implementation examples reference platforms so that we validate what we're doing. You can't just be a place where we come up with API definitions It needs to be a place where we put them into practice where we can test out Use cases against the API is that we do develop and prove them out, you know, so one of the one of the goals of the project would be to have a You know, a vibrant a vibrant developer effort a vibrant lab effort and testing and so on, you know, such that we can, you know, really test and and Ensure like, you know that the that the overall solutions are working So lastly, so, you know, there's three things I think, you know, that are key to the to the project itself Want to be able to independently boot and provision the DPUs. I mean, that's a a key thing that we want to see Accomplished with the technology We want to ensure we have that security air gap between the infrastructure and and the host The tennis running on the host Environment and to do that like you know, you need to be able to separate out the the lifecycle and provisioning of the IPU and DPU from from what's happening in the host and the host BNC then we want to be able to Provide the capability to program and operate technology and by that in terms of the project we want to drive POC's we want to be able to drive Use cases want to be able to drive test cases, etc And lastly, we just want to be able to stand up those use cases end-to-end It's not probably it's not going to be enough just to kind of pieces of the answer We need to be able to showcase for example a storage use case end-to-end right so that So that users can see the value it can be able to stand up an example out of the box Similarly for for networking, you know for if one of the goals of the project for example would be to Provide an accelerated offload the capability for kubernetes networking, right? It's not enough just to provide the schema or the high-level architecture for that. We want to be able to prove it out overall structure of opi it would be Very similar to what you see in other projects. We have a board of directors And and a junk from that we've an outreach committee focused on marketing We have a technical steering committee or TSE which then runs the the main technical sub projects of opi Right now we've got four of those and they're kind of aligned to what we've been talking about We have provisioning and lifecycle workgroup, which is focused on what it says provisioning discovery boot sequencing upgrades and Metrics and telemetry as well, right super important and then we have a api and behavioral models also super important because Essentially, these are the the set of common api is that the project needs to develop to be able to abstract away from the underlying Vendor SDKs so Intel is a vendor SDK and video has a vendor SDK so does Marvell other other device vendors But a key objective is that we have an abstracted layer above that such that You know customers and the broader ecosystem can onboard to you know without having to get into the specifics of of each vendor implementation So we have an api and behavioral model Workgroup focusing on that Then we have use cases So again, we're focusing primarily on storage networking and security in the first instance And those use cases like in a will need to land somewhere So therefore we have a developer platform workgroup as well, which is designed to ultimately stand up a community lab in opi and similar to two other two other places where we can run RCICD from a testing perspective, but also where we can have you know full developer access as well for the community like to see these use cases implemented on real hardware and Ensure that the the api's that we're developing are functional All right, ultimately that would evolve into what I think would be a reference architecture for for the opi community itself, so So that's the the main structure, I mean so around those Working groups, you know, we have you know a vision workgroup with minimum requirements workgroup as well There's no orientation workgroup for folks who are onboarding into the project You know, so so these are nothing surprising. These are these are support workgroups in the main body of the of the project from an Intel perspective We have contributed a project called the infrastructure programmer developing developer case or IPDK It's now a formal sub project of opi and what our objective here is to do is just Contribute our learnings and that we have across in the space to the broader community And I think it video has some plans there, too. Exactly Obviously on top of dbtk, we have open source interfaces and link to dbtk in other open source Platforms so internally in a development of such a platform as well that we basically contribute to the project Yeah, okay, so how you participate well, you know, it's You know join the project. I mean so there's an orientation workgroup in there, you know that you can access via Access via like you know the through the website to the github. We've got all the links for that in a couple of minutes And join the slack channels, you know, so it's very easy just to onboard yourself and get involved in the mailing list Understand, you know areas that are important to you and part of your companies So please have a look around and see you like, you know, what is it? I'll interest you When we looking back in the history The word generation of NPUs and pews were introduced one of the leading companies obviously in Intel but they haven't taken off and I see the DPU being a generation to reinvent and Basically make this NPU era more successful when we're looking at the NPUs they meant to accelerate network processing Now we know more about network processing network to become infrastructure. We call it now infrastructure processors Obviously we learned a lot over the time dealing with these protocols and I think the main failure of NPUs were their lack of common Programming paradigm every vendor has their own language every when they had his own programming Direction, we don't want this to happen to DPUs or IPUs as well There is massive demand and the infrastructure needs to be accelerated the infrastructure has to be dealt with Accelerator and custom computing and I think the DPU provide this type of Needs in a more efficient way from a power dissipation perspective cost perspective perspective and what needed is that the community come together and Then build up a platform that makes and community adopted approach where vendors like ourselves produces licking but the community itself defines how to Program it so what we can take away these massive demands from many customers In in my role as an architect. I have been engaged with big players small players and There's huge interest because The evolution Building your own silicon by some companies is not possible anymore There's only a handful of companies can build silicon using FPG's have been always challenging and Here for the networking world. I think the DPUs and IPUs of a huge massive opportunities But this customers that we engage with always asking for the programming Portability and all this. I think there's massive move what we need to do is together basically Make the programming around Open programming and open APIs together work with more participating in the OPI project and Succeed what we failed with MPU's Such a cure So with that folks and that wraps up Our presentation so again, we would encourage you to to look into the project and Definitely consider contributing. It's easy to participate You know the links are here in the presentation to the mailing list and to the Slack channels And you can join the join the community meetings easy to contribute on on get up everything is is laid out and And lastly we'd encourage you know folks here from your respective companies to actually join the product join the the project At the Linux Foundation because you know without the funding afforded through the Through the membership like we wouldn't be here. So So thank you all any any questions from folks regarding presentation. Yes For devices and we've got the PDK or similar or Good for applications of the source but even in the ad we just had a hard time to adapt it to Thank you So what's your vision how to accommodate it because or it sounds like you would like to have some kind of open field standard for that for computing, but it's very ambitious and I would love to see it, but it's challenging for what I think. Yeah, you wanna try that? Yes. Maybe try and summarize the question first. It was a broad question, but a good one. Okay, the question is about, obviously there are different vendors with wide range of technologies being put on DPUs, so obviously data pass processing for the packet processing, accelerators, we talked about, GPUs and AI cores for potential AI deployment within the DPU, and general purpose processing, and the diversity is massive, and obviously everybody trying to show off with certain niche markets and niche use cases, how are we in a world that's gonna combine this type of technologies? And I fully agree with you. I always give the example between homogenous computing and heterogeneous computing. Homogeneous computing is great because it's all the same unit and you can program GPUs, CPU cores, but if you wanna deploy acceleration, you can't get away without accelerators, so you have to deploy accelerators, you have to deploy custom computing. So when the DPUs are from my as an architect, it's a heterogeneous platform, and programming heterogeneous platform is a big nightmare because one programming paradigm doesn't apply, I agree, but there's another way to do that and you highlighted obviously the general purpose processors, you do have a general purpose processors acting as an interface, and our experience if custom accelerators, very normal function, custom accelerators, we don't need to program these accelerators, we only have an API. So everybody will implement this type of function like crypto offload, in a way that will have open interface like a kernel based TLS, KTLS has been one of them, so everybody vendor has its own accelerator, but they try to implement the KTLS in a way to achieve acceleration and higher throughput. I think going forward, I think we will see evolution and I think if we not start working and make our hands dirty, we will not see this evolution and this evolution will be twofold. General purpose processing will be providing the standard interfaces, beneath, we will have programmable data pass, probably P4 will be the most dominant programming language to program the data pass, and then we have the accelerators and we have the AI. Again, on the AI side, AI is going to have probably a common interface or your program, TensorFlow is probably one of them, and then you have the accelerators and from an accelerator perspective, I see this to be bunch of APIs, so there will be APIs. If you don't have the accelerator, the software will deliver it at a slower pace. Yes, crypto, yeah, exactly, it will evolve, more standards will come, but in between, you will see APIs that one vendor offer, you will not find on another vendor, and the conclusion out of this is this vendor, if they don't have the necessary accelerators, they will provide a software version running on ARM cores. It will be slower, but you still have the same interface, and over time, this vendor will may, in the next generation, introduce these accelerators. I think it will evolve. I agree, but I think it is an evolutionary process, and the programming paradigm will be the general purpose processor, the Linux interface, and probably the P4. These are the two things I see. The rest will be more API-based accelerators. Yeah, I 100% actually agree with what Shakira said there. So, and again, you have to cater for legacy, so we have 10 or 12 years of folks using or building DPDK-type applications. If they want to come on to this technology, like you have to find a way to onboard them, it's just a reality, so APIs like RT flowers, they're going to be important in the DPU-IPU-type context. I think there are, I agree that there are some use cases that are a bit more conducive to getting more commonality in terms of implementation. I think security is a good one. Storage is a good one as well. And with respect to networking, maybe things like Kubernetes, or common alignment around Kubernetes-type offload, could be done too. So, if folks concentrated around those, and to Shakira's point, it's difficult to program these devices in the hardware. So initially, software is going to be the way to go. You've got very beefy core subsystems on these devices now. Like to be able to put in place a software implementation that works. While over time, you evolve to the type of domain-specific language like EGP4 that will emerge as a common way to program the pipelines across different vendors, and using an agreed upon schema and tool chain. But that's again, but it's going to take time. It's going to take time. So I think diversity is there. You can't get away from it. So it's going to evolve. It's going to take a bit of a journey. One question, we have been flagged up with the red flag up. One question. It's not possible, obviously, every DPS has its own software stack and function, but long-term, I think, with this OPI project is to democratize means common APIs, common function. And I think it's going to be like, I think it's going to be like, I think it's going to be like, I think it's going to be like, I think it's going to be like, I think it's going to be like, everybody will offer 70, 80% what OPI is putting, and then the remaining 20, 30% is something you differentiate from your competitor. But if you, somebody who wants a second source, they will stick to the 70% what has been offered as common and ignore the vendor-specific function, and then there will be niche markets where the customer don't care over a second source, and they will basically take advantage of this unique features like having an AI engine on it. Obviously, they may want to deploy the debut as part of the H, you know, Cloud H with 5G, next year it's 6G. So I think, I don't think it will be unique one common platform, everything. Everybody has to differentiate itself from their competitor as they have a competitive advantage. But if you not get 80%, 70% common, customers will not be happy. Okay, thank you everybody.