 Starlings. I assume, you know, depends on our marketing folks, you know. Hey everybody, welcome to this session. My name is Paul Miller. I'm the Chief Technology Officer for Wind River. And you may have seen the keynote earlier this morning. This session is really about Starling X and some of the edge use cases that we're taking it through and a little bit of the architecture. As CTO, I'm not going to go to the molecular level like some of the guys in my team can do, but there's some folks here too, so if you have questions I can handle, we'll see if they can. Yeah, so without further ado, we'll kind of get started. A bit of an introduction, right? This is now a top-level open-infra-foundation project up there with, you know, Zool and Kata and OpenStack and the other projects that the community is working on. And you can see here, oh, you can't see here. There we go. Where the releases are hosted, the Git repo, as well as the ISO mirrors are staged at SendGen in Canada. So there's places there that you can go. And obviously the video here and the slides will be available afterwards so you can find that. And then from the other sessions that we encourage you to attend, we have a Starling X-101 that will take you through the entire architecture and the initial deployment of Starling X into a virtual, and I think also maybe a physical environment, I forget which one we did on that. So you can really get an overview as to how to turn up your first instance and start playing with it, as well as how to contribute. We'll have some links later in the deck here to share that with you. So why did we do Starling X? Why did we found it? And it's kind of a really interesting story. And I talked a little bit about this in the keynote, but maybe more depth here. When we started looking at distributed systems and distributed computing, we really found a new set of requirements that came forward, right? And when we tried to take the open stack that we had used previously and try to map it into it, where you then use perhaps Nova compute instances that are remotely located from the controllers, trying to deal with neutron and those kind of environments where the complexity of a distributed system like that introduces a lot of challenges. And as we moved more towards Kubernetes and container run times for virtualizing applications, taking a Kubernetes master at the center of a site and then 100 miles away putting a worker and having intermittency on that connectionally created a lot of problems, right? And so that really started a fresh look at, well, all right, this edge compute thing is going to be different. We can't take the existing approaches and map it directly into the topology that's going to happen here. The result of that is we felt it had a new set of requirements and that's what this slide is about. The first is reliability. In a lot of cases in these edge compute domains, you're dealing with environments that are high availability and high reliability, right? So you've got five nines or six nines requirements for the deployment, not three or four. You may be hosting real-time workloads that are in automobiles, aircrafts, telecommunications infrastructure, et cetera. So it's not kind of the classical IT enterprise back office quality that you need. You need a very high reliability architecture. The scalability problem changes where in data centers of old, we had, you know, of old, we're still building them quite a lot today, but you had the need for hundreds, two thousands of servers within a room not dissimilar to this, right? And you had very reliable connectivity between these systems and enough problems dealing with that, right? I mean, certainly as we turned up OpenStack and pushed RabbitMQ to its boundaries with hundreds of compute nodes, that's a big enough challenge, but as you start looking at edge and distributed cloud systems, you need to scale still to thousands of nodes, but now these nodes are all geo-separated, right? They're maybe running across an entire country with tens or hundreds of miles between every site and all kinds of variable latency between these systems, right? And so that scalability is different. We're still scaling to a high level, but now it's a geo-distributed system, right? And that introduces some new requirements. The other thing is at the edge of the network, you often have a highly sensitive cost factor, right? If you need five servers or a three-server redundant architecture plus worker nodes at 10,000 sites, but you could accomplish that same workload in one server, you have just an absolutely massive impact on the TCO that a customer has as they build and deploy these systems. So we realized that we needed what we call a hyper-converged architecture, where we would converge and optimize the control functions into a smaller core allocation so that we could fit them on a single node or a dual node for HA so that we could have a small footprint at the edge amongst the high scale of the overall system. Alterable latency was an interesting problem because as you move to the edge of the network and you start supporting workloads like virtualized RAN or edge compute applications, virtual reality, things like that, or augmented reality, you need ultra-low latency in the microsecond range. And so you need to bring in a real-time kernel and an optimized Linux environment on the host OS to have the right environment to host the actual applications that run in the edge of the network because they're very different than what you run in the core. Many times we even see NVIDIA GPU or Intel accelerators present in the server know that it's the edge of the network because it's such a high performance environment. So ultra-low latency became an important thing. We talked about edge security earlier today but this is really the huge footprint of the system creates a significant problem for intrusion and attacks. It's much more physically distributed. It's not in a single room with guards around it. You've often got bunker sites that have no human being present in them that has a component of your cloud. And if somebody can attack from that, either from the application or from physical intrusion, that creates a set of new security requirements that have to be addressed. Life cycle management, and this is a difficult one because I had used OpenStack since kind of the fulsome grizzly days. And going through the upgrade process with OpenStack was very, very difficult for a long time. And so we felt when we found at Starling X we needed to have that intrinsically as part of the architecture from the outset or it could potentially take years to bring it into the architecture. So we have the ability in Starling X to, from a centralized dashboard, do patches and upgrades dynamically out. You can actually select subsets of what we call sub-clouds, the remote clouds, and upgrade just them or upgrade all of them, do them in parallel and do them sequentially. There's a lot of feature set capability within Starling X to handle life cycle management of the infrastructure. Not so much life cycle management of the application. That's more an orchestrator function but of the infrastructure that we're responsible for. When we first founded at the Open Infrastructure Foundation we really felt that doing this as open source and having a community involved, which is one of the reasons why we're here today, we'd really like to see greater community involvement in what we're doing here. As we talk a bit towards the end of the presentation about what we're doing in the future I think we'll be excited about some of the challenges that are in front of us and potentially be interested in helping. So a high level picture here to help you understand what Starling X is, it is fundamentally a distributed cloud. You can see across the bottom here these would be considered edge sites and as we just mentioned a moment ago you can see a single server instance which we call all in one hyper-converged. Each of those single servers, only two processor cores are used to handle all the oversight functions at that site. You notice the Kubernetes logos there and OpenStack logos. A site can be either an OpenStack site, a Kubernetes site or both. In the case when it's an OpenStack site, I'm sorry, a Kubernetes site, you have a Kubernetes master sitting at that site that's running the local worker function. In the hyper-converged single server node that worker functions on the same node as the Kubernetes master. So you deploy pods and containers within that single site. Now if that single site doesn't need high availability because that high availability may come from overlap in the network or other functions, that's totally adequate to achieve a low cost of deployment. If however you do need high availability at that site, you can go to a duplex configuration. The duplex configuration is still hyper-converged. You still have networking, Kubernetes master, storage, all the control functions of Starling X and the worker functions all on the same node. You just have a pair of nodes that behave in an active-active failover architecture so that you can implement high availability applications at the edge of the network. And then on the left-most side, again, whether you do it with Kubernetes or OpenStack, that site can be dedicated servers for control functions, dedicated servers for storage, dedicated worker nodes. You can scale to hundreds of worker nodes so you can have the data center architecture but still have that site be a component in a larger geo-distributed system. The other key thing here is that notice the OpenStack and Kubernetes logos. If you go to a site and want to deploy OpenStack on top of it, it's actually deployed with all the control functions, Nova, Glance, Neutron, Keystone, etc. as containerized applications. So what that means is, fundamentally, Starling X is a cloud-native first approach. It runs Kubernetes and the container runtime. In this case, it's now container D directly on bare metal. When you run a container, it runs on bare metal. It's not running inside a hypervisor in OpenStack. And what that allows us to do is take those OpenStack services and spin them up as containers and deploy them. You can deploy OpenStack with a Helm chart. It's extremely fast to deploy OpenStack this way. As a cloud-native application and then the worker nodes, you can decide, you know, that worker node at that leftmost site, do I want that to be a Nova compute instance or do I want that to be a Kubernetes worker? And you can have these things commingled. So a single site can have both Kubernetes and OpenStack running within one site or you may have a site that you just want around OpenStack or sites that you just want to run Kubernetes. The nice thing about this is you can have geo-distributed OpenStack and Kubernetes in a commingled architecture where you can move from Kubernetes to OpenStack freely and reallocate nodes and do that in a single technology stack. A lot of people will spin up an OpenStack instance in a Kubernetes instance and run them as two different infrastructures. In this case, you're able to run OpenStack as a guest within Kubernetes to use the same architecture to support both technologies. And then finally, you see at the top there the central cloud instance. We call these system controllers. They create the center of the control plane in the single pane of glass even with the UI and CLI that lets you see the entire topology. I'll show you a screenshot. And actually we'll go to it right now so you can see the screen as it runs. This dashboard gives you the ability to see all the sub-clouds, filter them, group them, see the central controllers and manage them. So this is the part of the application that it implements a control plane and visibility and control over the entire distributed system. In doing that, it does use a lot of legacy OpenStack services in the architectures who accomplish this. This has always been kind of one of the weaknesses of Kubernetes is it's not a full infrastructure. Starling X turns Kubernetes into a full infrastructure like OpenStack is. And then with this visibility, you can see here the sub-clouds and the central cloud, you can see their online offline status, their deployment status, whether there's alarms present, synchronization of their software version as to whether they're patch current or not and actually pushing and filtering and move around the architecture and go and maintain and you can click on the top left of this and switch to a particular sub-cloud and work within that sub-cloud and deploy workloads and manage that infrastructure. So it's an attempt to create a system by which you can create a geo-distributed deployment and manage all those sites individually or look at them in totality as sub-clouds and central controllers all from a single control plane. And so effectively it's a distributed architecture. It's designed to be a distributed architecture but one that downstreams into it OpenStack effectively as is. We don't hack OpenStack to make this happen. This is using conventional OpenStack that becomes containerized or Kubernetes. It's a CNCF certified distribution of Kubernetes within Starling X so it's not a variant or in any way going to cause any application compatibility issues with Kubernetes. The standard Kubernetes API is exposed for example. So as we get to the a little bit more of their architecture we'll have a better picture coming in a moment. You can see a lot of the components here. Kubernetes, Cef, CollectD, ContainerD the OpenStack components on the top there and the thing I wanted to emphasize here is the new Starling X services because as an infrastructure project the way Starling X works is a lot of these things like Cef or Kubernetes or even OpenStack are basically coming downstream from other open source initiatives into the project. But then what does Starling X do? The code development that happens in Starling X is about what we call the Flock services or these Starling X services you see in the middle that do things like the configuration management for software upgrade that deal with fault management across the distributed system that deal with software management for upgrade and life cycle management that create the control plane that creates the visibility across the entire deployed architecture. So that's the unique part where Starling X development is writing code is taking downstream projects that provide the critical technology components we need but turning it into a distributed system without breaking those downstream components. I thought we'd give a few examples as to what this is being used for and this is some pretty neat stuff and it's really tied to what's happening around the edge of networks and the edge of networks are being heavily impacted by a lot of contemporary expectations that consumers have. People expect to have a self-driving car now or a car that can do accident avoidance and what we see here is an architecture where that edge node is actually in the car connected via 5G to the central cloud through a wireless connectivity service. Now once you create an architecture like this you can do some amazing things. The workloads that are running in the vehicle now become Kubernetes containers that can be dynamically updated and life cycle managed using cloud native principles. This is very different than what the automotive sector had for many many years where you'd have to bring the car to a dealership to plug it into upgrade any software. Now it's becoming a CICD type architecture where the automotive providers can dynamically update using the entire topology of this system as a cloud native environment right. This gives them the ability to build what we call mixed criticality applications. You may have some systems in this vehicle that are safety critical managing things like engine management and collision avoidance. Other systems in the car that are perhaps running you know Android or Linux that are doing the cabin entertainment functions and screens and dashboards right and those can all be mixed within this environment and run as an edge system. A great application for that actually is what's called vehicle to vehicle accident avoidance where you need this connectivity because the vehicles need to communicate with each other to avoid each other in the real world and that requires a low latency connection which is a great fit again for Starling X as we talked about that low latency kernel is ideal for these real time applications at the edge of the network. This example is actually coming out of France where there's a lot of work going on now to redo the energy grid from coal coal powered centralized energy distribution to wind and solar it changes the control of the grid from a centralized one to a distributed one. So they need a distributed capability to manage software applications that run the grid. Guess what Starling X it's a distributed cloud you can manage all these geoseparated sites from one place and manage all of your control applications for the energy grid. We have augmented reality and use in manufacturing in real estate and construction where you need that low latency connection to the goggles so the remote edge node that's sitting at the construction site or in the manufacturing sector is directly connected to applications that is creating visualizations perhaps overlays in a manufacturing environment to help you understand how to assemble or how or faults are coming from. That's an application that runs on that edge node at the edge of the network. Oops and this thing just hyperspace to give me one second In the manufacturing sector itself another really interesting area is cloud robotics. It used to be in the early days of robotics that people were concerned about just the application on the robot by itself but now manufacturing environments are distributed cells that they move and reconfigure the cells on the fly even potentially using private 5G to connect these manufacturing cells together. Here now you can have the control and management and reconfigurability of the manufacturing cell because as I move that robot system into another environment it's Kubernetes. I can dynamically reallocate an application and reconfigure it using cloud native principles because the entire environment in the manufacturing floor is now a cloud. It's just a distributed cloud across all these different cells that are being managed as one virtual environment and that's much easier than just trying to launch VMs or manage containers on a per robot instance. You need something to tie it all together into a framework and again that's Starling X. The final one that is really quite incredible is what's happening in particular in the U.S. with a program called JADC2 and ABMS or Advanced Battle Management System where they're looking at fully virtualizing the components within the aircraft. Don't be worried this isn't going to happen in your Boeing aircraft in the way home. There's a lot behind this but fundamentally the chief software officer of the U.S. Air Force has been quoted as asking for the plane to land with different software that it took off with. They need the ability to dynamically change the workloads because of the tactical environment that they're in, right? Missions information, weapons control navigation information, etc. So here again it's another edge application. The aircraft, the boat, the military team is the edge site and the centralized control and management of all the software applications and data is Starling X. So some pretty exciting applications as these edge applications come out. As we move to the architecture and some of the components here just to double click on that slide we had earlier, a couple of things I want you to take away. Low latency Linux, right? The current project is a sentos base with a Yachto real-time kernel that's merged into kind of a custom sentos build that achieves a low latency Linux platform. So if you deploy Starling X it comes with the Linux that is the host operating system. You don't deploy a host operating system and install Starling X on it, right? It comes with the Linux because that is a key component of the performance that we need. A lot of the orange boxes there with Horizon and other components that come from the OpenStack family that help us manage this kind of architecture. You see Kubernetes up there and then obviously OpenStack as a containerized OpenStack for workloads. So what that means is if you want to deploy OpenStack in Starling X, it's a guest with respect to a virtualized component, right? So it's deployed as containers within the Kubernetes architecture. And then the purple boxes that go around here, these are the areas where software contributions come from the OIF and the Starling X community. Distributed Edge Cloud on the top right here, this is about the distributed control plane that has awareness of the entire topology. Configuration management and software management, we have infrastructure orchestration that really refers to zero-touch provisioning. Zero-touch provisioning was added because if you're an operator trying to deploy thousands of clouds, you can't be manually turning up every cloud and configuring it, right? It's too slow. So we actually have Redfish, IPMI and Pixie support within the platform that natively what it does is it reaches out. You can deploy a site completely dark, OS on it, powered off. And the system will reach out and turn that site on, deploy the first controller, OS virtualization platform and scale out to the other controllers and workers and deploy the entire site with zero human intervention. And we actually have some videos I think demonstrating that this week. It's pretty amazing because then you can deploy hundreds of sites at a time in parallel and they're all happening automatically without any human intervention other than setting the configuration for it to be and give us the IP address of the BMC in the server and off it goes, right? So some pretty exciting stuff there. When we look at Kubernetes, it's what I would consider to be an extremely standard Kubernetes control play master and worker node architecture. You've got the host OS and the abstraction and namespaces that happens there. Container D we've moved from docker to container D as we've moved forward. Support for SQLC and I, Multis and SRIOV for high performance networking. We actually have some unique support for GPU and FPGA including NVIDIA GPU and FPGAs from Intel for ran acceleration workloads including the management of the drivers and software for those accelerator cards within the system. We did this because in the edge of the network where you've got these small footprints and these high performance requirements, you really need to welcome in and accept the fact that there's going to be accelerator and you need to think about it as a generic server but that there might be other assets there and if that's the case, well, we should include management of them as well because there's thousands of sites, right? And then obviously the scheduler controller manager and flux and whatnot up in the control plane with the Kubernetes master functionality. Now in a hyper-converged architecture like I mentioned earlier, this entire stack runs on one node, right? If you have a separated Kubernetes master and worker, there are different physical nodes, physical servers, right? A little note on, let's see, how am I doing on time actually? Five, okay. Self-healing control plane. This is a really interesting thing where if you look at the architecture on the left, we've got these central sites and there's a couple of examples there, small node count or high node count and then down the bottom again edge sites where you can see a hyper-converged node or one that has duplex configuration for high availability or the one on the right on the bottom that has the separated worker nodes, again it doesn't matter, you can configure each site but notice that each site does that little brain picture, that little purple brain, that's the starting X control function, and what that means is that if you look on the right hand figure where you have a legacy enterprise cloud solution, if there's any intermittency in that WAN connection which is what's often going to happen when you have an edge site that's far away from a central site, severing that link you lose all your control functions, you can't scale out pods, you can't deploy new services, if that's just a Kubernetes worker out there and the Kubernetes master is back at the central facility. With Starling X there's control at every site, both Starling X control and Kubernetes master, so if you sever that link everything continues to run, if you're local to that site you can log in, see the dashboards, scale out reconfigure networking, deploy new applications because it's a fully connected cloud, when that connection is restored it's re-synchronized automatically as part of the Starling X control plane to bring that back into service, capture logging events that were lost and bring it back in as an in-sync sub-cloud. So that's how Kubernetes is running as a cloud native environment, but Starling X is providing the management of the distributed system. I think we've talked about this scaling extensively. Just a note on OpenStack deployment, OpenStack can be deployed as a cloud native application that you understand it's a Kubernetes cloud native application from the context of Starling X, but this gives us the ability to completely deploy and manage it in an extremely high-speed efficient way. Once you've got the Starling X bare metal management and the ability to deploy these sub-sites and sub-clouds and central controllers, you can now use Helm to just deploy OpenStack to those remote sites as a cloud native application. It's a pretty nice OpenStack. Zero touch deployment we talked about. This is software update automation where from within the dashboard or the CLI, what's being shown here is you create the concept of a patching orchestration. So if you want to update a component of the infrastructure software, you can upload that patch onto the system and then choose how you deploy it in serial or in parallel into certain groups of sub-clouds of the entire system as an orchestrated upgrade function itself. And this is how upgrades and patching is performed within Starling X all supported today. A quick note on one example application. You've heard earlier the 5G conversation. This is really what it looks like and why it was a good fit for Starling X. So I thought I just put one slide in here and get you familiar with that. If you look at a 5G architecture, it is a distributed system. At the core of the network on the right-hand side, you can see it's running on the cell tower. This is the hierarchical architecture of a service provider. They move from the core of the network all the way out to the cell towers. And the DU instance is running on the cell tower. Now you see here the scale in the network. At the centralized site, there's three server images, then you go to two, then down to one. That's representing the change in scale at the data center. So at the core of the network, you may have hundreds of servers. At that DU edge site, you may have one. But you may have 50,000 of those DU sites. And this entire thing from the edge all the way to the core can be managed as one distributed cloud with Starling X. That's the different thing about it is you don't have to deploy each site as a separate cloud and try to tie them together. It's the control plane automatically ties all these sites together and manages them as one as we look towards the future. A couple of things going on here. One is the desire to extend the edge. You may have seen this in some of the examples there where as you look at this, this is really attempting to defining the concept of core to edge. Where at the extreme left, your core is really represented by centralized data centers or even perhaps public cloud environments. As you progressively move out to a regional data center and edge data center, that edge data center can be used in the enterprise perspective or convened edge as we just displayed in a service provider perspective where it's the edge of a cellular network. But stepping beyond that edge, you have this device edge. The device edge is where the automobiles, the aircraft, the robots come in play with either Wi-Fi or private 5G connectivity. The idea is for Starling X to now as we move forward embrace this device edge and bring it in so that these embedded device targets can be used in the infrastructure and we can deploy workloads into those devices as well as conventional server targets. And in this way we fully cover the entire topology of the applications that are being built. So if an automobile has a function in it that's software based and a component of that application needs to run at the infrastructure edge, that can all be managed as one virtual environment, as a true distributed system. I'm going to encourage you to, although we're some of our videos, if you're interested in this, we have examples of edge deployment and performing upgrades and some of the things I've talked about the team has gone through in detail and shows how those things work and the commands and UI as we walk through those. So it's worth taking a look if you're interested in this kind of distributed edge technology. And then available to you offline of course is the location of the documentation, mailing list, images, how to do that. I'll stop there. We're probably pretty much against the edge here, but we'll be glad to take any questions if you have any we can chat. And of course Wind River has a booth out there where the founder, one of the co-founders of Starling X and glad to talk to you if you want to stop by and chat about the architecture or any of the questions based on what I've shown you today. Any questions? Gentlemen said there is a mic over there, but if you ask I'll repeat there are devices, firmware updates and stuff like that. Can this also be customized? You're talking about NVIDIA and pre-built stuff, but if you have some custom device, can this also be handled and patched into? It can be what I would say though is this is the area where I think we'd look for participation from the community to understand what that application is and bring it in. What I would say is the architecture is supportive of that and it's supported GPU and FPGA and if there's some other component either a variant of GPU or FPGA or something new, smartnecks whatever, you probably find a lot of the framework that you need but you may need to contribute to get the full functionality that you want. Anybody else? You get to go again. You talked about the mechanism that from the control server you can let's say deploy a subcloud but the subcloud itself let's say is installed on its own, on the initial node. Is this configurable on the central controlling cloud or which nodes belong to the subcloud? We use what we call and it's partly because of the security topic that we add here a centralized declarative model which means that there's a YAML file on the central controller that defines the apology of that subcloud simplex, duplex, duplex plus workers, whatever services are on it and once that is known the system then can deploy that site to match that required topology right? And then you can deploy each site in a different topology with different hardware it doesn't care, right? So it's also managed by using Rekish for the distribution of that in the, let's say autonomous subcloud? Yes, and effectively and I'll ask Eddie to correct me if I go off base here but we use Redfish to attach and deploy the first controller node at that site and then that controller node then can be used locally to stage the remaining servers. So the zero touch deployment starts with the deployment of the first server and that's usually remote over a WAN. You don't want to do that for all the other nodes because that link may be slow right? But then once that first central control or edge controller comes up in the subcloud it has all the software so it can deploy the redundant controller and other worker nodes all automatically flown through zero touch deployment. Anyone else? He's going to ask another question if you don't ask me a question. All right, well good. I think we're at the end. Yeah, so thank you all for coming. If you'd like to stop by and chat more we'd be glad to talk to you.