 All right, we can get started welcome everyone So today in this session, we're going to be covering some new deployment architectures that we're looking at for Starling X And we'll go over you know some details about you know what the requirements are for those what we're seeing as input based on our industry deployments with Starling X and overall we'll go through that as a Detailed architecture review and then at the end we'll open up for some questions in case you have anything that we'll have for that cover That we have to cover So I'm Matt Peters. I'm a principal software architect at Wind River for the Wind River cloud products and I'm an active contributor to Starling X and a member of the community I've been part of Starling X since 2018 so the original inception But even before that I was part of the open-stack community working with Neutron and and Nova and the other other communities as well So let's get started So I will start out with a bit of an introduction to Starling X just in case folks are not familiar with it But I'll focus on the aspects of it that are applicable to these deployment architectures that we're going to talk about Within this session, I would like to cover our Proposals around the hybrid cloud architecture. So bringing Starling X to the the public cloud and how to manage those systems at the edge We're going to be looking at new deployment architectures for geographic redundancy So specifically around our distributed cloud and how we're going to be managing that in a distributed fashion for geographic survivability We're going to look at some optimized network deployment configurations as we deploy at scale This is becoming an increasing demand for us to be able to improve and streamline our network configurations So let's take a look at Starling X and specifically what's applicable to this this session So Starling X itself is a full-stack offering for Kubernetes container platform It provides manageability of the physical infrastructure and the deployments at the edge So it's set up in a hierarchy with centralized services that are running within a regional data center And then we have a set of distributed edge sites that are providing the the connectivity for things like 5g and other edge applications So right now this deployment architecture is based on a bare metal deployment So that regional data center is deployed on physical servers in these regional data centers and they're providing the manageability for those edge sites that are all running Starling X Kubernetes and the the end-to-end deployment architecture for the distributed cloud One thing to note is that these are each all independent Kubernetes clusters So at the the central region we're running a Kubernetes cluster that has all the centralized services that we need to manage the distributed system and Each of the edge sites are autonomous autonomous edge clouds as well. So overall we treat it as one big geographically distributed system But we do talk about the centralized components separate from the edge where we look at some of that manageability So let's get right into the First main topic, which is the hybrid cloud so we're really seeing a push to streamline some of the the operational costs and Capital expenditures that are occurring within the overall deployment So with the regional data centers depending on the geographic distribution of your edge sites You may have a number of these deployed across the national network So from a cost perspective being able to operate each of these independently Or even to have dedicated systems is not as cost-effective as having a more centralized system So we looked at a different options for you know being able to support both a scaled up Distributed cloud system for those centralized services as well as what we could do to be able to support portability into the public cloud So if an operator was looking to be able to Optimize for their overall costs consolidation. We wanted to have that portability and that flexibility So let's talk about the the solutions So in order to achieve this we have two changes that we're looking to make to Starling X The first being the centralized services being hosted on public cloud So this is not a full Starling X stack hosted on public cloud But specifically to distributed services that are related to the centralized management and we'll get into what those functions are in just a minute And then the the portability so as part of this, you know Certainly we want to have the the public cloud support But how do we make it so that it's portable between the different public clouds? How do we make sure it's portable to our private on-prem as well? so if we start with the The starting configuration where we have a bunch of centralized services running at the the central location Managing the the far edge sites So those centralized services today provide things like remote install remote deployment of all the edge clouds So this is a zero touch provisioning of the the edge sites all from the central location We have full life cycle management, which means that we have to have configuration and state synchronized Across these systems so that we understand their operational Performance it as well as being able to manage them after day one So once we get into day two operations, we have to have full life cycle management So that includes things like software updates and upgrades So when we're looking at a mass deployment, we need to be able to not only manage the individual systems But like I said, we treat it as a complete distributed cloud So how do we manage this across the the entire distributed network and have all these services? available for that overall life cycle management In addition to that there's a number of shared services that we do in this hierarchy So in the you know in the case of deployment of a sub cloud We have things like a container registry that we need to host We do it as a hierarchy for scale So not pulling all from the same, you know private registries or public registries depending on your deployment configuration We want to be able to set it up in a hierarchy So we use these regional data centers or the centralized services to provide a A container registry for the the subtending sub clouds So that extends to things like some of the identity management Any of the the booting and installation that we do so hosting the media So by having all of these things in this hierarchy allows us to scale it out to whatever geography that we need So under an individual system, we can get up to like a thousand sub clouds But if we want to scale beyond that or even scale what we're doing for distributed cloud Then we need to be able to scale those services as well So the proposal is to take those set of services and move them to Or be able to host them on a public cloud Kubernetes. So this means that we are Porting the existing starling x distributed cloud services into that public cloud infrastructure So this is to be able to host on things like the amazon eks Azure aks and google Kubernetes engine So this is really about trying to Provide that portability so that we can run on any public cloud offering that provides that kubernetes environment So decoupling, you know, some of those things from the starling x components We'll get into You know what some of that decoupling looks like but the the intent here is to make it fully portable amongst them So this offers several Different reasons why we want to do this But one of the enabling functions too is that we want to be able to migrate from our existing regional controllers to public Or back depending on what the operator requirements are So if they if they realize that they they want to start out with some physical infrastructure Because they're not ready to invest in public cloud. They can do that We have the ability to migrate today sub clouds between different system controllers So being able to migrate from on-prem to public cloud is already something that we could be able to achieve And something we want to support So it all starts with the the decoupling So if we take a look at the distributed cloud services, these would be a completely containerized microservice architecture Today, this is integrated into our system controller function within starling x So they are integrated services that are managed by starling x on the platform So by decoupling the more that gives us some of the flexibility to run on any of the kubernetes clusters So in that public cloud Configuration that we saw we're looking to use and leverage the the managed infrastructure from the the public cloud So that we we don't really need that extra layer of the the starling x infrastructure management at this particular component So if we're certainly needed it when we're talking about the edge and the life cycle management of a physical Server all the software management that we do at the edge But these services specifically can be decoupled and runs directly on the the public cloud infrastructure Because we have the public cloud this also means that we can take advantage of some of the managed services So instead of just the uh the public cloud infrastructure for kubernetes We are able to leverage some of the other uh components that are offered or services offered by the public cloud So these things can include things like a managed database service The message bus services like rabid mq a lot of the different public clouds offer these at scale that are our managed services Which allow you to move up into you know more of your application space looking at your Looking at scaling and managing your own services rather than the the underlying infrastructure So whenever it's available we want to take advantage of those managed services for that increased scaling capacity As we'll also see uh in the the next topic We'll be looking at some of the geographic redundancy and a lot of these public clouds offer High availability clusters in different regions and we'll talk about how we're going to even leverage that for some of our geo redundancy So one of the interesting things about this architecture is that because it's portable we can use the exact same Design for our private cloud So rather than having something that's you know separately deployed specifically for the public cloud We can do the private cloud in the same way So again, we're starting with just a regular kubernetes cluster This would be deployed as part of starling x. So if we're talking about on prem, we're talking about the full stack offered by starling x That brings the the containerized platform all the life cycle management that we have for that platform And all of the management that we need for that bare metal services So the distributed cloud services I mentioned there's a decoupling aspect So today the distributed cloud manages both the orchestration for that centralized controller As well as those sub-clouds. So when we're talking about public cloud infrastructure We don't have to manage the the the public cloud infrastructure directly We can provide the the deployment and the configuration that we need for that public cloud But on prem we still need that so there is an integration activity that still occurs between the distributed cloud services and that system management that happens So that portability gives us the uh flexibility to deploy in hybrid clouds Public cloud with the centralized management or strictly private cloud So let's talk about uh the next Major topic, which is geographic redundancy So this does have a relationship to what I I just discussed in terms of the the public cloud, but what we're really You know being asked to do is provide the ability to have an extra layer of survivability So we have the regional controllers or we have a public cloud Deployment that is in one particular region, but we want those services to be always available So in other words full management and operations and controls needs to be maintained So that's independent of any catastrophic event that could have happened to an individual site So, you know most of the the demanding industries today for so most of the industries today demand this type of survivability So we have redundancy and local availability, but if we're talking about you know an entire data center disappearing How do we maintain that manageability at the edge? So the the solution that we're proposing is a redundancy model specific to our distributed cloud architecture So the the edge itself is already, you know massively distributed in the geographic regions We're not talking about the the edge here. We're talking about that centralized management function We want to make sure that under whatever conditions that we're operating in we can always maintain that visibility Because if you lose sight of what's happening of your systems, it's effectively not providing service You don't know whether it's providing service. So you need to maintain that manageability So starting with the our system controller design. So the the central systems run a local high availability cluster What i'm showing here is a bare metal deployment with our typical Starling x deployment with two controllers and a number of worker hosts So depending on the scale of what you're doing for your distributed cloud You have different physical server configurations that you may require But the general principle principle is that you have a local kubernetes cluster It has high bandwidth low latency networking all contained within that data center And then the the edge sites are you know remote distributed systems Uh within this we want to be able to layer in a second region So we're taking that exact same configuration and deploying it into a separate region So this region is separated by geography So we're talking about hundreds of kilometers or hundreds of miles separation So that if there is a catastrophic event, it's isolated into that particular region or at least, you know separate data centers or deployments So the the overall design is to provide a shared state across the distributed cloud services So this is a synchronization for both configuration Site availability and a replication of the overall deployment of or visibility into those sub clouds So if one of those sites were to disappear Then we have the full manageability from the other site and it already has the replicated state that we need So we've chosen not to take a model that is distributed across You know taking a kubernetes cluster and distributing it across So we're doing this at the distributed cloud layer Because it eliminates a lot of the complexities that you would have with some of the the failure scenarios We can isolate it to the service availability the service management all within a particular cluster But you still get the manageability and the Capabilities to be able to do all of the things that I talked about before Like the remote install the configuration management all for day one day two Another key requirement around this is that it has to be completely transparent to the edge clouds So if you're If there is an event that happens at our system controllers today, we have autonomous sub clouds If a system controller goes away, then the sub clouds continue to provide service So they're not impacted we have to maintain that model So we don't want to jeopardize what we have today and the availability that we have today Based on you know adding this extra layer adding this layer redundancy So the the switch between them or the failover whatever you want to call it has to be transparent So in order to achieve that replication what we're doing is proposing a sub cloud peer groups So these are effectively just an organization of the sub clouds. So the those physical servers are at the edge Organized into groups that we can then designate as either primary or secondary sites across a set of system controllers So in this model, we're not restricted to you know a one plus one You know in that picture that I drew on the the other one where it just has the the replicated site What we're actually showing here is three sites. So you can have them all actively providing service managing their sub clouds But having some spare capacity so that if we need that redundancy and be able to move the management of those sub clouds We can move it to another site So taking the the example here of region one we would replicate Sub sub cloud peer group a and b to another site. So I've simplified this it's just going to another region We could split them. We can have it any other combination that you need for your redundancy model It could even be replicated out to multiple sites So not just one replica, but you know several replicas depending on what you require for your survivability So again, we're you know just expand on that model We have all of our spare capacity being able to be distributed amongst all the available system controllers that are available within the overall national deployment And you have full redundancy across your distributed cloud services for the manageability of those sub clouds that are designated to a particular peer group So this is a logical entity It's a grouping that the the operator can define So they can define whatever parameters that make sense to them for that redundancy. It can be proximity to their their sub clouds themselves So they want you know the system controllers near their sub clouds It could be based on their manageability or policy that they have within their their own data centers So with this model, it makes it very simple for us to be able to move the the management between the different systems So if we're we take just the one sub cloud peer group as an example It has the set of sub clouds that belong to that group. They're all actively being managed by a primary site But what we want to do is to have it so that the individual sites Can take over at any time. So using a simple priority based system We can, you know, have the the monitoring or the heart beating or whatever external services that is validating the health of the system So this could be, you know, internal to what we're providing or even an external entity that's making a higher order decision Being able to integrate into that allows us to switch that friendship between them based on priority very simply Because we're talking about Management control plane functions, we don't have to switch over in, you know, sub second or even second time frames So we can make decisions about what the overall network topology looks like so that we can avoid things like, you know, network partitioning You know isolation of one site, you know being appeared that it's, you know, unavailable when it can actually be providing service to its sub clouds So in this example, we actually Continue to provide heart beating service to those sub clouds so that we can incorporate that into our decisions So, you know, when you're trying to establish a quorum or understanding whether you want to actually move these sub clouds to A new location You don't want to take that lightly because you're going to be effectively moving the management all the way over to another system controller You don't want it thrashing around you want something that has policy included You want to make sure you're making the right decisions So this simple scheme allows us to provide that management across those different Systems in a in a mechanism that is straightforward for us and the operators to understand So if that one site goes down It's a very simple algorithm to move the the ownership over to another system controller and that prime ship when it moves Is transparent to those sub clouds, so they'll just start reporting their Their availability their manageability will all come from that new system controller So from their perspective, there's no change in their operational state But for the operators to be able to continue to provide that management function to them All right, so with that what we've discussed is really a layered in approach to our distributed cloud architecture where we can provide that geo redundancy across our distributed cloud services but next we'll look at that Sort of a continuation of some of the improvements that we're looking to make around starling x in the deployment So we've gained a lot of experience in you know deploying at scale We understand some of the pain points that our operators are having with that deployment and understanding You know where we need to make improvements and one of those is in the the network optimization So before we jump into some of the improvements or things that the the operators are looking for I want to start with where we are today with their our network configuration So what I have depicted here is a single system a multi-node system So it's a little more complex than some of the the sub clouds that we have today But if you I wanted to provide the more complex example So you can see what some of our operators would have for larger scale deployments of those sub clouds so these An individual system has many different network segments that are attached for different functions So whether it's you know operations and management interface that's provided More of a public or an internal Available interface versus a an internal network that's specific to the system We have these different segments for security reasons for operation operational reasons But this also adds an extra layer complexity for our operators because with this flexibility means they have to assign Network addresses. They have to just define the v-lands. They have to set up the networking and routing for these interconnected hosts And when you are repeating this across, you know, tens of thousands of systems Then it becomes a pretty large management function to be able to plan out your network and be able to Make sure that you have the right resources allocated We also have a combination of layer two and layer three networking So we have redundancy provided internally and externally for some of our addresses based on floating IP So if a physical server goes away, we want to continue to provide an endpoint for that service So it floats between them Which means that we do need a layer two network today to be able to move those services We also have requirements for multicast traffic, which is Well known to be a problem for people to deploy and get rights So whether it's you know initial deployment and multicast is just not working or it's just not contained within the scope that they're expecting So we really want to simplify, you know, what we're doing around the the end-to-end network configuration that we have for our system So this comes in several different forms. We have Key optimizations that we want to make one is in the reduction of the overall number of addresses we have for the system We want a simplification of the The overall network architecture. So, you know, I drew that other diagram that shows the complexity that's there. How can we simplify it? How do we reduce that down? We recognize the fact that not every deployment is the same So some, you know, some operators will demand that we have, you know, dedicated network interfaces for certain functions We have physical network partitioning or even logical with VLANs So we even though we want to simplify it, we still need to maintain some flexibility And finally, it doesn't stop at day one So after deployment, you may, you know, recognize that you want to renumber your your networks So you can't just go in there and say, okay Well, I'm going to redeploy when you have, you know, tens of thousands of systems So day two operations and network reconfiguration is very important So we have several initiatives to be able to solve a lot of these issues I'm going to dive into one specifically around the address reduction Or subnet reduction, I should say Because I think that's the biggest gain for our operators But we are looking to make improvements across the board in terms of the overall Key optimizations that I identified So we will be looking to eliminate the multicast traffic using configuration for endpoints Not discovery through multicast We will be offering Network reconfiguration for all interfaces or all networks Today we have some restrictions on certain internal networks So removing those barriers so that, you know, we can renumber or reconfigure any of our network interfaces and overall network addressing But let's take a look at the The network model that we have today and how we're looking to change this for our deployment So the I within our Starlin X as you saw with all those different network segments that really maps to Multiple network functions. So if you you have platform networks is what we refer to them That's really just a classification of what its role is within the system So whether it's oing m management pixie booting We have these all separated out from an internal function perspective And that today translates to separate networks So we we have address pools and network segments defined for each of these functions So that's where you get into your your minimum deployment footprint means that you need a minimum of seven subnets just to deploy your system which Yeah, depending on the the complexity of your network and how you manage your your network infrastructure that can be complex So even though some of these are Internal subnets, which means that they could be reused by individual systems Some of them are not some of them need external networking, which means they have to be globally unique So the proposal is to go to Or have the flexibility to be able to do a completely shared model So we still maintain the the individual network functions But we have flexibility on whether we're using a shared interface or we're using shared address pools So whether we're using the same address or we're pulling from the same address pools to at least reduce the subnets We have the ability to reduce the the amount of addresses and subnets that the the operator needs to manage This also means that it translates to the layer 2 partitioning as well So as we look at our transition from layer 2 to layer 3 only networking Uh, we want to reduce the number of vlands that are required and you know having separate network segments for is problematic So when we look at a shared model, it makes it much easier to move to this So rather than having you know, multi-neted interfaces, you can have a shared model where the addresses themselves are shared So this allows us to get down to the minimum requirements of the kubernetes deployment Which is your your host level addressing and then your your specific address pools for your pods and your services We recognize the fact that you know, not all of the deployments will be able to have Uh, the network configuration to have that, you know, level of sharedness because of the security requirements or just partitioning that they want Uh, so that flexibility means that we have different options We can do public private or we can do that full shared model Or even go all the way back to the the original picture that I showed that has everything partitioned So let's do a quick wrap up So what we looked at today was a new portable and scalable distributed cloud deployment architecture for both public and private cloud We looked at the new geographic redundancy and survivability of our distributed cloud operations for both private and public cloud And we looked at uh some network optimizations that we're making to help our operators reduce the overall complexity of their network planning So there's many new industries that we're getting into with starling x So it's you know critical that we do this now to be able to support all of these different industries There's you know new requirements coming in new challenges. So we welcome Contributors to starling x. So if you're interested or interested in defining, you know, what this looks like for You know starling x deployments or other use cases that you may have for starling x Please come to starling x.io or join our starling x discuss Where we talk about these topics where we can plan out these designs and be able to support the The overall evolution of starling x Thank you very much Are there any questions? We got a few minutes right over here Right Because it seemed like Yeah, that's really to provide that that picture was representing the bridge between the the public cloud network infrastructure That's offered versus the the private infrastructure. That's being hosted by the network operators. So uh, it's a You know, most of the the public clouds offer some form of vpn service to be able to connect your private cloud It varies by the different public cloud Offerings, but the intent is that it's it's mostly transparent to starling x We are operating with that bridge or that separation. So we have to support anything that comes with it in terms of Network latency that may be introduced for the distribution because we're now talking about separation Today starling x already has isolation between or can support isolation between the distributed cloud central services And the sub-cloud. So we're very adaptable to Network latency or network disruption. Uh, so it should be fairly transparent to us It's just the different offerings that are available to connect your private to public cloud offering Okay, there's one question in the back first right So the way it operates today is it's completely autonomous. So in theory, they could run, you know, indefinitely You just may not have visibility or operation of any of the life cycle management. Uh, so from a Like locally the Kubernetes cluster is fully available. So if there's life cycle management requirements for individual services or pods, they can restart or recover The only time that you start to get into recovery scenarios for length Longer periods of time aboutages when you get into things like your certificate management, you know, a certificate That's about to expire and you need to maintain that connectivity secure connection How do you renew that certificate when connectivity is restored if it expired when you know, it was disconnected So today a lot of our certificates depending on which one we're talking about and which is used for different communication can be You know, upwards of 30 minutes for the window to to renew But it really depends on that overall configuration So, uh, I just know some out of time, but I can take other questions, you know offline if Uh, where it's actually hosted like in different ones versus just Uh, I I haven't really seen that particular use case Usually it's one cloud provider and using their different geographical regions or availability zones to deploy I haven't seen a use case where it would be, you know, multiple public clouds working together Uh, what we have is a no they can be transparent So we have uh the centralized system knows about the sub cloud addressing and the connectivity It has the ability to connect and securely connect to those systems What happens when they move is what we refer to as a rehoming activity We reconfigure the sub cloud to have a different management function So the one that's taken over says, you know, I'm your new owner I'm going to be providing your management function and it synchronizes whatever it needs to provide that management function So the sub cloud is effectively reconfigured On the fly to be able to have a different manager or a different management function Completely isolated They they will continue to run you just can't do Other operations So if you if you had to connect to it to do software upgrades or other management or maintenance functions You wouldn't have connected but they themselves continue to operate All right. Well, thank you very much. Thanks everyone