 I think we're going to get started. My name is Shridhar Basan. We're going to be talking about building OpenStack near the users. I'm a lead engineer with Comcast. This is Anthony Vega. He's our network SME at Comcast. And Sean Collins. He's our lead developer, Neutron developer at Comcast. So thanks. A little bit of background on what Comcast is. Comcast for people who don't know from North It's a very large ISP and service provider in North America. We do internet service, video and telephony and over-the-top services for millions of customers in North America. We have the world's largest IPv6 deployment. And we have multiple services distributed within the United States, close to the customers. So this talk is about deploying OpenStack close to our customers and some of the challenges and what things work for us. So the traditional model of configuring and installing OpenStack is to do it in large data centers. We do have a few of those. Data centers, national data centers have a lot of redundancy built into them. They have dual power sources, dual cooling, and also network redundancy. We took a different approach to this model. Some of our services run, we do have OpenStack regions which run in our national data centers. And certain application files fit into those. But what we're here to talk about is deploying OpenStack close to our end users within what we call our regional data centers. We have a lot of these smaller data centers spread all over the United States. Being a video provider, we already have points of presence all over the United States where we serve customers. So it made sense for us to deploy OpenStack close to our customers for certain use cases which we'll go later in the talk. Actually, can you go back, I think? So each data center, the regional data center has its own AS and networking. We have a traditional leaf spine design. We have multiple racks of servers with their own top of rack switch feeding into a single spine. We decided to not have a lot of redundancy in our regional data centers because we expect the application to kind of be aware and the redundancy move into the application layer. So I'm gonna talk a little bit about why we came up with the design that we have and what it's done for us. So the first question is why would you split up? Why would you decentralize your entire environment and split out into a multitude of different locations? Besides what Sri said earlier about us already existing in points of presence all over the country, we had already had these sort of regional networks which have their own independent network connections to the end users because that's the point of aggregation where all of our markets come into our core network. So they're their own connections. They don't have to cross a backbone link to talk to a customer. They're close to the customer. Being the point of aggregation for a customer market means that there's a lot fewer hops. It's logically only three or four hops away from talking to one of our data center points. It's a lot less latency because you've got a lot shorter distance to cover and fewer links to hop through. Apps like Video and Voice, they want predictive latency. They want low latency. They want a stable connection. When you try to take from a customer end point and bring it back to a national data center things get a little more unpredictable the further you have to go. So this actually enabled us to reduce jitter, reduce latency and keep a more stable connection and a better quality. So the other thing about this is when you take a look at the more Amazon or Microsoft Azure centric models you've got these large availability zones but there's only two or three of them. They tend to take, in North America they tend to take an east west coast approach to availability zones where they've got one or two in the east, one or two in the west. You can only pick one or two of them and you have no control over where they are or how you're distributed. And when they go down, they go down. I don't know how many of you in the room remember when the east coast Amazon zone went down and took major applications out with it. When we go to this model of the distributed, highly distributed regional data centers we kind of take a lot of that danger out of having a single data center fail over. Because we've got these markets all over the place these designs usually incur a pair of links to the backbone. So what we did was attach the data center to each one of the routers that has these backbone links. So if we lose a data center even in a specific market there's a backup in the same market available and if that goes down there's backups in nearby markets or all the way back to the national data center. So we have a lot less sort of footprint for failure when a data center goes down. So what did we do differently? You're gonna notice that we're probably one of the only people who's got a really large neutron production environment. A lot of people talk about why neutron's bad, why they don't like it, why they don't want to deploy it. We actually really like it and the way that we deployed it is a somewhat better model for us. We're using provider networks and we're not using the Elf Brigade way at all. Which means we don't have the issues with HA, we don't have the issues with being in performance bottleneck through a single host. So we can actually take all of our VMs and attach them to a trunk network straight up to our spine switches for L3 termination. We also, our pioneers in this, Sean and I are actually active on the Neutron IPv6 subteam. There was no IPv6 support in Neutron up until Juneau. That's because we needed it. So all the way back in the Grizzly days we started working on it and actually had it deployed in our Havana builds and the code has now actually been upstreamed. So what are we running? So our regional data centers run within the data center all of the NOVA components, the Neutron controller and the agents for each compute node. We also run Glance locally for each regional data center. And then we also have limited block and object store where within each regional data center there is a Ceph cluster that provides both block and object storage. And then within that, so there is no replication across regional data centers or within like having a global namespace where if you create a volume or you create an object that is replicated across markets the failure domain domain is just within that regional data center. So it is left up to the application to replicate objects across depending on their storage requirements. So the things that we gain from having this design is primarily within the keystone design. So we have a central keystone where identity, the tokens and then also the catalog which maintains the list of end points for each region. It allows application developers to quickly recycle pieces of provisioning code that they have be it through HEAT or through NOVA where they can launch instances in one RDC and then with a simple change of the actual region that they're using to make calls to the OpenStack API they can quickly take all of their stack as it were all their databases, the application servers the web servers, all that tier and they can quickly just stamp it across all of the markets within our deployment. So it creates high amounts of reusability for the actual provisioning code with small minor changes to turn their applications into cross the country that are aware and then they can start building in fault tolerance into their application based on their needs rather than in a more expensive national data center model where you have redundant links, redundant power and all of those decisions have been made for you in advance because there are some applications that may need different types of availability and can also tolerate failures in a way that is more economical and then also gives the application person the person developing the application the ability to decide what portions of their application need to be replicated across or portions of it that not necessarily need to have uptime or are resilient to failures. So what did we lose? We obviously lost the capability to tolerate network failures within an RDC because the way that we're doing it is is it's a single switch with no redundant power. We also have a learning curve because a lot of the application developers are used to the traditional model where provided that they install their application on the server and it pings and they can reach it. All the other things that went into that that were typically the domain of the engineering team and then also the operations that was handled for them quote unquote for free, meaning that they didn't have to worry about it. Obviously there needs to be a lot more training that we had to do with application developers to explain to them the advantages of having this type of arrangement and then also explain to them what parts of their application need to become fault tolerant. And there is also the complexity of the open stack environments where for example, when we have a centralized keystone that serves out all the authentication and authorization, we have to work with the community to help scale token storage. So we're really excited about some of the ephemeral PKI stuff that is coming down the pipe and then also scaling the centralized keystone so that that piece of the open stack service is resilient and can tolerate failure. So that even if you had one of the nodes that is serving the centralized keystone went down, there's still other nodes that are available to honor requests so that if you lose keystone but some of the regional data centers are still operating, you don't actually lose connectivity. One of the other challenges that we have is since these regional data centers and the open stack installs inside of them are separate, the tenants also have to replicate some of the items within open stack. So for example, security groups, if you create it in one regional data center, you have to have some sort of logic in the application to replicate that across so that things also like your glance images and other pieces, they're not shared at the same level as the keystone. I believe SSH keys are but that's the exception, not necessarily the norm. So there is some overhead to that and we would definitely be excited to have projects in the community where certain things, including quotas for example, some of that I guess would be pushed up into something like Congress or another type of project where those things would be stored across regions because replicating it by hand is difficult. So is that it? So I went a little bit into this before but the main challenge has definitely been training customers, also putting in like the amount of service level that they should expect in this type of thing, in the regional data centers, telling them that there is a very good possibility that if there's power interruption or network connectivity that their instances will lose connectivity or will be powered off. So they need to have systems that are monitoring all of these types of things and perhaps when one regional data center in a market fails, they need to have logic that'll start provisioning other instances in the other regional data center to catch the load as it fails over. Another challenge that we have is using Tempest. We want to use Tempest to validate production installs before we turn them live and then it would also be really nice to be able to use Tempest to continually check that services are still available and are still consumable by tenants. So there are patches that we have in the works and that patches from other members of the community that we were very interested in seeing get merged. And then like I mentioned before, there is risk to having a horizon and keystone as the single point where all of the access is going through. So we have to be proactive and make those services more resilient and also allow them to scale out as demand increases since they will be the point through which all the other RDCs do their authorization and logins and things like that. Does anyone have any questions? So I'm microphone at the front. So what I didn't quite catch and maybe you mentioned it, but how are you doing the centralized keystone? Is it through keystone federation or do you have the whole cells architecture? So at this point, we use keystone. The only thing that the keystone, central keystone has is the authorization database that it stores. The actual authentication pieces, it is sent out to Active Directory through a driver that we maintain. And then the just the service catalog, which is just the list of endpoints that are available for every RDC that is currently live. And then also the tokens that are generated when a user successfully authenticates. So that's in the central keystone. Correct. So then how do the, and the other RDCs are independent clouds, right? All the other components, they are just independent clouds and their endpoint is listed in the centralized keystone as an available region. Okay, so they get their service catalog from the central keystone and everything else is, okay, I get it. Yeah. One last question on the neutron front. Using the provider networks model, do you guys, what's the compromise there? What functionality do you lose, if any? I'll take that. So you lose a couple of built-in features, right? So things like firewalls of service, low balancers of service aren't actually gonna work because you don't have the L3 agent there to pipe or intercept things. We also don't actually have floating IPs anymore. This is kind of a big challenge for people who are coming from the more traditional Mova Network model where they've got private IPs on their hosts for the fixed address and they have to work on floats and some people get used to the idea of a float being a sort of pseudo permanent address, a static IP, if you will, for an application and that's one of the retraining issues that we have sort of an issue with that a lot of people in traditional networking spaces assume that their IP is always static and that they can always have this IP address. We're getting around that with port reservation by pre-building ports and neutron and detaching them before Mova terminates an instance, but the whole concept of having floating IP is just no longer there. We're assigning public IPs as fixed except for tenant networks, which are a non-routed internal network. So we actually explicitly don't route tenant networks on purpose because our multi-tenancy is a little different than sort of a public cloud's concept of a routed network that only one customer can get to because our customers are all internal projects they all share the same space in our data centers normally. So the only thing that we're adding here is an attraction layer between the front end points to all their hosts, like putting in an HA proxy instance and then attaching all of their web servers to a private tenant network to do the hosting. Yeah, and using some ugly DNS hacks like round robin records or trying to automate some sort of GSLBs in some instances. You gain a couple of things out of not using the L3 agent though, like I said, besides the performance and the HA problems with the pre-DVR neutron. You also get the fact that this is a direct L2 trunk to your upstream switch. So if my network engineer guys wanna come in and say, well, I need to change some base level ACLs so that I can do security blacklisting or if I wanna do in any cast address, I can actually do that without having to do anything special because I control the routing upstream directly to the L2 termination point. Hey, how do you do orchestration in terms of all the data centers? Are you just running heat in each one and you deploy on each one individually or do you have some sort of global orchestration thing that deploys an application across all data centers at once? Are you referring to it as a cloud operator or you mean like how do our tenants orchestrate? How do your tenants orchestrate? So our tenants use multiple tools. Some of them use JCloud or Fog to orchestrate across multiple regions. Other tenants use Horizon to kind of people who are not very savvy with the CLI use Horizon or the Nova tools, like the OpenStack CLI tools to spin up resources. All they have to do really is like change the region name for the environment region name and then they can hit any one of our regional data centers. So you generally in terms of your application that you deploy the same kind of cookie cutter in each data center is not like I'm gonna deploy this port of the part of the service here and another part somewhere else. That's right. So we want application owners to deploy their apps across all of our regional data centers because they're serving, typically they're serving those local customers within that region. Okay, thank you. So I'd actually like to expand on that question when it comes to deploying cookie cutter apps. Since we've got this split out sort of geographically all over the country as part of keeping things off the backbone and keeping things low latency and close to the customer, what we generally see is if they're gonna deploy an app in one of the regional data centers, it's usually serving customer edge or it's something that's local to network management tools in the area. So you'll wanna have a copy of it all over the place because you have to serve the save function everywhere. So instead of having one piece in the Northwest and one piece in Central, they're gonna have it the same pieces all over the place. So it's the normal model for us with some exceptions. So in most cases, their data is sharded by geographic region. Is there any specific reason why didn't you try to distribute the Keystone? There's a horizon I can understand. I mean, you don't need to have, but at least the central point of authentication that. So the Federation model just landed and Jay, I think, right, the Keystone to Keystone Federation, we wanted to have a single ID for our tenants so that they didn't have to figure out like, now I got an auth against this region with new credentials and that token is only valid for that region. We wanted to make it simple so that our tenants can validate it against a single endpoint and then go to any one of our regions. Still, you mentioned that actually most of the activities are running. Okay, I do understand, for example, then the problem of policies because that would be a problem to synchronize. But anyhow, you mentioned that the actual authentication runs through active directory, right? So making the Keystone distributed without any need to do kind of Federation would be even possible and will guarantee you a certain level of availability of the service because you remember that all your, if I don't know, cut off the optical fiber between the central node and one of the remote node. No service is accessible, no? Yeah, so. Yeah, I'll take this one. So centralized Keystone for us isn't actually only in one data center. It's sort of a mesh. We've got Keystone running in all the NDCs that are running this environment. So what we do is the actual endpoint the authenticated against is around Rob and HA proxy. So the HA is there in the fact that it's actually running all over the place. And the reason we went with this, we realized that it's a little cumbersome to have to start changing region names all over the place. But it's better than having to put in an FQDN for each of your different locations. Remember what the FQDN is, all you need is a region name. And this is actually a precursor because we've been looking ahead and we know that there are more things coming down the pipe that are gonna start integrating this like a centralized glance with full replication across regions and there's a lot of stuff in the works and there's a lot of stuff that we wanna improve on and we like to contribute code to the community to improve on. So at some point, all of this becomes something that I can run centrally and distribute across all of my environments so that my users have the ease of saying, all right, well, this is an RDC application. So I'm gonna run an RDC script to deploy this to every OS region with region name that starts with X which would be an RDC delimiter so that they can deploy through all the nodes with the same commands and with the same objects all over the place. So basically you mentioned that actually even though it's seen as centralized, this is distributed, then what do you use to synchronize the database? Do you use the normal MySQL Galera? Do you use Postgres with the... We use Galera to sync all of our keystone databases and we use Galera within the RDC itself even though we mentioned single points of failure. The API nodes itself, there is like HA available within that. We don't want a single node to go down and take out the entire region. It's less likely that a network equipment is gonna go down rather than a host. The hosts are much more likely to go down. Yeah, there've actually been some studies in the past that put the mean time to failure for a top of rack switch somewhere around 10 years. The mean time to failure for a hard drive is gonna be less than three in most cases. So we tend to focus on making the HA on the compute side and the controller side as opposed to the network. It's a lot cheaper too. Any other questions? Thank you.