 Good morning, how's everybody doing? So welcome to kind of a deep dive into ASTARA. My name is Mark McClain. I'm a co-founder and CTO of Aconda, which provides services to support around the OpenStack ASTARA project. Previously, I served on the technical committee for a couple years, as well as former Neutron PtL. So I think when a lot of people wonder, like, what is ASTARA? What were the challenges for it? Why do we go off and write ASTARA? Why did we not do this inside the Neutron? And so I kind of want to touch on those a little bit today. And so when we started taking a look at it, ASTARA, as a project, was born from the time when I was working at Dreamhost with some others, and from the public cloud use case. And how can we solve the operational needs? So when we started taking a look at it, when you're running a cloud, you're going to have multiple services. You're going to have each SDN, L2 is going to be a little bit different. They're kind of difficult to change. In some respects, if you take a look at Neutron with like ML2, you can change some bits. But if you, say, have a monolithic driver, it can actually be a little bit more interesting if you're changing your provider over time. And one of the things is when we were designing the public cloud there is that we knew that services and vendors were going to evolve over time. It's just a fact of life when running in operations. And really day two matters. And so taking a look at it, those are some operational challenges we were trying to solve. But also you kind of have to balance that with, could we attack some problems in Neutron? And yes, we did in some places. And other places were like, this isn't a Neutron problem. This is an operational problem. This is an implementation problem. Not something that needs to go be fixed in Neutron because really nothing's broken in that area. And so kind of keep that in mind. It's like where we did things differently, it's just because we had a different opinion. And that's okay because when you realistically take a look at how Neutron's constructed, I mean, you can have a wide variety of solutions that solve a lot of different unique use cases and it doesn't invalidate another particular solution. So one of the takeaways is make sure it's not that what Neutron did with its implementations is wrong. It's just that for certain use cases it didn't match up with what we wanted. And so that's the good part about experimentations. The good part about an open ecosystem is that we can borrow and share and use different ideas. So logical Neutron, it's clean. You can make these nice pretty diagrams. I can log in Horizon and now the diagram bounces, it moves around, it gives me all kind of cool stuff. But underneath the hood it's a little messy. What's actually going on? How are we handling layer two? How are we handling layer three? Should we when writing systems handle layer two or layer three? So when coming with that for a STARA we made a conscious effort, a conscious decision to say we're going to be layer two agnostic. We think there's lots of solutions out there that do layer two fairly well, whether it's OVS or Linux bridge. We also prefer when you start talking about day two operations in terms of technical staff and operations to keep to use standard tooling. Some of the SDN solutions that do layer two and layer three converge are very interesting and provide a lot of benefit. But we decided to take a little bit different and really kind of respect the layers within the network stacks so that you have each layer is kind of doing it what it specialized best at. So with that, the STARA project was born. A STARA if you're wondering is basically kind of like a callback to the escape prototype name. Originally this project when we started was nicknamed the rug. If you've ever seen the movie The Big Lebowski there's a line in the movie about tying the room together. A STARA loosely translates into carpet. So that's how the kind of lineage of the project name. So within that, like I said, this came from the public cloud use case. So we wanted to be hyper scalable. We knew we were going to have a large number of end points. We knew we needed to, it was going to be supersized. It needed to be highly available. We also knew that we wanted to deliver services over the top. So specifically layer three and above services. We also knew that over time wanna have provisions to grow that set of services so that it was easy to add them, easy to integrate them. Yet, because it was born out of the public cloud use case maintain those open source APIs. Standard Neutron, Standard Nova, Standard Glance. If you write extensions, it makes it really hard on the tooling. Your customers can't interact with the cloud because they have to write a bunch of special goo. So we wanted to avoid all that. So like I said, the rug was where we began. So if you hear me say orchestrator or rug, it is the same entities. I use them interchangeably at times. So Reference Neutron, I think, you know, there's about a billion variants of this slide out there. You know, just to kind of differentiate it a little bit. So Reference Neutron, it's, you know, fleet of microservice agents. So you have layer three agent, you have DHCP agent. You have variants of those agents that talk within each other. If you're running DVR, it's even more agents. But, you know, conceptually, that's where it started. And, you know, that model works. And so what we want to do is explore an alternative way, which is what happens if we change this and provide some services, either in VMs or containers or hardware, you know, without all the agents and think about something that was a little smarter and could combine some services out. You know, I authored some of those agents on Neutron. And so, you know, it was interesting to look at use case. In this case, what we did is we simplified it a little bit by we have the Astaris service. Astar itself does not sit within the data path. We'll kind of hit on that a little bit later. It's purely control plane. Talks directly with Neutron server, you know, as well as sometimes I could put like a little dotted line that goes down to the message queue. But you'll notice we leave the layer two agents there. Astar is a layer two agnostic, OBS Linux bridge. We've run on top of other proprietary solutions, physical networks, it does support those. You know, again, with the Neutron reference, you know, you've got your hypervisors, you might have your full mesh, you have network nodes, you know, they're providing network services, whether you have one in the case I drew here or, you know, 10s or, you know, 50 of them, you still at times can encounter problems in terms of congestion, points of failure, you know, you can do high availability with DHCP and metadata, you can do those services. But we want to take an alternative approach which is what happens if you take those network functions and actually scatter them out throughout. So that if you have, you know, you can run these functions within VMs or you could run them, you know, lightweight services. And so if a particular function dies, you know, for lack of a better term, your blast radius in terms of tenants impacted is very tiny. So, you know, that's kind of alternate model, we were exploring what does this look like, how is it scheduling. And so a lot of times in the default case, if you were to download the code in GitHub, it's going to do a one for one, whatever service except for in, it's going to combine DCP metadata and routing typically are co-located on the same box within the open source components. So with that, let's kind of dive in a little bit under the hood. Like I said, it's, you know, it's control plane orchestration, it's logically centralized. We have pluggable drivers. Again, this is kind of calling back to one of our motivational challenges which is we knew we wanted to change technologies over time. Historically, historic support of the appliances that were based on VSD, Linux, you know, some proprietary appliances from big network vendors as well have, you know, been integrated. It's multi-process and multi-threaded. I'll kind of touch on that in a second. It actually helps in a number of different ways in terms of scaling. And, you know, we started development in Folsom it went in production in 2013. So, you know, while the project joined the big tents last fall, this is something that's been going and churning and been in constant development for some time. A star supports dynamic routing, both OSP, FPGP, it's pluggable, it, you know, I think the current appliance that we produce for kind of testing is based on bird, easily integrate quagga if you like that variant, you know, it's all, and designed for V6 from the ground up. You know, we started playing around with V6 even before Neutron had full support for it. It was a nice kind of learning lab. It was a good way to kind of, you know, try out some ideas and see what worked before some of the interfaces in Neutron were baked. And like I said, it's agnostic. So, you can run OBS, you can run Linux bridge, or something else. So, the Astar architecture in terms of control plane, if you look on the left-hand side, it's going to be Nova Neutron. You have the orchestrator. And then on the right-hand side, what you kind of see is what you see, like in the data path. So, you have at the bottom, you have your physical network. That can, you know, and then you have your OBS, Linux bridge proprietary. Some of the things we've done and kind of talked about in the past, if you've seen previous summits, we've integrated in with some of the switching that supports hierarchical port binding so that you could, you know, you can do a lot of interesting things with that. That's where you get that kind of OTT, L2 agnostic kind of shim layer there that enables us to kind of say orchestrate if we wanted to work in an environment like where you're running VxLand within the rack, and then your VTEPs, the hardware VTEP with the overlay across the tours, standard open stack APIs, and then you got routing, you know, low balancing firewall VPN. Today it supports routing, low balancing, still waiting on the firewall API to kind of get mature enough to build an implementation. And so, the architecture, Plugable actually has some cool things. So, because if you're running with service VMs, you can upload an image to Glance. One of the things we've recently put in Mitaka was the ability for a tenant to upload an image, and then so you could have alternate images for a particular, say, routing service, because fundamentally once you plug it into the network, you can, I mean, the plug-ins generally the same, so you can have different variants of image. It's kind of nice in terms of if you think about a public cloud provider because while you may have lots of tenants, they're not necessarily all the same or equal, and so you might need to provide special images for certain tenants, so that's an advantage, as well as it's driver-based, so low balancing, HAProxy, IngenX, IngenX Plus, VPN was rolled in, touched on routing a bit earlier. So, internals, we're all Python-based, I mean, that's generally what you find within the OpenStack ecosystem. What's a little different is a star of the orchestrator itself is based on Python multi-processing. An orchestrator instance will have multiple processes that all have master process communicate via the multi-processing internals, but within that there are several workers which will have threads. The nice thing is this gives you two knobs to turn. You can scale up your number of processes, as well as you can scale up your number of threads available, and the benefit to that is because each of the threads are typically dispatched to manage a particular VNF, if you have a VNF that's hung or you have a VNF that's slow, you're only impacting one of the threads. Pythons, traditionally, with some of the problems you would have with a ventlet, per se, where it might hang, we're actually using traditional Python threads. It's kind of interesting, they just work, they're fairly fair, not always good, because Python threads have a consistent byte code count before they yield, but what we found in production is that it was a lot easier, made development a lot easier, so if you take a look on the right-hand side of the screen, what you'll see is at the top we have a process, kind of a master process that will spin up a number of helpers, so kind of the big functional blocks is you have something that's responsible for notification processing, so it's listening from the telemetry coming out of Neutron, and so Neutron as the API service runs generates lots of events from Create, Update, Start and End events, and just a fire hose of events. We have a processor, it's sitting there listening, and we're listening for the type of event, the ID, but not necessarily all the contents of the event, because if you're listening to that fire hose, timeliness doesn't matter, so what we're more looking at is what's interesting about that event stream. So it processes it as rapidly as possible, generates an event internally, which is passed down to a scheduler, and when we say scheduler, realistically what it's doing is it's sharding across the number of available workers to then dispatch, so the key is basically based on tenant ID, and so then it will spread that out to a worker, and then within the worker, the worker then keeps a separate queue of say I need to go work on this tenant's routers, and then we'll further dispatch a particular router down to a thread, and so that rolls down, and so what that does is that if you have a particular noisy tenant, it kind of keeps the path, but you're not initially creating lots of congestion for everybody else. Within a worker, it's basically we keep a state machine, and some of the events, it's like, is this thing even up? So in the state machine kind of turns through, first thing we kind of do is we kind of figure out what we need to do, so if you take a look at the rights, kind of like a representation, so when you're using automated tooling with Neutron, you tend to generate a lot of events at once, it's not necessary to always process them all individually, you can actually take a look at the set of events, kind of coalesce them down, and then take an intelligent action, so while you may have eight events, they may all be seven of them, first one may say create, and the next seven may be update, so when you're taking a look as the worker gets to it, its first action is to kind of go through, scan the backlog of events unique to its network functions managing, and then kind of coalesce it down and say okay, I've got one create and seven updates, I can go do a create with a full config refresh, and then you're not sitting there wasting extra cycles. Other thing that's interesting about this is, let's say at the back end of the queue, somebody spun something up, and then they've used it, and then as they're working, they decide it's time to tear it down, what we've also found with automation tooling is sometimes you create a lot of update events followed by a delete, and so what we're able to do is peek ahead all the way in and be like, oh, I have a delete event, there's no reason for me to sit there and unwind and run all these config changes for updates that aren't actually going to matter. And then within a worker, we also have an instance manager, it's pluggable so that you could have varying types of instances. Traditionally with a star, it's been service VMs, but you could switch it out. The instance manager is capable of talking with novits, it's the interface where we abstract out, do I talk to Nova, how do I boot this thing, how do I manage it? With that, it's kind of like within each a star orchestrator, but one of the other things about that is you don't really want one node running because what we all know is hardware will fail or somebody will turn off the wrong circuit. I mean, you work long enough data center something screw balls gonna happen. So how do we scale this thing active-active? How do we have multiple orchestrators work together? So within that, like I talked about, kind of with the scaling up, we can add more threads that gives us stability. Within a star orchestrator, what we've found through it via testing is that a single orchestrator can easily handle thousands of VNFs. It doesn't really sweat, pretty easy to tune. For a while, this was also run active passive. They just scaled up, it kind of worked. But for that HA, we can actually expand the set of star orchestrators. We're kind of leveraging, the nice thing about being a part of a big community like this is we can leverage the two's library, it gives us membership, it allows us to kind of do clustering. It can be backed upon what ZooKeeper, Memcache. And so as you increase the number of a star orchestrators, we then are also using hash rings and kind of deterministically sharding the petition, basically sharding which VNFs were handling. So you kind of get that active-active. I could go through, I can add a third orchestrator, I can also contract the set, and it all kind of works. Now, oops, oops, back, okay. So like I said, this is implemented using hash ring. Again, one nice thing about this community is people have already tackled things like this. So we were able to take the hash ring that's been working in Ironic, kind of take a lot of that code, make it work for our purpose. Other thing we also did to kind of make things a little bit more simple is Workshare kind of really makes no assumptions about the current state of a function is. Because what you don't know is you don't always know when you have an event, say a contraction event, you don't actually know if the orchestrator is previously managing this network function actually did everything it was supposed to do or maybe it just died or hung or something like that. And so as you change that set, you actually have to go back and go through and ensure that the configuration is up to date, that there hasn't been any drift. It actually simplifies a little bit, expand and contract because you're not worrying about a late notification because if you have a multiprocess system, it's possible that host A is now managing in particular V and F, but the notification actually arrived on B. In this case, we don't really care. The notification will arrive on B, it's fine, it's just going to get dropped, but it's not going to be an issue in terms of how do we process it because we're basically going to assume a cold start and make sure that we have a fresh, fully populated config. So within that, within Mitaka, I talked about bring your own network function. You can upload a new image. It's something we're kind of excited about. The primitives are put in, I think when Newton, it's really kind of, we talked this morning in one of our design summit sessions, probably talk a little bit more tomorrow about that, about how we can expand this, what things we can support, active, active appliances. This was in Neutron proper and so it just took us a while to get that. So it was support for VRP, VPN as a service. So, you know, interesting thing is, you know, one of the little bit differences that orchestrator, or not orchestrator, but the appliance here actually runs strong swan whereas upstream is more open swan based or lever swan. So just a little bit difference there. And then instance pooling. And so instance pooling is very handy, especially from a VM based, when you're doing VM based services, you can keep a ready spare pool. The nice thing about this, you can keep pools of different types. So if you have load balancers, you can keep a distinct pool load balance of spare load balancing instances versus a spare pool of say routing instances. They're tunable, configurable. Nice thing about that is you can do rolling updates and you can pre-populate that pool if need be by growing it, letting it build and then doing an issue in your update so that can even make it faster. Combined with VRP, most of the time tenant doesn't even notice the update occurred. So within the ecosystem or within the ASTARA project, so we have the orchestrator and then we have a couple other component projects. We have an ASTARA horizon, which provides a little bit of integration of horizon. I'll show that in a moment. And then we have the ASTARA appliance. So while I said ASTARA itself is not in the data path, we needed something for testing. We needed something to show how routing as a service worked and that's something we could build against. So the ASTARA appliance, it's basically a basic router image. It supports BGP based bird, VPN, DHCP and is able to provide the similar services for metadata that you would find in the OpenStack and the Neutron version. It's Linux, it's Debian 8.3. Pretty simple to build, uses disk image builder. Again, re-leveraging components of the community. What we wanted to do is when building the simple implementation, differentiate where it made sense, but borrow and consume as much of the output of the rest of the community as we could because that's the benefit of all of us being in this big community. The ASTARA appliance itself, everybody kind of asks how it's configured. Fairly simple REST API just passes mostly an intermediate version of the config across. Could we have chosen a different protocol for support? Yeah, but we found REST was just kind of easy. Also made it easier for new people to kind of see what was going on and not have to learn. If you're not necessarily familiar with all the alphabet soup of varying ways to configure network appliances, we thought this was kind of the easiest way. And so typically what you find in interface with the appliance, the management network is gonna be wired up first on East Zero. Occasionally Nova will plug this in a different place. How we kind of counteract that is at boot time we pass via cloud in its commands that will tell the appliance that this is the MAC address of your management interface. And so from the appliance itself is able to go scan its available interfaces, look for that MAC address and appropriately configure that one as the management network. Typically within ASTARA it's historically always been IPv6 based. It just because we wanted to be IPv6 first it does support v4 occasionally. That question comes up occasionally. And then the other interfaces you'll find heath one is typically external network and then you have two through whatever which is tenant networks. And then another component is the ASTARA Neutron project. I'll get questions about what this is exactly. It's a small chimilar which enables ASTARA to better interface with Neutron. One of the challenges of being an implementation is that sometimes you need to run ahead of where the upstream community is going or other times you need a chance to experiment and then figure out if this is the solution before saying, hey let me go write a spec let me go make changes in Neutron because some ideas are great some ideas look good on paper and then reality need to kind of play around them. And so this is what ASTARA Neutron handles. It does provide a traditional layer three plug-in that you would standard, you know that service plug-in that you would consume standard and an install and a config file for Neutron. And provides a small M2 wrapper is interesting. I was chatting with some folks yesterday evening about this and they're like, why does this even exist? And part of it is for to add some support for some features that necessarily weren't in Neutron currently today and also to kind of test to see if it's, you know validate that the plan actually works. And so one of the things we'll hope to get in Neutron is to actually get rid of this ML2 wrapper and push those small tiny changes upstream and make them work in a wider array of folks. So like I said, long-term goal let's get rid of this thing. It's a very small set of code but it also, you know for us being an implementation the closer we are to mainline Neutron in terms of REST API and interfaces the better but that's just one of the components that's there. And so I touched on that. So let's, oops, so now the fun part. So demo time, you know, let's, you know picture container ship throwing massive fireballs I'm usually cursed with these things. So, you know, so one hand it's like, okay hey, this is standard horizon. We've seen this, that's our goal. You know, the horizon interface should be the most boring part but behind the scenes what I'm gonna scroll here so it's kind of giving an idea of what's going on is the historic orchestrator is running. It's looking for changes. It's looking for, you know what modifications are going on within the network. So if I were to go in as the demo user and let's just say I've got a simple topology setup. You know, so I've got a router. I've got two networks. So if I go in and let's just delete an instance, you know, I'm gonna, you know, workload's changing. I'm, you know, turning, I'm shutting things down. If you notice you'll see some motion in the background as the stars reacting to some of these things I can go through. I can, you know, detach the network from the interface. I can go back through, I can go back through connect it to the interface and you'll see the motions and you'll see the actions are going on underneath the hood. And so, you know, that's all the generic. What's happening is that a star is listening to the telemetry coming off of Neutron. It's reacting if I were to, and then see there's a big config change that's pushed. Now if I exit out of this, what I can do is just kind of give you an idea what's going on within the appliance kind of diving into it a little bit. Is, you know, like I said, the testing appliance is standard Linux. It can access it via IPv6. So, you know, I can kind of standard set of interfaces that we're seeing here. If we take a look at the services, it's gonna be DNS mask. Nice thing about this is that we were able to kind of co-locate a couple of services. So within this particular appliance, I'm running routing, I'm running VPN, DCP. So in terms of resources, a lot of the tenants resources, a lot of the tenants services are all co-located together. You kind of get shared fate. Operationally for operation teams that have been running this, it's very easy for them to dive in and then say, okay, I need to troubleshoot a particular tenant or maybe I need to go through and, you know, maybe I've got a tenant who like, you know, has an NTP reflection attack going on. It's like, okay, but how can I go through and make a targeted change to tenants to then fix something back in for their network? And so within that, like I said, it's standard Linux, it's just, we wanted to keep things as standard as possible. So looking ahead a little bit to Newton, this morning we talked a little bit about, you know, I think one of the outcomes is implementing a generic VNF driver. So, you know, while we've done routing, we've done load balancing, we've done VPN, those are logical constructs that are available in Neutron today. What we're seeing from some workloads is that there are VNFs that exist that people want to wire into their deployment, using their deployment. They have external management systems, which is fine. So within that, just a generic driver that can take that appliance, boot it, wire it and connect it in with the network. Python entry point support. Nice thing about this is that we can have drivers that aren't necessarily in the Astara tree that Astara can then utilize. Load balancing just kind of, you know, within the last cycle, they've added a lot of extra support for new APIs and load balancing. And so just as we've been growing the Astara team, you know, keeping up and making sure we're matching everything, SSC integration also came up this morning as well. So, you know, join our community. It's growing. We've had lots of new contributors in the Metaka cycle. It was exciting to see all the new faces. You know, my favorite part about the OpenStack community is, you know, while we may interact with people one-to-one in person, I always like when you have people jump in on IRC you've never met before and like, hey, I really like your thing. I want to help and let's work together. So, Project Status is on Launchpad Astara. Documentation. Either docs.condit.io is a redirect really for Astara.read.docs. And then FreeNode and then our weekly meeting is Mondays at, you know, 1800. So, with that, you know, just kind of the last little takeaways is, we designed it to be hyperscalable, control plane, very pluggable to provide layer three and above services, all open source. So, thanks. Questions? Thank you for the presentation. Before coming into this meeting, I was just seeing this Astara as a wrapper sitting on top of the neutron. I'm glad I came here. So, I see that it's much more than that. Mainly just you provide, not only wrapping on top of neutron, you have a stateful machine running in there to monitor what has been happening and maybe on the northbound interface of Astara, there's a subscription and notification mechanism will be provided from Astara to external systems, I assume. In that sense, what is the relation or how do you tie Astara with attacker? Because you're delivering pretty much the element management system, EMS, for the VNS. So, the attacker is orchestrating all these deployments and doing all these conflicts, and so, it seemed to be that these two can be bundled together to offer much more. Thank you. Yeah, so, I definitely agree that when you start taking a look between Astara and attacker, they're both within the OpenStack ecosystem. In terms of relative age of the projects, I think Astara was running a little bit ahead of where attacker was. And so, I think kind of logical next steps is figuring out how we can work together. One of the reasons why we moved our fishbowl session to this morning is to avoid overlapping with attacker for that very reason so that we can start, we can jump in and kind of cross-pollinate and talk and share ideas about how to better integrate. Because, like I touched on, the number of elements that we've either used or ideas have been inspired by other things in the OpenStack community and also integrating in with projects where it makes sense, it's a clearer win for everyone. Yes. Two questions here. How do you operate in provider networks kind of an environment? So, in a provider network environment, it's really no different. We run in provider network environment because from our perspective, a layer two connection is a layer two connection. Okay. The other one is, if you have an existing architecture with plain wheel in a neutron deployment with tenant networks, is there a migration path to a server? Is there a, so the question is, is there a migration path online migration path in the code we have today? No, have we written scripts that can do some of that stuff? Yes. Migrations are always a fun thing because every different cloud has a different SLA. So, but it is possible to migrate them. It is possible you can unschedule an instance via the neutron agents and then spin up a corresponding ASTARA VM. It is possible to do that and kind of migrate through. One of the things we also want to, possibility it's out there and really just could we take and leverage the VRP support for HA that's in neutron now and then actually have it fail over into ASTARA is make a really seamless path would be one of the things as well. Any other questions? All right, thank you very much.