 Welcome, thanks for coming. I'm going to talk about maximum ecosystem and maximum interoperability lessons learned from building 3,000 plus multi-vendor open stacks a month. So yeah, that's correct. Believe it or not, we build 3,000 open stack installations a month across a diverse ecosystem. And to us, the maximum ecosystem maximum interoperability equals oil, which is our open stack interoperability lab. And that's where we validate these open stack ecosystem 365 days a year, 24 hours a day, seven days a week. So this is something that we do non-stop. It's basically one massive continuous integration engine where we vet multiple harbor platforms from a vast number of partners that we have, multiple software component vendors, new open stack components, and massive number of different configurations and vendor clouds that we bring up. The reason we do this is because building clouds is not easy. Trust us, we do it all the time. We think we're pretty good at it, and what we've learned is that it's not easy. Each one seems to be a snowflake, and it's only getting worse. I mean, I'm amazed when I come to the open stack summit, it seems like each time it gets larger and larger. I'm sure you guys think the same. And just walking around, seeing the number of partners out there that we have and the number of potential partners and vendors related to open stack is pretty massive. So if you go to the marketplace, the open stack marketplace, and I did this about a day ago, there was over 100 plus drivers today across, I think it was five or six different categories, Neutron, Cinder, what have you, and that list is growing and growing. So really what we did with the open stack interoperability lab was set up this infrastructure to try and manage that, to try and manage that most specifically with the partners that we have. So we'll be on. So this is the partners that we have today. The list is growing. It's growing by leaps and bounds, which is a technical challenge for us, so we need to scale both not only the infrastructure that we have with an oil, but the people managing oil as well. But it's pretty impressive. There's a lot of gravity towards it, and we're very proud of it. So just to give you some daily stats, you can see there we have 100 plus open stack installs a day that we do within the lab. So that's 700 open stack server installs to bare metal. Over 1,000 containers created a day, 2,800 bare metal power cycles, 46,000 API calls, and 32,000 tests a day. Multiply that out, and the numbers get pretty big pretty quick. So how is all this possible? I'm going to turn it over to Ryan, who's the technical lead for the oil program, and he's going to go through that for you. So Ryan. Thanks, Dan. Automation, automation, automation. This is the only way to handle this kind of scale. And we do that using the canonical software tools that allow us to build and scale out clouds. So, as metal as a service, this gives us an API to talk about adding nodes, moving nodes, turning them off, power cycling them, releasing them. And then the cloud itself is, open stack cloud is deployed via juju and charmed up open stack components. Juju is a multi cloud orchestration service that we use to bring all the different pieces of open stack, getting them configured correctly and connected so that when you're done, you have a working cloud. And then this is all built on top of the Ubuntu open stack in the cloud archive where we host all the open stack packages that have been prepared for use in building clouds. So those are key pieces of technology. We didn't get to 3,000 clouds a month from the beginning. It was a much more modest beginning lab with maybe 20 servers and a couple bits of infrastructure. And so as we tried to scale that up, we faced some challenges, so that's kind of what we're talking about here. As the program grew, so grew the hardware that we had grew as well. And so we had lots of hardware sitting around and we needed to make sure that none of them were idle, right? The goal was if we've got a box, we need to be able to use it. So we needed to scale up the number of runs that we did in parallel. And when we build cloud a lot of times, at least in oil, we think about how many physical machines do we need for one of the particular configurations. And sometimes we had leftovers, right? If I need 20 nodes to build a cloud and I have 22 servers, I get two that are sitting around kind of idle. And so we said, well, is there some way we can fix that? So we started using LXC or LexD as we talked about earlier today to co-locate services and put them in containers that allowed us to roughly cut the hardware cost in half. And so then suddenly instead of only two open stack installs for 22, we could run three. We put this down to where I got 21 in there. So we did that across the cloud, across the lab. And that gave us more, we could run more clouds with the same amount of hardware, which for us, that was our goal. And then the other thing that we needed to handle this was a way to track what machines were in use, what machines weren't in use, when were they ready and when they're not. So we use a queue management system that's built on top of the mass API to see what hardware we have in the lab, what are the capabilities, what's available, what's being in use, and then determine what we want to use next. As we added different hardware from different vendors, we have to deal with the fact that a lot of the physical vendors, their hardware is different. So they all have their own biases that have different options and settings and configurations. All those different machines and option ROMs have different timeouts and different characteristics. There's a physical network in these machines. They, some of them come with spinning disks, some of them have SSDs. And one of the first variables we encountered with all the different partner hardware in there was the life cycle for bringing a node up for use with mass goes through this multiple boot phase. Some of the hardware went through this really quickly. So we would ask mass, give me a node, and it comes back really quickly with that. Other times, some of them would take a really long time to boot some of the enterprise systems, take a long time for that. So we had this variable time between when we would get a machine and it was ready. And the good news was the tools that we were using, Juju, for example, you can control how long it waits for these hosts to come up. And so with that, we were able to sort of normalize the hardware, all these different harbors, but from our consumption perspective, it was all sort of normalized. I can say, go give me six machines and I get six machines back after a specific period of time. All the different harbors also have different ways to turn themselves off and on. So from IPMI to AMT, some of them BMCs, we have to handle the fact that these are all different and mass gives us the tools for doing that. There's power driver plugins that we get to figure out how to turn on these different machines. And this is all handled in mass. So the oil lab, we didn't have to worry about that. Mass took care of that for us. But we do get to validate how reliable some of those services are because one of the failure cases, we've deployed a cloud and one of the nodes didn't come up. We can go and we can find out what that is, file bugs at the appropriate place and get that resolved. The other fun one was local storage. So a lot of the nodes will have this storage on the system that we use for the different open stack services, SEPs, Swift, Glance. They use storage in a different way. But since we're reusing the same hardware all the time over and over again, we'd encounter some issues where the storage may have had LVM metadata embedded on it. And the next time we go to use it, they're gonna say, hey, you've already used this before. So we found those, filed the bugs, got the terms fixed. We're also driving some enhancements into mass to allow us to wipe the storage when we're done, if you want optionally so that the machine's just like it was the first time you got it. Now, configuration explosion. So there's a lot of choices that you get to make when you're building your cloud from your compute backend choices, networking backends, image storage, which release that we're running, both from the OS perspective as well as the open stack. So we're currently running about 146 different choice combinations. And then when we run that across all the different vendors that we have, we've got 1,300 different combinations that need to be run in oil. Some of those things, some of those combinations don't actually work and we find that out. For those ones where we know there's incompatibility, the technology is just not meant to work together and we blacklist that out so we're not wasting our time. The net result is that we have to have a dynamic configuration of the cloud and so we do that by taking our inputs, what were the choices we made, what are the constraints, which hardware that it can run on, those sorts of things, and we have a customized open stack deployment that gets pushed out and run onto all the machines. Sure, so the question was what releases are we covering in oil right now? So we're running both LTS releases, so precise and trusty, and for the open stack releases, we're running Juno and Ice House. Right, it's mostly driven by what the Cloud Archive supports, so on the Canonical Cloud Archive page they talk about what releases are supported. The other thing that we handle here in oil is to make sure that of all those 1,300 different combinations that we actually get to run all of those. So our queue is tracking all the different configurations, whether we run them or not, to ensure that we're able to cover that effectively. The other challenge was the shared physical network. So we have a flat physical network where all these machines, physical machines that are connected to, this resource has to be shared for lots of different things. So the machines coming up, getting their own IP address, the containers coming up and getting IP addresses, when we're actually exercising tests, we have to allocate IPs for, floating IPs for the test cases, and then even some of the interesting deployments with charmed points we have where they may have instantiated a virtual machine as well. We need a reserve, an IP, so that's accessible. And that was all doable via Mat, so that was a big help to have that in place. Right, so we have all these installs all the time coming at us, there's a whole lot of data. So what we call a pipeline is sort of a run through the three major phases where we're going to deploy a cloud. Once the cloud's up and running, we have to repair it for use, and then we're gonna verify that it's working as expected. And through those processes, we generate all this data, logs, results, status, and it gives us a lot of things to look at. And just as we talked about automation, automation, automation is the way to handle this, that's what we're doing here now. We automate our analysis of the run, whether it was, did we successfully deploy the cloud or not, if not what component broke? Why did it break? Looking through the logs to find those issues and those are tracked and put into launch pads so that we can fix them. And then we also classify the type of error that it was. Did we have an issue with the hardware? Did we have an issue with some software? Was it configuration? All these different things, so we classify this and this is all done automatically as each run's completed. And so all that information then feeds into providing information back to the partners. This is an example of a fictional partner driven from some of the data that we have in oil. So this is what you get to see back as a vendor. On the top, going through the different, some of the major components that have choices that we run through. The vendor gets a view of when their hardware or software selection was picked. How successful was it when it was in a particular role? So for some of the hardware vendors, this you'll see them across all the different choices because they get mixed up for the software vendors. It's not as interesting because their component runs on different hardware. And then on the bottom half, what we're talking about is within a particular group, like say Nova Compute, we talk about the different back end choices that are made. So for example, did we do Nova Compute back by VM or did we do Nova Compute back by KVM or did we back it with LXC? And you get a distribution of the frequency as well as the success rate of that. So this just gives you a view of the types of cloud that have been built for that particular partner on a monthly basis. And then the other part that is critical for the vendors is what's the actionable data that we get out of this, right? We're doing this automated analysis. And at the top, we've done the classification of the types of failures that we've seen. Some cases we have infrastructure failure, something in the lab, somebody tripped over a cable, those sorts of things happen. We have bugs in our own code that run all of this as well. But the more interesting ones are things like in charms. If it's a charm that the vendor wrote an author, they need to know about it. Maybe it's an issue with an upstream package that we're tracking. These bugs are filed as launchpad bugs and sent to the vendor for them. And that allows them to participate in fixing it, whether it's a community bug or anything like that. And then the bottom is just the test case history, right? So as we run these test cases over time and things change, either fixes are going into the upstream packages or those things, then we can see how well that's affected the net results. The other thing we do is we add new choices, right? As Dan said, we keep getting them on over again. So we have a process for bringing in new choices. And some of the really, the hot stuff coming right now is just SDNs all over the place. So when we have a new SDN come to be on board, we have to go through this process here, where we say, look, let's charm up your solution. Let's go and put together a juju charm for your solution that fits in with the existing open stack solutions there. So they spend some time charming that up, making it available. And the nice thing about the charm for us is that it describes in a very technical way, what are the requirements? How is this working with other pieces of open stack? And how do we do this in an automated fashion? Because that's how we build clouds fast and quick that work is with automation. Once we have the charms from the partner that we viewed, and then we're gonna put it out into our staging lab, where we're gonna run the charm against a real hardware, real MAS, but it's not running on our production service. And this helps us flush out any other issues during the development of the charm that may have come up with some differences between the development environment and the actual runtime. We also have a little bit more restrictive environment in place, and so we catch some bugs there where grabbing a package is randomly from the internet or whatever, and we find that and find a more reliable way to bring that in place. And that means that the charms are gonna be more robust. So when customers are actually using these charms, it's going to work for them. Once we're happy with how the charms are running in staging, then we bring it into production. So we have a two week cadence where things that we've added into staging, if they're working well enough, then they can be promoted into production. So one of the ones that we just added was contrails. So this is just a ju-ju-gui picture of the contrails charm with all the different components and how they're relating to the standard open stack services. So I've got a couple of the solution-specific pieces up there with a Cassandra back end and some other pieces. So that's just sort of a picture. But we got the contrails YAML from Juniper. We deployed it, make sure that's working, pushed it into staging. It's been running in staging for a couple of weeks and then it's been promoted into production now. And the goal here, right, finding if all these things work well with other stuff as for a partner, is that when oil comes back and your solution has been vetted and it runs, right? These things run really well. This means that the validated configurations can be put into the cloud installer, right? So these are options for partners. So part of the value of coming into the oil program is getting this vetting and then having a solution for this component, this choice can be taken to market. So most of the stuff that we're doing oil, we talked about was the infrastructure component. So all of the, how is the cloud working? How do these pieces work together? But there are a lot of other partners who have things besides open stack infrastructure components. They have software that's going to run on top of the cloud. And so oil has a way for us to validate that your application is gonna run against these clouds that are all configured in different ways. Maybe you haven't built your cloud in 146 different ways, but we have. And we have a way for your application to be tested against these different infrastructures. So above the line, these applications, we put together a charm solution for the application so that it gets deployed on all these different clouds and then the test cases provided by the vendor we run to validate that, yes, my application is running as expected and it's run as expected across all of these different cloud types. So that's where we're at with that. Any questions? So there's no performance testing. So our goal here is to validate functionality of the open stack cloud rather than any specific performance of it. The challenge there is since the clouds change all the time, holding one variable, just changing one variable and holding off constant is really difficult just from a coverage perspective. How do we do all that? So we're not doing performance tests, and we're doing functional validation at this point. So that said, that we will actually functionally validate a performance test suite in oil to do performance testing outside of oil. So if there's a partner or someone who's interested in actually doing a performance test, we'll validate that performance test functionally within oil and then set up some infrastructure outside of it to do the performance itself, yes. It depends on the application, right? I mean, there are some subtleties between implementations and back end. So for example, self-raid-off gateway has a Swift interface, but it's not exactly the same as the standard Swift. So some of that's configuration, default configuration, some of it just depends. So it depends. I won't say we're always seeing trouble, but things crop up from time to time, so it's important to be able to validate that. Thank you. So before we take any more questions on oil, I think we'd like to introduce one of our partners that's participated in this program with us. And hopefully benefited from some of the results and data that we've seen out of it. So I'd like to introduce Sanju from Juniper that's gonna tell us about their engagement with OpenContrail in oil, thanks. Sure, thank you. Thank you, Ryan. Yep. So yes, we did integrate with Contrail. So we integrated with Canonical. See, Canonical, Contrail almost actually close. We're confused this time of the day, need to have coffee. But anyway, my name is Sanju Abraham. I am a senior solution consultant from Contrail. We started integrating with Canonical and we've actually benefited from some of these offerings from Canonical that have been discussed by Ryan. And especially if you run to some of the challenges that both enterprises and service providers are finding in selecting a good SDN vendor. I actually have this cartoon in front of you by John Klosner and it actually depicts the confusion and the chaos in the minds of these vendors trying to actually go to the SDN vendors and trying to pick the right SDN vendor. So it's as good as your blindfolded but thanks to the women folk who's basically helping him out, making a decision, telling that go with OpenContrail and want to open stack. So parody aside, with OpenContrail and OpenStack, what we've actually figured out with working with Ubuntu OpenStack is its scales. Ubuntu OpenStack basically scales and it has a good economics added to it. If you basically go to the marketplace and try to buy a good enterprise server, it costs more than what Ubuntu has to offer. Almost basically half to one third more than what we get from Ubuntu OpenStack. Ubuntu OpenStack also actually has this whole ecosystem with good tools and some of the tools have been already discussed by Ryan. We use mass, we use juju, we use chance to deploy the VNFs in the NFE cloud that we spin for the service providers. And Canonical Oil, we started integrating early this year and have seen results which are very impressive and we basically, the whole continuous integration when you talk about this agile environment where code is getting churned out and it is basically getting into the repository and you basically have Jenkins-based jobs to basically execute all the code, all the test cases and automation. What we find more interesting is there's also somebody who can rely on who is basically helping us not just within our ecosystem but in the vendor-neutral ecosystem trying to basically get different applications, different operating systems to work with and to basically validate the solution. So that gives us a very good feedback. And OpenContrail plays very well into this ecosystem. What OpenContrail provides is basically routing and switching IP address management and virtual DNS. It basically provides the service load balancing security and today what we are gonna focus more on is basically the dynamic service chaining aspect of it. So in the network virtualization and software-defined network, what are the most common things across the NFE and SDN is basically the dynamic service chain and management and orchestration. So let's look at the NFE high-level architecture. So if you look at the architecture diagram and this is what we are implementing is you basically have, and this is basically close to the HCE spec for NFE and we are close to the HCE standards. We have a service EMS which is basically defines the whole F caps for the VNFs that are getting spun on the OpenContrail and Open2Open stack. And you basically have an OSS system which is used in the service provider environment. There's basically OSS systems that help you to orchestrate all the VNFs that are basically spun up. Now in the current industry, if you see, all the VNFs are basically in a physical service and these functions, network functions are basically hardware devices that are basically getting requests from the EMS or from the OSS. Now what we've done in Contrail is the speed of actually spinning up these VNFs to basically have not just service agility but service monitoring as well from the OSS systems and the service orchestration is actually driving down from the OSS systems calling the heat APIs that we expose as part of OpenContrail and the heat engines is basically running on the OpenContrail OpenStack system. So if you look at the flow of Mano, Mano is basically the management and orchestration in the Etsy framework. If you look at the flow, once the VNF provider, he needs to basically register through the OSS and VSS system and that entry gets recorded in the catalog. Now once that VNF is recorded in the catalog, the operator now has an ability to basically spin up dynamically at runtime that VNF. So all that he needs to define is a template and this is basically hot heat orchestration templates. And once he defines the heat orchestration templates, he orders the OSS system to go ahead and spin that VNF. And at that particular point, when NFVO, which is another abstract layer on top of OpenStack and OpenContrail, it receives the request. The REST API basically is formed and it basically calls the heat API and the heat API calls the heat engine backend. And as part of the VNF, what is important is they need to basically create the virtual networks for the workloads to basically send traffic through this new VNF that is actually spun and the traffic workloads basically now use the VNF and the function that is offered. For example, if there is a firewall service that needs to be spun dynamically and there are workloads sitting in virtual network A and B, then dynamically after the VN is spun up, the traffic passes through the firewall and then you get all the firewall services. So the next step is basically it spawns the VNF after looking up the catalog, gets the VNF details and notifies the VNF manager. VNF manager is responsible for the lifecycle management of the virtual network function. And the VNF manager informs the EMS. EMS does the F caps, which is basically fault configuration accounting and performance. In this case, EMS completes the job of provisioning the virtual network function with all the configuration that is required for the virtual network function to operate. And once that is done, a notification is sent to the NFVO and NFVO again sends a REST API to the OpenStack OpenContrail so that network policies are spun up and these network policies enable the actual traffic flow because these are the actual network ACLs for the traffic that needs to flow across the virtual networks into the virtual function, virtual network function. So what is important to basically learn from this is how Contrail actually helps in service insertions across these virtual networks. This diagram actually gives you a very good example about how policy enforcement as well as policy enablement of a service enablement helps the traffic to flow seamlessly from the green virtual network in this diagram to the red virtual network by dynamically spinning up the service instances through this orchestration. If you look at the implementation details as to how traffic flows between the green and the red virtual network through this VNF, when the Contrail is basically deployed and we actually, Ryan was talking about how mass is used so those boxes down where the VD.secure and VFirewall is hosted, that server basically runs V-Router and V-Router provides encapsulation, it actually provides, it has MPLS over GRE, it has MPLS over UDP and VXLang termination to both top of rack and across the servers. So all the packets that actually comes from the virtual machine which is now the virtual DDoS or the virtual firewall goes to the V-Router and V-Router basically terminates it on the peer and then it basically forms completely a mesh kind of a network for tunnels to terminate on the top of rack or to basically go through the gateway to another node. So it becomes very seamless and easy for traffic to basically flow across these VNFs by means of policy and by means of the encapsulation that V-Router basically can thin. So now another cartoon that I like from Dilbert, there's so much of virtualization that is happening and if you look at some of the early proponents and the people who want to basically go the virtualization route, they kind of look at it and they come into their IT shop and tell their folks they want to basically go and adopt this virtualization but it's not easy. It takes cycles, it takes effort, it takes a good amount of understanding of what we need to virtualize. It just can't be like in phase one, a team of blind monkeys will unplug unnecessary servers and in phase two, those monkeys will just hurl software at whatever is left but that becomes actually a good automation orchestration done right. So contrail heat orchestration, what does it actually provide? It provides a way in which you can basically create virtual networks, IPAMs and DNS. It has VNF parameters that can be input to the, via the template files. It can create the service chain and the policies. So in the demo, I wanted to do a demo but I need to basically switch laptop so I just have the screenshots but this is basically the demo topology. So in this topology if you see there's a template file which is a YAML file. It has green and red networks and it enables policy dynamically and then this virtual firewall which is basically from Juniper. It's called a VSRX and Firefly image that gets spun up dynamically and the policy gets enabled and by enabling the policy traffic across the virtual network always flows through the Firefly to the other virtual network. So this is just the screenshot. I'm sorry if it is actually, the font is less but the gist of it is it basically calls, if you look at it, it calls the heat stack create with the VNF template and it provides the environment in which all the parameters are defined. There is a template YAML file to generate the service chain and that's about it and then heat basically goes and performs all the different tasks that are actually defined in the YAML file, creates the service template, creates the policies, creates the virtual network, spins up the VNF and all the flow that we actually saw in Mano is done through this and the advantage of this is basically we don't need to actually go with the Mano descriptors. You can just do everything for the virtual network enablement as well as virtual network function enablement and the traffic flow and the policy all through the hot templates and if you see this is basically what it's in the horizon, it basically shows you the name of the stack that gets created and the template you can actually drill down into the resources, the events and all the tasks that actually gets put up as part of this orchestration. And this is basically the Contrail web UI if you see there is a policy and the policy basically has the firewall enabled and you would see this is the first in the service orchestration where it is defining basically the template. So we've also partnered with different VNF vendors from F5, Nokia, Sonos, Riverbed, Sandvine. Some of the videos that we have done with Firewall and DDoS are on YouTube so you can actually take a look at those. And to finish, so what is SDN going to offer? SDN basically offers a greater agility, faster deployment, simplified architecture, wider teeth and brighter children. So that's the promise that all the SDN vendors are actually giving out to enterprises and service providers. So hopefully somebody would actually buy into this and deploy it. So OpenContrail is completely open source. You can actually go to jithub.com slash juniper and you'll find OpenContrail. You could actually download it. You could please contribute back to the community and for heat templates, you can actually go to the URL as shown. Thank you. So that was bang on time. We probably have time for just one or two questions. Sarah at the back. So the oil test suite? Yeah, so one of the tests that we're running is based on Tempus for validating the cloud. So yeah, what we're running to validate the cloud is just open source. We have the ability to run other test cases too. So vendors have their own test suite for a specific function that can be added as well. That's one of them, yeah. Any other questions? Yes, over there. Is Tempus publicly available? So they're not publicly available from us. If any one of those partners wants to actually make them publicly available, they have the right to do that. So we give the results of these tests back to the partners that we engage with. It's up to them to determine if they wish to make those results public. It's not, you understand that some of that is sensitive information. So because there's lots of competitors in this environment, so yeah. So a lot of the actual partners have asked us about rights to redistribute and the agreement is that they can, so sure. Last question, sir? Yeah, so we do actually retain the right to publish some data as long as it's not partner specific and some of that is lessons learned. So we will be probably within a month or so be doing monthly blogs, giving out some more generic type results. We also learn, there's a massive amount of collateral data which Ryan says, we also, there's a lot of lessons learned and we have a wealth of knowledge of what works really well together and what may not work very well together. I think that's the question you're asking. So one of the things that we do with the partner engagements is that if we find something that consistently works really well, we offer to actually have a route into our installer, which we talked about earlier today. So the canonical open stack installer. So when you actually go through that, you have an option to pick and choose the different solutions that you want. But also things like publishing reference architecture is based on the results that we know work really well. That's another thing that we're gonna be doing in the future as part of this. I think that's the question you were asking, right? Yeah, okay. All right, thank you very much. We're a little over time, so I appreciate your patience. So thank you very much. Thank you. Thank you.