 Hi, good afternoon. Thank you for coming one thing I want to make sure you guys could tell is that we included a QR code for those of you want to get the presentation We have it online for you guys to download that way. You don't have to take pictures of every slide that you're interested in Do we have you've already published it for you? So thank you for coming. We're talking about evaluation of the open stack deployment frameworks that we covered this summer over the In the last three or four months So we're covering today is a little bit about who semantic is who we are Why do we do this? Right? What was the purpose? So what our proof of concept goals were our criteria and then some of our conclusions on Specifically some of the frameworks that we looked at So what is semantic doing? Why are we even involved in open stack at all? So First of all, I want to be able about us and then we'll go into that Semantic for those of you who may not know I think most people probably do know who we are We're a security company. We're very focused on building a secure Environment for people to work in to build workloads on we cover everything from backup storage to security products like antivirus Compliance products as well as the PKI certificate business that they have Little about myself. So I'm Brian Chong. I am an infrastructure architect for the cloud platform engineering group for semantic my specialties are network and security really focused on basically designing The open-stack application level platform This is in my colleague Shane Gibson. He's also infrastructure architect He is very focused on sort of the frameworks the bare metal provisioning as well as sort of the network of physical air topology So one thing I want so some of the things we're really trying to tackle with this is really attacking some of the harder problems at scale For some of the applications that we're going to be building So I want to talk about a little bit about what exactly Semantic is really trying to focus on through this opportunity of using open stack First of all, we're building a brand new cloud platform to be used within semantic. This is pretty challenging for us We have a very diverse set of technologies and we have a lot of things that we want to build on So we're trying to start in a greenfield manner. We're building a brand new cloud from scratch Second thing is we're trying to make it very global, right? We're looking at using multiple data centers around the world as well as different teams And we're very much focused on an open source open platform model that we want to use The reason why open source is critical to us is for two reasons one as you probably heard It's a lot about flexibility. It's a lot about design parameters and we want a bit of control our destiny Right. I think over the years we have learned as infrastructure architects that you know When you have a very complex palm you're trying to solve and use very complex solutions You need to have the ability to understand them and how are you going to attack them? Attack issues that arise in the platform when you're not when it's an enterprise scale capability And we this is not only true for the platforms that we use and we purchase is also the purpose that we use internally that semantic builds the second thing you want to talk about is We're really trying to build an IAS layer platform So this is we will be building services on top of open stack, but we're really focused open stack as an IAS platform I know that some people may use open stack in many various ways I know there's lots of projects, but our main focus at this point is to use open stack to build a Internal infrastructure as a service platform One of the things also reason why semantic is included is trying to get into open stack is we feel that there is a Large value that semantic can bring to the open stack community by looking at the platform Helping to secure it and actually contributing back to the community some of the things that we find whether they be bugs or security holes Orien the ability to enhance open stack right with some of the capabilities that semantic has internally in terms of our engineering staff as well as our architecture capabilities and We're starting small right so I think one of the lessons we learned was that you know You can't get out of the gate with something so complex as open stack We wanted to make sure that we started with something small understood it got our hands wrapped around it And then we're going to scale this to thousands thousand servers across multiple data centers worldwide because I think some of the services semantic is focusing on are definitely Require many many servers giving examples some of the things we're looking at is like the current OSIS P service that We run right it takes billions and billions of transactions We're going to need a very large footprint to post that something like that on a cloud platform that we're looking to design I'm sorry on online certificate status protocol, right? So I don't know if you're familiar with Semantic purchase the Verstein CA business 2010 So we're talking about some very large platforms that we're looking to build so everyone within semantic is pretty excited to be using open stack and we're We're really looking forward right to its capabilities and see you really pushing it to its limits and seeing how far this thing can scale So but why what are we talking about today's specific? So that's sort of semantic long-term, right? I think you guys are here really to learn about what did we actually do and why did we do it? So we started in April and one of the first things we learned about open stack is we have to figure out how to get this thing Installed I think we learned by reading we weren't going to go download it from github, right and try to do this by hand Right. I think that's very difficult I know some people have tried and it takes many months and weeks and you know We thought we didn't have the time to do that So we went to look for certain vendors that we felt could help us in our journey and get us there faster We evaluated five different methods which would be going a chain will be going over today And we one of the folks is that we tried to really focus on open source, right as these are open source platforms We're not really looking at Enterprise version deployment tools So there are three major areas we looked at But before we go to technology Capabilities, what could it do right? What was a tool capable of actually deploying where they could do bare metal? Could it do the networking? You know was able to do install checks resiliency was actually key because we believe will be redeploying open stack very Frequently as we scale so when you have many thousands of nodes and lots of different clusters We knew we would have a rolling model right especially as the frequency and the cadence of which open stack deploys So we need to make sure that the tool itself was part of our actual life cycle process And it itself had to be resilient on its own right it couldn't just deploy open stack resilient It actually had to be in resilient on its own as well and complexity Because of the various services we're looking to build Overall and the different network topologies that we're looking to deploy on this for security purposes Compliance purposes as well audit capabilities. We wanted to make sure that the tool had the ability to handle extremely complex configurations as we went through the different Services that we want to build on this whether it be past services or true SaaS services So with that I'm going to hand it over to Shane and he'll go over some of the more specifics of what we did to our deployments Thanks, Brian So obviously when you deploy on doing something new it's a generally a pretty good idea to have an idea of what you want to exit with an idea of what success is and Not knowing a whole lot about open stack We knew starting at the bottom here that we wanted to get open stack up and running We needed to be able to test it to learn how it Works how it runs how it's configured different ways of designing and architecting open stack Primarily though we wanted open stack up and running on our hardware in our data centers in our environment that our teams could learn Get their hands on deploy Automate test Some of the specific nitty gritties we were interested in was adding deleting modifying nodes We're going to be rolling lots of equipment out and bringing up More and more equipment as we grow through our different iterations and growth phases So we needed an ability to be able to quickly and easily deploy redeploy and we also knew that Obviously it needs to be as few manual steps as possible We need to be able to automate as much as possible and preferably with an API Because we know we'll be developing our own frameworks to drive this and we want to be able to drive the tool through some sort of API methodology Through the proof of concept we needed to make sure that we started this the proof of concept with a vendor and a specific product Did a deployment on a cluster That deployment methodology was documented on our hardware on our environment With our network topology and then we handed it over to another semantic employee and had them redeploy and that's what we defined as how long it took us to get from bare metal to up and running open stack Sitting down we had to figure out how we were going to test the deployment tools and how we were going to test open stack And for those of you that are new to open stack you'll find There are a number of network elements that you need to be aware of within open stack There's a number of primary core Open stack networks Starting at the bottom you have your public network where your VM guest traffic ingresses egresses your services are exposed essentially your applications are exposed above that you have your API and your management and network segments, that's where your Open stack controllers compute and all of those various things top in their API level Private and storage networks. That's where your VM guest networks work. That's where your storage connections through Object store etc are and above that is the admin network, which is specific to the deployment Framework where you actually do the pixie boot the OS install and management of the physical cluster The last network the very top one is the BMC or IPMI network. That's a physical hardware control network We wanted to separate the network environment out into four Well five if you include the BMC IPMI network But for open stack networks physically separate networks so that we could have the ability to Define what traffic can come in and out of those networks a lot of people that initially deploy open stack They deploy it all in one network segment and we thought that was an insecure method of deploying open stack we want to control access to the API's the VM guest networks to the Private management networks etc Going through the deployment With the various different vendors tool in tools one of the things we learned very quickly was well There are hiccups along the way and things don't always go as smoothly as you hope they would go and so we ended up plumbing up our what was our jumpbox with a VLAN trunk and Configuring it with an IP address for every network segment So we'd have an ability to test because we closed off all of those networks You have to determine if you actually get the network level up and working on functioning correctly So that was a real big learning point for us and for those of you who can't read the small print in the very corner Yes, we know those are not a valid IP or not our valid IP addresses, but So this is the network topology. We came out with we used three clusters of five nodes That caused us some problems too in some cases because we weren't able to do a lot of the HA testing in open stack Which is five nodes five physical nodes And each of those clusters were completely isolated network segments So there's essentially 15 networks and 15 VLAN segments there Outside stepping outside of the network topology. We came up with a fairly standard Open stack design topology. We initially placed actually throughout the whole test We placed the admin node on a piece of bare metal on its own in retrospect We probably wouldn't have done that we would have spun up a VM guest for the admin node Configuration and not dedicated a physical piece of hardware to the admin node, especially with such a small cluster It was it one of those learning points. We went through We set up a controller to compute nodes in a storage node in our testing We only had one storage node for object store. That was another learning point was really tough since Swift typically you want three nodes to be able to do Replicated object capability. So we got into a number of scenarios where we had to hand deploy Swift and a lot of the some of the tool framework tools were unable to do the deployment With a single node and so that caused us a little bit of Heartache and burn if you're going to test object store I highly recommend that you guys get at least three nodes or put a little time and effort into doing three VM Guests to do your basic Swift or object store capability testing So once we've sort of defined the Network environment in topology in the open stack topology and architecture that we're considering testing We got on with the actual provisioning evaluation One of the things I really want to highlight and stress throughout this talk All of the tools that we tested are based on Features at the time obviously that were available over the summer during the test frame from April to we finished What September 30th? So over a three or four month period of time all of the vendors tools have been evolving dramatically and rapidly so When you guys sit down to do any of your own evaluation the tools I would look at all of the tools with a fresh set of eyes and all of the like I said They're a value. They're evolving very rapidly kind of like open stack is right Every release brings a whole host of new features and capabilities The five main tools that we tested Were the fuel web Maranta's product at the time was version 301 Fuel web at the time was a standalone product separate from their fuel CLI server We chose to implement the fuel web server because we wanted to get it up and running very quickly And it was a lot less time to learn all of the CLI deployment capabilities and Requirements for this the CLI product Canonicals mass juju products versions 1.2 0.7 Those products have been a revision significantly since mass is the metal as a service It's the bare metal provisioning component juju is essentially their devops tool slash glues other devops tools Together they use charms to deploy through the juju tools crowbar supported by Dell version 1.6 that product Kind of glues together a whole bunch of chef recipes and the crowbar framework Clearly strong integration with Dell hardware since it's born out of some of Dell's requirements for deploying Open stack, but it's not exclusively Dell Foreman A lot of people know a foreman. It's a general open source deployment framework tool We deployed with version 1.2 in conjunction with red hat who helped us with that deployment And then last we deployed rack space private cloud Rack space private cloud is not a deployment framework of its own It's it's a set of chef recipes for deploying open stack So it's not it broke our requirements in that it's not a bare metal provisioning solution that gets you up From bare metal all the way to open stack running However, we felt that rack space has a relatively significant Influence and impact in the open store open stack world. So we wanted to see what their product was capable of doing Before I go on I'm sure a lot of you out there that have looked at frameworks in the past are going to say well What about what do you know this or that, you know, why didn't you try and test this or the other thing? There's a million things out there I want to acknowledge that there's a ton of great tools out there and we didn't test a whole lot of things Some of them we still would like to test and continue to keep an eye on But we are aware that there's a lot of other interesting tools out there starting with fuel Fuel combines together a significant number of open set open source tools and products into one suite They've integrated a tremendous amount of components To get it up and running one of the things we found interesting was at the time They're using postgres is sort of the back-end data store for the fuel web product Sort of interesting because they deployed my sequel for open stacks So you had two different database platforms you needed to look at It's not necessarily a bad thing just something you want to be considered consider and be aware of Take a look at their requirements and see if they've they've changed their database back-end or at least be aware that you need to Know how to deal with two different database types Their deployment at the time swift or not swift but the fuel Product itself the fuel web product itself was a relatively young and new product And they hadn't bubbled up some of all of their capabilities from the CLI So we had to deploy a few of the components by hand that's since been rectified their fuel CLI and fuel web product have been unified So if you take a look at that you'll find that a lot of the capabilities that we had to deploy by hand are now Native with a fuel web product, so you'll find there's a better experience there After the fuel product we went to went to canonical and they came on site We deployed mass juju Mass is the like I said the the mass product is a bare metal provisioning tool has really strong multi-region Multi-data center capabilities with our global and local regional controllers It's actually one of the only products that had a multi-site aware capabilities out of the box juju charms is for the deployment of the code One of the things that we found particularly with Deploying with canonical mass juju is we were adamant about our five node model and how we wanted to deploy it and Canonical kept saying that's not the right way to deploy it with our tool We learned our lesson. We should listen to the vendor there. They were definitely smart They know they know what they're talking about with their own product So we should have been a little bit more flexible with it from that perspective Because of that we had a bit of a problem with the open-stack charms Deploying on a five node configuration. It definitely wants to be able to deploy on a ten node configuration after deploying with the canonical mass juju product Dell came on site with their crowbar product and We did the implementation with crowbar crowbar is a Version 1.6 is very tightly integrated with chef server so much so that the crowbar tool was Storing a lot of its state and information data in the chef server For being able to do its deployment topology They they were one of the products that deployed native with their deployment Monitoring and graphing capabilities and it was nice to see the tight integration So as soon as you got a deployment up and running you had an entire monitoring framework that was Available and running right out of the box for you. It was a really nice tight integration With a nauseous and gangly it was the product that they deployed as well Interestingly they did a great job of bubbling up a lot of the configurations and knobs and twiddles things that you can change With the open-stack configuration itself up through the crowbar UI so you can make Specific changes for example you could change between you you ID or PKI For the keystone configuration you have a lot of options that you can tweak and change They were very very quick to deploy Possibly because they were on Dell hardware Maybe maybe not we've only experienced deploying crowbar on Dell hardware But it was very impressive how quickly they were able to deploy on the Dell hardware One of the nice features if you're a Dell shop is they have the ability to put Specific point releases of BIOS firmware on the machines that they've tested and they know works with open stack and works Well with the Linux distribution that they've deployed on and that's a really nice feature as well as a tight ability to do configuration of the raid hardware level That was one of the places that it shined very specifically crowbar is in the process of moving from crowbar 1.6 to crowbar 2.0 and now They're bringing in a whole lot of new features and capabilities within the crowbar 2.0 framework deploying foreman Isn't Specifically an end-to-end automation framework But working with red hat they did a lot of work in the professional services engagement to give us a configuration to get from the bare Metal to open stack up and running Obviously if you're familiar with foreman it has a very interesting model where they do what they call smart proxies It's smart proxy might be something like a DHCP server a DNS server or Your puppet server it has the ability to manage multiple quote-unquote smart proxies remotely So you can have a very strong distributed model with your foreman environment if you're a puppet shop They've had puppet integration with foreman for quite some time where foreman can act as the ENC for the puppet environment and run deck as well as a good tool for automated capabilities within Foreman puppet configuration capabilities Foreman I believe is in the process of doing a lot of work right now to extend to some of the other DevOps tools So if you're not a puppet shop, there's a story there for foreman as well possibly So like I mentioned it requires a lot of customizations. You have to configure Kick-starts you have to do a lot of work to get to the point where you get foreman to the point to be able to do all the deployment for you rack space private cloud Their implementation is through chef. So since they're not a end-to-end framework We had to get our operating systems up and install the networks configured Chef server in place and then rack space did a deployment from that point on They at the one of the things that's interesting about rack spaces They have a very strong view of what works and what doesn't work. They run a significant Significantly sized possibly the largest size. There's some discussion about whether rack space has one of the largest Open-stack Deployments in the world So they have a very strong idea of what doesn't doesn't work in production. And so their implementation Initially was Nova compute Nova networking They didn't want to deploy neutron because they didn't feel at the time. It was ready for high availability which was one of our primary drivers and so We initially deployed with Nova networking with their advice and then we did eventually deploy with neutron in the L3 agent specifically which is the issue with Enterprise grade capability or at the time was with Neutron networking. I believe Havana has Significant amount of work that they've done around the L3 agents in the neutron networking components that are bringing it up there So it's much more closer and capability and parity with Nova networking capabilities so once we'd gone through all of the Different implementations and we had to sort of sit down and figure out how all they went and it was an interesting process because For the most part all of the tools were fairly different from each other They're like rack space not being and then automation framework it obviously there were some of those what we felt were requirements That we needed Weren't there, but their implementation of OpenStack was very good and they had a very strong Implementation view of how OpenStack should be deployed which is good when we're young OpenStack growing group within semantic having guidance from someone with experience is important The TTC column there is my little time-to-cluster Column so it's how fast it took us from the beginning of the engagement to a fully up and running operationally OpenStack environment Documented by the the vendor that came on site to do the professional services install and then another one of our employees to do the Implementation and so the time-to-cluster for crowbar was four days fuel web rack space mass juju were eight days and five days with foreman Some of the issues that slowed down the other configurations was because of our five node environment We had to do some of the things by hand and that really slows things down when you use you break the automation Capabilities of some of the tools so it then takes a bit longer to work out how to do all the implementation and deployment capabilities Capabilities resiliency and complexity refers back to the slide earlier in terms of capabilities being the bare metal provisioning the high availability and Resiliency is the the open sack high availability and the deployment tool high availability The complexity is the ability to deal with the multi-network environment and being able to well Indirectly it turned out the complexity of being able to deploy on a limited hardware environment of five nodes So one thing to note about our ranking of these things it's a reflection of Some of the problems that we ran into in terms of the our environment and our configuration and I wouldn't look at any one of these and say oh gosh You know there's a clear winner or oh gosh There's not a clear winner because in our minds there isn't a clear winner There are some that are better for what we need right now And there are some that are we're very interested in in the long term The net that we came away from it with is we're going to be watching all of these tools as well as some of those Tools that were on the we didn't test slide that we didn't get a chance to get to and test and as we get more experienced We're doing it playing open stat. We'll get a little further along with that Once we had gotten the clusters up and running Part of the engagement was also to help teach semantic How the heck to operate open stack clusters that we don't have open stack experts We needed to start building them internally and so from that point we needed to validate and learn and understand how open stack works from there and Brian We didn't touch on I think I skipped through distilling all the slides down In addition to or maybe I missed this in addition to that there were a number of ways We we chose to test the deployment framework and the tools. We had three main drivers We took Horizon dashboard and basically used it as a smoke test ran it through the paces to make sure everything works and then Brian wrote CLI based tool to drive the open stack environment and actually test and exercise all of the components And then we ran tempest tests against a couple of the deployment iterations as well. So Yeah, so I want to talk a little bit about what did we actually test So one of the critical Capabilities is the tool not just be able to deploy open stack but deploy it correctly right on the nose that we configured one of the things that I also want to add to One of the things we were looking at I specifically was looking for was really the ability to tune Right your open stack deployment through the tool when you're talking about large scale What we're hoping to do is not to have to have yet another tool like a puppet or a chef That has to lay on top of the deployment tool Right to make further configuration changes whether it be the oversubscription rate of your noble compute nodes or any other tuning right that you want so One of the capabilities of the tool that we also have to focus on was as you mentioned a bit of bubble up Right some of these parameters through the tool itself. So I just want to go over this slide pretty quickly So we did do an exhaustive test of the open stack system itself after we installed it. There were Some slight issues most of them were easily corrected either there was an error in the initial deployment of the tool of the open Stack code itself or it was as you mentioned some of the hiccups that we had a five node Cluster versus ten node and so some of the specific Tuning parameters weren't there. So but we did test them all For me, I'm not gonna walk through this slide exactly But I just wanted to make sure you understood that we did go through with some exhaustive testing We also as part of this I also want to mention that we did do a security overview of the tool itself And of the open stack itself and I'll be sharing that on Thursday One thing I wanted to specifically mention was the difficult part for us to test was really the neutron component Simply because the complexity of the network that Shane explained and we had a lot of different interfaces We had a lot of different traffic flowing over than over the system And so we weren't really able to test that at scale We only had five nodes and the Swift was also an all-in-one. We only had one storage node So we weren't really able to see the dynamics right of the Swift as it went over different networks as well as it As the data and objects were transferred between each other I just want to make sure you guys Heard that we did do a lot of testing and we did cover all the bases after the deployment So we know the deployment tool actually does deploy opens that correctly and it does work so the summary like I said Specific products capabilities at the time we tested everybody's tools are going through significant change If you guys sit down to do your own deployment. You want to do a bake-off of different vendors I highly recommend it. I wouldn't just sit down with one tool and say I'm gonna learn how to do it with this tool It's a good thing to do You'll find that everybody has a very different view of how to do deployment from bare metal How to do the operating system deployment how to manage that how to Deploy open stack there's a lot of as as we all know or learning through the summit open stack is a very complex Beach and a lot of different companies have different ideas of how to their best practices how to deploy it We learned a tremendous amount about open stack going through this process of learning how different vendors decide to do their deployments And how they like or different products or how they feel in the maturity of different elements of open stack exists So when you sit down and you go to do your own Deployment framework. I would take a look at all the tools we mentioned. They're all great tools I would look at the stuff. We didn't look at Bright down what your requirements are what are you trying to actually do we we did that? But we also looked at a lot of tools that didn't fit all of our requirements because there were compelling reasons for some of the different The vendors and the vendor tools and solutions for deploying open stack All of the vendors are We're really interested in feedback from us and that was really fascinating to us It was a side effect We weren't expecting because we were comparing so many tools side by side back to back to back We had a very good view of how different vendors and different tools do their deployment So they were almost all universally were eager to listen to us on what we felt were features or Ideas or different directions that they might go with for deploying open stack. So it was really interesting to see how Hungry the vendors are to get things right for enterprise deployments of open stack and get to that point Questions Yes, how did How did we proceed so we are going with the crowbar product we're working with the Dell team to help Sort of shape the the 2.0 product that they're in the process deep in the process of building right now The way we're intending to proceed is move forward with Dell crowbar the tool worked very well But we're also we're very keenly keeping an eye on all the vendors because all of them had strengths that Were we're worth keeping an eye on absolutely That's a good question. So we did the the the whole proof of concept started in I think we had we started in April Well, we started discussions in April, but we started work in July. I think I mean if you want specific time And I think we had the hardware and network installed and we started the project in April me and Shane We got the hardware set up in May and then we pretty much every two weeks Two to three weeks we had a different vendor coming in so that kind of gives you about the rough timeline No, so yeah, so it was about about three months total was the project from getting bare metal Purchased order delivered installed Configured and then bringing in the vendors each of the vendor engagements lasted from one to two weeks And then generally we had a week in between each engagement where we tested the open stack deployment and redeployed it on Different clusters and tested it again, but it took us about three months to go through the whole process Yes So yeah, so the question was did we consider which version of open stack if I am Paraphrasing correctly with each of the tool vendors. No, we didn't really worry about that too much All of the vendors whether they're on an older fulsome or grizzly release Are all very quickly moving forward with a product. I think we deployed grizzly on all of them We had some early access products and beta versions of a lot of the vendors tools that hadn't reached the community at that point yet That deployed grizzly. So we did deploy grizzly I believe with all of them and there were different variations of the grizzly deployment that were deployed at the time How much time do we have left we have Time for one or two more questions if they're yes again Yes, no real plans for that. It's just basic shells shell script driving The command line Yeah, if you exercise if you want after I can mean it's nothing special what I realized was for our internal use We weren't going to drive our platform to horizon So we needed to have the actually ability to orchestrate the rest apis in the order We wanted to so that's sort of why I wrote it. Yeah, but if you want to talk about I can actually show that with you So there's nothing in there. That's a symbolic specific if you do we're happy to share it It's nothing terribly exciting our contact info is here. It's on the slide deck again the QR code So if you're interested in the information or how to get in touch with us Please feel free to drop us an email and touch base with us as well as all of the different tools their websites and the versions We tested in their current version So any other questions any last questions? Yes, sir. How important was scale awareness in terms of? Yes, so yes, so that's a good question. It's very important. It's one that We ended up not evaluating most of the products on that solution simply because our deployment model Didn't really allow for high availability configuration It was important that we understand that they can do the high availability and in fact a couple of them are not yet A high availability product, but they all had it in their roadmap So our fundamental belief was that the vendor will get there if we needed them to get there Yeah, it is critical for the long term. Absolutely. Yes Any other yes, sir Yes, we did we use KVM mostly because that's what the all of the vendors chose to deploy not necessarily out of a desire of Our own just to deploy KVM The last question I cannot hear you please come Pardon Triple-O Did we consider it we considered it, but it was one of the tools we didn't test We would like to look at it. Yes. Thank you very much everybody appreciate your time Enjoy the rest of your summit