 Hi, everybody. Good afternoon. My name is Stefan. This is Jose and Mark. And we are going to talk about converting traditional apps to containerized applications on OpenStack. InterNAP is the public cloud provider that CrowdStar is using for this. And they'll get more into detail of that. So let me first talk to you about our offering. Essentially, InterNAP provides cloud services, which include bare metal, managed hosting, and colocation as well. We believe we want to provide the best infrastructure for all enterprise needs from an infrastructure standpoint. Director of Product Management, you can reach me by email. It's Antoine on InterNAP or follow me on Twitter to get some information on our companies, technologies in general. So let me get right to it. InterNAP is a infrastructure provider that is available and that has 20 data centers globally distributed. Out of those 20, we have seven that are OpenStack powered. So last year at the Tokyo Summit, this time we launched our bare metal service off Ironic, which we have brought to seven different locations in the US, EMEA, and APAC. We have roughly about 20,000 cores available for our users and 2,000 servers currently active under Ironic, providing all sorts of services to end users. We also have a petabyte of storage currently being offered available. And we have 15,000 servers that are still waiting to be onboarded into Ironic. There's a talk that happened about just a few hours ago from one of my colleagues highlighting the project of taking the existing servers that we had under a custom built orchestration and onboarding those into the Ironic management for OpenStack so that we can be solely focused on OpenStack. We use all of these wonderful things to provide the best hosting, ad tech, gaming, big data analytics companies infrastructure for their own purposes. OpenStack, obviously, we are here because we are open source minded, but that open source does not stop with OpenStack. Many of the logos that you see right now are in the marketplace. They have done the open source community participation, and we use them in various ways, whether it be from the full gamut of operating systems that are available either virtually or physically or underpinning, super micro for the actual servers that we deliver, Juniper Networks for our networking and Cisco as well, routing and sorts. Intel, we are a platinum provider. We use their high performance SSDs and VME looking forward to the 3D cross point. Again, all of these things to provide our customers with what we believe is a performance, best in class performance infrastructure as a service. Getting right to that. So we essentially have three big labels under our main deployment, which is Agile Cloud for the virtual instances, Agile Server, which we branded for the bare metal, and Agile Storage, which encompasses both our Swift for the object store and our block storage for the persistence storage that is available for the virtual based on solid fire. And hopefully in the future also, we'll be tagging that to the bare metal as well. All of this open stack obviously for us, the commitment is so that our customers such as CrowdStar can pick a standard API to consume, whether it be virtual, physical, and have a single pane of entry point for all of their infrastructure management services. So our open stack deployment is essentially the core services today, Swift, Keystone, Nova, Neutron, Cinder, Glance, so that we can obviously render the basic services. We take the core packages from the repos and deploy it ourselves with our own recipes based on Puppet and Ansible. We don't use a branded label like a Mirantis or such or Canonical to deploy it. This is all in-house, we have the skill set. And on top of that, we obviously have Horizon so our customers can go in and consume. The API is readily available with some limitations on Neutron, but Cilometer, Heat, Ironic are also exposed to the customers, which allow CrowdStar to consume and deploy. How do we offer these, what do we use all of those modules for essentially, as I've stated, virtual instances. So if you do a nice flavor list as a tenant, you will get what is called the A flavors, which are oversubscribed virtual instances. The B, which are not oversubscribed, so no noisy neighbor. You get the full power of the hypervisor in our instance, it's KVM that is available to you to the point of the B116, which is the largest one we have to offer, which is physically half a dual socket computer that is at the customer's disposal. On top of that, we have the eight flavors of bare metal that are equally available, consumable, just like a virtual instance, spin up, spin down, obviously the delivery times a little bit longer on a physical server versus a virtual instance, but all of the billing by the second, or monthly, or spin up, re-image capabilities are available, and both of these share the same networking, VLANs assignments, subnets, and so on, so seamless integration or experience for our customers. So what differentiates us, obviously as a public cloud provider, many of these things are not new to you, but what we have chosen is networking first, and the reason for that is as you listen to talks and as you deploy your applications, performance of being able to deliver data, across the plane from servers to VMs between servers, Hadoop clusters, heavy loaded, web front ends, caching, and so on, so for us without a good network, there's nothing of value that can be given. So we started with the network first, and so there are actually currently at this talk some spec talks over some of what we've already done, which is at the bare metal provide a bonded interface or the bonding of two physical interfaces and trunking VLANs to the customer. That way the customer gets better resiliency at the server level and the flexibility of attaching the VLANs that they want. So for us on that LACP bond, networks are pre-populated, hence the limitation on neutron, and the customers can detach whatever VLANs they want on whichever bare metal instances or virtual instances they want to be able to create a complex application or cluster or environment that may be segregated for production, staging, UAT, and so forth, right? That being said, infrastructure as a service isn't always cloudy. That's not necessarily a bad thing because obviously we only have a limited set of virtual instances. We have a limited set of bare metal that is readily available to our customers, but we also pushed it a little bit further. Why? Because in some complex applications or designs or needs from our customers, we realized that, well, we would never be able to inventory all the permutations that would be required to satisfy all the needs. So we have three categories of the service for the bare metal, which we call deploy on demand, which is the eight flavors that are readily available, continuously stocked, available through the CLI or Horizon, and then on top of that, we added what we call upgrade on demand, which is a selection of 72 different permutation that you can obtain. We deliver to the customer, make it available, and then that private flavor is exposed to you, right? Whereas the others are all public, the latter is private to the tenant or to that customer. And then above and beyond that, if really we get into extreme cases, well, then we may build a custom flavor because we have to build a custom configuration to satisfy a very particular customer need. And speaking of particular customer needs, none of this stops the customers from doing whatever they want ultimately, as we currently do not have a container as a service offering, we are definitely working to get CoroS on board, reaching out to the different container vendors, but right now as we are focusing on the infrastructure, but nothing stops our customers from being extremely flexible, and I think that's the real focus of the talk and I'll let the CrowdStar folks get right to it. Jose. Thanks to Fon. Hi, everyone. Thank you for being here with us today. Last day of the conference, I'm sure everybody's tired and ready to go party, probably. So a little bit about CoroS, I covered a lot of this during the keynote yesterday, but I want to give you a little bit of a view of how the game and the operation works and why that created challenges for us that we needed to solve in some way. So the way that the COVID fashion application works is that it's actually driven by events, right? So there is events that are delivered daily that have a specific theme that the women around the world have to dress for, and all these looks are actually being submitted and stored as data in our system. Once the event period ends, all those looks need to be rendered and displayed for voting, right? So everybody goes and votes on all these looks and the rendering process was one of the main things that was causing a lot of headaches for us because it's a situation of like, we can't pre-render the infinite combinations of looks that they can create for a specific event. So we needed to create a system that would be able to render them on real time as they were getting hit. And we started doing this on a KVM infrastructure and everything and we would scale it up and get ready with loads of additional available infrastructure just to heat those peaks of events going into voting. Once we were able to render them, then we would cash them and store them on CDNs and things like that. So we were able to reduce that workload afterwards, but getting, spinning those things up really quick was kind of like one of the challenges. So we started working on the virtualized parts and we would pre-heat our cluster and everything, but it was still the performance of each render was not great enough. And that's where like, ironic environmental really made a difference because it gave us access to the raw power of those CPUs. And it also allowed us to run way less servers up at the period. And also because ironic allowed us to just launch servers. And like you said, they launched a little slower than normal KVM instances, but it's still that pre-heat period would be very simple for us. And just the performance gains from originally the first public cloud that we were working with to going to bare metal, we were able to drop about 110 milliseconds off of our total time, which when you're hitting about, I think we were rendering about 150,000 in that first 10 minutes of the event going live. It was a lot of very hits. So having all that available power was great. The other thing was spinning these things up. These machines were kind of dependent. And that's where the container stack really started changing that for us because now we were kind of like diverting resources from one side to the other. And some servers had specific configurations for libraries most more than anything that were available for like the API stack, right? And all this servers that we were spinning up, they were being wasted some of the time. And starting to move to containers was, the argument for us was, okay, well, if we create the diagnostic layer of infrastructure where we don't care about what the server is, there's no dependencies on the libraries, making sure that everything is the same and things like that, we were able to then better utilize that layer of compute power that we had. There were issues in conflicting libraries where we would have to launch maybe three instances to be able to run a node service, a Ruby service and a PHP service because some was Ubuntu, the others were CentOS and there were a lot of limitations. So taking that sort of out of the equation and having the container stack be agnostic of what the hardware was, allowed us also to spin up these worker nodes through containers and the orchestration that Mark's gonna kind of show you guys to be able to take care of these loads. So now I'm gonna have Mark, who's our head of DevOps at CrowdStar, to give you guys a little bit of the workflow that we've done and get some ideas out of it. Thank you. Hey, guys. Again, thanks for coming at such a late stage. I'm Mark and again, as Tatru said, I'm the head of DevOps and it was about last year, beginning of last year that we had transitioned from our old data center into internet and we were running on open stack. But through that process, it was the case that we realized that our application wasn't as portable as we would like and we needed to change this. One of the main issues again, as Jose said, the initialization of the nodes where it was quite long, it was like it was a case of, it was over a period of time when you developed the application. It was a three year, so far three years, the application's been running, but over that three years, your initialization scripts tend to get bloated. So we wanted to try and simplify that. Again, to use more efficiently of the nodes and when we were running multiple products, multiple, say an API or the render application, they tend not to run on the same infrastructure. So it was a case that we would like to try and bring them together or at least have a cluster of servers that could be multifunctional. Yeah, so I said the Chef files throughout the development process, they don't diverge. We were using Ganglia for time series data and Magius for alerting. Ganglia is great to set up, it's really easy to send metrics to. However, the structure of Ganglia is a hierarchical structure, so it needs to be kind of slightly fixed. It's not too easy to automatically update and it's difficult for end users to write custom graphs. They require PHP knowledge. Magius has been around forever. Everyone uses it, loads of plugins for everybody, but it was built before the container landscape and again, it's kind of fixed in its configuration and when we're launching boxes up and down and containers are running, you don't have a defined place for a container to run unless you have management scripts to constantly update those Magius configurations, it's difficult. And also the code deployment was long. It's around through multiple Jenkins jobs to get it right and then push it out to the end servers. So yeah, we were using Chef to instantiate all the nodes. Multiple recipes to set up individual functionality PHP engine X, maybe different IP tables, you know yourself. Again, over the lifecycle comes out of sync from what's actually running on the boxes. If you want to do something quickly or if I got a request to install a library from a developer, I just run across and it needed to be done like yesterday because they never told me. To just hack it out and just throw it out onto the servers and get it up and running. Again, we wanted to see if we could separate the infrastructure so we didn't care what the infrastructure was going to run but at the time they were too tightly coupled. And also there was no guarantee that what was in the recipes was actually running on dev after if I go and change something and don't go back and update my recipes. So again, moving to containers but there was four things I kind of really wanted. Standardize everything. Give power back to the developers so if they did want to install some particular library they could just throw it into the container and I didn't have to do anything. I could look at it and make sure like, yeah, you're not doing something silly. It simplified the deployments and then guaranteed whatever was running on their local Docker environment that when that went rolled out to staging or QA and then went it out to production it was exactly the same, exactly the correct libraries and the code base. So what did we decide to do to solve the instantiation problem? We had just using the Nova API to call out to get access to compute or ironic resources and then we decided to stick with like is the bare minimum that you could go with for, now we haven't switched over to CoreOS yet but that's down the pipeline. We were using CentOS and there's issues running CentOS on a loop back device in a device mapper, the link there, why friends don't let friends run device mapper in production, go have a look at it. So the cloud configuration was really simple, booted it up from a Nova API call, updated all the packages that were currently in the image or in the flavor, initialized the users that we needed, added the Docker repo, installed Docker, registered with our DNS, added a few name servers that we needed and then registered with our Rancher orchestration system which I'll get into. So tips in Docker files that I kind of came through. Initially why I had the code bases for the main code base of the application and my Docker configuration separate, don't do that, put the Docker file into your code base and if you can and it makes sense, put your config files for your various environments into your code base as well, just stick it in, it streamlines the building of the images and then leverage your environment variables when you start the container to switch over between which environment you're gonna be in. So if you're running locally on your dev workstation and you want maybe xdebug for PHP or there was more different config files, just set which environment you wanna run in, that means everything's the same, you have a single entry point and the entry point manages the environment state. This is a little bit controversial to use a process manager with the inside of Docker. Some people like say no, you shouldn't do it but if it makes sense and it's okay with you then I don't see a problem with it. Again, it's controversial. Tag your images when you're building them to like a standard naming convention across all your products, you know exactly what it's supposed to be and just stick to it. This is just an example, you can make up your own and it makes sense to us when we're running it. So we're a supervisor for some of the containers we use and one of the reasons was that previous to Docker 1.2 there wasn't any health checks inside of a container. So inside of Supervisor, you start your program, there's an example program definition and then you create an event listener that Supervisor D will throw out events every, in this example it's a tick 60, it's listening for that tick 60 event and then once it receives that event from Supervisor D it'll run the health check. In case in a situation where a process becomes stale it may not die off the process, may not die, it still may be live but it may not be responding. So that's a perfect example for it to check to make sure that the actual process is running correctly. And if it doesn't it'll send a signal to Supervisor D to restart it. Again, with the monitoring we wanted a flexible approach. We didn't want to be stuck in a very fixed structure and we didn't want to go through and manually update config files if boxes or containers launched and the ability for end users to create dashboards on their own without having the PHP knowledge. Again, for logging information we wanted a centralized logged system and not to have multiple syslog configurations. Again, just try and standardize across the board. So what we did was we moved over to a service discovery and we're using console. There's multiple ones out there. We just picked console for us. And then for the time series data and alerting Prometheus and Ganglia Arc, sorry Grafana and for the application we just used an Elk stack. So the way we set up the main hosts that talk to console there's a single container that registers with our console server and then advertises what services that that node will be offering. On a base level all the hosts will run a node exporter and see advisor to get access to see what's going on inside the containers and then what the actual node is reporting. There's an example screenshot. All the services are on the left and then there's a service defined node exporter with all the IPs that that services running on. And then Prometheus connects to our console cluster using an URL consoleexample.com is then that config example. And then it asks console what services I really wanna be interested in. And for example there's the advisor and node exporter. The way Prometheus expects results to come back is on a HTTP metrics URL and it hits each of the hosts. So there's an example of some see advisor metric data that's coming from a host and then Prometheus will parse it and put it into its time series database. And here's a visualization of the time series data from Grafana. So as for the application log data we want to simplify the process. You can add drivers onto your Docker compose file to say which drive you wanna send it to. We just didn't wanna, we wanted to send everything to elastic search so we could process it later on. So there's an example of raw logs coming off in the red coming from a container. So we have logs about the connects to the Docker socket. It reads all the logs coming from every single container on the box and then forwards it onto log stash. Again through elastic search to index it and then we can visualize it in Kibana and there's a screenshot of Kibana example. So this is the basic node, a high level of what is on a single node or what's on every simple node before it gets a workload to process. And I think it's pretty obvious there what's going on. The only thing we haven't mentioned there is the rancher agent and that's the orchestration assistant that we're using and that talks to rancher. So the flow goes console, the Registrator talks to console and then it'll expose that time series data and then logs forwards the application log data and container log data to log stash. So here's an example of a workload in the cluster and this is what we actually run on OpenStack. We could use either compute or bare metal. Once a node comes into rancher then it's available to do work and then on the persistent layer we were just using bare metal because they're long running processes for data. For couch base and SQL and cache. So the deployment, back in the beginning of last year we did a little bit, we did a lot of research and it was still very early on in stages between WAS and orchestration system was going to come out as the best. So the handy thing about rancher is it supported a catalyst zone orchestration environment but it was on the roadmap to support Kubernetes as you can swarm. So we went with that and then over the period of time when we were using it, that support came online. Now how do you define your application? The rancher catalog is basically a library of pre-written applications with configurations for and they're written in Docker compose your normal Docker compose files and then there's a rancher compose which is used for scaling or the various particular settings of where it should run or it's more to do at rancher. But at the moment it only currently supports Docker compose files version one not version two. I think it's coming down the line and then maybe the next release but we'll see. So when we're building the containers what happens is we use Gitlabs continuous integration and once a Git commit comes in on a particular branch, we'll issue a build and this build here you can see the script is running the container manager. It builds the tag then builds the image pushes the images out to Docker hub builds the rancher catalog from a set of templates and that's all that we've put into the Git repo for the rancher catalog and then it calls back out to rancher to ping to say update your the private catalog. So here's the flow again pushes it to GitLab triggers the pipeline and then builds tests builds the rancher catalog and then calls out to either deploy or upgrade the stack. Now, this is a live demo, we'll see how we go. There's two URLs there, the voting demo.crazor.com and results demo.crazor.com it's the standard Docker example project where you're voting, it goes into the queue and then there's a Java worker now pulling the queue off and adding it to the storage. So let's see if we can just do the demo. So here's a, I might have to actually refresh this. Okay, so here's the view of rancher and these are all the stacks we're currently running and we're currently running this on the rancher is on a single node and then we have two worker nodes that are connected into rancher that rancher is actually managing and the only part of rancher that you need to be careful of is the MySQL database that it's running. That's the persistent state of the orchestration system. So we need to mind that and make sure that that's safe. So in the load balancer, with each of these stacks, here's the actual demo Docker app and we have a load balancer and it's pointing, has a few ACLs that are pointing out to a different URL. So here's the voting and results and we have our Git lab running as well. So if we go to our Grafana and see, this is the one that's taking all the time series data in from the containers and also the host metrics. And again, this is far easier for an end user to create graphs. So here's a view of Kibana. There's no data in this at the moment because it's constantly updating. So we'll see if we can go and just reload this. And we're going to send in a vote and notice that the vote will be processed by probably a different worker. And then here's the actual log that's actually popped in from live from the container. So here's the, it's updated from the node application and we're going to change it now live. So we have a, I've created a merge request into the open stack branch that the pipeline is monitoring. So this looks okay. We'll see who, we go and click on to accept the merge request. So we go to the pipeline, hopefully we're running. Hopefully we're running. We're building our stack. This is actively running and it's building the tree images. It's uploading them to Docker Hub. So now it's gone and built the version that we want for it. It's gone and built the Rancher Compose catalog entry and it's updated with the correct version and the hash of the, from Git. So then if we go back into our catalog and refresh, I'm sorry, into the stacks, it'll see now it's monitoring that Git repo where all our catalog entries are. So this is the catalog view of our own private Git repo. And then this is the standard one you generally see. So in the stack, notice we have cats versus dogs. We're going to do an upgrade. Select the version that we've just built. Hopefully that's the right one, 34. So it's upgrading to refresh. So we've switched over to the new branch. And again, if we refresh this, we've the new version. Now again, if we go back to this, it should update. I don't know if I do it. Did I vote there? I don't think I voted. Yeah. Let's see. Yeah, and there they are. So that's the whole process throughout from a Git commit all the way out to deploying the full stack. So let's go back here. So what we've done, we've sorted out the long initialization period. We're using Nova and the cloudiness configurations. It's very generic and it's multi-purpose. Now we've consistency between environments. And again, the logic has been decoupled from the infrastructure. And there's no dependency between nodes. And then most importantly, we've given developers more control and they have access if they want to add new libraries. And again, we can guarantee that code being pushed and created is guaranteed to be correct on all those environments. So I think I'm gonna be putting this code back up on GitHub. This is the URL if you wanna go get it. And I think we can have any questions. Okay, I think we'll call it there. And thanks for coming. Cheers. Thanks guys.