 Good afternoon everyone, I'm Pierre Gaunt and he's Pierre Grondin. We are both SREs for Tumogo and we're going to make a presentation today about our implementation of OpenStack, basically how we built a private cloud that can handle 40 billion requests a day. A few words about Tumogo, we are a public company since mid last year. Our focus, our purpose is to deliver ads, video ads for brands. We did more than 27 billion last year and we serve 40 billion ad auction per day last quarter. What you have to understand is the bidding process is something that is kind of specific. It's really latency related so performance is a big issue. We need to process all the bids in less than 50 milliseconds and the total time for a bid has to be less than 80 milliseconds, otherwise you're basically dropped. In terms of number, we do five petabytes of video traffic each month and in terms of storage, we have more than 1.6 exabytes of data storage, so yes, that's a lot. As far as we are concerned, we are in the ops engineering team. Our role is to ensure that basically everything works well and also to provide cost effective solution because of course cost is important and provide also cutting edge solution to our developers, provide them support. Our team is composed of site-related engineers such as system engineers, DBAs, and we are in two locations, US and Ukraine. The total of server we manage is more than 2,500. That's basically what the location we have, so as you can see, it's like I would say a standard mix of public and private stuff. Instead of hybrid stuff, everyone says that they are hybrid and stuff. Our organization was already kind of hybrid before we started this project. We were in six AWS regions and we had physical servers in Michigan and Arizona, mainly for web servers and databases. We use external DNS providers, we do external monitoring through catch points because it's not super nice to monitor your Amazon zone in between them sometimes. We also serve part of our contents and video content through CDNs. Then last on hybrid, we also have external providers for security audits. Based on that, why did we choose OpenStack or why moving to OpenStack? One thing you lose when you start using other one's clouds is basically you lose some level of knowledge on your architecture. You want to get that back because you want to ensure performances. If you want to do that, sometimes Amazon will answer you and we don't know. Also in terms of network, physical proximity matters. If you are close to your customer, then latency is small. If you want to be close to a specific customer and there's no Amazon point in that part, you're screwed. Also, you can plan more easily, hopefully, on what the increase of your traffic will be and your infrastructure should be. This is basically for cost efficiency also. Last, we have technology transparency. Basically we want to have everything as clear as possible for everyone within the company. Of course, the main reason is money. The interesting thing here is how you go from this super nice cluster, if I can say so, of Raspberry Pi to this wonderful, gorgeous production, all green page. As many people say, it's a journey. This started more than two years ago now in May 2013 when we had our first environment in Emoryville, California for people who don't know. That was based on a Grizzly version with 12 nodes, so 240 cores, you can make them as you want. The whole total memory was one terabyte at that time. It was really a dev environment, not in a real data center and stuff like that. Next at the beginning of 2014, we went to a real prod-like data center with everything routers, servers. At that time, the release was Havana. It had 40 nodes, more than 1,000 cores and 8 terabytes of RAM. That was nice, but still we needed to learn more, basically. Beginning mid-2015, we started switching real production traffic. As you've seen, it's already two years since the project started. At that time, it was IceHouse and two months ago, actually, we moved a full zone in the Amazon AWS way of thinking onto our own facilities. From an operational aspect, what are the issues in doing that? We are not rack space. We are not CERN, unfortunately, maybe. We are a small team. We are 12 members in two times zone. That means we cannot say, yes, let's add more engineer into the OpenStack project. That would make no sense. We are basically two and three working on OpenStack dedicated to that, which means other people also help. What is the challenge doing that? First internal training. Many people talked about that. It's really hard to keep everyone up to date on what you are doing when you are implementing something that is as cutting edge as OpenStack. Of course, you get little external support, as long as you are not relying on vendors, versus Amazon where you can create tickets and wait for them to complete. One other new challenge is, of course, you add some part of hardware management, if I can say it that way. You need to have new suppliers, equinics in our case, transit provider, or whatever. That needs to be basic to manage your data centers, and that's time consuming. Then you think about migration, and whoever does computer science in his life knows that migration is always an issue. First, we heavily depend on Amazon. The company has been built for years on Amazon, on Amazon. That means basically our internal ops tools rely on Amazon. Developers' application makes some assumption on Amazon. We use all this variety of stuff, S3, Dynamo, and stuff like that. You need to know exactly also what you are going to migrate or not. Important stuff that everyone says. You need to tell your developers that you are migrating. Basically, they need to help you doing it, because that will need some work on there. A big challenge we have is basically the short release cycle of OpenStack, but many people have talked about that already. From a client-end point, it's kind of complicated because sometimes we spend eight months to implement something, and then in eight months you're basically almost to release behind. You need to select what components you need, but we will see that a little bit later. Some of the things are very challenging to migrate. In our case, the storage, namely. Also, how far do you go into controlling your networking and your hardware? As far as network is concerned, we choose to go as low level as we can, except like digging roads. It's basically getting our own IP ranges, AS number, V4, V6 for people who want to play with that. Even for us it's not really, it can be used, but we don't use it for now. It enables us to choose our transit providers. We choose to have only T1 providers that goes back to the issue of latency. We need to have people that ensure latency somewhat. It also gives us as short as possible routes from or to our endpoints. I mean, there's no guarantee on that, it's BGP, but that's the idea. It also gives us the freedom and the ability to make basically the network as we want. If tomorrow we need to have dedicated AWS connections, because many people are on AWS, we can do it. If we need to peer with other ad exchange networks, we can do it. This is a very nice feature, but it's also for us more responsibility, because up until now you basically put everything in Amazon and hope that their network is working. Now that you own the network, you have to do that. You need to provide dashboards, you need to give numbers, you need to give all the graph stuff, and people love the graph. Also a benefit of having your own network is you cost-control it, I mean, most of the time, so you're going to try to do something that is somewhat better in terms of costs than what Amazon does. Speaking about hybrid, our network is kind of hybrid also, even if I don't like that term that much, but anyway. All our applications have been running for years on Amazon, so they are already cloud aware, you can say it that way, which means basically they can break, we can break a machine, we can break one instance, and we don't really care that much. Being also a hybrid network allows us to have a limit set of issues in terms of virtualization, because we already solved them, basically. Once again, owning the networks allowed us to have a mix between OpenStack, BearMevo, and Amazon. Also in the choice we made, we choose the Nexus 5K, hello Cisco, they are here, that's top of rack switch, because it's supposed to, it will allow us, if we want to do it, but we are not doing it yet, that's a big discussion we have, to basically push configuration directly on two switches. I'm a bit worried about this part, to be honest, but, and also it basically ensures that as Cisco is very committed on two OpenStack, that we will be able to evaluate as much as we can. Again, in terms of networking, we choose to have only one gigs for admin, people around may make different choices, and basically end times 10 gigs for public. Another thing about owning the network that is really, I mean, useful, I just took one example, but as you can basically dimension your network and do features that you were not able to do previously, for example, multicast, it's something we've been talking a lot recently. So, here is a good example of one thing that we did specifically for load balancing part. So, the classical scenario is when the packets come in from the public network, it will go through your network node, which will then re-route it to your private network and to your compute node, and then the instance, and the traffic backwards of the following the same path in reverse. So, it has one big disadvantage is that you rely a lot on your network node, and it poses all the high availability issues that everybody knows. So, what we did actually is that we put a load balancer onto bare metal nodes, and it allows us to follow a slightly different path. So, when a packet comes in from the public network, it is directly directed to the load balancer, which also has an interface into our tenant villain. So, a load balancer has a capacity to directly communicate with our instances on our compute node, and completely bypass the network node for everything, all traffic that is coming from the outside through our load balancers. So, it has two advantages. The first one is that you offload some part of your traffic from the network node, and you don't rely as much on the network high availability. Instead, you can leverage your load balancers. As I was saying earlier, we need to be close to AWS network, because basically we have a partner over there, and also because we may want to offload stuff in there, and for multiple reasons, we need to have good latency to them. So, I did a measurement this morning. This is basically a root latency measurement between one of our application servers, or BDIR in that case, and one of the gateways that is still in US West, AWS US West, sorry, and you see that the average, I mean, I could have let it stay longer, but the result would have been pretty much the same. The average value is like six milliseconds, so we can consider this as small enough. So, how do we do this? First of all, since we are very simple, very small team, we wanted to keep things very simple, because we are not building a multi-thousand-inpervisor cloud, so we don't need something as complex as this. One of the big benefits is that it allows you to simplify your day-to-day operations, and helps people like us. We are very happy with that. We made the choice, for example, to use homemade puppet catalog instead of using the community one for several reasons, because first of all, since we don't use all of the open stack components, it allows us to perform the same level of, it allows us to deploy everything we need with less lines of code, so it makes our catalog more simple and easier to maintain, and also easier to learn when somebody is joining the team. And we choose to not use some component like Horizon, for example, because we don't need this kind of web UI dashboards. We also made the choice to not use shared storage, because we don't need live migration, for example. So it also allows us to not rely on this technical feature, because our application is already handling possible breakage from the cloud since it has been built for AWS. When you own your infrastructure, you can also leverage things like affinity and IT affinity rules. For example, you can enforce a better resiliency of your application by using anti-affinity rules. Let's say, for example, that we have a log elastic search cluster. If you end up hosting several nodes of your elastic search cluster on the same hypervisor, it's really a bad idea, because you will, if you lose your hypervisor, you will lose a big chunk of your cluster. So anti-affinity rules and affinity rules are actually easy to implement, because if you leverage the nova metadata on your instances, and you can also improve your application performance by using affinity rules. For example, if you have a part of your application that relies on communicating with another service that is clustered, you can ensure a better geographical position of your instances to maximize, to minimize the latency of your services. So being a software engineering company, we choose to trade this infrastructure project as any other engineering project. We already are using a lot of tools like Puppet. We already are using concepts like infrastructure as code. It allows us to improve the production stability and to his production deployments by raising confidence in what we are actually doing. So RegeringPuppet, we already have a lot of automation for all of virtual machines. We did over 10,000 puppet deployment last year, and we did over 800 and 8,500 production deployment last year using Jenkins. So we choose to use the same tool that we are using to deploy your virtual machines to deploy your infrastructure running OpenStack. So on the infrastructure itself, we are using a masterless approach to deploy the node itself. So we bootstrap the node, we install the operating system, and then we deploy OpenStack and its dependencies using a masterless approach, which allows us to not depend on a puppet master when we do the deployment. So whatever happens on our puppet architecture, we can always deploy OpenStack clusters. And once this setup is done, we switch to a master mode to deploy everything else, for example, your monitoring and day-to-day operations. We have the same approach for virtual machines, both in AWS and OpenStack. So a puppet run is triggered by loading it directly at the boot of the instance, and usually from booting on the instance to having it in a production-ready state, it takes less than five minutes. So another concept that we are using in our approach is a code review. It allows us to raise awareness of what's going on in the team and also exchange on a different implementation ID for a given problem. So we are using Garrett for this, which is used by many, many big organizations, including OpenStack, of course. And since we already had Garrett plugged into the whole component of our engineering production development workflow, it was really easy for us to set up this workflow in OpenStack. We are doing code review per commit to keep a small change as possible and iterate fast, even on the infrastructure itself. And we have integration with other tools that we use on our daily job, for example Jenkins, Jira and HipChat. And we were already managing over 600 Git repositories, so adding a few repositories specifically for OpenStack was actually quite easy. So when somebody wants to do a commit, we have some automatic filters that are triggered by Garrett and run against Jenkins. That performs some basic checks, for example, is a syntax of your configuration file correct for everybody knows that, for example, EML is quite sensitive regarding spaces and tabulations. So using Jenkins and Garrett, we can already catch this error even before they actually merge into our trunk. And here is an example of our current workflow when we work on the infrastructure itself. So let's say that Pierre wants to commit a new feature. He pushes a commit to Garrett, which is then code reviewed by a member of the team. Once this is approved, he triggers a job on Jenkins, which will deploy our full stack on our lab and QA environment. And we get within a few minutes, we get a status if this change is actually doing what it was expecting or if it broke something. And we follow the exact same path for production deployment. So if you want to deploy in production, we just need to trigger our Jenkins jobs that will deploy the latest change on our production cluster. And all of this is fully automated. So this is a list of some of the jobs that we use for continuous delivery using Jenkins. We have four top-level jobs, for example, open stack management, load balancers and SF for each of the top major components that we use for open stack. And each job is broken into several smaller jobs that allow us to reuse a specific task between jobs and make them easier to maintain. For example, when we want to deploy SF on our lab cluster, we have six phases from deploying the operating system to actually performing health checks and benchmarks at the very end once it has been done. And each step of the phase is parallelized. So to deploy SF fully from scratch on our lab environment, it takes usually around 20 minutes from installing the operating system to having the result of the benchmark itself. We also have a team awareness using the integration with the HipChat. We use HipChat for the communication between all the team members and across different teams. And Jenkins is posting some status message on our HipChat channels. So whenever something breaks, we actually, whenever something breaks, we actually get an instant notification. So we can have a look at it and we can also make sure that every member of the team is aware of what's going on in different parts of the clusters. Since we have a lot of automation for the development itself, we can leverage that to simplify your pollution deployment and they become easier as one, two, three. One, you test your upgrade using Jenkins. Once you are sure that they are working as you expect, you just deploy the upgrade by pressing a single button. And three, you just enjoy the rest of your day. So this is, for example, our senior director of operational engineering switching our production workload to open stack. And this is really how we did that. Press a button, everything switched, done. So next is basically the monitoring because it's super nice to have all that working and deployed and all your data center and things, but you also need to monitor that. And that's the tricky part. As ops, we always want to monitor as much as we can. We already have a lot of existing monitoring using regular tools. I would say NADJOS and Graphite. This is basically something existing on AWS. You don't want to remove that, of course, because we'll keep that for open stack. But you need to enhance it because you need to have specific checks for open stack. Basically, you need to check the two main things we did are checking the components API. First, because you need to check their performances, you need to check their availability and you need to check that they basically works because it's nice that they are fast, but if they don't give the results that you're expecting, it's useless. And also, we need to check resources, available ports, and failed instances, as sometimes instances fail. Also, monitoring needs to be updated to monitor all your metrics for the hardware. Basically, the load of all your compute nodes, the load of your network equipment. I don't know, everything, all the ports, the ports rates and everything. So, as far as network equipment are considered, we use SNMP traps and we collect SNMP data. And for the rest of the existing instances that we just moved onto open stack, we basically kept what was existing as a monitoring for AWS. We did something very interesting. It's basically auto discovery of nodes. So, whenever we want to add nodes to the cluster, all the monitoring is automatically created. Two ways to do that, something that runs periodically like Chrome. Or upon request, if you add 10 nodes in a row, you don't want that, you may not want to wait for next Chrome. How it works? Basically, in Azure detects that you have a new host querying an open stack API. And basically, it creates all the checks related to this host based on its role. Last but not least, the graphing is also dated automatically at the end of the process. That allows us to do very nice diagrams like this one. So, this one is basically extract from logs from bidders. It gives us where the request comes from, geographically on the bottom, the error rates on the first one, receive packets on load balancer and stuff like that. Another really cool feature we made leveraging Grafana is basically creating dashboard because it's very nice to have all your graphs everywhere, but if you need to go through 10 pages, it's not efficient. So, basically, you need to have one page like this one that gives you the health of your cluster somewhat. So, here you can see, I don't know, total bandwidth, the number of public IP we used, the cluster load, even if it's not super readable from there, and at the bottom, the bandwidth usage. So, as Pierre said earlier, it took us two years to get there, so it's a good time to make a look in the rearview mirror. So, we found several benefits in running our own open stack setup, for example, transparency or visibility of what your application is actually doing. So, for example, we noticed after switching the production to our former first one that we were seeing a very odd pattern every 15 minutes. So, basically, we were seeing a huge spike in traffic every 15 minutes, and we never noticed that before in AWS. So, the reason is that in AWS, your metrics are usually based on instances, and this is the kind of pattern that you will see when you look at the whole picture of your infrastructure. So, we were able to find out that we had a specific component of our application that is dispatched on about 100 of nodes that was trying to push information to the outside at the very same time. Another benefit is that you are able to tailor your own instance flavors. For example, if you're running on AWS, and you are currently using an M3 X-large instance, and your application now requires two more gigabytes of RAM, your only choice is to move to an M3 X-large, and you will get twice more CPU. You will actually get 30 gigs of RAM instead of just 17 you needed. So, it also has a big impact on the cost of your infrastructure. And when you are running OpenStack, you can fine-tune the instance flavor specification for what your application really needs. Another benefit is that since OpenStack has some really well-designed API, it was easy for us to adapt the tools that we are using to provision our instances and basically run our platform from our AWS to OpenStack. So, as far as our operation team is concerned, this OpenStack cluster is just a new zone as will be any new zone in OpenStack, in AWS. A huge benefit is that when you leverage the best of a hybrid model where you have instances virtualized and some bare-metal roles, you can actually improve the efficiency of your operations. For example, we used to have 32 load balancers for a very specific part of our application and currently, we only need two for each actually. We cannot we cannot add a full production workload on one. So, we are currently able to serve over one million packet on two load balancers including full SSL traffic where we used to require over 32 before. There are still some use cases that do not fit an OpenStack environment. For example, yeah, for example, downscaling. I know it's a good it's a huge problem for many people because as long as you grow, it's okay. But the thing is, once you buy the servers, you cannot like sell them. I mean, physically, that would make no sense. So, once you pay the CPU, you should use them. So, you have to try to make as much usage as you can of them. Also, upscaling, the opposite, basically, is also an issue at some point because when you decide you buy the hardware, you're going to make like assumption. Basically, my application is never going to need more than 256 gigs of RAM. And AWS is refreshing is instance type. So, sometimes, what they're going to do is they're going to provide instances that have more RAM. And if your developers start to use them, you're kind of screwed. Also, when developers do a small feature change, sometimes it has a huge impact on load. This is totally manageable on AWS because basically, you spawn more instances. It's not that elastic onto your own hardware. And some workloads are really elastic. Like, for example, on our end, machine learning are clusters that are spawned every night or every now and then when we need them. And there are really a lot of machines in there. You may not want to buy those. So, what we've learned in a nutshell, we can be hybrid. We can average a boss, the best of every world, AWS OpenStack and some bare metal host in the same infrastructure. Also, whatever the stage of your OpenStack implementation, you will need a dev lab, QA, whatever. You will need something to test your change and make sure that you don't buy things or that new version of OpenStack behave as you expect and don't buy things. Storage is also a big issue for us because we currently storing over 1.6 gigabytes of data. So, we can't easily build a cluster to do that ourselves. Some stuff also may stay forever on AWS, for example, storage, maybe, some other part of the application. Communication between your operation and developers is really important because if you don't get your developers on board to do some config tweaks, some optimization regarding the ways user instances and stuff like that, your project will be much more difficult to achieve. We also learned that OpenStack is really, really flexible. We all know that there is maybe something like 12 different components. You can pick the one that you need and you can replace a component by some hardware if you need to or even just keep some part of your application in the AWS. We also learned that there is no need for HA everywhere. It might seem strange but you can sometimes remove the HA by leveraging some external service, for example, HA Proxy on bare metal and it allows you to build a very an infrastructure that is much more simpler than if you try to put neutron DVR everywhere, for example. And also, spikes can be offloaded on Amazon's what is usually referred to as cloud bursting. So we still have this capacity right now. So this is actually how we did the migration but if we need to, we can still send part of our traffic back to AWS in case of a disaster recovery test or if we are facing a huge unexpected peak of traffic. We still have a lot left to do despite having working on this project for two years. On the technical aspect, we need to migrate to other AWS regions. We only did one so far but we are currently working on doing a second one. We need to gain more experience on running a production-grade OpenStack cluster because it's always full of surprises. We need to work on version upgrades. As Pierre mentioned earlier, the race cycle is of six months and it's quite difficult to keep up to the pace. We also need to continue to adapt to tooling make sure that we are monitoring everything that we need to make sure that our tool are aware of the differences between AWS and OpenStack. We need to work also on a better capacity planning because it takes actually weeks if we need to expand our clusters. We have some concerns regarding the new regions. Maybe we will have new issues over there and also the human aspect still need to work to be worked on. For example, our dev team is still used to the AWS world when you are referring to a specific instance flavor if you want to actually just refer to a given amount of RAM. So, after math, we are serving in production since May 2015. We have been in traffic in production since September 2015. We currently have 100% of time since March. And the cost of operation for production workload has been reduced by a factor of 2 including the OPEX. So, it makes us quite happy. Yeah. Any questions? Can you use the microphone please? You mentioned there is like not every component has to be HA. So, are your OpenStack controllers highly available? What about networking and storage? Yeah, so our storage cluster is running safe and safe being a distributed storage is kind of a issue already. We currently have seven monitors for the production cluster, for example. Regarding management node for OpenStack, we have different... For example, we are using an active passive part for the MySQL database. Is this answering your question? About messaging. We have an active-active RabbitMQ cluster, of course, with a mirrored queue. Any other question? How big is the self cluster that you have? So, currently we... It comes off nodes and storage. Yeah. So, currently we have 16 nodes. On each node, we currently have two OSDs of 3TB each. We have the journals on SSDs. We are considering adding an extra OSD on each physical node because we still have one slot available. So, we are somewhere around 35TB. Yeah. We have a replication factor of three. So, 6TB per OSD. We actually did a migration from a smaller self cluster to this new cluster last month using the workflow we described. So, we did the tests on our lab using Jenkins and we did the deployment in production using Jenkins, too. So, the question was, is SSD journal on RAID0? No. We don't use RAID for the journal. We actually share a journal currently between two SSDs. But we have some very high-end SSD drives. So, we don't notice. So, even by using two OSDs on the same journal device, we are currently... Our bottleneck is currently on the device itself rather than on the journal. Maybe something we can add on that. I think today, if we were to redesign stuff, if you can avoid spinning disks, do it. Yeah. If you have enough money. If you have enough money. Actually, our first self-proliction cluster was running on Fusionary drives, but it doesn't really scale too expensive. No more questions? Thanks everyone for attending.