 Hello, everybody. How is everybody doing today? Did anybody had a chance to check out the local barbecue scene? Yeah, I had an opportunity to check one of them yesterday. And I was in meat combo for a couple hours. So I apologize ahead of time for the quality of the slides, because they definitely suffered because of that. Anyway, my name is Dmitro. And I'm here today to talk about the costs, the true cost of running cloud and IT infrastructure. Before we jump into the subject, I'd like to introduce myself. So I spent a number of years doing research in the area of performance modeling and provisioning and scheduling. After that, I was a capacity planning engineer at a major video game company where I was responsible for capacity planning for data centers, which were hosting the backend for major video game titles. Recently, I started a company with a former colleagues of mine called PAX-Odema. And at this company, we are looking at a easy way to run hyperscale-like-grade infrastructure for organizations of all sizes. And in order to solve that problem, we decided to build operas. Operas is an operating system which actually solves all of those problems. So why specifically this topic? In the past, I faced this question myself many times. And a lot of colleagues of mine and professional friends of mine, they faced the very same problem. Like, how much is going to cost us to run a specific deployment? And the problem with the existing environment is that, A, there is a big shortage of information. I tried to look up on the internet a number of times to find more or less objective study about the costs for hosting. And all of those were extremely biased white papers or very incomplete discussions on Reddit and hacker news. And that didn't really paint the big picture. Plus, the pricing model offered by, let's say, Amazon, or when you come to build in your own hardware, it does get pretty complicated. You have to operate in the space of, let's say, 30 to 40 different parameters. And having reasonable estimates for those is quite difficult and challenging. Not everybody has the good apparatus to obtain in those estimates and calculating the future cost for running this. So I decided to build a model based on generalized data, from my past experience. I spoke to a number of people running their own deployments in different environments, so they provided feedback on the prices and cost they're seeing. And I decided to build a model which will show the cost of running infrastructure, specifically since it's a KubeCon. We are focusing on the Kube infrastructure in different environments. And I'll be increasing the size of the infrastructure in this model and seeing how the cost is going to change. So that basically is the means to understand the economics and the economics of cloud and on-prem hosting. So as you might guessed, there are several options for hosting your Kubernetes. Option number one, and this is very popular option, as we heard today in the keynote, 63% of Kube deployments actually run of EC2, so cloud. The second option is buying physical hardware and placing that in a carrier hotel or a carrier provider. And to this case, I will refer as Colo. And there is also the third case when you can build a designated facility with the cooling redundant power. You can dig a trench and put some beautiful fiber there. That's called the data center. You probably know about that. So I will focus in this talk on the first two options because the third option is very expensive and probably is a little bit out of the scope. So as a gentle reminder, when you found a hammer, not everything looks like a nail. So don't suffer from a Thor syndrome, please. I'm not here to start a holy war. There are true believers of the cloud. There are true believers of bare metal. We are not here to engage in heated discussions about pros and cons of this. I'm here to talk about money. That is the main objective of this talk, to estimate how much it's going to cost and all the technical advantages and disadvantages. Well, there are a lot of other talks where those are being addressed. So please focus on the problem, not on the solution. So what has changed with the emergence of container orchestrators and specifically with the emergence of Kubernetes and Kubernetes being the major container orchestrator at the moment in terms of server management? Well, you remember those times when you had to, for example, deploy a new web tier for your application and you had to send a ticket to your administration team and say, hey, I need five servers of 1,011 spec from this and this season and in order to run this web tier. That was very complicated and hard and awkward and, to be honest, quite tedious. With Kubernetes, that's very different. What changed is that your application is effectively decoupled from the hardware, right? When you deploy your application, it doesn't really know where it's going to run. All you have to do is to say how much memory and CPU you want to allocate to the specific pod and then that pod effectively becomes a Tetris block and Qube just will look for different servers and will try to find where this Tetris block fits the best. So the conclusion of that is that in terms of, in the context of server management, the server counts don't matter anymore. The specs don't matter that much. All that matters is how much CPU and memory you've got. So CPU and memory basically became a utility and Kubernetes has really, really helped to advance that step and I'm really glad it actually did. The second thing is that you remember those times when everybody was super proud of the uptime of their servers. Does anybody remember that? OK, just out of curiosity, what was the highest uptime you've seen on a production server? Let's play a game. 1,800 days. Five years. Yeah, we got the winner. I thought I was the winner. I had like 395 days. So that was cool. How many security patches were missed because of that, which actually required a restart? All of them. We got an absolute winner. So guess what? We shouldn't be very proud of it. Well, it's kind of cool. It's a cool number and probably should be a special competition about the uptime. But we no longer have to maintain that at any cost. Why? Well, because with QP, we are provided with a self-healing mechanism which can help you to deal with the failed server or if you need to restart a server. It can simply reschedule your pods, reroute the traffic somewhere else, and voila. You can reboot your server or you can let it go down for some time. Also, you don't have to trouble your operations team with the tickets and servers down. And then there will dispatch remote hands who were probably competing in the Olympics to run as fast as they can and to replace that failed SSD or rebuild the rate and bring the database back. No, you don't have to do this anymore. So what has changed in the world of servers? Well, it's still a big gray box with a lot of blinking lights which cost a lot of money. So that part didn't change much. However, the density of the compute power and the memory and storage per single unit of server, per single unit, has changed a lot and has dramatically gone up. So if we have a look at the white box, sorry, white vendor, like for example, this is the super microserver. I just basically Googled the first site which came up to offer super microservices and looked at the sticker price of it. It costs about $25,000 per chassis. It offers 112 cores. And those are physical cores. Each of those cores has two CPU threads. So effectively, it's 224 cores which are visible to your applications. 512 gigabytes of RAM. And if you split it over, let's say, three-year amortization, which is quite, I would say, aggressive because everybody who are in data center, they know that very often you can keep your hardware for as long as five years or even seven. But let's say three years, for the sake of simplicity, $8,000 a year. OK, that was cool. So then we asked the next question, how much would cost an approximate box in Amazon? And I asked a lot of people who actually who run deployments in Amazon. And everybody said, we usually run M for instance. I said, OK. So the M410 extra large instances, which comes as on-demand instance, comes as 40-D cores. And now please remember that V core and physical core are very different. We have several times performance difference between those. And 160 gigs of RAM costs about $17,000 a year. The graph on the right is showing how much CPU and memory you can get per dollar if you buy that fat twin versus, again, on-demand instance. And obviously, you can have reserved instances which require longer term contract and cost cheaper. But you can still see that that gap is huge. Well, let's close that gap a little bit, because physical servers don't run in vacuum. You need power, and you need to place them somewhere. And the caveat with physical servers is that the placement of the physical server or geographical location determines the costs for running it there. Why? Because the cost for power differs a lot. Let's say if you want to run your data certainty, if you want to run your call in Nevada, that's probably going to be cheaper than running that in Tokyo or in downtown New York. So the prices I used in this model are shown on this picture, and they represent a more general case. Apologize. So that closes the gap already by a little bit. So now let's talk about network. And that's a very interesting piece, because you need to connect those servers together and somehow connect to the outside world. The network architecture has done a major leap with the emergence of the so-called close network design, which allows to run more scalable network. Effectively, your backbone can scale out if needed. It can tolerate component failures, and it provides a very, very substantial east to west bandwidth. And why east to west bandwidth matters? Like this is east, and this would be west. So east to west is bandwidth between these two guys. Why does it matter? Well, because when you run production system, very often those production systems have calls to other production systems and other production systems. And it's not uncommon to have about 40 requests happening inside because of one request which you write from the outside. Why? Because that's the microservice architecture everybody is trying to achieve. That's why it's very important that it's not only the bandwidth for getting from outside the perimeter. It's also getting from one server to another. This is what matters a lot. So if you decide to go with Scola, make sure that your east to west bandwidth is very, very well thought through and that your latency is bound. And why latency? A lot of devices offered on the market these days, they have really, really big buffers which can store a lot of packets. And this is nice. But what happened is like, if your packets get into those buffers, basically they stay for some time. And that reflects as a latency. So you have to make sure that the capacities you have are actually sufficient to accommodate all the traffic and that that traffic is not going to get stuck in buffers occasionally and have latency spikes. So network. In Acola, the cost network would probably come at the cost of $300 to $600 per node for the three years. And the external traffic, it depends on the ISP you will have in your colo. But the general conclusion we have arrived at was that it costs about 1 megabit per month. In the cloud, the situation is different in EC2. I will be referring to EC2 as the cloud. The internal traffic is free. Traffic within one availability zone is free, which is very nice. However, the external traffic is more expensive. It's about $0.12 per gigabyte. So the difference between those two is that if in the first case you pay for the metered traffic, which is pretty sweet, in the second case you have to pay for the pipe. So the saturation of your pipe and the size of the pipe basically is at you. However, the pipe can be a lot cheaper. Now storage. Well, the general conclusion was that most of the people run storage of the EBS in Amazon. And EBS is amazing because it can be mounted to any physical server and it has some redundancy. So that's pretty great. However, it costs about $7.69 gigabytes of data store. Quite generous. And 14 IOPS. That's not a lot, to be honest. If you look at the commodity SSD available on the market, you can probably go and buy in any store. It can handle $6080. And if you're looking at the NBME devices, that number is almost four times higher. It costs about $500 per terabyte, which can be translated into 22 gigabytes of storage per dollar per month and hundreds of IOPS. But you cannot just buy SSDs and throw in your colo and wait for them to turn into storage because that doesn't happen like that, unfortunately. Or does it? If you run self on every of your Kubernetes nodes on the background and you basically expand the storage of your compute servers with those additional SSDs, they can be used as the storage servers. So effectively, you are shoveling your SSDs just in your data center, in your colo, and they turn into the storage. But you have to run self on that. And that doesn't come for free because it doesn't come for free, period. The data you store in your, so to say, active storage needs to be snapshotted and backed up for business continuity purposes, for disaster recovery purposes, et cetera. So when it comes to EC2, it's pretty straightforward. You got S3. S3 is amazing and fairly cheap, very cheap. When it comes to the physical world, well, you got to buy that box, which is a nice for you server, which will keep your colo nice and warm. And that heater costs about $70,000 and can accommodate about one terabyte of raw storage. Usually what people run on those is ZFS, and they have about half of that storage being actually usable for doing the backups because you have to enforce some redundancy. So what we have figured out by now is that the physical world, everything is a lot cheaper compared to the cloud. However, there is another part of the expense, which is very often overlooked, and it's people. Like, people, unfortunately, don't like to talk about other people. I know that's kind of weird. The people costs are very substantial in those both solutions. So when it comes to the colo solution, you need people, A, to replace failed parts. You don't have to do that frequently anymore because you don't have to replace. Your downtime is not determined by the downtime of a specific part, but you still need somebody to go around and replace failed drives or memory sticks. You need contractors for doing Rack and Stack. You can have people on staff who are doing this, but this job needs to be done. You need system administrators which would configure that storage, which will configure stuff, which will configure the backup scripts, et cetera, et cetera, et cetera. Also, you need network administration. You need somebody who will look after your network, who will make sure that the packets are flowing nice and smoothly, that there is no clog ups, and configure those switches correctly. When it comes to cost, it's very subjective because different markets in different parts of the world, the salaries are very different. So in this model, I use the number of $250,000 USD. That's usually what it costs to a company to have a good quality system administrator on staff. That includes traveling costs, sick days, et cetera, et cetera. And for running a call of a few racks, you probably need about two or three good people on staff for the configuration I mentioned. And please remember, this is the minimum. When it comes to your cloud deployments, even though you don't have to deal with the physical parts, and all of that is happening through API, you still need somebody to configure the scripts which will call those API. You need somebody who will sit there and respond if something happens, if there is an outage with S3 or something else becomes unavailable. Yes, it's in a call-off. OK, so by call-off, I mean the facility where you bring your racks, and they basically guarantee you that here is your power, here is your uplink, and here is your real estate. And that's pretty much it. Everything else is on you. There is also other option where you can have people inside the call-off who will perform those tasks for you. So yeah, we can probably look into that in a different model. In this model, I assume that you have to have your own staff to do this. To be honest, my experience with the services provided in call-off was that the pricing for those can be very substantial in some places. But that probably is OK. So if you look at the breakdown of the costs, like basically, if you remember the beginning of the talk, I mentioned that there will be about 50 different moving parts around. That's roughly speaking what we arrived at. So when it comes to Amazon, the majority of your cost is the compute. So those would be the EC2 instances. The second part, which comes very often as a surprise to a lot of people, is storage. And the majority of that storage cost is due to the IOPS. Because this is the component which is I've seen so many times overlooked during the cost estimates. So if you happen to do one, please remember, count your IOPS and count them very wisely because they can hurt you. And they can hurt you by a lot. Data transfer comes as and people basically come as number three and four accordingly. So if you look at the split of costs in a call-off situation, it's very different. The biggest expense is people because the hardware is a lot cheaper. The second biggest expense is obviously chassis. That's the physical servers you need to buy. And racks and power because power is not as expensive. And if you compare the size of power to the cost of chassis, those are pretty much the comparable entities. However, the backups and the uplink come as very affordable. OK, so far we've got that the hardware established that the hardware is a lot cheaper but requires people. OK, so let's finish this by comparing how do you provision those systems. We live in Vancouver and we like to look at the mountain range through tennis rackets. So there you go. I was kind of funny because I put that slide and then I looked at the next slide the next day. I was actually doing the exact same thing. So let's talk a little bit about demand. Remember I was saying that CPU and memory became a commodity. So what happened is with the adoption of cube, you don't have to look like, I would say, it's a lot easier to estimate the demand for your infrastructure. Why? Because with the traditional deployment model, you had to do this exercise for every single application and you had to build in the margin for error in every single variation, in any single instance of your application and then combine this together and get the estimate. It's a lot simpler because when you add a lot of entities with variation, you get one big entity with a much smaller relative variation. That's called the law of large numbers and that's the phenomena everybody's reporting once they switch from one-to-one deployment to the container orchestration or some other different form of collocation is that your workload actually fluctuates a lot less. The workload of a single application can jump up and down, but if you look at the aggregate, it doesn't jump up and down very much. So the most typical scenario is that the companies are facing the increase in workload and the increase can change a lot, but let's say 40% is a fairly typical number I've seen. The workload usually has some seasonality. For example, Christmas period can be a little bit more busy than, let's say, summer period or the other way around. And usually it can be predicted, but only to some extent. So you can say, well, this might demand maybe in this range, but you don't know exact number. So satisfying that demand with on-demand instances in EC2 is super easy. That's basically meat and bread of the cloud. That's how it was advertised to us many, many times. The moment your domain changes, you allocate additional instances and you basically can reschedule pods onto those. The moment it starts going down, you can basically decommission instances gracefully without causing issues. So it was very hard to do with the traditional one-to-one deployments because you had to do this exercise for every single application, but now you don't have to do it for every single application. You just have to do it for your one big application called cube deployment. Usually you still do a little bit of a small over-provisioning just in case because we all feel just a little bit scared and that's good. Usually it doesn't hurt the cost that much. After some time, it's a common practice to switch from on-demand instances to the reserved instances. Why? Because they're cheaper. If we have to run, let's say, 150 instances for a year and we know that this will not go below 100 instances, why not just to buy one year worth of reserved instances? So it's very typical that you actually switch from on-demand to reserved once the workload is known and you know the bounds for it. So for those who really like to play with statistics, where is the sweet spot between the ratio of reserved versus on-demand? Because obviously, if everything is on-demand, you're super flexible but more expensive. If everything is reserved, it's cheaper but you might run in the problem of over-locating reserved instances, which also can contribute to the cost. So where is the spot of the ratio between on-demand and reserved, which is the most optimal? That's a little bit out of the scope of this presentation. In a call, the situation is dramatically different. Provisioning call is a much, much slower process. Like if with on-demand, it takes literally 10s of minutes, worst case to provisioning instance, with on-demand, it can take several months. Because you have to communicate with the vendors. You have to make sure they ship them on time. That though you arrive on time, you have to rack them, stack them, do the burning tests, et cetera, et cetera. As a result of that, the over-provisioning doesn't happen, let's say, on a quarterly basis or weekly basis. Most of the time, the expansion of capacities in a call happens one or twice a year. So the red dots, they show your demand, which is going up and down, and it is a little bit fuzzy about how the future is going to look like. And you still have this delay for adding the capacity. So what you usually do, you over-provision. And you over-provision by a lot. Why? Because you need to compensate for growth and unpredictability and possible delays. And that basically downplays the cost of call by a lot. I'd like to point that it is also a, I wouldn't say it's a typical practice, but it's becoming more and more common to see people using the unused capacities in a call for running the bad jobs. What difference between the bad jobs and the online jobs? That the online jobs, such as web servers, and et cetera, et cetera, they have to satisfy demand right away. The bad jobs, they can tolerate some delays. So I know that Kube Resource Management Group is working on the job preemption, which would probably make this feature even better in the common future, which would allow, for example, fully saturate your resources. And if a new online jobs comes in, it can just kick out some of the bad jobs which were running on the background. So that's how you achieve the full satisfaction of your change in demand. And at the same time, you run the bad jobs. So if you put them side to side, oh, sorry, before we jump into that, how can you improve this process? The simplest way to improve this process is to collect better analytics and have higher quality prediction. Don't use the numbers produced by different parts of the body. Use some analytics. If you're just starting, I would highly recommend to have a look at the book by Yahoo. This book is fairly old in the tech world, but all the principles and apparatus for establishing, for building linear regression models and curve fitting, it still applies. You can still use it. And it can make your life a little bit easier and help you to save some money. So when we put them side to side, that's the picture we see, that when we use cloud, we actually use a lot fewer resources when compared to the amount of resources you would have to use in a colo. So all of that beauty of the colo which we've seen in the first slides and the cheapness of that is offset by quite a bit by the fact that you have to over provision by several factors compared to EC2. So now let's talk about dollars. So we can see the graph which show in the cost of the infrastructure in AWS as the function of the number of instances. That includes all those components from the pie chart. And obviously it scales up almost linearly because that's how Amazon Billing works. With the colo, the situation is quite different. You have a very, very high plateau which basically starts somewhere in the range of $60,000 a month. And as you increase the operational capacity of your colo, the price goes up, but not as steep as in the case of Amazon. Why? Because the cost per unit of resource is a lot cheaper. And if you step back and zoom out this picture, we can see that there is a sweet crossover point in the range of 100 to 200 M4 to extra large instances where the costs of running that in Amazon become comparable to the cost of running the same infrastructure in a colo space. How does that translate into dollars? That's roughly speaking about $100,000 a month. Once again, this is built using the model which was built using the generalized data in the context of a specific organization that point can shift. For example, if you're very IOPS heavy, then probably Amazon is going to cost even more. If your demand fluctuates a lot and it's mostly CPU bound demand, then probably Amazon's solution will look more attractive. And the other way around, so basically, if your workload is smaller than that number in terms of instances, there is no much financial sense in moving to a colo space because all you're going to get is just additional headache with provisioning, and you're not going to save money from that. It's actually the opposite. And the other way around, if the larger infrastructure becomes the more financial sense, it becomes to think about running that in a colo. So I cut off this graph at 500 instances. So what's happening if you actually want to go further with the colo? There is only a certain scale of a colo which can be sustained with a rather simple networking. After which, the cost of networking becomes a lot more expensive, and it's a stepwise function. You basically have to implement a lot more sophisticated networking part, and that will run your cost a lot larger. So the conclusion of this exercise was that hosting in a colo is still a financially feasible option if you reach a certain scale. Basically, you have to cross the point of $100,000 to actually start that, to benefit from that. People and uncertainty are the biggest contributor to the cost overruns because if you remember, people are the major component when it comes to estimating the cost of running in a colo. And what happens is you can have people of different qualification, and sometimes a team of three people won't be sufficient to run the solution like that. You might run into the issue of you hired three people. Well, they couldn't do this, so you had to go further. When it comes to cost modeling in EC2, IOPS are very often overlooked, and they're a big contributor to the cost overruns. And the last conclusion is CPU and memory are still one of the largest parts of the expense. And there is no easy way of getting around this, but getting smarter about the resource management. So that probably will conclude my talk. And I think we still have a few minutes for questions. Yes. Yes, that would. So that would probably shift a little bit the numbers. The feature parity, that's actually what you're talking about, if you want to achieve the same feature parity. In order to answer that question, you have to do more rigorous evaluation of the quality of the solutions because you make sure that their production grade, that the quality is there, and estimate how much would it cost to migrate to a different standard. We focus here on the most of, I would say, locking-friendly solution. So the ones, because we were talking about this curse, the whole idea was that, well, you can jump from this environment to another one. The one you're talking about basically will introduce a hard locking. And that is a pretty big fear in a lot of companies. You're stepping into a very politically difficult area. Because my experience is. I was asking from the perspective of the people we've been playing to. I mean, it shouldn't be very hard to expand this feature, but that would require narrowing the requirements. And probably a less general of a conclusion. And I try to come up with this golden ratio, which can be used as a ballpark number in the discussion. Because I've seen very well numbers being used. And that was just driving me crazy. Guys, if you have questions, I think my time is up. Please come and talk to me now. Or hit me on Twitter or via email. And I can answer your questions.