 Hi, everyone. Welcome, and thank you for being part of this session. I want to do quick introductions, and then I'm going to get out of the way. I am Madura Moskowski. I am one of the co-founders and VP of product at Platform Line Systems. And I wanted to start by introducing Udi Gold. I've had the pleasure of knowing him for some time now. Udi is the VP of Infrastructure at PubMatic, and PubMatic is a leading ad tech content provider for publishers. So Udi has been with PubMatic now since 2014. Before that, he was the infrastructure wizard at Pure Storage. So he knows a thing or two about infrastructure management or operations. So when Udi joined PubMatic, he was given the almost impossible task of taking their massive deployment that they used to have in Amazon that they had just started migrating out of Amazon and into a co-location and private data center-based setup. And he needed to make sense out of it, add a good set of management on it. So I want to hand it over to Udi and let him share his story with you. Thank you. Thank you, Madura. Do you hear me? I'm OK? OK, so just a little bit of adjusting the facts. I joined PubMatic in order to move from Datapipe, which was a hosted service provider, and the cost went way up. We moved to a data center of our own. Then we had to move to Amazon for reasons that I will mention in the presentation. And we end up now back at home with a private cloud open stack, with a hybrid environment, which is cloud hosted and our private cloud. I call it the cloud hopping under the bar hopping. It's a lot of fun. So one thing that I would like, a few things that I would like you to remember at the end of this presentation, if anything else, is public cloud is not good or bad. For some, it works very well. For others, it doesn't work at all. For other, for a variety of reasons. And everyone who walks into the adventure of a public cloud needs to take into account several things in order to avoid situations like where PubMatic found itself. So size doesn't matter. And when you have a large operation, when you go to a public cloud, you better design from the get go your architecture, the software stack from top to bottom, to comply and take advantage of the public cloud. Otherwise, you end up with a lot of spending. If you find yourself in a situation where you are in a public cloud, your spending is way high, the decision to move a production environment, something that is generating revenues to the business, from public cloud to in-house cloud is a painful one. And the execution is painful as well. It requires mental shift in management. Mainly, the CFO is driving it. But the VP engineering, the VP sales are kind of holding back because it's scary. And it is scary, but you better prepare to do it in the earlier the better. And OpenStack, for me, is like being in a relationship. The fact that you have OpenStack doesn't mean a lot. You need to work on it. You need to improve it. You need to put more features to use it. Otherwise, it's like a shelf where that nobody really use and utilize. So just to start, I will talk a little bit about PubMatic. And then I will tell you the story that we had, where essentially all great stories start with alcohol or bed decisions. We were in a hyper-golf mode. We will review that. And we had to get out of it. So PubMatic is a leading provider in ad tech industry. We do real-time bidding, where when you click on a web page, New York Times, eBay, whatever, nobody knows what will be the ad that it actually going to be on your screen, regardless of mobile or desktop. We receive the request. We run a real-time bidding between a variety of vendors or publishers that would like to compete on the ad space, make the decision, and deliver the ad in real-time. And to end, this transaction takes less than 300 milliseconds. We also provide publishers the ability to utilize their electronic resources. We connect between publishers and advertisers and give them a marketing platform to utilize the resources they're spending the best way they can. We operate with multiple data centers. Some of them are physical. Some of them are in the cloud. And that's part of our business strategy. We start small with the cloud, and we migrate to a large data center that we build. We have a very high-volume data and very high-volume network traffic, which always needs to be under a certain threshold in performance. Otherwise, it's a problem. And we have a very large analytic system that consists of hundreds, if not thousands, of nodes. So that's about PubMatic. And this is a story that I have to share about the most expensive taxi ride of my life. My first trip to India, we have a large office in India. I landed in Mumbai, and it was 3 o'clock in the morning. I was sick. I was tired. I was hungry. And the driver didn't wait for me to pick me up. What do I do? A nice guy came to me with a uniform and a tag and kind of a very formal and offered me help. I was worn not to deal with that. So I told him that I'm fine, and thank you very much. He waited. I was hungry. I was tired. I was sick. He came back after 20 minutes. This time, I pulled the phone number of our administrator of the office, and he actually called to one of our guys, talked to him in the local language, and our guy told him what to do with me. And he came back to me and says, OK, I'm going to put you on this. And that's going to take you to where you need to go. He says, OK, I don't have local money. How much? It was $20. So for those of you who live in the US, $20 for a taxi ride might be a little bit expensive, but it's reasonable. This ride should have cost me about 100 to 200 rupees, which is maybe $1.50. I paid $20. I knew before I got into this that I'm being ripped off, but I walked in anyhow. And that's something that we need to remember, because sometimes you know what you start, and sometimes you know why you even started. I knew why I got into this rickshaw. I needed to get to a bed. But you never know how it will end. And when you go to the public cloud, that's exactly this. So we had a situation, the typical cycle development needs new machines where they need to test. They need hardware. There is no hardware to put it in production. What do we do? How do we do it without buying more hardware? CFO doesn't give any budget. We have the solution. We go to the public cloud. Easy to say. So our scenario was production infrastructure was utilized in a very high utilization, very satisfying. Large development efforts started with our new analytics system, which is quite impressive, I have to say, and I encourage you to check it out. They needed several hundreds of servers for development, QA, et cetera, and then production. And at the same time, I didn't have the hardware to give them. The plans to retire old hardware to give it to them were changed because of the high volume and the growth. And engineering development is on hold, and I'm responsible for all of this. So what can be done? So to cloud or not to cloud? So when we go to the public cloud, what do we actually get? We get the hardware. We get the network. And we get management and control. Some sort of a dashboard that helps us to deploy and see status. What don't we get when we go to the public cloud? Operating system installations, network architecture and deployment, security with a variety of patches that needs to be done and machines getting hacked and somebody needs to respond. And of course, all of the application stack that is all mine, anyhow. So that's something to remember. Public cloud does not equal to data center. And a lot of people make the mistake of, oh, I will go to the cloud and take 100 instances and I have a data center. And the answer is no. Because public cloud is CPU to higher. And if you need a CPU for 24 hours a day, then the public cloud is going to bite you with the cost. That's what happened to us. So if you do plan to go to the cloud and you plan your application accordingly so you can drop half of your capacity with no impact on the business, with no impact on business processes, on billing, on anything, then yes, it might be the solution for you. Because then you actually use it for what it was designed. But if you go to the public cloud and you spin up Hadoop cluster with 500 nodes, then even if the machine is zero CPU activity, you cannot drop it because it's holding your Hadoop file system. That's a problem. So in the public cloud, we need to have resource management, system administrators that are mine, network monitoring, or NOC to control the environment. And I need to have people understand what we are doing. So that's people that I have to bear the cost anyhow. I need to deal with security. And again, that's tools that I need to buy, either SaaS or on-prem. People to handle that. And all sorts of challenges, troubleshooting issue, troubleshooting latency issues, machines drop, and we need to call the provider to fix some things. And the cost is unpredictable, can be very expensive, and very difficult to control. Just as an example, we had a QA person that, by mistake, spin up 500 machines and realize that the day after. So we basically drop money down the drain. I can scream and yell and kick. My guy did it, and I have to pay. And if I give someone the capability to do this, I need to make sure that I also cover that with my spending. And dealing with public cloud, in many cases, it's a credit card or some sort of an invoice. It's not really a relationship with someone that you can actually talk, negotiate, explain, get some leeway, whatever you need to do. It's not a real relationship. So just as an idea, plans versus what happened. So I did not use the real numbers because I didn't want to scare anyone. But I used kind of ratio. So if the cost that we anticipated for the public cloud was x, that's how we budget everything. The reality was like this. So to be honest, we started, let's say, with requirement of 1. The requirements have changed during the project to about 3.5 from the original 1. But the cost, the end cost was almost 18 times between 17 to 18 times of the original that was predicted and five times of what it was supposed to be if the initial predictions were the 3.5, the demands or the project plan that evolved during the project. So now we have a problem because we are heavily invested. We have over 1,000 production nodes in the public cloud. The cost is skyrocket. When I come to the office, I go through the back door and I crawl under the table so the CFO will not see me. Because he was hunting me every day three times a day, what are we going to do about it? And I'm not kidding. It was unpleasant. And at the same time, we have environment to run. We have customers relying on this platform and we need to do something. So cost was unsustainable. It was six figures, very close to seven, and on a monthly basis, and it's fluctuate. And one month it was another $80,000, the other month it was another $120,000. And there was not a lot that I could do. Try to talk with the vendor, try to negotiate, to change, to do something, not a lot of success. So what do we need to do to get out of it? And that was the constant questions by the CFO. We needed hardware, network, and management, and control, because that's why we went there to beginning. Hardware we have. We have multiple data centers, so that was fairly easy. But the most difficult part was management and control. So what does it take to build management and control? It's to go and download software from the net usually or sign up with a vendor, have some people, and do some investment time, money to deliver that. So guess what? We're back to square one. Should we go to the cloud in order to develop our own management and control? So we defined the requirements. The requirements was we will use our own existing hardware without making changes in it. We will have a centralized management that will allow us to control, deploy, monitor things that are in our environment. And we will delegate that to specific organization units. For example, QA will have their own environment. They will manage it. They will get a set of hypervisors. They will spin up whatever VMs they need. And as long as there is no change in the capacity, they can rip them off or put them back in or destroy them and deploy them as much as they'd like without the need to come to the IT guys. I put the phrase, instead of being constrained by IT, they are enabled by IT to do what they need to do. The platform will support virtualization, dockers, VMs. We will have a hybrid infrastructure that we will be able to manage it from the same place. We will have vendor support and the ability to work with someone who will listen and actually help us to change things where we need them. And it will be as simple integration as possible. So the question is, is OpenStack is the answer? So OpenStack reminds me like Linux 20 years ago for those who are mature enough and experienced enough. Linux came out and people say that that's never going to fly because it's an open source who is going to support that, who is going to use it definitely not in production, blah, blah, blah, look where we are today. OpenStack in the same style. It's something that I'm talking about like a teenage sex. Everybody talks about it, but not so many actually do that. And yeah, everybody wants to have it. It's a great idea. And people are playing with it and testing it, but I'm not familiar with massive infrastructure that is deployed fully solely on OpenStack only. And I don't know everyone, and I don't know everything, so I might be wrong. However, OpenStack provides support to provision, configure, deploy, and manage stuff. It's allowed me to provide self-serve environment to my organization without the need to open ticket and go back to the corporate IT or to the data center team to give me more hardware, give me new VMs. We need this, we need that. Those days are over. And it operates on the main areas that we need, storage network and management of VMs. So this is where I got in touch with Platform9, who was a very early startup back then. And they were very attentive. I gave them the requirements. I explained to them what I need. And they were willing to listen and work with me on their roadmap, which actually allowed me to benefit from it. They were hardware agnostic. I didn't even have to take down a machine in order to bring it into the environment. That's how simple it was. They provided me with a tenant management so I can assign a manager that the view of the world that they see is what they manage. They supported that's a change that they did for me, because I have multiple data centers. Some of them are in the mayor. Some of them are in Asia Pacific. So I needed to find a way to identify the systems where they are physically reside. So although everything is virtually one environment, I can segregate and tag a data center and say, this group is that area. They provided me with very close and dedicated support. And I will not lie to you. There are some issues from time to time. We face some things. And they are always on top of the game. So that's exactly what I need. They support VMs, dockers, and they provide me with a management dashboard that provide me a consolidated view where I can see overall my environment how many CPUs or how much CPUs overall I consume and allow me to plan better. They have a very compelling roadmap, in my opinion. So this is something that helps me to continue and use them. And they have full OpenStack capabilities, which helps me in terms of future procurements. Because if I talk with a storage provider, one of the initial question is, are you OpenStack compliant? It's like, is it TCP? Do you support TCP? If not, I'm not going to talk to you. Same with OpenStack today. It's not that I'm heavily invested in it, but it's where I'm heading. And that's why it's important to have it in the environment. So OpenStack first. And just as an example, how I divided the environment, is there a laser here? Yes. Yes. So if this represent the entire hardware infrastructure that I have, there is subset that is production, another subset tenant for all engineering, and that including QA and others. And a subset of that is managed by the QA guys. And essentially, the QA integration manager managed that infrastructure. And he can decide, he sits in India, and he can decide that every night he destroy the environment and redeploy that to test the next code. That does not impact anyone or anything. And if something goes wrong and he needs more, then he might come to the engineering team and say, guys, can I spare a few machines with you from you and move it from this project to this project or from this project to that project? They self-sustain in terms of managing the resources, which save me a lot of headaches. And we reduced the number of tickets for provisioning by a significant amount. So to conclude my journey, there is one point that is missing here. Before this, we had a data pipe hosted, which was plain hosted data center. We moved to our own data centers. Then we went to the public cloud for a very large project. We moved to a private cloud in our data center, OpenStack-based. And we are now at the hybrid data center where we have private cloud. We rely for capacity and spikes on the hybrid on the public cloud. And everything is managed from the same place. So lesson learned. First of all, this is my story. It's a pragmatic story. My experience, a lot of sleepless nights, a lot of flights to New York and working four days straight because things were not so smooth at the beginning. And it's not black or white. This is my experience. Others will tell you that they go to the public cloud. It's the best thing that happened and saved them a lot. The other thing is remember to be realistic with your expectations and with your estimates. And I know that every project that is being given to me is going to start in a certain way, but for sure it will not end the same way that it was supposed to. There's always requirement change, add-ons. Oh, we forgot that. We need another dimension in the analytics. We need five more databases. We forgot discuss whatever it is. Always prepare yourself with enough buffer not to be in a situation where you said that the project is going to be $50,000 a month in the public cloud and you end up with a $250,000 invoice because you will have to explain that. Not engineering or not the architects. You are the one who is responsible for that. And also sales people like to tell you success stories and there are very good success stories in public cloud. They might not apply to you. In my case, definitely not. At the same time, if the vendor works with you from the beginning and included in the architecture, design and review and is actively wanting it to work for you, then it might be a good solution. One key element that you need to remember, set a limit. Put line in the sand. Put a boundary that no matter what, it's a go, no, go. Because we started with $30,000 a month, which quickly became $60,000, but that was OK because it's still around the budget. When we hit $180,000 a month, people started to question, is that the right solution? By the time that we were at $400,000 a month, it was way too late to go back. And from there, it only increased. And the idea of pulling it out was much more complicated, much more expensive, and much more painful. So if you set up the hard line no matter what, if the budget is, let's say, $30,000 a month, and you say, if I cross $100,000, I shut down this project or I move into my data or here is my plan B or plan C, then when you will approach that line, you will already have a plan, and it will not linger from one month to another when you find yourself eight months into the adventure. Totally helpless. So as I said, better earlier than later. And for me, two years ago, if you came to me and said, you're going to work, you want to be based on OpenStack, I would say, yeah, sure, whatever. The reality is that with OpenStack, now I can do more. I'm looking now at distributing a variety of empty disk space that I have and make it object store for a variety of use that I cannot do beforehand. And other solutions, SDN and a variety of things. So we're evaluating a lot of things that OpenStack basically enabled us to start look at. So as I said, this is one of my requirements and prerequisite for any vendors. And with that, I'm open for questions. The only request, if you have a question, please go stand by the mic so I can hear you. Hi, I have a quick question. So the cost when you were in the public cloud, those numbers are clearly shocking. But I'm wondering, how much of that is reflected with mismanagement, like the 500 instances that got spun up accidentally? And how much of that was actual run rate? So we had two incidents where the cost was mismanaged. One of them is this 500, which was 24 hours. So that was not a big deal. We had another case that one of the machines got hacked. And they spun 4,000 servers. And that was running for 24 hours. And we killed that all. And we got refunded for that. So that did not count. So when you say mismanagement, again, it's not a black and white. Because at some point, I deployed measurements to see what's the utilization of the servers. And not surprisingly, over 50% of the servers were less than 50% of the CPU utilized. So I was pushing engineering very hard to consolidate. Don't have that type of machines. Go to a smaller one, or put two in one. Try to be creative. The problem was that a big chunk of this was Hadoop implementation. And we needed the space. So you can say mismanagement in terms of utilization. But if I wanted to kill those machines, I killed some of the data. And we have petabytes of data coming in. So there's not a lot that I could do from the operation side. I agree with you. If we were walking into this adventure, wide-open eyes, working with the vendor to make some utilizations or some sort of accommodations, maybe we could do better. But that was not the case. OK, thank you. Right here. I wonder if it was an architectural issue on your part. And you must have talked to Amazon Solution Architects, because if it was a long-running Hadoop cluster, and you can separate compute and storage. You can put a storage in S3. And only control plane of the cluster has to be on demand instance. The data nodes, as well as task nodes, can be on spot instance, which is 80% to 90% cheaper. Did you consider that? Yes. And the whole thing, again, that came up after. So when we sat down with Amazon at the beginning, none of that was conveyed. The team deployed. Things started to run when we faced the situation and we checked what needs to be done. One of the solution was that. But then you need to develop an entire system to manage your spot instances. And when they dropped to spin up something else, we did not have the resources to do that. And Amazon was not willing to assist us doing this for us. According to them, they cannot take down servers. They can only go up. So it was, and again, if we were doing everything from the beginning, designed to be on the Amazon, then things could have been better. And that's what I said about the public cloud is not a data center. The engineering department who started this project considered that like a data center. So a lot has changed, right? So now Hadoop 2.0 is you don't even have to worry about the data nodes that are lost. Automatically, we'll come back and the Hadoop Spark or MapReduce will start from where it left. Yeah, but that was not available a year and a half ago. All right, thanks. Yes? So it sounds like when you were in AWS, you weren't leveraging very many of the AWS features. Is that true? It sounds that what? You said that you were treating it like a data center. So you weren't using Elastic Cache and Elastic Load Balances around 53 and all the kind of mojo that AWS provides at the platform layer. We did use some of those, probably not all of them. OK, so my question then is, I can imagine a lot of applications, maybe yours, has a control plane layer with a bunch of services that devs are actively working on. And then there's a large body of dumb worker bees. Maybe they're Hadoop instances or whatever. Did you consider perhaps moving all the dumb worker bees out of Amazon, where that's eating up all your costs but allow your devs to continue working in Amazon into a hybrid cloud that way? Yes, and for some of those, we tried to move some of our worker bees, as you called it. The problem is that because of latency, we have multiple components in the environment that must speak to each other in a very fast way. So the idea of moving portion out of the cloud and have them communicate even with direct connect would not cut it in terms of performance. Great. So you said you started with a plan of like 30,000 per month and it went up all the way to potentially 200,000 per month. So at that point, how many instances were there? I don't remember the specific details, but we talk about hundreds. Hundreds of VMs are not thousands of VMs. At the end, we got 2,000, yes. OK. So actually, continuing on that question is I definitely see the use case for moving from public to private. For somebody like your company or companies that are doing that, you had this OPEX that you were paying 35, went to 400K. Can you break it down into how much was commodity infrastructure hardware spend, whether it was with Platform 9 or not, like a percentage on commodity hardware, on actually DevOps outsourcing, or TNM, so that did it really reduce your OPEX and AWS, assuming AWS, that when you were converting it to OPEX to private cloud? Oh, absolutely. We saved easily $6 to $8 million on OPEX a year. That being said, the capex that we have is already ours, meaning even if I had to go and buy the hardware to use, it would be less than the OPEX. In any way that we slice and dice the financials of that adventure, the only advantage of going to the public cloud was the speed. In a matter of hours, we had the environment up, which you can quantify it for money, because if I want to deploy that, it's going to be three weeks or four weeks before I place 50 or 100 racks in the data center. That being said, again, the monthly OPEX today is 10% of what it was. So I'm just curious, at the end, you're mentioning your hybrid, some part in public, some part in private. So I'm just curious, why not all private? Why not all private? Very good question. So our traffic is very seasonal and follow a pattern that is very clear. And in order for me to accommodate peak time, I need to have a certain amount of hardware that would be 75% of the time idle. So for me to pay for the data center to run my equipment and waste power, when 75% of the time I don't need it, I should do something with this hardware. Either sell it to somebody who wants to use it as cloud and sell them services, or not have that from the beginning and overflow to the public cloud when I need to. And then the cost is fairly small. Yep, I have two questions. First one is, how long did it take you from the time you decided to move to OpenStack to when you first deployed? And the second one is, could you give us a top level layer of your infrastructure that OpenStack is running on in terms of network speeds or ports or? So the answer for the first question is, thanks to Platform9, it was fairly easy. I would say weeks. But again, I don't have the staff to deal with OpenStack on a daily basis. So I contracted with them, we negotiated, they gave me the software, we started to run. We had few bumps, but within few weeks, we had a lot of the infrastructure ready to go. Not all of my infrastructure is on Platform9, I'm not solely OpenStack, I have modules that are not. But for what I needed, it was fairly, it's a fairly easy implementation. When you talk about ports, what did you, what the second question is? No, it's not about a brand. I, we have hundreds of racks of gears. We have hundreds of switches. I don't remember numbers on top of my head, but it's in 500 switches, 700 switches, something like that. Any other questions? Okay, thank you very much.