 All right. I think I really get started. Hi, everyone. My name is Nicolas Beaus. I'm director of operation engineering at Adobe for the Adobe Advertising Cloud. I'm here to talk a little bit about our OpenStack journey and what lead us to cloud bursting and all this exciting stuff. So first, let me introduce briefly what is Adobe Advertising Cloud. And it's really the industry first and platform that allows you to manage advertising from traditional TV to any digital format. It will make it simple to deliver video, display, search advertising across any screen and in any format. And what I want to talk about specifically today is one part that is called programmatic ad buying. So that's one specific part that came up over the years. And that's really what this platform is about and what this talk is going to discuss. So originally, most of the ad on this industry has been bought through a complex manual process, RFP, IOS. And you can see the sales guys doing a bunch of facts and trying to get inventory on some website and sell that to some clients. And even so, the ad delivery was in some way automated. There was still a very heavy sales process going on. Nowadays, it's a fully software-based. And you can see that as a stock marketplace where you go on the website, you have a video. Well, the ad you may see before seeing your video is going to be a real-time bidding process where a different service will be called and tell us, like, oh, is there is this user going to see this video? And you have to make the decision in process to say, OK, I want to bid on this. I want to bid on this user, and I want to show this specific ad. And depending on your bid, depending on the competition, you may win or lose the auction. It's very transformative. It does mean there is a lot going on nowadays and on all those platforms. And what it means is a lot of exacting technical change for us. One of them is definitely latency. You have to be able to deliver and make your decision in process in a few milliseconds, 95 percentile generate around, like, 50 milliseconds. So you have enough room for network or something like that. But so it does put some different change on what you're doing. It's also high-volume traffic. We are processing hundreds of billions of requests every day. There are really, like, all those web requests we are getting from that get transformed, like, in ad auction. And all those ad auctions are coming to us, and we have to make decisions, like, do we want to bid on, do we care about this user, or do we have, like, an ad to show to this user or not? And to make our decision, we have to store, like, huge data set. We have a lot of information, a lot of object to store, a billion of objects. You can see that as cookies or device ID and things like that. That are used to allow us to target users to decide in which segment you may be in and which ad will be the most relevant to you. So all this, like, presents, like, new set of challenge to be able to provide a relevant ad quickly to the user. And we started all that in public cloud. And public cloud can be very helpful, but it can be very annoying to, as you grow, as you have to manage such huge volume, controlling costs become a real pain. Each cloud provider may have a different way of measuring costs. You may have, like, a hard time to build, like, a relevant projection. It can get very cumbersome. And you lose quite a bit of visibility and technical limitation on our performance. You kind of have to do a lot of iteration to understand, like, packet per second limitation or any resource stolen by a noisy neighbor or thing like that. Especially if there is any incident or network problem or stuff like that, it may become, like, even more harder to troubleshoot and try to analyze. And then from there, it's kind of become a blame game between your provider and yourself and trying to figure out, like, what is the best strategy? Like, there was a knowledge you suffer from. It's like, oh, but you're not, like, multi-easy, multi-region. You can't necessarily do that in all use case. If you are dependent on latency, you can't easily think about multi-region, for example. So you need to find a solution to those problems or you can still serve your business but considering some of the limitation you may have. Private Cloud works. I mean, you are in control of everything. You know everything and you perform the best. At least that's what you want from a private cloud. So you may start looking at private cloud and that's where you want to be. And that's very natural thinking process. You can think you can do better. But the idea is that you will most likely end up with some hybrid cloud where you try to leverage, like, what you are building in ours. You have some data center, bare metal deployment. You have some solution on a public cloud provider. And you try to leverage, like, depending on the use case, you try to leverage the best solution for each platform. And you may end up with very siloed infrastructures that may become hard to manage, may work for some use case. But as your platform grows, it's become less obvious that you benefit of each solution and you have a really standardized infrastructure and automated infrastructure. So with private cloud, there is some scaling challenge that came to it. And the obvious one is, like, you can't scale quickly enough. You have your hardware, you have bare metal, you end up having, like, you don't have the same flexibility as doing an IPA code and getting, like, a bunch of, like, 100 or 1,000 of VM provision for you. So it's become a new problem or old problem if you were already in a data center before. And if you make that up, it's really impact your stability and your growth. And that's something you don't necessarily want to try it on. So from there, you obviously want to implement cloud bursting. That is, like, what could be the natural thought to have. Like, OK, I have my core in ours. I have all this bare metal server. I have built my private cloud. But I still want to be able to bear some workload because I have seasonal traffic or things like that. So you want cloud bursting to quickly overflow your computer sources on public cloud, whatever is a public cloud you chose. That will help you mitigate, like, a controlled peak of traffic, seasonal traffic, and things like that. And it will give you some buffer for procurement data. So you still want to have your core in ours. But you know that it may take, like, three months, six months, depending on the market. If you have SSDs that are out of stock or whatever. So you have to factor all these details that are complicated and annoying. And the business doesn't really care about that. If you have to deliver a service. So the buffer is useful. But that's not that easy to do. And overflowing is that's a picture of the Orville Dam in California. I don't know if you follow this story. But after, like, 60 years or something like that, that's the first time they use this spillway. It's kind of a disaster. And it's very hard to build right. It's not something you want to use all the time. It's really like your safety net in some way. And it can be very costly to operate. At least at the beginning, you think it's a good option. But then you can realize that it's not that cheap to operate. So what I'm going to do is to go a little bit more inside our journey on what we did, what bring us to do a cloud bursting with the advertising cloud. And to us, it's really a journey about infrastructure automation. It's more than just open stack or cloud bursting or private versus public cloud. It's really how we automate our infrastructure. And now we scale the service to best serve our client. So the cloud journey started in 2011 with just an ideation that was mostly driven by cost and performance. And at the time, public cloud was definitely already a standard. But there was a lot of changes. Not everything was there. EBS was just starting to get there for Amazon. I think VPC was not even there or just starting to be there. On the private cloud solution, there was a lot of open source project coming up, like a Calib2, OpenNabila, OpenStack, CloudStack. So everybody was thinking about it. And 2012, we tried to evaluate that. We were more and more excited about it because all the technical challenge we were facing on trying to address with our public cloud provider, we were not able to address them in a meaningful way. And we are still growing. So we have been trying to evaluate Calib2. And we were looking at EC2 compatibilities. The API compatibility was very important for us at that time. And the reality is that we were not able to do meaningful progress on this. It does require a lot of effort. In 2013, we chose to have some vendors helping us, trying to go on the Zingam model, which was heavily based on CloudStack at the time. But when we did that, that was kind of a big failure. We were not very excited about what we were getting. We didn't see any cost saving. We didn't see how we could get the technical solution we were looking for. So there was no way we moved further on that. And at the time, OpenStack took more. The community was growing. And there were real appeal to look at that. All this modularity of OpenStack gave us the confidence that we could probably find the technical solution we were looking for. So we started to look at that. 2014 was really the do-it-yourself. That's where we started to design and build our infrastructure to fit our need. And that's really what drive the rest of the journey. So we built something that is meaningful for us. And we had to stick to the core. We had challenged with any block storage solution. But we went through this. Well, in 2015, we got our first production cluster deployed. That was a make-it-work moment. There was definitely challenges. But at the end, we were able to find good success in this deployment. But we had to accept some reality of this. We totally give up on the easy to compatibility. There was no real point for us to do that anymore. Having multiple API to deal with, that's not that much a big deal. It's easy to automate, to build your wrapper or your tooling. And as the year goes on, you know that there is new tools that allow you to do that even more easily, tools like Terraform and things like that. 2016, we grew even more. We added three more data center locations. We spent a lot of time scaling, automating. And we did a lot of iteration to really understand, to improve the platform, to make sure we can scale that. And so we went through this journey with some very specific philosophy. We are not an IT shop. We are not here to build an open-stack solution. We can resale to someone. That's not our core business. The time to muggle was a small startup growing fast. We are no part of the Adobe Advertising Cloud and supporting this growth. We had to be very lean. We didn't have 100 people to manage this solution. That came from three people working together like overnight and trying to build a solution that worked for our need. So we were really embracing this value of doing a lot with a little. And we didn't really care about what was nice to do, what was a fancy or last training option for open-stack. That was really like, how it's going to serve our problem and solve our problem. We wanted to really, really embrace the approach of Kettle and not pet. When we look at the infrastructure, we want to deploy. We were not talking about deploying two or three servers in our rack somewhere. We really took an approach of what we were designing was a full set of full racks that we designed end to end and make sure that when we deliver to the data center, it's a full rack that is delivered. With our minimum footprint was actually two rack at the time. And that's a strong commitment because it does mean like if our need was lower that this amount of server, then we will stick to a public cloud. And that's very important. And the other element that we keep moving with is a continuously every aspect. We want to automate everything. Being a lean operation team, being few people, we can't afford to not automate anything. We have to make things easy to rebuild, to repeat, to iterate on so everything is needed to be automated. All our bare metal, all our open-stack deployment is free done like push button, almost kind of deployment. What we have today is a six open-stack location. We are in six geographic. And we have this multi-cloud kind of philosophy where some of this physical open-stack deployment are actually a mix of bare metal, open-stack, and also like public cloud solution where we really try to integrate all this together and take the best of each one. I'll get in a bit more detail soon. Some of the challenge with that is really that there's a long learning curve that was needed. I think nowadays it's much easier because there is way more like collective knowledge on what can be done, how it can be done. But it's still like a huge learning curve, especially to deploy like an open-stack infrastructure. The complexity of deployment of grade or patching is also like a big challenge. You really need to put an excessive amount of effort to automate from your pixie deployment to your configuration management on open-stack and the release or the management of your VM all that need to be thought through. And the other challenge is that there is a lot of immature options. There is a lot of things that just doesn't work. Or doesn't work for your use case. And that's where it's become very critical, like understanding your use case. So some stuff that we thought we wanted and we gave up on was the cloud API compatibility. In my opinion, it's something that is kind of an utopia and it's really not necessary. At the end, each cloud provider, each cloud solution has their own versions, their own tech on how to do things. So trying to abstract that with a different API code kind of abstracts the performance or the challenge you will have at the end. So in my opinion, it's just better to acknowledge that and deal with it. You will have different behavior on different cloud provider, on your private solution, on your bare metal, on your VM, depending on how you configure them. You need to consider all those parameters. And the big challenge for private deployment is really that there is no mature, on-premise block storage solution. Unless you are going to buy something from a vendor or buy something expensive or have like an army of people working on it to build something stable, you will have a lot of change for that. So going through the journey for us was really to stick to the core of OpenStack, keep it simple. We could not afford to try to build like a specific deployment to start looking at ironic or other stuff. It was not what we were trying to do. Having a simple network design to scale and innovate and iterate quickly, that was also like critical to us. We leveraged VLANs easily, which allowed us to have like our bare metal server and our VMs to communicate together easily without necessarily having some extra layer of management. We keep our operation very lean. We do leverage of VAR. It's very important to get the value added of your VAR. If you don't have that, you are just getting a reseller and that's pointless. For us, they do the full work, they make sure that the color coding is there, they make sure the labeling is there, they make sure that the cable are in the white part and when they don't do that, it gets us upset, obviously, but that's what we expect from a VAR. And that's important because we want to deliver a full work. We don't want to go back and say, oh, this server, we need to add a bit more memory, we want to add a bit more disk. It's just become we are there to operate and if we are at this point of not knowing what one server needs and not standardize globalism, we are better off going on public cloud and change our image from time to time and stone-style, things like that. And the other thing is that we did acknowledge that we are not in the 90s anymore. We really tried to leverage any of the technology of the time and try to embrace what was working and how we can get our infrastructure fully automated. So there is a lot of code. Our world pipeline is a CI CD pipeline where we can code and automate everything. We leverage a lot like configuration management tools, full Jenkins workflow and things like that. Some of the outcome were like an observed saving of 30% cost saving on a reduced server footprint. I heard this morning during the keynote, someone mentioning a 75% cost saving. I don't know how those people dance. We didn't see that and when we look at the number and when we are fair with our public cloud provider, that means when we include like the headcorns that need to be added to building your private cloud solutions there in the effort, we only end up to a 30% cost saving. It's still very significant when you spend a lot of money every month. 30% can mean a lot. But you have to be fair and think about the engineering effort you put in it. It's not just the OPEX. It's really like also the human resources and all the investment you have to do and keep doing to improve your infrastructure. So you need to keep the ongoing air in the effort to make sure your platform is living and evolving. It did give us like clear visibility on what is going on on our stack. We did learn a lot about our own applications, things that we were not able to know and learn easily with a public cloud. We understand better our workload or network traffic, some of the bottleneck, things that were kind of hiding or hard to catch. We know are able to find them more easily. The real bottom line is that you need like a strong technical need for it. It can't be just cost driven because in many cases 30% cost saving may not be enough compared to the opportunity or loss of opportunity to not move faster. So you really need to have like a compelling use case for you. And we cannot add that, but now we have new challenges. That's hyper-growth and seasonal traffic peak. Being in a house may not allow us to answer that. We have difficult time to do like proper capacity planning. When you grow quickly, when your use case changes, you have like new features being deployed. You have a big advertising traffic season. It's become very hard to predict how the business is going to do and you obviously wanted to do like better every time. And you know that you have long delay to procure hardware. So if you know that next week, if two days a business guy come to tell me like, hey, we have a big deal coming and we are going to triple business like tomorrow or next week, I won't be able to solve this problem with my private cloud deployment unless I plan like huge buffer. So that's where we wanted to really implement cloud bursting and make it a reality for us. So we can still answer the business use case and can be still able to move with the business and not prevent the business to grow. So we identified like the compute candidate for us that were able to burst and that's really in our use case, it's the bidding system or the bidder on the ad-serving system that are the legitimate candidates. We did a POC to validate the feasibility and the POC was very, very straightforward. That was basically just a VM on AWS VPC, a bare metal server, SSHL2 tunnel, everything on the same network. And we just put that in our load manager in house and we send the traffic to it and look what's going on. We know that we'll have more latency. We know that things will happen but it allows us to correct metrics and really understand what we are going to get if we want to scale that. It's going to be the impact. Can we truly do cloud bursting? So it was very important for us to measure this service impact. Are we able to deliver the service? Does every cloud provider can answer this need? If they are like in Oregon and we are in San Jose in California, we may have like too much latency. So how do we do that? Measuring this data was critical. It allows us to build it at scale and the reality for us, like if we wanted to be able to burst like 20 to 30% of our workload, well, we needed like a 10 gig AWS Direct Connect always on basically. We had to commit to that. So that's become a new baseline on our infrastructure. If you want to have bursting capability, we need to have this link ready to be used at any time. That means you start paying for the 10 gig port every month. 10.7 if you don't use your bursting capabilities. We link that directly to our OpenStack network. We just play with some routing and we have like a fully AWS VPC that is just dedicated to our bursting and that's the one on which we are going to provision our new compute nodes. We automate all the scanning we still have like in new eras with the new tools that are available to us. We try to leverage that and improve our system every time. Terraform make it very easy for us to deploy like a new set of resources in AWS VPC. And the only thing left to be done on our side is just making sure our CI CD pipelines that deploy the application is still working to be able to deploy in a seemingly less way. Anyway, to deploy easily to the AWS VPC. And from there we have like a cloud bursting solution but there is still some limit and I kind of touch on them. You have to understand the constraint of cloud bursting. And the network is probably becoming the biggest one. Depending on how your private deployment or data center deployment is made you may have like 1 gig, 10 gig, 20 or 100 gig packed but this has a hard limit on how much you can put through and your public cloud provider will have this one limit too. I believe the virtual gateway on Amazon some limit on how much bandwidth you can put through if you try to leverage AWS VPN solution you will have even like smaller limitation I think the limit to 1 gig or something like that. So there is different use case like how you are going to implement it keep it very simple and really understand those blockers or those new limits that may not be like just the CPU or the end balance and how your compute is going to perform on this cloud. Kind of a moot point if you can't even get the traffic going there. Latency is going to be a big one and the other one is going to be to understand the cost. If you have like a commitment for a direct connect with Amazon you pay the part, that's great. But then when you start bursting you also need to pay for the bandwidth and that's not bandwidth per gigabit it's going to be the total bandwidth transferred and this can be very pricey at some point. So you really need to understand like what am I getting is a business case where it's there. The good thing is that at least for us it allows us to burst as we need but it's very important to continue to measure service impact not all service like we have some of our service where the bidding latency is so critical we can't burst on it we need that to be close to our data store so 2-3 millisecond extra going to our cloud provider will be a killer for this workload so we need to make sure we measure constantly and understand like as the application changes as the interaction with the other backend service and change we need to constantly revisit and monitor that to make sure that we are bursting services that are okay to burst. So here we are now, 2017 we expanded our footprint to 2 new data center locations in US we implemented cloud bursting as I mentioned in a very simple way and lean way like we have done all the time and we start to leverage Terraform and now I think our next step and now we are going to move with our time is really around like SDN containers that's kind of things that we haven't figured out on our side but we know that at one point we need to be able to deliver this kind of solution to our engineering team to make sure they can keep moving the product forward the bottom line to me it's really about infrastructure automation it's more than just open stack it's more than a container or like private or public cloud it's really about how you automate your infrastructure and to us it's really like automate everything code everything including the network there shouldn't be any hesitation on that and all your hardware like software, rack and roll if you really get stuck in this idea of like going to change the memory or the disk of one specific server you will not succeed in like a private or hybrid cloud solution you really need to standardize your hardware what was hard before is no way easier and I think it's important to acknowledge what may feel like a failure there was like discussions this morning or like failure of the private cloud I think today it's way easier to do and that's probably an opportunity to look at business use case and make sure that you leverage a new technology to do that at the end I think it's very important to not lose yourself in a tech trend it's very easy to try to get everything you need to understand your business use case because at the end if you are not in my t-shirt if you are not here to build like a storage solution or a compute solution or a public cloud you are here to delight your customer and you are here to really serve like the business solutions they expect from you and that's where we have been focusing a lot and that has been our motto for a while and that's it any question? yes you mentioned briefly that some of your applications require some data is input when you run these workloads in the public cloud how do you deal with moving data back and forth and that was the biggest limitation we saw and that's something that requires strong collaboration with the engineering team because we have this back and forth to this big data store we use AeroSpike for our big SQL storage solution so when you have your compute node bursting in a public cloud and you have like all your AeroSpike code going back you add up like 10 millisecond latency just for like one or two calls so that's definitely one of the parameters and that's why we can't burst every workload only the one that is okay to have like a smaller a little bit higher latency is okay and the effort for us now is really to work and bring this data to the engineering team and try to be creative like how we can change the application to consider this use case should we have like some copy of this data in the public cloud so instead of going back it will just use it in the public cloud where we are bursting that's definitely something we are looking at and that's one of the challenge of cloud bursting so how often do you burst how long does it take and what's the biggest challenge we started to do cloud bursting like three or four months ago and since then we always have something bursting because we didn't necessarily do the proper capacity planning so right now I will say we are bursting all the time but the ultimate goal is really to just keep that as a mitigation plan and more like for Q4 season where we have like high peak traffic and that's not something we expect to use like for a daily peak for example that's really like for a big seasonal peak so what do you find the biggest challenge in doing that since you've been doing it for three or four months so the biggest challenge is definitely not like putting it in place I think the biggest challenge is to keep this measure of like the back and forth data like your application is really the driver and how efficient and how realistic is cloud bursting and that's where like having this discussion and bringing that back to the engineering team to see what are the options is probably the difficulty and the cost aspect of it is not that obvious so you end up like discovering like oh we are bursting but no our bill is like so is that the cleanup or bringing the state or data back from the public cloud into your realm is that a fair summary so we'll say it again sorry so then you have some state that's in the public cloud right at some point you say I no longer need it or whatever it is do you discard the data do you bring it back so right now it's what we burst in the public cloud is completely stateless so everything that needs to be stored is going back to the data center and that's where this big network limitation comes from alright there's no more questions thank you everyone