 how are you guys doing great great all right thank you very much everyone for coming here and I hope that you have an amazing day it's a great show for everyone and let me introduce Adam Chesterton director of engineering at the Warranty Music Group and I'm Renat Kasanchen with Altorus and today we're going to share with you a journey that Adam myself and our teams some of you are here thank you very much we shared together and these lessons learned we would love to to share with you so thank you very much and Adam please right thank you at all yeah so basically we're gonna talk today about Warner Music's groups recent path over the last 12 months a few things have changed based on our cloud sort of strategy and we sort of came and adopted the public clouds so that's why I'm here to talk to you today about so to get started and I what I want to do is actually cast back 12 months from where we are today so from WMG's point we were very early adopters of Cloud Foundry we've essentially been using it since 2011 2012 we were at a point that we were like three years mature and we're using the community version and I think we're still one of the largest organizations using the community version for a production workloads today so basically going about that 12 months so we basically we we were using multiple cloud providers some private some public we actually had a total of five different providers we were using and by this point we had multiple business applications which were running on the platform using actual business sensitive workloads but then something happened and we sort of hit that sort of moment where we were in a really really crowded sort of space we were rather than using multi cloud in terms of providers we sort of had like this hybrid cloud environmental sort of approach and the question we sort of asked ourselves is were we using the cloud in the right way so we technically had our lower environments development and QA and testing in the public cloud and then in the private cloud we had our staging production and type environment so across multiple different providers so we had actually managed to crack the way to rapidly build tests deploy code into cloud foundry but then we actually started to experience some other issues and I'll ever and all talk about a few of those yeah great so when we first started getting the workloads to open stack based environment which was with managed open stack provider we suddenly realized that the environment is not as reliable as we hoped it would be and one of the biggest issue we we faced early on was the inconsistency of the open stack API when it comes to giving the bush the virtual machines the pool of resources to later then deploy the cloud foundry or even some of the bush components so sometimes we would request let's say 10 VMs most of the time would get all 10 but sometimes would get two or nothing at all so you would be wondering like what's going on so we try to get the answer ourselves and unfortunately the everyone in that game it was back in 2013 early 14 when we started getting this environment stood up with the open stack provider everyone of you knows who if you dealt with open stack it came a long way when it comes to maturity so back into 713 you can figure out how big a maturity of it was so we were having issues with the environment just like pretty much everybody else and unfortunately because the we hope that the managed provider would take care of that problem themselves but seems like no one had access to the proper logs us them everybody right so we all were confused and we had to stand up the environment quickly so because of the uncertainty when it comes to getting those VMs from from open stack we face those issues early on then when we were actually able to get the the environment up and running we would provision bush first and we would provision the call foundry with with bush and everything seems perfect until we would get the timeouts unexpectedly out of nowhere and the problem was not just as much as oh yeah we get a timeout well stuff happens right the problem was that if you go back to what Adam said that remember we were coming from just the AWS only environment and for for us the open stock did these two regions that were going into open stock they were supposed to be the production the staging and production environments where the developers supposed to be using the the dev test environment in the US and then we would bring the applications from AWS into staging and then production the key problem that we had is that these these errors that apparently the reason why we had those errors the timeouts it was it had to it had to do with the open stack itself it was a common issue that many operators of open stack at a time faced ultimately it was fixed by just with the next release of open stack and instead of trying to like patch it right well it was passed first but then the next release came out issues disappeared the key problem we had was not not the fact that we had issues you always have them the problem we experienced was that we could not get consistent results we were not testing the same thing because we we thought well funny would give us oh like environments are now the same we can go from one cloud to another reality was a little bit different so all these outside factors contributed to one key point where we could not guarantee a certain time of the week or day or in the future where we would actually be ready for production deployment so all of these issues were ultimately resulted in the situations where we're like hey guys we can probably do it but we can not promise anything so and that's what happened where the end and end users those business lines of business those developer groups that were bringing their applications and expected to have a reliable environment that they could go with production with they were like what's going on guys can you really give us something can you promise anything and with those completely unreliable unpredictable situations we could not promise everything and in obviously it wasn't good so cool thank you so yeah so we basically we did a big pause and a big sort of a reset and sort of took a look at the sort of the strategy we had as Renault was sort of saying it was becoming hard to predict the unknown this whole hybrid environment cloud situation we were sort of in was basically impacting as we were sort of setting out deadlines for business and for new applications and for new projects that we were onboarding onto Cloud Foundry so do you take that sort of that area where you become over cautious in terms of your deliverables which really goes away against the the power of what Cloud Foundry can do in terms of rapid deployment with or do you basically take a risk and think hey you know what it may be smooth it may not be and then you come into a very sticky situation with business stakeholders so we actually found it wasn't code or anything that was causing us problems it was these external factors and look at this player at this point we basically had a platform that was three years mature and these sorts of things we shouldn't really be having we would talk with our vendors I mentioned we were using five every time we would have one of these sorts of issues and the recommendation that kept coming back to us as we did the post mortems was look you guys need to go one way or the other it's and we basically didn't really have a compelling reason to stay with the model that we did so we took this pause and a reset and sort of think about our strategy that we had and then looking back over the last three years as we grew and as we expanded our actual Cloud Foundry presence within water music we actually we took a look out and saw how much the public cloud had actually evolved and adapted things such as like in terms of security reliability one of the things and WMG has came to this conference multiple times and spoken about our cloud vision and one of the things that the reason we built the platform that we did was we expected the price of compute and disk space to decrease over time and with the with the price was between the multiple public cloud providers that's exactly what happened but one thing we did start to see was the reliability side of the public cloud actually started to increase and because we had this hybrid clouds environment we started seeing from our lower environments have very similar if not comparable uptime to our managed private cloud providers which going back three or four years was just not the case so we basically went and we did a comparison do you want to talk through some of the stuff we did well yeah so first of all the the challenges that we had we're not just what you just described remember we're talking back about like this whole story started in 2013 right when just like how many of you guys actually I have a question how many of you attended the 2013 event right here in in this very building the first club fundry so any very quite a few so if you remember we had one of the great early visionaries and evangelists of club fundry janet and more of warning music group right he was very passionate about making club fundry work to make the composable enterprise and get the agility into an organization so we faced not just issues we were talking about with open stack we also were coming off from club fundry 1.0 and we had like we had a bunch of components that were customized like a cheap proxy router and it was like a a bunch of stuff like it became a snowball so and then add five six club providers to it you have many more problems than you have hands so we were in the situation where we wanted to simplify the the the risk areas kind of our risk profile and that's why coming back to an environment that would be reliable that would allow us the team responsible for club fundry to promise to the business user that yes that is the date and stay behind it and that was one of the key benefits that we were seeking the reliability and be able to stay true to our promises so with with the public cloud I would say that so no we sat back we looked we looked we did the comparisons we did the sort of thing we took all of this on board and we came to the conclusion that we wanted to go public cloud all the way and that was not as simple as what we thought as I mentioned a couple times we already had infrastructure there but we didn't really want to just take what we had in the public cloud and just move it straight into sort of the exact same model that we had we're three years mature at this point and we wanted to take advantage of some of the changes and enhancements that's happened within the public cloud so what did we do next we actually went out and we wanted to we spoke with aws architects and talked about ways how we could take advantage of things like vpcs elb's direct connects multi availability zones and how we could actually provide a model for all of our environments and basically look to rebuild everything from scratch so automation for us was actually extremely important so we worked and worked very hard to actually get things like cloud formation and to build out a sort of a foundation of cloud foundry and then basically using Bosch on top of that in order to deploy the applications and do that across all of the environments that we had security was of course a really really big factor and there's been so much enhancements in the security sort of remit on the public cloud over the last few years so we got a bunch of our colleagues went to reinvent last year with the purpose of actually trying to find partners to help us with this challenge and I think one of the things that sort of came out that's very clear there are a lot of people in this space this is a real problem out there in terms of people need help in terms of like if I take something on my own and go down the public cloud route I need help if I go to a managed service I get all that as part of the the cost that I get so it's a popular trend and we basically went and listened aligned ourselves with the right partner that we needed that fitted the business and operating model that we had also another thing that was very important to us was actually the upgrade path of cloud foundry we're very excited about Diego we wanted to move to that and actually building up resiliency within cloud foundry which was something we really had to do to move into this multi-availability zone model and we wanted to make sure that we actually had a path that we could stay in line with upgrades and take advantage of some of the great stuff that's coming up so this whole entire project kicked off around the beginning of last summer and we actually fully completed it in February this year so a couple of the lessons learned you take the first one with you yeah well first of all being able to promise to the lines of businesses when they can get their applications into the environments and what kind of service levels that they can expect is of a gigantic advantage if you are operating with multiple consumers and and the price that as a result we were willing to pay for having that expectations on availability on the reliability of what we can promise is actually was much higher that we were willing to pay in the result right initially without oh we're gonna cloud finding and take care of it but it wasn't the case so controlling costs was also an issue i don't gonna speak on this but i can just add a little bit that we are not gonna stop here we're also looking into some very seemingly cool ways to control costs when it comes to public cloud we even go all the way to even look at the the spot instances how we can use them for some of the ga working and but now back to Adam for the cost what we actually did absolutely i mean like cost is of course very important like coming from a music company the music industry is just not what it was 10 20 30 plus years ago so cost is actually a really important factor to us because every dollar we're spending is a dollar we're not spending on artists enough finding new artists marketing and we have a commitment back to the company to make sure that we're spending the company's money in the right way so one of the things we learned especially during the migration something we really really factored in place was when you're spinning up new environments you're going to run for a period of time of parallel infrastructure up there and in terms of really excellent planning and testing we actually minimized the amount of time that we had both environments up we had them up we had the ability to test spun things down spun things back up and it allowed us to really manage our costs so that we weren't running with basically two production environments for a considerable amount of time and also we've talked a lot especially after the migration in terms of utilizing things such as reserved instances and for those services and that are basically up 24 by 7 five seven days a week is a case of like taking the reserved instance approach helps us to easily save 40 to 60 percent based on on-demand pricing but one of the beauties of both cloud foundry and using the aws economics is a case of I can basically have all my DEA's as reserved instances and if I need more capacity I can spin another DEA up very very quickly on demand I'm not committed I'm paying as I go but if I find I don't need to recover that capacity if actually it's a case of I do need to as I'm onboarding more applications if I'm finding in the long term I do need this capacity I can just convert that immediately to reserved instance pricing and I get those benefits it's from from an accounting thing it it means that we can be very creative and actually make sure that we're spending money well so from a WMG perspective this whole initiative that we had and it was a good nine month sort of initiative was what was the immediate benefits back and for us one of the big things is we actually accomplish what we actually wanted to do by moving solely into the public cloud we've actually removed this this whole random unknowns that kept happening it actually means that when we're talking with the business and setting expectations we can be significantly more more accurate one of the best lesson learns that from the actual entire project as well is we actually migrated with absolutely no disruption to the business users or applications it was extremely transparent to them and that was mainly down to the planning the testing the uat and actually testing the migration multiple times before we actually performed it so we didn't disrupt sla's or anything around those sorts and for the end users performance is the same we we have all the metrics we've done all that sort of testing in place and in some ways the sla is actually improved because we can now give better commitments so when business users come to us and want functionality we've got a much better sense of the whole end to end of what happens so elasticity a few lessons we learn on the elasticity and is that apparently public cloud is not as elastic as you may think it is so what we experienced is that when if let's say you need x number of instances of a certain type things like it may seem like aws is like infinite pool of resources but when you cannot get them you're like what's going on and the reality is that it's not as elastic as you may think it is because if you are using instances of a certain type and you need a lot of them sometimes you may need to talk to folks at aws to make sure it's there if it's or it's gonna be there pre-reserved and so on so forth and what we learn is that if your workloads are not subject to extreme spikes sometimes the elasticity you get in the managed private cloud that's managed by the vendor it may be as elastic as you want right and so unless you you're flexible in terms of the types of instances you're using keep keep that in mind and overall we we we find it really great to work with a public cloud provider such as aws however what we find is that some of the services that might be some of the services look extremely tempting to end users and they start looking at them and what we learn is that sometimes they are not production grade because they might be very new so what I wanted to maybe Adam can you show absolutely I mean a couple of examples that we sort of have here especially around the elasticity especially if you don't have reserved instances when you're in the private cloud you can have as much elasticity as you pay for in the public cloud one of the things that we learned very quickly was and especially as you start talking about things like spot instances and other things if you're buying on demand and you have a very specific instance like that you're after you there you cannot guarantee that that is always going to be there aws may come around to you and say hey I do not have it in this region but I could throw you it somewhere else so I guess one thing for the sort of future as you're building applications and going down that sort of model in terms of trying to be flexible in terms of the instance types you're using and if you do need a guarantee if you have a process that you run every night and you need x-mat you build the applications to either scale based on the instance type and size that you get back so yeah and for those of you who would like to dig much deeper into the details and get some of the lessons learned that you can take home we have prepared a hangout can we get a few copies right there and I'll actually help distribute them so we put together a lot of resources here essentially it's a reference architecture for something as simple as a east and west aws regions and you have club honor in each we have the patterns and anti-patterns the five of each five patterns and five anti-patterns and we also prepared a list of what it seems like 10 tips in several categories that saw 10 different problems and there are 10 suggested solutions so we hope that you will enjoy reading about this and if some of you may use it in your real work will be extremely happy to to help some of you and pass around the knowledge and overall thank you very much for using club foundry because all together we're very strong and hopefully we will have many more case studies in the in the coming years on different use cases of club foundry great and thank you and with that we'll turn over to any questions anyone may have or do you feel confident that if you decided to move from aws to azure to wherever that you'd be able to do that and what kind of time frame do you think that you'd be up against yeah i think that's actually a really really great question so as i sort of i've since essentially over the course of the last few years has been rapid change across all of the public class so i i didn't want to specifically name ones we're using aws at the moment i think what's clear between google and azure is that there's a lot of competition and they're trying to keep um level that's helping on price but that is also comparable and so you take something like azure which has very much adopted cloud foundry there as well with um running on the aws model we did have to spend a lot of time working with cloud formation and the aws tool sets and the the sort of thing i was sort of look at is this a case of like there's lots of options out there so if we wanted to go down that sort of path we absolutely could i think it's sort of the same sort of challenges that i put on the learning things around security is probably still one of the the biggest sort of areas and they do things slightly different and have different terms but um we could take the model and move there if we so wished so one of the tips in the operating tips when public private clouds in fact is uh uh is number uh number nine which is a provision with infrastructure service native service so we we prepared the environments uh in natal us with the cloud formation so that and it seems to become like a like a normal practice nowadays compared to just a couple of years ago where you would try to use borsch for that right and so now we kind of prepare the environments and then borsch goes to work right so that's tip number nine and uh uh the chip i would add to this is that the um uh the differences in the uh not just the performance but also the the uh some of the like storage service um object store like you may have some dependencies that not necessarily in the cloud foundry space but where you may want to cross check what you use and this what we experienced is that it's very tempting for some developer teams to start using a new service from aws adam you you said the other day that uh but it wasn't ready do you remember what was yeah we so we had a situation coming out to reinvent this year so one of the great things about aws is they have lots of native services and um i talk a lot with the account teams that we have there and as a developer you you sort of get like stuff when it's ready to rock and roll uh maybe not as ready for prime time as we perhaps had so one example coming out of reinvent was around schema migrations um and essentially we had a developer who basically they announced this feature that you could basically import database schemas from oracle into postgres um developer went he tried it out instantly ran into a to a problem so he's there googling and everything looking around and then sort of came and came up as like this is really strange i can't find anything and actually the the i went to him was like you do know that this was only like at like 12 hours ago no one knew that this thing existed so there is a notion like you can be early adopters with aws and i think especially with the native services um as a developer you absolutely should um one of the things i've talked about internally at warner is you look at things like lambda from when it was announced um well just over 18 months ago to to what they were actually announcing in terms of feature sets that reinvent last year so and there's definitely um something we we're looking at so but you've got to resist the temptation to just dive straight in i think any other questions yes just say just say just one more time sorry yep yeah it was it was one of the things we looked at especially as we were sort of as we were sort of going around the path and looking at the public cloud now we had experience of building it ourselves and at the time we were very comfortable and i think like some of the tool sets that things like microsoft have put in place with as your that got announced recently will definitely help in that area and we're lucky because we've used a lot of support from our tourists to sort of help us so at the moment we've always been um we felt capable in house of dealing with that um but i think it's really great of the adoption and the tool sets out there to to make it much easier for people so it was one of the things that we looked at but that wasn't one of the individual problems that we have but it's a real problem out there thank you so i noticed in your reference architecture that you just handed out that you have like an example of like ussies and us west and it says in their two separate cloud foundry deployments what's the developer experience like to deploy to those is that a single like is that handled as a single cf push or is that two targeted cf pushes and how does that work so what what we have at the moment so yes so the the architecture which is sort of an example of the the sort of path that we're we're sort of going to um the answer is yes we we essentially at the moment is highly around multiple availability zones with a multiple regions coming later um but it's been a sort of model that we've been thinking about is like what's that sort of scenario two scenarios that you sort of pick out one is what happens if i lose an entire availability zone and potentially half my infrastructure um in terms of making sure that the platform itself doesn't start doing crazy things about spinning up lots of things and being unsure what's happening um in terms of the deployment um though between the model that we sort of have is it would be the single push and and go out there that's the model that we're we're working through at the moment so thank you any other questions brilliant thank you very much guys