 All right. Thank you all for coming So before we get started I need to make our requisite fire exit announcement Please note the exit locations nearest you and Find one if you need to use it In the event of a fire alarm or other emergency, please calmly exit to the public concourse area Emergency exit stairwells look leading to the outside of this facility are located along the public concourse For your safety in an emergency, please follow the directions of the public safety staff I think there's a continuous delivery joke in there somewhere, but hey you deploy your concourse where you want All right, who's this guy up here? My name is Jonathan Regear. I've been with Garmin for about seven years as a senior software engineer I've been in the industry for about 20 years. This is my second go-around with CF The first one was with cold fusion in the 90s. It was great at the time. I like cloud foundry a lot better Who's Garmin? We are a almost 30 year old tech company. We do about three billion in revenue every year We serve five different markets So we do outdoors where you can purchase dog training equipment handheld GPS is both sites things like that We also do really cool watches One of the neatest things about our new watches are our quick fit bands So it's easy to take your band off and have a nice silicon band for that run that you do And then when you need to dress up to go out take the same watch put a leather band on it and you're good to go We also integrate our products. So that bow site you take a shot The bow site figures out the angle of your shot and then determines From that where the location of that shot went and your watch gets a location and you can use your watch to navigate to that So that's that's pretty cool. We're also in automotive both on-dash and in-dash. We're also in boats. We do chart plotters Fish finders depth scanners things like that. We also do autopilots We've got some really cool automation between the watch and the boat So if you're standing at the front of your boat and deciding that you're not liking where you're heading You can adjust your course from your watch pretty neat We also do a lot of airplane stuff So all the way from autopilot to the glass cockpit We do all of that as well. We're in some small planes like experimental stuff You can put us in your four-seater Cessna or you can put us in a Learjet. We're wearing all those platforms Garmin uses two flavors of cloud foundry. We we are a pivotal shop right now. Our production workloads are all running on 110 We've got two. I'm sorry three two dot oh foundations up and running Our plan is to have those taking production traffic within Within the next month or two Here's a quick roadmap for the things I want to talk about and by the way I'm going to try and keep it to 15 minutes so that anyone that wants to attend the diverse diversity luncheon We'll have the you won't have to miss anything here. So I want to talk about some pain points We did some things wrong. We'd like you to learn from them. So you don't have to do them as well Automation helps you do more with less We want to talk about how we've won with the platform what running for at scale has looked like for us so far and then Probably last but not least why does all this matter This presentation I'll assess me Street is brought to you by the number 100. We'll get to that in a little bit So stay tuned the first pain So Bill Gates in 1981 said that 640 K ought to be enough for anybody if we fast forward a few years We decided that hey a slash 24 is a lot of IPs. That should be plenty to build the foundation, right? It was for a while now Understandably the Verizon's and the Comcast and they were probably thinking like what what do you do anything with a slash 24 for? Garmin is not quite that size our scales may be a little smaller So slash 24 seemed like a really good idea at the time But quickly it looked like this So as we moved more apps into the platform and and those apps got bigger and bigger We eventually realized that that was not going to be enough IPs especially when you consider that underneath of cloud founder you have compilation VMs which Claim an IP whether they're running or not and their job is to build new VMs for the platform and Because of repaving necessities every VM that's actually running in the platform has two IP There are two IPs allocated to it the one that's running and the one that's going to be rebuilt So quickly we look like this So we talked to Pivotal and came up with the idea of using isolation segments as a solution to that problem So to add IPs to a running foundation isn't possible It requires an outage an isolation segment is kind of like a mini cloud foundry that sits beside the main one You can route to it the same as as the original one, but it has its own routers and its own Diego cells so we thought ooh kind of like cloning the platform except we get to keep everything We were hoping it would work out like the picture there ended up more like that so and that's not on Pivotal That's not on cloud foundry that was on us Um any guesses as to why the isolation segment didn't work as as we thought it would If you're thinking firewall you'd be correct So we worked really really hard to make sure that we had all the firewall rules to get one of our major applications into the isolation segment And then quickly realized that that much work was not going to be feasible to get everything else in We tried to move a couple things. It didn't go well Then we realized um, I'll use a bit of an illustration here If you've ever been to a restaurant when a tour bus pulls up all of a sudden the restaurant is full and nobody Nobody can can get any service and You know you kind of have something like this Well, we realized hey when the tour bus leaves or when our largest application was no longer running in the main cloud Foundry and was moved to the isolate the isolation segment We didn't have to worry about it anymore. It was kind of like a tour bus leaving a restaurant All of a sudden the service gets better and there's tables free So that isolation segment ended up being a success not the way we had first intended, but it worked out really well for us Another one of our pain points was around unsupported solutions If pivotal tells you hey, this isn't a good idea. We won't support it, but you can do it anyway Listen, they're telling you that for a very good reason In our case firehose assist log was one of our major pain points The reason we had to use firehose assist log was because the entry point to our logging system is kafka Our goal with kafka was to bring logs into kafka first and then anywhere we wanted to stream them was merely another Another thing reading off of kafka So rather than putting the load on the platform doing the work The load is on kafka whether we want to stream to one place or three or four or ten So we have good reason that we wanted kafka and ended up having good reason to use firehose assist log But we ended up scaling the platform quite a bit just to handle firehose assist log Doppler Cloud controller and a couple of other things all required extra scale just for the purposes of keeping syslog up and running or keeping firehose assist log up and running We are really looking forward to pcf2 because this problem goes away. I'm really excited not to support that anymore Another one of our pain points was shared domain routing For obvious reasons most of the things in cloud foundry at garmin Require a route that ends in dot garmin dot com So easy solution we'll create a shared domain garmin dot com and then things like buy dot garmin dot com Connect dot garmin dot com other things dub dub dub. They all just work. That was great It worked for a long time until we started to do path based routing So when you have dub dub dub dot garmin dot com slash running dynamics or slash cycling dynamics Which both happen to be marketing apps? That was okay as long as you only have one org that's handling a A path based route on a shared domain You're golden but when we got to Services dot garmin dot com and we started to put services into the platform Where not all of the orgs that are running services are the same. We all we have multiple groups building services We quickly ran into problems The solution is private domains which can be shared So you end up with instead of a shared domain a shared private domain And what we decided to do was have an ops org that owns all of our domains and then we share those domains with other Excuse me with other orgs as necessary that that worked out really well It took us a little work to get there getting rid of that shared domain was uh Was challenging but it but it worked out for us Let's move to automation One of my personal pain points was managing the platform So anytime that someone needed access to an org or we needed to uh We needed to give people a more quota things like that that all fell on me I did I did script all of that, but it was still me writing the script. So I if I was on vacation or Wanted to take lunch Sometimes I was I ended up being the bottleneck for other groups and I didn't I didn't want to do that I found a um a public repo called cf management that was built by a pivotal services member That's worked out really really well for us It handles org creation space creation User access it will do quotas. It'll handle domains. It didn't handle shared private domains So that was a challenge that we had to solve So I learned go and made a pull request and now if you would like to do shared private domains using cf management You can Another big win for us was automation. So one of our one of our uh, you must be this tall to ride the ride Requirements was automation. So we used we used pipelines for all of our apps and Those pcf2 foundations that uh that I just talked about all of those are deployed via pipeline So from creating the ops manager the ert All the tiles all the users that are needed by the platform such as Build pack management the user that cf management runs under all of that happens off of pipelines that are that are deployed by concourse All right, let's get to those cool numbers So this is our first 100 Now again, if you're a comcast you might be like oh how queened but remember that slash 24 That foundation took a hundred requests a minute on one. I'm sorry a hundred thousand requests a minute on one application It actually peaked higher than that Now again, that's one application that is an application called connect that handles Most of the stuff with our wearables and our running apps and things like that that app is really really peaky So on a saturday or a sunday when half the world goes for a run We have a really really bad time about two hours later when they all come back and they want to see that data so We wanted to make sure that this this this amount of traffic can be handled but but I mean I could scale up vms to handle this and just leave them there all the time But that's a huge waste of compute So the other the flip side of this is I needed to be able to scale up to handle this much and down So the autoscaler that we get with a pivotal cloud foundry has been really great for that. So that compute is never Wasted it's consumed when I need it and then I put it back into the pool and other apps can use it when they need it So that's that's burst traffic now We also sustain a lot of traffic We've we've taken 100 million requests a day for for quite a while These numbers were pulled from from march. So so again, we have we have peak scale where we need an autoscaler And then we have heavy sustained scale where the platform just has to continue performing all the time We are Obviously running mission critical applications Our sso or signal sign-in application is just one of the applications in the platform That one has to be up 24 7 whether you're a pilot and you want to plan a flight Or you want to sign in to connect with to sync your watch data or various other applications that we run sso needs to be there all the time to handle all of that traffic and then For those of you with your acronym bingo cards, we've got a few there for you. Does everyone know what gdpr is? Okay, all right, so you know It's either finished gdpr on time or pay a lot of money PCF has been a huge boon for us as we've Written a lot of new code to handle gdpr Being able to pipeline all that work and put it in the cloud foundry means that my developers don't have to focus on Getting the code out there. They just have to focus on writing the code Deploying at scale is another challenge that we have enjoyed working with so we deployed 37 or 3400 times in the month of march So that's that's a for us. That's a pretty big number Uh And when you're deploying if you have an app such as connect where you have 100 instances running You may not want to deploy that app With another 100 instances in a blue-green scenario If you're going to do that your platform might have a bad time Just take more for it. So we found a plugin called scale over and it wasn't quite what we needed so again polished up some go code and Created a pull request and what that plugin does is you deploy your blue version of your application with Five 10 instances, whatever you want do the testing make sure the application is is good to go And then the scale over plugin takes over and what it does is At parameters you set you have your your green instance, which has a huge number of instance or a huge app with a huge number of instances Your blue app which doesn't have as many in the and in any controlled fashion the plugin does this So you end up with your your now green application having your 100 instances And your former green application is is now gone and the and it does that without straining the platform too badly So why does all this matter? There's a lot of a lot of benefit to the platform one of the main reasons this matters Um The benefits on the platform are huge and the care and feeding on it are small There are three people at garmin who who maintain cloud foundry I'm one of them It is not my full-time job and the other two if they don't do it as a full-time job either So we have now seven foundations One of which is taking production traffic and many more that will be taking production traffic soon And there are three of us that maintain those platforms We've had um, we've had some labs where we've had uh, both pivotal and ECS team now east uh now cgi come in So some of the work has been done by more than three people, but the day to day is is a is a really really small team The stability is another huge huge benefit here The platform just doesn't go down. We have had two outages in the last year One of them was our fault for not planning well and another one was an outage We had to take to update a core switch and uh, we just the network had to go dark now having multiple production foundations Made this a fairly much a non-event So we moved a lot a lot of production traffic to a different foundation and routed there and we were in really good shape I'm pretty sure this haiku is going to live on an infamy for a really long time But I want to bring it up because it is It's it's life-changing for our developers whether we're running php No, we don't do new php. We have legacy apps that we care about uh java node Static my developer doesn't have to care what they're running. They don't have to build a special vm for it It just runs So I didn't I didn't put the attribution on there But for those of you who are interested Anci Fakuri is the guy who wrote this haiku and I think it's been floating around for quite a while This is the one you want to write down culture change Culture change is hard. We've heard that on the stage the keynote stage a few times and It's it's it's not the easiest thing to do But when we don't have to worry about Where my code is going to run whether it's like oh, I need to find a vm where I want to run my this new idea that I have or I need to find a place to run this giant production application In both of those cases. I don't have to care. I have a platform and it runs my code for me I can set up sandboxes easily from my developers One of the things garm has been focusing on for a couple years is figuring out how we can get developers spending More time coding and we've we've done things as crazy as hey Let's try and get developers out of as many meetings as we possibly can That moved the needle a little bit But cloud foundry has been the real needle mover for us because now A production deployment isn't a okay. Let's make four tickets We have to write instructions for the team that's going to actually deploy this to some vms that i'm not allowed to touch All that's gone and my developers have pipelines where they deploy their code So that that's been the huge needle mover for us to get us back to Writing code that concludes my presentation. Thank you all for attending and I'm open for questions Yes The people Our initial team. I think was four or five We've had we've had a couple of dojos like I said with with pivotal and with ecs team So those got as big as seven or eight, but we've never had a ton of people I touched on it briefly, but the we just spent A significant amount of time. I think we were in our lab for four weeks building out that that concourse pipeline and Once we move everything to 2.0 every single pipeline or every single foundation we have will be deployed by a pipeline and so then The work to get everything set up is now Fill out the pipeline parameter file, which is a fairly decent sized file But once that file is filled out you create a jumpbox you deploy You deploy bosh and then and you deploy concourse, of course And then and then you go so it should be about a four-hour manual effort To get a new foundation set up and that pipeline as we were building the pipeline We were actually standing up three foundations in parallel I don't have a number of an answer freely hundreds I think our current production App instance count is in the three to five hundred range And then we're over a thousand in in non-production and and we're growing all the time Um At this point one of our challenges is just how do we how do we How do we evangelize fast enough to get people onto the platform? Or how do we how do we get to the people that are interested? So the hunger is there and and and we sort of have the human limit of how do we get to these people? And and train them for the platform Yes I'm sorry. I missed some of your question. There's a microphone here. I don't have anyone to run it though You were talking about culture change and yes getting things to run much faster through the pipeline great for developers Do you have visibility to the business side? Do you have Situation where there's more interlocks and complexity on that side and was there simplification and culture change as well? Yeah, we're still working on that piece. That's those are some conversations that are going on now But one of the one of the things we've been looking at is trying to to build the the product Oh, that's gonna fall. Oh, it didn't we're trying to build the product owner role And so so I was actually working with a team recently where we have a ba a pm We have a technical architect And then we have a another another role and I can't remember the title but All of those roles I I drew up on the board for the team what all their roles are and what all those folks were doing And it was kind of scary when I started putting dots between things that were overlapping And eventually we saw that most of the roles in the room are Are overlapped so so we're trying to trying to make those teams smaller But but yeah, there is there's definitely definite friction there where We have cases where the business wants to talk directly to the developers We're trying to position a product owner in between those people so that you have The developers who know exactly what they need to work on and then we're trying to position that product owner Where the business can can feed the product owner the large epic type requests like hey I want the app to do this not hey this button's not where I want it That should be a bug the product or the the business should never be talking to a developer about that That should be a support ticket that sits at the top of the backlog the developer picks it up and how it goes I saw that you still have some 1.10 foundations those curious how you're handling upgrades Are you doing in place or are you doing side by sides with the two o's all the one x's we've upgraded One x in place we've started with non-prod made sure everything was was happy and then we've deployed We've deployed the upgrade in place and that's that's been largely successful We've we've covered or uncovered some gotchas as we've deployed in the non-prod's and then the prods have gone pretty well I'm not sure if pivotal would say it's a good idea to take a one x to a two x upgrade in place The the versions are so different that we felt that that wasn't a good idea. We also decided Uh On the development side, we're iterating everything And and we kind of realized when we first built the foundation. We had this idea that oh This is awesome. We're done. We're we're good and that's just not the case We we we need to iterate everything. So we're pipelining There's a number of other things where we realized some of our mistakes in in the one x foundation and we decided Now's a really good time to to scale up a lot So like those slash 24s are no longer There's multiple slash 23s as far as ip counts in the new foundation And uh, just because it's it's so different. We've decided that we're going to ask developers We're gonna we're going to run them both in parallel for a certain amount of time and ask our developers Hey, I realize you've got your code running on the 1o foundation I'd like you to run it on the 1 2 foundation or the 2o foundation That gives them time to to update their pipelines make sure everything's good before we actually run our production traffic there And then once they're done, we're going to shut that that 1o foundation or that 110 foundation down But yeah, all of our previous upgrades were done in place We just felt the 2x foundation was a big enough change that we didn't want to run in place Yes, yeah, the goal is that the pipeline handles everything and that that three-man team is is able to to focus on other things I've got 12 30. So I think I'm almost out of time Again, if you're interested in the developer or the diversity luncheon, I think it started 10 minutes ago So I don't want to kick you out, but just want to make you aware if you're interested in that Thank you all for coming