 Good afternoon. I was going, wow, more people than we were expecting. I'm Alex Curtis. I'm an application admin at Garmin International. I'm joined here, you guys will meet here in a minute, Jonathan Regere, who is application development and Brandon Henry's system administration. And we're here to tell you guys a story about how we implemented a pass solution in four weeks. And pass solution we obviously implement is Cloud Foundry. We'll go through a little bit with you. The decision, why we went with Cloud Foundry, how we implemented it, and some of the learning opportunities we had along the way. We do have one caveat, though. We actually are still in the middle of implementing it, most of it. Or I guess we've got most of it in. We're still getting some pieces in. So we don't actually have our apps in production yet, but that is going to change here in a couple of weeks. So just want to make that really clear. But we do, Brandon's going to go over here in a minute, kind of what the status of the actual layout of the environment is now and where it's going to be very soon. So first, a little bit about us. Garmin were based in Olathe, Kansas. We're founded in 1989, but Dr. Minn-Kow and Gary Burrell. We have annual revenue of $2.9 billion. Our markets include marine outdoor fitness aviation, which is kind of the cornerstone for us, as well as automotive and mobile. The past, we're a device company. We make hardware, and they're all the little doodads you can think of, most of which have GPS in them. We're a hardware company, probably always will be, but for the GPS joke. So the present, the present and the future, as I said, we are a device company. But however, due to some ever-changing technology, the way people are using their phones and other devices, we're having to expand our product lines a little bit. So I'm sure, like everybody's seen, like the vivo, the health and wellness, that the push we're making in that realm. So this has not only expanded the scope of our business, it's also expanded the scope of our IT department. So hardware is still our thing. However, we're shifting a little bit more into software. We're taking a little more of a look at that. So with software that we look at, both what we develop as well as what we purchase, that software doesn't always impact directly on the bottom line of companies. So efficiency and cost actually are real driving force in what we choose. So all these health and fitness apps that we have coming in, all these devices are phoning home. And they're coming into, and basically these health and fitness devices that we're building now, we're hoping to fill the gap of the Nuvi and a lot of the, excuse me, the automobile pieces. So the Nuvi, not everybody's buying on dashboard mounted GPS's anymore. So we have to fill that gap. And thankfully, a few years ago, they saw that and before it actually started trending down. So the health and fitness stuff is really kicking off. So all these devices are phoning back home to our data center and our main data center to Garmin Connect, which Garmin Connect is just an example of one of the many apps that we house at Garmin. But it's probably arguably coming one of the biggest if not one of the most important. So everything across all of our segments is phoning back into this, in Connect. So users can upload, they can share, they can monitor, they can analyze what they're doing activity-wise. And that can be anything with health and fitness or run a kayak trip or even flying. If you go on a flight, you can track that here as well. And Connect also integrates with most of, most of not all of our devices coming off now. So, you know, forerunner for running, edge for cycling, the Phoenix 3, everything that we're building now can come back to Garmin. And this is kind of an example of what the dashboard looks. So Connect has come a long way. When it first started, it was a very, it was a massive monolithic app. I think it was one-ear file. And that obviously isn't sustainable. So we're pushing more to go to more segregate a model and it's come a long way. It still has a long way to go. But the back end is Oracle and also Elastic searches is just brought online now and a lot of SQL-less database integrations coming as well. It's also supported by several other IT infrastructure as well as applications. And I'll go into that here in a little bit. But like many of you who work for device companies or any companies that actually sell products, the holidays are really tough on us because Connect really gets beat up pretty good around the holidays. So everybody's unwrapping their new devices and they're opening up Christmas Day or whatever it is and they're phoning home. And that's when Connect really starts getting stressed. So we're looking at ways to address this and a lot of it, the direction we decided to go was splitting this out. So splitting the app out even more, segregating the individual services and microservices or one way we're looking. But Cloud Foundry really believes one of the keys to the strategy of us breaking up the app. The first app that I mentioned before that's slated to go into Cloud Foundry into production is our SSO. It's our single sign-on app and this is our enterprise-wide app. We use it across not only Connect but other apps that you may or may not be familiar with like the FlyGarm and the MyGarm and the ByGarm and so our main storefront app. This app here, it will be the first and hopefully by the end of the month we're hoping. We had some setbacks when we got Cloud Foundry installed not with Cloud Foundry with the app itself. It wasn't ready for, it wasn't handling multiple data centers efficiently so we had to delay the release of it in Cloud Foundry. But one of the requirements for us integrating SSO into Cloud Foundry is we still have a legacy piece of SSO still there. So we have to be able to not only initially support SSO and Cloud Foundry, we're gonna have to also integrate with the legacy piece as well. But we're both working on that. But, so we're getting a little ahead of myself here. So let's talk about how we got here first. So we're gonna bring up Jonathan Regere and he's gonna show you the roadmap how we got here. Thank you, sir. You're welcome. Hi everybody. All right, so before I get into what we're doing I wanted to talk a little bit about why we need to change. So deploying wars to servers, it works. Why would we want to change that? Well, I came up with a couple of reasons. Number one is the build out workload for modern infrastructure. It takes a long time sometimes to get a server up and running after the bear server is handed over from an infrastructure team. You still have to apply Tomcat and things like that. Utilization isn't great on regular infrastructure. I logged into one of our servers the other day and the RAM on the server was pegged but it was sitting at 2% CPU. So we're obviously, we're not using that infrastructure very well. We need to do it better. Monitoring and security are, they take too long on a box per box basis. You think of your zero day vulnerabilities and you end up with things like I have a thousand servers that I have to get patched before this date because there's a vulnerability coming on. So hey, I'm gonna get my infrastructure admins to drop everything they're doing and go solve that problem. Even with pooled infrastructure, uptime goals can be difficult to reach. If you need to scale up your pool it can be challenging to add things to the pool and get that done really quickly. I guess it's easy to add to the pool. Doing it quickly is where you have a challenge. So as far as developers go, we end up doing things that aren't development and that takes away from the time we can be developing. Things like setting up new servers for projects, doing those things instead of responding directly to business needs, the cost to a project where you have to add new servers, things like that. These are reasons why we wanted to change. To drive this home a little closer, I wanna talk a little bit about a sample application outage. I'm sure no one in the room has ever had this happen to them before, but so this sample application outage, you have a customer facing application that's running a little slow. You've got people that can't log in. What's the solution? You need to scale up the application and so you end up having to add more app servers to your pool. In this specific instance, by the time we got to, we need to add more servers to our pool, we'd spent about 50 man hours. We had a 12 person troubleshooting team going on and our customer experience was impacted. So those are things we all wanna avoid. Another issue that you may or may not have. We have a large app pool that has about 50 servers in it. Each server has about 50 wars in it. Now don't judge. We realize that this isn't the best thing and we're trying to get off of it, but it is what it is today. So if one of those apps in the pool needs to scale up, guess what? Build a new server. Drop all those things onto the server. That takes time. You're scaling an entire suite rather than scaling an app as a unit and to put it into a sports analogy, hey, I want kicking practice. Okay, bring the whole football team down to the field for kicking practice. That's not good. It's labor and infrastructure intensive to scale apps in that manner. Okay, so how did we get here? In late 2013, at one of our hackathons, we had a promotional account with pivotal web services and in 24 hours myself and one of our architects was able to deploy an application. Due to zero downtime AB deployments, we had it connected to a cloud data source and we really liked it. So after that, we attended a couple of cloud platform road shows, one of which we hosted at Garmin. We decided it was time to dig even deeper and we decided to go for a proof of concept. During our proof of concept, we spent some time looking at data stores. We know that Cloud Foundry connects really well to cloud hosted data, but at Garmin, we have some pretty huge databases already. The connect one is one of the ones that was mentioned. We needed to make sure we could connect to those. One of them is MySQL. We have a lot of Oracle also. Single data source applications, user provided data sources worked really well for us. I also ended up testing multiple databases in one application. Not that I recommend it, but for some of our legacy apps, we wanted to make sure we could do it. So that was easy. Now, non-JDBC is a little more challenging. Without the JDBC URL, it's a little more tricky to make a user provided service connection, but with a lot of great help from Pivotal, I was able to get that to work also. So Pivotal had set up weekly calls with us. We had Andrew Ripka and a few other guys on the phone weekly and they did a lot of troubleshooting and we had a lot of great help there. The other thing that was going on during this POC is myself and our architects were debating about what do we put in the cloud boundary? And my initial thought was, let's get everything in there as quickly as we can so that we can start to see the benefits of auto scaling, better infrastructure usage, et cetera. So maybe I'm naive, but this was what I was hoping for. I think really if we had gone that way, it probably would have looked more like this. So we decided, no, we don't wanna put everything in there because really what that would give us is a shiny new home for all of the problems we have today. Yes, we have technical debt. The company is only 20 years old but that's old enough to have technical debt and we have it. So we decided the cloud boundary was our great opportunity to modernize. So we're shooting to go 12 factor. We know we need to change some of our application development culture. We need to change some parts of our implementations such as how we do Splunk. I'm gonna let Alex talk about that a little more later. And of course, we know we need to reduce our technical debt. We're really looking forward to using the Spring stack. We already use a lot of Spring, mainly the Spring framework. We use MVC, the framework. And we're also gonna start using cloud services. I was able to get the Spring cloud configuration server up and running within cloud foundry. And so we're looking to externalize all of our configuration out of our applications, host it within the Spring cloud config server. One of the things we do today, each of our servers has a monitoring war on it, which is great. It allows us to monitor all of our servers but we all know with cloud foundry when you deploy a droplet, it's just the droplet. You can't put two droplets and call it a droplet. So our monitoring war cannot be, we can't deploy with apps anymore. So Spring actuator is gonna solve that nicely for us. The out-of-the-box health indicators are pretty cool and I'm sure we're gonna end up writing some of our own. So look for that from us in the future and we're also gonna be doing a lot more with cloud connectors. We've also been pretty excited about the Netflix OSS stack. I'm not sure how many of you are aware of all the things that are there but it's well worth looking at. We'll be using Eureka for service registration and discovery. We'll be using Histrix for fault tolerance and circuit breakers. We'll be using Fane as a declarative REST client which allows us to address REST as a native Java interface. So let's revisit, why do we wanna change? There's no more build-up for applications after cloud foundry is up and running. Your infrastructure is much better utilized because the DEA gets to decide who's gonna run what and therefore your apps are more spread out and you're using your infrastructure better. Zero downtime upgrades are gonna solve our labor intensive security and monitoring issues and the entire platform is designed to solve the uptime goal. To cover this slide really quickly, it'll give us the ability to do push button deployments. We're really looking forward for continuous delivery. I wanted to talk about done a little earlier and I forgot to do that. So how many developers here have said, oh, my code's checked in to get or I've got my pull request or I merged my code into develop, I'm done? Well, yes, we're done when that happens but you know what, nobody else gets to see the code. And so with continuous deployment, we're hoping that we can redefine done as it's in production. And we're hoping that that will happen within the confines of a single sprint. So going back to our outage prevention, if we tell that same story with cloud foundry, the real story is there's an imperceptible slowdown to the app. We as consumers using our products, we don't even notice it. But cloud foundry notices it because there's some heavier than normal load somewhere. It scales up and before a client even knows there's an issue, the developers is notified that, hey, you're not running three instances of this app anymore, now you're running five. And you can go on Monday and go back through the logs and take a look at that rather than the long seven one outage call that you end up having on a Sunday afternoon because the app went down. With cloud foundry managed apps, you end up with each app having the number of instances it needs to have rather than having the number of instances that your monolith has to have. And so going back to that football analogy, when we wanna have kicking practice, we just stick them on their own field and they go practice and they don't have to affect the rest of the football team. The instances will grow and shrink as necessary. The apps are nested in their containers so there's no security issue and that zero day of vulnerability. You can solve that problem by upgrading the, excuse me, by upgrading cloud foundry while the apps are running and there's no downtime. Autoscaling is of course efficient and fast. Your infrastructure is used much more efficiently. I'm gonna pass it off to Brandon, he's gonna tell us about some of the build-up we've done. Thanks. Okay, so I'm Brandon Henry, I'm a Linux sys admin with Garmin. I did the initial implementation and design of cloud foundry at Garmin and I'm gonna give you a high-level overview. Okay, so Akamai is our global load balancer and web content caching system. This is what we use to load balance globally to our different data centers. Next thing to notice is we have Nginx which is a lightweight load balancer behind an F5 which is a heavier, more feature rich load balancer. The F5 load balancer is used to balance application calls between our existing legacy applications and our shiny new cloud foundry environment. We use Pivotal to deploy cloud foundry in order to greatly reduce our implementation time, time to production of our first app, ease overall user and org management. As you can see, we actually have two different CF environments in production. Each purple cloud represents a data center. Like many IT orgs around the world, Garmin is struggling to make our own data centers more disaster resilient and so this is a good way to do it. Okay, so why do we have Nginx as an intermediary? For that you'll need to know a little background. Initially the scope of the project was to have little or no code rewrites for our first application. Many of our brownfield applications are interconnected and make calls back and forth to each other. Therefore when you have hard-coded URLs and legacy application code like say sso.garmin.com, this doesn't work when you take into account cloud foundry's requirement to route all application calls via the wildcard DNS entry, star.cf.garmin.com for instance. We decided to use Nginx to do the URL and host header rewrite. F5 could do this via an I rule but we wanted to be able to store the configuration in source control, automate deployment of the configuration to Nginx via get hooks. Since cloud foundry will be a service used and shared by many different development teams, we wanted to make sure that it was developer friendly, safe as possible and allow them to make the changes that they need. Okay, so here's what the Nginx rewrite looks like in architecture. So you start with, let's see if I can, so you start with app.garmin.com. Comes down to the F5 layer. This stopped working, okay. Goes on the F5 layer where we basically do some sort of load balancing to route to our legacy applications and cloud foundry. They're canary in the coal mine, AB deployments, weighted load balancing, something like that. From there, the requests that go to legacy are just like they are today, app.garmin.com. When they're routed to our cloud foundry environment, it hits the Nginx layer where it does the URL rewriting. So for instance, app.ola-cloud or app.kcg.cloud. And then from there, those requests are routed to cloud foundry. So let's take a closer look at a single leg over cloud foundry implementation. One thing I left out of the last slide is a callback to the F5 after the Nginx URL rewrite in order to simplify the image. So it actually routes back to the same F5. And in reality, we do that so that we can manage our cloud foundry routers through a F5 pool. It just makes that level of abstraction a little bit better for scaling out in the future. Many of you may have noticed we only have one availability zone per data center. So at the original time of implementation, due to the quick pace of the project, we only had enough hardware for one availability zone. Because cloud foundry, their load balancing logic actually requires two availability zones for TrueHA, we've ordered additional hardware and we have it scheduled for next week to actually add a different and new availability zone to each data center. So let's take a look at the overview now. We're moving the legacy app portion and adding our second availability zone to each data center. This is our future state. We'll remove the legacy infrastructure and rely entirely on cloud foundry after it has been proven of production level up times. So it looks good to me. All right, I'm just gonna spend a slider too showing you the user interface for Pivotal. Here we have the CFOps manager. This is where we control some of our build pack installs. It also acts as an integration point with VMware and cloud foundry infrastructure. Anyone who's installed cloud foundry manually is aware that navigating the maze of VMO files is can be tedious. The CFOps manager and VMware integration tile abstract a lot of that pain away. Within the Elastic Runtime tile, we have this handy resource tab that allows us to easily scale out our different cloud foundry components. Whether it's adding compute resources in the DEAs or scaling out highly available load sensitive components like the router, it's all managed easily from this page. Here's an example of the status page. Sorry if that's a little blurry. Within the Elastic Runtime tile, it's pretty handy for troubleshooting at a glance but to go in depth for production level support, we need to go deeper. For that, I'll delve into how we are monitoring our cloud foundry environment as well as the applications that live within it. So if you're paying attention, one of the build packs we had installed in the ops manager was called ops metrics. This handy service will aggregate all of our data from cloud foundry environment to a nice little consumable JMX endpoint. It has data about VM resources and health, overall application management statistics like number of crashes, number of app instances expected versus running, total number of requests, DEA metrics and many more valuable pieces of information. Casey missed it, there it is. Okay, so what do we do when we have all these fancy metrics exposed? We monitor. We use SolarWinds to monitor the overall health of our cloud foundry environments. SolarWinds for those that don't know is sort of all-in-one IT infrastructure monitoring suite. Through SolarWinds, we implement a simple up-down health check for each component in cloud foundry. This feeds directly into our existing 24 by seven monitoring infrastructure and fits perfectly with our current on-call processes. In theory, the health manager within cloud foundry should keep all the components running but we know that sometimes reality doesn't meet expectations. So we need to make sure. This is not enough information to figure out exactly what's going wrong if components or applications start to fail. So that leads us to our next tool. ViewRealizeOperationsManager is VMware's integrated operation suite, which we basically just used for its nifty dashboarding system. I'll show you the dashboards in a minute. These metrics, once again, come from the JMX endpoint exposed by the ops metrics tile. Temporarily collects these metrics that are a hardware server which aggregates the information into a format that ViewRealizeOpsManager likes. Unlike SolarWinds, this allows us to delve very deeply into our cloud foundry infrastructure. They're combined into very helpful graphs with colored health indicators that can give you an added glance view of cloud foundry environment and general CF application health. This will also allow you to deep dive in for more thorough troubleshooting. Here's an example of one of our dashboards. We can do things like monitor the number of applications requests, how many routes the cloud foundry routers have created, a canary in the coal mine metric, like registry update lag to test for network latency issues, overall DEA utilization to warrant of impending resource bottlenecks. As things start to go wrong, the green boxes will turn to yellow and then to red as things get worse. We can monitor memory utilization over time of our DEAs which can help us plan for the future. It could also help us see what impact new deployments have on our CF environment. We can monitor application health from the perspective of the cloud foundry health monitor. By default, it can monitor application health to determine their state running, stopped, crashed, their version, number of instances, and it has the ability to reconcile any problems it might find within the applications or components running within cloud foundry. This graph in particular would be helpful to monitor new application code releases to ensure that there aren't a bunch of new crashes after you deploy code. This is only scratching the surface of what we can monitor. There are a ton of metrics exposed, I encourage you to go out there and check out this blog post by Jamie and dig in. So of course the goal of the operations team is 100% uptime. Cloud Foundry, VMware, F5, EngineX, and Akamai. All the other components within our architecture are designed to support this goal. They offer redundancy, resiliency, and visibility. Akamai will load balance across data centers. F5 will load balance across the EngineX layer and the legacy architecture. EngineX will load balance to the cloud foundry routers. The cloud foundry routers will then load balance to the DEAs where the applications live within their respective containers. That's not all. We also have to keep in mind that this is all backed by infrastructure and monitoring solutions designed to alert us and protect us in the unlikely event of all the previous redundancies fail. The heartbeat health checks and self-healing provided by the cloud foundry health monitor. The virtualization and infrastructure abstraction given to us by the VMware, including the VM Resurrector in vMotion. The in-depth analysis of CF components aggregated and visualized by VRilize and Hyperic. The real-time monitoring, alerting contributed by SolarWinds. All of these components work together to provide a good customer experience and 100% uptime, which for an operations guy, that's why I'm here. All right, Alex will now go into a little greater detail about how we monitor the applications within cloud foundry. Can you guys hear me? So we're doing application monitoring, so totally separate than what we're doing as a system monitoring, and I'll be brief here. We decided to go with two tools we had in-house already, which were AppDynamics for our JVM monitoring, as well as Splunk for just overall log monitoring. So with AppDynamics, we had a little bit of a hiccup with how we did this. So just a quick overview, the way we actually load information in AppDynamics with our current applications. AppDynamics has an application layer, which is obviously an overused term application. For us, we use that as an overall environment, so connect-parade, connect-stage, internet-parade, internet-stage. Underneath that, for the tier name, we would actually define as a cluster, so SSO being a great example of that. So SSO would be its own cluster, and then also as for the legacy piece, the servers would come in and just populate underneath that. So we grabbed the user-provided service for AppDynamics with the original build pack, and we load it, we add our credentials, and then we weren't really sure what we were gonna see yet, so Jonathan and I just kinda jumped in feet first, and we get, this was initial, so we get this, because we weren't able to define our structure. We get, for our application config, config servers populate as the application, the tier name is populated as Cloud Foundry, and then the real confusing part, we get that string there is basically the Cloud Foundry instance GUID. So we were, this was really close, but it wasn't exactly what we wanted. So we went back to AppDynamics and Pivotal, and just said, you know, guys are really close, could you help us out? And they definitely did, they helped us a lot. So again, original build pack, so we grabbed Jamie O'Meara from Pivotal, and the guys from AppDynamics really stepped up to the plate force, and they got us a new build pack, which provides a new user-provided service for AppDynamics. So we download it, get it installed, and we're able to define in that the application tier name and how we want the servers to actually, or the instances, I should say, to actually be brought in. So you can see here, Jonathan and I were, this was a quick test, so a little app, this is now Jonathan's infamous little app that we used for testing in Cloud Foundry. Came in as the application name, TestCF came in as the tier name, and then the instances came in as zero and one. Now we're able to define that, we could say SSO-0, SSO-1, that identifier at the end is an auto-incrementing value, so, and we can edit that as well. So you can see this is definitely a giant step in the right direction to kind of form it how we were doing in Garmin already. And I'll kind of gloss over Splunk real quick. So the way with Splunk, we're obviously not doing any system level, we're doing all application level. So how it worked before for us, and I'm sure how it works for a lot of you who use Splunk, the applications are spewing off all these logs and they're going onto a file system in a lot of cases. So you may write to a flat file and Splunk is then pulling all that in. Well, with AppDynamics there's no, or excuse me, with Cloud Foundry, there's no file system anymore. So we have to integrate the Splunk user-provided service called the Splunk Drain. So everything is coming off of these applications and it's just going into the Splunk Drain and at that point it becomes elementary. It's how you've done it before. Splunk forwarders however you want to do it. So, and it comes in that way. We're still getting into that. So we're still, these pieces I'm talking about here still really in their infancy and we have a lot of process changes we have to make before we get to a lot of it. So real quick on production management which is kind of where I come in. I originally started talking about, I started hearing about continuous integration, continuous deployments and this is kind of what I hear off the bat. So I'm hearing testing and production is basically. So I decided to approach it with an open mind and I've been learning more and more about it. So we've realized that when we get into infrastructures like this we're going to have to adopt this thing. We're a release-based, currently mostly a release-based deployment strategy. So that process is going to have to become a lot more flexible and a lot more continuous integration. So the process itself is mostly going to have to change before we can really adapt the tool. Bamboo is kind of what we're looking at right now. So, but again, all very preliminary. So in summary, I hope you guys got something out of this. We're still, a lot of it's in progress but we've come a long way and a lot with Pivotal's help and a lot with AppDynamics help, a lot of our vendors and we sincerely thank them. But these are our seven, the seven topics that Jonathan brought up. This was the roadmap that brought us to this point. So this is what we used to determine that Cloud Foundry really feel like is the way that is really going to help us with getting our infrastructure going and getting to the next step. So the next steps for us, we obviously are going to get the SSO app integrated, hopefully the end of this month, maybe into June. We're really hoping that's soon. We're going to get the multi-site pieces with the multiple availability zones that Brandon talked about and we're going to eliminate that legacy interaction with SSO. So I hope this was helpful. We don't have a whole lot of information right now, as much information as we like right now but who knows, maybe next year we'll come back and give you guys an update and if you guys will have us. So that's it. And we'll be around the whole weekend. So give us a round of applause. Thanks.