 Thanks, everybody, for coming. Come on in, grab a seat. Today, we're talking about what does it take? What did it take for Verizon to deploy Cloud Foundry, manage Cloud Foundry, and make Cloud Foundry a success for Verizon? First, just a little bit about me. I'm a DevOps and Cloud architect at Verizon. I got my start in the cloud with OpenStack at Verizon and then went on to actually deploy, launch, and manage Cloud Foundry for the last couple of years of Verizon before finally moving on and into some API first transformation. Feel free to reach out and connect on Twitter and LinkedIn. Best way to get a hold of me? Enough about me. Let's talk about Verizon. Verizon is big. 170,000 employees, 10,000 IT developers, 3,000 IT systems. Verizon is big. It's a Fortune 15 enterprise. When you think of Verizon, what do you think of? Cell phones? Who's? Cell phones, right? Who's got a Verizon phone? Awesome. Everybody else, I'm sorry. But Verizon isn't just cell phones. If you came over from the East Coast, you might know our fiber-based home broadband and TV product. Don't tell the guys next door over the Comcast presentation, but it's pretty good. But Verizon is not just broadband and cell phones. It's not just the telco. It's all of these. Verizon is our enterprise solutions. It's our IoT space with thingspace. It is our mobile first digital video platform Go 90. It's our media and advertising digital properties with Yahoo and AOL combined into Oath. Verizon is so many different things. But Verizon is changing. Verizon is becoming a software company. Archmove into 5G is exponentially growing the amount of network throughput and data that we need to handle. All the new services that we're looking to get into are higher up on the network stack than the network. So that's managing numerous massive mesh of connected devices to build out a smart cities platform. That's adding connectivity to your car through our telematics plug-in home. Verizon is moving into all of these new markets and all those new markets are software. Verizon is becoming a software company. And in order to be competitive in the marketplace as a software company, we have to deliver software faster to the market. And Cloud Foundry is one of the ways that we deliver that software faster. That all sounds pretty good. In reality, in practice, not that rosy of a picture. We've met on a long journey with Cloud Foundry. And that journey has had its challenges in pitfalls and it's had our learnings along the way. So what I really wanna talk about is our journey and what we've learned along the way that you can leverage in your own business. So let's start with our journey. We started back in 2015 with five applications and our first private cloud automation launch based on OpenStack is the IaaS layer and then Cloud Foundry is the platform on top of it for PaaS functionality as well. We started to grow that. We learned from the use cases. Later, we extended to multiple more data centers and leveraged VMware as an underlying more stable footprint. So all that is our way to expand our private internal data center footprint. Then we started moving into public cloud space with AWS. Now, fast forward to 2017 and we're fully getting off of OpenStack and we're doubling down on our public cloud investment through AWS. So today, we're running in six data centers, including AWS. We have 12 foundations, which I think is really one of the unique things about our Cloud Foundry deployment is just the number of foundations that we're running. We're supporting 100 applications and running over 4,000 containers at any point in time. Pretty big scale. Why do we go on this journey with Cloud Foundry? First and foremost, to reduce the time to business value. Everything we do to digitally transform and change the way we write software is all about reducing the time to business value and being more effective at delivering that software. We do that by letting developers focus on developing. Do what we pay them to do, which is to write code that ends up as part of the app in production. Anything else is a waste. Along the way, we found, and you've seen a lot of the discussion here today around multi-cloud strategy, we stumbled upon the benefits of that as we are looking for migration paths from private to public cloud. The like for like environments of Cloud Foundry provide an easy path for that migration to public cloud. And the same applies to go from one public cloud to two public clouds. And then finally, the least tangible benefit of why Cloud Foundry is when you have that rapid acceleration, that rapid decrease in time it takes to get an idea and put it out in front of the business, you drive an innovation culture. You drive a culture where it's okay to try out an idea, test it on its own merits, and then feel free to continue or move in a complete other direction because the amount of investment to make that push to put that idea out in front of the business becomes significantly less on Cloud Foundry. So let's talk about what makes Cloud Foundry successful at Verizon. At the end of the day, adoption and growth of the platform is only successful if the developer experience is phenomenal. It doesn't matter how stable, how performant, how well over-engineered with every piece of technology you can imagine under the hood, if the developers think it sucks, it sucks. And when we talk about who our developers are, when we started, we all thought we were going to have these 12 out of 12 factor green field complete from the ground up re-architecture applications. The best of the best unicorns in reality, what we got? Little different. All right, but that's okay. When we look at our actual use cases on the scale of brown field to green field, I think we find that we're somewhere in the middle, maybe skewing a little bit towards green field, but they become some semi-stateless, maybe they're a couple factors, short of 12 factor applications that are still in the middle of refactoring, or they're small, single-purpose built applications. They're not massive systems that now have been completely re-architected. So realizing just that not every application is going to be green field. And that took a while for us to really recognize just how true that was going to be for application re-architecture, that that's okay. We can't hold out for perfect use cases if we want to grow adoption. So this actually then knowing that we have not-so-perfect 12-factor applications coming in, and we have developers who are, this is a brand new concept to them. This re-architecture is scary. There's benefits of un-appininated paths. There's benefits of cloud foundry here. Appininated paths kind of become those bumpers in the bowling lane that you can move around, but you're still gonna stay within the lane of what you're able to do within the platform. And so these are things like maximum container sizes. I can't push a container that's 32 gig. You can try, but we won't let you. You can't SSH in. You have a limited selection of build packs to choose from. You're forced to be multi-tenant. That platform is gonna be inherently a little less stable, so you have to start accounting for that within the architecture of your application. These are things that really the equivalent of pushing a kid into the deep end of the pool when it came to adoption of the platform, bring the developers on, lure them in with promises of faster deployment, and then they get hit hard with the hammer of the reality of why 12 factors really needed. In order to provide a superior developer experience, especially as developers are transforming the way they're running software and they're transforming their application architectures as an operations team, as a platform provider, we really need to do everything in our power to empower those developers. That was something that is a continual lesson that we learned is just how critical it is to not just invest in the platform, but invest in ways to empower developers who are using the platform. So we do that by now enabling troubleshooting with self-service tooling, network connectivity applications, dummy shells that they can SSH into to perform platform level checks. And we do that by being transparent, taking these DevOps practices or DevOps principles of breaking down that silo and providing feedback on every step of the process from the operations side back to development and being honest about what a cloud is, what is to be expected from multi-tenancy instability, and helping applications really distinguish the ever-asked question of a platform operator. Something's kind of funny with my app. Something's not working right. Is it my app or is it the platform? Providing all the tooling we can to help troubleshoot that issue, help developers self-service troubleshoot that issue. So that's providing the developer experience. That's what we can help to provide to make that experience better. That said, a superior developer experience still requires a performant and available and stable platform. So that gets us into how do we operate Cloud Foundry? How do we keep Cloud Foundry running at massive scale? How do we keep 12 Cloud Foundrys running at large scale? To do that, you need a top-notch team. Your operations team becomes subject matter experts, not just in all the technologies that make up the platform, but every technology that that platform touches because they become the front door for support for anything associated with that platform. So that is if you integrate into an identity system, your platform operators need to know the ins and outs of that identity system. I think honestly within Verizon, within IT, that our Cloud Foundry operations team was the first instance of a true full-stack engineering team. And that's because operation zones the whole stack from the base-level infrastructure, all the way up through to elastic runtime, and then all the way up to the network services and any other services that Cloud Foundry plugs into. Support and ownership applies across the entire stack, even to the point of operators creating applications that they in turn themselves run on Cloud Foundry and are available as tools available to consuming developers. Cloud Foundry operations owns that whole stack. And so you need a top-notch team with subject matter expertise in that full stack. Expertise in the stack is one thing. The other thing you need to manage is visibility. You need a situational awareness of what is going on in your foundation, in your platform, at any point in time. And so you need, in my opinion, a good quality monitoring suite to do APM and infrastructure monitoring of a Cloud Foundry foundation. And we've done this with integrations into Datadog. Getting all those metrics, pushing them with the Datadog, what really matters from those measurements? We really measure four things that are important to us. We measure health KPIs of each foundation. Our jobs responding, our VMs up. We measure capacity remaining. When do we need to start scaling out this platform? Can someone come in and schedule additional containers or are we gonna run out of space? That we need to know in advance. Underline VM health, as an indicator, is there something going on with the IaaS that we need to be aware of that we could be running in a more fragile state at this point in time? And the last one, which I'm really excited of our progress on, is smoke tests. Being able to run continual smoke tests against each foundation. Every 15 minutes we run the smoke tests there and against every foundation and feed those results back into monitoring. What does that look like in the end? Taking all those foundational health KPIs and putting them on the one screen? That's our sea of green. That is our one quick look at what is the health of our foundations? What are the health of all 12 of our foundations? And what actions do we need to start taking? Then we start to dive into what's happening within each foundation. And charting that out over time, seeing what's capacity remaining. Was there a spike in 500 requests over time? When are our peak load periods? All things that we wanna track and measure and monitor. That's visibility. Operations needs visibility and they need action from that visibility. And that's where we've seen great success in these last couple months with concourse. Concourse has been a game changer for us. We use it now to do release upgrades to keep us actually closer to the latest version released upstream of Cloud Foundry. We use it to do stem cell upgrades. We use it to do build pack management. That's the mechanism of which we're able to do that every 15 minutes continual smoke testing and pushing those out up to Datadog. And we even use it for centralized user management. Managing users in one location and propagating that change out to 12 foundations. Which has enabled us to do smaller, more frequent packing. So it allows us to be quicker in operating the environment but it also promotes consistency across all the environments. Before concourse, we had trouble keeping all of our environments consistent. And that led to a poor developer experience. If they're out pushed in one environment but failed the stage in another because the Java build pack was different, that's a poor developer experience. And that's something that would slip through the cracks before we brought in concourse. Now, while I was on the plane here, I was rereading through the Phoenix project. Who's read that? Awesome. So, a couple things that really stuck out to me this time reading through and saying these are some things that we've seen ourselves in action. Concourse is what's enabled us to get out from behind constant firefighting of platforms and actually get into having now the cycles to create tooling and processes that help automate and help make it easier for operations to continue to manage the platform. And that effect becomes exponential. As you build more automation tooling, you free up more time to build out more automation tooling. It grows and grows and grows. And concourse has been, that's why concourse for us has been a game changer. So, just taking a look at this in action with the smoke test example. You see, every 15 minutes against every environment, we're running another concourse pipeline to push and run the smoke test errand. And if you're not familiar with the smoke test errand, what that's doing is creating an org, creating a space, pushing an application, and then checking to make sure that that application is up and running. Really putting the whole Cloud Controller API, Elastic Runtime, and the Diego scheduling service through its paces. So, a successful smoke test is a pretty good indicator that Elastic Runtime for itself, maybe not all the ancillary services, but Elastic Runtime itself is up and running pretty well. So, then taking this one step further and combining what we've done with concourse and combining what we've done with Datadog, the last piece of that is down here at the bottom, actually pushing the metrics of success and failure up to Datadog so we can track that over time. So, here you see, we have a time series over the last week of successful, successful and failed smoke tests against an environment. We also have alerts set up so that we find out when smoke tests fail so we know as operators, when, if there's an issue with the platform, long before a developer opens a service ticket or says, hey, something's broke. Concourse has been immensely powerful and immensely valuable to us by being able to do these things. So, along our journey here, we've come a long way, we've changed a lot. What have we learned? Well, first, it's okay to be a donkey farm. We were expecting 12 out of 12-factor unicorns realized that wasn't the case, but that's okay. Application teams and developers are still getting the benefits of a PAS while re-architecting their application and continuing to grow their application maturity through refactoring and in more to a true cloud-native application. And we've been able to do that and help them on that journey by continuing to empower them with tooling. This is something, again, as a continual lesson, we have to keep teaching ourselves as an operations team that at the end of the day, our purpose, this platform's purpose is to empower developers. One of the ways we do that is providing transparency into the platform. Shifting the burden of responsibility of operations and some architecture considerations, the middleware versions, pushing those away from the development team and now into the build pack and into the platform, that's a significant change. To ask a development team who's been doing that for years who has, that's their scope of ownership, that's their value add and say, no, you don't control that anymore, we're gonna take care of that for you. Let's say, I don't really trust you to take care of that for me. That's where you provide that transparency. You still, you wanna give, you take over ownership of operations, but you provide as much transparency to view into those operations to help build up that level of trust between developers and platform operations. And finally, automate everything. If we're performing an action against an environment, it should be done through automation because chances are we're gonna have to do it to the other 12. That's been our journey so far. Where are we going? Well, first, taking those alerts and now turning them into meaningful remediation, really going full circle with this convergence between or interface between concourse and Datadog, having concourse report on events or findings, push that up to Datadog for tracking, monitoring, and then alerting, and then taking that alert and putting it right back down into concourse for remediation. It leads into, again, concourse, all the things. Our major platform investments over these next few months will all be around increasing our automation and management capabilities through concourse. And then we can always do a better job with platform transparency. The next steps for that, for us, will be to take all those metrics and visibility into those metrics that we use internally from operations and build those out into dashboards available to developers and to executives. Being able to show with one click or through an API, what's the health of this platform? Should I push to this platform right now? Is it up and available or should I push somewhere else? Being able to provide that data out to developers, continually providing that view into the platform. I'm hoping now as we've gone through just a brief summary of our journey and what we've learned and where we think we're going, that we've provided a little bit of information that you'll be able to use and apply to your own platforms. So with that, I'd like to thank you all for attending here and then now open it up for any questions. Stability, and it comes down to, we were finding with OpenStack that our failures were not isolated to availability zones. They weren't isolated to hardware clusters. They were logical failures that affected the entire cloud. So in order to effectively run an HA multi AZ approach with Cloud Foundry, that required two independent foundations for every, or running them on separate OpenStack environments entirely within the same data center just to provide some level of HA. And so the level of overhead redundancy on that was too great. Yes, yes. So the question is where does APM fit into all of this? So I showed infrastructure level monitoring and what is used by an operations team for the health of the platform. When it comes to what Verizon has done for APM, as we've come from a history of so many different companies and so many different independent IT organizations that have now through mergers and acquisitions become one large, pretty happy family. Everyone's come with their own opinion and their own existing licensing contracts about what APM they wanna use. So we've taken the hands-off APM agnostic approach and say you should use an APM. In some cases we're saying out of the box now you can have PCF metrics available to you, but you're gonna still wanna bring your own APM here and we'll help you plug it in. So APM is being still brought in and determined by the different consuming applications. Yes, we not fully, nothing beyond the lab at this point. I think more so it's been not a platform limitation, it's been where.net applications have been focusing on. Yes. I'll be available after as well for any additional questions or send them to me on Twitter. So I can give you one example which has been change management. We come historically from ITIL systems with a two week, you have to sit in a change review board meeting two weeks prior to wanting to make that change and you have to get all of your stakeholders to agree on okay that's a reasonable time to make this change. We have 12 environments with potentially disruptive changes happening during upgrades or during fixes to get consensus between a hundred applications on when a good window for that is and have that agreed upon with a two week lead time, not gonna happen. So change management is an example where we really in some cases had to push and say this is the way it has to be for this platform to be successful. This is the way and we're fortunate enough with Cloud Foundry being such a community focus and consumed outside in the enterprise that we have plenty of examples of other operations teams or other companies and say this is how they're doing it. There's no reason we can't do it this way which has led us now to categorization of changes into high low risk. Low risk changes are approved automatically. We use automation to put them in change control. Those that are high risk are still done with they're not done with that two week schedule but they're still done in the evening. Yes. So the question is what's our public cloud strategy? So we want to move a large number of our IT workloads into public cloud. We think it's an innovation accelerator and as we continue to grow up our compute needs across all areas of the business, not just IT, we're looking for opportunities to expand that compute without the, we're already running at capacity to our data centers. We'd really like to avoid that capital investment of additional data centers. That said with government contracts and lots of customer identifying data, this call detail records, things like that will always have a need I believe to keep some of our systems within firewalls. I think the definition of what's allowed to go to public cloud and what has to be inside will continue to change and will continue to shift in favor of public cloud as we become more familiar and used to and comfortable with it. But I think there will always still be a few use cases that exist. But I don't think necessarily that's targeted by use case. It's not a cloud bursting model. Yes. So I mean the example there of a multi-tenant platform but helping applications troubleshoot issues. So the question was with a multi-tenant platform and trying to enable application troubleshooting while still keeping developers happy, those are really in many cases conflicting ideas. How are we successful with that? So I think multi-tenancy became a requirement just from an operations perspective. We couldn't give every team their own foundation. So in order to go into Cloud Founder, the cost of entry was you're gonna have to be able to play nice with others. Second, enabling application troubleshooting and keeping developers happy, I think go actually go hand in hand. That in keeping developers happy, we wanna provide all that tooling to enable application troubleshooting and make application troubleshooting as painless of a process. But I'll be honest with you, that's something that we have struggled with over the last couple of years. We've made great investments in the platform. I think there's still a lot that we can do to invest in enabling application teams, supporting application teams, really providing a one-stop support endpoint for not just Cloud Foundry but 12-factor cloud-native applications that run in BuildPacks. Yes? Sure. So one of those, it was an app we found out there. I think Pivotal Engineering had written it to help troubleshoot and that was Will It Connect? It's up on GitHub and that one was just a really simple Spring Boot application that you provide an endpoint and that application will tell you whether or not it can establish that connection. One of the first things we often have with troubleshooting is identifying is there some kind of down dependency with a back-end service that application wasn't and that dependency wasn't put through Circuit Breaker appropriately so it was failing to stage because the database service wasn't actually responding. A tool like Will It Connect is now, you take their app out of the equation and you put that dependency into that Will It Connect field and see if that endpoint is up. Right, and then at that point we also look at and what we're looking to get into is building out strawman status applications on the platform. So these are applications that really don't have any external dependencies. They're only dependent on Cloud Foundry and the help you can point and look at those apps and continually push and re-stage those applications to show looks like, we can't say authoritatively but we can say with some confidence that Cloud Foundry, the platform is not the issue in this case. So let's go take a look at what's happening within your application. All right, yes, yes. So the question is, does Verizon have, are you from Garmin? Okay, because Garmin was talking about their labs. We're like, oh, that sounds like our dojos. So within the last, say eight months here, since the start of the year in some of our tech heavy office locations, we've established dojo programs. A lot around teaching of DevOps principles and just some new kind of modern tools and techniques you can use to accelerate your app development and put apps, app teams through those engagements, through those for like a four to six week engagement where they're working on still delivering their desired outcomes for the application but they're doing it side by side with coaches trained to teach in that those techniques in tooling. Yes. I think the first one we're gonna try and tackle is self-scaling around a Diego, we're running low on memory alert. We're gonna need to add a couple more Diego cells. So taking that alert up the data dog so we have that alert, we can see how often that happens and we can see once that alert hits and once that remediation events gets pushed up, the remediation is complete, we should be able to see corresponding in time the alleviation of that memory constraint. Yes. We have a lot of teams who want us to support stateful services or in stateful applications. I think we're waiting patiently on some additional functionality around the NFS volume service, support for some more NFS back ends and support for backups before we really open the doors on that and say, hey, bring your stateful apps, we can support those but there's certainly the ask for that. In terms of database, we really try as you saw the list there for the operations team and just the scope of ownership and control. If you throw database administrator into that mix, we'd be underwater. So we've really, we've drawn a hard line and said, as an operations team, we cannot, we can't also be supporting database services. We'll be happily integrate to another database as a service offering and expose it through the marketplace but there still needs to be a Verizon database ownership group over that. Yes. Well, anytime there's a problem they constantly talk to each other. No, and really your platform team within Verizon, the way we've organized around Conway's all, right? The way our processes have organized is as the platform team has been the closest and the closest tied to the success of the platform, they've also become the evangelists of the platform. So they're also the ones kind of out there beating the doors a little bit for, hey, have you tried out Cloud Foundry? That sounds like a use case that would be well suited for Cloud Foundry. Let's see if we can help you out and teach you some of the ways you can have success here. Yes. Yes. So we've, usually it's come into, could we find an opportunity to consolidate our APM and our infrastructure monitoring under one system for consolidated pricing? So we looked at New Relic as they moved into the infrastructure space and we looked at Datadog's APM tools that came out recently. So I think it's something we're still continually looking at. One of the big drivers for us with Datadog was that there is a well-established method of getting those fire hose metrics out to Datadog. That was provided out of the box with a minimal, we did a little bit of config on it, but it didn't take much. Yes, sure. So that's actually within the last couple of months really what I've moved into now is an additional platform. We're looking at an API-first rearchitecture strategy across our IT landscape. So we're looking for a singular API gateway and API management platform. So for that we're working with Apigee, now Google Cloud on that. All right, one more question? Let's see if we've got enough time here. So we're off of... So we're getting everything through Pivotal. So we're on Pivotal, Elastic Runtime Release 1.10. All right, looks like we're running out of time here. Thank you everybody.