 All right, everybody, welcome. Welcome. Thanks for coming right after lunch. You had a lot of choices for some really, really good talks at the time slot. You had a choice to sit down and take a nap. And you chose us. And I really appreciate that. Please don't nap in here. We'll do our best not to as well. Right. So thank you. Cracks in the Foundation, we're going to talk about bumps in the road that you're going to see during your time running Cloud Foundry and getting started and all of those things. So to kick us off, I have a disclosure I'm going to share with you. So just so you know, everything that I say today are my own thoughts, my own opinions, and in no way reflect any of the views, values, or opinions of my employer. Everybody got that? We may remind you. Just a few times we might. Bear with us on that. So first, I wanted to give a little bit of a background as to why we wanted to give this talk. So how many show of hands have been to conferences in the past? Yeah, this is good. How many have walked out of a conference and maybe been like, wow, that was great. I know I felt like rainbows, puppies, butterflies, everything's great. But then you reflect on your own experience, and you're like, wait a minute. I don't know what's talking about how great this was, but I'm not having that same experience. Am I doing something wrong? The answer is no. We didn't have barely anyone talking about it. So that's what we wanted to share with you today. Some of the things that might go wrong and ways to mitigate that, ways to look into it differently, and to share it. Because we're all a community. And we should talk about the things that go wrong so that we can share and grow from each other. That's right. And we want you to know that we know some of you are going through this right now. And that's OK. And we're here with you. And everybody has been through this. And normally, when you see in the introduction slides like this, this is where we explain who we are and what we do, but our bios exist. They're on the schedule. And you're more than welcome to read them. So we're going to jump right in. And this is a 10 list. It's not a top 10 list. It's not a not top 10 list if you're an ESPN fan. It's just a 10 list. It's 10 things that we came up with in no particular order. A lot of that reason is to keep you on your toes. You don't even know what the top one might be. So bear with us as we go through the whole list. So for number 10 to kick us off, automation is a solution to our problems. It's also the problem sometimes. How many engineers out there, and this is going to be a little show of courage, have over-engineered a solution? Oh, this is good. Oh, this is a safe space. I like this. I like it, too. I know I have, right? We've created the perfect code. We think it's gorgeous. It's beautiful. And then someone else takes a look at it and can't read it at all. Having a really hard time following it, right? And that's the idea, is you don't want to write automation that does absolutely everything you could possibly think of, and then some, and you'll never have to update it for 25 years. No. You want to do what it's supposed to do, right? Right, exactly. At Pivotal, we preach iterative development, iterative deployment. We want you to deploy early. We want you to deploy so early that first time that you're uncomfortable deploying, really uncomfortable deploying. We want you to deploy often, multiple times a day, if you have to. Every time you push to your source control system, you should trigger a deploy. And that can go for application teams and platform development teams, right? Everybody should be a little uncomfortable because first couple of times you're pushing things out. You're going to have users. You're going to get feedback, and that's the idea. Right. Perfect is the enemy of good. Exactly. Can we think of any other areas in the world where this is applicable? That's a really great leading, because I've got a great idea about that. OK. So everyone, pull out your painter's hats. Think about some artists that you may know, and I'm going to highlight two of them. By the way, I hope you brought lots of hats. We're going to be wearing lots of hats today. There's a lot of hats involved. The first one's Michelangelo. Anybody heard of Michelangelo? Not the Ninja Turtle? We're not as courageous about raising our hand about Michelangelo. That's OK. So Michelangelo, pretty famous, right? Pain of the Sistine Chapel. And Josh, do you happen to know how long it took him to paint the Sistine Chapel? I do know, because we've practiced this talk a bunch. And it was like four years. It was about four years. That's a long time, right? If you think about it, maybe not a long time in terms of creating the perfect painting, but perfect is the enemy of good. At the end of the day, if you boil it down to what it is, it's a painting. I want you to think about another artist. I want you to think about Bob Ross. Has anyone seen Bob Ross do some paintings? Oh, there's more hands. This is good. All right. So Bob Ross, in about 30 minutes, created a pretty good painting, right? I know I've been impressed. And he creates maybe little happy accidents along the way. And that's OK, because at the end of the day, he has a painting. And now I'm granted. Michelangelo, pretty famous. Sistine Chapel is gorgeous. It's beautiful. But they're both paintings. And that's the idea here. At the end of the day, you want to get something done. You want to get a painting finished. Right. Let's take this back a little bit to the tech world. And let's pretend that I'm the Michelangelo here. I have been working on a piece of automation on my system. It's sitting on my laptop. It's beautiful. It's got hundreds or thousands of unit tests. It's covering every possible situation I can think of. But it's not in prod. Does that mean I get to be Bob Ross? Yes, you do. You need a curler or a perm, I guess. You have a piece of automation that you are constantly deploying to your prod CI system or CD system. And maybe it's 70% of the way there. Maybe it doesn't do everything yet. Maybe there's some bugs. She's getting real feedback. She's getting feedback from customers because I don't care how many hundreds or thousands of unit tests you have, your customers will find a way to break it in a way you didn't think of. It's very much a case of, if it ain't broke, I can fix that. It's the same kind of idea when you think about that agile philosophy of you don't want to build the Ferrari out the door. You want to start with this skateboard, give it to your customers, let them try it out a little bit, realize they want the bike, and then you can stop at the bike and iterate on that. Maybe get some flashy features on your bike. You don't have to wait before you deliver something. Get it out to your customers as fast as you can. So Josh, to kind of switch gears a little bit, how do you expect the unexpected? You can't. Why is that? Because it's in the name. If you expect the unexpected, it's now the expected. It's kind of the definition. It's no longer unexpected. Right, exactly. It's just kind of how it works. But you often get asked that. And you get asked that by maybe upper management, maybe your colleagues. Maybe you're asking yourself that. And a lot of times, that's because there is fear. Fear of failure, fear of letting someone down, be it your boss or your co-workers or even yourself. How many here have ever dealt with a situation where you thought, if you fail, you're going to face punishment? I'm going to use that word very generically. But maybe you're not allowed to touch broad for a week or you have to bring in bagels because you broke the build. Or maybe you just feel bad. Right. And you're like, wow, I don't want to put anything out into broad anymore because I just did something terrible. So either we have a lot of very confident people here. There we go. There we go. Hey, there we go. We've all been there. And that's the idea, right? So let's picture this. What would happen if, in, say, an entire production foundation went down? How'd you feel? And really, really down. Let's say someone accidentally deleted the VPC that your cloud foundry system's running in. Anybody really uncomfortable right now? Yeah, a little bit? Everybody want to check your pagers if anybody's holding the pager? So what we're saying here is you've got to give some thought into that. You're going to have failures. It's actually like the one thing you can expect. Things are going to go wrong. It's death taxes and outages. Those are the three things? Yeah. You've got to think about failure and you've got to think about disaster recovery. So we're not saying don't think about those things, don't plan for them. But we're saying you can plan more for how you're going to handle failure, how you're going to handle when those things happen. If it's not if, right? It's when. Anybody taken down production before? Show of hands? If you haven't, you will. Probably will. And that's OK. Here's the thing. You're going to learn from that. We learn from our mistakes. And again, that's the whole point of this. You won't get it 100% right the first time. You need to take a minute, reflect on what happened. These are some of the best learning scenarios you're ever going to encounter. Reflect on that and say, hey, maybe that automation that I put out before, maybe I can add something to that to make it better. Check for this the next time. Iterate on my development. So if you can't expect the unexpected, what should we do? I think we should skip ahead too far. I think we should plan for the unexpected. Like that? Like that. Exactly. Yeah, absolutely. You can plan for the unexpected and you can learn from the unexpected. When you have a failure, when you have an outage, do a post-mortem and make it blameless. Anybody do post-mortems? Yeah. Common thing? How many of you focus on what happened and why? All right, now how many? Just that guy. How many of you focus on who did it? Maybe you shouldn't do that. It doesn't matter who did it. It matters what happened, how you fixed it, and how you are going to mitigate it in the future. Right. I love post-mortems because it's a chance to actually share with your consumers, hey, we know something happened. And we want to be transparent about it. We want to tell you what we learned from that scenario. But one pitfall that you might run into is you just share the bad things. You've got to make sure that you balance that. Share out the good things that are happening too. What are those success stories? What are those GWIS numbers? Get people really comfortable with sharing information. Both bad and good. And how you're learning through the process. Because it can also encourage others to do the same thing. Then you really get into that community aspect and learning from each other. Right. And you can also post-mortem your teams. And I know that sounds morbid, so we actually use a different term. We call them retros. How many here do team retros weekly? Quarterly? Hopefully a little more frequently than that. But the point is, you sit down with your team, gauge the health of your team just like you gauge the health of your platform. Find out what's been going well. Find out what's not been going well and figure out how to change that. So that the things that aren't going well aren't not going well the next time you do a retro. Right. Now here's a little courageous part. Anyone not doing those? Show of hands? Oh, this is good. We're getting some courageous people out there. That's OK, right? Because this is a learning opportunity. You can be the change in your company to say, hey, why don't we share out what we did to fix this problem? Why don't we share out the things that maybe went wrong and learn from other people? It's your chance to stand up, whether you're a formal leader, an informal leader, a developer, an operator. It doesn't matter. You can say, hey, failure? We're going to learn from that. We're going to do things differently next time, and we're going to share out the things that matter. Right. And I'm going to do a bit of a disclosure here. That we know this is not easy. This massive shift in cultural change is not easy. We're not claiming it's easy. We don't want it to sound like it's easy, because then we'd be just back to all the talks that we didn't like before. We know it's hard. We know it's going to take a while, and we're going to hammer on that as we continue on. So for this next one, how do you change the culture around upgrades? This is kind of our second culture slide. So how many out there, and your hands are going to get tired, because we're going to do a lot of this raising hands thing? So how many out there maybe have maintenance windows? Anyone been around at midnight on a Saturday, midnight on a Sunday putting out a change? Excited for that breakfast pizza that's coming at 7 AM? It might just be a Midwestern thing, I'm not sure. It's a breakfast pizza if you have it at breakfast time. There you go. That's a great way to think about it. What happens? Anyone else maybe have an expectation that you have 100% uptime outside of those designated maintenance windows? Yeah. How great is that? Have you been on the consumer side of that? Have you ever tried to go to a page maybe, update your car registration, or possibly make a dining reservation? You see a page that says, uh-oh, this page is under maintenance. Anyone had that experience before? I just had this yesterday. Yeah. How did that make you feel? Oh, I like this great one over here. I felt great. Glad I couldn't update my car registration. It was perfect. Now I want you to think about really that experience and how you could change some of that. Right. Because if you think about it, if you want 100% uptime, while you have that 100% uptime, you're running on static infrastructure. Static infrastructure doesn't get updated. You have zero-day threats just sitting out there waiting to be exploited. Maybe somebody's already exploited your system and you don't know it. And they're just sitting there doing their nefarious deeds with you, none the wiser. And so you really need to get into those small incremental upgrades because it will mitigate those threats. Right. And the idea here isn't that you're just pushing code willy-nilly out to production every day, right? The idea is that you're looking at proper testing practices, scans. You've got continuous delivery in place, continuous integration on both sides, the application delivery side, and your platform side. So that you know when a piece of code is going out, it's been through that testing lifecycle. It's been through those scans. I have confidence in that. Now, is it perfect? No. Of course not. You're always going to run into things that might be problems. Right. And we don't want it to be perfect because perfect's the enemy of good. That's right. You're learning. Oh, yay. I'm learning. The other thing to think about is how many people are at their best at midnight on a Saturday? For work. Some hands with them. So yeah, probably not anyone. Anyone a little bit better at 9 AM on a Tuesday? Probably in better shape around there, yeah? Some people are night owls. Yeah, hopefully you've had a couple of coffees by then. Yeah, here's the thing to think about, though. When it comes to upgrades, when it comes to patches, we've got to start getting comfortable with some of those things happening behind the scenes and trying to introduce changes in a way that aren't going to impact our users. It's important to think about that, but to be comfortable with it. Because if you only have one person on at midnight on a Saturday and something goes wrong, yeah. It's not great, right? They're probably going to have to call the other 10 people who weren't awake, and they're not at their best either. And they just have a whole bunch of people at about 60% maybe. Give or take. But if you're giving that upgrade at 9 AM on a Tuesday, perhaps, maybe you have people close to 100%. You've got your team there, and they're focused if something were to go wrong. Right. And going back to the zero day threat, I don't think many companies will just let known zero days sit on their platform. You address that. You deploy a fix for that zero day. But when you do that, you are acknowledging and accepting risk. So why not extend that a little bit further? Why not check your error budgets? Because you should have error budgets. You should say, I've got the time to run and deploy to upgrade these other parts of the system that may have different security vulnerabilities that aren't quite as critical. Or maybe it's a massive performance increase. Except that risk too. Again, we know this isn't easy. We know that a lot of folks have contractual obligations for uptime. But contracts never last forever. So maybe the next time you revisit that contract, you work towards that. Well, something to think about that Josh mentioned are those service level objectives, service level agreements. Consider those. You have an objective that you're planning to meet with that agreement. Maybe you're at 100% uptime right now. You can introduce some change. You can introduce some feature upgrades. Maybe you're at 80%. And you are trying to agree to 99.99. Probably not in the best shape, right? Doesn't it mean that you don't deploy that critical patch that needs to go out there? It means that you need to look at it with a little bit more scrutiny. Decide, hey, is this the right time? Because I want to make sure I'm building trust with my consumers and introducing that risk when I have the error budget to do so. Sure. So we've talked about some cultural changes. We've talked about now you've got a great system that everybody is deploying patches to and it's automated and you've got upgrades going. But what if you build it and none of the app developers want to get on it? They're very happy where they are deploying to their Kubernetes pod or to their WebLogic server or to their LAMP stack to go very classic. So let's take off our painter hats and let's put on our philosopher hats. And I'm going to ask a very odd philosophical question because it's one that has a right answer. If you have a prod system and it's not taking prod traffic, is it really prod? It's kind of like that a tree falls in the woods, doesn't make a sound. Right, exactly. So you need to deploy your apps to prod. You need to deploy them early. You need to deploy them often. Prod is the happiest place on Earth unlike maybe some theme parks. I don't know about that. We might have to agree to disagree on that one. But it's important that you partner with an application team to really get buy-in. You partner with an application that's maybe taking a lot of traffic, maybe has a lot of high value at your company. Sure, think like an SSO app or a payment processing app. Exactly. And you say, hey, we're going to go through this journey together. We want to help you along the way. There's going to be some bumps. I can almost guarantee that. But we're going to take it together. And I want you to picture this. You're standing on a ledge. Everyone's on a ledge, right? We're all behind everybody. And you've got a product platform team over here saying, yeah, down there is great. It's amazing. You should really try it. We love it. And everyone's looking at each other like, why would I jump off this ledge? Right? Right. What you need is that one person. One person who jumps off that starts the flow. They get down there. They start singing the praises about how great it really is. But wait, are you saying if all your friends jumped off a ledge, you'd jump too? I kind of am. I'm saying maybe in reality, we can be limbings in this situation. We can kind of encourage that kind of attitude. And the idea is, if I hear someone else is being really successful on it, maybe I heard deployment times are happening in minutes versus hours. I'm hearing those success stories. I want to go join them too. It can be great fun. Yeah, and just to recap what we're talking about, remember, if prod isn't taking prod traffic, it's just wasted hardware. It may not be your hardware. It may be the cloud, but you're still paying for it. So deploy to prod early. Deploy to prod often. And learn from your failures. And maybe just maybe slow the lemmings down a little bit because what if we build it and everybody comes all at the same time? Opposite problem, or problem you may have just created by having that first limbing jump off the cliff. Now you've got 300 people saying, hey, this sounds great. I'm on board. Let's go tomorrow. And you're like, whoa, hang on a minute. And this is where you have to balance that. So you may be partnered with that first application. Josh mentioned maybe it's an SSO app. So maybe it's a simple web service. And you say, as a platform product team, hey, I'm going to focus on maybe this capability. I like web services. We're going to start with them first, work through some of the kinks, hold hands with some of the development teams that are getting on board for the first couple of tries. And then maybe I'm pretty comfortable to say, hey, I'm going to open the doors to all web services can come on board. We've had a few go. We've had some success stories. Let's let them come on. And then you focus on the next capability and the next capability to really help kind of slow that rule a little bit. You want to be slow and steady to win the race. And it also gives you an opportunity to test some things out, get it wrong a few times, make some changes, and adapt as you go. You partner with them, and then you can eventually feel pretty comfortable with those 300 people that are coming on board. Right, because if you do that, or rather if you don't do that and everybody shows up at once, then you have concerns around, say, observability. Your logs are now maybe 300 times or 400 times more full than they were before. We're going to talk about logs a little bit later because it is her favorite subject. One of. One of. It's her favorite. If you think about it, what happens if a platform failure were to happen when those 300 applications were trying to go to production? Because maybe you didn't try it out a few times. Maybe there was just an application misconfiguration that you weren't aware of, or that you needed to make some more educational time with. You've got to think about that. Your blast radius can be a lot wider. So if you go with the slow and steady approach, and then you open it up to the masses as you go, you're going to be a lot more comfortable with who comes on. Right, because those logs may not actually be structured the way you thought they were going to be structured, and it's going to make them much harder to search and to write the appropriate queries for. In addition, maybe your dashboards that you have showing you health of the system, maybe they're getting overwhelmed because you don't have appropriately scaled ingestors, or maybe you take down your log system. We're going to definitely talk about that a little bit later. And speaking of alerts, did anyone run into the problem where you've had a bunch of alerts coming all at once? Yeah, have you experienced that? And then you're like, oh, I don't know which one matters, and which one doesn't matter. And I'm having a hard time telling the difference. If you do this slow and steady role, you can have some jays to make some tweaks on that. Same thing on the application side. Maybe it's a different system that they're deploying to. So as an application developer, you want to take a chance, look and test, figure out what your logs are going to be, and slowly roll on as you maybe have a smaller number of people there. Right, and you're also going to get a chance to make sure that your error budgets are actually holding up, to make sure that you chose your SLOs wisely, and to make sure that you're calculating your SLOs appropriately, because that's really hard. There were talks at SREcon last week about calculating your error budgets correctly that I highly recommend you check out when they become available on YouTube probably this week, or maybe next. So let's switch gears a little bit. Yeah. And let's talk about, why is all the hardware gone? Yeah, and we don't mean it walked out. We don't mean that it was a Black Friday sale and people came in and took it. We just mean we've used it all. Yeah, so this is a pretty good segue into capacity planning. Anyone doing capacity planning? I want just a few people. Anyone just trying it? Let's see how it works. Yeah. So most companies have a fairly long lead time when it comes to getting actual hardware in the door. But that's just your opinion. That is, that is. It does not represent the views or values or opinions of my employer. It is only my opinion, mine alone. Because I'm not running out to Best Buy and buying a server and installing it myself, right? Probably not a lot of people are. You've got a lot of process in place. You've got a lot of teams involved. You might even have a separate infrastructure as a service team that you're dealing with. That's even going to be true on the public cloud side. Yeah, the public cloud does not protect you from this. If you have to do quota increases or get VPC increases on your account, you know, sometimes that account is tied to a VP's email address. You know how hard it is to get a hold of a VP sometimes? Sometimes it's very difficult and it takes a while. Lead times are real no matter where you are. And they're long. They are. And so the idea is if you're using some of the tips that we talked about earlier and you're gradually introducing workloads, you might have a little bit of a chance of mitigating some of this, right? Including not buying all your hardware at once, looking at it, and it's sitting idle there. You're only using like maybe 10% of all the hardware that you think you need forever. And then maybe someone comes along and borrows some of that hardware. Right, and borrowing hardware is like people borrowing my CDs in high school. They're just, they're gone. I don't think anyone borrowed your CDs in high school, Josh. But where did my Smash Mouth CD go then? I don't know. I guess it's just walking on the sun. This has to do with, so this even has to do with, on your platform side, you're worried about actual physical hardware, right? On the application side, you might be worried about instances. You might have to think about, hey, what volume of traffic am I going to have? And you really need to think about how you're going to scale that appropriately. This is again where I'm gonna say testing is so incredibly important. Performance test. Understand what your load's going to be. Understand how many instances you might need in a new platform or a new environment so that you're kind of planning for this and you're gradually introducing it. Right, auto-scaling your apps will not fix your performance problems. Will not fix your capacity problems. It might make your performance problems worse. But in addition to considering the max capacity for your whole foundation and for your apps, you need to think about your max effective capacity. Folks that are actually running Cloud Foundry, how many of you are running in 3AZs? Anybody running in five? If anybody's running in one, just save yourself the embarrassment. This is a safe space, Josh. It is, it is, I'll give you that. But if you're running 3AZs, we're gonna use that as an example. It's a pretty common example. You should really be running at about 67% of your capacity because when, not if, when you lose an AZ, just ask Amazon, when you lose an AZ, you can fail over all your apps and all your components and there's room. And this also plays in when upgrades are happening, right? Things are gonna be taken offline and brought back on. You need to have that space to make sure that things are still running smoothly. On the application instance aside, we talked a little bit about that earlier. Make sure you don't just have one. If you have one and that thing happens to get upgraded, you're out of luck. You've gotta make sure that you have the number of instances available to really spread that load and take advantage of effective capacity. Right, and if you are running Singleton apps because that's what the architecture of the app requires, maybe it's time to look into re-architecting the app unless you can handle downtime. Right, exactly. You can't have high availability with only one instance. Keep in mind that you really have to think about, if you plan that 67%, you've gotta think about that budget and making sure that you're also planning that for your teams too. You don't wanna run into your teams at 100% either, right? Right. And we're gonna talk a little bit about that later. So let's kinda go into a segue here. Root cause analysis is a topic that is far and not dear to my heart. But how do we do it when the root cause is actually gone? So how many people have experienced a failure before? It's gonna be a generic failure. Yep. Yeah. And then what happens if someone comes along and says, hey, what was that root cause? What caused that failure? Happens a lot, right? We can ask that quite a bit. And what happens in this case, if maybe your Bosch Resurrector magically fixed the problem for you? Server no longer exists. And you gotta ask, hey, what's the root cause of what happened? And I'm gonna throw a grenade here and say that there's no such thing as a single root cause in a distributed system like Cloud Foundry. It is a very complex distributed system. 99.9% of the time, maybe higher availability than that. That cause is gonna be a complex confluence of events that may not be related in any way. It just happened to hit all at the same time. Maybe we can identify some of those events. Hopefully we can correlate them. Right. And that's why it's incredibly important to think about logs. Where are your logs going? I told you it was her favorite. Did they just disappear with that fix that magically happened? I hope not, right? I hope that they're going to a place where you can look at them later. Where you can say, hey, what really did happen? Do I have a history of what's been going on in my system? Can I take a look at that? Because you don't wanna lose those. And we're going to talk even more about logs. Right. And maybe it's not a big failure where something died and a VM was recreated. Maybe you had a thousand percent spike in latency. Maybe your app started serving requests every minute instead of every 100 milliseconds or something like that. How many folks are running APM tools for their apps? Wonderful. If you're not, there's a lot of folks in the foundry that specialize in APM tools. Go talk to them. I'm not here to make a pitch one way or the other on any of them. They all do things a little differently and you can find what's perfect for your company. But go check them out. APM is very important. Because the idea here is it's very, very hard to debug a problem or to figure out what's happening if you don't have the logs that tell you a story. Right. Do you really wanna try to guess why your latency spiked a thousand percent? I don't. Well, and I know I kinda just wanna know, right? I just wanna figure out what happened because I'm interested. And I wanna think about how I can improve things later. And that's the idea. Go back to something that you can actually take a look at and say, hey, I can make this thing better. I can put in some improvements. And if you don't have those tools, good luck. So here's someone, probably a lot of people's favorite topic. What happens when the regulators get involved? You weren't excited to hear that? What about security, change, compliance? What happens when those teams get involved? Yeah. Oh, I'm hearing some grumbling. I'm hearing that too. We're about to rock your world. So who here has to worry about sensitive data, sensitive workflows, NPI, SPI, all the famous data terms out there? Any TLA that ends in I. And how frequently do you happen to talk to your corporate security groups or maybe your regulators or change or compliance? Maybe you talked to them about twice a year when they come in to audit your system. They say, hey, here's a whole list of things I need you to fix tomorrow, right? It's probably what may have led to the grumbling earlier. And here's the thing. I'm gonna tell you a really, really big secret. Those people in change, those people in security, those regulators, they're people too. Is anybody from change or security in the room? I don't believe you. But they are. That's the idea. They're people. And the idea is to treat them like people. Get them involved in your process early so that they're not coming in twice a year and giving you a list of things that you have to fix. Instead, they're from the beginning. They're bought in. They're like, hey, let's do this together. How can we make changes as a team and not just as individual units? It's really important to also take a minute, take a step back and think about your production flow. How do you get a change from a developer's workstation all the way into production? Has anyone mapped out that flow before? It's a really fun exercise to do if you haven't. And I highly recommend it. Think about all the different things that you have in place, the wait time that's built in, the automation that's there, all the things that you may have put on that doesn't necessarily need to be there. Right, and I bet the first time you do it, you're not actually going to believe the number that you see. I've had customers do the value stream map and start to finish was 18 months. And they didn't believe it. It can be really surprising because you also uncover maybe some process or control that people have put on in different areas that doesn't have to be there. And so when you involve the security folks, the change folks, the regulators in that conversation, they can help you tell that story and say, hey, this is something that doesn't actually have to go there. That's process we can remove. And then you can look at once you've removed the process, it doesn't need to be there. What can we automate? How can we make this faster but still meet security and compliance requirements? That's the key. Right, and as you're building out your platform team, and we're gonna talk about that a little bit more later, it's really important to go as cross-functional as possible. If you can get someone from your compliance team or someone from your InfoSec or corporate security team dedicated to the product, do it. Do it immediately because you're gonna be able to educate them on the platform and you're gonna be able to work with them because here's the thing. A lot of these scans, a lot of these scanning tools, they are built for kind of a more traditional pure virtualized infrastructure model. And when you run it on a pass like Cloud Foundry, you're gonna get a ton of false positives. I've seen it happen every time I've done it. You're gonna get a ton of false positives and if you can educate them on the platform and show them why it's a false positive, you can work with them because as you continue meeting your compliance goals, you are going to fulfill the requirements on that compliance checklist in a way, in a lot of cases, that was not written because the platform just does things a different way. And because of that... Well, I'm also gonna challenge you a little bit to open your eyes too because you may learn things from them. You may teach them a little bit about the platform, they may teach you a little bit about compliance and security and some things to look out for. And then you can work together to automate a solution that's really cool, that you can put out as a group that maybe gets your changes into production a lot faster than they were before. That's why this is so important. Okay, two things. One, I'm now convinced that they're people. Oh, good. And two, I fibbed a little bit earlier. I said we didn't put these in any order, but we did, a little bit. We saved our favorites for last. So we're gonna go to Stephanie's favorite first, which we may have foreshadowed. All right, so we're on Safari now and all you see are logs. They're everywhere, there's logs everywhere. Logs, logs, logs and I can't get rid of them and they're everywhere and they're overwhelming my system. I don't know what to do. Because who here has a log collection system? All your logs are going somewhere, yeah? That does not surprise me. How great is it if you send millions and millions of logs to that log collection system all at once? It's not great, Bob. Not great, right? You're probably in that situation where you're like, there are too many logs everywhere. And part of that is because there's a great thing when it comes to a lot of these cloud platforms, a lot of the new technology that's coming out is that you get a lot of power. Get a lot of freedom to do a lot of really cool things, right? It's one of my favorite Marvel movies that's out there. With great power comes great responsibility and that's key here. Absolutely, the Cloud Foundry Foundation has been preaching for years the freedom to create. But the freedom to create also brings the freedom to make bad decisions. And hey, we fail sometimes, right? Yeah, exactly. Maybe you didn't make a bad decision. Maybe you just accidentally left a check off and now everything is in debug mode and you're sending a million lines of logs an hour to your log system, maybe a minute. Maybe, and there are logs, logs, logs everywhere. Yes, exactly, you're buried in logs. You need a log plow. And here's the thing, debug is an awesome tool, right? It's great when you have a problem and you need to debug something and you need to figure out what's happening. But please, make sure that you turn it off. That's like the one thing I want you to walk away with is use debug for good, not for evil. Right, has anybody in here written a custom firehose nozzle? Cool, thanks, Shawn. Maybe if you guys, and you folks, excuse me, if you folks decide that you want to write a custom firehose nozzle, add filtering in from the start. Maybe filter out the word debug. Just skip those messages. That becomes incredibly important when you're going into your production environments. Because no one wants those logs showing up in prod. Right, and some of your bigger log collection systems have interesting licensing models where maybe you pay for the amount of data that you index. And let's say you're in debug mode and you accidentally get in an infinite loop of exceptions and you blow through your index allowance in a day for the year. It can happen. And the idea again is that you're using these for good. Logging is always great. You want to make sure that you know that your systems are doing what they're supposed to be doing, but you want to do it the right way. And some of that's also going to be educating folks. So maybe if you're moving from a different platform to a newer one that lets you have more freedom, it's about working with those application development teams, educating them on how this thing works. And you do have a lot of freedom. You do have a lot of power. So here are the ways that you can use that. And here are some things that you can do a little bit differently, but still get really great results. All right, so this next one, I think is Josh's favorite. It is. Okay, whoa, whoa, whoa, whoa, whoa. There's that word again. I don't like that word. You don't like building? Perfect. Oh, perfect. Perfect doesn't exist in the tech world. You must have. All right, let's, yeah, okay. That is better. Let's build a good product team. And we haven't said the word product team much yet, but that's what I mean. I mean a real product team. And you may think, but Cloud Foundry is the product, or my APM is the product, or the apps that we release to our customers, the end users, that is the product. No, the platform itself is a product that you provide to your customers. Your customers are your developers. This product needs to be branded. You should have a designer on your team. It should be branded. Maybe someone from marketing is now embedded in your team and they're coming up with a great name for your platform. Maybe it's something cool and spacey, like Galactus or Alderaan. Maybe not Alderaan, it didn't end well for them. Oh, come on, a couple people got out, it was fine. It's fine. This is fine. But you should have that branding and you should be updating your product regularly because that's what products do. They get updated regularly. Stale products don't get used. Stale products don't get bought. Well, here's one of the things. So Josh mentioned a couple different roles you might wanna have in your product team. That's maybe more of a perfect world situation, right? Some of us aren't there. And that's okay. You just have to think a little bit like maybe a designer. You have to think a little bit like a tester, like a QA person. You have to think about a product manager and consider those things and actually treat it as a product. Whether or not you have those actual folks that have that role, that doesn't matter as much as thinking about those things when you're developing your product and putting it out. Right, and one thing about a product is products need to be responsive to their customers. You need to respond to their needs. And that is not me up here saying that the customer is always right. They are super not always right. But you still need to respond to their needs. Exactly, and that comes into anything about prioritizing a backlog. Maybe you've got one of those critical CVEs that Josh was talking about. That's probably gonna take priority over possibly a customer request like that in an IngenX build pack or something. You've gotta prioritize that backlog. It doesn't mean that you forget about customer requests or just say, oh, hey, you're not right, that's okay. You add them in your backlog. You think about that because you're a product. They're your customers. They should really drive where it's gonna go. Right, and think about that balanced product team that you will work towards achieving. You should have engineers, obviously, you'd have someone to work through the backlog. You should have product management. And I said product management, not necessarily project or program management. Those have their place. Those folks have their place. That role can be important. Maybe not part of the team directly. Maybe sitting outside of the team with a more overarching goal for the entire initiative at your company. The designer's very important. If your company does manual QA, get a tester. Ideally, get a pair of testers and get them on the team so that they can help you. Test code that goes through, test apps, all of that stuff. And remember that this applies not just to your platform team. You want to treat your platform team like a product as well, but it also is applications. Applications should be product teams too. They should have things that they're doing from beginning of life cycle to end of life cycle. They have on-call. They have incidents that come out. Their customers are just a little different, right? Because they might be your actual end users and you as a platform product team might have customers as developers. Same kind of concept though. Yeah, and if you don't have a product team, you can't have swag. And everybody wants swag. Everybody wants stickers, t-shirts. We're talked a lot about hats. Hats are great. But the on-call is super important. How many folks here have do on-call for, like actually do on-call for their product, for the platform itself? Do any of you also do on-call for the apps running on it? That's not great. You did not have a hand in writing those apps. You do not have ownership of those apps. You probably can't put in changes to those apps. Your apps and their teams, they need to take that ownership and to a point feel that pain with you. Your app teams should hold a pager when you're holding a pager. When you create a product, when application teams are also based around products, they have an ownership. So you've actually owned that product from start to finish, like I mentioned earlier. And so it's not just one team that always gets called when something goes wrong. You have a product. You wanna get called because you're the one who put out the changes to begin with. Your team's the one that did that. So you wanna be the one that kinda looks and says, hey, what are some ways that I can get better? Let's talk about what maybe has gone wrong. Right. And how many folks in here are managers? Like people managers. Great. This part specifically for you. When your people are on call, don't make them work the backlog the next day or when they're in the office. But if they're on call, they should not be working the backlog. Let them work on other things. Maybe it's reducing tech debt. Maybe it's fixing things that they found the previous night on call. This goes back to being responsive to your customers and having folks not fully dedicated to product and feature development in the backlog. Let them use their brains the way they see fit when they're on call. And with that. Yeah, I think that's about it. But we have time for Q and A. If anybody has any questions. Thank you for that, but that was not an answer. Or a question. Yeah. Any questions at all? Hi. That's a great presentation. Thanks for scripting as well. I have a question. So in the bigger organizations, you have different groups who are responsible traditionally upstream infrastructure. Now we are introducing the product teams. You talked about the culture shift. It takes years. It takes time, right? Can you? Absolutely, it takes time. This is one place where I think that both the top down and the bottom up methods for building support are critical. You can't, you're not gonna be able to buy fiat, tell people we're going to go cross functional, everybody's going to be happy, and it's going to be rainbows and puppies. But at the same time, if you don't have executive buy-in, if you don't have the folks that can make that change, you are going to fail in different ways because if they don't have the buy-in, they're going to see you changing the structure and they're going to make you go back to what you had before. Here's one thing I'm gonna add to that. It doesn't mean that you can't change small things, right? It doesn't mean that Greenfield doesn't work because it does. If you get a groundswell of people who are doing something, it can be really incredibly powerful and tell a story all the way up. If you've got people who are collaborating, who are sharing those failure stories, you could start that momentum towards building an entirely different culture and you can start that right at your own team and say, hey, we're going to do things a little differently. Absolutely, and a lot of time that's what you need to get that executive buy-in. But here's the thing, it's going to take time. Absolutely, and I'm not going to sugar coat that. It's absolutely going to take time and what's incredibly important is to learn from other people. How are others doing it? How are you going through that transformation? What are some ways that you've done it differently and know that you're also going to face some pain points along the way? Things aren't going to work out. You may not even split up your product teams exactly right the first time, and that's okay. Be willing to make changes, be willing to take a look at things and do retrospective type activities on how things have moved through your transformation as well. Right, experiment, experiment and be willing to fail. If you don't go into it willing to fail, then you're not going to take the risks necessary to really affect that change. That helped answer your question. You got another one? Oh, and the financing side of things and cost. Traditionally you have a budget for, you have IT budget where you plan for infrastructure. Now you're living in a dynamic world like you don't know what your consumption rate is going to be, business decides now I can launch a new product in two months, my budget going up, how do we tackle those? So that's a great, could you mind if I tackle that one? So because I've seen this in a lot of different customers, everybody wants to do charge back and show back day one and it's not tenable. You really need to effectively manage the platform and really know where your folks are coming from, what teams, what orgs, who's using what. I mean, before you start showing them and charging them effectively, you need to know for yourself first. And a lot of times that takes a while. And it kind of put like a little bit of a different spin on that. So when you think about capacity and you think about deployments that are going out there, that's another thing, you're probably not gonna get it right the first time, right? You might start with a minimum number of capacity that you're giving your orgs and then that's probably gonna have to change because you have to see how usage is going. How are people really using this system? How should it look in the future? Maybe I have one product that has really high usage and needs high capacity, I have another one that needs to be pretty low. So you've gotta think about this kind of a platform that's gonna scale, things that are going to change over time, whether you're in application that's going out the first time, you only have 10 users, or you're a platform that's going out and you're looking to put new applications on it. Right, and do you charge prod differently from test because I hope you have a test environment. Testing and prod is great, but it shouldn't be the only place. Also, we should give a round of applause to this room because the participation levels with all those questions I thought was wonderful. You're all taking very good care of your shoulders, not injuring them. Yep, it's arm day. To be able to raise them up, so thank you. And your hats were beautiful. Yeah, the imaginary hats were wonderful. Thank you for your participation. No, they're not, no one's clapping for this room. Okay, any other questions? They got it out of the way already. Any other questions? We'll be out in the hall afterwards. Thank you, that was amazing, awesome. Thank you. Thank you so much. Thank you. Thank you.