 Good morning. Good afternoon. Good evening and welcome to another edition of Cloud Tech Thursdays here on OpenShift TV I am Chris Short executive producer of OpenShift TV. I am joined by the one and only Amy Merritt from my technical marketing management team Thank you for being here Amy. And also our special guest today is Holly Cummings who is one of the contributors to the book that we both co-authored with 90 some other odd people the 97 things to Make cloud engineering easier, which I'll drop a link to in chat Holly you're also an IBMer You have a PhD you're Just kind of everywhere and everything and today you're talking to us about a very important subject I Can't do you justice and you know, so could you please introduce yourself for us? Yeah Thank you. Yeah, I struggled to introduce myself to you I sort of I do all sorts like life would be boring if we did the same thing all the time But I am most of what I my sort of my day job is is helping people become cloud native helping You know sort of businesses go from we think that cloud thing is a good idea But it's a lot harder than we expected to actually we're getting the full advantage of the cloud Cloud is not as simple as people think sometimes and that's surprising to them So you're talking about how not direct the planet today specifically around Kubernetes How did you like do you want to talk about like the the premise of the talk and then dive into it? I'm gonna let you kind of run the show now yeah For sure, so I'm sort of combining two topics that are hopefully of interest one is you know cloud and how to how to use the cloud in the best way and then the other one is the sort of the future of the human race and how to avoid You know Cataclysmic existential destruction, which hopefully people also think is important and I sort of um, yeah I started thinking about this because When I was learning Kubernetes, I kept finding myself doing things and then Realizing a wait a minute that really wasn't very efficient and wait a minute How much is this costing and wait a minute? Is it just me who's doing things in this slightly inefficient way? Or is it all of us and then I sort of look ground and I realized it, you know, it's maybe not all of us, but a lot of us are sort of Falling into these patterns because of how the technology works and because of how people work That maybe aren't optimum And then you know the sort of rotation is and how much you're figuring out on your own definitely leads up to that Yeah Yeah, and it's I think as well like it's about what people are good at and what people are not so good at and in general tidying up and being sort of economical and using the minimum and optimizing are things that we maybe don't do early in a cycle and Technology changes so fast that we're kind of always early in some cycle and there's always something that we're learning and trying to figure out by sort of bashing around But sort of you know going on to the kind of the the existential dread part and why why this matters, you know Sort of pretty clear at this point that we we have a problem and you know The earth is getting warmer and it's getting a lot warmer and I think it's sometimes sort of hard to get your head around the Then the numbers because when they talk about the sort of the warming they say oh well in the next 10 years It's gonna be one and a half degrees warmer than it was in pre-industrial times Anything one and a half degrees like, you know, if you went outside and it was one and a half degrees warmer We'd be like, oh, this is nice. I'll have an ice cream today Even if you even noticed it all But at a sort of a planetary scale that one and a half degrees, you know, it's not just oh, it's a little bit warmer It's no is really uncomfortably warmer and we see all sorts of consequences that are pretty scary like, you know So we see drought Obviously we see floods because the sea level rises and and for some things It's not just a case of the sea level rises. You have a bit less beach, but for some island countries, they're You know, they're in pretty grave danger of being gone. Yeah, there's sort of Really interesting things that the Marshall Islands are doing to try and figure out How do we still have a country when the physical landmass that we were all sat on doesn't exist anymore? And they're sort of like migrating in big batches to parts of the state so that they can kind of keep that national identity, you know, it's sort of really kind of scary and and huge things, you know, so then you know that the Subversion and then Hurricanes we see as well, you know, they're sort of becoming an increasing problem in a lot of parts in the world And they're quite directly related to the sort of the the rising ocean temperature then creates the conditions for hurricanes and fires are becoming a big issue Yes, and and so all of that, you know, I think some of them it still kind of seems quite Abstract and it still kind of seems like something that's going to happen far away to to other people and Far in the future, you know, like oh, maybe my children have to worry about it But like it's actually really soon and really near So about a year ago. I had a privilege of working on a project where we were Working with a startup who did climate risk and they we sort of migrated them to open shift And we re-platformed them to because funnily enough there was a lot of demand for their product and they needed to scale So we were sort of doing various things with them and At one point and we sort of looked and we thought oh Silly us, you know, we've been migrating this logic from one place to another and we've made a mistake because we were looking at this flood graph for Tokyo and it just sort of went up to the top of the graph and it just like stayed and we're like You know silly developers. We've got this wrong and then the the sort of the The CTO came by who really knew what was what was the data and he look and he said actually no That's that's how the graph looks the situation really is that bleak in 2030 in Tokyo, which sort of I think kind of brings it home that it's not You know just something that's like way in the distant future. It's yeah, because I don't think away Tokyo being near the water I mean, yeah, you fly over to get to the airport, but then you ride on a train for an hour Yeah, imagining it being gone on the map is Phenomenal or scary actually. Yeah. Yeah They're doing them. They're doing some super interesting things in terms of their flood defenses actually, they have these sort of great big underground cathedrals that they I mean not literally Cathedral's but they're sort of shaped like cathedrals to absorb the flood water and to be the flood barrier so that they have that kind of physical resiliency To compensate, but I think that you know, that's even in that only hold so much. Yeah, like that's like a band-aid basically Yeah, and and it's expensive as well, right like that, you know You sort of feel like the solution can't be all we need to do is dig fast underground cathedrals under all our coastal cities And we're fine, you know, it's got to be something a bit more sustainable Than that. Well, yeah, they're at sea level anyway, so you can't really dig down anyways, right? Yeah. Yeah. Yeah, if you're underground cathedrals already filled with water Doesn't help as much as you hope So then you know, the sort of the question is well, that's you know a bit scary a bit surprising You know, glad I don't live in Tokyo, but what does it have to do with me? You know, I'm just a software developer But the thing is our industry Is really quite a significant contributor to climate change and so then that means like for us You know, not just those those other software developers, but like me and you You, you know, we are contributing to climate change just kind of uncomfortable and To sort of like to get a Yeah, I was just gonna say the fact that we're Moving the needle in the wrong direction is disturbing, right? Yeah, that that too. Yeah, it's sort of, you know, and we're sometimes saying We're getting less bad less quickly than we hope then we thought we might but you know, that's still We're still like using more and more energy Which is Yeah, like there's a long way to go from, you know, increase increase increase to decrease And then the people who are just lying to it and don't see anything and aren't worried about it Yeah, and I think like I think that visibility Is a huge part of it and and you know, we can talk about it in a bit because Like it is, you know, it we're blind to it because a lot of it is kind of invisible And it's kind of subtle and it's kind of hard to track Yeah, unless you look at glaciers every day, you're not really going to notice or you know Pull up a picture from 1930 and 2020 You're not going to notice the difference, right? Yeah, and like and the sort of the Venn diagram of people who are looking at their servers And people who are looking at glaciers every day Pretty much zero There's a whole bunch of people on the glacier sort of great And then we're sort of looking at our servers going I don't see a problem But but like if you sort of, you know, think like I think we sort of always think of like, you know Flying, you know, that's the kind of the poster child for being really responsible in terms of the climate, right? But then if you look at the actual numbers So flying contributes about two and a half percent of the The worldwide sort of carbon footprint. So that's, you know, that that's big, but there's others that are bigger And if you look at data centers, it's sort of really hard to calculate. So depending whose numbers you look at It's between one and two percent. So that's like Pretty close to aviation, but we don't You know, we don't sort of think of ourselves as as anything like as bad as aviation, but we Probably should be And then the there's probably a case being made that we are worse than aviation in some parts of the world Yeah, I think so because this is like this is just data centers and then when you sort of look at the bigger ecosystem it gets even more in terms of like all of the the hardware and the physical construction of the data centers and the devices and Yeah, the cooling system and the networking system and everything else that's involved in the data center Yeah, yeah and then the sort of the the connection to kubernetes is kubernetes is the operating system of the cloud. So a lot of those data centers now have a lot of kubernetes kubernetes in there so then we you know, we need to start thinking about it and Like I think one way that we sometimes think data centers are Different from aviation is data centers have a lot more potential to run on green energy and in some parts of the world Like iceland, you know, there's a huge industry of renewable energy data centers, but if you look at it overall 80 percent of energy is still fossil fuels So like that's a u.s statistic But so even if if the energy Even if the data center is using green energy Which it may be and it may not be and if it's in virginia it almost certainly isn't Then that electricity is something that could have been going into a car Or could have been going into something else except that that you know, it got sucked up by the data center So then you know going back to the to the sort of the impact, you know, the earth is getting warmer and You know, I think we sort of seen all this so then Yeah, whoops. Yeah, we've seen We've seen a lot of this so but you know going back to the sort of the Where does kubernetes fit in? I think you know Where we sort of start is like I have a container I want to run my container because I've written the best application Right, but then we sort of think oh, but you know instead of having one container Such one I could have Yeah Yeah, I didn't draw 20 because I You know, we can have like a lot of containers because that's the sort of the joy of containers Is there so easy and we can auto scale so we can have 100 containers for five minutes Yeah Yeah And then like once you do that, you know in order to run all your containers, you're gonna need an orchestrator You're gonna need a kubernetes and then you sort of think but why would I have one cluster? Right. Now I need redundant Yeah Yeah And then like and that was what you know what I sort of started to see because I sort of Naively, I imagined when I was learning kubernetes that you know, it was the virtualization platform and we were all going to share it And then I realized it just doesn't work that way and they just these things just kept proliferating and proliferating And I think like that trend I think is getting even more and we're sort of shifting from Like I think when we first started talking about containers We kind of imagined the container as the unit of deployment and look I put my thing in the container and you're good But now a container that's not an application. That's just like a small part of it So now all of a sudden the cluster is the unit of deployment So then that's sort of Maybe okay or may not be you know, so we'll sort of Come back to that and I yeah, I really sort of started to realize that it wasn't just me Because about a year or two ago I was sort of at an internal event and they were sort of talking about the IBM You know managed kubernetes service and they were saying you know, look how great it's doing and you know Look at the sort of the take up we've seen and I sort of I was sat there and I was doing the math So do you want to guess how many clusters we had per account? Oh, probably over 20 maybe I'm gonna go with 100 Yeah, you're um chris is closer is 21 Yeah, I mean and then you think like some customers are gonna average of all customers, right? Yeah. Yeah Yeah, and so then it's like well, why is that bad, you know, we don't sort of but There's sort of two things that I think Affected and that we sort of need to think about when we think like is this an okay model or is this not an okay model? so we need to think about the utilization and we need to think about the elasticity and um So if you sort of imagine that you've got Your um Your cluster and you've got your application Of course, as soon as your application's in the cluster, you don't just have the application You've got like a control plane there as well and the um You mentioned auto scaling in like the the the elasticity Within a cluster is really pretty good. So, you know, I can go up. I can go down and That's pretty easy. The platform will take care of it for me But the problem is is my if my application is going up and down I still have all that cluster provisioned that Isn't going up and down So you like you can do cluster auto scaling um But it tends to There's sort of a few problems with it. One is it's a lot harder to do You'd have to sort of go through some hoops compared to the application auto scaling and it it um They tend to be optimized towards not starving your application of resources and making everybody hate you Which is a reasonable default. So they're much more willing to scale up than to scale down So if you're sort of trying to be really efficient and you turn on auto scaling, you may find it You know, it's it's always too scared to shrink your cluster um And the other problem is even if it does shrink your cluster Then all of a sudden that control plane that was like a small proportion of your overhead Is a huge proportion of your overhead So it maybe wasn't as bright an idea as you thought to have you know, sort of this tiny cluster because clusters have overhead And that I think like when we think about clusters and when we think about the cloud, you know, it's so abstracted Unless you're actually like provisioning it and you're sort of saying, okay I want a machine and it's got four CPUs and it's got 16 gig of ram You're sort of thinking I was just whatever it's just, you know, it's all virtualized It's all, you know, somewhere. It's not an actual physical machine and actually probably it is a virtual machine But you know, there's still like there's a physical machine there and using electricity. Yeah And they tend to like the the sort of the the Kubernetes architecture is that the The nodes are really tight closely tied to the machines. So at the application level It's all virtualized and fluffy and it just sort of wanders around and is elastic But the cluster itself is kind of, you know, it's almost seems like a gravity You know that it's kind of there sort of like dragged down to these to these machines So, yeah Yeah, you see that. Oh, it's only 30 cents an hour. No big deal Right, but it's actually there's a lot to that when you start dying all these machines together Yeah And then you start adding things to it and that brings up the price and everything else Yeah And so sometimes like when I sort of started looking at this There's a lot of really great things to be said for serverless in this context because with the serverless model Your application really maybe does have just sort of A tiny footprint and it really does have that great elasticity And so I sort of You know started to hear. Oh, well, all we need to do is go to serverless and That's okay. And, you know, if we but if we go to serverless and stay in the Kubernetes ecosystem Then, you know, we're looking at k native, which is really good But it's still sort of it's still in a cluster. So like I started to see this pattern So k native, you know, will allow you to scale your application down to zero But scaling the application isn't the problem. It's the scaling the cluster So if you like spin up this whole great cluster so that you can run one little k native application Which is the pattern I was seeing like, well, that's not really going to help So I think managed serverless managed k native With that kind of multi-tenancy, you know, then Then that's really good. And I think that's potentially a really great model for Some kinds of workloads But only some kinds and otherwise not everything fits that serverless model nice and neatly Yeah But you're right that k native piece does have to sit um A cluster of some size Yeah I mean everyone everything sits somewhere. I mean whether you're using virtual machines or Containers, there's still the underlying hardware that people forget about Yeah And then I think it sort of starts to be like who are you sharing it with so like When I was first learning k native, you know, I started to do it and I started to provision stuff And it's like, okay, great. Just give me your cluster and I was like, well, I don't have a cluster I don't want a cluster. I want there to be a cluster But I want someone who isn't me to to be doing it And I want it to be shared with a lot of people because I'm just playing around right And I don't want somebody to be hogging all the resources on our shared cluster, right like you You have to have Multi-tenant units so that everybody can have a fair slice of the pie. So it's like you've increased infrastructure just by doing multi-tenancy too, right like That federation component also has cpu cycles behind it as well Yeah Yeah, totally. And yeah, like multi-tenancies really really hard And that that's sort of like, you know, if you think about like why it's happening and you know, why are we wasting these things? You know, there you sort of I think you kind of want to imagine that like somewhere in your organization There's dr. Malis and they're just sort of going raw I can you know use up all my organization's money and I can destroy the planet at the same time But like, you know, there isn't actually, you know, we're all trying to do the right thing It just somehow ends up not quite being the right thing overall so like Yeah, and like I think one of the things with, you know, multi-tenancy is Again, you know, with the sort of when I started learning kubernetes. I saw that, you know, you can have a namespace I was like, oh great. That's such a nice model for multi-tenancy. You know, we can all have our own namespace And then we're sorted and we can share but like, you know, no namespace isolation Yeah, it's not it's it's good. It's not a physical virtual barrier. If you will. Yeah Yeah, like it's it's so porous And like so then you sort of end up with this thing that organizations won't trust untrusted workloads in another namespace in that in that same thing and trust ends up being sort of a fairly limited thing and then you look at your network topology And you realize or you know, you're sort of Cluster topology and you realize it exactly measures, you know, mirrors your organizational topology And you know that team over the corridor who have like a different org chart You're not going to give them your cluster because You know, why would you and you know, like there's all sorts of things so like part of it is that You know, I'm paying for this cluster and it's 29 cents an hour. Thank you very much So I'm not going to like let you use my resources for free And then like Amy you mentioned the sort of the noisy neighbor's problem, which is like What if they use all the resources? and Like that's so easy to do and you know, I sort of it was actually that same project where Where we were sort of migrating to open shift and I set up this tecton build and I didn't have my logging quite right And you know, and I think there was sort of a little bug in tecton that I was just sort of Basically managing to trigger about it being a bit noisy with its logging And so every time we would run a build the whole cluster would grind to halt And that was kind of okay because we were all one team So my colleagues would go holly, please stop running the build still you fix this bug and I'd be like But we need to do continuous integration. It's all okay and they'd be like holly, but you know We were one team so it was okay, but if it had been like those weirdos across the hall I would have been like get off my cluster. You you know until you sort out your builds because yeah you know And and then it yeah, I was just gonna say right like that brings up the sre model where it's very much like You have your way of doing things for your team and then your your other team has another way of doing things And but yes, they're all gonna have their own clusters and if they all have their own accounts That means they could have up to 21 clusters per account. That's pretty wild And we're talking about multiple application teams within one organization. You know, that's A lot of infrastructure underneath those clusters sitting there Yeah And and I think like the bigger the organization as well than the sort of the harder it is to sort of break across the silos And like and some of the sort of the namespace leaks can be really really subtle and really interesting so like when we um A few years ago You know, I BMW did this sort of massive effort and we sort of took almost our entire middle or portfolio and we're like, okay We're going to get this into containers and you know, we're going to call it cloud packs And we're going to have you know a cloud pack for integration and a cloud pack for applications And when you know, you sort of iterate on these things. So what the the first go round Some of the cloud packs could coexist but some of them couldn't and you kind of think But you know, surely I want to install these things together. But like the reason why they couldn't coexist it was you know really subtle and you sort of once I understood it I was like, okay, actually I kind of get why this happened because um, like if you make a resource it's Usually or it can be scoped to an individual namespace But there's an opportunity for error because someone can Accidentally forget to do the right scoping and then it's just sort of leaked everywhere But if you do a resource definition like a crd that has to have a cluster level scope So if two people have the same bright idea for a crd name You're going to get a collision. Yep and And then you kind of think about like an organization like ibm, you know It's lots of different teams and we're probably you say, oh, you just make a naming convention So but we're probably all going to do like calm to ibm my cool resource And then all of a sudden you've got too many of those So then you have to have like you can do it and you know, we did make it work But you have to have this sort of central governance body That's your naming committee to like look for clashes and stuff. And you know, so it's just it's not free basically Right. There's overhead Again, there's overhead, right. Yeah. Yeah, there's like a human overhead to get that and and you know, all the automated testing and everything as well You know, and then there's a security thing Oh, that whole thing, yeah Yeah Kubernetes is not secure by default So, you know, don't have malicious workloads running in other things in your cluster Please don't expose your api to the public, right? Like, yeah, that's not a good practice Yeah Yeah So I think there's probably like a compromise between The sort of what we do now Which is I don't trust you. I can't figure out my billing and you're across the hall So I'm not going to share my cluster with you and I've just put loads of malicious workloads into one great big cluster And now I can't manage it and we've got you know, weird subtle bugs everywhere because of scope collisions So you can maybe say well like prod We are keeping prod in its own cluster Because you were not that reckless to do anything else But like maybe we just try a little bit harder to get all the others in a cluster and You know, we try and talk to the people across the hall and so I'm going to show my kubernetes ignorance here because I'm an open stacker Um, we have actually had this exact scenario In production before on the same open stack cluster utilizing availability zones So the developers could only Bring up VMs on certain machines. Is there something similar to that in kubernetes where you can just say These users can only go here. Yes Taints and tolerations will allow you to do that or Even namespacing to an extent can like these this group of this team Only has access to these namespaces and so forth so on but that doesn't split any kind of like physical workload Right like okay, that's just all within the same cluster kind of and we can do it to a physical machine right Right like kubernetes auto scaling is based off of like inputs. You give it Essentially, right like you have to say yes, you can auto scale and consume As many aws or ibm cloud or whatever many resources you want Or you can actually put limits on things and like range limits and application limits Are so important not just for right like Auto scaling but for security right like you don't know if an application is compromised But you would know if that application is hitting its cpu limit all the time And it's actually a bitcoin miner running underneath it. So like having those things in place are vitally important So it's more limited to be a quota Than the ability to say these clusters go here only Got it you can say like these clusters like you have like a gpu class Systems that you only run your a ml workloads on you can do that But again, that's you saying You know, you have to add programmatically do that and put that restriction in your cluster Okay, got it similar but slightly different kind of thing if that makes sense Yeah, like we like after I had my sort of horrible problem with the tecton builds and the noisy neighbors I realized I could have just put a quota on it But like it's all it's all sort of you know that cognitive load But it's all you know that sort of extra layer of skill that you have have to have to manage Things so that they behave gracefully in a multi-tenant way and you're already worried about you know Persistent volumes and all these other things that matter to your application You're not gonna sit there and think like oh, what is my profile for this application look like as I'm testing it right like Yeah It'll be fine. It won't bother anybody You'll learn by doing It's hopefully not in prod It's not the prod cluster so we can run builds We keep the broad cluster over there But then like then there's this sort of second thing so say like say, you know We do all these hard things and you know, we get our quotas and everything and we have like the multi-tenancy dream and you know We have this cluster. It's really optimum, you know, lots of applications are running and you know It's a big cluster and the control pain is minimal, you know Is this like great for the climate? Well Like you said Chris, you know, it sort of depends what the workload is if it's bitcoin like no I know Yeah And even if it's not bitcoin Like there's still this question about Is it useful and our industry has this really horrible problem With zombie workloads. Um, I've seen them called the comatose workloads as well, right? It's something like once upon a time it was useful, right? But we don't know what it is No one knows Yeah And like to sort of get a sense of how how big this the scale of the problem is I saw a piece of research and they looked at 16 000 servers and a quarter of them were doing no work And I've seen another statistic that says 30 another study like that's That's a lot and you kind of think like a merchant, you know You could take that one to two percent if you could just find your zombies and just go flip flip flip Just turn it all off see what happens. Yeah Yeah, and again, you know, it's not malicious It's like in a study they said perhaps someone forgot to turn them off and it's like Yeah, and I think we've all Totally been there. So, you know, usually when I show this slide to people they look a bit sort of gray and they say I'll be right back But you cluster Or they forgot about the project. They're no longer doing it. They forgot it was using machines Well, you know software delivery lifecycle We often forget about the end of the life of that software delivery lifecycle, right? Like what do we do when it's over? How do we decommission? We're very bad at that as humans, right? Like we don't think about the end state Yeah, I saw I can't remember what organization it was but they had it because I yeah I think like there's no fun in decommissioning unless you sort of get your thing and what they they would do I think they I think they were moving to the cloud So I think it was about sort of decommissioning their Their physical servers and their physical data centers and like they'd have a party So like when they would sort of get rid of the last rack, they you know They'd have like cake and they'd have you know balloons and they'd like take photographs of like Look, here's where the data center isn't anymore And but you know, you would never do that for an individual kubernetes cluster because we just they're so easy to make Right, which now is right. We've moved from making big clusters to make or you know Individual big clusters to now there's hundreds of smaller clusters kind of scenarios So yeah, and like how do we how do we manage that? So like You know, and I think again like all of us have done this, right? So I was learning kubernetes, you know, my first client project where I was gonna use kubernetes So we spun up the cluster It was you know for a client project So it needed to be a fairly well-powered cluster But then I had too much work in progress and I got called away to something else and two months later I was like, oh wait a minute, you know, we're we're gonna migrate to kubernetes And you know this thing it was like a thousand euros a month this cluster because you know all of those 29 cents I had up I was like boss oops And you know, there's no there's no visibility of it. It's a thing Well, I mean That's almost every hyperscaler right there if there is visibility. It's not real time And that's because of the scale of these things right like getting real-time pricing data or usage data out of hundreds of apis is an engineering feat that has yet to be tackled Yeah Yeah, that's a whole conversation around like Hard limits and you know soft limits and account quotas matter Right, like you just open the can and let everybody say yeah This accounts wide open and take as many things as you want You will find that there's lots of things that people wanted But didn't get rid of Yeah Yeah Governance is yeah and that monitoring it's I mean I think the good thing is that I think it's an area for innovation because there's so many unsolved problems And then such problems are interesting but and and like this sort of You know the impact of them is big as well because like even if you don't care about the whole existential Dread bit and you know, you don't care about the carbon is money and right, you know Both both are probably pretty important Yeah Money is a resource I like the earth is a resource I like Yeah And it's sort of it's quite a convenient thing that at least in terms of cloud spend They're aligned right like it's not like oh, we have to spend lots more money in order to save the world It's like if only we could spend less money and save the world and save right. It's not yeah like Reducing cost does actually help the planet here. There is a corollary. Yeah Yeah, like that same study that said, um, I think the the 30% zombie servers I think I've got it. I've got it later on but they said if you could just sort of sort out your utilization a bit It would be 3.8 billion dollars I think yeah, that's that's huge. That's that's entire industry sectors sometimes right like that's yeah Wow, okay Yeah, so then the sort of the question is Okay, I like the idea of that 3.8 billion. How do I do it? And and you know, it's totally worth doing. Oh, yeah So, yeah, they sort of said like if you got how if you sort of sort of doubt Half of your utilization problem You'd reduce your electricity consumption by 40 Which would be 3.8 billion That's insane Yeah, it's it's so big And I think the problem with it is like some of it is low hanging fruit And some of it Really isn't and it's really hard to figure out how to Do it like even once you have the motivation And so like the sort of the classic scenario for how do we get a grip on this? Is we sort of say let's let's eyeball our estate So I got invited to this meeting and I have to say it was one of the least enjoyable meetings I've ever been to and it was with a uk bank and he sort of was like Going through his estate the cio to sort of try and figure like what are all these workloads and it was like Just going through this list. It's like what's this one? Does anybody know what this one is and basically it was zombie hunting Yeah, but zombie hunting sounds really fun and glamorous and it was it's not I've done it before it's not Yeah And so then you think okay, you know, we are technologists We can come up with a technological solution And let's use tags and that seems really promising right because it's sort of aligned to the capabilities of the platform and it's you know It's a thing But usually with tags I think it still doesn't totally work because two things happen The first thing is that someone forgets to put the right tag on or We sort of have tags that only you know We have to have a big committee to decide what the meaningful tags are and then the changes And the second problem is unless you then have an automation to go through and delete the things with the tags You just have tags that Show the sake of tagging. Yeah. Yeah. So like I I did this like just the other day I went into an account that I hadn't used for a while and I saw all these things saying holly delete me You know, and I knew they were like two years old and I'd never gone back to see those tags. So I think it's not sufficient Again, life's likely. Yeah And I think like a lot of organizations, they're aware of this problem and they're aware that they're leaking money So they say well a tag Is shutting the barn and then going out and deleting it that's sort of shutting the barn door after the horse is left Wouldn't it be better to shut the barn door before the horse is left? And you think the answer would be yes But what actually ends up happening is you sort of end up with this like governance So we say in order to stop wastage, we're going to stop anybody provisioning anything ever I was like, well, that's the wrong method Yeah, like the joy of the cloud, right? Is that it's so easy and it's so frictionless And then if you put all this friction in I'm so not sure it actually helps you remember to delete things it just stops you Doing things. I think, you know, any solution is kind of got to be based on optimizing for what people are actually like and You know behavior. Yeah, the easiest thing to do has to be the thing that ends up being cheapest and best for the planet So I mean tagging is one thing. Are there any other possible solutions to the problem? Yeah, so like I mean, I think we're seeing a lot of innovation here And I think we're going to continue to see more with stuff like fin ops of like Trying to get that real-time information and you know, you were saying it's really hard And then it's sort of even if you get it flowing It's still That doesn't mean anything is going to be any done to it. But like I think I think just making that information more accessible helps a lot and sort of like I um, I read an article the other day About what Spotify are doing and they have this sort of cost insights platform And the idea is just if you just give the engineers visibility Of how much their service is using they will naturally try and optimize it because engineers are natural optimizers. So just maybe Yeah, I think it depends on your incentive structures as well. Right exactly, right? If there's no incentive to optimize your engineers are going to be like, okay tackle next thing Not over optimizing something else because we've kind of beaten people up lately to like stop optimizing code before it's written, right? Yeah, optimize it after the fact kind of deal. And that Has its drawbacks as well Yeah, and there's a cost to that platform as well, you know, like we were saying it's not easy to get that information But like I think as well, you know, sometimes the solution can be really really So that's like super hard and you know pushing the boundaries of technology Sometimes the solution can be really easy and dumb so A colleague of mine, he told me about this thing that they did and so this was 2013 so virtual machines So what they did is you could provision a machine and it would just auto delete after two weeks and They managed to save half of their cpus So like if you needed it extended you could extend it, but the idea is optimize for what people are Bad or rather, you know optimize away the thing that people are bad at because people are awful at remembering to delete things But they're really good at provisioning things So and keeping their thing running. Yeah, exactly. Yeah, you are talking in dev environment here or a prod Because I could see this was bad Yeah It wasn't like Like things would sort of be going down in prod, but I think that's the thing like I think, you know sandboxes Are a big part of where this is coming from But you can do it in prod as well, right? Like, you know, some of these things that they're in prod and we don't know if If we're it's okay to turn it off and so one kind of okay tactic is To turn it off and wait and see who comes yelling at you Or what other system starts alerting or yeah Yeah, and then and they say, what are you doing? And then you say I'm doing chaos testing and then they kind of say, okay, that's okay then and then they sort of go away and They say, oh, I'll add another cluster. It'll fix the problem. Yeah Yeah, and then, you know, you can make a A report saying, you know how you've implemented chaos testing and you know reduce the utilization and you know discovered all of these vulnerabilities And I mean, I think as well like I sort of, you know, mentioned the fin ops I think we probably are going to see more capabilities Coming into the platform. So, you know Stuff like, you know, multi cloud management and multi cluster management where it's sort of just to give you that some kind of picture Over your estate. Yeah, or like, you know, simple things as well Like just graphic monitoring like if nobody's talked to this thing For three months and it hasn't phoned anybody Probably a good candidate to go Right, if it Yeah, that's not talking to anybody. It's time for it to go Yeah, that's a very good point. I've never thought of like, let's just look at traffic from each server and say This is worthwhile. This is not. That's a brilliant example Yeah, and like I think there's probably exceptions. I think there's some workloads where it's doing something like, you know Doing some incredibly intensive modeling calculation where it will come up with the answer 42 But you need to leave it to run for Two years and you know, it's not going to talk to anybody while it's doing that But I think that is probably like the exception. But if your workloads have that characteristic, you're going to know about it Like, yeah, those workloads would be very obvious because It's not very it's not very often you see a long running job in kubernetes, for example, right? Like normally they're very short-lived nowadays Yeah Yeah, the traffic monitoring idea is kind of brilliant Yeah, it's like so simple, isn't it? It's like yeah, like there's nothing coming out of this box Actually, there's nothing coming out of these 10 hundred boxes, right? Like yeah, the only thing going in and out is the ntp service Even then like like that some people might not have even configured that so It could be less than a k of bandwidth used every month, right? Like who knows? Yeah Yeah And so then, you know, the sort of the question is like All of that is kind of independent of kubernetes and then you think well, so is You know We should be going forward with technology and you know, we used to have virtualization and now we have kubernetes So are we better off and you know, like another way of saying it is is kubernetes zombie proof Which is sort of saying, you know, now is the cloud zombie proof? Yeah, and I kind of Think it's really not like it it makes it, you know I said before it makes it so easy to provision things But we still don't have a lot of help in our platforms to help us turn things off And we kind of naturally hate Turning things off and you know We hate getting rid of things and like if you make a cluster and you get it, you know, beautifully configured Even if you're not using it day to day You think well, my boss might come in two weeks and ask me to do a demo of this and you know, or what if I need it? so you just kind of Leave it and right like if you if you make something like you make a cluster There's this thing called the IKEA cognitive bias, which is basically if you make it you like it more So like the more work you put into that cluster, right the more like they want to spin it down. Yeah, exactly He a cognitive bias this is okay late on me And I think like I think now we are starting to see a thing actually where Kubernetes is helping which is the sort of part of the thing that we have to do is we have to make it so that it's not a lot of work to Make the the cluster or you know to make the workload or whatever because then we're more willing to shut it down And so like if we have Get ups and you know by this I don't mean any sort of fancy framework or anything I really just mean, you know, like if if we are good and we keep all our infrastructure as code Then it's disposable So like we have the confidence that we can spin it down Because we know we can spin it back up and then you know, we can do this all day long and you know, just Apply, you know minus f and we get it back And kind of as a bonus as well We're getting disaster recovery because we know if we can get it down to save energy We can Get it back if we actually really need it back And I I think as well like You know the level to which we can optimize I think we don't necessarily think about so when I started talking about this I sort of you know used to make this joke that like on Friday, you know We'd leave the office and you know, we turn off all the lights and we'd spin down all our clusters And then I realized there's actually like people doing that now if you have good enough automation And if you have you know your workforce in a single time zone so that you don't need to You know sort of support a distributed team You can do it and I heard one example and they managed to Shutting down their AWS instances out of ours and save 37 percent of their power bills. So like That's a lot, you know, you take that if it also means you're more resilient and you have more disaster recovery Mm-hmm No, that's a very very valid point, right? Like I did work for an organization where we did do this and one of our data centers, right? We knew that The team that was sitting in Raleigh at the time was only going to use this You know virtual machine cluster for you know testing and development But and that only occurred between the hours of You know seven to seven basically so during the nine hours we shut it off And saved tons of money those dev environments are not cheap and Having them constantly running or reducing their footprint down to where they can spend back up easily come You know monday morning or you know the next morning kind of thing is invaluable to Cost savings and you know energy savings Yeah Yeah, totally So then the sort of the thing is wow, this is amazing. I'm going to do all this and we've saved the world, right? Well You know, there are there are a few sort of things. I think one thing That Is you know, it's a really good concept is this idea of micro optimization theater, which is something that um Jeff Atwood used to to talk about and and I sort of started thinking about it like for me because um I I sort of you know, I When when we traveled I I used to fly And you know, I'd be on the plane. I'd be like doing my tech creating And but I'd always sort of I wanted to be you know sustainable So I'd say well, I'm going to take public transit to and from the airport You know, I don't want to take one of those unsustainable taxis I'm going to you know take a train and that's you know really sustainable And sometimes it was really hard like you know, it would cost more It would take longer But I'd be like yay, I'm a hero because I'm taking the train, you know So then I had this sort of model where like I felt like a hero because I would take a train to the airport Where I would get on a plane It's like well, I think I was solving the wrong problem really and I think Like what we do as individuals is really important But we've sort of seen in lockdown as well that there is a limit to it Like the um, I think in the lockdown the car, you know, we all stopped driving. We all stopped flying And carbon went down something like six percent. So, you know, we do kind of need these bigger systematic Changes as well. So, you know, it's not like we turn off one light and it's all okay and we you know You kind of need to make sure you're you're fixing the the right problem and there's sort of There's a balance right because like you say well like every little helps Surely it's still better for me To take a train than to take a taxi Even if it's not like the big problem But there's sort of an opportunity cost of like There's other things that you could be optimizing if you weren't spending all of your hero points on Taking a train. Yeah going to the train station. Yeah. Yeah And you know I I live Just outside the motor city. So like mass transit is not really a thing here Everybody has a car or two because well Most of the people here work for one of the big three auto manufacturers so having Public transportation is kind of a luxury as it were. Yeah um For some people and yeah, I would love to be able to hop on a train to get to the airport But I don't have that opportunity, right? Like it just doesn't exist We're in the middle of nowhere. We can't Yeah, get it. But yet when I do go to europe or You know larger cities in the u.s. That have it. Yeah, I take the train from the airport You know, I take the train here and there. I'll get on a bus But where I live, that's just not possible Yeah, Boston, I think is my favorite for mass transit because like they've actually put The trains that stops very near everything that I would ever need to get to But yeah, you're totally right like Michigan, Texas we have that problem Yeah, like you need the infrastructure and then once you have the infrastructure everything else kind of follows and it's it's sort of the same like when you're looking at At your cloud and you know, is this running on renewable energy or not, you know, you kind of Need that renewable energy to be there so that you can then make the consumer choice to you know to Choose something that's that's running on it. But if it's not there There there kind of is a limit. Do you think people would If they saw You know renewable energy Instances versus, you know fossil fuel energy instances, would they choose the energy, you know efficient or energy Greener energy sources. Do you think or would they just always go for the lowest price? Yes A good question. I think it would probably depend and you know, again, probably quite rightly depend because it's that sort of optimization like You can imagine scenarios where That money that you are spending on renewable energy Would actually allow your business to do something really way more significant That would have a bigger carbon impact, but you just you sort of need to Do those calculations and try and get the data and you know, it's it's really hard Like you sort of you want to be data driven in those decisions, but we don't have You know, we don't even have the visibility of well, what You know, what is this running on and you know, is this actually Clean energy or is it all just offsets? Which is You know, we're getting better than nothing, but quite quite different. Yeah, it's not. Yeah It's not a one-to-one translation with offsets. Yeah And there's this sort of other problem too when we when you optimize And it has it has an official name, which I never knew about recently, which is jevons paradox But I always think about it like the highway problem. So Our data centers are getting bigger and bigger and the pipe, you know, the network pipes are getting bigger and bigger and And and data centers are also getting more efficient Actually, they're getting significantly more efficient than they used to be so that you sort of think, oh, that's all great You know, everything's getting more efficient. We're going to be using less energy But you know, if you think about it as a road analogy When you widen the roads you kind of imagine that you're going to have like this huge six lane highway And there's going to be like one lonely car, you know going along it with, you know trees in the background And what actually happens is every single car goes. Yay, there's capacity. And so we just Fill it. So we do have to kind of be Careful that we don't optimize so much that then we don't actually end up reducing the carbon. We just end up doing Yeah That's a good point Yeah But I think like overall, you know, you were sort of We were talking about some of the challenges of the monitoring and the real time and stuff and I think I think there are so many There are so many sort of problems here What we kind of know what we need to do We just don't know how to do it like we know we need more information about How to optimize our workloads and we know we need we need, you know, better ways to optimize it and we need more support from, you know Our platforms to give us the information and like all of those are things that you think, ah, that's that's like a problem Yeah, that I could innovate And so that's actually kind of good because this is an opportunity for us to sort of go I had a cool idea and it's made things better What responsibility do cloud providers have because this is In direct, like, I mean they're incentivized to sell you more cloud. All right So the incentives aren't necessarily towards helping you Optimize the incentives or towards like giving you credit. So you buy more stuff Yeah Yeah, like I think I think they there is a consumer pressure on cloud providers And I think we can increase that consumer pressure to say I would like to run my workloads in a green way And I would like information About my workloads and that is going to influence my buying decisions. So, you know If if you're running, you know in virginia and you don't give me any information about this Then I'm going to go to a different data center or a different provider um I think So I think we can um ask for information and Ask for things like hosting I think the reducing the workloads as you say they just don't have enough incentives So I think that's on us to take the information and then do something with it So it's part of this we're talking about the infrastructure and the cloud hosting providers To some degree they're Not aware of what or shouldn't be really right what you're running on top So we go back to that discussion where during after 7 p.m The development machines were turned off But you still have the cost of running those machines Because in a public cloud situation, there's going to be other people running on them So while you're reducing the workload By how hard it's working when you shut your own stuff down on off hours The fact is those machines still have to run for the other customers on those machines That's a good point. Yeah, like I think I think this is sort of one where If we get it right will be in a really good place with public cloud because If the cloud provider will have to have you know, sort of enough machines for the maximum capacity But hopefully The maximum capacity Will be or the you know, the maximum demand will be different for each user So like, um, you know for Wimbledon, for example, they They only have demand in June and July Yeah, they're you know, so all of the servers, you know that they're using can be used for someone else And then you know, maybe somewhere else, it's going to be black friday It's going to be christmas And so it kind of averages out and then even within an individual day as well You're right that like if we all if we were all in the same time zone, we all turned off our machines at 7 p.m It would save a little bit of electricity, but it wouldn't save on the hardware and probably there's sort of, you know a great big thrumming Set of you know open stack or you know, whatever the layer is underneath that's keeping it running that's still using electricity but Probably we're not all in the time same time zone. So it still does sort of balance out a little bit around the clock Right and most businesses have more than one time zone. They're operating in for sure Right, like even in you know here in the u.s. You're just operating in four time zones automatically, right like uk it has one but eu has many right so it's yeah, it's the geography And time of itself is kind of hindering the the savings process potentially Yeah And and it is a good point as well that like the sort of it's not a perfect correlation between Carbon and cost like there's some things that you can do that for whatever commercial reason They're really really cheap and actually they're still kicking out loads of carbon and that you know that might be because There's sort of it's early in the commercial model and you know, so like serverless for example Early on in serverless You know you spun same as with k native You know you spun your instance down to zero and there was still a huge hulking infrastructure behind it Right, but could you have to listen for something? Yeah Yeah But like as a kind of as a like a really rough heuristic It's kind of a a good starting point It's a it's a better starting point than anything else because it's the information that is most easy to get at the moment most visible Any other solutions that we could try we could suggest, you know for I mean So sort of like I've got I've got a summary But it's sort of I think I think a lot of these we've we've kind of talked about but like again I think it's just you know It's sort of visibility optimized visibility optimized. So like if you're You know if you're in the open-source community if you're creating Some of these tools then like really you want to be Doing what you can to help better utilization So you want to be putting in those features that support the elasticity And you want to be putting in those features that support multi-tenancy And as well putting in the features to help your users catch the zombies And then like if you're a user You want to be trying to get that utilization up Taking advantage of those features for elasticity and taking advantage of those features to go hunt your zombies down Are there any Good zombie hunting tools that you're aware of right like call ghostbusters or you know No, not not yet like like I've said, you know, there's sort of the You know traffic monitoring is a solution now, but I don't know of kind of like, you know an easy You know drop in that you just put into your kubernetes cluster that you know I'm just looking at patterns. Maybe your things that we could look for you know traffic monitoring is one that I think you know has a lot of potential and and You know ops do that all the time, but we just need to sort of Surface to a different domain And then yeah, just a lot a lot of automation about making sure it's disposable spinning it down when you don't need it and as well Really considering either chaos testing or you know these kind of like You know sort of Danakley's you know provisioning models You know, it's going to die after two weeks unless unless you do something to save it And you know, they're easy and stupid, but they're surprisingly effective Well 50% is Very much effective. Yes That's huge. Yeah. Yeah, like it's it's ridiculously big like I mean I think it's just something bad about us as people that like we are so lazy and so bad at forgetting to shut things down that You can make that big a saving with such a cheap change It's amazing Is there anything else? I mean, yeah We've come up on time. I have recovered everything that you want to talk about so far. Yeah, okay. Awesome. Wonderful I'll last picture and that's all. All right. Well, thank you so much Yeah, we really appreciate you coming on and this is very thought-provoking and Like I'm already thinking about like my internal like home cloud footprint, right? Like I have computers running everywhere in this house and it's kind of like Could I shut any off? Totally could. Yeah, I totally could shut some off, you know from, you know Seven to seven or whatever like I was saying, but yeah um There's a lot of potential there for savings just Looking at my own stuff. I can't imagine if an organization looked holistically at their Uh environment and said what could we do to just get big whacks of savings here? I want to say wax. I mean like just You know Coming through with a sword essentially Damocles and and saying right just turn it off for a little bit and off you go Or turn it off after three weeks a month. Whatever it is. Just do it Yeah, this has been very thought-provoking and a whole lot of fun. Yeah, I have to say I love your slide deck I love the ice cream cone and now this is the melted ice cream I mean even that just makes you think more because we had our ice cream cone and now we have our melting scoop Yeah melting scoops make me sad to be honest with you. Oh, yeah melting earth would make me more sad. So yeah Thank you so much. Holly. We really appreciate my pleasure. Thank you so much for inviting me. It's been really fun That was our pleasure having you. Yeah So thank you very much audience for tuning in. Uh, we are done for streaming for the day, but Next week is red hat summit. So please sign up for red hat summit and Uh, check out the show we just did two hours ago and you'll learn more about what's coming in part two of red hat summit And you can come join us in the booth Yeah, come join us and Amy'll be there Uh, Holly, I don't know if you'll be at red hat summit. I don't know your agenda, but please come by say hi um, and again, thank you for for You know teasing our brains a little bit here into how we can make the world a better place My pleasure All right. Thank you everyone. Stay safe out there