 Hi, this is your host, Sapin Bhartian. Welcome to TFI. Let's talk. And today, we have with us Roman Hachoski, Principal Architect at Kerry Group. Roman is great to have you on the show. Yeah. Thanks for having me, Sapin. And today, we are going to talk about FinOps or financial operation, which is a practice that combines financial management with cloud operations to help organizations optimize cost and enhance financial accountability in cloud computing environment. We will talk about a lot of aspects of FinOps. So I'm really, really excited to talk to you, Roman. But before we get started, can you talk about the emergence, the origin, the evolution of FinOps, how it came to exist? When companies started moving from data centers into the cloud, they realized that they had tremendous waste opportunity in the cloud. When you're running in a data center, your potential for the waste is limited physically by the boundaries of your data center hardware and software. When you move into the cloud, you can potentially waste several orders of magnitude, more resources. And your move to the cloud can become a complete disaster if you don't control the cost. If you just start VMs left and right, you never shut them down. If you allocate resources that are completely inappropriate for your type of usage, then the cost can get very much out of control. So the FinOps, which is financial operations, kind of similar to DevOps, grew out of that realization by companies that they need to control the cost. And that's how it emerged. The reason it's important is, obviously, if you look, in fact, at some of the recent reports, including from Flexera, across large and small and medium organizations running in the cloud, the cost is number one priority. So that's what we're trying to solve with Caret. What do you think is driving this cost these days? If you look at the unit cost, like, in FinOps, there's this notion of a unit cost. How much are you paying for a unit, for instance, of CPU or gigabyte of memory or gigabyte of network address? Frankly, the unit costs from all of the major cloud providers have been very stable. In fact, Amazon and Google, for some of the cloud services, unit costs were actually going quite a bit down. And more recently, they kind of stabilized. And they're certainly not going up quite as much. So I think the reason the costs are growing up, the number one issue that I believe is the driving factor is lack of incentives for engineering teams and for product teams within organizations to control the costs. And as we all know, humans are driven by incentives. If your incentive is feature, function, delivery, as quick as you can, that's what you're optimizing for. When was the last time when you saw a product design document where cost control and cost optimization was on that design document? I don't think I've seen that very much. I know within some of the companies, that is the case. And they are optimizing for the cost very heavily. And those are the companies that are reaping the big benefits from using the cloud. But for the most part, I think 99% of the organizations, they don't consider the cost. And they're just using too many resources. So it's basically overuse of resources, improper allocation of resources, inability to account. Sometimes product managers don't even know how much their product costs to the organization in terms of the cloud consumption. They get a bill from the cloud provider, AWS, Azure, or GCP. And that bill has maybe rough accounting capability to allocate some of the cost. But for the most part, if you ask an average product manager at an average company, they really have no idea how much their cost to the business. Can you talk about the cultural aspect of FinOps? The cultural aspect is very important. But the culture by itself, you can't just tell people, well, this is the culture we're going to have going forward. I think you need to explain to people why it matters, what it means to them, what it means to the organization. And also, you need to train people on proper practices and also give them incentives. So I think incentives is the number one thing that is driving the culture. And it also needs to be continuous. So you also need to provide visibility and promote the best practices and lessons learned. So it's very much iterative process. It does not happen overnight. But I think with proper leadership on one hand, proper incentives on another hand, as well as visibility and reinforcement of the proper behavior and culture, I think the right behavior will certainly emerge. What are the key principles of FinOps and how do they actually help organizations with cost optimization? Yeah, there are several key principles of FinOps that are quite well understood. And when we try to implement and help companies do that. So number one, we already discussed it a little bit, is collaboration across teams. So it needs to be collaboration between engineering teams, between the finance teams, the business units, and product managers. So they together need to design the incentives, the OKRs, and also KPAIs to measure the achievements. So collaboration is very important. The other principle of FinOps is visibility, because you can't manage what you can't measure. So you've got to have the visibility into your cloud consumption. And we can talk about some of the ways to achieve that visibility, but in a nutshell, you need to provide very easy access to the cost structure and nearly, ideally near real time. So anybody from the regular engineer to product manager to the business owner can see aggregated and sliced and diced cost structure for the project, for the service, for the application. And then another key principle of FinOps is ongoing cost optimization. And that has many angles. The cost optimization has to start with architecture of the application. So we're kind of thinking of a shift left cost concern. You start thinking about the cost when you just designed the application from the very beginning. To give an example, if you need to build a web app on, say, on Google Cloud Platform, you can run it as a Compute Engine VM. You can run it as an App Engine. You can run it on GKE, on Cloud Run, or Cloud Function. So which of those five runtimes do you pick? So the answer, it depends. Depends on the workload profile. Depends on your requirements, on your customer usage, scalability requirements. And you have to decide it upfront, right? So you're shifting left that cost decision. Another aspect of ongoing cost optimization is right sizing the VMs, ongoing monitoring of the usage, memory, CPU, network, egress, storage, all of these components, and rinse and repeat. So you monitor the usage, you optimize your fine tune, you look at the reports of the cost, and then you repeat some of the optimizations using some of the tooling and automation. Doing it by hand is very hard. So you really need to have automation to help you and execute some of the plans and strategies automatically. We like labels, we talk about, of course, SREs. We talk about chaos injuring, we are about platform engineering. So talk a bit about, of course, we'll talk about tools as well. But when it comes to some of the practices, do you also see that these fin-offs practices or cost-optimizing practices also aligns with some of those practices or some of those teams, some of those labels? There's the old and there's a new. So it's a mix. Generally, the most effective fin-offs implementation is when you create a cross-functional fin-offs team that executes the practice, instills the culture, runs the training, planning, and ongoing monitoring. So that fin-offs team usually is a centralized organization. So that's a new thing that generally did not exist before. The old thing is that really everybody is responsible for the cost control. So by having incentives to all of the different groups, engineers, product managers, SREs, everybody is responsible and collaborates together. So it's a mix. To make it effective, you do have to have the centralized fin-offs team and we see a lot of companies in now creating those. And there are ways to collaborate. The transparency is critical. You have to have, again, going back to that visibility as well as cost optimization. So how do you share the best practices? How do you create automated tooling? How do you allow that automated tooling to be reused best practices across the entire organization? So everybody doesn't have to create that will from scratch. What are some common challenges with these teams face or audience face when they do try to implement either fin-offs tools or practices and how they can easily overcome those? Very often we see one of the usual challenges is resistance to change. Everybody likes to do what they used to be doing and not change it to do in a different way or... And the way to overcome this is give proper incentives. So if the team has an incentive to reduce the cost for their project or their API or component, then they will try to do it and they have to be measured at the end of the year, for example, on did they achieve that OCR or not? So that's the organizational aspect. The other thing is we see the obstacle very difficult. Sometimes it's very difficult to get the proper cost attribution. How do you know exactly what this particular component or API or entire product is consuming out of the entire organization cloud bill? So the labeling and tagging of resources at the cloud level can help, but also very often if you do it by hand, it could be very inaccurate. The labels and tags can be inconsistent. So you need to automate it. So generally that labeling and tagging needs to be consistent across the entire company. Their FinOps team will probably control and oversee creation of a vocabulary of these labels and tags, how they're being used. And also really important, use them in a CD pipeline. So all of these resources are tagged automatically and you don't forget to tag something that consumes 50% of your bill. So that automation is very helpful. Another obstacle is again, lack of visibility. If your engineers and product managers cannot see up-to-date information about consumption, it makes it very difficult for them to make these cost-based decisions. So you need to make it as frictionless as possible for them to access all of that cost data. How organizations are kind of leveraging some of the FinOps best practices for cost optimization, or how would you want them to do cost optimization? Cost optimization is a major component of the FinOps practice and framework. And cost optimization, you can separate it in different sub-segments. So one of those is rate optimization. So usually the financial team will do rate optimization using different methods, for example, negotiating a better discount from the cloud provider. That's one example, right? You get the discount for the entire cloud platform or maybe for individual services. Another example, most cloud providers provide something similar to the committed use discounts. So imagine if you have a number of compute engine VMs on GCP, you have a certain baseline. For instance, you're using 5,000 cores and certain amount of memory for those CPUs on a steady basis, but then during the peak workloads you use more. So committed use discounts in case of Google Cloud can give you up to 57% off over three year commitment. But 57% off, that is huge. You can't do that on-prem when you already bought that hardware, right? So that's really important. Another one, right sizing your virtual machines. When you monitor the consumption and you realize you're only using 5% of the CPU time, you can change the shape of those machines, allocate less CPU. In fact, example of GKE optimization, the Google Cloud team published this report. We can maybe even post it in the notes for the interview where they found that one out of 10 GKE clusters GKE clusters is completely idle at any given point of time. That's considering all of the GKE clusters entire across entire Google Cloud platform. Now there are other clusters that are over provisioned. So what Google team found that out of those over provisioned clusters, 30% of these clusters allocate 35 times more resources than they're consuming. So you're basically wasting 30 X of your investment and 10% of those clusters is 100 X over provision. That is gigantic. That is colossal waste of money. They're basically on every dollar that you're actually using, you're wasting $99. That is ridiculous. Another example of cost optimization is high availability. There are certain applications that require high availability, some applications don't. So big principle of FinOps is business driven decisions. So how do you make the business driven decision? You need to consider the importance of the application and the financial impact of application. What happens if application goes down five minutes a year which is three nines availability versus maybe two nines availability? Do you really need to put that instances of that application into multiple regions and replicate the data? The thing is when you replicate data across different regions, you pay egress just like you pay egress for sending data to the end user. And also even when you replicate instances and data within the same region across multiple zones, you still pay for a network egress. So there are all kinds of these considerations that require very careful planning and same goes for storage, like Google Cloud Storage. How long do you wanna keep that data? What kind of compliance, regulatory, auditing concerns you have? Should you enable automatic expiration of your data? So all of these kinds of things across network, compute, storage and bunch of other things, databases, they're all very important and there's an opportunity to set it up correctly and a lot of it can be automated and it needs to be automated to work effectively to reduce the bill sometimes by an order of magnitude. I have one example, I worked with one customer where they were consuming about $200,000 a month of a GCP. Well, when the new CPU type, the E2 CPUs were made available that was several years ago, we suggested to replace their N1 CPUs with E2 CPUs and that customer saved roughly about 40% of their cloud bill. They didn't change the shape of their VMs, they did not change the application. All they did was over a period of several months, they migrated those instances to a new cheaper CPU type because their application did not require the bigger, better CPU and that gave them 40% savings on the cloud bill cost. How is Karek helping talk about some tools or how you help customers organizations to tame this cost with cost optimization with the Phenops practices? Karek has engineers and principle architects who have significant experience with Google Cloud, we're exclusive Google Cloud partner and we have many ex Googlers, we call them Googlers, they used to be Google engineers. So we understand Google Cloud platform very well. We also have folks certified on the Phenops, I should have mentioned the phenops.org, part of the Linux Foundation, that's very important. There's a certification for Phenops practitioner, we have those folks certified and we help our customers to number one, create the Phenops organization within the customer environment, create a charter of the organization, responsibilities, create a plan, we help with implementing the Phenops practices with cost optimization, using some of the tools provided by Google Cloud, including building the looker dashboards or data studio dashboards for the visibility of the cost as well as implementing the cost optimization, architecting applications in a cost effective way, then optimizing storage costs, network compute costs and ongoing monitoring and iterative improvements on that. So it's a fairly sophisticated program, but it's very rewarding and the ROI could be very significant. Roman, thank you so much for taking time out today and I'll talk about, of course, Phenops tools practices, how our organization can, of course, optimize cost. Thanks for sharing a lot of insights about GCP, how sometimes the cloud itself can help with some of those practices and helping with cost optimization. So those are great insights and I would love to chat with you again, thank you. Yeah, thank you, it was fun chatting with you as well.