 Hi, this is your host of LimbHartia and welcome to a brand new episode of our series, TFI Topic of the Month, AKA T3M. And this month's topic is platform engineering, is DevOps there. So let's jump right into this topic. And today we have with us once again, Stephen Kim, CTO of Kiarik Group. Stephen, it's great to have you back on the show. Swap, it's always a pleasure to be here and spend time with you. Thank you. It's my honor to host you. And today's topic is close to our heart. DevOps, platform engineering, I mean, you come from the whole, sorry, the whole background there. So this is going to be an interesting discussion today. And I do also know that you have some strong opinions about certain terms. So it's going to be even more interesting. But before we jump into the leads of this discussion, I would like to hear from you. How would you define platform engineering and how you think it's different from DevOps? As you and I've spoken before, SRE is something that's very near and dear to my heart because I think that it really is a tell that an organization understands and has got this right. And I am very pleased to see coming out of the industry an adoption of the sort of the role or the phrase platform engineering. I think that's a really good way to go ahead and put this in the notion of organizations building platform engineering teams and the platform engineering discipline is awesome. And it makes me more than happy to let go of the SRE term and move on to platform engineering as a point of discussion. How would you define platform engineering as a term? And why are we talking about it? What is it important so that we can just set a bit of a how different is it from the DevOps? That's the core question. That's, I'm glad you're asking that. Platform engineering is the practice of allowing the application development team to really purely focus on the development and reliable operation of their application without having to concern themselves with the complexity and the implementation details of really everything else. And that everything else is the platform. And the platform engineering team's responsibility is to build and run and provide as a service that platform, which allows active teams to run in the manner that I just described. I think it can be helpful to go in and share a contrast to the platform engineering model. And that contrast is DevOps. I think these days, if you go to job boards or titles inside organizations, there's still a lot of organizations who still hold onto the DevOps nomenclature. They have a DevOps engineer, right? And I think that the reason that maybe that's stuck is really because similar tools and skill sets are in play in both the DevOps and the platform engineering roles. So it's the tools that back then it was Jenkins. Now it's more, you know, get-of-actions or Tecton or Argo CD or whatever it might be. But back in the DevOps days, the people, specific individuals who were wrangling the tools and the releases were a part of DevOps. And it just kind of stuck because the platform engineers are today in the correct model are still wrangling those tools. And I think it's also worth us discussing, oh wait, just real quick. I know everybody understands this, but bear with me. Okay, because I think it's important to go to recount what the DevOps model was. And even though people might say, no, no, we understand what the DevOps model is and we've moved away from that. But I would say that in your organization, if you have DevOps engineers or platform engineers who, for example, are wrangling releases for multiple app dev teams, then that would be evidence that you are still stuck on the old DevOps model, okay? The old DevOps model was this. You had a dev team, then you had an ops team. The dev team was responsible for writing the application and they would go and package that, build it to a deployable artifact, a binary or a jar file or whatever it might be, and they would hand it over to the ops team. The ops team would take that deployable artifact, place it on a server that they operate and they would run it, meaning they would carry the pagers for if there was anything that was problematic from that application. And the reason I think was because back then, sort of the most visible part of the failure when the application failed, had an outage or had problems, the most visible part of that was from the server point of view. You would go in and look at metrics on the server, use the error logs that are very low level that are written to Syslogs and that's why these people were responsible for it. Now, the problem was that the developers domain of concern and the operators domain of concern had nothing to do with each other. The operators didn't really understand the development point of view. We're talking about application specific characteristics like what was the memory utilization on this thing? What are the failure modes of the application itself and not the platform? And conversely, the application development team didn't really have an appreciation of what it meant to operate this thing, what reliability meant and what they needed to do. And so then DevOps was born. And I know DevOps isn't a team, it's not a function, it's a mentality, right? But realistically, you had people who were tasked, they were commissioned to go and say this is your responsibility to the team. And so the DevOps mentality was born where they tried to understand one another's cadences and they tried to coordinate that a little better, okay? Now, as the new thing these days, I think I'm way behind the curve in saying something like cloud is sort of the new thing. A lot of organizations are moving over to the cloud and what you really have to get right, more than ever on cloud is reliability because it's really, really difficult. If you go in and take the old mentality of how you wrote and deployed and operated applications on let's say a local data center or just in that mindset and you move there over to the cloud, everything from sort of the balances of cap theorem to tangibly how the applications operate, it's likely that your application will run worse in the cloud if you just lift and shift. Just you have shared resources, you have different uptime or availability of the compute resources that you're running on and that model is gonna put you probably in a worse place and it really comes down to complexity of having to learn a whole new platform. So moving away from sort of DevOps over to a platform engineering model, if I were to sort of give you a picture and there are two dimensions, on one dimension is the platform and the application and on the other dimension is development and operations. Okay, the old DevOps model would look like this. The dev team is responsible for the application on the platform, are they gonna use Apache? Are they gonna use JBoss, Tomcat, whatever it might be, whatever their stack is plus the code and they would hand it over to the operations team who would take that and they would operate the platform and the application, that's what they would do. The platform engineering model goes and flips it this way where now you have the application team who's responsible for the application from the development all the way to the operations and then you have the platform team who's responsible for the development and the operation of the platform. So a couple of things here, number one, the advantages of this really the primary one, I would argue, from an application product velocity from a product agility and from a service reliability perspective, a quality of service perspective, you want the people who are developing the application to operate the application because they best understand the failure modes, they both understand the lifecycle and also the cadence of application development and the platform are two different cadences and you typically don't wanna mix different cadences in that way, you don't wanna take a fast running gear and then a slower thoughtful running gear and then jam them together because you're gonna cause a lot of bad experiences in the form of people who are running, trying to run at a particular cadence, all of a sudden have to go and get pulled by somebody else's cadence and how that goes, right? And so if you want the application team to develop and operate the application they're running, well, you really can't expect them to then build up expertise in the application itself and also all the cloud infrastructure. Oh my goodness, I mean Kubernetes resources, Istio and not to mention the actual development platform of every time a Jenkins or a kid of action or a Tecton release workflow goes and fails, if they have to concern themselves with all of that, they're not gonna be able to focus sufficiently on what's happening over there is what you really need to be thinking about. And so another way to go and put it is as organizations move to the cloud and they take on much more complexity than before, some of it just because it's new complexity, the notion of building and operating under a shared services becomes much more important. You have to isolate and segregate complexity and allow engineers to go and develop disciplines in their local domains so that they can run it to excellence. Where did the term come from? Or once again, going back to the problem that this should be the right approach when we talk about cloud, we still have mainframes, we still have on-prem, we still have open stack, but the thing is cloud native is I'm totally wrong. It's not a thing, it's once again doing things, you can run your own data center like a cloud as well, right? It's just that it's sort of somewhere else. So what I want to understand is that if you do look at this approach that you're talking about, which is driving this new approach, is it vendors, maintainers or the user community? Oh, that's a great question. I actually think it's architecture. If you think about, for example, even back to your example of mainframes, mainframes was a particular architecture that took heavy leverage of the vertical integration all the way from the hardware all the way up to the software is what it was doing, right? The benefit of the cloud and which allows you to run at organizational scale versus sort of that compute scale is we have introduced, I think really something that we haven't seen before the degree of consistency and abstraction of APIs. If you think about the cloud, its power is the API. That's what allows you to run on a scale, right? And what the APIs provide are opportunities for wonderful segregation of concerns. They're contracts. There's a cloud infrastructure and it's APIs on how you go to manage the resources in their lifecycle. There is a platform which allows you to go in and manage contracts of how applications go and move through their application lifecycle through development and release and operations. And then you have the application which have their contracts to the end users. So if you look at, for example, something like a mainframe, the vertical isolation, the vertical integration of that really put it to work best to take advantage of that. So in the cloud, we should operate differently to take advantage of the specific strengths of the cloud model, these things. So if you think about not only the cloud API but also the Kubernetes API, the development of that, whether it's the Kubernetes API or as it developed further into the notion of CRDs and the operating and the development model of other platform components. People have often said, I've heard people say that Kubernetes is a platform that you build a platform on top of, right? And really that speaks to the power of the Kubernetes APIs and what that community has been able to go and accomplish. Let me go ahead and share that, I sometimes joke and I hope that they don't feel the same way but I feel like a lot of my years that I spent inside GCP, I spent challenging the Kubernetes organization about their go to market. And if you recall, what is it, six, seven years ago that Kubernetes, how they took this wonderful technology that they made and went out into the world because it really was at the early stages of development was their first front out the door was, here's the Kubernetes resources, here's how the YAML is structured, here is a pod and a replica set, back then replication set, right? Things like that. And my concern about that, my challenge was, we can't possibly go and ask 5,000 developers to get retrained on understanding this on how to go and deploy their application. We must go and provide a platform in the middle that abstracts all these things and the contract should be in atomic units that make sense and to developers and that developer should care about, which is here is a build, here is a deployable artifact, here is a release, here is an operating instance of my application in a cluster or whatever it might be, for example. And so that, I think once again, going back to your question, the change in the model is to take advantage of the new architectures that are forwarded us today that we didn't have back then. How should modern organization, as you said DevOps, the old DevOps model, how should organization approach this platform engineering? Because, okay, before I ask this question, let me ask a different question. How much awareness do you see is there in the industry about platform engineering? Or do you feel, I mean, as you said that companies still have their names there for the job titles, but how much confusion is there? Do you think that the market is maturing on their companies? Do you understand, hey, this is the right approach to once again deal with these complexities and challenges? Yeah, I mean, I'll share from what we've seen the two trajectories into sort of a modern, cloud-native platform engineering approach to their technical organization. And I share the sort of the trajectories because I believe they reflect the motivations, right? That bring them to the water and try to go ahead and take this thing on. Moving to the cloud is risky and it's expensive. It's not something that you do just for fun. There are specific things that you want to get out of that investment and terrible risk that you go out and take on. And the two trajectories that we've seen map to, actually, the two core areas that Kerak works in, which is one is platform modernization and the second one is developer productivity. So if you think about platform modernization, it is from a platform perspective where they say, we want to go to modernize a platform from VMs to container-based or from data center over to the cloud for motivations of cost elasticity, resource elasticity. And what they really want to go to build is an organization that allows the infrastructure to have much more options, right? The second trajectory is for an organization to go and say, we understand, we aspire to get far improved developer productivity and agility and service reliability by moving over to the cloud than we had before. And if you really think about the two trajectories, they meet at the same place. They're two sides of the same coin. They meet at the technical solution, which is a cloud-based infrastructure that allows a platform that allows the developers to go and focus on what they need to do well in order to accomplish the developer productivity, agility, velocity, and the reliability that they're trying to go and get to. Now, the interesting thing there is the biggest challenge and the work that needs to be done to get to that state is mostly not a technical one, but it's an organizational and it's a cultural one, because roles and responsibilities change. And I think in every organization today, that might be overstated, but you will see an operation, something that looks like an operations organization and something that looks like a developer organization. I mean, look at the titles at the top, SVP of development, product development, SVP of infrastructure or something like that, right? And that looks a lot like a DevOps model, right? It was a development team and an operations team. And to try to go and change that to a model where it's like this, as I described before, is one that you need to go through a particular technical implementation, a design and implementation for sure, because the tools need to be there to allow the opportunities for nice contracts and separation of concerns, but it's also the organizational re-jugging that needs to happen so that people fall into place with alongside those technical contracts that needs to happen. So I think the third thing that you'll often see in organizations is the effort to create the platform team, right? You'll see something like the platform engineering team or the dev-infra team or something like that, but they almost always typically belong to the operations team. So they come from an operations mindset and the fact is the platform engineering team is a really interesting one because they themselves are operating applications. Their application is a platform. And so that model where I talked about developers need to go and operate their platform. It's not only the app dev teams, it's the platform engineering teams. They are developing their platform. It has a backlog. You probably benefit from a product manager, right? Because you have customers for your app dev teams. You understand their needs. You need to go and prioritize the features of the platform and as you build that out, whether it's actions, cargo CD, tecton, developing out code to go and do the deployments in a gracefully failing way, then you need to go and operate that platform. You need to have an SLA on that platform so that you make clear to your customers what they can expect from you and they can go ahead and manage around that, right? Just like the app dev teams have a development and they have a release process and they have operations as well. So that organizational change is something that is, I mean, as you can imagine, politics come into play because people who had ownership of certain functions sometimes need to go and seed that responsibility, other areas, and that happens on both sides as well, right? Like app dev teams are no longer going to get to entirely go and pick their platform that they want. They're going to say, no, no, no, no, no, we have to run on Tomcat and we have to run on this. Well, the platform engineering team and the Cloudover team can't go and support 15 different platforms that they need to go and support. So as much as you can, and this isn't 100%, it's going to be like, okay, can we take an 80% approach to one thing or two things that run 80% applications and then can we kind of case by case basis for the rest of them? It's the same thing for all kinds of changes of this matter, but that's the thing that the app dev teams have to give up. They have to seed control of this but what they have to trust is the contract. If you go and follow along these conditions, this platform, this cadence, this SLA, you will be freed to simply focus on application velocity and service reliability. We will go and this is the platform engineering team talking to their customers, the app dev teams. We will give you all the tools that you need to successfully run and meet your objectives. Let's talk about what are the benefits of this new model? There's some sort of extraneous benefits. There are extraneous benefits to operating this model. Engineers actually don't want to own everything. There's sort of this version of engineers want control that says that they want to concern themselves with their full stack, for example, but I have found that engineers given that they have good reliable contracts that they can go to work against actually want to focus on less things. They want to develop excellence in what they do and they recognize that they cannot develop excellence if they're asked to go and scale their applications in a particular user or a scale way while also having to learn and trying to wrangle the Kubernetes YAML and the Jenkins infrastructure and all of those things as well. And so one of the other reasons that I've heard of companies trying to modernize is they want to attract and retain the best engineers is what they want to do. They look at their attrition rate or their retention rate and engineers go where the cool technology is happening, where they feel like they are able to go in focus and be impactful and effective and they're developing in their career. And I don't think I've really met engineers who want to develop themselves in 15 different ways, right? And so having a clean separation of Aptev, Platform Engineering, Cloud Infra allows you to have engineers who develop excellence over time on what they do, engineers who are not distracted by a bunch of other things that they really don't want to care about and allows them to be impactful and the organization to be impactful. How would you define developer experience? What is the importance of having kind of developer experience approach within organization? What are the benefits there? How companies should approach that and why? Let me go out and share a dream that I have. During my years in the engineering organization at Google, our team, the only thing we ever did was write code and really the internal platform you might have heard was called Google Three. One thing that it really did well is it separated concerns. The contract for software engineers was the source repo. They went in their IDE, went and wrote their code and then they want to submit a code. And then the machinery, the Platform Engineering machinery took care of everything else. It went and built, it went and ran a test, it went and did that really, really well, which we can maybe in a different conversation we can go ahead and get into because there's a lot of technical details in there. And really the only time that developers was ever sort of tapped on the shoulder was when there was a problem that they can fix. So they weren't ever tapped on the shoulder with, oh, this Borg cell had this problem where it could not deploy this because of some resource or something like that. I'm making stuff up, right? And it was wonderful. I think during the time I never interacted with somebody from the Platform Engineering team, not a human being, because they went and held their SLAs. I went and interacted with a contract, right? And there was a wonderful separation of what it was. And so my dream is to rebuild that experience out in the real world. And I believe it's possible. But once again, the challenges are not only technical, but it is organizational and mindset and cultural as well. Mr. Yen, thank you so much for taking time out today and talk about this topic with me. And I would love to have you back on the show. Thank you. Thank you very much. I love talking about this stuff. It's a passion of mine. I appreciate it.