 Today I'm going to talk to you about platforms, platform engineering and how reality is a bit more complex sometimes than we expect it to be. A bit, if you don't know me yet, I used to be a data scientist in university for like seven years, Germany, China, different countries and then nine years ago I switched into the cloud native space, working with Docker, working with a lot of technologies as a dev world person, developer advocate. I got my CKA at some point, got a bit more technical and then switched to product. I'm now the VP of product at Giant Swarm and I'm also a CNCF ambassador and today we are going to talk about something that I've been working on quite a lot within Giant Swarm but also upstream and that's platforms and platform engineering. Basically everyone's talking about platforms these days and there's going to be, there was like so many KubeCon talks already, I hope I don't need to define platforms by now and I think the most interesting part for me is that we now have an upstream workgroup called workgroup platforms or platforms workgroup within the tag app delivery. There's been a white paper published I think early this year which is really nice, this graphic is from that so if you haven't read that platform white paper, really recommend it. Soon to be published there is a platform maturity model and we just kicked off work on a platform as product paper which also has a research part so if you're an end user and you want to participate then hit me up later so we might set up some qualitative interviews there. The white paper, I think the most important part of that white paper for me was that it stated that a bit trivially you might say developer platforms should be enabling developers right but sometimes we forget that and restating the goals of a platform team is important to think about where our focus should be as platform products and platform people and the three goals that the white paper tells us are the top goals of a platform team is first researching platform user requirements and planning a feature roadmap which is basically very basic product work right but when it gets into where it gets interesting as the second goal is about marketing and evangelizing the platform because if you build it they won't come sometimes especially in a big company if you build a platform people might not even know about it like there's maybe different platforms competing at the same time so you need to actually actively go out and get people to use your product and this is where the third part comes in is you need interfaces documentation APIs how people will use your product you need to meet your end users where they are if you're an organization that lives in confluence maybe documentation and confluence is nice if you're an organization where your developers live on the CLI you need CLIs if they want to automate things and you want to enable them to integrate what you have as a platform into their platforms you might need more of an API direct API approach so oftentimes you need a mix of all of those and especially if you have a lot of different interfaces something like a developer portal also what might make sense because discoverability of platform services and capabilities is oftentimes a problem especially the bigger you get and I couldn't agree more with all of these goals but in reality when you look into platform engineering and platform teams you oftentimes see we don't have the time to focus on these things you're always like on the run and the reality looks a bit like this is an actual picture of me working on platforms and it sounds like really nice your your mission is to like enable developers right and your mission sounds pretty straightforward just put some Kubernetes there and then build a platform there's backstage just put it in front of it it's all going to be nice right but then like actually managing the lifecycle of all the clusters that you have maybe on different clouds maybe on premise managing API deprecations over time maybe every four or five months if you go to the next version then going through basically this whole CNCF landscape choosing the tools is not easy it's it's a it's a task it's 24 seven right it's like every kubecon we have like new versions new tools new projects being announced and it's really nice it's it's exciting but it also creates a lot of frustration and stress for a lot of platform teams because we also have might sometimes competing products and you don't know what to choose and if you choose the wrong one what happens and it's only to set up your platform then you have maintenance bug fixes you need to ensure security have your regulations in there you build your company glue in between all of these things and then sometimes because it's open source you might be forced to actually contribute yourself and this is sometimes really cool this can be really rewarding this can be a great career for yourself but sometimes it also distracts from the main job that you had to enable the developer and to refocus on that we need to shift our mindset a little bit and the mindset this is not just for the product owner it's not just like hire some product people put them in their platform teams and you're going to be done these people and you as an organization need to understand that the mindset of the platform engineering teams needs to shift towards a product mindset towards a mindset that has the point of view of the developer in mind and two tools that I would like to give everyone from kind of the the product space but that work for every engineer we've used them internally quite a lot are one job to be done and user journey mapping and jobs to be done basically means look at what is the job that your end user wants to do right they might come actually to you and say I want Grafana right or I want Instana I want something I want this tool I've seen it on a kubecon I want it right but that's not their job that's not what they want to solve right they don't even need really observability that's not a job to have that might be a job description for some people but it's not a job that like solves a problem for a company it's usually that they have issues either in testing or production and they want to debug it and for that they need data and maybe some dashboards and and and you need to again meet them where they where they prefer it some organizations really like to keep all the observability on cli level and only switch to two dashboards in in some historical cases so you need to understand this this concept of thinking about what is the job that the developer wants to achieve and how can I help them achieve it irregardable of the tool irregardable of the technology and the the the most famous example most probably for jobs to be done and that makes it really understandable is it's usually if you go somewhere and you want to like buy a drill it's usually not that you need a drill you need sometimes a hole in the wall and most probably that's not even your job you want to hang a picture and there's different ways to hang a picture maybe you use some 3m tape or some other means of like projecting the picture on the wall that's the goal that you have it's not to get a hole in the wall or use a drill that's not your goal and similarly this is how we need to think about our features and this makes us focus more away from the tooling and more on on what we want to actually achieve and another tool that we've recently been working and this is actually from Miro from my Miro like a few months ago we've tried to use user journey mapping where you actually kind of try to linearly map a journey of a user through your systems or through their job actually we actually completely refocused saying let's not think about the platform let's think about what is the job of the developer they start a project right at some point they start writing code and then that code needs to go into production right and then that needs to be running in production and within that you can go level deeper and say okay when they start a project they they have a bootstrapping phase where they might have subtasks like oh I need a git repository I need my IDE setup maybe I want code spaces then I want to discover is there APIs I can use is there managed databases in the company then at some point I start writing code so I have interloop development I then go into CI and to CD and at some point I'm in production and I want to ensure reliability and once you have this high level journey then you can go in and think about in which phases can I add to this process can my platform enable the developer or can can I help the company accomplish something like a security policy and it's not about also focusing just on a single space there it's about saying thinking about like where does this like in the process hang so like just think about policy management right we think about policy management and then like at cubecon and we're like okay there's two tools it's caverno or gatekeeper or some other tools so I put them in the cluster then I have policy management that's not how we should think about it right should think about like how do I get policy management as easy as possible to our developers so let's say we have repo templates so could I give the developer in the template already some kind of base policies could I help them adhere to pot security standards for example by having something in the template already that helps them get started shifting everything left shifting the real capability of my platform towards the end user and then already in ci and in internal developer loops I want to validate that everything works so there's no surprise once the developer goes into production and they're like no you forgot to have network policies right I want to think about the whole chain and this how I can think about much more holistic capabilities of the developer and I can think about more and I can break up also team barriers because oftentimes when you grow your platform teams you might have an observability team and a security team and a connectivity team and all these kind of different teams and if you completely treat them separately they're not going to accomplish that much because the integration is what makes it right you want to have integrated services and I'm going to show you an example where this makes a lot of sense to most but before we go to that example let's talk about risk because that's really dear to my heart when you build platforms we've been there right we've had past in the past it's not something new there's been platforms that enable developers and people have loved things like Heroku or even cloud foundry these kind of things because they help them get started very quickly and deploy code into production very quickly but inherently once you abstract away things you also set limitations and sometimes those limitations might be right but oftentimes the more mature a developer gets they hit these limitations and if they're too hard they will work like find workarounds and these workarounds are quite ugly sometimes like not ugly in terms of like it's a bash script no bash bashing here but ugly in terms of they might go around security they might not comply with your regulations they might not have what the best practices inherently they might even lead to complete shadow IT shadow platforms because they were not happy with your platform because you limited them too much sometimes compliance doesn't allow you to to give so much freedom but in most cases there is there is a balance to be taken and then once you start building especially like internal the platform teams for a single company you always focus on your company right naturally you want to solve problems for your company but there is a risk of building very bespoke platforms very proprietary internal APIs and doing everything like very close to your to how your company because your company is always special right every company is special we're different like every time you you think about it like we're different we have more regulations yes usually you're quite different but there's also a lot of common denominators just look at this community look at the end user community everyone has a lot in common and if you build it too specific for yourself you risk getting stuck in the maintenance of your own things right and when once you're you're forking things and building a lot of court yourself you also risk getting overtaken by open source right if you've built your own app management helm deployment github system and there's things from flux and github and and argo out there they might overtake you but you might have not built in oci support and they already have it and you have to always add these features so trying to to not kind of be too specific is is is a good thing here and then also we need to be careful because it's easy to lock in your lock yourself into a single technology or a single tool and and tools naturally promote that right you um but a tool change can be very painful right and in this ecosystem tools change a lot right standards change and trends go through this ecosystem it ebp f used to not be a thing like maybe five years ago but now you might need to change your your your networking towards something like that or you might be currently using some some tool from a from a vendor that is vc funded maybe they get bought maybe they can't continue what happens to the tool right and some strategies and we naturally we're at kubicon and open source summit here the answer for me is always open source and community standards and the risk reduction strategies it's not just like yeah just use open source it's also to think about when you're choosing the right um yeah building blocks for your platform think about is this this this building block that i'm using is this maybe a multi-vendor open source like not being dependent on a single vendor is very good right foundation membership can be quite helpful in that because i mean we've seen companies changing licenses recently and this can be very painful if you have to migrate away from that like maybe someone steps up and has a foundation alternative like we've seen but sometimes it doesn't happen and then you have to migrate away and this distracts you from building value again as a platform team right it's a huge project and all your backlog is going to be delayed by that and really really importantly for me is also building on community standards right this will also abstract away a bit the actual tool right if you use things like gateway api or other apis like csi cni all these community standards this abstraction levels that we've been working on in in the community then it's easier for you to change out the underlying tool without having to change the way your developers work right and sometimes that means also you might introduce some additional abstractions but trying to work on community standards and not build your own abstractions is always uh recommendable here and going back to this first point here of like past like abstractions make abstractions pokeable if you can i'm a big fan of having at least like especially if you look work for example in a git ops way having at least view rights for developers on the platform and them at some point maybe maturing into contributing to your platform and seeing like actually changing changing things and fixing bucks for you you don't need to give them merge rights right you but at least giving them the opportunity to escalate and say hey you didn't think about my use case um i don't know i need tpu's instead of gpu's you didn't think about that could we change that for me and having this at least the process that is easy and and uh doesn't block the developer so they don't need to go with their own credit card to amazon and just uh build their own machine on the side and thinking about both now the product view and this like building on standards i wanted to run through like a very common example that we see oftentimes right it's like progressive rollouts it's a typical use case of thinking how to connect uh different tools and here again what is the job to be done it's not the progressive rollout itself right it's not getting flagger into production or something like that the job to be done is i want to roll out an iteration of my code into production i want to get business value delivered it's it's the only job that i have as a developer right it's not going through the pipeline right that's the job is just getting there and for that i need release engineering capabilities to build and to deploy software there's some connectivity capabilities if you want to dynamically route to like cannery or blue blue green deployments here uh there's some observability capabilities where you want to maybe actively monitor and automatically actually roll back and roll forward into new versions deliver more traffic to the new version and then slowly phase out the old version and because security is important we want also the whole supply chain of this software to be trusted and secure right so now a question how many tools do i need how many cncf tools do i need for this rough ballpark 10 57 depends i think it depends on how you count right like where do you start and when do you stop just let's look at like just the base one right and this is really this is not a recommendation you can use any tool out there for this you can build your own but this is one selection of cncf tools that would most probably work together in this way right like let's assume we're in a github's world we use flux and flagger to trigger a cannery or blue green deployment here and then we use gateway api as standard right and behind that we could use a service mesh like linker d or maybe something from cilium we need a gateway api a compatible ingress controller contour or there's others out there to dynamically route traffic to my new service to the new version and then i use for example something like promissius or something compatible to that to observe this new traffic observe my new service and then give feedback back to flagger and to gateway api to direct more to like roll out the complete service into production right and the six store that's the three tools by the way so you said like six store three six store uh cosine for example to to uh sign my images and trust the supply chain and then i want to check for those signatures in the cluster let's say we use a qverno or gatekeeper for that so we have a secure supply chain but that's just the main use case right i also have i don't know maybe cluster api to to manage my apps and my clusters and then i should build images and store them somewhere so i need a registry that registry should also adhere to standards like the oci standard again right and maybe i get a hosted one or i host my own harbor right and we talked about interfaces it's a complex use case the developer wants to get get like maybe visual feedbacks or that maybe i integrated with backstage and i think flux just announced the backstage plugin alexis was tweeting about it so that's again another tool another plugin that you need to to manage and this is just one capability right it's just one capability out of many that you want to give your ones but this is one of the core ones and you can integrate more into these right and this is just to make make clear that like it's it's not that easy right it's it's it's a complex system we live in and to to survive here we need to basically stand on the shoulder of giants and to summarize basically what we talked about today is yes developer platforms need to enable the developer and while we do that especially the higher we are in the management level we need to understand and not underestimate the effort right there's a lot of people that are needed a lot of resources that are needed to build these developer platforms they're not easy to build and while we're building them we need to really focus on a product mindset on tools like jobs to be done on it or user journey mapping and everyone in the team needs to understand those like especially in engineering it's not something like you hire a product engineer or a product manager and they're solid for you you need also to enable the the whole team towards that then to avoid the risk build on open source don't build that much yourself try to see if there's products out there that can help you that package some of that that help you with this choice talk to people talk to people at cubecon about your choices and for me also much more importantly is stay true to the core of open source and try to if you find use cases that don't fit talk to upstream right involve yourself and it's actually quite easy and nice to become contributors and it's very rewarding like i know it can also be distracting but in some cases it might be the better investment of your time because once you have your use case supported by upstream you don't need to maintain it anymore the community maintains it maintains it right and you don't need to carry the patch or fork or reintegrate every time things change right and hopefully if you rely on these you as a platform team and as a company that is using platform teams and working with platform teams you you you can be happy and also really appreciate it because you're really enabling the developer you're caring for them you're helping them adopt these technologies thank you questions hello so thank you for presentation quick question like how do you measure velocity of successful pattern teams and what are their measurements yeah so it it's very company specific i would say but there's some frameworks that are not bad like a dora matrix for example i can recommend that's a good one i would be careful though to put very specific kpis in there right it's like you should always measure but being too data driven can also be risky because data can lag or be biased and but dora matrix is a is a good framework also there's existing tooling and dashboards for that but i would also think about go and talk to engineering departments engineering managers and ask them like for example with this user journey map that we created we went with that map to end users and ask them which part is slow for you which part is painful even right maybe it's not even slow but it's just painful right and do the same thing even with the platform teams which part is playing painful for the platform teams and they'll sometimes tell you like nobody knows that we have databases as a service right so then you can increase have a measurement for that like adoption rates i love adoption rates as measurements because it shows that people are actually using it and and this is why like it's very company specific i would say but talking to people will help you find the the right measurements for you i want to ask a question about how how the responsibilities divided between the platform team and the application developers some some tasks are not so clear like if application requires the public internet access who should be responsible for configing the or i'm maintaining the load balance and the dns config okay it's not an easy one responsibility is is always hard and usually you you would strive to have clearly defined roles and clearly defined responsibilities and make the limits very clear i do believe that something like for example a load balancer and dns are still on the platform side the developer shouldn't need to care that much about it but sometimes you have this case and this is where i was saying like if you make your system pokeable some like if you manage it as a platform team you need to manage your resources and you need to set some limitations on what you can give as an sla as a as a as a service agreement and then if you have a team that needs to go way beyond that has very specific needs for that you might need to say okay you team if you need that i cannot provide it you need to to manage that yourself but making the the the lines very clear is where is is important here um usually i would say try to standardize and keep ownership within the platform teams and then have good interfaces towards the developer how they can use it in a safe manner and only like in rare cases try to like move the ownership over there it's still hard because um maybe it breaks the load balancer breaks because of misconfiguration because i don't know wrong certificates in the ingress resource if you have a lot of wrong certificates in ingress resources some ingress controllers might fall over right then it's not the responsibility of the platform team anymore but it's it's the platform team's responsibility that takes care of the ingress that they work with their engineers why are you like what like and find out why why are these misconfigurations happening and trying to avoid them in the future either by helping with templating better or automating things that can be automated i would say does that solve your question thank you thanks for sharing i just have a question regarding to how to really attract the developer team to really use the platform product that the protect uh protect uh platform teams are actually delivered just like you mentioned some sometimes there will be fun some work around solution to just work around like the security check something like that so any best practice for it so it's hard to avoid right it's like you you don't have visibility everywhere but um i think the way that you once you realize that or like to avoid it even from the get go is to make things easier right um if everyone's running around with root access it's most probably because it's really hard to do like find great permission control so maybe then you need to invest in either a tool or building some permission management so make like it's it's usually about a usability problem where people get go towards work around or build around security and especially in the security case it's you need to work very closely with the end users to see what helps them because oftentimes with security use cases is like here's all the cvs fix them doesn't work right 800 cvs but if you can prioritize them help them prioritize that give it to some product teams then and then also maybe even create issues for them in jira like we've built in a jira integration for cvs at some point to make it easier for them to prioritize and and understand why it's important okay thank you and also another question is around the value of the platform because i will see that we will be facing some challenges from the management team especially for say oh and you guys know that uh when the investment or the budget for the platform is really huge so how really shows the business value when we are established and kind of platform product yeah business value is is not easy it a little bit goes back to the first question of like usually at some point in management level they understand like develop a velocity to be important for a company like any company not only tech companies software is becoming important and if the company hasn't understood that yet then uh then we have more work to do then we might need a bit more strategy consulting to get them understand but if the if it's understood that software is is really contributing value to the the the the revenue of the company i mean and retail it's easy because it's e-commerce oftentimes but in some companies it's maybe not that easy but once that's understood making clear that this platform and showing that this platform is actually enabling developers to to like deliver value faster and with less downtime and with less problems so we get less support tickets like measuring these and then showing that back to management that helps and also um sometimes you need like bigger new projects or you might need a new team packing that into the business value that this will provide into the value that you expect from this like an automated rollout feature you expect features to land much quicker and with with uh with less problems into in production so showing that and bringing it up to the level of saying okay this will save us x amount of time and or this um for example usually you can say something about time uh spent uh currently right like how much time does that do you take from uh inception to go production right and if you if you can then say okay we we're definitely saving time here and this new project will save more time then that's something that you you can use for budget negotiations and trying to expand the team okay thank you I make a very quick question because we are over time actually and lunch you just had one spicy take in it with the not committing too much on a tool and then go on the thing um I experienced this heavily in a place I worked before you spent so much effort to not lock in yourself into a tool so it's just not contradicting each other to say okay we want to be quick we want to be fast but then we want to stay on a such a high level not taking the benefits of maybe everything because then we are too deep into the tool what is your take on this it's it's a fine line to tread I fully agree it's if you if you're too careful and you build like everything abstracted away and like moving between all the clouds and all the tools it's it's gonna be way too much right but trying at least where it's possible and even given to you maybe by the community like cni cni gateway api all these tools that maybe the community already has worked on and checking if there is people that have worked on on abstractions already at least adopting those and for those not going too deep it'll be hard it still doesn't work always right there's like even with network policies which are a common component sometimes you need more and then you need to go into your specific tool and and and and think about and and use the the specific functionality and sometimes it's really beneficial because you get more secure you get faster and you get a lot of power it's it's just something that you need to think about especially when you see already things on the horizon like if you're currently working for with on validation let's say in Kubernetes clusters there's a lot of tools that you can use right now and they're all proprietary and then core Kubernetes is currently using like trying to introduce cell and other validation mechanisms back into core maybe those can replace some of your use cases in the future if you're thinking about that maybe now investing in a bit of abstraction will help you migrate in the future if you can already anticipate it a little bit cool I think you all want lunch everyone is hungry a little bit thank you