 Okay, so we are going to together actually architect a dynamic internal developer platform. And so, first of all, I want to quickly speak about the making off of this reference architectures what problems we aim to solve the design principles that we applied. And we're going to go to the actual design. And then I'm going to like a dry run walkthrough if you want. And I've asked in the platform con for our last conference where we're expecting up to 20,000 people this year. I've really asked all the speakers. Hey, please focus on reference architecture so I hope there are a lot more reference architectures coming. We had 500, 500 proposals for talks. I think we accepted 100 so super excited about that make sure you have a look at this as well. So, and again, I talked about that McKinsey is working on this, starting with Amazon and then we have a lot of others coming. But there are a lot of different enterprises and actual platforms that we looked at for that and we've used common patterns. And I think it's worth noticing that we have an like a like a size buyers. All right, so those architectures are designed for teams with up to with not up to, you know, thousands of users but from, you know, minimum 50 users rather 100 users onwards. So if you're like a cool startup and you're just starting off and you're fresh and standardization doesn't matter please use for a sell or heroku. And that this is actually, I think something pretty cool. Okay, and let's go ahead. What are the problems that we wanted to solve the usual things that why we're doing platform engineering. We have long lead times. It just takes too long time to market. We're in a recession. It needs to go faster. Otherwise, executives are sad and then we have too much ticket ops and we have high cost of maintenance and we have overwhelmed developers. And we have waiting times and we've missing self service. And that's because our setup is unstructured. It's because we have too many versions of Postgres and we need to maintain them and we have too many files here and too many terrible from files and the communication doesn't work. And now we want to get into a situation low lead time. We want to have a high degree of standardization. I cannot say that enough. That's that's the key thing here. We need to design systems we're getting to get into the design principles that drive standardization by design. We want to have to a certain degree. And many people have crucified me for this but we want to have a certain degree of separation of concern. Yes, ownership of the services. Yes, to a certain degree you build it you run it but separation of concern is nothing to be afraid of abstractions are nothing to be afraid of as long as you don't take context and then we want to have self service. And that is another principle here. Okay. Yeah, exactly. And so the, the, the, the sequencing if you want, and took me a while to understand this, you want to be able to dynamically generate configurations we want to have however you do this but we want to have and I'm going to explain what that means a design that is able to dynamically generate application infrastructure configurations with every deployment, because the drives standardization by design by doing this we enable all of these things self service eliminate ticket ops golden path and I'm going to be very concrete. It's another little bit of buzzwords but I'm going to be very concrete. And because of this we're slashing the time so this is actually sort of the logical thing here. And I want to make sure that we give the right people credits, because I didn't coat this. This has been done by Stefan Schneider, Mike Gatto and Marco Maruri and I think we'll hear a lot more from them. And, and I wanted to insert photos but I didn't find good photos. So we'll probably do that. Hopefully, when we send this out. What is a reference architecture first of all those are standard patterns on how to combine frequently use tools and design choices. We'll have them as visual flow diagrams and you're going to see that now we have that package just code that's an open sourcing we have white papers on this we can share that. And then we are going to have we don't have that yet tutorials on how to actually interact with them and we're going to do a lot of this. Cool. What are the design principles that we've applied. Number one golden paths over cages. What does that mean. It means very concretely. Follow this platform use this if you use this you get certain guarantees if you want to go off the golden path, please do so opaque abstractions we don't force anybody. You can do whatever you want you can go down to the lowest level but then my friend you're on your own. Second, we want to have standardization by design. What does that concretely mean by using the system to deploy and deliver your application. Keeping the degree of standardization as is, or we are improving it by deploying more by adding more services, we are not making it more complex and that's a very, very, very difficult thing to achieve. But that's what we've got to apply to apply here. It's possible through dynamic over static configurations that means rather than having. I don't know terraform home combinations that you need to maintain and they're really clean at day zero because you fork a something from a service catalog. But then they start to railing through the lifetime of the application through the environments rather than that we treat every day like they zero. If there's anything I want you to take away. Every day like day zero regenerate app and infrastructure configurations with every single deployment. And then we want to have code for code first. Very important. Modern enterprise. We have we want to have we want to make sure that we have disaster recovery. We want to make sure everything is as code and we never break the workload of the developer where is the developer in code. And that doesn't mean you can't put a user interface on top, we're going to have that as well and you can do all sorts of fancy things and CL eyes and yada yada yada. And I even advocate for leaving interface choice, but the code needs to be the single source of truth. Okay, cool. So that's it, right. This is the reference architecture. And, and this is a lot of lots of logos, let me say one first thing. You can interchange all of the logos. You know, you can, you can use different CIPa plans you can use five different CIP providers you can use three different registry providers can use different orchestrators different resource plans doesn't matter. That logical design is all the same. And we have five layers. And for me, this was also McKinsey's proposal so you know I'm piggybacking on their fame, but they have they they they dissected this into five different control plane layers you have the developer control plane layer, which is the interaction between the developers and platform engineers and everybody with the platform. You have the integration delivery plane, you have the resource plane, you have the monitoring plane and you have the security plan. And what I want to do right now is I want to go through the different planes, explain what they are, explain the design choice, and then what we're going to do as a in the next step we're actually going to follow a simple deployment because the unit of change in this platform is the deployment. And so, all of these activities always happen at deployment time that's our pulse of business if you want. And that's exactly what we're going to do here let's sort of a couple of observations we've within this reference architecture chose backstage as a portal, because it's the most commonly used. There's another one coming with GCP. And that's using Atlassian compass, which I think is sort of the second big player in that important space, not many know it yet, but it's, we're seeing it in more and more cases and it's accelerating really really fast and if you're in the in the Atlassian ecosystem I'm personally a big fan. Then we have obviously version control we're using scores a workload specification in this case infrastructure is cut with Terraform but you could swap that logo with crossplane or with Pulumi or whatever. And we have get actions here, Amazon ECR, you may take as a platform orchestrator. This is I think a little misleading, the actual implementation, we're using our CD which is the most the commonly used and has much more beef. And then you have the resource plane there here and this I think I've seen a version with with data doc very helpful and this is has to come cloud evolved again, the usual suspects. Cool. Now, let's actually get cracking and dissect this so developer control plane level. Okay, we looked at this already a little bit. And obviously, let's zoom in. And the first thing that I want to zoom into the thing that not everybody's using yet is a little here let's start with this first the workload specification. What is that actually, it's a general recipe on how my workload relates to dependent resources. If you want it's a general way of explaining my architecture to a system. And I'm explaining this in an environment agnostic way which means that the description is, it doesn't matter whether it's staging or production. The database my in this example, Python service depends on a resource of database Postgres storage s3 and then a type DNS and then we have a connection string that tells my workload how to connect to my database. And I, this is actually how the score file looks so it's not. I'm depending on that particular RDS database no we're saying, I'm depending on a database of type Postgres. There's exactly one score file, one workload specification and you can use score you can use anything else that sort of doesn't matter we're using score because you know it's the open source project I worked on so I'm in love with it but you know what, you can write your own many enterprises have their own version of this. And you have one of those workload specifications and they sit next to your workload source code, and, and you will then localize them on an environment by environment basis so that's the workload specification this is where, if you want the developer with every single deployment orders what they need. Okay. And then the other interfaces are used in different situations. So, and I listed this and we're going to send it over but let's actually have a look at and this is not data driven. You know, I've jotted my, my observations down and I would have loved to give you this is against 1000 people but that's really just what I'm observing so usually for deployment you have your good push flow right terminal ID. You have your configurations and change to configurations you have the workload specification. Same thing if you want to add and remove a resource and that's already like a hint. Now you can actually add a resource by just saying hey, I'm adding another resource definition to the workload specification can do that on the go. And by just doing a good push it will create that resource and wire this up, promote this to the next environment, and it will do that in the next environment so that's the beauty here. So this is what we're using for removing resources then for rollbacks and diffs because we have that single source of truth we would probably use an API CLI UI resource detail configuration so how is the conflict of the S3 bucket actually looking definitely infrastructure as code predominant method still terraform spinning up a new environment by definition if every day is day zero you can also do that. Similarly API CLI UI same for logs although that would probably be aggregated as the portal layer. I don't think that this is so correct. Then the portal, the service catalog, you would use that. If you want to, you know, search what microservice was we already have, we have an inner source use case has this been done before, definitely portal, and we have a service create like a scaffolding case right. The portal in combination with the templating API of something like GitHub or GitHub. So those are the interfaces is the answer it, it depends what the user is using, and the answer is hey, different users, different choices. If you have a couple of dozen developers, some developers like the CLI some developers like to use the code base. And I'm a big fan if you really want to drive adoption, the systems should be designed in a way that they can actually choose this. Cool. So this is our developer control plane layer. The next element is the integration into the replay and a number of these things are obviously familiar we have our CI pipeline building the code and then we need to push the code somewhere. And in that case, it's a registry and I'm using ECR here could be any other. And then we have the orchestrator component and the orchestrator is basically a configuration file generator, like a smart conflict file generator with sophisticated RBEC functionality could deploy could create infrastructure, but could also just call other API is like call Terraform enterprise call. I don't know hand over to August CD whatever, but the job of the orchestrator is to read that score file and say, Hey, okay, well, where are you deploying to this type staging? Well, perfect. Let me dig up what database to wire you or what to create for you, create the app conflicts, pull the secrets and then actually hand over to deploy to deploy itself. So that's the integration and delivery plane. Let's go to the next element the resource control plan also comparably straightforward what we what we have here those are the infrastructure components you're using today. In this example. So, out of the box this platform is good just to give you a couple of examples is spinning up an EKS cluster, and then you know you can order namespaces or whatever you have RDS as a database route 53 Amazon as the design of this architecture allows you to really genuinely use whatever you want. You can use, you know, you can have your postgres running on a VM in a basement of your grandmother that you've never looked at in the past ages and you can wire this up. Anything that can, you know, construct a, you know, has an API basically is something you can you can wire here. So that's the resource and control plane. And because we are applying dynamic configuration management, it's totally possible to have many cloud estates. So, another thing that's coming out is this in a multi cloud example. This multi cloud or multi cluster in this case doesn't really, you know, is that much of a difference then because the actual configurations are created with every deployment. The configurations are created against the target environment. So that means, if you're deploying to Amazon, or you're deploying to Google, you know, it doesn't matter that much, because that's going to be localized. What this also supports is fairly complicated resource combination orchestration if you want so let's say, give you an example the developer in the score file says hey I need DNS. But what that actually means in the real world is, yeah, you need DNS you probably also need a certificate issue, right. And, you know, you need ingress. And that means that simple request DNS. If we would just translate that that wouldn't be enough. So we that that would actually create an acidic resource graph and sequence the creation of these elements so they're actually ready in combination. Which also allows you to then QA this better and operate against this more more sustainably in its game. So that's the resource control plane. Then we have the monitoring and logging plane. And here this is too simple. So actually, if you know if I have a couple of areas where I think we need to improve this. Then that's definitely the monitoring and logging claim because the reality is, you have a PM right you want to see what's going on my infrastructure. You have a whole topic around error messaging. So how do you surface errors from different infrastructure components centrally, you have a whole conversation around. And then you want to center like stream centralized the elements of the or the outputs of the workflow. How is CD performing. Are there any sign ups that need to happen. All of these things need to actually be aggregated, and that's a pretty non trivial tasks so actually here you need more, and then you probably want to aggregate that on an orchestrator API level or on a portal level. So that's actually easier to consume for the for the for the user. So that's the monitoring and logging plane but you know pretty much you want to have the observability tied in. So basically, if you're creating creating the app conflicts with every single deployment, you know you can enforce the use of side cars labels and annotations, make sure that you really have good coverage here, and make sure you can actually centrally pull in all the locks on a workload by workload basis, and really see what's what's going on. So that's either monitoring and logging plan, and then this the next element where I think, you know, real life is a lot more complex. And you know now secrets management with with vault, but then you know we often have policy systems compliance based translator elements that say, Well, in this situation, you need to create a ticket and it needs to be approved and it goes into production. So I think all of those things also need to find their way in. So this is heavily simplified but a good good starting point I think. Yeah, so now those are all of these different elements. So now, what I actually want to do, let's quickly go back to our beautiful overview graphic here. I want to actually follow a deployment our unit, our rhythm of business, all the way from the git push down to the running application to the configured And, and before we do that I want you to understand that again that idea of dynamic configuration management and nine out of 10 people when they say, yeah, I get dynamic configuration management now don't really get it because it's very different to the approach that we've been, you know, approaching for a good decade or so. So let's go through this again. Let's talk about dynamic configuration management as an asynchronous love affair between platform engineering and application developers, which is such an unsatisfying experience right having an asynchronous love affair. But and that's really what it is so do you have that abstract description of the world. Hey, I'm a developer I have a workload it depends on a database of type posters I'm describing what I need. So now, I'm sending that through my CI pipeline. And by doing that, by definition I'm indicating a context, you know, otherwise that that couldn't build so it's a tag, or, and then I have a matching criteria. I'm deploying to an environment of type staging. I'm deploying to an environment of type production. I'm in a general environment, I have an app ID, I have a test ID, I, I have an end ID and you get the drill so there's some sort of indication of context. How we're doing the indication, you know, subject to the specific respect of organization and whatever but we have a context. Now, I'm sending the context it's making its way through CI. I've applied to change to a workload source code. And with every deployment, I've used that sentence a lot but it's also very important with every deployment. The score file is making its way through my CI pipeline and it hits the orchestrator. And now the orchestrator is performing an RMCD execution pattern. And it's called RMCD because of the song YMCA. That's actually the only reason because YMCA you'll always remember this and then you think about RMCD. And it's easy to remember and you have the tunes in your ears and RMCD means read, match, create, deploy. I'm reading the file. And the orchestrator says, oh, there is a file. And because we treat every day like day zero, the orchestrator has no prejudice so the orchestrator thinks, let me read that file and figure out what that workload needs and reads the file, figures out, oh, it's a Postgres, needs a DNS, needs an S3, okay, fine. Now, next step, read, match, match, what's the context. And the orchestrator again looks at the metadata and says, oh, yeah, this is an environment to a, this is a deployment to an environment of type staging. Well, let me look, make a look up what the platform engineering team wants me to use in that circumstance. And today in that case we say, oh, this is just a deployment to development. I just want you to use the, there is already like a Postgres database and I don't want you to create a new one I just want you to route to the existing one. Okay says the orchestrator, I'm going to go over to the next stage, the create stage and the create stage, we're going to say okay we'll create application and application configurations by applying that workload to baseline conflicts, you know, think of them and then we are actually going to create or wire existing infrastructure components in the very trivial case will just reach out to the Amazon API and fetch the latest credentials, and then inject them through secrets at runtime into the container. And all in the more complex case we're going to do advanced resource, a cyclic resource graphs and we'll do all sorts of procedures in the background, and then we'll, we'll route everything up. Now, and then there's the deployment stage at the deployment stage, we again have, you know, you guessed it, code files, and those code files can sit in repository and then something like our goal can take over and or we're handing over to other orchestration things all your orchestrator can do that in house if you want you know that depends on the on the on the systems you're using. But in the end that's what we have every single deployment we have dynamically regeneration configurations and we have that hot juicy asynchronous love affair between platform engineering and application developers. Good. And what we're going to do now is, and that's like a world premiere we're actually going to look at a golden path. So, and a golden path is something that the developers can use and if they stay on this they get certain guarantees. And I want to show you the first golden path now. And it's a golden path that's so trivial and trivial you'll think, what the fuck Casper my github actions can do that today. What are you talking about. It's deployment to development. Nothing is special about deploying to development really like nothing. You're taking the most trivial case, you'll get, you'll get an understanding on how this actually works and pens out, and you will understand. Even in that trivial case. What is the beauty of standardization by design. Okay, so we're deploying to development with applied to change to our little Python service. It's very special get push. Let's let's let it run our workload specification also runs because it's always being sent. And we now have the context context environment equals development environment type actually equals development. And then we are going to hit at some point, the orchestrator component and the orchestrator is thinking it's tuned read match create deploy RMCD. It's reading this it's matching the context it's creating the conflict it's configuring the resources in this case, it's really just reaching out because the resources are already there. And then it's deploying and we have our clusters configured and everything is wired and everybody's happy. Now again you're looking at this and saying okay well great you know whatever you have a deployment to development congratulations. Well the thing is, while this is the case, there are like a ton of things that could have changed under the hood. Let me give you a couple of examples. And you might have changed the labels and annotations that you want in workloads. Right. You might want a different side card you might want to update postgres version 14 from to postgres version 15. You have different, I can give you like hundreds of examples. Now, in the normal world, you would send out a request to all developers can you please update that and then after we do this again and nobody does it and then, and after four weeks you ask your CTO to send the email that we've all done that a couple of times. Now in this world, all you change is the baseline configuration. In the next deployment, these things are dynamically flowing in standardization by design by just using the system. It is standardizing itself. So that's the beauty. And now, because we are dynamically creating stuff. We can actually take this to the next level and we can go into the scaffolding workflows and we can go into the creation of a new environment for instance and let's actually do this and this is my golden path scenario number two. We create a new environment and what's the developer experience incredibly simple. We take the exact same score file we take the exact same code. And the only thing we change is that we indicate a new context and the context in this case is, hey, this is an environment of type ephemeral. It hits the orchestrator. The orchestrator says our MCD exactly read match. Hey, environment equals ephemeral. All right. Well, now I'm going to create a postgres. I'm going to create an S3 bucket. I'm going to create a DNS. Wait a minute DNS. We need ingress and certificate of cool. I can do that as well. I'll use Terraform or crossplane or I will automatically send an SMS to the Vatican and the Pope will do it himself and then send back the credentials. We do that better. We do that. And then we create the application configurations we receive the credentials secrets we wire everything up. Yeah, and then it's deployed and we have a new namespace. We have RDSS 3 DNS, everything is in the latest state. Okay, that's the second gold path. And this also means, well, it doesn't matter whether this is the first time or the 10th time, this could be a scaffolding. It could be after two years that this application exists. And this is running through completely automated and making sure everything is tidy and clean tool. Next example here is just for the sake of a deployment of production. What's the message I want to send the message I want to send is is the exact same thing, you know, indicate context and then we have everything running. The more important example is what happens if you want to go off the golden path, you know, I'm always another nice buzzword that I'm always saying is golden path not cages, you know. And so how do you ensure that this isn't a golden golden golden cage, right, because I mean these, these things are great but there's always that situation where you need a resource and the platform engineering team hasn't provided it yet. And it's a very, very frustrating experience for the user, because now you have the feeling Oh, you know, I've been restricted and I have to wait for another team and how is this better than what I have today. And the wonderful thing here is you can give an answer to that. And the answer is going off the golden path. Here's an example of a Rango DB, which I'm always using because a Rango DB is something you never need until you need it. And so, then you say, Well, I need a Rango DB, but, and you're sending the request to the orchestrator. And maybe that has dry around functionality. I'm telling you, hey, Rango, I don't know that I have no idea what to do. So, you know, in, in most cases you would be stuck now, not in this case, you can actually teach the system. How to create a Rango DB, right. And that is also what I mean by layered abstractions, if assuming the developer has the robust access control to do so. But now go in and actually define how a Rango DB is set up that, that, that resource knowledge graph, if you, if you want, can actually learn the century and this is leading to something that is a was a revelation for me two weeks ago, that if you want platform engineering, good platform engineering is actually about cutting and designing your repositories in a sustainable way. I was mind blowing for me when I when I, when I first person, but then actually there's a lot of truth to this we're going to look at this in a second. And so that's exactly what you can do here. Hey, you're just teaching that system. Well, by the way, this is how you create a Rango DB. And in those situations, if those matching criteria are met, should you actually be using a Rango DB. All right, so again, it's the orchestrator R and CD and then we have a Rango DB but the next time a developer now needs a Rango DB and the matching criteria hit back a Rango DB is there, right. And so that's actually what I want to double down a little bit a little bit more that platform engineering is about structuring repositories and a good setup right has a certain has a different repository structure than a lot of the repository structures that were that we've been used to. So, and if I'm, if I'm saying developer owned or platform engineering owned. I'm not saying that this is mandatory. And I'm not saying that sometimes there are developers that have both roles. That really, really depends on the compliance and security posture of your respective organization. Right. So, I'm just, I was just in London yesterday with a, you know, large bank, and at that large bank. The, the division would be very, very extreme. You know, you, it's not in the realm of possibilities that a developer is allowed to change the respective configurations of an S3 bucket in production, but you know, this is not going to happen. But let's say you are in a in a less security heavy environment, then you could absolutely say, hey, as a developer, you can maybe change everything globally until you get into a pre product and you can actually send a pull request to the global configuration of an S3 bucket. But what's different and it's actually have a look at this is that we do separate these things out so the developer actually now primarily owns the workload. You have the work in your in every workload repository, you have the workload source code, you have a workload specification file, you know, you have your Docker file, you have your pipeline YAML. And then you have the things that are cross cutting across the organization. And the first thing that I want to look at with you, again, is the workload profile here in the middle. So you can think of the workload profile I said that before like an empty Helm chart, and it contains the things we need to create the application configurations that could be CPU minimum allocation labels and annotations. Maybe the fact that has your goal to start up as a sidecar, all of these things. Now, workload profiles are have a one on one relationship with a workload. But no, it's not one on one, it's one too many so one workload can actually workload profile can be used in many, many different workloads. Because in reality, and I can tell you that firsthand, even if you're a very, very, very large organization, you maybe need three, four, five different workload profiles, that is pretty much it. Now then we have the, the way that resources are created. And that again, or wired and that again differentiates static so resources that you don't want to have lifecycle owned by the platform. They are just referenced so you can register them and say, Hey, by the way, we have this Postgres, it's already there. If somebody hits those criteria please route. And then you have dynamic resources, and those are resources that if somebody needs an S3, and it needs to be spun up, please use this template. So this would be usually infrastructure as code. The difference is, you now don't have. In this absurd case today where they had 470 different ways of creating Postgres across all teams. And you can say, you know, why not, but you can also say, like, why would that be a good idea. It's, there are no, it doesn't make sense that I'm not, you don't need for 70 different ways to instantiate Postgres this is not necessary like you maybe need a couple. And so that's what you would actually have here. You have resource definitions and resource definitions. You know, there's a terraform provider for an orchestrator for instance that says, if the nth ID is staging and the ID is ABC, then please use the following resource drivers. And the last thing are certain automations compliance elements and all of these things. And that's it. And so, if, if you want to have that case a Rango DB, well what you actually do is that you go into the terraform provider for the orchestrator and by the way we can do that together it's a fun exercise. And so this is the provider, you go into the terraform provider you go into the documentation. Boom, and we read this documentation from from from the bottom. Yeah, this is actually a terrible example. Let's look at this. So you say, we'll have this one. Okay, so you say, if the criteria at ID equals test app. And the user wants a Postgres. I want you to use the following resource definition. Period, that's it. Right, or let's take another example. If nth type is staging an app ideas test app and nth type is development and ideas test app then please at the request of GKE use this resource definition. That's it. It's not super complicated. So that means in that case right where we want to train the system how to create a Rango DB, what that train the system actually means is you are extending that central logic on when to use what cool. All right, so now, you know, I bombarded you with stuff. I'm sure you're completely overwhelmed. If you want to be even more overwhelmed, then you can read the white paper on this that that is, there's one coming out on McKinsey.com and I wrote another one that, you know, that adds more of these these things in a little more, you know, detailed 36 pages so if you're at home with your partner and you really want to have a great thing to read at night 36 pages of juicy platform content. Definitely cool. And the package version is ready. It's coming hopefully soon. There is no tutorials. So it's extremely hard for it to use for you. But I'm hoping some of these guys can write tutorials and otherwise. I hope somebody else doesn't or I'll do it. And then you can actually try these things yourself. And, you know, why everything up and package this. Cool. And then, you know, last comment there like a ton, ton more coming at the GCP one is in development right now. And then I think next is Azure and then either multi cloud and then open shift. So, yeah, pretty cool. All right. Cool. Thank you so much Casper and hope everyone have a great end of today or beginning of the day. Bye.