 Hello, everyone. Welcome to Empowering Users through Platform Engineering. We are super stoked to be here today, and thank you, everyone, who made it out to this talk. Now, before we get started, let us tell you a little bit about ourselves. My name is Adriana Villela, and I am a senior developer advocate at ServiceNow, Cloud Observability, formerly LightStep. Say that three times fast. I dare you. I love solving hard problems, whether it's as part of my day job or on a rock wall. I am a serial bolder, and I actually checked out a bouldering gym this week twice. I'm not obsessed. I am a CNCF ambassador, second year HashiCorp ambassador. I love tech blogging. I have a podcast called Geeking Out, and I have been doing the computer things for 30-plus years. I learned basic at age 10 thanks to my dad, which makes me old. Hey, y'all. My name is Ana Margarita Medina. I might not be as old, and I don't think she's that old either. But I also got started in tech various years ago. I got started with basic programming for video games, and then later front-end. But now I work alongside Adriana as a staff developer advocate at ServiceNow, Cloud Observability. I'm also part of the CNCF ambassadors and a two-time AWS community builder. When I'm not on the road speaking or coding, you can find me making lots of Latin food. And panadas are my favorite thing to cook. And I love watercoloring and doing fun makeup looks as well. I'm also part of the Kubernetes Code of Conduct Committee and part of the CNCF's captain project. And with that, we're going to get the talk started. Yay. All right. So we all know that software has changed drastically in the last 20-plus years. Like, I graduated from university in 2001, which feels like yesterday, but was forever ago. And back then, we were barely talking of the idea of being able to surf the web on our mobile phones, shout out to flip phones. Google was just becoming a thing. It was around for three years. Java was the hot new language. Diskettes, three and a half inch floppies were still around, being replaced slowly by USB flash drives. Monoliths were king. And cloud, what cloud? I think what you mean to say is that the cloud is just a big floppy disk. Floppy disks are back, right? Totally. Well, we do know software has been changing, but the way that we work with software has been changing constantly as well. And that's actually been dependent on the size of your organization or the size of your customers. We've seen this happen over the last 20 years through different movements. When DevOps started, we can see that the focus was all about learning and collaboration. But there wasn't as much focus on customer impact and reliability. So SRE came along, and that came along at a perfect time. More businesses were relying on online experiences to bring in all the revenue. We also have this thing called a pandemic that had a lot of companies just relying on online experiences, meaning that they hired on more SREs and stuff. So now we're at the other part of our, whoopsie, of this movement where we're now seeing how SRE focused on the things that we learned in DevOps and then put the focus on reliability. Platform engineering is doing something similar and putting a focus on developer experience. But we also have to remember that platform engineering is not just about developer experience. It encapsulates security considerations, reliability, and a few other things. And you might be here in the room sitting saying, well, like, is it SRE? Is it platform engineering? Is it DevOps? That's not the debate. That's not the debate we're wanting to have. We're wanting to say that we can all come together in, like, one single organization and work to make things better. One of the ways that I've actually seen this happen is by having someone in the C-level executive have their engineering team, and then you have either an SRE team or a platform engineering team, and the other one encapsulates into that one. So you can have a platform engineer organization that has a team of SREs that focus just on reliability or a bigger SRE team and then has some engineers focusing on platform engineering. And that really breaks on the silos. It allows folks to come together and really collaborate. And at the end of the day, a lot of the stuff that we're trying to do is to codify things to make them more repeatable and reliable. So with that, we're going to get into the meaty bits of our talk. But first, what we're going to do is we're going to play different characters because, honestly, it's after lunch and we don't want to fall asleep. So I am going to be playing the role of that developer. And I will be playing, oopsie, the role of a platform engineer. As a developer, I've noticed that the way that we build, test, and deploy has gotten more and more complex. We have things like the public cloud. And within it, you put some things called serverless workloads and Kubernetes, like how do I know what I want to use? Unfortunately, that means that as a developer, if I want to have access to the things that I need when I want them, I'm at the mercy of other teams to bring things up for me. I'm at the mercy of the platform engineering team. And I hate waiting for people to do things for me. Listen here, buddy. You think you have it hard? Like, okay, as platform engineers, we have the keys to the so-called cloud kingdom. But listen, it's not all about you, right? It's not all about DevX. We also have to maintain reliable systems. And it's just too much work. We're super stressed. We are at the point where we are drowning in Jira tickets. Have I told you how many Jira tickets I have to deal with on a regular basis? Jira, Jira, Jira. It's gotten to the point where I'm not enjoying my job. I want to work on the cool stuff. I want to work on cool animations. I want to work on making my systems more reliable and stuff. You make it sound like you're the only one that has Jira tickets. I have Jira tickets myself and I'm sending you Jira tickets. And I'm sick and tired of waiting around for you to fill my requests. Sometimes it's hours and hours. But other times it's days. And I have to explain to management that I'm being blocked by another team and I'm constantly waiting. I'm used to doing things for myself. I want the stuff that I need infrastructure-wise ready to go when I need it. You're literally tying my hands here. Clearly, you're overworked. You're extremely frustrated. But I'm extremely frustrated, too. And this is not healthy for our organization. How is it that we can make this better? So I want to make it super clear that, first of all, when you request stuff from me, like a Kubernetes namespace, or maybe a Redis instance, it's not like I'm sitting in front of my computer going clickety-click on some screen. There's automation that happens behind the scenes, okay? So we rely on tools such as Open Tofu, Argo CD, Flex CD, Cross Plane, Captain, just to name a few. So I mean, I'm not sitting on our asses. Well, that's well and good. But that's still not solving the problems I'm telling you that we're having. We have such complexity in our organization, we have so many needs that our developers are having. And did you not hear the management sending you memo? They want us to release faster than just once a week? Ouch. What I need is to get the tools that I want when I need them. And many other things, too. I really want a stable environment. But I want my stable environment to be the exact same one that I have in Caracas, Venezuela, the same one that I have in London, England. I want things to be approved by security. I want Infosack to already know everything that we're deploying. And I want to make sure that the stuff that we're using has gone through SOC2 auditing. I really don't want to be concerned on the underlying layer of what kind of infrastructure I'm running. I just need things that work. And I think it would be really nice if you could build in some best practices. I heard there's these things called guardrails or something like it. Okay. Okay. I admit you got my wheels spinning. I guess we are codifying the things that we're delivering to you, but we're not codifying the delivery of things. And I guess at the end of the day, it does make sense to do that because it saves me time. It saves you time. Time equals money. And I guess it makes the finance folks awfully happy, doesn't it? Well, that's awesome because with you codifying the delivery, you're now making it easier for me. If I'm dealing with an outage and I need to spin up something new at two in the morning, I don't have another year at a ticket that I need to put up and wait another 12, 48 hours. I could just get it at 2 a.m. or 2 p.m. Fair enough. And in doing so, you're going to have infrastructure that is reliable and repeatable, not just when you're available online. And I guess you're not going to get any more messages from me saying it's not working on my machine or my other dev teams. That's great. Making it both repeatable and reliable. And like we said, it's really going to make InfoSight people happy. We now get to build in those guardrails. And did we not say that financial folks are going to be happy? We don't have these extra cloud resources that are being spun up and being left doormat for months. And I think there's another person or bigger thing that's going to be happier. Call the Earth. We're going to be reducing our carbon footprint by allowing a lot of these resources to be coming down on a timely matter. So very true. And I guess now that I think of it, the best part is we don't have to reinvent the wheel because who hates building in-house tools? Fortunately, it turns out that there are a number of self-service platform engineering tools out there that can assist us with these sorts of things. We have cross-plane and backstage, which are both CNCF projects. We have credits and port as well. By any chance, is there a way you can put a demo together so I can see something today? Actually, while we were talking, I secretly enlisted the help of my super llamas to put together a little demo. So I do have something to show you now. But first, I'm going to explain a few basic core technologies that are going to be part of our demo, starting with our good friend, open telemetry. Brief overview of open telemetry. It's a CNCF project, open source, vendor-neutral. It basically generates, ingests, transforms, and exports data to an observability backend for analysis. And you can't talk about open telemetry without also talking about the open telemetry collector, which is basically a vendor-neutral agent, which is used to ingest data, whether it's from your infrastructure or your code, and then transforms your data using processors. And it does things like masking data, filtering data, add and move attributes, batch data, sample data. And then, of course, it's got to go somewhere, which is using an exporter. The exporter sends it to an observability backend for analysis, something like Yeager, which, in fact, we'll be using as part of our demo. The final thing I wanted to mention is one of the ways in which you can deploy the hotel collector is on Kubernetes. And to deploy on Kubernetes, you can use something called the hotel operator. The hotel operator, among other things, manages the deployment and configuration of the hotel collector. And so we're going to be leveraging also the hotel operator as part of our demo. I am not sure if I'm following. That's not the tooling that I just told you that we kind of need. Oh, yeah, fair enough, fair enough. Now, the other thing I want to mention, I forgot to mention, sorry, we're also going to be using Craddix. Does that help? No. Okay, okay. We're going to be using Craddix to basically deliver hotel operator capabilities to a Kubernetes cluster. Does that help? No, what is Craddix? It sounds like a space company or some crunchy chocolate? Well, fair enough. So Craddix, which we'll talk about in a sec, but Craddix will be using to basically deliver some additional capabilities to you, which is, so we've delivered the hotel operator to your Kubernetes cluster. And we're also going to leverage that capability to request resources like the hotel collector, which is going to be pre-configured to send traces to Yeager. We're also delivering a Yeager, and we're going to deliver a sample app so that you can test this entire setup. So we're essentially delivering this whole lovely infrastructure for providing you with the means of emitting, of exporting your traces, collecting and exporting your traces to Yeager, all at your fingertips. So wait, you're telling me that as I develop my code, I can actually see what's going on? Yes. Nice. That's actually going to save so many hours of just pulling my hair and trying to debug things like late at night. Right. Super cool. So you did ask about this Craddix thing, so now I'm ready to tell you about it. So Craddix, not a crunchy chocolate bar or a space company, contrary to popular belief. So Craddix is an open source tool that is used to deliver platforms. It can be thought of as a platform orchestrator. And the cool thing about Craddix is that the folks that are involved with Craddix are actually involved in the CNCF through the tag app delivery group, which is super awesome. Craddix does sound cool. How does it work under the hood? So with Craddix, you can deliver capabilities using Kubernetes YAML file called a promise. And the promise, basically, then it encapsulates this functionality, and then you can basically request the, request resources that leverage this capability and bundle that package. It is an API as a service, Kubernetes native API as a service, so that you, the developer can consume that kind of all encapsulated with relative ease. Okay. That does sound really cool. But this thing called promises. Are you also promising to buy me chocolate or dinner? Well, you know what? If this demo works, I promise to buy you a Chicago style deep dish pizza after maybe fingers crossed. But before we get into the demo, I do want to show you a bit of the lay of the land. So you know what we're working with. So first of all, platform engineers, we typically work with a control cluster. And in our little Craddix landscape, we have Craddix installed in the control cluster. And then so we own that cluster. Now there's also a number of worker clusters. And these are the clusters where you do your work, right? Where you're going to be running your apps. Now we own these clusters as well. But these are the ones where you're doing your work. And they, these clusters are registered with Craddix. And so Craddix is going to be able to install these capabilities for you relatively easy, easily. Now to get a feel for what we're actually be doing, we'll be doing in this demo. As I said, we're going to be installing an hotel operator and some other cool stuff like the collector sample app in Yeager. Now to do that, the hotel operator actually requires a cert manager for installation as a prerequisite. So first we have to install a cert manager promise. So basically I say to Craddix, which is running on our platform cluster, hey, install me a promise. And Craddix will say, okay, let me pop the YAML files for installing this promise into the state store, which can be like an S3 bucket, for example, our worker cluster, which has something like Flex CD running on it, is like looking at the state store going, do I need this? Okay, I do. Is it installed? No, it's not. Okay, I'm going to install it. All right, cert manager is installed. Now let's install the hotel operator. Same sort of deal with, okay, Craddix, install me this thing. So we initiate that on our platform cluster, which is running Craddix. And we say, all right, install me the hotel operator. It'll pop the related YAML files for the hotel operator in the state store. And then our worker cluster will say, huh, see some new files. Do I need them? Yes. Are they installed? No, let's go install them. In addition to that, we are leveraging these capabilities by installing these additional resources, our hotel collector, our sample app in Yeager. And so we're encapsulating that as a resource, exposing it to you developer as an API as a service. So then you developer can go and say, hey, I need this bundle that installs these three things for me. Super duper. So they say, Craddix, install me this stuff, this API as a service request. And again, Craddix will pop those related files into the state store, worker will be like, oh, see some new stuff. Do I need it? Yes. Is it installed? No, let's go and install it. So that's the gist of it. Okay. Seeing it with this nice process flow, things are starting to make sense. Awesome. But you also mentioned you had some llamas doing your work. Like it sounds like your llamas are just AI something. And I'm not sure if I believe that. Is there actually something we can get shown? Listen, my llamas are super smart. Don't knock them down. I'll show you a demo. Okay. Demo time. All right. Here's the deal. We have two clusters, a platform cluster and a worker cluster. I am using K9s, big fan of K9s, to manage the clusters. So you can see Craddix is installed on the platform cluster. And then on the worker cluster, it is registered with Craddix. It has this Craddix worker system namespace on there. And the next thing we're going to do, so Craddix is installed on these clusters. We've shown that. So we are going to apply our promises to the platform cluster, which is going to install stuff in the worker cluster with these commands. What does these commands do? Cube, C, T, L, create something context. There's a platform. Don't worry. I'll show you. I'll show you in the next demo. Okay. Here we go. So the first thing we're going to do is install cert manager. Now, Craddix has a cert manager promise available through its public marketplace, so we're installing that. It's being installed on the platform cluster. But if you look closely on the worker cluster, you will see that even though the promise is being installed on the platform, cert manager is being installed on the worker, which is where we want it, right? Because these capabilities are made available to our developer. Similarly, now we're going to install the hotel operator promise, which MyLama has developed, and same sort of thing. We apply the promise to our platform cluster, but it gets installed to our worker cluster. And we can check by running either Cube, C, T, L, or through our canines user interface to basically show that the promises are actually residing in the platform cluster. Having two spaces, so just in case. So does that make sense? It's starting to make sense. But where is it that I come in? Okay, so this is you. We are here. So we've installed these two promises. We have made the API as a service available to you. So now it is time, my developer, for you to request these resources. So that's what we're going to be doing next. With this, so we're applying this YAML file. This YAML file encapsulates everything. I keep hearing about you platform people using YAML everywhere. I don't understand what I'm supposed to do with this. Okay. All right. That's where the next demo comes in. Here we go. So this YAML file installs an hotel collector preconfigured to send traces to Yeager. It installs Yeager and a sample go app all in one go. Amazing. So here we go. We are applying the resource request YAML, and then we're going to see some magical things happen in the platform cluster. We see an application namespace being created and this is where our little sample app is getting installed. Next, if we pop over to our other namespace that gets created, we see that Yeager and the hotel collector get installed all from that one single YAML file, which I think is pretty freaking cool. Are you excited? I'm excited, but I still don't see the part where I come in and I got to see things working. Okay, true. Yeah. How do you know that the stuff is actually installed? Well, we're running a few of these verification commands to make sure that I'm not full of crap, which we'll see in action here in the next slide. So first off, we are going to tail the hotel collector logs. There's a bunch of stuff that showed up on the screen. I'm praying that it means that the collector came up properly. The next thing that we're going to do is we are going to port forward both for Yeager and our sample app so that we can access them easily. So now we're going to hopefully go over to our Yeager UI to make sure that it is up and running. Pray that it does. Hooray, Yeager. And then we're going to pop over and make our API call where user Catherine Janeway signed up and you get bonus points for getting the Star Trek Voyager reference. Oops, sorry. And also we saw the collector logs do something. So hopefully that means that our traces went through. So let's actually go over and check Yeager to make sure that the stuff is up and running. And if we refresh it, we will see that we have a new service that's there, which means hopefully we have a trace, which we do. And then we have some spans and we dig in and we see Catherine Janeway there and like, haha, it's there. Are you excited or what? I guess I could have trusted you and your llamas, but thank you for getting a chance to show me. There is still another problem as you haven't met. You told me you were going to get me Chicago deep dish pizza. I know, I know, but there is one more thing that I did want to mention, which is the fact that different platform tools can complement each other. I mean, we mentioned, for example, that we have crossplane and port and backstage in the mix of like tools that we can use to enable platform engineering. And so, you know, consider a scenario where we've got crossplane and Craddix working nicely together. Okay, I actually think I know how crossplane and Craddix can actually come together and play together. Okay, enlighten me. Let me show you a little bit of an example. Okay. Well, first we're going to leverage Open Tofu with the GitHub actions to create a cluster and that's going to install Craddix. This is now going to allow for us to use three different promises. That's going to be the cluster promise, the crossplane promise, and the application promise. Then we get to move on to step two, where we see that we can make a request to crossplane and that's going to grab all the configurations for the providers and the permissions that we need in order to create more clusters. Then that brings us to step three, where we get to make requests over to our Craddix and we get to leverage the cluster promise. That's going to go ahead and spin up a developer cluster and a production cluster. Both of them are going to be registered with Craddix and they're going to be managed as worker clusters for Craddix. But it's also going to go ahead and install crossplane and Argo CD for each cluster and that's going to be set up just specifically for the dev cluster or the production cluster and we get to leverage some specific gcp resources for that. Then we get to see the crossplane in the dev cluster is managing those resources and in step four we can see where an application team makes a request for an instance on that application promise and it goes ahead and installs nginx and our application. And all of this is done in declarative code so we get to go ahead and make things repeatable and more reliable. Well, I guess I'm out of the job then. Not so much. We can see that in scenarios like this, teams such as like SREs, developers, platform engineers can all come together and work together in self-serve tooling to make things more pleasurable, more reliable, and having more collaboration involved. And let's be honest, the less we use Jira, the better it is for everyone. Absolutely. But before we go, we wanted to make sure to leave you with some tips. First of all, we want to remind you that don't go ahead and just rename your team and not change any of your action items or the processes that your organization goes. Cough cough. Yeah, because that's the equivalent of putting lipstick on a pig. Then we also want to remind you that every organization has different needs. So I may have shown you an example where SRE and platform engineering early on report to the engineering team. That might not be the case in your organization. And that's okay. Not everything is going to be the same everywhere. And remember, the Google SRE book is not the SRE Bible. You'll have some leeway to do things in the way that works best for your organization. Hot tip. We also want to continue reminding you that collaboration is key. When you start trying to solve a lot of these problems, please make sure to include developers and platform engineering discussions. And developers, please don't forget to talk to your platform engineers when you're trying to think about making things better. Oh my God. Does that mean we can be friends? Of course. We both love Lamas. That's true. And don't forget to try to codify all the things always. Absolutely. We love codifying all the things. And with that, now, before we go, we do want to give a shout out to Dolly for generating all these lovely Lama images. Because while Anna and I are not great at drawing, we're awesome prompt engineers. So all ALR evil, AI overlords. Also, huge thanks to Abby Bankser and the Craddix team. And Abby is the one who's been using the clacker for the presentation today. So big shout out to Abby and for also helping us unravel all the mysteries of platform engineering. Honestly, it's been a trip and it's been lots of fun. We wanted to also leave you some resources. The resources include things like a tutorial to spend something like this up. Learn more about open telemetry and Craddix. So take a moment to take a photo of all these pretty QR codes or grab the slides afterwards. And then also, be sure to check out my podcast Geeking Out that I produced with my 15-year-old daughter, Hannah, who designed the logo. And she does the video editing. And Anna and I used to have a podcast called On Call Me Maybe. There are two awesome seasons with some amazing guests. So we definitely recommend that you check it out. We also wanted to let you know that I'm going to be at the AWS booth talking things open telemetry. Or if you want to talk platform engineering, I'm also there for that. Swing by the project pavilion and check out the open telemetry booth. And we also wanted to plug in the service now has a booth and they're giving away three pairs of vans every single day. And also check out the hotel observatory booth. I work as part of the hotel and user working group. And we're gathering stories of hotel users, contributors. We're creating humans of hotel videos. So if you would like to participate, come find me there after this talk and find us on all the socials. We are pretty much on all of them. Now before we sign off, we would love it for the purpose of our socials. If everybody could say peace, love, and code together, are you down for it? We'll prompt you. Okay. Three, two, one. Peace, love, and code. Yay! Thank you so much, everyone. Questions? We'll also be around with some stickers. We have geeking out podcast stickers on commie, maybe, products, open telemetry. Yes. And some other older generation stickers from previous conferences. Well, we'll be hanging out if anyone has any questions. We'll be out here. Thanks y'all. Have a good one.