 Good morning. Good afternoon. Good evening to everyone. I am very excited to join you today from Hello, good afternoon. Good evening. Good morning. Good everything to everyone. I am Happily joining this twitch stream today amidst a chaos of deliveries and other issues So I am very happy to Introduce today Josh Woods. He is joining us and then Eric Jacobs will be joining us here in a few minutes I believe he had some deliveries as well show up at the same time. So Yeah, is there there Christian is complaining of echo I am not hearing any echo Christian Sorry, so Josh Woods. He is known for a book. Believe it or not. They let Josh Woods write a book That's actually one of my favorite books to be honest with you because it has helped me so much learn about kubernetes operators I don't have a physical copy Josh because you know the times were in but I did print out a cover for everyone to see You can go to Just just go like kubernetes operators ebook It's on redhat.com. I will drop a link in chat here in a moment But yes Jason Dobies who was on yesterday and Joshua wood who wrote this wonderful book on kubernetes operators Josh Can you tell us a little bit about like the experience of writing a book a technical book like this? Oh wow, okay? Yeah, I mean I can and thanks for mentioning Chris and before I dive in I'll say one thing if you shoot me an email sometime with Address info I can make sure you get a book. Oh, well, thank you very much. I appreciate that. I'll even I'll scribble in it for you So yeah the process we had lots and lots of help from of course a ton of people at red hat Dobies and I are both developer advocates and while I have a long history with with operators originating from my time at core OS And this has kind of been a focus for me for what is an accumulating number of years As hard as that is to believe Dobies was fairly new and that gave us a really good kind of platform of a good knowledge of the basics and a good knowledge of What's somebody who's just coming to it needs to know to get the basics down to under you to understand how to? Start using and how to start writing kubernetes operators So the book focuses on red hats operator framework and SDK tools for the mechanism for writing that and as far as like how the book Had like what writing the book was like we had a ton of help from the folks at O'Reilly who published it and And started the project and from a lot of the folks at red hat who are in operators project management Or working directly on building operators and the operator life cycle management framework That's such a big part of how we deliver the features we add on top of kubernetes In open shift, so we had recourse to a lot of expertise and we're not trying to write it entirely out of our own empty heads And probably the most challenging thing in doing all of this is that we're dealing with Something that's still really rapidly evolving Especially in the framework in the SDK tools while a lot of the thinking and sort of Conceptual model of operators has been in place for a good amount of time and we see maturity around it in Partners and vendors who are delivering operators into the operator hub The tools that we used to build operators and the underlying kubernetes abstractions that operate operators rely on and leverage Continue to change rapidly so throughout the course of the book We get a chapter drafted and into place and we would constantly be having to do a walk back through and reviewing What we were doing and trying to make sure that at least at the moment We delivered the book to the presses it was as up to date as it could possibly be so for example I already know of a couple instances in the SDK where we need to make updates and where that'll be noted in like sort of Arata pages for the book That have happened even in the month and a half since since the book printed in March that yeah I was about to say like that is Like the fact that hey you you wanted you wanted to write this book knowing full good and well that the pattern is still It's a pattern. It's an established pattern. It's a good practice You know, we've established that but like the actual underneath bits are going to be changing very rapidly and that includes the SDK Kubernetes the whole nine yards and you know, also this whole this whole thing as as we're evolving here You know, we're donating the framework and potentially OLM and you know operator hub is out there for now But the the framework itself operator SDK framework is being donated to CNCS So this is not just a red hat thing. Yeah, you can use Kubernetes operators on any Kubernetes cluster, right? Like it's not we but we you know, we sponsored the book We're giving it away for free on the website. I just dropped in chat But you can use these operators or the operator pattern in general on any cluster Yeah, yeah It's a really good point and it even extends to one of the things that I think is the neatest part of what we Built out of the pattern and that is this idea of an operator life cycle manager. What is OLM? Well operators Automate and manage the software they run OLM automates and manages operators for a cluster and OLM while it is again Essential to how we're building features on top of the Kubernetes core of our OpenShift distribution It's also available in a bolt-on that you could add to any bog standard Kubernetes cluster and then begin consuming operators from operator hub.io and other catalog sources where where vendors have mature operators already Already in the market consumable in one way or another So I think that's a really important note that you make that that while this is intrinsic to a lot of how we deliver OpenShift it's been designed in a way to be modular to any Kubernetes cluster Which is is what the book focuses on certainly and in the exercises in the book we use mostly Minicube To illustrate this very fact. So if you look in the later chapters, you'll see an OLM bolted on top of Minicube And we use that for our build and deployment environment in the the hands-on examples in the book Cool, that's awesome. So, you know Eric just said he was on his way He to be honest, he had a delivery right as we were joining so that's why we were a little delayed So his doorbell ring he had to sign for it and all that fun stuff So now he is on his walking treadmill. It looks like joining us from afar off land Hey, Eric, how's it going? Well, he says he's good. I can't hear him though Eric with the audio issues. I don't know. That's cool You can do it Eric. I believe I believe I can fly I just want to believe we can hear I just want to believe we can hear him. No got nothing Nope, nothing. Nope Unplug it plug it back in again, or is it unplugged completely today? Yesterday he joined and I was like, I can't hear you. What's going on? Normally I have no problem And he had to plug in his headset Still is not working though Eric. No, nothing. I got nothing man. I'm sorry Yeah Eric the hint I could maybe offer is that zoom made unexpected choices for both my input and output sources when I that's true I've used it before but I had unexpected choices for those two items when I opened it today. I did too Yeah, actually I had to Adjust all my settings and like replug stuff in and everything so yeah something in some zoom update somewhere has potentially maybe Changed our defaults. So yeah, so so yeah, Josh About Prometheus today or did you want that out? Yeah And so I haven't gotten my shameless plug for for my my literary output out of the way What we are gonna talk about today And why we brought Eric as the SME for this particular feature is we're gonna talk about the new User workload monitoring features previewed and open shift 4.3 and GA and open shift 4.4 That allow you to Yeah, you leverage open shifts for me fierce monitoring and alerting system with your own applications In a fairly easy and plug-and-play way. I think we can hear Eric now. So we can hear Eric now But can I share a single desktop? Oh, this will work the question. Oh, I see screen. I am very happy Yay, Eric All right, so working. Yeah, that's working. We see your cool login page. Yeah, all right so yeah, I I caught myself by surprise because I Forgot sort of what we were gonna do You didn't forget you just well now I mean we I've been talking to Josh about this and it was like Josh doesn't really know kind of how the stuff works and We had talked and then it was like, oh, I thought he was gonna drive But then I realized that I probably need to drive because I'm the one who's actually gonna write the code maybe also You know So yeah, I was trying to get set up So I didn't totally pay attention to the ground that you covered on Metrics and monitoring and what we know and what we don't know so we were just getting started Also, if you could turn your input up a little bit, that'd be great volume. Yes Your volume you are very quiet We get to watch you do this one Yeah Which are hard to find a wall. I'm sharing my screen. No, I got it. I hear you They are you go to that little zoom window thing that hangs off. There you go The the auto is all the way up. So let me try something else here sound Maybe I can this is the joy of live streaming folks, right? Like nothing goes perfectly It was we will get to a point in this process where we have this down. We think yeah, that volume is definitely there we go No, that's all the way up to you. Okay. Well, I guess that's as good as we get I Don't know how to get any louder. You sound better now a little bit. I think you just moving your mic help Okay, so yeah, I don't know. Let's keep. Yeah, sounds good. Let's keep trying screen. Yeah, so I hear that we're talking about From the hilltops Yeah, so I've got an open to 4-4 cluster here, which I'm not sure if it's the GA released software because I Don't know what our demo system gave me but such as life To be clear our demo system what Eric is data references there for the audience is what we were hinting at in that little intro that Chris and I were talking while while Eric was coming online, which is the features we're gonna be looking at today were a Technology preview and OpenShift 4.3 and I've reached our GA state in OpenShift 4.4 So even if we're looking at a little bit of a pre-baked Version of 4.4. This is where the features GA and that is the the user workload monitoring feature With OpenShift built-in Prometheus that that we're going to be using to monitor our own application today's session cool So let's do a couple things here. I'll get a folder set up here for our So more or less I came to Eric and said hey, I'm an OpenShift dev advocate. I'm generally well educated This is a new feature Prometheus is I'm not something I'm deeply familiar with and Prometheus has sort of its own DSL prom cool for writing queries and an alerting system for triggering alerts And I know that a lot of these features are maturing in OpenShift's delivery of them So what we wanted to do is literally have him walk me through learning how to use these features Yeah, so I'm making a folder Called Sinatra Metrics. So I'm a weird person and I like Ruby As a program weird about that. I mean, you know, I like a lot of different languages. Does that make me weird? No, but most people think that Ruby people are weird. So such as life This repository has too many active changes. Yeah, that's fine. So Yeah, right Well, it's like I have my entire home folder as I get repo, but I have most of it ignored but anyway So we'll just make a new repo here to make VS code less angry. Um, so We'll go into our We'll try to go into our OpenShift environment and let's create a project for our For our metrics so we'll call this metrics playground Right in and for folks who might not know or we're tuning in to learn about OpenShift and OpenShift facilities I'll briefly say projects are kind of OpenShift version of namespaces on steroids and a way of isolating teams and the work of individual teams or individual developers From one another safely on single cluster deployments Hey, can you zoom or increase the size of your browser when you get a second? Yeah, absolutely. Thank you You got it. Okay. So here's a simple Sinatra Ruby application Sinatra is basically one of these Ruby Well, Sinatra is not really a framework. It's more just like a like a server if you will um And it's one of these real basic ones that like you got to tell it everything you want it to do so In this case, uh, so me writing go code is what you're saying So in this case, I have to tell it. Okay, you're gonna respond to slash get Oh, sorry, you're gonna respond to a get request on slash With just the text hello world knife. All right pretty basic. So the first thing we're gonna do is I've got this code so, uh Let me reopen this folder to make VS code get more happy now that I have a repo in here Sinatra metrics Increase the font size of VS code to even more. Yeah Wow, I feel like uh, I feel like a very old person right now. You you can maximize the window to have more space I can but it still doesn't make me feel any less old. Well, you know It makes me feel younger because I can sit further back. How do I? Now I'm just fighting with VS code like it still has this stupid main Oh, I get I didn't commit anything This is the fun of live streaming right so get Get status. All right. Get add dot get commit initial commit Okay, fine. So how do I Well, I think you have your project open for your whole home folder rather than the project for yeah, but I just I literally reopened This folder. Okay. I'll open it again Maybe it'll behave better now But it keeps wanting to Oh, let's do this new window Close this window file open recent folder Oh much happier. Okay, cool. There we go All right, so I have this basic thing and so what I'm gonna. Yes. Thank you solar graph lovely Okay, so what I want I'm gonna take advantage or try to take advantage of the source to image framework, uh, which OpenShift knows how to use and so source to image is just a way to combine um An existing base image with application code and so what I'm going to do is just make an actual repo on github because that's public And I think I called this an outro metrics metrics It's public repository. That's fine We will do this and we will get add remote Oh need to give that a name origin get remote add. Yep Hair programming at its finest So I'll push that code to master Okay, great. So we finally have Some code in the public. It's just a Sinatra app I think a quick summary there is probably useful. You're going to use a ruby builder within OpenShift to Generate a container to run this code in and in order to do that You're making your repo available publicly so you can aim your OpenShift cluster at it Exactly to fetch your code. Yep. Yeah, we're off to pick up code So we have our metrics playground project. I'll switch to the developer view The the the let me chime in there. So there's an administrator view that we're all familiar with I think on my team We've introduced a developer view that makes this a lot easier for developer to just get up and running as fast as they can Not have to see all the stuff on the side about, you know Cluster metrics and everything else, right? Like that's for the ops or the admins of the cluster Over here in the developer pane. This is where we start deploying code and getting crazy with Metrics and everything else. Right, right. I would say the developer perspective is intensely focused on application code And on a topological representation of the components of your app running on the cluster As opposed to the admin view, which is a lot about like how many nodes are in the cluster What's the load average on each of them and ops teams sort of concerns as you mentioned, right? So like me personally, I spend a lot of my time at the administrator view But when I'm on these Livestreams I spend a lot of time in the developer view and it's fun. I get both I'm letting people know what we're doing With the tweet. Oh, thanks. Yes. I had tweet off stuff scheduled What I have friends on Facebook who do technology stuff, man No, I have friends on Facebook too, but like opening the browser and then opening Facebook I know what that's going to do to the quality of the stream Okay So now we want our clone url, but we don't have ssh. So I need the regular Http url So paste that in here Show advanced options I don't need to do anything fancy because it's all just in master and in the base whatever Unable to detect the builder image. I'm pretty confident This is going to be a problem and and I know why it's going to be a problem But we're going to break it anyway just for funsies. Oh, yes. So it's ruby Okay Two five sounds good enough for me. Sure. Why not? Um, what do we want to call this? I don't want to call it. It's a not dramatic get app. We'll call it that And there's so two different names. There's the name of The application which is really the name of the grouping Right, and then there's the name of the resources. So I'm going to call them both the same thing I could call them something different whatever Do I want a deployment or a deployment config? Well, you'll hear it's talking about this one a lot Deployment is standard kubernetes capital g deployment. Uh, so if you know about that and how it uses replica sets, you know, that's great deployment config gives us a couple of extra bells and whistles and one of them is um Automatically redeploying when things change and since we're definitely going to be changing this code a whole lot We're going to want to take advantage of that So we're going to use a deployment config. We should probably do we could probably do a twitch episode That's just all about the differences between But to give a quick overview and summary and and to extend the one that eric just touched on a little is Analogous to what we were talking about with projects as a sort of accentuated A version of what namespaces do in default or standard kubernetes or Plain vanilla kubernetes Deployment configs are analogous to deployments in the same way There are additional open shift features that we that we built on top of the fundamental abstractions To enable some developer convenience features like triggering rebuilds that that eric is is specifically using in in this scenario sweet And then lastly we're going to create a route Routes are similar to ingress But again, we've got some extra cool open shifty enterprise bells and whistles that go along with routes But this is going to expose our application at a public url. So we'll go ahead and Create this. Whoa, that's a big ruby icon. Look at that. That is a big ruby And so what's happening right now is that there's a build Running or or starting or getting ready to get fired off right and what we see here is Open shift has spun up The builder image the the ruby based builder base image All your base are belong to red hat. Um, and then it pulls in the code and then it runs a build process, but you'll see it actually didn't do anything and This is going to explode in a ball of fire in a moment Because I left something out, but I sort of did it on purpose So anyway question in chat And it's from christian so Yes, our christ it assume rails is that why so you're kind of on the right path, but not not totally on the right path So here we go It starts to pull in the image and then it starts doing all kinds of building stuff And then it runs the assemble script And the assemble script builds my ruby application Well, if we think about java and building usually that's like maven and if we think about Python and building that's kind of like pip install and doing some other stuff. So ruby uses gem file But I forgot to actually have a gem file. Yeah. Oops. So yeah Oops, so that's why I couldn't detect what my source code was Because it's looking for key files in the repo that help identify the language that should be used So when I put in the source repo url And it it it introspected my repo it looked at all the files and it's like, well, I don't see anything I don't see a maven file. It's an amp file. I don't see a gem file I don't see a pip requirements dot text like I have no idea what the heck this language is It doesn't just assume that because there's a ruby dot rb file that it's ruby Right. So anyway, this is going to finish building and combining all the things It's going to push the image into the internal registry And then it's going to try and deploy it and it's going to fail because nothing There's nothing to run. It doesn't even know what to do So it runs the run script, but there's nothing to run. So, okay So let's go back to our code What is wrong? Actually go ahead. What is wrong with our code? Well, sort of but what I'm actually going to do here is I'm going to try to do something silly slash not silly Here's the build configuration And then what I think I can do is where is the webhook URL? Hey, look at that So GitHub Sorry, let me repeat this back up Builds and open shift have the ability to be integrated With webhooks and so I can tell github to call my cluster anytime the repo changes And so as soon as I upload code it hits open shift and says hey something happened open shift goes Oh, that means I'm supposed to do a new build. So we're like already doing ci With almost no effort. Um, so let me come in here to the To the github. I'll make this bigger Uh settings on the repo webhooks Uh add a webhook payload url Uh url form encoded is fine. The secret is embedded in the url itself. So I don't need to put anything in here I need to disable ssl verification because my open shift cluster does not use any Um known ca Yeah, yeah, so the ssl cert that is exposed by my cluster Is not known to github Um, well the the ca is not known to github. So if you know anything about how all that junk works basically like Well, I know you know how but I don't want to bother Do we want to go into pki infrastructure and all this stuff? I mean we came we gotta if you want It's your call. Uh, no, I don't want it. We'll let christin do that when he does dns So anyway, um, so we got a disable ssl verification Um, just push the event. I think is fine. We'll leave it active fine. Add webhook. Okay, great Uh, I don't know why that Last delivery was not successful why So many so many windows it's a little time Well, that's certainly isn't gonna work, right Let's github copy link address What why is it not copying the link? it can you What is at the end of the line I don't get that Okay All right, well that I mean that works to 403 I don't know why What was the error here? Ping Oh, that's fine. It actually worked. It's just the whatever answer it got back. It didn't like Okay, I think Because that's the headers. Oh the response. Sorry Yeah, that's okay failure unsupported content. Totally cool. We're good. Oh, do we just have the type wrong like we know? I don't think it matters. Okay um I think it's just it it didn't like what we told it But that's fine. It wasn't it wasn't that it couldn't find it. It was just, you know, it wasn't supposed to work Anyway, you know what let's ask the docs To the docs When in doubt that's why we write them, right? I mean, that's not why I don't write the docs. But anyway, triggering and modifying builds. Here we go. I said we Uh, webhook triggers github get lab oc new app webhook key secret blah, blah, blah using github webhooks Let's see describe Exignature Yeah, it's fine in github. It says oh change it to application json Oh, there you go. I wonder if we had the type wrong because that's what the the error seemed to indicate I mean he wrote a book You wrote a book too, Eric, but I never No, you didn't write a book Sorry, I ain't nobody got time for that same All right application json. Okay. We're good. So Um in our gem file, we are going to need to define the Sinatra gem And so I think I can just do well, you know what gem file syntax because I don't remember Oh, I don't really really Christian says you need to click update I need to click update. I did click update. I'll click it again And update yes, j. I think he disabled as a cell hook. Yeah That's disabled. Yeah when I click update it doesn't it just takes me back Okay, the hook was successfully updated. Look at that. Yeah, okay. All right gem file syntax Source ruby gems.org So we will make a file called gem file And we will add source ruby gems.org And I don't care about the ruby version and then so we want gem Sinatra We will Add our gem file to our repo right and but Yeah, and as as eric moves along here I don't want to like sort of give a preview for the Prometheus part of the thing as we build this app out this gem file will actually become key because it's how we'll add An exporter for Prometheus Uh protocol to to the ad to our demo app here I'm wondering where's the builds? This is weird Built configs, but it doesn't show me it says builds, but it doesn't show me the build. Anyway, whatever in the yeah, there's your Hey, look, it's running. Ta-da. I didn't do anything. It was all github. You can you can blame github Because we triggered a new build by sending that hook from from github to this Exactly when I pushed the code it caused The web hook to fire which told open shift to do the thing which happens to be a build So that's what's happening. Okay, here we go. It's copying the source code Installing running bundle install, which is good Which is hopefully gonna pull in Sinatra and probably other random dependencies. Oh, sorry Hey Sinatra Hey, look at that So maybe this might actually work and it will fly us to the moon Will it or send us is it flying me to the moon? Yeah, flying me to the moon Yeah, there we go. All right. So the new deployment. Hey, that looks nope. Oh wait doing something. Oh, it's doing something It says it's running You might consider adding puma. I might not So Sinatra wanted to add a wild cat here, uh, you know, hello all you It says crash the back off, but I thought it was running. Nope. It's still crashing. It's crashing Why maybe you should put a puma in it But Sinatra should be able to run all by itself like it shouldn't need Now I'm joking. You shouldn't have to do that. You're right figuration app groups. Oh, it's using freaking. Oh, is it doing like No, it's no. No, it's trying to use rack uh So christian's question about yeah, so, um rack is like ruby middleware that, um It's hard to explain what era. Let's just ask the internet Okay It'll probably explain it better than I will rack explained for ruby developers. Here we go so It's it sits between rails and the web server, but the way that the builder image is configured huge mistake Oh, you got to get a cheat. Yeah, um, so the way that the ruby builder image is written I think it expects Rack and so if you look at the error message that we're getting it's complaining There's no config. Are you found? That's a rack which is a rack up file. Okay. That makes sense. So now I just need to quickly Sinatra ruby rack Go ahead. So my question there is going to be did we fail to generate Uh, this rack up file or did we generate it and it's not in the expected path? No, because it's Reminds me of like target dot war for uh, no, this is going to be like a like a s2i run problem Um, it's whatever whatever that source to image image github.com scl org Ruby i'm pretty sure it's this one So now we're getting into kind of some of the bowels of Source to image if we look in the bin if we look at the run so Um, if puma is installed which it isn't Otherwise you might consider adding puma. Okay, fine If bundle exec rack up is null then exec this otherwise exec this other thing So I think what it's doing is this But that's actually failing So that's why it's crashing Um, rack is not installed in the image. So we're not getting this image Or sorry, we're not getting that error, right? It just says you might consider adding puma. It's this configuration Configure that our use not found. So I think if we have a blank file That will be okay Um So we'll try would that be a blank file for the puma dot. No, it'll be a No, it'll be a config.ru So rack up dash e is uh, yeah, it's just it's defaulting to look for config dot Are you I mean actually puma might make our lives easier It's maybe we could puma. I don't know should we puma or should we try to fix whatever's going on? Whatever you think is best because you're the real expert here, but this is probably gonna blow up Without a puma config. Well, maybe it wouldn't I don't know Well, let's try Adding a config.ru that's just empty By default we'll only set up session Configure that are you run the rack up acquire my app on my app. I think we need to do this. So Let's try this Class my apps and not your base. Is that what we have? Let's see We do not Run if what? Hmm, this is annoying modular versus classic style Configuring okay, so let's do this. Oh my gosh people need to stop texting me while I'm Someone literally just texted me a picture of them watching twitch That's cool. That's very cool. Thank you, mr. Nix Oh, will nix will next. Yeah, I will. Hey, thanks, buddy um All right, so we'll do that and then it says Something about Serving a modular application. We don't care, but we do want The config file run my app rb. Okay, so this Uh I There's probably you know what there's probably like an open shift Sinatra example somewhere Would that be a thing? I'd be cheating though. I don't want to do that Let's see. So it's referring to my app. Okay, cool. And then my app Which we have in here. Yes my app. All right, cool Ads config that are you Stuff Hey, look, we're gonna get another build builds builds builds Number three. Wow, that was fast That wasn't will nix somebody who never mind It wasn't will nix. I thought you said it was will nix. Well, will nix was in the it was like a chain Rack manages middleware not the java kind of middleware the other so again to kind of contextualize some of this a little bit While we haven't yet had a running server as a result What we are illustrating is uh s2i builds in open ship assembling all our components Building them into a campaign container depositing that container into a container registry accessible to open shift And then deploying that new build on on the cluster So uh like we're seeing a lot of developer convenience steps Assembled by by this this process that we're running through We got more gems now maybe I don't know the gems Gemtastic We may have gotten those last time. I just wasn't paying attention Perhaps this should have been planned more. No No, that would be failure failure is part of the fun No, exactly like learning in public if you just want to see a demo go to youtube We we we learn in public here Well, we we do things that should probably be learning but Anyway, okay Fingers crossed maybe it worked this time Oh still blue Details Nope, that's not what I wanted Logs Hey, hey, I fixed it. Good job my consider. No, I don't want to Okay, so we have a routes for this thing um, which we could see Uh, this button here open url So when I click this I get my hello world Yeah, which I think it is again really important to underline that Open shifts assembled all of these pieces along the line for us Up to and including giving us a url where external clients can access this Contrib service that we're using Yes Okay, awesome. All right Now if we look at the details on this pod we see some pretty basic stuff memory usage 32 megabytes CPU almost nothing Network basically nothing Right, but this isn't necessarily All that interesting or valuable or useful right because none of it is particularly application specific These are in fact statistics about the pod in which our application is running Right, so so if we want to drill down and know more about the behavior of the internals of our code We need a way to instrument that yes so, um If we switch back to the administrator view real quickly Actually wait it was here in the developer view monitoring Whoa, that's that's a new thing that I've not seen and it is beautiful It's it's certainly colorful What is oh nice rate of received packets. Wow, that's exciting That's a lot of metrics. Yeah, again, and these are metrics that out of the box We have on open shift just about any particular deployment we've made or deployment config. We've oh, I may not have to This looks like it might be turned on already um Well, we'll see what happens and maybe it'll work or maybe it won't. Okay. Um If we go to the documentation again, um, and we find the monitoring there we go monitoring monitoring your own services Um By default well, actually, I'm going to go back to the open shift UI if we go back to the administrator view We go into monitoring We go into metrics We have a lot of interesting Uh, oh That's not what I was hoping for Can I see no, um Ah, here we go. Okay So the cluster is already configured to fetch All kinds of metrics about stuff um And so essentially when we build open shift And then when we install it we preconfigure the cluster To be doing lots of metrics exposition and ingestion right and the way that we do that is with Prometheus rules And service monitors. I think I got that right. So if I go To actually I'll do this in the UI because it'll be fun and it'll be an experience If we look at crd's um So it's I'll I'll give a little background here and this thing called a service monitor is um A crd a custom resource defined in this clusters kubernetes api that describes An additional Data monitoring point or an application with a set of monitoring points that that we want to be able to describe to the cluster And have it start fetching those uh those metrics for us, right? Yeah, and if we look at the instances of service monitors You'll see that we have all these existing service monitors. These are the built-in ones inside open shift that tell the cluster monitoring To look at for example the api server or to look at um The marketplace operator and collect metrics on the marketplace operator, right? Right and and architecturally even these are a really cool thing to look at because They give you an idea of how We're we're extending kubernetes features and open shift in kubernetes terms so a crd because it has a Is a known format with a with a a standard way of expressing some set of data means that other developers other communities within kubernetes have a way to Uh describe this and talk about these these ideas of service monitors intelligibly with one another and That we can bolt them on to any kubernetes cluster because they're built out in terms of extending the kates api itself Yeah, um so We've been shipping this built-in cluster monitoring for a while now. Um, but we have Oh missing Anyway, not important. Um, but we haven't been um Sorry, I just realized I wear like a sleep ring And I'm not wearing it right now and I have no idea what I possibly could have done with it Oh, I washed my hands and I took it off. Okay. It's downstairs by the kitchen sink. Never mind. Sorry about that brain Totally like my brain just went over there and it was gone like it was done There was no coming back until I went through that whole train of thought I will almost have to like walk away and start looking for this thing. Anyway, so, um The cluster monitor has been there for a while and we didn't really give you a way to monitor your own applications You basically had to like self install your own prometheus instance, which is paying the butt, right? So with four three open shift for three and still now an open shift for four We give you a tech preview ability to tell open shift to use the existing monitoring stack to um To look at your services that you define And so it is looking throughout the cluster for service monitors and prometheus rules to pick up And so what we're going to do is look at the docs and figure out How to turn this on so that it will look for our own exported metrics Uh, and then we'll try and turn it on right on so the prerequisites make sure you have The config map object with a thing. So let's see So someone in chat is saying you're looking for a service monitor. We've already gone over that Yeah, yeah, that's why that's why I'm back. Sorry. It's okay. Um Cluster monitoring config config map. Okay. OC get config map Uh cluster Monitoring config in the open shift monitoring namespace Does not exist. Did I spell it wrong? Cluster monitoring I don't think I spelled it wrong Hmm doesn't exist you can enable it. Maybe we're gonna create it Because it looks like it looks pretty still on tech preview. Yeah. Yeah, well, it's still in tech. But but so here's the funny thing Right. It's like prerequisites. Make sure you have it right but then By the way, it's worded badly. That's fine Yeah, I would I would agree with that about the doc, but it seems to indicate that if we create it and then edit it To have these contents We'll get you know, we should ought to be Hopefully biscuit All right, so we're gonna use a combination of I guess is does this need to be bright? Is the dark theme Making it difficult fine. I think it's fine. So we're gonna create a new file We're gonna paste the yaml content into this We're gonna save it in temp CM config yaml Okay name Is cluster monitoring config it goes in open shift monitoring tech preview workload enabled true. Okay, so we will oc create this file success One thing to note Only a cluster administrator can do this thing Right, so in theory, if you're not the person who owns the open shift cluster You need to talk to that person to ask them to turn this on So we are enabling a feature in the monitoring solution Save the file monitoring your own services is now enabled automatically Uh, you can then check if the prometheus workload prometheus user workload pods were created So if we run this command But maybe it hasn't succeeded yet. So let us What? No, please go ahead. Sorry. I I misread something Hmm Okay, what do we want to do here get pot dash a Prometheus Um, Prometheus operator cluster. Oh, okay. It started. So here's user workload monitoring Oops wrong command There we go. Okay at this point We have told Via the config map We told the cluster monitoring solution. Hey, well, technically we told the Monitoring operator stack. Hey, we want to monitor user workload Which resulted in This news operator and other stuff getting deployed By the existing operator So it's like very inception-esque, but basically this is the prometheus stack that's going to monitor Our workload specifically right as opposed to the the stack that's already running in the cluster. That's monitoring the the fault statistics And metrics that we're working on deploying a sample service to test your monitoring services You can deploy a sample service We don't want to do that because we are writing the sample service You can check that it's running setting up a role for sorry creating a role for setting up the metrics collection I don't know that we actually Have to do this This enables a user to set up metrics collection, but I think I think we can get around that Good from the role binding Whatever We'll we'll go backwards if we have to do it okay, so Setting up metrics collection It says to use the metrics exposed by your service. You need to configure open shift monitoring to scrape metrics from the metrics endpoint Wait, we don't we don't have a metrics endpoint So how do we how do we actually do the metrics endpoint How do we make And what is that even supposed to look like right? if we think about prometheus um madetheus.io and we look at their docs I think it talks about yeah data model um This is really this is not what I wanted The first steps overview Where does it tell me? Bring the server bring this learning. Where's the thing? The thing it's discovering uh service discovery I'm looking for a better diagram here, but anyway, um At that metrics endpoint that it's talking about right it says uh metrics endpoint Prometheus expects to see a json payload. So I'll go back here to look at the um data model So metrics names and labels breathiest fundamentally stores data as a time series every time series is identified by name And some key value pair labels whatever and so it When you visit that metrics endpoint, you're basically it's expecting to see this like weird json not weird It's expecting to see a json payload in a specific format that um Tells it about what's going on in the application We could write this ourselves The payload is not actually json says metal mates. Uh, that's true. Yeah, it's it's something that includes both json and other things I think I might be doing this wrong Close enough right? Uh We'll actually see exactly what it is as soon as I figure it out. Uh, so anyway We we need our app to send the metrics we could write something to make it do that but um, let's see Prometheus exporter Let's see if there's already a metrics exporter for ruby. Oh Look at that. There's a Prometheus metrics exporter for ruby Because sweet of instrumentation metrics primitives ruby that can be exposed through an http interface So, yeah, just to give everybody like some background on Prometheus It was developed around the same time as kubernetes was so it has been around a while and it has You know libraries for pretty much every language. I've I've come across It's very very very very handy and developer friendly as well as operations friendly in my opinion Okay So gem Prometheus client, I'm actually going to do this locally first just for giggles because why not screw up my own laptop? cd open shift Metrics k bundle install And what this should do is install the Prometheus ruby gem It's the Prometheus exposition format. There you go. Yeah, and so in a really short summary form If we want Prometheus to discover and begin to to ingest metrics from an arbitrary application We need to provide Prometheus with this slash metrics endpoint that that it can respond with this payload These libraries which exist as chris mentioned for a great number of of languages and runtimes Give us that metrics endpoint and allow us to focus just on defining What metrics we want that in that that endpoint to export so that Prometheus can discover and represent them If I move The video box from zoom over here. Do you see it? I do not not on zoom Now I can actually look at you and it will look like i'm looking at you And then what I can do is I can look down so it looks like we're doing brady bunch stuff like hi chris Oh Okay, so Prometheus client got installed we're good and now if I Ruby my app locally it should work. Nope it exits because now I have to do the rack thing Uh, actually said you were a ruby developer This is how ruby development goes This this is this is ruby development right now. I know I used to work in a ruby shop. I get it I don't know exact Back up sure Yay, okay, nine two nine two. So if I go locally localhost nine two nine two I should see hello world and now if I do metrics, maybe it does something Sinatra doesn't do this. All right, let's figure this out. Oh, do you have to somehow register that? That one URL with like do you have to tell synod point is there kind of yeah, so In my app. So I added it in my gem file Uh, so it's here But my my app doesn't know anything about it right right now my apps only loading synodra So we need to require the Prometheus client in the application I think I just pasted over that. That's exactly what I did. Okay Um And then returns to default registry create a new counter metric register HTTP requests Okay, and then so Increment actually would do it whenever it's called. So let's do this. We'll create the client We'll do all actually we'll do all these things just for giggles See what happens never done this before like legitimately never done this before All right, so we have our registry. We have our metrics. We have our registering the metrics We have this helper function for accessing HTTP requests And then increment the counter, but we only want to increment the counter where the counter should be incremented So we'll put it here In the slash action so when anybody visits the application at slash we would increment the metric maybe So we'll see if this actually does something um, I need to restart the application And it explodes Well, each gtp request has already been registered. Oh equivalent helper function. It would help if I actually um read Copying and pasting from stack overflow for dummies Oh Gosh Here we go By the way, this is legitimate. I I shouldn't malign ruby developers because I really am not one I'm a I'm a I'm a crappy hacker at best. All right. I am a crappy hacker at everything. Yes at everything Okay, here we go. So if I refresh my hello world Oh, and it blows up also great undefined local variable or method HTTP requests uh Well, that's because well because it gets defined here The root of oh, it's not defined. Oh, I think I need to do this. Hang on. Let's try Let's try this Try again The stack overflow joke. Um, my my girlfriend is in the middle of some programming classes Which I was trying to help her with on sunday evening And we're working through this python app to draw some charts And so, you know, I go and I google how to ingest json and python How to parse json She's like is this like is this programming? I'm like it is when I do it Well, but that's like, I mean, I know I know we're sort of getting like a little off topic, but the thing is like There's no excuse for not I mean, sure, there's excuses, but like legitimately what's the excuse for not learning programming Like you can legitimately sit in front of google and 100 other free services And and build an app doing nothing but search and like free training online I mean, it's this is awesome, right? Anyway, so it is working now. I just refreshed, right? Yes, it is awesome And now if we go to metrics, hopefully something It's crossed So like I think I under like I sort of understand the hint that it shows at the bottom of that error page, which is Somehow or another either with a slash star that has a handler for every url under there or By explicitly defining slash metrics There's got to be something that tells the http server what to do Yep, or we got to keep reading or we got to keep reading the directions So there are two recommend-a-wares available want to expose the metrics endpoint. Hey, that's what we want Want to trace http requests? Oh, that would be cool if we were doing like jager service master somebody It's highly recommended to enable gzip compression. We are totally not doing that All right, so now we need rack and The middleware collector and the middleware exporter. We're not going to use the deflator, but we are going to use these two things Where does this go? Does this go in my config dot rack up file? Oh, duh. It's like telling me right here It would be great if I would read what i'm actually copying and pasting from stack overflow Well, you're not walking right now, right? No, I'm just standing. Okay, so you're standing and programming like that is two difficult things I feel like I was walking the other day But my wife is like, you know, I tried to watch her stream for a little bit But I just I couldn't deal with the like your head going back and forth. Oh really driving me crazy. So, okay Um, I don't know that this is gonna Work, right, but so Ryan You know what? those use lines Is what ryan is saying Or ryan jar jarvan and i can't say what about the use lines ryan. Yeah ryan Ryan would you like me to DM you the zoom blink? Ryan are you rubious? Can you save me from myself? All right, I've loaded I've followed slightly more directions and now we will go to the metrics endpoint and Oh Hey, look at that. Okay, but note. Look there are no HTTP requests, right? Like everything's commented out But I think that's because since I started the server I have not visited the regular slash url So if I go to the regular slash url I should see the the hello world which I do And now if I refresh this page, I should hopefully see That there was a HTTP Uh recorded Oh, wow data. Look at all that data. That's a ton of data Yeah, HTTP reverse 1.0 because you know fractional requests are a thing apparently So now if I refresh my page This is like a worthwhile point to make if you observe the data model that we were looking at for prometheus. They're all floats So there's no way to just export an integer, which is why you have 1.0 so Lilik Kozik says that these are not comments. These are help and type metadata Smarter than we are everybody. Yes, lily is well lily is from the prometheus team. Yes. Oh Brilliant when it comes to oh man. So I lily I apologize that you Are probably doing that thing where you're sitting and watching the television and you want to like strangle person on the tv And just watch the life like disappear from their eyes. I get it. It's cool. I'm sorry. This is so painful, but whatever So anyway, look we've got requests. This is awesome. Um, but we haven't made it work in open shifts quite yet Because first we have to commit our code Um, wow, what just oh apparently we have files Oh, he says no worries by the way. Yeah Everybody's worried. That's a lot. I believe All right. I need to get ignore Because I want to ignore the vendor folder Sweet, okay I can't ever remember if I'm supposed to ignore git gem file lock or not, but we're gonna we're gonna go with it So this is ads prometheus support Because that's a cool commit message if we go back to open shift if we go to The developer view go back to builds Go back to Sinatra look at the builds. Here's build number four Look at the logs Wait for the logs I don't have that many nodes. I'm surprised that it uh, that it takes a long time for Copying blobs around some of these images are kind of big though. So it's cool that it has my commit message in there It's very nice Also, I had your email so look out for that I'm pretty sure that if somebody wanted to figure out my your email that they could probably do it This is why I just put it online that way. I know like everything is gonna get blocked at some point Your your your uh-oh Oh exit one. Mm-hmm um Fine step around the s2i assemble bundler required by I Fruit source. Oh, yep. Okay. I knew it. Oh, what's the problem? um So the gem file lock Which is like the state of what's going on Uh It's because I used bundle to install this thing. So I actually want to remove this from the repo Just this is like a another artifact of how um Open shift wants to do The builds I think like I'm not totally sure but just whatever we'll go with it. So, um, maybe add it to the git ignore as well I'm sure that somewhere buried in the s2 here local fine. Okay. No, you made me do it s2i ruby If we go and we look at the assemble script Okay, so what does it do? It's for those applications that are using rack it puts them in production mode It has bundle Installed already in the base image. So it does this thing installs application source uh bund building your application source, whatever Uh, and then it does bundle install But I think there's something that it does when there's already to my linkedin profile. Yeah Where was I already deleted the file? What was the error message? Could not find bundler 214 Yeah, so I I'm using a different version of bundler and that got baked in and so basically it was like well I'm trying to do this thing. Oh, but I'm not the right version. So it blew up. It's fine So Yeah So add the gem file lock to the git ignore remove it from the repo lose gem file lock Get push Uh builds five Here we go logs so exciting By the way, as somebody mentioned While my email address is available through my linkedin profile feel free to send me a linkedin connection request. I'm happy to entertain all technologists I do many things alongside my redhat work that require Uh, a good network of peoples Happy to happy to connect I will find your linkedin profile and drop it in chat. Oh gosh, I don't know if you want to do that I'm doing it for all of us Yeah, the the I gen it's funny. I'm more selective about Facebook than I am with about linkedin. Oh, it's totally So like I I will there there are virtually no linkedin connection requests that I will reject unless they are like Super obviously spammy like the only reason this person is connecting with me is because they're probably going to immediately try to Tell me something right And then even then I usually accept anyway, and then my response is just immediately like no, thanks But Yeah, no, I get the I get the hey I wanted to talk to you about blah blah blah blah blah blah and it's like disconnect. Sorry. Nope. Yeah Okay, cool. So that built fine. So that fixed the problem We're moving the lock file. So if we go back to our topology view, we have an app deployed We see it's the fourth one. So this is going to confuse people, right There's a dash four in there like the number four But this was the fifth build But the reason that this is dash four is because the fourth build failed So there was no fourth deployment with the fourth build So now the fourth deployment of this thing Actually is the fifth build But like nobody's really going to be looking at this stuff that closely But that's why if you're wondering why the numbers mismatch it's because this is the fourth time It was actually deployed. Anyway So, uh We have this open. I think already somewhere this app maybe no, I'll just open it again Okay, we visited the app. This is good and now if we go to the metrics endpoint fingers crossed everybody All right We have somewhat lift off lift off. Okay, cool Don't worry about that being small. You don't really need to read it. Okay. Now where are we at? So we have The cluster open shift has been told to Look for user look for user stuff. Exactly. Yeah, and we now have an app that Makes stuff Happen exports metrics. So now we need to tie the two things together until the cluster prometheus to actually look For the metrics. So let's go back to the documentation Uh, so we need to create a service monitor Uh Which tells The prometheus what? uh, what endpoints to Actually consume and so this is where it's going to get a little a little weird So let me copy copy pasta as grishore likes to say copy pasta All right new file Close this We're going to paste this All right, I'm going to turn off the terminal here. We'll write this file as Slash temp slash service Monitor yaml And then we get cool formatting. Okay. It is a type oops type service monitor Um, I don't think it needs to be labeled So what does it say? Of course not So we're going to delete that for now We're going to call this our synatra monitor And it lives in the Oh boy metrics playground I thought metrics playground metrics playground namespace Endpoints selector match labels This is all weird, right? So what we're going to do is we're actually going to look at the prometheus documentation To explain better what Uh Where's the Operating Yeah, I want the service monitor spec basically Guides instrumenting it go Oh, where's ryan? He probably has a link bookmark somewhere. Yeah, right Of course, literally probably just laughing at us now you're eating popcorn and like yeah throwing stuff at her monitor Just do that. There we go. Getting started service monitor Service monitor tells it which services so here service monitors. Here's the thing, right? The operator not a prometheus itself, right? Yes. Uh, yes. Thank you lily So the key word here is service monitor. It's not pod monitor. It's not deployment monitor. It's not replica set monitor It's service monitor Premediate resource includes a field called service monitor selector which defines the selection of service monitor. Anyway, so The key here is that there's a kubernetes service That prometheus is going to look for to monitor, right? It's a service and that service will be durable in the face of Rebuilds redeployments pods dying pods being scaled. It gives us a reliable endpoint to reach to reach The implementers of that service behind that So service is a is an in cluster load balancer among a group of pods that that implements some some arbitrary service Yep And so if we Look at the yaml for the service We see that it has a label of app sinatra metrics So the match labels is going to be App sinatra metrics Well, it's it should I think it'll actually just be sinatra metrics. I think the app is you're right. Sorry my bad The there was no app in the key was app and the value is sinatra metrics And then the important thing is this section of endpoints is looking for the name Of a port Defined in the service if we look at the service The service has a port named 8080 tcp so the port that we're looking for is 8080 tcp. That's the name Of the port. We're going to query it every 30 seconds. That's fine The scheme is http, which I believe is as opposed to like gr pc Or something I lost the docs Oh, where's the docs? Services discovered by service monitor Endpoints port we could probably have left it out Yeah, okay um, we're probably gonna have to do some r backy stuff, but It'll be fun Um, okay, so scheme http. Yes, so we will save this and then we will create this file we'll see create temp svc monitor Diamol okay oc get pod dash a grep amethias And what i'm going to do is for giggles. Let's look at The logs For that pod and see What it says Amethias key workload deprecated spec image Just looking to see if there's anything interesting in here get namespace Does not exist. That's fine. Okay, cool Let's look at the logs for this one just to see does it say anything interesting? Oh gosh, okay Rules config map reloader. Oh, man. I don't even know Basically, I'm looking to see if like It was figured it out that it's supposed to do something Completed loading of configuration file. That's not interesting Also, not interesting Never mind Paul phantom says and it is here Yes, that is the service monitor. Just go to the target's endpoint of the user workload amethias ui Uh Oh, does it have a route There's no routes and the user workload monitoring query your metrics. Well. Yeah, I know I can query I was trying to prove that like the metrics were there, but sure. Let's let's query them. Okay metrics It's like query Custom cloud show prompt ql. That's what we want. Uh, well, what I guess HTTP Requests. Oh, look. Yeah stuff HTTP request total How do I run the query? Just hit enter Oh, there you go. No data points. Okay. Well, that's because we haven't hit the thing yet um I'm watching chat for paul to say yeah, we did it wrong. Okay, so I just said a bunch of lily says just go to slash targets endpoint Of user workload per amethias ui. Yeah, I'm not I'm not sure How do I target supposed to refresh targets? HTTP requests total enter None No routes yet coming soon, but you can port forward Oh Okay, so the metrics is not yet, but I mean it's working in the sense that like it's collect. It's it's collecting them Oh, wait, HTTP requests Total that's not an actual metric though. Yeah, or you're not generating that are you? HTTP requests Hey, hey, there you go metrics. Well, we did a thing So cool Open that thing up to the world so we can get more metrics in it There you go like Start typing Yeah Yeah, so to kind of reiterate the key idea here is we deployed a really simple HTTP server app We added to it a library that exports some counters at slash metrics And we connected that by describing it in a service monitor to the onboard user workload monitoring and open shift So we've got facilities to draw graphs with it and and dig into it and and analyze it right here in the open shift developer console Whoa, people have done the thing. I've done the thing. Sorry I want metrics. Give me metrics all the metrics. Okay, cool. So now we have We have an interesting Metric, right? I mean, it's not an interesting metric HTTP requests. Um, let's it's a useful metric It is but let's let's come up with some kind of imaginary metric, right? So, um Where's the documentation for the exporter? Okay So what are the different types of metrics? So we've got counters We've got gauges. What is a gauge? So gauges, uh, that's just like a different instantaneous value. Oh, we just set it to a number. Yeah, yeah Got a histogram Uh provides a sum that's that's a lot An americ data labels all metrics can have labels We set label values with labels. All right So let's let's do something interesting, right? So we're gonna use we're gonna create a gauge In here, um, and you'll you'll understand why I want to um To do this, uh room temperature celsius doc string dot dot dot labels room Set a value labels room kitchen. I got it. Okay. So what do we what do we want to do here? So we're gonna go back to the app We are going to create a gauge Um room temperature is not very interesting What do we what kind of a thing do we want to measure and the reason I'm doing this is so that we can try to create alerts Right because like we don't want to alert on the number of requests. I mean, we might want to like Maybe we have a really terrible app that after a thousand requests. We need to reboot it, but that's that's not interesting, right? So um, we want a gauge of something um Maybe the gauge of twitch viewers Can you pull that we'll create a gauge called I mean if we want to go like totally wacko we can uh We can try and figure out how to tie this into the twitch API No, so we're gonna create a gauge of viewers or or with a name For viewers with a parent with no doc string I'm gonna put a doc string in there just for gables e to many viewers labels room Uh, I don't know. What does it say labels are? Do do do labels all metrics can have labels allowing grouping of related time series. Okay, so we'll call this service And you'll see why in a second Okay, so we have a gauge What do we do with the gauge? Gauge set of value Gauge get a value gauge increment the value gauge decrement the value. Okay Let's create a new endpoint in our application called twitchy twitchy and when twitchy Gets visited I don't know. Let's set a random How do I do a random number in ruby? How to get a random number in ruby use ran range. Okay one plus Okay, sure. So it's zero to whatever so we'll say nom viewers equals Rand I don't know. We're not that popular so 50 Just give us a number from zero to 50 And then we will set the gauge The sorry gotta go back to the docks. I'm slightly offended by that. We're not popular. It should be we're not popular yet Yeah, that's what I said. That's isn't that what I said. No, you left off the yet part Soon soon we will be the twitchiest twitchers. Yeah, well, yeah That's about right gauge dot set Just copy this too many parentheses. Okay, so we're gonna set this to the num of viewers We are gonna label it as I think I said the label was service We're gonna call this twitch Okay, let's try to run this locally. See if it works Come on. There we go bundle exact rack up Okay, if we visit our local host version of this There's no metrics yet. Oh Apparently getting metrics counts is uh Oh, it's a server request, but it doesn't actually increment the metrics counter, but that's okay. We do that on purpose all right, so if we go to What did I call it twitchy if I go to twitchy? Nothing happens But that's okay. Nothing really was supposed to happen because we didn't tell Sinatra to return any data or anything So now if we go to local host 9 2 9 2 slash metrics We should see twitchy I have a Yeah, but I don't see the gauge. Yeah, maybe because I forgot to register it Oh boy, but create Under new gauge new Yeah, I have to register it from a theus register Viewer gauge Okay, so we have to reboot our app server real quick Um help viewers eat too many viewers. So it's cool. We're in there if we visit twitchy And then visit our metrics. We see that we have 35 viewers maybe On service twitch Interesting. Cool. And if I hit that page again twitchy and I refresh the metrics Now we only have 21 viewers So I'm pretty sure that's the algorithm like, you know how youtube does the the weird you never know how many views A thing actually has I'm I'm pretty sure you're not going to get the if you keep pinging it you'll get a different number every time Right. Yeah. Okay. So we can push this code Live now. So this adds a um adds twitchy endpoint push it And then if we go back to Our builds we'll see that it's building Build number six is running. We'll wait for this to finish. Actually while that's getting ready to finish Let's go find the documentation for alerting rules Creating alerting rules So the difference between a service monitor and alerting rule, right Prometheus Uh is the thing that alerts And it alert managers the thing that delivers the alert so When you're configuring alerting what you're doing is you're telling Prometheus at what condition To tell alert manager to deliver an alert So Prometheus collects metrics collects all the things And then if the metric exceeds a condition For which an alert is defined It goes and it calls alert manager and says tell somebody about this thing We will create an alerting rule for the number of twitch viewers Well, actually. Oh, yeah, sure. We're gonna do that. Okay Here's our Prometheus rule Why is this version alert This configuration creates an alerting rule named example alert which fires an alert when the version metrics Oh, I guess this is the name of the alert Wait, I need to file some docs bugs here. This would be nice if these were clearer I have a friend in docs now. We have lots of friends in docs. Well, I don't think I mean, I'm sure ali's not watching but Um, she is definitely probably not liking me very much these days. Um, okay file new But do I need to go do something nice to make up for this? No, it's fine We're gonna call this temp service alert yaml, which doesn't make any sense, but whatever So we are gonna call this twitch Too popular that's gonna be our that's gonna be our alert metrics playground Example so this is gonna be twitch Rules version alert. I got I got to look that up because I want to understand how this actually makes any sense Come on. There we go. Um service monitor tells it to monitor service Prometheus rule Exposing crazy alerting describes What maybe I want the syntax of this Prometheus rule kind Prometheus that's Deploying alert my or Prometheus rule. Here we go. Well, that's not useful Explain me the alerts configuring the alert manager. No want to understand what that Name is lily still watching lily. Where's the documentation for the actual Prometheus rule Alerting thing Like getting started tells me Huh, I just saw alerting Well, yeah, but that's what we were just looking at that. Yeah, okay I want to understand the Prometheus rule syntax for So like here's the Prometheus rule Fine, this is I don't I don't even understand so alert like an alert example alert. Is this the name of the alert? I'm assuming so I'm gonna go with that. We're gonna find out So we're gonna call this It won't work in four three Alerting on custom user metrics four five and onwards is what lily says So this is four four. Is it gonna work? Four five and onwards is what she said We'll find out paul just sent me a link on the uh, yeah, he said on the bottom of that page for alerting Yeah, that's where we were and it's not I mean it's sort of Yes, it it doesn't explain the syntax of the alerting. No, no, no the actual alerting md file this Yeah alerting md on the bottom Yeah, but it's still the syntax like yes, here it is, but it's not clear that like this defines the name of the alert I don't even but I don't know whatever We're gonna try it anyway, and if it doesn't work so be it um expression version job equals zero So I think we're gonna need some code to try this out. Okay Let's see. So the build completed. We're good if we go to topology view. We're cool Here's our app Hello world. That's great if we go to metrics We see each many viewers nothing's there If we go to twitchy Nothing happens, but that's fine, and then we go to metrics and we should see something metrics Uh viewers eight. Wow People are really bored. Okay. They're not bored. It says 35 over here. So here we go Here's the syntax right viewer service equals twitch so expression viewer service twitch greater than 40 Do something send an alert, right? I don't think this is actually Gonna do anything. Thanks paul Um because it sounds like somebody says this doesn't actually do anything but you know what I'm gonna try it anyway There's more links. I'm not sure if you saw Content of rules follow prometheus format of alerts. There's another one. It's a number of awake viewers. Thanks ryan It's a number clearly you're awake. So I guess we're doing well and we're doing a good job here All right, so I created this prometheus rule um I have no idea how to figure out Whether it's working or not um, I Because alerts do alert do events do alerts show up as events um All resort now I don't know that I can see them here. I might have to go to the admin view to see them then uh, maybe But I think lily had said that it doesn't work. It doesn't show Yeah, right and and so I think if an alert were fired It does show up in events I mean, of course most of the alerting machinery is oriented around emailing or ringing a pager Doing actual alert things on the outbound side I think that we would have an alert notice in uh, in the In the events, but I don't think we're actually I don't think alerts are an alert based on what lily has told us I don't think alerts are events Yeah, I think you're right eric there alerts are their own thing uh What it this and I don't even know how to use alert manager status This isn't actually alerts. This is just the status of alert manager Yeah, alert manager is the thing that controls what gets like broadcast, right not show me everything Right, like if it's not like click the plus sign next to not grouped It's a little bit down a little down. Yeah, what is that watchdog? Oh, this was showing me all the alerts. I'm like totally looking at it and not understanding what it's telling me So this is just these are currently alerts that are Happening so for example alert name image printing disabled great. Okay um, so currently we only have Eight viewers so we need we need more we need more viewers than that to make the alert happen So let me go to twitchy Now if everybody goes to that URL that's going to screw it all up. So please Be oh, do you know now we're down to three viewers. It got worse Whoa, I just totally did something bad This is gonna. Oh bad things are gonna happen weird beeping in the background. Oh, this is gonna be real bad Do you still see what you're supposed to see? Yeah, why what's up you lose power or something? No, I I have a everything on a kvm And double page down Is change to the other thing But the monitor her is going through the kvm and I've got um Some things on my laptop screen and my main screen in front of me And so it looks like my webcam has stopped It does your webcam's fine. Oh, you see it Yeah, it's not moving. Never mind. That's what I'm saying like you don't seem to move my head back and forth I'm frozen. Yeah, so let me Uh, how do I get back to the settings here? Or turning on and off again that is exactly what I have to do here. So we're gonna change this to HD webcam 615 And hopefully that's gonna work. Is it gonna work? It's not working That's a bummer. Yeah Nope, there you are on a different camera. Well a different webcam. So that's fine side view now. That's fine. Let me Does this work? Oh, I'm back. There we go. And you're back. Ta-da. Thank goodness All right, okay. I gotta be more gentle with the page down button. Yeah, no kidding. All right What did we set the alert for 40 something 40 47? All right. So now we're at 47 Which isn't good as long as nobody goes to it and wrecks us Don't visit that URL Monitoring metrics we want to look at Of course viewers Cool. Oh somebody went to it Not when my fault we got we got bummed Yeah, we're back at four viewers 29 Yeah 36 come on, baby 43. Okay But I don't think alert manager is gonna figure any of that out I think those were same ones from last time Yeah, these are the same same ones Still got 43 viewers. Yeah, we're just waiting for Oh, because the There we go. Okay. So now we're back up. It has detected That we have lots of viewers But I don't think alert manager is going to do anything Why is that? Because I don't think it it works I don't believe Alerts exist yet Yeah I arrived today thinking that it existed, but I've learned that it doesn't yet. Yeah, so so maybe lily can tell us is it that it's not It's just not looking At the alert rules like the the prometheus that's configured For user workload monitoring just doesn't care about prometheus rule alerts or Is it that the prometheus user thing doesn't know how to find the cluster alert manager or what's the Yes, it it we understand that it sounds like it's coming in four or five The question is like what part isn't wired together, right? Yeah, like what is the missing piece? Kind of in pursuit of finding out what we might wire together to make it work manually Right But um, I mean, you know, we can keep going and trying to do other interesting things But at this point, you know, we've created interesting custom metrics uh that are showing up in The metrics ui Yeah, I think and eric if you want to bring that back up on the screen like that's That's the coolest point. We've added arbitrary measurements internal behavior. Oh the metrics view sorry Yeah, I meant the metrics view or the the alerting view back over in the open shift console because what we've been able to do and Remarkably and actually a fairly short span of time is yeah, take a little app Arbitrarily measure different parts of its internal state and present them back in graphs Right in the open shift web console without really having to build out any of the visualization part All we've had to do is identify an endpoint and had prometheus to scrape it for us And we only had to learn a little bit about ruby to do it and we only had to learn a little ruby So hopefully hopefully whoever was uh disgruntled about us doing software development and app deployment is happier now Don't be disparaging. I'm not trying to be disparaging. Somebody was disgruntled legitimately They're gruntled now. They're gruntled. Is that the opposite? Right in and I don't know I'm good at making up words So many words so many words I don't I I don't I think we've sort of achieved the goal for today Yeah, and I mean it's a little bit funny that that we didn't have an absolutely correct understanding of of the alerting piece But um, we actually did illustrate a pretty useful process for anybody who's building applications on the platform and needs to measure internal state of that application um, yep and with the with the With the knowledge that the upcoming feature is wiring automatic alerts based on these custom counters of internal application state in the very near Open ship version future Okay, so lily has provided us a comment. Sorry for not looking at the camera. Yeah, I'm trying to I'm trying to decipher it I understand the global view in multi-tenancy piece You can create alerting rules, but just on your Just on your own custom metrics not an alert manager Yeah, no right and alert manager is the thing that uh routes the alerts totally got that so so the question is technically right now Because the number of twitch viewers is is too high The rule exists for the alert. So is there somewhere that I can see? The alert like being fired or triggered right because I understand that you know alert manager is not finding out about it But does the does it does it do anything? Is there somewhere else? Yeah, or to ask the question another way Like would we have to go all the way to configuring alert manager with a known set of alert targets to be able to observe? I don't even know that we could Uh, so ryan asks In sre terminology, did we establish a new or custom service reliability indicator? Um You can see it in prometheus ui for your user workload monitoring. Well, I can see the value, but how do I how do I see? Is there a query that's like alert Would you see it in local? No because the application doesn't know about um alerts So, um, I think what paul is suggesting is that in the admin perspective? Right the prometheus ui Well, so this is the prometheus ui for user workload monitoring, which we never exposed So I think what he's saying is that I would have to expose The prometheus ui for one of those user workload monitoring pods and then we could go to it paul and and see it um, so No port forward the prometheus user workload Well, eric, what I'm suggesting is forward the prometheus user worker and I could be wrong Exposing the service might be the same thing as as forwarding. Sorry. Go ahead just over to the admin ui I think we have a prometheus ui um Yeah, but it's not it's we do but I don't think that's the same one Because there's a there's a different prometheus Yeah, well understood right that we have this other prometheus set of of pods and deployments running to do our user workload monitoring Right. Yeah, this this if I click the Go to it from this view This takes me to the cluster one And paul paul and lily have also further clarified that they that you're you're correct eric Like what they're referring to is specifically the user workload monitoring prometheus Um, and they're talking about port forwarding to its its ui endpoint um, can I just expose it as with a service prometheus user workload service right here So can I do oc expose? We're gonna do it. We're gonna Try to do it Do a live user work load Oh, that's a service Okay, oc get route We're gonna do it. It's either gonna work or it's not Client sends an htg request to an https server. Okay, so yes That's cure. That's fine because I don't trust the ca And it doesn't work. Oh, because I think it's not Hmm off proxy What about oh do you have did? Do you need to be logged in? No For your lockdown What port do I want? Uh What did I do something there we go? Let's see internally it is I guess it's 90 91 90 90, I think well look at screen. I mean, yeah, I know it says it's target port metrics I'm really oh, I sorry Lily is here like she's actually doing this stuff live. She's checking her bash history now I'm very happy Lily's here Um, so if we look at the pod the pod should define ports that it does something with 90 91 Here's the here's the port that it exposes 90 91 So oc port forward h oc pod oc port forward pod permit these use your workload zero colon 909 Okay, it says it's doing it. So now if we do local host 9091 Oh, son of a biscuit. It's thinking Yay We've been here before Which I think is probably going to take us toward lily's comment about oc or about oauth proxy I'm just going to guess because it says because I would need to send some kind of token or something Oh lily dropped a comment and here cube port forward you did. Yeah, but that's what that's what I just did Yeah, we did that with the you hit you bar back proxy, right? Yeah is yeah So I don't think this is going to work No, but Could we fix it? No Because it's because it's configured to want to um There's but 90 90 is not exposed in the pod 90 91 is the cube our back proxy 90 90 is for amethias That I understand how that oh, I guess the our back proxy intern. Oh, hold on. Maybe I didn't scroll up far enough to do a find I think the stands was back containers liveness pro 99 I see what they're saying now. I just saw it 90 90. Where to scroll up Well, this is the low. That's a url. Sorry Anyways, I was just looking for 99. Well, but that's there's probably some awful jq command to make that work, right? So Paul says this is why you need port forward Oh connection closed. No, I know what that means. Well, because it's probably a ctp Ah alerts Twitch popular. Yay. Yeah Look at that E too many viewers. Look at that Kids, we did it. Oh, but I did out and eric and lily did it That's really it was really all lily paul lily paul. Thank you so much I'm just With special guests paul and lily Who by the way folks are like at least lily is in germany and it's probably i don't know 3 o'clock in the morning, right? Yeah, that's true. It is now. It's like 9 p.m. Maybe 10 Yeah, I mean it's late. Yeah But the fact that uh, the fact that they're willing to hang out with us like absolutely after dinner or whatever And and and that neither of them have managed to kill me somehow through my computer yet Um is I think uh, that doesn't mean they're not planning to It's only 9 p.m. I'm I will end up dead at some point inevitably in my own hand They're writing that program iteratively and then they just haven't got it all the way through to buggy yet Yeah, so lily says all this magic is now done for you from 4.5 onwards. Yeah, no, it'll be cool No need to port forward Oh, it says it's not active. Wait, did somebody did somebody change the number of viewers? This is weird. Well, so the number of viewers is is greater than 40, but the alert I see the alert, but it says it's it's not active active So shouldn't it be active? unless It went down again Can you go look at the graph over in the ui? I'm looking at the graph not the graph graph 43 It says right there. Okay So it should be It should be on Should be firing And theory Yes, what do we do? What do we do wrong here? Viewers namespace metrics playground service twitch greater than is that there? Is that the right greater than or less than some will I never get those right? It should be it is it points at the little one chris. Yes, I have to remember every time I have to look it up every time. I used to have a sticky note Still do a little over there viewers I'm gonna open this in a new tab Let's do this Let's graph exactly what it is Execute no data Yes, the alligator eats a big one. Yes, this is true. Yeah. Yeah. Yeah. Yeah Why I'm wondering why? Well, I was always told we point and serve. Oh, here we go. Okay service I may have defined my uh my rule incorrectly Oh Get No resource. Oh and the wrong namespace and metrics playground Let's see edit Too popular No viewer service twitch Oh What's the metric though? I might have done that wrong No viewer service twitch Oh exported service twitch Oh, I think I'm I'm using and I'm overloading a word Because look at that. Look at the actual data Exported service twitch. Oh, sorry. Yeah, sorry Right. You're overloading overloading a word. So if we look here the service that's being monitored is Not from metrics We don't want to measure that Yeah, so I need to call this alert exported service twitch Export it. Yeah, there we go. This is because if I change my Viewers to exported service equals twitch Execute. Yeah, it still works. Okay. So once I change this rule Hopefully Prometheus will pick it back up at some point I don't know what the Interval is on which I think it's 30 or 60 seconds You know, well, that's the that's the interval for polling That we defined in the service monitor, right what I'm saying is we just edited Metrics playground too popular. We just edited this definition So the question is when does Prometheus reload the Prometheus rules That are defined. Oh, yeah It still has not Reloaded it. It's still over 40, right? Yeah. Well, it's it still says service equals. Oh, yeah, and not Yeah, because this isn't this Yeah, there we go Oh, so maybe they'll know no Wait, look at that. Pew pew Firing there is a firing alert. Yeah Bang bang. All right. So now we we have actually finally succeeded in Doing the thing that we set out to do Almost completely. Yeah Yeah Now what and yeah, and as lily points out that that Recycle time was basically kubernetes Um Recycle time so we're what we were waiting on is for kubernetes to go through and reload config math So the changed. Yeah. Yeah Very cool Whenever kates reloads config maps zero to five minutes is the answer to that question. Yeah, because the My assumption is that the operator makes and lily or paul can confirm The prometheus rule is a crd And so when the crd changes some operate, sorry Prometheus rule is a crd Too popular is a custom resource instance of a prometheus rule And there's an operator somewhere that's looking at prometheus rule instances And i'm guessing it manipulates a config map eventually Once the prometheus rule instance changes, which then eventually gets reloaded by k8s Into the prometheus pod At which point it hops itself and knows about the new rule definition. Does that make sense? That's a convoluted description, but yeah at that point. I think I need a picture, but Um Here we go. Oh see get config map remember. I'm just a simple ops guy. Yeah, so see this Meteos user workload rule file zero. Yeah so if we Look at the yaml for that. Okay. Here's our twitch Yeah, right so Okay So there's a prometheus rule called too popular In my project The prometheus user workload operator This thing. Yep is looking at all of the projects to find Prometheus rules And so it found my prometheus rule and then it found that my prometheus rule changed And so it updated the config map With the new rule definition Nice and what lily said about reloading config maps At some point k8s says oh the config map Is different than what is actually in the pod So it fixes that somehow At which point the prometheus pod is like, oh, I have different rule definitions And then it it loads the new rules and then that's when our alert finally fired. So if we look at the logs for this Pod, I wonder if we'll see where it picked up The rule or something No, it doesn't actually log any of that fun information. That'd be nice if it did We'd help for trouble shooting me No, I mean nothing You know, I mean, okay, so go back to ryan's question. Um You know custom indicators are relevant for your after performance. Yes So to scroll back up to your specific question ryan, you said did we just establish a new service reliability indicator? So the the answer to the question is Yes with a asterisk It it depends on the context Of the indicator. So if you could create a gauge In your application You know That was a metric of health Right, uh, that was derived from other things going on in the application Absolutely, you could now have this be an sri Uh, the the measurement would be the sli and the high water or low water mark of the Alert you would define an alert based on the the sri And so the other thing that you can do with prometheus rules, which you have to be careful of is you can do like derived Mathematical, whatever's Let me see if I can pull up the docs for it. But there's basically a way to do stuff like You can do math basically on a Uh, yeah, which might be a combination of two counters or a function of one counter By the other counter or other ways that you want to massage two sets or three sets of data points To make them useful as a high or low water mark trigger, right? Yeah, let me find yes, let me find that That would be called a recording rule Right Sorry, not necessarily recording. Where's the? Yeah, so you can define Great functions. Yeah You know like an example of this the when we talked about this before uh, Eric An example of this that always comes to mind for me is like if I have a Some kind of a sensor That say measures temperature it may not be giving me a degrees fahrenheit or degrees celsius It may be giving me a raw count from the sensor That I could then apply simple math to to get a degrees fahrenheit number It may be giving me seconds when I want minutes I'll give you a perfect example of that in the real world So, um, I'm a car person. I like motorsports and car stuff and so, um, my I have a Programmable engine computer For tuning the performance of the engine and so it it outputs sensor values And so there's a messaging bus that it basically spits out all these numbers, but it doesn't do pressure In psi or bar. It does it in kilopascals, but it's Kilopascals in reference to a number, right? Yeah, but not only is it a raw number, but it's a raw number That's in reference to atmospheric pressure. So technically it's already 14 psi Or whatever the current barometer is like above the actual value So if I want to display on my dashboard in the car the oil pressure I have to convert from kilopascals to psi and then subtract out 14 pounds because atmosphere And then that's the number I can show so in in prometheus terminology. That's a recording rule That would be, you know, whatever the scale factor is for Kilopascals at a psi plus 14 and so instead of having to alert on 1096 kilopascals I can have a different thing that's like pressure psi As a metric and then I can build an alert off of pressure psi, but that's a derived metric Right, right. And so our I'd like to summarize whether like not intending to be perfectly correct But the idea here being We could have a function that first subtracts a sense of ambient atmospheric pressure And then does a conversion from Yeah From this to something more Like more humanly recognizable or more easily painted on a gauge in a car dashboard like you're talking about Yeah, and so if you had a bunch of different metrics That you could mathematically combine in some way to get a single health score if you will Then you could have A alert that fires when health is below some threshold or above, you know, whatever right The one caveat to this Pre-compute so if you're doing like crazy math And you're doing lots of crazy math across lots and lots and lots and lots and lots of rules Yeah, we'll crush but me crazy math is math where n is large despite the fact that n Whatever right like any kind of mathematical functions, you know, but the more the more complex your recording rules get The more horsepower you're asking prometheus to use every time it has to calculate So what is entirely possible that you can crush prometheus by writing too many fancy recording rules? So just be careful if you start to do these You know like sums with square roots with you know exponents and other stuff It's just like any other monitoring system if you start throwing a lot of You know like this metric with this metric with this metric and all the comparison of it And then you get this algorithmic thing and then boom like it's that's computational time, right? Like prometheus doesn't magically solve that problem. You still have to like account for that yep So it was a quick background aside for our viewers, eric What is a race car? Does it have square tube frame rails in a naturally aspirated v8? Uh No, are you an scc a kind of race car guy? I I I am an scc a slash nasa kind of race car person. So yeah And so for those who don't know scc a is like yacht racing for people who like race cars Close to you know, nascar, which my father actually was that's what he did for a living was drive nascar race cars He drove nascar for a living. Yeah. Yeah. No, if you didn't know this about josh. Yeah Like my only real my only real engineering background my degree is in journalism everything I know about engineering was from doing trig and sitting chassis up What's up? What was my whole real engineering background? My father's name is bobby rustwood Right out of talladega nights There we go here. I'll find you a link. Um, hey look at that career stats Yeah, so we raced uh, what's they refer to here as the southeast series and uh, Dad was rookie of the year and what was then the nascar slim gym all pro series 1994 We were at the time we were like the uh, third level nascar series the What were the craftsmen trucks that are now like the camping world trucks actually now that third level So it's like a double a baseball Yes, right. Yep. No, I remember. Yeah, now we're talking about that. Yeah, we're gonna have to we're gonna have to have chat Kindle oil you see down there like we'll leave this after this but I do I got since we brought it up I got to say this one thing if you go down a few rows and you see that kindle oil gt1 Yeah, that that was my like first professional action in my entire life. I was about 14 at the time Um, and I I did that sponsor deal with kindle oil That's awesome. I was like one of my roles in the race super cool to do negotiations for billboarding essentially Very neat Look at that. We were racing oldsmobiles Uh of car brand. There are probably people on the stream who have never heard of Was that was that was at the achieva body? It was the cutlass the very last late model cutlasses. Yeah Yeah, I guess the achieva was later That might have been 97 98 or something like that I don't know. Whatever. Anywho Cool beans. All right. Um, well, that's cool Uh, we did it. We got alerts and metrics got everything. We got it all things And and we got a little car talk out of it. Yeah. Yeah, it's car talk. You know, you never have to worry about Car talk with me. I'm always happy to talk lots of cars Cool beans. All right. Well, I got I got nothing Chris. What do you got nothing? I got nothing I would like to invite everybody uh tomorrow morning. First thing bright and early zero 900 eastern time 1300 gmt utc We are going to talk about open shift virtualization, which I am super excited about um, and then later in the day I will be, uh co-streaming An event with open shift commons where they have our global transformation office Which is Andrew Clay Schaefer and john willis and jade Broom and that whole team J. I forget his last name plume Broom, but there's too many names. Anyways, they're all going to be on chatting with Diane and we'll live stream that here as well tomorrow at noon eastern, which is 1600 utc. So you got two shows tomorrow for you Looking for a packed house Stay tuned. Uh, we'll get we'll get more Schedule and info out as we can set up more infrastructure to get that out to people We are literally doing this as live as we can right now No doubt you saw from my horrible microphone headphone experience Well, at least it was plugged in today So thank you. Thank you so much josh wood. Thank you so much er Jacobs. Thank you everyone in chat lily paul Ryan. Thank you all so much um Have a great afternoon. Have a great evening wherever you are. Stay safe out there, right? Like we want to see you back here tomorrow So thank you everyone talk soon Thanks a good night