 I think we should start. All right. So, Huma Key is a developer advocate at Livepan, and today he will be talking about event-driven services as neural-synaptic services. Please go ahead. Good morning, everyone. So for, I'm a speaker, I speak at conferences a lot, and I've been talking about event-driven systems for some years now, because I really think they're very, very interesting. And I've been building systems using it as a developer advocate, so I've got decades of experiences at backend, like Java developer backend systems, so now I get to build systems for fun and demos and things like that. But I've been fascinated with event-driven, and over the last year, I've been using something else, and the hint is, it's called this, Kalex, but that's kind of changed my perspective on how to do software, and especially how to do event-driven software. So I think event-driven software is very, very interesting, and this approach that I'm gonna kind of show you as well I think is interesting also. So maybe you'll think I'm crazy, some of my colleagues think I'm crazy because I'm talking about this neural processing with event-driven, but we'll see, you'll be the judge. So what I'm gonna do is, I'll do review event-driven because I think a lot of people have heard about it if you're a backend developer or even full stack developer, but many people haven't used it. And I'm gonna show you a bit of code, don't have a lot of time, but I want to show you a little bit of code and a different style of code as well, and then I'm gonna go crazy on you and get into this concept of what I call micromind. I've been playing with a name, Neurosynaptic Services and Micromind, those types of things, but it's an interesting processing pattern. So with event-driven, one of the things I've been working on is a demo that's been a lot of fun for me, and it's called Ership. And the reason it's called Ership is that the user interface that I built is a map. It's like a Google map, so you can go anywhere on the earth and you can zoom in on it to a location on the map and then you can pick a circular region like you can see the circular regions here and then you can say, I wanna create a simulated number of orders. So this is like an e-commerce system, this is like the shopping cart demo is kind of like the hello world for backend applications. So I started with this simple shopping cart app but that got boring really, really fast, so I wanted to make this more interesting and I wanted to do more interesting things with the demo app but I always like to do this visual stuff so that you can, and this is on GitHub that I got QR codes where you can go grab the repo, but the idea is you zoom into a location on map, anywhere on the map you want, you could be in the middle of the ocean or it could be over a city or wherever you wanna do it and then you can specify how many orders do you wanna create, you wanna create some like a couple hundred orders, you wanna create a thousand orders, but what I wanted to do is kind of have this demo put a hurt on the backend system, already put a load on this system and another thing you could say is a rate, like orders per second type of a rate then one click goes in to this demo app and it starts generating a bunch of orders. So all these dots represent orders that were randomly distributed within the circular region. So this is a design diagram of this application, I'm not gonna be able to go into a lot of detail with it but I want it, it is event-driven. So these rectangular boxes represent basically event-driven microservices, but they're of a very specific type, they're probably not like maybe what you've seen with other event-driven before because they're very focused on what they do, but what I wanted to do was show kind of visually what happens. So you can see the bottom left, there's a client, I'm kind of highlighting a client and it's sending in a command to a shopping cart service and it's adding say items to a shopping cart so it's loading up a shopping cart but the interesting thing is when the command comes in from the client to say check out. So the service is emitting an event, check out. Now this is where things get interesting, that event is of interest downstream to another service which is to create an order, so that triggers the creation of an order. So this is event-driven flow that there's kind of a cascading sequence of events. Like I say, this got boring really fast so I wanted to plug in this UI. So the map in the top center there is sending in a single HTTP request to say, I create 500 orders at this geographic location within this circular region. So that goes into another service which starts triggering the creation of these orders. But you can see there's a flow like this generator got the initial request, it starts emitting events over and over to create these what I call geo-orders and these geo-orders are just orders at a specific geographic location. But here you can see the event coming out of geo-order is of interest to a couple other services and the point I want to make here is that each of these services, one of the things we've been talking about with microservices forever is they're supposed to be loosely coupled and these are loosely coupled services. These services have an API but they don't know who gave it send of commands. These services also emit events but they don't know who or what consumes those events. They don't care. So every single service is only concerned with its own functionings, not concerned with the rest of the system. So this geo-order service emits an event which happens to be of interest because of the wiring of the system to a couple of other services. But now we go deeper into the processing flow here and I'm not going to go through what's happening here but other than what this system does is allocate stock to the order. So it's doing something a little bit more interesting than just taking orders in. But you can see there's this kind of cascading flow of events that are coming in. I'll get rid of this thing here and what it is. Where the processing is occurring. So all these highlights are these cascading flows where commands come into one service, it emits an event, that event gets picked up by another one. So the logic of this system is built in, it's kind of decomposed down into these individual very focused little services that are wired together to orchestrate the whole processing of the system where it takes orders in, allocates stock to them and reports back to what the processing is. So this is the general flow of this system. This is a really different way to design a system. If you, like I said, I've built back in systems for decades and with regular databases and cred-like operations, multi-table transactions and all those types of things, this is not that type of behavior. So the design of this application was really interesting where I decompose a problem down into each of these services is solving a different part of the problem and it has this processing flow and including things like allocating stock to the order. So that's kind of what this demo does. It's a fun demo to run visually on the map when the orders are first shown, they're grayed out indicating, okay, the order's there but we haven't processed it yet and then just over a period of a few seconds when it runs, the orders will turn blue when they're allocated stock or they'll turn red when there's insufficient stock so the order's going to back order state but that triggers other behavior that's in this design which will order more stock and as that stock comes into the system that stock goes looking for back orders and fulfills those back orders so that all the orders get processed. Now, the other interesting thing is that there's three fundamental components that are used to build this system in this platform. There's the rectangular services, they're called entities and like I say, they have an API. It's a simple API. I'll show you the code in just a minute. They have logic. Basically, the service composes the state of a thing. It's like this shipping order item for example that's in the middle of this diagram. It's just a durable object basically that this thing is processing and then it's submitting events. Now as events driven, the events are for changing the state. Like with a shopping cart, you add an item to the cart. That's the command coming in. If the cart's not checked out, you run the business logic, if the cart's not checked out and it's a real product and everything looks good then the item gets added to the cart which means you're altering the state of the shopping cart by emitting an event. So the processing is a command comes in, it creates an event, the event gets persisted and once it's safely persisted then the event is used to alter the state of the object and that's the processing flow. So the big difference here is that these services are really simple, they don't reach out, they don't talk to databases directly. The state is recovered from the event journal. So like what is the current state of a shopping cart? Well it's got add item three times, change an item quantity, remove an item quantity, check out. The aggregate of all those events is the current state of the shopping cart. This is the basic behavior of these services. So let me show you a little bit of code. Here's the shopping cart service, it's in Java. Anybody Java spring developers here? So some of this should look familiar. The API is definitely using spring like annotations but you can see that on this line here, number 26, I just have a class and it extends a base class, it's called an event source entity, this is part of the software development kit here, this K-Lex software development kit. But then here you can see, here's how you define part of the API, like add an item to the cart. But the main thing I wanna show you here because there's a lot happening right on this line 49, you can see there's above, it's like effects dot emit event. So what that is, that's the code that emits a new event that needs to be persisted by the underlying system. So this is a high abstraction, I just gave a talk earlier this week at another conference about high abstraction platforms and K-Lex is one of them but I talked about four different ones. These are higher abstraction platforms that are abstracting away as much detail as possible. So this code does not connect to a database directly, it's connected to a database indirectly through the layer, the platform layer that's below it. So the goal is the simplified development. So here it's like, admit an event. I don't know where it goes, I don't know what the database is, I don't care, that's the problem of the platform. My problem is worrying about the business logic and the service. So it emits an event. The next interesting thing is this current state. The current state is an inherited method from the base class which composes, has the state of this thing, like in this case a shopping cart, as it is right now, retrieved from the database. This is also a stateful service, meaning that the state will be cached, kind of a very elegant caching mechanism where if the shopping cart's active and there's a user adding things to the cart, that cart's in memory somewhere in a cluster. There's one instance of it in memory somewhere in a cluster and instead of having to retrieve the state every single time you're gonna make a change, it has it in there. But all that's being managed by the platform as well. You access in our code the state from this current state. And then this event for is a method I wrote, it's just my style, but it's really simple. It's just taking the command in and creating the event back and returning that. So these APIs, you can see the API, like add an item, change an item, remove an item, check out. This is all I'm doing. And then the next thing here is these event handlers. So the event handlers are invoked once the event has been safely persisted to the database and we wanna now make the state change. So if you look at this method, you can see that it's returning the state. So that's the state of the shopping cart. Altered by the event. So like with the add item, I'll take a go into this on method. You can see this is just basic Java. All I'm, you know, the object here has a list of items in the shopping cart. So all I'm doing is streaming through the list and adding the new item to the shopping cart. Very simple, no database, nothing like that. So there's all these event handlers for the different events. And then the state itself, it's just a Java record. And I really love Java records. It's one of the best inventions come out on Java because they're immutable. So that means that I can't write code where I'm gonna screw up and change the state when I shouldn't have or something like that. So it's really quite nice. But I have these event, just kind of overloaded event for methods for all my different commands. And then I have the on methods, overloaded on methods for all the different events. And that's the basic flow of this class. So no database connections, no worrying about publishing messages, things like that. But you can see that there's messaging happening in the system because the event from a shopping cart, like a checkout, is being picked up and processed by downstream services. So the other component that I wanted to show you is the shopping cart to order. And here I just wanna show you really quickly so I can move on. This is called an action. And actions process the messages from upstream events. So the event comes into the action and the action translates, say, a checkout into a create order command. And it sends that create order command to the downstream service, the order service to create an order from the shopping cart. So the logic in here is really simple. I'm not involved in pulling or anything like that or talking to Kafka or whatever, high abstraction type of code to do that. So that's kind of the code. So I wanna get into the micromind piece here. I'm gonna go a little crazy on you. And when I was working on this design diagram and my goal for that design diagram where I showing kind of the cascading sequences of events but to help people visualize a flow and kind of a real event driven type of a system. And I thought they worked out pretty well but it took a lot of work. If you guys ever heard of Blender, Blender.org is a fantastic 3D tool and I love Blender and I'm gonna show you something else that I've done in Blender in just a minute that I love. But so this diagram I did in Blender but it was a lot of work to do all this animation of highlighting each step of the processing flow. But it's like when I was looking at it, it's like, okay, wait a minute. The way these services are working and the way there's this kind of cascading sequence of events that's occurring between the services, I started to think, wait, this is somewhat like the behavior of the neurons in biological brains. Neurons in biological brains have a state, they have like a charge. They get signals in from other neurons or other sensors like touch or pain or whatever. But the neuron itself doesn't know who sent it a signal, it knows it has received a signal. The neuron decides should it admit a signal. So it decides it admits a signal. It doesn't know or care who is wired up to receive that signal. All the neuron that knows is basic mechanics. That's kind of what these services are doing here. Really simple, really focused, not complicated, not doing database transactions, it's kind of a pure kind of adventure type of thing where signals come in and signals go out and there's some logic in the middle to decide when, you know, how to respond to incoming signals and how to emit outgoing signals. So that kind of blew my mind because, you know, like I said, the shipping order, it gets a command in, does a state chain, emits an event, that event gets picked up by an action because it's subscribed to it, that action takes that event and translates it into a command and sends it down to another service. There's your flow. So it kind of blew me away. So again, just kind of walking through it really quick, you can see this cascading sequence of events, you know, step by step, this is the exact flow of the way this application works. And what it's doing, it's not just taking in orders, like I said, it's allocating stock and what the system is doing with just these 13 simple services, it remembers exactly where every physical unit of stock went, to every single order, all that is being handled. It's also handling back orders, like when there's insufficient stock in the system and when stock comes into the system, it has logic to add stock to the back order orders, find them and allocate stock to them. All this behavior is in relatively simple service. So that's what's kind of cool here. Again, my colleagues think I'm not talking about this whole neural thing, but I don't know, I'm pretty psyched about it. The other thing that I thought was really interesting was that there were behaviors that were starting to emerge, recurring behaviors that I hadn't seen before in other code. So for example, there's like five or six recurring behaviors, I'll give you a summary at the end, but of the 13 services here, three of them kind of use the same kind of a hierarchical tree pattern, I call it a reduction tree. And once I figured out how to use a reduction tree in this kind of a processing, and this isn't unique to Calix, I think is unique to the style of doing event-driven, that these patterns started to emerge. There's all these different reusable patterns that are starting to emerge here. And that intrigued me even more that maybe there's something really interesting happening here, because I'm starting to think that the way our brains, you know, biological neural networks are put together is the same type of thing. There's these recurring patterns in the way all the things are wired together that are being used to do very, very powerful things. So it's this whole idea of, you know, services behaving more like neurons in other functions and behaving like synapses. By the way, this is all serverless as well on this platform. So these things are running in a cluster, they're getting allocated when needed, it scales up, it scales down, those types of things. So what I wanna show you is a another visualization of processing the services. So what I did was, before I play, cause I got time, I wrote specific log statements in the application so that I could scrape those log, specific log statements out of the log and use that as raw data to render the activity within the ERSHIP demo as it was processing orders. So what happens here is that that's one shopping cart getting processed and a stock getting allocated in the system. And then the two other shopping carts come in. But the fun part starts here, like, there it goes. Now 200 orders are coming in from the map. And what's happening here, every dot represents an instance of one of the services getting created. And the colors represent state changes that are occurring. So red indicates back orders. And yellow indicates things that haven't been processed yet. So it's processing and green indicates that, yeah, the stock's been allocated to the order and it's flowing back up. So you can see it's fighting, it's actually recognizing that the system's out of stock that triggers stock getting ordered coming into the system and getting processed. And you can see things turn from yellow to red to green and everything gets processed. And the lines that we're flashing represents the specific messages, like a vent came from a specific entity to another specific entity. So these lines aren't arbitrary, this is real data from the running of the application. And it's 3D, so this is the beauty of Blender. I wrote this code, by the way, it's Python, Blender has a massive Python API, but wanting to do this for years, I've been using Blender for a long time. But I used chatGPT to help me write the Python code to do this. So this was a blast. I was so happy when this video worked because what I think is really interesting here is I don't know if, if you watch videos on say the processing of neural networks or the processing, some videos are on YouTube, say like of insect brains or human brain activity and all those types of things, really awesome looking videos. And I wanted to have a video like this, but for a software system. So this is about 10,000 entities were created, about 15,000 events happened as processing all these 200 orders. And then I already know some optimizations that I can do to this to cut things down some more, but the other thing is that this system is, it's built to run in a fully distributed environment. Scale up, scale down, any part of this application can break and when it comes back up, the system picks up where it left off. It will not, it will keep running until it's finished its processing. Unlike AI systems, like large language models, which have you heard about it, it'll lie to you with confidence or that they hallucinate like chat, GPT and everything. I've seen it do it. You know, you ask it a question and it says, oh yeah, here's the answer. You look at it and go, no, that's not the answer. You're smoking something. The, this system is not trained. It's hardwired. Is it like it's got instinctive behavior, not trained behavior. So it's enterprise precise. You know, like we're using these patterns to do things like financial processing. With financial processing, it's like you don't go back to your management and to your business sponsors and say, oh, it works most of the time. Everyone's a while, we'll screw something up a little bit. No, no, that doesn't fly, right? So this is absolutely precise type of processing. And for example, decrementing stock. This knows exactly what's happening with stock. It won't make a mistake with stock. It won't make a mistake, say, aggregating financial transactions into merchant payments. That's another system that we worked on recently. Those types of things. So it's fully distributed, fully scalable, fully resilient and absolutely accurate. And the nice thing is, is that I write this code and I'm really kind of focused on the pattern of the code, not worried about resiliency, not worried about scale, not those types of things. I mean, of course, when you're doing IOs, you're always thinking about, can I, in my code, can I reduce the number of IOs that my code is doing? Those types of things. But it's a whole different way of thinking about building applications. So to kind of wrap up, what I like about this approach is that the microservices behave more like neurons. I've been talking about microservices since, pretty much, they came out. And some of the things we've been talking about, talking about, but not necessarily doing, is things like microservices, you do one thing and do it well. In this case, you're kind of forced to do it that way. And the behavior of the microservices, commands come in, commands emit events, events get persisted, and then the events trigger state changes. That's the flow. Those actions, those synaptic type actions, they're just consuming events, from upstream event producers. And for the most part, they're translating those events into downstream commands. They can do other things. They can do queries against views and things like that. Like, for example, when I'm allocating stock to the order, the action has to do a query to see if there's any available stock to be allocated, and it grabs that stock from the query. The messaging in the system, it's, I don't know or care how it's implemented. I do know, but I don't care, because I work for the company that built it, and I talk to the engineers that built the platform all the time. But that's not my concern as a developer. As a developer, I'm focused on wiring together the flows, defining the flows, designing the flows, in my software, at a very high level of abstraction. But the messaging is at least once. But that also means that you have to consider right in potency. And meaning that with at least once messaging, what that means is the consumer of the message will get a message at least once, and it might get that same message twice. So if you're doing things like decrementing the balance of an account, or taking inventory out of a stock, or something like that, you have to think about that. How are you going to handle the processing logic when those types of things happen? So I mentioned the patterns. And in the Earthship, relatively simple demo, there's two, it's called choreographed sagas. The choreographed sagas is the saga pattern without a controller. So that the saga happens just because of the way things are wired together. There's a lot of these actions that do one to many, meaning that one event comes into the action, but it emits multiple commands to multiple downstream services. That's a very common pattern. There's generation loops and generation trees. This reduction tree I mentioned earlier, there's three of them in the demo. And the reduction tree is really for that, is reducing something like, how many back orders do I have for a particular product? How many, with the stock orders they're in, how much inventory do I have for a particular product? The reduction of all that detailed data is being handled by these reduction trees, and they're really there for fried on potency. So like I said, this is a different way of thinking about doing event-driven. Maybe you don't have a platform that does it, but so that's okay, because I think event-driven itself is right there. So we did a review, looked at some of the event-driven architecture, looked at a bit of code, and went into this micro-mind. I'll leave you with two things. This quote from Alvin Toffler, the literate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn and relearn. I had to unlearn a ton of stuff to, and relearn new ways of doing things to use this new toolset. It was hard. I think we've discovered some patterns that will help people that follow us later on, but this is a skill set that I think every one of us needs to have if we're in IT. This one's relatively new. AI will not replace you, people using AI will. I use AI like all the time. Most of the images that are in my presentation are AI created. I've been using co-pilot for a year. Get a co-pilot in my IDE for a year. It's amazing. I'm sitting here, I'm working at home and I come out to my wife and it's like this thing's in my head, it just reads my mind. It's just dumping out blocks of code and I'm just moving. It's amazing. So thank you very much. This is the QR code for the slide deck, and I think we have a few minutes for some Q&A. If you guys have any questions. Yes, sir. That's a really good question. The only reason that particular service, the shopping cart service had bad validation checks in it was because there was a client, an external client talking to it. The other services, and I think what you're getting at is when you're cascading and wiring services together, a service just can't reject, right? So this is very insightful, excellent question. The service has to have like a compensating action. So instead of just saying, no, I can't process this message, it'll emit an event saying, no, I didn't process this message. And that event goes back, is wired to go back to say the event producer. That's a very common pattern that was in this type of a system. But so you have to be very careful about just rejecting incoming commands. And it's only really for services that are on the edge of the system where there's external clients that aren't a venture of inclinin and can handle the error response. Any error questions? No, it doesn't, not in this system. It's not yet at least. So you mean like training it with, yeah. So that's an important distinction that the service, the behavior that's in this system, I kind of refer to it as instinctual. You know, like when we're born and when other organisms are born with a neural network brain, there's a certain amount of pre-wired behavior in the system that doesn't need to be learned. And that's kind of what analogous to what this system's doing. It's just pre-wired behavior. But what's interesting about this, I think, is that how powerful that pre-wired behavior can be. Don't put your hand into a fire because when you feel pain, you immediately react. A lot of that is instinctual behavior. It's not learned behavior. Some of it is, but it's like insects, for example. They have some learning ability, but they have a ton of pre-wired behavior built into them. And that's kind of what this is. But it's extremely powerful. That's what's kind of surprising me. It's like, man, I'm doing all this stuff with just these 13 simple services. The code's not complicated. It's the wiring together that makes the behavior of the system really interesting. Yes, sir. There's a state of each instance. Like there's a state of a shopping cart, state of an order, state of stock, state of product, those types of things. And you have to think about that as you're designing the system. That's the fun part, the part I really enjoy of doing this is thinking through the design, thinking through the compensating actions. You kind of have a happy path where everything works just fine. That's kind of the easy path, but then there's the unhappy path. So things like there's services that, like an order comes in and the order's decomposed down into his individual physical units of stock that it needs and those physical units of stock that need, they go looking for individual units of stock in the inventory. And it'll send in a request to one that's supposed to be available and that one's not available. So it emits an event that says, sorry, I can't join with you because I'm unavailable. Some other order got me. So that event goes back to the requester which triggers it to just try and get something else. But maybe there's nowhere stock the next time it goes back. So you have to think through that. Like, so what happens when there's no stock? So every single step you have to think through it's not as bad as I feared it might be, especially when you're decomposing a problem into these small processing units. That's where I think it makes it simple. If you have a big bulky services getting, it's got that smell because it's getting too big, then you know you're on the wrong track. And often the solution when I ran into that was to decompose it down into smaller delegate to smaller services. But you have to think about it, make sure. I think we're out of time, so thank you very much for attending the talk. And I'll be around all day today, tomorrow. If you want to talk to me, I'd love. Your only problem is shutting me up, you know, so. This thing, where did you talk? You can hear me? Yeah, you can hear me. Every mix, silly thing I say you hear now. There are chocolates and there are puttman stickers here. That's maybe the most important. Belgian speciality. We want the chocolate off. Voila, another hand, good. Another one, right, or that's too far away. Who was it? You were? So we don't have a puttman desktop swag yet, so that's why. Chocolate and puttman stickers, that's what I can offer. I have to wait two minutes or can we start? Hello? All good. So, microphone working, everything okay? And people still coming in, so I will wait. You have two more minutes, so. Yeah, so I had to learn to use the windows to work on the project. That's very fun. You introduce. There is no more movement, you know. So, Fabrice, works at Crest Hubs, as a technical writer, and he will be talking about puttman. Desktop, puttman desktop. A little bit about puttman, but mainly about puttman desktop. So, that's the, so puttman desktop was announced as version 1.0 for just a few weeks ago. And it was, we were all sweating the week before because it has to work. So, it's a very young app, let's say that. We are building features like crazy in it, and when you build one feature, another one is broken very often, but it's start to be really usable to do nice things. So, why, why, why, why puttman desktop? It's, so how to explain that? It's that for production, you want to have your containers running on Kubernetes or OpenShift most often. And in development environments, people are more using Docker, Docker compose, and other things that are just, okay, they are containers, but they are not working the same way. And your container works on development and then it doesn't work on production. That's sad. That's a very nice picture explained by one colleague. So, that's the idea is, okay, Docker compose and Kubernetes-CML, that's not the same. Let's try to give a tool to developers to produce the Kubernetes-CML and make it simple to produce. And that's why it's not just Ponman, is that there is one part of Ponman desktop which is really using Ponman, and the second part which is bringing you from container on giant to Kubernetes and OpenShift. You too. I have a session this morning and one this afternoon. The main reason is that because you are two people applying for the sessions and the other person cannot come. So, I will focus this session more on what you can do with the container on giant. And this afternoon, I will not talk about the container on giant, but try to focus more on what kind of Kubernetes workload and features we have. Voila. So, one problem that we solved and that I didn't talk about is that containers work natively on Linux. I know it's C groups and so it's features that are really bound to the Linux kernel. And on Windows and Mac, you need to work in a Linux VM. And that makes it a little bit more complex to, so it's more complex to start with Ponman and on Mac OS and Windows. And the first thing that Ponman desktop brings to Windows and Mac OS user is it's easy to start with Ponman. It's just you install Ponman desktop and then you click, click, click, click, click and you have everything working. For Linux users, we don't care. So, it's not really a feature, it's more a non-feature. And for example, I have been using Linux for 20 something years. And now I do this presentation on Windows because some of the features are specially baked for Ponman running in a VM and not for Ponman running on the host. Meaning that when I was testing on Linux, I had things that didn't work and the application is really, really, really meant for Windows and Mac OS first. And we have very few Linux users. Yes, so container and giant. You see the little compose here? So you can install compose on it. So you can run, you can run compose. But the interesting thing is that then from the container that you started by running Docker compose, you can continue and do something that we hope is better. Yeah, so proxy connecting to registries and installing in restricted environments. That's feature that we implemented in December, January, February. So that's the first, it came even before connecting to Kubernetes this feature is that if you are running in a jail where you don't have access to internet, and we made this feature fast. So really for people who want to run Ponman desktop and Ponman in lockdown environment in enterprise. For most of the user who are working remote, it doesn't, it's not a feature, but for enterprise it is. And you have a little, so you can choose between your Kubernetes context. And so in fact, natively you connect to all the Kubernetes that are in your Kube config. So if you already have clusters, they will show up here and you can select in which context you are working. And now I have a lot of time for demos. Okay, so there are some demos that I don't do live because it takes too much time. So, but there are some that I want to do live. So this is how you start the Ponman machine in Ponman desktop. And I'm sorry, but if I do this demo live, it will take 10 minutes and I cannot show everything. So just it's easy. There are certain circumstances where you, so Ponman by default is running a little less. So it means that the Ponman is started with your, with a user, a basic user, not root. But in certain conditions, you want to be running a Ponman as root and there is an option for that. And we made that for kind on Windows. So to run kind on Windows, it works better if you have a rootful Ponman. Good, hopefully I can, I don't know, this afternoon, my rootful Ponman machine didn't work yesterday. So I will not show it this afternoon unless I can fix it. Pulling an image. Let's see if the network is okay to hear. So pulling an image, I will delete the redis image. Up, up, up, and I will try to put an image. And I have my problem with Docker not accessible today. That's wonderful. If I'm, I like the live demos, you know, when you break things. Okay, so it doesn't work. I'm super happy. Let's do, when the gods of registries are with you, pulling an image is super easy. So I've never had Docker registry refusing to download an image before this morning in this room here. So that's the big surprise. So, but yes, okay, so you can download image from multiple registries. And so when one registry is down, you can use another one. Quay is a code registry for that. Next, next, next. So pulling an image, starting a container. So I will not do the demo with the disk redis because I killed it, but let's take engineers. So to start a container from an image, you know from, sorry, I don't know. If you use the command line, you know, a lot of command line, you put that in a script, et cetera. Here you have a lot of options that are displayed in the starter container screen. And usually these are things that you define in the image. For example, the port to expose that's defined in the image and you just have to validate. You don't have to do anything and you should have something that works out of the box. And so now I have my engine is working good. And I can interact with my container. And when you go to the data list, it's always open the log first thing. There is a summary, which is not the most interesting thing. In the logs, if you click in the logs, no, control F doesn't work today. No, if you do in the inspect, if you click in the contents and you press control F, you can search for things inside. That's a nice hidden feature. So that's, you can inspect the container. The cube thing is one of the features I will talk more a little bit later. So from this container, you can create a pod. And basically the definition of a pod is a Kubernetes YAML. So when you create a container, you can directly have the YAML content to create a pod out of your container. And you can interact with the container. And last thing, no, not exactly last thing, you can, as the container is exposing a pod, you can directly visualize what is trending on this pod. When you're running Podman, that's easy because it's localized, the pod. When you are running in OpenShift or Kubernetes, that becomes a little bit more complex. And you can, you have here a button to deploy two Kubernetes. And the context here that you can see here, it's depending on the context that you activate here. Okay? So here I can decide to deploy to OpenShift local to develop our sandbox. On OpenShift local, I have two profiles and I have a micro shift instance that doesn't exist anymore, but I can still do it. Let's back to the slide. Starting a container, starting the nginx container, building an image from a container file. Yes, let's do this. And normally, today, I am cheating a little bit. So here you can put another name. So you first have to choose a container file or a docker file. So most often it's called a docker file, but sometimes container file. Then Podman Desktop will take your container file and all the directory that are around. So if you are copying files, your Podman Desktop has all the context to do the build. We'll see that in the logs. But as I'm cheating, it goes very fast. Okay? When you build for the first time, it's actually uploading the build context can take one minute or more because it's taking the docker file and the directory from your local file system. And it's copying that to the Podman machine, which is in OVM, and then the build is starting. And here I'm cheating because it's already built, so I'm using the cache. I'm not showing you. You want to see that? No, you don't want to. So registries. So that's important. What do I start? Yes, registries I will show in the app directly. So the registry, they are in the settings here. You have four registries that are pre-configured or you just have to add your username and password for Quay, in fact, it's username and a token, but. And if you want, you can add your own registry. So if you have a private registry just for yourself or for your company, you can do that. And when you have added one registry later on, you can, if the image name that you have, so if you built an image, if the name match the registry, then you can push it. For example, this one, my custom image, I can push it to a client cluster, but I cannot push it to my registry. But this one, which has a name that correspond to my Quay.io registry, I should be able to push to the registry. And then when I push to the registry, I have, I can select between tags. Obviously, I take the tag that correspond to the registry. And then I haven't tried this today. Does it work? Does the network work? Yeah, I'm happy. It did work. So configuring, you can configure a registry. How many times do I have still? Five minutes, oh yeah, yeah, yeah. You can push an image, we have done that. And next to pick five minutes to talk about pods. Oh yeah, yeah. That's live demo. Okay. I have my pod, I delete my pod. Hop, let's be crazy. So I have redis, I select two pods. So to explain pods, as I live in Belgium, I was thinking of a metaphor that is implying fries. You want it? It's, so in Belgium you have two categories for people. You put the sauce on your fries, or you put your sauce on the side of your fries, you know? But frites au safar, they call it. And pods, it's a little bit the same thing, is when you have container that are exposing pods, you want that the application is exposing the pods to the user, that's good. But the database, you don't want to expose the part of the database to the user. You want to keep it in an internal network. Same thing in your fries. You expose the fry to your mouth, but the sauce, it's just for the fries, not for your mouth. So, but you cannot do it with fries. So you select your pods, and then you select the pods that you want to expose externally. And the pods that are only required internally, you don't expose them. And, there he is. He's working. Perfect. In the logs, you see that you have the logs of your two containers, the piton app podified and the redis podified, okay? And, and can you search in the log? No, not today, okay. You have, you can inspect the pod, okay? And here you can search in it. And you can see that it's exposing pods 5000 and 8080. Good. And you have, you can copy and paste your, the YAML to create it from code somewhere else, okay? So that's already cool. And you have the possibility to deploy to Kubernetes. That's for this afternoon. And here, if you go, that's where the application is still young because if you want to connect so now, you want to see the app of your pod, you don't have access to it in the screen yet, okay? It will happen. But we have not done yet. But in the summary, you can, oh, I clicked on it too fast. In the summary, here you have the name of your containers. You have three containers. So there is one secret container that you didn't choose that happened here. And if you click on the container name, secret, then you are on the container. And this time you can open the browser and we have the application and the increment is stored in this database. So that's cool. And if you, so you have the, you have, you can see your logs. That's good. And you can see that the container, the container that I selected are now stopped. So I don't have any more container in the running outside of the pod. And my pod has been created and my pod has three containers. And the last of them is an infra container that's, that's rolling them. And for these logs, you don't see things. I can show you. You can interact with, with the, the reddit's database here. How many seconds? It's over, huh? So I will finish that and then questions. So question, you can select containers to run a pod. We've seen you can generate the YAML to run a pod. Questions or the last demo is you use the YAML and you start a pod. You can still use Pondman compose. You can still use Docker compose after that. Okay. So that's, that's not. You're just gonna work with the same stuff. Yeah. Okay. So when, when you, when you do the Docker compose, you cannot send the thing to Kubernetes. Kubernetes will not understand the code compose. So that's the, that's, that's the first point is that you have something that works on development. But then if you want to put it on production, the Docker compose is, is not working. So you have at some point, you have to do the translation from Docker compose to something that is Kubernetes YAML. And that's, that's where, that's where, what we are trying to do is that this translation is as, as painless as possible. So, and, and because, so you can use, you can use a Pondman desktop with Docker. I failed to install Docker this morning, but that's another question. You can, so you can use Pondman desktop with Docker and then you can use compose with the car and then use Pondman desktop to take the state of your containers to generate pods or to generate QBML because you cannot run pods on Docker, that's the problem. But you can, you can, you can generate the YAML and then run the YAML into Kubernetes. And, and so we support Pondman, we support Docker, we support Lima at the, at this moment. So three different container and giant. And for Kubernetes, so we, we support remote Kubernetes and giant that are in your Qube config. And we, we have more support to help you install OpenShift local, helps you install kind and connect to the developers on bugs. And that's what I will develop this afternoon, this kind of stuff. So the point is, do you want to deploy to Kubernetes after a while or not? If you don't want to deploy to Kubernetes, you don't need. If you need to deploy to Kubernetes and that's something that developer need to do and not the upstream, that's a very nice tool to make it easier. Well. Basically, so it's done. And Docker always wants root containers and what does want root less. And so, and for example, if you want to run, sorry, if you want to run the standard, so the Docker IO release or, or Nginx image on OpenShift for example, it doesn't work. So they are, OpenShift is putting constraints much, much higher than Docker. So, so containers are running fine in development and everything is okay. And then you, you push to production and suddenly the container is not working. So it's to have, to have this, this information, this feedback loop be closer to development. That's a, that's a nice tool. And that's something where the usage is different from the Docker or Kubernetes extension in VS God or in whatever editor you are using. It's not exactly the same usage. The, the, the Kubernetes extension is not something that brings you from, from container to Kubernetes. Yeah, that's a tool that's just, just to do this transition. To do this, I, I, I, I work on containers. I have everything that I, that I want on containers and that's what, what do I need to do to, to move to the, to the, to the next step. Well, good answer. Another question. New features that are coming. Stability. That's a, for me, that's the most important. So, so we have, we have been building features like crazy in the, the last month. So the, the project is a one year old since February or March. So it's very, very, very young and it was going very, very, very fast. So stability. So, and to end test to, to know that when, when we build a new feature, we don't break another one. So that's, for me, that's the most important one. And then for now, the open shift local extension, it's, it's the first iteration for the, for kind also. It's the, it's the first iteration of them. So that's a, we discover all the age cases that make things break at some point. And you know, and also things like now I have my pod running. I want to have the button to, to open my application directly. So to, to have things even more, more simple and less hidden. There are some features that now are just it's here, but you have to know. And I would like it to be, to be more, more, more simple. And that's quite a lot already. And, and yeah, and I know that what is coming now is the, the welcome page has been reworked because now it doesn't, if you, if you look at the welcome page, it doesn't fit in a screen, you know, I have one man that takes all the, all the room, but the rest is. So, okay, yeah, I'm, I'm a, I have a big scaling factor to display in a big screen, but this, this welcome page is, was okay when there was only one man. But now that we have more things. That's a, that's a problem. Another question. So the time, the terminal interface is there. So anyway, we install it. So you use it, you don't use it. It's your choice. I would not choose for people. I'm, I've been using the, the, the, the text interface for years and I learned things about containers by using this app, you know, because there are, there are things that are more displayed. There are things that are also hidden, you know, because choosing to, to, to, to display containers, pods, image volumes, it's, you select what you, you know, you have, you have, you have some visualization, but for example, there are, there is no deployment. So in Kubernetes deployment is even more central than pods. And you don't, you don't see them. You don't see the, the secret or other things. So there is a selection and a simplification of what you need to know. I find it's really interesting to, yeah, to enter the machine and to say, okay, that's, that's the important point. I will focus on them and the rest we'll see, we'll see later. But I don't know, command line and windows, that's complicated. So I have command line user for years. And now I have cmd.exe, PowerShell, Git Bash, and the, the, some of the commands land in one of the, the terminal and not in the other. It's, it's, it's getting me crazy. So the, the, the terminal experience on windows is anyway, is a problem. Is a, is a, it's just, just, okay. Terminal, open a terminal, which one? Cmd, PowerShell, Git Bash, something else. I, I'm using this, this terminal that has all of them. But you see, it's, what, what do I, what do I want to, to, to use now? So that's, that's complex. So this afternoon, I will try to break things on Kubernetes, OpenShift local, developer sandbox, and all of that. And, and it will be fun. Ah, yes. And you can see our, hello, good morning everyone. I'm thrilled to be here today to talk about a topic that is crucial to the health and performance of Kubernetes applications and applications in general, testing observability features. As developers and DevOps engineers, we understand the importance of monitoring our applications, diagnosing issues, and ensuring that our alerts are accurate and timely. Today, we'll delve into how we can achieve these using Prometheus, alert managers, and various testing frameworks and libraries. A little bit about me. I'm John Villasa. I work at Red Hat. In specific, I'm contributing to the QBIRT project. I don't know if you had the chance, but in the first floor here, we have a small booth where we are giving some demos and giving some swag. So if you have the chance, please stop by. I started working on a tool of QBIRT named Hyperconverse Cluster Operator, but then slowly I started having more focus on monitoring and observability features across the deal components. So the outline for today. We'll start by setting up and seeing how to do a test environment, how to test metrics, then moving on to alert testing and how to ensure that they are actionable, relevant, and real alerts. And in the end, I'll show you a small demo of how I did everything together. So let's start with setting up the test environment. First, we need to understand what the test environment should look like. It needs to be obviously a controlled space where it can simulate the conditions of a production environment, obviously. But we don't want the risk of causing disruptions to our actual users. And in our tests, we'll need to create and delete a lot of resources, remove permissions, cause network problems. So this is where the concept of a disposable local cluster comes into play. A disposable cluster is nothing more than a temporary Kubernetes cluster that we just create for the purpose of testing. The beauty of this approach is that we can spin a new cluster whenever we want and whenever we need to run a test and we can just tear it down once we are finished with it. This ensures that every test of ours starts with a clean slate and we don't have to worry about any leftovers from previous tests. And weird permission changes that we might do to test our components. Spinning up a cluster can be used with a lot of tools. Some of them easy to use and available right now are, for example, KIND, which is Kubernetes in Docker, Minikube, MicroKaitos, or any other. On Qvirt, we actually have this cool tool already. I literally use this every day and I'm at work. And it already creates a cluster and has even a flag and sets up everything from it is related. So it's, I really like it. And in the automated tests nowadays we don't use it. We provision full clusters. But that's more like, I see, like a bonus for when the projects are more mature and the small projects that are starting, I don't think they really need it at the beginning. So, let's start now with metrics and events. Metrics are obviously the ways of observability, right? They give us insights on how the application is behaving and they help us diagnose the issues. And we'll explore how to test those metrics. I'm putting the question like, are unit tests enough? Obviously it's a dumb question. Of course unit tests are not enough. But in the ideal world, everything will be fully tested, right? But then again, we can't have everything and we know time and workforce are limited. So in the end, I guess at least in the beginning, unit tests are kind of enough. In my opinion, it's more important to start with end-to-end tests for alerts. And usually a lot of our alerts already use metrics for the calculations. So if our alerts are working correctly, we have some degree of confidence that the metrics they are using are too working correctly, right? But even though we can start simple and most of the times like the simple things are the ones that end up saving as the most time in the future. And when we talk about metrics, it's very important to first validate that we follow the right naming conventions and we have the correct levels. And if possible, if checking if the metrics are prefixed with a component name, so that later on we can easily find out where the metric is being created and just more quickly trace it back. And have the unit tests for functions that update those metrics and in the end of the test validate that their value was correctly updated as we'll see later on in the demo. Now let's start with the alerts. One of the main concerns in this topic is that we don't want to be flooded with false alarms or miss critical alerts, right? So we'll discuss how to ensure our alerts are actionable, relevant, and they are real. And we'll also cover how to configure them correctly and ensure that they react to the appropriate triggering conditions. This is probably the most known quote in the area and if you worked with alerts, you already probably saw it that every time the page that goes off, I should be able to react with a sense of urgency. I can only react with a sense of urgency a few times a day before I become fatigued and every page should be actionable. This is taken from the site play with engineering book. So it's like one of the vivals of observability, right? And we have to be aware that an urgent alert might actually wake up someone in the middle of the night. And this is even more important when we are producing software for external clients. We don't know how many people on duty they will have. Actually worked on a small company before and you were two people managing infrastructure and we didn't have someone already ready at all hours of the day, right? And since we have clients on the other side of the globe, if we have an alert in the middle of the night, one of us will need to wake up and look at it. So it better be a real alert. So how should an alert be configured? It should have an owner, a contact person that it's basically able to quickly understand the problem that created an alert or the process it refers to. Sometimes it might be the developer of the feature, other times it might be someone from a monitoring team. So this is really important because sometimes we don't want to start with something that we don't know very well and knock your head off and then lose valuable time. And as in metrics, we should be able to quickly identify the component which the alert refers to. For example, if you have a Kubernetes operator, that is managing, for example, resources on IBM Cloud, we might want to have an alert like, IBM Cloud is not available. So we want to identify which component created the alert so we can quickly understand where it came from and go there to navigate the logs and try to understand in more detail the problem. Also a summary and a description, and I think those are actually pretty straightforward, but we should also have a link to a handbook which we'll see in more detail in the next slides. Usually for severity, some people use different severity levels, but some of you actually recommend these ones. And they are useful to distinguish like which actions we should perform for each alert and to sort the priorities on them, right? For critical alerts, those are usually the ones that will page people in the middle of the night. The warning alerts, we sometimes just want maybe to create a ticket that should be looking into the next day. They are usually useful for, like some component is reaching critical memory. If you don't do anything in a day or two, they will reach a critical state. And if we usually don't really perform any immediate action, we just also create a ticket that will go to the bottom of the line. About the alert handbooks, I think those are really, really the important part because they serve as a comprehensive guide for the cluster owners or the operators. And they should provide a step-by-step instructions on how to handle the specific alert. Like you all know that if you don't provide these handbooks, usually owners or operators, will need to go through a lot of the documentation pages or their personal notes to understand how they can debug and fix the problem. And this always leads to losing valuable time. Usually that time also means losing a lot of money, right? And even with standard, sometimes we force them to rely on memory or improvisation because they will handle some related issue in the past and that will also lead to mistakes and more delays. So, how can we test the alerts? In our tests, we should make sure that all the alerts include all the mandatory fields that we said before, that each handbook URL link is valid and the handbook actually exists, that the alert includes a reference to the instance or pod. It might be like the name of the component or really in a label about the pod or even in the description. And the alert is triggered when the expected conditions are met, right? And then again, most of these steps are actually very simple, but as I said, for metrics might just save us a lot of time and addicts in the future. So, now let's put this theory into practice and I'll try to show you a small demo that usually goes very badly. So, to start with, I have here the creation. I don't know if you can see if I should increase or do I want to increase? Can you see or is it better to increase? It didn't even start already, first issue. So, in our project, we want to create the metrics, right? Actually, I should start it. So, this is a simple operator for Kubernetes. I logically didn't have no logic at all for now, just some simple metrics and alerts to show you how you can start with, right? And from then, I used in this case, kind to create a new cluster and I just installed from this. I'm doing this locally, but obviously you can follow these steps on a cluster you have for testing. It's even possible to do it on GitHub actions or GitHub lava. So, really simple steps that you can easily do in five, 10 minutes. And after having the cluster, I then created my metric. I have here the control level to refer to the operator. We are now looking also, for example, to have stability levels that will allow us to deprecate metrics in the future. So, all these levels are really important for you to put some time into thinking about them and because these are the kind of things that will help you, your team and your customers. And then, for example, for these metrics, we have here the reconcile loop of the operator. For those of you that are not familiar with operators, it lets us create a custom resource on Kubernetes and then, for example, when I create that resource, it runs this reconcile loop for me to perform any actions. Imagine that here, as I said before, I want to work with IBM Cloud. I might want to create, like, a machine on IBM or a route, something like that and this is the place where I will be doing that. And for example, here, whatever the logic is, I want to increment this metric that will tell me how many times the reconcile loop was created. And as I said before, it's real here, we can start simple and write the test for our metrics. In the first step, we can start with easy validation that, for example, the metrics follows the Prometheus conventions. And we list all the metrics and then we'll link the metrics. Here, I'm using Promblin's tool which already brings a lot of validations that, for example, the metrics have all the necessary structure. For example, counters should not have the total keyword at the end. And as I said, this is really important to take a look in the beginning when we start adding metrics. In Q-Virt, we have a lot of developers adding metrics and without any validations. And in one component only, this is... So what ended up happening is that when we added the linter, we have all these issues. Non-Instagram, non-summary metrics should not have the console fix and you see those are a lot of errors in terms of units and so on. So, and now we saw before that we are adding stability levels. Why are we adding that? Because this project is used by clients. And which metrics are clients using? Those ones that we created before. So we can just simply go there and rename the metrics because they are using them and it will cause them a lot of trouble. So we are now thinking about we should deprecate these metrics for two versions, create new metrics with the correct names, warn the customers that these metrics will no longer be supported, then... Trying to see if they are already using those metrics, not causing them issues and this will take a lot of time. So, and you saw that really it will take like very few lines to call Prometheus Linter and save us from these kind of troubles. But that's life and that's why we are trying to now expose these issues that we have with the community and writing some best practices so these problems don't happen again, right? So moving forward, here it's the unit test for the metric, right? We get the initial reconcile value count and then we run the reconcile loop. As you saw, it was very simple so it's obviously updating the metrics but in the future, your logic will be much more complex but even though in the end you find it's up on this case expect the initial value to be added one that will be the final value, right? So, pretty simple stuff. From there, we move on to, for example, recording rules. So, this is where we are creating the, protecting the rules. So, for this project, I created two simple recording rules, right? The first one, the number of operator pods in the cluster which is simply a query for primitives which counts the number of pods up in the cluster of that type and also the number of heavy pods which is the sum of the pods in the cluster because this metric will have the value one if the pod is ready to be used, right? And also you can notice here that some of the, we are also already trying to use stability levels on the alerts, for example, alpha. Here is just an example but heavy is to show that this is not actually tested and it's still being worked on and other. From these recording rules, we are building our alerts and one of the most important alerts to start with is obviously testing if the operator is down or if the operator is not ready. We make use of the recording rules we saw before. If the number of pods, operator pods in the cluster are zero, we'll be triggering the alert and if the number of heavy pods is less than the number of pods, we'll give the alert test operator is not ready. For those, we have also some validations for recording rules. We are linking the metrics as linking the recording rules as we did for metrics because they should follow the same conventions as before. And then here, for example, I want all recording rules to be prefixed with the name of the operator. And also the same for alerts that should follow Prometheus conventions. And actually here, I also put the link here that I said before that we are trying to achieve adding the best practices for observability on operator SDK. So a lot of these validations come from there, right? Alerts must be in Pascal case format. They must have an expression. We are validating labels, validating annotations. So just following the recommendations there. And these are basically the unit tests. I could have handles just to see that as a good developer, I follow by the rules I created. This one for the metrics, it will be very funny that it fell. And this one for the rules, pretty simple stuff. So now moving forward for the end-to-end test, that's what we want to know, right? I already here have a cluster. Actually, I see that I have the resource created and the pod, but on running it will clear it all. I'll start running because it takes a few minutes, I think. So, and let's see. For metrics, we are doing the same thing. Here some setup deploying the operator, then port forwarding the Prometheus service so that we can access it locally and deleting any previous resources that exists. And our test says that we should increase the test operator reconcile count when the reconcile count is run. So we just get the initial value for the test operator reconcile count and we create a new resource. As we saw before in the reconcile loop, any resource that is created is supposed to update the metric. So in the end, we know that eventually when getting the metric, it should be equally the initial value plus one. This is a really simple test for metrics, but as I said before, it makes sure that everything is working as supposed. We saw that this is very similar to the unit tests, but in unit tests, it's easier for us to know that the metrics are updated. Here, it's more tricky because it has a lot of also other operations before because we need to make sure that Kubernetes is actually passing the right events that is being cut by our operator and then later executing the reconcile function correctly and we have reconcile count, but we can have number of resources created, number of resources deleted, anything we want, right? And for alerts, we first have the verification I mentioned for the handbook. We are checking that the handbook URL is available. I created just some gist on GitHub for the purpose of this demo and I can actually show because I copied it from one of our handbooks, cleaned up some stuff, but let's see because it's also useful to know what a handbook should look like. First, we have the meaning of the alert that supposedly these alert fires when no test operator pod is running the cluster, the impact that these alert has. Sometimes in QVirt, the operator might not have like a big of an impact because the operator is not responsible for virtual machines, but if the alert is like virt controller is down, then users might start to have a problem because virtual machines are running while in the cluster and nothing is controlling them. And then we have all the diagnosis steps and here we should have clear steps that in the end we understand very correctly what the problem is. So moving back to the tests, we are now following the same approach from our last format, deploying a new test operator and then we are making sure that the test operator is down, is thrown when we want it, right? And in this case, for example, we just scale the deployment down to zero and since we have no pods in the cluster, we verify that the alert is being triggered. In this case, pending is enough for me because you might want to have a delay on the alerts just throw the alert if the condition is met for more than five minutes, for example, but if I see if the alert is pending, like the condition was triggered, it's just waiting for those five minutes. I think the alert is working fine, so I'm okay with it. And for the operator not ready, I just come here and set a random image which might mean, for example, that the repository is not available as happened earlier in the other demo, so it's a problem that sometimes we think, oh, this never happens, but it happens more than, it happens a lot, yeah, that's what I want to say. And we then validate that the alert is then thrown. Once again, if it's firing or pending. And as the time goes by, we start adding a lot of alerts. For example, we might want to see if, oh, it failed. If the, so, we might want to see if the operator is creating the right resources on Kubernetes, and for that it needs permissions. One of the things that we can do is just go there, just delete the RBAC permissions and see if the correct alert is triggered, saying that this permission does not exist anymore and you should look into it, right? So that's one of the things that we test. We can test, for example, for the HTTP requests if they are failing, that's actually a metric that Prometheus already gave us. So the number of things you can alert for is up for your imagination, actually. And just to finish, I want to present like this really useful tool. Actually, we don't use it on QVirt, but I use it on my master thesis, which is ChaosMesh and it's really cool. It's really simple to use and it allows you to create a lot of problems in your cluster, for example, deleting pods, causing network issues that might be a lot of latency, requests being dropped. It can cause CPU and memory issues. It usually really take a look. It's simple to use, simple to configure and it has a lot of potential for unit testing. So to wrap up, I just want to say that it's really important to add observability features. It really help us. Usually when we start projects we tend to overlook those kind of things but they end up being very important to help us in the future. And also important that they are actually working fine. As I said, we don't want our DevOps and clients to be waking up in the middle of the night just to see that the alert was not real or we don't want to have bad problems happening in the cluster, but they are not alerted for and they are losing money and they are losing clients. So those are the main takeaways. So that's it. If you have any questions, feel free. Yeah, because those metrics are not up to standards, right? And the idea here is to do when we have the stability level deprecated we will have a flag in the help text and they are not. Because if we keep metrics that we want to update we'll have a lot of more things to manage in the future, right? And we really have already a lot of metrics and it's hard to keep up with them. We are also trying to add tools to generate documentation and to centralize the metric creations but there are so many of them and if we have 10, 20, it's manageable, right? But when we get to the hundreds we have a lot of components that becomes a problem. And maybe it's not a thing as you said to we cannot remove them maybe in a version or two or three because those problems happen but eventually in the future I think we need to end up removing them or else it will be unmanageable because even it's an open source project, right? And people come and go and if we don't do anything about them they will eventually be forgotten and nobody will know why do we have this metric and we have another one similar. So that's why it's really important to validate them in the first place. I don't know if you might not agree, right? But it's my only... So, any more questions? Please. Four? Yeah. Sorry? Ah, yeah. So the question was if how to create metrics for reconcile errors. So basically that example is very simple but imagine you have an error you usually exit the reconcile loop with an error, right? And there I have just this... I have this just to operate our metrics. So reconcile count. But I could create a new one like reconcile error count. And in our loop here I perform some operations that throws an error and I just create that new metric like increment reconcile error count. No, that was... Sorry? Yeah, yeah. But you can have create any metrics you want, right? You might want to have an error that was... You could not connect to an external provider. An error that's just... You can create any metrics for the granularity you want because metrics are really cheap and you should create the metrics you need to then help you debug, actually. But then when creating the alert based on those metrics we should be more careful, right? Some of the errors might not be worth to alert for because it might come from user configuration that it was not able to create the resource with the properties it shows. So that might be like a warning or an info alert for them. But yeah, my advice would be create all the metrics you want because if you think that information is valuable create the metric. The alert you should be a little bit more careful here. Any more questions? So I think this is it. Thank you everyone for being here. All right, yeah. Okay. So John works in Red Hat in the storage team which is... which has a static view on which is it in the dust and he was talking about the favorite language of every rust. Hi everyone, thanks so much for coming today. So I just wanted to give a little bit of background on why I put together this talk and essentially asynchronous programming is something that's emerging in Rust and when I was learning how to do asynchronous programming then I essentially bumped into a lot of corner cases where it became a little less apparent how to proceed and so while the ergonomics are quite good then I kind of wanted to put together a talk that would basically talk about how to get started from ground zero in asynchronous programming in Rust. So the overview of the talk today is we're going to start by talking about the state of asynchronous programming in Rust. So basically just like language level support what's there, what's in external libraries and that sort of thing. We're going to talk about asynchronous programming fundamentals so we'll talk a little bit about the architecture, syntax and why that matters and then we're going to do a quick case study of a modified read write lock that we implemented in storage for our use case. So the state of asynchronous programming in Rust. It's designed to handle use cases like asynchronous IO and network operations so in general anything that you would need to do asynchronously the framework is designed to handle that so long running blocking tasks in the background that sort of thing. I wanted to give a couple of definitions because in asynchronous Rust we talk a lot about tasks and futures and tasks are kind of like green threads so if you're familiar with Java or Go you're familiar with this concept it's basically multiple green threads can be served on a single OS thread in parallel and futures are basically the handle that you can use to pull and access the final computation similar to kind of JavaScript promises for those of you who are familiar with JavaScript. So one note is that Rust doesn't really have the same kind of level of runtime that Go does so it can't preempt threads like Go can in their asynchronous implementations so it requires an API design that builds in yield points any time that a green thread can't make progress or a task can't make progress and futures also require an executor to handle driving them to completion so in general you need some sort of executor to kind of drive the asynchronous work well so that's a lot of information but we'll dive into some of this in future slides. So the language level support is really just kind of async and await at this point so what async does is it's a keyword that you basically attach to a function and that turns that function into a future. Await basically allows you to await on a future to complete and while it has the same kind of semantics as blocking it's awaiting in a non-blocking way. In terms of the standard library support then there are a number of data structures that have been added in to the standard library and we're going to talk about some of these in more detail but the one I really want to call out is the future implementation so this is the trait that basically is the building block of asynchronous Rust and you basically implement this for all of your future needs. So in terms of external library support and then basically we have futures RS which provides utilities for working with futures so that's going to basically be combinators for basically handling multiple futures in parallel that sort of thing and you also have an executor which as we mentioned is required for actually executing the futures. Tokyo is basically a library that's built with more functionality built into it and it has a compatible executor with futures RS so the two can be used together. It provides a runtime for asynchronous programming and basically what this runtime includes is a bunch of async utilities like an IO driver, a scheduler, a network API synchronization locks so it's really kind of aimed to really cover a lot of the asynchronous use cases that you might bump into with asynchronous programming and then I figured I should also call out async standard but it does not have a compatible executor with futures RS in Tokyo so the two generally bump into problems when used together it basically aims to provide a similar API to the standard library but it's all asynchronous so this is becoming more popular for people that want to do kind of asynchronous standard library operations so with all of that out of the way we can talk a little bit about the fundamentals of asynchronous programming so first we're going to start with implementing a future and consistently this seems to be the thing that a lot of people struggle with the most in asynchronous programming I put up the definition of the trait for the future just so that we could go over it in a little bit more detail and the basic idea is that you just kind of define what output the future should give and then you have this pull method so what the pull method essentially does is it basically pulls over and over again as it's woken up and basically you return either that you're ready and you return the result or you return pull pending and it goes back to sleep until it's woken back up and it can make more progress so one note is that you should never block in pull so that can cause problems and we'll get into that a little bit more in future slides just why you should never block in pull so this is an example of kind of what happens when you're not ready so when you're not ready then you basically have to register this thing called a waker and so the waker was listed on a previous slide as something that's in the standard library support and what the waker essentially does is it's a way to signal to a sleeping future that it's ready to progress it's ready to make another pull invocation and so when you basically put the when you put the future to sleep then you essentially have to register the waker somewhere to be called later to wake up the future and basically then you return pull pending so here's an example of kind of when you would wake a future so here we have waker.wake it's pretty simple you basically call that when the future is ready to make more progress so something always needs to call wake when the future has reached a state where it can progress because otherwise the future is put to sleep indefinitely and will never wake back up and so one of the common misconceptions or difficulties learning asynchronous program is kind of where do you call wake and so I put a few examples of kind of where you could call wake it could be in a spawn thread in the background it could be in the drop method on the return data type from pull so say you return a handle indicating resource acquisition in the drop method then you could theoretically call wake to signal to other sleeping futures that they're ready to progress now the waker could also be sent using a channel across to another thread where you basically have a thread in the background listening for events and determining whether the sleeping tasks are ready to are ready to make progress so there are a lot of options and it's really up to the design of the developer so some important notes on futures are that creating a future does not start its execution a lot of people expect it to but it doesn't so it must be either pulled or awaited to kind of start execution however using Tokyo spawn will immediately begin execution in the background so that's one case where you can start it immediately pull also needs to handle wakeups where no progress can be made so in general you may expect that pull only needs to handle cases where waker.wake has been called explicitly but there are scenarios where it can be woken up due to side effects and where no progress can be made so your pull implementation always needs to be able to handle that we can now move on to kind of sort of the Tokyo runtime basics the runtime contains an executor and scheduler so that's where a lot of the IO code like kind of the functionality for E pull that you would get asynchronously in Tokyo is built into it has multiple runtime backends so you can choose either a single threaded back end where basically all of the tasks are interleaved with each other on a single thread or you can have a multi threaded back end for true parallelism and futures are spawned as tasks and multiple futures can make up a task multiple tasks can be served on a single thread so the end result is that blocking in a future can stop other futures from progressing and it probably makes a little bit more sense now why you should not block in either pull or in async function because that could theoretically stop another future that is ready to progress from progressing so there are two notes on the blocking aspect of things that I just want to call out which is that async functions can use blocking mutexes as long as they are not held across awaits so basically you cannot acquire a mutex then do an await and then release the mutex so as long as you're just acquiring a mutex for one single statement that is not across an await that is okay to use blocking mutexes and blocking mutexes are actually common in pull implementations but they can only be held for the execution time of pull because the general thought process is that if you acquire a mutex and then drop it as soon as pull completes pull is supposed to be very short running so if it's only acquired for that scope that is equivalent to non-blocking from the standpoint of asynchronous programming so there are two types of threads there are blocking threads and core threads so the core threads are the kind of OS threads that we've been talking about that can serve multiple asynchronous tasks on a single OS thread blocking threads are kind of a special case where they're used for areas where you cannot get around blocking and they basically assign a single task to the blocking thread so that it can block without causing any other task on that thread to kind of stop progressing and so that those blocking threads will get into a little bit more but that's one way that you can block in asynchronous programming I also just want to call out that the Tokyo documentation is very thorough so please take a look at the glossary and the more advanced topics in the tutorial because they'll give you more of a working knowledge of some of the terminology and kind of code that you can write with asynchronous programming so we're just going to quickly go through syntax and common usage we've already kind of covered async in a way but that's one thing that I just wanted to mention here we also have spawn which we referenced and so the way we can think of spawn is that basically we have this long network operation and so we call spawn on this asynchronous task and basically it spawns in the background and starts immediately executing we do a lot of other work that we have to do before we can handle the response then at the end of that we away handle the response and that's a great example of kind of a basic asynchronous workflow spawn blocking is very, very similar but basically it's for blocking operations and so this is going to spawn this task on a blocking thread so you can basically do all of your blocking here and so it's a similar kind of API you spawn the blocking network operation you get the handle, do other work and then away and handle the response so block on is kind of the inverse of spawn blocking where from that we're calling basically a blocking function from an asynchronous context block on allows you to basically call asynchronous code from a blocking context so in general here we have an example where we basically wanna make a blocking call but then after that we want to evaluate a future and this is assuming that we're calling mySync function from a context where it is safe to block so we can call the blocking call and then we can basically get the asynchronous API and then block on it so that we block until it's ready because you can't use await outside of an asynchronous function so you would need to block on the asynchronous code this would evaluate it asynchronously in the background blocking until it's complete join all is kind of a great example of spinning up a bunch of futures in parallel and waiting for all of them to complete so here we basically have a vector of futures so an arbitrary number of futures we pass that in to join all and await on it and then when they're all done completing then we can basically iterate through all of the results and handle them select is kind of turning that on its head where we basically have a future of a couple futures and we basically wait until the first one returns and when that first one finishes evaluating then what we do is we basically handle the result of whichever future evaluated first and we cancel all the rest so the previous one was kind of waiting for all of the futures to complete select is waiting until the first future completes so hopefully that gives you a little bit of kind of an overview of basic syntax usage kind of common functions that are used to handle futures and now we're going to dive into a case study of a modified read write lock that we had to implement on our team so the use case is basically that we had a table of data structures that all need to be locked independently and the IPC mechanism that we're using requires fetching read-only information from all data structures at once we initially considered locking each data structure individually but that scales linearly in terms of lock acquisition time and we began to notice a slow down when we were fetching these properties and so we ultimately decided that we needed to design a synchronization lock that handled our particular use case so we basically need to handle a single read lock on an individual element, a single write lock on an individual element in all read lock on all elements and in all write lock on all elements so we're going to actually dive into a little bit of code here so I've highlighted kind of the important sections here but basically you'll see we have our all or some lock and in that we basically have a lock record which is basically keeping the state of what is locked at any given moment and we also have this table inside of a mutex and that's keeping track of all of our individual data structures that we want to lock so we basically have an all read lock kind of field here and what this basic this is an integer because we can have an arbitrary number of all read locks at the same time as long as they're not conflicting with an all write lock for example we can only have one write lock all write lock at a time and we can have as many single read locks on an element at a time as we want mapped to which element is locked we can also have is we can have one write lock on each individual element so we keep track of which element is locked and have exactly a count of one for those so as you might notice if you're familiar with Rust this is upholding kind of the mutable immutable reference requirement so basically we can have all of them read locked as many times as we want to or all of them write locked exactly once so this should look very similar to references in Rust next we kind of have this implementation we're going to dive a little bit into acquiring a single read lock on an element and as you can see here we basically just kind of create a future and then we await on it so this is a great example of the kind of async await workflow we create a future and then await on it and that's really all that this does so it gets a little bit more interesting when we actually get to the future that we are awaiting and that's the sum read future so I've highlighted that this is the poll definition right here so we're actually going to see a more complicated example of implementing the poll method here for the future trait and I highlighted the acquisition of the mutex mainly because this is a great example we're using a blocking mutex here and because it's only acquired for the length of poll it is okay to use this next we kind of check whether we have a UUID or name that is registered in our table of data structures and if we determine we don't have anything with that name or UUID what we essentially do is we basically return that we're ready and we found nothing but we also above that call wake and one of the reasons this is important is because basically we've determined that we're done with our work but anything else that might conflict with this read lock needs to be woken because now it can make progress because we have finished our read acquisition so we have to wake up everything that's been put to sleep in the meantime if we do find that something has this name or UUID in our data structures but it conflicts with another lock that's already acquired as you can see in this next slide up on the top right then we basically add a waiter and what we do with that waiter is we store the waker so that we can later call wake on this for it to make future progress and then we return pull pending so this essentially puts the task to sleep and if we determine that there are no conflicting locks below in the ELTS block then what we can do is basically register that we have acquired a single read lock on this element we basically get a reference to that element using unsafe code and put it in this kind of some lock read guard which some of you might recognize if you've worked with New Texas before in Rust this is essentially a data structure indicating the lifetime of this lock acquisition and we can access the individual data structure from this guard so one thing that I just wanted to call out about futures is that when you drop them that is generally considered canceling a future and in this case we have to augment the drop method a little bit for the future mainly because if you're just canceling a future and there's no state stored in an intermediate place then you can just drop the future and it'll be canceled but in this case we're storing things like what's waiting for the lock in our all or some lock and we basically have to have this cancel method to essentially clear out all of the intermediate state if we drop the future because say we drop the future and then it doesn't clear out all of that intermediate state we could bump into a place where basically the lock can no longer service any future requests so always be careful about the intermediate state that you're leaving in your asynchronous infrastructure for the some lock read guard then this as I mentioned kind of indicates the lifetime of the lock acquisition and as I mentioned as an example of where we can call wake this is a great example of returning some lock read guard from the poll method and then calling wait in the drop method that was one example that I gave and what we do here is we remove our record of the read lock and then we call wake basically saying we're no longer holding onto the read lock and so wake up all of the lock tasks that conflict with this because they can now make progress and potentially acquire the lock so that was a lot of information and I just kind of wanted to go through a quick kind of series of closing thoughts about asynchronous rest because as you can see the design definitely requires some knowledge of the implementation to do it right? So some of the benefits are that the highly parallel nature of asynchronous rest allows very good performance on very few threads so in general if you are doing a similar sort of workflow without asynchronous rest you'll also you'll usually be required to have many more threads kind of handling that same workload and with asynchronous rest you can actually have those tasks kind of executing on fewer threads and it gives you kind of an ergonomic way to interface with that. It also has support from any types of combinators and asynchronous workflows that are already built into a lot of the libraries that exist so you don't really have to build everything yourself either. The design and documentation is actually very good and so in general I'm usually able to find answers to questions beyond that the community is very helpful and rust's kind of promise of fearless concurrency also extends to async code. You're going to get all of the benefits of type checking, lifetime checking all of that kind of stuff in asynchronous code as well. Some of the drawbacks that I just wanted to mention is that asynchronous programming requires care. So as you see it can be very, very detailed and it can take a lot of thinking about the architecture, where you're blocking when you're blocking. So an OS thread architecture can be used if requirements don't necessitate async. In general it's not bad to just use OS threads but if you're looking for performance then you know you probably want to go towards the asynchronous side of things. Lifetimes can also be harder to manage in asynchronous code. So asynchronous code has kind of the problem that we have multiple threads operating and so in general we assume that a lot of the time if you have a lifetime then you basically it has to be static and so it gets a little bit harder managing lifetimes and deadlocks and hanging can be a little bit harder to debug. There are cases where you have tools like Tokyo Trace to kind of help you out but it can be a little bit difficult. So with all of that said, thanks for attending. If you're interested in looking into the implementation a little bit more I've linked the repository where our modified read write lock lives and if you have a great rest of your time at the conference. So the question is whether I think asynchronous rust could help in kernel space to solve some IO problems that have popped up and so in general I've heard a lot of discussions I'm also involved with the effort to put Rust in the Linux kernel and in general I've heard a lot of people being very hopeful about asynchronous rust in the Linux kernel. One of the things that I think I've seen work happening on is basically building an executor that works in kernel space. So as I mentioned in the talk one of the major challenges is that you need an executor to actually handle the asynchronous tasks because of the architecture so there basically has to be some work on basically building an executor that works in kernel space and that may be merged by now I'm a little bit out of date on my information but that's one of the things that's popping up. So an executor can work without an operating system so in general the question is is there an executor that works without an operating system and in general there are executors that work without an operating system so I think I've heard of people implementing executors even in lower level situations than the kernels so it is possible, Rust is a pretty low level language so as long as you kind of satisfy the executor requirement and the trait requirement then you can basically implement your executor wherever you want to. The problem gets a little bit more complicated when you're dealing in environments that don't actually have an allocator and that's where you have to get a little bit smarter about how you're designing your executor because that can get a little bit. Okay, I think that's it. Thanks so much. Maybe this, this, this. One, two, yeah, it was muted here. Okay, so you should be able to hear me hopefully enough. So hello, my name is Martin Stepanko and I work in, not Quarkus team, but I work mostly on Quarkus at Red Hat. You can find me on these links and please do if you will have any questions about Quarkus after the talk. Usually I am responding on all of these channels so please reach out if you will have any questions. I want to point out this at the bottom. There is a code for 35% of any many products because I am writing a book which is called Quarkus in Action which you can already find in the early access program for six chapters which are published. So hopefully I know a thing or two about Quarkus so hopefully you will enjoy this talk. Please, I really don't have any slides. This is the closest to slides, what you will see today. I will open the terminal and I will start typing. So please, if you have any questions to what I'm doing, just shout. If you want to see something, just shout. We are here to make it interactive to give you the most of it as I can. So, and with that, I will show you this slide at the end so if you would like to see some things. So hopefully, tell me when this is big enough. Everyone can see even in the back. Okay, so the idea how to start with Quarkus. Of course, you probably know that there is Quarkus.io which is a site where you can find all the information related to Quarkus. Sorry that I don't see you throughout the monitor. You need to deal with it. Basically, if you want to start with Quarkus, the easiest thing, if you prefer clicking start that coding or code.quarkus.io, maybe I should ask, how many spring boot developers or spring developers? Raise your hand. Great, start that spring.io, code.quarkus.io. Exactly the same thing. Just here, you can see that in the spring boot you have starters in Quarkus, we call that extensions. You can see that the least, you probably cannot see how big the bar is, but I can click here, show more extensions and it's even smaller. I can show you it like this. It takes a while to scroll through this even on the computer that is around, I think, 1,000 extensions which are supported. Quarkus itself contains a lot of these extensions directly integrated into the Quarkus platform, however, because the community demand is bigger than we at Red Hat can maintain in one single GitHub repository. We created a so-called Quarkiverse organization which you can find on GitHub. GitHub, Quarky, if I can type it correctly. Yeah, Quarkiverse, which is basically another set of extensions which is basically done by community. Quarkus will give you a repository for your extension if you want to maintain your extension with the whole CI CD integration into the Quarkus platform, so with every Quarkus release you will be tested, you will still be sure that your extension compiles against Quarkus. However, it lives in your repository which is under this Quarkiverse organization and you maintain the code. So in this sense, if you have very cool library, you can get it integrated into the Quarkus ecosystem and still maintain the integration yourself. And as you can see, there is also a bunch of these repositories done by community because we just cannot, like, I don't want to spend any more time compiling Quarkus if I need to work on my extension because of your stuff, more or less. But Quarkus comes also with this command line interface which is basically just a CLI wrapping everything that you are doing in the command line with Quarkus. And one of these things is this create application. So if I want to create, because I don't like to click, create Quarkus application, I can just type, create app.io.exe.tepanq. Let's go with DevConf 2023. Very, sorry about that. So what this does is basically it will generate a Quarkus application with very basic stuff out of the box. Out of the box, we will get REST support, Maven wrapper, Maven config, basically some Docker files. And this is if you don't specify anything. With that create app, you can also specify any extension that you would like to add on the compilation to the project with minus X. So I can start typing name of the extensions here but I will not do this. The basic idea is that some of these extensions come with so called code starts which is basically some simple code to give you the idea of what you are basically integrating. But typically this is very good for people that don't know what they are doing. Let's open this in idea. But if you know what you are doing then typically it's just a stuff that you need to delete. So because out of the box it added REST easy reactive extension, why is this so slow? It, yeah, sorry about that. Give me a second. I need to remember the shortcut. I'm working on it. You just need to give me a time to remember the shortcut. So with these we will get a JAXAres resource generated with some sample string. It's just some simple code to remind you of the API that you are integrated. What that Quarkus extension is, is actually only a maven dependency from IoQuarkus, this one, IoQuarkus group ID and Quarkus dash name of the extension. Quarkus extension is basically integration layer of any library that you are putting into the Quarkus. Why would you want to use Quarkus extension and not directly the library? Because first thing, Quarkus is built around a few concepts which we don't have time to dive into in 30 minutes. So please catch me after the session if you are interested in this. Which you have access to inside the extension API. For instance, build time, slash runtime processing of application code, compilation to native, so adding some custom flex to Gravium, et cetera. So this is why if there is a Quarkus extension, please use that. Otherwise, in JVM mode, if you are running Quarkus on JVM, you can use anything that you like from Java ecosystem. In JVM it's still same old, but brand shiny new Java application. If you want to compile to native, which is a separate use case with Gravium, then you probably want to stick to extensions because that extensions will give you that guarantee that you will only type somewhere down here minus D or minus P native, and it will compile to Gravium native. Because that Quarkus extension is actually responsible to make this happen all the time. So if you have a custom library without that integration it might cause some problems. However, you still have access to some APIs in Quarkus itself to mitigate this if you need to. Okay. So first things that always you will do with the Quarkus project, if you will open it, is so-called Quarkus Dev. Just to mention, if you don't want to use this Quarkus CLI, in the background, all this does is call Quarkus colondane, which is Maven plugin, which is directly integrated here, somewhere, Quarkus Maven plugin. However, if you use this Quarkus to the benefit that it gives you is that it will shield you from the actual builds tool that you are using in the background. So you can also use Gradle, which is not that unpopular as many people think. So if I need to work sometimes on Maven and sometimes on Gradle, I am preferring to use Quarkus CLI because this will always run the correct invocation of command in the background. So what is Quarkus Dev or Quarkus is Dev mode. So basically this is single handedly the most productive feature that I got when I started with Quarkus development. What it does is basically it runs continuously your application in the continuous process and it dynamically recompiles any changes that you are making to your application. And if I am telling you any changes, big enough, let's just verify that I can call my application that generated hello endpoint that I got there. So now it's my application responding from the greeting resource hello from REST Easy Reactive. We are on DevConf, oh, that was a REST Easy Reactive. I will just save the file if I repeat the same call. That class was dynamically recompiled in the background. It was exchanged with some call folders, et cetera, but in the general you have a new version of your application in a few milliseconds. Like in 800 milliseconds. So you can imagine that the workflow that you are doing is typically these terminals are on one monitor, ID is on another one, and I am just continuously typing the code, just refreshing the page on one terminal and my application is running for the whole day. Typically, really, if you are not doing something crazy, which, again, I'm working on Quarkus, so I need to do crazy, so I need to restart it a few times, but if you are writing your application, you don't need to start this for the whole day. And basically, this is the first really nice feature, so we have to go from here. Really, in 30 minutes, I don't have that much time to show you. So here in the bottom, you see that you can press a few buttons inside this terminal and something happens. So H will give me all the options that I can do inside this DevMode terminal. Now, probably second best feature that you will get in Quarkus is Continuous Testing. Since we are already recompiling your application, why we cannot run all the tests directly with it? So if I type R now inside my application, Quarkus will start Continuous Testing and then it will start running my test in the background and, of course, it failed because I changed something which that test is trying to assert. It is expecting help from REST Easy and I now have Ahoj REST Easy. So if I go into the test itself, and actually let me do it like this, I will move this terminal into the same workspace so you will see some magic. So inside that test, I will fix the test, Ahoj Z, and I will just save the file and you will see that DevMode automatically recognizes that I saved the test and it will rerun it and we see that now my test is passing. Nice. For me, this was impressive when I saw this first time. So you get the idea that you are basically just typing your stuff and you have this Continuous Testing enabled and by just saving the file, all your tests are automatically rerun in the background. So if I don't see on the periphery that something red is jumping inside the terminal, I'm not even looking at it. I'm just typing continuously. One heck that I learned myself is actually because if you are familiar with Spring, Spring Boot Test Annotation, this will still run a new instance of Quarkus. Quarkus test will run a new instance of Quarkus against your test. But with DevMode, this is still a runnable application which you can call normally. So if I'm not doing something which is really changing something inside the application which would interfere with what I'm doing inside my application in the DevMode, I will just comment out this Quarkus test application and I'm pinging really the application in DevMode because you see that this took like 300 milliseconds if I need to rerun the test. But I think that with R, I can rerun it and now you see that it only takes 18 milliseconds because this is just HTTP invocation against the DevMode. So in a few milliseconds, I can run hundreds of tests against the DevMode if I'm just testing something which is not changing something or if it's changing it in a way that I can still continue what I'm working on. But of course, then you will just uncomment the annotation and it will start the new instance against which you are testing. Okay, it takes 300 milliseconds which can bump up depending on the test, but you get the idea. So I put it back here and press H again. And what is now happening? Yeah, sorry. So what else you can do? With that continuous test things because definitely there will come a question like I have a bunch of tests and this takes time. You can play with these, you can only set to run the fail test, you can, I don't know. Oh, okay. Quarkus will try to do its best to guess what kind of code you are changing and it will try to not rerun all the tests all the time. It will try to rerun only the set of tests which are relevant to the code which you are changing actually on that iteration of DevMode. But of course, this is still software, so it's a guess, but typically it works pretty good or you can just toggle the broken only mode which will only rerun tests that fail. So we'll get there, but yeah. So the question is if this is only running unit tests or it can run integration tests too. Actually, this is integration tests because with that Quarkus test annotation you are running the new instance. So yes, this will also run Dev Services and we will get to it. So this will run everything in the background continuously. Okay, I don't want to jump. So we will get to the services definitely. Another cool feature that you can play with here is this DeVui which by pressing D inside this terminal you see that maybe you don't, but in the bottom my browser opened a new tab which is basically, oops, I didn't want to click. I want to put it down. It's basically a development clickable interface inside your Dev mode where each extension can add whatever they want. So Arc, for instance, which is our CDI container which is responsible for dependency injection, you can just click to list all the beans inside your application, like my greeting resource for instance. Or you can see these very nice removed beans or removed components because as I mentioned, Quarkus is doing a lot of stuff at build time. Of course, in Dev mode build time and run time is the same thing because we are already building the application at the same time, but if I would, maybe even clean package, so I would actually compile my application, Quarkus would remove 65 CDI beans which are never going to be used inside the final produced jar, in a final produced application. So this is like 65% less, 65 plus is less memory and less processing at runtime, just to give you the idea. Okay, extensions itself. We will get to it. Okay, so I am here in the right side terminal, I am in the same directory. So here I can run again that Quarkus X which is short for extension, which will print all the extensions that I have currently installed in my application, which is not totally true. Let me restart Quarkus here, so you don't think that I am telling you something wrong because you can see here that I have that CDI, Small Right Context Propagation Vertex, which are some extensions which are always included inside Quarkus because CDI, for instance, that Arc is integral component for Quarkus itself. Quarkus internals are wired through that Arc container too. Vertex, for instance, is a reactive set of libraries which is basically the core of Quarkus. Everything in Quarkus is reactive and we will get to it. So this is why we have only REST, easy reactive because this is the only optional one, which we edit on top of that when we generated the default application. If you type dash i here, which is installable, it will list you all the possible extensions which you can add to that Quarkus application, which is a very long list because we have only one extension. You can play with it, but we don't have time. There are categories you can search for particular keywords, et cetera, inside the terminal itself, or if you are lost, you can search inside that code that Quarkus.io or what I actually prefer because, sorry, I need to go on Quarkus.io because typically, if I don't know what I'm looking for, I need to find the documentation of how to use it. So for instance, when I was trying to learn cache, I can just search and learn documentation which is basically official Quarkus documentation that consists of set of guides which are step-by-step instructions ranging from typically 15 to 30 minutes, but some of them are even 60 minutes on a particular extension, how to edit to your project, how you can configure it, how to use it. So after that 15 minutes, typically, if I go into data cache, after these 15 minutes, I have the idea how to edit to my project, how to use it inside my project, somewhere, blah, blah, blah, somewhere, you get the idea. I personally, yeah, here, cache resolved. I personally really like this kind of documentation because in an interactive way, you will really create an application by in which you can continue experimenting after you are finished with this guide. So typically, for me, if I don't know what I'm looking for, I am going to the Quarkus guide. But you have the idea. So if I want to add an extension, you have directly a command here or you can just edit to your application, but since we are using the command line, I can use it here. So I will add, for instance, the one that I am maintaining which is more I helped and I want to show you a magic. You see that just by adding an extension, Quarkus also restarted and now I have new extension installed in my application. And actually, this extension comes also with some API, slash queue, slash help. So now I have exposed a new API. If I remove that, or before I remove it, I will show you both actually that Quarkus extension add is doing. It will basically just add the new Quarkus-smaller-I-help dependency into my POMIX ML. But this is enough to restart the dev mode and you have that extension installed. So again, not stopping the dev mode continuously running, I'm just experimenting any style. I can remove it and hopefully this will just remove it and after the dev mode starts, now I don't have that endpoint anymore because I removed the extension, which was ending. The funny part is that, let's take that cache one because I don't remember any extension. I will just copy paste, or maybe I will just copy paste this because I don't have that cache in the latest version, I think, locally on my computer. So yes, you see that even I don't have it on my computer and what Quarkus does, it will dynamically downloads it from Maven and it still works. So not stopping this really for the duration of what you are doing. Okay, so let's remove this. Some more fun stuff. One of the very good features of Quarkus, after this, what I was showing to you is framework for Hibernate ORM, for database access, which is actually called Hibernate Panache, which is a framework done by Hibernate Team, which is not in JPA, but it's something which is the preferred way how to do database access in Quarkus. And with this Panache, you have basically two options, how to implement the entities, active record pattern and the repository pattern. So if you are coming from Spring, you can do very similar repositories in Quarkus. But what I want to do is, because I can just add the dependencies directly into my POM and just save that file and Quarkus will still restart in the background, I want to show you that what I basically added is just that Hibernate ORM and I added a driver for Postgres. Just because, and this feature that I'm going to explain now is called the services, just because Quarkus sees that I added a driver for Postgres, I probably want to connect my application to Postgres, but I didn't configure any connection to Postgres. Quarkus goes ahead and starts a new Dev Services container for Postgres for me. And this actually started Postgres container in the background. And if I press C here, you will see that it started a new inspiring Gagarin container for me. And it automatically injected configuration that automatically connects to the container. So now my Dev Mode application is actually connected to a real Postgres database running in the container, which is tied into the run of the Dev Mode. If I kill now the Dev Mode, it will stop the database in the background. And this feature, Dev Services, it's something, and now you can start typing basically any codes, Avenger, Extents, Panache, and maybe I need to reload, maybe. Come on, Panache. Oh, come on, Panache, entity. This needs to be entity. This is what I'm typing here, is that active record pattern. So basically, we are encapsulating all database operation into the entity class itself. So I can just type my properties in a public fields because I cancel string name, public string, civil name, and public boolean snapped. And this is everything that I need to do for my entity. ID is generated because this Panache entity contains the generation. If I don't need to extend it, of course, if you extend the Panache entity base, you can customize it, but you don't need to. What is happening? Go back. And because Clarcus is doing a lot of processing at build time, we actually can, like, not force you, but we can tell you to use public fields because if you have private fields with public getters and setters which are just getting and setting, it's just public field encapsulated. What Clarcus does at compile time because it can also generate bytecode, it will first thing, generate getters and setters for you in this class, and of course, default constructor, like false hash code, et cetera. So it's a valid entity. And it will rewrite dynamically any accesses to this public fields in different classes to use that newly generated setters because you can dynamically transform bytecode. If I want to use it inside my entity, come on, come on, come on, come on, get V. Let's do it at path full, and this will be a list of, come on, list of Avengers. What is happening now, computer? Get all, and a lot of exception. Because everything is now encapsulated inside that entity itself, so select stuff from Avenger entity is just list all. Now you have a bunch of static metogs directly on this resource, and you also have create, we will return Avenger, a bunch of, we will take an Avenger. If I can type Avenger, this is not autocompleted. Of course, we need to still handle transaction, but what I can do is just on the Avenger itself, call some methods which will basically encapsulate the entity manager operation. So if I want to persist something, it's just Avenger that persists, and I can now return the Avenger. We don't need to play with this that much, but you can imagine that, for instance, that persis operation will actually do the persistence into the database, so you will get the ID generated, et cetera. I don't have that much time to show everything. So if I hopefully, and I need to move this because I want to have this on path slash, and we will move hello on path slash hello, so we will get, let's do Avenger. So if I now call 8080 slash Avenger, hopefully I should get a list of Avengers. Yes, I am getting a list of Avengers, which is nothing, but with Panache, all you can actually do inside the dev mode, create the file in certain domain resources called import SQL, and I hope that I have Avengers import, I have, where you can place any SQL that you want to get executed when the Panache is bootstrapping. So hopefully, if now it starts, yes, I am getting a bunch of Avengers, but I don't have the correct extension, and this is the one place that it sometimes break, so wish me luck, and it broke. So this is the one place that, because I am rewriting the extension, which is encapsulating in the background, thanks to the REST easy reactive extension, REST easy reactive Jackson is just adding Jackson support on top of REST easy reactive. I need to restart the dev mode, but at least I can show you that because I killed the dev mode, it also killed my Postgres database, and now a new Postgres database is starting for me because I am restarting the dev mode again, and now I have a new Postgres for the new run of the database. And this time, hopefully, I am getting back Avengers in Jace. Okay, so this is basically, the dev services is automatic management of test containers in the background, so basically this works out of the box, and because I don't need to remember this, I can just type here the services. This works out of the box for a bunch of stuff, of course, we are adding this continuously, so it works for all the databases, like Postgres, MySQL, I don't know what else is supported in the container itself, MongoDB, as you can see here, but it also works for Keyclog, for instance, if you are working with like OEDC, for instance, it will automatically start SQL for you, for dev mode, for IMQ or Kafka, it will start Red Panda, Broker in the background. So it's very streamlined development where you don't need to care about the infrastructure that is basically you are depending on, it is automatically managed in the background for you. And these dev services are not only limited to dev mode, but also if you write any tests that would need to access the database, actually that Quarkus test will start a new Postgres container just for that test execution. So every time that you are running tests, if I would run continuous testing where I would be persisting something to a database, it would run another Postgres database, which is just used for the testing. So no more like application development against Postgres or MySQL and testing with H2. Now you are testing with real database. Go ahead. Yeah, it would start a new Postgres just for that test execution. Okay, so one last thing, because we don't have that much time, but if you are interested, I have computer we can sit somewhere after the talk and I can show you all the stuff. One other stuff that I like personally is this REST data extension. Thank you. This is exactly what I can do in five minutes. So this is basically extension, which is generating REST resources based on your entities, which is, as I learned, 95% of application development in Java is crud over database. So what I can pick here is basically because I already have JDBC and REST is reactive JSON, I can just copy paste this ORM REST data panache extension and paste it into my application somewhere if I can find, save this. And all I can do now because I always forgot this when I'm talking, so I will just copy paste entity, panache entity resource, okay. I can just create a interface, call it Avenger resource and this will be interface that extends panache entity resource of my entity and of my type of my ID. Again, I need to give idea and hint that I have a new extension and that's it. I think that's it. So still nothing? Okay, now we work. I will on purpose remove this Avenger here and I will remove this here so we no longer have, okay. At least I can show you that if I save the file when it's still not compiling, the mode will not crash. It will just print you the error and give you an opportunity to fix it. So what did I miss? Oh, this looks good. Good? No. Avenger resource Java. Oh, I have, what am I missing? No, it shouldn't have needs for part, you know this one. This is like Java. Cannot find symbol. Maybe you need, oh, I know what is happening. This will probably not work because I don't have that extension downloaded because this is kind of on purpose that Quarkus will always do release like one day before I have this talk. And I will run into this kind of issues. So we will give you the second but basically what this does is basically it will create that CRAT operation rest resort over our entity that I will pass there out of the box. So now I should still have that Avenger but now it's coming from the generated resource and you don't need to trust me because, so this one's more open API. I will just add a new extension which I also don't have probably downloaded. Come on. My computer is somehow slow today. Sorry about that. Come on. Okay, any questions? Until my phone internet will go ahead. There is a separate extension for several extensions actually for database migration. We can talk about it later. Now it's taking a little bit of more time because it needs to restart the database if I'm adding a new extension because it's a broader, even I'm out of time. Okay, so thank you. I will just show the last thing. Just so you trust me, small open API, come on. Where are you? Where is open API? Oh, come on, I cannot find it. Oh, okay, I cannot see it here. Okay, let me do it one more. Okay, I don't know what is happening. So I will show it after the talk because we are out of time. But basically, you need to... Wait a second. Give me two minutes. You need to believe me that it basically generated a new Jaxarys resource with all CRUD operations over the database, especially as, exactly the same way I was doing it, that greeting resource. Just this is doing it out of the box. So basically, the CRUD over the database is two classes because I don't need a greeting resource. So now it's over. Thank you for your attention. Can you hear us? Perfect. Okay, hi everyone. Thank you for attending this talk that will be about securing Python project supply chain. So a few words about ourselves. My name is Maya. I'm a software engineer at Red Hat in the emerging technology security team. And hi, my name is Fridolin. I used to work at Red Hat. I used to work at Datadoc. I'm an entrepreneur now. And you can find more information about us on Twitter, Mastodon, or GitHub. Okay, so let us start this talk with a simple question, which is why protecting your supply chain actually matters. So if you follow open source security or maybe Python news, you might have seen this information pass, which is that not too long ago, PyPI maintainers decided to temporarily deactivate user registration on the index, and also the upload of new packages, of new projects, because they got overwhelmed with the volume of malicious packages that got uploaded on PyPI. So they couldn't handle it. Everything was very overwhelming for them. So they had to suspend all those uploads and maybe a little more, less recently in January, you might have seen that PyTorch, the famous machine learning library was compromised. Some nightly bit of PyTorch was a victim of a dependency confusion attack. So we'll see what a dependency confusion attack is later in this talk. But this is also a supply chain attack. And this is not a coincidence if you have seen all those articles passed recently, because a study showed that year to year, for the past three years, supply chain attacks have increased by more than 700%, which is very high growth rates. Supply chain attacks can cause a lot of damage to an organization or a project, can be both financial and also reputational. And if you have a weak supply chain, this can get you into a lot of legal and compliance issues as well. So some new regulations were put into place to try to secure software supply chains. And one of the most famous ones was issued in 2021 by President Biden. And this was the executive order, 1421. It was called Improving the Nation's Cyber Security. And it was issued after the pretty infamous SolarWinds attack, which affected a lot of big organizations and some branches of the United States federal government. And this executive order basically tells corporation that collaborate with the US government that they should be more strict about supply chain standards they adopt. So for example, it pushes organization to adopt secure authentication to servers, or maybe like strict protocols and this kind of things if they want to sell software to the government. And this includes things like software based of materials that we'll cover later, which are basically like a list of the ingredients that compose your software. Okay, so now let's take a look at supply chain threats and vulnerabilities. Maya already mentioned infamous SolarWinds attack. It was an attack on SolarWinds Orion platform that is quite well used. It's a network performance monitoring platform. And what was done here, attackers basically uploaded malicious DLL file that was subsequently pulled by a build system which produced software artifacts that were consumed by customers. These software artifacts were properly signed so customers were not aware that there are some malicious behavior. And the effect of this attack was quite large. So more than 18,000 customers were affected. And more than 400 of US Fortune 500 companies were affected by this attack. You can see, for example, the White House or Pentagon or State Department or National Security Agency that was affected by this attack. What happened, SolarWinds stock price went down. That's not the worst thing, but attackers were able to access confidential information of customers. What could SolarWinds do better? They could follow Salsa framework. Salsa framework is quite recent. Quite recently, it went to version one. Salsa stands for supply chain levels for software artifacts. So it's not a source, it's not a dense. And Salsa defines four levels. Starting from level one where there are basically no requirements on the build platform up to level three, which is like properly hardened build platform. Salsa introduces this image. So you can see there's a producer that produces source code and the source code is stored in a source repository. Then build platform pulls sources, pulls dependencies and creates a package or the resulting artifact that is subsequently consumed by consumers. Salsa defines threats in each step. So what can go wrong, but also defines how to prevent from these threats. So how to, for example, prevent from submitting unauthorized code or make sure that the source repository is not compromised. Now let's take a look at some toolbox to protect your Python projects. So the very first toolbox thing we will talk about is stuff. It's the update framework. It's called tough because it solves tough problem and that is basically securing updates and preventing temporal attacks, rollback attacks or key compromise attacks. The reference implementation is based on Tandy. That's an updater that was used in Tor. Tough is let's say more generic but borrows many ideas from Tandy. There's also obtain that is similar project to tough that is used in automotive industry. It's used by companies that populate updates to cars. The reference implementation of tough is in Python. You can find it under Python tough and one company Datadoc use tough to secure agent integrations that software that is shipped to customers and uses tough and in total that we will talk about later to securely ship software to customers. There are also efforts like PEP 458 and PEP 480, PEP meaning Python NNS proposals to secure PyPI itself and we will talk also about it. Tough is also used in Zigstore to securely download public keys for instances of Zigstore. I already mentioned in total. So in total is a framework to secure supply chain. What it does, it basically defines what each step in a pipeline should do. So if a pipeline should, for example, write something or package something, then there is created attestation. So you are sure that each step in a pipeline performed desired task and it's properly signed. So users who consume resulting artifacts can verify that each step in the pipeline, each step in the chain did its job properly. Now. Okay, so now let's go to an important part of every supply chain, which is code signing. And for this part, I would like to introduce quite new projects when it comes to space of code signing, which is Zigstore. So Zigstore was started a few years ago by different institutions and companies like Google, Red Hat and Purdue University to make software signing more accessible and simpler. So it provides a very secure and simple interface to sign any kind of codes and containers as well. And to use it, you don't need any specific cryptography knowledge, which is kind of often improvement if you compare it to other signing standards like PGP, where sometimes a configuration can be a bit complex and you might need to know about the underlying cryptographic protocols that the tool uses to sign your software, which is not the case here. So one nice feature of Zigstore is that it uses OpenID Connects to sign software instead of self-managed private keys. So OpenID Connect is an authentication protocol. And what it allows you to do is to bind your email address or any kind of identity, like for example, GitHub workflow run to your signature. So instead of having a permanent public key, bind to it, you can have something more identifiable for your end users like your email address and more personal. Zigstore has a client implementation in Python. It's called Zigstore Python and you can check it on GitHub. So it's a pretty good tool. It has a lot of integrations like for example, with GitHub CI runs, you can use it as a GitHub action, you can use it as well as a CLI. And I put an example on this side of what it looks like to sign with Zigstore Python. So it's very simple as you can see. If you want to sign a package, let's say a Python package in this case, the only thing you need to do is enter Zigstore sign in your package and then it will redirect you to an OIDC session. So basically this is a web browser page that opens up and you need to enter your credentials to an identity provider which is for example, Google or GitHub which are currently supported, enter your password to authenticate and then it will validate your identity and bind it to the signature of your artifact. Here's the package. So you sign your artifacts and the second step is for your end users to verify the signature. So here again it's quite simple. This is the second command. You just need to run Zigstore Verify here identity and you need to pass the email address of the signer which can be found on the signage certificates which is provided by Zigstore and you need to pass as well the URL of the OIDC provider. So here for example, the signer identified with GitHub accounts so you will need to pass the corresponding URL and just what we call a bundle file which basically is some kind of verification file that contains all the materials you need to verify a signature with Zigstore. Okay so before we skip to the next part I would like to make a quick reminder about the difference between what is malicious and vulnerable. So a vulnerability in software is some kind of flow in a computer system that can weaken the overall security of the system but the single vulnerabilities that they can be exploited but they are not always exploitable and actually some study found that less than 10% of vulnerability are actually exploitable and less than 1% of them are actually exploited. On the other hand, malicious software or malware is any kind of software that is intentionally designed to cause disruption in your system so that includes for instance ransomware or Trojan horses or viruses for instance. So to found out about vulnerabilities that exists in software libraries for instance you can use vulnerability databases. So here we chose two examples. The first one is OSV. It's a distributed vulnerability database for open source projects and what it does is it aggregates vulnerability databases from different ecosystem like Golang, Rust or of course Python with spy PI data and it makes them available in a format called the open SSF vulnerability format. The second example we picked is GWAC. So it stands for graph for enlisting artifact composition and GWAC is a graph database that aggregates all kinds of software metadata about security like for instance artifacts, identities, attestations like S-bounds we'll talk about later and it stores the relationships between those artifacts and metadata inside the edges of the graph database. So GWAC is quite useful if you want to prevent supply chain attacks because it allows you to understand better the relationship between the different components in your software and how they are used together for instance. If you try to apply this to the Python ecosystem there is no direct support for vulnerabilities in PIP, the Python package installer. Nevertheless there is a tool called PIP audit. It uses the OSV database that Maya mentioned and what it does it audits already installed Python environments. So you can issue PIP audit and it will show you vulnerabilities but also packages that introduce these vulnerabilities to your environments. There was also an experiment called PIP Cuddle. It basically resolves application dependencies with, without vulnerabilities or only with vulnerabilities that are acceptable ones. So it accepts a configuration file. In this file you state which vulnerabilities you are fine to have in your application and then PIP Cuddle resolves application dependencies and then you can install all the dependencies including transitive ones. There's also a project called security constraints. What it does it consumes security recommendations by GitHub. So you need to provide GitHub token and it can generate security, it can generate constraints for your application. So the resolution process then checks these constraints and resolves application dependencies without vulnerabilities. The PIPI is quite, let's say, bad when it comes to number of malicious packages published each day. So PIPI maintainers claim that there are roughly 40 malware packages introduced each day and they need to be taken down manually. There's a dataset called malicious software packages dataset and it aggregates packages that were published on PIPI but were taken down because they were malicious. So if you want to experiment with malicious code you can do so, just be careful. There was also an effort which is an open source tool called GuardDoc. It scans Python source code and tries to find patterns in the source code that can be malicious. GuardDoc uses SAMGREP rules to statically analyze the source code and give you information whether the given package is malicious or not. GuardDoc is not used on PIPI but you can use it on your own or you can plug it into your system. Maya already mentioned SBOM. SBOM was mentioned also in the executive order issued by the US president. SBOM stands for Software Bill of Materials. What it does, it basically states all the software that was used to create or assemble some application. So in this listing you can find all the dependencies, their version and there are two formats that are used in the industry, Cyclone DX or SPDX. There are more but these are, let's say the most used ones. If you have a software bill of material for your application, you can also use VEX that stands for Vulnerability Exploitability Exchange and VEX states whether the given vulnerability that is present in your application is actually exploitable. So if you have a vulnerability in your application, it doesn't mean that an attacker can exploit your application because, for example, that vulnerability doesn't need to be on the call path or the application configuration prevents from exploiting the given vulnerability or you deploy your application into an environment in which that vulnerability is not exploitable. There were two efforts. One was OSVDef, efforts to, let's say, standardize VEX in the industry. They introduce a file that you can maintain in your Git repository and that file states information about libraries that you have, information about vulnerability and whether that the given vulnerability is maintainable or not. OSVDef also proposed a way how to, let's say, manage multiple VEX files across repositories so you can check multiple files when you are consuming multiple libraries. There's also open VEX standard in the industry that was pushed by ChainGuard, ChainGuard the company, and it proposes a standard in the industry to describe VEX for your application. So here is an example. It basically states vulnerabilities and also their status and what introduce the given vulnerability. If you want to run your Python applications, you can use Python container images. So Red Hat produces some Python container images, UBI or Federa-based source to eye images. The main benefit of these images is that it's large RPM ecosystem with Wetted and very well maintained software. You can use Micropipan in these container images and on the other hand, there's also ChainGuard's Python image. It's based on Wolfi, they maintain their own ecosystem. For packages and it uses multi-stage builds. So you have one container image that is used for building your application and then another one that is very minimal just with Python runtime to actually run your application. They try to minimize number of CVs present in the containerized environments. Another thing you might want to do, if you want to check for potential vulnerabilities in your source code is use static source code analysis and we picked an example here of such a tool which is called Bandit. Bandit was started by the OpenStack security team at Red Hat and what it does is that it scans the files in your Python projects and then from this file it generates ASTs, abstract syntax trees and it uses plugins to analyze the risk for potential vulnerabilities but you can choose which plugin you use. So for example, you can choose if you want to detect things like how to do passwords, shell injections or crypto mining for instance. Okay, so now let's go over some initiatives that the Python community has taken recently to secure the ecosystem supply chain. There are a bunch of them but we chose a few important ones. So the first one is mandatory 2FA for maintainers of critical packages. So this is a list of packages that are widely used by the Python community and by developers and so PyPI maintainers chose to give away for free with a sponsor security keys so that the maintainers of those critical packages can securely authenticate to PyPI and upload packages in a more secure way and more recently they announced that in 2023, so this year, 2FA will be mandatory for every package maintainer on PyPI. They also have another initiative which is Trusted Publisher, quite recent as well. Trusted Publisher uses the OpenID Connect protocol again for users, maintainers of Python packages to use an OpenID Connect identity to get a temporary identity token which allows them to get a temporary access key to PyPI instead of the normally the API key would store permanently for example and reuse in CI workflows to publish packages which is a bit more insecure. And one last measure is pretty recent. They chose to drop support for PGP signatures from PyPI. So a Python community member showed, made an audit of how PGP signatures were generated and used in PyPI and found that they weren't that useful and were actually quite hard to maintain so they chose to just drop support for it. So now we'll go over more initiatives to come from the Python community. Okay, so let's take a look at improvements. We will talk about PEPs. PEP stands for Python Enhanced Proposals. That's basically a way how to describe what you want to do in the Python ecosystem and then the community decides whether it's good or not. The first one is PEP 458. That's about securing PyPI downloads with signed repository metadata. This one was accepted and it uses stuff, the framework that we discussed before and it basically secures downloads of Python distribution so you can be sure that you are downloading the right software from PyPI if you're a PyPI consumer. It's still work in progress. Then there is PEP 480. That's about surviving a compromise of PyPI. So imagine someone compromised PyPI and uploads packages there on packages. How do you want to check that? This PEP describes a way how to do it. It's based on PEP 458 and it adds developer keys to warehouse or PyPI. Currently it's in a draft state. Also, there might be a new PEP in few days so stay tuned. Now let's talk about dependency confusion attack. So at the beginning of presentation we described one dependency confusion that happened in the PyTorch ecosystem. So if you are a user and you install flask or PyTorch from two indices, let's say PyPI and PyTorch, you want to consume PyTorch from PyTorch because there are let's say special builds that you would like to use. In the Python ecosystem, these indices are treated as mirrors. So it doesn't really matter for PEP or potentially other installers which index is used to consume a package. So in your example, you would like to install flask and Torch and also transitive dependencies of these libraries but which index should be used. So imagine that in this case you are consuming Torch from the PyTorch index and dependency of Torch called Torch Tritron also from the PyTorch index. But if you are an attacker and you upload a package to PyPI with the same name as the one on PyTorch index, then it can cause troubles because these indices are mirrors, right? So the Torch Tritron can be malicious and you can consume malicious package. If you would like to detect possible dependency confusion in your Python applications, you can use a tool that is called Yorkshire. So acute name, right? Then there is another PEP called for extending the repository API to mitigate dependency confusion attacks. That's basically the PEP that's addressing these dependency confusion attacks. It's still in rough state, but what it introduces, it introduces a way how to create a contract between indices. So imagine PyPI says that project Torch Tritron is trusted on PyTorch and PyTorch index says Torch Tritron is trusted on PyPI. So there is a contract between these indices and consumers or installers of packages can verify that these packages are or trust each other. So they can pull from PyPI or PyTorch. If there is no distracts information, then the installer can fail and notify about possible dependency confusion. Another PEP called recording the provenance of installed packages. That's PEP 7.10. It's based on PEP 6.10, recording the direct URL of installed distributions. So if you install a Python application and you install it using a URL, so let's say you use GitHub to download an archive of PIP, then PIP and other installers create a special file called direct URL JSON in the metadata directory called distinfo and track information that you installed PIP from GitHub. And this is the URL, this is the hash of file. Nevertheless, there was no way how to find out what you actually installed if you issued just PIP install, PIP or flask or whatever application or whatever library. So PEP 7.10 introduces a new file called provenance URL JSON that states what file was downloaded, what were the hashes, when you installed packages using their name and optionally their version. It also tracks information about indices. It's still in draft state, but if you are interested in it, feel free to check it. And now we will have an opportunity to win something. So for those who are listening to us, there can be something good. So the rules are, we will ask a question. I will try to check who raises hand the first and then we will give something. Does it sound okay? Yes? Okay, let's do it. So the first question, which project mentioned in this presentation does this photo relate to? Yeah? Sorry? Salsa, yes. So we have a winner. And so that's right, you have like mild salsa deep. Together we, that's it. Okay, so another. To which project mentioned in this presentation does this photo relate to? Wow, yes. Yes. Can you guess also the price? We actually borrowed this idea from Guac people, so developers behind Guac. Yes, so Guac, graph for understanding artifact composition. Okay, to which project mentioned in this presentation does this photo relate to? Anyone? Yes. Into the ring, into the ring. Wait. Okay, so the price is, oh yes, so salsa is the correct answer. And you get a ticket to salsa lessons so you can choose between Maya or me. Okay, okay, there's also plan B, so. Yes. So hot salsa it is. Okay, to which project mentioned in this presentation? Yes, we have? Yes, it's Yorkshire. So it could be also guard dog. So what do you want? It's a lollipop. Okay, and now the tough one, to which project mentioned in this presentation does this photo relate to? No. Yes? No, no, that's not it. Okay, a hint? You win chocolate. We wanted to ask for signatures of zig store guys, but maybe next time. Stickers, oh, that's a good idea. And I think we have some space for questions. So if you have any questions, feel free to ask. So I'll repeat the question. So I mentioned that there probably be some pep, like new pep. So I collaborated with one 5 PM maintainer, it's Donald Stuffed, and there is something written. So let's see if it will be public or not. There are also other engineers involved. So for example, Trisha Angu is behind Tough and other people. So let's see if it go public. Yes? Okay, so I think if you want a complete answer, William Woodruff published an article about it, which is pretty explicit, and he gives details about the whole audit he did on PyPI about GPG signatures. I think it's called something like GPG signatures worse than useless. So that's pretty explicit as well. So I encourage you to check it if you want a really complete answer on why GPG signatures were not worse maintaining anymore. Yes? Oh, yeah, sorry. I just repeat the question for, so the question was, why exactly are GPG signatures not considered worse maintaining anymore by PyPI? Sorry, yes. Okay, so the question was, if I understood, if we considered ChargeGPT as a tool to prevent malware, and there is a tool called Package Hunter that can help detect vulnerabilities, is that correct? Runtime vulnerabilities, okay. I have not considered ChargeGPT personally as a tool for this talk, at least we didn't provide any example. I'm sure some tools use it now, but honestly, I didn't search into it. I don't know if you did, okay. No, we haven't considered it, yeah. Okay, thank you. Thank you. Perfect, my talk won't last that long anyway, so that's fine. Everybody knows this. Yeah, but I've done this talk a couple of times. Krabitschetsan, what is Krabitschetsan? Box, ah, that's very nice. Yeah, yeah, wait, wait, I can do it. I just didn't notice, yeah. Yeah, you can. Look. Okay, great. And five minutes as well. Yeah, your colleague told it. No worries, just need some water. You give me the go? Thanks. Hi everybody, thanks to be here for this talk about chopping the monolith. I'm Nicolas Frankel. Well, nobody cares anyway. Just to be on the safe side, I do this talk in multiple countries, in some countries it can be considered offensive to contradict the attendees. I'm sure here it's not the case, but if at any point you feel offended by my opinions that contradict your core beliefs, just leave at the moment, and don't hold it against me. So with that being said, we live in interesting times where if you are doing microservices, by default you are a good developer, and if you are working on the monolith, you are a bad developer. Who here is a good developer? Good, so you are doing microservices, right? I know, hey, come on. That's the assumption. Who here is doing microservices? Come on, raise your hand, I want to see everything. Okay, half the room. That's an interesting take. So we live in an industry where we are supposed to be engineers, but we mostly live by hype. Microservice is good, monolith bad. You might see that I have a bit of gray hair, which means that I'm probably either old or experienced or both. I would like to think that I'm experienced, and I want in this talk to perhaps challenge some of those beliefs and offer some ways that you can do otherwise. So just to get back on the initial microservices stuff, I want to read it to you, but basically this is the definition of microservices by Martin Fuller. So if you have anything against this definition, you bring it to Martin, not to me. That's not my problem. This I consider the truth. If you are not happy, well, go back to him. Again, in the same article, he describes several characteristics of microservices, right? And in general, cool, right? Once I did this presentation internally, not this presentation, but a presentation, and I thought that was a very cool demo. And when I did this stuff, everybody said, oh, I thought that was my demo, not at all. They didn't care about the demo, they care about the stuff. This is $100, anybody can buy it, huh? Anyway, so the microservices stuff, well, componentization of your services, everybody does it, and that's basically the definition. Organize the round business capabilities, blah, blah, blah, blah, blah, products, not projects. Among the people who are doing microservices or said they were doing microservices, who is organized around products, which means you have a star date and you have a recurring budget over the years, you have no end date to the project, at least at the beginning, at some point it might stop, whereas a project has a star date and an end date and a budget over the project and probably over budget after some time, but that's normal. So who here is organized around products? Okay, like 10 hands at most, yeah, because you have two hands. Good. And so you see that people who are saying they were doing microservices, according to this position, they are not really doing microservices, they pretend they are doing microservices because they have some of the characteristics, but not all of them. Now, if we get even further, I remember when this microservices trend started, like everybody was talking about Conway's law. The people who advertised for microservices said, you must revert Conway's law. Who is not familiar with Conway's law? Okay, so the idea is in the thesis by Melvin Conway that your architecture mimics the communication channels of your company, of your organization. That's his thesis. And yeah, I was happy to meet him. Come on, I need to brag a bit. And so because we are organized like this, so we have front-end engineers, we have middleware engineers, and we have database administrators, then we have a layer architecture. So if we want to do microservices, we need to use Conway's law. So if we want this architecture, we need to have self-contained autonomous teams. So I will ask the question again, and I have nothing against you. Who here is organized, and who is doing microservices, is organized like the diagram on the right. Not the half-half. In general, it doesn't go in your direction if it's half-half, but anyway. Okay, still 10 hands, but not the same who were organized around products, which is interesting. And the poster child for this organization is Amazon Web Services. They popularized the idea of two pizzas team. Well, the side of the pizza depends on which country, if I belong to the team that's different, but basically they are like small self-contained autonomous teams. So the idea is you want to go from the organization on the left, which is probably your legacy organization, to the organization on the right. And for that, you need this person. So now the question is, who is this person? Who I want to illustrate with this picture? Who? Banana man. Banana man, yeah. You're a second person who tell me that. I don't think you have many banana men in your organization at the moment. So that's the wrong answer, but that's a funny one. Anyway. Markies. A monkeys. A monkeys, yes. Okay. You are now responsible for the real answer, which is middle managers. Is there any monkey in the room who belongs to the definition? And now everybody is super afraid, right? No, on a more serious note, yes, if you want to move your organization into a certain direction, you need the help of middle managers. Of course, you need to have the willing and the backing of the executives, but nothing will happen if you don't have like the help of the middle managers. Never works. And now comes like the $1,000 question at the head in this group is a middle manager. In this group as well. In this group as well. Where are the middle managers here? They become engineers, I'm not sure. Another answer would be pitchy. But actually is that the problem, you want the help of people to help you make themselves completely useless. Because even if on the organization, on the right, you might need some middle managers. I mean, the basic idea of autonomous self-contained team is there is no manager. And if I might hands, you might think that again, something against the managers, I have something against the bad managers. And bad managers, main tasks is to accept my vacation requests to be the proxy of my raise every year when I have one. And when I mean the proxy is you cannot negotiate, right? It's just, hey, you've got X percent. Yeah, thank you or oh, I'm super unhappy, but whatever happens is just a proxy. And do some reporting. This is what bad managers do. I have a couple of good managers who helped me achieve my job. They were really supportive, but like in 20 years, five, perhaps four. So that's the problem. So I believe that it's very, very hard if you have a legacy organization to do microservices. What you might achieve is you will have technical microservices and still your regular organization, which means you have none of the benefits of microservices and all of the problems. Congrats. Now, how does it work in real life? Well, you have one senior architect, someone who is very technical. He reads about microservices in a magazine. He said, wow, that looks cool. We can solve all our problems. He has some influence in the organization, talks to the chief executive officer, talks to the other executives. He got buy-in. You go full microservices. As I mentioned, you've got none of the benefits. Then he quits. Who is going to maintain the microservices architecture? Because you call them monkeys, right? So the person, the people who are left need to maintain that pile of crap. But at least the guy has all the women, because I mean everybody needs to be represented as microservices on their resume. Full win for them, full loss for you. But the question is, why are we so obsessed with microservices? I mean, there must be a reason somehow. So in another article, Fuller lists three main benefits of microservices architecture. The first is strong module boundaries. The second is technology diversity. And the third is independent deployment. So let's see if any of them is reason enough to move to microservices. First one, strong module boundaries. Yes, you have completely separate repositories, unless you have a mono repo, and then I never understood why you can have microservices and mono repo, but let's forget about that problem. So you don't have related code base. Of course, you have no dependencies between them. This is strong module boundaries, right? Yes, this is, however, a small benefit for a huge, huge investment. There are other ways to get strong module boundaries. For example, if you are in the GVM, you can have your modules, your, like, GPMS modules. You can have, like, static checks with Sonor that says that this package cannot call this package. There are other ways to achieve that. This is not reason enough. It's like saying, oh, I want to learn with my left hand so I will cut my right hand. It will work, sure, but not sure it's the only way, and it has a strong side effect. Technology diversity. Everybody loves diversity. We are in IT, we must be diverse. And so technology diversity is saying, hey, we are a Java shop, but I love Rust. What? I love Rust. I also love Kotlin, but Rust is a better example. So I love Rust, I want to learn Rust, and though we are a Java company, my next microservice will be in Rust. And it's good because it's diverse, right? And now I leave the company. Yeah, you are still the one going to maintain the Rust shit that I created, right? Thanks to be seated on the first row. I have stickers for you afterwards for your pain. I really appreciate that. So diversity is the good thing, but the idea of technology diversity being a benefit for the company is complete crap. I mean, even if you are the Java shop, it's very hard to recruit a good Java developer. Imagine recruiting a Rust developer if you have no clue about Rust. No great idea. So the only thing that I have is independent deployment. So this is like projects, and you have like free main phases, specification, implementation, and deployment. So this is outside our scope, the specification, mainly it's communication between the business and us, it's incompressible. However, what we can work on is implementation and deployment, and this is like known as the lead time. So the idea of deploying often is not a new problem. We already had this problem before, but before we did it in a very different way. Before we had really strains. So we deployed, let's say, four times in a year. It's not funny. You're probably using right now software that we deployed like this. And the idea is that it's very hard to test the whole monolith. So we cannot test it completely all the time. So we need to reduce the number of deployments again to x time, let's say four times a year. The thing is, the business needs to wait for months to get their feature. But who here is a developer? Nearly everybody. We are always late, right? Come on. We are always late. So if you miss the train, they need to wait three months extra. Business is never happy to wait three more months. So what do we do? Well, we cannot deploy. But there is a trick. You can still deploy if it's a bug fix. It's not a problem. You just ship the unfinished feature and, of course, now there is a bug. And you can continue working on the feature. It drinks the bell, right? I know. I mean, if you have a stupid rule, then we are creative people. We find a way to go around the rule. But this is based on an assumption that it's not possible to test the monolith well and the assumption that the microservices people tell is that they never tell it. They tell it's not possible to test the monolith well. The underlying assumption is that it's, of course, very easy to test the microservice well. Meaning that the microservices is completely self-contained, has no dependency to others. There will be no side effect, whatever. Of course, there will be less ripple effects unless you have a mono ripple, blah, blah, blah. But still, the assumption is completely crap. You have hard contracts, yes, but I'm pretty sure that there is some implementation detail that leaked. And then you can break something if you just test the microservice. So, in my opinion, the real problem is we think of the monolith as a monolith. However, if you have worked for some months or even some years on a code base, you may have noticed that some ports change very frequently or very randomly, and some other ports are very stable and very rarely change. Do you also have the same experience? I mean, the only one. Yeah, okay. So, my experience is that, in general, it's the business requirements that makes the change or a change in law. So, business requirements, frequent, low, random. We already had this problem at the time when I was young and a young engineer eager to develop my stuff. How did we handle it? We used the rules engine. Who knows about rules engine? Oh, not that many people. Who doesn't know about rules engine? Okay, and then some people are shrewd in this cat. They are between knowing and not knowing. Okay, that's interesting. Okay, there, okay. So, for those who don't know, the idea of a rules engine was you would deploy once the engine and then inside you would deploy code on the fly. And this code would get evaluated by the rules engine, blah, blah, blah. And there were a couple of benefits. The benefit is that the business would be independent of the release cycle. They could go to production, make directly their change, and it would be applied immediately, which can create issues because sometimes the dates, you need to apply the change at the correct date, especially with law, blah, blah, blah. I mean, I don't want to work on the detail because this talk is not about a rules engine, but the idea would be that. And so some businesses would be very afraid to rely on IT, depending on the organization, and they would say we want a rules engine and then we want nearly all the code there. The problem with this approach is that, well, IT would be very happy because then it becomes their problem, not yours. But the problem with this approach is the idea of the business changing a rule is crazy. I never worked on such a project, but I asked one of my colleagues once, hey, show me the codes, but how does the rule looks like? And wow. So first, I don't believe any regular business person could have changed it. You need to be a developer to understand the rule, and then you need to be also very familiar with the business. So the idea of being independent is an idea, but it's not a reality. Anyway, this is only one of the ways. My idea is if you isolate the quick changing part or parts, you can replace it with a single microservice with a serverless function with a rules engine, with whatever something that is not invented yet and keep your monolith working. Do you know this, like, microservices pattern by Chris Richardson's book? Do you know the book? Okay. Very, very interesting book, like this thick. I read halfway through the book. He does everything by the book and then halfway, I understood that nobody is crazy enough to implement everything. So about halfway through the book, I just put it back on the chef's table. Okay. If you want to do microservices the real way, you should read the book and implement everything. It's a lot of effort. Anyway, there is a companion site, which is also very interesting, and in one of them, he talks about strengthening the monolith. Basically, there is this, like, Martin's follower design pattern, strengthening pattern, and the idea in strengthening the monolith is very similar. So basically, you start with the monolith, then you chop one port, which becomes a microservice, then several more ports and ports and ports, and at some point, you don't have the monolith anymore, you only have microservices. And this is a very good idea, but his idea is this is to be architecture. Everything is microservices. That's the grail. Everything is nice. We have achieved microservices' paradise. Whereas my idea is why don't you stop there? So here, you just have the quick-changing ports and you keep everything in the monolith, which is very stable anyway, so you don't care. So one way to achieve that is to use an API getaway if you are in a web environment. So you can, like, through the monolith, you send everything to the monolith, and at one point, you just change the route to one function that is serverless, microservice, a rules engine, whatever. I don't care. The best example I have is, like, I've worked in e-commerce, is the pricing. Who has worked in e-commerce as well? Yeah. Retail. Oh, the worst kind. And the pricing is the crazy sport. Yeah. I didn't work on taxation so much, but the pricing was always crazy. Like, the business always had crazy idea. Like, the regular stuff, like, okay, you buy this product, and if you buy X, you get, let's say, 10% is easy. Then they come with, oh, if you buy X of this product, you can get Y for free. Okay? Yeah, but if you buy Z of this product, then you get no shipping costs. So every time you wanted to do something, they had more crazy idea. I worked on the hybrid platform that was impossible. So my idea is we can chop the pricing engine and do something fun. And then I have 10 minutes, so it's demo time. And of course, it's the time where I want to show you that it works, and there is a chance it doesn't, otherwise it would be fun. Chopping the monolith, which is not the right way. Chop monolith. New window. Okay, so here I have my architecture. Is it big enough for you? Good. So I work on the Apache API6 project, so I will be using Apache API6. Of course, you can use any API get where you want, but of course, I will be very grateful. My boss, if you used mine, but okay. API6 depends on ETCD, where it stores its configuration, its distributed key value store. Then I have my application, which I need to downgrade here. So I can show you the codes. And I have MariaDB, where I have all the data from my project. So who here is a GVM developer? Oh, not that many people. So this is a regular Spring Boot application with Kotlin. And the idea is I will start the stuff and then I can describe the codes. Docker compose up. Here I have this pricing stuff. And at the moment, pricing is super hard. I just add the sum of all prices, right? The idea is not to show you how you can do a pricing engine, it's just to show you how you can change it easily. So I have this pricing stuff, and basically when I call it here, so I get the cards, I price the cards, and I return everything in one JSON payload. So when I go there, so here it should have started, I will just configure the API getaway, so I send everything to the shop shop. And here it's just to remove caches so I don't have bad experiences during the demo. Okay, now I can... Oh, I need to start Firefox. Yeah, I got interested in this stuff, in the Czech language. I don't know if you see it. Sorry, but I won't learn Czech anytime soon. And so I can go to local host, and I learned something that's not like only the Czech language, it's called syllabic consonants. So that's how you can pronounce all those consonants, and they are even vowel somewhere, that stupid foreigners cannot know. So here I add stuff to my cards, I will just check the network, and I go here, and here in one go I got everything, so if I get here and I say the response, I see either total and I have the lines of my cards, and the origin is a monolith. I cannot make it bigger, you should have seated in the front row. But trust me, do you confirm this written monolith? Of course, thanks. You really want your stickers, I understand. Okay, so now to actually start my journey, I will need to make pricing not a regular function call but an API call. So now the pricing here, and I will just update, so it's Docker, compose, start, chop, chop, chop, chop, and normally at some point, I always forget it's up, it's not start is up, yes. So at some point, yes. Okay, it's doing this. So the pricing now has become a route. So now I compute the route. So the client, the browser, will first request the content of the card, it will get the content of the card and we make another request to the pricing. This is very stupid, it's not the client that should do it, it's the API getaway, but you understand I'm too lazy to implement further. Now I have this, I get back here, I didn't implement the sessions, so I need to redo everything but that's fine. I go there, I go there, I go there, and here I go on the cards. So you can see it's another version and now when I go to pricing, so first I have two requests and here the origin is still the monolithic. So now my architecture is ready to be chopped. What I did finally is I kept the whole stuff but I implemented the pricing in like JavaScript. Yes, and I put the codes on the Azure Fast. And so now the only thing that I need to do, I don't redeploy anything, I just go here and I still have everything that goes to the monolith but I have one special endpoint, the pricing endpoint, and I say, hey, when I receive a post request on the pricing endpoint, well, you just go to Azure instead. So I do it and now if I do like the refresh, like it should be like slower and then you can see Azure here and I didn't redeploy the monolith and now I can change the code every time I want on serverless and again it can be serverless, a rose engine, whatever I don't care but basically I kept my monolithic. So thanks for your attention. You can follow me on Twitter even though Twitter is now dead. You can follow me on MasterDone where I'm trying to find new and interesting content. If you're interested about the code itself because even you should trust this guy that everything was fine but probably I bribed him or whatever so you can get the code on GitHub and do it by yourself and if I got you interested in Apache API 6 somehow just have a look. It pays my bills to come here so I can do better talks next year and I come back again and now we have like three minutes for questions. Wow. Yes, thank you. You really want your stickers, I know. Remind me. So the question is when do you stop chopping the parts of your monolith? When do you know what's the right time? So I will completely change the question because that's my privilege as the speaker and I tell you the question you should have asked instead. The question is when do you start chopping? And there is one single reason in my opinion to start going to microservices and it's not about any technical reason. The only reason you should start doing microservices is because you have too many developers working on the same code base and my experience and this is very personal we started to feel the pressure at 70 developers so we had a dedicated release manager who handled all the branches of the Git flow it started to get really messy that's at this level we should have started to think what part can we start to chop before that time scaling a like stock of a flow, they don't do microservices they scale super well so every technical reason you have to do microservices is probably bullshit in my opinion I accept stickers also the companies that go on the stage and advocate for microservices they have the right organization they have many developers and yes, they are at a point where the benefits of microservices outweighs the problems that you will get quite distributed tracing but I believe that most regular companies are not mature enough to do microservices anyway so unless you start feeling the weight of those like many many developers don't do it and we can talk afterwards because I'm out of time thanks a lot, I have stickers for everybody don't worry thank you and I have a talk tomorrow about open telemetry it will be less funny but still interesting I believe sorry what time at 11 what is on the projector as well so you know you can feel free to slide a little closer or maybe you can just stand there I would just like to remind you not to forget to attend our social events today we will be reacting to other or in the future and right now you will talk to the projector thank you hello everyone welcome to chemo 4 what to expect my name is Otave Rodofeski I am a principal software engineer at Red Hat I'm also a committer and a member of the Apache Chemo PMC you can find me online I'm usually on Twitter but you also have my mail here should you have any questions after this talk in real life I live here in Bernal for 8 years already unfortunately I still do not speak fluent Czech but I keep trying on the agenda today I'm going to give a very brief overview of the Apache Chemo project then I will talk about the motivations and objectives for releasing a new major version of the framework I will discuss a bit about the changes under the hood I'll complement with my thoughts on how you can plan the upgrades and I'll finish with the questions and answers the Apache Chemo project is basically organized in multiple sub-projects Chemo Core being the largest one of them it's the one that provides the enterprise integration patterns and components that is what most of the people tend to know about but then we also have some sub-projects that focus on specific technologies or specific problem domains we have things like Chemo Spring Boot and Chemo Quarkus and we have also things like Chemo K which focus on providing a lightweight runtime for Kubernetes and OpenShift if you have never seen what Chemo wrote looks like I'm showing an example here this one is written in Java basically shows how you could pick some data from Kafka and send it to a queue on Amazon SQS this route could be written here in XML or YAML among other languages with the course of developing Chemo 3 we kind of created a few new features and projects that they will become more prevalent with Chemo 4 the first one of this project is Chemo J-Bank Chemo J-Bank is basically a way to quickly run the routes you create you don't have to actually write a full blown Java project to do that you basically pass a YAML file and Chemo J-Bank can run it for you this makes it a very nice tool for prototyping routes so whenever you want to try something Chemo J-Bank can help you on that Chemo J-Bank also has features to simplify bootstrapping integration projects for our sub-projects things like Chemo Quarkus you can quickly export those routes for running on those run times and lastly Chemo J-Bank also provides functionality on which you can implement new things for instance internally within this project Chemo Caravan we use Chemo J-Bank Chemo Caravan is a tool kit for designing routes basically with Chemo Caravan you can use those drag and drop features to select components and patterns and design the integration in a visual way it basically works as an extension for Visual Studio Code and Chemo Caravan will save the YAML route for you so you don't have to actually write code and one of the nice things that it has is this arrow button which uses Chemo J-Bank to run that integration that you have designed but I'm here today to talk specifically about Chemo 4 what we had in mind when working on this new version of our framework and when we think about the motivations for working on Chemo 4 the one of them and I think the primary motivation was the release of the Jakarta 10 set of APIs Chemo itself does not use Jakarta APIs but many of the libraries that we use to implement our components do so with the goal of being able to upgrade to those libraries we had this motivation for Chemo 4 then we have another big motivation which is Java 17 the community of engineers working on the Java language has been for many years already developing newer and greater features for the Java language and we want to be able to make use of those features both for making our code easier to maintain but as well as for implementing new features for our community and then we have a set of projects that are very important for the Java community in general and they are widely used we have here things like Spring Framework 6, Spring Boot 3 and Quarkus 3 those three projects are widely used around the Java ecosystem and they too are somehow also motivated in releasing new features based on both Java 17 and Jakarta 10 so it came out as the perfect opportunity for us to work on a new version of Chemo 4 aligning with all these five items when we think about our goals for this release, what we want to achieve as always for when you are working on a new version of a framework you always think of the foundations you want to build and with Chemo 4 we again fought on building foundations for the future so that's why we are aiming at supporting Java 17 as a minimum version as I said we want to make use of these great and new features of the Java language both for our work on the framework as well as for implementing new things for our community and we see Chemo 4 as a forward-looking release that is we are already looking forward to the upcoming Java 21 LTS release which should be I think around September, should be out I think around September and one last point is when we raised this discussion with the community as having Java 17 as a minimum version we did receive some feedback from the community questioning whether we could support Java 11 but it turns out that since many of the libraries and many of the motivations that we have are already aligned with Java 17 as a minimum it would be kind of difficult for us to do that looking at our work in maintaining Chemo and what we did with Chemo 3, things that worked well things that gave us some additional effort we identified Chemo Karath which is one of our sub-projects as one of the pain points in our ecosystem so we decided with Chemo 3 to downgrade the effort we have on maintaining this project to as a best effort in practice this means that we will decouple the releases of Chemo Karath from the releases of Chemo Core so what it means is that when we release a new version of Chemo Core there won't necessarily be a new version of Chemo Karath in the same kind of thought we also took this opportunity to work on goals to reduce our maintenance effort Chemo is kind of a big project so as always a new version is a good opportunity to do that so we focused on doing some internal cleanups as the time goes on and on a project that is already 17 years old there's always some technical depth that piles up so with Chemo 4 we did quite a lot of cleanups I will elaborate a bit more about this in subsequent slides we also took the opportunity to look at things that within the Java ecosystem are either being superseded by newer versions or stopped being used for instance we know that nowadays most of the projects are already using JUnit 5 so we decided to drop the support for JUnit 4 in this release and we also took the opportunity to look at the library of components that we have Chemo has about 300 components a bit more than that so when we look at the components that we make available for the community and look at both the technologies and the underlying projects that offer the functionality that we use we identified about 33 of those that either have become unmountained or were evolving on a pace that was not the same as the rest of the community and by that I mean for instance by this point in time you do not support Java 17 or have not yet been supported in Jakarta 10 and so on things that would kind of make our release more difficult and as a result of this evaluation we had 33 components that were affected of those 20 can be easily replaced by others already available in Chemo and here I'm talking things for example Chemo-Active MQ which you can sum up easily replaced with Chemo-JMS and of those components most of them have some little usage based on what we could gather from Jira, mailing lists, messages and chat requests and one point that is important to highlight here as well is that in some cases these are temporary we have for example as I mentioned here Chemo-Active MQ the community is already working in resurrecting these components based on a newer version that supports Jakarta 10 so it might be that this number of 33 components affected will be reduced as the development of Chemo for progress Talking a little bit about the changes under the hood these are things that normally wouldn't matter most for the users but I think it can provide a good insight about the work that is going on one of these things is what we call the internal plugin manager and Chemo internally has a way to allow the configuration of many things how exchanges are created, how messages are created how error handlers are created and this is done through what we call a plugin interface in Chemo 3 this interface was not very uniform with Chemo 4 we looked at the way we were working with them and he worked to provide a uniform interface this is something that does not again affect the users but it turns out that reduces the maintenance effort for us we also look at the things that were somehow public but shouldn't be things like introspection support which is used only internally and when not talking about code specifically about the internal code there are also changes in testing for instance we are aiming at providing clean builds for power and S390X architectures this is Linux on mainframe of course Chemo runs fine on these architectures but we also want the community to be able to build and run the tests on them again on the same topic of reducing the maintenance effort we evaluated how we were doing some of the testing for our components the way Chemo 3 used it to do that was through a kind of complex hierarchy of types to set up the Chemo context with Chemo 4 there is a new JUnit 5-based extension that makes it much easier to set up the context we basically used the register extension and the JUnit does the book of the work for us eventually we hope that this will become something that we encourage the users to do to simplify their part of the testing as well and lastly on the part of internal changes nowadays we know that supply chain attacks has been a big topic and within this topic of security many projects are providing the software build of materials with Chemo 4 we are going to provide that as well this new release, one topic that we focused a lot was on the part of performance improvements some time ago it became kind of famous this JDK issue, JDK 818450 basically, simplifying a lot here this issue is related to a performance penalty that can happen when you do type checks if you are working on any application that is performance sensitive I strongly recommend looking at this material that I'm showing here especially cracking the scalability wall which was a talk given at I think the Vox UK where two of the engineers that work a lot on uncovering this issue and working around it talk about the implications of this issue what was done to solve it and why Java developers should care about it as we work to work around this issue and fix the problems that we identified as part of that we optimized several components on Chemo notably SIDA and disruptor we've of course some changes also happening on Core which benefit basically all the 300 components that we have and of course as during the course of investigating these performance issues and working to correct them we came out with other micro optimizations that alone wouldn't do much changes but can provide some nice performance improvement and to showcase a little bit about what I'm talking about in terms of performance improvements I'm going to show a few examples for instance on the case of SIDA which is a component that is kind of widely used in a scenario with low contention we have 19% faster throughput with Chemo 4 on a scenario with 4 consumers and 1 producer 18% faster with a little bit more contention and so on also disruptor is another component that is also used especially in scenarios where you have multiple producers on the case of disruptor we have 36% faster results with Chemo 4 and so on overall what we found out is that Chemo 4 in our tests it was 80% it was faster 80% of the times for the SIDA component when compared with Chemo 3.20.4 for the disruptor component when compared with 3.18.6 Chemo 4 was faster 80.6% of the time and when compared with 3.20 Chemo 4 was faster 91.9% of the time if you are interested in learning a little bit more about the performance improvements that we worked on Chemo 4 I suggest looking at this blog post where we discussed a little bit more about the changes that we did, what we found out some of the metrics that we gather and more I hope that by this time I kind of sold you the idea of Chemo 4 and I hope you are all excited about upgrading to this new version it is not the final version is not out yet but we already are working on a release candidate we have a few milestones available already but what should you expect of a migration from Chemo 3 to Chemo 4 well first of all the migration from Chemo 3 to Chemo 4 should be much easier than was from Chemo 2 to Chemo 3 we are aiming for a drop in replacement additionally you should not expect any kind of DSLs incompatibilities Chemo has I think a quite good track record of retaining this backwards compatibility and it should not be different this time with regards to the planning one thing that might change a little bit what kind of packaging of Chemo you are using if you are using a downstream or a commercial distribution do talk to a vendor they may have their plans already if you are using the upstream version the open source one from Apache the usual recommendations for any upgrades matter here save time for testing we do our best to ensure that Chemo has a good testing coverage but there is only so much we can cover do plan to provide feedback to the community if you try the upgrades something is not working do share that with us do open tickets go to our mailing list or chats and provide feedback to us and share the knowledge with the rest of the community the problems that you face today can help someone else going through the same problem in the future it also helps us as a community to learn from the problems that the community is having and with time as we work on other versions other features we can learn from that the path for upgrading sorry the path to Chemo 4 might depend where exactly you are during the course of developing Chemo 3 there were a few important milestones in terms of what we achieved for instance the first version where we officially supported Java 17 the last version where we supported Java 8 and so on so things like this might matter as well on your upgrade plans in general as I said migrating from Chemo 3 should be relatively easy especially if you are using maintained technologies if you have some component that's not going to be removed or have been temporarily removed that should be a good sign as well if you are using one of the modern projects that we have Chemo Spring Boot Chemo Quarkus Chemo K migration should be fairly simple if you are using the plain Chemo Core a good sign is if you are using a relatively newer version of Chemo 3 especially a relatively newer LTS version a bit more effort might be required if you are using an older LTS version especially older than 3.14 not because we broke features or because some incompatibility along the way but because of the cost of many small changes aggregated all over the time and some code changes definitely will be required if you are using those internal APIs that I mentioned that were changed I don't think I think many in the community are not using those APIs but if for any reason you are then do prepare for some code changes those changes also should not be so complex but of course is something that will eventually change if you are using OSGI as I said on the beginning of this talk we are downgrading our efforts on Chemo Kafka to a best effort so ideally you should plan to move your OSGI workloads to Chemo Quarkus Chemo Spring Boot, Chemo K, Chemo Kafka connector any other project that might suite your needs user impact changes which should not be widely impactful are in terms of logging with Chemo Quarkus we upgraded SLF for JPI to version 2.x this version brings a few different dependencies so depending on what you use as a login you might need to adjust a little bit the dependencies that you use one of the exchange patterns that we had and was not widely used called inoptional out was removed and for those using the Chemo Main and using a main listener to configure the Chemo Main one of the methods that was used to perform that configuration the method Configure was removed it was already deprecated on Chemo 3 and with Chemo 4 this method is not available anymore for those users they should move to either the after-configure or the before-configure method if you are still on Chemo 2 then that migration path is a little bit more problematic first Chemo 3 was modularized so the dependencies that Chemo 2 provided are different than the ones provided by Chemo 3 so migrating from Chemo 2 involves adjusting those dependencies and package changes and so on there were also changes in terms of handling multiple and single contexts and Chemo 3 also brought a Java 11 as a minimum version so it's quite a lot more work than the migration from Chemo 3 closing this talk about Chemo 4 I leave these comments that even though Chemo 4 is not out yet in terms of there's no fully stable version we already have several milestones available a release candidate is planned for just a few weeks in the future so do start planning now do plan to adjust your applications see how they work with Chemo 4 and provide the feedback to the community avoid un-maintained versions we still have many users on the community either using very old LTS versions of Chemo 3 or even using Chemo 2 completely so please do plan to upgrade those we are not maintaining those and the community is not receiving fixes for CVEs and bug fixes at all re-evaluate usage of old technologies and standards if you are integrating with some tool that has become un-maintained that's something you should look at your architecture and see what you can do to move away from that because there's a good chance that the component user to integrate with that might not be available in the future and as I mentioned earlier please share the knowledge it helps the whole community it helps us to learn what we did well what where we failed and we learn to do better in the next version and with that I think we are open for questions and answers if there's any yes, so the question is what about the Chemo Kafka connector if it's planed for Chemo 4 and the answer is yes it's planed for Chemo 4 he worked the Chemo Kafka connector on the course of the development of Chemo 3 so that it uses Camelets and as we progress with the development of Chemo 4 we will certainly work on a new version of Camel Kafka connector that makes use of Chemo 4 engine and the new Chemo Camelets that will become available welcome any other question? so thank you everyone test, test, okay works sort of so I can't hear outside the door thank you I'm Paolo Abeni I work for Red Hat in the networking services team I use it to add back to the UDP protocol implementation inside the kernel then I move it to add bugs to the MPTCP protocol implementation inside the kernel and then currently mostly get blamed by Linus that is, he's one of the upstream maintainers of networking 3 and I'm Marcelo I also work for Red Hat in the networking services team and one of our main responsibilities in the past time, recent it's on integrating OVS hard of loading in our products okay today we are going to talk to you about networking performances okay, so we are going to give a very brief introduction about networking performance in general and then we will focus on a couple of case of study that we use to demonstrate some interesting strange results we will go to investigate them using common tools and perhaps how to improve things in some cases and we will draw some conclusions hopefully what about what's the big fuss about networking performances in the end they both down to measure the maximum amount of packet per seconds or message per seconds or the max throughput that a given host is able to process usually on a single core that is mostly by convention because we hope that things will scale on multiple cores even that is not always true and also because the set ups are much more simple when we use a single core that is in turn actually unfair with respect to some architectures that have less power for full GPUs but possibly much more cores available why we do performance testing for many good reasons to detect bottlenecks to avoid regressions to tuning setup etc so when we speak about performances we are not interested in functional behavior we assume that everything works just as expected and we instead are interested in raw numbers for that we avoid using well-known and very useful tools like TCP down, Wireshark, packet drill etc and we instead focus on using packet generators either in user space like IPERF or NetPERF or kernel space packet generator like Packagen that tool usually provides the statistics we are looking for but more often than not we need to have other aggregate counters that the kernel can provide and we can expect with other tools like NetStat that gives us per protocol aggregate counters or socket stats that give us per socket information and more often than not we are interested in looking where our CPU cycle are actually spent and today we will use a lot the PERF tool so let's move to the case of study the first one is very simple one that is receive throughput for an UDP application why this one? because it's a very common thing that most even Telco does first thing how fast we can go let's make sure many packets we are able to receive this step is very very simple we have an host that runs a packet generator in our case Packagen we use Packagen as packet generator because we don't want the transmitter being the bottleneck we want to measure the performance of the receiver the receiver runs on another host it's a very simple UDP application here there is the URL where you can find it just read packets and drop them we use that one because there is a lot of common light option that can be used to configure its behavior the two host are connected by a fast link 10 gigabit in our case the faster the better and we are using two melanox nick on both the sender and the receiver whatever hardware you are going to use for this kind of experiment you will have slightly different figures but the whole trend should be the same on most recent several class hardware so we want to see how fast we can go in this scenario how many packets the receiver is able to process and we want to get someone stable results for that we are going to pin the user space process application on a given core and we are going to pin the kernel space processing on a core also we are going to disable IRQ balance that could move the kernel processing on random cores unexpectedly make the figures we measure sort of random and we won't avoid that so that given we are going to do the first test with a somewhat default configuration the only perhaps unusual thing is that firewall D is disabled and we don't have a net filter and this is what we see 1,660 packet per second which is not bad it's quite accurate than we would have got a few years ago with the same hardware but we want to understand if that is the maximum we can get so we attach the PERF tool to the receiver the PERF tool can measure how many cycles are spent in any function executed by a given core and we report that we are using a common line report the main information there are the name of the function where CPU is spending cycles and the percentage of CPU cycles spent in every function you can see that most of the time we spend copying packet copying data from the kernel space to the user space not that many times roughly 20% of the whole CPU time and we can see that below that the four topmost offenders are a function related to syscall overhead there is the libc receive message syscall which is the syscall used by the application to actually fetch the packet from the kernel and the other ones instead are related to security content measures for recent hardware vulnerability and to the sending of security tools inside the kernel so the bottom line is that we are spending a lot of time due to syscall overhead and the application is receiving one packet for every syscall so we could think that if we use a different syscall that allows us to process many packets with a single code it will save a lot of time and possibly going faster there syscall actualizes it's called receiveMMessage well the first M stands for multiple and we can change the behavior of our tool with a common line argument to tell it to use such syscall so let's do it and measure whatever we see surprise, surprise, we are slower than before we should have been slower than before and that is quite unexpected we hope it will go like faster why? simply running the top common line tool give us some information, some what as we can see in the first experiment the user space process took 88% roughly of one CPU and now it's taking much less so we are really going faster but still we are processing less packets why? because the bottleneck is not in this case the user space process the bottleneck here is the kernel space processing the other process you can see in the top report case of RQD which is keeping a CPU fully busy still we are faster, why we see less packet? because to actually walk up a process the CPU needs to spend some cycle the faster is the user space process the more frequently such process go to sleep more frequently the kernel space process need to walk it up and more CPU cycles as to spend to walk wake that process up less CPU cycles that CPU has available to actually process packet so the bottom line is that with our first report we are first path to investigation we have looked to the wrong CPU we should have looked to the other one the one processing the kernel space stack so let's look at that now and this is what we get and we can see that the top most offender the function that is burning more CPU cycle is that iNetGRO receive and we have another function still quite high in that ranking with that GRO word inside both of them are related to the GRO engine the GRO engine is a very lower component of the networking stack is in charge of aggregating as much packet as possible as seen on the wire in a single giant packet that will later traverse the whole networking stack so that kind of technique allows the networking stack to save a lot of CPU cycles when it's able to aggregate packets because as many as 40 wire packets will be processed by the working stack at once the bad thing for this experiment is that the UDP protocol does not leverage GRO by default so that cycle are actually completely wasted what we can do we could disable GRO that is the thing that we can do in this experiment because we are interested just in UDP in general it's a bad thing to do because there is also the TCP protocol which is what we use it somehow and if you disable GRO TCP performance we are seeing it dramatically anyhow we can disable it via the HTH tool with the command line reported over there and we can repeat the experiment and finally we get some progress if you compare the number with that we had done some slide ago you can see that we get some miserable improvement we did not change at all in our code just with somewhat better configuration at least for this test still we have some surprise there is what the Perf tool is reporting now as the topmost offer for the kernel space processing and as you can see that is completely different from what we had before yes the GRO thing is gone because we disable it but the other function are just have just random number with respect with what we have before very different CPU usage why? because when we process a packet coming from the net no matter what we do we will have cache misses for every given packet because the packet contents are fresh just put into memory by the DMA engine and that memory contents for the CPU point of view is completely new that means cache miss that cache miss before happened in the GRO engine and that was one of the reason why it was so costly now the GRO engine is not running anymore but still we have that cache miss and whatever function is actually experience that cache miss is getting its cost exploded sort of so we are still interested and see if we can improve the throughput sometimes down the list taking very little CPU cycle we see that UDP before early demand function even is taking very few cycles that is somewhat relevant because that early demand function is trying to look up for a connected socket for each incoming packet to avoid later root lookup but in our experiment the UDP socket is not connected so that little amount of CPU cycle are completely wasted and we can avoid that with a simple cccl disabling that early demox functionality we execute the cccl and we repeat the experiment we hope to see some improvement and we see that and unexpectedly that's a huge improvement 10% that's well we just remove the little overhead and we got a relevant improvement why? because so far I lie at the black family the figures are not that stable overall and they are not that stable because power management is enabled on the host we are using and power management kick in at an unexpected moment and when it kick in the results are sort of boring anyhow the trend repeating many tests etc. the trend is like that if you disable GRO you will see an improvement if you disable early demox you will see another improvement so we are still interested in maximizing our throughput and we can try something slightly different we say that we can first look again at our preferred and notice that there are a few functions that are related to root lookup that are consuming quite a bit of cycles and we mentioned that before that early demox functionality could avoid root lookup so we could try to enable it and change our user space tool change the configuration of our user space tool to actually connect the UDP socket upon reception of packets that can be done only if the ingress UDP traffic is just for a single flow in our experiment is just for a single flow so we can single for many given L4 tuple UDP IP source destination sorry source IP, destination IP source port, destination port so we enable early demox and we change the configuration UDP sync and rerun the test and great we see a relevant improvement we have moved to more than two million packets per second from the beginning 1.6 which is more than 20% improvement and with no changes at all to our to the code of our application just a slightly different setup if we go back and look at the top tool output we see that now both CPU are fully busy so we may conclude that we are at that of our children no more performance improvement are possible and that will be false because we mentioned at the beginning that UDP, sorry GRO could give a great booster to bulk transfer and GRO is not enabled by default for UDP but can be enabled on a per socket basis if the application creating the socket requested UDP sync does not support that option so no figures no real figures for that but if you fetch the source modify that to enable UDP GRO which is a simple set soak option and then use receive and message because at that point the botnet will be back on the user space part then you will see something with this hardware around 3 million and half packet per seconds which is much more than what we see now that would be the end, no because if nobody from the security team is watching you could try disabling Selenux and possibly disable also Red Poline and security mitigation I'm not suggesting to do that you can do that in control lab where everything is under your control and your responsibility if you do that you can probably reach something around 4 million packet per seconds which would be almost 3 times the initial figures and that would probably be the greater number you could get for this kind of hardware with that alright, thanks Pavel moving on to our next case of study I'll be covering the situation on which we use your hard of loading that we can do with OVS and TC so yeah, completely different from the use case before I'll briefly explain how OVS hard of load works because that's not really common yet but I'll assume that you know how OVS works here we have a picture on how a standard OVS works packets are coming through the nick network name space using VE so that's pretty standard with hard of load it leverages SROV it creates virtual functions in the switch dev mode on which the flows they are processed in a programmable way unlikely from the legacy SROV model on which the card itself does many things on its own this is the benefit of using the switch dev mode and then the picture comes like this you have the nick, you have the network name space and assuming that this flow is already loaded packets come directly from the wire into the network name space and vice versa this is the fully of loaded way and then when we go to the off situation which is for example when you are doing contract, you are doing decapsulation but for some reason you can't output to a virtual function in the network name space but so you have to use a VE tunnel the card can't output directly into that so it will offload all the processing up to that moment and then it has to resort to a software fallback to do that in the picture it's pretty much the same thing but the last step going into the network name space it's not offloaded, it's processing harder but the way back through the wire it's entirely down in software because when it's coming from this VE tunnel it's impossible to do offloading at that time the network card user was a Konect X6DX the sender was a real late receiver real line so it's very very fresh the test was simple TCP stream with a perf test results that we got here we are testing just with the Cdata path then we are not using OVS kernel but we are leveraging skip hardware switch so that we can say run entirely in software or leverage offload whenever possible and the idea on trying to leverage this partial offloading there is that using dedicated hardware is usually better than using generic processor for doing the computational work and at the same time it's creating some parallelism quite off out of the blue because when the card is processing a packet it finishes, delivers to the OVS to do the last remaining part and then it's already processing the next packet so you are getting some parallelism for free in there right so we should get a performance bump but not something happens we get a worse performance than doing entirely in software so what's happening here is we go from 18 to 11 gigabits per second that's quite a drop and if we check this is entirely software we are using skip hardware on the center side we can see the CPU used is quite okay no CPU is being maxed out no bottlenecks here on the receiver side though yes we are maxing it out so the receiver is the bottleneck and then if we move the half of load situation on the center side still not maxing out CPUs on the receiver side still the same figures so what we can conclude from here already is that when we go on this half of load situation apparently on the receiver side we lose 50% of efficiency because we are using the same amount of processing to do half of the work and then we don't know what's going on so let's start top bottom first thing check TCP stats to check if things are going right or wrong in there do you see something off in here I don't because there are no retransmissions there are no drops it's just it's really clean but the numbers are lower so it seems to be on TCP let's move down this is on software we have our baseline that is this one we are not debugging trying to make something just better but we are comparing A and B and this is one output of Perth unlikely on Paulus the left column in there is the accumulated time accumulated CPU uses dysfunctions and its colleagues are using and nothing stands stands out although this is the baseline and when we try to compare to the half of load situation it's not too different so how do we move on from here what idea is pick one right no with previous knowledge we know that this function is quite important it's the one that gets called in the driver when it's processing a packet that Nick just delivered to it so we can dive into it and if we didn't know that we can go on over every function it will take a little bit longer but it will get us there and then we expand that view we have these differences here it's the the software one and the harder half of loaded one we can see it's doing more stuff and different stuff in here but at the same time we don't have a good idea on what these numbers are meaning it's spending less CPU time on NAPI JRO receive but what does that mean we can't make some sense out of it to get so finding bugs it's harder than finding Waldo right when we go on checking trying to understand what those numbers mean these screens they're counting how many times these functions are getting called and during this experiment this one here NAPI JRO receive was getting called 7 million times as I was explaining this is the function that coalesces packets into a bigger one that the networking stack will process later and when we go half of load it gets called 4 million times and if you do the math between these two it's pretty much a drop that we had in the throughput okay well now we are starting to talk and the other two functions that appeared CT restore flow that CT received they are also getting called that much it was called a lot in here but previously it just returned it so it was a blank but now we have them quite present in there and if we are half of loading interesting to note that the contract in here didn't get entirely removed from it because we still have calls for it that are legitimate in the software processing even though that it's half of loaded because the transmit path is not of loaded and also we have calls for it inside the network name space so it's really not easy still to make some sense out of these numbers let's move back to this window and then let's expand it because now we have some more knowledge on it and makes it easier to understand it when I expand the function calls we can see that because of the fallback to software what it is doing is in order to restore the contract entry it is doing packet header section at the driver level which is before GRO it is doing it's consulting X array two times it's doing memory allocations for the extensions and for the tunnel destination and memory allocations as you may know they are really not cheap it's consulting another rsh table and this is all getting done before GRO so the driver will do all this stuff for this packet and then we give this back to the GRO engine which we realize that it belongs to the previous packet to the same flow as the previous packet and we'll merge them and all this effort it was needed to do GRO but it's not discarded because it's part of this bigger packet that's why recovering this mad information from the packet that was partially processed by the driver it's actually more expensive than doing it entirely in software in software if we were doing it we don't do any of this and we just aggregate the packets and do it only once so you're talking about a trade of doing 40 times of these to then aggregate the packet and throw it to the network stack or just process them aggregate 40 times and then give to the network stack so it's another attempt on getting some benefit of the hardware that backfires proper solution for this would be the hardware vendor or the hardware vendor to support GRO so that we can have the network stack and the hardware more aligned on how they work some conclusions on it is when dealing with performance expected and expected things may backfire and you may have more work to do it hardly is something that just flip that knob and it will be fine because it really depends on the use case that you are working on there's no size that fits all so you can try to optimize the system as bad as possible for this use case but for the other use case it's different and it may work but it also may not work it may work worse than if you didn't have done these optimizations and there's this one too that pretty much rules them all it supports a lot of stuff that can help you you will likely need some knowledge into the curators and to the many subsystems that we have in there but it's a very helpful tool that is very worth your time to understand it and that's it, any questions? I think we confused them all so without Selene sorry let me repeat regarding the use case of study I say that we could better performances without Selene yes, that is actually well known but don't tell Paul Moore which is the Selene well he knows actually if we go backward a little bit one of the first slide here you can see the fourth most offender is that Selene that is the hook used by Selene to perform and force its policy if you disable Selene at boot time not making it permissive like adding Selene equals 0 on the kernel command line that function will not be called at all and that overhead will go away here is 3 something percent at greater packet rate that we cannot obtain with Selene that will be more visible and removing that you will get some relevant gain but don't do that can you repeat the last part the question regards cache utilization I mentioned that processing new packets will be used and it suggests to disable caching for the packets to avoid that cache miss if I understood correctly but that will not solve the problem per se because we see the cache line the CPU see the cache miss because the CPU has to access the data from the IP address for IP processing so if you don't have the cache in between we have to go all the way down to the main memory and spend a lot of CPU cycles in the end it's exactly the same of having a cache miss there other questions sorry thank you hi blah blah blah something is okay or not okay so hi I'm Pavel I am on my fifth dev conf and I have I have my presentation as a web page because I will be speaking about web API so I have decided that probably it will be better to do everything such as you can try it by yourself because I can because I could mock up something or something like that but you can try and you can try everything I will be doing by yourself I will be talking about the most interesting browser and web APIs how much of you are the web developers I will ask like half of the people and how much of you are programming in JavaScript or TypeScript or doing something like it's like a little bit more than the web developers because now JavaScript isn't only for a web and I will be touching this topic on this presentation and how much of you has used some special unexpected weird thing on the web like for example VR on the web or augmented reality on the web so you are in a good point because I will be talking about this about how to make strange weird things directly in the web this is the address on the code if you don't have a scanner but I the first thing is I would like to distinguish between the web API and some API web service because these are totally two different things but but but they are they are often mixed together the first thing probably most people will will be thinking of when I say the web API they will think about for example open AI API or some facebook API or weather API or some service which is somewhere in the cloud and you are getting information from there but this is actually not a web API I will be talking about different kind of API I will be talking about the capabilities which has browser in itself on the device locally without the internet and which can be used by the javascript to make some some cool stuff very often very often happens to me the one thing five years ago I was here at DEF CONF and my friend Michael wanted to show something cool and back then the most cool thing was the virtual reality it is a little bit surprising but yes five years ago the virtual reality was like I don't know like AI today kind of and and we were thinking that we can go to DEF CONF and put here some Windows PC with SteamVR and show something but it will be a little bit weird to have in a Mozilla stand in a DEF CONF which is organized by Red Hat the Windows computer without any connection to this conference so we have figured out and we have figured out that we can use the VR directly in the web and we have there a stand and people can try people can try our very very simple game and the main reaction was like wow this is cool and this is really running in the web in Firefox in javascript javascript is the language for validating forms and alerting stuff or something like that and they were very surprised that this actually can be done in javascript and I would like to show you in this short presentation what what interesting things can be done directly in the javascript there is there is actually official page in MDN which lists all the web APIs but but they are very very very very a long list of them and I need to pick some so this presentation is very subjective probably I will talk about six of them and maybe there is some more interesting that I will mention and maybe there is there is some more cool or more capable or more updated because actually it's out of my capability to know all of them but you can definitely go through and yeah I have picked these ones it was also on the description of the presentation and I will I will show what each of them can do what are gross back of these APIs and what type of applications you can program with them and this is the first slide that I didn't know actually if it works through the hdmi cable but if not I have it on my mobile phone oh I will probably do a thing that's this is like the live demo of one very very strange drawback of this probably I will I will run it on my phone because it's it's certainly loud and this is actually the script which says this is actually this script there is nothing more if you copy this into the html and put it into the script script text this is actually the thing there is like like speech synthesis is a bit old thing but when I want to do these things for example 5 years, 6 years, 10 years ago I would need to use some external service, I would need to get some API key set up my server do there some configuration I would need to put a lot more work to do this function now a days there is a standardized API to synthesize voice in the browser there is a drawback drawback is that the standardization is not like not one to one what it sounds it's like one to one or what should be sound and some systems doesn't have for example English synthesis some systems doesn't work doesn't work properly on all languages some systems are male and some are female you can list the voices you can do the thing that you are you look through the available synthesizers and you pick the best one but you have no guarantee that there will be a synthesizer for your language and for your gender you want but you can be pretty sure that for most of the devices there is an English voice for synthesizing so for some for the application this is a very useful thing there is a equivalent API which is called speech recognition which works the other way around but it actually needs a permission because it will be a little bit scary if the web can directly listen to you and transcribe the thing you are saying and send it somewhere so when I say when I click here run actually I have some problem with my notebook so I need to yeah so never mind but it will it will do the thing that it will ask for the permission and never mind I think I don't think so probably not probably not it's like this is one this is one thing that I have that I will mention in the next slide but the thing is that not on every browser it is working and not on every browser you have permission to do it so if you want to use probably any of these APIs you should check the permissions and you should check that actually you can do it you cannot rely on the thing I will show it on the next slide here there is the there is another API or another set of APIs which can detect the surroundings like the device orientation it means whether the device is on the top or bottom or how it's facing it can detect acceleration and it can detect much more things than the existing here for of them but for each of the APIs I am showing here but code is only for the accelerometer you probably should to detect if there is or isn't the actual thing because there may maybe not there you can try to look at the page which is called canayuse.com and you can put their name of the API and it will show you like the 95% of browsers is supporting the accelerometer API in the Czech Republic and then you can you can decide if you want or don't want to use the API and there are a lot of them and a lot of them has pretty nice usages the obvious usage of these APIs are for the games and I will be using such a thing this is the thing I have mentioned in the first place there is a way how to how to use virtual reality and augmented reality directly from your browser there are three APIs it is a bit confusing but there were in the past API which was called WebVR API this is now deprecated it works somewhere but it's deprecated and it was replaced by WebXR API and WebXR API is the way how to unite through one standard all the devices like the virtual reality headsets augmented reality headsets mobile phones and probably the new device from the Apple will implement it also I didn't know but I think I actually don't know what will happen if I try it here but I can show you the thing on the device where it's supposed to be testing when I it doesn't need to see the text but do you see the button when I say run it's like the text that this device has registered Google I don't know Google Carbure Glasses it's called that and it sends the thing into the Google Carbure Glasses but actually to do there some real thing or to do there some real 3D stuff it's probably better to use some libraries this is like this code running in this screen when I run it on the device it will turn on some detecting mode and I can see the stars but this is the thing that a lot of these raw APIs has in common that you can use them directly it's this thing you don't need anything else than this code to start the VR experience but there will be nothing, there will be some default setup and nothing else or you can use some framework which will which will enable you to do there stuff without knowing things like shaders and rendering pipelines and so much things that they will probably be up to one day of the DEF CONF and one day of the workshops here to do there such things when you want to try to do some AR VR stuff I recommend Babylon framework or a frame framework Babylon framework is imperative it's like you are programming the thing you are telling put here the square put there a controller put there a light put there that thing and do this when the user clicks on that the A frame framework it's like HTML or SVG but for 3D you have XML code which defines the scene and you can define by XML here is a cube here is a circle here will be the wallpaper here will be the skybox this is static and you actually don't need to know you can just design it by the coding or by some editor and this is the thing that you have seen that on my mobile phone it's tried to attach on the sensors so it was like more interactive on the notebook the notebook hasn't have the sensors so it's fallback on the classical game 3D mode and the framework is doing this work by yourself yeah this is this is like this is like the a little bit weird maybe that I'm telling about this in the presentation which I am mentioning for example where we are for the speech synthesis but but one thing in the JavaScript was very very very big pain and it was working with the dates and the formatting stuff and there were there were totally messy and non-useful stuff there but nowadays there is a pretty pretty useful and very widely supported API for manipulation with the text strings which you can use and you don't need to include any library or some other stuff you just tell hey JavaScript format me this object to that format and it will it will just do it and yeah and the thing that's a big thing in the JavaScript that when you have all the things that I have shown you maybe not the VR there is some rendering in the WebGL but all the things are running in the JavaScript are running in a one thread it's like the one one threaded process and you are a little bit limited with it but nowadays you have some options how to offload the work onto the second thread and the third thread and you can you can make a proper multi-threaded application and there is a thing called WebWorkers I don't know how much of you have heard or used WebWorkers yeah like third of you WebWorkers is the thing that you can run the secondary thread in JavaScript and you can send messages between the main thread and the second thread and you can do some hard work there you can for example compute there some very heavy stuff or something like that nowadays there is an API for doing the hard work stuff but in the canvas it's pretty useful when you want to render some some very very very heavy stuff and don't want to mess the whole UI because when you are working in the main thread you need to do some interpolations with animation frames and you need to stop and to settle up the UI there are some strange stuff but if you use this you can do very hard work in the worker and when the worker is ready it will send the message into the main thread and the main thread will show the result and everything is working very seamlessly it's I was thinking about what else I can show you but there are so many things so probably now I will ask you if there is some things that you are interested in the JavaScript or in the browser you want to do because because there are so many things and so little time to present it so I will ask you what you yeah, there is a compass in this set there are multiple sensors and you can list all of them in the mdn page and yes, there is a compass and also the important thing that there is also a geolocation and the geolocation is the separate permission when you allow for example gyroscope you need to separately allow geolocation where you are because it will be I have 10 minutes or 5 yeah, so maybe yeah, definitely it's like the framework Babylon which I have shown here it's combining a lot of things together and it's also combining a camera but you can access the raw camera as an API and you can do there pretty everything you can do in the application you can program proper camera application in the web which actually does everything as the normal application and also there are very good APIs which can work with the streams which can take the camera and stream it somewhere or take the camera and process it and do with there some stuff and that's the I think I have prepared here when you are using the camera there is one thing that JavaScript or the browser you cannot do but it's not about the capabilities it's about it's about permissions or sandboxing the JavaScript or JavaScript from the browser cannot modify the files on your system because it will be weird that you actually cannot do directly on the web for example file explorer or something that is saving the files but there are ways the one way is that you can emulate the downloading you can actually create the file for example from the camera or PDF file or I don't know whatever and you can bundle it into the file and emulate the download of the file and the file will be downloaded but not from the web but internally like from the JavaScript and and also you can open the file picker box it's like you can tell the user hey now upload me here some file but you actually don't need to upload it somewhere you can process it in the JavaScript but in this situation the permissions work like when you click OK or open or whatever is the submit button in the file submit form you are giving the permissions to the JavaScript that can read this file before that JavaScript cannot even see or doesn't have any any information about your files yeah yeah yeah it's a very good question it depends on what API you are using yeah question is what is the overhead of the JavaScript and it depends on the type of the API you are using and it depends on browser some are more efficient than others but for the heavy work like for example when you are computing the number Pi in the worker it's percentually slower but it's not it's it's not totally different I think 60% of the performance of the C or something like that it's not definitely like 100% or or if you want to do very very heavy stuff probably JavaScript isn't the option but when you are doing probably anything else it is okay the question is the question is how you can know that there is some cool API there or where you can find the new APIs I think it depends you can definitely go through the manual and the new things but for me it is not a good way because there are so many things and you are a little bit confused with them I am every time I see something on the web which surprises me I look at the how it is done a lot of the things I have discovered it was like I was browsing the web and I have seen that there is a web which can transcribe your voice and put the voice into the video and then download the video as a different format then you upload it and I was like what there must be something on the server and I have turned off the internet and seen there is there is nothing on the server it is running in the JavaScript the web application so I have studied what was the thing which is doing this feature and I have seen the second source is the libraries there are some very cool libraries which are integrating these APIs together for example Babylon.js is integrating the camera API webXR API full screen API there are dozens of APIs that this framework is combining together in some useful stuff looking for these the question is if it isn't a little bit overkill to have all the things and if it won't like totally over memory and the thing is yes for example 20 years ago browsers were like the smart PDF readers which were capable of nothing maybe the forms and the validation of the forms and JavaScript was originally the language for the small things and small validations and something like that and nowadays JavaScript is language for creating the full capable applications and I actually don't know if it's good or bad it has some drawbacks it has some positives for me it is maybe more positive because you have very portable applications which can be very easily connected from one device to another without any installations or without any struggle and there is a second thing every each of these APIs which are actually interacting with your device for example with the location or with the sensors or microphone or something like that are needed to be allowed by the user you must allow the API for the JavaScript can even touch it there is the thing that JavaScript can do the computations for example it can mine the cryptocurrencies in the background and it was the problem but nowadays the browser has mechanisms to detect such malicious behavior when you go to the website which is doing something very strange in the JavaScript the browser can detect it and tells the user that this page is doing some nasty stuff and drains your battery do you want to keep this JavaScript running or not and you can decide as a user probably I don't know if we are out of time or if we have so thank you for your attention and I will be on the networking party so we can try to do so thank you