 Don't know if you know that feeling if you sign up for a marathon or half marathon when you sign up You think that's a really good idea and as soon as you stand in the starting position you are wondering why did I do that to myself and I have a similar feeling today But let's remind me and us why I think it's really important to talk about causality and why I think it's interesting So causality in general the idea is that we do something on the one end that affects on something on the other so for example if we change a certain variable in our system that will have an effect of a different component in our system and Of course if we manage our complex systems There will be always those relationships that we cannot predict and that we will only get aware of if we encounter an incident but we also have Normally a system consisting of multiple services and all of them need to communicate together to provide the overall functionality and Actually that is something we can really track well So we can see the causal relationship from the one service communication to the other and as the title suggested already We are talking exactly about that type of causality that we have from service to service and vocation Maybe another picture that comes to my mind if we don't have causality We have spaghetti and everyone knows spaghetti code So the idea is something is very tangled and we don't see how things relate to each other Well, now we know why it's important, but why do I think it's important and who am I? My name is Nelia Lea Uleman. I work for fiber plane We built notebooks to collaboratively debug infrastructures So you can bring in multiple providers Observability providers and see what's going on in your system Besides a lot of ice cream and I've not always worked at fiber plane I joined this company almost a year ago and before I work for company focusing more on process automation microservice orchestration and that switch for me was basically if we say with a food picture coming from the main course to the dessert Focusing on something completely different, but if I see something new I also look for similarities and In the main we have spaghetti and I'm from Germany in Germany. We also have a thing that's called spaghetti ice cream It's basically ice cream with strawberry sauce But basically I found that we seek causality not only in tools that are focusing on observability But we also seek causality in in tools for example like workflows So mainly in this next 20 or 25 minutes depending how much time we have I will jump through those three tools core graphs and service measures traces and Workflows because all of them show us the causal relationship of service to service in vocation Let's maybe start with something where we could get something out of the box because that normally sounds really good for us If we don't need to make that many changes who a few is running your services with a service mesh Just hands up Okay, I see few not not all but okay So if you use a service mesh the idea it captures your network traffic and it helps you Analyzing it and also helps you basically with setting up the network traffic And the idea is when we run in Kubernetes we attach a sidecar to it and the sidecar then is responsible for handling the network Traffic and of course you can see if we have such a component That might be a good idea to see what that can offer to us because it will capture already Maybe the service to service relation that will happen over the network. You see here already I put an invoice so the next example or the the example we look at is actually is to and is to under the hood uses envoy as a proxy to proxy the network and Yeah, on top basically what we also see in the in the demo in a minute is keali Which is the plug-in that helps us with the with with the visualization. So let's do that quickly I have here Well, I have here keali open and I see what I wanted to see and basically here in the call graph And I can zoom in and now I see here basically the services calling other services That's actually a causal relationship So I see that the product page calls details and also calls reviews I can even go in into a certain thing and see here in the service What the relation is the nice thing is as well in keali I found is that we see this called chain activities Compared with metrics. So if I go here on the tracing page I see here the milliseconds as metrics and this in context with the call chain that we have We can understand what which kind of our calls took very long and which were very short and I could go even deeper into one of them and Would get data here. I would see similar traces in that stack and I also see how many apps are involved I also see how many Sponsor here and I'm coming to that now because we are not talking here about a call chain or anything else We are talking here actually already about a trace. So what is a trace? The idea of a trace is that I capture within a transaction multiple events and each event is a trace It can be a call to a service, but it could be also a Call to a database or call to a message broker or any event that occurs Every spawn has a certain duration and belongs to a trace And I also know the parent spawn so I can basically conduct the tree down here I already saw an istio that is to use as traces basically to to generate those call graphs. So Coming back to the idea getting something out of the box working where you see such a nice call graph if you use a Service mesh, I think there is something you might and you want me to consider The idea is in a service mesh if I call service a from service B I open the first spawn so I have a context here and once service be consumes the call It knows the context from the header and knows they both belong into the same trace But what then happened is or what that can happen is if in service B. I call service C I open a new trace So I have two separate traces and I don't get the call graph out of the box What I need to consider if I want to get a call chain with my activities is I need to consider Basically context propagation of those of those spawns of those headers and that's something we can do But I'm mentioning it here because if we are application developers That would be something that happens in our code base that we need to make sure that we propagate the context correctly And by that I mean what is the important part here to link those different events to or different spawns to one trace It's actually the trace parent ID That is part of the header that consists the trace ID and also the spawn parent ID So here basically we know okay I belong to this trace and the the previous spawn was the one that I have here as well So that helps me to conduct the full full chain cool, I Talked about oh well I talked about now the service mesh and how it helps us with the HTTP traffic But if we consider our service architecture, we might not always have HTTP communication We maybe also have other other types of communication Maybe with a message broker and then this looks different then also here we need to go into the application code to make sure that we trace those messages correctly and Then we are basically at the topic of tracing in general So even without a service mesh we can also start tracing our applications The idea here is that if we have our application code we can use auto instrumentation So there are certain agents we can also run next to our application code So also no need to touch it open telemetry has a lot of Has a lot of languages that supports and also within those agents a lot of frameworks that supports The other idea is if I don't want to stay on the edges of my service I also can go into the service code for manual instrumentation I can make those events within the service visible in in my trace and Yeah, let's have a look into a scenario here and now we look into Instrumented application with open telemetry and into Jager and Basically for that I like to introduce you to the context of the application. I'm going to use here for that demo It's an ice cream recommendation Service because I really like ice cream. I have a hard time to choose so we have four services connected over Kafka here as a message broker and We have the recommendation service that takes a user input and based on that user input We need to call the location service to get the longitude and latitude of a certain city or location We we are we want and with that information with the geo data We can call the weather service because obviously we want also our recommendation based on the weather With the geo data we get information about the weather and then in the end We call an AI service because someone told me every good application nowadays needs AI So we call basically chat GPT and ask chat GPT. Hey, can you give me an ice cream recommendation? for for well, I mean we will see that for certain scenarios Let's look before we look into what ice cream we get recommended. Let's look into the weather service and I used here micrometer in Java It is a facade that you can use with Java and that's compatible with open telemetry so you can export the data in an open telemetry standard and What you see here is I use the observation API in code I created an observation and started basically and then I can control the lifecycle of that observation So I can create an event that will be the spawn that we see so I call it weather info created And then I can give even further text into that one I can say I also want the weather I want the business case and I want the name of that person requesting this and then I basically stop the observation Another thing if we are in the spring context you not always need so many lines of code if you Register with aspects you can also use an annotation basically saying over your class at weather fetched service So not always So there's not always the need to change so much on your code if you want to have custom spans Cool. Let's have a look into the most important part the ice cream recommendation of the day I have already my name. I'm in Paris My diet obviously is croissant diet since I'm in Paris and I'm still super excited to be here let's send that over and It takes sometimes our it is here and we have an excited mood We have the diet preference scattered clouds in Paris can someone confirm. I haven't been outside What caramel cappuccino ice cream flavor that sounds not that sounds not too bad Let's look into what we've can see now and Jager was a trace if I go now to the ice cream recommendation Service I see here basically the nine spans and What I want to show basically is here that I have the weather consumed And I see here my name I see the scattered clouds and I also see the business case So I I get all the information that I put there in this barn and that's basically the spawn that I've created I also see the other one That is this one here. That is where we annotated the class So here I didn't put anything but based on the class I also see the method I've called in that class And that's basically the idea if we want to have a causal relationship that goes beyond only the outside of the service Edges we can use the manual tracing to see what's going on in our service and how our service is behaving Cool paying back to the presentation big question now I haven't like just just imagine I haven't gotten an ice cream recommendation And I'm a customer and I'm calling the company responsible for that Where is my ice cream recommendation? Will tracing help me really to find out what has happened? If something has not been delivered in a in a way I mean you saw clearly that I can put in the busy the business key and the business ID in a trace But normally on a best practice level we don't capture a hundred percent of our traces We want only ten percent we do sampling here Because it's expensive to keep all the keep all the traces So I must have been very lucky if this specific trace of my business case was would have been captured So a trace not necessarily relates to a business case or to to a process that goes through our through our systems And that's basically where workflows come in and now we make a jump We jump from the observability space more to the application designing phase and to workflows So the idea of workflow is basically that I have a set of activities defined in a sequence and I need to perform certain tasks So for example, I write a shopping list and once this is completed I jump to the next step in the process which in this case is an event So in this example, I just wait for bank account balanced received So I need some data here and once I have that I can order items and then this workflow ends and the idea is that those tasks can be performed by our services and Those events can come from external and that really helps us to capture Business cases because whenever I start one of those workflows I create a process instance and this process instance is Traveling through the process and also gives me information about state now There's certain benefits to that because for example if everything went down or goes down It's very easy to ask the state of the process. Where are we or where we have we stopped and where do we need to we continue? In our scenario if we talk about the ice cream recommendation We already have that one service which was the recommendation service that sent out geodata and received the recommendation This one needed to make sure to correlate the messages correctly So we already had some service with state here and if we put that into a workflow now We could say this is our workflow application We have the set of activities that we do here. We request the geodata. We wait for geodata We get the weather information. We request the final recommendation So we call the AI service and once we have that we bring the information back to the front end I'm still having Kafka here because it is not really important how you call certain activities in that workflow It can be still over the message broker It doesn't need to be direct service invocation and that can be very handy if you have asynchronous communication And also things that maybe take a little bit longer cool in the following scenario I used Dappa as a workflow engine and as a as a workflow in my service Because I saw it I think maybe last year during kubcon and I was super intrigued by the idea to build workflows completely in code So our workflow lives in the one service and we have the Dappa side cars for also for for the other abstractions areas that Dappa uses and we can manage the state and radars from that Specific workflow. So how does it look like? Just give me a minute so I have here the Workflow defined in my application the workflow definition and and the workflow definition in that service I really define what I wanted to do. So for example once I started the workflow I want to send the location the location data and here as well I have a nice way of separating the data only to the purpose that meet that so with the location Object I can define that I only want to bring the name of the location in and maybe the workflow ID But the rest of the information should not be passed to the location service What I also can do is if I have a well and then here the most important part I call that activity And I wait here until the activity has completed and then I continue in my workflow The nice part here is I can also wait for those external events Maybe coming from Kafka in my scenario, but could come from everywhere, right? So it could be also maybe a user that needs to do something and if you have something really long-running Because it it uses the durable task. I can also set a time out here I can basically say well, you know what if this external event is not coming back within the next day I actually I do something else in this case I throw the task cancel exception and I define here in the catch Closet what I want to do in this case Basically, I don't want to give an ice cream recommendation and I complete with something that we can't provide for that moment And I think I hope you already see that having this as a feature is really powerful because coding a timer and code Can can get really complicated Cool. Let's run the application quickly. So I need to stop this one here and I can bring up this guy that looks good and I go back to my Frontend. I want to know the weather in Berlin. Maybe I'm missing out also Feeling slightly more relaxed now to what's the end of the presentation. So let's send the data. I Get French vanilla ice cream in Berlin. Okay, that's interesting. I Don't know well, okay, but I mean it did the job So if I go here to the console, that's a thing I want to show is I see what has happened in the workflow I'm locking those events. So I'm seeing actually exactly what's going on and Based on that this workflow has finished now But if that would be a really long-running business case I could also query the workflow API to understand where I where I am in the in the current state and also in the case of My missing ice cream recommendation Someone could look into that and really tell me why the process and where the process is stacking at the moment cool that was a very Short and quick brief into causality. I showed basically three tools not for I think we saw they have similarities when it comes to the causal Relationship, but then they are still very different from each other Let's let's have that as an overview. Basically if we use a service mesh We have the possibility to capture network traffic and based on that also we see service to service and vocation Mainly mainly they also use the concept of tracing here We might need to consider context propagation depending on which on on which languages we use But also which codes want to track, but normally we only see the outer edges of our application When we consider tracing Or while tracing as is part of call graphs So they kind of are related to each other But if we consider tracing we have the possibility to auto or manually instrument our applications We can customize the spawn definitions So we can really dig into the context of the application and get more causal relationships of the of the application insights But also traces are part of the observability stack if we consider workflows That's why you see three lines. They are a little bit different because there are more or less relevant if we consider business cases we should consider them in the application design and build phase and They are also there for managing the state and our and our processes and our workflow instances We can define timeouts and rowbacks and certain behavior how we want the business causal relationship to continue Well with that, I like to finish I hope it shows you it's not of choosing the one or the other tool you can mix and mingle I guess it's only the different perspectives that we have if we look into causal relationships. Thank you