 Hello everybody. In this talk we will discuss how to build fault tolerant distributed applications with temporal. My name is Tichomir, I'm a developer advocate at Temporal and I'm also the project lead of the CNCF serverless workflow project. So, a table of contents first we will give a quick introduction to Temporal. We will discuss how you use Temporal to build resilient microservices. We will also take a look at the polygode aspect of building microservices using Temporal, as well as some of the air handling features that it includes. In the end I will present a demo to kind of tie it all together. So Temporal is an open source distributed microservice orchestration platform. And to get you started here are some links like the website, docs, community forum, things like that. Temporal is used by a ton of different companies out there. And if you want to learn more and see more use cases and types of applications those companies are building using Temporal, you can go to this link and see a bunch of different case studies. So to give an introduction to Temporal, Temporal is composed of two main parts. One is the server, Temporal server and one is the SDKs. On the left hand side on the server, it's basically a GoLang binary that can be deployed in many different ways. For example, Kubernetes, Docker, or really any infrastructure that you might have available. In addition to that, Temporal also provides a cloud offering that you can use if you want to use that deployment rather than one that you do locally yourself. On the right hand side as far as the SDKs goes, Temporal provides a programming language model to building your business logic and applications. It's also called workflows, which are kind of like units of execution. And you can write them using programming languages so you can use Go, PHP, Java, Node.js currently. And again, those applications that you build using programming languages can be deployed on any framework in infrastructure just like currently that you're using to build your applications. So looking at the Temporal server, the Temporal server itself is composed of multiple different services. You can scale out the Temporal server horizontally, but for example deploying multiple servers in different clusters, but each service individually can be scaled as well. Now, just to take a little quick look into the different parts of the Temporal server, we have the front-end service. So all the communications of your client applications and your services can communicate to the Temporal server via GRPC calls. And the front-end service handles basically all the in-belt calls and allows for things like multi-cluster replication and things like that. The history service manages workflow state transitions, meaning that Temporal server does not execute your code directly. However, it assists during this execution in terms of storing some sort of important events, execution events that then later on allow you to either resume workflow or your application execution and deal with things like failures and stuff like that. The matching service provides host tasks queues, which are basically endpoints dynamically generated through which you can host multiple types of workflows or type of services at some particular endpoints. And in the end, the worker service is a background processing Temporal server includes a bunch of different background system workflows and replication queues and things like that in order to achieve what we will take a look in the next couple of slides. As far as storage goes, again, Temporal does not store anything, your applications code or some sort of serialized application code. Now all your code runs actually in your applications as we'll see soon, but the server storage still stores some information like events in the history of the execution of your different services and applications. Temporal currently provides as far as storage goes, goes Cassandra MySQL and Postgres QL and the scaling of your database really depends on the database use. So if you're using Cassandra, your scaling options will probably be somewhat different than, for example, Postgres. Observability is very important when we start writing any sort of microservice or any polyglot type of service or distributed service stuff. So both the server and the Temporal SDKs provide out of the box provide metrics, and these metrics can then be consumed by things like Prometheus and Garfan and you can build your dashboards and metrics visualization that way. In addition to that, Temporal SDKs also provide tracing information. So during the execution of your application and services, you can view the tracing information, for example, with Yeager, whatever your tracing type of software is that you want to use. I like to kind of like display the use of Temporal from a point of view of a particular user. In this case, let's focus on a developer. As developers, we want to really be able to focus on our business logic. And when we're dealing with writing very complex applications, especially distributed microservices, we have to deal with a ton of different things. With Temporal, as a developer, you really can focus on writing your own code. And the Temporal server on the right hand side in the box is basically you can think of it as a black box that provides you things like event handling, durable timers, durable storage, transaction management, queuing, analytics, and as we have shown previously, metrics. So all the things you get out of the box are really your applications. You can write simple code and you have all those benefits that you can utilize as far as what the Temporal server provides. Now, given of course some scaling options and deployments, the Temporal server is capable of executing hundreds of millions of these applications that we call workflows concurrently. As we said, Temporal server does not directly execute your code, your code still executes on your premises and your deployment, you know, the way you deploy your applications. But it tracks and manages its execution state and its application kind of flow. Now, as far as again looking at from developer perspective, we can utilize the Temporal SDKs to write our applications in different types of languages. As we said, Go PHP, Node, and Java, and each one of this as the case provide APIs for use that we can utilize during development such as, you know, workflow development APIs, testing APIs, and also client APIs. Because Temporal, again, it takes a programmatic approach rather than some sort of DSL or a high level type of workflow language, we write our code as developers still in our favorite IDE. So we don't have to get out of our environment, we don't have to change our programming language use, we can stay within the same type of environment that we're used to. So what are we kind of targeting, you know, what are we as developers, what do we have to write, and he has three things really the first one is workflows. So workflows are implementation of a business logic. So this is just code, you know, with some restrictions that we will go over that you have, you can write in order to execute your business logic, most likely orchestrate some third party services, or things like that. The workflow code that we write is becomes fault tolerant because of Temporal. And there is many different things like configuration based retries, timeouts, compensation and all those things that Temporal through SDKs and with addition to the server that kind of manages this things allow you to have out of the box as well. And the second thing are activities. So activities are basically parts of your code where you can use you know any sort of library database access file access you can do pretty much anything you want there. So activities can be of course invoke sync or async without you really having to specify any of that information your code or use some sort of third party libraries. So that you can be rate limited. And of course, you know with Temporal you get automatic retries without having anything to specify in your application, as far as coding for it goes. The third things are workers. Workers are processes that host your workflows and activities. Workers are responsible for execution and progress of execution of both your workflows and activities. Workers then communicate with the Temporal server and that communication is important in order to run invoke and in running and continuing and resuming your workflow execution, as we'll see also in the later slides. In addition to the SDKs and the programming model Temporal also provides for developers, a web UI, through which you can see, you know, you know what's going on in what workflows are running would stay there in their execution history stack traces and stick like things like that. In addition it provides a CLI, which is probably a more powerful way you can use to do a lot of different things, like again start workflows and you could have your batch executions of things and stuff like that. As far as testing and debugging goes which is important. Again, you can use your IDEs and the standard testing and debugging libraries and the debuggers of your choice. There is nothing special that you have to download or use in particular. As far as testing goes, you can test both your workflows and activities, and you can use your mocking lives that you want to use to mock things like that. And important thing about testing workflows is because you might have workflows there running for weeks, month or even years so long running type of execution. Temporal testing framework provides a time advanced feature, which allows you to test even workflows that might be running for multiple years within milliseconds. So that's kind of like an important thing to understand that you can really test any sort of application or service that you work with Temporal easily. As far as activities goes, they can be tested and debug independent. And again, because of the testing framework that Temporal provides for most SDKs, you do not have to even have the Temporal server running in order to test your code. Now, from kind of like an architectural perspective or a little bit higher up view, we have to ask ourselves, if we start adopting Temporal, can we still use frameworks that we are accustomed to and that we currently want to use? And the answer is yes, Temporal is not intrusive in this way whatsoever. We talked about this, can I use it already? Can I use my current programming language? And the answer is yes. What about the dev environment? And again, we kind of went through that for both testing and debugging and also writing your code. That's the case. And can I use my testing libraries that I'm already currently using such as JUnit, PHPUnit, Testify, and again, yes, you can because Temporal does have the programming language approach. You can use all those libraries and tools as well. So that's kind of one perspective of looking at, let's say, writing a new applications and things like that. But what if we have an existing application? And typically this is kind of like we have some data model. Inventing platforms that we're using, our application itself in the middle in the box can provide some sort of object model and code that we currently might have that communicates with, for example, third party systems and different UIs in order to accomplish the business logic or end goal of our application that we provide to our customers. So once, if we wanted to incorporate Temporal into the mix, what we have to really pick and choose is which parts of our code, mostly the, you know, the business logic to the core business logic code executions or the orchestration of, for example, these services in UIs, you want to turn into workflows, and which parts of our code that currently interact with different file system, database access, invocation, rest, or, you know, async, sync, invocation of this third party systems we want to turn into activities. So finally, what is the value proposition of Temporal? The value proposition is that no matter what we do or what our title is in the end, we build services or microservices in general. And these services have to be durable, distributed, scalable, and of course polyglot in most cases, and that's up to you. But Temporal provides it for you, it really allows you to focus on writing your business logic without having to think about all those benefits that you get pretty much for free. So let's take a look at a quick example. This is a Java example. Again, Temporal provides us the case in different languages. But in the left hand side, let's say that we have an existing class called MyCustomer that has some state, for example, it holds a customer, and it has a mind method called UpdateAccountMessage, where we want to update some information about this customer. It has two methods on the bottom, I call GetCustomer, which allows us to receive information of the customer we're currently processing, and it has an exit method where basically that method allows us to stop processing if that needs to be. You know, we're dealing with Java, we have some sort of interface, MyCustomer interface that are my customer class implements. And this is kind of like the blueprint, it has the three main methods of the account message, GetCustomer and exit. So in order to turn that into a Temporal workflow, really we have to start with the interface. So we just use annotations, for example, at workflow interface, which basically says that any class that implements this particular interface should be considered a workflow. So the UpdateAccountMessage, which is kind of like the core method or the main method of our class that implements this business logic is annotated at the workflow method. And our method GetCustomer, which allows different clients to get this information from workflow is a query method. And our exit, which is basically receives a signal and outside signal or data that we want to receive when during execution of this class, we annotate with that signal method. So with that itself, our class on the left-hand side has become a Temporal workflow. And two things we have to take a look at, how do we actually interact now with our workflow, how do we start it, stop it, and how do we get information out of it, and how do we signal, for example, send signals to it, as well as how do we write this business logic that we want to actually implement this class with. So let's take a look now at some of the temporal features provided by the temporal SDK and their APIs. The first thing that you get out from the temporal APIs is, for example, you can start workflow execution and you can be long running. So on the left-hand side, for example, we show that, hey, we want to start a workflow execution implementation of your business logic, and we're going to let it run up to a year. The workflow has state, and the state you don't have to deal in your code yourself, as far as actually writing some sort of calls to databases or persisting yourself. That is done through Temporal. Again, it's not the actual class, your code is persisted, but it's event history, and we'll take a look at that a little later on as well. Your workflow code can be fault tolerance. On the left-hand side, see that we can have automatic retries for activities. We can have retries for your workflows. You can reset your workflows, cancel, terminate, things like that. There's APIs for that. And also, because this is programming language approach, you can try catch and you can catch some certain exceptions you can perform, things like compensation before ending your workflow execution. We can also define through the APIs that we want periodic execution via some cron, for example, and we can invoke workflows and as well also as activities sync or async. So Temporal provides full support, and in some programming languages, it is the fault approach to actually invoke things async itself. Finally, we all know, especially when you have long running business logic or workflows, you need some sort of versioning. Changes happen, so Temporal APIs provide you a way to version your code, even while it's currently even running and deploying new version and deal with updates automatically. Of course, we talked about testing already. So again, all your workflows activities or any code that you write using the Temporal APIs is fully testable. Now let's take a look a little bit how we interact with our workflows. So the left hand side, let's say we have some sort of client application, a client that actually wants to invoke an instance of our business logic or a workflow, workflows that we develop in our services. So the client API can send commands, those are again commands they're based on GRPC, and we send them to the Temporal server to its front end service. So one of the commands that we can send is start, but there are many more, we can signal query, cancel, things like that. So in the left hand side, let's say we have some code that actually uses the Temporal Java SDK and its APIs to call a workflow client on start, this is an async invocation that we request for the server. The server itself, at this point, does not really know what service or our service is actually going to execute or pick up the request to start a workflow execution. But what the server does, it's going to put a message into a particular task queue. Now, our application, our service on the bottom left that we're writing the includes our workflows and activities of our workers, we tell our workers to listen to this particular task queue. And then when a message arrives, in this case on the right hand side, a little red circle, that includes some information about wanting to start a workflow execution arrives, our worker is going to pick it up, is going to read the instructions and start the execution of our new workflow instance. So this is a fully distributed system. Workflow execution that starts, let's say on our service here on the bottom left side can actually during some time continue on a completely different service, if in case of a failure in case this particular service goes down and things like that. It's completely temporal and it's fully distributed, meaning that you have the ability to actually built in fault tolerance and reliability, things like that into, you know, your whole equation without really having to write there or code it yourself. Our task workers picked up the initial task to start to work for a started processing to some point where it needs some more information from the server for example, schedule an activity execution or start creating a timer in case where our work was as let's say sleep for 10 minutes or 10 days. So our workflow can send again a message to them portal server say basically saying, I want to schedule in an activity execution, the activity itself, because again distributed system does not have to be even executed within our service but could be a completely closed the particular activity. And again temporal server is going to put a task on a specific task you that we request, and it's going to be picked up by some worker and execution is going to move forward with that. Just going a little bit into this more we talked to temporal is a very resilient system. And once the message just arrives from the task you're in the worker picks it up on the left hand side let's say we have a workflow code that we want to execute the message that itself or the workflow task includes things like what should be executed next and all the history and information that's happened so far on the left hand side once this task in workflow task is received, we use the past events or the workflow history to put the workflow in the same state that it was before the task was received. And again, this workflow can be replayed or its state can be placed into right before the workflow task was received in a completely different machine. Once we have replayed and put the workflow state into the exact position where we needed, we can use the what's next part in order to continue workflow execution from that point on. Again, if this workflow history can place the state in the same state he was before it was, it's called deterministic so we can move on with the execution. If the event history does not match, for example, you made some changes to our code without versioning it, we can run into some non deterministic errors as well. And tomorrow so we'll let you know and still give you the chance to fix the error and not just fail your workflow. So let's take a look now at service orchestration, this is kind of like a common way where we have let's say a food delivery service written in any programming language for which the case where we want to basically with the use of temporal server orchestration third party services on the right hand side like dispatch service restaurant service payment service and things like that. So with temporal, you can deal with intermittent failures, meaning that is some of those third party services are down, does not mean that we have to fail. If that happens or doesn't really mean that we have to write any code in order to deal with this particular errors with temporal server and it's as the case, you can deal with intermittent failures without any worry and you can actually fix those errors. And, and for example issue retries until this particular services come back up and are available. And other things to be temporal provides this dealing with continuing failure let's say in this case or payment services down and it's not coming back up. In this case, we can deal with this here and actually do things like compensate or workflow and do some other things in order to to deal with this particular permanent or continuing failure as well. Another thing that temporal allows you to do is rate limit the services you're invoking. So even though in this case these are third party and we don't particularly have anything to do with the code or the services itself we're just using them. We go through temporal server can rate limit. So for example, if the payment service has a cost associated with it. And we don't want to go for example, over 100 invoking it 100 times per second or per day or whatever the time is, you can define this rate limiting in your application and temporal server will make sure that the rate limit operations are fall. Another kind of way of looking at this is let's say we have two services that both are talking through the temporal server food delivery service. Let's our dispatch service. In this case we can rate limit our applications ourselves so we can define rate limiting your own service code. We still deal with error and propagation across the services. So as in before where we had third party services we had to deal with things like, HPP errors like four or four without really having any ability to gain some specific errors. If you have different services that are using temporal, you can have very, very powerful error handling and propagation that we will see here in a minute. In addition to that, one service that we write with temporal can be written in let's say go. And the other service that we write them might be a different team could be writing written in Node.js. And really, in addition to what we can do is we can have even further where our workflow for example could be written in one language, our worker can be written in a different language. In the service B side, we can have our activities again, written in different so as far as the polyglot aspects go that the portal server is able to serialize and serialize the data, and also workflow information, so that we can have this polyglot type of collaboration and communication, including error message and propagation and things like that across distributed services in different programming languages as well. So let's take this for a minute at as far as polyglot goes as far as error handling. Let's say on the left hand side we have our food service and our workflow and activity written in Go. And our application starts by making an instance of our food delivery workflow. This workflow let's say calls our activity, and then this activity through the temporal server and then the typical SDK APIs, client APIs invokes the restaurant service that's written in Node.js. And the same thing here we go again and the restaurant service activity invokes the payment service activity which is in this case let's say written in PHP. Now in systems or distributed systems like this, let's say in our payment service we have an error or an exception happens. Typically, you know our food delivery service is going to have no idea what actually happened and will not be able to receive the proper information of what where and how things broke in order to be able to let's say compensate for it, or fix the error. When you're using temporal, the payment service actually we has through the temporal server error propagation. So we're going to propagate that error from PHP to the restaurant service written in Node.js. Then we can propagate that error back to the restaurant service workflow, and then back to the food delivery service, which is again written in Go, and it's work full. So when this error propagates back all the way to where we started our execution. This error is going to be able to have all the details, specifically the servers that are failed. All the information of the payments is added to the exception as well as the original exception and all the things that do, for example, the restaurant service added to the error. And again, when we catch this error and get it back in our food delivery service when exactly what failed. And how to deal with a particular error. All right, so we come to the demo part. In this case, I wanted to demo the resilient server orchestration by showing actually a temporal workflow that invokes similar to what we've seen in the slides before. A couple of our services. So in this case, we have a patient onboarding workflow and below is the URL, GitHub URL where you can clone it and run this project yourself. All right, so let's get started. In our case, what we have is first our services. Go ahead and show you that we have a patient onboarding service which has three endpoints assign a doctor assign a hospital and notify a patient. So basically with the service that we want to write or the core business logic is to invoke this particular or orchestrating locations of the patient onboarding service and it and points in order to onboard a customer. So we have an application that we're running that's running our workflow so let's take a look at the workflow itself. The workflow is written in Java it's an onboarding implementation, and he basically has a bunch of different activities that as you're doing in order so first we're going to assign hospital to a patient. We're going to assign a doctor to a patient. We're going to notify the patient. You know we have assigned them to a doctor and the hospital and file we have a final onboarding step that we want to perform. So let's take a look at our application also has a UI. So, our UI is basically, you know, very simple we have some patient information so let's say we have a customer called patient called john. Let's see has a zip condition. So they have asthma john at home and let's say 555 and prefer contact method is text. So once we actually press this onboarding patient, we're going to communicate with the temporal server to ask it for creation of a workflow instance. The application here they also host our workflow and activities and the workflow worker is going to pick up the task from the temporal server and execute this particular workflow so let's go ahead and do that as we do that you I shows all the activities. So for example we are signing currently hospital to john. We're signing a doctor and this is again per our instructions in the workflow, we're notifying the patient, and then we're going to finalize the onboard. So we are done. With that, we can see that our UI updated, and we see that our patient john was assigned a hospital and a doctor, it has been successfully onboarded. So if we look at the temporal web UI, we also see for example that we have a particular workflow that is in status complete. It's our onboarding workflow, and we also can have the history such as we can see the input which is the form input that we have in the form, as well as can see the results. They're also we displayed on the page. In addition, as we talked about this is the actual history will all the execution information that was stored within the temporal server, and is able to have the ability to actually resume workflows from a particular failure. So let's go ahead and do this one more time. Let's say we have Mary. The headaches. I was going to type in something here. Now what we want to do is show failure so let's say for example we start our workflow execution. We're going to stop or fail our services during the execution or so what we have done the services their workplace has to communicate to have an intermittent failure as we have seen here. Our UI has stopped showing progress, and we don't have this patient onboarded. Now if we look at the temporal web UI we see that even though our services have failed, we have not failed a workflow execution. And we still see the onboarding workflow in this case the new one is running. So let's go ahead and bring our services back up. Let's say our failure has been fixed and see what happens and again we did not write any code for this we did not have to specify now workflow anything. This comes by default by using temporal and let's see if temporal is able to deal with the failure is see now our workflow has resumed once our services came back up. We were retrying basically our workflow was retrying to involve these activities as soon as they're back up the retry stopped and we were able to onboard Mary into our system. And again if we look here we can see now that our workflow has completed. Alright that's all I had for today I hope you guys enjoyed I hope you enjoyed the conference and have a great day. Thank you. Bye.