 All right. Welcome to this talk. Houston, we've got a problem. How to debug your pipeline in Tecton. I am Vincent de Mistyre, Senior Principal Software Engineer at RELAT and I am one of the lead of the Tecton project. I also work at RELAT and I work on a lot of Tecton stuff on Tecton as well. And yeah, today we are going to talk about how to debug your pipeline in Tecton. All right. So let's go through the agenda. First we'll define a little bit what do we mean by debug? In debug what? Then we'll go rather quickly on what is Tecton in Tecton in another shell. Hopefully you know already a bit what is Tecton when going to this talk, but if you don't, this will be a really quick introduction to Tecton. Then we'll do a little bit of magic in Russian dolls, explaining how all this works. Then VBAP will do a demo and we'll finish with the next steps after this demo, after this work what will be the next steps. So first, debug what? So let's talk a little bit about debugging. So debugging is the process of identifying and removing errors from computer hardware and software. And in this case, we will be talking about debugging, but it will be in terms of pipelines and debugging pipelines themselves. So what do I mean by debugging pipelines and what are pipelines? So pipelines usually is a linear sequence of specialized modules used for pipelining. Basically, this means you do a task or an action, then an election, then an election, and some can be in sequence, some can be in parallel. There's a lot of types of pipeline, there's data pipeline, there's even pipeline, there's a lot of type of pipeline. In our context, in the context of this talk, we're mainly talking about CICG pipelines. And in our example, it's more accurately be useful in a continuous integration pipeline. But yeah, we will focus on the CICG pipelines, but this could probably apply to other type of pipeline that can be done inside Texan if we want to do. Yeah, so why would we ever want to debug our pipelines? Now, if there is a pipeline which works on your machine, but not on the CI, it's very hard to understand why it's not working on the CI because you would need to understand what is the environment the pipeline is running in. And that would be the CI environment and you would need to get access to the CI environment and understand this thing. So debugging pipelines could be helpful in this way where you just get access to the CI environment and figure out why it's not working on the CI. But then there are also times when it only runs on the CI and you can't run it on your machine because of resource constraints and you will have to debug your pipeline while it is running and go to a particular part of the pipeline and debug it there. Then while you're debugging that same pipeline on the CI, there is a possibility that you won't be able to fix it if you are just passing some parameters around and you never understand where the error is happening because you run the pipeline, you see the error, you fix it, you run it, you see the same error, you fix it again and you keep doing this till you get really frustrated and you probably want to leave your job at that point, but you can't do that and you have to keep debugging pipelines as they are. But there are better ways to debug pipelines and instead of doing it in this way and some CI tools allow you to debug pipelines while they are running and they provide great support for this. Some of these examples are so-called CI, which will do this very well. So getting inspiration from this, we in Tech Town figured that we could do this in Tech Town as well. So how would you debug pipelines? So this is kind of explaining what we will do in Tech Town and what other tools kind of do in some manner. So debugging pipeline, what do you want for it to do once, one first? You want to be able to pause the execution of your pipeline at demand, so anywhere if the user wants to, and or in case of a failure. So some steps or some action failed, I want to stop the execution there and be able to drop in. So we want to allow the user to drop in directly in the CI environment so it sees what files are there, what processes are running, anything really that could help them debug more what happened and why it failed and probably also re-execute some stuff in the context of the environment, the CI environment. And of course, once we allow the user to drop in the CI environment, we want to allow him to continue or break the flow. So if he feels like he knows what's happening, he fix it manually for this run and wants to see if there's anything else that will fail later, he can continue it with success and the pipeline continue running. Or if he knows how to fix it, he's pretty sure it's not gonna fail after, he can just go and fix it and thus he wants just to not consume too much resources and break the flow, finish the pipeline as it would have been without the debug. And how we will do this in Tecton, we'll dig a little bit more how we do this in Tecton, but this is based on TEP, the number 42, so TEP stands for Tecton Enhancement Proposal, and this TEP is about adding breakpoints on failure for steps inside a task run. This is focused on task run to start the work, but this TEP is gonna extend more and more as we go. So first, let's go quickly on what is Tecton for those who do not know what is Tecton or have a really general idea of it. If you do know Tecton very well, the next few minutes will probably be a little bit boring, but it's okay. So what is Tecton in a nutshell? Tecton is an open source project that aims to provide a set of standard and shareable components for building a Kubernetes-sized CICT system. This is governed by the Continuous Delivery Foundation, the CD Foundation, which is a kind of a cousin of the CNCF, and contributions are coming from literally everywhere in the world and lots of companies. Some of the most important ones are Google, Red Hat, CloudBees, IBM, Pivotal, and D2YQ. There's many, many more. It's a really highly active community right now. So in a nutshell, a little bit more. So Tecton allows you to write declarative pipelines with standard Kubernetes custom resource. This means we're using the exact same mechanism used for services, deployments, codes in Kubernetes, but to be able to declare pipelines. Everything runs into containers. So Tecton itself, the control plane, etc., are running inside containers. And any steps, any task, anything that you might do inside your pipeline will be running in containers, in Kubernetes. You can do almost whatever you want inside your pipeline, because as long as it runs into a container, it's going to run into pipeline. One example is you can build images using Kubernetes tools inside your task. You can use whatever tools you want, and then through parameters, and through results, and through resources, you can reuse what you just built somewhere else. So you can use image, build, icon, whatever you want. You can deploy to multiple platforms. So of course, you can deploy on Kubernetes. So anything like the Kubernetes pods, deployment services, serverless, etc., but you can literally deploy on anything as long as you have tools inside your containers allowing you to do that. One example that we have upstream is there is some task that allows you to create a Mac mini VM. I don't even know if it's a VM or not. Maybe it's a real one. Somewhere in the cloud, deploy something in there, run your tests, get back the results as everything is running into containers. You can do this. And it aims to provide a set of powerful user interface. Right now, we do have an official command line tool, an official dashboard, the command line tool being TKN. And yeah, let's dig a little bit more into the tecton concepts now. So there are a few concepts in tecton that really help to understand how the tecton pipeline itself works. So these will be step, task, pipeline, past run and pipeline run. And you couldn't think of these resources of two different kinds. The first one would be the definitions. Second would be the executions. Among the definitions, we've got task and pipeline, where a task is a list of steps that are sequentially in the same port. And each step is by itself running in a container. And a pipeline is a graph of tasks with inputs and outputs executed in a certain order. And a pipeline. And when we see a graph of tasks, we mean that these pipelines can be made of different tasks together. And these can be pre-existing tasks, new tasks that you can create right in the pipeline itself. And then you can just basically compose different tasks together to form a pipeline. And this really shows the composability of tecton itself when it comes to creating pipeline. And to run each of these tasks and pipeline to have these as reproducible as possible, you can just define these tasks and pipelines once and then run the equivalent run resources, which are that you have a task run for a task and you have a pipeline run for a pipeline. So if you want to run a task with certain parameters, you would basically create a task with a task run. And when you create this task, you can give any parameters that you want the task to run with. If there are some other parameters you would want to run the task in the future, you can do that as well. You just have to provide different parameters in that case. And the same is the case with pipeline run. You already have your pipeline with the tasks, which can run either in sequence or in parallel. And you just provide a pipeline run, which has all the inputs that are necessary for the pipeline. And then you can just run the pipeline run. And in this way, you can just keep testing your tasks and pipelines with different parameters. And you can compose different pipelines with multiple different tasks. And you never have to have to write different pipelines every time, even if you're using the same functionalities before. So Tecton is in this way very reproducible and composable. And it makes it easy for the users to just kind of get down and start making pipelines easily, if you have a certain set of tasks. And a lot of these tasks are present in the task catalog, so the user can just start building different pipelines. Okay, so let's look at what a pipeline looks like here. So you can see that there's a pipeline, and then there are four tasks over here. And the first task is running the first three steps, and each of these steps runs in a container of itself. Then the output is being passed to the next two tasks, which are running in parallel. And the second task, which is there, is sending its output as an input for the last task, which is there. And you can see that the first task is running by itself, second to in parallel, third one by itself. And this is just a definition of the pipeline, and the task that it's references or has defined and defined in the pipeline itself. So to run this pipeline, actually, you have to provide a pipeline run, and which sees that there's an execution of the pipeline, which creates task runs in itself to create each of these tasks, task runs and execute them. And before there were pipeline resources, and currently they are being deprecated, and they are replaced with inputs, other kind of inputs such as workspaces and parameters, workspaces, obviously, places where resources can be shared or data can be shared through volumes, and inputs can be given via parameters also. And these two concepts replace pipeline resources completely. Let's look at, let's now look at like what a task run actually looks like, because in this demo, and in this scenario, we are going to focus on how to debug a task. And in the future, we should be also able to debug a pipeline run, but for now, we'll only focus on a task. So let's talk about what a task actually looks like. Okay, so here in the task, we've got a spec, and there are steps which we can define in the task, and we are going to focus on this task, not this task in particular, but we're going to focus on tasks and task runs, and trying to debug them. In the future, we should also be able to debug pipelines, pipeline runs. So for now, we are going to look at this task and create a task run from this task, and I'm going to try and debug it. So let's actually understand what debugging looks like, and how we've gone ahead and implemented it. For that, we'll look at something called the Russian Doll entry point hack. And Vincent, you'd like to go ahead with it. Yeah, so let's dig a little bit about the Russian Doll debug. So we use the Russian Doll as a reference, mainly in reference to the talk that Christie Wilson and Jason Hall from Google did in 2019 in KubeCon, where they described a little bit what Tecton was doing, what Magic Tecton was doing to be able to run containers in sequence inside a pod, because this is some need we had to be able to control the flow of execution inside a pod, and this talk is a bit going through this. So I'll let you, after, watch this if it's interesting, but in a gist, what we do, let's just first define what we mean by entry point, and then what this hack is all about in one slide. So another point in the container world is whatever binary or command that is going to run when the container starts. It's either specified by the image itself. So if you build your image using the Kafe, for example, on the left, it's whatever was defined as using the entry point keyword. If you don't want to use the image entry point, you can always specifically explicitly define the entry point when you describe your container. So inside a pod, it's going to be container step, which is in the step container spec, which is what we have in steps in the task on the right. So you are always able to define or override the entry point you want when you start a container. But yeah, this is what we mean by entry point. And the talk that Jason and Kristi did, resumed in one slide is we do overload the user entry point. So either the entry point that comes from the image or the one that the user provides, we override this one with our own one we control. This is a binary called entry point. So what Tecton does when it's scheduled a pod for a task, it's copying itself the entry points inside each and every container so that we know what we execute. Then we do change the entry points to be our own entry point. And then in arguments, the whatever was the original entry point. So if the user didn't specify anything, we will check what was the image entry point and then we'll append it to our arguments. If the user provided something, we will append it to ours. So for example, if the user wanted to do a ls-l as entry points, in a really simple way, this is going to be executed as our entry point then ls-l. And thus we can control whatever gets executed. And thus the third step is going to be our binary is waiting for some signal to start. We are using files from previous steps if there's any previous. And then if we got the signal, we just run the initial command that the user provided. And then when it's done, we can gather the result. So gather if it failed or not, which exit point it was and signal the next step. So in the natural normal flow, if the command failed, we will write that it's an error and the next step entry point will know that the previous step failed and skip and so on for all the other steps. So this is how the entry point works. If you're interested in digging more, I'll let you watch the talk after. But for our debugging purpose, we needed to enhance this and this is where the debug breakpoint story happens. So as you said, to actually start debugging and providing the feature for debug, we've had to expand the entry point hack and a little bit flip it over its head and use this hack to actually control the lifecycle of the steps itself so that when a step is executing and it fails, the step should stop at the failure and not stop the execution of the task on itself. And the user should be able to go into the environment and then debug it and troubleshoot and figure out what's the issue and then go back and fix the issue that is there that they come. So for this, what we do is first of all, we mark the task on this debugable by adding the debug spec and also adding where we want the breakpoint to be. On failure over here is a dynamic breakpoint which basically stops the task on any step that fails. So how many of the steps fail? The step would be paused at that point and the user can basically gain access to the environment of the step at that point. And what is done is when the user gets environment access, they can debug do whatever they want over there and then they can after that continue the step from there by marking the step either as a success or a failure. So to continue and break the breakpoint and mark the step as the failure, we provide scripts which help to do all these things. So we provide a debug continuous script which basically marks the step as a success and then breaks the breakpoint. And the user can also mark the step as a failure by using the debug field continuous script. And the step would be marked as a failure and in the end the task can also be marked as a failure. So when this execution is complete, the user can basically go ahead and get an idea of what exactly fails so that they can go back to the task run and fix it over there. So yeah. Now we'll actually look at how to do this in an actual demo and let's go ahead with it. Okay, so we are running Minikube and with tecton installed and you can see that there is a tecton controller on the web for what's over here. And what we are going to do is we are going to run our task, a task run which contains a task spec so we don't have to create a different task. And this is going to have a task which is going to write a file and then start it and then read the file. And let's see how that goes. So this and somewhere along the way the task is going to fail and we will probably figure out by the end like how we can debug and fix the task. So let's go. I'm going to go ahead and create this task which is called step script bash. And then you can see here that the task run has been created called step script and a generated name. And it's in pending mode because the port is initializing task and port is initializing. And over here we should be able to see the task and logs which come up. So we'll just wait for a second till that comes right up. Okay, so you can see the task, task run is running. And but you can see in the logs also that we have got an error in the stat step, which says that cannot start shed data, new write no such file or directory. So I don't know what's going on here. Let's go back and check in the task and so like what happened considering like we are on on failure right now, we can see this debug steps saying that skipping writing on writing to the post file related to debug. And what we do is we will get into this pod and go to the step start container, which is basically step dash step name that is the container name format for this. So this is the step that we are going into, we're going to see like what is going on. So we are using the Ubuntu image here. So there is the bash shell. So we are able to use it. So we are in the bash shell in the container right now. And what we are going to do is we are going to check cannot start shed data new write. So we're going to check if new write is there or not just in case. So there is no such file or directory. Let's just go ahead and share data if we can figure this out. So we are in shed data. Oh, it seems like there is data in the shed data going to be there is something that is awesomeness inside. Now what we're going to do is we're going to, okay, now we know that the data is in data file and not in new write file. So we are going to go to the task run and try, I'm going to try and debug it. So the task that we ran was the step script bash in the step script bash. So the second step, so the write seemed to work well. And the stat is doing a stat on shed data new write, which is wrong because it seems like it's writing to data and not new write. So I'm going to go ahead and fix this right away. And then considering that considering that at the end in the last step, we are eating shed data data. So I think it should work. So we just mark this step as a success for now. So let's go back to the container and then mark this task run as a success. So I'm going to, so how do I mark this as a success? So I'm going to go to tecton debug scripts. And in the scripts we have, so these are the debug scripts which come along with the debug spec. And the user can use to either mark the step as a success or as a failure and close the break point at the same time. So I'm going to go ahead and run the debug continue script over here. And yep. So the board terminates and the task run is marked as a success over here. And you can see that, see that you can read awesomeness over here as well. So basically the task is passed with success. Now let me just go and test out if the check that we made, the debugging that we did came to fruition after changing the new write to data. So I'm going to run the same task run again. So we've got, we've got this task run right here. It's, the board is initializing. It's running. It'll start running soon, I hope. Right. So as, as we saw, and as we are seeing it right now, we were able to quickly get in the container and see what's was inside. And it helped us see that we were looking at the wrong file. So yeah, this was pretty useful. Vibab right now decided to mark the step as the, the step and the task as a success because we knew the next step would succeed. But we could have done the opposite, doing, marking it as fail and just have a new one. I don't know if you want to demo that. Let's actually see how that works. So how marking a failure works. And if it actually marks the task on the failure, so I'm going to go ahead and go ahead and start again the wrong, wrong file. And then we'll try to run this again. And let's, let's mark this as a failure and see how it goes. So I'm going to create this again. And this one, as we know, it should fail on the second step. So let's get the pod for the task run. And let's wait for the step to fail. So we have this fail step over here again. So we're going to put it in it. And then we know, we know the drill and we know that the file is wrong. But we are just going to mark the step as a failure. So we are going to run the script for failure. So debug fail continue. So I'm going to run this. And even if, even if the task continued, and we had the output, the step failed, so you can see that a task run over here is in a failed status. So in this way, like you can mark this as a success or a failure. So let's recap the demo and let's recap the demo and see like what we did basically. So there was a task on which we executed and a step in the task and failed. So we had already provided debug breakpoint on failure in the spec. So the task and did not complete the execution rather it halted so that the user can come in for debugging. And when the task was halted, the user could get access to the debug environment in this case that is a step and container. And the user could figure out why the step failed and then mark the step as a success after failure, after it failed. So they can just let the task and continue as it did. Then after figuring it out, the task continues executing with no problem. The user now knows the reason why the step failed in the first place. So this basically didn't involve any kind of hacky processes internally. It did, but externally it did not let the user to figure out like, okay, why did this fail? I probably need to run this again. The user had to run the task in just once. The failure happened only once and they were able to debug it, understand the root cause of failure, and then go ahead with running the task again. All right. So this is more or less the current status of our proposal and our work. Let's see what are the next steps for this. So there's a few things that we need to enhance our ad to have a more complete debug feature. One thing is right now there's a kind of shortcomings, which is if you are using inside your steps an image that doesn't have a shell, then the debug scripts won't work. You won't be able to continue or break the flow because we need the shell to do that. One thing that we would like to provide for this reason and for other useful reason is for Tekton to be able to provide its own shell, probably as part of the entry point itself. Either the same binary or using the entry point injection act to inject another binary. This would be useful because then no matter what image is used inside your step you could debug it, but it would also mean no matter what image you use you could use the script feature of Tekton. Then we also want to support more modes or breakpoints that the user can provide. We demode the only mode that is currently supported which is the unfailer mode. We would like probably to add more things like debug on each steps or specify the steps that we want to debug, those kind of things we need to support and probably the same on the pipeline itself. So be able to say I want this and this task to be debuggable but not the other ones. In addition, we would like probably to go forward. We are waiting for feedback of the community but we are thinking of in the future being able to actually do the software debug of a process running in the CI directly from your IDE. So the flow would be something like I do run my pipeline. There's some end-to-end tests that runs in my services running and is opening a port to debug. I would like to be able to hook my IDE on this running process somewhere and just do my usual IDE debugging feature on there. And of course one of probably one of the most important except is the better user experience. So we would like to make it even more easier for the user to actually debug each steps or each pipeline. So integrate it with TKIN, the Tecton official client or dashboard where you could just see it failed and it's in a debug mode. So just do TKIN debug and get dropped in or from the dashboard just okay it's in debug mode just drop me in the shell. And of course more tooling integration. So integrate it directly within your IDE, being IntelliG, being VS Code, etc. And those are the next steps that we are seeing right now. Of course we are open to any feedback on this because it's not set in stone, it's going to be an experimental feature to start with. So yeah we would like to hear from you. And yeah thank you for attending this talk. As other beautiful talk we are available and on chat for questions. So yeah thank you for listening to us and watching this talk. Thank you everyone. Take care, bye bye.