 Today we're going to talk about how to save your pipelines and potentially your clusters with Tecton results. My name is Adam Kaplan. I'm a software engineer at Red Hat and I'm a contributor to the Shipwright and Tecton projects. I also represent Red Hat on the CDF governing board. I've been with Red Hat now almost a little over five years helping companies deliver container images on Kubernetes. And before that, I helped companies deliver actual physical containers at a logistics-focused consulting firm. I wish my colleague, Dibyo Mukherjee, would be here with me, but unfortunately he couldn't make it here to Vancouver, but he had a huge part in this session and also with the results project itself. And he deserves a lot of credit for what you're about to see here today. So there's a small group here, so I just want to ask a quick show of hands. How many of you here are running Tecton for either your teams or things that you are building within your companies? I've got a few hands. I've got a few folks up front who I definitely know they're using Tecton. And so if you want to keep your hands up, how long have you been running those clusters for? You've been running them longer than three months? Got a few hands, more than a year, longer, more, okay, yep. So those of you who have had to operate Tecton in production, you know that there are some real fundamental challenges to operating it, and for those who haven't, I'd like to give a quick explanation of why that is the case. So when a lot of teams start building container images in particular with Tecton, they'll create a pipeline that looks kind of like this, where you're going to clone your source code, you're going to probably spin off a set of tasks that will actually build a container image, most likely push it to container registry. While at the same time with that source code you'll run some kind of static SAS analysis like a tool like SNCC or maybe something that is specific to your programming language like Gosec. Once the container image has been built, you might have another step where you're going to then scan the container image so that you can look for vulnerabilities with the actual underlying base image, for example. And then at the end, you'll want to pull it all together, analyze what happened, make a determination whether or not the pipeline succeeded or failed. So for folks who are maybe that's new for Tecton or they're experimenting with it for the first time, Tecton is a cloud-native, effectively workflow engine that you put on top of Kubernetes. And so the question for the audience is, does anyone know where this data lives, this pipeline? Where does this actually go? If you have operated it, you may find out it is actually there in Kubernetes itself. And in the first versions of Tecton, your tasks and your pipelines had to be stored on the cluster as custom resources. With the new resolvers feature, that is no longer a requirement. And if you saw Christy and Wendy's talk yesterday, there are really a lot of good reasons that you want to use resolvers that are beyond just the storage savings. But once you start running the pipeline, each task and step has to be stored in full so that the system has a fixed definition of what to run. Your pipeline runs will then spawn task runs. Those task runs create pods. You might get a PVC or three in there. You'll also need things like secrets and config maps so you can do things like pull from a source from a private repository, push your container image to a container registry, I should say. And so forth. But unfortunately, with each pipeline run and task run, you're adding fuel to a potentially explosive situation. Each object stored means that the pipeline controller needs to do more work when it is restarted or upgraded or even kind of reconciled. More data in cube means that you're going to have potentially slower at CD queries, which is the database behind Kubernetes. And anything that is on top of that just gets slower and more sluggish. So it's not just the Kubernetes API server. If you have like a web console, for example, that'll get slower. Tecton has a CLI called TKN that makes it easier to run your pipelines. That, too, will start getting slower. Everything will start just getting bogged down. But really more importantly is that LCD has a hard storage limit of 8 gigabytes, which for running workloads, that's great. But for a persistent store, that is really terrible. I think we're going back maybe 40 years in terms of what our capabilities are. That sample build pipeline that I showed you before, if you actually take the data out of Kubernetes and ultimately it's just all JSON, there's really not a ton of compaction or compression that is going on with that data. It can take about 300 kilobytes give or take. And if you do a little bit of math on a real world cluster, you might be able to hold about 15,000 of these. This was just some quick back of the envelope math that I did there. Because in a real world, you might not just have a dedicated cluster for CI CD. You might be running your own application workloads on there. You might have other things on your cluster that are running as operators such as like Argo CD. Anyone who is using OpenShift, OpenShift 4 has a lot of operators that get installed. So by some of my math, HandWave Math 15,000 is probably a good kind of round number. So as your cluster scales, if you don't do anything else, the clock starts ticking on your cluster. And the more people that start joining your CI CD cluster start doing more work with it, as more projects get onboarded, that clock ticks faster and faster and faster. I use Kubernetes CI as like an example of CI at scale. The Kubernetes project is one of the bigger open source projects in the world today. I want to thank the SIG Infra folks and Ben Elder in particular for pointing me to the data that, if you look at it today, they're doing about over 20,000 jobs a day. They're not using Tecton. They use a system called Prow. But a job is, if you wanted to use it with Tecton, it would be effectively a pipeline run. And so if you wanted to just take what Kubernetes CI was doing today and try to do it with Tecton, your cluster would not even last a full day. A quick show of hands, those who have either done, used Tecton in prod or are just running Kubernetes in prod. How many of you have had a cluster die because there were too many objects in it and you ran out of storage? I got one hand in the front. I can tell you right now, at Red Hat, we manage openshift clusters for our customers. And there were definitely some early days where we had tech preview of our version of Tecton and we had clusters die in left, right, and center. So as a naive kind of solution to this, you can try a verting disaster by having a quota. So Kubernetes lets you define quotas on how many objects you have. You can have a quota here where you have three because so many people are using my CI cluster. But once my pipelines are run and I try to run another pipeline run, I get this error from Kube saying that you've exceeded your quota. So this is kind of useless to a developer. What good is your CI system if eventually you can't run anything on top of it? So the next logical thing you could do is start deleting those old ones. And, you know, that helps. It certainly frees up the storage. But especially for CI CD, those pipelines contain a lot of really important information. Things like the logs. So when you try to debug a failed pipeline, if you delete your pipeline run and everything that's underneath it, you lose the logs. If you want to do any kind of analysis, try to extract things like Dora metrics, you can't do it if you don't have the data. So how can we save this data? That's where Tecton results comes in. It is here to help rescue you. So Tecton results makes it easy to save your pipelines and logs off cluster and then provides a simple API that lets you access the data after the pipelines have been removed. So the results project consists of two components. There is a watcher, which for those who are familiar with Kubernetes terminology is a controller, which monitors those pipeline runs and task runs as they execute on the cluster. And then there's an API server, which receives the data and provides an endpoint for users to access. The API server is backed by a Postgres-compatible database, and there's a new feature, Logs feature, which I'll talk about in a bit, that gives you optional storage for logs. So you can use Postgres directly if you're using a cloud provider database service such as Amazon RDS. If you're using a Postgres-compatible engine, you can use that. So the way it works is that the watcher will wait for your task runner pipeline run to complete, and then it'll send the data to the API server where it is stored in the database. If the logs feature is enabled, those two are streamed to the API server, which then stores it in the log storage backend that you configure, which today we support a persistent volume, log storage, or Amazon S3. So once the data is in the database, it can be retrieved over an HTTP API, which can either be REST or GRPC. And so here's kind of an example, and I want to kind of step back a little bit to kind of showcase it, where we are going to query the API to find a pipeline run and then find its subsequent task runs and other records. The way that you can do this with results is a feature called CEL filtering, what CEL is, is it's a programmer-friendly syntax that in our case with results converts your programmer code language into effectively SQL queries, not exactly. So in this instance, I have an example here. We are passing in a filter. We are filtering on data type. So each thing in our results has a type. And if you notice here for pipeline runs, it looks very similar to the Kubernetes typing of API groups, version, and kind. You can then also drill into the data on the object itself. And since pipeline run is a Kubernetes object, it's a custom resource. We want to get it by its kube-friendly name. We can use, just look at metadata.name. So I want to see if I can maybe pass forward a little bit. So once we have that information, we try and query it, and then we can actually see all the records that are related to this pipeline run. So you can see we have the data for it here. And when you get it back from the API, it is all JSON-encoded. And it's all organized under the parent for its name space. We can then go a little bit further, and I'm just waiting to see where the demo goes. Yeah, so we can go further. And for pipeline runs by default, if it's just a pipeline run that was triggered by hand, we can use this unique ID to find the task runs and logs related to it. So once again here, there's a whole bunch of base64-encoded data, where you can see the task run. And we can go one step further once we have sort of know which records we want to search for. If we see we find a task run, we want to look more into what is in that task run and what was run. Let me just jump ahead. So what we can do then is drill into the task run itself, and with a little bit of JQ base64 and base64 decoding, we can actually get the original task run back out. So I'll go ahead and do that. There we go. So you can see we have our task run back, what was actually on the cluster. As it's scrolling here, you can see we have the specifications. So this is a very simple example pipeline that's adding two numbers. There's a step here. You can see the container that was run, the commands, everything that was run in this task, as well as some other extra data. So I want to call this out here. One of the annotations, one of the things that Taktan results does is it pulls out basically a locator where you can find this data once it's been removed from the cluster. And so you can see it's got a reference to the parent result that it's under, the record that's related to, and also where you can find the log for this task run. And speaking of logs, we can then go a step further and look at the logs themselves since I have the logs feature enabled in this demo. And it's just like before, we can use jq and a little bit of base64 decoding. And with the special, the new logs endpoint that was added, we can actually get the logs data back in just one moment. There we go. This was a very simple one, but we have our log back, and now we can safely delete this task run and pipeline run from our cluster. So how does this all work under the hood? The foundation is a very generic data model consisting of results and records. The records are what you probably saw in the returns from the API server, where we just use types to define the value, and the value is all encoded JSON. Postgres, what's neat about it is that very powerful JSON support, and we can, if you are operating it, you can add indexes that make it easier to do your own queries and analyze the underlying data. The results are used to kind of group the records together, and out of the box results uses owner references to organize things. It also has support for triggers. So if you're using Tecton triggers and you use a trigger to start one or more pipeline runs, your trigger ID can be used to group all of your pipeline runs and task runs together. All results also have a parent, which I may have alluded to earlier, which defaults to the object's namespace on Kubernetes. It doesn't have to be that. Technically, the terminology is intentionally a little abstracted. Towards the middle and end of last year, I was working on a team where we were actually extending results so it worked with the KCP project, so it technically went beyond just Kubernetes namespaces into the broader KCP workspace concepts. As alluded to earlier, results also has API support for GRPC in addition to REST. The watcher, in fact, uses GRPC to send the data to the API server, not just the JSON that we send over, but then with the logs feature, we're actually using it to stream the data in a way that not only has low latency, but it also ensures that we don't overload the API server too much. If you want to know what GRPC is, it's an implementation of protocol buffers. I really should add links here to explain what that is. It's a great project. Definitely go check it out. We also have a feature that lets you use Kubernetes RBAC to control who and what can access to the data in results, especially once that data is no longer on the cluster. One of the cool things about results is that how you define RBAC does not have to be tied to an actual custom resource on the cluster. We have a convention where we have virtual custom resources. They don't actually exist on the cluster. They exist in results, but we have defined an API group for it. We define resources for it, which we then use to control who has access to what. We use similar conventions to Kubernetes when it comes to create, read, update, delete permissions. This is an example here of a read-only role for the namespace where you have permission to get and list all the results. Here's an example in action where we have two namespaces, where we've got two pipeline runs, and I'm going to speed things up a little bit. We've got a cluster role defined that is effectively read-only, and we have a service account that binds to the role. Tecton results also part of the install. We provide a cluster role that gives you read-only namespace, so using a role binding, you can have read-only for the results objects in a single namespace. In this example, I have another namespace similar. It's a role binding. It uses that same read-only cluster role, and with the RBAC system, we use token access reviews to figure out authentication, and then we use subject access reviews for authorization. To access the record, we first need to get a token. It doesn't have to be a service account token. If your cluster is configured to use OIDC providers, you can use an OIDC provider. With that token, if that token corresponds to a user or service account with the right permissions, you can see the data. But then, if I want to go and try and use the service account of a different namespace and access the parent in another that represents it corresponds to, say, that original one, and if you don't have permission, you will get a permission tonight error. I hope I'm not too close for overtime. I want to... So what's next for tecton results if you like what you see and you want to see where we are going with the project? So many of the current contributors are trying to operate results at scale. So performance is definitely top of mind. A lot of the community have lately introduced significant database performance improvements. And we know... We have learned through operating it that there are improvements we can make to that RBAC authentication and authorization. A feature like catching is something that we don't have right now, but would definitely not only improve performance, but also take load off of the Kubernetes API server. In terms of the log storage feature, so the log storage feature started out as a TEP and one of the tecton maintainers Christy is here in the audience. We had a great talk back at KubeCon 2021, right, virtual, where we talked about TEPs and enhancement proposal processes. So shameless plug. Go ahead and check that out. It's on YouTube. In that, we discussed initially rolling out the gate with persistent volumes S3, but we definitely have visions of more object storage providers, whether it is, like, cloud provider storage for Google, Azure. Any other kind of, like, object storage is something that's certainly on the roadmap. Log forwarding service integrations is also another thing that was identified as a good thing to have. Many clusters in production, they are installing a log forwarder and using a log aggregator, whether that is something like Elastic, Lowkey, Stackdriver, CloudWatch, the list goes on. There are lots of good tools that also are doing the job of extracting your logs and forwarding them. So sometimes it's a matter of just let's not reinvent the wheel, let's figure out a way where we can rely on those tools to extract the logs and then provide an interface around the destination. Finally, in terms of other things that are sort of like we've explored as a part of Results as Perview, there's an older TEP out there about automatic resource cleanup, and so Results actually has this featured enabled in the Watcher where if you configure it with the right flag, it will delete your either task run or pipeline run for you once it knows that the data has been stored in the database. That is something that right now is just like at the cluster level, but there's certainly a need to also define that for individual namespaces, especially if you are operating Tecton in a multi-tenant environment. We have also been working with the Tecton Artifacts Project. This started as TEP 86 and is now morphed into the Data Interfaces Working Group. And so we've been sort of starting the conversation about whether or not Results should fit in here as like a sort of part of the solution. This is still new exploring territory. We're still trying to understand whether or not this is something where Results can play a part. And finally, in terms of integrating with the rest of the Tecton ecosystem, so if you looked at the abstract, I did mention the TKN Results plugin and I did not demonstrate it here today. But yes, we do have a plugin for the TKN command line, the TKN command line, which you probably saw in some of those commands. It makes it easy to manage your Tecton pipelines. It works like a kubectl plugin where it lets you extend it and it has support for accessing, at least right now, the Results and Records. We need to upgrade that so it can now result. It can support the logs feature. We've also talked about having integrations with the Tecton dashboard. And last but not least, we can definitely use improvement with our docs. Those who know me, I am very much... Docs is very close to my heart. I got my start with the cloud-native ecosystem by contributing to docs. And it's something that is very hard to do. So we don't just need code contributors. If you are a technical writer or if your company is using TKN Results and you have technical writers on your staff, we would love you to come to the community and contribute. And speaking of the community, if you want to get involved, the code is on GitHub and that is primarily how we manage the backlog with GitHub issues. We also have a Slack channel on the Tecton Slack. Hashtag Results. You can also join the Tecton mailing list. If you join Tecton Dev, you not only get access to the mailing list, you get access to the calendar and the docs. And we have a working group that meets every week via Zoom. We have a pretty globally distributed team who is working on this. So we split between time zones. We have basically East Hemisphere, West Hemisphere. East Hemisphere meets at 8 a.m. Eastern, 1300 UTC every Thursday. And for the Americas, it's at 12.30 p.m. Eastern, 17.30 UTC. I do have a link to the working group's document, except I need to update it so that that information is all there. So without a further ado, I think we've got a few minutes left for questions. So anyone have questions about results? One last thing. Andrea has a question over there. Sorry, I was just thinking when you mentioned the dashboard integration, whether it would make sense or do you have any plan to start getting resources in results as they evolve as well? So before, like, getting a task run in results before it's finished, so you could point then the dashboard directly to it without having to switch between... So we had... I think the... So the reconciler, actually, it does start updating the database as the task run is running. That's actually kind of how we get those annotations onto the task run in the first place. So in theory, you could just start accessing the data on results once it's been put there. Cool, thanks. But I guess to follow up on that, it might just be the matter of, like, if you're looking at the data there, just like Kube and Tecton, it's a controller, so it's not going to be real-time. It's going to be eventually consistent. Any other questions? All right, I think we've got maybe one minute since we started a little late. One last thing is that we have a logo and actually five logos to be exact. So the wonderful and talented artists at the Linux Foundation have provided these five candidates for the project so that we can join our fellow friends with a logo in the Tecton project. We have an active poll that's open on GitHub and discussion, so please take a look, vote on your favorite, and let us know what you think. Which one's your favorite? I do not want to bias the poll, so... See me outside afterwards, maybe at the bar, and then we could chat. Other questions? All right, well, thank you very much, everyone. Thank you for joining. I hope you all enjoyed CD-Con and GitOps-Con and enjoy Vancouver and the rest of your week if you're staying for Open Source Summit.