 Hello everyone, I'm Tommy. Thank you for joining these sessions. I'm the senior software developer at IBM working in the open technology teams and Today we're going to go over the complexity on scaling and my pipelines on QBDT using TecTons So to begin with let's talk about what is TecTons pipelines? So TecTons pipelines is a project that provides QBDT style resource for declaring CI-CD style pipelines And it's mainly you know construct with you know few main CRDs namely task pipeline, task-run pipeline-run and custom runs These are three you know major CRD that we use we in TecTons pipelines For examples pipeline runs it defines the execution of the pipeline itself that composes multiple tasks a task run that defines the execution of a task which is a set of few steps and Also, we also have our custom runs that allows you to instantiate and as a queue Custom tasks which is able to implement with you know any custom class controller, which you could define your own task logics and What I want to use TecTons pipeline to build our ML infrastructure Well, the main reason is that like we need to build our ML infrastructure on Tower OpenShift and OpenShift container platform is the industry leaders It's the industry leading enterprise given native platform Which it brings out the box of many feature for developer including CI-CDs and the CI-CD for OpenShift Platform is OpenShift pipelines and OpenShift pipeline is based on TecTons projects which offers native integrations with the OpenShift platforms and provides smooth experience for the developers and Out-of-the-box OpenShift pipeline also certified by Red Hat for OpenShift and also have enterprise versions available on OpenShift as well and That's why we have this is the main reason why the run, you know Secure pipelines for ML and AI on OpenShift and this is the option we have chosen and When we actually look into what TecTons have provided out-of-the-box, we see a lot of the good things So TecTons will help us run any more floats It will construct and create new parts for each of the tasks that you have defined it will have like condition that It will skip tasks when the foundation You know it's not match and able to pass parameter very easily so any of the inputs and output can be passed in between multiple parts and It has like APIs to connect, you know custom controller So you want to have your custom logic for your tasks you could also do that and You were able to optimize, you know We'll flow very easily on the controller levels and able to provide abstract the template which it helps us When we deploy the pipeline on different environments, it's easy to have a different configuration settings that define on the pipeline levels and As we kind of build, you know ML pipeline on top of TecTons We see like this few teacher We want to have it in TecTons and our IBM team actually work with the telecommittee to accomplish this kind of features So first of all, like we have the TecTons finally it would help us, you know Like define any like error handling cleanups if the pipeline is finished or failed and we also have a very standard API Definitions and have a lot of way you could abstract the specs So you could actually define a global or cluster-wise specs that able to share across model pipelines Or share across all your tasks inside the same pipelines and we also have TecTons work space Which it help you define the common, you know volume across all your pipelines And you could also define volume within the same pipeline so all the tasks are shared the same, you know our workspace definitions and Of course, we have the TecTons custom tasks which allows you to have any custom logic that is not provided by the TecTons pipeline out of the box and we also have different termination Logics where if you want to cancel a pipeline in machine learning sometimes you want to just get rid of all the resource that you're running To just free up all the resource we have that logic But you can also do the traditional CICD where you want, you know, the current running tasks to complete first Then you exit the pipeline you could also do that So there's multiple way you could terminate, you know, TecTons depending on your need and of course We machine learning we have a lot of parameters So having different like matrix and they put a loop over those matrix is very helpful and With the new TecTons release it actually introduce a new common expression language Conditions so now not you're not only able to you know have condition just do like string matching your goals Have simple common expression language to do the conditions on the fly. So that's very helpful With all these fantastic features on TecTons We're able to build, you know Very good workflows right using TecTons basically just create a pipeline objects, you know to the give it any control plane it creates the pipeline CRD and then in the pipeline CRD will create, you know, task CRD that basically just construct with parts and all those parts will run you know, each of those step which is a Container by itself and this is the fantastic, you know, CICD platform and we're able to use them to, you know, build a simple, you know ML pipelines in this case, but we see the limitation when we try to scale and build more complex ML pipelines So some of the limitation we found at the very beginning is that like TecTons, we have no caching out of the box. It makes sense because it's actually just a controller handle workflow. There's no storage for storing cache and there's no good, you know, Python SDK capability Let's say for data scientists is very difficult for them to, you know, compose a pipeline just using YAML So it's good to have a Python SDK to help them compose pipelines and with there's no like out of the box garbage collection So as you run more experiments those problems actually, you know, leave on Kubernetes which kind of ways allow, you know This resource in our case and there's no log archival. So when you actually try to clean it up there's no way you could, you know, retain all those logs and Of course, when you have more complex pipeline, you might want to share, you know, a class model pipelines Currently, there's no good out of the box way to do on TecTons So we see that as a limitation where we have to have all the pipeline that composing single TecTons pipelines which limit our scalabilities And that comes with the next line where you compose everything into one single pipeline With the Kubernetes, you know, XCD store, you kind of limit on let's say 1.5 megabyte on your pipeline definition spec which limit the scale of the pipeline size itself and lastly because ML pipelines, right, you need to understand what kind of inputs and output, what kind of RFA produce So we need some sort of metadata to track all those information So their scientists could easily go back and understand what they actually, you know, run and produce within that pipeline And because of that we actually look into like all the different, you know, open source projects and we found, like, Q4 pipelines and Q4 pipeline is actually aimed for data scientists to run any runtime, any framework, any data types So for example, you want to just, you know, trade models using PyTorch or using PyTorch to do like prompt tuning You could easily do on Q4 pipeline as well. There's no limitation on any library that blocking you And it also provides, you know, very easy interface for data scientists So for example, it has the Python DSL, so the scientists could define their container spec or their workflow spec using Python itself and Because with the native Python, it's easy to just pass input and outputs between, you know, different Python functions and they would just translate into this input-outputting condition to a Tec-Ton directly and then in addition, right, because we also have looping capabilities Usually we would define programmatic concepts and those concepts would easily translate into like counter-native pipelines such as parallel loops And in addition to that, Q4 pipeline provides, you know, multiple layers of optimizations So first of all, the runtime of Q4 pipeline actually leverage any CICD run time we have provides So at the very beginning, Q4 pipeline was built on Argos before Tec-Ton was introduced And our team see that Tec-Ton is very useful for building ML pipelines So we actually introduced a new runtime to Q4 pipelines for Tec-Tons and with Q4 pipelines, right, you're able to like provide us a common storage for storing experiment and data, metadata tracking So user could be able to track all their experiment and Any kind of like input and output also being stored in this case And once the pipeline is finished, right, we also have like garbage cleanup and garbage collections where we would archive all the pipeline history into a MySQL database So we will clean up all the space from Kubernetes But you could still have all the information that's stored in the database for analysis and future reference as well And on top of that, we also add like caching capabilities So anything you have run before and you have no You wonder like just cache the steps and not waste resource Once let's say your pipeline is saved, you just reward it and you want to cache all the previous steps you could also do that with Q4 pipelines and With this you could see like Q4 pipeline provide a good way to clean up all your All pipelines in the Kubernetes control plane and provide a good way to you know cache your workloads inside the existing Tec-Ton pipelines and What we have improvements, you know Q4 pipeline on Tec-Ton v1 is more like an extension of the existing Tec-Ton features So we are not really not modifying the Tec-Ton pipeline itself too much If we are mostly just still mapping, you know the user pipeline definition pretty much want to want to Tec-Ton pipelines But adding you know additional optimization features So we have see that like Garbage collection so it that helps just reduce Kubernetes Xcd this size, you know provide a Python DSLs and API server so any user request could go to like the Q4 pipeline API server and reduce the number of query user need to do to the Kubernetes API directly and Because we actually want to do like caching without Modifying the Tec-Ton pipeline itself in the first version. We're actually using part mutation for caching which even without Q4 pipeline you could leverage this feature with native Tec-Ton just by adding two annotations so there's no impact to the Tec-Ton controller logics and At the very first version where we implement Tec-Ton backends to Q4 pipelines there's no, you know common expression language and pipeline loops, but we could easily extend those features using custom test controller that's how we actually leverage custom caches right at the beginning and Q4 pipeline also provide ways common storage for you to store the common artifacts and we also take that advantage and store our logs, our card our logs in that common storage as well and With the Python DSLs, it's easy for data scientists to produce any helper function and let other data scientists We use so it's easy to produce Common helper function and able to compose, you know, portable common function into one simple Q4 pipeline test For example, we have a user that Produce logic that wait for all the other files to be ready then Execute that could be a simple Python function that defines in Q4 pipeline and then in the compiler code because it's commonly used we could just take the same logics and apply to all the pipeline that other users have been using and Finally, it also produce provides a very simple preliminary metadata tracking So it actually give you an idea what kind of data is actually flowing in and what kind of data has been consumed for each steps With all these awesome features, we still see this limitation on the scalabilities Namely when we do the caching because we don't want to interfere the tech time controller logic But from the beginning The caching only can be done for a part mutation, which we still have to like Schedule the part we still have to create a part construct the part that is the big bottlenecks and because our power user actually have like 10s of thousands of tasks in each pipelines With tens of thousands of tasks, even though you try to like cache them You still have tens of thousands of parts. So that actually bottlenecks our schedule as well of we're allocating resources and because of the Big size of pipelines, we are not able to store, you know, anything that's more than let's say 2000 tasks inside a single pipeline because of the size of xcd's and As we actually put all these tasks together By by the nature of tectons are you want to break tecton into multiple pipelines and Just have them all connected together and passing from the between those pipelines It gets very complex and we try to compose that complex pipelines Traversing that graph and you know validating that graph and passing around around that graph is very difficult So this is why we kind of see Even just to traverse like a very big big graph in tecton. We see bottlenecks during the scheduling aspects So this is why we kind of move into a new design of give a pipeline called give a pattern v2s So instead of you know, having keep a problem to be a smart compilers that maps, you know user Define the python's definition one-to-one to tectons. We actually just have a smart runtime where the KFB DSL KFB SDK or whatever your local node codes interface that you compose a pipeline all compile into a intermediate representations and behind the scenes this intermediate representation have all the lot all the graph definitions all the Requirements for composing the pipelines and let the back end itself to optimally just map their own subset of the pipelines to this complex pipeline itself. So From a tecton perspective, you could have multiple tecton pipelines to compose one, you know complex user intermediate representations. So from a user perspective, you could still see the pipeline be represented very complex But behind the scene you would break into multiple pieces So tecton would not have the bottleneck of like scheduling a big, you know graph and traverse a big graph And to achieve this we actually introduce, you know, new concepts called driver execute their publishers. So basically in the drivers, right? So we actually are storing, you know, print tags and contacts in the driver. So it knows Okay, which node is this execution supposed to do? And then once the execution is finished, there will be a publisher that uploads the output artifacts and status to our common, you know, metadata servers called MLMD. So all the pipelines, no matter which pipeline you use, could all, you know, share the same parameter metadata artifacts status using just this MLMD metadata servers. And this is very powerful because this means we actually could break down tecton into multiple smaller pipelines and we do, you know, the complexity of the graph itself from the runtime representations and With the Q4 pipeline V2, another enhancement we have done is because we have Bruntime for Argos and tecton as well. We also create an abstract interface for future runtime. So that's what you want to bring in airflow. It will be very easy for you to just add these three different components. One is a compiler, convert, you know, the Q4 pipeline intermediate representation into the representation you want your runtime to run. And then you have those, you know, files, right? You could convert your pipeline into multiple pipelines. And when you run it, you have to just create an execution client to run those, you know, resources you have produced. And then you also have execution specs. That's where it helps you to modify your underlying runtime specs when you actually need to, let's say, update your parameters or update a small subset of the specs. And when we kind of dive into what Q4 pipeline V2 actually provides, we could see the driver publisher model actually separate into, like, kind of two categories. So when we actually say driver and publisher on the, you know, graph level, the direct, acyclic graph levels, you could see the driver actually just produce the parameter artifacts for the parent contents and then see whether or not our sub-pipeline then, because the user could also produce multiple sub-pipelines, we could see whether or not this sub-pipeline has been executed and we can cache it. If so, we could actually skip the whole sub-dag and just reduce the complexity of the graph itself. And then once the graph is finished, where the publisher could just take the information from the sub-pipeline had produced and just publish it back to the media service, and then on the task level, when you actually won't need to run a part for that particular task, the same driver and acyclic logic exists. So the driver would actually check whether or not this task has been executed before, fetch all the parameters, compute the conditions, and then if the cache exists, then actually it would just skip the acyclic part. If not, then it would just create an acyclic with the publisher embedded. So in this case, the publisher is actually a binary inside the user part itself. So we are not adding any extra step, we're not adding extra containers that reduce the user run times. And once the user execution is finished, we could see from the publisher aspects, before the user action is created, the publisher main job is actually to pull all the artifacting needs for the user task to run, and then we place all those parameters with the necessary parameter from the MMD, then run the user execution command, and once the execution command is finished, then we grab any artifacts that need to be uploaded to the object storage that's designated, and then publish all the metadata that's related where the artifact is stored and what the parameter is being produced back to the MMD metadata service. So this way we don't have to rely on Kubernetes API at all to, you know, store all those status, you know, know where the parameter is being located. All this information is actually stored in MMD metadata service, so we actually could reduce the bottleneck from the Kubernetes control plane itself. And from a very high level, when Q4 pipelines on Tecton first designed it, we have all these kind of driver publisher tasks. Actually it's a task in Tecton, and as we could see like this, our bottlenecks when we create a lot of tasks inside a single YAML, and when Tecton schedule a lot of tasks, it actually creates a lot of bottlenecks on the Tecton schedule itself. And this is why we, as we kind of improve over time, we try to merge all those driver logics inside our task controller and public controller itself. So our current design is actually able to merge the driver and publisher logic into our task controller itself. So the task controller could just go ahead and evaluate whether or not this task is being cached, and also evaluate all the conditions. And if there's a need, you need to run a pod, you can actually run a pod to accomplish a task. But we're also aware that for ML workflow, sometimes you might need like distributed trainings, or let's say you want to run a RAI workflow, we could just simply just create a RAI CRD, so you don't have to have a pod that's dedicated to run a client. We could just create a CRD for you, and the RAI controller would just be aware of the CRD and run your RAI jobs. So it's very easy to integrate, and we do some number of pod integrals in these scenarios. So to summarize, you know, Q4 pipeline on Tecton V2, it really brings the way how you run custom tasks to handle caching, skipping conditions, and handling parameters all in one place. And we also have the publisher binary run along with the user codes, so it will upload all the user tasks parameter into the ML metadata service. So you're actually able to reduce the limitation from Tecton as Tecton parameter actually passed via the QBND YAML itself. So as you put a lot of, you know, parameter inside Tecton, you will have a limit on, you know, how big your parameters. But with this new approach, the publisher pages just push the parameter to the ML metadata service, which is backed up by current and in the open sources MySQL or PostgreSQL database. So your limit is much higher in these scenarios. And lastly, we also, you know, published also upload all the pipeline status and graph structures all in the ML metadata service. So we actually don't have to retain, you know, the graph structure in the pipeline, the Tecton definition, and on QBND itself. So as we're actually executing the pipeline, we can actually clean up the pipelines because all the information will be uploaded as, you know, the subset of the pipeline is being complete. And with this enhancement, we could see that it would address the part, you know, creation issues when we do the caching. So all the cache right now, you don't need to, you know, create a new part. So it would be very efficient from a cache perspective. Because of the pipeline could be, you know, decouples input, very small sub-pipelines. Graph traversal is very simple within Tecton because you only see a subset of the pipeline. It doesn't have to understand the whole complexity of the gigantic ML pipeline flows. And all those information are stored in MMD. So MMD could see, it stores all the parameters and pipeline status. And it also dramatically reduced the limitation from XCDs. So it could actually break into a lot of smaller pipeline and store into XCDs, right? Only small limitation we still see is that as our power users do have tens of thousands of tasks, each of those tasks still represents by custom resource occupancy. As the number of pipeline grows, we could still see some limitation on XCDs. But this is kind of similar to the concept of checkpoints from machine learning where each of our tasks is actually, we call those status in the persistent volumes. So we could just actually have the machine learning pipeline to be more condensed. And for any parameters, any status that doesn't need to be stored in persistent volumes, we could maybe just have more advanced compilers to combine those logics and only pass those parameters in memory instead of storing all those information in the MMD which reduce our run times. So this is kind of like future enhancement we're planning to do. And with that, we're going to show you quick demos on how give a problem on tecton2.0 is being accomplished. You can see where you're seeing the same SDK and no matter you're using our go with tecton backends. And currently with the Q4 pipeline 2.0, the algo implementation haven't optimized the backend to run driver tasks using HTTP templates. So all the driver tasks on algo currently still running on part but the committee is working on converting the algo back end also to use more like long running server approach which it will significantly reduce the caching time as you can see without that, algo has to take at least 3 or 4 additional parts which dramatically increase the duration where in tecton we could simply just run a cached pipeline within 2 seconds. So with this, I'm going to show the demo. So when you go into the Q4 pipeline interface you can see there's a pipeline you have created. Once you click on this, you can actually see we also have versioning for the pipelines to pick different versions. In this case, I will just first run a version where we have all the tasks being cached. So let's just create a simple run. So when the pipeline is being cached, you can see it could execute it fairly simply over here and complete it very fast. So it should be running in seconds. So once the pipeline is being scheduled, everything is just popped out instantly as we kind of go back. We can see I think the startup times for the controller itself takes a little bit so it's been 8 seconds but it's still very fast in these scenarios. And the fastest and usually in an ideal environment you could see it gets down to like 2 seconds when you run these pipelines. And when we actually get the pipeline run itself you could see the pipeline actually completes within 8 seconds even in the Tecton resource perspective. And you could see we actually run stuff in these pipelines so we actually run at least 4 tasks to evaluate the context of the graph and then also evaluate whether or not we have the cache hit. And all these 4 tasks are complete within seconds. And the Q4 pipeline is very easy for you to just pick which tasks are being cached. So let's do a scenario. Let's say we are not caching the training step because sometimes training you have randomness in this case so we might not want to cache this. So we run the same step multiple times but because the preprocessing is always the same because you have the same data coming in and you do the same transformation so your output is always going to be the same. So in this scenario you could see we could just cache the preprocessing part and then just take the same environment that's produced by the cached outputs and do the training. So you should see you were able to grab the artifacts right from the cached outputs and then do the training. It's actually running the part of the training. And you could see from the output itself could show you the artifact produced in this case a simple message artifact it produced. And then once it produced you could see this is the new model being uploaded to the mineral storage. So it's very easy to navigate in this case and you could simply just decide which tasks you want to cache and which tasks you want to always run. So with this I complete my demo so we want to talk about some of the future optimization we're going to do with our current designs. Because our current designs kind of rely on these driver and publisher models and our initial implementation actually just connecting all the root node behind the driver task and connect all the leaf nodes connect the publisher after all the leaf nodes it actually creates a layer of complexity when we construct the pipeline itself. So the next phrase we actually try to have are control the handles all the driver and publisher logic itself as well. And only let tecton the handles the core pipeline execution. So this way we actually retain the complex retain the same tecton structures but able to add those extra capability to do caching upload status right to the metadata service in the controller itself. And as from the community side we're actually working on a more mature status IR so we were able to upload the graph level status more mature as well so this is why we have some delays on migrating this approach to the new controller all handled by controller approach and then I think lastly we want to auto optimize the looping CRD where we see looping is basically repeating the execute the same tasks multiple times so we actually want to enhance the looping capability to have all the looping tasks to run in the long running server and just input different parameters different sets of inputs so we're actually able to reduce the number of tasks, number of resources that is redundant in the pipeline itself. So with that it completes our tough days so here are the links to the tecton pipeline project KB tecton project for all the optimizations OpenShift pipeline for all of you want to run this project on OpenShift and feel free to come to our Slack channel if you have any questions and you're wondering how this OpenShift project will implement in our product also check out what's in X which also composes all the open source technologies in our product itself thank you very much any questions? any questions in the audience? if not I could take it offline as well thank you very much