 Good afternoon, everyone. Welcome, welcome, welcome to KipCon. And thanks for coming into my talk. Today, we're going to talk about something interesting about streaming data processing. My name is Derek Wang. My body, Vijay Maras, is supposed to be here, but he has some last-minute change. You cannot make it. Both Vijay and I are principal engineers working for Intuit. And we are the project leaders of our open source project, Numa Flow. I'm personally also the project leader of CNCF Graduate Project ArgoEvents. Here's today's agenda. We're going to start with a brief introduction about our employer Intuit. And then we're going to talk about streaming data processing, its benefits and use cases. And we're also going to talk about the challenges we experienced when we used streaming technology with our system platforms. And then give our solutions our open source project, Numa Flow, followed by a live demo. And then in the end, I'm going to take questions. Start with the introduction about Intuit. I'm not sure how familiar you guys are with Intuit. Intuit is a well-known fintech company which is based in North America. Because we have some very famous products, TurboDax, Pyrtic Karma, QuickBooks, and Melchon. The TurboDax system is the most popular tax returning following tool. Almost everyone knows about it in North America. And all these major products are actually powered by our five key platform areas. These five key platform areas make sure we deliver value to our customers and accelerate innovations consistently across our organizations. And with 100% of our service running on Kubernetes-based modern SaaS platform, we are enabling billions of machine learning predictions to billions of dollars movement across our systems with the secure and smart approach. Intuit is also a very large open source community player. We do not only use open source technology to build our platforms, build our services. We also contribute a lot to open source community. As many of you know, Intuit is a creator of Argo, which is one of the most popular CNCF graduate project. We also got two-time CNCF and a user award in 2019 and 2022. And more than that, we actually open source a lot of projects, some of them are listed here. And the one we're going to talk about today is called Numerflow, which is a Kubernetes-native stream data processing platform. The one with a polyware icon. Finishing all these introductions, let's move on to talk about the streaming data processing. Before doing that, I would like to ask you some questions. How many of you actually are data engineers or ever worked on data processing? If you don't mind, raise your hand. Quite a lot. How many of you actually are doing or ever done streaming data processing? Also, so many. So I'm very glad that there are so many data engineers and non-data engineers coming here for this talk. And I hope we can learn from each other. What is streaming data processing? Stream data processing is a technology that continuously generates data, process data, analyze data from various data sources in real time or near real time. So streaming data is processed as time is generated. It's quite different from traditional data processing. This traditional data processing system that actually capture data in a central data warehouse and process data in groups or batches. So those traditional systems were built to ingest data, structure data before analytics. However, the nature of enterprise data is analyst. There is no bound for those data. Those are streaming data. So which means generating data volumes for those streaming data source could be very large. That makes it no way to use traditional batch processing technology to do real time streaming analytics. And that's one of the reasons why in the past year, the streaming data process platform has started to evolve. So imagine you have an IoT application or you have some sensor data or you are using tools like Kafka, Kinesis, or Palsat to process data. You're most probably doing streaming data processing. Why is streaming processing so important? So a lot of people are actually interested in streaming data processing because it has many benefits. It gave you insight quickly about what's happening in your systems so that it can make quick decisions It brings better user experience. It makes your business activity more efficient. Or imagine you are building an application like Uber, Lyft, or any navigation system. You really want to know what happened in the past minutes or even seconds. Or you're building some system like anomaly detection. You really want to quickly detect if there are any problems with their services so that it can take quick actions or you're building some fraud detection platform. You want to protect your customer from any data loss or money loss in a short period of time. To build this kind of system, to achieve this kind of goals, you really need to use streaming data processing technologies. And on the other hand, a lot of people think streaming data or streaming data processing is only for the job of data engineers. But actually, this is not right. Even you're not using tools like Kafka, Kinesis, to store those data. But think about it. Your application keeps generating logs and metrics all day long. That never ends. Your customers keep booking hotels' flights from your booking system. Or your customers actually place orders from your shopping system. Those kind of data keeps coming to your system. That never ends. Those are all streaming data. So that's why we said a streaming data is actually for everyone, a streaming data processing. Stream data is everywhere. And stream data processing is for everyone. And we talk about streaming data processing in its best use cases. But actually, doing a real-time streaming process is not easy. We have been using streaming technology to build a lot of systems and platforms. And we actually explain a lot of challenges. One of the systems we have built is a normally detection platform, which is used to detect the anomaly of the application running Kubernetes clusters by applying the metrics emitted by those applications, such as latency, error rate, or any custom metrics. To apply those metrics to an AI machine learning model and to compute some anomaly scores. And then we use anomaly score to determine if your application is healthy or not. So this is a system actually with combined technology, with streaming data processing, and the AI machine learning technologies. So the metrics after the streaming data keeps coming to our system. And we experienced a lot of challenges to build this system. The first challenge we experienced is there was actually lots of boilerplate code for streaming in every component. What that means is for the anomaly detection platform, we found our machine learning engineers spent a lot of time on writing code for streaming infrastructure. So for example, you can imagine for the best thing that the machine learning engineer can do is to do machine learning exploration experiments. They actually need to write code to consume data from data sources like Kafka or any other data sources. And they really need to figure out how to write a reliable code to do that. I found there are a lot of things like this. So that's the first big challenge for us. The second challenge we experienced is without having a dedicated streaming data processing platform, you probably will end up with running your streaming platform with a collection of microservices. So to be honest with you, the anomaly detection platform we built at the beginning was actually running some microservice to do that. And then actually, we figured out we really need to deal with a lot of ad-hocs. So for example, you really need to figure out how to make those microservices running in a streaming fashion, and how to make those microservices reliable and maintainable. So there are a lot of ad-hocs you need to deal with. Let's assume you have microservice A and microservice B. You want to really figure out A equals B and B equals C. So like a streaming fashion, it's really hard to do that. So that's a second challenge for us. The third one is for the traditional architecture, it's really hard to do things like experimentation or exploration. So for example, if you want to try a new model for the nominal score generation, it's really hard to get it plugged into the microservice system. Or if you want to do some extra data enrichment or do some feature engineering, there almost no way to do that for a live running system. So that's a third challenge we experienced. But this does not mean there is no existing streaming system available. So there are a lot, like flaying, spark streams, they are all very good streaming data processes powerful. But the problem for this kind of a system is they do centralized data processing. They are very costly and very heavy. So imagine that's not a system you can easily manage by yourself or install by yourself. You need to have a team, maybe an organization to do that. And those kinds of systems actually require a very steep learning curve. So it's not an easy managed system that I can use. Another problem with the existing system is almost most of the system is JVM-based, which means you need to use Java or Scala or those kind of language to write your data processing jobs. But think about if it's really hard for machine learning engineers for this nominal detection platform. Python is the most popular language and favoritized by the machine learning engineers. And Python is the language that viewed for the machine learning infrastructure in a library. And it's really hard for our machine engineers to switch a different language like Java to write the processing logic. And sometimes it's not about how to learn to a different language, but actually there's no way to use Java to achieve some sort of machine learning because there is no available infrastructure for machine learning for Java language. Another problem with the existing platform is they're not Kubernetes native, which means they were not designed and built for Kubernetes, even though they can run on Kubernetes somehow. But for Kubernetes, we know the pod could be terminated or restarted very often due to many different kind of reasons. You want to do node upgrade, you want to do secure patch. So your service needs to be very resilient to those kind of Kubernetes lifecycle events. But systems like Flink, if there is a pod restart for this worker node, and then you really need to pause your data processing job to wait for the worker node to come back. And this is a big problem, which means your streaming data processing platform cannot be running on the same infrastructure, just like your regular applications do. So those are challenges we experienced. And to address those challenges, solve this problem. We come up with the solution, which is our open source project called NUMAflow. So NUMAflow is Kubernetes native streaming type processing platform. Here, Kubernetes native means we do not only running on Kubernetes, we also use Kubernetes native features with the platform, which means NUMAflow actually is like a resilient to the Kubernetes native lifecycle events, like pod restart, node upgrade, and then your data plan job will not be interrupted. That also means if you know about Kubernetes and can easily use NUMAflow to run your data processing jobs to do stream data processing. It's a very lightweight and easy to use framework. You can use it in your Kubernetes cluster, in cluster scope, or running your own namespace. And it's also a language agnostic framework, which means you can use whatever language you want to use to write your data processing jobs. Meanwhile, I'll provide SDKs for those different languages. Right now, we have Java, Python, Goland, Rust support. And it's easy to provide some other language support, as well as the common Java interface are implemented. We also provide many building sourcing things, which means you don't need to write any code to consume data or write data to common sourcing things like Kafka, NAS, Redis, et cetera. But meanwhile, you still have the flexibility to write your own user-defined stores and user-defined things. And all the scaling features are the box supported. We support auto-scaling, auto-scale your workloads all the way down to zero if there's no traffic in your data processing pipeline. And scaling up to whatever number is needed based on traffic and load. And it's a full stream data processing feature provided platform. We support back pressure, detection of pressure handling, and provide watermark support out of the box. And because of all these features, we actually can achieve the cost reduced like 1 third compared with the similar infrastructure running same pipeline. And we have some open source community users actually running a NUMAflow in some GPU devices, which has no internet access. So it's very lightweight structure. We talked about the basic feature of NUMAflow. Let's look at what is a NUMAflow pipeline and how to use a NUMAflow pipeline. And suppose you have a stream data source that you want to do some stream data processing for the data coming from the stream. And how are you going to do it with NUMAflow? In NUMAflow, we actually abstract all the data from jobs to a Kubernetes CRD object named pipeline. So each pipeline contains multiple data processing steps. And we name each step as a vertex. So for this particular use case, the first thing we need to do is come up with the source vertex. But depending on what kind of data source you have, you can use a computing source vertex like Kafka, or you write your own data user defined source. And after having a source vertex and what you need to do next is to add in some UDF vertices. Here UDF stands for user defined functions. And we support a map reduce auto box. So which means you can have some map UDF vertex or some reduced UDF. So usually you do things like a data enrichment, data transformation, even map UDF. And for this use case, we also have a reduced UDF falling in the map one. So where you can do things like aggregation, aggregate by some sort of keys by a period of time. And then in the end, you actually forward data, process data, some data sinks. Similar to the source, you can use a user defined source. Here you can use a user defined sink, or you use a user defined, I'm sorry, the building sink or user defined sink. And NimaFlu has a very interesting feature called conditional forwarding, which means you actually can forward your data to different downstream vertices when some sort of a condition is not met. So here it shows the conditional forwarding to multiple sinks. And each box on this diagram is actually a vertex. And each vertex is nothing but a set of paths running your workload. So we auto scale the vertex for the different number of paths, or each vertex, that's auto scaling. And next I'm going to do a demo. And if you want to try this demo by yourself, you actually can scan this QR code. And it will lead you to get a repository where you can find all the source code, the steps to install, or the demo needed scripts. You will probably can do it by yourself in less than 10 minutes. So the setup is from context for the demo. Suppose you have a foot delivery application, just like Uber Ease. I know Uber Ease is actually operational in France. So you have some streaming order coming from different clients, web browser, or cell phone apps. So at the time your backend server is taking care of those streaming orders, you also want to do some streaming analytics. You want to see what are the most popular restaurants in the past one hour. Or you want to do something like what is the revenue of those restaurants, things like that. And then in the end, you want to send the aggregate data back to another Kafka topic. So this is a very generic use case for streaming analytics. You can define a similar use case in different scenarios. To do this demo, I actually wrote a piece of code to simulate the order of streaming order, which is publishing the order information to a Kafka topic. And then I created a new method of pipeline to do the analysis. And then in the new method of pipeline, we're actually doing some data enrichment and data aggregation and things like that. If you look at the pipeline topology, so I'm going to have a source of Redex, which is used to consume data from the Kafka topic. And then I have an enrichment Redex as some missing information or any information that's needed for the analysis. And then do the aggregation. And then in the end, send to a Kafka scene. And quickly looking into the data transformation for this demo, this is a raw order information. And there's an IDE for this order information. It's a JSON format for this particular data. There's an order IDE and the restaurant ID and the order time. And there are some dishes that customer ordered. Somehow the dish price is not in the order information. So in the enrichment step, actually adding the dish price into the data. And because I want to do aggregation per restaurant, right? So it would be better to show the restaurant name instead of the restaurant ID. So also add the restaurant name and do the order information. And this is the aggregated data. So you can see there's a window start time and window end time. And that's a restaurant name, how many orders and what's the revenue coming from the aggregation. Now let me go to the demo. To do this demo, I already pre-installed the NEMA flow controller in my local cluster. Actually running, I'm running a cat 3D a cluster on my laptop. You can actually run this demo in whatever Kubernetes cluster you want to do. And I already got the pipeline great before I came to the stage. So you can see there's actually a Kafka service I'm running in my local namespace. And also I have an order gem, which is a piece of code I mentioned earlier, which is used to simulate a streaming order. So if you take a look at the logs, we are actually generating some streaming orders and the order information is like the order ID, restaurant ID, right? Just like the one we just looked in the slides. And now I will go to our UI coming from NEMA flow. NEMA flow provides a very fancy UI, something like this. You actually can check the cluster view, the namespace view for the pipelines. And I create the pipeline in the default namespace, if I click default, you're gonna see there's a pipeline running, it's called order analysis. And if you want to create a new pipeline, you actually click the button right here and you put the pipeline stack and then you just submit. So there's some other option you can do from the UI. There's a pause, there's a delete pipeline, there's this kind of operation for existing pipeline. If you click this button, you're gonna see the pipeline to pilot for this order analysis. There's in which is used to consume data from Kafka topic on my laptop. And there's an enrichment aggregation into do this demo, actually for two things for this pipeline. One is used to write the data to a Kafka. So if you can see the spec, I'm actually writing the aggregated data to another topic in my topic output. Also I have the lock sync, which is also a building sync, which is used to print the data in the system log so that we can easily check. We don't need to check the Kafka data over there, right? So if you look at this pipeline, I'm actually printing out the enrichment orders for each order we see from the Kafka topic. So you can see for the enriched order, you can see there's order information and there's a dish price. And there's a restaurant name, which added to the order information. If you look into the lock sync, and we can see the aggregated, aggregated information is something like this. There's a window start time, when the end time and restaurant name, how many orders to the amount, just like we expected at the beginning. And then we're seeing for same aggregation window, we're seeing the average data for different restaurants. If you go back to the pipeline, we're actually seeing some other information for swim data processing pipeline, you're seeing the average processing rate for one minute, five minutes, and 15 minutes. We're also showing the watermark. And is there any back pressure for this pipeline showing the panning message in the backlog? So it's pretty powerful UI provided by Nemofu. And then back to the slides. And quickly look at the pipeline spec. As I mentioned earlier, we have a vertices in a pipeline CRT definition. And for this pipeline, we have in enrichment and aggregation and output. We actually have two things I didn't mention in this slide. The second major section for a pipeline CRT object is called edge, which used to define the relationship between those vertices. And the last piece of the demo is let's look at the source code, all the source code needed for this demo. Only these two pieces. So one is for the enrichment. And we're seeing there's function called enrich. And we're adding some information like a restaurant name for each data received and returning. We also add the dish price here and returning a list of messages. And if you look at this function, it's very generic. You're not seeing any upstream or downstream vertices. You're not seeing any source and things. You only see there's a mapper.datum. It's like the data received for this particular enrichment of vertex. And this is actually the most powerful part for a new workflow that you don't need to care about or those upstream and downstream. The platform will take care of all the things for you. You don't need to worry about it. And one more piece of this enrichment code is like we pay attention to the return message. We are actually doing the message returning with the keys here because we know we're gonna use the restaurant as the group by keys in the next aggregation. So returning the message with keys are like a restaurant name. And then on the right side is an aggregation code. Similarly, there's a generic function provided for the aggregation feature, for the reduced feature. So you're getting a list of keys and you're getting a channel for the message you received. This is a sort of code written in golden. So you're seeing channel or if you are using some other aesthetic like a job, you're gonna see something like the iterator or list is like that. And there's some metadata and returning a list of reduced message. And we're getting the restaurant name from the keys. And for all the message from the channel, we do our for loop and then you do some simple math, right? To calculate how many orders in the 2.0 amount so revenue. And then we return the reduced message as a JSON stream. We set the window start, window end and restaurant name order count in 2.0 amount. And we're not seeing anything about the upstream and downstream. So just like we're doing some sort of unit function, unit for stream data processing. You don't need to care about what's your data source and data sync. You can easily switch the data source and data sync to different types. You don't need to make any code change for that. You don't need to make any logic to your data processing pipeline, all right? So we just look at a demo. I hope you know in some idea about how you might look to do extreme data processing to the stream analytics. And we just check the pipeline, which is like line or tree, tree shift. And actually it's much more powerful than that. So the first picture we're seeing here is like a multiple source use case. So suppose you have a multiple source, one is Kafka, another one is Pausa, and they have similar data structure. You want to process that from these two different sources. You don't, instead of writing two pipeline, you actually combine both sources into one pipeline. And the second picture actually shows the joining use case. Joining is like you can join multiple upstream vertex to same downstream vertex. And we support map joining or reduced joining. You can do either one. So this is like a fork and join use case, like a diamond shape, right? The third one is more interesting. It's like you can do cycling, which means you can forward your data to yourself or someone in front of you. This is very useful when you do sort of reprocessing when some sort of conditions are met or something like a retry, right? And last picture, the fourth picture shows the side input support. The side input is something, if you're familiar with the approach beam and you probably know about side input, that you actually can broadcast the change for your stream processing unit for those kind of configuring changes, which is not very frequent to broadcast those kinds to your stream processing unit without interruption your data processing pipeline. And this is something supporting Emuffler as well. And now let's take a look at the use cases. First one is streaming analytics for Emuffler. Of course, I mean, demo which is the data is actually streaming analytics. And second one is ML ops. The example I mentioned earlier for streaming for non-detection platform is actually an ML ops platform. You actually get this running for Intuit across all of Intuit I guess clusters. And of course, you can use it to like, even driven applications. Your pipeline can consume data from data source like Kafka and of course you can write your own data source, user-defined source to get data from any data source you want to do. And some other information I want to share and how Neymar Flow is used in the open source community. A lot of people are using it to do anomaly detection. And one of the user I think I mentioned earlier that they're using Neymar Flow to do digital signal processing which is running Neymar Flow in a GPU device which has no internet access. We also run Neymar Flow on some Raspberry Pi. It's also working. And one of the use cases we have is actually our open source user is actually running Neymar Flow. It's actually a very large car manufacturer. I don't want to mention the name right here. But they're using Neymar Flow to do map data processing for their navigation system. And some data I can share here for Intuit. We have been doing like five billion message process each day and we're doing 60 million machine learning predictions and unified model fine tune is like 45K per day in Twitter models like 135K. Some of the information I can share here. I think that's all for my demo. If you are interested into this project you actually can scan the large, the bigger one to go to our open source GitHub repository. And the second one is the demo I just did. So we are interested in wanting to run it by ourselves just to scan the bar code here. And now I'm taking question. Thank you. Hey, go ahead. Just a quick question. When you have those inputs and outputs those are the buffers inside of the Neymar Flow documentation. I just looked at. Yeah, you can actually find all of this information from our buffer. You mentioned the key at the output in your example and is that the buffer concept? So that buffer is like a key value store that you have somehow. When you pass the messages around from one step to the other. So the keys are the keys you only use for when you want to do some aggregation, do some reduce feature, right? From a regular map operation you actually don't need to do the keys. But if you want to do a reduce for a group by sort of fixed window, a sliding window, or session window. You want to group by some sort of keys or you are not doing any keys. You just wanted to go by a window and then you don't need to have the keys. But if you want to go by keys you need to give the keys in the previous maps, UDF. I'm not sure I answered your question. Not really, but. I have two questions. I hope it's okay. Yeah. You mentioned Kafka in the example. But you also mentioned Pulsar. Is there any plan to support Pulsar natively? So that's the first question. Right now, we actually have, there were different kind of source support. One is 100% native, which means the source code for some Kafka, the source code of consuming data from Kafka is actually embedding in the platform code in your new workflow source code. So that's 100% native supported. And we also have a second layer of native support, which means we provide some sort of sourcing implementation, but you're going to use it as a sidecar container. And the other one is, you actually write your own source code. You want to process data from, for example, from database. You don't know what kind of schema you have in your database table, right? So you have to write your own user-defined source to consume data for that. And for Pulsar, it's not the first use case right now, but we can make it native support. Yeah, and that would be great. Second question. You had NUMAflow in the same cluster as Kafka from other understood. Is it possible to have Kafka in one cluster and NUMAflow in different cluster completely decoupled? Yeah, of course. So we don't care about where your Kafka is sitting, as long as Kafka can be accessible from the NUMAflow pipeline. I'll give you a hint why I'm asking. We use Pulsar. We would use auth for authentication, right? So just hinting into how would you handle having Pulsar cluster completely isolated? Right now, I would have a client, I would authenticate and I could consume. I would be super curious to see how that could work with NUMAflow. So you mentioned you have a Pulsar completely separating one cluster. Yes. But where do you want to run your NUMAflow pipeline? So on a separate cluster. On a separate cluster, and they need to make sure there's some connectivity to. Yes. Just hinting in this, so maybe we'll comment in the repo, but it's really interesting because maybe we don't want to use Pulsar functions and that would be an interesting use case to use NUMAflow. Yeah, of course. We actually have Slack channel mentioning the data repository. You are interested in use case, or interesting by using NUMAflow to use case, you're welcome to reach out to us. Thank you. Thank you. I have a question. So my understanding is that NUMAflow acts a bit as a Kafka Streams application. So it consumes from an input topic and writes to an output topic. Would you confirm or...? No, we're not using Kafka Stream. We are actually, it's like a... So the NUMAflow pipeline is a, you know, have nothing to do with Kafka Stream. It's like, you actually can use it to consume whatever data you want to consume. Kafka just, I use it because, you know, we into it, we have a lot of Kafka use case. And we provide a first layer support for Kafka. We provide native support for Kafka. You don't need to write any code to consume data from Kafka. You know, what I wanted to say is like, you're trying to like do what Kafka Stream does in a cloud native way. Like, and then like my question is basically for the database. So for the aggregation function, do you have an internal database for the aggregation or...? I'm sorry, I didn't hear you. Yeah. You mean for internal message transmission? Yeah. Okay, I got your question. So there are two layer of a data persistent for NUMAflow pipelines. We actually support exact ones delivery semantics. I didn't mention that during the talk. We actually support that. So there are two layer of a persistent for running a NUMAflow pipeline. We have something called industry buffer between each vertex. So all the message transmitting the pipeline, we actually persistent in the industry buffer. Right now we're using JSTream, NASJSTream, someone mentioned that in the previous talk. We also use JSTream to do that, but actually it's a plugin. You can use whatever, you can use some other sort of ISP implementation to do that. That's first layer of persistence. For the particular reduced use case, for reduced we also process data in each of the part by using PVC. So when you come up with the, when you come up with the, I actually can show that in the spec here. When you put your aggregation step here, you actually need to provide a PVC or provide any other sort of persistent solution. You write directly on file system or like do you have a layer? I see in the file system. Right. So you don't have any kind of cache for large scale data. Yeah. So right now the throughput by support is around, any throughput below 30K per second, you can use Muffler to process. Thanks. And also like one other question, do you have Avro support for Kafka? I mean ordering support? Avro, Avro, Avro schema support. Oh, Avro schema support. You mean other type of data source or? Avro is just a modeling like model. It's, if you have avro schemas, if you can consume data with avro schemas. Oh, okay. So actually we don't care about what kind of data format in your data source. So in our platform, in our system, in the projects, actually what we see is like bytes array. It's just binary. So if you write your own user defined functions, you actually need to be aware of what kind of data schema you're using. So we don't care about what kind of schema you're using. It's not, I actually use a JSON as an example to further demo, but you can determine what kind of data format you want to use. Okay, thanks. Any other questions? Small question. How do you define how you, what's your access point in the container? Because I assume you're not starting a new container for every element you have to process in your stream, but I don't see here any entry point of how this container should start. Like I see image and I see a lot of other parts, but how do you know which function to execute in a container, for example? We actually provide some SDK support. So I actually can show you the source code for this demo. So this is a small QR code I showed there. If you scan the QR code, you actually can see all the source code for this demo. If we go to, so there's a main function, which is the image I mentioned in the demo. So you are seeing, I have a main function here, right? If it's an enriched, we actually start the enriched function like this, mapper.newser, that's something we provided. So in the SDK, or reduce, we have a similar function. Thank you. Yeah. Any other question? I have a question. If you, the whole pipeline exists in one custom resource is it correct? So if we want to add a new sync, how does it work? Should we recreate all pipeline or? If we are adding sync for existing pipeline. So you actually want to change the pipeline, right? You want, for example, you want to add a branch, you want to add a sync or add a source, right? I want to add a sync. You want to add a sync? You just change the pipeline spec and then apply it again. That's it. Oh, okay. But you actually, you know, my answer to this question has not 100% correct. It actually depends on what kind of use case you have. So for example, you're actually adding some new top poly for the pipeline, but actually for updating or removing some pipeline, some, you know, nodes for the pipeline, it depends on if you have any legacy data or backlog in the pipeline. If you are a new vertices, actually it does not recognize, they do not recognize the backlog, the data, and then it's going to be a problem. So you need to make sure those kind of things are not a problem for you. Oh, okay, thanks. Yeah. I think we're out of time. If you have any questions, I will stay here and we'll answer your questions over there. Okay, thank you.