 I'd like to thank everybody who's joining us today. Welcome again to CNCF's webinar for Vega, Rethinking Storage for Streams. I'm Kristy Tan, Marketing Communications Manager at CNCF. I'll be moderating today's webinar. We would like to welcome our presenter today, Flavio Junquera, Senior Distinguished Engineer at Dell. A few housekeeping items before we get started. During the webinar, you are not able to talk as an attendee. There is a Q&A box at the bottom of your screen. Please feel free to drop your questions in there, and we will get to as many as we can at the end. This is an official webinar of the CNCF, and as such is subject to the CNCF Code of Contact. Please do not add anything to the chat or questions that would be in violation of that Code of Contact. Basically, please be respectful of all your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinar page at cncf.io slash webinars. With that, I'll hand it over to Flavio to kick off today's presentation. Thank you, Kristy, for the introduction. First off, I want to thank everyone who was able to join today. I know we're going through difficult times across the globe, and I appreciate you taking your time to attend the presentation, maybe learn something new. I see that actually as an opportunity to get away from the overflow of information about COVID, so hopefully you'll be able to learn something new today about streams, about Provega, and it will be definitely worth your time. So as Kristy mentioned, I'll be talking about Provega today, a project that I have been working with the team for the past few years. Before I get into Provega, and its motivation, how it works, architecture, all that, let me tell you a bit about myself. I worked for Dell, Dell EMC. I am a senior distinguished engineer there. I have been working on the Provega project since 2016. So it's a bit over three years. And my background is disability computing. I was in research for a number of years. I worked at Microsoft Research, Yahoo Research. I worked on a number of open source projects, in particular ones in Apache. I was one of the people who started projects like Apache Zookeeper, Apache Bookkeeper, some of the prominent ones I have worked on, and you have some contact information if you wanna reach out to me later on, or follow me on Twitter. Now, so getting to the motivation for Provega, for streams, a lot of the motivation for systems doing, talk about streams and processing streams, storing streams, is the many sources of continuously generated data. If we think about more concretely, applications like social networks, where you have end users, continually streaming up events, status updates, or if you think about websites where users are transacting, so they are purchasing, they are, I don't know, bank applications, where users are continuously producing, producing those events and generating transactions. So those can be streams of data that you want to collect and you wanna process. But it's not only about end users, it's not only about human beings on the other side of the screen, it could be us about machines, it could be about a fleet of servers that you wanna collect telemetry about, learn more about how the servers are being used if they are operating correctly. All those are valid ways of valid applications for which you need to collect such data. Other types of machines that are interesting to think about and collect data from are, these days we talk a lot about IoT, so sensors makes a very good use case for discussions around streams, continuously generated data. And there is, of course, all the conversations about autonomous cars, connected cars that will leverage stream processing as well, so there will be a good number of streams coming out of all those machines, all those end users and all of those might be a good source of information or output for a good number of applications. And so if we think about the landscape that I'm trying to portray, we have on the one hand, on the left-hand side, we have, again, end users' machines producing a continuous flow of data. And we wanna be able to ingest that data, store it and process it, not necessarily just in those two stages, maybe there are complex combinations of ingesting, processing, storing, derived data set and processing again and composing stages of processing and storage in ways that benefit your business, your applications. And so when we think about outputs that could be the result of those data pipelines, we can think of visualizing data in different ways, ways that give us more insight or just enable us to understand better the nature of our applications, our devices. It could be alerts if you talk about service, you might wanna learn about problems that your fleet of service is experiencing. You can get insights about your customers, recommendations based on the use of other users or even actionable analytics, things that it can use in your daily job or when you go visit a customer or it could be even between machines as well. So all those are valid outputs that you can think when you have all those streams available to you that you can process and derive value from. Now, one step down and talking a bit more concretely about use cases. So we have seen, I wanna mention a couple of general use cases that we have seen. So we have seen, for example, fleets of drones where they have cameras and they are recording, they're streaming that video which you want to ingest and process. And people wanna do that for different reasons. We have seen applications where you wanna check the health of your cattle, two applications where you are checking, inspecting your planes between flights, say in an airport. So all those are examples of, both of them are examples, perhaps a bit in two extremes of the spectrum but where you wanna use fleets of drones and you wanna ingest those videos and even other telemetry, process the data and make sense out of it. And another interesting point is those applications, they are interested in ingesting the data, processing it and getting those insights in near real time, right? So essentially as soon as possible but also they want to come back eventually and reprocess it. So the ability of getting results with all latency but also the ability of reprocessing that data, they are important for this class of applications. And along similar lines, we have seen the factory floors where you have video cameras recording videos of pieces that are being manufactured and you wanna detect maybe defects in those parts. So for that you ingest videos, you process videos and you get some output that gives an indication whether things are going right or wrong. And again, in this example, we also want to process the data as it comes in but we might also want to reprocess data later if for example we missed something or maybe we found the problem in the way we're processing it. So that ability of reprocessing is also relevant for such applications. Now, moving down one level and talk about streams, let's reason a bit about how we see streams of data. So the most natural way to think about a sequence of events where these events are again coming from sensors, servers and users. And so we are ingesting those events and we are pending one after the other. That's a naturally of reasoning about streams. But in reality, it's not just a single flow like this. We have some degree of parallelism. We can have multiple servers, multiple sensors, multiple users. And so it's not really just one single sequence like that. It looks more like the figure that I have on the slide that we have again, multiple flows. But even that, even that does not completely characterize the kinds of data streams that I'm referring to. So data streams are not necessarily constant in the way we had in the previous slide. So the traffic that you see in a stream, even with the parallelism that I have added, that degree of parallelism does not, is not necessarily static or constant. So it could vary over time. So maybe at some point in time the load drops, at some other point it increases, maybe this is periodic, maybe happens when you double the number of sensors you have or the number of servers you have. And so a stream, a real stream looks more like this. Where again, we have parallelism and the load fluctuates over time, either periodically or at specific times. Another important property is that this can be unbounded. Such streams can be unbounded. So they can start and you can keep collecting data for as long as your application runs. Traditionally, applications have split that data into fresh, recently ingested data and older historical data. And in many cases, even separated that data in different systems. And I'm gonna call this the Lambda way referring to the Lambda architecture. But the ideal situation is that we don't make that separation, right? And we see a stream as one unit, one flow of data. Of course, you as a user, you have the rights of deleting the data if you wish you, but you should also be able to store a stream for, or stream data for as long as it makes sense for your application. And it's not all about writing, right? So reading is also important. So read scale is another property of any system that the user data streams needs to incorporate. So they're big deal of dealing with these changes, dealing with the parallelism, dealing with the changes to workloads, also needs to be taken to account on the read side when reading the stream. So that's the motivation for thinking about streams in the way we have, in the way we do in Provega. And the idea of Provega is essentially to create this system for storing streams and having stream as a core primitive. Just like you have file systems or you have object systems, we want to have a storage system that ingests streams and output streams. And at whatever point you decide to process that data, you should be able to read that in the form of a stream. You should have no need of reading the data of storing somewhere else. And so properties that I just characterized and are important are to consider that the amount of data in a stream is unbounded. You might need to store stream data for corresponding to a very long time of ingestion. It needs to be elastic to deal with the traffic changes that I have mentioned. It needs to be consistent. You don't want to see duplicates or misadvance in the stream. And it needs to give you the idea of both tailing a stream and performing a historical processing over that same data. And those are properties that we believe are very important if you have a cognitive application. And we wanted to offer that in a system so that I could play well with all such applications. And that's essentially what Provega is and does. Now, so let's get into details of Provega now. So Provega builds on the notion of segments. A segment is an append only sequence of bytes and it's the units that we store in Provega. Now note that it's bytes, not events, messages, or records. And this is important because it shows the flexibility that our design gives. We can have an event API, but we can also have other APIs that do not have the notion of say events, messages, or records. And for events in particular, given example, we can rely on a serializer from the application which will be responsible for taking events and transforming to bytes. And that's the final state that we store. Now when we talk about, so one of the things that a segment give us is the ability of providing parallelism. So an application that is writing to a Provega stream is able to write to those segments in parallel. So that gives us a high capacity for ingesting data. Now when an application is writing, you can use routing keys to map the particular, so if you're using the event API to map the particular events to segments. And in that way, we're guaranteeing per key order. But across keys, we do not necessarily guarantee order. But again, that's the cost of, that's the benefit of providing higher performance or giving the parallelism that we're getting with the segments. Another benefit we get from reasoning or thinking about segments is the ability to scale streams. So we can start, say like we have in this example, we start with two segments, then we can go to five at some point because our load is higher. And then at some later time, we can scale down because the workload has dropped. So those are operations that we can perform because again, we have this notion of segments. We can see the segments and we can create new segments and compose that in the stream. Segments are also used when we take the transactions. So when an application creates a transaction in Prevega, we create temporary segments. We call them transaction segments. So the application, as the application is writing events to the transaction, it's writing to those transaction segments. So there's no interference between the events that are being written to the transaction and the events that are being written to the main segments of the stream. So that data that is being written to the transaction segments, they only become visible in part of the primary segments of the stream once they are committed. And once a commit happens, then those segments are merged. Now, if for any reason the application doesn't want that transaction anymore, it can abort and the transactions are deleted. And there's no trace of the data after that. So again, there's no interference between the transaction and the primary segments in the case of a boards. We can also use segments, for example, to implement of replicated state machines. We have the notion in Prevega of revision streams where we have conditional appends. So we compare offsets when appending and that only happens if the offset matches. We use that property to implement a primitive called the state synchronizer, which we both expose in our API and we use it internally. I haven't talked about reader groups, but we use in the coordination of our reader groups. And in general, as I have just mentioned, we can implement replicative state machines with it and that is done with optimistic concurrency. Now let's focus on a bit on one of the features that I have just mentioned, which is stream skating. That's an interesting one, it's one of the key novel features of Prevega. So skating a stream consists of dynamically changing the set of segments of the stream. So you can, for example, start with a single segment and later we decide that we need a higher degree of parallelism and create new segments. So it can go, say, from one to two. We have the ability of scaling automatically. So if you configure a stream to outer scale, then Prevega will internally track the load and scale accordingly. But also you have the option of manually scaling the streams in Prevega. And you can think of them as proactive versus reactive. So for example, if you anticipate that your workload is such that you will need a higher degree of parallelism, you can go ahead and menu scale it ahead of time. While auto-scaling is reactive, as it observes that the load has changed, then it will react and scale accordingly. So those are the two mechanisms that we offer as part of scaling streams. Now, looking in more detail on how this works, let's say an abstract example before we go into a more concrete one. So we start with a single segment. Remember that I said that in the case of events, we map events to segments according to the routing key space. So we have the space between zero and one. And I'm starting with one single segment. So all keys will map to the same segment. Now let's say that we have a pair of keys that are hot. And so that induces a scale-up event. So we split segment one into two equal segments of equal length, segments two and three. And that keeps going. At some point, we decide that that's not enough and oh, actually, let me give an example first. So to understand this a bit better, let's say that those keys are representing locations in a geo-application, all right? So say that we talk about taxi rights and you're looking at a specific rights from people in a particular city. So some particular location can be hot because of any events and that could induce such a higher load to give for a couple of keys or a number of keys. Now, but let's say that that's not enough. I say that's not enough. And now we end up splitting again into segments four and five. And that keeps going until the point that those keys become code, they go back to code and Pravega realizes that the traffic is not hot as it was before and the two segments can be merged into one single segment. And so that's the end state of the stream. Now, one important observation here is that as the segments in the stream are changing as we have these dynamic changes to the set of segments, a single key does not necessarily map to the same segment over time. And that does not require anything specific from the application. The application doesn't have to do anything specific about it. So Pravega under the hood manages those, there's a segment of keys to the specific segment. So if we pick, for example, point nine, we'll see that initially it points to segment one, then it maps to segment two, then to four, then to six. And again, under the hood, Pravega will manage that those things for you and that will be transparent to the application. Now, let's look at this heat map. Hey, Flavio. Oh, yeah, go ahead, Gracie. Oh, we just had a question come through the Q and A. Do you want to answer it now or do you prefer to wait till the end? I'm going to talk about this. I'm going to talk about storage in a second. I haven't talked about the architecture yet. Great, okay, sounds good, thanks. Okay, so heat map. So this is the heat map we generated from a re-execution. This is in one of our task clusters. And what we are seeing here is the set of segments over time that we have. The color represents the loads in that given segment at a given time. So if it looks reds, then the workload's higher in that segment. If it looks like light blue, then it's the workload's lower. So that's the spectrum we see at the bottom of the heat map. And the white you see is the split between segments. So we starting from the left, we see a good number of say whites, right? Which indicates a good number of segments. As we move towards the right, we see that the number of segments reducing. Which kind of indicates that the workload is dropping enough that segments need to merge. And we can also see that based on the color of the segments, like the light blue colors in those segments. Down to a minimum that starts at around, I don't know, maybe 2.30 and goes all the way to a bit over 5.30 a.m. So there's a minimum of two segments around that time. And then from 5.36 a.m., it starts to pick up again, right? We start seeing segments splitting, a good amount of red, which indicates that other segments are hot. And then we have a larger number of segments around that time. Now, we have generated this graph with a workload from this New York City yellow taxi trip records. So beware, it's at the bottom of this slide. And we can see that those changes to segments, they follow the workload that we observe in here. So as we move from left to right, we can see the workload dropping down to a minimum around 4.00 a.m. And then around 5.36 a.m. starts to pick up again. Right, so, and this is precisely the effect we wanted to see with respect to scaling, right? So as the workload changes, as it drops, we need fewer segments, as it goes up, we need a larger number of segments and need more parallelism. So if you put those two graphs together, we can see again that on the left-hand side, we have a good amount of margin of segments. And on the right-hand side, we see segments splitting as the load picks up. So that's an illustration of what we expect to see for a stream scaling in a production application. Now let's talk a bit about the Provega architecture. So one of the questions that we got was about the storage subsystem. So let's talk a bit about not only that, but other aspects of the architecture. We, when ready to Provega, so if we focus on events, we have event writers, they append to a Provega stream, and we can have a number of parallel segments. We track writer positions so that we can ensure that in the case of connections dropping and restarting that we're able to resume from the correct position. We also have readers that can read those events. We group readers into what we call reader groups, and reader groups split the segment load across the readers in the group. They also balance the load across those readers. And we can grow and shrink that set by adding and removing event readers. So that's one of the data that we provide. And that's the case even in the presence of stream scaling that we just talked about. Internally, Provega has two core components. One is the controller. The controller manages stream metadata, the life cycle of streams, and also it manages transactions. Now, the second element, which is a segment store, focuses entirely on segments. So the controller is responsible for making sense of segments and exposing the abstraction of streams to applications. So stream is not a concept of the underlying storage, right, of the segment store. It's a concept that is exposed by the controller. The segment store manages the life cycle of segments and stores segment metadata. We use the tiered storage in the segment store. The first tier is a tier that provides low latency for small writes. And we use Apache Bookeeper for that. And we have a second tier, which is what we call the long-term storage. And that can be implemented with file or object. So we have different bindings for different systems. So you can plug an NFS mounts or you can plug up an object store. Those would work with Provega. So that's plugable in our system. We also use Apache Bookeeper for a few things for coordinating the segment containers, the split of workload across segment store instances and for doing a few things around transactions as well. Let's have a closer look at how the write and read path work. So the write path is such that when an event streamwriter wants to append events, the first thing we'll do is contact the controller to determine what is the segment store that he needs to append to. And that's based again on the assignment of work to the different segment stores. So once it learns from the controller, it can connect to the segment store and start appending bytes, the bytes from the events. It will, those writes will be persisted to Apache Bookeeper. So we won't acknowledge to the event writer until we receive a response from Bookeeper that has been persisted. And Apache Bookeeper on its end will guarantee that the data is persisted. And the second tier, which I have mentioned for long-term storage, we don't write treat immediately. So we asynchronously move the data to that long-term storage, which enables us to trim data, to truncate data from Apache Bookeeper ledgers, which is the log abstraction that Apache Bookeeper exposes. And as I had mentioned before, for long-term storage, we offer different options. You can use, for example, HDFS or an NFS mount to serve as the long-term storage. On the read path, on the read path, the reader will follow a similar sequence of steps to contact the controller to know what to read from with segments or serving a particular segment. And the segment store will serve data from the cache. If it's a cache hit, then it will serve the data immediately, which is the case when you're taking a stream. But if you are performing historical reads, then it's most likely will be a cache miss and you will read data from long-term storage. So that's essentially how the read and the write path work. I want you to switch gears now and talk a bit about how Prevega connects to applications and in particular to stream processors, which is what I mentioned when I talked about the landscape of applications. The main idea is that we develop connectors. So we have a sync connector that takes data and writes to a Prevega stream. And you also have the other end, which is when you want to consume data from a Prevega stream. That would be a source connector. So one example of a connector that we have developed is the Flint connectors. You have the URL of the repository at the bottom of the slide. But that's the general notion of connectors and we have developed a few of them. So there is the Apache Flint one. I have just mentioned there is one for Apache Hadoop. We have plugins for Log Stash. There's a recent one developed by the community for Alpaca. And there are more to come. There are some that are under developing, developing on our ends and hopefully we'll see more contributions from the community. But here I want to focus on the Apache Flint one because that can show some interesting aspects of connecting a stream processor with Prevega and the properties that we can get. So Apache Flint is a framework that enables, enables one to do two things. One is to write distributed processing applications. And the second is to deploy these applications in a distributed manner. So it's a framework that enables you to write such code and also enables you to actually run it. So it's a runtime for such applications. And it's able to process both bounded and unbounded datasets and bounded datasets being streams. In bounded datasets you could consider as historical data. Now the idea of using Prevega with Flint would be that the data from all the sources continuously generating it, that's ingested to Prevega. Prevega can serve both third-party applications or Flint jobs or we can have Flint jobs serving those consuming applications or we can have combinations of those. So the interesting aspect is that we can use this interleaving stages of processing and storage to build complex pipelines. And this is essentially the idea of this figure where we have again the data being ingested to Prevega and perhaps you have multiple stages of Flint jobs that can give you the final output that you need. And again, you can think of applications where you wanna do multiple stages. You cannot, you don't wanna do one single stages, one single run over the data or you might want to derive some intermediate datasets before you get your final output. So when reading from Prevega, when reading from Prevega, what's called source tasks or source tasks in Flint, each one of them will execute a Prevega reader. So the set of Prevega readers across the source tasks will form a reader group and they will split the load of segments across them and even deal with the changes to segments that could happen because of scaling. And all that complexity, as I mentioned before is hidden from the application. The application doesn't have to deal with any of those changes to the set of segments. Now, one interesting, one important feature that we expose in the Flint leverages is checkpointing. We give the ability of getting a position across the segments of a stream so that Flint can use that as part of its own checkpointing mechanism. So when the master of a Flint job is ready to take a checkpoint, it will request from the reader group a checkpoint internally, the readers will coordinate via state synchronizer and as part of that, each one of the readers will emit a checkpoint event and that checkpoint event is going to trigger the following steps of the Flint checkpoint mechanism. And finally, when that process completes, the master receives a checkpoint and stores that as part of the metadata of a checkpoint that the job requires. And we've seen the second why that's important for exactly one semantic checkpoint. A bit of code, you can create a Provega source by creating a Flint Provega reader from the connector I have mentioned. So to do that, you pass a Provega config, you tell which stream you wanna read from, a serializer and you can use that as a source as part of a Flint job. So you can see more detail in our repository containing samples if you're interested. Now let's move to talk about writing. So in writing, I'm gonna skip that at least once apart and focus entirely on the exactly one spark which uses transactions to write back to Provega. So if you have a job that you're passing data, say you're reading from a Provega stream and you're crossing it now, you're dumping to another Provega stream, that will happen in the context of transactions. And so to make that work correctly, Flint will execute a two phase commit like protocol. As I mentioned, you also have the option of disabling that and using at least one semantics in which case you won't be using transactions. So the way it works is the Flint master will start the checkpoint. You execute those steps I mentioned previously and that's the step that I'm calling prepare, right? So when it starts that process of checkpointing, you will push marks which will flow through the data graph of a Flint job. When those marks reach the sync tasks, they will acknowledge that to the Flint master. The Flint master, once he hears from all sync tasks, it will complete the checkpoint, indicating that to the sync tasks. And at that point, they will commit their corresponding transactions. So that flow guarantees that the data that data is being processed in an exactly once manner. So this is all against some code. Now to write a sync with the Flint Provega writer, here again you need, you can pass the Provega config, you can tell which stream you're gonna write, you're gonna write to have an event router to write just specific streams using, I'm here using the exactly once mode, a serialization, et cetera. And then you can add that as part of your job. Okay, so that's what I wanted to say about Flint. Let me now talk a bit about Provega on Kubernetes. So we use operators in a few places. So an operator is a custom controller from managing the life cycle of an application. In our case, an application is Provega, but it's also other systems that we use along with Provega, for example, Bookkeeper and Zookeeper. And it does automation on a number of dimensions, deployment, configuration, I don't know, making sure that looking for disruption budgets, looking at a party affinity and anti-affinity rules. So those are properties that they look into. So scaling elements of the system, performing upgrades, so for example, in Provega, upgrades are managed by the operator. And also monitoring the health of the different elements of the system. So as I have mentioned, we do, we have a number of operators. We have a Provega operator, which focuses on the core elements of Provega, controller and segment store. So we have the Provega operator for that. And we have a couple of other operators, one for Bookkeeper and one for Zookeeper. So you can see the repositories listed in this slide. If you're interested in more information, you can go check our documentation. All right, so that's pretty much all I wanted to cover. So I'm ready to wrap up. In conclusion, I motivated the work of string processing of Provega. It's coming from the needs of processing this multitude of sources that are continuously generating data. I've talked about end users that are continuously producing events or performing online transactions. I mentioned IoT sensors, servers, drones. All those are potential sources of continuously generated data that we want to ingest and process, not necessarily in a tailing manner, not tailing that data as it comes in, but also being able to reprocess any arbitrary time. Provega aims to be a critical piece of that puzzle, in particular providing stream as a storage primitive. So Provega is a storage for data streams and it provides important properties for such applications. It's able to ingest an abundant amount of data on a per stream basis. It gives you elasticity based on our auto-scaling feature and it gives us the ability of guaranteeing consistency for the data that is being ingested and also the data that is being read in the way that I had illustrated with Flink. To connect Provega to stream processors, we have to build connectors. I have mentioned a few of them, but I have focused mostly on one example, which is Apache Flink. And I have shown how we can build exactly one-to-one using Provega properties and properties of the stream processor. Provega is open source. It's licensed under the Apache license V2. It's currently hosted on GitHub. And one of the things that we're looking to is looking for a home for incubation. So we hope to eventually hit some foundation and be hosted there. Now, before I close, in the case that anyone in here is interested in getting started with Provega, I wanted to give a few pointers. There is, of course, the website, Provega.io, that gives you a good amount of information, has good documentation on explaining parts of the system, how to run it, and such. There's, of course, the repository itself that you can go and check it out, see what kind of issues we are working on, maybe pull requests, maybe interact a bit with the developers. You can run Provega standalone. You can fetch the repository, run standalone, and see how it feels. Maybe run some samples against that Provega standalone that also give you a feeling for how to write codes with Provega and what kind of features it can get. You can also try and code Bernadies, look at the repository, and see instructions for how to deploy there and all the features that we offer. And throughout the process, feel free to provide any feedback and even contribute that if you have a chance. And with that, I close. I open up for questions. I have a number of pointers here, in the case you're interested in checking any of the things that I have mentioned today. Thank you. Great. Thanks, Flavio, for a great presentation. As Flavio mentioned, it's now time for the question-answer piece of the webinar. So we've got about 10 minutes. Flavio, I'll go ahead and read some questions for you. So the first one that's from Vijay is, does Provega have geo-replication in its roadmap? Something similar to Pulsar? Yes, it is in our roadmap. It is in our roadmap. We don't have that yet, but it's clearly an important piece. And we'll do it. Great. Okay, the next question is from Sharif, and apologies if I'm butchering these names. Is there a SQL-slash-SQL-compatible layer that can be used to query data in Provega? Something similar to K-SQL for Kafka? Also, does it have a Presto connector? Great. You can do SQL-like queries with Flink. So you could use that. As for Presto, we don't have a connector yet. It is. That's another thing that is in our roadmap. Great. This question is from Anderson. Provega can be used in a serverless architecture, Lambda functions, or asking if Provega can be used in a serverless architecture. Yes, absolutely. That's another thing that we're thinking about. That's another thing that we're thinking about, and we definitely add some functionality in that sense. It is actually very, very good questions. All things that we are looking to getting. And so if any of you is interested in contributing, we are happy to hear your opinions, get your contribution, and analyze it. Those are all great questions. Yeah. Looks like we have another one that just came in from the cost. Does Provega have predictive analytics capabilities? Provega itself is only the storage bit. So if you want to connect to anything that performs the analytics, like I have described with Flink, that makes total sense. So Provega gives you the ability of ingesting the streams and reading from those streams, and doing that in a way that you can ingest data for arbitrary amounts of time. The architecture I have explained allows you to keep data for as long as you like. And so in that sense, this is great. If you're looking to into processing a large amount of data for, say, I don't know, training a model. Right? So all those properties of the system, of the storage system give you the data of doing those things. But again, if there is any system in particular that you're interested in using, that should be nothing that prevents you from writing a connector or even working with us on writing a connector. Great. Looks like we have time for a few more questions. Any last minute questions? Please drop them in the Q&A box. Good folks, a minute here. Okay, I'm not seeing any other questions come through. So great. Thanks again, Flavio, for a great presentation. Oh, sorry, I spoke too soon. Is there a Provega Lite version with built-in Tier 2 support? Provega Lite version with built-in Tier 2 support. So not really. Provega will connect to the Tier 2 you provide. If a built-in means that you want to use a local storage, in principle, you can do that. Won't give you a lot of capacity, because one of the ideas for Tier 2 is that we have elastic storage for long term, for storing data long term. So in that sense, I would not recommend doing that. We have thought of the ability of not having Tier 2, but that's not a feature we offer today. Okay, and Sharif is asking, have you done some benchmarking against other streaming solutions, example Kafka or Pulsar? We have. The performance we're observing is comparable to both of them. Of course, there are differences you observe, depending on the case, based on the architectural differences. But fortunately, I don't have numbers that I can share with you right now. We'll have something soon. But we have been looking to that. Okay, and Tom is asking, I think this will be our last question. How does Provega identify bottlenecks in Tier 1 and Tier 2 storage? How does Provega identify bottlenecks in Tier 1 and Tier 2 storage? I'm not sure I understand the question. So if this is asking about throttling, or if the question is about throttling, we do try to look into the traffic that is going, what's incoming versus what's going to Tier 2, so that the amount of data, saying bookkeeper doesn't grow without bounds, and we apply back pressure to decline. Okay, great. Well, oh, sorry, I keep saying that one last question here. Is there any docs or POCs to do with CDC with Provega connectors? I don't think so. So you can look at samples and see if there is anything that suits your needs. Provega samples, that's why I didn't list it here. So it's one of the repositories in our organization. Great. Okay, well, that is all the time that we have today. I want to thank you all again for joining the webinar, and thanks Flavio for the great presentation. A reminder that the slides will be available later today on the cncf.io slash webinars page. And thanks again, and we hope to see you all at a future CNCF webinar. Thank you. Thank you all. Thank you, Christy.