 We'll just wait a few more minutes to make sure a few more people join. Hello everyone. This is the NSF6 storage code. We're just waiting for a couple more users to join, and then Derek and Flavio will be giving us a presentation of Pravega, so we'll just wait for a couple more minutes. We'll probably start at five past the hour. I think we should be able to start. Aaron is going to be joining in a minute. Quinton, unfortunately, had a power and internet outage, so he might not be able to join. So in that case, Derek and Flavio, I'll hand over to you to give a bit of an intro of the of the Pravega project, and I assume you have some slides to share. Yes, hi, this is Flavio. Can y'all hear me all right? We can. Okay, all right. Okay, so let me share my screen. Can everyone see my slide? I can. Okay, all right, so how much time would you like for this? I suppose you don't want me to choose the whole time. Is that right? So how much time do you think I have just so that I calibrate? This is the main thing on the agenda for this meeting, so you have most of the hour. Okay, all right, okay, so I won't rush then. My name is Flavio, and now we talk about Pravega as Alex just introduced. So this is pretty much the same presentation I gave at the CNCF webinar yesterday. I did make a few changes that removed some of the discussion on Flink, and I added some new content. So hopefully many of you haven't seen that presentation, so it won't be new to you. Before I get into Pravega. So a bit about myself, I am a senior distinguished engineer at Dell. So I've been working on the Pravega project since 2016. So I have completed those towards the end of 2016. So I completed a B3 years working on the project, so complete four towards the end of the year. My background is in disability computing. I was in research for a number of years. I was researching in Microsoft research, early on in Yahoo research. And I work on a number of Apache projects. Most prominent ones that I actually helped to build from scratch are zookeeper and bookkeeper, both in the ASF. So I have some contact information in case you want to reach out to me or follow me on Twitter. So email and the Twitter handle. So now let me move on and start talking about motivation and Pravega and such. So the main motivation for Pravega and many of the systems that you hear that are dealing with streams, stream processing in general, are the many sources of a continuously generated data that we have out there. And I'm sure that you have come across a good number of them and that's not a surprise. But being more concrete about the sources, they can be events that end users generating. We can think of the traditional social networks or users are posting events. Or you can think of online shopping where users are purchasing items that are performing online transactions or they are searching for products. So all those generate data that you might want to capture. But it's not only about end users. So you can think of machines also being sources of continuously generated data. You have servers that are continuously producing telemetry that you want to capture so that you can spot problems early on in your food of service. So that would be when you use case. But it's not only about servers either. There are the types of machines that many users and applications care about. So sensors in IoT, the whole promise of connected cars, autonomous cars and so on, it's not quite a reality yet. But we are going towards that direction. So hopefully that will become a reality eventually. But all those will be continuously generating data and ingesting that data and processing it could be interesting or even a requirement for a good number of users. Now, if I put those comments into what I'm calling a landscape here, what we have is on the left hand side, we have various sources of data. End users, we have machines. I don't know if you have drone sensors, connected cars, all these things producing continuous flow of data that I want to capture and that I want to process. But now the processing might not be as simple as just filtering the data or normalizing the data, it can have various stages. And so in the end, we need at least two car components so that we are able to achieve this goal of ingesting processing the data. One is storage. So capturing the data, storing it so that I have it available for processing. And second, having a stream processor that is able to take that data and make sense out of it. And those things can be combined in a number of ways. You can think of the processing as a directed graph where you have interleaving stages of storage and processing. And the output of those pipelines can be a number of things. So visualization where you are representing raw data in ways that are just more intuitive or are easier to extract insights out of it. You can produce alerts if we talk about a fleet of servers that bad things are happening in your infrastructure and you want to know about it. Generating sites about maybe users or your applications. If you have front-end applications that you might want to know there's a spike in traffic or events of the recommendations. Again, if you talk about end users, what other users are looking to or users with a similar profile. And finally, just actionable analytics where you present data or present results that could be useful on any action you want to take. You go visit a customer and you want to know more about the accounts or anything related to that customer you're about to visit. So that's the general landscape. But to talk a bit about a few use cases that we have seen in the fields. One class of applications that we find very interesting is the ones related to drones where you're just in video produced from the cameras in the drones along with telemetry. So both of them are sent directly to some infrastructure that they're used to ingest and process. And the applications for that they vary from looking at the health of your cattle to inspecting airplanes between flights. And you want to do that not only by tailing the stream, by tailing the data. So by processing as soon as the data is available for processing. But also you might want to go back and reprocess data. I don't know, maybe you found a bug or you found an issue that you want to revisit the data and extract some new information. And so in applications like this, not only you're interested in the low latency aspect of it, tailing the stream and processing as soon as the data is available. But also going back in time at some arbitrary point and reprocessing. And along similar lines we have in a factory, you can have cameras just recording or about taking pictures of parts that are being manufactured. And you want to spot, for example, defects on those parts. And so the same concept applies here where you want to spot those problems or those defects as soon as possible. So you probably want to tail the stream. And process the data as soon as you get it. But there might be situations again in which you want to go back in time and revisit the data and reprocess it. So that same concept applies for such use cases too. Now focusing on streams itself. So that's what I wanted to say about use cases. But now let's turn our attention to what these streams actually look like in an abstract way. So a natural way of thinking about streams is that they are sequences of events or records or messages that I don't know any concept that matters to the application. But this sequence of data items as they are produced, we keep appending them to the sequence. But in reality, it's not just one single flow. If I think that a lot of the scenarios that I have mentioned with servers, with sensors, you have a number of these parallel flows. So it's not one sequence, but you can have many of those in parallel. And so this parallelism gives us another degree of say realism. So it's closer to what we would expect to see in a re-application. But it doesn't stop there because we can also have fluctuations in the traffic we're observing. You can have parallelism, you say you have the parallelism, but the traffic in the parallel flows can grow and shrink over time. And that's because you have, I don't know, daily cycles, weekly cycles maybe, no periodic cycles. But also you can have spikes on a Black Friday or Christmas. I know some events that get people to access more your system. So all of those could be examples of where you can have changes to the traffic of your application and consequently of your stream. Now, it's also important to note that if we talk about continuously generated data, you can have this being continuously generated for a very long time. We could be talking about years, if an application has been running for a long time. And so we might want to capture the stream from the beginning and keep it as a stream. In the recent past, a lot of applications, they tend to split the stream data into, say, fresh, recently ingested data, which is the part that you essentially keep dating. So it's the recent data you're capturing and processing. And the older historical data that maybe you have already processed, and you might want to reprocess in the future. And so for that matter, you maybe use a different system even to store it. And I'm going to call it the lumped away, just in reference to what people call the lumped architecture. But the reality is that it would be ideal for applications that they didn't have to make such a distinction that they could ingest the stream. Of course, they can get rid of dating the stream truncated and so on. But if they need to keep the data, then they should ingest the stream and keep it as a stream for as long as the data is needed. So no such a distinction between fresh and historical data. Now, these streams are not only about writing. I have focused a lot on the right part. So one sequence parallelism, then traffic fluctuations and ingesting. But a big part of it is also on the read side. So making sure that an application that wants to process it is actually able to cope with or to deal with the flow of data. No matter in what form. If it fluctuates, if it's parallel or what. So read scalability is another important aspect. So with all those concepts in mind, the main goal of Provega, or the vision we have for Provega when we started, is that we wanted to have a storage system that has stream as a primitive. So it has streams going in and has streams going out. So traditionally, storage systems have focused on objects and files. And we thought that given the nature of a lot of applications that we've seen, it's more natural that they use streams as their core primitive. And now, using the concepts I have just explained, those streams, they have to be implemented in such a way that the system is able to accommodate an unbounded amount of data, that the stream is elastic, that it's consistent, we don't want to duplicate events or miss events, and that applications are able to both tail and process data historically. And I'm referring to this as a cognitive way of exposing streams, because those are all concepts that we find very important when building cloud systems. So let me now move to talk about Provega specifically. Now, Provega builds on the concept of segments. A segment is a single sequence of bytes, and it's our storage unit. So it's the unit that we store in our lowest storage system. It's an append-only sequence of bytes. It's bytes, not events. I have mentioned events before. And that appears at the API level, but internally we treat as a sequence of bytes. Now, to convert from bytes to events from bytes, we use serialization. So we expect the application to provide a way of serializing the data and serializing on the way out. So segments enable us to have parallelism. I can have a number of segments in parallel. Flavio, could I just ask a very quick question here? Yeah, of course. So does Provega just focus on the storage of the streams? Or does it have any functionality to do some of that serialization? So as an example, is it only just doing the raw streams? Or does it have some high-level functionality, like similar to what, say, a message queue might do, for example? So you do have the figure of writing events. The Provega client expects you to pass a serializer and a deserializer. So the serializer on the way in, the deserializer, it will use the deserializer on the way out. So the application writes an event, but internally, we store it as bytes. So Provega internally does not understand the events. You don't understand the bytes. Okay? Let's go. So the segments that we store internally, they enable us to have parallelism. So you can have the various clients writing to those segments in parallel. And we use routing keys to map their pens of data to those segments. Now, these segments also enable us to vary that degree of parallelism, which I'm going to call scaling. So I can start, if I start from the head of the stream, which in this representation starts from the right. If I start with two segments, then at some point my traffic goes up. I decide that I need a larger number of segments and I transition from two to five. At some later points, traffic goes down and I drop from five to three. So that's viable with the Provega stream. And as I mentioned, we have this notion of scaling. And this scaling can be done in an automatic manner. So when you configure a stream, you can say they want auto-scaling enabled. And so Provega will track the traffic and we'll do that scaling automatically for you. Segments also allow us to implement transactions efficiently and effectively. When you start a transaction from an application, Provega creates segments for temporary segments for the application, for that transaction. And any appends in the context of the transaction will go to those segments. Now, if the transaction commits, then those segments are merged into the main segments of the stream. And in the case of an abort, we just discard those segments. So that data, the data of a transaction does not interfere with the data in the primary segments of the stream until the transaction is committed. And if it's aborted, we simply discard those segments and the data in them. So that's another benefit of having cheap segments in the way we have in Provega. Yet another thing that we can do with segments. We can have conditional appends, which we do in revision streams. They work by comparing the offset. So if you're trying to append and you want to make sure that the append happens in a state that reflects your observation of the state, then it will be accepted and it will be appended. If the offset has moved, which means that probably you don't have the latest states, then that append is rejected. And this is a way of implementing consistent states at the application level. We do that via primitive that we call the state synchronizer, which we both expose in the API and we use internally. And one of the things that you can do with this is build replicated state machines. And note that this is the way of doing this via using optimistic concurrency. And so we have, again, these two primitives that we expose. One is revision streams and another one that is a state synchronizer. This state synchronizer builds on revision streams. Let me talk a bit about one of the key features that we have. I mentioned stream scaling, but I want to talk a bit more about that. I want to go in more depth about it. I mentioned that when you're scaling a stream, we can go from one segment to many, or we can merge segments and go from more segments to fewer. Scaling can be done both automatically manually. So auto scaling will react to changes to the traffic. So you configure a stream to perform auto scaling. And if Pravek observes changes, important changes to traffic, then you will scale up or down. But you can also do it manually. So you can also proactively scale the stream. So for example, if you expect some traffic and you want to increase the degree of parallelism of your stream, you can go and manually do it before the events are, before you have that spike. Now illustrating how that looks for a particular stream. So say that we start with a single segment, stream with a single segment. And what this graph is showing is the routing key space versus time. Remember that the routing keys are the elements we use to map events that are being appended to segments. And so in this case, if I'm starting with a single segment, all the keys in the routing key space are mapped to the same segment. Now say that I have two hotkeys. And those two hotkeys induce enough load that Pravek decides that it needs to split that one segment into two. For the sake of example, say that the routing keys are representing geographic locations. For example, I have some taxi ride application or some taxi ride data looking at the geolocation of the taxi rides where they're starting, where they're ending. And so those two locations turn out to be hot for some reason. There is an event where people are just congregating there. I guess these days that's probably not happening. But before this, before the virus situation, that I suppose that was a common thing. So Pravek splits into two new segments, two and three. Now let's say that that was not enough. And it induces enough load that it requires a higher degree of parallelism. So now it splits segment into four and five. And at a later time, say that those keys go back to codes. Now Pravek goes and merges four and five back into six. So that would be the final state of the stream, at least for the time frame that we are looking to. Now one interesting thing to observe from this is that a single routing key does not always map to the same segment. It varies, it can vary over time if you have auto-scaling enabled. So if I pick for example, 0.9, then it started with segment one, then it was mapped to segment two, then four, then six. So at different points in time, a particular routing key mapped to different segments. And even though this may sound a complication for the application, the application does not really observe this. And these changes to segments are completely hidden from the application, and we deal with it under the hood. Now let me show you a graph that illustrates the changes to segments over time, but now from a real run. So we have this heat map that shows segments and the load on the segment. So the color represents load. So light blue means that it's likely loaded and bright red means that it's heavily loaded. The white lines represent the separation between segments. And what we observe here is if we start from the left, we observe that we have a number of segments and slowly those segments are merging. And so we see fewer and fewer segments down to a minimum that starts around 2.30 a.m. and goes all the way to around 5.30 a.m. And we have only two segments during that period. And from around 5.30 a.m., 6.00 a.m., we start seeing segments splitting again and splitting more and more. And we see a good amount of reds in the segments, which means that there is a good amount of workload in those segments. And this is precisely observing the traffic that we used to generate that figure. So we took data from the New York City yellow taxi trip records. We took half a day from it and we just ran it through Prevega to observe those changes to the segments. And we see precisely that. So it starts with some amount of traffic and it slowly drops down to a minimum. And then at some point early in the morning, it starts picking up again. And if we put them both together now, we can observe that effect where the change to traffic causes the segments to merge initially and then split again when the traffic picks up. Let me now move to talk about the Prevega architecture. So as I mentioned before, we have event writers. That's one of our APIs. We have other APIs like a Bystream API. But the event API is an important one. Many applications, they have the abstraction of events or similar abstractions that can be mapped to events. And so using the event API, writers append to a Prevega stream, to the segments of a Prevega stream. And we track the position of the writer. So that in the case of a disconnection and followed by reconnection that the writer is able to resume from the right position. Then to consume the data, we have the notion of reader groups. So we group event readers into groups that we use to split the load of segments, to balance that load as well. And that gives me the ability of growing and shrinking the sets so that if I need more capacity for reads, then I can do it. If I don't need as much and I want to reclaim some resources, I can remove some readers as well. And reader groups operate even in the presence of scaling. So that balancing of an assignment of segments happens even when in the presence of scaling. And the readers are not aware of those changes that happens and it's coordinated internally using a state synchronizer. Now the two main components of Prevega itself are the controller and the segment store. The controller manages the lifecycle of streams. It commands the segment store, for example, to create segments when it needs to. It also manages transactions that we run against streams. The segment store is responsible for managing the lifecycle of segments and for storing them. So that's our underlying storage layer. So the segment store doesn't know anything about streams. Stream is a concept of the controller and the controller is the one responsible for exposing that concept to the two applications. The segment store deals with segments. We use tiered storage. The first tier of storage is we expect it to be low latency option for small writes. So we have chosen to use Apache Bookkeeper. And for the second tier, which we call the long-term storage tier, we have different options. And we can configure it to use either file or object. In principle, the system is agnostic to what's being done there as long as we have bindings to connect to such systems. So for example, we can use HDFS there or we can use an NFS mount. We also use Apache Bookkeeper for coordinating the assignment of what we call segment containers. So that's not to be confused with Linux containers. That's the abstraction we use to represent groups of segments. And that's the units we use to assign work to the different segment store instances. Let me talk a bit more about this. So the controller is the one responsible for assigning segment containers to the different segment store instances. So each segment container is responsible for a group of segments. And to determine where a particular segment is going to land with respect to segment containers we hash the name of the segment. Now, in this particular example, I'm showing the controller assigning three segment containers to each one of the segment store instances. In the case that say I add another segment store instance, what the controller will do is we'll remap the segment containers. We shut down segment containers in existing or some of them in existing segment store instances and map them to the new one. So in that way, we distribute the loads, you taking into account new segment store instances. You can also remove them. I'm just adding one, but of course you can remove and that's also redistributed. Now, talk a bit more about the right and the read path. So on the right path, the first thing that an event stream writer needs to do if he wants to append data is to determine which segment store hosts the segment he wants to append to, based on the segment container. So he finds that information from the controller and at that point he connects to the segment store and starts appending the bytes. Now, the segment store will write to bookkeeper and only when he receives a response from bookkeeper that is persistent, he will respond back to the event stream writer. So bookkeeper on its turn persists the data in a journal. And so it's guaranteed that it's on disk by the time that the event stream writer receives the acknowledgement. And the data to long-term storage to tier two, that is propagated asynchronously. And as I mentioned before, we have a few options available, HDFS and FS, all those built on a file or objects. For the read path, we have a similar structure. The stream readers get information about segments from the controller. They read bytes from the corresponding segment store. The segment store responds with data in the cache. If it's a cache hit because it's dating a stream data, they will return immediately. If not, they need to read the data from tier two. And the data in bookkeeper is not used for reads. At this point, it's only used for recovery purposes. So if a segment store instance crashes and he needs to recover the data for a particular segment container or set of segment containers, then you will use the data in Apache bookkeeper ledgers. All right, so that's what I wanted to cover about the Provega, the segments, concepts, features, and a bit of its architecture. Let me now say a few words about stream processors, which is actually connecting Provega to applications. We so far... Flavio, can I just ask a very quick question about the Apache bookkeeper? Yeah, go ahead. So in effect, is the Apache bookkeeper kind of your transaction logger, right-to-head look kind of thing? Exactly, you can look at it this way. Right, it's okay. Okay, thank you. So to connect Provega to applications, we typically use connectors, especially if you're talking about... Let me start that. So if you're talking about application, you're building from scratch, then of course you can just go and use the clients directly. But you have say frameworks, which you want to... Generic frameworks that you want to connect to Provegan. For those, you want to build sync and source connectors. So the sync connector will allow you to output data to a Provega stream. The source connector will allow you to read data from a Provega stream. So one example is the Flink connectors that we have developed. So the reference to the repository is at the bottom of the slide. But that's a general concept for connectors that we can use for systems like Flink or other stream processors. So existing connectors that we have implemented or are aware of. So we have one for Apache Flink, which I have just mentioned. Then we have one for Hadoop. We have LogStash plugins. There is one contributed by the community for Alpaca. And there are a good number of other ones that we are implementing and we expect the community to contribute as well. So I have skipped a good amount of slides that I had on Flink. If anyone is interested in talking more about this, I have backup slides about that, but I will skip it for now and move on and talk about Provegan Kubernetes. So we have implemented operators. So operators are a customer controller for managing the lifecycle of an application. That would be a general definition. And we have used that in a few places. And our operators, they do a number of things. They worry about the deployment, about configuration. So we talk about construction budgets, part affinity and entire affinity rules of validating. Make sure that we are satisfying those. Assigning default values to achieve variables. So all those things. It takes care of scaling. In the case of the Provegan operator, it's responsible for upgrades. So if upgrading from more version to another one, that would be the responsibility of the operator to actually take care of it and also monitoring the health of the individual components. We have implemented three different operators for the different components for the various parts of the system. So we have for the Provegan operator that covers the controller and the segment store. Then we have the bookkeeper operator and zookeeper operator. So all those three are open source at the moment. And what I wanted to do is show quickly a cluster data that we have deployed. It's running a longevity workloads. This particular longevity workload is running. So it's characterized by a small set of routing keys. So it targets traffic using a small set of routing keys, which is a known uniform load distribution across those keys. So we're not using all keys, just a small set. So it gives me a skewed distribution of the workload. So let me show it running in a minute. Can I just say it's particularly good form to have a live demo during a project presentation? Yeah, okay. Yeah, so I have set up this before the call because there are a number of steps I need to get this running. So I want to show this running already. So I want to be running a lot of commands, just also in the interest of time. But let's see. Why am I not? Okay, so these are the pods that are running in this cluster. So as you can see, let's see if I can annotate this. As you can see, we have Graphon and Inflex to be running. Then we have a set of three bookies. We have the Prevega operator running. Then we have a single controller and a single segment store. And we have five zookeepers running here. Zookeeper servers and the Zookeeper operator. So this version of the operator was not using, so was incorporating the actions of Bookkeeper as well. And so I don't have the Bookkeeper operator running separately here. But other operators I have described this is in a separate repository and it's available as well. Okay, so this is the Graphana dashboard for that cluster. So let me start with the operation dashboard. So this is the traffic we're imposing. We are putting between six and eight. Hold on a second. I'm not seeing this playing quite yet. Okay, there you go. Right, so as you can see in this graph here in segment write bytes per second, we're putting loads that's between six and eight megabytes consistently. So as I mentioned, this is a longevity test that we run continuously. And one of the interesting things that we can see here is the number of, the variation of the number of segments. Remember that the distribution of load across keys is skewed. So I am sending to a small set of keys. I remember correctly it's four. And so it's expected that we get a good number of splits, so increases in our segments, and then at some point this starts dropping again until it converges. So that's the expected behavior. All right, so that's what I wanted to show quickly just to see that the cluster running and some of these graphs. Let me go back to the presentation. If you have any questions about this, I can come back to it and I'll ensure some more. And all right, so that was a quick review of a Prevator cluster live. Now to wrap up, the main motivation for us to pursue a system to store streams was the observation that we have a very good number of applications out there that have sources of continuously producing data. And we felt that a lot of those applications would map their abstraction that they have of the sources to streams rather than file or objects which are the traditional primitives you find in storage systems. Now we have put the effort into making these streams unbounded, elastic, and consistent from a storage perspective. And we have also done the work of connecting them to stream process so that we can extract the value out of the data. It's not only about ingesting and storing, but also being able to derive value out of the data. I gave one example, which is Apache Flink. I mentioned some others, but Apache Flink is the main one that we have been working with. The project is open source, is under Apache License V2 at the moment. It's hosted on GitHub. And we are looking for home for incubation. So we are looking at our options for incubation at this time. And before I close, just a few comments on anyone who could be interested in starting with Provega. So I want to give a few pointers. Check the website. There's a good amount of documentation there. You have even videos and blog posts in addition to project documentation. Check the organization on GitHub and the main repository. There are a number of repositories there. Provega is the main one with respect to what I presented today, but you also have the connectors I have mentioned. Then you can run Provega standalone locally if you want to do some quick testing or even some development. You can run that along with Provega samples. So we have a number of samples in that Provega samples repository. And to try on Kubernetes, I suggest that go look at the repository and the instructions there. And throughout that process, feel free to give feedback and even contribute if you see anything that you would be interested in changing or improving or anything. And with that, I conclude my presentation. So this last slide gives a good number of references for all the things I have mentioned during the presentation. Thank you. Thank you for that presentation. That was very informative. It's interesting and I guess slightly different to some of the typical storage projects that we've discussed so far. So this is a very interesting alternative. You mentioned you were looking for a place to donate the project to. Are you familiar with the sort of project's graduation structure in the CNCF? A little bit. I'm not very familiar. I'm more familiar with the ASF way of doing things. I'm not as familiar with the CNCF or even the Linux Foundation in general. I mean, we have spoken with people across the Linux Foundation including the CNCF, but I don't know. So anything you want to mention would definitely be of use. So if you want to give any information about that, that will be used. So I'm sure. All right. So perhaps I can send you an email after this. I have dropped a link to the process into chat. Perfect. That's really helpful. But so just to quickly summarize, there are sort of three levels of projects. The starting level is a sandbox project and this has a relatively low bar to entry. So it's kind of good if you're trying to help build the community and address maybe IP policy related changes or help grow the number of maintainers, for example, of the project. The next level up is the incubator level and that covers, that has a higher bar and there are a number of different criteria. And then finally, there are the graduated projects, but graduation then requires additional things like security audits, for example. So it would be useful to understand your thinking on this and at what level you are considering because obviously there's a different workflow in a different process and different levels of due diligence that we would need to consider. Sure. So in your view, what would be the difference again between sandbox and incubation? What would make anyone go for sandbox rather than incubation directly? That would be my question. It's typically down to project maturity. So the incubation level requires a number of criteria like, for example, having maintainers from different organizations and end users and having the project being used in production, those sorts of things. So if you can't get some of those references or maybe the project is very focused on just one organization, that might be an opportunity to go in at sandbox level to kind of grow the community further. Got it. Okay. That makes perfect sense. All right. Okay. That's interesting. Interesting. And thank you. Yeah, sorry. I don't know. My audio is breaking up. I don't know if it's me or everyone else as well, but thanks for presenting. I think it's definitely, as Alex said, a very different project than we're used to looking at. So I think it'd be a good asset to add to our portfolio in CNCF. Yeah, thank you, Amy. It's Amy, right? No, that was Erin. Erin, okay. Sorry. I guess I saw the wrong. Erin, thank you. This is Luis. I just want to thank you for this. This is really nice. I'm very different. I think we need to expand our landscape document that we have for storage systems. So one of the questions is that, yeah, you have a lot of Apache projects. What, you know, I'm just curious, you know, why go after CNCF approval instead of part of the ESF? Excellent question. I would say that I personally haven't entirely made up my mind. I have been with, except for a board director. I have been in everything in the ESF. I am computer projects. I am part of the PMCs. I am an Apache member. I have been part of incubators. So I know the ESF pretty well. I have heard great things about the Linux Foundation and CNCF in particular. I have been very impressed with the infrastructure and the group of people and the projects. And so I decided to explore. I thought it would be a good idea to explore. So it's a strong contender in my list. It's looking pretty solid. All the work that people have been doing, all the projects and again infrastructure, I think that all counts and helps projects to be successful. Excellent. Thank you. Thank you, Luis. So thanks again for presenting. Amy or Erin, unless there are any other things that we need to raise, I think this covers our agenda today. Yes. I think we should pass on the maybe sandbox at least to start templates. So it helps answer some of these questions like Luis was answering to structure it in a way that would help you understand which level of acceptance you think the project would go into. So I'll go ahead and forward that on to you, Flavio. And then I think we'll go from there once it's structured in a way. Oh, thank you, Amy. Once it's structured in a way that we could move forward and understand what you guys are looking to get out of it. That sounds great. Thank you. This is a question to most of us, not really Flavio, but I thought projects always started as sandbox first. No, that's not accurate. There are projects that can generally, I mean, they don't come in as graduation, I think that's kind of a given, but they can start as incubation provided, you know, they have a lot of support within the community. Sandbox meant to be that springboard. And so without really understanding it in terms of all the other different aspects, I don't think I could make a recommendation one way or the other. VTES started as an incubation project. Yeah, ARCO just came in, the incubation. Okay, yeah, VTES was like the early days of this project, the CNCF stuff, and I was wondering if we had changed the format, I understand now, that's cool, thank you. Not that I know of Luis, but it's fluid. It's all good, thank you. Cool, okay, so I think we're done, and we're coming up to time too. So thanks everybody, and we'll see, and we'll speak to each other in the next couple of weeks, and I hope everybody's keeping well and staying healthy. Yeah, likewise, thank you for the opportunity of presenting to this group. Ah, thanks again Flavia. Be well everyone, see you online.