 Good morning, good afternoon, good evening, depending where you are. We will start to call shortly, we'll just give it a couple more minutes to allow a few more people to join. I'll just post a reminder on Slack as well. Good morning, afternoon and evening. Good morning. How are you today? I am good. Good morning. Good morning, everyone. We'll just give it one more minute. Sorry to keep you waiting, Juran. No problem. Hey, Louise. All right. I think, I think we have a quorum. The first item on today's agenda is the presentation from the chat about Chubau. It's done the right pronunciation FS project, which is currently a member of the CNCF as a sandbox project and they've made some amazing progress over the last year. And have built the community. And are looking now to move into incubation stage. So we look forward to, we look forward to the presentation. Okay, thanks over to you. Shall I share my desktop? Absolutely. Okay. Hi everyone. Good morning. Okay. This is Shoran speaking and the co-maintenor of Chubau FS. So today I'm going to present on behalf of Chubau FS hoping to move it from sandbox to incubation. Can you see my desktop? Yep. Okay. Yes. For those of you who are not familiar with Chubau FS, I'm going to brief some backgrounds of Chubau FS. Chubau FS is a cloud native distributed storage platform designed to a design for containerized applications running in large scale container platforms, such as Kubernetes. As you may recall that Chubau FS was formally described as a distributed file system when it was first open sourced and presented to the storage SIG when trying to get into as a sandbox project. But since version 2.0, we have a feature added to the project which is actually compatible interface. So the using scenario expands tremendously after that, and it is a case in real production. It is already beyond the scope of file system. So we prefer to call it distributed storage platform right now. What makes it so special for cloud native applications? Here are some challenges we summarized based on our experience serving applications running in Kubernetes cluster. First of all, we find that there are a lot of customers sharing the same Kubernetes cluster. Which means in terms of storage provider, we have to have the ability to provide different volumes in the purpose of data isolation. It is impossible for us to deploy different Chubau FS clusters for each customer. So multi-tenancy is a necessary feature for Chubau FS. And then I think storage usage and throughput for a single customer is hard to predict. Because I think the main advantage for cloud native application is that it is highly scalable. And it is very common that the storage usage and throughput growth as business growth. So this is a requirement for cloud native storage platform as well. And since we have a lot of customers, the file sizes are diverse, ranging from kilobytes to terabytes. Which means we have to support both large and small files. And I think it is a big challenge to support small files very well for distributed storage system. And also, we have to support sequential and random read-write patterns for different applications. And then we find that the applications running in the same Kubernetes cluster are sometimes connected to each other. So it is better for the data to be shared by upstream and downstream users. And it is better to provide different interfaces for different customers. So these are the challenges we took into account from the very beginning when designing a Chubau FS project. To solve those challenges, there are some new features listed here of Chubau FS. For example, optimized resource utilization, multi-tenancy, electricity, city, and scalability for metadata and the data. I think it is very common for data to be scalable, but it is a big challenge for metadata to be scalable. And we also can serve large amount of clients in a single cluster simultaneously. This means that there is no theoretical limit for the number of containerized applications using the same Chubau FS cluster. And also Chubau FS is optimized for both large and small files. And it has converged, right now it has converged the system and actually compatible interfaces. And here is a timeline for Chubau FS. It is used in production at gd.com first in June 2018, and it was open sourced in March 2019. We released version 1.0 in April 2019 and was presented to the storage SIG in June 2019. The industrial paper based on Chubau FS project was published in Sigma 19 in July. In order to integrate with more cloud native ecosystem, we developed Chubau FS PSI plug-in and Chubau FS help. And they are released and used in production right now. In December 2019, Chubau FS joined as a sandbox project. Then in April 2020, we released version 2.0, which supports S3 compatible interface. Also, there are several external users listed on GitHub. The most recent external user is Meizu, which is a consumer electronic company in China. And I'm going to cover the using scenario of different companies in the later slides. And in August 2020, Opal joined as a key contributing company. As we know that such projects, such storage projects require constant investment for someone to be able to make key contributions or key improvements to the project. So Opal has established a relevant team. And right now, jg.com and Opal collaboratively leading the project right now. Then we have a plan for the next release to improve stability. And we are trying to support more big data applications in the future. And here's a high level architecture for Chubau FS. As you can see from the diagram, there are several components forming the whole system, which is resource manager to manage the whole resource of the cluster and volumes. And data subsystem is the place where file contents are actually stored. And the metadata subsystem is where file metadata is stored. And comparing to local file system, resource manager is where file system metadata is stored and metadata is where file metadata is stored. And to take the request from application and users and resolve them into metadata and the data request, we have few clients and object node providing both file system interface and actually compatible interface, respectively. And here's a detailed architecture of Chubau FS or what a Chubau FS cluster looks like. Hey, just a quick question. So the primary way of accessing a volume is through a few clients, therefore. Yes, clients is providing file system interface. Here's what Chubau FS cluster looks like. I'm not going to dive into the technical details of this diagram, but I think there's some key points I was mentioning. First of all, is that the metadata is highly scalable. As I mentioned before, data subsystem is, it is very common for data subsystem to be scalable. But for metadata, it requires careful design spots for metadata to be very scalable. It is documented in our industrial paper in SIG mode. Second of all, is that as you can see, there's only one potential bottleneck in the diagram, which is resource manager. But we can deploy a manager proxy before resource manager. So we eliminate bottlenecks as much as possible. And thirdly, yeah. Just a quick question on that, so that that manager proxy is kind of like a load balancer for the resource manager? Actually, we use Entrix as a manager proxy. It takes just deal with the reader request because that's where the bottlenecks, that's a high-frequent request to resource manager. So they're caching service? Yes, they're caching the reader request results. Thank you. And thirdly, as you can see that one client's control plane and data plane separated, which means that the normal read or write process does not go through resource manager or manager proxy. It only goes through metadata and data node. And last but not least, this architecture can withstand a high traffic peak, which is very useful during commercial promotion festivals. Okay. So just to clarify, and maybe to also make it clear to others, so effectively the data part for any files or any objects goes to the data nodes and it's partitioned and sharded. And the attributes of a file and things like a directory structure and things like that go to the to the methanode correct? Yeah, correct. Does the methanode also contain information on the layouts of the data partition? Actually, yes. No, not the data partition. As you can see, meta partition is indexed through a node number range. So, and the metadata consists of where the actual data is stored. I mean clients gets the partition view and in order to access a single file, first it can get metadata from a methanode and what it gets is a data partition ID. And through this data partition ID, it can, the request can be routed to the actual data node. Yeah, so actually it's the client that has the data partition view. Yeah, in fact, I have a question regarding. So you say that it's basically the metadata, but you have trying to find the data based on the inode which is one per file. So that means that you have to have the like the data node which can be big enough to fill in like one file. So every files, the file that cannot be shared across different data node or is it can be sharded. I'm sorry. What one fell. So my question is, is this one is one file have to be like located in the one single data node or it can be like a sharded across different data node. It can be sharded to different data nodes, because the minimum data node storage unit is extent and a large fell, for example, a large fell can consist of several extents and this extents can be sharded and can be distributed to different conditions and for small fails. Maybe a wide extents can consist of several files. So that's why we, I said, I said, 12 is optimized for small fails, because small fails are aggregated to a single extent. So, for small fails, several fails can be aggregated to single extent for large fails. One large fell can consist of several extents, and that that extents can be distributed through data partitions. Okay, so, so what's the algorithm or strategy we use to distribute or decide that well to store those files, or is that my name. Because as you know, staff that have like consistent hodge and stuff, but so what about what we have here. Actually, a data partition can be marked as readable or read only. So the client will pick one right for data partition and write the actual data to to specific data partition and update the method node update the metadata. Okay, so so the information is stored in the metadata node associated with starting node so you can find out where the file is. Right. Okay, thank you. Okay, so we talked about the technical details. And here's the community growth. To my face was open source. Since March 2019, and it is it was enrolled as a sandbox project in December 2019. So this this data statistics, I extracted from a depth that's thought since f.l. The data is before versus since joining a sandbox. Before joining sandbox, as a duration is nearly nine months. Since joining sandbox, it's about 14 or 15 months. These statistics are from development perspective, as you can see commits increased 300% code committers, a double and we have a boost for pull requests. And I think the most important thing here is that now we have to to companies constantly in in West constantly in West on this project, which which is JD.com and Opel. We have two development teams making key improvement to the projects. And as I mentioned, such storage projects have a has a high barrier for contributing key features. So we are now having two teams. And as you can see from the commit that more and more key commits are from Opel team, as well as JD.com. Okay, I'm going to brief some user adoptions from different companies. First of all, of course, JD.com is the top e-commerce company in China. And it still has the largest triple FS cluster in production. It served as the default storage for containerized applications. It has been deployed in Kubernetes cluster. There is no local local storage provided for containerized applications, just only only triple FS. And it has been supporting more than 100 internal business customers at JD.com. There are several using scenarios in Opel right now. Some of them are already in production and some are under development. For example, triple FS served as the backend storage in the AI platform. This one is in production. And we are trying to expand the using scenario for triple S which is hoping to support more big data applications. For example, we are trying to use triple FS as the backend storage in a data lake. Data lake architecture. This is under development. And also, we are trying to use it as a remote spark shuffle plug-in storage. And this is also under development. And when this usage is in production, we plan to open source on GitHub. And also Meizu, it is a consumer electronics company in China. They don't have a development team, but triple FS are used in production in Meizu. There are several business customers use triple FS as backend storage, such as AD algorithm platform, and database, push storage, risk control, and cloud backup. Can I ask just a small question on some of these use cases? Are they therefore geared towards read-intensive workloads where the client can do lots of caching? Or are some of the workloads also geared towards lots of right activity? And the reason why I'm asking this is because there are some obvious gotchas when you're doing intensive writes with Fuse file systems, for example. And I kind of wondered if you had done anything, if there was anything specific in the project to kind of deal with that. Well, actually, we had to find a balance point between POSIX file system semantics and the performance. Because as we all know that POSIX file system semantics are not very suitable for distributed storage. So we have to make some compromise about the POSIX semantics to balance the performance. But the principle is that we have to fulfill the application requirements for such semantics compromises. Like one to cache the data and one not to cache the data. We had to do some balance. And the principle is that we have to fulfill applications requirements. So as long as we can support the customer applications, we can release the POSIX semantics. So there's a balance point between semantics and performance. Rob, ask a question in the chat, asking if you would be able to maybe cover what motivated Oppo and JD.com to undertake the development of TreeBowFS versus perhaps using other technologies. And is it due to, you know, some particular applications or perhaps scale? I'm sorry, development. Yeah, if you don't mind me just attempting to recast the question. I guess, how should we think about TreeBowFS and apologies if I screwed the pronunciation up versus other FS options that may predate it. So if there are some fundamental limitations that couldn't be adapted for the use cases, you know, why a new option versus adaptation of other things and it's not meant to be a criticism I'm just trying to understand and form a mental model around what is what differentiates this versus other things that could potentially be adapted to provide a similar capability. So actually the main advantage of TreeBowFS comparing to other distributed storage is that it can support small fails very well, both in capacity and performance wise. I think I mean, it's very hard for distributed storage to support small fails. So this is a big advantage. So for large fails, I think everyone has a similar performance regarding large fails, but for small fails, TreeBowFS is highly scalable and is optimized specifically for small fails. But of course there are some scenarios that TreeBowFS cannot support, which is, I think it's the MyCircle directly using TreeBowFS. This scenario cannot be supported right now. But for MyCircle history table, which requires a lot of read request, but not write request, TreeBowFS can support that using scenario. Okay, so what I mean is for most using use cases, TreeBowFS can support, but there's certainly some database scenario that cannot be. And that's what we are going to cover in the next development phase. Does that answer your question? Yeah, I think it gives me some sort of direction on, and better understanding it. I'll have to pour through your docs a little bit better, but I appreciate it. I think Ardeline had a sort of follow on question maybe in the same vein. Yeah, so basically how much of the performance improvements are due to relaxing the process requirements and client side caching for small files? I think we have a performance stats in the paper, comparing to SAFFS. And, well, I didn't bring some of the graph here, but they are in the paper, and you can find it on our GitHub. Sounds good, yeah, I'll check out the Seekamal paper later. Thank you. Okay, shall I go on? Sorry, just one more question. So, you mentioned that the database cannot be supported, or like MySQL cannot be supported. What exactly is the reason? Because of large amount of write or something else? Actually, MySQL's write pattern is DirectIO, especially for InnoDB storage engine. InnoDB storage engine uses DirectIO as a write pattern. For DirectIO, there's really nothing a file system can do, because the semantics requires the IO to be sent to the server. So, there's, well, actually there's no, I'll say, no room for the file system to do optimizations for DirectIO. So, what we want to do to support DirectIO is we need to reduce IO latency, which requires DPDK or RDMA to reduce the network latency. But that requires hardware support. So, I mean, MySQL write pattern uses DirectIO, which has no room for the file system. Yeah, that's basically a common performance issue, right? So, because they're using DirectIO, they're asking the file to be persistent, so become crash persistent, right? So, based on your current design, can you just like simply send the content to the back end and the persistent there, because you already still have a distributed file system. It's just like performance, it's not as good, but in theory, if you back pass your cache layer, you still can able to do it, or for some reasons, it's not desirable. Yeah, well, we are trying to cover this using scenario. Well, I mean, DirectIO semantics requires the IO to be sent to the server, but so, if we can reduce the network latency between client and server, then we may be able to support this scenario. Okay, thank you. It could move the app. I don't know if you guys support hyperconvergence, but if you could move the app to the cluster that runs the file system, then the app will be on the same node, and you could reduce it, right? I don't know if you support that. Actually, we have a strategy to select the data nodes, and if there is a strategy that if clients, if the data nodes, the available data nodes are on the same computer node of clients, then it will choose this data node with high priority. Excellent. But if they're shorting individual files, co-locating data and compute may not mean much because the data is shorted, it falls shorted across different data nodes. Yeah, they may, if you lose the same node, and then you don't chart, then you, if you lost your node. No, no, no, I'm not saying don't replicate, I'm saying don't chart. Then your recache can really benefit from that. But anyway, that's a discussion for another day. Okay, you're all welcome to discuss technical details offline and to issue, to propose some issues on GitHub. You're all very, very, very welcome. Okay, so here's the future plan. I'm going to cover this, cover these plans from a community perspective and technical perspective. From our community perspective, the objectives are to attract more companies contributing to the project, and we are going to make it easy and stable to use. The difficulties are that, as I mentioned, such a storage project has a high barrier to contribute key commits. So, what we are going to do is that we are going to open some series of technical lectures which can do some source code analysis for someone to be able to familiar with to bypass more quickly. And we are going to, we can also provide internships for college students. And this is what we are going to do in this summer. And also, we can develop tools to simplify deployments and cluster operations. And for technical plan, we are planning to integrate with since the ecosystem. And we already have to buy up a CSI and to buy up as hell. And the monitor subsystem is highly reliable on promises, and we are going to integrate with look, and this is under progress. Actually, we have proposed a pro request towards a root community, and we are waiting for the feedback. And also, as you can see, we are trying to integrate with big data ecosystem as well. For example, use to bypass as backend storage for data lake and develop remote spark shuffle service. And we are also going to the next biggest feature is a cross zone optimization to improve robustness. Okay, so that's it for today's presentation. Thank you. Thank you very much. I would just like to ask a couple of questions around how somebody would, if somebody wanted to run this in production today. Are there, you know, until sort of the rook changes are, are committed. How, how would somebody sort of deploy and manage this across sort of a number of a number of nodes or a number of servers with what's what's the, what's the sort of current best practice. Actually, there is no hardware limit to deploy to bypass for best practice. I think I can give some suggestions, which is resource manager can be deployed separately and meta node and the data node can be deployed hybrid. Because a meta node consumes much of the memory resource and the data node consumes storage resource of a single node. So they can be deployed. And for understood, but I guess what I'm trying to get at is, is there some sort of process to sort of automate that deployment and perhaps, you know, a class, you know, like, for example, is in a group in a cluster or orchestrated in, in, in some sort of way or, or is this something that you install on a sort of server by server basis. Actually, a true bypass can be served as an on premise storage for Kubernetes platform. But if you are planning to deploy a true bypass in an orchestrated by Kubernetes, I think, first of all, data node can and the meta node cannot be migrated from different between different nodes. Does that answer your question. I mean, no, I think I understand. I understand the concept of the data node and the, and the method partition. I guess what I'm, what I'm trying to understand is, you know, what in the typical use cases where, you know, JD or OPPO or some of the other companies that are using it in production, are these typically deployed therefore on, you know, some sort of bare metal nodes or, or, or, or VMs or something but but is it is it under, you know, some sort of config management or is it actually deployed perhaps as a container orchestrated in some way. Most of the cluster are deployed on bare metal nodes, and can it is served as an on premise storage platform for the applications are running in Kubernetes cluster. Right. So there's, there's, there's, there's, I guess that means there's, there's quite a barrier to entry to, to deploy this in, in, in production because the, the deployment and the operational aspect of this is, is, is going to need is going to require some work. I imagine. Yeah. But, as we have to bypass home, it can also be actually it can be deployed and orchestrated by Kubernetes cluster. But I don't think they are used in real production. But we have to have help, which can deploy to have us in Kubernetes class. All right. Okay, that helps. Were there any other final questions for for Sharon. Thank you very much. Thank you Sharon and, and all the team, all the project team. This was a great presentation I think we all, we all learned a lot about to our first. We, we also had another agent item to go with Raphaela to discuss the cloud native DR but I think given that we only have a few minutes left to the hour. So I think we can propose that we, that we move that to the next call. Hopefully that's okay with you Raphaela. Yeah, it's okay. That's perfect. I also learned a lot from this presentation so thank you Sharon. Thanks everyone. Well, in that case, we'll give everybody a few minutes back. We'll close the call. Unless there's any other things that anybody else wants to raise. I think we're good. All right. Thank you again, Sharon and the project team, and look forward to seeing you in the in the next call. Have a good rest of your day and good or good evening depending where you are. Thanks. Have a nice day. Thank you. Bye bye. Bye guys.