 So good afternoon, everyone. Welcome to the session. My name is Simon. I'm the Huawei Store maintainer. And my colleague, Du Mingming, is the Huawei Store developer and the maintainer. We both work for dark cloud on the storage product and the solutions. So today, we will join together to share the topic of accelerating the Kubernetes data intensive applications with the cloud native local storage. So here's our agenda. So first, I will talk about the storage challenges of the cloud native applications. Then why local storage? And what does the Huawei Store will bring to us? Then we will bring up the Huawei Store use cases for the workload, like the middleware, AM machine learning, Kuba word for virtualization, and edge computing. So actually, the initial drive, we developed this project about three or four years ago. It's just for the middleware. Just provide the storage solution for the Kubernetes state for set, the collections. Then we gradually to extend it to the other use cases. So the challenges. So the following requirements are critical to ensure the stable operation and efficient performance of the cloud native data intensive applications. So first is a high performance and a low latency. So this is typically for the workload, like the online transaction processing. The business, like the banking service or security trading, which requires very rapid response time to ensure the best customer experiences. And also scalability and elasticity. So this is typically for the internet, the 2C business, need to handle a large number of the concurrent user requests. So especially during the promotion or on sale, the data traffic may increase dramatically. So that will need the system. Also the storage have a very quick scalable. And also the high availability for the data consistency and the disaster recovery and backup for the in case of the site failure. And also the multi-tenant support for the resource and user management. And also the last but also not least is automated storage management. This will increase the productivity and reduce the management cost. So with the local storage will provide a list of the benefits here. So the first is a superior performance with a local disk IO. So we all know that with a local disk, we will have a very low latency and a low network overhead. And because we just mentioned a bunch of the local disks, so it will be provided an extremely low cost compared to the commercial storage system. And also the high flexibility with the deployment on demand. And because it's a Kubernetes native, the storage system, so it will be very well integrated with the native applications. But however, there are some of the challenges for the traditional, the local storage. For example, the lack of the high availability for the data safety and the security. And also the management complexity. For example, if we manage a large scale of the cluster with hundreds of nodes, we have to manage very large of the disks. So that will be very hard work. So let's bring up the Huawei Store. That's our storage project. So Huawei Store is currently the CNCF sandbox project. And the Huawei Store is a Kubernetes native storage solution by unifying managing the local disks into the local resource pool. Then use the CSI to provide data volumes to the updated applications. So let's look at this architecture diagram here. The green part is the Huawei Store. And the blue part is CSI. So we can see two major components in the Huawei Store, the local disk manager, LDM, and local storage. And the local disk manager, LDM, is just to abstract the local storage, local disks, into the Kubernetes resources. So it can be managed by the Kubernetes. So that's why we can offer a list of the features, like the disk auto discovering, monitoring, et cetera. And then we can use the local disk claim, which is also the Kubernetes resource, to claim the disk into the Huawei Store for the management. Of course, you can also reserve your disk if you don't want to be taken over by the Huawei Store. And the local storage will create the local disk pool for the uplayer, the CSI. And we can create a multiple in each node. For example, we can use a different disk type, HDD, SSD, or NVMe, for creating the different resource pool. And because we use the Linux kernel mode, the LVM technology, so it will take a very small footprint for the system resources. And that's a very good benefit for in the later, we'll talk about the middleware use case and edge computing. And the uplayer is storage class. That's the Kubernetes storage class and PVPVC. And then also, where very important the component does not show in the picture is the Huawei Store scheduler. So just like the Kubernetes scheduler, and we can schedule the part to drift with the data to ensure the data liquidity. And on the right-hand side, we can see the LDM can also skip the local storage to directly provide the disk type, the data volume, to uplayer SDS, software-defined storage, like the Minial, and SIF, and MFS. So they will create their own data volume to the applications. So this is for the case, for if you require the extremely high performance, and also the disk resource isolation, but the trade-off will be the flexible features like the scalability. And down the bottom is a screenshot what the Huawei Store can provide. The two types of the storage class, LVM type and disk type. And also, we have the three types for the LVM type. We have the convertible, non-convertible, although it's an HAA and non-HAA and convertible. So convertible means we can convert from the non-HAA to HAA. This will provide the flexible to the future demand. OK, after the brief introduction about how the Huawei Store will work, so this page will be easy to understand. So the local disk manager, LDM, will simplify the disk management. For example, they are auto-diskering and identifying and monitoring. And also, we can provide a centralized management interface and the disk overview. And for example, we can get the control get LD, local disk. We can get a list of all the disk managed by the Huawei Store. And also, we support the multi-type, the disk type, HDD, SSD, and NVMe for the different performance requirements. And the local storage will provide the two types of data volume, the LVM and the raw disk we mentioned before. And also, another very important, the Huawei Store will provide the feature is the better than the traditional one is the high availability. So we can cross-note replicas for the data volume synchronization. And apart from that, the Huawei Store will also provide a list of the enterprise level data measurements and services features, like the volume snapshot, cloning, and online migration. So that's a very, very important feature for the middleware use case and also the Kubernetes use case we will talk about later. And also for the scalability, we can also make the online expansion, the data volume expansion, and the node expansion, and the disk expansion, all online without any interruption of the production and the systems. So next, I will move on to the use cases. So Mimi will talk about the middleware and AI machine learning. And I will focus on the covert and edge computing. So greetings. Thanks to Simon for his show. Next, I'm going to present two use case scenarios. The first use case to be shown involves the integration of Huawei Store with middleware applications. The primary types of middleware applications commonly classified into four major categories, include relational and non-religional database, data processing and data analysis. For database applications such as MySQL and Munger, they generally carry on many critical business operations within enterprises, requirements for storage that are reliable, stable, and offer low latency. For data processing, applications like Kafka and the data analysis applications like Elasticsearch, aside from requiring low latency and high performance, the capability to ensure resource isolation in production environment is important. These applications may consume a lot of storage resources during both state traffic, which can be harmful to other services running on the same node, potentially causing these services to become unavailable. Besides above features, certain essential common storage functionalities are also indispensable. These include snapshot, clothing backup, recovery, and the data consistency measures, which are critical for ensuring data security. Additionally, for storage or system administrators, straightforward optional management tools are crucial for data tasks, allowing them in handling scenarios such as system upgrade, including incremental components updates, scaling storage capacity, and migrating data. It's particularly beneficial when these features are integrated into a visual interface or a command line tool. Based on above mentioned requirements, HAL is the Hwame's solution for middleware. For those who require super storage performance like Minio or Kafka, Hwame's offer us the capability to use direct Zodiac volume. The most fat aspect of local storage is the high availability of data. In this solution, Hwame's offer block-level high availability data volumes that help applications running on the other available node with zero data loss during a node failure. Moreover, Hwame's offer provides additional management features such as automatically data migration, clothing, failover, and step shots to better manage and utilize data. Finally, the most important thing, all these user operation enhancing features are integrated into both command line tools and a user-friendly visual interface for conveyance. OK, the above is the Hwame's offer solution for middleware. Let's move on. The second use case to be showed is the integration of Hwame's service, AM Machinoni. HAL is the storage challenge for AM Machinoni. Training performance is limited by the speed at which data is read from storage. The key to high performance lies in the ability to repeat and quickly read data from local storage. The closer the data catch is to the GPU, the faster the retriever speeds. Furthermore, the balance must be stuck between performance, capacity, and cost. During model training, the demand for storage mainly occurs in the following two stages. The first one is multiple loading datasets during training. The second is frequently writing and loading of checkpoints. So the relationship between AM Machinoni and storage is that the faster storage makes the faster training. In this scenario, the demand for storage can be summarized as follows. The first one is high reread performance. Second is high write performance, so it is large capacity. In the bottom column right is the description of storage by NVIDIA DJI Superport. It is seen that local storage has a natural advantage in AI scenarios. OK, tell us the overall architectural diagram of HWAMISO in AM Machinoni. And HWAMISO mainly does the following things. The first one is simplified management of datasets. Second is offer in-class storage and unified datasets loading and storage. The third is excited data loading and storage unknowns. Next, we will discuss these three aspects apparently. During model training, the first issue that arise is determining where to load the training datasets from. Generally, the source of datasets are quite diverse, including both public and private datasets. Some require authentication, while the others are accessible with knowledge. If scientists or training programs within application can focus their attention on the training process itself without consulting about the source of datasets, the training offenses will be improved a lot. So HWAMISO provides functionality for users to customize the configuration of the data source. Once user completes the configuration, the datasets will be loaded into the training cluster from external source according to user setup. After addressing the issue of datasets sourcing, the next challenge is speeding up, loading datasets into GPU. Typically, the same dataset is loaded repeatedly, which can happen across the same or different training tasks, potentially leading to computations for disk IO resource, and resulting the user loading speeds. HWAMISO offers a data volume based on a shell memory for each dataset, which allows the training tasks to read data directly from memory. They'll by minimizing disk access and speeding up multiple access to the same dataset. Now that we have commenced the training tasks, we can joyfully train our models. However, in parallel training, some tasks may fail, leading to interruptions in the training process. To prevent the waste of previous gain training progress, it's essential to frequently save checkpoints. So HWAMISO offers memory-based data volume for storing checkpoints, and to ensure these checkpoints can be accessed on other nodes within the cluster. HWAMISO will assign them to the in-cast shell storage system. OK, here is the overall solution for air machining of HWAMISO. Up next, Simon will take over and continue bringing us insights from other scenarios. Thank you. So cool words for virtualization. So I was to assume here that everyone here to know what a cool word is. It's a very cool open source software and to running the virtual machine in the container manner on the Kubernetes. And it's connect and manage the container and the virtual machine on the same platform, and also it supports the internal accessing between the virtual machine application and the container applications. So in a virtualization environment, the persistent storage is very important also. The underlying storage need to provide high-performance block storage capabilities. And also there's some important capabilities like the virtual machine snapshot, snapshot restore, cloning, and online migration, et cetera. So HWAMISO, the data volume can provision to the Kubernetes system. So the LVM type, which we talked about before, can be used by the cool words, the system volume and the data volume. And the disk type of storage classes can be also used for the cool words, the data volume. And also the volume snapshot storage class will be supported, the cool words, the VM, virtual machine snapshot, and restore, and cloning, and migration features. So here is the practical, the lab order we how to mount the data volumes on the cool words. So the first one is a LVM type, the data volume, mounted to the cool words, the system volumes. So because it's a system volume, it will need the operating system. So we can use the CDI to import the system image to the cool words, the system volume. And then the middle one is the LVM type for the cool words data volume. And then the right one is the disk type of storage class for the cool words, the data volume. And the snapshot and restore. So before we can enable the feature of the virtual machine snapshot, we have to make sure the library store, the storage class supports the CSI volume snapshot. So we need to check the storage class, the YAML file to ensure the two parameters of the snapshots and the snapshot contents be conflicted, because the snapshot controller will keep monitoring these two parameters. And also the snapshot and restore support must be enabled in the feature gate. And for the VMIR, for the virtual machine restore, we also have to stop the virtual machine first to restore the new virtual machine from the snapshot. So there's a cloning, cloning is much straightforward. It's just a complete copy of the virtual machine. And so the cloning API also really on the snapshot and the restored API. And we have to also make sure that when we store the storage class, support the volume snapshot. And like just we talked before, and also the snapshot and restore support must be enabled in the future gateway. So here is the level we test of the snapshot. So for example, we create the first state of the FI-1, the FI-1 state, then we create a snapshot resource to capture the FI-1 state. And then to change the state to create the FI-2. And then we create another restore, the resource to restore the virtual machine instance from the snapshot that we created before. Then it was successful, the clone. And so next is the edge computing. So with the popularity and the instance of the IoT devices, the amount of the data generated by the edge device has increased dramatically. This requires the edge computing in order to have the sufficient storage capability to accommodate and process this data. So there are some special storage challenges on edge computing, for example, the real-time requirement. So the edge computing will normally have the real-time or near real-time for data processing, and also the data synchronization and consistency to avoid any data conflict. And the third one is very important, because as I mentioned before, at the edge side of the edge computing, the system devices, the resources will be very limited. So it's not affordable for the storage system to take too much of the system resource. And also for the recovery and the data recovery to minimize the operation and the maintenance, and also scalability and elasticity we have mentioned before. So the data volume can be deployed on both the cloud side and edge side. So at the cloud side, the Huan can provide the data volume for the middleware or the corporate, like we talked before, and also for the data analysis and analysis, and also the large model training. And at the edge side, we can also for the middleware and the corporate, and for the lightweight, the data processing. So what Huan may do is benefit on edge computing. So because the local disk IO will ensure the real-time availability and the processing the speed of the data. And because as I mentioned before, as we use the Linux kernel mode, the LVM technology, it will take a very small footprint to saving the limited system resource. And they automated the management to ensure the data reliability and business continuity. And also the data volume actually has high availability achieving the efficiency and the reliable data synchronization. So this is for the corporate cost, the data to show in what the Huan may store will occupy the resource that the resource, we installed the local disk manager component and a log storage. And the system resource will take the very, very small. So, okay, so that's all the content that we talk about today. And also, this is the GitHub repo. And you are more than welcome to join us to make the Huan may store is one of the best solutions for in the future with very low cost, high performance and many enterprise level features. And also please scan your QR code for your feedback. And yeah, and we also have the time for the Q&A. Any questions and we'll try our best to answer. So if not, we can, okay. By the way, so our food is the M11. So please come to talk with us. If you have any, the question. Hello. Hi. I have two full question. First question is, when there are compute heavy workload and the storage heavy workload together on a cluster, this local storage may not work, right? Because the pod has to locate with the storage or the storage is not accessible remotely, right? On a different node. Is that correct? Sorry, I'm not quite clear. So cluster with a different kind of resources requirement some application require compute heavy, right? And some applications are storage heavy. So pod has to co-locate with the storage, with the, with this storage class. Is that correct? Or can the storage be accessed from a different node in the cluster, a pod running on a different node? To be clear, is this your question about the, different applications will require the different, the storage resources. Like the. And compute, yeah. Compute. Yes, as I just mentioned, we have, we can support the different type of the, the disk types like HDD, SSD or NVMe for the different applications performance requirements. Okay. But the pod has to be on that node where the storage is coming from. Sorry. The pod has to run on a node where the storage is allocated on, right? So we, or the Hwamiso will have the scheduler, our own scheduler to make sure the data, the pod will drift. So we can, or we have the HA, so synchronization. The data can be to the another node. So if the node is failure, and we can make sure the pod will drift to the way the data located. Oh, I see, got it. And this node, this data, which is replicated, is synchronously replicated or asynchronous? We can support both, the sync or async, but currently it's a sync. And we will, yeah, it's also developing on the async. Oh, okay. So it's not LVM snapshot based? Not snapshot, it's a total copy, not snapshot. Okay, thank you. Okay. Thank you for the question. Is it working? Yeah. Maybe it's just me, I'm just a bit curious in your earlier diagram of architecture, you were able to expose local storage as your own storage class with CSI directly into the Kubernetes cluster for pods, but you also had instances where you exposed your disks to Ceph. You elaborate a bit on when you would maybe want to use, when you would need to use Ceph instead of just using your local storage storage class. So what's a storage class to work with Ceph or the other SES? Supposedly the question is, you're able to use your storage class directly from a pod, right? You don't need Ceph to... Yeah, no, no Ceph, it's over the, our, the, what we store the data volume can directly permission to the applications. And also we can, we can provide this type to the Ceph or the Minileo so they can create their own data volume to their applications. So there's two ways we can, in the architecture, I think we can explain before. Okay, thank you. Okay, so if we're not answering, if not answered, so we can talk offline. So our booth is the M11. Yeah, I might talk to you. We can talk to you. Yeah, thank you. Exactly. Hello, hey, can you explain more details about how you replicate data from one node to another? For example, if we write a file in a machine, in a node, how you copy that data of that file to another node in a synchronous way? So where is the replication, the tools? Like the ArcGround or the IBD or something like that. So it is, yeah, there is any data or file come out from the IO and we can synchronize to another. No, but yeah, it's a kernel, it's kernel mode. The IBD, the kind of the way the IBD is a very good tool for the data synchronization. So which tool? The IBD is also the open source, yeah. It's a third party tool. Yes, third party, yeah. Okay, thank you, yeah. Hello, can you explain how it is under the encryption? There is an encryption layer or a proxy? Proxy? Yeah, how the tool is it is under, and using the encryption, if, how do you perform encryption disk? Okay, we'll talk to you later, just many of the offline, so if you go. Okay, okay. Okay, so that's all. If you have any more questions, come to our booth, M11 again. So we have prepared a few funny game and also we'll find any more surprise there. Okay, thank you.