 So now it's in number 45, and we're going to start immediately. And our topic today is embracing the data workroom in cloud-native environment with data faculty. And now I want to introduce ourselves. My name is Yau Xiaoliou from Cardera, and I'm mostly working on the app. I'm the attached Hadoo, HDFS, and Ozone projects, and I'm the attached Hadoo Committer and PMC. And Sammy is also a member of the Ozone and attached Hadoo, HDFS, and also the attached Hadoo Committer and PMC member. And here is the outline of our talk today. First, we're going to review the big data evolution in cloud-native environment, what challenges we need, and we're going to introduce data locality and why it matters, why we need to pay attention to it, especially when it's used in cloud-native environment and its demand. Next, we are going to have a look at the storage of the data locality and how we're fetching this locality of this data and provide some reliability and functions improvement. Next, I'm going to introduce the Ozone projects and attached Hadoo and how to use it in KVS and also its philosophy of using it. And there will be Q&A sessions as well if you have any questions, we welcome one of you to propose questions for us. And next is the challenges in the cloud-native environment, and the goals of this environment is utilizing some open-source software to make it into the launchable microservices, publishable services, and we're going to mount it into the microservices and microservices and deploy it in one container so that it can have dynamic orchestration orchestration so that it can optimize its resource applications. As for the big data itself, the earliest big data is actually from, it's based on the Google GFS paper and also the engineers of some open-source providers and also they're utilizing their structures as well. The basic ideas, the storage and computation nodes are mainly co-located. The benefit of this is that when you need some compute resources and need some storage you can minimize your your query on the website on the web to get your data so that it can help you to better paralyze your your data computing and also increase the level of your data calculation. You can see the workloads are going to be increased in times and you can see there are some distributed file systems and also we have some of the scheduling of the system. For example, John and Cosner and resource managers like Mezzos and Pat Mezzos based on that there will be a different kind of workload management and if even some of the analytics workload management and like this kind of workload is going to have some high job and also the it builds upon the John and HP face and HDFS and there were some other workloads will utilize Spark and MR as well so it will going to have a very tight coupling between the computing and the storage so you can guarantee efficiency. However, these functions will have some challenges as well. First, how can we schedule the utilization of these resources and there were some nodes may have relatively poor computing and storage power. Maybe there was some homogenous data so that its resources do not have the same capability as other nodes when your storage need to access to your local resources although you have very strong resources but you cannot utilize it because of the accessibility problem that it will reduce the efficiency in the scheduling of the resources. Sometimes you have a lot of computing resources that has been idle and also some of the nodes have some overloaded problems. The utilization of the clusters may not achieve its optimized state and there was some scalability problem as well. You can imagine if the whole cluster need to get horizontal scaling and you can scale it the computing nodes and storage at the same time it may have the unified requirements on hardware and software for the computing nodes the scaling may have higher the multiple resources and the storage may be more stable and computing may have higher dynamic requirements. For example, it may be able to use a spin up and down some of the nodes dynamically so the result is that if you are not able to make it into the homogenous nodes to deploy it as homogenous nodes then there will be some problems and in this way your hardware resources may not be fully used and sometimes the computing resources are available but the storage is not satisfying or meeting the standards then there will be weights of this kind of computing resources so during the big data evolution we can see a lot of the co-developed system developed from the coexistence system into some saccharity system and in these clusters we will have some HDS clusters it's just managing the storage it's just for the storage clusters it's not relying on the computing clusters and you can dynamically adjust your computing clusters you can scale up and down so that you can guarantee the overall operations can achieve the optimal configurations and with this framework you are able to attribute high utilization of your resources and also the storage will attribute good effects however the overall performance will have some problems as well because all the assets of the storage don't have no locality concept and all of those assets need to rely on the network connecting the computing and storage and although we have increased the speeds of the network however in the big data environment and and there was some parallel data interaction if we need to guarantee the access on different ends limited in 58 seconds and there were 50 milliseconds so we need to adjust some stream time as well however with under this framework we are not able to achieve these standards and are not able to implement these standards as well so we will get to the third phase of the big data evolution originally although we separated the computing and the storage we also utilizing the hybrid cloud in the multi-cloud there were different kind of cloud providers are providing this kind of elastic computing nodes you can spin up and down scale spin up and down of this kind of nodes just like how we utilize big data as well it can it can modify its computing power and based on object stores or cloud based on cloud if you need this data on press on premise on on the cloud and then if going to spin up this data to the uprames cloud or the private clouds or the local cloud to do some of the distributed storage so that for those have very high requirement on the latencies the storage will have more efficient services based on this framework but there will still some problems in this framework if you need to get access to it to access of these objects a lot of these objects is based on the ultimate consistency and a lot of utilization of big data is and it assumes that they will have very strong consistency and once you write data and you need to read the same data they just written however in the objects storage there is no discount guarantee so we will see a lot of big data projects is doing some of the local set metadata handling and on on your own prime or the private clouds to do some of this access so that it can stabilize it it can get the consistency just like on the cloud and next is the reliability problems once you put your data on the cloud they will add a cloud copy in your data how can you guarantee that in different applications in different places like in the north america and asia or europe how can you guarantee that the different places can get access to the closest source instead of from the further places and as for the reliability if you just have a local copy and if there was a problem in the local data center the data will not have a reliable replica just now we have mentioned locality locality is a very important concept in big data but it's locality it can be divided into different types the basic thought is that you want to put the data in those places needed for computing computing so what is the data type you can divide it into the following types and this procedure able to get this data from the storage so this is the process local if it's not interesting nodes maybe it's in some documents store in the local nodes that is the node local concept and there's another case that it's not on the local nodes maybe it's on the same suite or the right so at the same rate maybe it's a little bit slower than the local access but it's better to it's better and quicker than you to access in our data centers so this is the third layer of locality the fourth layer is that maybe you need to cross different racks then there will be a bigger latency between different nodes or terminals to guarantee the locality in big data so we can have a lot of benefits for example it's end-to-end latency can be shorter it can have an overload throughput will be higher and reduced unnecessary network traffic so that the big data clusters are able to complete more tasks and jobs so that the utilization of the cluster will be higher and we have a reliability guarantee because it will tell you for different allocations what copy you should use not just based on the three copies in your data center or on the locations you have and it will be a very good way to prevent any accidents happen and when they are fetching this data they cannot wear the locality of this data and also they're able to tell the tasks the tasks are doing the scheduling about the blocks of this data for example and it's a it's a split of it in the patch and also it can be based on its locality level just like what i mentioned the red local the process local and the node local et cetera and this is locality level of different data next let's take a look at how the data storage function supports data locality speaking of the storage layer we have the mass dimension hdfs hdfs since it's about since the inception there has already been more than a decade it's been developed into a very stable storage system and mainly it's divided into metadata layer in the metadata layer is on the name of node and all the metadata in the name node were loaded in internal memory so it's very quickly to access the internal memory it can reach 200k per ops second and it also supports a large memory normally it can reach hundreds of petabytes and capacity and a single node can support thousands of thousands of nodes for plastic supporters and it cannot further scale horizontally it also it's also very consistent resilient to different types of disk failures or node failures next this slide is about the hdfs locality how it is delivered i mentioned there's a master node and data node and master node is to maintain a metadata where the other is to maintain the corresponding modules each file has a list of blocks corresponding within the file for example in the first application scenario we have a hdfs as planned co-located this client might be a Spark job or a matter reduced job it needs to access a high file and the high file gave the routes and a client will access the name node where are the corresponding block and it knows where the the clients come from the client comes from and it knows where the the blocks are in which client so based on the topological information based on the distances the different data nodes will be categorized based on distances and all these information will be feedback to the name node so all these data sets were trained if you access data node one then the preferred data node is data node one if you were to access the same file through data node two then it will give a data node six instead of a data node one because data node one is in a different rank and there are some other optimization options for example even if you are accessing native file local file without second rate it is done through the network stack to access the file and there will be some additional dcds so this optimization it means when the local file there are no local client to access the local file then it will directly you know regard the local file block as a certain distance and it will be transmitted to the client and the client can access without the the stack then naturally in the cloud native environment we will consider you know using open source software stack to deploy applications to run it in in a park we only need to boot it in so that if it can't mark the cloud native patented a service but there are some challenges for example the hdfs name node has some straining restraints they're all put under the them nodes to facilitate the access but the problem is if you have many small files small applications small programs there will be like hundreds of millions of files and needs special tuning and it will take up a lot of memory to support the regular function and if you were to restart and upgrade the system it will take a long time to reboot and the second challenge is the containerizer hdfs data node it's not easy because different part the ip is a virtual ip it it corresponding to the mapping is different from you know simply building up the topology you need to additional tuning to ensure that the ip address are identical and there are also some opportunities in the cloud native environment if we have this storage orchestra we can ensure that the hdfs the existing hdfs can be easily upgraded to to the cloud native environment and provide more cognitive and friendly name service and block service so we have more better services to support locality next i will ask my colleague samir to share ozone and other how ozone supports data locality thanks xiaoyu's introduction we all know that hdfs there are some some challenges in applying it so we need some solutions to cope with these challenges and hdfs community have spent over 10 years in on the development and the marketing so we have a new open-source project called apache ozone apache ozone is a targeted storage system through apache protocol to provide a consistent object store and one of the objective is to address the hdfs scale problem we know hdfs clusters the storage upper limit is like you know a hundred million level so ozone's target is to to support over a trillion object this is one of ozone's primary object and second is to support cloud native environment from the very from the design of ozone it encompasses all these external environment it supports you know through through docker through yarn to deploy in kbs kks and in the latest release for it supports csi in terms of security ozone supports a couples-based and kms-based user encryption and decryption programs and system so as to meet corporate or enterprise clients expectations to security and the object storage protocol is different from s3 and other familiar ones it it's more compatible it can serve s3-based services or based on the object store protocol we have a further extended the s3 gateway and interfaces that is to say based on the s3 program it can seamlessly dock with ozone in the meantime to support aloof this in a native environment ozone have a hadoop abstract file system this interface using this interface can upload for the upper stream services like yarn and spark it can be seamlessly utilized utilize ozone there's no modification required and lastly data storage in ozone is through wrap the default is three segregation so it's very secure and reliable this is a ozone's overview there used to be a name node function it's divided into two independent services and namesake ozone manager we have to manage we have ozone manager to manage namespace originally it manages the file to sdfs for the object or the specific data block but now the we have a separate component called a storage container manager it's a service and namespace functions is to divide we divided it to enable more micro services and the data node is similar to sdfs you may find that there's a the concept called container so container in ozone is a replica it's a basic replica unit in hdfs block is one unit of replica well in ozone it's like actually a logic concept a sequence a cluster of data sets the size can the default size is five gigabyte it can support up to 16 gigabytes so with this new component the original file to block relationship there's an additional layer object to container and container to block and in each container it can maintain its own block information so all the metadata does not need to be concentrated in ozone manager or sdfs controllers internal memory so the scale is very unique and easy to scale really here there's a chart showcase the containers architecture so each container has its metadata and the metadata is a key value of stock inside it will record in which block which block is placed at which location container has two states open and close when it's open it's readable and writable when the size reached the default set or the threshold like five gigabyte then the story container manager will observe this phenomena and will trigger a sequence so the container will be switched from open to close of course when it's closed the container is immutable so it cannot be written again it can only be read so the ozone manager can switch the container's state based on the the data node utilization so it can be migrated freely so the data distribution can be optimized it will be you know optimized and have an average distribution among different nodes and now let's talk about the topology support Xiaoyu has mentioned based on the topology we will know the locality information of the data it can better distribute the resources so that the computing node it will be very close to the computing node to reduce network traffic more efficient operation if you know HDFS in order to topology it supports two layers from the root to rack and under rack there's a node so three layers let's call it now the cloud native environment the virtual environment the original three layers are insufficient to represent the network topology so in ozone we provide a user customizable model so the user based on its unique environment can customize any layer of topology this is only one example I under the data center I can have a different rooms and under rooms there are different racks under the rack if this virtual machines virtual servers then you can have a virtual room and under it at the bottom is our data node so this is only one example the user can customize it according to its demand and by default we provide a compatible two layer structure compatible with HDFS so with this topological support naturally we will think of how to optimize the ozone's data reading data access flow this is the one representation the client under rack 2 is about to access the object and the object has three replicas stored in rack 2 and rack 3 one data node in rack 3 and the client will communicate with the manager that I am about to access the object and I will give you the key you should tell me the locality and the ozone manager will recognize the location of the client and based on the location of the client to based on the distance and to the client there will be a priority there will be a distance list and this list will be used or be based because of the rack 2 is closest to the client then it's ranked as the first choice and the client will access the data through the rack 2 because it's a local storage it will enable faster access this is the topology where writing process it has open and closed states when the container is open you cannot move this location it's going to build up a wrapper ring and after the container is closed the wrapper ring will stop so this container can freely move around these clusters can do some topology as well it can migrate to the locations that is more suitable to it and at the same time guarantee the reliability of the data you are not going to put all the replicas on the same nodes on the same rack we can put it across different racks or deploy it to different applications some application will have its demand maybe it will require its data to be deployed across different database and you can utilize this information to migrate this container to the places that the client want let's review the KBS concept first it's an open source project and also it's providing the microservices and this class has been used in our release so it will be very convenient for us to use it apart from the what we mentioned in the framework like the data log and different tools we also compliant with the s3 gateway an API we also have some other like encryption tools and functions as well as for the cloud native is supported by the KBS and also has the topology aware writing as well that can be used on cloud we can better utilize topology on cloud environment and this is our suggestion on of the framework for deployment we the the cloud native will support this framework and it's quite easy to build it up we have four components here one is om and we suggest the step-by-step services as for sdm and also we can use the default state process services as well this service can use a replication you can utilize even multiple replication and for s3 gateway you can deploy more because it can enhance the efficiency of the system and next is the demand sense so that we can better schedule different nodes and there will be an instance for these different nodes and next is the romat boson because many communities are promoting the the also project and we have accumulated a lot of experience in the also community and utilizing it on this project so this project has developed quite fast and we are going to release new versions every two to three months and in also 211 we have support supported the compatible file system and also able to support the natively support spark yarn and hide and on the 0.3 version we support the s3 gateway and also we have pushed the deployment in the kbs on our version 4 open 4 and also on open 4.1 we also work on the csi support it will be better help us to deploy the qbs storage it can storage it on the on-prem and that will support the non-victator applications we also have an awesome operator it can work with root it will better for the deployment in the kbs and it will be more user-friendly for the clients thank you very much and here is the official website of also if you are interested in it you can try to use it there are a lot of guides there you can try step by step thank you if you have any question you can ask i have two questions here first in the current production environment is it better to deal with the scalability is there any good solutions for maybe i can answer these questions first i can repeat it here in the real production environment when we using also what kind of uh what's the scale of the utilization and now in some of the production environment of the users it's like some of the testing projects they are going to redirect some of the resources to the to the ozone we have about 250 nodes the for the biggest one and haven't exceed the haven't grown up to more than 1,000 but from the perspective of those on nodes it can address even up to 10,000 nodes it will be even better than the hdfs theoretically so as for ozone so in the market environment and for the migration next is for the migration solution last year we have done the sdfs upgrade to the ozone and this upgrade has minimized the data copy so that we can understand format on the data level to use different of splits to handle the upgrade of that and it's going to use a large amount of dict and some of the other tools to deal with the data you can have a look at it on hdfs glop and i can give you a follow-up and then we have some posts on the community as well we want to convert these data but not to copy the original data i have two questions please use the microphone answer this question we use the wrap technology in ozone and wrap is the independent project when we handle a large amount of isle wrap it's just trying to deal with some of small data update upgrade and vettis is trying to solve these problems it's a library and we use some micro benchmark and use this to deal with bandwidth problems originally based on wrap if you would need to do some synchronize io and wrap commit there will be some of the performance problems and all of the rough is working on the apc model and have two phase so we can theoretically achieve the performance requirement that is suitable to hdfs and hdfs writing functions and in some of the small doctors can even be better than the hdfs functions the next after the container is closed if i want to modify this data or how can deal with it our solution is that we are going to tell the client after the there is a state on sccm if you want to write on it and they will have some and have some commands saying that it's on the pending closed state if the client's writing on it and they will and they will require maybe it will be rejected but it will require to get right on new blocks for example i want to draw some changes to the document and we found that the original state has been closed and then maybe i will write it on the new container to um on the new location as for the functions maybe i can make some supplement memory point here now as for the awesome writing it's quite quick already there was some data supporting this is there any samples for example on hdd what's the specific time used on it maybe i need to check i want to ask another question where we can check the data that you have been working on and give us as a reference and we are going to utilize the blocks to share some of this information with you we also want to know that you think it's stable during this process anything should be improved for example how about the network it depends on the disk that you're using as for the network not quite sure because we're utilizing the real network it's the 10 gigabytes network and some is used still using the one gigabytes if it's 10 gigabytes and utilizing gb so maybe it will be a little bit slow sometimes and some clusters will use ssd to speed up one to two ssds to speed up because now the functions of ssd is very good maybe it's similar to the web disks already and we also want to have a look at data and can you share some with us for its performance and you also tell us about the roadmap is there a timetable for this roadmap or when you is there expectation of some new release and by the end of july we are going to have 0.5.0 release it's going to achieve the production quality and it has a higher ability and can increase its availability release we also will include it into our products and put it into the operator list so that the open source community and community products can come in line and i think the absolute release is very important can cover all the cases and also are able to provide a lot of benchmark results and performance data as well and if it's 0.5.0 if it's able to be stable we're working on the market and environments and it's 10 scored scale 0 to 10 it's 10 maybe we can communicate here i just want to know about the maturity of this maybe we can talk about it privately