 Hello everyone, it is a great honor to join in KubeCon and cloud-native con North American to share the drone fly. This is also my second time joining KubeCon to share drone fly, unable to attend or fly sections due to COVID-19. I hope late listeners can have the same experience as or fly. The sharing is the accelerates image distribution in multi-clusters with drone fly, which introduce new features and experience in multi-cluster that have been added to drone fly in last year. My name is Ji Wenbo. My English name is Giles. I'm a soft engineer at Ant Group working on drone fly. I'm also a maintainer of drone fly. If users have problems during the use of drone fly, you can communicate with me. I hope to have more exchange with you on the new features of drone fly in the future. First, let's introduce the drone fly project and some of our research and deployment data in the past year. drone fly is an open-source p2p-based image and file distribution system. It designed to improve the speed of the large-scale file distributions. It is used in the fields of application distribution, cache distribution, log distribution, and image distribution. At this stage, drone fly has evolved based on drone fly 1. On the basis of maintaining the core capabilities of drone fly 1, drone fly 8 has been upgraded in major features such as system architecture design, product capabilities, and use system nature. drone fly has been selected and put into production used by many internet companies since its open-source in 2017 and entered since in October 2018, becoming the third project in China to enter the Sensef sandbox. In 2020, Sensef TOC voted to accept drone fly as a Sensef incubating project. drone fly has developed the next version through production which has absorbed the advantage of drone fly 1 and made a lot of solutions for known problems. Now that drone fly has been released more than 120 times, the project has actively commenced for a long time. We can refer to the first picture, which is the commit date for the past year. I would also like to thank thanks to contributors from different companies, including but not limited to these companies, such as Ant Group, Alibaba Cloud, BatDance, GetLive, Meituan, Xiaomi, Langchao, Shanghai Jiao Tong University. Listeners who are interested in the project can join in the communication through the link below and discuss the future deployment of the project with us. Some listeners may not know much about the drone fly project. I will introduce the architecture of the drone fly project and the role of each service. First, drone fly includes manager, scheduler, set peer and peer. The manager module in the project can be used as a manager service. The scheduler module in the project can be used as a scheduler service. The DFD module in the project can be used as a set peer service or peer service. We can refer to the architecture picture on the right. When the peer downloads task, it will call the scheduler API to get the list of the candidate parents. If the task is downloaded for the first time in the P2P network, the scheduler will trigger the set peer to download back to source. After the peer gets the candidate parents, it will download the required piece from the parents. The manager can provide a valuable scheduler address and a valuable set peer address, two peers in the cluster. And the manager manages the relationship between multiple clusters and manages the stability of the clusters. We introduce each service. First of all, the manager service, which is a management service. It is used to manage the relationship between multiple clusters. A P2P cluster includes a scheduler cluster, a set peer cluster, and multiple peers. Multiple P2P schedulers are managed by the manager. At the same time, manager also provides dynamic configuration management, such as controlling the load of the peer and set peer. The manager can also become a certificate issuing CA service, issuing lift certificates to scheduler, set peer, and peer. It also provides a front-end console, which is convenient for users to manage P2P clusters. It also provides user-related features, such as user module management and RVAC. It also provides open API, such as preheat API, to be coded by other service. Of course, the main feature of a manager is to manage the stability of the cluster. It can remove the active service address from the cluster. Scheduler is a very important service in the drone fly. Its main feature is to select the candidate download parents for download peer. When the peer download is failed, control the peer to back to source, download the task. Of course, the scheduling is still very completed. It will build a DAG of peers and in the scheduling. It will go through feather and evaluate to select candidate parents. Set peer can be triggered by the scheduler to download back to source and dive the results into pieces, which can be used as the root node of the cluster results. And it also has a feature of a peer. Usually, set peers in the cluster provide high performance machine and a high-quality network environment. When downloading tasks for the first time, they can back to source download the results at the fastest speed. Peer is the client in P2P network. It can be the downloader and uploader. It will start a DFGAT daemon to distribution image. It gets the parents by exchanging information with the scheduler. The DFGAT daemon and DFDemon extensions, DFGAT, DFstore and DFCache, command LAN tools, RCS architecture. For example, DFGAT will download files with the command LAN and DFGAT will call the GRPC API of daemon to download files. This page introduces some major updates. First of all, we have three main improvements in terms of stability. We replace the CDN with set peers to improve the stability. The stability of the cluster is ensured by keep alive between each service and manager. If there is an error on the instance, the manager will delete the instance from the cluster. At the same time, we use the GRPC consistency hash to ensure the cache iterate and resolve the GRPC address in case of call exceptions. In terms of the security, use the manager as a CA to issue certificate to ensure that TLS is used during the transformation of P2P traffic. We return the scheduling result through two steps of feather and evaluate it and use the DAG mask to form a P2P network to improve the download efficiency. Use set peers instead of CDN service because CDN is a sparrerate module before and it can be actively triggered back to source downloading. So the current version removes CDN and daemon as triggered back to source downloading API. It reduces maintenance costs for CDN modules. This can also clearly define the role of different types of peers in the P2P network. Rather than adding a new service, the above picture shows a download size. First, peer will get the most matching schedule address from the manager. Then peer adjusts tasks with scheduler and peer then builds a bidirectional stream with scheduler. If the task is downloaded for the first time, the scheduler will trigger the set peer to download back to source. The scheduler returns the candidate parents to the peer. Then peer will download the piece from the parents and build the piece into a completed file. Users can deploy the set peer in a place where the network environment of the source is better. So as to ensure that the first time back to source is as fast as possible. The manager can manage the stability of the cluster through KbaLive. First, the scheduler will start and then report its host information to the manager and KbaLive with the manager. And then the manager will record the scheduler state as active. If KbaLive is disconnected, the manager will set the scheduler state to inactive. And the scheduler will get the address of the active set peer to trigger the back to source download. After the set peer is started, it should also report its own host information and KbaLive with the manager. The manager set the set peer state to active. If KbaLive is disconnected, the manager will set the state of the set peer to inactive. When peer starts, it will put the list of the scheduler address whose state is active and matching from the manager according to its own host information and provide for scheduling use one downloading task. During the process, the peer use the scheduler address. The scheduler use the set peer address and the manager the routing through the manager. When the service has error, the manager will set service state to inactive and remove the service from the cluster. This way, we can manage the stability of the P2P cluster. Dragonfly manages the cache hit rate of downloads through consistent hashing. Consistent hashing is done by gRPC balancer and resolve builder. The peer gets the matching scheduler address with active state from the manager. Then add the address to the gRPC address list through resolve builder. When the peak builder is built, the address of resolve will be built into a consistent hashing. When peaker peaks, it finds the matching scheduler address in the consistent hashing according to the download task ID and use this address for peer scheduling. If exception is on a scheduler, the gRPC request will fail and the resolver will resolve the new available scheduler address from the manager and rebuild the consistent hashing and the peer will also update the available scheduler address. Rebuild the consistent hashing. It can not only ensure that the same task ID can hit a scheduler but also delete the inactive scheduler address in time. Let's talk about how the core service scheduler offers the Dragonfly scheduler's tasks. Scheduling is divided into two steps, feather and evaluate. One is the peer exchange scheduling information with the scheduler. The scheduler will first take out all candidate parents in the task. The scheduler will feather the unavailable parents according to the rules. The feather and feather based on the current parent's load count. Even if the parent's load count exceeds the limit, it will feather the alt. If parent's download state is failed, it will also be feathered alt. The parent's download value is called the time consuming of each piece. If the piece downloading time conforms to a normal distribution, it will feather the slow downloading parent by 6 sigma based on the piece download time. The feathered parents are all available parents. The next step is evaluate and scores the available parents. And then selects our set of parents with the highest scores, which is returned to the peer to download the task. The evaluator will be based on multiple features, such as the more free load the parent has left, the higher the score. If parents download more piece, the score will be higher. In the end, a score will be given to each available parent. Then sorted and the set of parents with the highest score will be selected and returned as the scheduler result. The scheduler process is building a DAG for the peer. Why build a DAG? There are two reasons. The first reason is that in John Fly's P2P network, there is no peer that publishes results. The real results in the John Fly come from the south side. If it is an image downloading, it may be a hubble. The real results needs to be downloaded from the south side to the P2P network through a peer. And other peers can download it. If peer A downloads from peer B, peer B downloads from peer C, and peer C downloads from peer A, none of the peer downloads real results back to sons, forming a download loop. In this way, peer cannot download real results. So when building a network, there can be no loop, and a peer can download piece from multiple parents. So the network must be a graph. Therefore, it's necessary to build a peer download network in scheduling, which needs to build a DAG. The early version used a tree, which made it possible to indicate that the peer downloads from multiple parents. It can only indicate that the peer downloads from one parent, which will reduce the download success rate. The following sites of data fails of different sites, which represent the difference in downloaded data caused by the P2P network in the form of a tree or a DAG. Peers are deployed in different IDCs. So the bandwidth between peer is not fixed. And there is only one peer back to source download task. And we will increase the number of peers in the P2P network to get the final date. First of all, we can look at the picture. One downloading task, no matter how many peers are, the download bandwidth of DAG is larger than that of the tree. And in the case of more peers, the greater the impact of the download bandwidth. We can see that when downloading one gigabyte task, the data on the graph is compared. And the DAG will be better. Bandwidth would be better. When downloading 10 gigabyte task, the data of the DAG will be better compared to tree. So changing the P2P network from tree to DAG can improve bandwidth usage. In terms of transmission security, John Fly has also been upgraded. John Fly for the transmission precise, most of which are based on Neutr's TLS. Manager is John Fly's CA and issue certificates for each service. Each service will call the manager API to get certificate change. One is started. Of course, users can also turn on or turn off TLS through the switch. If TLS is turned off, the communication between service will be carried out in clear text. In creptate, transmission can make P2P network transmission of information more secure and reliable. John Fly is based on the upload and download service provided by peer and derives other command line tools. For example, the provided DF store command line tool provides object storage based on backhand, which can quickly read and write object storage. Backhand can be adapted to the object storage of cloud service, such as AWS S3 and Aliwin OSS. We can create the P2P network as a large catch to catch the content of object storage in the peer. Below, I describe best practices during multi-cluster deployment. First, P2P sub-cluster consists of a site of a scheduler cluster, a site of a seed peer cluster, and multiple peers. The scheduler cluster will manage the download metadata information of this P2P sub-cluster, hyping peers to exchange data in the sub-cluster. Of course, a sub-cluster can be an independent network environment. Multiple sub-clusters require a manager for management. So service in multiple sub-clusters need to access the manager. The pictures show the multiple cluster management deployment. Users can manage multiple P2P networks isolated from multiple network environment through the manager. We will provide users with a site of image acceleration solutions based on Hubble, drone fly, and NIDUS. First, we will build the image into a NIDUS image, then upload it to the Hubble, and then download the image through NIDUS stem shutter, which of course is a lazy load. When downloading the data of our file, it will be accelerated through the drone fly P2P network. In this way, users have an image acceleration solution that can be used on demand and drone P2P acceleration. In the future, drone fly has some plans for each module. We will add intelligent scheduling and will optimize the download practical. The picture shows our future plans for each module, and interested developers can pay attention. Finally, thanks, I hope more developers can pay attention to drone fly. Interested developers can follow our GitHub project, or join our select channel, or follow our Twitter. We will release some version information and feature upgrade information, or join our discussion group. On the right is our DingTalk group QR code. You can scan the code to join our community discussion. Thank you.