 Hello, everyone. Welcome to our session, Managing Cloud Native Artifacts for Lottie-Scale Kubernetes Clusters. I'm Henry Zhang, Technical Director of Cloud Native Lab with China R&D. I'm the creator and maintainer of Harvard. My interest is in cloud computing, AI machine learning, and blockchain. Today with me is Mingming Pei, architect from NetEase. He's responsible for Qingzhou Cloud Native DevOps Platform. He's also a Harvard maintainer. Here is today's agenda. We first talk about the two aspects of managing cloud native applications. Then we introduce how to use Harvard to manage cloud native artifacts. Next, we'll go through the case study of NetEase and how they manage artifacts for large-scale Kubernetes cluster using Harvard. We all know that cloud native technologies become increasingly important to build modern applications. Usually, there are two aspects in managing cloud native applications. The first is the dynamic part. That is the runtime, how the applications run, and how they are scaled up, monitored, backed up, and so on. The second aspect is the static part. That is artifacts. When applications are not running, they usually reside on storage devices as files or some kind of artifacts. The most common and important cloud native artifacts are container images, Helm truss, and so on. They usually that could be more like CNAP and so on to be managed in a cloud native environment. Given the importance of artifacts, we need to efficiently and securely manage artifacts when operating a cloud native platform. And Harvard, a graduated CNCF project, is designed to perform the tasks of managing cloud native artifacts. It supports OCI artifacts like Docker image, Helm truss, CNAPs, Open Policy Agents, Singularities, and so on. In addition, Harvard provides a bunch of features on artifacts management, such as RBAC, Robase Access Control, image isolation by project, image retention, and immutable images. We will cover some of them shortly. The first artifact management feature I'd like to talk about is replication. This feature was created in Harvard's early version, 0.3. It allows two Harvard instances to synchronize the images from one to the other. Because the replication tasks are carried out automatically and reliably, users love and apply this feature in many scenarios. In the latest release of Harvard, it can support replications of artifacts across multiple cloud environments with various registry and services. We list some registry service here, such as Docker Hub, registry service in public cloud like AWS, Google, Azure, Ali Cloud, and so on. It is very simple to move your artifacts between different environments. This will help a lot when managing your artifacts. Suppose you're running a large cluster with many nodes. If all nodes pull image from the public registry services, it will take up a lot of networking bandwidth to download the same images again and again. It is obvious, not optimal. Moreover, not all nodes within an organization are allowed to connect to the external registry. So Harvard provides a feature called proxy cache. This feature has been requested by many community users for quite some time. It is released in Harvard 2.1 recently. It is a special kind of Harvard project which can hold, catch the images, and serve them locally. It saves external networking bandwidth and speeds up the local distribution of images. Under the hood, it leverages replication capability of Harvard to pull images from remote sources when they are not available locally. Because cached images are stored under a Harvard project, all the project-related features in Harvard, such as quota, scanning, immutable tech, can all be applied to cached images. For any registry service in production, high availability and scalability must be considered. There are many ways to achieve HAA of a registry service. I just reintroduced some principles here, which can be the guidelines for users to implement a production registry. The Harvard core services components are stateless, which means that they can be scaled out by running multiple instances of each component. The key here is to set up HAA for persistent services like Postgres SQL, RADIS, and shared storage. There are many existing solutions for these HAA services. Just show one of them that fits your environment. If you have multiple data centers or cloud environments that are running your applications, you can establish an HAA Harvard instance in each data center and environment so that they can back up each other. The replication policy can be configured to synchronized artifacts between two environments, also Postgres SQL and RADIS can be kept in sync by using some kind of synchronization software or mechanism. By adding low balances in front of Harvard services, it provides high availability with an active standby configuration across two environments. This creates additional protection for registry service. If one data center or one environment goes down, the other can go live to continue the service. One thing to know is that the propagation delay of artifact replication between two data centers or two environments should be taken into consideration and when implementing such a solution. That is to say, artifacts stored in two data centers could be different when artifacts are being replicated. When artifacts are deleted from a registry, their storage space needs to be released and reclaimed. This process is called garbage collection. System administrators should perform garbage collection periodically to ensure the system does not run out of storage space. In the latest Harvard 2.1 release, garbage collection feature is improved and can be performed without any impact of image pushing, pulling, and deletion. This feature allows Harvard to keep providing artifact service while doing backhand garbage cleaning up. This is very crucial for production system to be up and running continuously. When publishing a new version of the application to a cluster, we need to send the artifacts to every node. For a large-scale Kubernetes cluster, it is a challenging task to distribute artifacts to all nodes within a short timeframe. If there is only one registry instance servicing the entire cluster, the registry instantly becomes the bottleneck of the distribution. So to address this distribution problem in a large cluster, peer-to-peer distribution approach seems a feasible solution. Harvard can leverage the capabilities of peer-to-peer engines like Dragonfly and Kraken to accelerate the artifacts distribution. What Harvard does is to proceed the peer-to-peer network along with the cluster. The idea is to distribute the artifacts to the peer-to-peer network before the request of the artifacts arrives. When the actual request comes in, the content is ready for distribution within the peer-to-peer network and can be transferred right away. In a case study of NetEast, we can see that the peer-to-peer approach improves performance significantly in a large cluster of many Kubernetes nodes. Harvard can work with multiple Kubernetes clusters as well. By setting up proper preheating policies, Harvard can send artifacts to each peer-to-peer cluster and make them ready for subsequent artifact distribution. When administrators manage artifacts, security is one thing they need to deal with. Harvard can help scan the content against a publicly known CVE databases. Based on the scanning result, Harvard reports vulnerabilities found in the artifacts so that administrators can take proper action, such as patching the image to remediate the vulnerability. This is not only crucial in a production environment. It can be used in CI pipeline during development phases to ensure all images created do not contain severe vulnerability. In addition to vulnerability scanning, Harvard can block pull requests if the image vulnerability exceeds a certain threshold level. Also, other features like content trust can ensure prominence of the artifacts. And vulnerability scanning can be triggered automatically when an artifact has been pushed to the registry. If you want to allow some CVEs to exist in the images, for example, if they are not very critical, or if you know it is critical, but you want to make them available for a while in your enterprise or in your organizations, you can set the exception in the allow list. From time to time, users may have their own types of artifacts. If these artifacts follow the OCI specs, they can be managed and visualized by Harvard. The latest version of Harvard extends the functionality for artifacts processor. Users can define their own artifacts format and media types by following the OCI specs. Then the artifacts can be pushed to or put from Harvard. A benefit of storing artifacts in Harvard is that artifacts can be treated the same as container images and can be replicated to other places or enforced by road-based access control. You can also get other features free from using Harvard. We have already seen partners utilizing Harvard for storing machine-learning models as artifacts and reduced operational complexities. There are many more to talk about Harvard's capability of artifacts management, but I'd like to pass it to Ming Ming for sharing his experience in artifact management in the NET-East. Thank you, Henry. I'm Ming Ming from NET-East. Today, I will introduce how we manage artifacts in NET-East. Now, the container technology is widely used in NET-East. We use container images, Helm charts, our plate founder, et cetera, as artifactors. GitHub is also used in production, and we use Hub as the repository of cloud-native artifactors. And now there are lots of Kubernetes clusters in NET-East and 5,000-plus nodes in larger ones. We have more than 20 Hub instances and the largest Hub instance manages about 100,000 images. Let's take a look at our architecture. We developed two services to manage the Hub instance is in the Kubernetes clusters, and we have the flexibility to combine the relationships of these instances at the same time, accessing them by the authority system of NET-East cloud-native platform has been established. In addition, we integrated NET-East object storage as backend for high availability and high performance. How do we make Hub highly available in NET-East? And first, we address Hub's high availability of fire storage includes object storage and local fire storage. We use, I think, you know, a fire synchronized tool to synchronize the data when using local fire system. For externally dependent high availability, we have used the open source project storage to address the high availability of post-gray circle and the HRA proxy approach to address the high availability of radius, which you can find in the detailed description in Hub community. In addition, Hub's monitoring mainly armed at the scene is enhanced and the monitoring plans of the scene such as replication failure and P2P dispatching failure are achieved through metrics of Plumaceous. This is how we approach the multi-environment management of artifactors, first do packaging and complete all kinds of tests in the testing environment and then we tag the image with a release to treat remote replication. The image will be replicated to the production environment and then treat the online deployment. For larger scale distribution of the images, we are facing two problems. One is the throughput pressure in registry service and next is network bandwidth pressure in backend storage. And we realized P2P distribution by integrating Hub and the clock through the mode of sharing registry server. The complete design can be found in the community of clock. We finally achieved the goal of over 5,000 concurrent pools and the 10 gigabytes plus image distribution acceleration. And here I would like to introduce some features of P2P distribution of clocking. This is the test that we made. You can see from the table, the bound-wise image of P2P distribution is configurable and from table, you can see more layers and smaller layer size cause lower utilization of bound-wise. It means the distribution will be slower and the number of P2P peers brings little impact to their distribution performance. Max 15 kilos peers spotted officially and it works well in our product environment. Now, we also petition to the safety of artifacts, artifacts are scanned after packet immediately. There are also two types of quality gates. The pipeline gate and the dispatch gate are loss asset and there will be goals to determine whether the artifactors should be presented from being used in production environment. And finally, let's look at how we managed the artifactors in CRCD. First, we create our CRCD process with application center. You can see the full flow of our CRCD and also I have introduced some related processes earlier. And besides that, the following functions are also contained in CRCD stages. Artifactors versus management, artifactor security and the CD triggered by image pushing webhook. The CRCD stages are connected in series through code and artifactors. You can see from this picture, this is the full CRCD flow in our company. And that's all of my share. Thank you. Back to Hu Kunlin. Thanks, Mingming. I want to summarize a little bit here. So artifact management is an important aspect of operations in a cloud native environment. Registry is an ideal place for performing the management tasks of artifacts. Harbor can be your choice of powerful tools for managing your artifacts. Harbor can provide high reliability and scalability for the registry service and have replication proxy, caching, non-blocking, GC and so on. A whole bunch of powerful features that you can consider and leverage. As you can see in the case study on land east, they use Harbor for the large scale Kubernetes cluster for the artifacts management. Lastly, I want to introduce our new book of Harbor. It is the first book on Harbor in the world, also by the Harbor's maintainer and contributors. If you are interested in the content, please take a look. Thanks for listening to our session.