 Good afternoon. We are Yifei and Hao. We are from Bidance. Today, we would like to share our story of building virtual machine monitoring service on top of EBPF. Bidance is a cloud computing provider. We provide our customers virtualized computation resources. And then we are also building an ecosystem around the computation results. One component in the ecosystem is the virtual machine monitoring services. And here is the architecture of our monitoring service. In the center of this monitoring service is agent. It posts the performance data from the kernel. It does the necessary filtering and aggregation and finally publish the results to a database. And our customers can visualize the data through our APIs. In the first iteration, as the green box is shown, the agent collects the data through Linux author box mechanisms. For example, the PROC FS. However, when we investigate very challenging performance issues, we realize we need performance data with finer granularities. So in the second iteration, as the right and the blue boxes, so we add EBPF code in the kernel to collect the raw data from the kernel and publish the results to use the space through BPF system call, libbpf, and libbpf go. Eventually, our customers can see this additional data. With this data, now we are able to work on more challenging performance issues than before. Here are some examples of the data we collected by EBPF. On the top is the figure shows the network throughput per network flow. Network flow here means the file tuples to IP to port and the prototype. And the bottom shows the network latency per connection flow. Here, we show our customers like RTT, connection window size, and transmission rate. We provide our customers APIs so they can build their own dashboard like this or they can use our own in-house dashboard. We provide our customers network latency in details. However, we are also very interested in latency questions in depth. For example, what's the software overhead in the latency? Which network layers contribute most to the latency? So we select key functions for each network layer. And we calculate their running time as the latency in the layer. In that way, we can also view the latency from the layer dimension. As the figure shows, that's the function we select for each layer on the two network paths. The challenge here is not collecting the raw data from the kernel. More important is how to interpret the raw data into useful and understandable information for our investigation. It's pretty challenging, and we are working on that. Currently, we are focusing on network subsystem using EBPF. We also have a plan of using EBPF for memory and storage subsystems as the next step. Deploying this BPF-based working machine monitoring in our cloud environment is not easy. Because the kernel version in the working machine images come with different flavors, from different vendors like Red Hat, Debin, and with different kernel versions. It brings us challenges. For example, the verifier in EBPF we found has been evolving in the history, and its behavior is not very consistent. So we have to customize our BPF code for the verifier. And we hook our BPF code through K-Prob. And not all kernel functions are stable. So we have to build different version of BPF code for the unstable kernel functions. With the limited time, we share a part of our story, and we are more than happy to share more offline. Thank you for listening.