 All right, let's get started. Welcome, everyone. We are here talking about using e-steal to manage and monitor machine learning workloads. My name is Wen Chen. Lee means my colleague. We are both software engineer working at Google. We are working on e-steal. But most specifically, we are leading the e-steal security development. How many of you have heard about e-steal? Great, wonderful. What about QBflow? Excellent. Let me start with a little background about e-steal. And then we'll talk about QBflow as a case study for machine learning pipeline. So e-steal is an open service platform to manage service interactions across container and VM-based workloads. There are three main challenges on service communications. Monitoring, connectivity, and security. And e-steal solves all three problems for you. First, e-steal provides you uniform visibility into your service communications. You get latency, request patterns, errors, all sorts of telemetries you want to know. Second is operational agility, which is mostly about traffic management. So e-steal provides you load balancing, traffic shifting, traffic splitting, circuit breaking to help you manage your traffic easily. And the last stage is to provide the security you needed to secure your service communication and plus declarative policy, which you can easily describe your intent. Like you can easily turn on mutual TLS to enable data-intensive encryption plus authentication, authorization, and audit to protect your service. Let's take a look at an overview of e-steal architecture. On the data plane, is the auto-inject N-way proxy to every single service instance? And this N-way will intercept both incoming and outgoing traffic. Combined with e-steal control plane, e-steal provides magic to secure, connect, and manage your traffic. There are three major components on e-steal control plane. First, pilot is responsible for distributed policy to every envoy. Getty does policy validation to make sure the policy is good to use. Mixer is the integration point to collect and report the metrics to all different backends, like Prometheus, Grafana, whatever monitoring system, unlocking system, tracing system you would like to use. Citadel is a key to managing key serve, provision key serve to every single workload, and then you can establish a mutual TLS to secure the connection. Now let's talk about running machine learning workloads in a multi-tenant environment. Typically, a machine learning pipeline can be divided into three stages. Prepare data. You store the raw data in some persistent storage, like GCS bucket, Amazon S3. And then you ingest data to machine learning workloads that analyze, transform, and split data to the format you can consume by the second stage, like feature columns. The second stage is training models. That includes select the right models, hyperparameter tuning, and model testing, and model validation. After finishing the model trainings, we move to the third stage. Then you deploy a model and serve the model as a traditional service. In a multi-tenant environment, each tenant may want to have its own pipeline to run their machine learning workloads. So here comes the challenge of multi-tenancy. First, on security. You want also the isolations. Each tenant may want to provide isolation on operators. So operator can only manage the workloads and resource for your own tenants. You don't want operator from another tenant to be able to touch your workload. Second, isolation of communication. You only want communications within your tenant. Workload can talk to workloads within the same tenant, not across tenants. There might be some data you don't want to expose to other tenants. And the last one is isolation with user access. So you don't want a lot of tenants users to access your prediction service, to access your notebook. The second one is operational agility. So after you build the new models, Easter can help you easily test and verify how good the model is. So Easter provides splitting, traffic and mirroring. After you verify it's a good model, then you can leverage Easter to roll out the new model safely and easily. The last one is observability. Each tenant won't have its own view of monitoring, logging, auditing, and tracing. So they can take advantage of Easter to do troubleshooting and react quickly to provide good SLO. So here is our proposed solution, which is Easter plus Kubernetes namespace. Each tenant has its own dedicated namespace. Basically, they run their workloads on their namespace. We're talking more about how Kubernetes namespace can be leveraged to do resource and workload isolation. Then Easter will auto-inject envoy to each workload, along with the Easter control plane. The Easter provides the security, traffic management, and observability you needed to manage this machine learning workload. So Limi is going to talk about how this proposal works, using QBflow as a case study. Yeah, so I'm going to do a case study on Kubeflow to show you how you can use Easter to manage multi-tenant machine learning workloads. Kubeflow is a machine learning toolkit for Kubernetes. A Kubeflow pipeline is a platform for building, deploying, and managing machine learning workflows. A typical Kubeflow pipeline includes the following steps. First, you need to create a Kubernetes cluster. Then you need to prepare the training data by deploying workloads like TensorFlow Analyze, TensorFlow Transform to analyze, clean, and pre-processing this training data. And the next step is to change your machine learning model. And you can achieve that by deploying TensorFlow training jobs. And the last step is to serve your machine learning model by deploying TensorFlow serving jobs. So I just use TensorFlow as one example. Kubeflow supports any machine learning frameworks. And TensorFlow is just one of them. In multi-tenancy scenario, each tenant's machine learning workloads are managed within a dedicated Kubernetes namespace. For example, user Alice resource should always be deployed in a namespace called Kubeflow Alice. Kubernetes RBAC can be used to control the operator operations. For example, it can control who can deploy the machine learning workloads, who can run the machine learning workloads, and who can configure the machine learning workloads. For example, like set clotor and set access control policy or set traffic policy, we can use the Istio authentication authorization feature to secure and isolate tenant's communication. Istio provides mutual tests for transport authentication. Data is encrypted in transit. Istio provides a strong workload identity in the form of a Kubernetes service account. Each workload is assigned a cryptographically signed certificate, x5.9 certificate for the identity. And Istio is responsible for key and certificate management and the distribution. Istio provides end-use authentication, specifically, Jota authentication, for any OpenID Connect provider. For example, Google or serial Facebook. And Istio provides identity-based authorization to support service level and the namespace level segmentation at both HTTP and TCP layer. And Istio supports service-to-service and end-user-to-service authorization. Istio's authorization policy language is a role-based access control plus condition. So it provides both good usability and flexibility. So it's flexible because you can define custom condition using Istio attributes. Both Istio authentication and the authorization are enforced locally on Android proxy. So it has high performance. So now let's take a look at how Istio end-use authentication feature can be used to secure and isolate user access to machine learning workloads. Let's consider a scenario that an end user tries to access machine learning workloads like a serving job. The end user request arrives at Istio ingress. The ingress will forward the request to a service called Istio Auth service. If there is no end user credential attached, the Auth service will redirect the user to login with an identity provider. For example, Google login, GitHub login. If the login is successful, it will return a George token, which we call request context token, or RC token. RC token contains all the information about the source, for example, source IP, and also the end user credential. RC token is forwardable between the workloads. And it is a short-lived George token. And it's valid only inside the Istio mesh. So it has a very nice security characteristics because the token cannot be impersonated outside the mesh. So RC token is validated locally by Istio proxies running in front of the machine learning workloads. And after the token is validated, the user is authenticated to access the machine learning workload. Now the next step is the oscillation. Istio provides oscillation feature that can be used to authorize both channel identity and end user identity. Some typical use cases include it can authorize a developer to access a Jupyter notebook. It can also authorize an end user to access a model serving job. I show an example of Istio oscillation policy here. On the left side, we define a row called Alice serving row. It allows you to read the serving workload in Chromeflow Alice namespace. So the policy says, if the request comes from Istio Ingress and the George email claim is Alice at fool.com, you are allowed to access a serving job in Chromeflow Alice namespace. You can use Istio to monitor machine learning workloads for each tenant. Istio allows you to plug in any monitoring backends, such as stack driver premises. And you can define policies to configure what metrics, what stats you would like to log. The monitoring backends like a stack driver provide isolation between tenants. For example, you can specify the access control policy such that user Alice only sees logs and metrics in Chromeflow Alice namespace. Istio traffic management feature provides a lot of nice functionalities, such as traffic splitting, load balancing, retries, failover. For example, you can configure policies such that the frontend namespace sends 95% of traffic to V1 service. And the remaining 5% of traffic is sent to V2 workloads. And you can also do traffic splitting based on the request content. For example, request header. You can say, if the request comes from an Android device, send it to V1 workload. If the traffic comes from Apple device, send it to V2 workload. And so by doing so, you can achieve some functionalities like automatic rollout to new virgins, do AB testing for different machine learning models, separation of a staging environment and a production environment, and a lot more. All these can be configured through traffic policies. And I just showed a traffic policy here. There is no application changes required. And there is no traffic disruption. Now I'm going to give back to the mic to Wenchen to show a demo. Sorry. Thank you. All right, so next, we are going to play a short demo. It's about how E-Steel and QBflow manages multi-tenant Jupyter notebooks. So this demo is prepared by Kunmin, a developer for QBflow. Just want to acknowledge the appreciation to him for preparing the demo. That's OK. I'm going to meditate. So the first, we're going to show you that there is a centralized UI. You can try that. It manages all notebooks. Sorry, it's so fast. Let me start over. So this one shows a centralized UI. And we're trying to access a notebook from or not different users. And we're supposed to see access denied, which is what you're seeing here. So next up, the user, Kunmin, is going to create a workspace path profile under his own name. And that's the step to do that. After you create the demo file, deploy it. Then switch the user to your own user. And you can launch a notebook instance. Now you connect. Right, successful. Simple like that. And here are all the information, resources, about is the community, about QBflow communities. If you are interested in contributing to both communities, please join us. And you are more than welcome. That's all. Any questions? I want to give him a microphone. Hello, thanks for sharing all this. I just want to know how Istio and Kublo are running in the Google's production and what is the scale of usage? Are they all running using this? Or how is this used in production? That's a great question. So Istio is currently really the $1.2. It basically means production ready. But QBflow is still at upper stage. And anyone from QBflow team? So probably we have to wait after QBflow launched 1.0. Then we can show you what's going to be the production story on Google. Right, this is alpha mode. So feel free to try it and give us feedback. But the feature is already available in production ready. Right. Any other question? I'm sorry, my English isn't very good. I want to ask you. It's OK. Because it's a sidecar, right? And it will do something like that. But its network performance is really good. It's suitable for this kind of environment. That's a great question. So the basic question is Android sidecar. And how good performance is using NY, particularly on network functionalities? So it depends on your use case and depends on the performance requirement of your application. So we did some performance benchmark. So NY performance is actually pretty good. It's very equivalent to Nginx, which is like a high-performance proxy with good reputation. The latency overhead as we tested is about 100 microseconds. And the CV cost is at the same level. So if your applications can tolerate 100 microseconds overhead per hop, I think that's fine. But if your application is something like Cassandra, and each transaction is also at the microsecond level, then that's something you might need to consider in terms of overhead. Does that answer your question? Thanks. That's also a good question. So the question is, is it possible to provide hardware isolation to improve the performance? What does Google provide that? I think that's a good idea. But I don't think I'm at a position to tell about that. Any other question? All right, then. Thank you all for attending this session.