 Hi, I'm Amy, and I work at Google on the cloud platform. In this talk, I'll first discuss what kinds of issues come up when you move your machine learning workflow from development to production and what approaches and design patterns can help. Then I'll discuss how Kubeflow Pipelines operationalizes many of these useful patterns. And finally, I'll show a couple of example pipelines that show some of these patterns in action. So launching the first proof-of-concept version of a machine learning system is usually fairly easy. But when you try to productionize and scale out, typically what was built as the prototype is only a small piece of what you need to pay attention to. And it can be hard to scale out and keep your system operating continuously. I should add that I'm using productionize to indicate making a system more robust. That doesn't necessarily mean that it needs to be external facing. So many of you have probably seen this graphic before, but I like it, so I included it again. This is the common perception when thinking about a machine learning application. Many of us anticipate that the main challenge is going to be getting the models working properly and accurately. But the reality is that building the model is only a small part of what you need to be paying attention to. So why do things become so much harder when moving to production? Here's an incomplete list of some of the reasons. These are things we hear from customers. First, data cleaning and processing becomes very hard at scale. Dealing with data, cleaning it, getting it to where it needs to be, future engineering, et cetera, is a large part of the overall effort required to put an ML system into production. It can also be hard to scale out training and serving infrastructure to make sure that your system has sufficient resources when it needs them, and then it can scale back down when they're not required. Another set of problems can arise from issues like model or data drift or training serving SKU. Your production infrastructure needs to be able to detect and handle situations where a model is no longer sufficiently accurate or new data might indicate model training is necessary or online prediction data is not consistent with the future engineering done for model training. In a production environment, access control and security become more important as well. And of course, lots more. So let's look at some patterns and practices that can help address these problems. Here is a high level and incomplete list. First, formalize your machine learning workflows. For production, move away from a stitched together notebooks or model ethics scripts. This suggestion will probably come as no surprise given the title of this talk. Equally important, your machine learning workflows should behave in the same way across environments. Workflow execution should also support versatility. That is, scale out resources when needed and scale down when they're not. Designing for composability, modularity, and reuse of workflow billing blocks will ensure that you can reliably reproduce and rerun your workflows. And similarly, your infrastructure should support workflow monitoring, versioning, and caching. This typically requires making ML workflow metadata explicit. The data scientists on your team will probably be prototyping in notebooks. You need well-defined processes to capture that work and move it out of notebooks for production use. Mechanisms for supporting collaboration and role-based access control also become important. Informal methods of providing team members access to your notebooks, et cetera, won't scale anymore. So that was a useful list. But how to operationalize these patterns and actually use them? This is where Kubeflow Pipelines comes in. I'll briefly introduce it and then talk about how its features and capabilities support these patterns and practices. Then I'll show a couple of examples. Kubeflow Pipelines is part of the Kubeflow Project, which features in many of today's talks. It has the goal to make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere. Kubeflow Pipelines, which you'll often hear abbreviated as KFP, is a platform for building and deploying machine learning workflows. It runs on a Kubernetes cluster, and its pipeline steps are container-based. It can be installed with the rest of Kubeflow or in so-called standalone mode without the other parts of Kubeflow. For the examples I'll show later, I'll be using a standalone installation installed on a GKE cluster, where GKE is Google's hosted Kubernetes. Kubeflow Pipelines has a Python SDK and an interactive UI. You'll see both in the examples coming up. Kubeflow Pipelines supports two Python SDKs, KFP and TFX. In the examples I'll show for this talk, I'll be using the KFP SDK. So back to this question, how can Kubeflow Pipelines play a part in operationalizing helpful ML ops design patterns and practices? First, of course, Kubeflow Pipelines lets you formalize and automate your machine learning workflows. You can think of a pipeline's platform as the backbone of production ML systems. A typical workflow might look like the one on this slide with experimentation and prototyping stages, as well as automation of model eval and model deployment, monitoring and retraining. By specifying a workflow as a pipeline, rather than building monolithic scripts or running a series of notebook cells, we can automate, track, and reproduce the workflow more easily to bug problems and reuse some parts of a workflow elsewhere. More is needed, though, than just workflow orchestration infrastructure. Explicit use of pipeline metadata is another key to effective ML ops. Once we can treat workflow artifacts, things like models and data sets, as first class citizens with defined schemas, we can automatically track and reason about their use. Kubeflow Pipelines automatically logs to a metadata server during pipeline execution. And this, in turn, enables automatic lineage tracking and supports features like reproduction of previous pipeline runs and comparisons across runs. An ML workflow should also be reproducible and portable so that it runs the same way in different environments. Here are some of the ways that KFP helps support that. Kubeflow Pipeline steps are container-based, so they'll run the same anywhere. Similarly, Kubeflow Pipelines can be installed anywhere that a Kubernetes cluster can be set up. So this lets Kubeflow Pipelines leverage Kubernetes autoscaling, allows pipeline tasks to create and manage Kubernetes resources. And I'll show an example pipeline that takes advantage of that in just a few moments. KFP also allows Pipelines to be versioned, so it's clear which version of a given pipeline was run when, and it automatically logs metadata and tracks pipeline artifacts during execution. With KFP, it's also straightforward to clone a pipeline and run it again or to retry a pipeline. KFP supports step-level caching as well, which makes it really straightforward to experiment with different pipeline variants or to debug. Support for modular design and composable pipeline building blocks is also important so that the pipeline steps can be plug-and-play, support easier debugging, and can be reused and shared. Kubeflow Pipelines allows users to share components, by components, I mean specification of pipeline steps, as well as pipeline definitions. Components can be compiled to YAML format, put under version control, and loaded from their source URLs, as I'll show in one of the examples. The KFP SDK also makes it easy to create pipeline components without needing to directly use Docker, which can be helpful often for data science teams, particularly in the prototyping phase. I'll show that in an example as well. And while I won't have time to show it in this talk, there's now multi-user support for KFP when you install this part of Kubeflow. Given that pipeline components are composable and reusable, the next step is to provide pre-built building blocks to guide construction of canonical ML workflows and reduce the need to build your workflows from scratch. The TensorFlow Extended Project provides a number of such components in libraries, and I'll show an example that uses one. In addition, pipeline steps can call out to any service, so we can provide pre-built components that wrap ML services. There are many such in the KFP repo and more being added all the time. It's also important to support processes for moving from notebook prototyping and experimentation to production. Part of this is the ability to easily create and run pipelines from a notebook. And one of the examples, I'll show how straightforward it is to do this. There are, of course, more aspects to converting from notebooks than just using the SDK in a notebook. For example, see the KL talk in this session. Another ML Ops pattern relates to supporting tooling around managing pipeline runs. One important category is making sense of execution results. This includes the ability to organize pipeline runs into semantic groupings. KFP calls them experiments and to easily compare and visualize results across runs. KFP components can be designed to output metadata that renders its visualizations in the dashboard. And you can spin up a TensorBoard server right on the cluster by including a pre-built component, pre-built pipeline step, when you build your pipeline. I'll show that in one of my examples. Another important tooling category is CI CD. For example, when pipelining and components are changed, new versions are automatically built and tests are run. I won't have time to cover that in this talk, but there's examples online and another talk in this session will delve into this in more detail, I believe. So now let's take a brief look at a couple of KFP pipelines that support some of the design patterns I've been showing. Both of these examples use the same dataset, which is a log of a few years' worth of London bike rental data combined with information about the local weather on each day. The machine learning task in both cases will be to train a Keras model to protect rental duration. The model details aren't important for this talk, though. For both example pipelines, I'll be using a so-called standalone installation of Kubeflow pipelines, running on a GKE cluster, where GKE is Google's hosted Kubernetes. I'll first show an example that highlights some of the ways that a Kubeflow pipeline can leverage its underlying Kubernetes cluster. This pipeline uses the Keras tuner to do hyperparameter tuning and uses the tuner's distributed mode to let the workers run concurrently on the cluster, setting up each tuner worker as a Kubernetes job. And of course, the cluster will auto-scale as needed to support the number of tuner workers specified. So this is what the graph for this pipeline looks like. Pipeline's using a pre-built component to spin up a TensorBoard server on the same cluster, and then the Ktune step launches the tuner workers and controller as Kubernetes jobs to do the hyperparameter search. And when they're all finished, returns the end best parameter sets. Once the tuning search is completed, the pipeline trains end full models using the best parameter sets, trains them in parallel using the Kubeflow pipeline's loop structure. And while they're training, we can use the TensorBoard server to monitor the training runs. After training, each full model is evaluated. And if it is of sufficient accuracy, it's a condition check here, it's deployed. The models are deployed using TensorFlow serving, where the TF serving services are also spun up on the same cluster. So we're using the Kubernetes cluster, not only to run the pipeline, but to launch jobs to perform a distributed hyperparameter search, to use TensorFlow serving to serve models, and to run the TensorBoard server. And of course, the cluster will auto-scale when we need it to. If we want to use more Keras tuner workers or train more models concurrently. So now let's see what it looks like to upload and run a new version of the pipeline. I'll click on upload version to upload a compiled pipeline archive file. In the next example, we'll see what the pipeline specification looks like. By uploading new versions of a given pipeline, rather than just creating a new pipeline, it lets me track and group related pipelines in their runs. Then I'll click create experiment to run this pipeline under the grouping of a given experiment. So this lets me semantically organize the pipeline runs. Then in the start to run page, I'll enter the pipeline run parameters. Some have default values I'll keep and others I will change. While I'm not showing it here, I can also clone or retry an existing pipeline run. One thing I'll do is specify to train full models using the best three parameter sets returned from the Keras tuner. The default is to train the best two, which gives the pipeline graph shown in the top image. And when I change it to three, I'll get the pipeline graph shown in the bottom image. This is made possible by using the pipeline, so-called parallel four construct to launch and training jobs in parallel. Because I've configured my clusters to support auto-scaling, I can change both the number of Keras tuner workers or the number of concurrent training jobs and let the cluster scale out when it needs to. Next, I'll start the run. Note that one of the steps is configuring a tensor board server that will run on the cluster. The K-tune step manages and coordinates the Keras tuner workers, which are Kubernetes jobs. While the tuner workers are running, we can see them in the GKE dashboard. The K-tune step waits for the workers to complete all their trials and outputs the top and best results. And in this case, we requested the top three. Then a full training job is launched for each of the best parameter sets. While the training jobs are running or after they've completed, we can launch the created tensor board server, which we've configured to include all three of the full training runs. This lets us monitor model graph and metrics data and lots more. It looks like set zero of the three parameter sets is the best one. And it looks like perhaps we over-train the models as well. We can look at the outputs of the K-tune step and the pipeline graph to see which parameter set was used. And this information has been preserved in Google Cloud Storage as well. For my second example, I'll walk through parts of this notebook. This example builds on the training and serving components used in the first example and adds some additional ones. It shows how to use the TFX TensorFlow validation libraries, often abbreviated as TFTV, to detect data drift between different versions of a data set. If drift is over a given threshold, the model is retrained. This is a two-step process. First statistical information needs to be generated for both the old and new data set, then the stats need to be analyzed. We'll build these in the notebook as two different pipeline components, both using the TFTV library. Then we'll build a new pipeline that uses these components plus the training and serving components from the previous example. While I won't have time to cover it, the notebook also shows how to set up event-triggered pipeline runs. I'll share a link to the notebook at the end of this talk. I've done some setup already, so let's skip to building the new components. I'm gonna create Python function-based components. I'll define a function that implements each pipeline step, then convert it to a component. The first function generates the stats for a data set. The function returns a string that holds the path to the generated stats info. When the function is compiled to a component, it will have an output called stats path that other components can consume as input. We'll see this in a minute. The given data set is large. It can be hard to do this analysis in memory. The function allows the option to launch the analysis as a Cloud Dataflow job. Dataflow is Google's hosted Apache Beam if it needs to scale out. This is a good example of using pipelines to orchestrate calls to other cloud services. Now I'll convert this function to a pipeline component. When I do this, I need to specify the base container image to use. The default is a Python 3.7 image, but I'll use one that already installs the TFTV libraries to save time. As part of the process, I'm generating a component YAML file. This can be put under version control and shared with others. Now I'll do the same for the second new component. It also uses the TFTV library, compares to stats files. It returns a value indicating whether there was an anomaly over the given threshold. Now that I have two new components, I'll build a pipeline that uses them using the KFP SDK. In addition to the new components, I'm defining pipeline ops from four prebuilt components that TensorBoard component I mentioned earlier that spins up a TensorBoard server on the cluster and the training, serving and model eval components from the previous example. These are all checked into GitHub and I can load them from their URLs. Note that I need to use the raw URLs. We'll use the TFTV app twice for both the training and test dataset. Then the TFTV drift step takes as input the stats generated by the TFTV step run on the training data. If sufficient drift is detected, this is the conditional expression, we'll retrain the model on the new data. Then a model eval step determines whether the accuracy of the new model is sufficiently good. And if it is, that's the second conditional, the model is deployed using TensorFlow Serving. The last line of this pipeline definition indicates that the training step should run on a GPU enabled node in your cluster. Now we're ready to compile and run the pipeline. After we compile it, we need to instantiate a client object using the URL of the KFP installation. Then we'll first create an experiment under which to run the pipeline. Next, we'll upload the pipeline, get its ID and then launch a pipeline run. You can also combine the upload and run as a single call, if you like. When we run the pipeline, we can pass it the input arguments to use. Once the run is launched, we can view it in the Kubeflow Pipelines dashboard. Then, several hours later, the result looks like this, in the case where we do detect drift between the old and new data sets and decide to retrain the model. In the serving step, you can see a listing of the YAML used to spawn off the TF Serving deployment and service. So in this talk, I discussed some of the problems that can arise when you're trying to move a machine learning workflow from development to production. I showed some MLops patterns that can help and talked about how Kubeflow Pipelines helps operationalize these patterns and practices. You can think of the capabilities provided by Kubeflow Pipelines as largely falling into three buckets. ML workflow orchestration, share reuse and compose components and pipelines and then run them across multiple clouds on-prem, anywhere that Kubernetes itself runs. And rapid, reliable experimentation, including metadata tracking and parallel experiments. So no matter which machine learning framework we're using, these concepts are critical for MLops. So thanks very much for your time. Here are the links to the code for the two examples I showed.