 Hi, welcome to my talk. I'm Shotaro Kohama. I'm a machine learning platform engineer at Velkari. Today I'm going to talk about efficient model exploring and continuous delivery with polyacid and cube growth. Let's get started. First of all, I will explain what Velkari is. Velkari is a customer-to-customer marketplace where customers can sell their items to the other customers. Velkari is available in the US and Japan. Today I'm going to talk about the Velkari US machine learning platform's work. One of our unique challenges is writing. For sellers, if the item price is too expensive, it takes time to draw the view. On the other hand, if the item is too affordable, the seller cannot make profitable sales. To help solve this problem, we use these machine learning models. We call that system a price guidance system. That price guidance system consists of the two features. One is the price suggestion. The price suggestion feature recommends a viable price range when a customer makes a listing on our marketplace. The second feature is smart pricing. When a customer makes a listing on our marketplace, customers can decide the flow of the price. The smart pricing feature gradually and automatically updates the item price to the flow of the price. That item will be promoted to the potential buyers. To decide item price range and flow of price, we use this machine learning model. Our machine learning model takes the item title, description, and category, and so on and suggests the price. We use machine learning practical components behind this price guidance system. In the latter part of this presentation, I will dive into the machine learning components. Here is today's agenda. First, I will talk about the general machine learning project life cycle. And then I will explain how we use Coriaxon and the Qt frog pipeline at Melcony. Lastly, I will talk about how we build machine learning practical to utilize that Coriaxon and the Qt frog pipeline efficiently. Let's move on to the first part, machine learning development life cycle. Generally speaking, machine learning projects are highly related. First, we need to design the task to solve the FML. Once we design the task, we will start the collecting training data. We get enough training data, we can start the model exploring phase. Through the model exploring phase, we first design the model to deploy to the production. And then we can start development microservice to deploy model and integrate microservice with the other systems. Once we finish deploying the model to the production, then we can get actual feedback from the production cluster. Then we can go back to the collecting data phase. We can train the ML model with the actual feedback data and that this iteration will keep going. As a machine learning practical perspective, how we can accelerate this iteration is the key to the success of the ML project. And we believe we can be able to accelerate these iterations by automating manual processes with open source ML ops and DevOps tools. At Malkani, we use these open source tools for acceleration. For model exploration phase, we use Coriaxon. Coriaxon is one of the ML ops tools to support scalable and repeat-disable model exploration. And for continuous training, we use Cubifro pipeline. Cubifro pipeline allows us to manage the end-to-end ML workflows on top of the Kubernetes. And for continuous delivery, we use Spinaker. The Spinaker is able to trigger a deployed pipeline when a new image pushed to the registry so that we use Spinaker for continuous delivery. In this talk, I will focus on the Coriaxon and the Cubifro pipeline parts because in Cubicon North America 2020, SNAP team gave a great presentation about how we can use the Spinaker for ML ops. So we use a similar way to get along that ML microservice to get production so that I will focus on the Coriaxon and the Cubifro parts in the later presentation. Let's move on to the Coriaxon section. Coriaxon is an ML ops tool to support scalable and reproduce model exploration. The Coriaxon has a YAML specification called the Coriaxon file. On the Coriaxon file, we can define the steps to build a local image for training and also we can define the commands to run the model training. And also we can put what parameter we will explore as a hyper permitted training zone. And Coriaxon has a UI to visualize the results of hyper permitted training zones. As you can see on the left below image on this slide, we can see the what parameters were the best as one matrix. And Coriaxon file has all information to reproduce that experiment that other engineer can easily reproduce that experiment. And also it allows us to take over a project easily. That's a big benefit. Here are the steps to run the hyper permitted training job on Coriaxon. First, we'll define a Coriaxon file. And next we'll create or modify the folder to train the model to decide what in a library or in a model architecture we'll use. Third step is to upload the Coriaxon file and call this Coriaxon CLI. And then the Coriaxon will build our local image to run the training call and it will schedule the hyper permitted training job in parallel way. That's how to run the Coriaxon from job. In this role, my favorite point is that Coriaxon helps us to build a local image. With this feature, a developer doesn't have to build a local image manually or doesn't have to wait for continuous integration every time you modify the code. It sounds a subtle thing, but for me it's very important to keep my concentration in the development role. This feature prevents interaction from happening in the document, so that's my favorite part. At the military US, we have been using Coriaxon for more than two years. With less than 10 developers, we've created 175 projects and we've done about 87,000 experiments. And also we have metadata around this experiment on the Coriaxon that we can see the parameter was used for this experiment. That's amazing. Next is Kubeflow Pipeline. Kubeflow Pipeline is an open source workflow engine, which is especially for machine learning workflow. Comparing to the other workflow engine, Kubeflow Pipeline has a metadata store. With the metadata store, Kubeflow Pipeline stores outputs of each stage so that we can store what parameter we use in the training step. And also we can store the results of evaluation steps. Kubeflow Pipeline UI has the feature to visualize that matrix so that we can visualize the confusion matrix of the evaluation step on that web UI. Kubeflow Pipeline also has a Python SDK. With that Python SDK, we can write the workflow as a Python DSL. And also that Python SDK provides a way to define the reusable component. So we can create one component to share among pipelines. At Meltady US, we use Kubeflow Pipeline to achieve continuous training and continuous delivery. Kubeflow Pipeline automates the manual operations regarding the model export, collaboration phase, and deploying the ML model to production. For example, in the 1-1 workflow, the first step submits to the PolyAxon job to get the training job. And the next step is to submit to the PolyAxon to get a training model. And the third step to submit the PolyAxon job for model evaluation. And next step, if the matrix looks good, then the Kubeflow Pipeline will create the Docker image to serve the new trained model and push it to the Docker registry. And then Spineka will detect the new Docker image and then it will deploy to the production. That's the way we achieve the continuous training and deployment. With this approach, we can avoid the re-implementation in transition from the model exploration phase to the continuous training phase. For example, we don't have to modify the code to access the training data, or we don't have to modify the code for the metadata, something like that. And as a result, we can say the local development part. The last three, as a platform team, we built our tools and set up a configures integration to be able to write such and to end the workflow easily. And I feel these three points on this slide are very important for the efficient development. In the last section, I will highlight these three points, especially. The first one is monorepo for Kubeflow Pipelines. We built a monorepo to manage the multiple project Kubeflow Pipelines to share the best practices and knowledge. The second point is we define the YAML manifest to manage the resources on the Kubeflow Pipeline and the PolyAxon. And it enables us to achieve infrastructure as part of the resource management. The third point is the Kubeflow Pipeline component to submit the PolyAxon code. We built a custom component to submit the PolyAxon job from Kubeflow Pipeline. With this component, every developer doesn't have to write a code to log in the PolyAxon and submit the job. That's the point. Let's deep dive into the 1.1. Here is the structure of the monorepo. We put a Python package to share the lightweight components. Lightweight component means one way to create a Kubeflow Pipeline component. Kubeflow Pipeline SDK has a feature to convert a Python function to the Kubeflow component. We don't have to write a top-up image and a YAML file for the component. With that feature, we built it for the Kubeflow Pipeline component. And also, with that Python package, we can share the utility functions, like setting up memory and CPU resources. And also, we can share the constant values for variable names or secret values. And we set up the continuous integration with this monorepo. When we push the code to this branch, give it a half, and then the continuous integration will detect the change of pipeline. It automatically compiles and uploads to the development browser. At that time, continuous integration will use the branch name and the last commit hash. With this way, we can standardize the versions of the Kubeflow Pipelines. We have a development browser and a production browser. And only when merging the pull request into the main branch, the CEI will upload to the production browser. With that, we can keep the only called pass or call review on the production browser. That's one of the benefits. Next thing is project manifest for Kubeflow and Kubeflow Pipeline. We define this demo specification for resource management. CEI will create the Kubeflow Pipeline experiment and Kubeflow project with this demo file. And then we don't have to manually create these resources on the UI, and then we can keep the consistency between development and production environment. This manifest also has the owner's field. With this field, we can manage the permission for call review. And it enables each team to upload the pull request to modify the related call. That's the way we do this project. The second thing is polyaxon Kubeflow Pipeline component. These are the steps how Kubeflow Pipeline polyaxon component works. At the first phase, any container from the private repository with the secret need to store the key to have a token. And then the main container moving to the polyaxon user with a secret need to store the polyaxon token. In the service case, the main container submits a job to the polyaxon through the polyaxon API. And then the main container tails the logs until the job ends. And last, the main container fetch the results of the job and outputs the data to the next steps. Here are the examples of the Kubeflow Pipeline we use for continuous delivery and training. The right side here shows the actual compiled Kubeflow Pipeline. We have a blog post about how we use them for continuous training and deployment. If you are interested in that, please take a look at it. That's it. Here is the takeaway from my presentation. The first one is the polyaxon suite, the model exploration in a scalable and reusable way. Monolepo and continuous integration for Kubeflow Pipeline works well for us and it allows us to keep high efficiency and consistency. The custom Kubeflow Pipeline component for polyaxon involves us moving forward from model exploration phase to continuous training phase seamlessly. Thank you for listening. Let's move on to the live Q&A session. Bye.