 Hi, my name is Ted Chen. I have my colleague Ching Hong here with me. We are software engineer working for Center for Open Source Data and AI Technologies from IBM. Today, I will go over the first half of the presentation, and Ching will go over the second half of the presentation. So today's agenda, we'll be going over feature store, feast, and doing a quick demo of feast. And then Ching will take over, and he will cover cave serving, transformer, and also a cave serving demo that shows how it can work with feast. So let's start with some general definition of what feature is. So features are individual values that act as inputs in a machine learning model to predict an outcome. Feature may need to be computed to become training features. So the engineered features are the features that are used for training the models. Many of the ML development companies or organizations store features in a centralized fashion. So organizations have different requirements for feature store. But in general, it can be seen as a data management interface that enables data scientists, ML engineers to create and share and distribute machine learning features. So there are many feature store solutions. Feature store was first introduced in Uber's machine learning platform called Michelangelo back in 2017. It was developed to build reliable, uniform, and reproducible pipelines for creating and mapping training and prediction data at scale. Before the system was built, the data science for building models on their laptop, the engineer teams were building one of systems to serve model in production for each project. There was no established way to deploy model in production. But of course, Uber is not alone. Many other companies were facing similar problems. APMB, Spotify, Pinterest, and Twitter are looking for solutions to manage and operate their own ML model and deploying process at scale. So they were all building their own in-house feature store to solve their own needs. The term feature store has since become more generic in recent years. In 2020 and 2021, there is an explosion of managed feature store starting to appear, such as Tecton, Databricks, Vertex AI, and SageMaker just to name a few. So just a bit history for Feast. So Feast was originally founded by Gojack. The creator, Willan Pinar, said Feast was developed to address data challenges at Gojack for scaling machine learning, for ride hailing, food delivery, digital payment, fraud detection, and a bunch of other use cases. It was developed in 2018, open source in 2019, and joined the LFAI and Data Foundation in 2021. And currently, Feast is one of the popular feature store projects on GitHub. And so Feast is able to solve the following problems. First, model need consistent access to data. Machine learning systems built on traditional data infrastructure are often coupled to database, objects, stores, strings, and files. As a result, any change to the infrastructure will break a system. So Feast decouples model from your infrastructure by providing a single data access layer that abstracts features storage feature retrieval. Second, deploying new feature into production is difficult. Typically, moving your engineered feature to production requires dedicated engineering team to set up the survey infrastructure. Feast aims to streamline this process by allowing data scientists to ship their engineered feature directly to the online store in production with minimal supervision. Third, models need point in time correct data. ML models in production require a view of data consistent with the one on which they are trained. Otherwise, the new features may leak into models during training and accuracy of these models could be compromised. Last, features aren't reused across projects. So the centralized registry allow different data science teams in any organization to publish and share features across multiple projects. So let me go over some basic terms of Feast feature store. So in short, Feast can be seen as a feature managing and serving layer for your model in production. The three main concepts used by Feast and also in the industry are online store, offline store, and registry. The online store is used to serve feature data to model from a low latency online store in production. Usually, the online store lets you query features and use them as input for real time model prediction. The offline store is used to serve feature for batch training jobs. And then the registry is used to store feature metadata. Keep in mind that Feast does not solve the following problems. So first, Feast does not want the ETL tools. Feast is not a feature engineering tool either. Feast is not a data warehouse. Feast assumes that you have already done the feature engineering jobs using upstream ETL tools and you have stored these features in your data warehouse. Feast provides the SDK to retrieve features from your data warehouse in a consistent way. And last, Feast is not a general purpose data catalog for your organization. That means Feast is purely focused on cataloging features for use in ML pipelines or system and only to the extent of facilitating the reuse of features. So let's dive into the Feast infrastructure. So what exactly do you need to run Feast? Prior to release 0.9, Feast had a dependency on Kube cluster. In the recent release, Feast has been simplified to run entirely and can be run entirely on your laptop without any infrastructure and dependencies. So minimally, you only need to run pip install Feast to get started. Also, Feast provides CLI tools to automatically set up infrastructure registry feature repo with sample features in local mode on your own laptop. So right away, you can start using it to get historical features, overall materialized to move offline feature to the online store, and use SDK to retrieve the online features right away. Besides local mode, currently, Feast officially supports running Feast online or offline store and registry GCP AWS using cloud providers, native services. For example, if you choose the GCP Feast supports Datastore as the online store, BigQuery as the offline store, Datasource and GCS as the registry. GoJack was the original sponsor. TechTown is currently the major contributor and sponsor for the Feast since last year. GoJack was major sponsor before the release 0.9. IVN has been active in contributing to Feast since last year. Shopify provides some performance enhancement for Feast. Robinhood integrates Feast in their ML processes to provide data science and friendlier APIs. Salesforce built multi-tenant feature store using Feast as part of their ML platform. And so for those who like to run those, run and use the latest Feast feature store in Kubernetes cluster and set up a REST API to retrieve the online feature. This setup could give you a minimal starting point to set up your own production-ready Feast online store with your own model serving service such as a cave serving. Just let me go over the topology of my feature store on the Qt cluster. I will use this as a base of my demo. So basically what I have here is I have three components already deployed running in my cluster. And the online store is a ready server. I also have a feature server that serves REST endpoint. And then the Chrome job that runs materialized to move the latest feature from offline to online store. And so beside those three components in the cluster, outside of the cluster, I store my feature definitions and feature store config in a GitHub repo. The Chrome job and the feature server use the repo to initialize themself. So they know where to find the registry online store and offline store. My offline features is stored in a bracket file on the S3 and the registry is located on S3. Although the current release 0.12 does not officially support this topology, it already has all the components ready. So the only thing I need to do is to wrap them inside a container and deploy it to the Qt cluster. So for this demo, I'm going to use a Jupyter notebook to do some quick demo of how to initialize fees and use the SDK to get online features and also using some generic HTTP client like Curl. So let me switch to my Jupyter notebook. So this notebook is developed to work with this cave serving transformer, which Qing will go over later. So this notebook will help you populate a piece online store and run the piece online serving REST API server in the Qt cluster. And the demo is tested with Feast 0.12.1 release. And I have all the steps to set up the cluster around the notebook. All the steps is stored in this repo here. OK, so let me quickly go over the demo. So first, you need to run the clone and clone the config and clone the feature definition. And then run keep install Feast. And since I use Redis and S3, so I need to install Feast Redis and Feast AWS, which I already done so, so I'm not going to do that again. And then after that, you can run the Feast SDK. So Feast SDK shows you some help menu. So you can just start with it. Inside the feature repository, which is the one I just clicked on, there are two files inside. The first one is driver repo. This is the feature definition. It has the entity and the feature view. And also, it tells where my driver step pocket file is. And also, there is a feature store config file. The config file specifies where to find my registry and where my Redis server is. So with this two file, I can use the file to initialize my Feast SDK. So I just need to point in my Feast SDK to my repo. And that's how you initialize it. And of course, you can run some historical features to get historical data. And there's my data. And you can also get the online features. You see an SDK like this. Oh, I forgot to run materialize. So materialize will move the offline data to the online store. Well, let me run again. OK, we got data back. And let me use curl to get the online feature. OK, so there you go. So I've shown you how to use Feast SDK and curl to get online features. So this is it for my part. So Qin will take over and do the second half of the presentation. Hi, I'm Qin Huang. I'm going to talk about cave setting and how it works with the Feast feature store. Cave serving is a complete solution for production ML serving. It is based on Kubernetes for its proven scalability and performance. It is aimed to solve the model serving complexity problem, such as networking, auto scaling, system configuration, and resource monitoring. This is particularly for the data scientists. Cave serving has a clean and generic interface that works with most popular machine learning frameworks so that the user experience is going to be very simple and consistent. The project is founded by some industry leaders, including Google, Southern, IBM, Bloomberg, and Microsoft. I guess all these companies would like to see a vendor neutral serving platform. The key features for cave serving start from this serverless inference for ML models. The model explanation helps to ensure that the predictions are expandable, fair, and not biased. The pre and post processing helps to ensure that the data and its formats are properly prepared and used across the entire ML workflow. Last but not least, the ability to do canary rollouts, it means a new model version can be deployed without any service interruption. Let's look at the solution stack here. Kubernetes is on top of a compute cluster, as usual. The next one is Itzio, which is a service mesh layer. It is designed to support distributed microservices. Over here, it enables cave serving to handle traffic routing and ingress to the deployed models. Next up is Knative. It is a layer for managing cloud native applications on Kubernetes. Over here, it enables cave serving to manage networking and the canary rollouts. It also helps to auto scale the use of CPUs and GPUs. Finally, on top is cave serving. It makes use of this entire stack to serve ML workloads, running in different frameworks. Let's look at the supported frameworks and storage options. As you can see from SK Learn TensorFlow, PyTorch, SG Boost, Onyx, and so on, I would say most existing and new models can be served by this solution. As for the storage options, cave serving is capable of working with S3, GCS, Azure Blob, DVC, and HTTPS. So that means the ML models can stay in different cloud storage systems, no need to move them around. The core element for cave serving is this inference service. It is essentially a static graph with three components that sort of work together to end all the requests for a single model. The predictor is the T workhorse to serve inference. The explainer is an optional component that provides model explanations. The transformer is also optional. It handles pre and post processing for predictors and explainers. So on the right-hand side, you can see the diagram. A user's request will go through pre-processing, explanation, prediction, and post-processing. Of course, this is when all components exist in a single inference service. The wiring and the inter-component communication is all managed by cave serving. We'll take a closer look at the transformer next. Transformers are flexible. The transformation logic is implemented by extending this cave model class in Python. And then a container image is created with the custom code in use for deployment. Currently, there's no building transformers in cave serving. However, there are a couple of examples. And these can be used as templates for new custom transformers. The first one is an image transformer for port serve here. It takes raw image input data and converts it into input tensors for the models to work. The other one is this feed transformer, for which I will go over in more details next. This feed transformer is all about input augmentation with real-time features. So a few things need to be developed for the transformer to work. The first one is the custom container image with code to interact with the beast feature store. We coded the feed processing logic here and leave the post processing as through for now. So as you can see here, the user request will be processed by this transformer code and two rest calls will be made. The first one is to this feed feature server to gather the features. Of course, the features are coming from Redis. And the other rest call is to this predictor. And that is to make the prediction. Eventually, the driver rankings will be returned to the end user. Essentially, that's how this transformer works with Feast and also with the game serving framework. Here we have a custom built feature server container using Feast SDK package. And the properties we used here are entity IDs, feature references, and the serving URL. Of course, the driver ranking model is trained with Feast offline store in a scaler. The features to determine rankings are driver average, daily trips, account rates, and conversion rate. Okay. Next up is a short demo. That the use case is to find the best candidate for a driver request out of, for instance, five drivers which are identified within certain distance from the requester. So as you can see here, the input is going to be the unique driver IDs and the output will be the predicted rankings for the final recommendation. To build a solution, you can see the high level steps here. Yeah, let me switch out of presentation to do some quick code review and a short demo. Okay, so this is the Cape Serving repo and here is the Feast transformer example. The first thing I want to share with you is the configuration or the spec for my inference service. Here you see there are two components, transformer and the predictor. And I have a list of arguments which are quite dynamic. Essentially, I just need to provide the serving URL later to make it to work. The next thing I want to quickly show is the code that works with Feast feature server. So this is the pre-process handling. Essentially, it's going to do some parsing and form a HTTP request and later forward the final input to the predictor. Okay, here you see I have the code clone here. I just updated this with my serving URL and I can show you the parts we already have. The Feast feature server and the Redis. Of course, in my namespace, there's no parts deployed yet. So essentially I'm going to use the code control applied to deploy my inference service. So here you can see two parts are being deployed in a few seconds. These will be ready. And then I'll just run this URL command to make a request. Essentially, it's using the input file I have here which has five driver IDs and you'll see the rankings coming out shortly. So let's see if the parts are ready. Okay, they are ready. So let's just... and the curl command. Yeah, essentially this goes to my transformer, gathered the online features and make the predictions. Apparently, driver three is the best candidate. All right, this concludes my part of the presentation. Dad, go back to you now. Sure. Okay. Let me summarize today's talk. So we have shown you that what this feature store is and how you can deploy FIST in a Qt cluster and also run query against the online feature store using HTTP. Also, we have shown you how to do model inference using FIST and cave serving together in a Qt cluster. We do think that they are still feature future enhancement that will make FIST, cave serving, and Qt flow better. So let's start with the FIST. The latest FIST needs to bring back the out-of-box QoNet support for those who like to run FIST on Qt cluster. And also besides REST API, GRPC server could be a nice addition for high-performance online feature retrieval. And for the cave serving, we also think that having generic built-in transformer for FIST will accelerate the feature development and model deployment. And for Qt flow, for the overall ML workflow, we think feature engineering is very important and could be added as a key step in the Qt flow pipelines. We could also add FIST to Qt flow like training with TensorFlow using TFJob. This could be done by extending training operators with FIST. So the offline data can be used to train models. So this summarizes today's talk. And Shen, could you change to the next slide, the next one? So we do have a few talks from our organizations. And if you'd like to learn something for the AI and Qt flow and cave serving, please join those talks. Thank you very much. Thank you.