 Hello, everyone. I'm Fitz Wang, come from Tencent. It's very pleased to be present here. So today, my topic is Android 3.2... Oh, sorry. Android 3.2.0, a first-time machine learning platform. So this is today's agenda. First, I will briefly introduce what is Android, and then recall what we have achieved since the release of Android 2.0. And next, we will introduce the new features of Android 3.0, including auto-feature engineering, two computation engines, Spark on Android, and Patot on Android. And next, AutoML, especially kappa-prime tuning and model serving. In the end, I will share two use cases. So what is Android? Android is a large-scale distributed machine learning platform for Spark data and Qt models. There's many machine learning platforms in the industry, such as TensorFlow, Patot, Spark, Paddle, Paddle. So what makes Android special? I think the underlying reason is first. There is an efficient and distributed primate server in Android. The distinct feature of Android primate server is it supports primate server functions. It's allowed to push the calculation down to the primate server. And compared to other synchronized strategies, such as Oridus, Android primate server does not require all the primates stormed in one single node. So both the scalability of Android primate server and the flexibility of primate server function makes Android is suitable for Qt models with trillions of primators. And the second reason is there is a self-developed mass-nappery, which is designed for high-dimensional Spark data calculation. It runs about 10 times faster than Breeze. Breeze is the mass-nappery in Spark. Based on the mentioned two reasons, the reasons in Android are focused on two scenarios. First is recommendation, and second is graph learning, including graph embedding, graph neural network. As you can see from the radar chart, Android actually is very different from TensorFlow PetTorch. TensorFlow and PetTorch are focused on deep learning and has a powerful ecosystem, while Android is focused on Spark data and Qt models. This is the architecture of Android. The Android mass-nappery is at the bottom. Above the mass-nappery is the efficient and distributed Android primate server. Apart from the primate server, there are two computation engines. One is PetTorch and the other is Android ML Core. Above the computation engine, there are three distributed executors, including Android native Spark Engine and PetTorch Engine. And on the top, there are two public service components. One is Auto-MAR, Auto-Machine Learning. The other is Android Serving. So Android Instant Tencent. The left chart, as we can see, is the number of tasks running on Android. As you can see, the number of tasks running on Android increased about two-and-a-half times since last June, especially the yellow line is the number of tasks running on Spark Android. It increased about 10 times. The left table, we can show some detailed statistics, such as active user per month and active user per day. From those statistics, we can conclude that Android actually runs very well inside Tencent. Some popular apps such as Tencent Video, Tencent News, and WeChat are all users of Android. To ensure outside Tencent, two statistics outside users is a challenge. So we maintain a social media account to respond to the question from users at the first time. So all the data are home from that account. From the incomplement statistics, we can show over 100 companies or institutional users enjoying their in-assisting products. Most of the companies come from China, especially from Beijing, Shanghai, Hangzhou, Chengdu, and Shenzhen. The left figure lists the top companies, Weibo, Xiaomi, Huawei, Baidu, and China Technicom. So the open source and the papers, Android has more than 4,000 countries stars, and more than 1,000 folks have seven sub-projects and more than 3,000 whole lines, and more than 2,000 committers, and 34 contributors, and eight of them are committers. So from those statistics, we can conclude that actually the Android community is relatively active, except the open source statistics. We also published three papers on the top conference, including Sigmoid, VLDB, and ICDE. So this is what we achieved since Android 2.0. Now, this is an overview about the new features. The right, yellow, and the red corner represents the existing enhancement, and the new arriving components or features. The new components or features, including feature engineering, especially auto-feature engineering, auto-service, auto-machinery, and Patoch on Android, could be latest support. The inconstant feature or components including improved computation graph, Spark on Android, and PS in commencement. So let's go details. The first new feature or new component is automatic feature engineering. Feature engineering is very important for real-world applications. For example, the online recommendation systems usually use logistic regression as the model for its cache throughput and no latency. However, this model requires a complex feature engineering to achieve a higher accuracy. So we have a comprehensive survey about the feature engineering. So, and we found the feature selection and the feature cross is more important. Although Spark has excellent feature engineering operators, it's next feature selection operators. So we implement two categories of feature selection operators. One is statistic-based, such as one-range selector, feature selector. The other is model-based, including non-source selector and random forest selector. Automatic feature synthesis is very challenging because the dimension is easy to go over the range of integer type in the computer. And it's also easy to need to the dimension curse. So for the first problem, we implement a non-index vector to replace the integer vectors. For the second problem, we propose an interactive approach to generate key-old features. We call this interactive approach artificial engineering. The proposed approach includes two stages. The first stage is oblification stage. It uses the condition product to increase the dimension. And the second stage is reduction stage. Reduction stage. It uses feature selection and feature re-index. Okay. Here is an example. First, we use feature curse, condition product to increase the number of features. The number of features increase very fast. And then, use feature selection. We have implemented several types of feature selector. Use feature selection to select the most important features, and then re-index the features. Why we need to re-index? Because we want to reduce the feature space and then calculate or append the new generated features to the existing features and finish one iteration. And the iteration goes again and again to generate key-older features. So, all the experiments based on Kax dataset shows our method is better than others, including factorized machine. Factorized machine uses a hidden way to perform feature curse. So, the second feature is spark-on-anchor. Spark-on-anchor is used for Sona. Actually, it's not a new arrival component. Sona is a standalone machine learning network on Spark, just like Spark MLNIP. But it does not depend on Spark MLNIP. The new features in Sona include feature engineering. We not just copy the operators from Spark, but at a non-index vector, as I mentioned, support for training sparse data, sparse models. Second, seamlessly connect with automatic kind of primate tuning because we add a new model, AutoML to the Spark, to Android 3.0. And last, we have Spark Fishing API. So, there's no cost for Spark users to reach to Android. This feature shows the architecture of Spark-on-anchor since it has been released before. So, just put it here as a reminder. This is how to program in Spark-on-anchor. First, we just start and stop the primary server and load the data and define the estimate, and the special file gradients. You can define the gradients in a JSON fashion. This is a logistic regression. And the train your model use fit. And evaluate your model use evaluation. So, we also provide measures such as NOS, ROC curve, AUC score, F1 score, and also performance statistics. We also bring new algorithms for Spark-on-anchor, such as deep-close network, primate server, primate server. Any questions? Okay. But in this slide, we want to connect actually the algorithms in Spark-on-anchor is very different from those in Spark. The algorithms in-anchor is focused on recommendation systems and graph learning, graph embedding, graph neural network. But the algorithms in Spark is a general purpose. As you can see, both the left figure and the right batcher, the unions is actually very small. So, you can use Sona as a supplement of Spark. And for performance, we choose two popular reasons, deep and wide and deep FM to compare with TensorFlow. All the experiments are performed in the same condition on creator data sets. The result is Sona runs three times faster than TensorFlow in deep and wide algorithms. And TensorFlow runs a little bit faster than Sona in deep FM algorithms. So, actually, Sona is very efficient. The second computation engine is Patochon-anchor, short for Patona. This is a new component of Android 3.0. So, actually, it's origin from the internal requirement of Tencent. Tencent has one of the largest social networks in the world. The network behind QQ... QQ is an instant message app. It's very popular in China. So, the network behind QQ has one billion nodes and two hundred billion edges. The network behind Wichapay has one billion nodes and ten billion edges. So, very big graphics. Based on... There's a variety of business scenarios on those social networks. For example, the credit writing, anti-forward, anti-mountain rendering, acting recommendation, however, to support those business scenarios is not easy. It's a challenge because the data is sparse and the model is cute. So, we surveyed the existing platforms such as TensorFlow, Patoch, and we found that TensorFlow and Patoch are very good at deep learning, but they are not good enough as there is sparse data and cute models while Android is good at there is sparse data and cute model. But the computation graph in Android is relatively weak. So, we take the advantage of both Patoch and Android and start the sub-projects, Patoch and Android. This is the architecture of Patoch and Android. First, you should define your model in person and save the model in pt file and then submit your models to the distributed system. The PS is a primate server. The PS was used to store and update primates. The driver was used to start PS and store PS and used to create, initial, update, and save primates. And the worker was used to train in models. First, pour the primate from the primate server and feed the Patoch through Java native interface, JNI, feed the data and the primate to Patoch. And the Patoch performs the forward and backward calculation to get a gradient. And the worker feeds the gradient and puts the gradient to the primate server and finishes one iteration. So, this is the architecture. The second part is transparent to end users. So, users only have to care about the first part, the passing part. I have an example to show how to program in Patoch and Android. Take the graph convolution network as an example. Define graph convolution network now and define the forward process and create a north function. At last, save your model as pt file and submit the pt file to the distributed system to train your model. We have implemented two graph neural network agreements. Graph convolution network, graph set. We also implement many recommendation agreements such as factorized machine, deep factorized machine, deep cross network, deep and wide and so on. So, you can try it. The third feature is hyperprime in tuning. Manually, hyperprime tuning in machine learning is exhausted. So, there are three traditional hyperprime tuning methods. First is grid search. Grid search splits the primate space by grid and assumes the distribution of the hyperprime is uniform. However, this method and the computation increase exponentially with the number of parameters. Another, the distribution of hyperprime is usually not uniform. So, this method often meets the optimizer, optimal and no efficient. The second method is random search. Random search is a sample sequence of hyperprime combination from the configuration space and evaluates all the sample hyperprime combinations to select the best one. So, this method is also inefficient. The last method is basing optimization. This method uses a cheap, very cheap stroke gate function to replace or to approximate the expensive target function. The stroke gate function will create a mean and a variance for a given hyperprime combination. The accusation function uses a mean and a function to get a score for the given hyperprime combination. So, if the score is higher, it's expected that the combination will get a higher performance in target function. Actually, basing optimization requires less evaluation about the target function and has a higher performance. We have implemented all the three methods in Android 3.0. And here we want to connect basing optimization. We have implemented two stroke gate functions. One is Gaussian process and the other is random forest. We also implement the EM algorithm and the RBFJS to optimize the color function of Gaussian process. As for accusation function, we implement EI, expect improvement, PI, probability of improvement, and UCB, the upper confidence bound, but early stopping. As you know, it is very expensive to evaluate the target function. So, if we found the performance of given hyperprime combination is not good in the first several iterations, we can stop it. So, this technique is called early stopping. We also implement early stopping in our system in Android 3.0. Our experiment on logistic regression shows that Gaussian process is better than random search and random search is better than grid search. The result is consistent with our expectation. And the last new feature is Android serving. As more and more deep learning agreements implement in Android, the efficient serving becomes urgent. The left figure shows the architecture of Android serving. Users can access Android serving through GRPC and the REST4 API. Android serving is a general purpose machine learning serving framework. So, you can serve in models from different platforms such as Android, Partouch, PMML format. Through PMML format, we can serve models from Spark, from XGBoost. The same as tensorflow serving, we provide 5-grain version control. You can specify the latest version, the earliest version, or you can specify what you like versions. We also implement 5-grain service monitoring such as QPS, query per second, number of exception requests, response time distribution, average response time, CPU, memory, GPU, and network usage. So, in this slide, we compare Android serving with tensorflow serving. The left table shows Android serving is a cross-platform serving. It can serve models from Android, from Spark, from Partouch, PMML, but TF serving is serving model only from tensorflow. The left table, we compare the performance of Android serving and tensorflow serving. As you can see, the performance of Android serving and the TF serving is compatible, especially a little better. For example, the QPS of Android serving is 1,900, while tensorflow serving is 1,800. We have integrated Android serving to our products, Taiwan, on cloud, so this is an example. Import your model and choose a version and deploy your model. Give a name, choose a running environment, choose a quick and quota, okay? So, it's deploying your model. You can check the logs. It takes a while to deploy a model. Once it's deployed, you can test, choose machine learning and use adjacent. This is the response. This is monitoring. QPS, the usage of CPU, GPU. That's all the new features of Android 3.0. This is the user case we want to share. The first is short video recommendation in Tencent. The user's play notice and the contest was forward to the Kafka in real-time and Stream Computing Engine Storm subscribes the data from Kafka. And there is a real-time feature generate query the user and the video profile from the KB store and generate features. The generate features was forward to the real-time training process and update the online model every 15 minutes for one client. And on the other client, it dumps to the HDFS for offline learning. The grievance is factory machine. We use Android to perform the offline learning. The offline model was used to initiate the online model or we reset the online model when there is an exception. So, let's focus on the offline training. The feature size is more than 63,000 and the data value is 2.4 billion samples. The previous training time is more than 10 hours using Spark. The Android training time is only one hour, about 10 times faster. The second use case is financial anti-floor intensive. The network is heterogeneous. There are three types of edges. The first is payment. If user A pays money to user B, there is an edge. The second type is device. If user A and user B share the same device, there is an edge. The third type is Wi-Fi. If users share Wi-Fi, there is edges among them. So, the fast unfolding algorithms in Android use to discover communities in the network. And the fraud risk model uses the discovered community and the user profile network features to create an anti-fraud strategy. So, let's focus on the Android part, the fast unfolding algorithms. The data value is 1.5 billion nodes and 10 billion edges. The previous running time is more than 10 hours based on GraphX, but the Android running time is only five hours. Four times faster. Okay. Thank you for your attention. This one? You mean go back? The second from the last? This one? Sorry. So, any questions? Sorry? The fact-traced machine. You mean model? Fact-traced machine. And we follow the recognized leader optimizer. The last slide. This one? With the code? Okay. This one? Oh, sorry. Is this one? QR code. Oh, QR code. GBDT. Actually, we don't know your GBDT in your case. But we do support this algorithm in Android. We have implemented very efficient GBDT algorithms in Android. We also published two papers about GBDT. So if you want to know more about GBDT, you can refer our website and get more information. Unfortunately, no. We don't compare the two algorithms in this scenario. What's the reason? I think it's the dimension. You know, GBDT supports the dimension relative lower than the FM. Because you should calculate the gradient histogram for each, for every feature. So it's not very efficient. In Android, there is a GBDT algorithm. In Android, you can support about many of features, not many of them. So for many of features, GBDT does not work. So in this scenario, in this use case, the dimension is very kind. So we don't compare it with GBDT.