 Before I start I may ask you how many of you are working with machine learning or AI in daily basis? Great. So I don't have to explain some basics of machine learning. Today I want to share with you three points. First, the digital banking in China and what is the real bank. And second, the basics and the applications of a new technology called federated learning. And third, introduction to an open source project we create and donated to the Linux foundation is called FATE, the federated AI technology in a boiler. Oh, let's start. The way bank and the digital banking. What is way bank? To most of the people we are honored as who, who are you from? So just discover way bank from Google map. First we define a tiny little red pin just on the north shore of southern China Sea. Magnified by 100 times you can see it's just beside, it's in the Shenzhen city just beside Hong Kong. Shenzhen is kind of China's Silicon Valley, Tencent, Huawei, ZTE, Monterey, all the tech giants are located in Shenzhen city. And then another 100 times you can find just beside the Shenzhen bay. That's our institute in this building. One and the only facility we have, we have only 2,000 employees in our bank. And over 50 of them, 50% of them are tech guys compared to the traditional bank only have four or two 8% of their employees are tech related. We have a record time, a record of TTM time of 11 days. That is on the day one you are scratching a financial product and on the day 11 you have it online and operational. We established our banking service in the end of 2014 and by the end of 2018, we are only four years institute. We have acquired more than 150 million users and generate annual revenue about 10 billion in Chinese currency. And we are 30% held by the Tencent group. So how does this even happen? How does this even happen? What is digital banking? Why is it blooming? That because digital banking is a reflection of globalization. This chart is actually from quite a place, quite a place, a very famous institute in the financial area. On the year of 2001, beginning of this century, China has joined the WTO. At that time, beginning from that time, China has become the manufacturer of the world. That is when the consumers and buyers in America buy more and more products produced in China. 90% of them send back to the US and Australia in America. The economic growth in China is a share of the economic growth in America and the economic growth of the global. That is the play between the year of 2001 and 2008. In 2008, something happened. The financial crisis, the Western market collapse, and many products want to sell anymore. So the Chinese government decided to change the strategy. In 2009, many massive construction projects conducted in the China area. Two of them are the 4G network and high speed rail. The right chart shows the penetration rate of mobile internet in China. It is a very wide dot on the bottom and goes rocket high after the year of 2009. You can see China has built more base stations than the rest of the world combined. We have roughly 3 to 4 million base stations in China. And the rest of the world combined have roughly 3 million base stations. And another massive construction project is the high speed railway. The blue line is the passenger number of passengers in the high speed railway carried per year. You can see around 2014, the passenger carried by the Chinese high speed railway has passed the number of the American aviation passenger numbers. In the range of 1,000 kilometers, traveling with HSR is more convenient than traveling with aviation. That create giant cities, group of cities, and unprecedented consumer market on the mobile internet. At the beginning of the China during globalization around 2001, the US consumption levels per household are 13 times larger than China's. And now it's only about three times. And actually in the years, we can forecast that we will pass the threshold of two. And it is simultaneously happening on the mobile internet. The Chinese don't use credit card. We don't use traditional wallet. We use smartphone to pay everything. The Chinese consumers like to go around and buy things anywhere, anytime, as they want. So the banking is also going for digital by AI technology. First thing AI involved in banking is online acquisition. You can build an online platform based on transfer learning to do real-time bidding on the internet traffic. It's 10 to 15 times better in cost efficiency than the traditional profile-based targeting. And then the customer will click the ads to the landing page or to the WeChat channel to communicate with us and inquire about some products. This is a chatbot we build for our customers or to be our customers. There are four million sessions of inquiries running our chatbot every day. It's maybe the world's largest chatbot platform ever. But you will be used to it because the number of Chinese is huge. Now, when they are decided to buy or employ our financial products, they're going to the next step. We'll have to identify the actual identity of our customers, not the virtual identity online, but the actual identity of themselves. So that grows the computer vision approach for ID, verification, anti-fraud, OCR, or something like that. We have more than one million inquiries every day. Maybe the largest platform on China also. Building anti-planning technology is 10 times cost efficiency than the traditional core center services. And then when they pass the identity of verification, the customer will go through our real-time risk evaluation platform, building on large-scale machine learning risk evaluation models. We have over 200 models for risk evaluation, which have more than 200,000 variables containing these 200 risk evaluation models. It has three times more cost efficiency in the risk management compared to the traditional scorecard. Scorecard is, you can see it being the perspective of machine learning. It's a very small-scale regression containing only 10 to 12, 10 to 20 variables. So, all together, we have building AI-driven experience to our customers, that which is get along in 60 seconds and typically 15 seconds in average. Anywhere, anytime he wants, there's no extra fee charged. That's full digital driven by AI. As you have seen all about, you may ask, I want to go to China. I want to get lucky out of the coverage, but forget it. The party will be over soon because the data protection, we can call it data protectionism, is coming. That's the timeline of GDPR. You can see it was first introduced in 2012. And unlike other European agenda, the GDPR has a rocket speed passing the European Parliament process. And in the year of 2015, all the three major council has a great agreement on these regulations. So, from the year, roughly about 2010 to 2014, every year, a famous consulting company called Gartner is talking about big data all over again. But in the 2015, the big data technology suddenly vanished from their technology hype map. The reason that Gartner gave is, they think big data will be everywhere and embedded in every technology. So, it's not individual technology anymore. But the real reason for that is this, because the GDPR has passed the European Parliament process and will be carried out in the coming two or three years, which is in the 2018, you can see many corporations are fine with GDPR and charging them with a great deal of dollars. But it's not over again. The data regulation is more strict and severe in China. In the European or in the California or in the US, the data leaking is a corporate offensive. It's not punishable by individuals. But in China, data leaking is felony. There is one individual, one natural person in a corporation responsible for protection data. Once the data is leaked, it will be punishable up to seven years in prison. I think it's felony in China. So, you cannot exchange data or you cannot buy data. It's very risk and dangerous behavior. So, without data, there is no digital banking. There is no digital banking AI. So, we build this technology called fair data learning, comes to rescue. This technology allows us to build models without moving extra data from multiple parties. It doesn't exchange any raw data or encryption format of red data. It only exchange some encryption format of gradients, something like that. And you're kind of back engineering any data from it. So, that's pretty much part of the regulation problem because no actual user data is leaked during the process. It can be used in many applications in banking or even not banking business can utilize this technology as healthcare research, medications. We have built several networks. You can see it on the upper left right. We built a network between banks to create more anti-matter laundry models and the banks don't have to share their user data. And we created a network between the internet companies and the insurance companies to get more accuracy of pricing. And something like that has a lot more. So, we come to the second session. We want to share with you that is the basic standard application of fair data learning. Fair data learning is first introduced by Google in 2017, I think. They introduced this technology to prove that Android has protected the user's privacy. It was used on the G board. That's the input method you're using on the Android. It has a tiny little function that every time you're typing a key strike on the G board, it will suggest you a word. This model is trained based on the input sequence and the correction sequence that your user actually input. So, it's totally private. But with the technology of fair data learning, you can utilize the data collected from millions of cell phones without actually collecting them into the cloud. The model is trained locally on the device. And no actual data is transferred to the data center. But this technology is only an incremental upgrade of distributed learning. It has many constraints, such as each party must have the same feature space. If you want to do some collaborations between different companies, which have different business, this technology will not apply because they don't have the same feature space. We extended the concept of fair data learning. You can see the downside right is the horizontal fair data learning, which requires the same feature space and label on every party that is original down by Google. And what we are doing is the upper right thing is vertical fair data learning that allows different feature space to work together to build more decent machine learning models. And the vertical fair data learning can be only done on the simple alignment together. That's you have common users, which means you have common users. But there is a lot of data you're abandoned to use because they don't align together. To solve the problem, we have another technology called fair data transfer learning to facilitate the abandoning the data in the model training and improve the performance of the model. There is a video. Actually, there is a video on the YouTube we created to explain the three categories of fair data learning technology. Starting from the vertical fair data learning, it's the most complex thing. First, you should understand the exchangeable encryption algorithm. It's a basic technique used for private settings section to align the samples from multiple parties. So it's a lot of encryption equalization here. We want to dive into it. And after the sample is aligned, you can do the actual model training. Machine learning, the most common method for machine learning model training is grading the descent. And every time you explore grading from various directions, you have to calculate the loss function to determine whether the model is converged. So in the encryption machine learning setting, we only have limited arithmetic operators on encrypted data, that is, arithmetic soft plus, arithmetic soft multiply partially. So we have to rewrite the loss function. It's a logistic regression loss function. We have to rewrite this function in encryption format that is with polynomial approximation. So you guys have to get tired of equations. Don't worry, we have 50 pages of equations come up with. I'm just kidding. So during this process, only some of the encryption mediated data is exchanged between parties. No raw data or the raw data, encryption format of raw data is transferred. But compared to some technology, for example, the differential privacy, you can put some noise in one database and move to another party to build a model. But the risk is still there. The data you move contains all the information that is private. And if your method can be back engineered, it has a high risk of exposing the entire database. And the factory learning only exchanged some need-to-know information in machine learning process and in encryption format. So even if you can back engineered or decrypt the mediated data, you still cannot back engineer the raw data. It has two or three layers of protection. Use cases. We are back. So most of our use cases are loans, credit estimation. Ideal big data for this decryption of SME loans are showing on the right hand side. You have all the data you have to describe the actual state of small business. That is the credit report, the finance, the tax, their reputation, and if they have any legal sort. But in reality, it's more like the right hand side. You only have some blank credit report from central banks. Far from enough to describe the credit of a small business. So we get creative. We build a factory learning model between the digital invoice data and the credit report data. Digital invoice is some kind of China thing, I think. Because in the US, the invoice is printed by different institutes. But in China, they are back-end to authenticate all the invoice. So they collect all the invoice data, but they can't share or expose to the third party because it's a national security matter. But we use factory learning to utilize this data. And in the result, the model accuracy moved up to 12%. And after months of trial, the cases of bad loans has done by 40%. There is also other applications that we're trying to build with factory learning. You can not only use using the vertical factory learning between the two parties, you can also apply horizontal factory learning simultaneously to create a network. And it's not just for big data or large-scale machine learning. You can also use it in deep learning that we build a deep learning factory network between computer vision customers. That's something we are sharing on our YouTube channel. So this basics and applications of the factory technology. Now we are moving to the project, the factory AI technology enabler. The vision of factory AI technology enabler is to provide industrial-level factory learning framework with autofbox usability. We are explaining it step by step and enable big data collaborations with data protection regulation compliance. That's our GitHub and that's our website. You can find all our resources on this website. The roadmap of this year or the roadmap we have, until now, we still stick to this roadmap. At the end of January, the beginning of February, we declared the project on the AAAI conference in Hawaii. This year, the major components we released on version 0.1 as factory machine learning toolkit is the core algorithm components. And on the March, I think we just reached 100 GitHub stars. And the first contributor outside WeBank has appeared to approve our stability. And on the 0.2 version in May, we released the first version of feed serving. You can deploy your feed data model online and do online inference. And in the June, we donated this project to Linux foundation and it became a public governance project. And in this month, for specific today, we announced the 1.0 version of feed. It contains very important features like feed flow, similar to cube flow, and the dashboard of federal learning. We are realizing everything about federal learning. And we are planning to release the next version on 1.1 in the September. In end of September, it will introduce secret sharing, the new protocol for federal learning. At the end of the year, we fully support deep learning model. Because now we only support several kind of deep learning model, not all of them. That's the project landscape. I want to dive into specifications. It's too complex to explain. The core component is the federal data ML. It has four layers from MPC protocol to numeric operators, machine learning operators, and you combine this together to get federal data machine learning workflow. A typical pipeline of model training process is this. It's covered well by the federal ML lip. And to run this pipeline, you have to employ feed flow. Unlike many other projects like cube flow, it only provides flow that's running in Dental Central. But in the federal learning process, there's a multi-partisan. You have to arrange and coordinate the training process across different sites. So the core of feed flow is federated task scheduler. Actually, federated task scheduler from different sites are linked together. After you train a federated model, you need to put it into service. Because in many settings, like vertical federated learning, you don't have the whole model. You only have partial of the model. It's called submodels. So all other submodels must be put online simultaneously. That introduce the core function of feed serving that is model version control. You must align the subversion of submodels and align the submodels with their data. Another important feature that feed serving have is online federated ML. It's actually online federated machine learning feature engineering. Because in the real applications, you don't have all the data in your database. Partial of the inference features, partial of the features used by inference, is coming with the request. So you have two doing online feature engineering to transform this request data to the features that the model can accept. That says online feature engineering. If you don't have this, you will have to hardcore this component. But it will have slightly performance difference between the online inference and your offline evaluation. Because the difference in the online feature engineering model, there comes the most popular components of the fate, the fate dashboard. Because the federated learning technology is new and maybe complex in management. So the fate dashboard sold it all. You can see, you can monitor every model processing, modeling processing, the diagram of model construction, the evaluating of features, the evaluations of the home models. It's just like you are using a centralized modeling platform. It will simulate it for you. So that comes to the last part of my presentation. Let's call four collaborations. That's why we are here, isn't it? We will introduce several collaborations on the Linux Foundation. We have already begun. The one and the most important is we collaborated with CNCF project called Harbor that allows us to deploy on the Kubernetes smoothly. And we are also looking at the projects like the QB flow because it's also a flow manager for machine learning. The Harbor project is originated from China. It's developed by the VMware China team. And it's phenomenal. It has thousands of users. They are very warmhearted. You guys should admit them. And another project I want to mention is Angel. It's a full stack machine learning platform originated from the Tencent Group. It's stable. It's beautiful. It's powerful. It has millions of users and can process petabytes of data. We are testing their technologies, just like the primary servers and the asthmatic labs. So again, I bring up our website. Everything we are sharing on our website is fedai.org. And there is some more materials for you. Our website, we also have a standard to bring different implementation of federated learning together so the different software can talk to each other. And we also have a YouTube channel. They have very interesting videos about federated learning and all the applications you are doing with your federated learning. It's not only banking. It's about urban management, banking building, various things. And if you want to find some materials about the papers, the tutorials, something like that, you can go to the reference materials link. And if you are just want to talk, you can just write me an email at torbychain.webank.com. Anyone who has a WeChat account can scan this QR code. We have a robot assistant behind this. He will provide any assistant you will be like. That's all. Thank you.