 Good advice. Cyber security is something I think is important to everyone and, you know, Greg's advice is very very good in order to stay more secure, stay as close as you can to the latest kernel version. Now we want to talk about a topic that is on the top of everyone's mind. You know, China is referred to as the AI lab of the world. Artificial intelligence is a huge movement here in China, and we are lucky today to have two folks to talk to us about the latest in artificial intelligence. Please welcome from Huawei, Pei Jinghe, and from China Mobile, Dr. Feng Jilan. Go ahead and have a seat. So we're talking artificial intelligence. We're talking open source. I thought maybe it would be helpful for each of you to just quickly introduce yourselves and tell the audience a little bit about how you're using AI and ML in your companies and just get us started. Okay, so good morning to everyone here. I'm Jilan. I currently work for China Mobile as the chief scientist to focus on AI and the next generation network, R&D. My name is Pei Jinghe, and I'm from Huawei. I have been working in the open source for quite a long time. And recently I joined the AI effort in the company, and I got some background in the neural networks and pattern recognition. So I think the new boom of AI brings a lot of opportunities for both companies and developers. So it seems like with every big new technology movement, whether it's blockchain or big data or cloud computing, open source is now really the fundamental infrastructure for almost everything that is being implemented, whether it's at China Mobile or a big company like Huawei to create products and services. I'd love to hear about what open source projects you're looking at maybe using in your organization as it relates to AI. Okay, maybe starting from me. In AI, I think because there are so many AI projects, open source projects, and a lot of them are very good. So in Huawei, we basically use a lot of AI open source projects in different layers. So for example, just to take example of the training framework, we use MXNet, we use TensorFlow, and even Cafe2 in different labs, different product lines. And more importantly, we also use some other big data projects to process and to manipulate the data because this generation of artificial intelligence is really data driven. And so a lot of projects we have been using, some of the projects we participate very actively, we contributed code back in a lot of areas. Jim, I think just now you mentioned about the user scenario and the user application in Huawei. I think I want to add a little bit more about that because in Huawei, AI is not only a tool to improve our product and the solutions. It has also been heavily used in our internal procedure to increase our internal efficiency. As you know, Huawei is a very big ICT solution provider. Take one example for logistics and the shipping. Because every time when we do a lot of solutions, when we ship solutions, for example, base station to China Mobile, that system is a very complicated one. It has a lot of parts and mechanics, you know, packages, and actually the shipping itself is very challenging because the shipping needs to be arrived on site in a very good sequence and in very good timing. And previously that was done by human beings, you know, by, you know, experts, but recently we incorporated artificial intelligence, which basically provide much better time to market and much better efficiency to our customer. So this is just one single, you know, simple example for how AI has been used in Huawei to improve our efficiency. Interesting. Now you deal with the biggest, essentially mobile network in the world, right? So 850 million users, the complexity, the volume, the speed at which you need to work is just amazing. Tell us about how at China Mobile you're using the different technologies, what you're seeing in terms of this area. Okay. So we are more kind of demand-driven versus technology-driven. So there's just so many great AI-related software supporting systems. We can't track all of them. It's hard to follow out. So we're trying to kind of consolidate based on our business needs. So there are majorly, mainly there are three things, four things that we try to do. First is that we're trying to put together a general platform which can support corporate-to-wide applications, AI-related applications. So for that, we do use support to most open-source deep learning framework, reinforcement learning framework, and again, the generative learning framework. So we'll follow the main trend. For now, for the major ones, we do have to use TensorFlow. For speech, we use CAUTI. For NLP, natural language processing, we mainly use TensorFlow. We use lots of Othiano, the kind of the old one, the research style, but we are quite familiar, so easy to change and improve. And for business, we use this platform to support customer care. I think customer care is truly a few... For generations, generating AI has really heavy influence on. We have kind of mobile now. I think you put together our mobile customers and broadband customers. We have, I think, over a billion number of customers, so customer care is important for us. And the second one is the marketing. So how we apply AI to truly improve our marketing sales efficiency. And third one, the biggest one is the network. The network intelligence. There are tons of problems that we are facing. We hope AI can help. And for that, we use quite a lot of smaller one open source software to support. So it's actually based on whatever we need, we grab the software. And for those we haven't used, we have to learn and capable to really change and improve. And for the smaller ones, we contribute back to the open source community. Got it. I get this question all the time from different organizations and people. I've got data, I need to train a bunch of models, I need to put them into a production pipeline and deploy them in my enterprise for customer care, natural language recognition, whatever it is. But the number one question I always get is, you know, there's PyTorch and TensorFlow and MXNet and all these different components and I'm not sure how to set up the pipeline correctly. You know, is there, what advice, I'll start with you, what advice would you give people as they're looking across sort of a complex landscape of different software, different open source projects with like getting started or picking the right one, like they're just, it seems confusing. Yeah, it is. So the practice we have is actually some of these, for example, the training framework you just mentioned, you know, are originated from their kind of specific user scenario application. So for example, like image processing. A lot of people, when you, you know, start to, you have a bunch, a lot of, you know, images, you try to do a lot of, you know, AI work. When you start, you probably, when you look at the paper, the paper probably will point you to the framework, you know, might have been implemented the algorithm described in the paper. So I think for the developers, for the data scientists, it's very easy for them to start to go into one framework. Then it started to work from the beginning to the end. And the problem is later they find out when I want to try to switch to another framework, I got some difficulty. So one suggestion I have is currently there are some high level API. So for example, career, you know, you know, some, like the high level API. Yes, which can help, you know, to specify your workload, your training workload, and then define, okay, which training framework you will try to use. So maybe this is one suggestion, you know, try to use some high level API, you know, open source projects. But I think another problem, you know, especially for me is how to pre-process or how to prepare the data. And I find that is very challenging, especially when people try to put them into different industry verticals. Actually the biggest problem they face is, I don't know how to pre-process my existing data to feed them into the TensorFlow, feed them into the MX-net. So actually the preparation stage is very important. And I think there are more and more projects doing that. So I suggest the people, the developer to keep close eyes in that area because that area I think will probably have the next boom in the AI open source projects. Because for the training framework, we have a lot of them already, but I think for the data part, I think we are just the beginning. We have a lot of existing big data, but for AI we need more advanced data processing open source projects. And I think the next stage we will see more and more in that area. So more work to do on the data processing side, more work on the pipeline side. It sounds like China Mobile is experiencing this as just pragmatically you're picking a framework depending on what you need to get done and you got a lot of them. I mean, is this what you experience and how would you like to see that change? What would make your life easier if there was more consolidation in the open source frameworks? If there was more general purpose frameworks, what would you like to see? Okay, I guess probably everyone has their own philosophy of picking and learning. Every company has its own style. Personally, I often talk to colleagues and our company co-workers saying there are a couple of things that we want to imagine. One is it has to be useful. I think this world kind of for me is every morning you wake up and you feel like pretty scary. There are so many people who are super smart. Every day you check the papers, their new inventions, their new ideas and everything is changing in the world and you become younger and younger. And now we have machines even more smart, way more smarter than we are. So facing this quite complicated world, what are we going to do? So there are a couple of things I really suggest to people we work together. One is really focused. I don't think anyone can master the whole stack of technology. No one can in our life of time and say impossible. Second, I really want to say once you pick whatever you want to focus on we always say AI as one word actually means lots of things. So I'm thinking if you work on natural language processing, if you work on dialogue generation, if you work on image, face, object recognition more specifically just concentrate on that problem and then go deeper. So once you pick whatever you want to focus on, there are lots of softwares people use I mean you can track that few. Second one I really want to say is go back to the basic. There are tons of great open source software you can grab and you can marry that software with your data and within probably couple hours a day you can create certain capabilities and you can claim you're capable of doing something. Actually you don't really understand what's the basics. So when you apply this into real business there will be problems. There won't be that easy. If you're incapable of really improving it, understand what's behind the scene I think you put yourself into certain risks. So we're thinking it's a focus and truly go back to the basic and make it useful. Once you are there I guess you can definitely enjoy the beauty of the macro side of the programming coding and where the truly the intelligence comes from. So in terms of overall there's another dimension is pick open source software which you feel like you have a strong community who can support. There's tons of problems that you're trying to solve on your own. Maybe that takes days and months but if you have a community you can search online, you can ask friends in your company because people around you find a solution much faster. So pretty much that's great advice and it seems like we're early on in terms of the maturity of a lot of this technology and there's all these different little communities out there some clearly larger, MXNet, TensorFlow, some of these communities but still a lot of different things. Paging AT&T they open sourced a bunch of code called Acumose and the goal for them was not necessarily pick one framework versus another but allow a production pipeline that would be able to better share models, share data sets so that they could then reuse those inside AT&T so this was a problem that they had in their organization I'm sure you have the same problem at China Mobile and the idea here would be to create this marketplace to be able to facilitate this. You've seen this, give me your thoughts on projects like that what you're seeing in the market. I see commercial products, Amazon has SageMaker which is a very similar tool to the Acumose open source project give me your thoughts on these kind of projects. Yeah, I think Acumose to myself is a very good one it's probably not as stylish as TensorFlow but it's a very fundamental one because it basically created this kind of pipeline to allow different open source, AI open source component to be integrated. Because just now I talked about the vertical industry when they try to apply the AI in their scenarios I think if they know Acumose this kind of project I think it's more easier for them to bring in the AI experience from the open source world. I think the marketplace is also very vital in the whole ecosystem because that's a place where data scientists, the data engineer and the application developer can exchange and I also think in the future if we can incorporate technology like blockchain which basically will make the data exchange model exchange more economic driven and probably will bring the ecosystem richer and more to a better. So I think a project like Acumose is definitely a very good starting point but of course there are still a lot of things need to be done in that pipeline. More stages need to be added in and more component need to be integrated in. Do you think that we're going to see a world where you have aside from just working in open source on the technology infrastructure more actual sharing of models and of data sets amongst let's say affinity communities like networking companies big global operators or supply chain management companies or do you think that that's something that's just hey that's our secret, that gives us competitive advantage like where do you see the world going in terms of sharing the data and these models amongst different organizations? So I had worked at AT&T for over 12 years. Okay so you're using both sides of that. I'm quite happy for AT&T. They changed that culture very quickly and to focus on software focus on AI. AT&T has over 60 years of experience on speech and image video processing. So definitely they're very good on AI. So I learned to Qmer a couple of months ago. Good to have met friends out there. I probably have different ideas to see. So I'm thinking I kind of was trying to solve a problem as the kind of problem I think we chanted yesterday is the AI efficiency. So in real business I'm the sense of today which quite different from a couple years back we're trying to find use cases. We're trying to find business opportunities to apply AI. Now it's the other area. They have so many. There's so many opportunities. You can apply AI. You can make a difference in real business. Then the problem comes back and saying each of them costs a lot to make it work. You have to work with business to collect the data, collect the right data. That's a long cycle. You have the major iterations to really to get it right. And then you have the AI developers to closely work with the business. You have to be humble. The technology to be humble. The people behind that needs to be humble to understand what's truly their bottleneck problems. Otherwise you may create something creepy or disappointing. You have to truly respect what their truly problem is. Then you convert to how business is a problem into an AI problem that you can solve. For AI, you have to take the data. You have to know what your ex looks like, what's the input of the output. Then you have to test in the business. Often the case is when you apply, it works in a way, but not as much as they expected. Then you need, when they are trusted, you have more iterations to let it grow. The kind of efficiency is truly the problem. Now I think it's somehow hindering AI to creating a bigger value. Accomas is trying to... Can we have a model marketplace so we can share on the model level? At least for now, as of today, I think it's very hard. At least for a telecom communication industry, we do have so many problems, and each problem is quite individually uniquely defined. I think we lack that level of abstraction. Once you create a model for those very specific problems, when you grab that model, apply in different scenarios, apply in slightly different problems, often at times they won't work. Then you need to adaptation, to do transfer learning, although there are lots of research papers on transfer learning, but it's not... I think it commercially proved successful yet. Other lots of trying, we try a lot of those technologies too. Just to give you an example, since China is quite big, you have 31 provinces. Once we create the same thing, we create a model for province A, then we transfer that model from province B, often at times it won't work automatically. I really hope it works. I think one day it will work, but I'm thinking I just... I talked to Mazen, who is leading the Accumus... Mazen Gilbert, yeah. We both came from speech recognition background, so I'm thinking if truly it's the case, we can stay on that model level. But for certain problems like speech recognition, image, and probably certain natural language problem, it has... The problem has to be defined very simple. So you're facing something generic. So once the problem is simple, probably that model is general. Right. In that field, in that problem, for lots of business problems that we are facing, I really hope my doubt. I hope it works. I think it's difficult, but I like this concept of AI efficiency. If you're running a business unit, trying to continually move towards optimal AI efficiency, picking those abstractions, trying to generalize the frameworks and models and so on, as we find that, seems like directionally where people want it. You point out that it's super hard. It's going to take a while. My opinion is, I totally agree with Dr. Phan's comments, but another thing I want to add is, overall, I think AI should be more and more open, because we have this kind of black-boxing problem, which impacted the industry, and actually a lot of people start to concern about the AI. But only if we can make the AI, for example, model more open to be more shareable, I think this will decrease the concern and eventually put the AI into a wider use. So this is just a simple comment, but I think overall trust more efficiency by opening it up. And also mass security. So I have been thinking this for a while, saying for every AI problem which proved very successful now, as face recognition, as object recognition, and the speech, there's one thing common is that community has a very big, huge data set behind it, which means all the researchers and technologists and developers can take the same data set and tune their technology, improving it on the same thing. And the testing has been consolidated. One common way to test the technology has one training data to tune all. So I was thinking is how about, for example, for each industry for telecom, as example, maybe it's a suggestion for Linux foundation, instead of beyond what we are sharing today, the open source, which are on the coding level, can we do on the data level? I guess because telecom industry can still with lots of customers, we have the privacy concerns, we have security and lots of concerns, but I don't think that should be just a stop for us to share the data. There will be one way or another. We can create a data to a decent size which can really support, let the researchers, let the system to work on, so we can collectively push the technology forward. Without this common data set, so each one talks a different story, actually. So I swear we did not rehearse this question, but we have been thinking about this at the Linux foundation for a long time. And so last year, Paging, I think you're aware of it, you're on our board, we created an open data license. Because I agree, I think the data sharing is critical for the advancement of this field, and when you look back historically, one of the things that made open source so efficient was the open source licenses. The Apache license, the GPL license, made the flow of data, the intellectual property framework of sharing, very easy and repeatable, because these were all similar. So we actually created an open data license, a copy left style license with a share back requirement. If you add or modify data, you share data back, and then just a permissive license, which is everyone can share it, and there's an attribution to it, but I totally agree. My question is, we have, I think, a legal construct to share that data. The question is, is industry going to hoard data or share it? Are we going to see data consolidation and centralization and exploitation and monetization, or are we going to see more sharing or maybe a combo of both? Final thoughts on this particular topic, because I think you're right, it's a hugely important thing. Yeah, there are lots of technologies that you can apply on cleaning the data to protect, to respect our customers' privacy, but I'm thinking is, if we collectively work, we'll find a way to share the data. Once that data is available, we all can work on the same problem. The technology will go forward. For example, I'm thinking is, each speech, image, and natural language processing, they grab the general framework, AI framework to make it work for them, and when they solve the concrete problem, they find the new dimensions, you have to create different kinds of nets, you have to find different ways to optimize your neural network, and you contribute it back, and you find, oh, this is not only work for image, this is work for other problems. So I'm thinking is, as our industry, as I come from telecom, so I don't think that we are there yet. So I'm thinking if we have that abstraction layer of, you have to abstract our problems, since our problems are so many. For a network, you have different phases, you have planning and design, you have the operation, you have the optimization layer, you have the 5G, you have tons of problems on a daily basis. Our business is quite complicated. So I don't know, there are lots of key problems we need to solve, we have to solve, otherwise we cannot, it will become a problem for other industries to rely on this network. So I truly hope we have that data set. I heavily search a lot, and I guess it's probably back to several years ago, there was one data set that opened by Orange. There's a couple years back, I forgot which company, it's from a European company too. So I'm hoping it's not like every paper you read, lots of papers, they apply AI on network, on telecom industry, but none of the data is truly available. We just don't have that common thing. Once we have that common thing, I'm thinking is not only solve our problem, when we solve our, because our problems are unique, when you're trying to work on this, you will find new inventions, and then you can contribute back to AI, versus it's just one way, we use them, but we don't contribute much back to the core. Yeah. Well, unfortunately we're out of time here, but you've both left us with a lot of challenges here. So help to consolidate and create useful general purpose ML and modeling tools, find the right abstractions, increase the frameworks that allow for the pipeline to be more efficient, increase the AI efficiency, do better data sharing, find the data abstraction layers, we can have more efficient data sharing. We've got so much work to do here. Hopefully all of you can join in in the Linux Foundation Deep Learning community as just one of the many communities in this area to solve some of these problems. And I really appreciate both of you spending time today with all of us to talk about this. Thank you very much.