 It's my great honor and pleasure to give a talk at this wonderful workshop in this prestigious place. Today I'm going to talk about the work conducted in our lab on artificial intelligence. So the title of my talk is Building Better Connected World with Artificial Intelligence Technologies. So my talk will consist of two parts. First I will just pick up some examples of the work done at our lab on AI for the business of our company and also for the people in the world. And then I will just focus on one area that is deep learning for natural language processing, which I'm also working on. So my talk is more practical than theoretical. So I don't want to talk too much about math before mathematicians, but I like math. But at the end of my talk I will call for collaborations from math or theoretical people because we think that deep learning has brought new challenges to machine learning theory. And we really need some breakthrough in theory on deep learning in general, machine learning. So this is the vision of our company. So we want to build a better connected world with ICT technologies. So for our lab it means artificial intelligence technologies. So our lab has a very unique name. It's called Norse Arc Lab because the founder of the company, Mr. Ren, he thought that the future of ICT will depend on data, not just algorithms or chips, etc. So because he has this kind of vision five years ago and he thought that the flood of information will come, so Huawei needs to be prepared. That's why we need to build our own Norse Arc and to have our own technologies to serve better the customers. So we have this lab started five years ago. So we have people, now 150 researchers and engineers working all over the world in roughly 10 cities, including Paris. As Mo Wang mentioned, we have also people working in the French R&D Center together with other researchers and engineers. So we have eight research areas which are strongly related to the current products of Huawei and also future products of Huawei. So they are intelligent telecommunication networks, speech and language, recommendation and search, big data, analytics, computer vision, intelligent devices, internet of things and smart city. So as you know that Huawei has three major business areas, telecommunication, consumer and enterprise. So all our research areas are strongly related to the business of Huawei. So next I'm going to show you some examples of research on AI related to the products of Huawei. So first intelligent telecommunication networks. So in this area we are mainly working on two major problems. One is software defined networks. The other network maintenance. We want to use machine learning, data mining technologies to help improve the efficiency, etc., of telecommunication networks. So let's look at the example of software defined networks. So I think you might be familiar with this concept. In the future we envision that all the networks will be software defined. So in this kind of situation machine learning, data mining can play a very big role because a lot of complicated phenomena cannot be easily defined by human knowledge or by human experts. So it's necessary to take the data driven machine learning approach to solve all the problems including for example routing in a network or resource management. So in our lab we have tried to develop some technologies for SDN using machine learning. For example in a data center we have developed technologies to predict size of flow, etc., in the data center. For example in a data center because many different users may use the servers, the resources in the data center in a dynamic way. So the flow in the data center can dynamically change. So how to predict the flow, the size, etc., in the data center cast a big challenge. For example we have developed online algorithms using Gaussian process to predict the flow size. We have also developed technologies for example based on reinforcement learning to do a dynamic routing based on the situation of the data center of the network. We can automatically decide the best routing strategy and try to minimize the completion time. So let me show you a video to demonstrate the work we have done for routing in a data center. So in a data center we have a number of large number of servers and a number of routers. So by using reinforcement learning for example we can decrease task completion time by about 30%. When we iterate using reinforcement learning we can dynamically change the policy of the flow control and then to help improve the efficiency of task completion time in this data center. So based on simulation we can improve efficiency by about 30%. Let me show you another example that is speech and language. We are working on speech recognition, machine translation and natural language dialogue in this area. We have developed a number of methods for speech and language dialogue using deep learning. So let me also show you a video about single turn dialogue. So in this setting suppose the user inputs an utterance then the system returns a response. Here suppose we have a large amount of data so then the system can automatically generate a response given the utterance from the user. This is a generative approach which means the system can automatically create a response but the model is changed from a large amount of data. Let me show you a demo. So this is in Chinese so if you input I want to buy a Samsung phone then the system returns data support our national brand. I will explain the detail later. Everybody says this is good. This is a joke. I will explain the detail later. Suppose you go to a website and you copy and paste any random sentence you input into the system. This is the original input and then the system returns a reply like this automatically. It creates a generate a reply for another sentence and again you get a reply from the system. So we are the team first to develop this model in the area and so if you are familiar with deep learning this is a sequence to sequence learning model. So basically it's a kind of exchange of recurrent neural network. So first when you give a sentence we have an encoder based on RN and recurrent neural network. We transform this sequence of words into a sequence of intermediate representations. And then we have another RN recurrent neural network working as a decoder. So the decoders transform the internal representations into a new sequence of words. Because we have crowd roughly four million pairs of sentences from Chinese Weibo which is a kind of Twitter in China. And so we can take each pair of sentences as one round of conversational data. So the original message in Weibo can be viewed as an input from the user. And in Weibo usually for each post there are multiple comments from other people. We can take each post its comment of the post as a reply to the original input. So in this way we have four million pairs of sentences roughly four million pairs of conversational data. And then we can use the sequence to sequence learning technique to train this model. Which has about roughly 100 million parameters in the whole model. And so one interesting thing is that the model can memorize training data very well. So suppose you have a pair you want the system to memorize. You can just put the training instance into the training data and then to let the system, the model, repeat the training instance. That's why for the second example it's a joke. We just manually create such a pair and input into the model in the training data and then the model can memorize it. And once you input the exact same input you can get exactly the same response as in the training instance. Okay that's why we have the Huawei example. And so one interesting thing is that the model can not only memorize training instances it can also generalize. It can create natural responses given inputs in Chinese. That's very interesting and surprising. So with 100 million parameters with this kind of complicated model it can mimic people to create natural language sentences. That's very interesting. So we did some analysis of results roughly 95% of the responses are natural language sentences. That means from a human's viewpoint 95% of the replies from the system can be viewed as Chinese sentences. And roughly 76% of the replies can form a natural dialogue, one round of natural dialogue. That's also very interesting. You can view the replies from the system just like the examples we have seen. The system can intelligently reply to you in natural language. That's one work we did. So another big area for us is recommendation and search. We are working with Huawei smartphone group to build search engine recommendation engine and information management engine for our smartphone users. For example, this is for the app store. In China Huawei does have its own app store. So we are developing the recommendation search engines with Huawei smartphone group. So I just give you a rough idea about what I mean by recommendation search and app store. So if you have a Huawei phone, if you don't please buy one. And if you go to China you can go to the app store and there. So if you search for apps where you browse any page you can get recommendation from us. So the engines behind the app store are developed by our lab. So next I'm going to introduce another work by us about personal information management. So many users, almost every user has a lot of photos on her smartphone. And how to help the user to find photos, pictures stored on the smartphone is becoming a very big problem. So we again use deep learning technologies to develop model to help the users to find photos on smartphones. So let me again show you a video. So this is again Chinese, the user can input a Chinese sentence to find photos, images. For example, in this case, let me see, outside from an airplane. So we have 50,000 photos in the archive and as training data among all the photos we have created a training dataset. For each photo we have asked people to label each photo with some natural language descriptions. For example, in this case, climb mountain or something, you can get related photos. So the model is very simple. So we have, for example, 20,000 photos labeled, each photo has roughly five descriptions on average. Then we build a model like this. We have two CNNs, convolutional neural network. So the CNN on the left hand side can create one representation given a photo. And the CNN convolutional neural network on the right hand side can create a representation for the natural language sentence, for the text. And then so we can do a matching between the image representation and the text representation and to see whether they can match. So given any pair of text and image, we can use this model to decide whether they can match each other very well. Semantically, whether they are related or relevant. So we have data, as I mentioned, to train this model to learn the representations of image and language and also learn the mapping relations between the two. So with this we can do a very good job in image retrieval. Another area which I would like to quickly introduce is big data analytics. We are working with, for example, technical companies. So usually a technical company, they have a lot of data at the telecommunication network side. Usually it is called operation support system data, so OSS data. Also they have data about customers. Usually it's called business support system data. So we are working with a number of career companies to help them to leverage OSS and BSD data to improve customer relationship management and better management of their telecommunication networks. For example, one thing we have been doing is to help technical companies to identify the locations of users. So, for example, if you have a smartphone, your phone usually is talking with base stations in the surrounding area, right? Every time, every eight seconds, in fact, the phone has one communication with base stations nearby. And then this kind of data is recorded at the base stations. And we can leverage the data to identify the location of each individual user. That's the idea here. So we can, in fact, create a trajectory of each individual user based on the data. And this will help the technical companies to improve a lot of things. One thing is, for example, we can help to do a better planning for a city. So this is, for example, in the case of Shanghai, China. So we are working with one career company to help collect the data of their five million users. In this way, we can easily look at the population, how the five million users move around in the city. For example, the red arrow indicates that there are more people, more users of this career company. And this time, window. For each user, we have a number of labels. We can also look at the distribution of gender, distribution of ages, et cetera, for a particular time in a particular region among the five million users. And in this way, we can help the city government to do a better planning, to help the technical company to do a better promotion, et cetera. So one big challenge here we are facing is, in fact, the data is usually called MR data, measurement report data. So the log data from the base stations is very noisy. If, in fact, you want to identify the location of a particular user, it's quite noisy. So this figure shows that, for example, the line in white is a trajectory of one particular user in several blocks in the city. And the red lines are the original data from the base stations, the MR data, measurement report data. You can see it's very random and very noisy. With the original MR data, it's really hard to identify the real trajectory, the locations of the user during different times. So it's very necessary to improve the accuracy of trajectory prediction from the MR data. That's something we are doing here. So if you take the mean of the data from MR, the accuracy is, the mean is something like 100 meter, much worse than the GPS prediction. So if you only want to use MR data to do a better location prediction, the MR data is not very useful compared to GPS. So here what we want to try to do is to just use MR data to see how much we can improve. So we have developed a number of models to improve the accuracy of location identification of users. For example, the original data is in white. The ground truth is in white. And then with one prediction model, we can do a much better job. The green one is the output from our model, and we can do a much better job in prediction of user's trajectory. With additional knowledge from the city with map knowledge, we can further improve accuracy of prediction. Okay, this is another example. So there are many examples of technologies on AI which we are developing or we have developed to help improve the productivity user experiences, et cetera. So next I'm going to talk more about deep learning for natural language processing. And I will particularly point out the advantages and the challenges. So you have seen some examples like image retrieval and natural language dialogue in the first part of my talk. And you also see the power and also some limitations of natural language processing enhanced by deep learning. So to me, there are five fundamental problems in natural language processing. If you take a mathematical view on natural language processing, the ultimate goal for natural language processing is to let the computers to understand human language but still it's difficult in some sense, nearly impossible at this moment. We have to formalize all the major natural language processing problems as mathematical problems. So in my view, there are five fundamental problems in natural language processing from the viewpoint of math. So classification, matching, translation, structure prediction, and a marker of decision process. So let's look at all of them one by one. So first classification, so there's problems like this. Given a string, you want to assign a label to a string. It's a very typical or classification problem. So in natural language, I will give you an example. So a string can be a sentence or a document. A label can be a category representing some, for example, semantic category, et cetera. Matching is a problem like this. So given two strings, you want to match the two and to see whether they are relevant or related, et cetera. And for translation, you want to transform one string to another string. And this is also a very important natural language problem. For structure prediction, so given a string, you want to find the structure in the string. And for marker of decision process, in this case the task is more complicated. And given a state and action, you want to predict what would be the likely state next. So if you look at all the major natural language problems in applications, they can be formalized into one of the five basic problems I just have list up here. For example, text classification sentiment analysis can be viewed as a classification problem. And matching is widely used in search. So for example, and the question answering. For example, in search, given a query, you want to find the most relevant document. It's usually a matching problem between the query and a set of document candidates. And this is also true for dialogue. Some dialogue systems are built on retrieval technologies. So in this case, matching is also important here. And for translation, we have machine translation, speech recognition, handwriting recognition, and also dialogue, single-term dialogue. So the example I have just shown, generative approach to natural language dialogue can be viewed as a kind of translation problem. We in fact use sequence-to-sequence learning techniques to perform the single-term dialogue here, right? It's again a translation problem. And for structured prediction, we have, for example, name, identity recognition, part of speech tagging, or segmentation, word segmentation in Chinese, sentence parsing, semantic parsing, et cetera, as typical examples, right? And for macro decision process, this can be used in, for example, task-dependent multi-term dialogue. If you look at all the major natural language problems in practice, they can be categorized into the five basic problems. And so recently, we see that for all the five problems, deep learning has significantly improved the performances for all the four problems in natural language processing, particularly for translation and structured prediction and matching. For example, for machine translation with sequence-to-sequence learning, the so-called neural machine translation has already outperformed traditional statistical machine translation. And so this is also true for other tasks. With more data available, we can do a better job with deep learning for all the four problems in natural language processing, except the last one. So next, I'm going to discuss more about the advantages and the limitations of deep learning for natural language processing. So first, let's look at the advantages and disadvantages of deep learning in general. Some may be more related to natural language processing. For example, we all know that deep learning or deep neural networks is good at pattern recognition, right? So if you like to do something, there are many complicated patterns used in the task. Deep learning can easily learn and capture the patterns. For example, even for a machine translation problem, it's in fact a kind of very complicated pattern problem, right? So the deep neural networks have some kind of capability to acquire the patterns in the problems and then do a very good job. And it's data-driven. The performance is higher than the other approaches. And so another advantage is end-to-end training. I will explain this a little bit more later. And so you don't need to have human knowledge involved in system construction. In other words, you don't need to do feature engineering, right? For example, if you work on machine translation, you even don't need to know the languages, right? You don't need to have knowledge about the language pair you are working on. You only take the data and then feed the data into the system that you can build a translation system. That was not possible before. Yes, end-to-end training and representation learning. So as we have seen in the image retrieval example, deep learning is also very powerful in building representations across different modalities. In this way, we can easily do matching between data in different modalities. And also it's very, in some sense, very easy to build a deep neural network system because what you need is just employ the gradient-based method. The algorithm is very simple in some sense, right? And so it's quite powerful. But there are also limitations for deep learning. For example, it's not very good at inferring the decision problems. I will give you an example later. And it's data-hungry. You need a lot of data to train the model. And also it's difficult to handle long-tail. I will, for the experience, this long-tail problem later. And the model is usually a black box. It's difficult to understand, interpret the mechanism inside the system. And the computational cost of training is very high. And also for some unsupervised learning problems, it's still not clear whether we can develop a very powerful deep learning method to address the problems. And so the fundamental issue with deep learning is the last one. It's still lack of theoretical foundation. Something I want to emphasize at this workshop. I will talk more later. So let's look at several advantages and disadvantages. First, end-to-end training. As we have seen in the natural language dialogue example, it's really surprising, right? The system, the model can be built with a lot of data without any human involvement. It's just a kind of sequence-to-sequence learning. Two RNs, one encoder, one decoder, and they can just transform the representations from one sentence to another sentence. And this is really surprising. And this can be observed in many different tasks in natural language and even in other fields. You don't need to understand the details of this particular problem. For example, if it is translation, you don't need to understand the two languages. But yeah, so in the meantime, it also means that this is a huge black box. We really don't know what is happening inside. And another advantage for deep learning is representation learning. Before, in natural language processing, when you do information retrieval, for example, question answering search, you also do matching, but you represent, for example, two strings, query and document with vectors based on terms, based on symbols, right? And it's a kind of symbolic matching. But this is only possible for comparing two strings. That means you can only do symbol matching between strings. It was not possible or even thought possible before for matching across different motilities. For example, if you want to do query and image matching, or queries in natural language, text or image matching, it was not possible. People didn't realize it was possible. But with deep learning, we can in fact learn representations for image and text and perform matching between image and text. And they work very well. That's something really surprising. So to me, those are the two surprises. One is end-to-end training. Another is cross-modality training. So this is a kind of mysterious or magic. And they were not possible before. And then let's look at the limitations. First, inference and decision. There are many hard problems in natural language, particularly the multi-turn dialogue. Let's just look at an example. Yeah, multi-turn question answering. Still, it would be difficult. It would be difficult for deep learning to address multi-turn question answering. For example, single-turn is very easy. For example, if you ask, how tall is Yao Ming? He is a very popular basketball player in China. And then the system may return and answer automatically. So the single-turn question answering system can be based on deep learning and it can be based on retrieval-based or generation-based. That is fine. So as I have explained, deep learning can do a very good job in such kind of problems, matching and translation. But if you go to multi-turn question answering, it becomes much more challenging and complicated. For example, we have two rounds of conversation. The first round is the same. But if you have another round of conversation, the user asks, who is taller? Yao Ming or Liu Xiang is another athlete. And then popular in China. And then the system may return. Who is taller and what's the height of the other athlete? And even for this very simple example, it would be very difficult to formalize this second round of dialogue as a kind of matching or retrieval problem. Because in this second round of dialogue, first inference is involved. The system needs to compare the height of two people. And also the system needs to keep track of the conversation so far. And who is what's the height of the previous person, etc. So it's not a simple matching or retrieval problem. So during our human mental processing, maybe it's a very complicated task and multiple modules are involved and also multiple types of information processing get involved in this whole process. Even in a very simple example, you can see that it's not clear at least that deep learning can also solve this problem and contribute. And it involves inference, decision, etc. So it's still not clear at least. Deep learning may not help in this challenging problem. And yeah, so I have explained this. So maybe for multi-term dialogue, deep learning will not be enough at least. And another challenge for deep learning in natural language processing is challenge in the tail. So as you know, natural language data usually follows the parallel distribution. That means there's always a long tail. So this is just one example. So for example, if you collect the news articles in Xinhua news agency and through the year, you can get more news articles. Once you have more data articles in the corpus, then you can observe that the number of words, unique words, or the size of vocabulary increases. It's not a linear or sublinear, but this is a very typical trend. When you have more data in natural language, you usually always get more unique vocabularies, like unique names, etc., new terminologies, etc. And that means it's a kind of never-ending story. You always have new vocabularies. So if you just train the model using a statistical approach, this is not only for deep learning. This is also true for deep learning. So you always have a lot of long tail words and real words. How to train the model with regard to the real words is a big challenge. Because if you just take the data to train that part of the model, and then the model is always not well-trained, it can always not cover the long tail. And the number of real words is also quite large. It's not so small. So this is another issue, at least not solved by deep learning in natural language processing. And finally, this is the final problem with deep learning. And we have observed many interesting phenomena for deep learning. Recently, there is also a very interesting paper from Google about the generalization ability of deep learning. I don't know whether you have heard about this work. And this is also interesting and also in accordance with our own observations. We have observed similar phenomena in our own work for natural language processing using deep learning. So first, let's think about generalization ability of deep learning. So this is about generalization ability of deep learning. So generalization is a very key aspect of machine learning. Otherwise, if an algorithm cannot generalize, that means it does not work, right? Generalization ability is fundamentally important for any machine learning algorithm. But usually, this is one interesting phenomenon. When you train a deep neural network in practice, usually we don't observe overfitting. That means the generalization works quite well. So the model is huge. Usually you have a huge amount of data to train the model. And when you look at the training error and also the test error, you see the training error and the test error are similar. Not maybe exactly the same, but similar. That means they are usually all small. And there is very good generalization. There is no overfitting. And this is one observation in practice. And another interesting thing is neural networks can memorize training instances, as we have seen in our demo, right? So for the dialogue example, so the model can memorize training instances. So this is also observed in computer vision. Yeah, once you have some instances input into the training data and then the training instances can be memorized by the model. And so this is from Google's work, which is very interesting and a very interesting idea. And so the basic idea is this. They just inject some noise into training data in the training phase. And in an extreme case, they randomize everything in the training data. So that means the training data is completely noise for training. But still the neural network can memorize the noises. And that means in this case, in the extreme case, the training error is zero because it can memorize everything. Even all the noise is injected into the training data. The model can memorize, can remember the examples, right? The training error is zero. But the test error is huge because there is no relation between the input and output in the training data. The learned model is really random, right? But when the model is applied to prediction in test, then it cannot work. But this phenomena is also observed in our own experiments for natural language. So this is very surprising or in some sense very interesting because that means the previous theory about generalization ability is broken here. They cannot explain the phenomena here. Because, yeah, so one important thing for neural network learning is that the number of parameters is usually very large. It's larger than the number of training instances. That's also the case in our case. We have 100 million parameters to train. But we don't have so many instances in our training data, right? That's one of the reasons we can imagine that the neural network can memorize the training instances. However, it's not clear how to interpret the behavior when you inject noises into the training data. The discrepancy between the training error and test error when you have some noises in the training data. So there are many open problems here. And it's still not clear how to explain this. All the existing generalization theory, if you are familiar with machine learning theory, the machine learning theory based on VC dimension, rut macro, average, et cetera, they cannot explain the phenomena here. So in some sense, all the existing theories are broken. So then we have this kind of question. What would be the generalization theory for deep learning? When the model is complicated enough, right? And there is no theory about this. And to me, this is really a math problem because for machine learning, eventually you want to learn a function, right? It's a function approximation problem. You want to approximate a function with data. But when the function is very, very complicated, you don't have so much data to approximate the function. And then we have observed many interesting and strange phenomena. And when the model, the function is a kind of nonlinear function like neural network we have, right? So what's that? And I really hope the people in the audience who get interested in this kind of problem to really address and help us to solve the problem, right? So to me, I would say that deep learning really creates a new crisis for machine learning because we don't have theory. And on one hand, we have many successful application stories to show that deep learning can really do a very good job in practice. On the other hand, we really don't have any theory to explain why. And this kind of big gap really needs help from mathematicians to solve. Yeah, that's my talk. And thank you. Any questions? Yeah. I just have a question about how you input language. So if I have two words, say, do I input it as two separate unit vectors with zeros everywhere and just one at one place? Usually we don't do that. We need to compress. Otherwise, there's so many parameters because we have a lot of words in practice. And so there is a technique called word-to-vect. That's an algorithm to try to compress. So usually what we have mentioned is one method. We don't use it in practice, which is called one-hot vector. One is zero. One is one. The other is zero. And we try to, in theory, in principle, try to compress with cold currents information with lower dimensional vector, but more dense. So one-hot vector is very sparse, right? We stencer low-dimensional vector to represent each individual word as a starting point. We have several techniques developed. The committee has several techniques developed for doing that. Another thing, do people ever try to use letters instead of words? Yes, yes, yes, yes. That's also a very interesting approach and quite powerful. For example, for translation among European languages, it works quite well. Or a string of letters. We call it subword. So yeah, it also works, yeah. Good question. Good question. Yeah, so I share most of your points. That is just an impressive, but I mean, at no point actually the system is understanding what it's suggesting. It means when you have this dialogue thing, you are producing the right answer, which looks to human reasonable, but the system doesn't understand what the system is saying, right? Right, right. There is no clue actually. There is no interpretation behind it. You just go and get defined on what you did training, something that is similar. Right, right. That's the same thing with translation. It means there is no reasoning behind it. No reasoning. So I think for all the tasks where we don't need reasoning, putting more and more data, most likely we'll do better. And deep learning methods would be the right answer. If you go to the question where you have this multi-tasked dialogue, there you need reasoning. You need to understand the question. So do you think in the future that we will not read the reasoning anymore at all for all the tasks? Because you can imagine maybe if you have enough data, you don't need to reason anymore. No, no, no. I'm not so extreme. So I kind of agree with you. You talked, there is another approach, right? So I think that, yeah, discriminative and generative. Yeah, I think reasoning is very important. Yeah, not, you know, just to employ the brute force approach to use data to do that. There is clearly a limitation. Maybe if you have enough data, we don't need to understand anymore. Just if you have enough data. But for shallow problems. So if it's multi-tasked dialogue or multi-turned dialogue, it would be more complicated, right? You don't have enough data. So data is observation, right? You just look at the behaviors of a person. You cannot model his thinking, right? Something like that. For example, you can imagine using approach from linguistics to generate sentences, which makes sense. Right. And then having the answers. And then keep feeding them to your planning, to your network. But for simple tasks, it would be possible. Translation is a simple task, right? Yeah, but for multi-turned dialogue, I doubt it's not, right? And even for single-turn generative approach to dialogue, there is limitation, right? We can only achieve 76% of accuracy. Yeah, right, right? Because you only observe. You don't have any understanding, inference, et cetera. So there is clear limitation. Yeah, I don't personally believe. Yeah, I'm interested in how far we can go. But I don't think we can, with this kind of simple data-driven approach to achieve human capability of using language. And do you think that deep learning as it is, human intelligence, I think we do reasoning, right? Yes. I'm not so sure, but I think we do reasoning. Sure. We are a rule-based system. Humans are rule-based system. We learn things and we apply rules. This is a non-rule-based system. So we have no clue what the sign is happening. Right, right. I would say that we take a kind of hybrid approach. We have some patterns, right? Yeah, we memorize something. For that part, that can be maybe learned or acquired by deep learning. But we also have reasoning, et cetera, capability. Long-tail phenomena, right? We use reasoning capability. Otherwise, we cannot handle, right? We don't have difficulty dealing with tail. But for machine, if you take data-driven approach, there's always limitation there. Are there questions? Yeah? I have some question about how well is, how can you judge that algorithm after from another algorithm? For example, you have some actions like inference, memory. How can you say one inference answer is better than another inference answer or one memory? How can you quantize a good memory? These things, sometimes you have objective magic. Sometimes you have subjective magic. How can you judge the outcome of algorithm is good? Usually, we take a task-driven approach, right? If the task is well-defined, if you want to accomplish something with a robot, you communicate with the robot, if the robot can quickly understand your point and accomplish the task, then we say, yeah, the robot is doing a very good job. So, yeah, yeah, so all the problems, usually we try to derive material criteria to do evaluation, et cetera, right? And based on the task, yeah. So otherwise, it's difficult, yeah. So maybe I have one, two general questions. One is around, you were saying about the fact that you need to label stuff. And this goes back to the question of unsupervised learning. What do you think about the future of unsupervised learning and applications for communication? Telecommunication, okay. I guess one of the big problems that we have here is that you need to label the data. And, okay, there are some companies who are very, I mean, taking advantage of the fact of Facebook and others, where people label things just because they're on the web and it was free. And the question is that if we can't label data, what do we do? Yeah, so yes, I agree that unsupervised learning, yeah, is a very important topic from now, yeah. Do you think that's the future? Yeah, yeah, I think so, yeah, I agree, yeah. And once we have problems in unsupervised learning addressed quite well, right? Yeah, we need to put more effort in research on unsupervised learning, yeah, I agree. And my second question is one of criticism of the black box approach, right, is the fact that we don't know what's happening. Basically, you were saying, okay, one of the problems with deep learning is that it looks like a black box. But the other advantage of, at least when I hear people about using AI is that whenever they don't understand anything, they like it because it's a black box. Yeah, because then it's, we don't know how to put our hands in it, we don't understand anything and they say it's better to model everything as a black box and to train the box. So my feeling is that the black box approach, as far as I see people going in, is whenever they can't model things, they consider it as a very good asset to have a black box. Yeah, in some sense, yeah. But sometimes you want to have some understanding, right? The mechanism, at least the mechanism. You don't need to understand the details, but at least some level of understanding is necessary. That's why I think we need to put more effort in this search on understanding of the mechanism in deep learning, otherwise we'll have problems. It's a science, right? We need to understand the details to some extent. Yeah, I'm totally agree with you. The only thing I see is that the approach of the AI people is that because we cannot understand, it's better to model as a black box and we're not going to make any effort to understand. That's the feeling that approaches the people. That's some people, right? It's too hard to understand the whole box, so I'm just going to consider it as a black box and I go for it. It will become a kind of philosophical problem, right? It's difficult to argue. So if you take the view from the behavior point, and then it's okay, you just model the behavior, right? You don't care about the insight and that's a black box approach. But for building the goal of building AI, I think it's necessary to understand somewhat the insight and that's why the behavior approach is not enough, I think. I fully agree with you. I'm trying to find a mix of people, trying to mix, like you were saying, the prior, the human aspect, the understanding, with AI and how you can match both will be a good thing. I think we're lacking how to incorporate in a mathematical way. Right, right, right, right, right. This is my view. I agree. There are three approaches to AI, maybe. One is you just look at the behavior, data-driven approach, right? It's the most successful approach for now. Another is you look at the mechanism of human brain, try to understand the mechanism and build a model, right? To mimic human brain mechanism, right? That's another extreme approach. It does not work so well because we don't understand the human brain very well. At least for now, maybe it takes hundreds of years, right? Another approach would be a kind of knowledge-based because we have knowledge, we can read down everything, right? That's the traditional AI approach, symbolic approach or knowledge-based approach. It didn't work well, right? The failure of traditional AI was because people believed that everything could be realized with the knowledge incorporated into the system, right? Rules defined by humans, by experts, et cetera. It didn't work well, right? Maybe a kind of hybrid in the future is, at least in the coming years will work quite well. That is the main part, the main framework is still data-driven, but we get inspiration from human brains. That's what we're using, deep learning, right? On the other hand, we also take human knowledge, we incorporate human knowledge into the model and a kind of good mix of the three approaches, right? But with data-driven as the main because that's the only approach to me. We can mathematically define the models very elegantly, right? With a knowledge-based approach, if you only look at knowledge, but your approach is still a statistical approach with some knowledge or prior incorporated, right? So the framework is still statistics, mathematics, right? For the other two approaches, human brain approach or the knowledge, purely knowledge-based approach is still not clear, maybe possible in the future to define a kind of mathematical model to solve the problems, right? That's why we have to take a kind of hybrid approach for AI in the coming years. That's my personal view.