 So we're here at the Linaro Connect and who are you? I'm André Agallo. I'm VP for the Strategic Initiatives and Segment Groups. So what's the latest? The latest is we are announcing the Machine Intelligence Initiative and as part of this, ARM is making a major step in donating the ARM and SDK as an open source project. So this is machine learning coming into the Linaro now? Yes. Machine learning is really recent. Machine learning and especially deep learning for the masses is really recent. Neural networks started in the early 70s, 80s and then somehow it was less of a hype. But 2015, 2015 is the year of the very first commit to the TensorFlow project from engineers at Google. They were working first on the project disbelief and then it turned out into the first commit to TensorFlow. That was in 2015, midnight, something late night. 2015 is the year of the first commit to Cafe2, another very important machine learning framework endorsed by Facebook and the Berkeley AI Research Lab 2015 again. 2015 is the year of the first commit to MXNet endorsed by Amazon, Carnegie Mellon University and other universities in the world. So 2015 is a very important year. This is because the compute capabilities in the CPU and GPU reached a level of performance that allowed to run all the training and inference in the data centers, which was not possible before. And also this is thanks to the cloud. Every time we upload our images to the social networks, the public images, the public postings, these go through inference, these go through detecting faces, miles, objects and automatic tagging. But this means also that we are providing a huge database of billions of images that all the cloud vendors and the social network vendors can use to train and improve the curious of their algorithms. So this is the cloud. There are constraints that come with deep learning on the cloud in the data center. You need to be always on, always connected to the Internet. There may be constraints in terms of bandwidth and traffic because you need to upload your images to the cloud to get it analyzed. There may be constraints in time, in latency. There may be real-time constraints in a production line, in industrial use case, for example, in detecting liquid spill or in any other analysis of some data capture from the sensors. There may be privacy concerns. There may be sensible data that you don't want to upload to the cloud. So this is where edge computing comes in the picture. And basically edge computing means deploying computing workloads and deploying machine learning, deep learning algorithms at the edge. So in micro data centers or powerful edge nodes that are right there close to where the data is captured from the sensors. And this is possible because all the modern application processors, those that we have in our smartphones, they all have nowadays neural network processing capabilities. This can be Ethereum Genius computing using the CPU, the GPU, the DSP, some special instructions or MAC units, or it can be complete offload processing engines. This is what we refer to as machine learning processor, neural processing unit, deep learning accelerators, so DLA, NPU, all these acronyms are somewhat similar. Now the point is that on one side you have the frameworks. TensorFlow, Cafe2, MXNet, PyTorch, Paddle, Paddle and the other Chinese frameworks. And on the other side you have all these companies that provide neural network accelerator IPs. I could easily count a hundred of companies providing NN processing IPs in one sort or another. And so they need, and this is the problem that Linar we're very familiar with. You need to integrate accelerators with open source, with complex frameworks, yet you should avoid forking and fragmentation. And today this is what is happening. With a hundred companies providing neural network IPs, you start by downloading the runtime from a given framework and you modify it to add the hooks, the calls to your driver. Well, this is a fork. And with many companies providing IPs and many very complex frameworks, every framework change is fluid. They change continuously. There are between 500,000, 1.5 million thousand lines of code. These are pretty big and change all the time. Well, it's not scalable. It's not sustainable having hundreds of forks. So this is where at Linaro we have the skills to help and have a significant impact. This is the core of the mission of the Machine Intelligence Initiative. We want to reduce the fragmentation, yet we want to ensure that all the members in the ecosystem can provide their competitive advantage. The way we are achieving this is that first we want to endorse open standards, open standards for the machine learning frameworks in terms of the format to describe the neural network, describe the weights, and the entire deep learning algorithm. And we need also to adopt open standard APIs to initiate a graph, to initiate a backend on the device, to start the execution, be notified when the inference has completed. Onyx is a very interesting candidate as an open standard. And we're seriously evaluating and considering leveraging Onyx and the Onyxify API as the upper layer interface boundaries of our work. Then we want to work on a common open source inference engine that is shared across the ARM partners in the ecosystem, optimized for the ARM cores, yet it shall provide a framework of plugins so that every neural network accelerator can be plugged in as an accelerator. The inference engine then is a kind of a commodity, is a shared software, and every vendor shall only focus on their own specific accelerator, their competitive advantage. So the total cost of ownership for each partner providing their neural network IPs, the TCO is much lower. They only need to focus and continue upgrading their neural network applications, their key differentiator, and collaboratively add Linaro and all the players together, then we can collaborate in the common inference engine, in maintaining it and evolving it. So are there some, for example, TensorFlow, Caffe, all that? Those are one side, and on the other side is something else, and those TensorFlow Caffe are open source also. Yes. So the other parts are also open source, but you are tying them in between? Yes, correct. As I said, when you tie one of these frameworks down to an accelerator, you need a runtime engine that goes through the graph that needs to be used for the inference phase. So the runtime is something that looks at the graph, it translates it in a set of complex mathematical operations that are executed in sequence. And so we are tying up the frameworks and the accelerators that offload the CPU in a way that is not duplicated for every vendor, not 100 times 100 inference engines that do all the same but somewhat different, and nobody is able to re-base them, provide fixes, new features. So we want to focus on the, yes, you said the tie-in, that's a good way of phrasing it. And so Linaro is a perfect entity that could help to solve this? Absolutely. This is what we have been doing for the last eight years, since Linaro got started. We focus in leading the collaboration in the ARM open source ecosystem. We run the company in profit neutral mode. All our members fund our activity with an annual membership fee and engineers that we move into shared teams. And we use all the fundings to hire the best talents in the community, the best key maintainers. So also for this machine intelligence effort, we hire and we are hiring the right technical leaders and managers and test experts. Test is very important as well. And all our work is upstream. So the way I would understand AI is that it doesn't have to do with somebody having kind of like a secret but the best algorithm. And if they have that, then they have an advantage. Is that true or not? No, no, the algorithms come from the framework vendors, from the model vendors. TensorFlow is a framework to develop applications. TensorFlow has its, it's a think of an up store. TensorFlow has its store of models, models that can do image recognition, text recognition, sound or voice or speech recognition. So these are the algorithms. And they can be proprietary? They can be proprietary, they can be open source. TensorFlow Cafe2 has the model zoo. MXNet has, I think the name is GLUON. Everyone has a kind of an up store or a model store. Some are open source, some are proprietary. That is where the intelligence, the data scientists, they develop these algorithms. At our end, we work on the execution, the translation from algorithms into executable code. Machine code. Machine code, yes. So Linaro, at Linaro, you're the expert making the most optimal machine code? What's it called? You know, using the ARM processors? Linaro is where we, it's a melting pot of experiences from all the players in the industry. We have experts in kernel development. We have experts in tool chain. We do work with ARM in optimizing GCC, yes. So we have teams with a very varied set of skills and expertise. The key DNA is always collaboration and open source. And so, for example, TensorFlow has a TPU. ARM is doing a machine learning IP on the chip. Yes. And some people are doing with GPU. But it doesn't matter that there's all these different hardware. And there are multiple companies providing other generation units. In Asia, there's Bitman, Cambricon. There are so many. They can be like A6. Or blocks that are integrated in the SOC. It really reminds me of the early days where the smartphones had the capabilities of doing a video playback. In those days, everybody was having a different proprietary way of exposing MPEG for H264 decoding to the upper layers. And there was a proliferation of proprietary applications. Then we started focusing on... And those drivers were kept in binary, right? Now we are at the stage where the drivers are fully open source. To me, one of the golden references is the open source drivers for the Dragon Board, the DBE-410 and A20 developed between Qualcomm and Linaro on 96 boards. That, to me, is one of the golden references. We have an entire video encoding drivers that is fully open source and that integrates with the frameworks. So in those days, it was tricky. It was proprietary. Now we have clearly well-defined APIs. The video for Linux driver API then integrates in the FFNPEG and then you can go up to Gstreamer and the applications work. And every SoC vendor had a different set of video decoding accelerators. Yet now we have defined APIs. So I see a very similar situation now. Of course, complexities may be different. The use cases are different, but it's a similar dynamic. So you have a role to play no matter the architecture of the accelerator. So it works with GPUs, with an IP on the SoC, it works with the ASICs, anything. Yes, the key is to have the right set of plugins in the framework so that the inference engine can probe and can understand which operators can be accelerated, can be offloaded, and the inference engine can understand and define a subset of a graph. So a sequence of multiple operations can be executed all at once, offloaded all at once, without doing the back and forth between the CPU and the accelerators. And then it's the vendor who provides the accelerator that takes care of whether you need a shared memory, you need a DMA, you need control registers, interrupts. That is where the plugin can abstract these, but the framework shall be aware whether it is a complete offload or a heterogeneous computing with GPU and DSP and CPU. So all the providers of all these solutions that you plug into, they're all excited about you taking this role and how big is this role going to be for you? For Linaro? For Linaro I think this is a key, yet another key initiative to help the ARM ecosystem expand and to reduce fragmentation, to help members focus on their competitive advantage. We held AI neural networks on our summit on Wednesday here at Linaro Connect. We had amazing speakers. We had keynotes by Chris Benson, AI evangelist, by Jim Davis, general manager for the AI division at ARM. Rob Elliott, director for ARM NN. He gave a great deep dive into ARM NN and how ARM NN does exactly all these and copes for all the acceleration. We had Pete Warden from Google, TensorFlow mobile and embedded team. He is the tech lead and he was with us. He also shared a quote for the press release and he's eager to collaborate with us. We also had Tom Lane from AWS AI and he gave a great talk and a demonstration on how to use TVM and Onyx. So I think we are on the right path, also in the technical choices. And then we had Marc Chalebois from Qualcomm, Xilinx with an amazing talk about how to optimize machine learning on FPGAs. I hope I'm not forgetting anybody. No offense if I'm not going through the entire list but it was a very busy day and the feedback was extremely positive. Is it possible that maybe NVIDIA thinks they will do something different? Because they want to do everything on their GPUs, right? Yes. And maybe they have a different strategy or it could also be compatible. I would love to work with NVIDIA as well. We are here to collaborate and to be inclusive. So we would love to have also NVIDIA collaborate with us and share a quote together. Of course, here is... we are focusing on deep learning inference at the edge. So it's not the big, very powerful GPUs that run in the data center. That is the training, that is the data center. Here we are talking about edge computing, edge devices. Do you not talk about data center? Not for this specific work. We have other activities in the data center with high performance computing and this is where the training work will naturally fit. Today with this machine intelligence is really the deep learning inference at the edge on the edge devices. But potentially, you might take up a role too in the data center for the AI? So, El Dinar, we have a data center team and we have a high performance computing special interest group. One of the key activities that we are working on is enabling the scalable vector extensions in the ARM V8 architecture, the SVE, into our high performance computing. This can be easily used also for training in the data center. So, the data center team is one of the segment groups that I'm responsible for. El Dinaro, we have the technical steering committee led by our CTO, David Brasling. This is the place where we ensure we exploit all synergy across all the teams at El Dinaro. And so your next role is just more reaching out, more talking with all the people and making it happen? Right now I'm covering all the segment groups and I'm kick-starting this machine intelligence initiative. And I think at some point we will have some full-time senior manager or director that will own it and at some point this will grow and will need help. All right, so potentially there will be openings or maybe somebody at El Dinaro wants to do this, right? Potentially, yes.