 Live from Boston, Massachusetts, it's theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. Welcome back to Boston, everybody. This is theCUBE, and we're here live at Spark Summit East, hashtag Spark Summit. Zia Ma is here. She's the vice president of Big Data at Intel. Zia, thanks for coming on theCUBE. Thanks for having me. You're welcome. Software is our topic, software at Intel. People don't necessarily associate Intel always with software, but what's the story there? So actually, there are many things that we do for software. Since I managed the Big Data Engineering Organization, so I'll just say a little bit more about what do we do for Big Data. So, Intel do all the processors, all the hardware, but when our customers are using the hardware, they like to get the best performance out of the Intel software, Intel hardware. So this is for the Big Data space. We optimize the Big Data solution stack, including Spark and Hadoop on top of Intel hardware, and make sure that we leverage the latest instruction set so that the customer gets the most performance out of the newest released Intel hardware. And also we collaborate very extensively with the open-source community for Big Data ecosystem advancement. So for example, we're a leading contributor to Apache Spark ecosystem. We're also a top contributor to Apache Hadoop ecosystem, and lately we're getting into the machine learning and deep learning and the AI space, especially integrating those capabilities into the Big Data ecosystem. So I have to ask a question just sort of strategically, if we go back several years, you look at, during the Unix days, you had a number of players developing hardware, microprocessors, there were risk-based systems, and remember MIPS, and of course IBM had one, and Sun, et cetera, et cetera, and some of those still live on, but very, very small portion of the market. So Intel has dominated the general-purpose market. So as Big Data then became more mainstream, was there a discussion of, okay, we have to develop specialized processors, which I know Intel can do as well, or did you say, okay, we can actually optimize through software, was that how you got here, or am I understanding that? Or? Yeah, we believe definitely software optimization, optimizing through software is one thing that we do. That's why Intel actually have, you may not know this, Intel has one of the largest software divisions that focus on the enabling and optimizing the solutions on Intel hardware. And of course we also have very aggressive product roadmap for advancing continuously our hardware products. And actually you mentioned the general-purpose computing. CPU today in the Big Data market still has more than 95% of the market. So that's still the biggest portion of the Big Data market. And we'll continue our advancement in that area. And obviously as the AI and machine learning, use cases getting added into the Big Data domain, and we are expanding our product portfolio into some other silicon products. Yeah, and of course, I mean that was kind of the big bet of, we went to bed on Intel, and I guess, I guess- You should've still do. And still do, and I guess at the time, C8 or other spinning disk man now Flash comes in, of course now Spark with memory, it's really changing the game, isn't it? What does that mean for you and the software group? Right, so what do we, actually, still we focus on the obviously at the hardware level, like Intel now is not just offering the computing capability, we also offer very powerful network capability, we offer very good memory solutions, memory hardware. Like we keep talking about this non-volatile memory technologies. So for Big Data, we're trying to leverage all those newest hardware, and we are already working with many of our customers to help them to improve their Big Data memory solution, the in-memory analytics type of capability on Intel hardware, give them the most optimal performance and the most secure result using Intel hardware. So that's definitely one thing that we continue to do, that's going to be our still a top priority, but we don't just limit our work to optimization because giving you the best experience, giving you the complete experience on Intel platform is our ultimate goal. So we work with our customers from financial services company, we work with folks from manufacturing, from transportation, and from other IoT, you know, Internet of Things segment, and to make sure that we give them the easiest Big Data and the analytics experience on Intel hardware. So when they are running those solutions, they don't have to worry too much about how to make their application work with Intel hardware and how to make it more performant with Intel hardware because that's the Intel software solution that's going to bridge the gap. We make, we do that part of the job and so that it will make our customers experience easier and more complete. You serve as the accelerant to the marketplace, so go ahead, Joanne. That's right. So the Intel's Big ML is sort of the news product as of the last month or so, open source solution. Tell us how there are other sort of deep learning frameworks that aren't fully integrated with Spark yet and where Big ML fits in, since we're at a Spark conference, how it backfills some functionality and how it really takes advantage of Intel hardware. Yeah, so yes, George, just like you said, big deal, we just open sourced a month ago. It's a deep learning framework that we organically build on top of Apache Spark and it has quite some differences from the other mainstream deep learning frameworks like a cafe, TensorFlow, Torch, and Tienaue or you name it. The reason that we decided to work on this project was, again, through our experience working with our analytics, especially big data analytics customers, as they build their AI solutions or AI modules within their analytics application, it's, they finally it's getting more and more difficult to integrate AI, to build and integrate AI capability into their existing big data and analytics ecosystem. They had to set up a different cluster and build a different set of AI capabilities using let's say one of the deep learning frameworks and then later they have to overcome a lot of challenges to, for example, moving the model and data between the two different clusters and then make sure that AI result is getting integrated into the existing analytics platform or analytics application. So that was the primary driver. How do we make our customers experience easier? Do they have to leave their existing infrastructure and build a separate AI module? And can we do something organic on top of the existing big data platform, let's say Apache Spark? Can we just do something like that so that the user can just leverage its existing infrastructure and make it a naturally integral part of the overall analytics ecosystem that they already have? So this was the primary driver and also the other benefit that we see by integrating this big DL framework naturally with a big data platform is that it enables efficient scale out and for tolerance and elasticity and dynamic resource management and those are the benefits that's naturally brought by big data platform. And today, actually just with a very short period of time and we have already tested that big DL can scale easily to tens or hundreds of nodes. So the scalability is also quite good. And another benefit with a solution like big DL, especially because it eliminates the need of setting up a separate cluster and moving the model between different hardware clusters, you save your total cost of ownership. You can just leverage your existing infrastructure. There is no need to buy additional set of hardware and build another environment just for training the model. So that's another benefit that we see. And performance wise, again, we also tested a big DL with a cafe torch and TensorFlow. So the performance of big DL on single node Xeon is orders of magnitude faster than out of box open source cafe, TensorFlow or torch. So definitely it's going to be a very promising and a useful solution. Okay, can you talk about some of the use cases that you expect to see from your partners and your customers? Actually, very good question. We already started a few engagement with some of the interested customers. The first customer is from steel industry. We're improving the accuracy for steel surface defect recognition is very important to its quality control. So we worked with this customer in the last few months and build an end to end image recognition pipeline using big DL and Spark. And the customer just through phase one work already improved its defect recognition accuracy to 90%. And they're seeing a very good yield improvement with the steel production. It used to be done by human inspection? It used to be done by human, yes. And you said, well, I'm sorry, what was the degree of improvement? 90, 90. So now the accuracy is up to 90%. And another use case and financial services actually is another use case, especially for fraud detection. So this customer that again, I'm not at the customer's request, they're very sensitive to the financial industry. They're very sensitive with releasing their name. So the customer were seeing its fraud risks were increasing tremendously with this wide range of products, services and customer interaction channels. So they implemented an end to end deep learning solution using big DL and Spark. And again, through phase one work, they are seeing the fraud detection rate improved by 40 times, four zero times through phase one work. We think there were more improvement that we can do because this is just the collaboration in the last few months and we'll continue the collaboration with this customer. And we expect more use cases from other business segments, but those are the two that's already have big DL running in production today. Well, so the first is, I mean, that's amazing. Essentially replacing the human have to interact and be much more accurate. The fraud detection is interesting because fraud detection has come a long way in the last 10 years, as you know. Used to take six months if they found fraud and now it's minutes, seconds, but there are a lot of false positives still. So do you see this technology helping address that problem? Yeah, we actually continuously improving the prediction and accuracy is one of the goals. And we expect, you know, that's why we, this is another reason why we need to bring AI and big data together because you need to train your model. You need to train your AI capabilities with more and more training data so that you get much improved training, training accuracy. Andrew, this is the biggest way of improving your training accuracy. So, and you need a huge infrastructure, a big data platform, so that you can host and well manage your training data sets. And so that it can feed into your deep learning solution or module for continuously improving your training accuracy. So, yes. This is a really key point, it seems, I'd like to unpack that a little bit. So, because when we talk to customers and application vendors, it's that training feedback loop that gets the model smarter and smarter. So if you had one cluster for training that was with another framework and then Spark was your, I guess for the rest of your analytics, how would training with feedback data work when you had two separate environments? That's one of the drivers why we are creating BigDL because we tried to port. We did not come to BigDL at the very beginning. We actually tried to port existing deep learning frameworks like a cafe and TensorFlow onto Spark. And you also probably saw some research papers. Folks, there are other teams that's out there that's also trying to port cafe, TensorFlow and other deep learning frameworks that's out there onto Spark because you have that need. You need to bring the two capabilities together. But the problem is that those systems were developed in a very, in a traditional way. You know, it's with big data not yet in the consideration when those frameworks were created, were innovated. So, but now the need of converging the two becomes more and more clear and more necessary. And that's why we said, and when we ported over, we said, gosh, this is so difficult. First, it's very challenging to integrate the two. And a second, the experience after you moved it over is awkward. You are literally using Spark as a dispatcher. The integration is not coherent. It's like they are superficially integrated. So, this is where we said, okay, we've got to do something different. We cannot just superficially integrate two systems together. Can we do something organic on top of the big data platform, on top of Apache Spark? So that the integration between the training system, between feature engineering, between data management can be more consistent, can be more integrated. So that was the, that's exactly the driver for this work. Well, that's huge. I mean, seamless integration is one of the most overused phrases in the technology business. Superficial integration is maybe a better description for a lot of those so-called seamless integrations. You're claiming here that it's seamless integration. We're out of time. Just last word on Intel and Spark Summit. What do you guys got going here? What's the vibe like? So actually tomorrow I have a keynote. I'm going to talk a little more about what we're doing with BigDL. Actually, this is one of the big things that we're doing. And of course, in order for BigDL, system like, you know, BigDL, or even other deep learning frameworks to get optimized optimum performance on Intel hardware, there is another item that we're highlighting, the MKL, Intel Optimized Math Kernel Library. It has a lot of common math routines that's optimized for Intel processor using the latest instruction set. And that's already today integrated into the BigDL platform ecosystem. So that's another thing that we're highlighting. And another thing is that those are just the software. And at the hardware level, during November, Intel's AI day, our executives from BK, Diane Bryant and Doug Fisher, they also highlight the Nirvana product portfolio that's coming out that will give you different hardware choices for AI. You can look at FPGA, Xeon 5, Xeon, and our new Nirvana based silicon like Crest Lake. And you know, those are some good silicon products that you can expect in the future. Intel, taking us to Nirvana, touching every part of the ecosystem, Intel's, nice to say, 95% share in all parts of the business. Sia, thanks very much for coming to the cubes. Thank you, thank you for having me. All right, keep it right there, everybody. George and I will be back with our next guest. This is Spark Summit, hashtag Spark Summit, we're the cube, we're right back. Since the dawn of the...