 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back everyone to day two of theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests on this panel today. We have Tim Vincent. He is the VP of Cognitive Systems Software at IBM. And Steve Roberts, who's the offering manager for big data on IBM Power Systems. Thanks so much for coming on theCUBE. Well, thank you for having us. So we're now in this new era, this cognitive systems era. Can you set the scene for our viewers and tell our viewers a little bit about what you do and why it's so important? Okay, I'll give a bit of a background first because James knows me for my previous role as, you know, and I've spent a lot of time in the data and analytics space. I was the CTO for Bob, running the analytics group up to about a year and a half ago. And we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the cognitive systems for Bob Pitchano, who's my current boss, Bob asked me to move over and really start helping build, helped to build out more of a software and more of an AI focus and a workload focus on how we're thinking of the power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software. So co-optimize software space and the hardware space specifically what's needed for AI systems because the actual processing, the data processing, the algorithmic processing for AI is very, very different than what you would have for a traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. And is this driven by customers? Is this driven by just a trend that IBM is seeing? I mean, how are you? It's a combination of both. So a lot of this is that, you know, there was a lot of thought put into this before I joined the team. So there was a lot of really good thinking from the power brand. But it was really foresight and things like Moore's law coming to an end of a life cycle, right? And the ramifications to that. And at the same time as you start getting into things like neural nets and the floating plant operations that you need to drive a neural net, there was clear that we were hitting the boundaries and then there's new technologies such as what NVIDIA produces with their GPUs that are clearly advantageous. So there was a lot of trends that were coming together that the technical team saw. And at the same time, we were seeing customer struggling with specific things, you know, how do I actually build a model if the training time is going to be weeks and months or, you know, let alone hours? And one of the scenarios I like to think about I'm probably showing my age a bit, but went to a school called University of Waterloo. And when I went to school in my early years, they had a batch-based system for compilation and systems run. You sit in the lab at night and you submit a compile job and the compile job would say, okay, it's going to take three hours to compile the application. You think of the productivity hit that has to you. And now you start thinking about, okay, you've got this new skill in data scientists which is really, really hard to find. They're very, very valuable. And you're giving them systems that take hours and weeks to do what they need to do. And, you know, they're trying to derive these models and get a high degree of accuracy in their predictions. And they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. Well, before the cameras are rolling, you were talking about the term data scientists and app developers is used interchangeably and that's just wrong. And actually, let's hear, because IBM's whole position, and I agree with it, I think it's the right framework, data science as a team sport, but application development as an even larger team sport in which data scientists, data engineers play a role. Yeah, we want to hear your ideas on the broader application development ecosystem and where data scientists and data engineers and so forth fall into that broader spectrum and then how IBM is supporting that entire new paradigm of application development with your solution portfolio, including AI on power. So I think you used the word collaboration and team sport and data scientists that collaborate a team sport, but you're 100% correct. And I think it's missing to a great degree today and it's probably limiting the actual value of AI in the industry. And that's how the data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is simply an AI model, deep learning, machine learning and deep learning. And the application developers are consuming those things and they're producing something else, which is the application logic, which is driving your business processes and this view. So they got to work together. But there's a lot of confusion about who does what. You see people who talk about data scientists build application logic and the number of people who data scientists can do that, it exists, but it's not where the value they bring to the equation. And the application developers developing AI models, you know, they exist, but it's not the most prevalent form factor. But you know what's kind of unbalanced Tim, in the industry's discussion of these role definitions, quite often the traditional definition are scoping of a data scientist as if they know statistical modeling plus data management plus coding, right? But you never hear the opposite that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future, well, at least on some level need to be conversant with the best practices of building and tuning or training machine learning models, or no? I think it'll absolutely happen and I will take it actually a step further because again, the data scientist skill is hard for a lot of people to find. And as such, it's a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called AI, power AI vision. And it takes it up at another level, I don't know, above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert? So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of neural nets you use, how to do data preparation? So we build a tool set, which is effectively an SME tool set which allows you to automatically label, we actually allows you to tag and label images. And then as you're tagging and labeling images, it learns from that and that actually helps automate the labeling of the image. Is this distinct from data science experience on the one hand, which is geared towards data scientists? And I think Watson Analytics among your tools is geared towards the SME. This is a third tool or an overlay? Yeah, this is a third tool, which is really, again, one of the co-optimized capabilities that I talked about is it's a tool that we've built out that really is leveraging the combination of what we do in power, the interconnect which we have with the GPUs, which is the NVLink interconnect, which gives us basically a 10x improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly. So we're seeing about a 4x improvement over competitive technologies that are also using GPUs. And if we're looking at machine learning algorithms, we've recently come out with some technology we call SnapML, which allows you to push machine learning out of, yeah, it allows you to push machine learning algorithms down into the GPUs. And this, we're seeing about a 40 to 50x improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to do something specific, which is allow them to build out AI models to do recognition on either images or videos. Is there a pre-existing library of models in the solution that they can tap into? Basically, it allows, it has a- Are they pre-trained? No, they're not pre-trained models. That's one of the differences in it. It actually has a set of models that it picks for you. And actually, so this is why it helps the business persona, because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyperparameter tuning. But the end user doesn't have to know about all these things, right? So you're trying to lift, lift the, and it comes back to your point on application developers. It allows you to lift the barrier for people to do these tasks. Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally, you should have, the infrastructure should recommend and choose under various circumstances, the models and the algorithms, the libraries, whatever for you, for a particular task. One actual feature of PowerAid Enterprise is that it does include a way to do a quick visual inspection of a model's accuracy with a small data sample before you invest in scaling over a cluster or large data set so you can get a visual indicator as to whether the model is moving towards accuracy or you need to go and test an alternate model. So it's like a dashboard of like Gini coefficients and all that stuff, okay? Exactly, it gives you a snapshot view. The other thing I was going to mention, you guys talked about application development data scientists and of course, a big message here at the conference is, data science meets big data and the work that Hortonworks is doing, evolving the notion of container support in yarn, GPU awareness in yarn, bringing data science experience which can include the PowerAI capability that Tim was talking about as a workload tightly coupled with Hadoop and this is where our power servers are really built not for just a monolithic building block that always has the same ratio of compute and storage but fit for purpose servers that can address either GPU optimized workloads providing the bandwidth enhancements that Tim talked about with the GPU but also data dense servers that can now support two terabytes of memory, double the overall memory bandwidth on the box, cores, 44 cores that can support up to 176 threads for parallelization of spark workloads, SQL workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need because Hadoop isn't now just MapReduce, it's a multitude of workloads that you need to be able to mix and match and bring various capabilities to the table for compute and that's where power eight, now power nine has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller, smaller cores, smaller memory where it makes sense. Big issues. This show at DataWorks 2018 here in San Jose, the prime announcement, partnership announcement between IBM and Hortonworks was IHAH, I believe it's IBM hosted analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie in from an offering management standpoint between that and power AI so you can build models in the power AI environment that deploy them out to, in conjunction with the IHAH, is there going forward, I mean it's going to get a sense for whether those kinds of integrations- Well the same data science capability, data science experience, whether you choose to run it in the public cloud or run it in the private cloud model or on-prem, it's the same data science package. Power AI as a set of optimized deep learning libraries that can provide advantage on power, apply when you choose to run those deployments on our power system. So we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So it really depends upon the customer requirements and whether a power foundation would make sense in some of those deployment models. I mean for us here with power nine, we've recently announced a whole series of Linux power nine servers, that's our latest family, including as I mentioned storage dense servers, the one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architectures actually to support combinations of cluster models that can, as I mentioned, fit for purpose for the workload. So bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing power in ICP with big data solutions as well. And before we wrap, we're just about to wrap. I think in the reference architecture you described, I'm excited by the fact that you've commercialized, distributed deep learning for the growing number of instances where you're going to build containerized AI and distribute pieces of it across in this multi-cloud. You need the underlying, as it were, middleware fabric to allow all those pieces to play together into some larger application. So I've been following DDL because your research lab has been posting information about that for quite a while. So I'm excited that you guys are finally commercializing. IBM does a really good job of commercializing what comes out of the lab, like with Watson. Great. Well, good note to end on. Thanks so much for joining us. Oh, thank you. Thank you for the, thank you. We will have more from theCUBE's live coverage of DataWorks coming up just after this.