 Live from New York, it's theCUBE. Covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. Good morning, everybody. Welcome to the Waldorf Astoria. Stu Miniman and I are here in New York City, the big apple for IBM's Machine Learning Event, hashtag IBM ML. We're fresh off Spark Summit, Stu, where we had theCUBE. This, by the way, is theCUBE, the worldwide leader in live tech coverage. And we were at Spark Summit last week, George Gilbert and I watching the evolution of so-called big data. And let me sort of frame, Stu, where we're at and bring you into the conversation. So the early days of big data were all about offloading the data warehouse and reducing the cost of the data warehouse. I often joke that the ROI of big data is reduction on investment, right? These big, expensive data warehouses. And so that was quite successful in that regard. And what then happened is we started throw all this data into the data warehouse. People would joke it became a data swamp and you had a lot of tooling to try to clean the data warehouse and a lot of transforming and loading and the ETL vendors started to participate there in a bigger way. Then you saw the extension of these data pipelines to try to do more with that data. The cloud guys have now entered in a big way. We're now entering the cognitive era, as IBM likes to refer to it. Others talk about AI and machine learning and deep learning. And that's really the big topic here today. What we can tell you, the news goes out at 9 a.m. this morning. And it was well known, widely known, that IBM is bringing machine learning to its mainframe, Z mainframe. Two years ago, Stu, IBM announced the Z13, which was really designed to bring analytic and transaction processing together on a single platform. And clearly IBM is extending the useful life of the mainframe by bringing things like Spark, like certainly what it did with Linux and now machine learning into Z. I want to talk about cloud, the importance of cloud and how that has really taken over the world of big data. Virtually every customer you talk to now is doing work on the cloud. So it's interesting to see now IBM unlocking its transaction base, its mission critical data to this machine learning world. But what are you seeing around cloud and big data? Well Dave, first, talking about, we've been digging into this big data space since before it was called big data. One of the early things that really got me interested and exciting about it is from the infrastructure standpoint, storage has always been one of those costs that we had to have. And the massive amounts of data, the kind of the digital explosion we've talked about is that keeping all that information or managing all that information was a huge challenge and big data was really that bit flip is how do we take all of that information and make it an opportunity? How do we get new revenue streams? And Dave, IBM has been at the center of this and looking at the higher level pieces of not just storing data but leveraging it. Obviously huge in analytics, lots of focus on everything from the early Hadoop and Spark and newer technologies but digging into how they can leverage up the stack which is where IBM had done lots of acquisitions in that space and leveraging that and wants to make sure that they have a strong position both in cloud where they have what was renamed, the soft layer is now IBM Bluemix with a lot of services including a machine learning service that leverages the Watson technology and of course on-prem they've got the Z and the power solutions that you and I have covered for many years at the IBM Edge show. So okay, so machine learning obviously heavily leverages models. So we've seen in the early days of big data the data scientists would build models and the machine learning allows those models to be perfected over time. So there's this continuous process. So we're familiar with the world of batch and then sort of many computer brought in the world of interactive and so we're familiar with those types of workloads. Well now we're talking about a new emergent workload which is continuous apps where you're streaming data in that's sort of what Spark is all about and so the models that data scientists are building can constantly be improved and the key is automation, right? Being able to automate that whole process and being able to collaborate between the data scientists, the data quality engineers, even the application developers is something that IBM really tried to address in its last big announcement in this area which was in October of last year the Watson data platform, what they called at the time I think data works. So really trying to bring together those different worlds of different personas in a way that they can collaborate together and improve models on a continuous basis. So the use cases that you often hear in big data and certainly initially in machine learning are around things like fraud, fraud detection. Obviously ad serving has been a big data sort of application for quite some time and in financial services, identifying good targets, identifying risk. What I'm seeing Stu is that the phase that we're in now of this so-called big data and analytics world and now bringing in machine learning and deep learning is to really improve on some of those use cases. So for example, fraud's gotten much, much better. We know, look, 10 years ago, let's say, it took many, many months if you ever detected fraud. Now you get it in seconds or sometimes minutes but you get a lot of false positives. Oops, sorry, the transaction didn't go through. Did you do this transaction? Yes, I did. Oh, sorry, you're going to have to redo it because it didn't go through. It's very frustrating for a lot of users. That will get better and better and better. We've all experienced retargeting from ads and we know how crappy they are. That will continuously get better. The big question that people have is, it sort of goes back to Jeff Hammerbarker, the best minds of my generation are trying to get people to click on ads. When will we see big data really start to affect our lives in different ways like patient outcomes? And we're going to hear some of that today from folks in healthcare and pharma. But again, these are the things that people are waiting for. The other piece is, of course, IoT. What are you seeing in terms of IoT and the whole data flow? Yeah, because a big question we have, Dave, is where's the data and therefore where does it make sense to be able to do that processing? In big data we talked about, you've got massive amounts of data, can we move the processing to that data? With IoT, David Floyer, our CTO, talked that there's going to be massive amounts of data at the edge and I don't have the time or the bandwidth or the need necessarily to pull that back to some kind of central repository, I want to be able to work on it there. And that's where there's going to be a lot of data worked at the edge. Peter Levine did a whole video talking about how, oh, public cloud is dead, it's all going to go to the edge. A little bit hyperbolic to the statement, we understand that there's plenty of use cases for both public cloud and for the edge. In fact, we see Google, big push in machine learning TensorFlow. It's one of those machine learning frameworks out there that we expect a lot of people to work on. Amazon is putting effort into the MXNet framework, which is once again an open source effort. So one of the things I'm looking at this space and I think IBM can provide some leadership here is to what frameworks are going to become popular across multiple scenarios. I can't have how many winners can there be for these frameworks, we already have multiple programming languages, multiple clouds, how much of it is just API compatibility, how much will work there, and where are the repositories of data going to be and where does it make sense to do that predictive analytics at advanced processing? So you're bringing up a good point. And last year, last October at Big Data SV, we had a special segment of data scientists with this data scientist panel. It was great. We had some rockstar data scientists on there, like Des Blanchefield and Joe Caserta and a number of others. And they echoed what you always hear when you talk to data scientists. We spend 80% of our time messing with the data, trying to clean the data, figuring out the data quality and precious little time on the models and improving the models and actually getting outcomes from those models. So things like Spark have simplified that whole process and unified a lot of the tooling around so-called Big Data. And so we're seeing Spark adoption increase. George Gilbert in our part one and part two last week of the Big Data forecast from Wikibon sort of showed that we're still not on the steep part of the S-curve in terms of Spark adoption and generically we're talking about streaming as well, included in that forecast. But he's forecasting that increasingly those applications are going to become more and more important. Brings you back to what IBM's trying to do is bring machine learning into this critical transaction data. Bringing, again, to me, it's an extension of the vision that they put forth two years ago, bringing analytic and transaction data together, actually processing within that private cloud complex is what essentially this mainframe is. It's the original private cloud, right? You were saying off camera it's the original converged infrastructure. It's the original private cloud. Mainframe, still here, still lots of it. It's lots of Linux on it. We've covered for many years all the, you want your cool Linux Docker, containerized machine learning stuff up. I can do that on a Z-series. Yeah, you want Python and Spark and R and all the popular Java, all the popular programming languages. So it makes sense. I mean, it's not like a huge growth platform. It's kind of flat down up on the product cycle, but it's alive and well and a lot of companies run their businesses, obviously, on the Z. So we're going to be unpacking that all day. Some of the questions that we have is what about cloud? Where's it fit? What about hybrid cloud? Well, what are the specifics of this announcement? Where does it fit? Will it be extended? Where does it come from? How does it relate to other products within the IBM portfolio? Very importantly, how are customers going to be applying these capabilities to create business value? And that's something that we'll be looking at with a number of the folks on today. Yeah, and Dave, another thing, it reminds me of two years ago, you and I did an event with the MIT Sloan School on the second machine age with Andy McAfee and Eric Bonjolson talking about, as machines can help with some of these analytics, some of this advanced knowledge, what happened to the people? Talk about healthcare. It's doctors plus machines most of the time. It's, as those two professors say, it's racing with the machine. So what is the impact on people? What's the impact on jobs and productivity going forward? Really interesting hot space. They talk about everything from autonomous vehicles, advanced healthcare and the like. This is right at the core of where the next generation of the economy and jobs are going to go. It's a great point and no doubt that's going to come up today and some of our segments will explore that. Okay, so keep it right there, buddy. We'll be here all day covering this announcement, talking to practitioners, talking to IBM executives and thought leaders and sharing sort of the major trends that are going on in machine learning, specifics of this announcement. So keep it right there, everybody. This is theCUBE. We're live from the Waldorf Historia. We'll be right back.