 Live from New York, it's theCUBE, covering the IBM Machine Learning Launch Event, brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. Welcome back to New York City, everybody. This is theCUBE. We're here live at the IBM Machine Learning Launch Event, bringing analytics and transactions together on Z extending an announcement that IBM made a couple years ago with sort of laid out that vision and now bringing machine learning to the mainframe platform. We're here with Jim Kobielus. Jim is the Director of IBM's Community Engagement for Data Science and a longtime CUBE alum and friend. Great to see you again, James. Great to always be back here with you. Wonderful folks from theCUBE. You ask really great questions and... Well, thank you. I'm prepared to answer whatever you want. So we saw you last week at Spark Summit, sort of back-to-back of continuous streaming, machine learning. Give us the lay of the land, from your perspective of machine learning. Yeah, well, machine learning very much is at the heart of what modern application developers build and that's really the core secret sauce in many of the most disruptive applications. So machine learning has become the core of of course what data scientists do day in and day out and what they're asked to do, which is to build essentially artificial neural networks that can process big data and find patterns that could normally be found using other approaches. And then, you know, as Dinesh and Rob indicated, a lot of it's for regression analysis and classification and the other core things that data scientists have been doing for a long time. But machine learning has come into its own because of the potential for great automation of this function of finding patterns and correlations within data sets. So today at the IBM Machine Learning Launch Event and we've already announced it, IBM Machine Learning for ZOS, takes that automation promise to the next step. And so we're real excited and there'll be more details today in the main event. One of the most fun I had last year, one of the most fun interviews I had last year was with you when we interviewed, I think it was 10 data scientists, rock star data scientists. And Dinesh had a quote, he said, machine learning is 20% fun, 80% elbow grease and data scientists sort of echoed that last year. We spent 80% of our time just sort of wrangling data and it gets kind of tedious. You guys have made announcements to address that. Is the needle moving? To some degree, the needle is moving. Greater automation of data sourcing and preparation and cleansing is ongoing. Machine learning is being used for that function as well. So, but nonetheless, there is still a lot of need in the data science sort of pipeline for a lot of manual efforts. If you look at the core of what machine learning is all about, supervised learning involves humans, meaning data scientists, to train their algorithms with data. And so that involves finding the right data and then of course doing the feature engineering, which is a very human and creative process. And then to be training the data and iterating through models to improve the fit of the machine learning algorithms to the data. So, in many ways, there's still a lot of manual functions that need expertise of data scientists to do it right. And there's a lot of ways to do machine learning wrong. And there's a lot of, as it were, tricks of the trade. You have to learn just through trial and error. A lot of things like the new generation of like things like generative adversarial models ride on machine learning or deep learning, in this case, a multi-layered. And they're not easy to get going and get working effectively the first time around. I mean, with the first run of your training data set. So, that's just an example of how the fact is there's a lot of functions that can't be fully automated yet in the whole machine learning process. But a great many can in fact, especially data preparation and transformation. It's being automated to a great degree so that data scientists can focus on the more creative work that involves subject matter expertise and really also application development and working with larger teams of coders and subject matter experts and others to be able to take the machine learning algorithms that have been proved out, have been trained and to drive them to all manner of applications to deliver some disruptive business value. James, can you expand for us a little bit this democratization of before it was not just data, but now the machine learning, the analytics. When we put these massive capabilities in the broader hands of the business analysts, the business people themselves, what do you see in your customers? What can they do now that they couldn't do before? Why is this such an exciting period of time for the leveraging of data analytics? I don't know that, Stu, that it's really an issue of now versus before. Machine learning has been around for a number of years. I mean, it's artificial neural networks and that's going actually in many ways in the late 50s and it's steadily improved in terms of sophistication and so forth. But what's going on now is that machine learning tools have become commercialized and refined to a greater degree and now they're in a form in the cloud like IBM Machine Learning for the Private Cloud on ZOS or Watson Machine Learning for the BlueMix Public Cloud. They're at a level of consumability that they've never been at before. With a software as a service offering, you just, you pay for it, it's available to you, to your data scientists to begin doing work right away to build applications, drive quick value. So in other words, the time to value on a machine learning project continues to shorten and shortened due to the consumability, the packaging of these capabilities into cloud offerings and into other tools that are pre-built to deliver success. That's what's fundamentally different now and it's just an ongoing process. You sort of see the recent parallels with the business intelligence market. 10 years ago, BI with reporting and OLAP and so forth was only for what we now call data scientists or the technical experts in all that area. But in the last 10 years, we've seen the business intelligence community and the industry, including IBM's tools, move towards more self-service, interactive visualization, visual design of BI and predictive analytics through our Cognos and SPSS portfolios. A similar dynamic is coming in to the progress of machine learning, the democratization to use your term, the more self-service model wherein everybody potentially will be able to be, to do machine learning, to build machine learning and deep learning models without a whole lot of university training. That day is coming and it's coming fairly rapidly. So it's just a matter of the maturation of this technology in the marketplace. So I want to ask you, so you're right, right? 1950s, it was artificial neural networks or AI sort of was invented, I guess, the concept. And then in the late 70s and early 80s, it was heavily hyped. It kind of died in the late 80s. In the 90s, you never heard about it, even in the early 2000s. Why now? Why is it here now? Is it because IBM's putting so much muscle behind it? It's because we have Siri. What is it that has enabled that? Well, I wish that IBM putting muscle behind it technology can launch anything to success. And we've done a lot of things in that regard. But the thing is, if you look back at the historical progress of AI, I mean it's older than me and you in terms of when it got going in the middle 50s as a passion or a focus of computer scientists. What we had for the last, most of the last half century is AI or expert systems that were built on having to do essentially programming is right declarative rules defining how AI systems could process data whatever under various scenarios. That didn't prove scalable. It didn't prove agile enough to learn on the fly from the statistical patterns within the data that you're trying to process. For face recognition and voice recognition, pattern recognition, you need statistical analysis. You need something along the lines of an artificial neural network that doesn't have to be pre-programmed. That's what's new now about in the last, since the turn of this century, is that AI has become predominantly now focused not so much on declarative rules, expert systems of old, but on statistical analysis, artificial neural networks that learn from the data. See, in the long historical sweep of computing, we have three eras of computing. The first era before the Second World War was all electromechanical computing devices, like IBM Start, of course, like everybody's, was in that era. The business logic was as burned into the hardware as it were. The second era from the Second World War, really to the present day, is all about software programming. It's COBOL, FORTRAN, C, Java, where the business logic has to be developed and coded by a cadre of programmers. Since the turn of this millennium, and really since the turn of this decade, it's all moved towards the third era, which is the cognitive era where you're learning the business rules automatically from the data itself, and that involves machine learning at its very heart. So most of what has been commercialized, and most have what is being deployed in the real world of working successful AI, is all built on artificial neural networks and cognitive computing in the way that I laid out, where you still need human beings in the equation. It can't be completely automated. I mean, there's things like unsupervised learning that take the automation of machine learning to a greater extent. But you still have the bulk of machine learning is supervised learning, where you have training data sets, and you need experts, data scientists, to manage that whole process. That, over time, supervised learning is evolving towards who's going to label the training data sets, especially when you have so much data flooding in from the internet of things and social media and so forth. A lot of that is being outsourced to crowdsourcing environments, in terms of the ongoing labeling of data for machine learning projects of all sorts. That trend will continue apace, so less and less of the actual labeling of the data for machine learning will need to be manually coded by data scientists or data engineers. So the more data, the better. See, I would argue in the enablement pie, a big, oh, you're going to disagree with that, which is good, let's have a discussion about it, but in the enablement pie, I would say the profundity of Hadoop was two things. One is I can leave data where it is, and bring code to data, five megabytes of code depended by the data. But the second was the dramatic reduction in the cost to store more data, hence my statement of the more data, the better. But you're saying, eh, maybe not. Certainly for compliance and other things, you might not want to have data lying around. Well, it's an open issue. How much data do you actually need to find the patterns of interest to you, the correlations of interest to you? You know, sampling of your data set, a 10% sample or whatever, in most cases, that might be sufficient to find the correlations you're looking for. But if you're looking for some highly deep and rare nuances in terms of anomalies or outliers or whatever within your data set, you may only find those if you have a petabyte of data of the population of interest. So, but if you're just looking for broad historical trends and to do predictions against broad trends, you may not need anywhere near that amount. I mean, if it's a large data set, you may only need five to 10% sample, whatever. I love this conversation because people have been on theCUBE, I'll be meta, for example. So Dave, sampling is dead. Now, a statistician said that's BS in a way. Of course, it's not dead. Storage isn't free, first of all. So, you can't necessarily save and process all the data. Compute power isn't free yet. Memory isn't free yet, so forth. So, there's lots of- You're working on that, though. Yeah, sure. It's asymptotically all moving towards zero and whatever. But the bottom line is if the underlying resources, including the expertise of your data scientists, that's not for free. These are human beings who need to make a living. So, you've got to do a lot of things. A, automate functions on the data science side so that these experts can radically improve their productivity, which is why the announcement today of IBM Machine Learning is so important. It enables greater automation in the creation and the training and deployment of machine learning models. It is, as Rob Thomas indicated, it's very much a multiplier of productivity of your data science teams, the capability we offer. So, that's the core value because our customers live and die increasingly by machine learning models. And the data science teams themselves are highly inelastic in the sense you can't find highly skilled people that easily at an affordable price if you're a business. And you've got to make the most of the team that you have and help them to develop their machine learning muscle. Okay, I want to ask you to weigh in on one of Stu's favorite topics, which is man versus machine. Humans versus mechanism. Actually, humans versus bots. Okay, go ahead. So, you know, a lot of discussion about, I mean, machines have always replaced humans for jobs, but for the first time, it's really beginning to replace cognitive functions. What does that mean for jobs, for skill sets? You know, the greatest, I love to comment, the greatest chess player in the world is not a machine. It's humans and machines. But what do you see in terms of the skill set shift when you talk to your data science colleagues in these communities that you're building, is that the right way to think about it? That it's the creativity of humans and machines that will drive innovation going forward? I think it's symbiotic. If you take Watson, of course, that's a star case of a cognitive AI-driven machine in the cloud. We use a Watson all the time, of course, in IBM. I use it all the time in my job, for example. Just to give an example of one knowledge worker and how he happens to use AI and machine learning. Watson is an awesome search engine. Through multi-structured data types and in real-time enabling you to ask a sequence of very detailed questions, and Watson is a relevance-ranking engine, deep Q and A, all that stuff. What I found is it's helped me as a knowledge worker to be far more efficient in doing my upfront research for anything that I might be working on. You see, I write blogs and I speak and I put together slide decks that I present and so forth. So if you look at knowledge workers in general, AI as driving far more powerful search capabilities in the cloud helps us to eliminate a lot of the grunt work that normally was attended upon doing deep research into a knowledge corpus that may be pre-existing. And that way, we can then ask more questions more intelligent questions and really work through our quest for answers far more rapidly and to entertain and rule out more options when we're trying to develop a strategy. Because we have all the data at our fingertips and we've got this expert resource increasingly in a conversational back and forth that's working on our behalf predictively to find what we need. So if you look at that, everybody who's a knowledge worker which is really the bulk now of the economy can be far more productive because you have this high performance virtual assistant in the cloud. I don't know that it's really going, AI or deep learning or machine learning is really going to eliminate a lot of those jobs. It'll just make us far smarter and more efficient in doing what we do. That's, I don't want to be a little, I don't want to minimize the potential for some structural dislocation in some fields. Well, it's interesting because as an example you're already productive and now you've become this hyper productive individual but you're also very creative and can pick and choose different toolings. And so I think people like you, it's huge opportunities if you're a person who used to put up billboards maybe it's time for retraining. Yeah, well maybe a lot of the people like the research assistants and so forth who would support someone like me and most knowledge worker organizations, maybe those people might be displaced because we would have less need for them. In the same way that one of my very first jobs out of college before I got into my career, I was a file clerk in a court in Detroit. It's like, it was a totally manual job and there was no automation or anything. You know that most of those functions, I haven't revisited that court in recent years. I'm sure automated because you have this thing called computers, especially PCs and lands and so forth. It came along since then. So a fair amount of those kinds of feather-bedding jobs have gone away in a number of bureaucracies due to automation and machine learning is all about automation. So who knows where it will all end up. All right, we've got to go, but I wanted to ask you about this. I love unions, by the way. Yeah, and you've got to meet a lot of lawyers, I'm sure. Okay, cool. So I got to ask you about the community of data scientists that you're building. You've been early on in that. It's been a persona that you've really tried to cultivate and collaborate with. So give us an update there. What's the latest? What's your effort like these days? Yeah, well, what we're doing is I'm on a team now that's managing and bringing together all of our programs for community engagement programs for really for across the portfolio, not just data scientists. That involves meetups and hackathons and developer days and user groups and so forth. These are really important professional forums for our customers, our developers, our partners, to get together and share their expertise and provide guidance to each other. And these are very, very important for these people to become very good, to help them get better at what they do. To help them stay up to speed on the latest technologies like deep learning, machine learning and so forth. So we take it very seriously at IBM that communities are really where customers can realize value and grow their human capital ongoing. So we're making significant investments in growing those efforts and bringing them together in a unified way and making it easier for developers and IT administrators to find the right forums, the right events, the right content within IBM channels and so forth to help them do their jobs effectively. And machine learning is at the heart, not just of data science, but other professions within the IT and business analytics universe are relying more heavily now on machine learning and understanding the tools of the trade to be effective in their jobs. So we're educating our communities on machine learning and why it's so critically important to the future of IT. Well your content machine and it's great content so congratulations on only kicking that off but continuing it and thanks Jim for coming on theCUBE. It's good to see you. Thanks for having me. You're welcome. All right, keep it right there. We'll be back with our next guest, theCUBE we're live from the Waldorf Astoria in New York City at the IBM Machine Learning launch event. Right back.