 Okay, we're back from IOD. This is Dave Vellante at Wikibon.org and we're bringing you theCUBE live from IOD, IBM's big software event, big data event, an event that really is underscoring IBM's portfolio in big data, its commitment to big data, its management visions, its technology visions. Really quite impressive and this is day two for us. We've been covering IOD here Monday and Tuesday. We're off to Strata tomorrow, Wednesday and Thursday. We're going to be covering that event like a blanket. theCUBE, SiliconANGLE's flagship product that goes to the events. We extract the signal from the noise and I'm here with my co-host. I'm Jeff Kelly and we'd like to welcome Jim Giles. He's a distinguished engineer for Big Data at IBM. We're seeing development of the InfoSphere and the Streams platforms as well. Big insights and Streams, yes. Great, thanks for coming on theCUBE, appreciate it. Yeah, we've heard a lot about Streams. We've had a couple of your partners on, customers and people are very excited about it. But first of all, let's start with just the whole topic of big data, set off cameras, one of your favorite topics. It's one of ours as well. You have it in your title, I made that comment, which is relatively new to see people with big data in their titles. But most of them, I wonder if this is true for you, have been doing something around data, big data, little data for quite some time. Is that the case with you? Yeah, that's right. So we've been working on the InfoSphere Streams product. That's where actually I started with an IBM, working on building that in IBM research and then bringing it to market as a product within Software Group. And so that is really about analyzing data in motion and it's one of our major parts of our overall big data platform. And so then we got excited and interested in Big Insights and Hadoop and multiple different parts of the Big Data platform. So we have been working on it for quite some time. Yeah, we had Nagi Halim on before and he was saying that essentially he started this initiative 10 years ago. That's right. You know, it's like love about IBM, you do some real research and stuff actually hits the market. It's not just research for research sake. Yeah. So what's going on at IOT? What are some of the things that you're talking to customers about and what are you hearing back? Yeah, so I think one of the most important things to really talk about is the overall big data platform. And so whenever some people talk about big data they just think that means Hadoop or they just think it means, you know, maybe whatever warehouse that they happen to have but really with IBM we're taking all these different pieces together. We always talk about volume, variety and velocity and so we make sure that we have the right components to solve each of those different types of problems whether it's analyzing data in motion whether it's analyzing data at rest and bringing all those different pieces together. And so that's really the strength of our IBM Big Data portfolio. So where are you finding when you talk to customers where are they on this journey? You know, it's really early days I think in this market for so-called big data. So you know, where are you finding most customers at? Are they at the point where they conceptually understand what's going on and they're ready to implement solutions or are you still doing a lot of evangelizing? Where do you think we are in terms of maturity? So different people are at different places and different are coming into different entry points. So we have some customers that maybe have a problem a business process that they are working on right now that they just can't do anymore the way they've been doing it before. And so they're really looking for new technologies that they can bring to bear to that problem. And for them, they pretty much know what they want to do. They're ready to explore the technology with us and implement projects and get into production. There's other customers who have brand new problems that it's a whole new problem space a new thing that they want to do. And for them, sometimes they're still kicking the tires and trying to understand just what these things can do for them and what sorts of new capabilities they can uncover. So it's a real mix of customers that we're seeing right now. Can we do a little big insights drill down? Yeah, sure. Take us back through, because it's a pretty comprehensive platform. Yes. Take us back to the start. Where to come from, lay it out for us. Paint a picture if you would like to. Yeah, sure. So this also, like many things, the ideas around putting something together came from IBM Research, actually started in the Almodin Lab for big insights. And we're really starting to realize that Hadoop was a very important capability and we needed to figure out how we were going to leverage it within IBM. And so there was some different work that was going on, some different work that was happening in file systems, some work that was happening in analytics. That's a very important part of the big insights product. And we started pulling into these different pieces together and we realized that if we put them together in the right combinations, we were going to have something very powerful, very unique within the marketplace. We really pride ourselves on starting with that Apache Hadoop stack and in building on top of it with analytics, with tools for both the business analyst, the business executive, the data scientist, developers as well, a complete and total package. So we really brought different things together from different parts of IBM Research as well as things that we've created as well. So add some more color to that. So if I'm buying big insights, what am I getting? So with big insights, you get Apache Hadoop, much like many other people who are doing Hadoop distributions or offerings, but you also get a whole set of analytics capabilities. One example of that is our advanced text analytics, really powerful capability. This is something where you can analyze textual information, whether it be from Twitter, Facebook, whatever other type of source or your enterprise data, because in enterprise data, you have a number of things that are unstructured as well, and you can pull entities out of that data, you can find sentiment in this type of textual data as well. And this is not only for English, but also multiple other different types of languages. So that's just one example of one of the analytic capabilities. We also have some workload optimizations. So there's things that we do in terms of our scheduling, things that we do in terms of our packing different types of workloads together to really improve the performance. In many cases, we can get 30% better performance than the base schedulers that are built into Hadoop today. Going on up this stack, we have developer tools, bridge-set of developer tools and Eclipse for not only the Apache open source languages, so Pig and Hive and all those sorts of things, but also for IBM's Scripting Language Jackal that we've created as well, and also for all the analytics capabilities. On up the stack, kind of the most important, the face of Big Insights is our web console where data scientists can interact with data in a very familiar spreadsheet-oriented kind of way. So the capability is called Big Sheets. It allows you to look at your data, and one of the key things there is that, of course, with big data, you can't bring all your data into a spreadsheet, operate on it, and push it out. So instead, what we do is we can pull samples of the data, do some transformations. When we're happy with those transformations, push that back down into the infrastructure run and then get our results that way. So Big Sheets, also our application framework, being able to take applications that we've built and also that our customers build and have them executable by really people who don't understand technology very well, but just know that they want to accomplish a particular task. So it's a whole set of things that we provide. It's a modular set of components that I can engage with. I can utilize other tools if I want. Is that true or? So it's one complete package. Okay, so it's a full stack. It's a full stack. Okay, great. So I've dropped it in and that's my environment. That's your environment. Okay. And do you have a community addition? Yes, we have Big Insights basic addition and we also have Big Insights Enterprise. Big Insights basic is effectively IBM's version, our distribution of Apache Hadoop. It's the same as what you can get directly from open source just with our blue washing that we, what we call blue washing, following all of our legal processes and things like that of the Apache Hadoop components. And then a few other things to make it easy to install and administer. So the same technology. Same technology. It's just blue washed. That's right. Meaning it's blessed by IBM and it has all the... That's right. So it means you stand behind it and it fits into your corporate edicts and everything else. Okay, so it's the Hadoop, which is open source. The analytics, now the text analytics is something that's specific to Big Insights or is that some other technology that you've dropped in? Yeah, so the cool thing about this is that one of our strategies is to take these analytics that we've been creating for platforms like our Big Insights platform but also make them available on other platforms as well. So that exact same advanced text analytics capability that runs in Big Insights also runs in streams. This makes it extremely powerful because you can think about doing some deep analysis on all the data that you've got stored in your Hadoop environment. Coming up with the type of analysis you want to do from a text perspective but then push that back up into streams for doing real time analysis on data in motion. And so this is overall where we're headed, making sure that the same analytics that we have, if they're appropriate in a streaming environment, work there, if they're appropriate in Hadoop, work there and being able to share those same capabilities across. Really useful. What's the value proposition of Big Insights? Is it the simplicity, the integrated stack? Is that right? It's a combination of them. It's really the simplicity. It's the fact that we have an integrated stack and all the components work together. Most importantly, the fact that it's integrated into the rest of what we call our Big Data platform with all the other products, that deep integration is very important. And then these advanced capabilities that don't really exist elsewhere. So a lot of organizations, when they talk about Hadoop, say, okay, virtually everybody, Jeff, says we're going to make Hadoop more robust, we're going to make it enterprise ready, and so. It's a legitimate conversation to have. I mean, that's. It's a legitimate conversation to have because there's some problems there. How are you attacking that problem? Right, so I mentioned some of the things like the workload optimization, that's one piece of that, and of course the integration into the enterprise and all of both IBM's components as well as other partners and other vendors as well. That's a key part of that enterprise ready aspect of it. But the other things that we're doing is, for example, we're really focusing on security, making sure that we can provide the kind of security that our customers are expecting on top of the Hadoop framework and many other things that we're working on as well. So the workload optimization sounds like tuning to me, like automated tuning. Is that, am I inferring that correctly? Well, there are some other things that we're doing as well where we're taking some scheduler technology and we're sort of overlaying that on top of the base Apache Hadoop scheduler. So we have something that in research was called Flex, which sort of lays down over and replaces the fair scheduler. Now I think we called our advanced scheduling capability. And so it's really just taking some of the things that we've learned in other systems over time, other experimentation that we've done in research and so forth and applying that now to the infrastructure in providing better overall performance. So how about things like data protection, data management, recovery? Is that part of the objective or is that really not fundamental to the system? So all those things we're working on and including capabilities for all those things. So it's a journey. So it's not all there today, it's part of the roadmap. That's right. So you mentioned security. I'd love to talk about that a little bit. What are some of the security concerns you're hearing from customers? How are you approaching that? Have you seen some activity in that market? It's a startup called Squirrel, which kind of came out of the NSA. They're focusing on cell level security and their data and their no-supial database that works kind of in the Hadoop stack. But what are you finding in terms of the major security concerns around Hadoop and Big Insights? Is it different than the security concerns you're hearing around, you know, Nathisa in more traditional databases or does it have any unique characteristics? Excuse me, I think the concerns are really the same. It's just that maturity, the technology is something that's got to evolve over time. So one of the things that, excuse me, people are always worried about is the number of ports that you have to open up to access all the different components. And so one of the strategies we've taken is that we sort of have a gateway into our clusters through our web console, rest interfaces through our web console, and that's the way that you get into the cluster to do the computation that you want to do. So if you have an application, for example, you deploy it into our web console and then once it's approved for use by the administrator, then you can actually, your users can run that directly from the web console itself. So one of the strategies is just trying to sort of surround and you can think of it as a walled off garden that we're trying to protect in a way. Another concern is definitely around the auditing and understanding what users are touching which data. And so that's something that we're, we worked with our guardian capability to make sure that we can actually take that information as it's being generated from logs and things like that and push it up to our guardian tools so you can understand what's happening within the cluster. So the big insights, I'm sorry we're spending so much time on it but it's interesting to me and I think our audience. Can you talk about your database strategy with regard to big insights? Where does the database fit? Yeah, so there's a couple of different things I can say about it. Already today we have very deep integration, two-way integration with all of our major database capabilities. So pushing data into DB2 and pulling it out of DB2, pushing it into Neteza, pulling it out of Neteza. All this using high speed parallel readers and writers and so that's one of the things we've done. The other thing we've done that's related to the database technology is with our Cognos capability we have the, we can now take Cognos and access data that's in our big insights cluster through a Hive connector that we've got. Over time we're working much more towards integrating these things, these components very deeply. So for example being able to issue queries from your database and being able to have those pushed down into a Hadoop environment and vice versa. So if I wanted to use H-Base with big insights I can do that obviously. Oh, of course, yes. And so obviously there's a big movement we're going to hear a lot about this tomorrow the next day at Strata is to bring together a sequel and no sequel. So what's your angle on that? So that's something that we're very actively, very actively working on is the ability to have a rich sequel interfaces over the top of our big insights environment. So I mean long terms we hear a lot about the connector strategy right now. We've got companies, enterprises doing a lot of the hard deep analytics in Hadoop or crunching the numbers and then kind of hanging a database off the side and kind of moving data in there. And long term do you think that's a long term strategy or do you think we need to move to a more comprehensive platform that kind of integrates these capabilities into one common platform? It sounds like the latter, I don't want to put words in your mouth. Yeah, we definitely see it all converging and coming together. They have to all work together in a complete picture. So architecturally what does that mean? How do you go about, what do we have to do to get there and what's it going to look like when we do arrive? So I think that ultimately what you're going to see is the ability to execute standard SQL queries on top of a database and you won't even know that it's happened but some aspects of that will go off and be executed inside the Hadoop environment. You'll see extensions where sort of overflow data, you can think of it as if you want to make it, instead of pushing data into a long term archiving you might want to push it into something that we're calling a queryable archive, still making it accessible but maybe not as accessible as it would have been if it was inside your database but not as inaccessible as if it's been moved off to long term storage or something like that. So it's going to become a complete seamless picture is where we think it's headed. What kinds of things are really exciting you now? What's really floating your boat? Yeah, so I think that really the maturity that we've hit with all these products, Big Insights 2.0, Streams 3.0, the ease of use that we've reached at this point is very exciting to me and I think also what I'm starting to see with many of our different customer use cases. Earlier in the conference as an example just one that was just extremely exciting, ConocoPhillips talked about how they're tracking icebergs in the Arctic and trying to understand where icebergs are relative to the oil production platforms and if they needed to move the oil production platform or if they needed to maybe break up the ice and all this is being done using satellite data in real time, just a really exciting use case and there's many more use cases like that with other customers that are just, it's just amazing what people are trying to do now. Fantastic, go ahead. I just want to return, sorry to return to the architectural question but we've heard a lot about the peer systems at the show and the pure data and some interesting things that are going on there. How does the entire big data platform as the book that we've got here we had on some of the authors earlier, how do they all integrate in a way that doesn't create more silo, data silos? An issue we've been dealing with in the data management field for years. How does the big insights in the Streams portfolio play with pure data portfolio and the peer systems portfolio in a way that doesn't create more silos? Right, so as I mentioned earlier, we do have rich and deep and vast integration points between all the different components within it and the other thing is that we sort of have laid out patterns and some of those are actually discussed in the book, laid out patterns of how data would flow through your enterprise, maybe it comes in through the streaming system, you get your insights as the data's flowing in then data, whatever data is relevant that might flow into Hadoop, maybe you'll do some deep analytics, maybe you'll find some data elements that are really important, they need to maybe go on to your database. So we're trying to describe an architectural pattern here and more broadly that shows how the data would flow through your enterprise, how you actually can leverage each of the different environments for the applications that you've got accessing data and interacting with data at the right place at the right time. All right, great, well, Jim Giles, thanks so much for coming on and I mentioned that we've had some of the authors on and Jim in fact is one of the authors but I think we've had half the authors on so far, we've got to see if we can get the other three by the next hour or two. Yeah, Jim, that was great, really appreciate you coming on. This is theCUBE, SiliconANGLE.tv's continuous coverage of IBM IOD, we're right back with some great guests coming on this afternoon. We're going all day, we're going all week, this is big data week, keep it right there, we're right back.