 We're back, this is Dave Vellante of Wikibon.org and this is The Cube, I'm here with Jeff Kelly, my co-host. SiliconANGLE comes to these events, we extract the signal from the noise. We go out, we scour these events for the very best guests that are at these shows and bring them to you. They share their opinions, their thought leaders, their practitioners, their technologists. Anjul Bambri is here, she's a multi-time Cube guest. She's the vice president of Big Data and Stream, something we're going to talk about. Stream is a very interesting product at IBM. Anjul, welcome back to The Cube. Thanks. It's a pleasure to see you. We're here at Hadoop Summit. A lot of discussions going on about various things that we're going to get into, but let's start with big insights. That is your flagship offering for the Big Data Group at IBM. You guys have been evolving the platform, you just made an announcement in June, so give us the update on big insights in your group. Sure, and it's always a pleasure to be here. So we just announced big insights, our next release, a couple of weeks back, and a lot of the capabilities that we have made generally available, it's interesting, have also been discussed as capabilities needed by the enterprise for Hadoop to really stick in the enterprise throughout the day here. So I think that's really good. The first thing there is that we have brought SQL to Hadoop, and by that what I mean is that now we are providing, via our big SQL capabilities, complete ANSI SQL 2010 support. So it's all ANSI SQL 2010 compliant, very feature-rich, and we are also bringing the processing, the MPP processing of data, structured data to Hadoop. So from a performance standpoint, this is something that is, the enterprise customers are really going to be wanting to, that as they are using SQL, they are expecting the same kind of performance that they were getting when they were using SQL on structured data and relational databases. So this is something, both from a feature function standpoint, as well as from performance, I think people will really like what we have released there. And we are doing this by not moving data out of Hadoop into any database on the side. The data stays in the Hadoop file system, we don't move the data around, and we are able to run SQL, whether it is data which is in Hadoop file system, it may be in Hive, HBase, comma-separated files, whatever the case might be. The other key part of the big SQL support is that we have our BI tools like Cognos and MicroStrategy all supporting big SQL, which is really important for customers that are doing, or augmenting their warehouse to bring in different kinds of data into Hadoop, that their existing investments that they have made in Cognos, or maybe MicroStrategy and others, that they can continue to leverage those, just because they have data in Hadoop. And now that we are giving them SQL, it's equally important that tools like Cognos continue to work. So there's very tight integration that we are providing with the likes of the BI tools like Cognos. Obviously, both on the distributed platform as well as, you know, there is still a world out there which people don't talk about that much, which is the mainframe world. And now we are, you know, the next month we are going to be releasing this whole big insights and Cognos support for our customers who are on mainframe, right? So big data is not something that is only on a certain platform, right? There are big transactional systems that are still running on the mainframe. And it's equally important for those customers and the applications to be able to bring the polystructured store called Hadoop in those enterprises and be able to use tools where they are accessing data which is in Hadoop as well as transactional systems which are on mainframes like DB2Z. So, you know, with big SQL, we are really tying all of this together. Yeah, so in fact, of course, you guys made the announcement, the announcement in April, Blue Acceleration, we had Bernie Spang on at Wikibon. We've beaten that one pretty hard and this is good. But so important, our customers want instant gratification, essentially. You know, they want the same experience in Hadoop that they have in the transactional world and that's obviously a challenge for your business. So you have this offering of big insights and it essentially comprises IBM's distribution of Hadoop. Talk a little bit more about what's inside of big insights. What is that all about? So, big insights is our offering which is completely based on Apache Hadoop. So there is no forking or there is no, it's pure Apache Hadoop. And what we are providing on top of that is capabilities like big SQL which are also ANSI SQL compliant so that customers can access data. Once it reaches a structured form in Hadoop using SQL which they have been used to, as well as we are providing from the standpoint of security, there is capabilities that we've added where you can secure your data, govern your data, bringing in things like data encryption, masking. So which is where we are providing integration with like Guardian where you can now look at, have a complete audit trail of who has accessed what data, who has run which map reduce jobs. So having that audit trail is very important because otherwise the data is just out there for anybody to access which is okay when you are maybe doing some sandbox. But there is always in the enterprise, there is a governed side of the enterprise and that's really important that you don't lose control of the governed side of the data. So RACF for Hadoop basically, resource access control facility for all you made from yourself there. See, we love the main frame. That's a really important point especially when you are moving. When you want to actually use Hadoop to support some mission critical applications and this exposing customer data for example to different parts of your organization or potentially even outside your organization, clearly you've got to have the governance and the rules apply to that. I wanted to ask you a question a little bit about the impact of Hadoop on IBM's overall big data business. So we hear a lot about one of the use cases for Hadoop is moving some of the workload from your, what can be an expensive data warehouse into Hadoop where you can still have access to it but it's less expensive to store and things like that. So we know certainly IBM has kind of the IBM pure data system for the analytics based on some of the Netiza technology more of the data warehousing technology. You've also got pure data for Hadoop. So I'm curious, how is that impacting your business? Are you seeing that, are people looking to use Hadoop and potentially move off of, move some of their data from Netiza into Hadoop? And how do you see that over the long term impacting your business? So it depends on, obviously what Hadoop is doing is it is enabling many new classes of applications which were maybe expensive just because the cost of putting so much data in a warehouse could clearly, would not be a cost effective way for some of the scenarios. But just as an example, right, like, you know, from when we spoke specifically about Netiza and Hadoop and we have customers that are using Hadoop, you know, by putting Hadoop in next to their warehouses and really augmenting their warehouse. And the usage that we are seeing is that since Hadoop lets you bring in all kinds of data which is also very noisy data into, you know, the Hadoop file system, obviously you can do that by in a very cost effective manner and you don't have to really process a lot of your data, just dump it in there and then you can see, you know, what do you really care about there? So to that extent, I think it's a great technology for all warehouse customers to be able to bring in data which doesn't really belong in a relational database initially at least, right? And once they have done the filtering and analysis, they may identify that there is now a subset of the data which they were not leveraging initially to make decisions and they may decide actually to move that data back into the warehouse. So because for certain kinds of applications, the warehouse is probably the highest performant way to access that data or analyze that data. So leaving that, so in some sense, it's also the reverse that is happening, right? That if they were only making decisions based on the data that's in the warehouse, you know, you can only squeeze it that many ways. I call it that you can only, you know, just because you keep torturing the same data, you're not going to get different results. So you've got to stop torturing the data, right? And now you don't have to torture the data, but you bring new types of data and get new insights, figure out what else you were not capturing and bring that into the warehouse so that you can actually do more and have better insights than, you know, what you were trying to accomplish by torturing the data. So stop the torture, bring in the new data in Hadoop, find out what's useful, move it back into the warehouse and you'll get new kinds of correlations and insights. So you're bringing these worlds together. I mean, IBM's always done, and of course, there's a lot of tweets going on today around IBM serving up data at Wimbledon, you know, Pond intended, IBM's always done the Olympics. So you've always been there, but you're bringing, you know, your mojo is around Hadoop and big insights and you're bringing those two worlds together and they're colliding in a fairly big way. In a nice way. Are they? That's my question. Is it all harmonious or is there some dissonance? What are you seeing in the customer base? So, you know, whenever there is new technology, there is always going to be some disruption and which is fine, right? In the end, I mean, from an IBM standpoint, we want to bring to the customers what's best for them, right? We want to make sure that they are able to leverage the technology that is the right one to solve their business problems. So we would never discard any technology just because it's going to cause disruption to something which has prevailed for the last decade or so. So looking ahead, we firmly believe that, you know, we have to embrace this, which is what we have done. And it's not that, you know, we've embraced this, we are getting it ready for the enterprise by really helping not just our customers, but you know, there's a whole world of SQL programmers out there and really helping them cross the chasm, making sure that the tools and the skills that they have invested in, they can continue to leverage that, not just for, you know, what they've had traditionally, but also for these new technologies. I mean, that's where we are really surrounding Apache Hadoop and building an ecosystem of capabilities around it so that the customers really get the best of both worlds. And they can now actually implement new classes of applications. And really, you know, providing a way that you run your workload where it suits and fits the best. And you know, it's not an either-or, it's an extension and an augmentation. HiBM has always had the ability to interact with, sell to the upper, you know, senior management audience. But it especially strikes me, Anjul, in this big data world. You've got, like Merv was saying today, you've got the guys in the hoodies, the guys in the suits. Who do you interact with? Who are the customers of Big Insights? Are they hoodies? Are they the suits? Are they both? They're both, you know, because we are obviously, you know, the customers that have been with us for a long time. And, you know, we are loyal to them and, you know, they are loyal to us because we watch out for them. We are looking at the trends in the market both from a technology standpoint as well as what are the new insights that they need so that, you know, they can serve their customers better. So we are bringing to them, here's the new technology, here are the new possibilities that were not there before. And, you know, how do we bring it to them? And then, of course, we also have, you know, the new customers where maybe they don't have a lot of legacy to, you know, that they have to bring to the next decade, for example. And they are finding it also very attractive because they know that we are embracing the open source and we are building a good ecosystem around it so that they'll be able to solve their problems quickly with us. And which is important for them because they have to survive, right? They're just starting out. So for them to really become, you know, the next bigger companies, they have to bet on companies which will take them forward in the next decade. And we don't have much time but I got to ask you about Streams because I'm so fascinated with this product. It's a product that is organic out of IBM's research labs. There's been some startups. I know some of the folks spun out of there. There's a new startup we're actually going to have them on later, H-streaming. So it's a really interesting space being able to make decisions, allow machines to make decisions on data before persisting it truly in real time. What's happening with that product? So, you know, Streams is, we've seen so many implementations of where people are leveraging Streams in the telco industry to analyze call data records at volumes and throughputs that were just not possible before to predict customer churn, to gain insights into their consumer and offer them the right promotions and services at the right time without waiting for all this data to be persisted somewhere and then take action on it. We've seen deployments of this in healthcare. You know, you've all heard me talk about data baby many times. But, you know, it was same data that was before not being able to be processed at the right time to be able to predict certain things, right? In this case, infection in a newborn baby. But the same data, because of Streams technology, is now saving lives. I mean, talk about that as an impact. We are seeing, you know, marketing and ad agencies use Streams where every interaction that the customer is having has to be captured. The context has to be captured and the context has to be understood amidst, you know, many different kinds of interactions and then the right action needs to be taken. Three-letter government agencies, no comment. Yeah, and, you know, for security. I mean, this is, I mean, you know, like, yeah, in the open source, there's a lot of talk now happening about real-time analytics. We've had real-time analytics for now the last 10 years. And as a platform, the throughput, the consumability of Streams, you know, the development environment that we offer to the application developers is something that is really cool and I think something that people should try out. You know, we have quick start editions of Big Insights as well as Streams now available. That way, for development and test, people can download those and try it for free and, you know, hopefully form the same opinions that I have about these products. IBM is really getting it done in open source. You guys have been doing that for well over a decade, two decades really. Lou Gershner called you were covering alcoholic. I owe you that one. All right. Thanks very much for coming back on theCUBE. It's great to see you as always. All right, we'll be right back. This is Dave Vellante. We're live, Silicon Angles theCUBE. I'm with Jeff Kelly and John Furrier. We're here at Hadoop Summit in San Jose. We're right back after this word. Thank you.