 People want faster decisions. They want decisions in near real time. And a lot of the talk today about big data is really focused on bringing that real time like capability to big data. What's the best way to do that? A lot of people equate Hadoop with big data. A lot of people are trying to bring SQL capabilities to Hadoop. Well, there's more to big data than Hadoop. And we're here with Bernie Spang of IBM to talk about that. Bernie is the director of strategy and marketing at IBM database and software systems. He's up here from New York at the Wikibon offices. Hi, everybody, I'm Dave Vellante. Bernie, welcome back to theCUBE. Thank you, Dave. Good to be here. So we last talked to you live at IBM's IOD, which was a fantastic event. I've said a number of times that IBM is basically taking its analytics business and really super glued it to the big data meme. And it's done a tremendous job of that. According to Wikibon, the number one big data player. And you really got there through a series of hard work, acquisitions, a lot of thinking, building out a great portfolio, leveraging your excellent services business to really begin to solve customer problems. So it didn't happen overnight, but the market really is favoring... The wind is at your back, as they say. Yeah, well, first of all, thank you for that. And I'd agree. One of the important things that we've done and is having an impact amongst our clients is looking at the complete problem. It's not just about data. It's not just about analytics. And it's not just about any one technique. So when we look at big data analytics, if you put those things together, one way to think about it is it's the tops on bottoms. I mean, having great analytics without the data systems to support it gets you nowhere. And having great data systems without applying the innovative analytics gets you nowhere. So you need both of those things. And when we talk about big data, we mean all data and all paradigms that you're going to apply to get value, business value from that data. Yeah, so when I talk about big data and the Wikibon definition is essentially data that is either too big or too fast or too different that you can't handle it in traditional ways. And I think that's, you know... That's fair. It's a vague definition, but it's a good one. It means you've got to do something different. And of course, Hadoop is different. You've got function shipping, and so that's why I think a lot of people equate Hadoop with big data. And I know personally my first entry into the big data world came through Hadoop. But there's much more to it than that. Talk about that a little bit. Yeah, certainly. And I agree. A lot of people, the first thing they were exposed to beyond traditional relational database management was Hadoop as a new paradigm. But there are others. There's time, sorry, streams computing, analyzing data in motion that never even lands on a storage device. So before you persisted, you're actually analyzing it and making decisions. And it gets back to your point about too fast, right? And too much. I mean, if you're talking about, you know, our great example, if you're talking about telemetry coming from medical devices, right? And our great advertisement that we had, you know, the example of one of our clients, a neonatal care unit, analyzing the telemetry coming off 17 different monitors for each newborn in the thing. You know, doctors and nurses used to look at those data points every 20, 30 minutes and write notes and look for patterns. Now you do that on the fly and you can never store all that data and analyze it over time and get an answer in time to make a difference. Now with this kind of stream processing, they're finding life-threatening infection up to 24 hours faster than they were able to before. And it still makes the hair on the back of my neck stand up whenever I tell this story because you think about that's reality, a big data and applying it to the real world that makes a difference. Well, and of course the smarter planning initiative is, you know, the inference when you watch those commercials is okay, I got machines making decisions and machines can make decisions a lot faster than humans and I know it's not just machines, but that's different. So I think we agree on that. Yeah. Good. Now let's talk about the role of our DBMS in this whole big data space. You guys have announced blue acceleration, you've got a growing portfolio. Before we dig into blue acceleration, let's talk a little bit about our DBMS. Does that fit into big data? Yeah, certainly. And actually I just did some recent blogging on exactly this point. What role does a relational database system have in big data? I'll do you one better. I mean, we've got our IMS system that's been powering the biggest, most complex, high performance applications around the world for 40 years now. And when the IMS team looks at some of these data channels that people are talking about, wow, look at this big data thing, they chuckle, because it still pales in comparison to the volume of velocity. Now that's a pre-relational system. Okay, so now you got the relational system. So my answer is the same across all these. We don't wanna throw the baby out of the bathwater. You don't wanna repeat the same mistake. All those people when mid-range and PCs came around up, the mainframe's dead. Nobody needs a mainframe anymore because look at all these things we can do with these other systems. And the reality of it is we've been growing mainframe mips every year since the death was declared. And we continue to grow new clients using system Z. So what's the moral of that story? Just because there's a new tool that does something new and better than the old one, doesn't mean it does everything new and better than the old one. It's about using the right best tool for each job. And the relational database system still does things better than any other type of data system. When you're talking about the kinds of work it was designed to do, like running SAP transactional workloads or loss and in for any of the ERP examples would be one good example. Now, that doesn't mean that system and the relational database system should stand still. You also want to extend and enhance it for these new demands, the higher performance. And so that's where blue acceleration comes in. Yeah, so the advantage, of course, is you've got, we love, of course, SiliconANGLE Wikibon. We love the disruption. We love things like Hadoop. We love things like NoSQL. It's helping make the database market interesting. We're seeing it five, six, seven years ago. It wasn't that interesting. And now it's like the hottest thing going. But at the same time, you've got this trillion dollars of IT investment built up and maybe it's multiple trillions and people just can't rip and replace. So you've got existing clients that are saying, okay, help me. I need to get from point A to point B without disrupting my business. How do I, I hear that from CIOs all the time. That's what I care about. So talk about blue acceleration. Where did it come from? How does it fit into the whole DB2 piece? Okay, so first of all, blue acceleration is actually a set of technology innovations that came from a collaboration with IBM Research and our development labs. There are actually 10 labs around the world involved in this. More than 25 patents have been filed for these innovations and it includes in memory or dynamic in memory columnar data management. And by dynamic in memory, you mean it's processing the data in memory but it has the intelligence to dynamically move data from storage into memory as it's needed. So you're not limited to in memory only and you don't have performance degradation if your data set is bigger than your available memory. That's an important differentiation of why we call it dynamic in memory. It's columnar processing and storage of data as opposed to traditional row based for the relational. And again, right tool for each job. The, for certain types of scan work and analytics work, the columnar based storage is much more efficient and you can do better compression and better cluster processing. But there's more than that. There's also parallel vector processing innovations in blue. So fanning the work across multiple processors and then using the advances in the modern chip technology and IBM Power chips and Intel x86 chips to be able to load multiple data sets for each instruction. So you're able to do parallel processing in every instruction cycle and then you fan that out across multiple cores. Then we've got actionable compression. So not only do you get the benefit of compression from columnar storage, but now we've got patented actionable compression which actually enables the processing and work in doing analytics on the compressed data without having to decompress it. So that not only adds to the space savings with the compression value, but adds to the performance boost. And then finally data skipping would be the fourth technology advantage. And that's the intelligence to know where you can skip entire sections of data and not have to process it. So you think in terms of compressing it into memory, doing columnar processing, so you can focus on the particular column you worry about that filters the data. You do data skipping that further reduces the amount of data you're processing. And then you fan out the resulting work on multiple processors and you load each processor core up with multiple data sets. So all these things build up to deliver what we've seen in early clients and partners have seen eight to 25 times faster on complete analytics workloads and specific analytic queries. We've seen more than 1,000 times faster with this technique. So what are you measuring, eight to 25X or 1,000? As you're talking about sort of wall clock time to complete a task? Yeah, in that case you're talking about running a report or doing a complex analytics query in context with answering a question. So it gets back to your point, right? This is all about getting faster answers so you can make better decisions. So even if you take the eight times faster number not the 25 times faster, think of today where you might have a client may have an analytic workload that runs all day, right? And the 25 times face 24 hours to get an answer back. Well, what if that's only an hour? How many more questions can you ask in the day, right? How many more answers are you gonna get and you're making decisions within the day as opposed to the next day based on yesterday's data? That's where this gets exciting. And because of the compression capabilities and the dynamic in memory nature, we're talking about terabytes and into the tens of terabytes of data for this kind of speed of thought analytics. And when you couple this with our Cognos BI software as an example, you're talking about the difference between I wanna see this report click, let me go get a cup of coffee or go out to lunch. When I come back, I'll have the report to look at versus click, there's the answer. Oh, I see that, huh? Now I wanna see it over here. Click, now I get the refresh. It's that speed of thought analytics that's gonna enable clients to do more things faster. I had Steve Mills on theCUBE down at 590 Madison Avenue a couple of weeks ago. He had this big flash announcement. I wanna ask you about flash a little later on, but one of the things he said, Mills is awesome, he said that, hey look, in memory, technology's been around since there's been memory. But so why are we suddenly talking about it so much now? What has changed so that it's really come to the fore? I mean, everybody's talking about it in memory. So a couple of things. First of all, memory itself has, the cost has come down. So you can have more memory, therefore you can load more data in memory and do more things with it. So that's a fundamental thing. But the real big difference is that when most people are talking about this in-memory excitement and these in-memory databases, they're really using that as a shorthand for in-memory columnar. And it's this columnar-based approach instead of the row-based approach coupled with being able to process it in memory. That is the big advantage on the speed here. And that's really what people are excited about. It's that dramatically faster answer. Now the other thing is it's also dramatically easier. It's much simpler. And that's another important value of the blue acceleration. It's not just speed, it's also simplicity. You create a blue table, you load the data into it, and you start doing your processing. And you get top performance automatically. There's no building indices, aggregates, there's no ongoing database tuning. Like you have to do if you're doing analytics against your traditional row-based approach. And it's this dramatic difference that is why, I'm running into analysts who are surprising me because they're talking about this capability among the set of no-SQL technologies, which had me scratch in my head the first time. I said, wait, it's still SQL. The SQL doesn't change. It's still relational data management. It's the same schema. It's just a different storage model and this columnar processing. And they say, yeah, but to your point, it's a dramatically different technology, a different approach that now enables me to do things faster, cheaper, simpler than I've ever been able to do. No, without any application changes. Yeah, in this case, right, and that gets back to your point of the CIA. How do I get from here to there? And why do we put blue acceleration in DB2? And we've declared when we announced this that this technology is starting out and being introduced in DB2 for the reporting and analytics workload, but we're gonna expand it for other types of data workloads and then other products. And I can tell you how we're using it in Informix already next. But the point being, if I don't have to move the data out and get a completely different system that I set up, why should I? Why add the complexity and the cost? And the beautiful thing about blue acceleration is you can have within a single database, row-based tables, columnar-based tables, all in the same database, all accessed with the same SQL. But you optimize it, so the tables that are being used for analytics and reporting are optimized in columnar form and the tables that are required for operational or transactional updates are in the row-based form, so you get the best performance. Yeah, so that's an interesting subtle, sort of nuanced, but it's an interesting part of the value proposition. So I want to come back to the value proposition, but before we do, the way I want to get there is to talk a little bit about something that you just said, which is, you know, the databases today, if you think about databases, and I'd love to sort of test this on you, I mean, they're relatively small, if you think about it. You try to avoid calls on them, because you don't want to go to disk, because disk kills performance, but now within memory technologies, that starts to change flashes. It's changing that as well. How do you see this type of capability changing the way in which databases will be designed in the future, the amount of data that I can bring into databases, and how will that affect processes? Because processes are pretty rigidly built up around databases today. Do you see that change? Oh, it definitely will change. And, you know, our client and partner that were at our launch, I think really captured the story well. So we had one of our clients, Ken Collins from BNSF Railway, was talking about his experience with Blue, and he tried to break it, tried to throw, you know, bad sequel. You know, things that, you know, if you throw it against the transactional database system, it's going to run forever, and maybe never come back. You're going to kill it. And it was coming back in seconds when he was working against Blue. And he said, you know, it's making me wonder, is there anything, is there a bad sequel anymore? Do you have to do this tuning and have sequel specialists? Because if even bad sequel can come back with an answer in a matter of seconds, it changes the paradigm. And when you can fit so much data in memory, and even when it doesn't fit in memory, it can be in flash, which takes away the disk, you know, issue. And because it's dynamic in memory, what I talked about with Blue Acceleration, it's all handled by the system. You don't have to worry about where it is. It completely changes the amount of data you can use this technology on. And that's where it comes into the big data. So back to your question of, so how does a relational database fit in big data? Because, geez, we, you know, we don't put more than a terabyte in our relational database because it's slow. Well, that's the old one. That's your grandfather's relational database system. And that's what's different now when you bring in things like Blue Acceleration. Okay, so we're talking, let's go back to the value proposition. We're talking speed, we're talking simplicity. Give us some, you know, and to say whatever else you want about the value prop, the sort of economics of this, but then tie it to some examples. And specifically, if you can talk about some examples with Blue Acceleration. Yeah, so the other one I'll mention is, you know, we had our partner Temenos, banking, core banking software. And the discussion there was about all the financial banking data. So talk about, this is structured data. This is not unstructured social, you know, none of that domain. This is all real financial structured data. And he was talking about the volume of data that exists, what he referred to as gold dust data. It's in these systems that you can't touch. You can't apply analytics to it because of the performance requirements and the operational and the security. And they don't have the time or resources to be able to set up warehouses to do analytics on this data, but they know there's value in that data if they could just analyze it. And he was talking about how when they're testing, they're bringing the Blue Acceleration capabilities to bear here, because it can be within the same database system. And they can have the row based approach to optimize for the transactional performance, but now they can actually, you know, I call it snapshot data marts, right? You can take a snapshot of the transactional operational data in the same system and now put it in a columnar blue, what we refer to as a blue table, and get this second or sub-second response time to these complex queries that up until now you just couldn't do. And he was excited, you know, for Temenos and bringing this capability to their software because they intend to provide capabilities in their software that enabled their customers, the banks, to provide services to their clients, to provide new services for marketing and sales and being able to take that data and offer it as value, whole new revenue streams are possible. Another area to look is in Telcom, right? The gold dust data that exists in all the calls and tweets and, you know, the text messages and everything, you know, that can be monetized if you can get at all and analyze it and do reporting on it, that's of value. Excellent, okay, so now let's take a look at the IBM portfolio, you've got, you've announced blue acceleration, you obviously have DB2, you've got the pure data stuff that you guys acquired in part anyway from Nutiza, Nutiza technology in there, you got Informix, bringing time series, you got Streams, you've got things like Big Insights. Do you have all these assets in your portfolio? Tie it together with us and help us better understand IBM's how they fit together in the overall strategy? Sure, just a quick thing, just to hit on the ones we haven't talked about. So the time series capability in Informix is another great example of no SQL technology. It's not standard SQL relational tables, it's really an optimized system for time series or time stamp data and Informix also has a spatial database capability. So when you talk about the big data challenge of dealing with data from meters, smart meters and sensors that are time stamped and location stamped, we see clients who are using traditional relational approach, relational database approach on that. Those systems are just falling over as they scale up the number of meters and sensors and whatnot. It just doesn't work. So it's another great big data no SQL example where if you apply the right tool to the job, dramatically faster performance, two to five times faster using half to a fifth of the processor cores and requiring a half to a third of the storage space. So now all of a sudden you can do, capture a lot more of this data and do reporting and analytics on it than you could ever afford to do before. Another great example. The big insights is our Hadoop-based capability, right? I mean, I'm a fan of reminding people that big data does not equal Hadoop, but let's not forget Hadoop is an important part of the big data story. InfoSphere Streams does the stream processing on data in motion. And then these are what I just outlined are software. So these are multi-platform software offerings. And then what we've done with our pure data system, which is a part of the pure system family, is you take the software and integrate it with hardware into an integrated system, building expertise, targeting a specific workload, have a single integrated management of the system, support, et cetera, what we refer to as expert integrated systems, right? That delivers appliance simplicity, rapid time to value ease of management by having that integrated package, right? And we've got pure data systems that are optimized for transactions, for analytics, for operational analytics. And the most recent one we announced was the pure data system for Hadoop. So we've got software and systems, why? Because some of our clients need the flexibility of multi-platform software to do their custom solutions. We've got clients who want the appliance simplicity of an integrated system. And the time value that delivers. We also have our software integrated and delivered with system Z. That's another deployment choice. System Z delivers the highest qualities of service on the planet, security, reliability, efficiency. That's another deployment choice that many of our clients require. And then there's cloud agility. So delivering our capabilities via cloud services, whether that be on the IBM smart cloud or through partners who deliver cloud-based services. So that software flexibility, appliance simplicity, systems equalities of service, and cloud agility are four fundamental deployment options. So you've got something for everybody there. And if I want the cheap and deep, lowest cost approach, I can just, in a pure Hadoop scale out, you got that too. If you're looking for a Hadoop, and that's where you're focused, and you're looking for low cost, you can have a download it for free and put it on your software. And support is optional. You can come back for support optionally if you want. And then it's the same big insights, Hadoop-based big insights offering that as your needs evolve, if you need a larger set of capabilities in an enterprise class scale environment, you can move up to the enterprise version. And then if you want it as a expert integrated systems to speed and simplify deployment, you get it as a pure data system. Right, and obviously you have your services organization out doing deep industry probes and tremendous expertise in those parts of the organization that can apply these technologies across your portfolio in ways to drive business value. Exactly, all services end, I'd also add our IBM research team. So one of my favorite areas these days is in energy and utilities, and that whole smart energy. And we're working with the smart energy team at IBM research, right, at our Watson research lab, and organizations around the world using streams to analyze the data in motion through the grid. Landing that data in a time series database, right, to do the reporting analyst and the smart meter management, feeding that data and other data into a pure data systems, perhaps powered by the Natesa technology to do broader analytics and deeper analytics amongst different types of data. And then the most recent add to this storyline is the clients that were then going the step further and using a Hadoop-based system to also bring in text information, sentiment analysis from their customers, documents from the customer relationship management to bring that to bear. So when I hear those stories like, this is exactly the message, right, it's using the right best tool for each job, looking across all your available data, both inside and outside your company, all types, all structures, and figuring out how you can extract the most data from it to get better answers and better decisions for the business. I feel like we're hitting, and we talk about all this sort of, what I would call, internet of things or something called the industrial internet, as some of the next big wave, IBM obviously calls it smarter planet, the next big wave of big data, if you will. You're seeing these GE ads now, GE's getting into it big time, so it's his old saying, right, you got one lawyer in town, they can't make any money, two lawyers, they both get rich, so you got IBM and GE, GE, of course, a customer of yours, but now doing this whole industrial internet, you guys have the smarter planet. I think people are really gonna start picking up on that, but the time is now that the technology is available to do that, and you guys, you and GE and others, each have your parts of the value chain, but that is an explosion. Oh, we're definitely seeing it, and it's not, it's smarter manufacturing, I think all those lines, transportation, the vehicles and all the sensors along the routes and doing analytics on that, it's distribution networks, trucks and fleet management. Supply chain with just the healthcare. Healthcare, right, all the telemetry you're getting from the devices, we're wearing them now, all the stories of... Watches and all these companies doing watches. Exactly, so that is a big data wave that we're in right now, that all these different capabilities I just talked you through, certainly apply, and we recently had this, when we did the Blue Acceleration Launch, we did it at our Almond and Research Center, where the relational database was invented and the disk drive was invented, and we had one of our VPs from research talk about the global technology outlook for the next five to 10 years, and the next big wave is gonna be in audio and video, and doing a better job of mining and utilizing that information automatically and adding it to the mix. So we've got a way to go here. Being able to sort of scan the content in there without having to spend 30 minutes watching the video, and metadata, it's just... Exactly. It's incomprehensible how large this is gonna be. We try to size the big data market and it's five billion here, grow into whatever 50 billion, but this is hundreds of billions, if not trillions of dollars of opportunity that we're talking about here. Yes, if you define big data as all data and all possible ways to extract value from that data, absolutely. So it's enormous, we're here watching this and excited to be a part of this trend. Bernie, I'll give you a last word. Advice to CIOs, everybody has at least attuned to the fact that there's value in data. A lot of companies, it's a bell curve, obviously. Some are leading the charge, others are trying to figure out what is this big data thing? How do I actually monetize? Where do I start? What advice would you give to CIOs who are looking to get started? Now, my advice would be, you're gonna wanna start small and specific and demonstrate a return on investment, absolutely. But do that after you've looked at the broad picture and the strategic scope of how data can impact your business. And then from that, narrowing in on what's the first most important thing that we wanna do and demonstrate success and do it in a way that it's an obvious first step in a complete roadmap. I've seen too many cases where, jumped in on the hot project, right? A vendor or one of my technical team came in with a hammer and said, our biggest problem is a nail. And we jumped right on that. And yeah, that might have worked, but it didn't turn out to be the biggest problem or it kinda was a dead end project that didn't fit into bigger things. So when you're talking to CIOs and you're talking to business leaders as well who are focused in on this, it's make sure you look at the broad picture first, lay out a roadmap and then take the first step. Excellent, all right, Bernie. Well, IBM making moves and big data. And of course we're following here at wikibon.org and siliconangle.com. Check out SiliconANGLE for all the blogs, check out Wikibon for all the research, the free research. Bernie, really appreciate you coming in, sharing IBM's insight, it's portfolio. We've got a lot going on there and we're really proud to be covering it. This is Dave Vellante of Wikibon. Thanks for watching everybody and we'll see you next time.