 Okay, we're back. This is Dave Vellante. We're live from IBM's IOD conference. We're here at the Mandalay Bay Conference Center in Las Vegas, IBM IOD, an information management show that really has transformed into a big data show. Think Big is the theme of the show. IBM bringing all the horses to the track. It's a giant big data portfolio. We're here with Nagi Halim, who is the chief big data architect for IBM. Nagi, welcome to theCUBE. Thank you. Appreciate you coming on. We were talking off camera about your role at IBM. I didn't ask you this, but I'm going on camera. Is it like you're a chef and you have this sea of ingredients that you can put together? Is that, it sounds like a dream job for a technical architect type. It's more chief than chef. And I would say it's not a set of ingredients. It's a set of capabilities. Okay. Okay, we bring capabilities together. We bring people and technologies together to create new things that haven't been done before. So, give an example, if you would. Well, a lot of the big data technologies involve very significant system activities that build new kinds of capability at the systems level. So, parallelization of analytics, for example. And some of these things require very great mathematical capability in the system to actually provide the sort of intelligence that you would expect these systems to have. So, for scheduling or for making decisions about what to run or how to run jobs, et cetera. So, that's an example of where a chef, as you put it, would bring together capabilities from many different specialties, forming together a greater capability. So, you're involved in the development of technology. You're an invent stuff. Is that right? Right. So, as someone who's involved with both the research end and the practical product creation and then as product development, we both apply the technologies in real world settings. We also continue to understand where there are gaps or problems or have, potentially, those big moments when you have some insight into what's needed. And we go back to the lab, so to speak, think about it, try things out and bring new technologies in or new capabilities into the mix. So, what's the process you use to sort of identify those problems? Obviously, you talk to customers, but can you talk about that a little bit? Well, one thing that's interesting about IBM Research and IBM right now, as opposed to an academic setting, is that the problems we're faced with that clients have are, in some sense, externally defined and often very, very hard. So, in an academic setting, typically you have a problem and you can sort of shrink it down to something that you can handle. Here, the customer defines the problem. The problems can be vast in scope. They can be mission critical to that customer. They can involve potentially a lot of money. And so, it's an assessment. It's sort of a creative act of going in, listening carefully to what the client has and figuring out what we have that will work, but what we need that we don't have. And so, it's a creative process. I mean, there isn't any simple formula. It's really about assessing that problem and kind of intuiting what might be possible from our background and doing research activities and creating something, having an inspiration. And sometimes we have created things that are actually quite different than what's come before. Nagi, can you talk about R&D at IBM and how that's different from many other companies? I mean, you guys do a lot of real deep research and invention, a lot of companies will do, R&D to them is really incremental product features, for example, and that's sort of put into an R&D bucket. How is IBM different? Where's IBM different? Well, first of all, I think IBM research is an absolutely phenomenal place right now at this juncture for having a very unfettered and very encouraged environment for creating anything that we can think of that's going to address customer problems. It's often been said that we're not about invention so much as innovation. So innovation is the notion that we create new things that have direct bearing on some important real world activity. So the notion of innovation is really quite important. It's kind of a mantra that we work by. In terms of how we differ, I mean, there are many different aspects of that. I mean, one thing IBM research is extensive, so it's a worldwide presence and so there's influence from many different markets. So in China, in Japan, in Israel, in India, in Australia, in Ireland, we have many labs around the world. So that diversity of inputs is one ingredient. The diversity of people is another important ingredient. As someone who's been at research a long time, I have the ability to draw on an absolutely amazing and vast array of colleagues. If I need something that's mathematical or analytical or hardware related or communication, either something that's kind of in the current timeframe or something I need done in the future, all those things are possible. So we have a way to bring together great minds with great customers. We have many of the world's leading firms that come to us for advice. So that combination of hard problems and amazing capability on a global scale makes for a terrific environment. So you've got big data in your title. I presume that's relatively new, right? Yes and no. Most things, technologies will get tagged with a certain name at a certain point. I would say that the concepts that we've been working on have been kind of in the air for a long time. In fact, our work on the real time portion of this problem, that is the analytics that occur in real time, has been going on since about 2003, 2002. So the notion has been there for a while. In the early days we called it the data overload problem. So that was something that came from our collaboration with the Department of Defense in the United States. And eventually morphed into this kind of big data phrase that's now commonly used. And we don't use the data overload problem so much. So it's relatively new as a major strategic for us, but it has its roots back almost a decade now. So data overload problem, meaning I get so many sources of data, such a vast amount of data is coming in so fast I can't make sense out of it. Right, it's an overload. We have a couple of very specific situations I can talk about. One of them being that in modern hospitals there are a tremendous number of sensors attached to patients. And the doctors are literally in that situation currently where they have so much instrumentation on the patient, so much information coming from that instrumentation. They're actually unable to figure out on their own what the data is telling them about either the patient's current state or the prognosis for that patient. So that's a pretty good example. In general though, the sort of thematic statement is that people are builders and we build big complicated systems, financial systems, transportation systems, energy distribution systems, et cetera. We're all very much using these systems constantly. Certainly the cell phone network everyone uses on an almost constant basis. But once we create these systems, it's very difficult to figure out what they're doing sometimes. And so big data is about creating instrumentation in some sense that allows users or builders of that system set of systems to see into those systems and understand what's happening. So they allow great possibility for fraud and misuse. Sometimes they don't work the way you expect them to work, et cetera. So we now, as part of the big data mission, have the ability to create instrumentation that allows users or operators of these systems to know what's going on. Okay, so not replacing people, right? Those old saying, big data people, the last line. Absolutely not. We're not replacing people. I think the very important message there, which is we're allowing people to be more effective and to perhaps work more efficiently and more intelligently with the systems that they've created, not to displace them. So the operators will no longer be fiddling around with bits and trying to understand logs by hand. They'll have instruments that magnify their capability. And so the level of success should be going up as these technologies take root. So can you talk about some of the practical examples that you've either brought to market or are working on, maybe by different industries? Right. For me personally, one of the most fantastic and fascinating aspects of this is that there's almost no industry that we have run across that doesn't have some big data problem. At high level, I can say that agriculture, transportation, oil exploration, energy, energy delivery, natural sciences, all these fields have amazing and interesting problems of big data. You can take any example, the healthcare example I spoke of, but for network operators, they're very interested in finding out not only how the network is operating, how people are experiencing the network, the combination of network plus handsets plus different applications, means that if everything is operating correctly, customers may still not be satisfied. So how does a network provider who wants to retain customers and have good customers sat know about that and manage that reality? And that's an example of an instrument, essentially the big data instrumentation that we're creating allows network providers to peer into that user experience and see what users are maybe happy or not happy. Another angle that's completely different is that in modern society, we have people constantly say giving their opinions via Twitter or other social media mechanisms. And so another angle on this is to say that essentially have people behaving as sensors that if you put out a new product or a new advertising campaign, people will react to that. And the reactions are often carried via, say, Twitter feeds or other things of that nature. And we're now able to automatically track what people think about a product or an offering or a situation or a public health crisis or something like that. And this is all done with big data machinery. So people as sensors is a very interesting new thing that's happening. So people oftentimes, CEOs in particular, look at R&D and they say, look at development, say it's risky, right? You're not necessarily going to get a return on it. What are the parameters under which you operate? And that's by the way, that's why I think a lot of companies revert to just make an incremental product improvements because they know they can get an ROI on it. How much of a factor is that in your activities? Do you just go out and solve problems? Somebody says, here's a hard problem, go solve it? Or do you try to, does your organization try to identify a commercial use for that? Well, I mean, you're raising very interesting points. You know, this is not just a technology problem, first of all. So there is a question about how an early adopter would incorporate and use the technology, but also establish companies that maybe aren't typically early adopters. So there's kind of a ramp in. We look for projects that have solid, meaningful value to that company, that might be modest in duration. So we can do a quick succession of projects to really go in and show value over and over and over to kind of build up the confidence. So that's one thing. But the other thing that's very important to understand is it's not just a technology play. When we create a new solution or a new product, it's in part something that will be embedded in the business operations of that company. And to be embedded that way does potentially require that that company changes its nature of its operations. People have to both be willing to provide it the data that it needs to sort of maintain that whole application, but also to listen to the results. So if there's actionable result coming out the other end, that needs to be incorporated in the business processes of a company, which is in some cases much more difficult than you might expect, because to take advice essentially from a machine is something that people may not really be used to, particularly on a strategic kind of basis. And there's risk associated. What if the machine doesn't provide the answer that's correct? Or what if there's a problem with interpreting what the machine has said? How do we handle that? What's the process for incorporating that in business processes? So I believe that's an aspect of the big data evolution that needs to be focused on. You talked about earlier about real time. That's one of the hard problems that you guys started working on in the early 2000s. Can you talk about how you've addressed that problem? Have you largely solved that problem? So just to be very clear and simple about it, traditional computer science has been what's called the store and process model where the data from any analytic context is stored on disk typically or potentially on main memory. And then the analytics will work on that data until some conclusion has arrived at and then produce that conclusion. And that's fine for many applications. There is a different set of emerging applications where the latency that exists from the time that you sort of start acquiring the data until that acquisition process is complete is too long for the problem at hand. So for example, if you're storing a day's worth of data, but you need the answer in half an hour, you can see the obvious mismatch. So we need to have essentially what amounts to incremental analysis that occurs as the process is unfolding so you're able to react to things as you see them. And for example, right now in this interview, you're listening to me, you're making decisions about how to steer the conversation or ask what questions to ask or maybe some clarifications that you need in real time. You're not waiting for the interview to be all over, looking at the video and saying, oh, wow, I should have asked him about this particular question. It doesn't make sense, right? Because the interview will be over. So the real-time nature of many big data problems is a fascinating and emerging area and that's one of the focus areas. Now, the second part of your question was, well, have you addressed it? Well, we essentially created it from the ground up technology and that is IBM's InfoSphere Streams product, whose nature is incremental and very high speed and fast processing as things are unfolding. Yeah, which is, that's an organic. We've talked earlier about IBM's vast portfolio of acquisitions that it made. This is a great example of something that was grown internally. Right, this is created internally and it did in fact draw from many, many disciplines ranging from hardware through mathematics, systems, communications, and analytics. So a very interesting team effort that IBM put together. Okay, so real-time check, got that covered. What are the really interesting, hard problems are you working on? What are they now? Well, I would say we have a collection of very challenging client problems. So that would be one thing. That's in the problem space. In the research area though, there are a couple of different areas I'd like to highlight. The first being that the analysis process is often something that's done in some sense laboriously by hand and so we have a major effort going on now to automate that process. So to have the machines actually direct how the investigations go, particularly in complex fast-moving situations such as cyber, where the machines are able to learn about normal patterns of behavior in the networks. When they start detecting things deviating from that, they can focus and target analytics to look at the trouble spots, perhaps actually issuing investigational kinds of probes in those areas that are a problem in the network and drawing conclusions automatically. So I think the net is we're trying to and essentially have been successful at automating aspects of this investigational analytic process. Fantastic. Well listen, I really appreciate, Nagi, you coming on to theCUBE. I also appreciate you talking about theCUBE is real time, because theCUBE is real time here. So really appreciate your insights, great segment and thanks very much. Thank you. All right, keep it right there. Right back with Jason Silva, our next guest. Keep it right there. This is theCUBE.