 Okay, welcome back everyone. This is theCUBE. We're live in Las Vegas for IBM Impact. This is our flagship program, theCUBE. We go out to the events and extract the ceiling from the noise. I'm John Furrier, the founder of Silicon Angle. I'm joined by my co-host Paul Gillan here. And our next guest is Roger Ray, InfoSphere Streams product manager. Welcome to theCUBE. Well, thank you very much. You know, it's great to be here. And it's interesting that you almost said InfoSphere Streams without listing. That took me about a year to say that. I had to concentrate. You have to concentrate and think very hard. In fact, when Streams first came out of IBM Research to be called a product, our CEO at the time, Mr. Palmosano said, I think we need to call this SystemS, which was the research code name for Streams, instead of trying to say InfoSphere Streams. Well, we love Streams. I mean, obviously we're in the media business, online media, social media. You know, you're talking about social graphs, social distribution, word of mouth, whatever you want to call it. Essentially, they are Streams. Streams of consciousness, Streams of content, you know, even like, you know, rivers, Streams. So it's basically Streams means real time. So, Taskeek, so the average person out there, it just means more noise, right? So there's a lot of noise in the Streams because there's a lot of issues with getting data out of real time Streams. It's almost like planes taking off from every different runway on the planet, right? So like, how do you make sense of it? So the old days of normal day, you throw in a new pile, you do some data mining on it, you get some insights, and you're good. The report gets spit out, now it's different. You have all kinds of signaling, mobile phones, updates, so many omni-channels of data. So describe to the folks how you address that, and what is the whole basis behind Streams? So Streams is a real time analytic platform. So as you talked about, John, you can take mining algorithms against historic data. What you can also do is take those same kind of mining algorithms, and run them against the Stream, while it's still in memory before it ever lands to a disk drive. Streams was actually created as a joint research effort with the United States Government Department of Defense. They said, you know, we have all manner of data, and it's coming so fast that we have to process it in real time. Landing something to a disk drive, even huge clustered arrays like a dupe, is still too slow to keep up with the volume of information that we have. And so we have this challenge of the huge volume of information that's coming in at such a speed that nobody can handle it. And then there's also the issue of analytic silos. Just as many people have application silos, and they worry about integrating those different application silos, there's the challenge of how do I integrate the analytic silos? So for example, if I had street corner cameras on all the corners, and I were doing a facial recognition kind of application, we know those really aren't very reliable. Wouldn't it be nice if I could fuse that information I may have just seen Roger the bank robber on the street corner with where are all of my law enforcement personnel? Can I send a message immediately, via tweet, text message, or whatever other technology I have out to that law enforcement's location? We just saw this picture at this location a moment ago. Here's the picture from our database. Please investigate. Well, Streams does all that in real time to fuse the many different kinds of streams of information. I was just going to use that word fusion because you have context in real time motion, data in motion, as you guys say on your website. But the fusing the data, it sounds easy, but it's really not. You got other databases, other data sets that you have to kind of correlate context in real time. How, I mean, I know how hard it is being a computer scientist by training, but like, tell us how hard that is. I mean, that is pretty hard. Can you take us through the nuance there? Well, to get to the more detailed nuance, that's why there was several years of research with the government to go build this. IBM actually spent over $100 million in engineering and R&D over the last 10 years to build this. We've made it much, much easier than it's ever been. There's a high level language, software development, visual development, where you drag and drop, what are my sources of data? What do they look like? Are they structured? Is there some structured to it, such as a database, a temperature sensor that has a sensor ID and maybe temperature and humidity or other pieces of information, or unstructured information, such as a web camera, that as you're streaming out this video, what that really looks like is some camera's ID, so you know what the camera is, fuse that with information, what's the location of that camera, and then just a blob of data that frame by frame is a series of images. And then you can do the analytics on the image to understand in real time what's there. One of the things that we heard at the Amazon re-invent conference was the Kinesis product that they had. Yes. Which is kind of a data warehouse. So similar kind of thinking that they're poking at. Can you take? What I tell you, John, is they're poking at it, but they're years behind, and when you really look at Kinesis up front, all it really is ingest into an in-memory store, and they'll promise to keep it for up to 24 hours, and then read out of that to do some analytics. So you're not doing analytics on the same machine where you're storing it. It's really much more of an in-memory store as opposed to streaming analytics, which you could do with things like, well, at this show, WebSphere Extreme Scale is an in-memory grid to be able to cache things, things that have been around for years. Stream is really record by record as the records come in. You might call them events, you might call them tuples or messages if they're coming from a JMS messaging system. As each record comes in, what do I do with this? Do I want to filter it? Do I want to cleanse the data? Geospatial applications, you know, GPS, let's just drill down on that. So obviously Amazon has a little bit of a, they're taking baby steps, but they're not over the top trying to do a reach-around on IBM because they're really talking about mostly log files, right? Mostly developer-focused. Give some more examples of some of the media streams implementation. You mentioned the camera one, that's pretty real-time. That's heavy-duty data, obviously video. What other examples can you give in terms of big deployments that you guys have rolled out? Well, the media and largest deployments that we've done are in the telephone industry. So every time you browse a website, make a phone call, a little bit of metadata gets spit off of which we've read a lot about in the last six months here in the United States with the NSA and others. But many of these large telephone companies, for example, some of mine in India, get about nine billion of those per day. Hundreds of thousands per second. They're all spit out by the switches in a binary type of format. You need to convert from binary into ASCII so that you can do some cleansing of those data, maybe enrich it, and then load it into a database so that it can be used for the monthly billing cycle and other things. But at the same time, as we're converting that data from the binary to the format, we stream off the records as they're being written to the database to do further analytics. Churn prediction, is the customer likely to leave? Fraud prediction, is the customer using this new phone service to make an exorbitant amount of international calls, which is a good indicator of fraud. I bought the service, I'm gonna make a few days of international calling, and then I'll throw the phone away and disappear. So a wide variety of things in the telephone industry, just to be able to improve their processing of the data, add analytics, in near real time. We say near real time for that, because it's really every five minutes that a switch spits out a couple of thousand to 20,000 billing records. The two examples we talked about so far, government, national defense and telephone, are very, very large industries. Are there applications to small industries as well? Well that's where we're trying to go with some of the cloud offerings that we'll be coming up with, such as on Blue Mix and the IBM cloud, is to be able to say, let's allow small companies to use this in a shared infrastructure, and be able to build out some simple services. So some of those that any small company could use, for example, is analysis of the social media. We did a nice pilot, again, sort of defense, but you could see how this applies to a small business as well. One of the large police departments in the United States said we're gonna be protecting a concert, we got a black eye a few months ago, and we did another concert, and we didn't know about fights breaking out, until the next day they showed up on YouTube. So of course the press just had a field day, where was our police force at this time? We said, well, we'll do two things for you. And we did the pilot in just a couple of weeks. Let's monitor Twitter, social media. Most people never turn off the location feature in their phones. So when you tweet, guess what shows up along with that? That metadata stuff, where are you located? So it was very easy to throw away all of the tweets, and that's what you said, all of this junk that's out there that you really don't want. They only wanted tweets that were within a very small distance of the concert venue. Everything else was garbage, and they didn't care about it. And then we filtered on special words that the police might be interested in, like drugs, fight, drunk, puking, or other words that, you know, passed out would be another good one that we need to know about. And then we only gave the pertinent information to the law enforcement officers to do things, and the location, where are they in the venue? The second thing we did was just a very simple facial detection. We weren't trying to recognize anybody, but they had cameras trained on some of the fences around the venue. Obviously, unless you're eight feet tall, there's a certain level that you really shouldn't have a face on a fence. So anytime you have a face on a fence, that's probably a pretty good indication that someone's trying to get into the venue that shouldn't be coming in that direction. So that was another one that we did. And again, those kinds of security things can apply to many companies. If you have, even as a small business, are there things you want to secure? Okay, so I've got to ask you the question about the customer. So if I'm a customer upstream, or a potential customer of streams, what am I thinking about? What's on my mind? I might not say, I don't wake up in the morning and say, hey, I need streams. I have other things that I pop it through my head, like things around hitting deadline, I'd be more agile, pressure to do certain things in the business. What's the mindset of your potential customer for streams? So there's a couple of them. One of the biggest ones is simply, listen, I'm doing some analytics today, but it's batch. And I just need to do it faster. Or I need to do it continuously. One of our applications that they need to continuously monitor because the world changes is an iceberg flow monitoring application. Conoco Phillips has this oil field off the north shore of Alaska, and they have this small challenge that they need to know 48 hours in advance, is the iceberg going to hit my platform? Is it going to be strong enough and big enough that it'll cause an oil spill? And I can't be wrong in that decision because I can't have an oil spill off the north shore of Alaska. The Gulf of Mexico situation there would be untenable for all future drilling up there, not to mention the environmental impact. But there's also the issue if I'm wrong and move the platform, I lose four days worth of drilling. Two days to move it out of the way and two days to move it back in. So they take satellite images, they take wind and weather information, how's the current flowing, underwater drone information, again underwater currents, so they can accurately predict where will that be. But usually that's the kind of a problem. Listen, I've got some real world environment, I've got a variety of different sensors that are giving me all kinds of information. It might be temperature, it might be wind weather, it might be some type of video, or in this case the satellite images, where I'm analyzing the images to learn things and then try and understand how does that change over time. Now, what are the infrastructure requirements? It sounds like you would have a huge computing overhead, you would be required to do this kind of sophisticated analysis. Well, you know it's all like any other computing system that you have. How much goes into, how much goes out to, and how complex are the processing that we do inside? So you can do it on any scale is what you're saying. So you can do it on any scale. If you have a small scale, if you're just collecting Twitter feeds and you're happy with the 10% Twitter feed, that's not very much volume of information. Obviously the messages are short. If you're only searching for a few key words, you can very quickly find those key words of interest to you. Obviously video image processing are much larger. So I have customers that range from very small, roughly four core systems to be able to do these kinds of analytics. My largest installation is one of the several US government installations that's in the neighborhood of 130 core nodes doing video and image analysis, cybersecurity types of things that detect hacking into networks, zero day attack kinds of things. You have a quick start addition on the InfoSphere Stream site. What kind of experiments are you seeing people tackle with when they're just sort of playing around? So when they're playing around really, it's just sort of learn and compare, how does this compare to other streaming technologies like Kinesis that you mentioned or S4 that's been around for several years or Apache Storm that's going on. How does this compare to other kinds of things? And then once I get that in, it just depends upon what's the nature of what they're doing. Is it some internet of things? I just want to filter off, look for thresholds, is the temperature sensor going higher than it should be, accelerometers, is somebody in a car going faster than perhaps they should be to go explore geospatial kinds of applications. We've got a number, I haven't mentioned the healthcare kinds of applications that were installed in about 15 different hospitals around the world to do intensive care unit monitoring. In fact, tomorrow morning, Bob Pitchiano's keynote speech, he'll have Dr. Carolyn McGregor. Carolyn McGregor is an informatics professor from the University of Ontario Institute of Technology. She's been working with neonatal ICU's at Sick Kids Hospital and several other hospitals in Ontario and the United States and China as well. And Bob will be on shortly this afternoon, we'll have to hit him up. Oh good, you'll have to ask him about that and we'll see how well prepped he is for that. Is this to monitor sensor data to determine if a critical event is occurring? Well, what we're doing is several things. For example, Emery Hospital is one of our reference customers. They've got streams hooked up to about 100 beds. So the ICU monitors, you're familiar with ICU monitors and you see EKG waveforms. There's a number of different sets of analytics that we do want on those. All hundred of those stream into, and it's two servers is about all it takes. From each patient, they're getting about a thousand messages per second. That's that EKG waveform. And we do the analytics to detect where's the start and end of each heartbeat. So you know what the time is. And then there's some analytics that are called heart rate variability. All of us are heart rate changes all the time. The beginning and ending, where does the heart pump in each end? What's the heart rate? 72 bits, 80 beats per second when you're excited and things like that are higher. So you're looking for what's the variability in that. And if it's not varying, then you have an answer. So I got to ask you the hard question, which is the billion dollar question everyone's seeking is making it user friendly is a really, really important piece of this equation. Because essentially streams allows you to do some really kick ass instrumentation. I mean, essentially get everything ingested in. So great ingestion, love that. So big fan of that. You've seen things like Nest being bought by Google for billions of dollars. You've seen the internet of things. Big evidence on the interaction design. How do you guys make the user friendly? So like, I'm sure there's a lot of policy-based stuff going on. What's the secret sauce? What's the vision? I mean, that must be a real challenge. Can you just go into detail? So John, yeah, a couple of things. Today you should understand streams is still for programmers. We've made it easy for programmers with drag and drop visualization and lots of adapters and lots of analytics that you can bring in. To reuse existing analytics. Whether they come from Matlab or SPSS or RSS. So it's on there, you put it on them. It's on there. But it's still putting on them to be developing. Some of the things that we're starting to do on, we've got some pilots going on now with the ability to take that streaming data and streaming it into Microsoft Excel. So now you've got an Excel based end user exploration tool to say let's see those streaming records come in and I'm not quite sure what I'm looking for but I know Excel pretty well because I use it for everything, everybody does. So let me use some of the analytics in Excel to be able to further extend the base applications. And after I've learned those, let me be able to see that, report back to developers to update it. That's the short term, the longer term is. How do we allow the Excel users to build on those macros, to build on those visualizations and generate the streams code to build their own streaming applications? You've outlined some fantastic examples here. Are there any, as you build out this core technology and sort of let people find uses for it, are there any applications that customers have found that have surprised you? You know, frankly, every one of them surprises me because I'm amazed at the things we could do. Who would have thought about monitoring 100 people all at once and seeing a simple dashboard so that a doctor can quickly identify across the whole hospital where's the critical patients I should go see first? The initial work with Carolyn McGregor on the neonatal intensive care unit to detect illness 24 hours earlier. Let me ask the question differently. Okay, ask the question. You should ask someone who might be watching to say, well, I work at an accounting firm. I mean, we don't care about real time. We, it's just not part of our business. Is there some way they should be thinking about their business differently that you would advise them to think differently about the business that might realize the potential of something like Info Screams? Sure, one of them is just that question that we started with. What analytics are you doing today? Is there value to get results faster? So if today you're bringing it all in and you're doing a weekly quarter leave in accounting, there's risk accounting, what's the right timeframe that you should learn that you are exposed and should be thinking about taking some action? Is it simply the daily close of events? Would you like to start looking at intraday events to understand your risk tolerance? Are there things, if you're an insurance company at risk to do weather monitoring and are there things that you can do with your customer base to better understand even in real time the weather patterns of what's coming up, which we've also done some pilots with people on hurricanes coming in and how does that impact the risk to oil refineries and other things and what's the stock price based upon the risk of where will that hurricane hit? Excellent examples. So we are here live in Las Vegas. I want you to give the audience for the final word in the segment. Why is this a moment in time for IBM that's going to be a game changer around this? Because, you know, IBM is on all the trend lines so all the fault lines as we say is the tectonic shifts are happening. Blue mix in the cloud, streams there, you mentioned to the developers still some more work to be done on the user interface side. Big deployments, but here at Impact, what is the big thing that's happening here that you want to share with them? So the biggest thing that you'll see really is the internet of things. We've done an awful lot of work with the message site appliance people where message site gives the very fast high-speed delivery of messages. But when you're dealing with millions of messages per second from millions and millions of automobiles, millions and millions of sensors, millions and millions of sensors that are on houses to monitor your electric voltage and things, you need something that can analyze all that, monitor in real time and detect actions that should be taken. So the ability to take actionable insight to be able to understand what's going on in the real world continuously at the right moment in time make a decision, take an action that improves your business, improves your social standing, the social meeting and other things, whatever it might be, that's what the capability is. And nobody can bring that together like IBM has. We've got the analytics from acquisitions that we've done throughout the years with SPSS and Cognos. We've got the messaging capability. We've got Watson for the cognitive analytics, the entire foundation and the ability to do it, not only with my product with Streams in the real time, but numerous batch ways of doing that with big insights or Hadoop offering, the database offerings that are all part of Watson foundations where Streams can really act as the five senses feeding into Watson cognitive computing. Very impressive, certainly, I can say, looking at it, it's robust, comprehensive. But final question just to drill down on that for you is, where are we in this build, grow, monetize for the customer? I mean, that's the end game is they want to make their business more competitive, serve their customers and you see the evolution, build, grow it and then you need to make money from it in services value. Are we still in the early build stages? What would you peg the... We are still in the early build stages in several early adopter industries. There are several advanced applications where they are monetizing it and saving money in the hospital area where they're saving lives in the, as I mentioned, energy and utility things where they're doing a better job of monitoring infrastructure for repair kinds of actions. But it's really still the leading edge customers in those industries. Roger, thanks for coming on. Roger Ray with Info, Sphere, Streams. Got that right. That was great, John. Streams are outside, data lakes are lakes. They use that in big data terms. These are Streams, I'm calling gushers and Niagara Falls of data, if you will. But great stuff, really relevant. Data is the future, congratulations. This is IBM Impact, this is theCUBE streaming the data here, live to you. This is theCUBE, we'll be right back after this short break.