 Live from the Mandalay Convention Center in Las Vegas, Nevada, it's The Cube at IBM Insight 2014. Here are your hosts, John Furrier and Dave Vellante. Okay, welcome back everyone. We are here live in Las Vegas for IBM Insight. I'm John Furrier with Dave Vellante. This is The Cube, our program. We go out to the events and extract the signal from the noise and it's exciting here at IBM Insight. In the social lounge, doing some social media, creating some new relationships. Our next guest is Kirk Born, professor of astrophysics and computational science at George Mason University. Welcome to The Cube. Thank you, John. Great to have you. One of the things we love about The Cube is we're like surges. We want to get the data out of your brain and share it with the world out there. Astrophysics, computational science. I mean, to me, the first thing that pops in my head is social data and cloud computing because the games change. I mean, I want to get your perspective just at a very high level to kick this things off. Having computation on demand is a wonderful thing and then you got this whole metaphor of Internet of Things and data is flying everywhere unstructured. It's kind of like a star cluster. You know, the Milky Way out there is somewhere out there. What's the gravity of all this data? I love your metaphors there. I know we have astronomical growth and data so I got hooked into this. Very early on because of the growth in truly astronomical data that I was working with at NASA for many years. I mean, for the first time, really think about that. The first time in modern business ever, you can measure everything. Correct. So there's no excuses. So you need the computation. How should a manager think about how to approach their future infrastructure, their future app development, their future engagement, if you will? What's the mindset? How should they approach it? It's not your grandfather's data warehousing or any other process. Well, I think you hit the nail on the head there. I think historically we think about moving the data to a centralized location. You know, putting all the data in one place and have that one unified data model and that one view of the data. But with the world as it is now, with data in so many places and so many formats and such volume, that we really have to move the computation to the data. Move the algorithm to the data. I think it's a great idea. Move the algorithm to the data. I think that's the new paradigm for infrastructure. It's how do you make the analyst's job easier? And I think trying to have that analyst or require that analyst to pull all data from all sources and do all this munging and wrangling before they can start isn't going to work. You have to basically look at the workflow as a computational workflow that's moving to data sources. We had Bill Inmanon at MIT, father of the data warehouse, right? He was really encouragingly open to new models. One would think maybe while he's hanging on, but he's not at all. And he basically said, look, you need to include these into your new world and as you say, bring the code to the data and not the data for the code. And in the surveys that we do, we find that two of the major initiatives that people have are tool sets that they bring to their big data initiatives are data integration tools and the existing data warehouse. So help us square that circle. Is that sort of the model that you see and how does the sort of old and the new fit together? Well, I have this idea in my head of a trilogy called The Lord of the Things. I want to see the main characters in this one. Well, one of the chapters of the trilogy is the twin powers of data. Twin powers. And the twin powers are both in the language you're using, the data warehouse and also the real-time operational data flow. And so when I say, empower the analysts to move the computation to the data, I'm thinking a lot more about, well, not a lot more. I'm thinking in some sense about the real-time data stream. So the analysts can't pull the real-time data stream onto their desktop. They have to let the data stream flow where it goes and they move the algorithm there. But in an operational environment where you have already designed the workflow and designed sort of the sentinels I call it, the algorithms that can identify customer events or real-time events, that can operate in a structured data environment like a data warehouse. So the data warehouse, remember, as it always has been, is that place where you have that well-defined data model, that well-defined set of data that enables your business decision-making. And once the analysts have sort of discovered the paths to insight, then they can encode that in an algorithm-extreme, the sentinels of the data. So I always talk about, with data, we're going from sensors to sentinels to sense. So we have the sensors collecting the sentinels, the algorithms, identifying when something great happens that we may try to make sense of. So as a practitioner, what does that mean for how you apply resources in my language, the traditional data warehouse and sort of the new? Are you sort of baselining your spend in the data warehouse and spending more? Actually, or are you, maybe for a dollar that you might spend here, you only have to spend 30 cents on the new stuff, but yet the volume is so much higher as you were talking about before you came on. How do you see that shaking up? Well, I'm not going to begin to pretend to tell a business how to appropriate their dollar that they have to spend on this, but they're going to have to look at really where they derive value and innovation and return on investment, or as I like to say, return on innovation. And if it's more on one side of that house that that is the real-time stream versus the warehouse operational environment, so be it. That has to be decided in each case. An example I heard of just recently of a very major Fortune 100 company, which I won't name right here right now, but their business is something completely not data when you think about their business. I mean, they're dealing with customers and you think about that face-to-face customer engagement. That's what they're about. And yet when you talk to them and they say, we're no longer that kind of a company, we're a big data company. So they're a big data company that owns properties, basically, what they said, and the properties are where they engage face-to-face with customers. And so you think about that, the company's redefining itself as a big data company. So they're actually making the decision that they're probably putting more investment into this. So it's no longer like, how do I appropriate this dollar? I now have $2 when I used to have one because from the senior executive management all the way down through the C-suite, corporate suite, people have bought into the fact that data is their future. Yeah, well, and a lot of people tell us they're the best way to get an ROI is to lower the denominator. They talk about reduction on investment. And I guess to follow up, do you see people, do you see that, and do you also see an accelerated investment in the new stuff? Well, I think the accelerated investment will return even more. I mean, so you're right, two ways to achieve that is make the denominator smaller but you make the denominator so small that your return, you invest a penny and you get two pennies back. Well, that's really not going to look good on Wall Street. But I think now this whole idea of lift-up-lift modeling, some businesses historically have said if we get like a 1% return on some kind of a campaign, that's great, that's fantastic news. Now I'm talking about businesses are seeing 73%, at one case, 700% return on the investment. And so why not invest not $1 in the denominator? On a large base. Yeah, a much bigger base because now you're talking about an explosive growth from that type of approach. Kirk, I want to get your thoughts on, I was having a chat here last time at the reception here at the social lounge with another PhD professor from Turkey. And we were talking about some papers he wrote two years ago and they're popular now. So I want to get your perspective on a couple things. One is this lag effect on the academic side with some really good body work done just a few years ago that are, and maybe go back even decade, network theory by the way, signal theory always is translating well into this computational kind of graph space, if you will, around these new databases. What historically, with old to recent is really working from a paradigm standpoint that's kind of mainstream right now. And what are some key things that you see happening right now in the business tech theater that are super exciting that people should focus in on? Well I think the lag is really a significant thing to think about as you're saying. I got first very interested in this field primarily through data mining as an application of machine learning algorithm so large data sets in this economy over 15 years ago. And as I look, I started going to some conferences on data mining and machine learning just to learn more about it. And these conferences had hundreds, if not tens of hundreds of talks at each of these conferences on new algorithms and it's like this was 15 years ago. And those are all published algorithms and there's a dozen such conferences a year and if you multiply it by 10, 20 years you're talking a lot of stuff that's in the an astronomical amount of stuff in the research literature. But has much of that seen the light of day? I don't know. What's happening now that you've seen it's super exciting? So I'm seeing, what I see what's happening is the real powerful algorithms that have sort of gotten lag, so to speak are now being adopted in very small and mid-sized companies that before it was only a larger company would maybe take the risk of investing there. So one of the areas of course is machine learning which grew out of the artificial intelligence world. And I always tell my students that machine learning is just a set of algorithms. If you apply machine learning to data it's called data mining. If you apply it to machines it's called robotics. It's the same set of algorithms. It's the brains underneath the application, right? And if you apply it to business decisions it's called operational analytics or operations research. And so operations research is a really great area of where convex optimization, which I mean huge numbers of books and a large, if you say astronomical, a large collection of research in that mathematical field is now seeing the light of day because it fits fuels right into prescriptive analytics which is the predictive but prescriptive kind of nuance there, right? So let's get to that and say I want to double click on this notion of machine learning, this fabric of machine learning and how it powers applications. We love to talk to Jeff Jonas because all we do is get intoxicated by our talks about this because it's really exciting. But Dave and I have researched that most customers think they should build systems around their existing data space. I call it data space, name space. But data sets. They know what they have. They build around that. But now new data sets are coming in. So what's happening is they need to build an infrastructure every time the new data set comes in. So they got to be agile. Also I want to bring in a comment in he Chusa mentioned at the TED and IBM events in San Francisco last month which was she kind of talked about this notion. She didn't really tease it out. It might have been premature, active data. How is active data because whether it's robotics, operational analytics or whatever application machine learning is in you got to be looking at a space that you're observing to use Jeff Jonas' word that observation space. That's a key active piece that could feed the machines to learn, right? So you're only as good as what you read or learn. So how does that all play out? Are you seeing changes in that specific area with regard to data? What kinds of data? Well I think what I see is just the diversification of the data sources more than anything. I mean a lot of times you talk with people who say we've always done big data and I sort of react negatively to that kind of comment but I understand what they're saying is that they've always been a data driven business but maybe it was just one kind of data. It might have been just like the quarterly sales data. So we talked about descriptive analytics which is looking in the rear view mirror about what happened in the past and so the predictive analytics which is one of the hottest things in this field now about predicting from what happened in the past as to what will happen. But then from there you can move to prescriptive analytics where you say well given all the possibilities of what could happen what's the most optimal thing that could happen and how can I influence to make that happen based upon what I've seen happening in the past. But the leap forward beyond that now is cognitive analytics and that's where people are looking at the full set of data. So that's what I mean by the diversification of data. Of course we're looking at social media voice of the customer and all kinds of things like what role do you see the humans playing in all this because as you go from descriptive to predictive to cognitive it gets more complex and faster. What's the role of humans? You know the little bromide, humans are the last mile. Is it true? Are the humans becoming less important, more important than what's your perspective on that? Well I'm glad you brought up that last mile statement because I always say the last mile challenge well the first mile challenge in big data is all these diverse data sources I mentioned. The last mile challenge is to derive actionable insight from it. That is the human in the loop actually being able to take some kind of an action based upon that stream. And that is really a big challenge. And so I see a lot of ways I see humans in the loop there. One of course is just the one that vets the final decision right because if you end up with sort of a list of possible actions to take someone has to make the decision. And if an algorithm shoots out a list of 10 things with the probability that they'll all have some kind of success and if the difference between the probabilities is microscopic you can still create the ranked order list but there's really no fundamental difference in those so a human has to look at that. So I'm afraid of lies, damn lies and statistics. What do you see and what do you advise for organizations? I'm not sure how this will work out but I have this idea of breadcrumbs which other people have talked about which is you see how people use the data, you see how people access the data, you see what kind of pieces of data lead to good insights and good decisions and all through that process all through that workflow that you're like tagged you think about tag management systems and web analytics there's tags that track your business users your business analysts use of data and at the end of the day you say here's the data stream and here's the workflow that really led to some really powerful outcome that happened and try to formalize that within a business. And that can be automated presumably and you say okay here's where you're spending your time, is that really where you want to be spending your time to optimize your business? So in this field there should be my personal, there's got to be a lot of fast fail that is you try a lot of things and try to find the one that works best. Don't build something out, spend two years building and discover that it doesn't work. Do a lot of fast fail and then at the end of that stream you say here are the paths through our data that led to success and that's forced ourselves to follow those. Well Kirk to your point about where data driven and I know what you mean by that because everybody says we've always been data driven but are they really? I feel like it's more than just data it's processes, it's mindset you've mentioned fast fail it's willingness to try different things it's maybe how you approach infrastructure your whole vertical stack within your industry what are the other sort of components beyond data that you see as success instruments that people can, or levers that people can turn knobs. Well some of that I would say some of that soft stuff that is culture, business culture I mean you always talk about culture each strategy for lunch or something like that or breakfast or whichever that metaphor is and I firmly believe that because I've been in places and I won't name anything but I've had a number of places I've worked at over my career and there are places that are very open to change and innovation, other places haven't been and so you need to have that corporate culture that when you as you say data driven is not just about the data, it's about the decision the mindset that we're going to use evidence-based decision making in our business and so creating data products in an environment where that works for people is really sort of fundamental to me those data products might be not just your morning daily report but it might actually be almost like a visualization of the hotspots of your business as determined by the data and storytelling seems to be a big part we see with Tableau for instance the creative side of the use of the data not only at the app developer piece which is giving the app developer front line access because now they're closer to the outcome so that's one and then the business user but we have a question from the crowd here question for Kirk how are regular business line people not PhDs going to understand the astronomical quantities of data seems hard for the average Joe line guy making daily decisions human factors well that's one of the big areas of research I would have to say is the human factors research to make this work and you mentioned Tableau already there are companies out there like that who are bringing sort of the storytelling of the data to the user who's not going to be enmeshed in mathematical algorithms on their day-to-day basis they want to be able to see what's happening and be able to tell the story of what's happening and having tools in their hands that are more semantically oriented and by semantically oriented I mean the way the data are displayed and presented is in the context of their business so it has all the contextual richness that a person needs to make a decision and so that's what I think about cognitive analytics is cognitive stuff in humanity is like when you make the decision based upon your entire context I know the right thing to do because I see this around my environment around me and I know to go through that door and not that door or something like that and the same should be happening in the sort of analytics space that helps people who use their natural inability to identify a pattern or a trend and yeah that's the one that is signaling a decision for me and make that as human friendly as possible Have you seen advances in the academic side with research and also some of the students around the creativity because this is something that we always try to tease out because that's something that's not well reported out there but we see it and we've been talking about it is there's an almost a new level of creativity because like I said everyone's connected now and the researchers love data usually data geeks right so they love to munch the data or wrangle the data eat the data play with the data party with the data has that exploded on the creativity so has it enabled new lines of thinking can you share any examples You just named everything that I live for I mean so storytelling with data the creativity and curiosity aspect is what drives me and gets me up every morning but for me how I see that in my own immediate work environment is you named my title at the university professor of astrophysics and computational science but what I do is I teach data science even though my background are those things and among these data science students I have students who are doing medical research who are doing financial research who are analyzing text records of near misses of aircraft in the national airspace it's just people who are doing time series analysis all kinds of things that are happening and feel that I personally know little about but I can communicate with my students through what I call transdisciplinary data science it transcends this so all of a sudden people are having all these creative ideas and you're not sort of stuck in your own little box trying to find someone to talk to about your idea you can now talk with people who use this common language of data science and that actually triggers even more creativity because now you hear someone else's perspective who's outside of your field but you're still able to communicate in this common data science language can you share cause this is so cutting edge I mean it's really awesome when I have a daughter who's going to college my son's in college and I try not to get them too focused on putting a stake in the ground in terms of career but I got to get your perspective let's just say that Dave myself when you were draft picking data scientists we're the general manager of the team and we want to identify some great prospects what makes that tech athlete is there a general I don't want to be too pigeonholing in terms of specifics but if we were like evaluating the candidates in a combined work out students if you will what makes the killer data scientists I mean there are different roles I don't understand that but like if we were like sitting here and draft picking what would we look for it's really funny you should say that because I follow my own collegiate football team very closely and they had a really super athletic I won't name this but people can figure this out and they had a really super athletic quarterback I mean came out of high school just really highly touted and yet the guy really made poor decisions on the field and then it was really sort of a disaster and so so if you just sort of look at sort of some measurable attributes I mean the more objective was sometimes those don't work so well but what really matters is the ability and now going back to the data science the ability to problem solve to have that insatiable curiosity to find out what the what is the answer to this question being even able to ask good questions and also the communication skills I was talking about the three C's of data scientists as the three things I look for communication, creativity and curiosity and if you don't have those three then you're already batting down in the order so to speak and I always like in my classes at university people ask me do you teach Hadoop, do you teach Java, do you teach this and that and I say I don't teach any programming language in my courses at all except in the professional course I teach Matlab just to get students familiar with it but I teach the concepts, I teach the algorithms, I teach the underlying techniques and methods and decision making why do I choose this algorithm what are the different data types? I have numeric data, I have unstructured data what algorithm works best with this and because programming languages come and go programming environments come and go they're going to have a life in this field they're going to have to know the concepts and how to do good problem solving and attacking a problem conceptually as opposed to oh I can write a piece of code that will solve that problem but not really understand why they're trying to solve the problem so I'm looking more for those that sort of leadership on the field decision making that one particular quarterback I mentioned was lacking as opposed to the fact that the guy was probably the most athletic quarterback in the country but it didn't work out too well so we got a break here, I really appreciate you coming on poor decision making, Tom Brady's got the X Factor Joe Montana, we all know Colin Kaepernick was not well touted Niners and he's just got the X Factor making things happen on the run so I got to ask you a final question here we always joke about influencer is it something that we play with a lot of data, social data, and a lot of people in the social digital world are saying oh he's influential no but he's just the loudest the milky way is gravity where the star clusters conform, stars come and go but there's always a core inner circle have you done anything around or have any thoughts on what is influence mean a mostly long tailed distribution has different targeted and different communities it seems the same metaphor seems to apply network theory, distributed networks astrophysics, kind of applied to interactions what's your thoughts on influence I guess influence is that's a good question I'm not really sure I thought too deeply about it for me influence comes first from the personal passion of the individual if you feel passionate about what you're talking about people around you will feel that passion will gather around you, maybe that's the gravity I mean so there's been some discussion in recent years about who's the top big data influencers on twitter and I've been in that conversation and it's not the people with the most followers and that sort of surprises some people because they work really hard to get a lot of followers and then they're not in that conversation of top influencers and I think it comes right down to passion, I always tell people the biggest and best compliment you can ever pay to me is when people say that I'm an enthusiastic person because yeah that's how I feel and if you notice that about me I've made an impact there I've influenced you in some way that maybe you're going to listen to me or pay attention to me so that influence you know comes from people of passion and we've seen some people like that at this IBM Insights already this week I mean some of the speakers like Jeff Jones and folks like that you see the passion in them and you just want to follow them and be influenced by them and it's that semantic alignment you're mentioning contextual so it's really back to the old internet stuff, behavioral contextual data yeah well of course page rank if you think about page rank it's not just how many links you have going out or how many coming in, it's how many other high ranked things point to you and vice versa right and so when you're interacting among a set of peers who are likewise minded that just sort of spreads and that becomes your center of gravity of that Milky Way of data science well we consider you a very influential person you have great subject matter expertise you do some great work links for people to find you crowd chats with you, what can you share for contact information well my twitter feed is my life right now I just tweeted yesterday when I arrived in Vegas that what happens in Vegas stays in my twitter timeline so you can find out about what I'm talking about I've done a lot of blogging on a different a lot of different sites but just in the last week I decided hey I need to have my own center of focus there there's not much there yet but maybe only five or six blogs that's after only being in existence for four days let us know how we can help we certainly want to promote your mojo because you got great work going on and again the gravity is everything you create a little cluster, get some collaboration going good ideas were spread and again it's all about the collaboration virtually and physically so appreciate you coming on the cube this is the cube we're here live in Las Vegas this is the social lounge special presentation from the cube silicon angle this is a nice presentation I'm John Furrier with Dave Vellante we'll be right back after this short break