 From New York City, it's theCUBE, covering IBM Data Science for All, brought to you by IBM. Welcome back to Data Science for All. It's a whole new game, and it is a whole new game. Dave Vellante, John Walls here. We got a quite a distinguished panel here, David. So it is a new game for us. Well, we're in the game. I'm just happy to be, you know, have a swing at the pitch. Let's see what we have here. Five distinguished members of our panel. They'll take me a minute to get through the introductions, but believe me, they're worth it. Jennifer Shin joins us. Jennifer's the founder of Eight Path Solutions, the director of Data Science at Comcast and part of the faculty at UC Berkeley, and NYU Jennifer. Nice to have you with us. We appreciate the time. Joe McKendrick, an analyst and contributor at Forbes and ZD Net. Joe, thank you for being here as well. Another ZD Netter, next to him, Diane Hinchcliffe, who is a vice president and principal analyst of Constellation Research and also contributes to ZD Net. Good to see you, sir. To the back row, but that doesn't mean, and hang up at the quality of the participation here, Bob Hayes with a killer Batman shirt on, by the way, which we'll get to explain in just a little bit. He runs a business over Broadway, and Joe Caserta, who is the founder of Caserta Concepts. Welcome to all of you. Thanks for taking the time to be with us. Jennifer, let me just begin with you. Obviously, as a practitioner, you're very involved in the industry. You're on the academic side as well. I mentioned Berkeley, NYU, steep experience. So I want you to kind of take your foot in both worlds and tell me about data science. I mean, where do we stand now from those two perspectives? Have we evolved to where we are? And how would you describe, I guess, the state of data science? Yeah, so I think that's a really interesting question. I think there's a lot of changes happening, in part because data science has now become much more established, both in the academic side, as well as in industry. And so now you see some of the bigger problems coming out, right? People have managed to have data pipelines set up, but now there are these questions about models and accuracy and data integration. So the really cool stuff from the data science standpoint, where we get to get really into the details of the data, and I think on the academic side, you now see undergraduate programs, not just graduate programs, but undergraduate programs being developed. UC Berkeley just did a big initiative that they're going to offer data science to undergrads. And so that's a huge news for the university. So I think there's a lot of interest from the academic side to continue data science as a major, as a field. But I think in industry, one of the difficulties you're now having is, businesses are asking that question of ROI, right? What did I actually get in return for the initial years? So I think there's a lot of work to be done and there's a lot of opportunity. And it's great because people now understand better what data science is, but I think data scientists have to really think about that seriously and take it seriously and really think about how am I actually giving a return or adding a value to the business? And there's a lot to be said, is there not? I mean, just in terms of increasing the workforce, the acumen of the training that's required now, it's still a relatively new discipline. So is there a shortage issue or is there just a great need? Is the opportunity there? I mean, how would you look at that? Well, I always think there's opportunity to be smart. If you can be smarter, you know, that's always better. It gives you an advantage in the workplace. It gives you an advantage at academia. The question is, can you actually do the work? The work's really hard, right? You have to learn all these different disciplines. You have to be able to technically understand data. Then you have to understand it conceptually. You have to be able to model with it. You have to be able to explain it. There's a lot of aspects that you're not going to pick up overnight. So I think part of it is endurance. Like, are people going to feel motivated enough and dedicate enough time to it to get very good at that skill set? And also, of course, you know, in terms of industry, will there be enough interest in the long term that there will be financial motivation for people to keep, you know, staying in the field, right? So I think it's definitely a lot of opportunity. I think, but that's always been there. Like, I tell people, I think of myself as a scientist and data scientist happens to my day job, right? That's just the job title. But if you're a scientist and you work with data, you'll always want to work with data. I think that's just that inherent need. And it's kind of like compulsion. You just kind of can't help yourself but dig a little bit deeper, ask the questions, right? You can't not think about it. So I think that will always exist, whether or not it's an industry job in the way that we see it today, in like five years from now or 10 years from now, that I think that's something that's up for debate. So all of you have watched the evolution of data and how it affects organizations for a number of years now. If you go back to the days of the, when data warehouse was king, we had a lot of promises about 360 degree views of the customer and how we were going to be more anticipatory in terms and more responsive. And in many ways, the decision support systems and the data warehousing world didn't live up to those promises. They solved other problems, for sure. And so everybody was looking for big data to solve those problems. And they begun to attack many of them. We talked early in theCUBE today about fraud detection. It's gotten much, much better. Certainly retargeting of advertising has gotten better. But I wonder if you could comment, maybe start with Joe, as to the effect that data and data science has had on organizations, in terms of fulfilling that vision of a 360 degree view of customer and anticipating customer needs. Sure. So data warehousing, I wouldn't say failed, but I think it was unfinished in order to achieve what we need done today. At the time, I think it did a pretty good job. I think it was the only place where we were able to collect data from all these different systems, have it in a single place for analytics. The big difference between what I think between data warehousing and data science is data warehouses were primarily made for the consumer to be human beings, to be able to have people look through some tool and be able to analyze data manually. That really doesn't work anymore. There's just too much data to do that. So that's why we need to build a science around it so that we can actually have machines actually doing the analytics for us. And I think that's the biggest stride in the evolution over the past couple of years that now we're actually able to do that, right? It used to be very, if you go back to when data warehouses started, you had to be a deep technologist in order to be able to collect the data, write the programs to clean the data. But now, your average casual IT person can do that. Right now, I think we're back in data science where you have to be a fairly sophisticated programmer, analyst, scientist, statistician, engineer in order to do what we need to do in order to make machines actually understand the data. But I think part of the evolution, we're just in the forefront, right? We're going to see over the next, not even years, within the next year, I think a lot of new innovation where the average person within business and definitely the average person within IT will be able to as easily say, what are my sales going to be next year? Versus as easy it is to say, what were my sales last year? Where now it's a big deal, right? Now in order to do that, you have to build some algorithms, you have to be specialists in predictive analytics. And I think as the tools mature as people using data matures and as the technology ecosystem for data matures, it's just going to be easier and more accessible. So it's still too hard, right? That's something done. Today it is, yes. You've written about it and talked about it. Yeah, no question about it. Well, and we see this, the citizen data scientist and we talk about the democratization of data science but really we talk about analytics and warehousing and all the tools we had before. They generated a lot of insights and views on the information, but they didn't really give us the science part and that's, I think what's missing is the forming of the hypothesis, the closing of the loop of, we now have views of this data, but are we changing, are we thinking about it strategically? Are we learning from it and then feeding that back into the process? And I think that's the big difference between data science and the analytics side. But just like Google made search available to everyone, not just people who had highly specialized indexers or crawlers, now we can have tools that make these capabilities available to anyone. And going back to what Joe said, I think the key thing is we now have tools that can look at all the data and ask all the questions because we can't possibly do it all ourselves. Our organizations are increasingly awash in data which is the lifeblood of our organizations but we're not using it. This is the whole concept of dark data. And so I think the promise of opening these tools up for everyone to be able to access those insights and activate them, I think that's where it's headed. Well, this is kind of where the T-shirt comes in, right? So Bob, if you would, so you've got this Batman shirt on, we talked a little bit about it earlier, but it plays right into what Diane's talking about, about tools and maybe, oh, I don't want to spoil it. You go ahead and tell me about it. All right, so Batman is a superhero but he doesn't have any supernatural powers, right? He can't fly on his own, he can't become visible on his own, but the thing is he has a utility belt and he has these tools he can use to help him solve problems. For example, he has the battering when he's confronted with a building that he wants to get over, right? So he pulls it out and uses that. So as data professionals, we have all these tools now that these vendors are making. We have IBM SPSS, we have Data Science Experience, IBM Watson, that these data pros can now use as part of their utility belt and solve problems that they're confronted with. So if you're confronted with maybe a churn problem and you have somebody who has access to that data, they can input that into IBM Watson, ask a question, and it'll tell you what's the key driver of churn. So it's not that you have to be a superhuman to be a data scientist, but these tools will help you solve certain problems and help your business go forward. Joe McKendrick, you have a comment? Does that make the Batmobile Watson analogy? I was just going to add that, all of the billionaires in the world today and none of them decided to become Batman yet. It's very disappointing. Go ahead Joe, you were good. And I just want to add some thoughts to our discussion about what happened with data warehousing. I think it's important to point out as well that data warehousing as it existed was fairly successful, but for larger companies, data warehousing is a very expensive proposition. It remains an expensive proposition. Something that's in the domain of the Fortune 500, but today's economy is based on a very entrepreneurial model. The Fortune 500s are out there, of course, that's ever shifting, but you have a lot of smaller companies, a lot of people with startups. You have people within divisions of larger companies that want to innovate and not be tied to the corporate balance sheet. They want to be able to go through, they want to innovate and experiment without having to go through finance and the finance department. So there's all these open source tools available, there's cloud resources, as well as open source tools, a Hadoop, of course, being a prime example where you can work with the data and experiment with the data and practice data science at a very low cost. Joe, Dion mentioned the C word citizen data scientists last year at the panel. We had a conversation about that and the data scientists on the panel generally were like, stop, okay? We're not all of a sudden going to turn everybody into data scientists, however, what we want to do is get people thinking about data, more focused on data, becoming a data driven organization. I mean, as a data scientist, I wonder if you could comment on that. Well, so I think the other side of that is, there are also many people who maybe didn't follow through with science because it's also expensive, right? A PhD takes a lot of time and if you don't get funding, it's a lot of money. And for very little security, if you think about how hard it is to get a teaching job that's going to give you enough of a payoff to pay that back, right? The time that you took off, the investment that you made. So I think the other side of that is, by making data more accessible, you allow people who could have been great in science have an opportunity to be great data scientists. And so I think for me, the idea of a citizen data scientist, that's where that opportunity is. I think in terms of democratizing data and making it available for everyone, I feel as though it's something similar to the way we didn't really know KPIs where maybe 20 years ago, right? People didn't use it as readily, they didn't teach it in schools. I think maybe 10, 20 years from now, some of the things that we're building today from data science, hopefully more people will understand how to use these tools, right? They'll have a better understanding of working with data and what that means. And just data literacy, right? Just being able to use these tools and be able to understand what data is saying and what it's actually, what it's not saying. Which is the thing that most people don't think about. But you can also say that data doesn't say anything, right? There's a lot of noise in it. There's too much noise to be able to say that there is a result. So I think that's the other side of it. So yeah, I guess in terms of, for me in terms of data, citizen data scientists, I think it's a great idea to have that, right? But at the same time, of course, like everyone kind of emphasized, you don't want everyone out there going, I can be a data scientist without education, without statistics, without math, without understanding of how to implement the process. I've seen a lot of companies implement the same sort of process from 10, 20 years ago, just on Hadoop instead of SQL, right? And it's very inefficient. And the only difference is that you can build more tables wrong than they could before. I guess it's a accomplishment and for less. It kind of reminds me, I'm not a data scientist, but I did state a holiday and express last night, right? Yeah, and there's a little bit of pride that they used 2,000 computers to do it. Like, there's a little bit of pride about that, but of course, maybe not a great way to go. I think 20 years ago you couldn't do that, right? One computer was already a accomplishment to have that resource. So I think you have to think about the fact that if you're doing it wrong, you're going to just make that mistake bigger, which is also the other side of working with data. Sure, Bob? Yeah, I have a comment about that. I've never liked the term citizen data scientist or citizen scientist. I mean, I get the point of it and I think employees within companies can help in the data analytics problem by maybe being a data collector or something. I mean, I would never have just somebody become a scientist based on like a few classes here she takes. It's like saying like, oh, I'm going to be a citizen lawyer. So you couldn't be legal problems. We're a citizen surgeon. You need training to be good at something. You can't just be good at something just because you want to be. Joe, you wanted to say something, too, on that. Since we're in New York City, I'd like to use the analogy of a real scientist versus a data scientist. So real scientists requires tools, right? And the tools are not new, like microscopes and a laboratory and a clean room. And these tools have evolved over years and years. Since we're in New York, we could probably walk within a 10 block radius and buy any of those tools. It doesn't make us a scientist because we use those tools. I think with data, making the tools evolve and become easier to use, like Bob was saying, doesn't make you a better data scientist. It just makes the data more accessible. We can go buy a microscope. We can go buy Hadoop. We can go buy any kind of tool in a data ecosystem, but it doesn't really make you a scientist. I think, you know, I'm very involved in like the NYU data science program and the Columbia data science program. Like, these kids are brilliant. You know, these kids are not someone who is, you know, just trying to run a day-to-day job, you know, in corporate America. I think the people who are running the day-to-day job in corporate America are going to be the recipients of data science, just like people who take drugs, right? As a result of a smart data scientist coming up with a formula that can help people, I think that, you know, we're going to make it easier to distribute the data that can help people with all of the new tools, but it doesn't really make it, you know, the access to the data and the tools available doesn't really make you a better data scientist without, like Bob was saying, without better training and education. I'm sorry, how do you then, if it's not for everybody, but yet I'm the user at the end of the day at my company and I've got these reams of data before me, how do you make it make better sense to me then? So that's where machine learning comes in and artificial intelligence and all this stuff. I mean, so at the end of the day, Diane, how do you make it relevant and usable, actionable to somebody who might not be as practiced as you would like? Well, I know, I agree with Joe that many of us will be the recipients of data science and just like you had to be a computer scientist at one point to develop programs for a computer, now we can get the programs, you don't need to be a computer scientist to get a lot of value out of our IT systems. The same things that happen with data science, there's far more demand for data science than there ever could be produced by, you know, having an ivory tower filled with data scientists, which we need those guys too, don't get me wrong, but we need to have, productize it and make it available in packages such that it can be consumed. The outputs and even some of the inputs can be provided by mere mortals and, you know, whether that's machine learning or artificial intelligence or bots that go off and run the hypotheses and select the algorithms, maybe with some human help, we have to productize it. This is a concept of data science as a service, which is becoming a thing now. It's, you know, I need this, I need this capability at scale, I need it fast and I need it cheap. The commoditization, data science is going to happen. So that goes back to what I was saying about, you know, the recipient also of data science is also machines, right? Because I think the other thing that's happening now in the evolution of data is that, you know, the data is actually, it's so tightly coupled. Back when you're talking about data warehousing, you have all of the business transactions, then you take the data out of those systems, you put them in a warehouse for analysis, right? Maybe they'll make a decision to change that system at some point. Now the analytics platform and the business application is very tightly coupled. They become dependent upon one another. So, you know, people who are using the applications are now able to take advantage of the insights of data analytics and data science just through the app, you know, which never really existed before. I have one comment on that. You were talking about how do you get the end user more involved? Well, it's like, like we said earlier, data science is not easy, right? As an end user, I encourage you to take a stats course, just a basic stats course, understanding what the mean is, variability, regression analysis, just basic stuff, so you as an end user can get more, or glean more insight from the reports that you're given. If you go to France and don't know French, then people can speak really slowly to you in French, you're not going to get it. You need to understand the language of data to get value from the technology we have available to us. Incidentally, French is one of the languages that you have the option of learning if you're a mathematician, so math PhDs are required to learn a second language. France being the country of algebra, that's one of the languages you can actually learn. So, anyway, tangent, but going back to the point, so statistics courses, definitely encourage it, because I teach statistics, and one of the things that I'm finding as I go through the process of teaching it is I'm actually bringing in my experience, and by bringing in my experience, I'm actually kind of making the students think about the data differently, right? So, the other thing people don't think about is the fact that statisticians typically were expected to do just basic sort of tasks in the sense that their knowledge is specialized, right? But the data operations was they ran some data, they ran a test on some data, looked at the results, and interpreted the results based on what they were taught in school. They didn't develop that model a lot of times, they just understood what the tests were saying, right, especially in the medical field. So when you think about things like, we have words like population, right, census, which is when you take data from every single, you have every single data point versus a sample, which is a subset, right? It's a very different story now that we're collecting data faster than it used to be. It used to be the idea that you could collect information from everyone. Like, it happens once every 10 years, right? You built that in, right? But nowadays, you now hear about Facebook, for instance, I think they claimed earlier this year that their data was more accurate than the census data, right, so now there are these claims being made about which data source is more accurate, and I think the other side of this is, now sasticians are expected to know data in a different way than they were before. So it's not just changing as a field in data science, but I think the sciences that are using data are also changing their fields as well. So is sampling dead? Well, no, because- You sure it'd be? You get so much data. Well, if you're sampling wrong, yes. That's really the question. Okay, it's been said that the data doesn't lie, people do. Organizations are very political, and oftentimes lies, dam lies, and statistics, Benjamin and Israeli. Are you seeing a change in the way in which organizations are using data in the context of the politics? So some strong P&L manager says, gets data and crafts it in a way that he or she can advance their agenda, or they'll maybe attack a data set that is probably should drive them in a different direction but might be antithetical to their agenda. Are you seeing data, we are talking about democratizing data, are you seeing that reduce the politics inside of organizations? So we've always used data to tell stories, and that's at the top level of an organization, that's what it's all about. And I still see very much that no matter how much data science or the access of the truth through looking at the numbers, that storytelling is still the political filter to which all that data still passes, right? But it's the advent of things like blockchain, more and more corporate records and corporate information is going to end up in these open and shared repositories where there is no alternate truth, and that it'll come back to whoever tells the best stories at the end of the day. So I still see the organizations are very political, we are seeing now more open data though. Open data initiatives are a big thing, both in government and in the private sector, and it is having an effect but it's slow and steady. So that's what I see. Okay, go ahead. I was just going to say as well, ultimately I think data-driven decision making is a great thing and it's especially useful at the lower tiers of the organization where you have the routine day-to-day decisions that can be automated and through machine learning and deep learning the algorithms can be improved on a constant basis on the upper levels, that's why you pay executives the big bucks on the upper levels to make these strategic decisions and data can help them but ultimately data, IT, technology alone will not create new markets, it will not drive new businesses, it's up to human beings to do that. Technology is the tool to help them make those decisions but creating businesses, growing businesses is very much a human activity and that's something I don't see ever getting replaced. Technology may replace any other parts of the organization but not that part. Okay. I tend to be a foolish optimist when it comes to this stuff. You do. I do believe that data will make the world better. I do believe that data doesn't lie, people lie, people lie and I think as we start, I'm already seeing trends in industries, all different industries where conventional wisdom is starting to get trumped by analytics. You know, I think it's still up to the human being today to ignore the facts and go with what they think and when they're gut and sometimes they win, sometimes they lose but generally if they lose, the data will tell them that they should have gone the other way. I think as we start relying more on data and trusting data through artificial intelligence, as we start making our lives a little bit easier, as we start using smart cards for safety before replacement of humans, as we start using data really and analytics and data science really as the bumpers, right? Instead of the vehicle, eventually we're going to start to trust it as the vehicle itself and then it's going to make lying a little bit harder. Okay, so great, excellent. Optimism, I love it. So I'm going to play devil's advocate here a little bit. There's a couple of elephant in the room topics that I wanted to talk a little bit. Here it comes. There was an article today in Wired and it was called why AI is still waiting for its ethics transplant. And I'll just read a little segment from there. It says, new ethical frameworks for AI need to move beyond individual responsibility to hold powerful industrial government and military interests accountable as they design and employ AI. When tech giants build AI products, too often user consent, privacy and transparency are overlooked in favor of frictionless functionality that supports profit driven business models based on aggregate data profiles. This is from Kim Crawford and Meredith Whitaker who founded AI now. And they're calling for sort of almost clinical trials on AI, if I could use that analogy. Before you go to market, you've got to test the human impact, the social impact thoughts. And also have the ability for a human to intervene at some point in the process. This goes way back. Is everybody familiar with the name Stanislav Petrov? He's the Soviet officer who back in 1983, it was in the control room, I guess somewhere outside of Moscow in the control room which detected a nuclear missile attack against the Soviet Union coming out of the United States. Ordinarily I think if this was an entirely AI driven process we wouldn't be sitting here right now talking about it. But this gentleman looked at what was going on the screen and I'm sure he's accountable to his authorities in the Soviet Union. He probably got in a lot of trouble for this but he decided to ignore the signals, ignore the data coming out from the Soviet satellites. And as it turned out, of course he was right, the Soviet satellites were seeing glints of the sun and they were interpreting those glints as missile launches. And I think that's a great example of why, every situation of course doesn't mean the end of the world as it was in this case but it's a great example of why there needs to be a human component, a human ability for human intervention at some point in the process. So other thoughts, organizations are driving AI hard for profit, best minds of our generation are trying to figure out how to get people to click on ads. Sure they're. John Hammabocker is famous for saying it. You can use data for a lot of things, you can solve, you can cure cancer, you can make customers click on more ads. It depends on what your goal is but there are ethical considerations we need to think about. When we have data that will have a racial bias against blacks and have them have higher prison sentences or so forth or worse credit scores or so forth, that has an impact on a broad group of people and as a society we need to address that and then as scientists we need to consider how are we going to fix that problem. Kathy O'Neill in the book, Weapons of Math Destruction, excellent book and I highly recommend the listeners read that book. She talks about these issues about if AI, if algorithms have a widespread impact, if they adversely impact protected group and I forget the last criteria but we need to really think about these things as a people, as a country. I always think the idea of ethics is interesting so I had this conversation come up a lot of times and talked to data scientists and I think as a concept, as an idea, yes you want things to be ethical. The question I always pose them is well in the business setting how are you actually going to do this? Because I find the most difficult thing working as data scientist is to be able to make the day to day decision of when someone says I don't like that number, how do you actually get around that, if that's the right data to be showing someone or if that's accurate and say the business decides well we don't like that number, many people feel pressured to then change the data, or change what the data shows. So I think being able to educate people, to be able to find ways to say what the data is saying but not going past some line where it's a lie or it's unethical because you can also say what data doesn't say. You don't always have to say what the data does say. You can leave it as here's what we do know but here's what we don't know. There's this don't know part that many people will omit when they talk about data. So I think especially when it comes to things like AI it's tricky because I always tell people I don't know why everyone thinks AI is going to be so amazing. I started an industry by fixing problems with computers that people didn't realize computers had. For instance when you have a system, a lot of bugs, we all have bug reports that we've probably submitted. I mean really it's nowhere near the point where it's going to start dominating our lives and taking over all the jobs because frankly it's not that advanced. It's still run by people, still fixed by people, still managed by people. I think with ethics a lot of it has to do with regulations what the law is saying. That's really going to be what's involved in terms of what people are willing to do because of a lot of businesses they want to make money. If there's no rules that says they can't do certain things to make money then there's no restriction. The other, I think the other thing to think about is we as consumers every day in our lives we shouldn't separate the idea of data as a business. Think of what we think of it as a business person from our day-to-day consumer lives. Meaning, yes I work with data and incidentally I also always opt out of my credit cards when they send you that information they make you actually mail them like old school mail, snail mail like a document says okay I don't want to be part of this data collection process which I always do. It's a little bit more work but I go through that step of doing that. Now if more people did that perhaps companies would feel more incentivized to pay you for your data or give you more control over your data or at least if a company is going to collect my information I'd want there to be certain processes in place to ensure that it doesn't just get sold, right? It doesn't just, for instance, if a startup gets acquired what happens to data they have on you? You agree to give it to a startup, right? But I mean, what are the rules on that? So I think, you know, we have to really think about the ethics from not just someone who's going to implement something but as consumers we'll control we have of our own data because that's going to directly impact what businesses can do with our data. You mentioned data collection so slightly on that subject all these great new capabilities that we have coming. Talked about what's going to happen with media in the future and what 5G technology is going to do to mobile and these great bandwidth opportunities and the internet of things and internet of everywhere and all these great inputs, right? And do we have an arms race? Like are we keeping up with the capability to make sense of all the new data that's going to be coming in? And how do those things square up in this because the potential is fantastic, right? But are we keeping up with the ability to make it make sense and to put it to use, Joe? So I think data ingestion and data integration is probably one of the biggest challenges. I think, especially as the world is starting to become more dependent on data, I think, you know, just because we're dependent on numbers, right? We've come up with GAP, right? Which is generally accepted accounting principles that can be audited and proven whether it's true or false. I think, you know, I think in our lifetime we will see something similar to that where we will have formal checks and balances of data that we use that can be audited. And, you know, when getting back to, you know, what Dave was saying earlier about, you know, art, you know, I personally would trust a machine that was programmed to do the right thing than to trust a politician or some leader that may have their own agenda. And I think, you know, the other thing about machines is that they are auditable, right? It's very, you know, you can look at the code and see exactly what it's doing and how it's doing it. Human beings not so much. So I think getting to the truth, even if the truth isn't the answer that we want, I think is a positive thing, right? It's something that we can't do today that once we start relying on machines to do, we'll be able to get there. Yeah, I was just going to add that, you know, we live in exponential times. And the challenge is that the way that we're structured traditionally as organizations is not allowing us to absorb advances exponentially. It's linear at best, right? So, you know, everyone talks about change management and how are we going to do digital transformation. And the evidence shows that the technology is forcing the leaders and the laggards apart. There's a few leading organizations that are eating the world and they seem to be somehow rolling out new things. I don't know how Amazon rolls out all this stuff. There's all this artificial intelligence and the IoT devices, Alexa, natural language processing. And that's just a fraction, that's the tip of what they're releasing. So it just shows that there are some organizations that have found the way, but you know, most of the Fortune 500 from the year 2000 are gone already, right? The disruption is happening. And so we're trying to have to find some way to adopt these new capabilities and deploy them effectively. Or you know, the writing is on the wall. So, you know, I spent a lot of time exploring this topic. How are we going to get there? And all of us have a lot of hard work is the short answer. I read that there's going to be more data or it was predicted more data created in this year than in the past five thousand years, right? And that to mix the statistics that we're analyzing currently less than 1% of the data that's, so, I mean, to that taking those numbers in here, what you're all saying is like, we're not keeping up. It seems like we're, it's not even linear. I mean, that gap is just going to grow and grow and grow. So how do we close that? There's a guy out there named Chris Dancy. He's known as the human cyborg. He has 700 sensors all over his body, right? And his theory is that data is not new. Having access to the data is new. You know, we've always had a blood pressure. We've always had a sugar level, but we were never able to actually capture it in real time before. So now that we can capture and harness it, now it can be smarter about it. So I think that, you know, being able to use this information is really incredible. Like this is something that over our lifetime, we've never had and now we can do it, which it, hence the big explosion in data. But I think how we use it and have it governed, I think is the challenge right now. It's kind of cowboys and Indians out there right now. And without proper governance and without rigorous regulation, I think we are going to have some bumps on the road along the way. It is a new oil. The question is, how are we actually going to operationalize around it? To find it. So, go ahead. So I will say the other side of it is, so if you think about information, we always have the same amount of information, right? What we choose to record, however, is a different story. Now, if you wanted to know things about the Olympics, but you decide to collect information every day for four years instead of just the Olympic year, yes, you have a lot of data, but did you need all of that data, right? For that question about the Olympics, you don't need to collect data during years there are no Olympics, right? Unless of course you're comparing it relative. But I think that's the other thing to think about. Just because you collect more data does not mean that data will produce more statistically significant results. It does not mean it'll improve your model. You could be collecting data about your shoe size, trying to get information about your hair. I mean, it really does depend on what you're trying to measure, what your goals are, and what the data is going to be used for, right? If you don't factor the real world context into it, then yeah, you could collect data, an infinite amount of data, but you'll never process it because you have no question to ask. You're not looking to model anything, right? There is no universal truth about everything, right? That just doesn't exist out there. I think she's spot on. It comes down to what kind of questions are you trying to ask of your data? You can have one given database that has 100 variables in it, right? And you can ask it five different questions, all valid questions, and that data may have those variables that'll tell you what's the best predictor of churn, what's the best predictor of cancer treatment outcome. And if you can ask the right question of the data you have, then that'll give you some insight. Just data for data sake, collecting it, I mean, that's just hype. Yeah, we have a lot of data, but it may not lead to anything if we don't ask it the right questions. Joe? I agree, but I just want to add one thing. This is where the science and data science comes in. Scientists often will look at data that's already been in existence for years, weather forecasts, weather data, climate change data, for example. They go back to data charts and so forth, going back centuries if that data is available, and they reformat, they reconfigure it, they get new uses out of it. And the potential I see with all the data we're collecting is, it may not be of use to us today because we haven't thought of ways to use it, but maybe 10, 20, even 100 years from now, someone's going to think of a way to leverage the data to look at it in new ways and to come up with new ideas. That's just my thought on the science aspect. Knowing what you know about data science, why did Facebook miss Russia and the fake news trend? They came out and admitted it. We missed it. Why? Is it because they were focused elsewhere? Could they have solved that problem? And it's- We'll see how they work. It's what you said, which is, are you asking the right questions? And if you're not looking for that problem in exactly the way that it occurred, you might not be able to find it. I thought the ads were paid in rubles. Shouldn't that be your first clue that something's, I miss? Red flags, so to speak. Yes. I mean, Bitcoin, maybe it could have hidden it. Right, right, exactly. I would think, too, that what happened last year is actually was the end of an age of optimism. You know, I'll bring up the Soviet Union again. It collapsed back in 1991, 1990, 1991. Russia was reborn and I think there was a general feeling of optimism in the 90s through the 2000s that Russia is now being well-integrated into the world economy as other nations, all over the globe, all continents, are being integrated in the global economy thanks to technology and technology is lifting entire continents out of poverty and ensuring more connectedness for people across Africa, India, Asia. We're seeing those economies at very different countries than 20 years ago and that extended to Russia as well. Russia is part of the global economy. We're able to communicate. That's a global network and I think as a result, we kind of overlooked the dark side that occurred. Joe? Again, I'm a foolish optimist here, but I think that it shouldn't be the question like, had we missed it, do we have the ability now to catch it? And I think without data science, without machine learning, without being able to train the machines to look for patterns that involve corruption or result in corruption, I think we'd be out of luck. But now we have those tools and now hopefully, optimistically, by the next election, we'll be able to detect these things before they become public. Well, it was a loaded question because my premise was that Facebook had the ability and the tools and the knowledge and the data science expertise if in fact they wanted to solve that problem but they were focused on other problems. How do I get people to click on ads? Right, they had the ability to train the machines but they were giving the machines the wrong training. Looked under the wrong rock. That's right. That's easy to play armchair quarterback. Another topic I wanted to ask the panel about is IBM Watson. You guys spend time in the valley, I spend time in the valley. People in the valley poo poo Watson. Oh, Google, Facebook, Amazon, they've got the best AI. Watson and some of that's fair criticism. Watson's a heavy lift, very services oriented. He's got to apply it in a very focused. At the same time, Google's trying to get you to click on ads as is Facebook. Amazon's trying to get you to buy stuff. IBM's trying to solve cancer. Your thoughts on that sort of juxtaposition of the different AI suppliers and there may be others. Oh, nobody wants to text this one, come on. I told you, elephant in the room questions. Well, I mean, you're looking at two different, very different types of organizations. One, which has really spent, you know, decades in applying technology to business and these other companies are ones that are primarily aimed at the consumer, right? But I think, you know, when we talk about things like IBM Watson, you're looking at a very different type of solution. You used to be able to buy IT and once you installed it, you pretty much could get it to work and store your records or, you know, do whatever it is you needed it to do. But these types of tools, like Watson actually tries to learn your business and needs to spend time doing that, watching the data and having its models tuned and so you don't get the results right away. And I think that's been the kind of the challenge that organizations like IBM has had is like, this is a different type of technology solution, one that has to actually learn first before it can provide value. And so I think, you know, you have organizations like IBM that are much better at applying technology to business and then they have to further hurdle of having to try to apply these tools that work in very different ways. So it's education too on the side of the buyer. I'd have to say that, you know, I think there's plenty of businesses out there also trying to solve very significant, meaningful problems, you know, with Microsoft AI and Google AI and IBM Watson. I think it's not really the tool that matters, like we were saying earlier. A fool with a tool is still a fool. And regardless of who the manufacturer of that tool is. And I think, you know, having a thoughtful, intelligent, trained, educated data scientist using any of these tools can be equally effective. So do you not see core AI competence and I left out Microsoft as a strategic advantage for these companies? Is it going to be so ubiquitous and available that virtually anybody can apply it? Or is all the investment in R&D and AI going to pay off for these guys? Yeah, so I think there's different levels of AI, right? So there's AI where you can actually improve the model. I remember when I was invited, when Watson was kind of first out by IBM to a private, you know, sort of presentation. And my question was, okay, so when do I get to access the corpus? The corpus being sort of the foundation of NLP which is natural language processing. So it's what you use is almost like a dictionary, like how you're actually going to measure things or things of things up. And they said, oh, you can't, what do you mean I can't? It's like, we do that. So you're telling me as data scientist, you're expecting me to rely on the fact that you did it better than me and I should rely on that. I think over the years after that, IBM started opening it up and offering different ways of being able to access the corpus and work with that data. But I remember at their first Watson hackathon, there was only two corpuses available. It was either like travel or medicine. There was no other, you know, sort of foundational data available. So I think one of the difficulties was, you know, IBM being a little bit more on the forefront of it. You know, they kind of had that burden of having to develop these systems and learning kind of the hard way that, you know, if you don't have the right models and you don't have the right data and you don't have the right access, right, that's going to be a huge limiter. I think with things like medical, you know, medical information, that's an extremely difficult data to start with. Partly because, you know, anything that you do find or don't find, the impact is significant, right? If I'm looking at things like what people clicked on, the impact of using that data wrong is minimal, right? You might lose some money. If you do that with healthcare data, if you do that with medical data, people may die. Like this is a much more difficult data set to start with. So I think from a scientific standpoint, right, it's great to have any information about a new technology, a new process. So, you know, that's the nice thing is that, you know, IBM's obviously invested in it and collected information. I think the difficulty there, though, is, you know, data doesn't, just because you have it, you can't solve everything. And I feel like from someone who works in technology, I think in general, when you appeal to developers, you try not to market. And with Watson, it's very heavily marketed, which tends to turn off people who are more from the technical side. You know, because I think they don't like it when it's gimmicky, in part because they do the opposite of that, right? They're always trying to build up the technical components of it. They don't like it when you're trying to convince them that you're selling them something and you can just give them the specs and look at it. So, it could be, you know, something as simple as communication. I think, but I do think it is a valuable to have had a company who was at least on the forefront of that and tried to do it so that we can actually learn from, you know, what IBM has learned from this process. But you're an optimist. All right, good. Just one more thought. I want to see how Alexa or Siri do on Jeopardy. Yeah. All right. Gonna go around, final thought, I'll give you a second. Let's just think about like your 12-month crystal ball in terms of either challenges that need to be met in the near term or opportunities you think will be realized. 12, 18-month horizon. And so, we'll say, Bob, you've got the microphone headed up, so I'll let you lead off and let's just go around. I think a big challenge for business, for society, is getting people educated on data and analytics. There's a study that was just released, I think last month by ServiceNow, I think, or some vendor, or Click, they found that only 17% of the employees in Europe have the ability to use data in their job. Think about that. 17. 17, less than 20%. So, these people don't have the ability to understand or use data technology to improve their work performance. That says a lot about the state we're in today. And that's Europe. It's probably a lot worse in the United States, so that's a big challenge, I think. Educate the masses. Joe. I think we probably have a better chance of improving technology over training people. I think using data needs to be iPhone easy. And I think, you know, which means that a lot of innovation is in the years to come. And you know, I do think that a keyboard is going to be a thing of the past for the average user. I think we are going to start using voice a lot more. I think augmented reality is going to be things that becomes a real reality, where we can hold our phone in front of an object and it will have an overlay of prices where it's available, if it's a person. Like, I think that we will see within an organization holding a camera up to someone and being able to see what is their salary, what it sales did they do last year, some key performance indicators. I think, I hope that we are beyond the days of everyone around the world walking around like this. And we start actually becoming more social as human beings through augmented reality. I think it has to happen. You know, I think we're going through kind of foolish times at the moment in order to get to the greater good. And I think the greater good is using technology in a very, very smart way, which means that you shouldn't have to be, sorry to contradict, but maybe it's good to counterpoint. I don't think you need to have a PhD in SQL to use data. Like I think that's 1990. I think as we evolve, it's going to become easier for the average person, which means people like the brain trust here needs to get smarter and start innovating. I think the innovation around data is really at the tip of the iceberg. We're going to see a lot more of it in years to come. Diane, why don't you go ahead and we'll come down the line here. Yeah, so I think over that timeframe, two things are likely to happen. One is somebody's going to crack the consumerization of machine learning and AI such that it really is available to the masses and we can do much more advanced things than we could. We see the industries tend to reach an inflection point and then there's an explosion. No one's quite cracked the code on how to really bring this to everyone, but somebody will and that could happen in that timeframe. Then the other thing that I think that almost has to happen is that the forces for openness, open data, data sharing, open data initiatives, things like blockchain are going to run headlong into data protection, data privacy, customer privacy laws and regulations that have to come down to protect us because the industry is not doing it, the government is stepping in and it's going to re-silo a lot of our data. It's going to make it recede and make it less accessible, making data science harder for a lot of the most meaningful types of activities. Patient data, for example, is already all locked down. We could do so much more with it but health startups are really constrained about what they can do because they can't access the data. We can't even access our own health care records. So I think that's the challenge is that we have to have that battle next to be able to go take the next step. Well, I see with the growth of data, a lot of it's coming through IoT Internet of Things. I think that's a big source and we're going to see a lot of innovation. You know, new types of Ubers or Airbnb. Ubers sold 2013 though, right? We're going to see new companies with new ideas, new innovations. They're going to be looking at the ways that this data can be leveraged, all this big data or data coming in from the IoT can be leveraged. You know, there's some examples out there. There's a company, for example, that is outfitting tools, putting sensors in the tools and industrial sites can therefore track where the tools are at any given time. This is an expensive, time-consuming process, constantly losing tools, trying to locate tools, assessing whether the tools being applied to the production line are the right tools or the right torques and so forth. With the sensors and planning in these tools, it's now possible to be more efficient and there's going to be innovations like that, maybe small startup type things or smaller innovations, but it's going to be, we're going to see a lot of new ideas and new types of approaches to handling all this data. There's going to be new business ideas and the next Uber, we may be hearing about it a year from now, whatever that may be and that Uber is going to be applying data, probably IoT type data in some new innovative way. Jennifer, final word? Yeah, so I think with data, it's interesting, right? For one thing, I think one of the things that's made data more available and just people more open to the idea has been startups, but what's interesting about that is a lot of startups have been acquired and a lot of people had startups that got acquired and now these people work at bigger corporations, which was the way it was maybe 10 years ago. Data wasn't available and open, companies kept it very proprietary. You have to sign NDAs. It was like within the last 10 years that open source and all of that initiatives became much more popular, much more acceptable sort of way to look at data. I think what I'm kind of interested in seeing is what people do within the corporate environment, because they have resources, they have funding that startups don't have and they have backing. Presumably if you're acquired, you went in at a higher title in the corporate structure, whereas if you had started there, you probably wouldn't be at that title at that point. So I think you have an opportunity where people who have done innovative things and have proven that they can build really cool stuff can now be in that corporate environment. I think part of it's going to be whether or not they can really adjust to sort of the corporate landscape, the politics of it or the bureaucracy or I think every organization has that. Being able to navigate that is a difficult thing in part because it's a human skill set, right? It's a people skill, the soft skill. It's not the same thing as just being able to code something and sell it. So it's going to really come down to people. I think if people can figure out, for instance, what people want to buy, what people think, in general, that's where the money comes from. You make money because someone gave you money. So if you can find a way to look at data or even look at technology and understand what people are doing, aren't doing, what they're happy about, unhappy about, there's always opportunity in collecting data in that way and being able to leverage that to build cooler things and offer things that haven't been thought of yet. So it's a very interesting time, I think, with the corporate resources available, if you can do that, who knows what we'll have in like a year. I'll add one. The majority of companies in the S&P 500 have a market cap that's greater than their revenue. And the reason is because they have IP related to data that's of value. But most of those companies, most companies, the vast majority of companies don't have any way to measure the value of that data. There's no gap accounting standard. So they don't understand the value contribution of their data in terms of how it helps them monetize. Not the data itself, necessarily, but how it contributes to the monetization of the company. I think that's a big gap. If you don't understand the value of the data, that means you don't understand how to refine it if data is the new oil and how to protect it and so forth and secure it. And so that, to me, is a big gap that needs to get closed before we can actually say we live in a data-driven world. So you're saying, I've got an asset, I don't know if it's worth this or this. They're missing out on a great opportunity. So devolved to what I know best, which is. Great discussion. Really, really enjoy the time that's flown by. Joe, if you get that augmented reality thing and work on the salary, put it toward that guy, not this guy. Okay? It's much more impressive than what he, over there. But Joe, thank you. Diane, Joe and Jennifer and Batman. We appreciate it. I'm Bob Hayes. Thanks for being with us. Thanks you guys. Really enjoy the conversation. Thank you. And a reminder, coming up at the top of the hour, six o'clock Eastern time, ibmgo.com featuring the live keynote, which is being set up just about 50 feet from us right now. Made Silver is one of the headliners there, John Thomas as well, or rather Rob Thomas, John Thomas we had on earlier on theCUBE. But a panel discussion as well, coming up at six o'clock on ibmgo.com. Six to seven, 15, so be sure to join that live stream. That's it from theCUBE. We certainly appreciate the time. Glad to have you along here in New York. And until the next time, take care. Oh.