 Welcome back to HPE Discover 2021, theCUBE's virtual coverage, continuous coverage of HPE's annual customer event. My name is Dave Vellante and we're going to dive into the intersection of high performance computing, data and AI with Dr. Eng Lim Goh, who is the CEO Vice President and CTO for AI at Hewlett Packard Enterprise. Dr. Goh, great to see you again. Welcome back to theCUBE. Hey, hello Dave, great to talk to you again. You might remember last year we talked a lot about swarm intelligence and how AI is evolving. Of course, you hosted the day two keynotes here at Discover and you talked about thriving in the age of insights and how to craft a data-centric strategy and you addressed some of the biggest problems that I think organizations face with data. Data is plentiful but insights, they're harder to come by and you really dug into some great examples in retail banking and medicine and healthcare and media. But stepping back a little bit to zoom out on Discover 21, what do you make of the event so far and some of your big takeaways? Well, you started with the insightful question, right? Data is everywhere but we lack the insight. That's also part of the reason why. That's the main reason why Antonio on day one focused and talked about the fact that we are now in the age of insight and how to thrive in this new age. What I then did on the day two keynote following Antonio is to talk about the challenges that we need to overcome in order to thrive in this new age. So maybe we could talk a little bit about some of the things that you took away in terms of specifically interested in some of the barriers to achieving insights. Customers are drowning in data. What do you hear from customers? What were your takeaway from some of the ones you talked about today? Oh, very pertinent question, Dave. You know, the two challenges I spoke about, right? How that we need to overcome in order to thrive in this new age. The first one is the current challenge. And that current challenge is stated is barriers to insight when we are awash with data. So that's a statement, right? How to overcome those barriers? What are the barriers to insight when we are awash in data? In the day two keynote, I spoke about three main things, three main areas that we see from customers. The first one, the first barrier is with many of our customers, data is siloed, right? You know, like in a big corporation, you've got data siloed by sales, finance, engineering, manufacturing and so on, supply chain and so on. And there's a major effort ongoing in many corporations to build a federation layer above all those silos so that when you build applications above, they can be more intelligent. They can have access to all the different silos of data to get better, more intelligent applications built. So that was the first barrier we spoke about, you know, barriers to insight when we are awash with data. The second barrier is that we see amongst our customers is that data is raw and dispersed when they are stored and it's tough to get value out of them, right? And in that case, I used the example of the May 6, 2010 event where the stock market dropped a trillion dollars in tens of minutes. You know, we all know those who are financially attuned with know about this incident. But this is not the only incident, there are many of them out there. And for that particular May 6 event, you know, it took a long time to get insight, months. Yeah, before we, for months, we had no insight as to what happened, why it happened, right? And there were many other incidents like this and the regulators were looking for that one rule that could mitigate many of these incidences. One of our customers decided to take the hard road. They go with the tough data, right? Because data is raw and dispersed. So they went into all the different feeds of financial transaction information, took the tough road and analyzed that data, took a long time to assemble and they discovered that there was code stuffing, right? That people were sending a lot of trades in and then cancelling them almost immediately. You have to manipulate the market. And why didn't we see it immediately? Well, the reason is the process reports that everybody sees had a rule in there that says all trades less than 100 shares don't need to report in there. And so what people did was sending a lot of less than 100 shares trades to fly under the radar to do this manipulation. So here is the second barrier, right? Data could be raw and dispersed. Sometimes you just have to take the hard road and to get insight. And this is one great example. And then the last barrier has to do with sometimes when you start a project to get insight, to get answers and insight, you realize that all the data is around you but you don't seem to find the right ones to get what you need. You don't seem to get the right ones, yeah? Here we have three quick examples of customers. One was a great example, right? Where they were trying to build a language translator, a machine language translator between two languages, right? By not do that, they need to get hundreds of millions of word pairs of one language compared with the corresponding other, hundreds of millions of them. They say, well, I'm gonna get all these word pairs. Someone creative thought of a willing source and a huge source, it was the United Nations. You see, so sometimes you think you don't have the right data with you but there might be another source and a willing one that could give you that data, right? The second one has to do with, sometimes you may just have to generate that data, interesting one. We had an autonomous car customer that collects all these data from their cars, right? Massive amounts of data, lots of sensors collect lots of data and you know, but sometimes they don't have the data they need even after collection. For example, they may have collected the data with the car in fine weather and collected the car driving on this highway in rain and also in snow, but never had the opportunity to collect the car in hail because that's a rare occurrence. So instead of waiting for a time where the car can drive in hail, they build a simulation by having the car collected in snow and simulated hail. So these are some of the examples where we have customers working to overcome barriers, right? You have barriers that is associated with the fact that data is siloed, they federated it. Barriers associated with data that's tough to get at, they just took the hard road, right? And sometimes, thirdly, you just have to be creative to get the right data you need. Wow, I tell you, I have about a hundred questions based on what you just said. And as a great example, the flash crash, in fact, Michael Lewis wrote about this in his book, The Flash Boys, and essentially, right, it was high-frequency traders trying to front-run the market and sending in small block trades, trying to get sort of front-end it. So that's, and they chalked it up to a glitch. Like you said, for months, nobody really knew what it was. So technology got us into this problem. Can I guess my question is, can technology help us get out of the problem? And that maybe is where AI fits in. Yes, yes, in fact, a lot of analytics work went in to go back to the raw data that is highly dispersed from different sources, right? Assemble them to see if you can find a material trend, right? You can see lots of trends, right? Like, you know, if humans look at things, right? We tend to see patterns in clouds, right? So sometimes you need to apply statistical analysis math to be sure that what the model is seeing is real, right? And that required work. That's one area. The second area is, you know, when there are times when you just need to go through that tough approach to find the answer. Now the issue comes to mind now is that humans put in the rules to decide what goes into a report that everybody sees. And in this case, before the change in the rules, right? But by the way, after the discovery, the authorities changed the rules and all trades of any sizes, it has to be reported. Right. Yeah, right. But the rule was applied, you know, to say earlier that shares under 100, trades under 100 shares need not be reported. So sometimes you just have to understand that reports were decided by humans and for understandable reasons. I mean, they probably didn't want it for various reasons not to put everything in there so that people could still read it in the reasonable amount of time. But we need to understand that rules were being put in by humans for the reports we read. And as such, there are times we just need to go back to the raw data. I want to ask you- That it's going to be tough, yeah. So I want to ask you a question about AI is obviously, you know, it's in your title and it's something you know a lot about. And I'm going to make a statement. You tell me if it's on point or off point. So it seems that most of the AI going on in the enterprise is modeling data science applied to, you know, troves of data. But there's also a lot of AI going on in consumer whether it's, you know, fingerprint technology or facial recognition or natural language processing. Will, two-part question. Will the consumer market as it has so often in the enterprise sort of inform us is sort of first part. And then will there be a shift from sort of modeling, if you will to more, you mentioned autonomous vehicles more AI inferencing in real time, especially with the edge, it could help us understand that better. Yeah, this is a great question, right? There are three stages to just simplify. I mean, you know, it's probably more sophisticated than that but let's simplify. There are three stages, right? To building an AI system that ultimately can predict make a prediction, right? Or to assist you in decision making, have an outcome. So you start with the data, massive amounts of data that you have to decide what to feed the machine with. So you feed the machine with this massive chunk of data and the machine starts to evolve a model based on all the data is seeing. It starts to evolve, right? To a point that using a test set of data that you have separately kept aside that you know the answer for then you test the model, you know after you've trained it with all that data to see whether it's prediction accuracy is high enough. And once you are satisfied with it, you then deploy the model to make the decision and that's the inference, right? So a lot of times depend on what we are focusing on. In data science, are we working hard on assembling the right data to feed the machine with? That's the data preparation organization work. And then after which you build your models. You have to pick the right models for the decisions and prediction you want it to make. You pick the right models and then you start feeding the data with it. Sometimes you pick one model and the prediction isn't that robust. It is good, but then it is not consistent, right? Now what you do is you try another model. So sometimes it's just keep trying different models until you get the right kind. That gives you a good, robust decision making and prediction. Now after which if it's tested well, Q8 you will then take that model and deploy it at the edge. Yeah, and then at the edge is essentially just looking at new data, applying it to the model that you have trained and then that model will give you a prediction decision, right? So it is these three stages, yeah? But more and more, your question reminds me that more and more people are thinking as the edge become more and more powerful, can you also do learning at the edge? Right. That's the reason why we spoke about swarm learning the last time, learning at the edge as a swarm, right? Because maybe individually they may not have enough power to do so, but as a swarm they may. Is that learning from the edge or learning at the edge? In other words, is it? Yes. Yeah, you understand my question, yeah. That's a great question. That's a great question, right? So the quick answer is learning at the edge, right? And also from the edge, but the main goal, right? The goal is to learn at the edge so that you don't have to move the data that the edge sees first back to the cloud or the core to do the learning. Because that would be one of the main reasons why you want to learn at the edge, right? So that you don't need to have to send all that data back and assemble it back from all the different edge devices, assemble it back to the cloud side to do the learning, right? With swarm learning, you can learn it and keep the data at the edge and learn at that point, yeah. And then maybe only selectively send it. The autonomous vehicle example you gave is great. Because maybe they're, you know, they're maybe only persisting, they're not persisting data that is in climate weather when a deer runs across the front. And then maybe they do that and then they send that smaller data set back and maybe that's where it's modeling done, but the rest can be done at the edge. It's a new world that's coming down. Let me ask you a question. Is there a limit to what data should be collected and how it should be collected? That's a great question again, yeah. Wow, today full of these insightful questions that actually touches on the second challenge, right? How do we, you know, to thrive in this new age of insight? The second challenge is our future challenge, right? What do we do for our future? And in there is the statement we make is we have to focus on collecting data strategically for the future of our enterprise. And within that, I talk about what to collect, right? When to organize it when you collect and then where will your data be going forward that you are collecting from? So what, when and where? For the what data to collect, that was the question you asked. It's a question that different industries have to ask themselves because it will vary, right? Let me give you, you use the autonomous car example. Let me use that. And we have this customer collecting massive amounts of data, you know, we're talking about 10 petabytes a day from a fleet of their cars. And these are not production autonomous cars, right? These are training autonomous cars, collecting data so they can train and eventually deploy commercial cars, right? So these data collection cars, they collect 10, the fleet of them collect 10 petabytes a day. And then when it came to us building a storage system, you know, to store all of that data, they realized they don't want to afford to store all of it. Now here comes the dilemma, right? Should, what should I, after I spent so much effort building all these cars and sensors and collecting data, I've now decided what to delete. That's a dilemma, right? Now in working with them on this process of trimming down what they collected, you know, I'm constantly reminded of the 60s and 70s, right? To remind myself 60s and 70s, we called a large part of our DNA, junk DNA. Today we realized that a large part of that, what we call junk has function, has valuable function. They are not genes, but they regulate the function of genes, you know? So what's junk in the, yesterday could be valuable today, yeah? Or what's junk today could be valuable tomorrow, right? So there's this tension going on, right? Between you deciding not wanting to afford to store everything that you can get your hands on, but on the other hand, you know, you worry, you ignore the wrong ones, right? You can see this tension in our customers, right? And that depends on industry here, right? In healthcare, they say, I have no choice. I wanted all, right? One very insightful point brought up by one healthcare provider that really touched me was, you know, we are not, we don't only care, of course we care a lot, we care a lot about the people we are caring for, right? But you also care for the people we are not caring for. How do we find them, right? And therefore they did not just need to collect data that they have from their patients, they also need to reach out, right? To outside data so that they can figure out who they are not caring for, right? So they wanted all. So I asked them, so what do you do with funding? If you want it all, they say they have no choice but to figure out a way to fund it. And perhaps monetization of what they have now is the way to come around and fund it. Of course, they also come back to us rightfully that, you know, we have to then work out a way to help them build that system, you know? So that's healthcare, right? And if you go to other industry like banking, they say they can afford to keep them on but they are regulated, same like healthcare, they are regulated as to privacy and such like. So many examples, different industries having different needs, but different approaches to what they collect but there is this constant tension between you perhaps deciding not wanting to fund all of that, all that you can store, right? But on the other hand, you know, if you kind of don't want to afford it and decide not to store some, I mean, if you don't some become highly valuable in the future, right? You worry. Well, we can make some assumptions about the future, can't we? I mean, we know there's going to be a lot more data than we've ever seen before. We know that. We know, well, not withstanding supply constraints on things like NAND, we know the price of storage is going to continue to decline. We also know, and not a lot of people are really talking about this, but the processing power, everybody says Moore's law is dead. Okay, it's waning, but the processing power when you combine the CPUs and NPUs and GPUs and accelerators and so forth, actually is increasing. And so when you think about these use cases at the edge, you're going to have much more processing power. You're going to have cheaper storage and it's going to be less expensive processing. And so as an AI practitioner, what can you do with that? So the amount of data that's going to come in is going to way exceed our drop in storage costs, our increase in compute power. So what's the answer? So the answer must be knowing that we don't, and even the drop in price and increase in bandwidth, it will overwhelm the increase, 5G, it will overwhelm 5G given the amount, 55 billion of them collecting. So the answer must be that there might need to be a balance between you needing to bring all that data from the edge, 55 billion devices of the data back to a bunch of central cores because you may not be able to afford to do that. Firstly, bandwidth, even with 5G and SD, when you'll still be too expensive given the number of devices out there. Well, given storage cost dropping, you'll still be too expensive to try and store them all. So the answer must be to start, at least to mitigate the problem, to leave both a lot of the data out there and only send back the pertinent ones, as you said before. But then if you did that, then how are we going to do machine learning at the core and the cloud side? If you don't have all the data, you want rich data to train with, right? Sometimes you want a mix of the positive type data and the negative type data so you can train the machine in a more balanced way. So the answer must be eventually, as we move forward with these huge number of devices out at the edge to do machine learning at the edge. Today, we don't have enough power, right? The edge typically is characterized by a lower energy capability and therefore lower compute power. But soon, even with lower energy, they can do more with compute power improving in energy efficiency, right? So learning at the edge, today we do inference at the edge. So we data model deploy and you do inference at the edge. That's what we do today. But more and more, I believe, given a massive amount of data at the edge, you have to start doing machine learning at the edge. And when you don't have enough power, then you aggregate multiple devices, compute power into a swarm and learn as a swarm. Wow, interesting. So now, of course, if I were sitting and flying the wall in an HPE board meeting, I said, okay, HPE is a leading provider of compute. How do you take advantage of that? I mean, we're going, I know it's future, but you must be thinking about that and participating in those markets. I know today you are, you have edge line and other products, but there's, it seems to me that it's not the general purpose that we've known in the past. It's a new type of specialized computing. How are you thinking about participating in that opportunity for your customers? The world will have to have a balance, right? Where today the default, well, the more common mode is to collect the data from the edge and train at some centralized location or a number of centralized location. Going forward, given the proliferation of the edge devices, we'll need a balance, we need both. We need capability at the cloud side, right? And it has to be hybrid. And then we need capability on the edge side. That we need to build systems that on one hand is edge adapted, right? Meaning environmentally adapted because the edge different, they are on it. A lot of times on the outside, they need to be packaging adapted and also power adapted, right? Because typically many of these devices are battery powered, so you have to build systems that adapt to it. But at the same time, they must not be custom. That's my belief. They must be using standard processes and standard operating system so that they can run a rich set of applications. So yes, that's also insightful for that Antonio announced in 2018, for the next four years from 2018, right? $4 billion invested to strengthen our edge portfolio, our edge product lines, right? Edge solutions. Dr. Goh, I could go on for hours with you. You're such a great guest. Let's close. What are you most excited about in the future of certainly HPE, but the industry in general? Yeah, I think the excitement is the customers, right? The diversity of customers and their diversity in a way they have approach, they are different problems of data strategy. So the excitement is around data strategy, right? Just like, you know, the statement made was so, was profound, right? And Antonio said we are in the age of insight powered by data. That's the first line, right? The line that comes after that is, as such, we are becoming more and more data centric with data, the currency. Now, the next step is even more profound. That is, you know, we are going as far as saying that, you know, data should not be treated as cost anymore. No, right? But instead, as an investment in a new asset class called data, we value on our balance sheet. This is a step change, right? Right, in thinking that it's gonna change the way we look at data, the way we value it. So that's the statement. This is the exciting thing because for me, as CTO of AI, right, a machine is only as intelligent as the data you feed it with. Data is a source of the machine learning to be intelligent. So that's why when people start to value data, right, and say that it is an investment when we collect it, it is very positive for AI because an AI system gets intelligent, get more intelligence because it has huge amounts of data and a diversity of data. So it would be great if the community values data. Well, you certainly see it in the valuations of many companies these days. And I think increasingly you see it on the income statement, you know, data products and people monetizing data services and maybe eventually you'll see it in the balance sheet. I know Doug Laney, when he was at Gartner Group, wrote a book about this and a lot of people are thinking about it. That's a big change, isn't it? Dr. Yop. Yeah, the question is the process and methods in valuation, right? But I believe we'll get there. We need to get started. Then we'll get there, I believe. Dr. Goh is always a pleasure. And then the AI will benefit greatly from it. Oh yeah, no doubt. People will better understand how to align some of these technology investments. Dr. Goh, great to see you again. Thanks so much for coming back on theCUBE. It's been a real pleasure. Yes, a system is only as smart as the data you feed it with. Hey, excellent, we'll leave it there. Thank you for spending some time with us and keep it right there. For more great interviews from HPE Discover 21, this is Dave Vellante for theCUBE, the leader in enterprise tech coverage. We'll be right back.