 From around the globe, it's theCUBE with digital coverage of Exascale Day, made possible by Hewlett-Packard Enterprise. Hey, welcome back, everybody. Jeff Frick here with theCUBE, coming to you from our Palo Alto studios for their ongoing coverage in the celebration of Exascale Day, 10 to the 18th on October 18th, 10 with 18 zeros. It's all about big, powerful, giant computing and computing resources and computing power. And we're excited to invite back our next guest. She's been on before. She's Dr. Artie Garg, Head of Advanced AI Solutions and Technologies for HPE. Artie, great to see you again. Great to see you. Absolutely. So let's jump into, before we get into Exascale Day, I was just looking at your LinkedIn profile. It's such a very interesting career. You've done time at Lawrence Livermore. You've done time in the federal government. You've done time at GE and in industry. I just love if you can share a little bit of your perspective going from hardcore academia to comes from government positions then into industry as a data scientist and now with originally Cray and now HPE looking at it really from more of a vendor side. Yeah. So I think in some ways, I think I'm like a lot of people who've had the title of data scientist somewhere in their history where there's no single path to really working in this industry. I come from a scientific background. I have a PhD in physics and that's where I started working with large data sets. I think of myself as a data scientist before the term data scientist was a term. And I think it's an advantage to be able to have seen this explosion of interest in leveraging data to gain insights, whether that be into the structure of the galaxy, which is what I used to look at or whether that be into maybe new types of materials that could advance our ability to build lightweight cars or a safety gear. It allows you to take a perspective to not only understand what the technical challenges are but what also the implementation challenges are and why it can be hard to use data to solve problems. Right. Well, I just love to hear again your perspective because you are into data. You chose that as your profession and you probably run with a whole lot of people that are also like minded in terms of data. As an industry and as a society, we're trying to get people to do a better job of making database decisions and getting away from their gut and actually using data. What if you can talk about the challenges of working with people who don't come from such an intense data background to get them to basically, I don't know if it's understand the value of more of a data kind of decision-making process or more just it's worth the effort because it's not easy to get the data and cleanse the data and trust the data and get the right context. So working with people that don't come from that background and aren't so entrenched in that point of view, what surprises you? How do you help them? What can you share in terms of helping everybody get to be a more data-centric decision-maker? So I would actually rephrase the question a little bit, Jeff, and say that actually I think people have always made data-driven decisions. It's just that in the past, we maybe had less data available to us or the quality of it was not as good. And so as a result, most organizations have developed, organized themselves to make decisions to run their processes based on a much smaller and more refined set of information than is currently available both given our ability to generate lots of data through software and sensors, our ability to store that data, and then our ability to run a lot of computing cycles and a lot of advanced math against that data to learn things that maybe in the past took hundreds of years of experiments and scientists to understand. And so before I jump into how do you overcome that barrier, just I'll use an example because you mentioned I used to work in industry, I used to work at GE. And one of the things that I often joked about is the number of times I discovered Bernoulli's principle in data coming off of GE jet engines, you could do that overnight processing these large data, but of course historically that took hundreds of years to really understand these physical principles. And so I think when it comes to how do we bridge the gap between people who are adept at processing large amounts of data and running algorithms to pull insights out, I think it's both sides. I think it's those of us who are coming from the technical background really understanding the way decisions are currently made, the way process and operations currently work at an organization and understanding why those things are the way they are, maybe their security or compliance or accountability concerns that a new algorithm can't just replace those. And so I think it's on our end really trying to understand and make sure that whatever new approaches we're bringing address those concerns. And I think for folks who aren't necessarily coming from a large data set and analytical background and when I say analytical I mean in the data science sense not in the sense of thinking about things in an abstract way to really recognize that these are just tools that can enhance what they're doing and they don't necessarily need to be frightening because I think that people who have been say operating electric grids for a long time or fixing aircraft engines, they have a lot of expertise and a lot of understanding and that's really important to making any kind of AI driven solution work. Right, that's great insight. But I do think one thing that's changed is you come from a world where you had big data sets so you kind of have a big data set point of view where I think for a lot of decision makers they didn't have that data before. So we won't go through all the up and to the right explosions of data and obviously we're talking about exascale day but I think for a lot of processes now the amount of data that they can bring to bear is so dwarfs what they had in the past that before they even consider how to use it right they still have to contextualize it and they have to manage it and they have to organize it and there's data silos. So there's all this kind of nasty processing stuff that's in the way some would argue has been kind of a real problem with the promise of BI and decision support tools. So as you look at this new stuff and these new data sets what are some of the people in process challenges beyond the obvious things that we can think about which are the technical challenges? Yeah, so I think that you've really hit on something I talk about sometimes was kind of the data deluge that we experienced these days and the notion of feeling like you're drowning in information but really lacking any kind of insight. And one of the things that I like to think about is to actually step back from the data questions the infrastructure questions sort of all of these technical questions that can seem very challenging to navigate and first ask ourselves what problems am I trying to solve? It's really no different than any other type of decision you might make in an organization to say like what are my biggest pain points? What keeps me up at night or what would just transform the way my business works? And those are the problems worth solving and then the next question becomes if I had more data, if I had a better understanding of something about my business or about my customers or about the world in which we all operate would that really move the needle for me? And if the answer is yes then that starts to give you a picture of what you might be able to do with AI and it starts to tell you which of those data management challenges whether they be cleaning the data whether it be organizing the data whether it be building models on the data are worth solving because you're right those are going to be time intensive, labor intensive highly iterative efforts but if you know why you're doing it then you will have a better understanding of why it's worth the effort and also which shortcuts you can take and which ones you can't because often in order to sort of see the end state you might want to do a really quick experiment or prototype and so you want to know what matters and what doesn't at least to that is this going to work at all type of business? Right, right. So you're not buying the age old ad you just throw a bunch of data in a data lake and the answers will just spring up just come right back out of the water. I mean, you bring up such a good point, right? It's all about asking the right questions and thinking about asking questions. So again, when you talk to people about helping them think about the questions, right? Cause then you've got to shape the data to the question and then you've got to start to build the algorithm to kind of answer that question. How should people think when they're when they're actually building algorithms and training algorithms, what are some of the typical kind of pitfalls that a lot of people fall in that haven't really thought about it before and how should people frame this process? Cause it's not simple and it's not easy and you really don't know that you have the answer until you run multiple iterations and compare it against some other type of reference. Yeah. Well, one of the things that I like to think about just so that you're sort of thinking about all the challenges you're going to face upfront, you don't necessarily need to solve all of these problems right at the outset but I think it's important to identify them is I like to think about AI solutions as they get deployed being part of a kind of workflow and the workflow has multiple stages associated with it. The first stage being generating your data and then starting to prepare and explore your data and then building models for your data. But sometimes I think where we don't always think about it is the next two phases, which is deploying whatever model or AI solution you've developed and what will that really take? Especially in the ecosystem where it's gonna live is it gonna live in a secure and compliant ecosystem? Is it actually gonna live in an outdoor ecosystem? We're seeing more applications on the edge. And then finally, who's gonna use it and how are they gonna drive value from it? Because it could be that your AI solution doesn't work because you don't have the right dashboard that highlights and visualizes the data for the decision maker who will benefit from it. So I think it's important to sort of think through all of these stages upfront and think through maybe what some of the biggest challenges you might encounter at them are so that you're prepared when you meet them and you can kind of refine and iterate along the way and even upfront tweak the question you're asking. That's great. So I wanna get your take on, we're celebrating Exascale Day, which is something very specific on 1018. Share your thoughts on Exascale Day specifically, but more generally, I think just in terms of being a data scientist and suddenly having all this massive compute power at your disposal, right? You've been around for a while, so you've seen the development of the cloud, these huge data sets and really the ability to put so much compute horsepower against the problems as networking and storage and compute just asymptotically approach zero. I mean, as a data scientist, you gotta be pretty excited about kind of new mysteries, new adventures, new places to go that we just couldn't do it 10 years ago, five years ago, 15 years ago. Yeah, I think that it'll only time will tell exactly all of the things that we'll be able to unlock from these new sort of massive computing capabilities that we're going to have, but a couple of things that I'm very excited about are that in addition to sort of this explosion or these very large investments in large supercomputers, Exascale, supercomputers, we're also seeing actually investment in these other types of scientific instruments that when I say scientific, it's not just academic research, it's driving pharmaceutical drug discovery because we're talking about these, what they call light sources, which shoot X-rays at molecules and allow you to really understand the structure of the molecules. What Exascale allows you to do is historically it's been that you would go take your molecule to one of these light sources and you shoot your X-rays at it and you would generate just masses and masses of data, terabytes of data, it would each shot. And being able to then understand what you were looking at was a long process of getting computing time and analyzing the data. We're on the precipice of being able to do that, if not in real time, much closer to real time. And I don't really know what happens if instead of coming up with a few molecules, taking them, studying them, and then saying maybe I need to do something different, I can do it while I'm still running my instrument. And I think that it's very exciting from the perspective of someone who's got a scientific background who likes using large data sets, there's just a lot of possibility of what Exascale computing allows us to do from the standpoint of, I don't have to wait to get results. And I can either simulate much bigger, say, galaxies and really compare that to my data or galaxies or universes if you're an astrophysicist or I can simulate much smaller, finer details of a hypothetical molecule and use that to predict what might be possible. From materials or drug perspective, just to name two applications that I think Exascale could really drive. That's really great feedback. Just to shorten that compute loop, right? We had an interview earlier and someone was talking about when the biggest workload you had to worry about was the end of the month when you're running your financials. And I was like, why wouldn't that be nice to be the biggest job that we have to worry about? But now I think we saw some of this at animation in the movie business when the rendering for, whether it's a full animation movie or just something that's a heavy duty 3D effects, when you can get those dailies back to the artist, as you said, while you're still working or closer to when you're working versus having this huge kind of compute delay, it just changes the workflow dramatically and the pace of change and the pace of output because you're not context switching as much and you can really get back into it. That's a super point. I wanna shift gears a little bit and talk about explainable AI. So this is a concept that a lot of people hopefully are familiar with. So AI, you build the algorithm, it's in a box, it runs and it kicks out an answer. And one of the things that people talk about is we should be able to go in and pull that algorithm apart to know why it came out with the answer that it did. To me, this just sounds really, really hard because it's smart people like you that are writing the algorithms, the inputs and the data that feeds that thing are super complex. The math behind it is very complex and we know that the AI trains and can change over time as you train the algorithm. It gets more data and adjusts itself. So is explainable AI even possible? Is it possible at some degree? Because I do think it's important and my next question is gonna be about ethics to know why something came out and the other piece that becomes so much more important is as we use that output not only to drive a human-based decision that needs some more information but increasingly moving it over to automation. So now you really wanna know why did it do what it did? Explainable AI, share your thoughts. Yeah, no, it's a great question and it's obviously a question that's on a lot of people's mind these days. I'm actually gonna revert back to what I said earlier when I talked about Bernoulli's principle and just the ability sometimes when you do throw an algorithm at data it might come, the first thing it will find is probably some known law of physics. And so I think that really thinking about what do we mean by explainable AI also requires us to think about what do we mean by AI? These days AI is often used synonymously with deep learning which is a particular type of algorithm that is not very analytical at its core. And what I mean by that is other types of statistical machine learning models have some underlying theory of the population of data that you're studying. And whereas deep learning doesn't it kind of just learns whatever pattern is sitting in front of it. And so there is a sense in which if you look at other types of algorithms they are inherently explainable because you're choosing your algorithm based on what you think the is the sort of ground truth about the population you're studying. And so I think are we gonna get to explainable deep learning? I think it's kind of challenging because you're always gonna be in a position where deep learning is designed to just be as flexible as possible and sort of throw more math at the problem because there maybe are things that your simpler model doesn't account for. However, deep learning could be part of an explainable AI solution if for example, it helps you identify what are important so-called features to look at. What are the important aspects of your data? So I don't know. It depends on what you mean by AI but are you ever gonna get to the point where you don't need humans sort of interpreting outputs and making some sets of judgments about what a set of computer algorithms that are processing data think. I think it will take, I don't wanna say I know what's gonna happen 50 years from now but I think it'll take a little while to get to the point where where you don't have to maybe apply some subject matter understanding and some human judgment to what an algorithm is putting out. Yeah, it's really interesting. We had Dr. Robert Gates on years ago at another show and he talked about the only guns in the US military if I'm getting this right that are automatic that will go based on what the computer tells them to do and start shooting are on the Korean border but short of that there's always a person involved before anybody hits a button which begs a question, right? Cause we've seen this on the big data kind of curve and the gardeners has talked about it as we move up from kind of descriptive analytics, diagnostic analytics, predictive and then prescriptive and then hopefully autonomous. So I wonder, so you're saying world's still a little ways and that last little bumps gonna be tough to overcome to get to the true autonomy. I think so and it's going to be very application dependent as well. So it's an interesting example to use the DMZ because that is obviously also a very mission critical I would say example, but in general I think that you'll see autonomy and you already do see autonomy in certain places where I would say the states are lower. So if I'm going to have some kind of recommendation engine that suggests if you looked at the sweater maybe like that one, the risk of getting that wrong and so fully automating that is a little bit lower because the risk is you don't buy the sweater I lose a little bit of income I lose a little bit of revenue as a retailer but the risk of I make that turn because I'm an autonomous vehicle as much higher. So I think that you will see the progression up that curve being highly dependent on what's at stake with different degrees of automation that being said you will also see in certain places where it's either really expensive or it's humans aren't doing a great job you may actually start to see some mission critical automation but those would be the places where you're seeing them and actually I think that's one of the reasons why you see actually a lot more autonomy in the agriculture space than you do in the sort of passenger vehicle space because there's a lot at stake and it's very difficult for human beings to sort of drive large compounds. Right, well plus they have a controlled environment so I've interviewed Caterpillar they're doing a ton of stuff with autonomy because they control that field where those things are operating whether it's a field or a mine it's actually fascinating how far they've come with autonomy but let me switch to a different industry that I know is closer to your heart looking at some other interviews and let's talk about diagnosing disease and if we take something specific like reviewing x-rays where the computer and it also brings in the whole computer vision and bringing in computer vision algorithms excuse me, they can see things probably fast or do a lot more comparisons than potentially a human doctor can and or hopefully this whole signal to noise conversation elevate the signal for the doctor to review and suppress the noise it's really not worth their time. They can also review a lot of literature and hopefully bring a broader potential perspective of potential diagnoses within a set of symptoms. You said before both your folks are physicians and there's a certain kind of magic, a nuance almost like kind of more childlike exploration to try to get out of the algorithm if you will to think outside the box. I wonder if you can share that synergy between using computers and AI and machine learning to do really arduous nasty things like going through lots and lots and lots of x-rays compared to and how that helps with a doctor who's got a whole different kind of set of experience a whole different kind of empathy a whole different type of a relationship with that patient than just a bunch of pictures of their heart or their lungs. Yeah, I think that one of the things is and this kind of goes back to this question of is AI for decision support versus automation? And I think that what AI can do and what we're pretty good at these days with computer vision is picking up on subtle patterns right now, especially if you have a very large data set. So if I can train on lots of pictures of lungs it's a lot easier for me to identify the pictures that somehow these are not like the other ones. And that can be helpful, but I think then to really interpret what you're seeing and understand is this, I mean, is it actually bad quality image? Is it some kind of medical issue and what is the medical issue? I think that's where bringing in a lot of different types of knowledge and a lot of different piece of information right now, I think humans are a little bit better at doing that. And some of that's because I don't think we have great ways to train on sort of sparse data sets, I guess. And the second part is that human beings might be 40 years of training a model or 50 years of training a model as opposed to six months or some things. With sparse information, that's another thing that human beings have their sort of lived experience and the data that they bring to bear on any type of prediction or classification is actually more than just say what they saw in their medical training. It might be the people they've met, the places they've lived, what have you. And I think that's that part, that sort of broader set of learning and how things that might not be related might actually be related to your understanding of what you're looking at. I think we've got a ways to go from sort of artificial intelligence perspective and development. Right. But it is Exascale Day and we all know about the compound exponential curves on the computing side. But let's shift gears a little bit. I know you're interested in emerging technology to support this effort. And there's so much going on in terms of kind of the atomization of compute store and networking to be able to break it down into smaller, smaller pieces so that you can really scale the amount of horsepower that you need to apply to a problem to very big or to very small. Obviously, the stuff that you work is more big than small. Work on GPUs, a lot of activity there. So I wonder if you could share some of the emerging technologies that you're excited about to bring again more tools to the task. Yeah, I mean, one of the areas I personally spend a lot of my time exploring are, I guess this word gets used a lot, the Cambrian Explosion of new AI accelerators, new types of chips that are really designed for different types of AI workloads. And as you sort of talked about going down and it's almost in a way where we're sort of going back and looking at these large systems but then exploring each little component of them and trying to really optimize that or understand how that component contributes to the overall performance of the whole. And I think one of the things that the, just I don't even know, they're probably close to 100 active vendors in the space of developing new processors and new types of computer chips. I think one of the things that that points to is we're moving in the direction of generally infrastructure heterogeneity. So it used to be when you built a system, you probably had one type of processor, you probably had a pretty uniform fabric across your system. You usually had, I think maybe storage, we started to get tiering a little bit earlier. But now I think that what we're gonna see, and we're already starting to see it with our exascale systems where you've got GPUs and CPUs on the same blades is we're starting to see as the workloads that are running at large scales are becoming more complicated. Maybe I'm doing some simulation and then I'm running, I'm training some kind of AI model and then I'm inferring it on some other type, some other output of the simulation. I need to have the ability to do a lot of different things and do them at a very advanced level, which means I need very specialized technology to do it. And it's an exciting time and I think we're gonna test, we're gonna break a lot of things. I probably shouldn't say that in this interview, but I'm hopeful that we're gonna break some stuff. We're gonna push all these systems to the limit and find out where we actually need to push a little harder. And some of the areas I think that we're gonna see that is there, we're gonna wanna move data. And move data off of scientific instruments into computing, into memory, into a lot of different places. And I'm really excited to see how it plays out and what you can do and where the limits are of what you can do with the new systems. Artie, I could talk to you all day. I love the experience and the perspective because you've been doing this for a long time. So I'm gonna give you the final word before we sign out and really bring it back to a more human thing, which is ethics. So one of the conversations we hear all the time is that if you are going to do something, if you're gonna put together a project and you justify that project and then you go and you collect the data and you run that algorithm and you do that project, that's great. But there's like an inherent problem with kind of data collection that may be used for something else down the road that maybe you don't even anticipate. So I just wonder if you can share kind of top level kind of ethical take on how data scientists specifically and then ultimately more business practitioners and other people that don't carry that title need to be thinking about ethics and not just kind of forget about it. That these are, they had a great interview with Paul Doherty, right? Everybody's data is not just their data. It represents a person, right? It's a representation of what they do and how they live their lives. So when you think about kind of rendering into a project and getting started, what do you think about in terms of the ethical considerations and how should people be cautious that they don't go places that they probably shouldn't go? Yeah, I think that's a great question and not a short answer, but I think that I honestly don't know that we have great solutions right now, but I think that the best we can do is take a very multifaceted and also vigilant approach to it. So when you're collecting data and often we should remember a lot of the data that gets used isn't necessarily collected for the purpose it's being used because we might be looking at old medical records or old any kind of transactional records, whether it be from a government or a business. And so, as you start to collect data or build solutions, try to think through who are all the people who might use it and what are the possible ways in which it could be misused? And also I encourage people to think backwards. What were the biases in place that when the data were collected? You see this a lot in the criminal justice phase is historical records reflect historical biases in our systems. And so, there are limits to how much you can correct for previous biases, but there are some ways to do it, but you can't do it if you're not thinking about it. So I think sort of at the outset at developing solutions that's important, but I think equally important is putting in the systems to maintain the vigilance around it. So one, don't move to autonomy before you know what potential new errors you might or new biases you might introduce into the world. And also have systems in place to constantly ask these questions. Am I perpetuating things I don't wanna perpetuate or how can I correct for them and be willing to scrap your system and start from scratch if you need to? Well, Artie, thank you. Thank you so much for your time. Like I said, I could talk to you for days and days and days. I love the perspective and the insight and the thoughtfulness. So thank you for sharing your thoughts as we celebrate Exascale Day. Yeah, thank you for having me. My pleasure, thank you. All right, she's Artie. I'm Jeff, it's Exascale Day. We're covering on theCUBE. Thanks for watching. We'll see you next time.