 Live from the BuildGram Auditorium in San Francisco, it's theCUBE, covering Pure Storage Accelerate 2018. Brought to you by Pure Storage. Welcome back to theCUBE's coverage of Pure Storage Accelerate 2018. I'm Lisa Martin with Dave Vellante. We're at the BuildGram Civic Auditorium and we are sporting some. You can't see mine because it's chilly. Who are you? Who are you? I'm a symbol. I don't know, I don't know. There's a name for that. I'm formerly known as Prince. Dave and I are here with Rob Lee, the VP and Chief Architect at Pure Storage. Hey, Rob, welcome to theCUBE. Thanks, thanks for having me. You're sporting a lot of gray. We won't make a comment. I know, I don't have a symbol or a T-shirt either. Can't believe you haven't been kicked out. Like if then you just didn't actually eject you, you don't have to fix that. So, you've been at Pure for about five years now. You were one of the founders of Flashblade. Here we are, third annual Accelerate, packed house this morning in the keynote session. What are some of your observations about the growth that you've seen at this company? Well, you know, it's really been amazing. When I joined Pure, we were about 150 employees. I joined as part of the founding team for Flashblade, one of the first two or three people. In fact, my first day on the job was taking monitors out of boxes and setting up desks. Since then, we've obviously grown tremendously from 150 employees to over 2,300. But more importantly, what we've been able to grow in terms of customers, right? So we've went from that tiny size to over, I think, 4,800 customers today. From the Flashblade side of the house, it's been a really, really fun ride. The first couple of years of my time at Pure were spent really heads down building the product, figuring out how do we repeat some of the kind of core philosophies and values that we brought to FlashArray into Flashblade and take that product into new markets. We brought that product out and launched it at our first Accelerate conference three years ago. So that first year was really about getting it out to market, growing that customer base. Last year, you saw us take it into a lot, a lot of more kind of newer and emerging workloads, analytics, AI, so on and so forth. And this past year has really even spent just doubling down on that. And not only building a lot more expertise within the company about understanding where that direction of the market is going, but also translating that experience that we're gathering, working with customers on the leading edge of all of those industries into helping our new and prospective customers figure out how do they deploy those solutions into their environments and be maximally successful. So it's really been a very, very exciting ride. So Rob, you're the resident AI expert inside of Pure, and I'm sure there are many, but you're in the cube now, so we want to unpack that a little bit. AI seems to be this emerging technology that's a horizontal layer of tech that cuts across virtually every industry and every application. But its application seems to be narrow, whether it's a facial recognition or natural language processing, supply chain optimization. So what's Pure's point of view on AI, artificial intelligence, I'm not crazy about the name, I like machine intelligence better personally, but what's your point of view on the AI space and how it will get adopted, maybe some of the barriers to that adoption? Sure, well, so I think, so I share the same distaste for the term, mostly because I think it's overused and it's misused in many ways, right? I think if you look at AI at its heart, it's really about gathering more intelligence and more value from data. Now, more recently, technology advances mostly in compute and algorithms have caused and created an explosion in subsets of AI, particularly machine learning or deep learning, and that's really what's driving a lot of these new applications. And you mentioned a few image recognition, voice recognition, so on and so forth. But really what it is, is it's re-highlighting a focus on the fact that organizations for decades have been gathering and collecting and storing and paying to store volumes and volumes of data, but they haven't been able to get the maximum value out of it. And I think one of the most chilling statistics I've seen is that over 80% of data that's gathered is unstructured data, but if you look at all of that unstructured data, less than 1% is actually analyzed, right? What that means is that 99% of the data that people have been collecting over the last several decades, they haven't been able to extract maximum value out of. And I think what we're seeing is that the recent advances in, again, hardware technology, software technology, algorithms to drive a lot of these deep learning type of applications, even though the applications may be very focused in terms of the types of data they work with, image recognition, object recognition, emotion detection, so on and so forth, it's really bringing the spotlight back across organizations onto how do we get more information out of all of our data, right? And in a lot of cases, conversations that we get into with customers that start out with the glitzy use cases, the object detection demos, when we start peeling into, okay well so what is it, how are you going to deploy this into your organization, how are you going to translate this into better customer outcomes? You know, we're actually finding ways to apply more traditional data analysis techniques to get better and more information out of people's data. And that may be everything from relational databases to big data analytics stacks. And so again, I think the bigger movement here is that recent advances in technology have really re-highlighted the focus on organizations getting more out of their data, of all forms. When you think about the top market cap companies, Amazon, Facebook, Microsoft, at Google, et cetera, they seem to be companies that have mastered or at least are ahead of the pack in terms of machine intelligence. You guys recently conducted a study with MIT. What do you see from that study and in your conversations with customers in terms of the incumbents being able to close that gap? So I think there are a couple of really interesting points that came out of the MIT survey. One is that the prevalence and demand for AI and particularly machine learning applications is both broad-based across all industries, but it's also huge. I think one of the stats that I saw was that over 80% of organizations expect to deploy into production some form of AI or machine learning technology into their companies by 2020. You know, I think the other thing that wasn't in that survey but was in a set of remarks that Andrew Aang actually from Google made was that is that the rapid pace of development in AI research and particularly the algorithm side in terms of different training frameworks and the way that people are working with data, that the rapid advance in that is actually democratizing entry into the AI space. I don't remember the exact quote, but he said something to the effect of, as algorithm research advances, it's easier and easier for new entrants to get into machine learning, to get into data science and make a bigger and bigger impact. And I think that the other thing that we've learned from the large incumbents is that in many cases, and I think actually Google was the one that came out and said this, I said the reason why Google is at the head of the pack, if you will, in terms of data intelligence and machine intelligence, in some respect, they got their lead by having the most advanced algorithms, most advanced software engineers, but they maintain their lead because they have the most data. Basically, the takeaway point there is having a lot of data, Trump's having the best algorithm and we expect that to continue as AI research and algorithms continue to evolve. And so I think it's really, in many ways, it's much more democratized landscape than previous approaches to... And a lot of that makes sense because the incumbents as we use that word, I like that word, they're going to buy AI from technology suppliers and then they're going to apply it to their business. I mean, at the same time, data generally is not at the core of their business. It tends to be either humans or maybe the bottling plant or some other manufacturing assets or whatever it is. So they have to figure out the data model and that study suggested that while they were optimistic about AI, they were struggling with trying to figure out how to apply it, the skill sets, et cetera. Maybe share some of your thoughts on that. Absolutely, I think one of the things that that study really highlighted was that while there was a tremendous excitement and demand from the upper levels of management, the CIO, the kind of C-suite to deploy AI technologies, that there was an increasing and growing disconnect between the policy decision makers, the kind of executive management and the people that are actually doing the work and working with it. And I think that that disconnect with this technology set is, we see it on a day-to-day basis. We see it with customers that we talk to and I think a lot of that disconnect actually comes from poor infrastructure planning. One of the things that we see is that many companies get really excited about the promise of AI technology, the promise of, hey, I could deploy this solution, I could understand my customers better, great, let's go do it. And they go off and they hire a bunch of data scientists without investing in or thinking about the infrastructure that they're going to put into place to make those data scientists productive. And one of the things that, I think there was an article in Financial Times that actually looked at hiring and retention for data scientists and what they found was that, what they found was that the lack of infrastructure, the lack of automation was materially contributing to frustration in terms of data scientists being able to do their jobs to the point where even though it's really, really hard to hire data scientists, it's becoming difficult to retain them if you're not giving them, if you're not equipping them with the tools to do their jobs efficiently. And so this is an area where there's a growing disconnect between the decision makers that are saying, hey, we've got to go that way. Their understanding of the tool sets and the automation and the infrastructure required to get there and their staffs and their employees that are actually responsible for getting them there. And this is an area where, as we, one of the exciting parts of my job at Pures, I get to talk to a lot of customers that are on the bleeding edge of implementing these technologies. And one of the things that we get to do is that by working with each of these customers by understanding what works, what doesn't work, we can help kind of bridge that gap. I'll take the bait. So what does that infrastructure for AI look like? I mean, it's kind of self-serving, but describe it. Sure, well, so I think at the heart of it, it's all about simplicity, it's all about removing friction and bottlenecks. There was a Harvard Business Review article a while ago that looked at data science in general, where time is spent, where resources are spent. And they came up with a statistic that said more than 80% of a data scientist's time is spent not doing data science, or it's actually spent preparing data, moving data, copying, doing basic data wrangling data management tasks. And the other 20% is spent complaining about the first 80%. So no, so I think what we see pure helping with, what we see kind of the ideal kind of infrastructure to enable these types of projects is an infrastructure that is simple, easy to work with, easy to manage. But more importantly, you heard Charlie and Kicks during the keynote today talk about data-centric architectures. You heard them talk about the importance of building an architecture, building a practice, building a set of processes around the idea that data is very, very difficult to move, you want to move it as few times as possible, you want to manage it as little as possible. And that really, really applies in a lot of these AI applications, right? To give you a very, very quick example, if you take a look at an AI pipeline to do something like training an object detection system for self-driving cars, that pipeline, that simple sentence may encapsulate 30 or 40 different applications, right? You've got video coming off of video cameras that have to be ingested somewhere, that video has to be cut, downsized, rendered, cut into still images. Still images have to be warped, noise filters applied, color filters applied. If you play this out, right, in most cases there's 30, 40 different applications that are at play here. And without an infrastructure to make it easy to centralize the data management portion of that, you've also potentially got 30 or 40 different data silos. And so, when we look at how to make projects successful, when we look at how do you make infrastructure that helps data science teams spend more time doing data science and less time copying data around, tracking where it is, so on and so forth. You know, that's all part of what we see as a larger data strategy, right? It ends. Oh, sorry, Rob. So one of the customers that was shared on stage this morning, Page AI, how they're leveraging not just pure technology, but also really kind of taking what used to be, and still is for a lot of organizations, an analog process of actually looking at cancer pathology slides and digitizing that and taking it forward. Did you see in this study any leading industries that are maybe better positioned to align the C-suite with the IT needs to take advantage of AI faster? Are there any industries that kind of jumped out in this study as maybe those that are going to be leading edge? So I think the thing that I actually jumped out was that how broad based across industries really the AI applications are. I think if you look at specific types of data sets or specific use cases, right? So if you look at image detection, for example, yes, I think you can drive that into specific industries. I think you're going to see a lot in healthcare, in manufacturing, certainly self-driving cars is a big one, right? I think if you look at natural language processing, right, speech detects, that sort of thing, you know, a lot of customer service, right, that's being put into use in a lot of, you know, automating a lot of chatbots, a lot of customer service, kind of call center type applications. So I think if you look at a particular application, or a particular data set or data type, you can drive that to industries that are likely to lead the charge. But what was interesting to me was if you consider all of the machine learning approaches, all of the AI kind of interest, how broad based across all industries that was? At least I know we're out of time, but we'd be remiss if we didn't ask you what you guys are doing internally. You're not just selling an infrastructure for AI, you're AI practitioners as well. Can you briefly describe what you're doing? Sure, sure. So I think the most interesting application of AI that we've got internally is really the AI engine that powers META, which is our pure one hosted kind of, our pure one offering that helps us predictively and proactively manage customer arrays. We started pure one as a remote support offering since the beginning of pure, since we first shipped FlashArray. And we did it originally to get to the point where we could better understand arrays, the more arrays that we ship in the field, we want the marginal cost of support, the marginal kind of effort, if you will, to understand that array's behavior, to decrease with the number of arrays that we ship. And we want our understanding of the array's behavior of the customer use case of the workload behavior to increase with the number of arrays that we ship. And we started off by using more traditional AI techniques, so basic language processing, basic statistics, so on and so forth. What we've since done is built a machine learning engine behind it so that we can make more intelligent inferences, more intelligent decisions. And so you've seen this come out as in the form of tools that we've released, such as Will It Fit? So we can now take a look at an array and we could say, okay, well, you've got this many workloads, you've got this many VMs sitting on this array and on this volume. What would it look like to put double that? What can you expect in terms of capacity utilization? What can you expect in terms of performance? We can also take that hypothetical kind of hypothesis analysis to different hardware platforms. We can say, hey, you've got this workload running on a X50 today. What would it look like to double that workload and move it to an X70? What would that look like? And again, a lot of those inferences, a lot of, we can do that without exactly tracking and exactly testing that workload because we have a broad based set of data points across our entire fleet. Too complicated for humans to do all that. It really is. Yes, it really is. But generating workload DNA. Work exactly, exactly. And more importantly, and to get to David's point, more importantly, doing it in an automated way so that you don't have to put an army of human beings and army of administrators behind it to calculate it by hand. Well, Rob, thanks so much for stopping by theCUBE and sharing with us what's going on from your perspective. Go get some orange. Thanks for having me. For Dave Vellante, I'm Lisa Martin. You're watching theCUBE. We are live at Pure Storage Accelerate 2018 in San Francisco. Stick around, Dave and I will be right back with our next guest.