 Hi, I'm Peter Burris, and welcome to another Wikibon theCUBE special feature, a special digital community event on the relationship between AI, infrastructure and business value. Now it's sponsored by DDN with participation from NVIDIA and over the course of the next hour, we're going to reveal something about this special and evolving relationship between sometimes tried and true storage technologies and the emerging potential of AI as we try to achieve these new business outcomes. So to do that, we're going to start off with a series of conversations with some thought leaders from DDN and from NVIDIA, and then at the end, we're going to go into a crowd chat. Now this is going to be your opportunity to engage these experts directly, ask your questions, share your stories, find out what your peers are thinking and how they're achieving their AI objectives. That's at the very end, but to start, let's begin the conversation with a Kurt Kukine, who is a senior director of marketing at DDN. Thanks, Peter, happy to be here. So tell us a little bit about DDN to start. So DDN is a storage company that's been around for 20 years. We've got a legacy in high performance computing, and that's what we see a lot of similarities with this new AI workload. DDN is well known in that HBC community. If you look at the top 100 supercomputers in the world, we're attached to 75% of them, and so we have a fundamental understanding of that type of scalable need. That's where we're focused. We're focused on performance requirements. We're focused on scalability requirements, which can mean multiple things, right? It can mean the scaling of performance. It can mean the scaling of capacity, and we're very flexible. Well, let me stop you and say, so you've got a lot of customers in the high performance world, and a lot of those customers are at the vanguard of moving to some of these new AI workloads. What are customers saying? What are, with this significant engagement that you have with the best and the brightest out there, what are they saying about this transition to AI? Well, I think it's fascinating that, we kind of have a bifurcated customer base here, where we have those traditionalists who probably have been looking at AI for over 40 years, right? And they've been exploring this idea, and they've gone through the peaks and troughs in the promise of AI, and then contraction because CPUs weren't powerful enough. Now we've got this emergence of GPUs in the supercomputing world, and if you look at how the supercomputing world has expanded in the last three years, it is through investment in GPUs. And then we've got an entirely different segment, which is a much more commercial segment, and they're maybe newly invested in this AI arena, right? They don't have the legacy of 30, 40 years of research behind them, and they are trying to figure out exactly, you know, what do I do here? A lot of companies are coming to us, hey, I have an AI initiative. Well, what's behind it? Well, we don't know yet, but we've got to have something, and they don't understand where is this infrastructure going to come from. So the general availability of AI technologies, and obviously flash has been a big part of that, very high speed networks within data centers, virtualization certainly helps as well. Now opens up the possibility for using these algorithms, some of which have been around for a long time, but it required very specialized bespoke configurations of hardware to the enterprise. That still begs the question, there are some differences between high performance computing workloads and AI workloads. Let's start with some of the, what are the similarities, and then let's explore some of the differences. So the biggest similarity I think is just, it's an intractable hard IO problem, right? At least from the storage perspective, it requires a lot of high throughput, depending on where those IO characteristics are from, it can be very small file, high op intensive type workflows, but it needs the ability of the entire infrastructure to deliver all of that seamlessly from end to end. So really high performance throughput so that you can get to the data you need and keep this computing element saturated. Keeping the GPU saturated is really the key, that's where the huge investment is, right? So how does that, how do AI and HPC workloads differ? So how they're fundamentally different is often AI workloads operate on a smaller scale in terms of the amount of capacity. At least today's AI workloads, right? As soon as a project encounters success, what our forecast is, is those things will take off and you'll want to apply those algorithms against bigger and bigger data sets. But today, we encounter things like 10 terabyte data sets, 50 terabyte data sets. And a lot of customers are focused only on that, but what happens when you're successful, how do you scale your current infrastructure to petabytes and multi-petabytes when you'll need it in the future? So when I think of HPC, I think of often very, very big batch jobs, very, very large complex data sets. When you think about AI, like image processing or voice processing, whatever else it might be, like a lot of small files, randomly accessed that require, nonetheless, some very complex processing that you don't want to have to restart all the time. And a degree of simplicity that's required to make sure that you have the people that can do it. Have I got that right? You've got it right. Now one, I think misconception is on the HPC side, that whole random small file thing has come in in the last five, 10 years and it's something DDN's been working on quite a bit. Our legacy was in high performance throughput workloads, but the workloads have evolved so much on the HPC side as well. And as you posited at the beginning, so much of it has become AI and deep learning research. That we've become a lot more alike. They do look a lot more alike. So if we think about the revolving relationship now between some of these new data first workloads, AI oriented, change the way the business operates types of stuff, what do you anticipate is going to be the future of the relationship between AI and storage? Well, what we foresee really is that the explosion in AI needs and AI capabilities is going to mimic what we already see and really drive what we see on the storage side. We've been showing that graph for years and years and years of just everything going up into the right, but as AI starts working on itself and improving itself, as the collection means keep getting better and more sophisticated and have increased resolutions, whether you're talking about cameras or in life sciences acquisition capabilities, just keep getting better and better and the resolutions get better and better. It's more and more data, right? And you want to be able to expose a wide variety of data to these algorithms. That's how they're going to learn faster. And so what we see is that the data centric part of the infrastructure is going to need to scale even if you're starting today with a smaller workload. Kurt, thank you very much, great conversation. How does this turn into value for users? Well, let's take a look at some use cases that come out of these technologies. DDNE3I with NVIDIA DGX1 is a fully integrated and optimized technology solution that provides enablement and acceleration for a wide variety of AI and DL use cases in any scale. The platform provides tremendous flexibility and supports a wide variety of workflows and data types. Already today, customers in industry, academia and government all around the globe are leveraging DDNE3I with NVIDIA DGX1 for their AI and DL efforts. In this first example use case, DDNE3I enables a life sciences research laboratory to accelerate their microscopy capture and analysis pipeline. On the top half of the slide is the legacy pipeline, which displays low resolution results from a microscope with a three minute delay. On the bottom half of the slide is the accelerated pipeline where DDNE3I with NVIDIA DGX1 delivers results in real time, 200 times faster and with much higher resolution than in the legacy pipeline. This use case demonstrates how a single unit deployment of the solution can enable researchers to achieve better science and fastest time to results without the need to build out complex IT infrastructure. The white paper for this example use case is available on the DDN website. In the second example use case, DDNE3I with NVIDIA DGX1 enables an autonomous vehicle development program. The process begins in the field where an experimental vehicle generates a wide range of telemetry that's captured on a mobile deployment of the solution. The vehicle data is used to train capabilities locally in the field, which are transmitted to the experimental vehicle. Vehicle data from the fleet is captured to a central location where a large DDNE3I with NVIDIA DGX1 solution is used to train more advanced capabilities which are transferred back to experimental vehicles in the field. The central facility also uses the large data sets in the repository to train experimental vehicles in simulated environments to further advance the AV program. This use case demonstrates the scalability, flexibility, and edge-to-data center capability of the solution. DDNE3I with NVIDIA DGX1 brings together industry-leading compute, storage, and network technologies in a fully integrated and optimized package that makes it easy for customers in all industries around the world to pursue breakthrough business innovation using AI and DL. You know, ultimately, this industry is driven by what users must do, the outcomes that they try to seek, but it always is made easier and faster when you've got great partnerships working on some of these hard technologies together. Let's hear how DDN and NVIDIA are working together to try to deliver new classes of technology capable of making these AI workloads scream. Specifically, we've got Kurt Kukine coming back. He's a senior director of marketing for DDN. And Darren Johnson, who's a global director of technical marketing for NVIDIA in the enterprise and deep learning. Today, we're going to be talking about what infrastructure can do to accelerate AI. And specifically, we're going to use a relationship, a burgeoning relationship between DDN and NVIDIA to describe what we can do to accelerate AI workloads by using higher performance, smarter and more focused infrastructure for computing. Now to have this conversation, we've got two great guests here. We've got Kurt Kukine, who is the senior director of marketing at DDN. And also Darren Johnson, who's a global director of technical marketing for enterprise in NVIDIA. Kurt, Darren, welcome to theCUBE. Thank you very much. So let's get going on this because this is a very, very important topic. And I think it all starts with this notion of that there is a relationship that you guys have put forward. Kurt, why don't you describe it? Sure, well, so what we're announcing today is DDN's A3I architecture powered by NVIDIA. So it is a full rack level solution, a reference architecture that's been fully integrated and fully tested to deliver an AI infrastructure very simply, very completely. So if we think about how this is gonna, or why this is important, AI workloads clearly have put special stress on underlying technology. Darren, talk to us a little bit about the nature of these workloads and why, in particular, things like GPUs and other technologies are so important to make them go fast. Absolutely, and as you probably know, AI is all about the data. Whether you're doing medical imaging, whether you're doing natural language processing, whatever it is, it's all driven by the data. The more data that you have, the better results that you get. But to drive that data into the GPUs, you need great IO. And that's why we're here today, to talk about DDN and the partnership of how to bring that IO to the GPUs on our DGX platforms. So if we think about what you describe, a lot of small files often randomly distributed with, nonetheless, very high profile jobs that just can't stop midstream and start over. Absolutely, and if you think about the history of high performance computing, which is very similar to AI, really IO is just that, lots of files, you have to get it there, low latency, high throughput. And that's why DDN's probably nearly 20 years of experience working in that exact same domain is perfect because you get the parallel file system, which gives you that throughput, gives you that low latency, just helps drive the GPU. So you mentioned HPC from 20 years of experience. Now it used to be that HPC, you'd have a scientist with a bunch of graduate students setting up some of these big honking machines, but now we're moving into the commercial domain. You don't have graduate students running around. You don't have very low cost, high quality people. You're just, you know, a lot of administrators who nonetheless, good people, but lot to learn. So how does this relationship actually start making or bringing AI within reach of the commercial world? And that's exactly where this reference architecture comes in, right? So a customer doesn't need to start from scratch. They have a design now that allows them to quickly implement AI. It's something that's really easily deployable. We've fully integrated this solution. DDN has made changes to our parallel file system appliance to integrate directly within the DGX-1 environment, makes that even easier to deploy from there and extract the maximum performance out of this without having to run around and tune a bunch of knobs, change a bunch of settings. It's really going to work out of the box. And the, you know, NVIDIA's done more than just the DGX-1. It's more than hardware. You've done a lot of optimization of different AI toolkits, et cetera. Talk a little bit about that, Darren. Yeah, so, I mean, talking about the example I used researchers in the past with HBC, what we have today are data scientists. Data scientists understand PyTorch. They understand TensorFlow. They understand the frameworks. They don't want to understand the underlying file system, networking, RDMA, Infiniband, any of that. They just want to be able to come in, run their TensorFlow, get the data, get the results. And just turn that, keep turning that, whether it's a single GPU or nine DGXs, or as many DGXs as you want. So this solution helps bring that to customers much easier. So those data scientists don't have to be system administrators. So a reference architecture that makes things easier, but that's more than just for some of these commercial things. It's also the overall ecosystem, new application providers, application developers. How is this going to impact the aggregate ecosystem that's growing up around the need to do AI related outcomes? Well, I think one point that Darren was getting to there, and one of the big effects is also, as these ecosystems reach a point where they're going to need to scale, right? There's somewhere where DDN has tons of experience, right? So many customers are starting off with smaller data sets. They still need the performance. A parallel file system in that case is going to deliver that performance. But then also as they grow, right? Going from one GPU to nine DGXs is going to be an incredible amount of both performance scalability that they're going to need from their IO, as well as probably capacity scalability. And that's another thing that we've made easy with A3i is being able to scale that environment seamlessly within a single namespace so that people don't have to deal with a lot of, again, tuning and turning of knobs to make this stuff work really well and drive those outcomes that they need as they're successful, right? So in the end, it is the application that's most important to both of us, right? It's not the infrastructure. It's making the discoveries faster. It's processing information out in the field faster. It's doing analysis of the MRI faster, you know, helping the doctors, helping anybody who's using this to really make faster decisions, better decisions. Exactly. And just to add to that, I mean, in automotive industry, you have data sets that are from 50 to 500 petabytes and you need access to all that data all the time because you're constantly training and retraining to create better models, to create better autonomous vehicles. And you need the performance to do that. DDN helps bring that to bear. And with this reference, architecture simplifies it. So you get the value add of NVIDIA GPUs plus its ecosystem of software, plus DDN, it's a match made in heaven. Kurt, Darren, thank you very much. Great conversation. To learn more about what they're talking about, let's take a look at a video created by DDN to explain the product and the offering. DDN A3i with NVIDIA VGX-1 is a fully integrated and optimized technology solution that enables and accelerates into N-data pipelines for AI and DL workloads of any scale. It is designed to provide extreme amounts of performance and capacity backed by a jointly engineered and validated architecture. Compute is the first component to the solution. The DGX-1 delivers over one petaflops of DL training performance, leveraging eight NVIDIA Tesla V100 GPUs in a 3RU appliance. The GPUs are configured in a hybrid cube mesh topology using the NVIDIA NVLink interconnect. DGX-1 delivers linearly predictable application performance and is powered by the NVIDIA DGX software stack. DDN A3i solutions can scale from single to multiple DGX-1s. Storage is the second component of the solution. The DDN AI-200 is an all-in-VME parallel file storage appliance that's optimized for performance. The AI-200 is specifically engineered to keep GPU computing resources fully utilized. The AI-200 ensures maximum application productivity while easily managing tough data operations. It's offered in three capacity options in a compact 2RU chassis. Each AI-200 appliance can deliver up to 20 gigabytes a second of throughput and 350,000 IOPS. The DDN A3i architecture can scale up and out seamlessly over multiple appliances. The third component of the solution is a high-performance, low-latency, RDMA-capable network. Both EDR and Finneban and 100 gigabit Ethernet options are available. This provides flexibility, ensures seamless scaling, and easy integration of the solution within any IT infrastructure. DDN A3i solutions with NVIDIA DGX-1 brings together industry-leading compute, storage, and network technologies in a fully integrated and optimized package that's easy to deploy and manage. It's backed by deep expertise and enables customers to focus on what really matters, extracting the most value from their data with unprecedented accuracy and velocity. Always great to hear the product. Let's get the analyst perspective. Now I'm joined by Dave Vellante, who's an analyst with Wikibon, a colleague here at Wikibon and co-CEO of SiliconANGLE. Dave, welcome to theCUBE. Dave, a lot of conversation about AI. What is it about today that is making AI so important to so many businesses? Well, I think there's three things, Peter. The first is the data. We've been on this decade-long Hadoop bandwagon, and what that did is it really focused organizations on putting data at the center of their business. And now they're trying to figure out, okay, how do we get more value out of that? So the second piece of that is the technology is now becoming available. So AI, of course, has been around forever, but the infrastructure to support that, the GPUs, the processing power, flash storage, deep learning frameworks like TensorFlow have really, Cafe have started to come to the marketplace. So the technology is now available to act on that data. And I think the third is people are trying to get digital right. This is about digital transformation. Digital means data. We talk about that all the time. And every corner office is trying to figure out what their digital strategy should be. So they're trying to remain competitive, and they see automation and artificial intelligence, machine intelligence applied to that data as a linchpin of their competitiveness. So a lot of people talk about the notion of data as a source of value, and there's been some presumption that's all going to the cloud. Is that accurate? Well, yes, it's funny. Funny you say that, because as you know, we've done a lot of work on this. And I think the thing that organizations have realized in the last 10 years is the idea of bringing five megabytes of compute to a petabyte of data is far more viable. And as a result, the pendulum is really swinging in many different directions. One being the edge, data's going to stay there. Certainly the cloud is a major force. And most of the data still today lives on premises. And that's where most of the data is likely going to stay. And so, no, all the data's not going to go into the cloud. Well, it's not the central cloud. Yeah, that's right. The central public cloud, you know, you can maybe redefine the boundaries of the cloud. I think the key is you want to bring that cloud-like experience to the data. We've talked about that a lot in the Wikibon and Cube communities. And that's all about simplification and sort of cloud business models. So that suggests pretty strongly that there is going to continue to be a relationship between choices about hardware infrastructure on premises and the success at making some of these advanced complex workloads run and scream and really drive some of that innovative business capabilities. As you think about that, what is it about AI technologies or AI algorithms and applications that have an impact on storage decisions? Well, I mean, the workloads, the characteristics of the workloads oftentimes it's going to be largely unstructured data. There's going to be small files. There's going to be a lot of those small files and they're going to be kind of randomly distributed. And as a result, that's going to change the way in which people are going to design systems to accommodate those workloads. There's going to be a lot more bandwidth. There's going to be a lot more parallelism in those systems in order to accommodate and keep those CPUs busy. We're going to talk more about that, but the workload characteristics are changing so the fundamental infrastructure has to change as well. So our goal ultimately is to ensure that we can keep these new high-performance GPUs saturated by flowing data to them without a lot of spiky performance throughout the entire subsystem. Have you got that right? Yeah, I think that's right. I mean, that's when I was talking about parallelism, that's what you want to do. You want to be able to load up that processor, especially these alternative processors like GPUs and make sure that they stay busy. You know, the other thing is when there's a problem, you don't want to have to restart the job. So you want to have sort of real-time error recovery, if you will. I mean, that's been crucial in the high-performance world for a long, long time in terms of, because these jobs, as you know, take a long, long time. So to the extent that you don't have to restart a job from ground zero, you can save a lot of money. Yeah, especially as we, as you said, as we start to integrate some of these AI applications with some of the operational applications that are actually recording the results of the work that's being performed or the prediction that's being made or the recommendation that's being proffered. So I think ultimately, if we start thinking about this crucial role that AI workloads are going to have in business and that storage is going to have on AI, move more processing close to the data, et cetera, that suggests that there's going to be some changes in the offing for the storage industry. What are you thinking about how the storage industry is going to evolve over time? Well, there's certainly a lot of hardware stuff that's going on. We always talk about software-defined, but there's some hardware still matters, right? So obviously flash storage changed the game from spending mechanical disk, and that's part of this. You're also, as I said before, seeing a lot more parallelism, high bandwidth is critical. You know, a lot of the discussion that we're having in our community is the affinity between HPC, high-performance computing, and big data. And I think that was pretty clear, and now that's evolving to AI. So the internal network, things like InfiniBand are pretty important, NVMe is coming on to the scene. So those are some of the things that we see. I think the other one is file systems. You know, NFS tends to deal really well with unstructured data and data that is sequential when you have all this streaming, for example. Exactly, and when you have all this, what we just described, this sort of random nature, and you have the need for parallelism, you really need to rethink file systems. You know, file systems are, again, a linchpin of getting the most out of these AI workloads. And I think the other is, we talked about the cloud model, you got to make this stuff simple. If we're going to bring AI and machine intelligence workloads to the enterprise, it's got to be manageable by enterprise admins. You don't need, you're not going to be able to have a scientist be able to deploy this stuff. So it's got to be simpler, cloud-like. Fantastic, Dave Vellante, Wikibon, thanks very much for being on theCUBE. My pleasure. We've had the analyst perspective, now let's take a look at some real numbers. Not a lot of companies have delivered a rich set of benchmarks relating AI, storage, and business outcomes. DDN has. Let's take a look at a video that they've prepared to describe the benchmarks associated with these new products. DDN A3i with NVIDIA VGX-1 is a fully integrated and optimized technology solution that provides massive acceleration for AI and DL applications. DDN has engaged extensive performance and interoperability testing programs in close collaboration with expert technology partners and customers. Performance testing has been conducted with synthetic throughput and IOPS workloads. The results demonstrate that the DDN A3i parallel architecture delivers over 100,000 IOPS and over 10 gigabytes per second of throughput to a single DGX-1 application container. Testing with multiple container demonstrates linear scaling up to full saturation of the DGX-1's IO capabilities. These results show concurrent IO activity from four containers with an aggregate delivered performance of 40 gigabytes per second. The DDN A3i parallel architecture delivers true application acceleration. Extensive interoperability and performance testing has been completed with a dozen popular DL frameworks on DGX-1. The results show that with the DDN A3i parallel architecture, DL applications consistently achieve higher training throughput and faster completion times. In this example, CAFE achieves almost eight times higher training throughput on DDN A3i. As well, it completes over five times faster than when using a legacy file sharing architecture and protocol. Comprehensive tests and results are fully documented in the DDN A3i solutions guide available from the DDN website. This test illustrates the DGX-1 GPU utilization and read activity from the AI-200 parallel storage appliance during a TensorFlow training iteration. The green line shows that the DGX-1 GPUs achieve maximum utilization throughout the test. The red line shows that the AI-200 delivers a steady stream of data to the application during the training process. In the graph below, we show the same test using a legacy file sharing architecture and protocol. The green line shows that the DGX-1 never achieves full GPU utilization and that the legacy file sharing architecture and protocol fails to sustain consistent IO performance. These results show that with DDN A3i, this DL application on the DGX-1 achieves maximum GPU productivity and completes twice as fast. This test and result is also documented in the DDN A3i solutions guide available from the DDN website. DDN A3i solutions with NVIDIA DGX-1 brings together industry-leading compute, storage, and network technologies in a fully integrated and optimized package that enables widely used DL frameworks to run faster, better, and more reliably. You know, it's great to see real benchmarking data because this is a very important domain and there's not a lot of benchmarking information out there around some of these other products that are available. But let's try to turn that benchmarking information into business outcomes. And to do that, you've got Kurt Kukain back from DDN. Kurt, welcome back. Let's talk a bit about how are these high-value outcomes that business seeks with AI going to be achieved as a consequence of this new performance, faster capabilities, et cetera? So there's a couple of considerations. The first consideration, I think, is just the selection of AI infrastructure itself. We have customers telling us constantly that they don't know where to start. Now they have readily available reference architectures that tell them, hey, here's something you can implement, get installed quickly, you're up and running your AI from day one. So the decision process for what to get is reduced. Exactly. Number two is you're unlocking all ends of the investment with something like this, right? You're maximizing the performance on the GPU side. You're maximizing the performance on the ingest side for the storage. You're maximizing the throughput of the entire system. So you're really gaining the most out of your investment there and not just gaining the most out of the investment but truly accelerating the application. And that's the end goal, right? That we're looking for with customers. Plenty of people can deliver fast storage, but if it doesn't impact the application and deliver faster results, cut run times down, then what are you really gaining from having fast storage? And so that's where we're focused. We're focused on application acceleration. So simpler architecture, faster implementation based on that integrated capabilities, ultimately all revealing or all resulting in better application performance. Better application performance and in the end, something that's more reliable as well. Kirk Huckine, thanks very much for being on theCUBE again. So that ends our prepared remarks. We've heard a lot of great stuff about the relationship between AI, infrastructure, especially storage and business outcomes. But here's your opportunity to go into the crowd chat and ask your questions, get your answers, share your stories, engage your peers and some of the experts that we've been talking with about this evolving relationship between these key technologies and what it's going to mean for business. So I'm Peter Burris. Thank you very much for listening. Let's jump into the crowd chat and really engage and get those key issues addressed.