 Hello everyone, welcome to theCUBE's presentation of the AWS Startup Showcase. This is season three, episode one. The focus of this episode is AI ML, top startups building foundational models, infrastructure and AI. It's great topics, super relevant, and it's part of our ongoing coverage of startups in the AWS ecosystem. I'm your host, John Furrier with theCUBE. Today we're excited to be joined by Jay Marshall, VP of business development at Neural Magic. Jay, thanks for coming on theCUBE. Hey John, thanks so much. Thanks for having us. We had a great CUBE conversation with you guys. This is very much about the company focus. It's a featured presentation for the startup showcase. The machine learning at scale is the topic, but in general it's more, we should call it machine learning and AI, how to get started, because everybody is retooling their business at companies that aren't retooling their business right now with AI first, will be out of business in my opinion. You're seeing massive shift. This is really truly the beginning of the next gen machine learning AI trend. It's really seeing chat GPT, everyone sees that, that went mainstream. But this is just the beginning. It's just scratching the surface of this next generation AI with machine learning, powering it, and with all the goodness of cloud, cloud scale and how horizontally scalable it is. The resources are there. You got the edge. Everything's perfect for AI because data infrastructure is exploding and value AI is just the applications. This is super topic. What do you guys see in this general area of opportunities right now in the headlines? And I'm sure you guys, phone must be ringing off the hook, metaphorically speaking, or emails and meetings and Zooms. What's going on over there at Neural Magic? No, absolutely. And you pretty much nailed most of it. I think that my background we've seen for the last 20 plus years, even just getting enterprise applications kind of built and delivered at scale, obviously amazing things with AWS and the cloud to help accelerate that. We just kind of figured out in the last five or so years how to do that productively and efficiently kind of from an operations perspective got development and operations teams even came up with DevOps, right? But now we kind of have this new kind of persona and new workload that developers have to talk to and it has to be deployed on those IT ops solutions. And so you pretty much nailed it. Folks are saying, well, how do I do this? These big generational models or foundational models as we're calling them, they're great, but enterprises want to do that with their data on their infrastructure at scale, at the edge. So for us, yeah, we're helping enterprises accelerate that through optimizing models and then delivering them at scale in a more cost effective fashion. You know, I think one of the things the benefits of open AI we saw was not only as an open source, and you got also other models that are more proprietary is that it shows the world that this is really happening, right? It's a whole nother level. And there's also new landscape kind of maps coming out. You got the generative AI and you got the foundational models, large LLMs. Where do you guys fit into the landscape? Because you guys are in the middle of this, how do you talk to customers when they say, I'm going down this road, I need help. I'm going to stand this up and this new AI infrastructure and applications. Where do you guys fit in the landscape? Right, and really the answer is both. I think today when it comes to a lot of what for some folks would still be considered kind of cutting edge around computer vision and natural language processing. A lot of our optimization tools and our runtime are based around most of the common computer vision and natural language processing models. So you're YOLOs, you're BERTs, distill BERTs and what have you. So we work to help optimize those. Again, who've gotten great performance and great value for customers get trying to get those into production. But when you get into the LLMs and you mentioned some of the open source components there, our research teams have kind of been right in the trenches with those. So kind of the GPT open source equivalent, OPT, being able to actually take a multi-hundred billion dollar parameter model and sparsify that or optimize that down, shaving away a ton of parameters and being able to run it on smaller infrastructure. So I think the evolution here, all this stuff came out in the last six months in terms of being turned loose into the wild. But we're staying in the trenches with folks so that we can help optimize those as well and not require again, the heavy compute, the heavy cost, the heavy power consumption as those models evolve as well. So we're staying right in with everybody while they're being built, but trying to get folks into production today with things that help with business value today. Jay, I really appreciate you coming on theCUBE and before we came on camera, you said you just run a customer call. I know you've got a lot of activity. What specific things are you helping enterprises solve? What kind of problems? Take us through the spectrum from the beginning, people jumping in the deep end of the pool, some people kind of coming in, starting out slow. What are the scale? Can you scope the kind of use cases and problems that are emerging that people are calling you for? Absolutely. So I think if I break it down to kind of like your startup or I maybe call them AI native to kind of steal from cloud native years ago, that group, it's pretty much part and parcel for how that group already runs. So if you have a data science team and an ML engineering team, your building models, your training models, your deploying models, you're seeing firsthand the expense of starting to try to do that at scale. So it's really just a pure operational efficiency play. They kind of speak natively to our tools, which we're doing in the open source. So it's really helping again with the optimization of the models they've built. And then again, giving them an alternative to expensive proprietary hardware accelerators to have to run them. Now on the enterprise side, it varies, right? You have some kind of AI native folks there that already have these teams, but you also have kind of like AI curious, right? Like they want to do it, but they don't really know where to start. And so for there, we actually have an open source toolkit that can help you get into this optimization. And then again, that runtime, that inferencing runtime, purpose built for CPUs, it allows you to not have to worry again about do I have a hardware accelerator available? How do I integrate that into my application stack? If I don't already know how to build this into my infrastructure, does my IT ops teams, do they know how to do this? And what does that runway look like? How do I cost for this? How do I plan for this? When it's just x86 compute, we've been doing that for a while, right? So it's obviously still requires more, but at least it's a little bit more predictable. It's funny you mentioned AI native, born in the cloud was a phrase that was out there. You kind of now you have startups that are born in AI companies. So I think you have this kind of cloud kind of vibe going on. You have lift and shift was a big discussion. Then you had cloud native kind of in the cloud, kind of making it all work. Is there a existing set of things people throw in this at? And then what's interesting AI native and kind of providing it to existing stuff? Because we're seeing a lot of people take some of these tools and apply it to either existing stuff it's not really lift and shift, but it's kind of like bolting on AI to something else. And then starting with AI first or native AI. Absolutely, that's a great question. I think that probably where I'd probably pulled back to kind of a lot of kind of retail type scenarios where for five, seven, nine years or more even, a lot of these folks already have data science teams. They've been doing this for quite some time. The difference is the introduction of these neural networks and deep learning, right? Those kinds of models are just a little bit of a paradigm shift. So, I obviously was trying to be fun with the term AI native but I think it's more folks that kind of came up in that neural network world. So it's a little bit more second nature whereas I think for maybe some traditional data scientists starting to get into neural networks, the complexity there and the training overhead and a lot of the aspects of getting a model finally tuned in hyperparameterization and all of these aspects of it. It just adds a layer of complexity that they're just not as used to dealing with and so our goal is to help make that easy and then of course make it easier to run anywhere that you have just kind of standard infrastructure. Well, the other point I'd bring out and I'd love to get your reaction to is not only is that a neural network team, people who have been focused on that but also if you look at some of the data ops, lately AI ops markets, a lot of data engineering, a lot of scale, folks who have been kind of like in that data tsunami cloud world are seeing, they kind of been in this, right? They're like been experiencing it. No doubt. I think it's funny the data lake concept, right? And you got data oceans now, like the metaphors just keep growing on us but where it is valuable in terms of trying to shift the mindset, I've always kind of been a fan of some of the naming shift. I know with AWS, they always talk about purpose built databases and I always liked that because you don't have one database that can do everything. Even ones that say they can, like you still have to do implementation detail differences. So sitting back and saying what is my use case and then which database will I use it for? I think it's kind of similar here and when you're building those data teams, if you don't have folks that are doing data engineering, kind of that data harvesting, pre-processing, you got to do all that before a model is even going to care about it. So yeah, it's definitely a central piece of this as well. And again, whether or not you're going to be AI native and as you're making your way to kind of, on that journey, data is definitely a huge component of it. Yeah, you would have loved our super cloud event we had talked about naming and around data meshes was talked about a lot. You started to see the control plane layers of data. I think that was the beginning of what I saw as that data infrastructure shift to be horizontally scalable. So I have to ask you with neural magic and your customers and the people that are prospects for you guys, they're probably asking a lot of questions because I think the general thing that we see is how do I get started? Which GPU do I use? I mean, there's a lot of things that are kind of, I won't say technical or targeted towards people who are living in that world, but like as the mainstream enterprises come in, they're going to need to playbook. What do you guys see? What do you guys offer your clients when they come in and what do you recommend? Absolutely, and I think where we hook in specifically tends to be on the training side. So again, I've built a model. Now I want to really optimize that model. And then on the runtime side, when you want to deploy it, we run that optimized model. And so that's where we're able to provide. We even have a labs offering in terms of being able to pair up our engineering teams with a customer's engineering teams. And we can actually help with most of that pipeline. So even if it is something where you have a data set and you want some help in picking a model, you want some help training it, you want some help deploying that, we can actually help there as well. You know, there's also a great partner ecosystem out there, like a lot of folks even in the startup showcase here that extend beyond into kind of your earlier comment around data engineering or downstream IT ops or the all up ML ops umbrella. So we can absolutely engage with our labs. And then of course, you know, again partners which are always kind of key to this. So you are spot on. I think what's happened with the kind of this, they talk about a hockey stick. This is almost like a flat wall now with the rate of innovation right now in this space. And so we do have a lot of folks wanting to go straight from curious to native. And so that's definitely where the partner ecosystem comes in so hard because there just isn't anybody or any teams out there that literally do from, here's my blank database and I want an API that does all the stuff, right? Like that's a big chunk. We can definitely help with the model to delivery piece. Well, you guys are obviously a featured company in this space. Talk about the expertise. A lot of companies are like, I won't say faking until they make it. You can't really fake security. You can't really fake AI, right? So there's going to be a learning curve. There'll be a few starts will come out of the gate early. You guys are one of them. Talk about what you guys have as expertise as a company, why you're successful and what problems do you solve for customers? No, I appreciate that. Yeah, we actually, we love to tell the story of our founder, Nir Shavitsa. He's a 20 year professor at MIT, actually was doing a lot of work on kind of multi-core processing before there were even physical multi-cores and actually even did a stint in computational neurobiology in the 2010s. And the impetus for this whole technology has a great talk out on YouTube about it where he talks about the fact that his work there, he kind of realized that the way neural networks encode and how they're executed by kind of ramming data layer by layer through these kind of HPC style platforms actually was not analogous to how the human brain actually works. So we're on one side, we're building neural networks and we're trying to emulate neurons. We're not really executing them that way. So our team, which one of the co-founders also XMIT, that was kind of the birth of why can't we leverage this super-performance CPU platform which has those really fat, fast caches attached to each core and actually start to find a way to break that model down in a way that I can execute things in parallel not having to do them sequentially. So there's a lot of amazing like talks and stuff that show the kind of the magic, if you will, a part in the part of neural magic. That's kind of the foundational layer of all the engineering that we do here. In terms of how we were able to bring it to reality for customers, I'll give one customer a quote where it's a large retailer and it's a people counting application. So a very common application and that customer has actually been able to show literally double the amount of cameras being run with the same amount of compute. So from a one-to-one perspective, two-to-one, business leaders usually like that math, right? So we're able to show pure cost savings but even performance-wise, we have some of the common models like your ResNets and your Olos where we can actually even perform better than hardware accelerated solutions. So we're trying to do, I hate to just dumb it down to better, faster, cheaper but from a commodity perspective, that's where we're accelerating. It's not a bad business model. Make things easier to use, faster and reduce the steps it takes to do stuff. So that's always going to be a good market. Now you guys have deep sparse, which we've talked about on our CUBE conversation prior to this interview, delivers ML models through the software. So the hardware allows for a decoupling, right? Which is going to drive probably a cost advantage. Also, it's also probably from a deployment standpoint, it must be easier. Can you share the benefits? Is it a cost side? Is it more of a deployment? What are the benefits of the deep sparse when you guys decoupled the software from the hardware on the ML models? No, you actually hit them both because that really is primarily the value because ultimately, again, we're so early that and I came from this world in a prior life where I'm doing Java development and the WebSphere, WebLogic, Tomcat open source, right? When we were trying to do innovation and we had innovation buckets because everybody wanted to be on the web and have their app in a browser, right? We got all the money we needed to build something and show, hey, look at the thing on the web, right? But when you had to get into production, that was the challenge. So to what you're speaking to here, in this situation, we're able to show we're just a Python package. So whether you just install it on the operating system itself or we also have a containerized version, you can drop on any container orchestration platform. So ECS or EKS on AWS. And so you get all the auto scaling features. So when you think about that kind of a world where you have everything from real-time inferencing to kind of after hours batch processing inferencing, the fact that you can auto scale that hardware up and down and it's CPU based, so you're paying by the minute instead of maybe paying by the hour at a lower cost shelf. It does everything from pure cost to again, I can have my standard IT team say, hey, here's the Kubernetes in the container and it just runs on the infrastructure we're already managing. So you have operational cost and again, and many times even performance. So that's, I can throw CPUs if I want to. Yeah, so that's easier from the deployment too. And you don't have this kind of, kind of blank check kind of situation where you don't know what's on the back end on the cost side and you control the actual hardware and you can manage that supply chain. And keep in mind exactly because the other thing that sometimes gets lost in the conversation depending on where a customer is some of these workloads like, you and I remember a world where even like the round trip to the cloud and back was a problem for folks, right? We're used to extremely low latency and some of these workloads absolutely also adhere to that. But there's some workloads where the latency isn't as important. Now we actually even provide the tuning that if we're giving you five milliseconds of latency and you don't need that, you can tune that back so less CPU lower cost. Now throughput and other things come into play but that's the kind of configurability and flexibility we give for operations. All right, so why should I call you if I'm a customer or prospect, Neural Magic? What problem do I have or when do I know I need you guys? When do I call you in and what does my environment look like? When do I know? What are some of the signals that would tell me that I need Neural Magic? No, absolutely. So I think in general, any neural network, the process I mentioned before called sparsification. It's an optimization process that we specialize in. Any neural network can be sparsified. So I think if it's a deep learning neural network type model, if you're trying to get AI into production, you have cost concerns, even performance wise, I certainly hate to be too generic and say, hey, we'll talk to everybody. But really in this world right now, if it's a neural network, it's something where you're trying to get into production. We are definitely offering kind of an at scale performance deployable solution for deep learning models. So neural network, you would define as what? Just devices that are connected that need to know about each other. What's the state of the art current definition of neural network for customers that may think they have a neural network or might not know they have a neural network architecture? What is that definition for neural network? That's a great question. So basically machine learning models that fall under this kind of category, you hear about transformers a lot or I mentioned about YOLO, the YOLO family of computer vision models or natural language processing models like BERT. Yeah, if you have a data science team or even developers, some even regular, I used to call myself a nine to five developer because I worked in the enterprise, right? So like, hey, we found a new open source framework. I used to use spring back in the day and I had to go figure it out. There's developers that are pulling these models down and they're figuring out how to get them into production. So I think all of those kinds of situations, if it's a machine learning model of the deep learning variety, that's really specifically where we shine. Okay, so let me pretend I'm a customer for a minute. I have all these videos, like all these transcripts. I have all these people that we've interviewed, CUBE alumni's and I say to my team, let's AIify, sparsify theCUBE. What do I do? I mean, do I just like, my developers got to get involved and they're going to be like, oh, I uploaded to the cloud, do I use a GPU? So there's a thought process and I think a lot of companies are going through that example of let's get on this AI. How can it help our business? What does that progression look like? Take me through that example. I mean, I made that CUBE example of it, but we do have a lot of data. We have large data models and we have people and connected to the internet. And so we're kind of seeing like there's a neural network. I think every company might have a neural network in place. Well, and I was going to say, I think in general, you all probably do represent even the standard enterprise more than most because even the enterprise is going to have a ton of video content, a ton of text content. So I think it's a great example. So I think that that kind of C, or I'll even go ahead and use that term data lake again of data that you have, you're probably going to want to be setting up kind of machine learning pipelines. They're going to be doing all of the preprocessing from kind of the raw data to kind of prepare it into the format that say a YOLO would actually use or let's say BERT for natural language processing. So you have all these transcripts, right? So we would do a preprocessing path where we would create that into the file format that BERT, the machine learning model would know how to train off of. So that's kind of all the preprocessing steps. And then for training itself, we actually enable what's called sparse transfer learning. So that's transfer learning is a very popular method of doing training with existing models. So we would be able to retrain that BERT model with your transcript data that we have now done the preprocessing with to get it into the proper format. And now we have a BERT natural language processing model that's been trained on your data. And now we can deploy that onto deep sparse runtime. So that now you can ask that model, whatever questions or I should say pass, you're not going to ask it, those kinds of questions, it's a GPT. Although we can do that too, but you're going to pass text through the BERT model and it's going to give you answers back. It could be things like sentiment analysis or text classification. You just call the model and now when you pass text through it, you get the answers better, faster, cheaper. I'll use that reference again. Okay, we can create a QBOT to give us questions on the fly from the AI bot, you know, from our previous guests. And I will tell you, using that as an example, so I had mentioned OPT before kind of the open source version of chat GPT. So, you know, typically that requires multiple GPUs to run. So our research team, I mentioned earlier, we've been able to sparsify that over 50% already and run it on only a single GPU. And so in that situation, you could train OPT with that corpus of data and do exactly what you say. Actually, we could use Alexa. We could use Lex to actually respond back with voice. How about that? We'll do an API call and we'll actually have an interactive Alexa-enabled bot. Okay, we're going to be a customer. Let's put it on the list. But this is a great example of what you guys call software-delivered AI, a topic we chatted about on theCUBE conversation. This really means, this is a developer opportunity. This really is the convergence of the data growth, the restructuring of how data is going to be horizontally scalable meets developers. So this is an AI developer model going on right now, which is kind of unique. It is, John. And I will tell you what's interesting. And again, folks don't always think of it this way. The AI magical goodness is now getting pushed in the middle where the developers and IT are operating. And so it, again, that paradigm, although for some folks seem obvious, again, if you've been around for 20 years, that whole, all that plumbing is a thing, right? And so what we basically help with is when you deploy the deep sparse runtime, we have a very rich API footprint. And so the developers can call the API, IT ops can run it, or to your point, it's developer friendly enough that you could actually deploy our off the shelf models. We have something called the sparse zoo where we actually publish free optimized or pre-sparsified models. And so developers could literally grab those right off the shelf with the training they've already had and just put them right into their applications and deploy them as containers. So yeah, we enable that for sure as well. It's interesting DevOps was infrastructure as code and we had a last season series on data as code, which we kind of coined. This is data as code. This is a whole nother level of opportunity where developers just want to have programmable data and apps with AI. This is a whole new way. Well, great stuff. Our news team at SiliconANGLE on theCUBE said you guys had a little bit of a launch announcement you wanted to make here on the AWS startup showcase. So Jay, you have something that you want to launch here? Yes, and thank you, John, for teaming me up. So I'm going to try to put this in the vein of like an AWS main stage keynote launch, okay? So we're going to try this out. So a lot of our product has obviously been built on top of x86. I've been sharing that to past 15 minutes or so. And with that, we're seeing a lot of acceleration for folks wanting to run on commodity infrastructure. But we've had customers and prospects and partners tell us that ARM and all of its kind of variants are very compelling both cost performance wise and also obviously with Edge. And wanted to know if there's anything we could do from a runtime perspective with ARM. So we got the work and it's a hard problem to solve because the instruction set for ARM is very different than the instruction set for x86. And our deep tensor column technology has to be able to work with that lower level instructions. But working really hard, the engineering team has been at it and we are happy to announce here at AWS startup showcase that deep sparse inference now has or inference runtime now has support for AWS Graviton instances. So it's no longer just x86, it is also ARM. And obviously also opens up the door to Edge and further out the stack. So that optimize once run anywhere, we're not going to open up. So it is an early access. So if you go to neuromagic.com slash Graviton you can sign up for early access but we're excited to now get into the ARM side of the fence as well on top of Graviton. So that's awesome. Our news team is going to jump on that news. We'll get it right up to get a little scoop here on the startup showcase. Jay Marshall, great job. That really highlights that the flexibility that you guys have when you decouple the software from the hardware. And again, we're seeing open source driving a lot more in AI ops now with machine learning and AI. So to me that makes a lot of sense and congratulations on that announcement. Final minute or so we have left. Give a summary of what you guys are all about. Put a plug in for the company, what you guys are looking to do. I'm sure you're probably hiring like crazy. Take the last few minutes to give a plug for the company and give a summary. No, I appreciate that so much. So yeah, join us on neuromagic.com. Part of what we didn't spend a lot of time here are optimization tools. We are doing all of that in the open source. It's called sparse ML and I mentioned sparse zoo briefly. So we really want the data scientist community and ML engineering community to join us out there. And again, the deep sparse runtime, it's actually free to use for trial purposes and for personal use. So you can actually run all this on your own laptop or on an AWS instance of your choice. We are now live in the AWS marketplace. So push button deploy, come try us out and reach out to us on neuromagic.com. And again, sign up for the Graviton early access. All right, Jay Marshall vice president of business development neuro magic here. Talk about performance, cost effect and machine learning at scale. This is season three, episode one, focusing on foundational models for stars building data infrastructure and AI. AI native, I'm John Furrier with theCUBE. Thanks for watching.