 Hello everyone, welcome to this CUBE Conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We've got a great conversation on making machine learning easier and more affordable in an era where everybody wants more machine learning and AI. We're featuring Neural Magic with the CEO. It's also CUBE alumni, Brian Steve, the CEO. Great to see you. Brian, thanks for coming on this CUBE Conversation. Talk about machine learning. Hey John, happy to be here again. What a buzz that's going on right now, machine learning is one of the hottest topics, AI, front and center. Kind of going mainstream. We're seeing the success of the kind of next gen capabilities in the enterprise and in apps. It's a really exciting time. So perfect timing, great to have this conversation. Let's start with taking a minute to explain what you guys are doing over there at Neural Magic. I know there's some history there, Neural Networks, MIT, but the convergence of what's going on, this big wave hitting, it's an exciting time for you guys. Take a minute to explain the company and your mission. Sure, sure, sure. So as you said, the company is Neural Magic and it spun out at MIT four plus years ago, along with some people and some intellectual property. And you summarize it better than I can, because you said we're just trying to make AI, that one's easier. But another level of specificity around it is, in the world you have a lot of data scientists really focusing on making AI work for whatever other use cases. And then the next phase of that, then they're looking at optimizing the models that they built. And then it's not good enough just to work on models, you got to put them into production. So what we do is we make it easier to optimize the models that have been developed and trained, and then trying to make it super simple when it comes time to deploying those in production and managing them. You know, we've seen this movie before with the cloud, you start to see abstractions come out, data science. We saw like the secret art of being like a data scientist now democratization of data. You kind of seeing a similar wave with machine learning models, foundational models, some call it, developers are getting involved, model complexity is still there, but it's getting easier. It's almost like the democratization happening. You got complexity, you got deployment, it's challenges, cost, you got developers involved. So it's like, how do you grow it? How do you get more horsepower? And then how do you make developers productive, right? So this seems to be the thread. So where do you see this going? Because there's going to be a massive demand for, I want to do more with my machine learning, but what's the data source? What's the formatting? This kind of a stack to them. What are you guys doing to address this? Can you take us through and justify this wave that's hitting that everyone's seeing? Now, like you said, like, you know, the democratization of all of it. And that brings me all the way back to like the roots of open source, right? When you think about like, back in the day, you had to build your own tech stack yourself. A lot of people probably don't remember that. And then you went, you're building it. You're always starting on a body of code or a module that was out there with open source. And I think that's what I equate to where AI has gotten to with what you were talking about the foundational models that didn't really exist years ago. So you really were like putting the layers of your models together in the formulas. And it was a lot of heavy lifting. And so there was so much time spent on development with far too few success cases, you know, to get into production to solve like a business or a technical need. But as these, what's happening is as these models are becoming foundational, it's meaning people don't have to start from scratch. They're actually able to, you know, the avant-garde now is start with existing model that almost does what you want, but then applying your data set to it. So it's, you know, it's really the industry moving forward. And then, you know, and the best thing about it is open source plays in new dimension, but this time, you know, in the realm of AI. And so to us though, like, you know, I've been like, I spent a career focusing on, I think on like the, not just the technical side, but the consumption of the technology and how it's still way too damn hard for somebody to actually like operationalize technology that all those vendors throw at them. So I've always been like empathetic of the user around like, you know, what their job is once you give them great technology. And so it's still too difficult even with foundational models because what happens is, there's really this impedance mismatch between the development of the model and then where the model has to live and run and be deployed and the life cycle of the model, if you will. And so what we've done in our research is we've developed techniques to introduce what's known as sparsity into a machine learning model that's already been developed and trained. And what that sparsity does is that unlocks by making that model so much smaller. So in many cases, we can make a model 90 to 95% smaller even smaller than that in research. So, and so by doing that, we do that in a way that preserves all the accuracy of the foundational model as you talked about. So now all of a sudden you get this much smaller model just as accurate. And then the even more exciting part about it is we developed a software based engine called Deep Spurs. And what the inference runtime does is it takes that now sparsified model and it runs it but because you're sparsified, it only needs a fraction of the compute that it would have needed otherwise. So what we've done is make these models much faster, much smaller, and then by pairing that with an inference runtime, you now can actually deploy that model anywhere you want on commodity hardware, right? So X86 in the cloud, X86 in the data center, arm at the edge, it's like this massive unlock that happens because you get the state of the art models but you get them on the IT assets and the commodity infrastructure that is where all the applications are running today. I want to get into the inference piece and the deep sparse you mentioned, but I got first have to ask you mentioned open source, Dave Vellante and I, with some fellow CUBE alumni as we're having a chat about the iPhone Android moment where you got proprietary versus open source. You got a similar thing happening with some of these machine learning modules where there's a lot of proprietary things happening and this open source movement is growing. So is there a balance there? Are they all trying to do the same thing? Is it more like a chip, you know, silicon's involved with all kinds of things going on that are really fascinating from a science. What's your reaction to that? I think it's like anything that, you know, the way we talk about AI we think had been around for decades, but the reality is it's been some of the deep learning models when we first, when we first started taking models that the brain team was working on at Google and billing APIs around them on Google Cloud and for the first cloud to even have AI services was 2015, 2016. So anything about a really then what, six years since like this thing is even getting lift off. So I think with that, everybody's throwing everything at it. You know, there's tons of funded hardware thrown at specialty for training or inference, new companies, there's legacy companies that are getting into like AI now and whether it's a, you know, a CPU company that's now building specialized ASEX for training. There's new tech stacks, proprietary and software and there's a ton of as a service. So it really is, you know, what's gone from nascent eight years ago is the Wildwell West out there. So there's a little bit of everything right now. And I think that makes sense because at the early part of any industry, it really becomes really specialized. And that's the, you know, showing my age of like, you know, the early part of the 2000s, you know, a red hat people weren't running X86 in enterprise back then and they thought it was a toy and they certainly weren't running open source. But you really, and it made sense that they weren't because it didn't deliver what they needed to at that time. So they need a specialty stacks, they needed expensive hardware that did what an Oracle database needed to do. They needed proprietary software. But what happens is that commoditizes through both hardware and through open source. And the same things are really just starting with AI. Yeah. And I think that's a great point before we call that out because in any industry, timing's everything, right? I mean, I remember back in the 80s, late 80s and 90s, AI, you know, stuff was going on and it just wasn't, there wasn't enough horsepower. It wasn't enough tech. Like you mentioned some of the processing. So AI is this industry that has all these experts who have been scratching that itch for decades. And now with cloud and custom silicon, the tech fundamental at the lower end of the stack, if you will, on the performance side is significantly more performant. It's there, you got more capabilities. Now you're kicking into more software faster software. So it just seems like we're at a tipping point where finally it's here, like that AI moment where machine learning and now data is involved. So this is where organizations I see really jumping in with the CEO mandate, hey, team, make ML work for us. Go figure it out. It's gotta be an advantage for us. So now they're going, okay, boss, we will. So what do they do? What's the steps does an enterprise take to get machine learning into their organizations? Cause you know, it's coming down from the boards. They know, how does this work for us? Yeah, like the, you know, what we're seeing it's like anything like it's whether that was open source adoption or whether it was cloud adoption. It always starts usually with one person and increasingly it is the CEO which realizes they're getting further behind the competition because they're not leaning in, you know, faster. But typically it really comes down to like a really strong practitioner that's inside the organization, right? And that realizes that the number one goal isn't doing more and just training more models and necessarily being proprietary about it. It's really around understanding the art of the possible. Somebody has grounded in the art of the possible with what deep learning can do today and what business outcomes you can deliver, you know if you can employ. And then there's well proven paths through that. It's just that because of where it's been it's not that industrialized today. It's very much, you know, you see ML project is very snowflakey, right? And that was kind of the early days of open source as well. And so we're just starting to get to the point where it's getting easier, it's getting more industrialized or less steps, there's less burdensome on developers there's less burdensome on the deployment side. And we're trying to bring that whole last mile by saying, you know what? Deploying deep learning and AI models should be as easy as to deploy your application, right? You shouldn't have to take an extra step to deploy an AI model. It shouldn't have to require a new hardware it shouldn't require a new process, a new DevOps model it should be as simple as what you're already doing. What is the best practice for companies to effectively bring an acceptable level of machine learning and performance into their organizations? Yeah, I think like the number one starters like what you hinted at before is they they have to know the use case. In most cases you're gonna find across every industry that that problem's been tackled by some company, right? And then you have to have the best practice around fine tuning, the models already exist. So fine tuning that existing model that foundational model on your unique dataset. If you're in medical instruments it's not good enough to identify that it's a medical instrument in the picture you gotta know what type of medical instrument. So there's always a fine tuning step and so we've created open source tools that make it easy for you to do two things at once. You can fine tune that existing foundational model whether that's in language space or whether that's in the vision space you can fine tune that on your dataset. And at the same time you get an optimized model that comes out the other end. So you get kind of both things. So you no longer have to worry about you're freeing you from worrying about the complexity of that transfer learning if you will. And we're freeing you from worrying about well, where am I gonna deploy the model? Where does it need to be? Does it need to be on a device, an edge, a data center, a cloud edge? What kind of hardware is there enough hardware there? We're liberating you from all of that because what you can count on is there'll always be quantity capability commodity CPUs where you want to deploy in abundance because that's where your application is. And so all of a sudden we're just freeing you of that whole step. Okay, let's get into deep sparse because you mentioned that earlier. What inspired the creation of deep sparse and how does it differ from any other solutions in the market that are out there? Sure. So where it's unique is it starts by two things. One is what the industry is pretty good at from the optimization side is they're good at like this thing called quantization which turns like big numbers into small numbers, lower precision. So a 32-bit representation of AI weight into a 8-bit. And they're good at like cutting out layers which also takes away accuracy. What we figured out is to take those, the industry techniques for those that are best practiced but we combined it with unstructured sparsity. So by reducing that model by 90 to 95% in size that's great because it's made it smaller but we've taken that what it's the deep sparse engine when you deploy it that looks at that model and says because it's so much smaller I no longer have to run the part of the model that's been essentially sparsified. So what that's done is it's meant no longer need a supercomputer to run models because there's not nearly as much math and processing as there was before the model was optimized. So now what happens is every CPU platform out there has an enormous amount of compute because we've sparsified the rest of it away. So you can pick your laptop and you have enough compute to run state-of-the-art models. The second thing that and you need a software engine to do that because it ignores the parts of the models that doesn't need to run which is what like specialized hardware can't do. The second part is it's then turned into a memory efficiency problem. So it's really around just getting memory getting the models loaded into the cache of the computer and keeping it there. Never having to go back out to memory. So our techniques are both. We reduce the model size and then we only run the part of the model that matters and then we keep it all in cache. And so what that does is it gets us to like these low latency faster and we're able to increase the CPU processing by an order of magnitude. Yeah, and that low latency is key and you got developers coding super fast. We'll get to the developer angle in a second. I want to just follow up on this motivation behind the deep sparse because as we were talking earlier before we came on camera about the old days, I mean, not too long ago, virtualization and VMware abstracted away the OS from the hardware rights in the server. Server virtualization changed the game and that basically invented cloud computing as we know it today. So we see that abstraction here. There seems to be a motivation behind abstracting away the machine learning models away from the hardware and that seems to be bringing advantages to the AI growth. Can you elaborate on, is that true? And what's your comment? It's, I think it's true for us. I don't think the industry's there yet, honestly because I think the industry still is of that mindset that if I took these expensive GPUs to train my model then I want to run my model on those same expensive GPUs because there's often like not a separation between the people that are developing AI and the people that have to manage and deploy it where you need it. So the reality is that that's everything that we're after. Like, do we decrease the cost? Yes. Do we make the model smaller? Yes. Do we make them faster? Yes. But I think the most amazing power is that we've turned AI into a Docker based microservice. And so like who in the industry wants to deploy their apps the old way on OS without virtualization, without Docker, without Kubernetes, without microservices, without service mesh, without serverless. You want all those tools for your apps by converting AI models so that they can be run inside a Docker container with no apologies around latency and performance because it's faster. You get the best of that whole world that you just talked about which is what we're calling software delivered AI. So now the AI lives in the same world that organizations that have gone through that digital cloud transformation where their app infrastructure AI fits into that world. And this is where the abstraction concepts matter when you have these inflection points, the convergence of compute, data, machine learning that powers AI. It really becomes a developer opportunity because now applications and businesses when they actually go through the digital transformation their businesses are completely transformed. There is no IT, developers are the application they are the company, right? So AI will be part of whatever business or app will be out there. So there is an application developer angle here, Brian. Can you explain how they're gonna use this? Because you mentioned Docker container microservice. I mean, this really is an insane flipping of the script for developers. So what's that look like? Well, it's because like AI is kind of, I mean, again, like it's come so fast. So you figure there's my app team and here's my AI team, right? And they're in different places and the AI team is dragging in specialized infrastructure in support of that as well. And that's not how app developers think. Like they've ran on fungible infrastructure that's abstracted and virtualized forever, right? And so what we've done is we've, in addition to fitting into that world that they like, we've also made it simple for them for they don't have to be a machine learning engineer to be able to experiment with these foundational models and transfer learning home. We've done that so they can do that in a couple of commands. And it has a simple API that they can either link to their application directly as a library to make your friends calls or they can stand it up as a standalone scale up, scale out in front server, they get two choices. But it really fits into that world that the modern developer, whether they're just using Python or C or otherwise, we've made it just simple. So as opposed to like go learn something else that kind of don't have to. And a way though it's made it, it's almost made it hard because people expect when we talk to them for the first time to be the old way, like how do you look like a piece of hardware? Are you compatible with my existing hardware that runs them out? Like no, we're not because you don't need that stack anymore. All you need is a library called to make your prediction. And that's it, that's it. Well, we were joking on Twitter the other day with someone saying, is AI a pet or a cattle, right? Because they love their AI bots right now. So I'd say pet there, but you look at a lot of, it's going to be a lot of AI. So on a more serious note, you made in microservices, will deep sparse have an API for developers and how does that look like? What do I do? Tell me what my as a developer, what's the roadmap look like? What's the- Yeah, it really looks, it really can go in both modes and go into a standalone server mode where it handles, you know, rest API and it can scale out with Kubernetes as the workload comes up and scale back and like try to make hardware do that. Hardware may scale back, but it's just sitting there dormant. You know, so with this, it scales the same way your application needs to. And then for a developer, they basically just, they just, the PIP install deep sparse, you know, as one commanded to do an install and then they do two calls really. The first call is a library call that the app makes to create the model and the model's really already trained but it's called a model create call. And the second command they do is they make a call to do a prediction. And it's as simple as that. So it's, AI is as simple as using any other library that the developers are already using, which sounds hard to fathom because it is just so simplified. Software delivered AI. Okay, that's a cool thing. I believe in it personally. I think that's the way to go. I think there's gonna be plenty of hardware options. If you look at the advances of cloud players that got more silicon coming out, more GPUs. I mean, there's more instance. I mean, everything's out there right now. So the question is how does that evolve in your mind? Because that seems to be key. You have open source projects emerging. What path does this take? Is there a parallel mental model that you see, Brian, that is similar? You mentioned open source earlier. Is it more like a VMware virtualization thing or is it more of a cloud thing? Is there, is it gonna evolve in a trajectory that looks similar to what we might have seen in the past? Yeah, we're, you know, when I got involved with the company, what I, when I thought about it and I was reasoning about it like you, you know, you want to, like we all do when you want to do something full-time, I thought about it and said, where will the industry eventually get to, right? To fully realize the value of deep learning and what's plausible as it evolves. And to me, like I know it's the old adage of, you know, software is hardware, cloudy software. But it truly was like, you know, we can solve these problems in software. Like there's nothing special that's happening at the hardware layer in the processing AI. The reality is that it's just early in the industry. So the view that we had was like, this is eventually the best place where the industry will be is the liberation of being able to run AI anywhere. Like you're really not democratizing. You don't realize the model, but if you can't run the model anywhere you want because these models are getting bigger and bigger with these large language models, then you're kind of not democratizing. And if you got to go in like by a cluster to run this thing on. So the democratization comes by, if all of a sudden that model can be consumed anywhere on demand without planning, without provisioning, wherever infrastructure is. And so I think that's with or without neural magic, that's where the industry will go and we'll get to. I think we're the leaders in getting it there, right? Because we're more advanced on these techniques. Yeah, and your background too. You've seen OpenStack pre-cloud. You saw open source grow and still exponentially growing. And so you have the same similar dynamic with machine learning models growing. And they're also segmenting into almost an ML stack or foundational model as we talk about. So you're starting to see the formation of tooling inference. So a lot of components coming. It's almost a stack. It literally is like an operating system problem space. How do you run things? How do you link things? How do you bring things together? Is that what's going on here? Is this like a data modeling operating environment kind of red hat type thing going on? Yeah, I think there is, I thought about the tune. I think there is the role of like distribution because the industrialization is not happening fast enough of this. And I go back to like every customer, just every user does it in their own kind of way. Everyone's a little bit of a snowflake. And I think that's okay. There's definitely plenty of companies that want to come in and say, well, this is the way it's going to be and we industrialize it as long as you do it our way. The reality is technology doesn't get industrialized by one company just saying do it our way. And so that's why like we've taken the approach through open source by saying like, hey, you haven't really industrialized it. If you said we made it simple, but you always got to run AI here, right? You only like really industrialize it if you break it down into components that are simple to use and they work integrated in the stack the way you want them to. And so to me, that first principles was getting things into microservices and dockers that could be run on VMware, OpenShift, on the cloud, in the edge. And so that's the real part that we're happening with. The other part, like I do agree, like I think it's going to quickly move into less about the model, less about the training on the model and the transfer learning, you know, the data set of the model, we're taking away the complexity of optimization, giving, liberating deployment to be anywhere. And I think the last mile, John is going to be around the ML ops around that because it's easy to think of like soft, now that it's just a software problem, we've turned it into a software problem. So it's easy to think of software as like kind of a point release, but that's not the reality, right? It's a lifecycle and it's, and so I think ML very much brings in what is the lifecycle of that deployment? And you know, you get into more interesting conversations to be honest than like once you've deployed in a docker container is around like model drift and accuracy and the data set changes and the user changes is how do you become from an ML ops perspective aware of that, sending signal back, retraining, and that's where I think a lot of the, more of the innovation is going to start to move to there. Yeah, and software also the software problem, software opportunity as well is developer focused. And if you look at the cloud native landscape now, similar stacks, developing a lot of components, a lot of things to stitch together, a lot of things that are automating under the hood, a lot of developer productivity conversations. I think this is going to go down that same road. I want to get your thoughts because developers will set the pace and this is something that's clear in this next wave, developer productivity, they're the de facto standards bodies, they will decide microservices check, API check. Now skill gap is going to be a problem because it's relatively new. So model sprawl, model sizes, proprietary versus open, there has to be a way to kind of crunch that down into a like a DevOps like just a limited, make it the developer out of the muck. So what's your view? Are we early days like that? Or what's the young kid in college studying CS or whatever degree who jumps into this with both feet, what are they doing? I'll probably say like the non-popular answer to that like a little bit is it's happening so fast that it's going to get kind of boring fast. Meaning like, yeah, you could go to school and go to MIT, right? Sorry, like, can you get a whole three and like becoming a model architect, like inventing the next model, right? And the layers and combining them, et cetera, et cetera and what operators and building a model that's bigger than the last one and trains faster, right? And there will be those people, right? That actually like they're building the engines the same way, you know, I grew up as an infrastructure software developer. There's not a lot of companies that hire those anymore because they're all sitting inside of three big clouds. So you better be a good app developer. But I think what you're going to see is before you had to be everything you had to be the, if you were going to use infrastructure you had to know how to build infrastructure. And I think the same thing's true around is quickly exiting ML is to be able to use ML in your company. You better be like great at every aspect of ML including every intricacy inside of the model and every operation it's doing. That's quickly changing. Like you're going to start with a starting point, you know, in the future. You're not going to be like cracking open these GPT models. You're going to just be pulling them off the shelf fine tuning them and go. You don't have to invent it. You don't have to understand it. And I think that's going to be a pivot point, you know, in the industry between, you know, what's the future of a data scientist, ML engineer researcher look like? I think that's the outcome is going to be determined. I mean, you mentioned, you know, doing it yourself. What an SRE is for a Google with the servers scales huge. So yeah, at the beginning get boring you get obsolete quickly, but that means it's progressing. So the scale becomes huge. And that's where I think it's going to be interesting when we see that scale. Yeah. Yeah, I think that's right. I think that's right. And we always in, and what I've always said and much the, again, the sugar into my ML team is that I want every developer to be as adept at, you know, take advantage of ML as not ML engineer, right? It's got to be that simple. And I think, I think it's getting there. I really do. Well, Brian, great, great to have you on the cube here on this cube conversation as part of the startup showcase that's coming up. You're going to be featured in your company with featured on the upcoming Avis startup showcase on making machine learning easier and more affordable as more machine learning models come in. You guys got deep sparse and some great technology. We're going to dig into that next time. I'll give you the final word right now. What do you see for the company? What's, what are you guys looking for? Give a plug for the company right now. Oh, give a plug that I haven't already doubled in as the plug. You're hiring engineers, I assume, from MIT and other places. I think the biggest thing is we're on the developer side. We're here to make this easy. The majority of inference today is on CPUs already, believe it or not, as much as we like to talk about hardware and specialized hardware. The majority is already on CPUs. We're basically bringing 95% cost savings to CPUs through this acceleration. But we're trying to do it in a way that makes it community first. So I think the shout-out would be, come find the neural magic community and engage with us and you'll find a thousand other like-minded people in Slack that are willing to help you as well as our engineers and let's go take on some successful AI deployments. Exciting times. This is, I think, one of the pivotal moments, next gen data, machine learning, and now starting to see AI not be that chat bot, just customer support or some basic natural language processing thing. You're starting to see real innovation. Brian Stevens, CEO of Neural Magic, bringing the magic here. Thanks for the time. Great conversation. Thanks for joining me. Cheers, thank you. Okay, I'm John Furrier, host of theCUBE here in Palo Alto, California, for this CUBE Conversation with Brian Stevens. Thanks for watching.