 Hello, and welcome to this CUBE Conversation. I'm John Furrier, host of theCUBE here in our Palo Alto studios. We're featuring Octo ML. I'm with the CEO, Louis says, Chief Executive Officer, co-founder of Octo ML. I'm John Furrier, the CUBE. Thanks for joining us today. Louis, great to see you. Last time we spoke was at RIMARS, Amazon's event, kind of a joint event between Aiders and Amazon, kind of put a lot together. Great to see you. Great to see you again, John. I really have good memories of that interview. That was definitely a great time. Great to chat with you again. The world of ML and AI, machine learning, AI is really hot. Everyone's talking about it. It's really great to see that advance. So I'm looking forward to this conversation. But before we get started, introduce who you are and Octo ML. Sure, I'm Louis Seze, co-founder and CEO at Octo ML. I'm also professor, computer science at the University of Washington. You know, Octo ML grew out of our effort on the Apache CVM project, which is a compiler and runtime system that enables folks to run machine learning models in a broad set of harder in the edge and in the cloud very efficiently. You know, we grew that project and grew that community. Definitely saw it was solving a pain point there. And then we built Octo ML, Octo ML is about two and a half years old now in the mission. The company is to enable customers to deploy models very efficiently in the cloud and make them, you know, run, do it quickly, run fast and run at a low cost, which is something that's especially timely right now. I like to point out also for the folks because they should know that you're also a professor in the computer science department at University of Washington, a great program there. This is a really an inflection point with AI machine learning. The computer science industry has been waiting for decades to advance AI with all this new cloud computing, all the hardware and silicon advancements, GPUs. This is the perfect storm. And you know, the computer science now we're seeing an acceleration. Can you share your view? You're obviously a professor in the department, but also an entrepreneur. This is a great time for computer science. Explain why. Absolutely, I know just like just the confluence of, you know, advances in what, you know, computers can do as devices to computer information plus, you know, advances in AI that enable applications that, you know, we thought was highly futuristic. And I was just right there today. You know, AI that can generate photorealistic images from descriptions, you know, can write text, that's pretty good, can help augment, you know, human creativity in a really meaningful way. So seeing the confluence of capabilities and the creativity of humankind into new applications is just extremely exciting, both from a researcher point of view, as well as an entrepreneur point of view, right? What should people know about these large language models we're seeing with ChatGPT and Google that's got a lot of work going on in that area? There's been a lot of work recently. What's different now about these models and why are they so popular and effective now? What's the difference between now and say five years ago that makes it, makes it... Oh yeah. It's a huge, huge inflection on their capabilities and all would say like emergent behavior, right? So as these models got more complex and our ability to train and deploy them, you know, got to this point, you know, they really cross the threshold into doing things that are truly surprising, right? In terms of generating, you know, acceleration for things, generating tax, summarizing tax, expanding tax, and exhibiting what to some may look like reasoning. They're not quite reasoning fundamentally. They're generating tax that looks like they're reasoning but they do it so well that it feels like was done by a human, right? So I'll say that the biggest change is that, you know, now they can actually do things that are extremely useful for business and people's lives today that, and that wasn't the case five years ago. So that's in the model capabilities and that is being paired with huge advances in computing that enable this to be, enables this to be, you know, actually see line of sites to be deployed at scale, right? And that's where we come in, by the way. Yeah, I want to get into that. And also, you know, the fusion of data, integrating data sets at scales. Another one we're seeing a lot of happening now. It's not just some, you know, siloed pre-built data modeling. It's a lot of agility and a lot of new integration capabilities of data. How is that impacting the dynamics? Yeah, absolutely. So I would say that the ability to either take the data that I had that exists and training a model to do something useful with it and more interestingly, I would say using baseline foundational models and with a little bit of data turned them into something that can do a specialized task really, really well, created this really fast proliferation of really impactful applications, right? If every company now is looking at this trend, and I'm seeing a lot, I think every company will, rebuild their business with machine learning. If they're not already doing it, and the folks that aren't will probably be dinosaurs, they'll be out of business. This is a real business transformation moment where machine learning and AI, as it goes mainstream, I think it's just the beginning. This is where you guys come in, and you guys are poised for handling this frenzy to change business with machine learning models. How do you guys help customers as they look at this transition to get concept to production with machine learning? Great, great questions. Yeah, so I would say that it's fair to say there's a bunch of models out there that can do useful things right off the box, right? So, and also they believe that the great models improved quite a bit. So the challenge now shifted to customers, everyone is looking to incorporate AI into their applications. So what we do for them is to, first of all, how do you do that quickly without needing highly specialized, difficult to find engineering, and very importantly, how do you do that at a cost that's accessible, right? So all of these fantastic models that we just talked about, they use an amount of computing that's just astronomical compared to anything else we've done in the past. The means of the costs that come with it are also very, very high. So it's important to enable customers to incorporate AI into their applications, to their use cases in a way that they can do with the people that they have and at the cost that they can afford so that they can have the maximum impact they possibly have. And finally, helping them deal with hardware availability. As you know, even though we made a lot of progress in making computing cheaper and cheaper, even to this day, you can never get enough and getting an allocation, getting the right hardware to run these incredibly hungry models is hard. And we help customers deal with hardware availability as well. Yeah, for the folks watching, as if you search YouTube, there's an interview we did last year at Remars. I mentioned that earlier, just a great interview. You talked about this hardware independence distraction. I want to get into that because if you look at all the foundation models that are out there right now that are getting traction, you're seeing two trends. You're seeing proprietary and open source and obviously open source always wins in my opinion, but there's this iPhone moment and Android moment that one of your investors, John Tru, from Madrona talked about was is iPhone versus Android moment, one's proprietary hardware and they're very specialized high performance and then open source. This is an important distinction and you guys are hardware independent. Explain what all of this means. Yeah, great, great, great set of questions. First of all, yeah, so open AI, of course, they create a chat GPT and they offer an API to run these models that does amazing things, but customers have to be able to go and send their data over to open AI, so and run the model there and get the output. Now there's open source models that can do amazing things as well. So they typically open source models so they don't lag behind these proprietary closed models by more than say six months or so, let's say. And that means that enabling customers to take the models that they want and deploy under their control is something that's very valuable because one, you don't have to expose your data to externally. Two, you can customize the model even more to the things that you wanted to do. And then three, you can run on an infrastructure that can be much more cost effective than having to pay somebody else's cost and markup, right? So and where we help them is essentially help customers, enable customers to take machine learning models, say an open source model and automate the process of putting them into production, optimize them to run with the right performance and more importantly, give them the independence to run where they need to run, where they can run best, right? Yeah, and also, I point out all the time that there's never any stopping the innovation of hardware, silicon, you're seeing cloud computing more coming in there. So being hardware independent has some advantages. And if you look at open AI, for instance, you mentioned chat GPT, I think this is interesting because I think everyone is scratching their head going, okay, I need to move to this new generation. What's your pro tip and advice for folks who want to move to or businesses that want to say, move to machine learning? How do they get started? What are some of the considerations they need to think about to deploy these models into production? Yeah, great, great set of questions. First of all, I mean, they, I'm sure they're very aware of the kind of things that they want to do with AI, right? So it could be interacting with customers, automating, interacting with customers. It could be finding issues in production lines. It could be generating, make it easier to produce content. And so like, customers, users would have an idea what they want to do. No, from that, you can actually determine what kind of machine learning models would solve the problem that would fit that use case. But then that's when the hard thing begins, right? So when you find a model, identify the model that can do the thing that you want to do, you need to turn that into a thing that you can deploy. So how do you go from machine learning model that does the thing that you need to do to a container with the right executables, writing the artifact that can actually go and deploy, right? So we've seen customers do that on their own, and it's got a bit of work, and that's why we're excited about the automation that we can offer, and then turn that into a turnkey problem, right? So a turnkey process. Luis, talk about the use cases. If I don't mind going to double down on the previous answer. You got existing services, and then there's new AI applications, AI first applications. What are the use cases with existing stuff and the new applications that are being built? Yeah, I mean, existing stuff is, for example, how do you do very smart search and auto completion when you are editing documents? For example, very, very smart search of documents, summarization of tax, expanding bullets into pros in a way that you don't have to spend as much human time. That's why some of the existing applications, right? So some of the new ones are truly AI native ways of producing content. Like there's a company that we are, we share investors in a lot of what they're doing called One Way and Now, for example, it's sort of like an AI first way of editing and creating visual content. And so you could say, you have a video, you could say, make this video look like it's night as opposed to dark or remove that dog in the corner. You can do that in a way that you couldn't do otherwise. So there's like definitely AI native use cases. And yet another one in life sciences, there's quite a bit of advances on AI based therapies and diagnostics processes that are designed using automated processes. And this is something that we're just scratching the surface there, there's huge opportunities there, right? Talk about the inference and AI and production kind of angle here because cost is a huge concern. When you look at, and use your hardware and that flexibility there, so I can see how that could help. But is there a cost freight train that can get out of control here if you don't deploy properly? Talk about the scale problem around costs in AI. Yeah, absolutely. So very quickly, one thing that people tend to think about is the cost is, training has really high dollar amounts, it tends to over index on that. But what you have to think about is that for every model that's actually useful, you're gonna train it once and then run it a large number of times in inference. That means that over the lifetime of a model, the majority, the vast majority of the compute cycles and the cost is gonna go to inference. And that's what we address, right? So, and to give you some idea, if you're talking about using a large language model today, you can say it's gonna cost a couple of cents for 2,000 words output. If you have a million users active a day, if you're lucky and you have that, this cost can actually volume very quickly to millions of dollars a month, just in inferencing costs, assuming that you actually have access to the infrastructure to run it, right? So, means that if you don't pay attention to these inference costs, it's definitely going to be a surprise and affects the economics of the product where this is embedded in, right? So, this is something that, you know, there's quite a bit of attention being put on right now on how do you do search with large language models and if you don't pay attention to the economics, you know, you can have a surprise, you have to change the business model there. Yeah, I think that's important to call out because you don't want it to be a runaway cost structure where you architected it wrong and then next thing you know, you got to unwind that, I mean, it's more than technical debt, it's actually real debt, it's real money. So, talk about some of the dynamics with the customers. How are they architecting this? How do they get ahead of that problem? What do you guys do specifically to solve that? Yeah, I mean, well, we help customers. First of all, be hyper aware, you know, understanding what's going to be the cost for them deploying the models into production and showing them the possibilities of how you can deploy the model with a different cost structure, right? So, that's where, you know, the ability to have higher independence is so important because once you have higher independence, after you optimize models, obviously, you have a new, you know, dimension of freedom to choose, you know, what is the right throughput per dollar for you and what are the options? And once you make that decision, you want to automate the process of putting into production. So, the way we help customers is showing very clearly in their use case, you know, how they can deploy their models in a much more cost-effective way. You know, in the cases, there's a case study that we put out recently showing a four acts reduction in deployment costs, right? So, this is by doing a mix optimization and choosing the right hardware. How do you address the concern that someone might say, Luis, that, hey, you know, I don't want to degrade performance and latency and I don't want the user experience to suffer. Can you, what's the answer there? Yeah, two things. So, first of all, all of the manipulations that we do in the model is to turn the model into efficient code without changing the behavior of the models. We do not, we wouldn't degrade the experience of the user by having the model be wrong more often. We don't change that at all. The model behaves the way it was validated for. And then the second thing is, you know, user experience with respect to latency, it's all about a maximum, like you could say, I want a model to run at 50 milliseconds or less. If it's much faster than 50 milliseconds, you're not going to notice the difference, but if it's lower, you're going to notice the difference. So, the key here is that, how do you find a set of options to deploy that you are not overshooting performance in a way that's going to lead to costs that has no additional benefit? And this has, this provides a huge, very significant margin of choices, set of choices that you can optimize for costs without degrading customer experience, right? And user experience. Yeah, and I also point out the large language models like the chat GPTs of the world are coming out with David Montenegro, we're talking on his breaking analysis around this being like over 10x more computational intensive on capabilities. So, this is a huge thing. So, and also supply chain. Some people can't get servers, by the way. So, or hardware these days. So. Or even more interestingly, right? So, they do not grow in trees, John. Like GPUs is not kind of stuff that you plant an orchard, and a lot of them have a bunch, and then you're going to increase it, but no, these things, you know, take a while. So, and you can't increase it overnight. So, being able to live with the cycles that are available to you is not just important for costs, but also important for people to scale and serve more users at, you know, at whatever pace that they come, right? So. You know, it's really great to talk to you and congratulate you on that. I'm looking forward to the startup showcase we'll be featuring you guys there. But I want to get your personal opinion as someone in the industry, and also someone who's been in the computer science area for your career. You know, computer science has always been great, and there's more people enrolling in computer science, more diversity than ever before, but there's also more computer science related fields. How is this opening up computer science? And where is AI going with the computers, with the science? Can you share your vision on, you know, the aperture or the landscape of COMSI, CS students and opportunities? Yeah, and also I think it's fair to say that computing has been embedded in pretty much every aspect of human life these days, human life these days, right? So, for everything. And AI has been a counterpart, it's been an integral component of computer science for a while. Madison's that happened in the last 10, 15 years in AI has shown, you know, new application has a thing to re-energize how people see what computers can do. And, you know, there is this picture in our department that shows computer science at the center called the flower picture, and then all the different paddles like life sciences, social sciences, and then, you know, mechanical engineering and all these other things that, and I feel like it can replace that center with computer science. I put AI there as well, you see AI, you know, touching all these applications, AI, in healthcare, in diagnostics, AI, in discovery in the sciences, right? So, but then also AI doing things that, you know, humans wouldn't have to do anymore, they can do better things with their brains, right? So, it's permeating every single aspect of human life from intellectual endeavor to day-to-day work, right? Yeah, and I think the chat GPT and OpenAI has really kind of created a mainstream view that everyone sees value in it. Like, you could be in the data center, you could be in bio, you could be in healthcare, I mean, every industry sees value. So, this brings up what I call the horizontally scalable use of this. And so, this opens up the conversation. What's going to change from this? Because if you go horizontally scalable, which is a cloud concept, as you know, that's going to create a lot of opportunities and some shifting of how you think about architecture around data, for instance. What's your opinion on what this will do to change the inflection of the role of architecting platforms and the role of data specifically? Yeah, so, good question. There's a lot in there, but I should have added the previous question that you can use AI to do better AI as well, which is what we do and all the folks are doing as well. And so, the point I wanted to make here is that it's pretty clear that you have a cloud-focused component with an edge-focused counterparts. Like, you have AI models, but both in the cloud and in the edge, right? So, the ability of being able to run your AI model where it runs best also has a data advantage to it from, say, from a privacy point of view. That's inherent. You could say, hey, I want to run something locally, strictly locally, such that I don't expose the data to infrastructure. And you know that the data never leaves you, right? Never leaves the device. Now, you can imagine things that's already starting to happen, like you do some forms of training and model customization in the model architecture itself and the system architecture, such that you do this as close to the user as possible. And there's something called federated learning that has been around for some time now that's finally happening is how do you get the data from butcher places? You do some common learning and then you send a model to the edges and they get refined for the final use in a way that you get the advantage of aggregating data, but you don't get the disadvantage of privacy issues and so on. It's super exciting. That's some of the considerations, yeah. It's a super exciting area around data infrastructure, data science, computer science. Luis, congratulations on your success at OctaML. You're in the middle of it. And the best thing about it is businesses are looking at this and really reinventing themselves. And if a business isn't thinking about restructuring their business around AI, they probably will be out of business. So this is a great time to be in the field. So thank you for sharing your insights here in theCUBE. Thank you very much, John. I always have a pleasure talking to you. I always have a lot of fun. And we both speak really fast, I can tell. I know, without the transcript available, we'll integrate it into our CUBE GPT model that we have. That's right, great, thank you, John, great to you. Bye. Okay, this is theCUBE, I'm John Furrier here in Palo Alto, CUBE Conversation. Thanks for watching.