 Well, hello everybody, John Walls here continuing our coverage here at AWS ReInvent 22. On theCUBE, we continue our segments here in the Global Startup Program, which of course is sponsored by AWS Startup Showcase. And with us to talk about AniScale was the co-founder and CEO of the company, Robert Nishuar. Robert, good to see you, thanks for joining us. Yeah, great, thank you. You bet, yeah, glad to have you aboard here. So let's talk about AniScale first off, for those at home that might not be familiar with what you do, because you've only been around for a short period of time, you're telling me. Company's about three years now. Three years old. Yeah, so tell us all about it. Yeah, absolutely. So one of the biggest things happening in computing right now is the proliferation of AI. AI is just spreading throughout every industry, has the potential to transform every industry. But the thing about doing AI is that it's incredibly computationally intensive. So if you want to do AI, you're probably not just doing it on your laptop. You're doing it across many machines, many GPUs, many compute resources. And that's incredibly hard to do. It requires a lot of software engineering expertise, a lot of infrastructure expertise, a lot of cloud computing expertise to build the software infrastructure and distributed systems to really scale AI across the cloud and to do it in a way where you're really getting value out of AI. And so that is the problem statement, that AI has tremendous potential. It's incredibly hard to do because of the scale required. And what we are building at any scale is really trying to make that easy. So trying to get to the point where, as a developer, if you know how to program on your laptop, then if you know how to program, say in Python on your laptop, then that's enough. Then you can do AI, you can get value out of it, you can scale it, you can build the kinds of, incredibly powerful AI applications that companies like Google and Facebook and others can build. But you don't have to learn about all of the distributed systems and infrastructure. It just, you know, we'll handle that for you. So that's, if we're successful, you know, that's what we're trying to achieve here. Yeah, what makes AI so hard to work with? I mean, you talk about the complexity, a lot of moving parts, I mean, literally moving parts. But what is it in your mind that gets people's eyes spinning a little bit when they look at great potential, but also they look at the downside of maybe having to work your way through a pygmy of sorts. So the potential is definitely there. But it's important to remember that a lot of AI initiatives fail. Like a lot of initiative, AI initiatives, something like 80 or 90%, don't make it out of, you know, the research or prototyping phase and into production. So some of the things that are hard about AI and the reasons that AI initiatives can fail, one is the scale required. You know, moving, it's one thing to develop something on your laptop. It's another thing to run it across thousands of machines. So that's scale, right? Another is the transition from development and prototyping to production. Those are very different, have very different requirements. A lot of times it's different teams within a company. They have different tech stacks, different software they're using. You know, we hear companies say that when they move from, once they prototype and develop a model, it could take six to 12 weeks to get that model in production. And that often involves rewriting a lot of code and handing it off to another team. So the transition from development to production is a big challenge. So the scale, the development to production, handoff. And then lastly, a big challenge is around flexibility. So AI is a fast-moving field. You see new developments, new algorithms, new models coming out all the time. And a lot of teams we work with, you know, they've built infrastructure, they're using products out there to do AI, but they've found that it's sort of locking them into rigid workflows or specific tools, and they don't have the flexibility to adopt new algorithms or new strategies or approaches as they're being developed, as they come out. And so they, but their developers want the flexibility to use the latest tools, the latest strategies. And so those are some of the main problems we see. It's really like, how do you scale? Scalability, how do you move easily from development and production and back? And how do you remain flexible? How do you adapt and use the best tools that are coming out? And so those are, yeah, just those are, and often reasons that people start to use Ray, which is our open source project and any scale, which is our product. So tell me about Ray, right? Open source project. I think you said you worked on it at Berkeley. That's right, yeah. So before this company, I did a PhD in machine learning at Berkeley. And one of the challenges that we were running into ourselves, we were trying to do machine learning. We actually weren't infrastructure or distributed systems people, but we found ourselves in order to do machine learning, we found ourselves building all sorts of tools, ad hoc tools and systems to scale the machine learning, to be able to run it in a reasonable amount of time and to be able to leverage the compute that we needed. And it wasn't just us. People all across machine learning researchers, machine learning practitioners, were building their own tooling and infrastructure. And that was one of the things that we felt was really holding back progress. And so that's how we slowly and gradually got into saying, hey, we could build better tools here. We could try to make this easier to do so that all of these people don't have to build their own infrastructure. They can focus on the actual machine learning applications that they're trying to build. And so we started Ray, started this open source project for basically scaling Python applications and scaling machine learning applications. And well, initially we were running around Berkeley trying to get all of our friends to try it out and adopt it and give us feedback. And if it didn't work, we would debug it right away. And that gradually turned into more companies starting to adopt it, bigger teams starting to adopt it, external contributors starting to contribute back to the open source project and make it better. And before you know it, we were hosting meetups, giving to talks, running tutorials and the project was just taking off. And so that's a big part of what we continued to develop today at any scale is like really fostering this open source community, growing the open source user base, making sure Ray is just the best way to scale Python applications and machine learning applications. So this was a graduate school project, right? You said on your way to getting your doctorate. And now you're commercializing now, right? I mean, so you're being able to offer it. First off, what a journey that was, right? I mean, who would have thought? I guess you probably did think that at some point, but. No, you know, when we started, when we were working on Ray, we actually didn't anticipate it becoming a company or we at least just weren't looking that far ahead. We were really excited about solving this problem of making distributed computing easy, you know, getting to the point where developers just don't have to learn about infrastructure and distributed systems, but get all the benefits. And of course it wasn't until later on as we were graduating from Berkeley and we wanted to continue really taking this project further and really solving this problem that we realized it made sense to start a company. So help me out, like what, and I might have missed this, so I apologize if I did. But in terms of Ray as that building block and essential for your ML or AI work down the road, you know, what is it doing for me? Or what will it allow me to do either one of those realms that I can't do now? Yeah, and so like why use Ray versus not using Ray? I think the answer is that if you're doing AI, you need to scale. It's becoming, if you don't find that to be the case today, you probably will tomorrow, you know, or the day after that. And so it's really increasingly, it's a requirement, it's not an option. And so if you're scaling, if you're trying to build these scalable applications, you are building, you're either going to use Ray or something like Ray, or you're going to build the infrastructure yourself. And building the infrastructure yourself, that's a long journey. So why take that on, right? And many of the companies we work with don't want to be in the business of building and managing infrastructure. Because, you know, if they want their best engineers to build their product, right? To get their product to market faster. I want you to do that for me. Right, exactly. And so, you know, we can really accelerate what these teams can do. And, you know, and if we can make the infrastructure something they just don't have to think about. That's why you would choose to use Ray. Okay, you know, between A and I, ML, are they different animals in terms of what you're trying to get done or what Ray can do? Yeah, and actually I should say like, it's not just, you know, teams that are, new teams that are starting out that are using Ray. Many companies that have built, already built their own infrastructure will then switch to using Ray. And to give you a few examples, like Uber runs all their deep learning on Ray. Okay, and you know, OpenAI, which is really at the frontier of training large models and, you know, pushing the boundaries of AI. They train their largest models using Ray. You know, companies like Shopify, rebuilt their entire machine learning platform using Ray. But they started somewhere else. This is all, you know, like, it's not like the V1, you know, of their machine learning infrastructure. This is like, they did it a different way before. This is like the second version or the third iteration of how they're doing it. And they realize, often it's because, you know, I mean, in the case of Uber, just to give you one example, they built a system called Horovod for scaling deep learning on a bunch of GPUs, right? Now, as you scale deep learning on GPUs, for them, the bottleneck shifted away from, you know, as you scale GPU training, the bottleneck shifted away from training and to the data ingest and preprocessing. And they wanted to scale data ingest and preprocessing on CPUs. So now Horovod, it's a deep learning framework. It doesn't do the data ingest and preprocessing on CPUs. But you can, if you run Horovod on top of Ray, you can scale training on GPUs. And then Ray has another library called RayData. You can, that lets you scale the ingest and preprocessing on CPUs. You can pipeline them together. And that allowed them to train larger models on more data. Before, just to take one example, ETA prediction, if you get in an Uber, it tells you what time you're supposed to arrive. RayData uses a deep learning model called DPTA. And before they were able to train on about two weeks worth of data. Now, you know, using Ray and for scaling the data ingest and preprocessing and training, they can train on much more data. You know, you can get more accurate ETA predictions. So that's just one example of the kind of benefit they were able to get. Also because it's running on top of Ray, and Ray has this ecosystem of libraries, they can also use Ray's hyperparameter tuning library to do hyperparameter tuning for their deep learning models. They can also use it for inference. And because these are all built on top of Ray, they inherit the elasticity and fault tolerance of running on top of Ray. So really, it simplifies things on the infrastructure side because there's just, if you have Ray as common infrastructure for your machine learning workloads, there's just one system to kind of manage and operate. And if you are, it simplifies things for the end users like the developers, because from their perspective, they're just writing a Python application. They don't have to learn how to use three different distributed systems and stitch them together and all of this. So AWS, before I let you go, I mean, how do they come into play here for you? I mean, you're part of the showcase, the startup showcase. So obviously a major partner, a major figure in the offering that you're presenting here. Yeah, well you can run, so any scale is a managed Ray service. Like any scale is just the best way to run Ray and deploy Ray. And we run on top of AWS. And so many of our customers are using Ray through any scale on AWS. And so we work very closely together and we have joint customers. And basically, and a lot of the value that any scale is adding on top of Ray is around the production story. So basically, things like high availability, things like failure handling, retries, alerting, persistence, reproducibility. These are a lot of the values of, the value that our platform adds on top of the open source project. A lot of stuff is well around collaboration. Imagine you are, something goes wrong with your application, your production job. You want to debug it. You can just share the URL with your coworker. They can click a button, reproduce the exact same thing, look at the same logs and figure out what's going on. And also a lot around, one thing that's important for a lot of our customers is efficiency around cost. And so we support, yes, exactly. A lot of people are spending a lot of money on AWS, right? And so any scale supports running out of the box on cheaper spot instances, these preemptible instances which just reduce costs by quite a bit. And so things like that. Well, the company is anti-scale and you're on the show floor, right? So if you're having a chance to watch this during re-invent, go down and check them out. Robert Nishihara joining us here, the co-founder and CEO. And Robert, thanks for being with us here on theCUBE. Really enjoyed it. Me too, thanks so much. For three years graduate program and boom, here you are, you know, with the off to the enterprise you go. Very nicely done. All right, we're going to continue our coverage here on theCUBE with more here from Las Vegas, we're the Venetian, AWS re-invent 22 and you're watching theCUBE. The leader in high tech coverage.