 Hello, everyone. Welcome to theCUBE's presentation of the AWS startup showcase. The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three episode, one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. We're excited to join today by Robert Nishihara, who's the co-founder and CEO of a hot startup called AnyScale. He's here to talk about RAID, the open source project, AnyScale's infrastructure for foundational models. Robert, thank you for joining us today. Yeah, thanks so much as well. Been following your company since the founding pre-pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with chat GPT and open AI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream. So I think you guys are really well positioned. I'm looking forward to talking with you today. But before we get into it, introduce the core mission for AnyScale. Why do you guys exist? What is the North Star for AnyScale? Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or these companies getting a lot of value from AI were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that or don't make it to production is the need for this scale, the infrastructure lift to actually make it happen. So our goal here with AnyScale and Ray is to make that easy, is to make scalable computing easy so that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. That programming example of how easy it is with Python reminds me of the early days of cloud when infrastructure as code was talked about was it was just code the infrastructure programmable. That's super important. That's what AI, people want to just program AI. That's the new trend. And I want to understand, if you don't mind explaining the relationship that AnyScale has to these foundational models and particularly the large language models, also called LLMs, we're seeing with like OpenAI and chat GPT. But before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then of course, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them, okay? So Ray and any scale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune that don't want to train the large underlying foundation models but that do want to fine tune them, do want to adapt them to their purposes and build products around them and serve them. Those are also using Ray and any scale for those that fine tuning and that serving. And so the reason that Ray and any scale are important here is that building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources and to actually take advantage of that and actually build these scalable applications. There's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and any scale to take care of that and manage the infrastructure and solve those infrastructure problems or you can build the infrastructure and manage the infrastructure yourself, which you can do but it's going to slow your team down. It's going to, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. We want to focus on product development and move faster. I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point. Doing it yourself, hard to do. These are things where opportunities are and the cloud did that with data centers. Bill took turn to data center and made it an API. The heavy lifting went away and went to the cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening? You guys are taking the learnings and making that available so people don't have to do that? That's exactly right. So today, if you want to succeed with AI if you want to use AI in your business infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point and with Ray and any scale we're going to remove the infrastructure from the critical path. So that as a developer or as a business all you need to focus on is your application logic. What you want the program to do. What you want your application to do. How you want the AI to actually interface with the rest of your product. Now, the way that will happen is that Ray and any scale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray and any scale. And so I think something like this is really necessary for AI to reach its potential. For AI to have the impact and the reach that we think it will. You have to make it easier to do. And just for clarification to point out if you don't mind explaining the relationship of Ray and any scale real quick just before we get into the presentation. The Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that in order to provide an easy a simple open source tool for building and running scalable applications. And any scale is the managed version of Ray. Basically, we will run Ray for you in the cloud provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. Awesome. I know you got a presentation on Ray and any scale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away. And then when you're done presenting, we'll come back. I'll probably agree with a few questions and then we'll close it out. So take it away. Sounds great. So I'll say a little bit about how companies are using Ray and any scale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation the underlying trend here and this is a plot from open AI is that the amount of compute needed to do to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers but the point is no matter how you slice and dice it it's an astronomical rate. Now if you compare that to something we're all familiar with like Moore's law, which says that the processor performance doubles every roughly 18 months you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications and what you can do with a single chip, right? So even if Moore's law were continuing strong and doing what it used to be doing even if that were the case there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph what we've seen and what has been clear to us since we started this company is that to do doing AI requires scaling there's no way around it. It's not a nice to have it's really a requirement. And so that led us to start Ray which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project it's been adopted by a tremendous number of companies companies like open AI which use Ray to train their large models like chat GBT companies like Uber which run all of their deep learning and classical machine learning on top of Ray companies like Shopify or Spotify or Instacart or Lyft or Netflix ByteDance which use Ray for their machine learning infrastructure companies like AntsGroup which makes Alipay they use Ray across the board for fraud detection for online learning, for detecting money laundering for graph processing, stream processing companies like Amazon run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since over the past few years and one of the most exciting use cases is really providing the infrastructure for building, training, fine tuning and serving foundation models. So I'll say a little bit about here are some examples of companies using Ray for foundation models cohere trains large language models open AI also trains large language models you can think about the workloads required there are things like supervised pre-training also reinforcement learning from human feedback so this is not only the regular supervised learning but actually more complex reinforcement learning workloads that take human input about what response to a particular question is better than a certain other response and incorporating that into the learning. There's open source versions as well like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects and organizations training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level there's the core Ray system. This is essentially low level primitives for building scalable Python applications things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries scalable libraries for ingesting and pre-processing data for training your models for fine tuning those models for hyper parameter tuning for doing batch processing and batch inference for doing model serving and deployment, right? And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right? They wanna load their data and feed that into training and Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature, the fact that it is common infrastructure for scaling arbitrary workloads from data ingest to pre-processing to training to inference and serving, right? This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities if they wanna do reinforcement learning, if they wanna use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray being future proof and being flexible in general gives them that ability. Another reason people choose Ray on any scale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray. Making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale training, to scale data ingest, pre-processing and so on. So scalability and performance are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any cloud provider, Google, Google Cloud, AWS, Azure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or Jax or XGBoost or Huggingface or PyTorch Lightning or Psykit Learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tools, tooling in the machine learning ecosystem. That's things like weights and biases or ML flow or different data platforms like Databricks, Delta Lake or Snowflake or tools for model monitoring for feature stores. All of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then any scale is the scalable compute platform that's built on top, you know, that provides Ray. So any scale is a managed Ray service that runs in the cloud. And what any scale does is it offers the best way to run Ray. And if you think about what you get with any scale, there are fundamentally two things. One is about moving faster, accelerating the time to market and you get that by having the managed service. So that as a developer, you don't have to worry about managing infrastructure. You don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows, things like easily moving from development to production, things like having the observability tooling, the debugability to actually easily diagnose what's going wrong in a distributed application. And so things like the dashboards and the other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose any scale is superior infrastructure. So this is things like, you know, cost efficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling, things like just overall better performance and faster scheduling. And so these are the kinds of things that any scale on provides on top of RAIDs, the managed infrastructure, it's fast, it's like developer productivity and velocity as well as performance. So this is what I wanted to share about RAID and any scale and provide that context. But John, I'm curious what you think. I love it. So first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an any scale platform, not tools, so you got tools in the platform. Okay, that's key. Love that, managers. Just curious, you mentioned Python multiple times. Is that because of PyTorch and TensorFlow or Python is the most friendly with machine learning or is it because it's very common amongst all developers? That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, RAID is actually designed in a language agnostic way. And there are companies out there that use RAID to build scalable Java applications. But for the most part right now, we're focused on Python and being the best way to build these scalable Python and machine learning applications. But of course, down the road, there always is that potential. So if you're slinging Python code out there and you're watching this video, get on any scale bus quickly. Also, while you were giving the presentation, I couldn't help since you mentioned OpenAI, which by the way, congratulations because they've had great scale. I've noticed in their rapid growth because they were the fastest company to the number of users than anyone in the history of the computer industry. So major successor at OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning. So congratulations. But I actually typed in to ChatGPT. What are the top three benefits of any scale? It came up with scalability, flexibility and ease of use. Obviously scalability is what they're called. So that's what they came up with. So they nailed it. You have an inside prompt training, buy it, isn't there? Only kidding. Yeah, we hard-coded that one. But that's the kind of thing that came up really, really quickly. If I asked them to write a sales document, it probably will. But this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer to be more human, more natural. And this is clearly will be in every application. In the future. Absolutely. So this is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just the chatbot that you talk to. This is going to be how you get things done, right? How you use your web browser or how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. This is going to be one of those things we're going to look back at this time, Robert, and saying, yeah, from that company, that was the beginning of that wave. And just like AWS and cloud computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing. And that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, okay, I love it, let's go play around, but what's it going to cost me? Is it going to be tied to certain GPUs? What's the landscape look like from an operational standpoint from the customer? Are they locked in and the benefit was flexibility? Are you flexible to handle any cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? The cost is super important here and many of the companies, I mean, companies are spending a huge amount on their cloud computing on AWS and on doing AI, right? And I think a lot of the advantage of any scale, what we can provide here is not only better performance, but cost efficiency, because if we can run something faster and more efficiently, it can also use less resources and you can lower your cloud spending, right? We've seen companies go from 20% GPU utilization with what their current setup and the current tools they're using to running on any scale and getting more like 95, 100% GPU utilization, that's something like a 5x improvement right there. So depending on the kind of application you're running, it's a significant cost savings. We've seen companies that have processing petabytes of data every single day with Ray going from getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And if you're, when you have applications that are spending potentially $100 million a year, this and getting a 10x cost savings is just absolutely enormous. So these are some of the kinds of... Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the cloud, you got infrastructure, you got the platform, you got SaaS, same kind of things going to go on an AI. So I want to get into that ROI discussion and some of the impact with your customers that are leveraging the platform. But first of all, here you got a demo. Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the AnyScale UI. I have started a little AnyScale workspace. So workspaces are the AnyScale concept for interactive developments, right? So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on AnyScale. But, and I'm just going to kick this off. This is going to train a large language model. So OPT and it's doing this on 32 GPUs. Got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change. When I launched the workspace. And what I can do is I can pull up VS code, right? And remember, this is the interactive development experience. I can look at the actual code. Here it's using ray train to train the torch model. Got the training loop and we're configuring, we're saying that each worker gets access to one GPU and four CPU cores. And of course, as I make the model larger, this is using deep speed. If I, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right? And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or different accelerator type, again, this is just a one line change. And here we're using ray train to train the models, just taking my vanilla PyTorch model using hugging face and then scaling that across a bunch of GPUs. And of course, if I want to look at the dashboard, I can go to the ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at the CPU utilization here, where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about any scale, both, I can get that interactive development experience with VS code, I can look at the dashboards, I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, being able to take advantage of all the resources in the cloud to scale. And it's like, you know, I can, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if you're not used, if they're idle, but imagine you go to sleep and have this big cluster, you can turn it off, shut off the cluster, come back tomorrow, restart the workspace and, you know, your big cluster is back up. And all of your code changes are still there, all of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience you want to provide for our users. So that's what I wanted to share with you. Well, I think that whole, a couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean, human error is a big deal. People pass out at their computer, they've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on your water running in your house. It's just, it's at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle, but data cranking the models is a big point. Another thing I want to add there about cost efficiency is that we make it really easy to use if you're running on any scale to use spot instances and these preemptible instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using any scale and they go from not using these spot instances because they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. You know, this was my whole feature article at re-invent last year when I met with Adam Silebsky. This next gen cloud is here. I mean, that's, it's not auto scale, it's infrastructure scale. It's agility, it's flexibility. I think this is where the world needs to go, almost what DevOps did for cloud. And when you were showing me that demo, I had this whole SRE vibe. And remember, Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. What's your, I mean, I might be a little bit off-basic, but how did, would you explain it? It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure, where developers only think about their application logic and where businesses can do AI, can succeed with AI and build these scalable applications, but they don't have to build an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code on their application logic and run the stuff out of the box. Awesome. Well, I appreciate the time before we wrap up here. Give a plug for the company. I know you got a couple of websites that can go. Ray's got its own website, got any scale. I got, you got an event coming up. Give a plug for the company, looking to hire. Put a plug in for the company. Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right? We can be the infrastructure that enables all of that to happen. That makes it easy for companies to succeed with AI and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models, but really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. It's, you know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. So, and all of these companies can get value out of AI can really, can really use AI to improve their businesses. So if you're interested in learning more about Ray, at any scale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if you're business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them and build applications and products around them, give us a call, talk to us, you know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering products, go to market and it's an exciting time. Robert Nishihara, co-founder and CEO of AnyScale. Congratulations on a great company you've built and continuing to iterate on and you've got a growth ahead of you, you've got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGBT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable. So we're going to need that infrastructure. So thanks for coming on this. Season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top stars building foundational model infrastructure for AI and ML. I'm John Furrier, host. Thanks for watching.