 So what I'm going to talk about is Ray, which is a Python library, mostly although it's got a little C++ core for making it really easy to distribute Python applications across the cluster. It was inspired by the challenges of doing distributed machine learning, training neural networks and so forth. You can reach me at this information on the left, in my email address, I'm on Twitter at Dean Wampler. Ray.io is the site for Ray, and I work for any scale, the company developing Ray. We're having a Ray summit this fall, September 30th and October 1st. If you're interested, you can find out more at our events page on our website. So this will be a really fast talk because I have 30 minutes. I'll cover a lot. What I hope you'll get out of it is sort of the gist of what's going on with Ray and why you might be interested in it, and we can certainly take questions in the Discord channel afterwards. So let me start with a demo. I'll exit from this presentation and then switch to my browser where I have an application running. So what I'm going to do is just walk through first, what it's like to work with a core API. Whether you're doing machine learning or not, this is the API you might use for distributing your applications. And then I'll run through an example using reinforcement learning in a library for Ray called RLL, so you can get a sense of what Ray is doing behind the scenes as it were. So I went ahead and evaluated a few cells already. I did some imports. I initialized Ray, which here is just running on my laptop, but I could also tell it to connect to a cluster. And then there's this what we call the Ray dashboard that comes up so I can actually see what's going on if I'm trying to understand performance issues or see what's going on and so forth. We won't look at that again for time's sake, but just so you know it's there. So the example I'm going to simulate is the case where maybe I need to call an expensive data store, expensive in the sense that it takes a lot of time to compute relative to just regular computing. Just to make it a simple case, I've defined two dictionaries, one of which takes the keywords like reinforcement learning, hyper parameter tuning and so forth and returns the corresponding high level library in Ray for those particular tasks. And then there's a second one Ray URLs that takes the values returned from the first dictionary and gives you the links to the documentation for them. And you'll see why we have a second one in just a moment. I'll mostly start with the first one. So let me define it just a regular Python function that will take in one of these phrases like reinforcement learning and look up the value in the dictionary and return both that key and the value when it's done. And to kind of simulate doing something that's compute intensive, I'm going to sleep for a period of time equal to the length of the phrase divided by 10. So it turns out reinforcement learning, I believe is 22 seconds. So it'll sleep for 2.2 seconds when we call it. And then if I define this function that just iterates through all of the phrases in the dictionary and then calls them and time how long this takes, we'll find that it takes about 7.1 seconds because it'll do one at a time and sleep for each one. And it turns out there are 71 characters in the keywords. So that's why we slept for 7.1 seconds. Well, this kind of query could be done in parallel. There's nothing that's related between the queries that I did. So let's see how we could use Ray to turn something into a asynchronous process that we can then do in parallel or possible, but without doing a lot of low level parallel kind of programming. And the way we do that is we define something called a Ray task. We annotate a regular function with this annotation at Ray.remote and I can just turn around and call the other one. So I can have it both ways. I can have the original Python function or this Ray task and use whichever I want. And then when we actually call it, we add this dot remote to the invocation as a visual cue to us that we're actually doing something with Ray and not just making a regular call. And this thing that it's returning is actually a future. So we've fired off this asynchronous call somewhere in our cluster. And at some point we can retrieve the value using ray.get. And this ray.get will block until the value is ready. I waited long enough so it came back immediately and here's what we got. So let's actually do a similar search iteration that we did before. Notice what I'm gonna do. I'm gonna fire off all four queries at once, save those IDs and then call ray.get to get them all at once. And this will take 2.2 seconds, why 2.2? Because the longest key is reinforcement learning. So all of them, even though some of them finish sooner because they have shorter keys and we slept for a shorter period of time, the way I called ray.get actually means that we'll wait for all of them. I'll show you a workaround for that in just a second. And you can see that, yes, reinforcement learning is 22 characters long. Right, now here's why I have the second dictionary and it's a really nice feature of ray that lets us do tasks that actually depend on each other in a reasonably intuitive way. So what I'm gonna do is define a function that is gonna get the doc URL from. So recall that I queried the first one, I got something like ray.rlib. Now I can use that to query the second dictionary to get the URL for the documentation. The way I've written this function is it's actually gonna take the tuple that was returned from the first one. And that will actually be very convenient in what I'm about to do. But otherwise it's kind of identical to the first one. And also I'll create a task as well as a function. And then here, this method will actually first query the original dictionary with all four keys. Then it'll turn around and take those results and query for the doc URLs and then we'll finally return the results there. Let me go ahead and run this. It'll take like four seconds or something, but I want you to notice something crucial that's happening in line two. We have task dependencies now. I can't schedule the second, these doc URL tasks until the value from the first task is completed that corresponds to that lookup. So in a regular distributed system library I'd have to pull or something, wait for the first result, unpack it and then pass it to the next one. Ray is doing all that for me. This subject name is actually one of these IDs. It's not even a regular tuple. It's actually a future ID, but Ray knows that it needs to unpack that for me. And it also waits to schedule this task until the thing is ready. So I can have a graph of dependencies and Ray will handle that for me, but otherwise this code kind of looks like regular Python synchronous code where I'm not even thinking about distributed systems. So I think the reason I got excited about Ray when I joined any scale was I just loved the way that it kind of took normal concepts we like functions, added extensions that let us still kind of pretend like we're working with synchronous code, but now we have the magic of distributed concurrency across a cluster in the whole bit. So it's pretty nice for that reason. And sure enough, it took about 4.4 seconds to do that whole thing. I did mention that we were blocking for everything. There is an idiom with a function called ray.weight where we can loop getting the results that have already finished while we wait for more to go on. I won't take the time to walk through this code because it's a little complicated. I don't have a lot of time, but what we can see happening is as results finish, I go ahead and process them. You know, the first one came back in two seconds while other things are still running and then eventually I finished the loop as I've basically drained the queue, if you will. So that's a nice way to do work with things that are already finished while other work is running. And then lastly, really I'm finishing almost everything about the core API right now is you can also put objects into the distributed object store that Ray is using. So really in a real application, there's dictionaries I wrote, a reference data, I probably put them in the very object store. So tasks all over my cluster can query, pull out those objects and use them as needed. So this is a nice little sort of, let's call it the reverse, the dual if you will of Ray.get. All right, one thing I have not done yet is dealt with distributed state. These tasks that I've been writing have all been asynchronous, or sorry, well, they've been asynchronous, but they've all been stateless. What if I wanna keep some moving state and I wanna have that distributed over the cluster? Well, once again, we start with a familiar idea, something that we know already is a good way to park state, which is a class. So I'm gonna declare just a regular Python class that now is called a search service. It's gonna have these dictionaries, just for convenience that we mentioned. But one thing it's gonna do differently now is that it's actually going to remember how many times I asked for each key, as well as keys that don't exist in the dictionary. So it's actually keeping a state of, sort of metadata about how this service was used. And then we have a query method that does basically what we've done before, where we query for a given phrase. This one also handles the case that it'll determine which dictionary it needs to read and that kind of stuff. And then we'll have a convenience method for getting all of the keys from both dictionaries. But otherwise it's kind of the same code we've already had. As usual, you create a service. We can then do queries over it. And if we run this, we see that, yeah, it returns all of the, here's the known keys and an unknown one that I've appended to it because I'm now gonna try timing this thing. And this will take about 11 seconds because it's gonna one at a time go through each of these sleep for each call and so forth. It's important to note that I've actually gone back to synchronous execution in the sense that I have one actor, as we call it, it's gonna process one request at a time. It's not gonna do it in parallel the way we were doing it. The way you would get back to parallelism is have multiple instances of these, like a farm of them or something, or you can also do a concurrent invocation. So by default, we'll do one at a time, which is usually what you want because that means I'm not likely to corrupt the state in this class, which is the number of queries. So that's usually the way you wanna go. And you can see the results we got back and it took about 11 and a half seconds. And sure enough, we queried each of those keys once, including this one that had never seen before. Well, we can very much turn this into an actor now, just very much like we did before, we'll subclass our search service, we'll annotate it with ray.remote. The one other thing we have to do here is in an actor, you can't reach in and read objects or fields inside the actor. So we have to have getter methods or accessors to get at them. So I've added those to get the dictionaries and also to get that query object that's keeping track of how often we've called something. Notice how we construct one, it's serviceactor.remote. We'll use the same sort of function we used before, but now what we're gonna do is fire off all the queries we want. And again, that's the setup here for list. And then we'll use our weight loop to pull them off as they're done and print out the results. So if we time this and watch what happens, we can see that some of them are starting to come back, but this is synchronous again, because of the way we've now designed it, but at least it's completely thread safe and robust. So it did take 11 and a half seconds, but it worked as before. And we can see that once again, we called all of those keys once. All right, to finish this, let me just rip through very quickly an example of the high-level librarian Ray called RLib for reinforcement learning. If you don't know what reinforcement learning is, it's the thing you may have heard about that beat the world's best go player. It's the technology used to beat Atari games. Essentially, you have an agent. It's looking at an environment. It's making observations about the state of the environment. It's trying to guess what the best action to take is and then it observes the reward it receives, and then it tries to learn and get better and better at choosing actions based on the state. And what I'm gonna do actually is use a popular example of a reinforcement learning environment, which is called cart pole. And that's basically where I have a one-dimensional cart moving back and forth, and I wanna keep this vertical pole balanced as long as possible. And this is very easy to set up. And Ray, for time's sake, I can't go through all the details we're seeing, but I will say this, we are gonna train a two-layer, fully connected neural network, the two hidden layers with 50 weights on each one. And we'll actually see how I get smarter and better as it goes. And while I'm waiting for there it goes, that's finished. So this is just a loop. It's gonna do the training and it'll print out results as it goes and now we'll watch it happen. This will take a few seconds. But what we'll see is we're gonna print out, as it runs, it can do up to 500 points and that's when it just stops. But what we want is for most of these so-called episodes to get as close to 500 as possible. So the number we really care about is the middle number. That's the mean, so that on average, it will do this well at this level of training. And as you can see, this number is getting higher, excuse me. So it's the maximum score that it got when it was doing a training run. And that means it's getting smarter as it goes. Now the iterations, I think we're gonna do 20 and it turns out that it'll get good, but not great. If we let it go longer, it could actually get really good. Close to 500 every time. So we'll let this continue. I'll go ahead and evaluate the next cells so that they can load when we're ready. And what I'm gonna do is just show you the data which kind of reproduces in a nicer format what we're seeing printed and then we'll actually plot it just to see what it looks like. And then we'll do one last step and I'll be done with the demo. So it's up to roughly half of a maximum score. I think it gets up to about 350 or something when it's done for this particular size network and for 20 training steps. And then the last step we'll actually do when it's finished is we'll take that network that we've trained that's saved as a checkpoint and we'll actually try running it and we'll see how it works when it runs like five or six cycles. So there's the data. You can see that the reward mean which is the center column got up to about 305. And here's what it looks like when we plot. The max we very quickly got to the point where we could get a maximum score but the mean score still rose during that whole time. And now we'll try this rollout and you'll see a window pop up where it's actually running these so-called episodes where it's trying to balance the cart. Balance the pole on the cart rather. All right, there we go. So this does about five, I think, episodes. This one's not too bad. It's holding up pretty well. It'll also print out the score. I scroll this down a bit. So that first one, it actually got all the way to 500. This one it looks like it'll stop almost at 500. So this is actually doing pretty well even though our mean score isn't that high. So hopefully you can get a sense that it's actually pretty easy to use a high level domain specific library like this to do the work we wanna do. But at the same time, it's actually using this distributed compute framework under the hood. And the example we've shown would actually go much faster if we were using a real cluster. And then when you're all done, you can shut down if you want. So let me go back to the talk. All right, so why Ray? Well, it's kind of emerged out of two big trends. One is that the size of neural networks is growing enormously, which also translates to how much compute is required to train them. We're far outstripping Morris law in terms of growth. At the same time, Python, as you all know, it's been seen enormous growth of interest in the last decade or so driven in a large part by interest in machine learning and all the great libraries available in machine learning, data science and so forth written in Python. So that kind of means that we really need an easy way to distribute Python over a cluster if we're going to get past the limits of Morris law. But we want something that people who don't really want to care about distributed computing can do so easily. So there's a whole bunch of icons here about different steps that you have to do when you're doing like a typical machine learning pipeline, all of which typically requires some sort of distributed computation to meet the scalability requirements. And the vision of Ray is that we could have this sort of very generic framework. If you think about the original part of the demo there, it was really nothing about machine learning. It was just about scheduling tasks of arbitrary size, managing distributed state. And then on top of that, we can build these libraries that handle specific domains. These are four of them right now for that are part of Ray that are part of the machine learning space. There's others that are being written in neural rather natural language processing. And a lot of people are starting to write generic applications with Ray as well. So let me just talk briefly about these libraries. We actually just saw an example of RL Lib. See how I'm doing for time, pretty good. Here's a different icon or image than the one I showed you. But it's one of these huge spaces that's starting to see a lot of interest for a wide variety of reasons. The first big successes were in gameplay like beating the world's best go player. They've been used a lot in robotics. Actually this one in the middle, this bipedal robot is actually implemented with Ray. There's some interesting work being done, getting closer to regular business problems, industrial automation, workflow management, that sort of thing is becoming a hot area. As well as optimizing systems like your network topologies, HVAC systems, Netflix and YouTube have published some interesting papers about using reinforcement learning to improve their recommendation systems, which has been an old problem, but now they're finding new ways to do it. And then of course in the finance world, they always leverage everything. Finance is a time oriented problem and that's what our reinforcement learning is really good at. So peeling the onion a bit, what's going on in AlphaGo, the go player is that the observations in this case, so the state of the board, the actions are where you're gonna play stones and the rewards are really in this case, instead of a reward at each step, it's only win or lose, that's all. So you don't really know how you did until the end. And behind these things, they built this huge neural network that at various layers can identify different patterns in go play. Reinforcement learning is a very broad field and there's also lots of different ways that people are using it and building algorithms that work, that do reinforcement learning. So what Ray Aralib tries to do is to give you support for all of these different ways of doing things. Open AI Gym is where we got that cart example, for example, and then give you lots of different algorithms that are built in or the easy ability to define your own, lots of abstractions that can be glued together in different ways and then all of this running in a distributed way arbitrary scales using Ray. You can actually try it out in SageMaker. If you're already using SageMaker, there's Ray in the middle here of this picture and Azure just rolled out support for reinforcement learning with Ray as well. Back to why Ray was created. If you think about all of the different compute requirements that you need to be able to support in reinforcement learning, you've got these things like simulators and game engines that look a lot more like regular applications with complex distributed graphs of objects and memory. And the agent itself may be fairly complex, which are much different than what we're used to doing in data science and neural network training, but we have to do that too. And we have to do this over and over again as efficiently as possible because the only way we can train in reinforcement learning is to just play over and over again and learn as we play. So the kind of diversity of CPU requirements, memory access patterns and so forth is what drove the need to build something that's very flexible like Ray, but very efficient also, so that you can build on top of it tools like RLM. The second one I'll mention quickly is Tune. And this is for hyperparameter tuning. What is hyperparameter tuning? Well, it's become a research area in its own right because of problems like when you need to decide what's the best neural network architecture for my problem. Every number you see here is a hyperparameter, meaning before I even train anything, I have to decide on the structure of this neural network. So like how many layers am I gonna have? How big are they? If I'm doing convolution, how much am I doing, what's the size of the convolution? What about pooling and so forth? So the space of possible neural networks is enormous and you don't wanna naively just search all possibilities. You want some intelligence that makes it easy, hopefully to get to a good architecture relatively quickly without a lot of expensive compute time. So Tune is really great for being very concise in how you declare what you want. And then it integrates in lots of frameworks like our favorite neural network and machine learning libraries, as well as intelligent algorithms that try to optimize tuning process. In this case, I'm using Bayesian optimization. It's also optimized for neural network training because that's where the real problem of hyperparameter tuning exists. And it's designed to be easy to plug in new frameworks and new algorithms. Last thing I wanna talk about for those of you who aren't interested in machine learning is what about using Ray to build applications or microservices? I won't go into all the reasons we create microservices. I just wanna focus on one part which is the need to separately manage things. And if you think about it, we often have separate microservices because we might need lots of different instances of some things more than others for scalability reasons. Some microservices might be evolving much more quickly than others. So we need to swap them out very frequently. But the downside is that we have all of this stuff to manage ourselves. We have to have a lot of instances because no one machine is big enough probably. And we don't wanna rely on one machine that if it fails could bring down our application. So we have all of this manual stuff we have to do to deploy our applications. Well, the vision of Ray is that we could actually go back to thinking about one logical instance for each microservice. But behind the scenes, Ray is scheduling all of these actors and tasks across our cluster sort of transparently for us. And we don't have to do nearly as much of that management of instances that we had to do before. There's a lot more that needs to be done here to make this as fully powerful as we might want. But this to me is one of the exciting aspects of Ray. It's really what we've been doing with RLiv and these other tools, but it applies equally well to other general purpose applications as well. And also nicely integrates with systems like Kubernetes because Ray is working at a very fine grain. So it works at a lower level of granular or I guess maybe it's a higher level of granularity than whichever, than Kubernetes does. So it integrates nicely with these other frameworks. So if you're interested in adopting Ray, one thing you might look at is if you're already doing multi-processing with these libraries, JobLiv or multi-processing pool, Ray actually provides some drop-in replacements that break the single node boundaries. So with just changing the import statements, you can now do a scheduling across the cluster rather than just across the cores in a single machine. And it also integrates nicely with async.io if you like programming co-routines. There's a very nice way to use Ray that way as well. So check out ray.io for more information. I've been writing tutorials that you can find at AnyScale Academy at this URL. You're welcome to join the Ray Slack. That's the, excuse me, the best place to find out about Ray to ask questions and so forth. And there's even a Google group for it. Once again, please check out our Ray Summit conference this fall, it's free, it's online and we hope to see you there. Thanks for listening. I'll be happy to take your questions in the Discord channel. Hey, thank you so much. That's amazing. Actually, I really love it because, you know, I really hate setting up all these like pipeline, you know, like distributor thing, everything. I was doing data science, but that really bugs me. And this is actually really good that you have one thing for everything. So let me have a look at the, so because there's no Q&A in Zoom, so I'll have a look at the power of chat to see if anybody's firing any like questions in. If not, then I gotta maybe I would ask one question myself because I'm really interested. So, is it like, so you said that because it's for everything. So like, is it like, so is it difficult to set it up? Like, I really, I'm not good at deploying anything. So like, would you say that it's good for, you know, people to like independent, you know, for example, like my pet project, I just do it myself. I don't have my colleague. Like, is it okay for people to do it that way? Or is it usually for more like a corporate solution for that? So. Yeah, that's a really interesting question. Ray clusters when you go, like when your laptop isn't big enough and you need to go to a cluster, it can be as easy as just standing up some instances in Amazon and then running the initialization script on each one of them. And then it does a pretty good job bootstrapping itself from there. When you submit code to that cluster by, and it's transparent. You just do Ray and it, here's the address in my cluster, then it will actually upload libraries and so forth that you need. So it's generally, you know, I've used a lot of distributed system tools, you know, Spark and Hadoop and all this stuff. And it's about as easy as it can be in that way. It's not completely seamless. Some of the technology we're working out at any scale hopefully make it even easier where you don't have to think very much about it at all. But right now there's a little bit of setup but it's not too difficult and it can be as lightweight or as big as you want. Yeah, that's good. That's good. I love the flexibility. So thank you so much. I think it's really interesting. So like if people, you know, maybe, you know, you can get in touch, you know, this information is there in the slides. And thank you so much. And I hope you enjoy the rest of the conference and the show show afterwards as well. Oh, thank you. Thank you. Thanks for having me.