 Okay, well, thanks very much for coming, my name is Eric Rolamson and I'm a software engineer at Red Hat and I'm here to talk to you today about integrating OpenData with Ray. Or you can think of it as Jupiter and Ray in the cloud. So the basic landscape to talk, I'm going to describe what Ray is and how it works at 10,000 meters. And then I'm going to kind of put Jupiter and OpenData into context about where it's coming from and why. And then I'll describe how I get Ray on ODH to work and do a little demo and then close with some of the relevant community collaborations. So Ray was designed to sort of like occupy a niche on the spectrum that you could say is sort of like, you know, at a significantly higher level than something like OpenMPI, which is of course extremely low level, but give you a little bit more fine-grain control of what kind of parallelism you actually have than something like Apache Spark. So in Ray, there's two kinds of things you can parallelize that are tasks. They're just like executing a function. Actors, which kind of act like parallelizable services. And its corresponding programming model is actually quite easy to use if you have any kind of, you know, Python function. You can simply add this at-ray remote decorator and Ray will, you know, wrap it in some parallelizable code. And similarly, if you have a class, you can do exactly the same decorator. And now you have something between deploying to Ray as a running service, like obviously. So if you ever work with something like Apache Spark that's probably familiar to you, it represents all of its computations as a declarative DAG. And so by declarative, what I mean is you build up the computation, you know, with calls, but it doesn't execute into actually tell it to. So here you can see we're basically building up a little tree that is going to add up a bunch of integers. The most over-engineered summation of integers you may have received. But basically saying, hey, I'd like you to set up a compute where you add integers one and two, three and four, and so on. And then I'd like you to, you know, add the inner V1s. And then finally, the very bottom, when you say ray.get, you know, Ray goes off and says, OK, now you want me to compute something. Unwind this DAG for the compute and send you back the result. And I took this particular example in the diagram. It's a nice blog post by Robert Ishihara. If you want to go check it out, it's a nice write-up. Yes, that's a great idea. Ray's data model is basically a typeless object store. This is implemented by Plasma now at the time. When Plasma started, it was actually part of the Ray-Coase. It's been exported to Arrow, actually. So it's now the Plasma stuff you can find under the Patrick Arrow project. So like a lot of things in Python, it's typeless and schema-less. It's just data that you push up there. It is designed to be local first. Just to say it will try to use, you know, if you have a computation of using data, you'll have to see if the data is available locally first before it's full. And so, you know, as long as things are staged properly, the equation. Ray's scheduling model is similar. You know, it will try to execute the, you know, functions you give it locally first. It will only submit to the global scheduler. It can't service a computer. I do it somehow. There are many, many libraries. Like the big five are Ray-Tune for, like, hyperfiner tuning. It has a scale of what we enforce with learning. And, you know, distributed training with standard, like, stochastic gradient descent. It allows you to actually serve models if you train. You can use Ray's serve library. And a new addition, relatively speaking, is Ray data sets. So I was just telling you about how it's native object stores or, like, typeless and schema-less. But Ray data sets are sort of like net data frames. You know, so if you're working with column or data with a data set, we often do. There are tons of integrations. Like, you can integrate Ray easily with, like, Google, Dask, PyTorch, FK Learn and some of the, you know, more specialized libraries. Like, Space, Hugging, Space. The bottom is URL, which you can look at for being at all of them. And so one thing about all these, you might have noticed, is all these are things that people already use on Jupyter. And so this is something that we could, if we integrate properly, get, you know, literate programming and interactive, iterative development with Jupyter, but with Ray, as they all like to compete back in. These are both tools that run on Kubernetes and you can use the demo I'm going to be doing on Jupyter. So when I start to cut things integration, you connect it to a cluster using ray.init. And it had a strange limitation where, it had a strange limitation where you could only connect on the physical node that the head node was running on. So, like, if you wanted to do this, you would have actually had to, like, install the ray head node and Jupyter onto the same pod. And so this is, like, obviously, architecturally not very convenient. However, you know, not too long afterwards, they created a new utility, ray.connect. And ray.connect allows you to connect the way you'd really like to, which is to say you have some application like a Jupyter environment, and you connect remotely to the ray head pod. This allows you to stand up the ray cluster separately and connect it to the inside. Now, they keep evolving this, and so now ray.init, which used to work the wrong way, is now the new standard, which is the old standard, but now it connects, again, using this proper connection capability. So that's been something that's fundamental, as well as how you connect to the ray cluster, that keeps changing that, but now it's back to the old standard. So in my case, I wanted to sort of like consume Jupyter, consume Jupyter via open data hub. There are a few reasons for that. Open data hub, first of all, is an open source downstream of Kubeflow, and so it's picked off the open source cooling box. It is a reference platform, which is to say it acts as like a repository of standard data science tools, many people like to use, and allows you to deploy them easily, as a reference stack into the cloud. And it's nicely federated, so like, while you can easily install all these different tools via open data hub, they're not super tightly integrated, and it also, because it's all running on OpenShift, makes it easy if you need to add other kinds of cooling into it, and you're not going to break its model. Open data hub covers most of the phases of a machine-burning workflow, like from a setting your business goal to ETL, current data training, through actually deployment and monitoring, and along the other axis, it also covers the corresponding persona, both stakeholders, data engineers, data scientists, and the DevOps folks, and the operation to help you actually run the model. We can do this, you know. Today I'll be primarily talking about Jupiter, which is right in there, but because this is in an open data hub environment, all the other tools are available to you to do some solving. So, at Red Hat, we actually use open data hub. There's been studies of application logs, analyses, and cluster metrics on the projects like Operator, and also internally, some of the customers support data is being processed, so it's a tool that we actually use. So now I'm going to start describing how this is actually working, and I'm going to start with an analogy. This is the existing integration, which is the spark on open data hub. So, if you've ever used ODH, Jupiter hub launcher, you bring it up, and you pick an image, and tell it inside, and you pick it off, and when you pick it off, the launcher, first of all, as it always does, spins up a Jupiter environment for you, but there are some other things it can do if you configure it. So, you can also configure a spark single user profile, and ODH launcher will detect the presence of this out on the cluster, and it will look for some resources to spin up, and the resources are described via what are called cluster service templates. It's basically GAML, which is a compatible exchange of petitions, and so those objects, you can configure to see something like stand up for the cluster for you, and then once you get into the Jupiter environment, the cluster is there and available for you to use. So, you simply use the profile, and the corresponding template objects are nothing but compagnets that sit out on the cluster, and so it made me wonder, well, if I can describe objects that stand up as bar cluster-wise, maybe I could also, you know, describe something that stands up already, and then it would be the same. I can give you self-service, personalized, great cluster of people. I love the animal, right? I wanted to show you this, not because I want to actually expose it to GAML, but just to show you that these really are the first standard of the profile for array-nl notebook, and I described some things like array cluster and resources, and the bottom, I described how you must fill in array cluster environment variable for the notebook. And similarly, that points to a bunch of template objects, and so all on the second compagnet you can see. There's nothing special about it. You can see the things with user names, how to generate special labels and otherwise standard things that array clusters like to know about, like how many workers and upscaling rates and stuff like that. And so, now I've described you how this is all happening in a little live demo on our cluster, and I mentioned that the person who took my original work and added all the latest array features in the updated images is Michael Clifford, so the demo you're about to see is... Thanks to Michael for putting it up on our other website. So, here we have a feature environment and because it takes a few minutes, I'm not going to show much of logging into the launcher, but I logged into the launcher and it's called me up to super environment. You can see it has a bunch of array team and other stuff already pre-installed on the image. And Michael updated all those images and we thought and so we're going to plug these things and here I mentioned the array cluster environment variable. This again is like basically preset for you. You can see it's just basically name of a service that's sitting out on an open shift of the Kubernetes cluster if it's connected. So, here we'll do the array init command, standard port, init connect and give this back to the object describing the parameters. So, you see the live version is 1.13 So, I want to actually at this point it's kind of arbitrary, isn't it? I just showed you like the integration or however it works. I'm going to use with the standard sqlearn to generate some arbitrary classifications that everybody can use and we will actually run a, you know, just a baseline to show you that it's supposed to be run sqt and fit a model for it to determine accuracy on our synthetic data. But now that's of course the standard thing you might do if you're not with them, so I do the same thing with the ray tune, the hyper tune. And it'll create ourselves an objective function to send to it. Those of you who are perceptible might notice that I've actually stuck my training data like it's in the closure of the function. It's not something you're like typically going to do but it works for a demo. Normally you would actually stage this in sort of like a spark-like way using the ray data set model to make things actually efficient for large-scale data. We'll define a hyper tuning search space using the standard alpha and sqlearn parameters where you can see that our grid is six fifth elements that we multiply out. And now with our objective function and the search space we will pick off a tuning rod and now here we're actually sending this stuff off to ray and telling you to actually search over our space and send back to know the best possible result and then point it a little bit but it will actually work. The first thing is one of these things is ray. The ray the ray head node is always up so it immediately got job running on that node. You can see it after all six of those grid elements and currently it doesn't have them yet and it's just running it's just running out here things get interesting and you see it's actually acquired four of the six and it's not going to acquire anymore because we asked it to actually use and our cluster only has four GPUs and so the ray operator is very adaptive it will just keep trying to get stuff and if it makes a request to the Kubernetes or OpenShift scheduler for more worker nodes and it can't be satisfied well it's okay with that. We would like to get the six it's not going to have them but it's okay it will keep running just to talk it out. Just over a minute it gave us all the results and of course this is a very small step it is entirely possible because we are designing this data and stuck it in the closure and it's actually super small it would have been faster frankly to call it run locally but in the real world of course we have much larger models and much larger data parallelizing it could be a big figure. We can ask the object that it returned to tell us what the best result was and it said oh well I got my best result using an alpha 1 so what I can do is just quick train with those parameters one last time and see what I got so here I got 88.8% accuracy and because of first guess it's slightly better it's a lot harder to clean with. So here's an example using the power of ray from a nice convenient self-serve future environment to parallelize it so as I mentioned ray has tons of integration pretty fast for you it's easy to tip and saw or freeze days on your container images and the nice thing about this is you can use the same ray cluster to run all these so you can build up pretty complicated notebooks complicated workbooks all talking to the same cluster and parallelizing so you have a unified scaling environment for your data size and of course unification also typically means simplification you don't have to stand up a lot of specialized engines like a special batch engine or actually boot parallelizer you can basically tell it all to talk to the ray cluster and use that for the engine for the environment so a lot of this work I did especially early on was courtesy of the ODH installation with the landscape filtering cloud thank you almost and also in cooperation with the Operate Work Project where of course they're trying to take the principles of developing software in the open and extend them to operating software and services and this particular deployment I'm describing here no longer exists because the actual cluster I went away revealed zero cluster to the work on the MLT but you can see here I was able to use the Operate First project to describe everything I just showed you as a bunch of pull requests and once I had gotten the merge and redeployed the demo that I showed you was actually running on the MLT and what's going on for the future I want to actually redeploy Rayflex ODH on the MLT on the new cluster that's developed out of this log we would like to at some point take the ray operator and put it in the official operator catalog or something that can be installed to be the Operate operator life cycle manager and the things that have happened it's the last time I told you to talk we do have properly maintained ray images on project soft so there are standard builds and they're updated and we have community use cases through Jupyter again, courtesy of I would like, you know, on a roadmap hopefully through this quarter to get an actual formal integration with Open Data Hub so to basically make what I showed you an official part of the overlays to get with the operator and also this quarter we're exploring taking models to my train deploying them to edge device so running, possibly running RayServe at the edge or alternative model for the architectures and loads of you who are familiar with the Open Data Hub world everything I just described to you is done with the Open Data Hub Jupyter Hub Launcher now they are actually going to be moving away from this old launcher to using Integrate notebook controller and so we're going to explore how to do what I just described here simply use the profile and the notebook controller and that's the end of the talk I encourage you to once we get this back up on I'll see you play with Ray if you have any comments or questions or want to reach out send an email to myself on my Twitter thanks everyone yep and this is sort of a big advantage where you can think of it as a sort of a typical kind of thing but it has to be a logical kind of node and so if you're going to use this example it's pretty much the same or it'll be all blockchain and it'll be kind of like what you see but we want to see that it works great that's a great question and first of all obviously you saw I was just connecting to the one it made for me but you could stand your own up and connect to it too oh yes yes he was asking like why the question was why would I give each user their own instead of a large persistent cluster of people to connect to and the first part of the answer to the question is if that's the way you actually want to work you can it's easy to produce these clusters using the Ray operator just produce a Ray cluster of customer resource and it spins for you the reason I didn't do it that way is because I actually like sort of like pushing I prefer to push the multi-tenancy onto the platform so while Ray can be multi-tenant my experience with trying to use these kinds of platforms as multi-tenant is that they don't always implement the scheduling and the sharing once you become multi-tenant you're like signed up for a lot of stuff you got to like be able to manage multiple people's jobs and like how they how you allocate the resources and so like a lot of these projects while they support that they end up not supporting as well because it's like it's kind of like writing your own crypto problem I'm pushing it doing it this way I'm pushing it onto OpenShift and Kube which already have highly developed scheduling and resource sharing that's why that's why I think it's good about it yes well it scales it scales itself down like it acquired you saw how it acquired those workers you can configure it like after a certain period of idle time it just spins them down again so it won't hold on from forever I mean however I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I