 Thanks for getting up this morning, everyone, I know. Tomorrow that'll even be harder. So, I'm Steven Nules from the AI Center of Excellence in the Office of CTO, and Vacek Pavlin up here as well. So, today we're gonna talk about data science in the open cloud exchange model. And for those that don't know what an open cloud exchange is, we will get into exactly what that is. And there's a great demo. So, first off, the good, the bad, and the ugly of proprietary clouds. So, yeah, most of us in here probably work at Red Hat, at least I know the back row does. So, we know a lot of these things. But benefits, right? Elasticity, integration, security, defer costs, service abstraction, they have operational excellence, and they're built on open source software. But with all those benefits, there's a lot of negatives as well. First, vertical lock-in, data gravity. So, for those that have never heard the term data gravity, it's basically that the platform that owns your data also owns your processing. Because you're not gonna upload terabytes and terabytes of data to Azure, just to then pull terabytes down to process it locally, right? You're gonna do your processing up in Azure as well. So, once they own your data, they own all of the investment around processing and everything you do with that data going forward. From an open source software perspective, the public clouds do a great job of benefiting from all of the hard-working effort that goes into open source software. But they generally don't contribute back, right? And this has always been an ongoing problem. Then there's the life cycle dependency, right? Amazon, Google, they can change their services at will, and you as somebody who's leveraging that service just has to react to it. They may drop support for that service. So, you're somewhat at the mercy of what they decide to do from a life cycle perspective. And then there's the challenge of black box services and their reproducibility. So, when you think about highly governed or regulated industries like finance and healthcare, if you're making a decision based on data, you need to be able to justify that decision. And in these black box services, you don't know what processing they're doing and there's not necessarily the reproducibility. If somebody changed the code in the background of that service, the answer you get the second time to run the exact same data through may be different and that presents a challenge from a regulatory perspective. So, why not just use a private cloud, right? You've got all those challenges with the public cloud, well, don't use it, right? Just do it all on premise. We certainly sell software that does that. Well, that has challenges too. First off, there's a lot of operational complexity in deploying and managing these stacks. So, if you're a large organization, maybe that's something you can absorb. But if you're a smaller organization, that can be a real challenge, especially depending on the scale of the operation it is you have to support. Secondly, a lot of the private clouds have a poor user experience, right? That's a lot of the draw using like Amazon services is that great user interface, that great user experience of just pointing, clicking, tying things together and how well that whole workflow goes. You don't generally get that same experience in a private cloud. Next, just the lack of diversity in services, right? Again, unless you're a huge shop with a huge development arm, you're probably not developing all the plug-and-play services that you're gonna get access to if you use a public cloud. And that can be a challenge. Or if you could develop them, the lifecycle to get those things out the door is probably much longer than it is if you were just to use a service already sitting up in the public cloud. And then, in general, private clouds tend to be costly in terms of support from a training perspective and operational excellence perspective. Which again, large-scale shops, they can absorb some of that, but on a smaller scale, it's just not feasible. So you don't get the operational excellence you may need if you need a 24-7, 5-9 system. So, there are other alternatives out there, right? OpenStack's done a great job of proliferating their public clouds. There's a number of public clouds based on OpenStack that are out there in the world. And that's fantastic. It's actually so fantastic that it covers more regions of the world than any public cloud provider can offer. Pretty good claim for an open-source platform. But none of them have the scale or mission to attract the type of diversity you would get in a public cloud. So they tend to be pretty niche deployments of these things. Secondly, between the clouds, they're really not homogenous, right? There's a lot of diversity in the services they have. They're based on different base images. There's different flavors in the OpenStack deployment. So you can't really plug the things together very, very easily. And last, not a single one of them has a mandate or the funds of trying to overcome any of these issues. There's not a single one of those things that's listed that is even worried about trying to build or collaborate across the other clouds. So that presents a challenge there. So what about a different model? A model we call the Open Cloud Exchange where we bring together hardware vendors, software vendors, open source communities, research institutions and governments together to operate in collaborative ways to provide a platform that can meet the needs of both small and large scale users. So what is an Open Cloud Exchange? So it's an alternative cloud model where we bring together many stakeholders rather than just a single provider to participate in implementing the cloud, operating the cloud and providing the services that folks are gonna use on that cloud. It has a multi-sided marketplace where participants cooperate and compete. So if you're two joint organizations that have a need for a similar service, well, by all means, collaborate, put together that service and offer it to others and others will pay them for that service and we'll keep your costs that way. But you can also compete, right? There's nothing that says we can't have multiple services that provide similar capabilities but maybe have different feature points or functionality points catering to maybe a highly regulated industry versus an open source community. And last, users can freely choose among the services out there. So we have this rich bed of services that's made available. Users can pick and choose which ones they need based on their particular use cases and maybe those are based on cost. Maybe it's something where you've contributed infrastructure into the environment. So you've received credits for your contribution and therefore you want to cash in those credits on services that are relevant to you. It could be that's just based on general capability. You need something that's more secure, something that's more high speed. All those things are available. So what are the core use cases we're looking at for open cloud exchange? So again, this is an environment where you have multiple vendors that are contributing at every layer of the stack. So you have hardware vendors who are going to contribute infrastructure. You have network vendors who are contributing network components. You have software vendors putting up their software. You have open source communities putting in their components. You have private companies. So think of something like a GE or Walmart. They're contributing infrastructure as well to meet their needs. So one of the common use cases here is if you have occasional seasonality where you need to be able to respond to high demand, it's not worth your time or money to invest in this massive private cloud on your infrastructure because you're only going to utilize it for a very short period of time. And then bursting in a hybrid cloud model up into Amazon may not be the best option for you. But you could overfund the OCX model with a little bit more than what you would have on your daily demand and run your workloads in the environment such that on a daily basis, you're actually receiving credits because you're not using all of the infrastructure you've provided. Then when you have one of your peak workloads, you explode and you may take up your fully donated resources plus other resources. But in that case, you're just redeeming credits for the extra infrastructure you already provided and it ends up being a net win from a cost perspective. So it's a shared model where workloads have high variability. The next one is where software and hardware vendors get to contribute software. Maybe before it goes GA and they get real user feedback on how that software functions. They get, also they get a test, a marketplace for it as well. So if you're a hardware vendor and you're developing some new chipset and you think that chipset is very well targeted to certain workloads, you can put that hardware into the OCX model and have users target their workloads for it and get feedback on how did it work? Did it meet what we thought? And then you can tweak and tune before you actually enter the market with that hardware knowing with some level of confidence that hey, this does meet the workload and the demand we wanted and now we have a public viewpoint that we can point to that says, hey, look, here's how well it did work. Next, it's a great platform for open source software communities. One, because open source software communities tend to be somewhat constrained on the infrastructure and hardware side and what they want to do to be able to truly serve their community. Well, this is a platform where they have access to resources they otherwise might not get access to and it allows them to broaden their overall user base and to put services out there for folks to take advantage of, right? If it kind of helps with the general publicity aspect of your community. And then lastly, it allows organizations that otherwise couldn't collaborate because somebody would have to own the infrastructure to collaborate on a set of infrastructure and say, hey, we want to do this project together. Let's put the hardware and infrastructure into this OCX model. We can work jointly there. We can share our data there and it removes some of the barriers to actually making progress on these joint implementations. So recap, current clouds are expensive. Industry locked out of a lot of clouds or a lot of niche industries that don't get access to it. Lots of great open source software out there that can benefit from and contribute to the environment. All those niche markets out there that are underserved and can't afford to set up their own private clouds. Lots of users and vendors out there or customers are concerned about vendor lock-in and it doesn't have to be at the scale of AWS, right? There's lots of studies that clouds don't have to be at AWS scale to actually provide cost benefit. So the first OCX implementation that Red Hat's been involved in was stood up in the Massachusetts Open Cloud. For those that aren't familiar, I didn't put the website down here but if you search Massachusetts Open Cloud you'll see this out there. And it's a platform that brings together various educational institutions, government institutions, private industry, and then a whole bunch of software vendors that have put all of this infrastructure software network into a common platform in support of this OCX model. And to date, so what is it? It's 90,000 square feet and growing with tens of thousands of users accessing this infrastructure. Where we and what we're gonna talk about today are getting involved as more on the open ship side of things where they want to, so a lot of what they've been doing right now has been farming out virtual hardware for various research studies and use cases. And now what we're getting into is more about as a service type platform where users and independent researchers don't have to worry about setting up the infrastructure or deploying any applications themselves. They'll just come in, go into an interface, do their data science experiments and move on to the next thing. So for our hardware, we have 250 cores, 1.5 terabytes of RAM. There's an open ship cluster that's been stood up dedicated to this effort. And then they have tons and tons of SaaS storage. So with that, I'm gonna turn it over to Wojciech to tell you more about the details of what we have running out there and how people are using it. Thank you, Steven. Okay, so now you know about the hardware and the environment that we run in and then now our team is building something we call Open Data Hub, which is an AI as a service platform, which we target is to, I don't wanna say compete with, but show people that there is an alternative to other cloud provided, public cloud provided platforms. And it's basically a meta project. So it doesn't, we don't specifically develop tools for machine learning, but we rather integrate existing tools, whatever we find that community benefits from or community pre-first. And it's all built on open source and all what we do, all the integrations are also being open sourced. We also run Data Hub inside Red Hat currently, which has more components than what we have publicly in our GitHub repo, but it's still going on and growing. The goal apart from like doing the integrations between things that currently are built in silos in a lot of cases is also to foster the collaboration between those communities. So not only that we want to integrate things, but we would like to help people and push people into collaboration and contribution between various projects. Another thing is also to ensure reproducibility. So with, as Steven mentioned, with public clouds, we never know that the service has changed, the model that you are using can be changed. We think that if we build everything on open source, you can always find the real source of what you are running and even from like compliance perspective or even from your interest, you should be able to go and find why is the thing that you are using doing the work in a way that it is doing it. Another goal for OpenData is to build a flexible entry point layers. So if you are a researcher that needs to store a little data and just have a very simple processing of it, you should be able to just focus on the storage tier. If you are an app developer who has some data, but the biggest problem is to where and how to run your applications, how to set up pipelines for that, you should be able to find at the right level of entry point for you as well. And we also want to enable those users to pick and choose the services, whatever you are used to from the public cloud. So if you have a bunch of services, you are not bound to using them all, but we want you to be able to choose what whatever is important for you and if you are using the OpenCloud Exchange model, then only pay for what you are using, obviously. We deploy OpenData right now to Massachusetts OpenCloud and operate there, held with the operations of that and basically Red Hat is providing support for OpenShift and the OpenData app is running on top of the OpenShift and we help the research communities that are on the Massachusetts OpenCloud to leverage those tools that we are providing there. We are running an early editor program on the OpenData app on Massachusetts OpenCloud, which is to find interesting, interested researchers and projects or companies that would like to try OpenData app, onboard them, help them to figure out how to use the tooling and then help them with the research that they are doing. So right now, the two things that are running, the two early doctors that are running our best piece, which is a nonprofit, I think, company that is taking a lot of data from the beehives around the East Coast and then they are processing it and trying to help the beekeepers to maximize their bee health and all the stuff around that. They are also cooperating with NASA. There is a link to a video so when the slides are published, you will be able to see the video or you can put it into Google, I guess, where they collaborate and they did some mapping on Google Earth for the beehives and how things are influenced and stuff like that. So it's a really interesting video. Second project that we are working with is, those are basically university students, researchers. And their work is to analyze a lot of various papers, thesis, white papers, whatever you can think about and build a network from the citations in those so that if you are an author of a thesis and you thought, okay, so I wrote this medical-based thesis, so I'm probably influencing a medical industry or medical space and then from this network that they are building, you can learn that your medical thesis also influenced, I don't know, some biology or a technical field or something like that. So that is also an interesting project. We also have other projects that would like to join that but we have to onboard them very slowly because the infrastructure and the operations is not at the point where we can just put everyone on the cluster. So I guess that if anyone is interested and probably can reach out, there is a slide at the end. Come on, Pat. Can I say something? Yes. In one day, how many bees do you have? So maybe the project can help. It's a worldwide problem, B count, let's say. So what is a, what are the patterns in the service and the platform that we are trying to build? So you still want to have a secure hybrid cloud platform. So there is Kubernetes, it's based on open source so Kubernetes and it's S3 set as a foundation and let's say Kafka is a stream processing and a message bus thing. On top of that, you probably want to build some pipelines and you want to do an application lifecycle management. So there are some CI CD tools, there are ways to store your data and to store your code and how to match them together. When you are building an application, you'll probably do that in some language. So we have multiple language runtimes and obviously it's running on top of Kubernetes and OpenShift so whatever you bring in your container will run that, right? Analytics and AI processing is an important part because we are building an AI as a service platform. So we have a bunch of tools regarding that. So for experimentation, we have Jupyter notebooks for model evaluation, we are looking at MLflow. For model serving, there are other tools like Selden and TensorFlow serving and we are also looking at a project called Kubeflow. If you are familiar, it is a Kubernetes native workflow manager for data scientists, I would say, like with a lot of tools integrated. And we also have a bunch of common services in there. So something like if you are familiar with AWS S3, there is an S3 endpoint that you can access in that platform and in that cloud. You can go up the layer. There is obviously the part which spans the whole stack, so you need an identity management and policies and rule-based access control for things. So that is also part of that or that's also part of the problems that we are looking into. So what, from the data science in the OpenAd, what do we have for you as a user? Basically, right now, what is open source in a GitLab repo are these pieces of technology. So we started with this set as a foundation for data. We expose S3 API for the set, so you can access it with an object storage. So if you look for an example like how to do data science with AWS, you probably find that we are using S3, so we have that as well. We use Jupyter app for our data scientists to experiment and if you are not familiar with Jupyter notebooks, I'll show a quick example a bit later, but it is basically an interactive engine where you can write various language, where you can use various programming languages and interact with it through web UI, so we don't have to install anything on your laptop and we provide it in the cluster. We use Apache Spark, which is an AI engine, general AI engine which can be used for the data processing or model training, things like that. And then also TensorFlow is integrated, which is the hot thing right now for building AI neural network models and other things. So I think it's time for the demo, so I'll show you what we have, what is how it's working. I came up with two personas here. So one is operations, that is the people that will take open data hub and deploy it to their infrastructure. So that's for example me, we're doing that with MOC. When we do something or we add some services, I'm mostly the person to go and do that. And then we have data scientists, which bunch of them are sitting in the back row, so that's how I view them. We will come and use the platform to do the data science. So collect the data and then they can process it and work on that. Okay, so this is the open shift. You probably maybe have seen some talk yesterday or are planning to go to some talks today about open shift. That's our enterprise distribution of Kubernetes. And if I want to deploy open data hub, I had my open shift administrators to add open data hub APB into my open shift APB. It's basically a packaging format for containerized applications based on Ansible. So I just searched the catalog for open data hub course. Now I'm the operations persona, right? So I want to deploy that, so I'll just click that and it tells me what it is that it's an AI as a service platform based on open source. Great, that's exactly what I want. I don't want to do the product deployment, so I'll select the development plan from the APB. And I need to select the project. I'll create a new one. So let's call it demo. I don't need to fill in these. Let's probably give it more memory just for the sake of it. I want to deploy all the images that we have for the notebooks. And I remember this that I can, I don't have to build all the images, I can just use them. So now I click create. And when I go to the overview, I will start seeing things being deployed. We are trying to make it as easy as possible for an integrated platform to not only that when you are using it, it should be easy, but it should be also easy to deploy it and not have to go through too much trouble for that. So as I mentioned, part of that platform, and this is just the depth plan. So this is more about like, if you want to try it, our platform is set, in this case, it's set Nano. So that's a single container deployment. It is not scalable. It doesn't give you a big data storage. If you want that, you want to use the product plan and you want to deploy site on the side in a proper deployment, not this small one. We have a Jupyter Hub. That's for accessing the notebooks and working with the notebooks. It brings in a database for itself to take care of users. And then we have a Spark operator, which is also built by our colleagues in Red Hat from Red Analytics, which takes care of starting our Spark clusters. So if you are a user that comes to Jupyter Hub and you decide to use a Spark for your analysis, you only need to log in and then you select the Spark notebook and the Jupyter Hub will know that you are going to do some Spark work. So when you spawn your Jupyter server, it will also create a Spark cluster for you to work with so that you are not sharing the Spark cluster with others and don't fight for resources so that you know that whatever you do is there for you. So open-shaped data. It is not there by default. I edited theirs for the demo, but we have, if you go to OpenDataHub.io, there is a link to the repository and in the repository is a read me with the steps how to deploy OpenData Hub. I'll definitely get to it. I can show the repository as well. You can run it on your laptop, but it will consume at least 12 gigabytes of RAM. So you probably need at least 16 gigs of RAM for your laptop. We are trying to, for the dev plan, we are trying to put it a bit down, but it is hard with, if you want to still make it useful or make it so that you can try the stuff in Spark and not just wait for an hour to analyze 300 lines of some CSV file of RAM. So there are trade-offs, but yes, you can definitely deploy it on your laptop with MiniShift. So I've already also deployed it here and I did some configuration in there, so I just started just to show how the notebook interface looks like and how we work with that. So we basically can connect to Spark. The integration is in a way that there is an environment variable that gives you a Spark URL. So you don't have to try to figure out how these things are connected, what is the name of the service or anything. It's all in the environment variables. So now I, this is the notebook, so these cells where you write a code and then you run them and it basically communicates with the backend on the cluster in OpenShift and executes the code and then it returns back the results. And you can have, this is Python, but you can have different kernels. That's how it's called, the engine that executes the code. So you can have R, you can have Julia, we are looking into doing a MATLAB kernel or using a MATLAB kernel. There are a bunch more like .NET and JavaScript and everything. So you can do interesting things. So now you can see that it's finished and it returned some number computed in Spark. It is a great data science notebook, right? As I said, I'm more into an operations part. So if like guys in the back would present, they would probably show much more interesting stuff for the data science, right? So here we have the, how to access the Spark. So we also have the Spark server, the Spark server URL in our environment variables. So we can connect to it and we can list the buckets in there. You can see that there is nothing. So I can create a bucket through the S3 API and then I can probably upload some file and take a look at it. So it's return 200, which would be good. So it's got created. Now it's uploading some file. I don't know why it's so slow. I think that the cluster is sick after yesterday's workshop when we basically burned it down. So now I uploaded some file and now I can see that the file that I uploaded is actually this notebook. So OVH4 does, I buy the notebook and you can see that it is uploaded there. So if you imagine I would have some big CSV with 10,000 of lines, I would upload it and then I can work with the true Spark. I can use SQL queries on that file and things like that. So I think that's it for the demo and I'll go back to the slides. Maybe it's just internet. Yeah, it could be. So if there are any questions while we are loading here, feel free to ask. And so one of the things, again, kind of pointing back to the beginning of the OVH4 talk about in the cloud exchange. So maybe we go over pretty quickly but if you were an independent data scientist and you wanted to do that type of experiment, that was a lot of infrastructure that got all laid down with a couple of answers into some boxes and you immediately could go in and start using the environment. It was already plumbed together so you don't even have to know about the other services, right? They're all just environment variables. All that's taken care of for you so you're concerned more than about your research, your project, your data, everything else that was just working, right? Very similar to like an Amazon save maker type experience or an Azure or a Google. All done with open source software in an open cloud exchange model. I feel a lot of time for you now. I don't know. Oh, you didn't ask me. I don't know. Yeah. So we talked earlier about some of the principles we have open data and how you wanted to enable collaboration between communities and kind of stick to open principles and using choices on a big component that's going on and flexibility of tooling. I'm curious about future plans or maybe to grow that. Will the users be able to bring their own tooling into this environment and libraries and frameworks in the world? Yep, absolutely. So right now, and I think this is actually one of your Yeah, I'll skip this one and I'll go right here. So for the question to answer it, probably this part, what's next? So right now what I showed is what is in the GitLab basically for the open data deployment and we want to add more components. So as mentioned before, we run many more components internally so we want to add them there but also if someone has an idea what should be there, what is missing. It is a public GitLab repository so poor requests are welcome. Feel free to fix it. No, honestly, if there are ideas coming from the community, it would be even better than us just pushing everything in there. We would definitely love that. I may fall under the lessons learned side of things is I think from the user experience, when it's up and running, they love the experience, right? It's extremely intuitive, very easy. User interfaces, they're already accustomed to using when it does. The biggest challenges we've had thus far are really more around operating the environment and that's probably something we took for granted in the beginning because the MOC has a pretty extensive experience running OpenStack Cloud and they're just new to OpenShift so there's probably a bit more on the enablement side we should be doing there to help accelerate that but from the user experience side it's been all the feedbacks been extremely positive. Yeah, and it plays into the lessons we learned so it is hard to plan capacity for data science until you know what the person is actually going to do so that's something we need to get better with or we are starting a cooperation with a profit store, right? With the MOC, so they are analyzing the workloads and then they are potentially changing the resource limits and things like that based on some machine learning models so that might help there. As Steven said, OpenShift version free is pretty hard to operate on scale and keep it up, keep it running, keep everything working with storage and registries and everything so that was I think the biggest hurdle because we bumped into a lot of these obstacles on the way to keep stuff running in the MOC on top of OpenShift so they are still learning, they are completely new to OpenShift so it's a hard road to take. We hope that OpenShift V4 will solve that for us because it's all self-driving and machine learning based and operator based. One another problem that we have is aligning priorities and setting up the communication with the operations of the MOC so at the beginning it was hard to know what is important to do first so we were like, okay, so give us some cluster and then we deploy it there and we try it but then we realized the cluster is, I don't say useless but it is not big enough and it's not stable enough and it's not production enough to actually onboard people on that so now they are setting up a new, bigger cluster which is bringing new problems because they set up a new OpenStack and different versions of things, stuff like that so we are still working on that and how to build a process around these things to get OpenShift running in a production manner and then have OpenData running on that in a production manner as well and for the what's next as well mentioned that we would definitely welcome contributions and we're going to put more components, more deployment configurations for the components that we run internally but we're also working on something called AI Library and AIOps so we would like to add that to the OpenData so AIOps is basically using machine learning and AI to operate your cluster to help with the operations, analyzing blocks, analyzing metrics so we would like to add that to the OpenData app itself and AI Library is a set of components and pre-trained models and codes that can train the model where you can pick and choose like I have the data, it's in the right format I can just choose an algorithm that I want to do like correlation or clustering or something and you get data out of it or the results out of it in the other end that is also, it is part of the OpenData app but it is not integrated in the public repository well enough yet we are also as I mentioned seldom in TensorFlow serving and there are other tools looking into model serving so if you are able to train a model and you can do it in the Jupyter notebook that's not such a big deal if you have enough resources but it needs you to upload the model somewhere so that you can do to the S3 endpoints to the object storage that's also easy but then you need to have a simple way how to run the model so that you can make predictions from that so you can query the model itself so we are looking into adding maybe seldom or Kubeflow has other tooling there are multiple ways how to do these things so we are looking and investigating these and we are also working with the Boston University and that's on my previous slide also we work on that with the Czech Technical University on an open cloud marketplace so basically building a marketplace with the services on top of open cloud and figuring out pricing for these services automatically and then like so that you, as Stephen mentioned like if you provide an infrastructure you get some credit for it and basically you can then spend them in the marketplace on consuming other services and other resources Right, is there a scale to this infrastructure because you know that I'd say we are still working on the actual like boundaries of the marketplace and how it will actually work I don't know too much about that project to be honest The way they've done the scheduler right now is they actually under subscribe their resources so if you were to come in and request a certain amount of resources if they showed up as available you'd be sure you got them so right now they have their hardware rich and software poor so there are no chance of running out of infrastructure right now And it's first, first, first date as I said, first request, first step Right now, yeah Or who contributes most as more right to That's the marketplace model we're working on so right now it's again there's no issue with accessibility the marketplace model they're trying to come up with is exactly that is how do we prioritize based on all those factors and the credits received, contributions made and that's they're actually going to apply some machine learning to how that calculation gets derived so that's an area of investigation that falls a little bit outside of what we're doing from an open data hub perspective but it's something they're doing from an operation side And probably one thing to add on I guess to Mike's question before about the software within open data hub and what we're like road map wise we haven't gone into the full road map here for open data hub but there's no technology limitation as to what goes in open data hub again it's sort of a meta project that ties things together so today the fact we used Jupyter notebooks is simply because that had the greatest demand from within Red Hat and from the users we spoke to we're starting to get increased requests for things like RStudio and if someone wanted to bring along Zeppelin and contribute that into the environment that would be fine right we're not saying that notebooks are the only interface into data science so this is where we started So we describe how we work with NOC with Massachusetts Open Cloud we are also starting to work with universities in the Czech Republic currently we are working with Masaryk University in Brno and the CERIC Scientific Cloud that they run together with CessNet and the goal is to also get them to deploy they are moving to OpenStack and to Kubernetes so our goal is to get some open shift there so that we can deploy open data hub and also encourage them to use that we already came up with one pilot project for doing that so I think that's going to go very well and then we are also working with the AI Center at the Faculty of Electro Techniques and at Czech Technical University in Prague they are also building some data centers so we are consulting them with how to build it so that open data hub and open shifts can run there but we also cooperate with them on a specific project from the data science, machine learning and AI world so one of them is the Data with Dynamic Pricing that Siebel mentioned for the marketplace so they would like to help us build models and how to build those models for the dynamic pricing we are working in our team on a project called TOT which you can call TOT to Christoph in the back about it which is to help developers to find the right set of dependencies for their project so we also will collaborate with the Czech Technical University on that and then also they are very interested in the AIOPS stuff for the open data hub so analyzing blocks, metrics, things like that to get your operations of your clusters simpler so that you don't have to look at all the metrics with your eyes but some model can do it for you and then you just get a very nice alert if something is happening we already went through this and it's basically it for the talk so you can try it yourself as there was a question that you can write on your laptop so I answered yes you can but it's going to consume a lot of resources from your laptop so we will try to work on that but you can spin up a VM and deploy it there I guess if you go to openadahub.io you will find we are starting to push out some content for like blog posts there is a mailing list that you can subscribe to where all the news will be there where you can provide suggestions for which components or which technology you are interested in and what would make sense to add and you can also find a link to repository in there for the instructions on how to deploy with APB if you want to be included in the related apps program you can contact any of us Steven here, Sherard in the back and then I am there as well and we will do our best to include you if you have an interesting project that you would like to try to run on openadahub and I think that's it for the talk we have five minutes for questions that was either very confusing or very very clear the question is on multi-tenancy so right now we don't leverage the open-shift multi-tenancy with this so we use open-shift as an authentication provider but everything that I showed is deployed like Sparkluster and Jupyter Notebooks that all happens in a single namespace so basically we need to have a very very big namespace for that we are also looking into how to split these things and then basically use the open-shift resource planning and resource quotas for so that everyone gets their own namespace but right now it's all deployed in a single namespace and Jupyter Hub takes care of the multi-tenancy part where you get your own container where your code runs and then in a set you get your own credentials where you can create your packets and Spark is also an instance that is assigned to you here it is we have one multi-taker app and it's deployed on an open-shift cluster not run in a single namespace right, yes right now the vision for this version of open-native is that you have one open-native and then your data scientists come in as they want or need and they use it but we are also looking into running specific services per user and using the open-shift multi-tenancy I think it will very much depend on what kind of service that is so I guess that for example Kafka you want to have a single thing for many streams coming in but there might be different services that would make more sense to have it deployed in your own namespace like the model training jobs or model serving jobs probably would be better to go into your own namespace with your own resource limitations okay thank you very much everyone thank you for coming to Aurora