 So good morning everyone just to keep the ball rolling with the previous talk I have just few questions to break the ice so very with me the first one is Which of you knows what a virtual cubelette is or is even developing one or walking with? so Yeah, all right cool and and Which of you then think that if we possible for a user with a basic to average knowledge of kubernetes to Get it work That will stack let's say so improve improving and developing its own provider Yeah That's the question and that's the problem that we envision to To take off with the interlink that is the project that I'm presenting but First of all let me present myself. So I'm a technologist that is working at INFM for the ICSC Center INFM stays for National Institute for nuclear physics and While the ICSC that is quite hard to pronounce is the Italian research Center for Supercomputing bit data and quantum computing So Let me first of all say that if this is not one man show in the sense that I Want really thanks all the people that put effort in this activity and you can also see from there That is not either Single institution efforts. So we have several institutes from all over Europe that are working together to get Some cool stuff in here. So Setting the stage first of all what is INFM and what are really our customers and Also, which kind of resources and providers we have at our disposal So as you can already have Imagine we are we have a set of scientific communities that are working together in the domain of particle physics and We have five committees that are all in need for Computing power computing power that comes mainly from a distributed infrastructure in Italy Composed by several sites each one of these sites as different maintainers different kind of Computing resources manager. So This is one of the challenge that we are going to face not only that In the frame of the ICNC We also have many other science domains that are joining These requests for resources that are not only physics goes to the genomic or chemical material science and science forth. So all kind of sciences striving to access These kind of resources being them normal batch cues or specialized resources with a lot of GPU and computing power What we need to do in the R&D that I'm presenting is find the balance and the match between How we can serve these different communities with all day? but Evolution for their frameworks that fortunately enough are converging toward cloud native set of tools With the resource brokering that is very complex in a situation where we have this kind of Geodistributed scenario There is then one other key point here is that We participating into an European project that is to be considered the real catalyst for the solution I'm showing that is called intertwin Intertwin is a European project That is composed by providers and use cases basically and the use cases are digital twins Users that want to apply and run their digital twins wherever it's possible So accessing GPU resources wherever they are available with the very same interoperability Approach that Cloud can provide so in this case the needs for having a cloud native way or Interface to access these resources is even stronger and you can see maybe not but trust me In the providers we have also a lot of euro HPC or HPC centers in general. So you are the challenge is really alright On one end we have a seed of cloud resources where we can provide the our customers our users our scientists With an interface that they know how to use mean being cube flow or other other solution And or the other one we want to get access to this amount of resources at the HPC centers So what we need in fact is an API layer if you want an abstraction to rule all that and That's where we are getting and to answer the question of which API we need we started really from the user needs and so we did some Paul here and there and the ecosystem really led us to to the solution and we were in a situation kind of This one so Well, we have a lot of frameworks a lot of people doing different stuff in different ways But fortunately enough for us They are all converging into containerized environment and trust me it was not granted since if you look back to a couple of years and So the assumption is user of this kind of access containerized cloud native frameworks or of any sort and All of them are talking with a Kubernetes API under root So the point of contact that we decided to put is all right. We need something that Save this guy here that is running away. So let's say I Create a pod but actually I want it to run everywhere Be careful. I am presenting the results for the HPC part. So accessing GPU In this scenario, but this is applicable also if you have a machine In any side that you want to to access and you have no Kubernetes there In fact when we are we're trying to shape up our needs we started say, okay, first of all, we have Containers adopted by our customers. That is cool Then we have most of the infrastructures that for this very reason are containers ready one way on the other and also We think that we might try Give it a try to setting a Kubernetes API as our good base for satisfy them We had a problem though. I Already anticipate that We have some system that does not allow us to Install all the Kubernetes needed components and typical HPC centers does have a slur usually or whatever other other kind of batch system and and They are really good at it. So there is really not it for for them to change anything and So, yeah, what we could do at that point Something started to to rang a bell also thanks to the previous Kubecons where we start thinking about virtual kubelet project So we took a look carefully and in fact It's essentially what we needed because it's creating a Kubernetes cluster node But virtual so giving access via the Kubernetes PI but in a bit away and translate the pod request into Something else something else that can be Whatever and in the case of HPC center can be some it does a long job with a singularity or obtain or or Whatever kind of container runtime So the two main points before are respected. We keep a Kubernetes API. We exploit removed resources How we can do that? Well, first of all, we need to set some use cases in mind in our mind And this one is the baseline for everything. I'm going to show you in a moment So I have my analysis paper line I can decide and talk you Bernatis this particular step of the pipeline as to go to a euro HPC node running slur and the magic happens in the virtual Kubernetes node where this message is translated into okay, I want to go there and Allow me to submit a job and return the results to my to my user. Oh Of course, we did some scouting before starting and There was a lot already on the plate and we try that first of all And in particular I'm referring to the KNOC project that Worked perfectly on our first try in the Vega Slovenian euro HPC Site with some caveat I would say so we give this try and say Okay, there are some woe stuff and there are others that are meh, but we can improve somehow the first one everything worked also thanks to the fact that our Pipelines usually for science use cases are very standalone Set of jobs that download and input and provide an output. That's all the map art was mainly due to our particular Set up of our distributed resources I told you before that there are many centers managed by different people with different preference both both in terms of Computing resource management and also on language of code language for developing stuff so we wanted really to Delegate to the provider the ability to set the most comfortable Container management system that they can provide so what we want them to be in charge of saying This is how I want to run containers on my infrastructure And it is not too easy since Not really a black belt, but some deep knowledge of how Kubernetes work inside I need it if you want to implement the whole change of virtual cublet, so we really try to strip off the virtual cublet implementation in two part one that we are going to maintain centrally, but that can be done essentially extended as needed So we maintain the Kubernetes logic on one side and we set a kind of agreement kind of an open API spec if you want between the part centrally managed and the part provided by computing resources administrators In other words we tried and we are trying actually to streamline the process of creating a virtual cublet so providing a solution for Lowering the barrier in extending such a technology on Ideally any remote resources we started from HPC. It can be a container as a service as we will see in a moment All right, so the journey Begin and the vision was clear So we set these API's in the middle between the virtual cublet internal Kubernetes mechanic and the remote Counterpart that says I authenticate your request. I take care of running the container in the best way possible And again if you want to develop a plugin of your own There is only one requirement you have to respect the API spec That's all in any language that you want. You don't know what to you don't have to know Go or you don't have to know Kubernetes or really a basic Kubernetes knowledge So this eventually and hopefully for us Will grow up into an ecosystem that we can serve To the users and say, okay, these are our solution portfolio. You can access these kind of resources There is clearly not a silver Bullet here. We have a lot still in the in the making and we are quite early in the in the development path, but These are all caveats that are not disturbing so much our use cases because as I said you before they are all well contained So the first one the payload should not rely on intra cluster network connectivity This is for the moment of the list. There is no Internal tunneling of the connection from the remote counterpart and then also From the volume perspective Config map secrets and empty deer are the only supported resources for your pots All right. So now let me play a little Game here if you want Introducing all the characters of our implementation. So we started from the the receiver That is the virtual cuba component the virtual cuba components takes care of saying or communicating Getting weak uber net is receiving the request and say, okay, I'll take care of this but under the hood is contacting already the interlink API server and Interlink API server is the man in the middle So it's the one responsible to translate the request that still have some Kubernetes related stuff in something that you can consider cuba net is agnostic So the creation of a container and the management of the container life cycle eventually then once the man in the middle translate this measure this message is able to contact the cavalry actually so Them that have just one responsibility They are to make interlink API server satisfied so answer all its questions and They also have to know exactly The battlefield if you want so Where the container should run how they should be removed how the logs should be forwarded to the interlink API server? For whoever is not all loving Particularly role-playing game. I have a more technical Figure here where you can see the the full stack starting from the Kubernetes API server to the interlink virtual cuba part and That is forwarding well-defined Request to an authenticated endpoint that is the interlink interlink API server The authenticated endpoint That notice can be used by one more than one virtual cuba So you can have multiple cuba net is cluster talking to interlink API server Then is forwarding their quest to a set of plugins that you can develop on your own We started with two main plugins that the first one is allowing the route on the The route on the left-hand side. So the one that goes from I want to run this pod on my machine and That is installed with an interlink and Is capable of running docker container locally on the host so whatever machine you can imagine you can attach this directly and and get ready to run pods there and the other part is The one specific for the HPC. So the deceiver will take care of saying to the man in the middle Alright, I'm authenticated. Please submit this container and let me know how it goes and The man in the middle can contact this learn plugin That depending on how the resource provider is configuring stuff can say, okay This is how I want a container to run on my infrastructure Please submit this job with obtain a run exact or whatever and I'll let you know how it goes So we started small we started from two World-class euro HPC centers that are where we're volunteering to test out this technology The first one here is HPC Vega. That is the Slovenian a supercomputing Austin at the Institute of Science in a motherboard and it was actually the first Volunteer where we tested also the KNOC project before We we get a lot of feedback a lot of useful information and also a lot of resources We are currently running on 90 GPUs for some tests. So are quite Quite generous so and the other part instead is the Julek supercomputing center that allowed us some specific development that are also on ice They come to us and say look I tried all the flow via slur. It's working. Okay But what if we have already a container as a service provided for our user that in this case is called Unico Can we use that and the answer is of course? Yes So they were actually the first volunteer trying to implement the very last part So the the plug-in part for their needs In both cases we are a success story to share that I'm going to to show just if you moment right deployment experience Very briefly because it's very brief There is an installer that you can configure with all the details where you say I have this remote host to be added as a virtual node give some set of Virtual resources to this note and authenticate with that through a o out of sunset of some sort Then once the installer runs give is it gives it gives you all the instruction to Set up your kubernetes cluster. So the virtual kiblet part But also set all the script needed to be run on the remote remote counterpart So Once ready in the kubernetes part, you will see a new node appear down here And eventually when you submit a pod to that node you will access a nice nvidia a 100 GPU Out of the box So it's not really important. We have examples Related to implementing this in mini cubes. So essentially you can extend your mini cube to any kind of resources in this scenario and This powerful tool Is enabling us the main case studies that are the drivers for our development So the first one is I have my pipeline implemented in ML flow or whatever framework I want to some part of this pipeline to be run on the remote node Then we have several as computing show new data new models appears on my storage I want to trigger this in an automatic way So it's kind of an evolution of the first one and then I want to also play interactively with these kind of resources. So kind of spawning my notebook on a GPU node This is more tricky. We needed to implement backward connectivity So the first one first use case Is driven by a CERN use case We saw this morning already Presentation comes from LHC and this is very very similar. So we add an algorithm to be inferred so a 3d GAN algorithm and That took quite some resources and we managed to create a pod to do this to do that manually so starting from setting some annotations for customizing the jobs to Eventually assign directly the job to a julek node The flow was already repeated also for the VEGA one. So consider the two HPC before equivalent in these terms All went well and we found out our result into a ML flow tracking server Virtual Kubelet was host on one Kubernetes dedicated to these experiments So nothing particular but hosted at INFN centers the serverless computing part was Volunteered by Oscar that is a framework used to do this kind of stuff I have my data lake if you want it triggers something and I want to manage this dynamically So interlink in this case allowed Oscar to extend the computation toward the HPC cluster on this part of the board So Oscar were running on a Kubernetes cluster where a virtual node Was there? The only thing that needed to be done at the Oscar level was okay configure properly the pod to be delegated there if you have to create your model or to play around of course the interactive prototyping can be very helpful and so we tested out the our third use case that is the Interactive analysis platform if you want so we had the possibility to say look I want this notebook to be spawned on one of these free HPC flavor please do that and eventually The user will be prompted with a full functioning Not book where you had all the resources that he requested and Was able to run NML flow workflow of the previous case so well That was a cool start of the journey We still have some Other a step to be to be made some are easier than out than others so the first one is if we really want to widening the plugins and Providers in our portfolio. We need to define an open IP API spec as soon as possible There is a one already, but since we are early in development can change So one message here is a perfect time to jump on board if you want The other one is okay We can put all our three use cases together if we manage to integrate to keep flow so that would be the the nice thing to try out next and Eventually deploying this in a multi tenant way has been preliminary tested Multitenant meaning I have one interlink serving different Kubernetes cluster Then there are long tails of development that can Improve a lot the user experience meaning I put him in the pod in a notation Say I want this data set to be at the virtual note and you take care of this data orchestration under the wood before running your container and Also, I want some kind of in-cluster VPN That can tunnel the connection between my node running a container back to the internal of Kubernetes and some scale testing Well scale testing It means try to break things basically But at the moment we are already as I mentioned before Capable of scaling at a decent amount of resources at the Vega site All you know concluding The deal so on one end there is the Kubernetes community Effort to maintain a framework or a virtual keyboard framework for interlink So where all the complexity are maintained centrally Communication state machine, whatever And on the provider part that is the focus of creating really the value of this project so granting access to More resources in a way that is as much transparent as possible for for people For all the stakeholder if you want We really think that interlink can extend our provisioning model We feel that can be of interest also for other communities So we are really looking forward to any feedback that can come from It's beginning with HPC is not limited We are already working toward other kind of implementation where we have distributed infrastructure Providing a container as a service and pointing them the interlink to the API of that container as a service and we also have in in the making other are in delay lines for LHC experiments dedicated ICAC Center analysis infrastructures and Not last but not least giving access to via cloud to all our set of distributed site in this way From the provider perspective. I Think that it would be very nice To consider that this kind of solution can speed up the delivering of a Kubernetes style access to our resource to resources of your provider without deep knowledge of the internal Kubernetes and also We are going to work toward a library or registry very soon So stay tuned and if you want to participate, I'll let you the link to the Slack channel here with some Also to the interlink documentation and the talk is in this cat Thank you So we have time for two questions if any Thanks for very interesting presentation. I have a short question about this network connectivity troubles So especially when you bitter have use case you really need to proxy to this you bitter lab server you running so how you solve it to like Yeah, the trick in this case was Actually that the Jupiter hub instance was exposing an SSH and point an SSH point and point were Authenticating the user through the access token that through the application token of Jupiter up So the the notebook were spawned and was configured to open to nailing the connection to The SSH server with the password that is the token that the Jupiter not to look at But this is only working at Vega because we are working on We have open communication between the Jupiter hub and the notebook Of course, it cannot work in any scenario possible, but that's yeah an open field. I think yeah Very nice talk. Thank you. I wanted to know if this was available anywhere online or if we could There's documentation we can look at Sure, sure. So you can find The the talk first of all you can find in a schedule here so in the in the application but then there is the interlink documentation link that is actually some Quick start guide for it local development So where everything is local so kind of fake you work with a Kubernetes cluster talking to a docker compose Or you have more development guide So where you can deploy virtual nodes starting with a github authentication And it's really a work through that takes literally five minutes to to complete And then the nice thing is that we are also starting to provide SDK for that So we have example how you can create from scratch the interlink Plug-in for any for your provider basically the example is in Python pool Participating for people proceed being preferring this but can be extended as needed. All right. Thank you all then