 So good afternoon. Thank you all for coming. My name is Francisco Brasileiro. I'm with the Federal University of Campina Grande in Brazil. Together with Andre, we're going to present a work that we've done at the university to ease the task of people that have a bag of task application and want to run them on the cloud. So a bit of the outline of the presentation, we started with a bit of a motivation for the work. We show how, what's involved in managing the execution of a bag of task application. Then we show how the AuroBall system is designed. Then we show how we can connect good cloud using this middleware called Fogball. In between these two parts, we're going to show small demonstrations of the systems in action. Then we open for questions. So many science applications. I don't know how many of you were in the previous presentation where people from the University of Melbourne have said that 9% of the jobs that they received in their HPC system was for one core. So these are very popular applications where the computation, the size of the computation comes from the fact that you have to run many small computations. And this can be very easily parallelized. The other thing is that they come in sporadic demand. So for instance, in the university, two months before the deadline of a conference, there are researchers using the computing facilities to get the results of their experiments. But after the paper is submitted, there is a time where they sleep, they eat, they do other things. And so you don't have that much demand from that specific group of users. So this kind of, the other thing is that the faster you can run this scientific application, the better because you can speed up the cycle of research development since you can run faster your experiments. You can analyze the data. The data will be ready for analysis earlier. So you can plan a new set of experiments and run a new set of experiments. So the time to, here's not the time to market, but the time to paper is reduced if you have a lot of resources available. The cloud is a nice setting to run this application because theoretically it's the same cost. If you use the same amount of computing power, it doesn't matter if you do that in one hour or in one year, you are paying the same bill. So if you have lots of resources available, you may be able to run your application much faster. Okay, so what a typical user has to do to run a bag-of-desk application in the cloud. So the user is sitting there on his or her workstation and has access to a cloud service. So it has to create a VM. Then it has to stage input files in this VM. Then it has to SSH to the VM to start the remote execution. Then it needs to monitor this execution from time to time. And finally, when the output is ready, you have to retrieve the output data. And if in a public cloud you want to destroy the VM, in a private cloud it would be nice to do that as well, but some people don't. So this is quite time consuming and you have to do that for thousands of, of course, you have to automate this procedure. And this is essentially what the AHAbo service does. So we have a service that runs in the same administrative domain of the user and the cloud service. So what the user does is simply to submit a job through a command line interface to this AHAbo service. So what is a job? A job is a file that describes the tasks that need to, that belong to that bag of tasks application. Normally, this file will have hundreds or thousands of blocks like this one, where the only thing that changes is the input file and the output file. So normally you will have some, some script that will automatically generate this job file. It has also some other clauses that can be used. You have the label to name your file, the requirements for the resources that you will need from the cloud. These are comments that should be executed before any of these tasks. So when you create the, when you access the, the resource in the cloud, the first thing you do is to, to run what's in the neat part of this job. Then you run tasks and after each task you run the final part. So it's pretty straightforward what the user has to do. After submitting the job to the AHAbo service, then the user needs just to wait for the results to appear on its, on the desktop. So what the AHAbo service does is it creates the infrastructure. It stage in the input data. It starts the execution of the task, monitors the execution of this task. If a task fails for whatever reason, then it can be, it will be automatically restarted. And then when the outputs are ready, the AHAbo fetches the, the outputs. And most importantly, it destroys the VMs. So we'll shift gears here and I passed over to Andrei that will show how this works in action. Okay. So here in this first demo, I'm going to show you how AHAbo is used to automate the task of submitting a lot of smaller tasks to the cloud. And I hope you can at least have an idea of what is being showed here. So what happens here is that we have a job description file, just like Fubika said. And in this job description file, we have five simple tasks. And the tasks here are just sleeping for some time. So it's the simplest task you could have. And after you get the task, the, the tasks described in the job file, what we would have you is that you would submit this job file to the AHAbo service. And here we're using the command line client. And before it completes, let me show you that the, this virtual machines here were already there. And in a few seconds, a few other virtual machines will be created from the AHAbo service. So let's change to the AHAbo dashboard, log in. Then we were able to see the job, click on the job. And then the, here it has already interpreted the job file, detected the tasks that will need to be executed. And now what happens is that it will start creating the VMs that will execute those tasks. And then if we wait for a few seconds, so this video is a bit faster than regular speed, the tasks are now ready. And the instances will be created in the horizon interface. So you see five instance have been created. Then if we go back to the AHAbo dashboard, we'll be able to see that the tasks will start running soon. Okay. So now they're running. They're running those five VMs that were created. And next what's going to happen is that when the execution finishes, then the VMs will be deleted. So one thing that the user may want is that the VMs be kept after the task is completed. Because then if you submit a new set of tasks, what happens is that you can reuse the machines that are already on. And that would reduce the overhead of the initial provision. So now the tests are completed and the machines will be slowly deleted. Okay. So Fubika, maybe you want to detail a little bit more about Fubika. Thank you. Well, D, so all is nice. But the problem is the user has a quota in the cloud. So I want to run my 200 simulation, but I can only spawn five VMs. I'll rather spawn 200 VMs and run all the simulations at the same time. So we need to find a way to increase the capacity that I am able to reach. So this is true even in the public cloud scenario, because out of the counter, the number of virtual machines that you can create even in a public cloud is limited. And this is because it facilitates the long-term capacity planning of these providers. So capacity can be extended in several ways. One way is using a multi-cloud. So you have access to several public clouds and you create VMs in all those public clouds. Another one is cloud bursting. So I have my private cloud. And when I exhaust the limit of the number of resources that I can use in my quota, I could jump to the public cloud. I could federate private cloud. So the idea is remember that this kind of workload is very ephemeral. So I will submit my 1,000 tasks job, but then I will stay quiet for a couple of weeks, digesting the data of that experiment. Meanwhile, my quota in the cloud would be available for the people to use. So we could do a kind of exchange of quotas in a federated cloud in order to be able to access more resources at the same time. And of course you can do combinations of all that stuff. So in the WebCG, we have also developed a middleware called Fogball. In fact, it's a suite of open source software that does different things. And among them, it provides support for multi-clouds, for cloud bursting and for federating private clouds. In the demo that we are going to show in a minute, we are going to focus on the federation of private clouds. Fogball also gives support to the deployment of opportunistic clouds using desktops that is particularly suitable to this kind of bag-of-test application, but we are not going to talk much about this today. So I think we can move to the demos so that you have an idea of how the system works, and then I will explain the internals of the system. Not today. So again, what we have here is the terminal that we use to submit a job composed of several tasks. And because this time the process is a little bit different because the high ball service will contact the Fogball service that federates two private clouds. It's useful for us to take a look also at the Fogball dashboard. And the Fogball dashboard here is showing that there is a user logged in, which is the high ball service, and we logged in as a user so that we could see in the UI the things that the service is doing in the background. In a summary, this user has access to 56 vCPUs, 36 gigabytes of RAM, and 67 instances. And these resources are spread into two different private clouds. So we have one here that is, it's a bit difficult to see, but it ends with rmp.br and the other ends with ufcg.do.br. And if you look at these columns here at this table, this table is showing the quota and the usage of the current Fogball service user. And here we can see that this user has zero instances in the rmp.br cloud and it also has zero instances in the ufcg.do.br cloud. And you can also see that there is additional instances here. So there are three instances in this ufcg cloud. These are the same instances that were already running in the previous demo. So let's take a look now at the job. So here I'm showing you again, so there are no instances being created in Fogball at this moment. So there are the same three instances in Horizon, which are the same three instances that were there at the beginning of the previous experiment. And let me show you the job file. So the job file this time has 15 tasks and these 15 tasks have a larger sleep time. So now I put here 75 seconds because we needed a bit more time to be able to access VMs in the different clouds, which take a bit more. Otherwise, all the tasks would run in the local cloud and then we wouldn't show what we want to show. Could not be a good demo. So again, using the CLI client to submit the job and okay, then we can change back to the eyeball dashboard, login, see that there is a job pending. Okay. So here we see that the job has around 15 tasks. The 15 tasks are ready to run. So it means that in a few seconds, the instances will start being created. So let's go back to the Fogball dashboard. And here we see something that it's interesting. So the way that the Fogball service works is through asynchronous requests. And what you see here is that the eyeball service requested 15 instances. And we know already from the previous demo that there would be not enough space to create the 15 instances in the local cloud, so the UFCG cloud, because the three existing instances have already used a large amount of memory. And as the VMs are being created, the orders will be fulfilled. Okay. So four instances created. Now seven instances have been created. And one other interesting thing is that it will prioritize the execution in the local cloud as we would expect. So what you can, what you could see here, if it was a little bit better to read, would be that all this instance ID, they have the name of the cloud embedded in it. So this one is maybe a bit easier to read. The instance ID ends with usg.edu.br. And these eight instances were created in the local cloud. But eyeball requested 15 instances, as you saw on the previous order page. So then the Fogball will help creating the additional instances. But let's take a look here at the local cloud. So horizon, the instance have been created in the local cloud. So eight instances. And if we update the page, then we would be able to see that the additional instances, so the seven additional instances have been created in a different cloud. So if you cannot read the name, at least you can see that these names are longer than these other ones, because they are in the, they use the other cloud name, which ends with rmp.br. So I have seven additional VMs here. And now that the VMs are created, we would be able to see here in the usage page that the, the high ball service user is using its quota. So here we can see the UFCG cloud. So it has 11 instances in total, but eight for the current user, which is the high ball service user. And we see that with these eight instances, we have already reached the RAM quota. So that's why it could not create the 15 instances. And the other seven instances were created in this cloud here, which is the remote cloud. So now we have our 15 instances to execute the 15 tasks. The tasks will be running soon. Okay, so we have one running. Let's update. And then we have a few more running. So they're running in the VMs that were created both in the local cloud and the remote cloud. And now some tasks are already being completed, because it was not a very, a very long sleep. So if we take a look here at the instance, we'll see that some of the instances are already being terminated. Okay, so one configuration that the user could play with is that the, the instance could be held. But here for this demo, it's better not to use that. So when there are no ready tasks, the instance will be finished as soon as the tasks are completed. Should be very soon. Yes. So now we have all the tasks completed. And then it's just a matter of seconds to have all the instances deleted. So here you see in the dashboard from Fortbow that all the instances that have been created through Fortbow have been finished. And we could also go to horizon and refresh the page and see that all the instances created in the local cloud have also been deleted. Just the three original instances have been kept. So that's, that's it for the second demo. Right. Well, what I'm going to do now in the next five minutes before we move to questions is to speak a little bit what's going on behind the, under the his sabotage. So how does Fortbow work? So the idea is that we are building a layer on top of the, of the cloud orchestrator to deal with all the, only with the federation aspects of the, of the problem. So essentially we have two services, one that implements a discovery service. So it's this membership manager here and an allocation manager that interfaces the, the other, the users with the, with the federation. One thing that we didn't mention in the, in the demo is that those two clouds they use different cloud middleware. So the UFCG one uses open stack, while the RNP one uses cloud stack. So through Fortbow, Fortbow provides an OCCI API. So the ARBO services uses these OCCI API to interact with any cloud orchestrator in a transparent way. In fact, we can, we can also use settings where we have cloud bursting in, in, in private cloud. We have adapters that will work with Azure and AWS as well. So the first thing that one of the main services that the Fortbow layer provides is the, is a service for the federation of identities. The, the service works in, in, in layers. So at the federation layer, a local request will come with the federation credentials. These credentials are authenticated and authorized at the appropriate service. And this service can be configured. For instance, that cloud that we showed uses a federation of LDAP providers. But we have implementation with VOMS, which is people, anybody from the grid, European grid initiative there? Well, never mind. And, and another one that works with SHIBOL. So the, the deployment that we are doing with the Brazilian N-Wren uses the federation of identities that they, they run that's based on, on SHIBOL implementation. So these are authenticated and authorized at, at the federation level. And then you have a mapping to a credential on your local cloud. And this credential is the one that is used to access the cloud. So what you make available to the federation, to the different users will be defined by this mapping. As we've seen, that is a possibility to send requests that cannot be fulfilled locally to a remote cloud. And at the remote cloud, you go through the same authorization and authentication process with the addition that you also have the credentials of the football manager that is sending the request. So that is sending the request over. So you can use this information as well to define how you map or even if you are going to allow that user to use your resources. An important thing is that mapping is, is defined in an autonomous way by each cloud administrator. So that's why we call a federation. It's that there are some rules that everybody needs to follow but there are rules that each member of the federation may have the flexibility to define. The mapping is one of those. The middleware that you use to operate your cloud is another one. Football was conceived to be easily extensible. So we, the architecture is based on plugins and we have two kinds of plugins. One family of plugins is the interoperability plugins and it's in charge of allowing football to communicate with the underlying cloud. So if you want to provide football support to a new middleware, what you have to do is to develop the interoperability plugins for that middleware. And the behavior plugins change the way your node in the federation behaves. For instance, you can change the way you prioritize the allocation of remote resources. So one site might follow one policy while another site follows a different one. The other thing that we took care was to deal with the fact that in some of the private clouds that we needed to put in the federation that drove our development is that they don't offer public IPs. So they are private clouds, so they don't need to offer public IPs to every VM that was created. So we created a tunneling service that automatically provides a public IP to every VM that is created in the private cloud. Another thing that we use is a messaging service to allow all these components to exchange messages without having to open strange ports in your firewall. So this is used in our current implementation, XMPP. Next one. Some success stories. So football was developed in the context of a project that was jointly funded by the Brazilian government through the CNPQ agency and the European Commission. And in that context, the idea was to join clouds in Brazil with clouds in Europe. The clouds in Brazil were running OpenStack while the clouds in Europe were running OpenNabla. These clouds also belong to the EGI federation, so there are some services that the European partners wanted to use, for instance, this VO management system that they use. So we developed the behavioral plugins in football to allow the federation to use this federated service. Another case that we are currently deploying in Brazil is with the Brazilian N-Ren. The idea is to federate clouds in several institutions across the country. And these clouds use at least three different flavors of orchestrators, CloudStack and OpenStack being a tool of them. We are using, as I said before, a federation called CAFE for the identity provision at the federation level. And this was the cloud that we used in the demos that we showed before. And I think I conclude the talk with that, and I'm open to questions if you have any. I was told that they are recording, so if you want to make a question, if you can come closer to the microphone, so that the question will also be recorded, it would be great. Thank you. Thank you. Thanks very much, guys. What do you do if your token expires? I mean, the Arabal Service User Act is on your behalf, yes, but does it actually create new tokens? Does it cash your credentials as well, or does it just use the token and then, when that expires, are you unable to create new bag of task VMs? You mean the federated token that, as a user, you authenticate to... Well, it depends. Are you talking about VOMPs or any token? It was the federated acts authorization and authentication token. I think it was a few slides back. So the Fogbo authenticates to the cloud. So Fogbo has credentials to the cloud, and you may have credentials to Fogbo. So there could be different levels of authentications and different tokens in use. That's what he was asking. So any case that you're using Fogbo, Fogbo has its own means to authenticate together with the cloud. So it can continue working on it, whether your authentication with Fogbo has expired or not. I think the question was, if I create a VM and during the lifetime of my VM, someone changes my credentials in such a way that I wouldn't be able to create that VM. Yeah. Or if the token that you used to create that VM then expired because it was so old. But I don't need that token any longer. Once the VM is given to the user, it has a VM and an SSH to the VM, and the token is no longer needed. Okay. So all of the VMs that are created are created at the first moment. And... No, no, no. Right, yes. Anyway, the VMs you still need to be deleted. So you still need the access to do this. And that's done in the context of Fogbo, if you're using Fogbo. I mean, when you need to create the VM, you need to have the authorization to do so. After that, we don't... It's like opening a file. If you open a file and you have the rights to read the file and you keep the file open and someone goes there and changed that permission so that you can't read, you will still be able to read the file as long as you have the handle to the file. So it's a similar kind of situation. So it's a breach, but... Okay. What you can do is you could have a service at your cloud that would periodically access all the... But we don't do that. Any other questions? So I'm Samuel. I have a question about how the schedule works in the case. I have... Like in the demo, we have two clouds and we had other 15 VMs and some of the VMs needed to be created on the other cloud. What if I have more than one remote cloud? How can I define the behavior that let's say I want to balance that like half go to the second cloud and the other half goes to the third cloud? Currently, we have a very naive scheduler in our ball that simply passes your job, assesses the amount of resources that are needed to run that job and sends the orders to for-ball. And as the resources appear, it uses whatever resources is there as making no differentiation between them. Our plan is for the our ball service to... When submitting your job, you might also submit a scheduler together. Yeah, exactly because you may have different contracts with different remote clouds. Yeah. Okay. Thank you. And even different applications might have different ways to schedule the task. This is just a proof of concept to show that... In fact, what we wanted with our ball was we developed this Federation middleware and we showcased it. Everybody was very happy with that. So we wanted to have users into this Federation to try to use it. And bag of tasks seemed to be the easiest way to get people on board. So the our ball is just a facilitator for people to join in the Federation without much work with a slow, I mean not steep, learning course. Congratulations. Thanks. Can I... I came in late so I didn't miss the first part, but can I assume that football is federating instances, right? It federates create instant creation, VM creation from, you know, different clouds. The Federation, it can be different levels, right? It can federate identity, it can federate your cloud so that you can share resources like this one which is basic, this resource is basically just pure instances, VMs and nothing else. We can also federate, we can also... Well, what we do is with football allows you to have access to a number of clouds in a Federation to create basic resources. What are these basic resources? You can create VMs, you can create volumes and you can create networks within a single cloud. For instance, I cannot yet create a VM at one provider, a volume in another provider and attach this... We have done that. Yeah, I've seen the massages. Yeah, I would love to, I would be really interested in, you know, getting gap, but in any case, so one thing that, so you, what kind of Federation, like when it came in, you have already have, you know, multiple clouds up and then you have, you know, resource assigned to it. Is there any special schedule that does that or this is all manual, like this cloud will have only this kind of resource? You mean because there are already instances in that project? No, yes. For example, this cloud can only have, like, you know, 10 limit number of instances that can be starting up and the second cloud would have 20. Do you have some sort of schedule for that or how do you... Yeah, that's it. Yeah. Well, I'm going live now, so I'm not responsible for what to... So this is the Federation, the Brazilian N-Ren Federation. So we have those two clouds and a catch-all cloud for people that don't... So we have one of the models, the business models that we use in this cloud is that you have the notion of your local cloud and you have priorities over the resources on the local cloud and then you can cloud burst to other clouds. And you get priority on the other clouds based on how much resources you offered along the time to the other clouds. Do you teach how Paul Paul knows that this pool would have this kind of resource? Do you manually type in it? No, no, no. It... Perhaps you... We may be running out of time. Yeah. I'm just... So this component here is the one that does all the interactions with the cloud, but there is this component here that implements this directory. So this is the one that... And that information that appears in the dashboard is user dependent. Like I logged in as a certain user and then the resources that are made available for me, depending on my credentials. So you might log in in the same system and you will have different... And this is all based on the mapping that is done at each local cloud. So the administrator of this cloud defines how much resources I can access, how many resources you can access, and this service puts all together. Okay? I think we are running later. We'll be around. So if you have questions, we'd love to talk. Thank you.