 Hello everybody, I'm yeah, I think if I start by introducing myself as most of you will not know me I'm Daniel Bayer from Lubeck from the Institute of for Neuro-Empire Informatics and Well, I'm not a deviant developer, but a user since seven or eight years and I'm here to speak about deviant for computational grids. Oh, and maybe you wonder who was that Muller guy is I'm PhD student and he's my supervisor. Actually, he's Actually will become a deviant developer soon To the outline at first I will like this or At first I will give a short introduction to grids as I think most of you will not know them and when I would speak about different users like the biologists mainly the biologists and When I will explain how deviant comes into how deviant fits into this And then the main part I will explain how to use deviant in the grid At first, what's the grid? So yeah, I'm first and proposed with definition a grid is a system that coordinates resources that are not Subject to centralized control using standard open general purpose protocols and interfaces to deliver known trivial qualities of service so But most important for the most important points of this definition are no centralized control, so it's distributed and Open protocols and interfaces So it can work together with other grids where different communities all of them use are using different softwares and This has to be interoperable Which has to work together? Quality of service, so I'm very different kinds of grids So different kinds of service, but basically it's Basically, we are data grids and computational grids data grids. We are just storing data Like it's generated by the biologists or other physicists We are generating a huge amounts of data and it has to be stored for several years the computational grids we are used to process this data that is stored and I'm about here here to speak about computational grids I already said we are different Good communities and we are using different software. So I'm from the Nordic community which is a good community in the Nordic countries like Norway, Sweden, Finland and some countries nearby Actually, this map is a little bit outdated. I think it's from 2005 or something. We are now some more participating The Nordic community has its own middleware. So most good communities Where use different middlewares, different software, but in most cases, it's just some subset of globus In the Nordic community, we got our own middleware, which is called ARC. ARC is special in that ARC is special in that it is minimally invasive and So it's yeah, it's pretty low It's pretty small and lightweight and easy to install so it's possible to install ARC Server and one day alone one person one day versus not possible of other grid middleware So that's a that's a special point How does the ARC middleware the ARC middleware works so so basically the grid compute The compute elements will do the processing of the jobs. These are the local batches teams and So the job comes in via the grid is submitted to local batches team for processing and the results are transferred back Also, this has to be coordinated in some way. So there's a good information system which Announced what kind of architecture what how much memory? What is available? What kind of software is available and stuff like this to Yeah, yeah to announce everything the shops itself. We are sent to two sites who provides the necessary software and Additionally, the data is also transmitted of course To be users. So there are different kinds of users in grids. Basically, it's the physicsists way Well, yeah, I think they started with the word thing. We have many We're running experiments for example in Geneva as the same and creating huge data sets searching for some particles. I think the explosive or something and Yeah, so the grid was developed to fit to where needs If you're more interested in this the Atlas turn project is As a certain Atlas project is the one which is doing this one Also, Atlas is the name of the software collection Which is doing the processing of the data on the other hand were for example, the biologists The binoculars were pretty very different from the physics is as in that there's not one big project doing everything But we are a lot of small project, which yeah, which do not which are not cooperating that much and We're doing stuff like yeah, analyzing the DNA and And others so the main difference between these both is The biologists we're using Hundreds or more different small programs where the physics is are using only one big one So the needs of these users of these groups are very are very different And of course we are other user groups like scientists as a computer scientist Astronomers and many more Basically, everybody who needs computing power and don't has his own grid has his own cluster is using the grid so How to embed the grid computing into work groups? Well, basically, we are the big ever the large work large work groups like yesterday we hold Like yesterday was a talk from Tim who explained how I do this as a singer Institute We got their own cluster. We don't need the grid. The grid would be yeah, we gain nothing by using it so for large groups the grid is just some extension of the local batch system of a local cluster and Maybe nice me but it's not essential But we have some small groups a lot of them and they don't have it on their own cluster or their own infrastructure So they strongly rely on the grid being available to do any processing which is not Which is not feasible to do on the local desktop. So This is So for them the grid is is essential and often no problem these small groups We don't got it. We don't got it people. So there are maybe especially in the red labs and stem biologists sitting and So one was the most clever in it things we have to do all the work and Yeah, this does not scale. So so for them the grid is important And maybe and we are already some science related projects like David science and David made were providing software for for different sciences and We project or already have a nice property. So As they allow to integrate this different groups this large one of the small ones because the small one we cannot They can just not They just can cannot get all the software inside themselves. Also. So it's it's nice that The large groups they can do the packaging can create packages and everything push it to deviant and the small groups Just can't roll from there. So it's oversight they've been already provides a lot of service to work groups and Yeah, now to the good part So why using deviant for doing good computing? I think I don't have to explain all the adventures of deviant here, but I want to point out some I think are important For example, the first one with David, you don't have to hunt for packages It's just where so if you use some other distribution and you're asking around I need some package the answer is use rpm seek and then you do this and we find something and maybe it works maybe not so If they been there's no need for this. It's just going and It's selling with a way where I get and it's rare another feature of deviant is security updates for many years I understand that this is not that some don't think this is a feature but only because of long these long release cycles But it's still very important to have security updates still three or four years after releasing the system and As long as you hope this will also be available if the lease cycle should become shorter Of course package system is very good and one point Damian. It just works somehow. Yeah, so I never managed to screw up a Damian system, but I was sitting in front of Other links of another leanings to this distribution some weeks ago and just insult something and it broke completely So I never had this experience with Damian. So this is one of the Yeah, of course, we are also known technical reasons For example as a helpful community a few years something else when well the community is helpful but we're not that technically addicted and We try to help but we can't because we don't know Because I don't know much about the techniques behind and with David and you all always have some clever mind who knows how How it works and can explain the error and can fix and can fix it or can say how to fix and Of course because all of this reason before there's a personal preference for deviant and that's not only for me but for a lot of people And But there are some great communities will require you to use of some other operating system. So actually If you want to be part of this community when you have to use for example scientific Linux if you are in a G or Whatever, but not a grid. We don't require such a thing and so everybody's free to use what he wants and we want to use Damian so the ARC level handle layer handles the differences between nodes So you're using Debbie and someone else is using say fedora or whatever and if you reference a file in the file system It will trend does the translation for example Etc. Say RCD whatever it is on Debian may not exist in Red Hat or Fedora So it there'll be this configuration things which are different Between Fedora for example and Debian does the ARC layer the the grid layer handle handle doing the translation To give you a single system image or what is it providing for you? What they mean is for me The grid layer good wire layer the grid layer good layer. Yeah, so Was a good layer is not specific to any distribution. I think so Adjust the policy that you have to use some in some distribution for binary compatibility or something and when we are force for example the use of scientific Linux, which is very strong in computing and There's no such enforcement when of course you can use what you want or what does Yeah feasible what is accepted in the community you are in Next slide. Okay. And so one way to use grid is one way to use Debian and the grid is of course using it to run the middle there with which means ARC in this case and a system running ARC has to Leads with additional Except for ARC with packages. So we need the global toolkit. So we are still lying on it or although we got our own middle Yeah, but for stuff which is into in the global toolkit. We don't implemented it to use but use the global toolkit The global circuit itself is a collection of different parts and these are the parts which are needed by ARC You may wonder why there's an open LDAP The global circuit comes with its own very old version of open LDAP and it's needed So no other open LDAP, but the one from global It needs a good packaging tools. These are needed for compiling the global circuit and Optionally warms bombs is a virtual organization management system and of course some open source tools and libraries which are available in Debian Looking at this list, of course, there's a question of how to package Debian how to package ARC and we tried and Some problems arise. So the first problem is that global is not really designed to be integrated into this into this evolution So it's it's using a known channel build system. It's comes with own open LDAP and and so on. So it's And it's huge. So it's difficult to get this in shape to be included into the distribution the second problem Sometimes you need specific versions of libraries. So for example, we need open SSL 0.9.7 Which is quite unfortunate as versus being dropped from the new suspensions. So actually to compile ARC I had I had to get Open SSL headers from stable now it's all stable Sometime back because there are no development package of open SSL 0.9.7 any more in unstable testing Actually, this is the dependency of Globus and problem with she soap. This is more complicated So Actually, there's no version of ARC the current stable ARC There's the other way around the current stable ARC. There's no version of she soap in devian which supported So it's ever too old or too new which is also not that good, but this is fixed and unstable in Subversion now and so the current ARC is working with the she so from unstable and A last problem the license issue ARC is GPL software, but use open SSL And of course this has to be changed, but there's not that much of a problem as the community the developers are small and So it should be possible to add close Much more interesting is using devian on the worker node. So It means the one doing the computation How it works Usually usually always the jobs that were coming in via the grid and when we ascended via the local batch system to the worker nodes and the jobs, of course, they need to be any applications They're just coming in is only the data and some some small star script and so the action the actual application must be pre-installed from the worker node and This is a lot of work for the local system and so you cannot do this manually So the idea is we are using devian and doing packages for providing the software in the most automated way This is what we try to do, but at first I want to elaborate a little bit on the current situation how it all works and So nowadays the site admin he has to To install some software he has to even makes the round to all worker nodes or using some automatic system and Inserting the software at all in all places and then he has to prepare a runtime script Which is sourced by the job multiple times and enables this run time environment run time environment RTE When the middle where so arc in this case scans for this runtime environment scripts and If it finds any it announced the availability of this run time environment via the grid information system and the client the client search in the Client if you want to submit a job he searched in the infosystem for this run time environment and submits the job to the site Which announced the availability of this run time environment? So this is how it now works the both The what's the middle where the client is doing is fine, but this manual deploying is just not does not scale So this has to be changed Another slide on this of course you have to organize somehow You have to agree on the names of front environments and this is currently done by our website, which is hosted somewhere in Finland and Well, let's zoom in and Where we just got the name of some one-time environment this case up spy or just the core 1.0 and the description which is a bit short in this case and some other major information and Yeah, so of course this also does pass on scale This is feasible for maybe 30 or 50 run environments But if you are doing this for a hundred or a thousand then you can't do this management by my website So this are the current problems with with our deploying software. Actually, this is an open problem. So There's nothing better available. So I think most people are just installing manually or something like this Well to address this problem, we created a new company a new service for the grid called the janitor janitor is is a component which Automatically installs front environments if they are requested by a job. So we get dynamic one-time environments and information about with a store about them is stored in a catalog which is already f-based and Allows, yeah to automatically do this Of course, they are also Removed if they are not needed anymore Let's work things in the grades of the grid in such a way, but it's Transparent so the client he does not know if a run-time environment is just available But he only knows if it's available He does not know if it's installed or if it will be if it will be installed if he submit the job He only knows it's available and he can submit a job and the job will execute How is this done? He does not know So on dynamic So the pieces of a dynamic on-time environment it consists of a unique name of course a description and the information how to deploy because this is a set stored in an RDF database and Yeah Of course, I'm you have to do support different ways of deploying because we use as we are using different Users with good sites by using different softwares. We are using Debian hours are using scientific Linux or whatever. So We have to do multiple ways and the first one we did was just using some tar file so the tar file consists of the actual software some instant script and the runtime script and So the software just rubbed somewhere the instant grips execute and the runtime script fixed and when it works so For first short, this is pretty good, but of course we are Our problems with dependencies. So that's supported between between RTEs and between tar files, but not between other things But okay, so the advantage of this is it's easy to create this files and It's easy to install them. It's just dropping them somewhere Also, we have no we are no purpose of conflicts because the conflicts The packages we are enabled by this runtime script. So we are living in some directory and the runtime script is forced when We're available if not when not so there's no problem with this This advantage is this advantage is We are not it is not not possible to be to depend on the system software the system software is Yeah, like ordinary Debian packages. So I can't say this random environment needs this Debian package It's not expressible. Now. The problem is how do we install them dynamic under environments, which needs the conflicting set of system software so if environment a needs some software and Under environment B some other software and their conflict This is also not possible. And of course, it's easy to create top of these tar files But if you have to if you have to do is for two bits four thousand four thousand when it's maybe a lot of work So it has to become better and the idea is To use Debian packages and virtualization to solve this problem to make it fully automatically and Easy usable Yeah, so we want to Yeah, the idea so is to dynamically create virtual machines fitting the jobs requirements and execute the job within the virtual machine Creating virtual machines is quite easy if you've got a minimal one which is created in any way and You can just add packages using for exchange switch and change route and access app get and When you get your image as you want it if you are only using Debian packages, of course Also the build of course ability means are doing the same thing The main advantage of this is that dependencies on system software are handled automatically and we don't have to clean up after us because We are no conflicts Because we are just throwing away the machine and creating a new one Okay, so the current state of work the acquittal we have since 2001 and it's It's still developed of course currently we are developing version Arc 1.0 This is done by the new project. You may be noticed this at a local in the bottom This is a project founded by the European Union to to develop the next version Yeah, so what is mostly needed currently mostly wanted is the batch this image knows what's saying So currently there's no way to execute a virtual machine via a batch system or some things or start a little machine So it has to be done in some Yeah, either by hand or via some hacking scripts and so on so it's quite difficult and So this is the main problem with this approach currently it's it's missing some some infrastructure and of course We also need some more open community for grid computing and I think this was the last slide not for conclusions and So the main advantages of the advantages of the process We get debian I made this yesterday so um Okay, so by why are you my main advantages while using debian we can if you can do this automatically So we don't have to worry about problems and uh, yeah like this Okay, just give this slide Acknowledgements. So this is a list of the people involved in this Not a good from not a good and no work and uh, yeah, for example, unless he's doing the packages for debian and avers and uh Frederick also to some level but it's also What I have to say, uh Ah, yeah, I forgot um, if you're interested in this kind of work and uh, Don't mind spending want to work on this and don't mind spending some time in Lubeck when you can also just Call me and I think we can't have you around so if you're a student, of course and uh Yeah, nothing. So thanks for your attention and questions My name is my name is Andreas Tiller and You got it Hi, my name is Andreas Tiller and I want like to know Um, if you talk to the upstream maintenance of globus because you Told us that it is not really prepared for Integration in distribution. So if from a minor point of view you should go to them and teach them how to do Yeah, one should think that but um Well, I'm not doing this much of developing at the middle of where and this of So I'm not that much into this but um The guys developing the Nordic middleware arc. They had to uh Yeah, we had to patch actually globus a little bit and made some changes. It's I think about 20 patches or something. So in this Mount and um Very sad. I don't know that we are quite unresponsive. So it's difficult So I don't know But the problem is if if several people are using this software and they are working together in Nordic or whatever Wouldn't it be a nice goal to approach to say well build some package of this software And work together with these people that you have some common base and you told me that arc would need only one day to install so This is much and in my opinion you could do it better if it would be a package and would need perhaps half an hour Of course, you could um, I mean there are packages for rpm based systems We are building and well actually unless building these packages, but um Currently there's no package no real package for deviant, which is a little bit unfortunate. It's um Well, we are more problem with problems with globus for example So are we good communities are also using globus, but it's almost a different subset of globus So globus true could test many parts and some are using this part some are using this parts in different versions and Well, but we are talking about integration Integration in deviant, so what do you see any chance or do you want to stick to the current status core What I would like to have arc and both globus and deviant of course actually we are working on this so but um I don't know. There's some problems to solve still Okay, you have some homework to do. So I think I think Stefan got some packages, but I don't know if we are working so Okay Next one Thanks Andreas So you said that you're integrating in external third party things using tar Tables if you just go back to that page for a second Just go back to the page on that you had on Integration of third party Yeah, that's one The dynamic runtime environment is a little lacking So what's the status of that? I wasn't quite clear that you actually are doing that for um, this is what what we are doing So that's that's working. So we get the star files and we can just drop the contents and and when you have Runtime environment available. So um, so this is working. Okay. That's just the time out that So for a lot of the software that you're talking about for example things like blast or probably other bits of I'm only familiar with Biotech stuff, but they probably aren't in deviant Have you thought about contributing those packages that you're creating in the deviant stuff back into deviant? I mean getting them and For example, uh, one of the pack one of the things you might install in a runtime environment if you're doing You know biosciences would be a blast analysis package Yeah, of course Um, but those not all those packages those third party packages will be in deviant if you're creating a dev file of those Are you upgrading? Are you? Is there any thought of you know pushing those back into deviant themselves? Or is there any? Well, I have some some problems understanding you maybe because we discuss this later Okay, I'm still I'm still typing or I was out Can you hear me or not? No. Yeah, I can I can hear you but um I wonder if you are still we are okay. So um well, um Well for now, it's only the star package thing because it was simple to do and it's working everywhere as I said So if they are using scientific linux, which is quite common when we can we can also do some tough package stuff and Well, of course, I want to go to deviant packages, which means also including them into the distribution because it's just simpler but Yeah I'll have a chat you afterwards So more questions Thank you. Thanks again