 the test work. Thank you very much. Obviously always a challenge to talk after Tom, who eloquently draws a line from more than 70 years ago. I think it was 45 when say the paper from von Neumann was published till say 10 years in the future. This talk is really completely down to the earth, much more boring. So I think the best thing to do right now is just to admit it's from the very beginning. So it's talking about say an ambition with a project which say a number of supercomputing centers are now say preparing and I hope that we can say also formally start off say very soon. And it's about something which we call Phoenix which is about say trying to basically aggregate say infrastructures and to augment infrastructures which are partially already in place at the involved supercomputing centers. And it's about to say establish HPC and data infrastructure services for multiple research community. So it's really and that's something which I want to explain to you and I think it's also an invitation then also to this specific community which is say at this conference to engage with us is to say work with us also in building up this infrastructure because it's meant to actually support communities and we want to do it in a way that we actually say facilitate federation of the infrastructure at different locations here around around Europe. Now as I said I mean it's really meant to be a science community different approach and actually and that's why I put the logo here it's the human brain project which is for say one important reason say going to be the main driver it just for the simple reason is that it's we expect to get the money for getting going from actually exactly this project so that's why this project is in the primary seat but it's also say about trying to involve other communities so in order to realize this infrastructure and to first enhance this infrastructure really in a co-design approach so to try to understand what are the needs of the different communities it is about trying also to establish a new kind of model where say science communities can buy into an infrastructure such that we as phoenix resource providers say act them as say service providers who provide IT infrastructures to science communities but that of course also means that the communities have to get control on the way of how these resources are actually provisioned then to the community so the resource allocation is actually managed by the communities and that is something which we also try to establish within this this project now just also I will explain it a bit more detail later I mean just to already highlight at the very beginning say there's some things which we try to do different than what you can as today expect from a supercomputing infrastructure is that we want to add say more interactivity we want also to add say the ability I mean to do a more elastic provisioning of scalable computing resources and then the third point is say the federation of the data infrastructure now I put here a disclaimer I mean we are still say very much at the beginning so a lot of it is actually still open to say final architectural decisions but this is on purpose because as already stressed we really want to do it say with the right science communities we don't want to just put something in place and then hope for everybody to become happy now who are the players which are involved in there it are say five supercomputing centers it's in Barcelona BSC in Francia then here in Italy in Bologna Cinecar CSCS in Switzerland and then in Germany it's the newly supercomputing center so it's all sites which are say provisioning tier zero resources within praise I think also what all these sites have in common that they are quite strongly linked then to different science communities so really something which I at least personally also believe to be extremely important now this is not meant to be a closed shop so I mean we actually also envisage that in future this consortium might actually expand now what are the kind of research communities which we are talking about well first of all just also form that's where we get our money from it's the brain research or the more specifically human brain project which comes with its needs for being able as already mentioned also by Tom I mean the scalable simulations of brain models but actually at the same time and that shows also the diversity in this research community it comes with quite some challenging data analytics requirements when it's about analyzing for instance brain images in order to reconstruct them high resolution three-dimensional atlases of the brain I think also one interesting aspect and that's also where you see that the community is trying to say have really collaborative approach is to build up a knowledge base as part what they call then the informatics platform now there are some other say communities which I we believe that are say prime candidate of also being say say good users of this infrastructure that is material science also there on behalf say large data sets which are coming from simulations but also from say experiments and I think obviously there is a European community which is already engaged towards enabling data sharing within for instance the different centers of excellence there then the area of genomics and also the area of physical science experiments where we have say some experiments which do have for instance also the need of bringing not only say a lot of data and being able to process it but also to combine it with say HPC simulations so it's really about say communities which have in common that they at least most of them have the challenge that they produce data from quite different types of sources it could be an HPC system itself it could be also say different experiments like say high performance scanners of images so you have distributed data sources and you have also say in a certain level of heterogeneity and you have the need of having a close connection to HPC systems where the HPC systems act as a source or a sync of the data so from this we would conclude that there is a need for an infrastructure to facilitate data sharing and to connect it to a high performance data processing capabilities and that is something which typically most of the infrastructures which we have at least in place in Europe don't provide so I mean many of the supercomputing centers are for instance involved say in any kind of computing but if you look closer in the center and Julie is not an exception I mean it's basically just one server where you can copy data by hand from one world into the other so there is not really a tight integration of the federate data infrastructure and the HPC now how do we want to address it also say from a more architectural or conceptual perspective and I think there's also say an important change in the way of how we want to set up things in say really provisioning of resources in a service-oriented manner so to really say also separate say concerns and to and our role as Phoenix resource providers is to really focus on infrastructure services which are suitable for different science communities so we cannot afford it to just do it for one single science community and that's why we also position this project even so initially the main money comes from one specific community in a way that we can expand it also to other communities and then we expect then these communities on top of these infrastructure services to come up with their own community specific platforms and they're talking here to Francis um a material science community for instance something like a there is a good example for having say a platform which is community specific which can run on top of such kind of infrastructure and services then we want to federate these these infrastructures um for different reasons I mean it helps to enhance availability of the infrastructure service it also rather say the the kind of services which RSA available because we don't expect that each side the same kind of computers to be available but it also allows for instance to optimize for data locality so I mean for some of the data sources I was mentioning for instance what we in Ulig are doing in the context of say creating of brain atlases we also talking here about petabytes of data and so you don't want to move this kind of this amount of data all over Europe but ideally you keep this data locally and then you allow others to actually use local services in order to access the data now a lot of this aspect of course is strongly inspired which is in the in the cloud but I think it's also important to realize that there are some things where it's still say different from the cloud so it's not us turning into say um say cloud providers like like Amazon um I mean we are aiming for say a very limited level of virtualization so that you can really get the resources which we should need and it's also I think largely about the different types of business model in terms of that we will charge for provisioning of capabilities of IT infrastructure not so much in terms of the actual consumption of these resources right so there's a big difference in there because if you would say basically charge according to say consumption then we would have to charge as much higher rate but we just say okay we provision a certain capability to you and up to them to community to actually use that um sake of time let me just um immediately jump into the kind of services which we plan to provision in in this context it starts we say computing services where as already mentioned we will add interactive computing services um then of course there is also scalable computing services but if you think about in for instance say solutions like either where you have an idea demon you need somewhere uh say uh opportunity also to have a service which is continuously running so you need uh say the m services stands for say virtual machine services then there is different kind of data services and of course you have to embed it into an authentication and authorization services you have to manage users etc so let me go in for some of these services a little bit more more detail for say the interactivity um and that's something where we have seen say the the brain community driving it a bit not necessarily it to the extent we initially expected it um but um they are still say pursuing it and we see it also emerging in in other areas and that is the need for being able to actually couple to scalable simulations in an interactive way and in a way that you say monitor the progress of the simulations but also in the way that you interact with the simulations um and that could just mean of being able to interrupt a simulation on the fly when you just notice okay the simulation is moving in the wrong direction it is not exhibiting the kind of properties which you would expect but also we see here say the stronger need for being able to interact with the process the data so to have basically the ability to log into the system and to have access to to all the data and to have then also the ability to use interactive frameworks like Jupiter networks are or maybe also matter of octave now a second type of services is say the scalable computing services um and one thing where we really want to say expand as there as we see some use cases in the context of the the brain research but that's where we actually still more in the searching part um and that's about say elastic a scalable computing services and there are different options of how to actually implement it I mean you could think about having a kind of a checkpoint resume mechanisms where you say okay there's some long running simulations we don't care about say so much on when they finish so you checkpoint it in order to free up resources and then allow for an interactive um job to quickly scale out for um say a larger job the other alternative is also to say reserve essentially a set of nodes for these kind of um calculations so what we see for instance in the in the brain is is in the brain is that people are interested in our coupling simulations with uh neuro robotics experiments so when you start to have some real-time requirements and you don't want to submit a job to the bachelor's system and the bachelor's system it is wisdom I mean schedules then uh the simulation at uh 2 a.m. in the night um when you're not ready to do your robotics experiment now this is say one example and that's the reason why I explain it here where we really say open also for say co-design in order to understand what are say the needs uh so what is basically the upper limit for acceptable response times are you willing I mean to only wait for minutes or are you say willing I mean to wait for half an hour or an hour until the resources are made available and in what range should we be able to scale um now also there is in the area of the way of how we want to provision uh say data resources where we want to do some of the things uh differently and I will explain the reasons why we planted this infrastructure to actually differentiate between different types of storage and um the terminology which we are currently using is um that we distinguish between so-called archival data repositories and active data repositories and possibly upload buffers but let's focus on the first two archive data repositories are meant to store data for a long time they are optimized for being extremely reliable they are optimized for capacity because that's where I say the primary data products are say accumulated but they may not necessarily be particularly good in terms of um the performance at which you can access them uh in particular if we talk about say highly scalable from highly scalable compute systems on the other hand um we want to have say active data repositories which I say data repositories we actually put in the vicinity of computational but also visualization resources because for visualization you need to have short access latencies for when you access the data and these repositories we say foresee being used for storing temporary slave replica of large data objects now the reason for that is actually if you look at the kind of say storage technologies um there's basically uh you could see say two areas and on the one hand you have the say the technologies uh to which we are used to in the context of HPC uh so that are highly scalable parallel file systems uh which have proven to be able to to cope uh with say 10 thousands or even hundreds thousands of of clients um and they all say have a common that they are basically focused on having a POSIX or almost POSIX compliant interface on the other hand if you look more say in federated architectures like say cloud architectures um there are different kinds of solutions which are being used um and they are say much more flexible um they also um make it much more easier I mean to federate them as they for instance support federated identities um but they are not suitable for say accessing them from a very highly scalable system and in order to basically take the best features from both of the worlds I mean the kind of architecture which we are implementing is that say on the one hand we have um archival data repositories which are based on these kind of cloud technologies so which are then going to be federated uh so that's why they are based on technologies which are suitable for a federation uh but on the other hand we have active data repositories which are continue to be based also not excluded but also on these parallel file systems and through which you then can access them the data uh with say scalable compute services and of course you need then to put services in place we call them data movers which allow them to move data back and forth from these slower uh but federated data stores into say the active data stores um all of them of course needs to be integrated into one AI infrastructure and that's something which is also say not not trivial but it's it's necessary so that you have your same ID which you can use um to access resources either at DSC in Barcelona or at Ulich in Germany um and so there's quite a lot of technicalities around that I mean to make sure that you have on the one hand say you enable the community to actually obtain these kind of IDs but on the other hand also say security requirements are taken to account now let me at the end um take you through also one of the aspects which are highlighted at the very beginning and that is the way of how we envisage the resource allocation now what we have here is basically we're talking about three different um actors so on the one hand we have those which are currently say part of the Finnish consortium that are say the resource providers so those who provision uh compute resources storage resources and then we have Finnish communities so that are science communities which say have a buy-in into this infrastructure so they are eligible to say consume resources there and members of these communities we call them the phoenix users now the roles then is is that the the resource providers obviously they provide the resources for a given period of time to the phoenix communities based on say the agreements which are in place and they also define the rules according to which the resource is allocated but they don't do the allocation itself because at the end it's the communities which say provision the resources for realizing the infrastructure and so it's up to the communities to actually do the decision on how the resources are distributed and the way of how we foresee this being done is that the phoenix user they submit to their community a proposal for resources and then the community which then reviews the proposal and then awards available resources to the phoenix users so it's the resource provider which only defines rules and that is say one of the rules that there is a kind of a peer review process in place to make sure that say resource are distributed according to scientific excellence but at the end it's the community which makes a decision and for this we also say um and I only want to briefly mention it we have say defined the concept we call it say phoenix credits that's a way of how we express the resources which are available in this infrastructure and which we then can provision then to a science community or phoenix community and then the phoenix community can distribute then these credits according to the rules which have been defined beforehand so um to briefly summarize um what we really would like to have is uh to have say strong science drivers towards the data-oriented federated HPC infrastructures um there's quite a lot of opportunities but also say quite some challenges which we have to to solve in this context like putting an AI in in place to also bridge between process and cloud storage technologies the integration of interactive computing services and also to establish really a new model for allocating HPC and data resources to research communities and we try to do that in in phoenix where phoenix is a group of super computing centers and they are currently now engaging in a very first project it has a bit of a clumsy name HPPSGA so it's a special research grant under the framework partnership agreement of the human brain project um and it's called in IC which stands for interactive computing e-infrastructure um and but it's not only say for the brain research community it's also for a wider community so part of the resource of actually provisioned be provisioned through praise and that's where we have some say open as then also ready now to invite other communities to engage in this and I just would finally like to give credits to now a growing number of colleagues at the different sites which are involved to that thank you very much