 Today, I would like to talk about the interconnection between two processing platforms. So the C-Brain platform developed at the MNI in Montreal, and the virtual imaging platform developed at CNRS in France. So basically, these two platforms can execute pipelines on clusters. So C-Brain can run on the Canadian grid, and VIP can run on the European grid infrastructure. And we were wondering how these two platforms could interpret to exchange files, to have a common authentication system, and also to exchange pipelines. So the goal is quite technical. The main motivations for that are first, of course, to be able to access more data, because we could exchange files between the platforms, to access more applications. Somebody who had ported a pipeline to a cluster knows that it's kind of costly, I mean, it takes time to do this and to debug that, so we'd like to be able to exchange these ported pipelines. And of course, to share also computing resources. Even if each of the platforms have substantial amounts of computing resources, we also know that sometimes we have queuing times, et cetera. So that would be the main motivations. Another motivation, which is a bit more prospective, I would say a few words about that in conclusion, is to be able to reproduce studies between these two platforms. Each of these two platforms have whole machineries to run workflows, to trace data sets, et cetera. So ideally, we'd like to be able to take one experiment from one platform and to run the same experiment on the other platform just to reproduce the results. So the main developments that we've done so far, that I will just describe in the next slides, are first, a common authentication with Mozilla persona, then synchronized data directories between the two storage systems. And finally, while this is still in progress, but we've started to work on the exchange of applications as virtual disk images between the platforms. So just to say a few words about C-Brain first, C-Brain is a processing platform that you can access at this URL here. Basically it's a web platform that offers a possibility to upload files, to download results, to do visualization, and to run pipelines on the Canadian grid infrastructure, as I was saying before. It's a fully operational platform with an operational team behind it, and quite a large community of users as well. VIP on the other side has mostly the same functions. You can access it from here. It's a web portal too, where you can have access to a few applications representing pipelines. You have applications in simulations or cancer therapy simulation, image simulation too, and also a couple of new image analysis pipelines, in particular some FSL tools and free server. It's connected to the European grid infrastructure, which offers a network of computing centers worldwide, and it's also in full operation since a few years with a large user community. The first step that we took to try to connect these two platforms is to look for a system for common authentication, so that users don't have to have two passwords or even two different authentication systems. We've reviewed a couple of them, including X59 certificates, including SAML, including OpenID, and we chose Mozilla persona. Why? It's quite lightweight. I mean, it can be implemented with a couple of, I think, three or four JavaScript lines on the client side, and also just a few code lines on the server side. And also, compared to the existing solutions, it's supposed to respect users' privacy. It's also an open solution, so we thought that in a research context, that was quite a good technology. So basically, the system is, though I'm not going to enter all the details, the system works as follows. When a user wants to connect to a website, so it will basically connect with his or her email address, first, the system would generate a key pair in the user's browser, and then the public key is validated, is signed actually by the email provider of the user. So if your email provider supports this technology, for instance, Google or Yahoo, then the email provider would directly sign it, otherwise Mozilla provides a full-back service. Once this is done, then the user has the possibility to sign assertions to connect to the platforms, C-Brain and VAP in our case. Of course, this is all done in the back end, and then the platform can just go to the email provider to verify these assertions, to verify that the signature is actually valid. So as I was saying before, there are a couple of arguments which led us to show this, and I encourage you to try this if you have similar single sign-on issues on web platforms. The second thing that we have developed is file synchronization for data sharing. So this turned out actually quite easy because C-Brain is really based on standard technologies, so including SSH for data storage. So the only thing that we had to develop is a file synchronization mechanism between an SSH data server and European Grid Infrastructure. So we really chose, that was really a choice to do file synchronization because mainly it can mask data transfers compared to solutions where you would transfer files on-demand when you want, for instance, to import data from C-Brain to VAP, you would first need to transfer your data, to wait for it, and then to launch your pipeline. Here you can just define a shared data directory, you drop data there, it's synchronized in the weekend, and when you come on Monday or whenever, then you have your file synchronized. So that was really a design decision. The synchronization itself is just based on comparison of checksums, nothing really fancy. The final thing that I wanted, I would like to talk about is the exchange of pipelines between the two platforms. So actually before diving into the implementation, the first thing that we have realized is if we exchange pipelines across platforms, then very rapidly we will go into these kind of problems of reproducibility issues between the two platforms. I think we all know that when we execute the pipeline on different operating systems, or in general on different environments, we may have different results, and we didn't want to increase the mess by offering the possibility to transfer pipelines, and then we have, I mean that was a concern for us. So if you want to know more about this, you can go to the poster number 39. But basically what we decided to go for for this interoperability work is to transfer virtual machines between the two platforms. So when this is completely finished, we'll have a repository of virtual appliances that we could exchange between the two platforms. But the first step for that was to actually enable these platforms to use virtual machines to execute their tasks. So that's what we've done in Cbrain, so administrators can now declare disk images in Cbrain, and then when users come to execute their tasks, they can specify a particular disk image where these tasks will be executed, and then Cbrain takes care of deploying these virtual machines either on the clusters of Compute Canada, also on clouds, we have an interface to open stack to execute the tasks and to stop the VMs when this is all done. So that's it for the developments. As I was mentioning in the introduction, our future work, for our future work, we identified a use case for using this to facilitate the reproducibility of experiments. So we'd like to use this machinery so that experiments that I run by users on one of the platforms can be then published, can be associated with an identifier, we thought of digital object identifiers. And then this identifier could be referred in a paper, for instance, or a report. And then we could have another person accessing this identifier, resolving it. And then he would be redirected to the platform that was used to produce that. The rationale behind that is that these platforms actually have all the belts and whistles to enable this reproducibility story, but this should just be enabled. Finally, this user could request the access to the experiment, and he may or may not have the rights to reproduce it there. So then with the technologies that I've just presented, we could then redirect the user to another platform, VIP in this case, to help him reproduce the pipeline that was executed in C-Brain in this example. So if you're interested in this use case and to do it, then please come to me. I would love to talk with you. So that's all I had for today. So thanks. Did you actually run experiments on both clusters in these virtual machines to assure that with that there you get the same results, or is there still some changes, or is that not tested? So we haven't run extensive experiments in virtual machines, but we were able to identify that these differences actually come from software libraries. So with the same software library, this fixes the differences. The software library is not so far downstream that the virtual machine uses it as well, which then might introduce its problems. In the experiments we've done, it's the libmat, including glibc. So you would really have to install another operating system to be able to use it. All right. Excellent. Thanks. Is there much of a performance hit based on running it in the virtual environment as opposed to the native environment? Sorry. The performance hit, running it virtually as opposed to native Lyando's deployed architecture? Yeah. I guess the current figures are a couple of percent, so two, three percent of performance loss. So I think for the users it doesn't really matter from the perspective of a computing center. Of course, it can be a lot. Okay. So we'll...