 quickly. So remote access to data. I guess I can start with a personal story here if you want to hear. So this was long ago, maybe five years, let's say, someone was coming to one of our help sessions and needed to use the cluster. And they had a file on their computer they needed to copy. And it was their own computer. So in order to copy it, I had to start explaining, okay, first you need to copy this to the Cosh login server. And then from there you have to copy it to Triton. And since their laptop was, I guess it was Windows, they needed to use one program to do the first copy and a different program to do the second copy. And basically there were all these different things and by the end of that I was just feeling embarrassed that our system was so difficult to use. So why is that? Well, because copying data around is a big deal and needs to happen often. So luckily now almost everything you see on this page is new since then. So we've done, we've made a lot of improvements of the way things can be copied back and forth. But really don't suffer doing your data copying a bad way. So take a little bit of time, read through what's here, try to set things up so you feel comfortable doing it. Or on the other hand, do everything on the cluster so you don't need to copy things back and forth. So that's another case that many people end up doing. Yeah, like, I would give an example like I previously did something myself and I transferred a lot of data from the cluster to my own laptop to do some plotting. And then I realized that okay, actually I don't need the data, I just need the plots. So I wrote the script so that it can do the plotting without showing the plot in the cluster. And then I just copy the plots or just view the plots. So if you think about when you're copying data back and forwards, like, it's good idea to like, think about what, what you're actually needing. What is the actual thing you need to do. And often the situation is that you maybe, when you let's say doing plotting or analyzing the results, you only need like, let's say you do a physics simulation, you need a time series of how some values behaved, like how the system behaved. And if you can do that calculation on the system itself, like in the HP system, you can just copy that, like, let's say CSV file of time series and here's time and here's a ESL temperature or something like that. And then you can plot it on your laptop and it will, the copying will be really easy and fast. But if you constantly have to copy like hundreds of gigabytes of data to your laptop, just to visualize them or something like that, you might think that, okay, maybe this, maybe there's a better way. Like, and, and having this kind of like, a distinction, like, might help you in the long run. Yeah. But there's, there's many different copying clients, many different ways of copying. Unfortunately, there's no one good way. But maybe here under history and background, I tried to recently categories, categorize things into two styles. One is transferring data, where you make a separate copy of the file somewhere else. So now the same data exists in two places. This is generally efficient for large data, or when you need an actual copy. The other option is remote mounting. And this basically means you make a view of data on another computer. So for example, on my own computer, I can use something called SSHFS, or SMB to mount, which is the terms used here, we mount the directory from Triton on my own computer. And then on my own computer, I can CD to the directory, open the files, view it in an image viewer, that kind of stuff, without making a duplicate. And this is great because when I updated on Triton, I automatically see the new version on my own computer. And this saves a lot of time and a lot of frustration. The disadvantage is this is not great for a big data, because every time you open the file, it has to transfer the whole thing across. But when it's small, like images, whatever, or if it's being modified in both places at the same time, it can end up being corrupt and in some inconsistent state. And you really want to avoid doing that, basically by making sure you're only modifying it on one place at a time, and you save it on all other sides before you do that. But these are not too hard to handle if you think about it. So with that being said, what's here? What's in this section? First is data availability throughout Alto. So basically, we already have the data mounted on other computers at Alto. So for example, if you're using VDI, the virtual desktop interface, the data from Triton is already there. It's already mounted there. So you can just go and immediately open it and see it like however you would like. Remember that mounted in this case means that it's the view of the data. The data is actually in a server in the Triton Hall. And in order to like, if you let's say you want to read a 10 gigabyte file, it has to transfer all the way from Triton to the VDI machine. So it will like be a lot slower because there's lots of networking. Like it has to go through a lot of pipes. Yeah. And it's available on other things around Alto, in particular the physical workstations in the departments, which is great for quickly doing things. Then you can mount things remotely onto your own computer using several different strategies here, which I guess maybe we don't need to, we don't need to go into the details now. You can do this as homework. What do you think about that? Any particular comments? And this works on all operating systems. So there's no, yeah, but you have to be inside of the Alto VPN to do this. Okay. Any comments, Simo? No, I think it's good. Yeah. There's more examples of doing this here. Then transferring data. There is different things you can do it from the command line using SFTP. And I'd also mentioned maybe like, whenever you're, it's a good idea probably also to differentiate between let's say code and data. Like code is something I personally use it for everything transferring text files. Like if I if I have something that I code, I make like on my laptop, I develop some code, I make a git push to some Git server like version of the file or GitHub, and then I do a git pool on the on the try to decide so that there's code is in sync. I don't simply like copy the files across like a folder, because that makes it so that like, I have the version control handles all of the copying. And I know that stuff is under version control, so they won't break. And, and like, especially if I if it's a like a project I'm working on. So because these kinds of small files, we're talking about order of megabytes of data. So if for these small files that are very important because they're code, that there's lots of effort, lots of actual thought by you is meant or used to create these files. They are really important. So it's important to have them under backup. And that's why the version control is great. And it's easy to transfer with the version control. And easy to make certain that they are in sync in different systems. So that's good idea to do. And when we're talking about data, then we talk, let's say, like, like results or outputs or inputs of these kinds of simulations, that might be a lot bigger. And that might be harder to keep under version control because it's it's binary data. So usually like for this kind of data, I would refer using version control instead of like copying, because that makes it much easier to manage. Okay, so let's see. And there's different ways to transfer and some exercises if you wanted to try.