 Record to the cloud and I think now that I've done that, this should be working and everybody should be able to see me now. All right, so we've got some interesting things happen here. First of all, this room is darn near empty. We have one person in this room which we had like 10 yesterday. I don't know if those same people joined us online or they got what they needed yesterday or what but or just couldn't make it today. But we're going to go anyway because we have lots of people on Zoom anyway. And the other thing is that Dave is sick. So we're going to go through some of these things that we had planned on and we're going to kind of hit them. Just kind of Adam and I tag team in this because David planned on doing this and then he got sick today. So we're going to do the best we can. We're not the most prepared as we could be for that reason. So advanced Linux tips and file management. All right, so customizing Linux, aliases, environmental variables and scripts. Okay. So let's let me I'm going to get on my machine. I'm going to go ahead and sign in. You don't have to see this since you saw it yesterday me signing into on demand. And of course I waited too long. I'll share my screen here momentarily. Oh, apparently it doesn't know I am. All right. So I'm going to share my screen. All right. So I've logged into on demand. I'm going to go to a shell access to get to one of the head notes. And like I said, we're kind of flying by the seat of our pants. I know we both know quite a bit about the subject to hand. We didn't have an actual presentation prepared for it. So we're going to go with some things that we thought of and that kind of thing. The first he talks about here is aliases. So aliases, if you're going to be doing the same thing over and over again, it doesn't make a whole lot of sense to type ridiculously long things. I have one, for instance, that I have an a well actually have a script that does this, but it's the same thing. I could set this up as an alias. If you just type the word aliases, I need to make this bigger, don't I? So people can see what's going on. Oh, because I'm on the wrong one. I do that with plus. I'm used to doing it with my scroll wheel, and I don't have a scroll wheel. Live demos that are great for everybody. Control equals. Ah, there we are. Make this bigger so everybody can see what's going on. There we are. Now you should be able to see on your screen a little better than what I did. There are a few aliases built into the system. So type aliases. It'll have them. Should. Alias by itself. So these are some built in aliases. So I told you the grep command, which says it'll find all the elements. So yesterday I did this. I said grep, print, tilde slash, bocat, intro and omp example. Oh, hello.c. Right. This is what I did yesterday. So this is the same. Just ran the same command again. And you notice that it put the word print in red. It highlighted it. That's because of an alias. If you actually have the grep command by itself, if I say which grep, which we learned to which command yesterday, it says it has an alias first, but then it says user been grep. So if I was to do user been grep, which is actually the command that it will be running, print from tilde slash, bocat, intro, omp, hello.c. You'll notice it gets the same thing, but it's not highlighted. So that's an alias that's built into the system that says when I run grep, I really mean we're not going to really run grep. We're actually going to run this command that we have on here, which is grep. When I type that, I really mean this. I mean grep. Come on, highlighting that part. This part grep with dash dash color equals auto. So that's why it's highlighting the words for me because if I was to type user been grep, dash dash color, two dashes, come on, fingers, color equals auto, then you see the same thing. So I'm running the exact same thing as if I had to type that on the command line. I do have this one. This is one that I have on my own. It's a very long string. And so yeah, they're like shortcuts. Like whenever I type this, this is what I mean. We have a host that changes around quite a bit. It's our firewall. And when we go to our firewall, if I was to try to log into our firewall, I'd say SSH root at NS1, and it's going to come back and give me an error because it's doing some host key checking on there. We have actually two firewalls, and they have different host keys, and so they'll go back and forth. So I made a shortcut for myself called SSH no checks. So I do alias, and I have this in my setup every time. So every time when I log in, it has this. And it says SSH with the option of user known host file. So it's not even checking for a host file. It's not checking for keys. The global host file is here. And yeah, that's the whole, that's the end of the line. So I can type this as a shortcut. I can say SSH no checks root at NS1. And now it's just asking me for the password. I'm not going to actually do that. But that tells you that that shows the difference is that it's like eliminating those checks, because I don't want to have to type that whole, sorry, I don't even want to have to remember that whole string every time I do it. Transcript. Oh, gotcha. Move that up here. And I'll also move this up. That's what I meant to do. Move it a tiny year so we can, and I moved it in the way of the other thing. There we are. There, not sure we're able to see it. Sorry about that. So you can see that instead of typing that whole thing, which I don't even want to have to remember at all. I can say SSH no checks. Yeah. And the way I would do that, okay, let's say I want to set up a new one here. Trying to think of a useful, let's say I want to have, yeah, there you go. I want to do case that dash me on the command line. Case that dash me. And I shouldn't have anything there because I don't have any jobs running right now. And I can do an alias km equals case that dash me. And now when I do my show my aliases, I should have that in there now, right? Yeah. Yeah. Down the screen there, alias case that equals case that dash me. So now I can do km and it'll run that command, which is entering nothing because I have nothing in there. Actually, I probably would do one for case that dash me dash d equals one. Because I usually want to find out what the status was of a job that I did recently. So that'd be something that I would, that's a command that I run fairly frequently that I could set up an alias for. Now, if I log out and log back in, that goes away. So, but we're going to get away around that too. And that is you have a file here called src or kind of bash, bash profile. There we are. So I have a file here called bash rc dot bash rc. And you have, and you should have one built in. You do not want to overwrite this file. You want to add on to it. So the default is this bit up here at the top. And then we get down to here. And as a matter of fact, there's even a line that says put your stuff here. So these are the things that we've that that I've added to mine. And here's my necessary no checks that I added in there. So if I wanted to do the alias, I could put in, I could put in here that alias km equals case that dash me just like I did. And then every time I log in, it runs this and it knows what that alias is. What about the pre, right? Those are the defaults. Yeah, those are the defaults. Those are matter of fact, if you look at this first part, it says, we're going to look in a system wide. That's what all this stuff. That's why you don't want to overwrite this because then you don't get the defaults of anything. So we're saying go run the defaults and then come back here and run what we want. And again, this is just one form of a script file. Script file is just something that we say that we want to do over and over again. Now, if I have something longer that I want to have in here, instead of putting in alias, you can make an alias, you know, run this command and this command and this scan. That'd be kind of a pain in the butt. But, but it will do it. However, once you start getting any complexity whatsoever, you probably want to get to some script files. So I'm going to go back to my bio cat intro directory here. And you will see I was playing with a my host yesterday. This is a different one. So I have, okay, my host.sh was the one we were messing with, right? So I have another one called my host.s batch. Matter of fact, let's do less. My host.s batch so you can I didn't realize it was even long enough to get to the whole thing here. So we're going to go through this kind of show how the script works. First of all, any line that starts with found sign, that means ignore this when I run it. But the first line is special in a shell script. And that tells you what interpreter you're using. In this case, it's, it's a bin bash dash L. I usually, if I'm just typing one up, usually we call this a shebang. I don't know. It's used to be the hash. And then they called that a the exclamation point of bang. So that became a shebang. And every, so every, every, every, every script file has to store the shebang slash bin slash sh is usually what I type if I'm typing it from scratch. I've got this from somebody else. Matter of fact, it says a samples learn file. I created it, but I grabbed it from something else, you know, to begin with anyway. So you see all these that have, they all start with a, with a pound and it says basically anything starts with that, I'm going to completely ignore it, which is good for putting comments in your file like this, where I put this, where I put these up at the top so that I can read the top and see what's going on. Matter of fact, here, I even gave my gave instructions because I put this out here for other people to use as like a template. So usually a pound at the beginning of the line is ignored, but slurm in particular says, ah, whenever, whenever I'm running s batch, s batch in particular, if it sees something that starts with pound s batch, it's going to say that's, that's command that I want to use on my command line. So instead of typing, so they went through some of this yesterday, you could type s batch dash dash mem for CPU equals 512 megs equal dash dash time equals this dash dash partition equals this. I could type all that on the command line, but we don't want to do that. Putting this in an s batch means that when I'm running it directly, so if I was to run this file, it's a ignore, ignore, ignore, ignore, ignore, ignore, ignore, ignore, all the way down to the bottom. As a matter of fact, I don't even run the program on this one. I must have been messing with it. Oh, there we are. My host is H. Yeah, you're right. Right here. This line for those of you following along the home. So I'm actually running, remember the tilde? Another shortcut to that is dollar sign home. Dollar sign home is the exact same thing as tilde. So I said dollar sign home, same as tilde, back into my host.sh. So that's actually the command that I'm running. Everything else here is just a blank line or is, or has come in and out. So I could run this. This is a, this is a script file. And when it runs batch, it's going to ignore everything, except for this line that's going to run it. So let's do this right now. So I say dot slash my host dot s batch. And it ran it. It didn't submit it to the queue. It just ran it right here on the head node. And all these other things were ignored. It didn't tell it nodes, all that kind of stuff, because bash my interpreter said, oh, that starts with the pound sign. I can ignore it. However, if I want to submit this thing to the queue, that's what instead of that, this is the nice part about not having to put this on the command line is I could say s batch dot slash my host dot s batch. So now I'm, I'm taking that I'm submitting to the, I'm submitting to the queue this batch file. And it tells me I submitted a batch job. Because s batch told me I gave it all the parameters up here. And if I was to look at this, it would say I'm running this on one node with one task per node with a name of my host and all that kind of stuff. It's already done running. So, but if I look now, we'll do case that dash me dash D one. So this is everything I've run the last day. And you can see that I have this job number notice this job number matches that job number. I wrote it one node one core. I asked for half a gigabyte of ram didn't use really any of it. Didn't have time to check it. It ran for one second and completed. Now Dave kind of showed some of this yesterday. But I should also have a file here now called slurm dash 8411832 look like a familiar number. 8411832, 8411832. That's that's the job ID. Anytime you ask us for help, it really helps us if you can give us that job ID, because that gives us very specific information, we can zero in and look at that particular job. But if we look at that file, it'll say that it ran on hero 09, as opposed to running on EOS when we ran it directly, because it went up to the queue, the queue said, okay, hero nine, you're going to run this job. It ran it and it said, okay, I'm done now. Save the file. And then it told me I was and then it was finished. They're both scripts. And the ending of the name doesn't make a difference. That is a convention that I use to distinguish between them. So they're all scripts. The s batch one has all these commands in here that this one's ignored. I do this with two to say this is just this is ignored even for s batch. But if I do less, I host the s batch. You'll see not all these are ignored. This one is ignored when I run it directly. It's not ignored when I submit it to s batch, because it starts with s batch and s batch says, hey, that's meant for me. I can I wanted to run with half a gig of RAM. And I want to run this for one minute. So having a script like this, yes. So well, or it's so that it's so you can reproduce what you have. Because it's okay. You got something that works. Now I want to run this again. But, you know, you changed your input parameters, whatever. And now I'll have to oh man, how much how many courses I use that on how much memory that used. I don't remember what to do. That way you can just go back and look at your script. And this is and you have that all built into there. Yes, exactly. And by the way, anybody online, feel free to pop on and and ask questions. More than willing to answer questions as they come up. Those are some pretty easy examples of scripts. I have some scripts here. Yes. Again, these are commands, the flags to LS. I said, I want long, I want in reverse, sorted by size. And it's just something I happen to know because I use this quite a bit. So I'm just looking for my this is going to get my bigger files here. Um, that's per sir. I think it's something here that look here, anything that doesn't have that's actually executable. Those are old. I don't have any good ones now that aren't public anymore. Oh, here's one essay case that so essay case that is a script that I wrote to run case that um, somebody asked online where the example scripts located. I had them my home directory in uh, in homes, Kyle Hudson and Baok had intro. This has been directory. You won't be able to see all these because I have them locked down. Um, but there are a couple of them that do. So that's this Kyle Hudson been and but I have one here essay case that and what that does is it runs case that down here. This is this is ignored. This is my old one. So it runs essay. It runs case that I run strip color codes because Dave has everything color coded. I like everything to be in lower case. So I take all these different types and I put them in lower case. So if I run essay case that if I run case that let's just look at the first 20 lines. Ah, silly thing. I do full screen on this thing. Yeah, well, it did. Yeah, it did, but that's all right. We can that everybody else online is seeing this. So we're all right there. So head dash number will give you the first so many lines in a file. So kind of like we so I'm using my pipes again. That's something I do frequently as I run through case that and I want to say I just want to run the I just want to see the first 20 lines instead of seeing everything that's got and tail dash 20 will do the opposite. It'll show the last 20 lines or whatever. So these are if I was to run case that this is the first 20 20 lines that I would see. I don't want to I don't want to see the whole thing because it'll make it harder to compare. So you see how it has you know this in green and this in blue and this is capitalized even though it's not in our our file definitions and shows all this stuff in here. I don't want I want all the information but I don't want it polarized because I want to be able to find it easier. So I have one essay for some system administrator because we're the nerds that don't want to have all that stuff. I just want the plain text out of it. So I run essay case that pipe to head dash 20 and you'll see it's the exact same thing except completely without without any color and this is all in lowercase now. And that's because of a script that I wrote that would do all this stuff to it. So a script is just one line after another is going to run the first one run the second one run the third one. There are you can do advanced things with it. You can do loops. You can grab the nth line of a file things like that. There are all sorts of tutorials out there probably going to get into you know other than knowing that they exist. If you have questions want to ask us about those kinds of things let us know. We a very common thing that somebody wants to do is to run an array job which means it's going to run the run through the scheduler so many times so say a thousand times you want to run the same a thousand jobs but with different input data and the input data comes from a text file and you can say for round one we want to grab the first text file in this directory for round two and the second text file in this directory for round three one and third so that way it can detect that and you don't have to you know write a separate submission script for every one of them. I wonder if Dave has an array thing. I think that's one of the things he was going to talk about here. Okay sure. So it did work. We like it when things work. Environment variables scripts. Okay so environment variables. But I showed you the one yesterday I showed you path. You can get a list of what variables are set by saying by using the set command. Now it's going to show a whole bunch of them so I'm going to pipe that to less so we can kind of go through and see what's there. So if I was all of these here are different parameters that are in there already. My history size. My home. My host name is something that's going to change all the time. Some of these things are automatically changed by the system some are not. LS colors. That's a big nasty looking thing that shows you how to make your file listings look pretty. And I've never mess with that. That's like a system default one. But you can make it do all sorts of things with the colors. You don't like the built in colors that are there. You can make them different. Module path. That's where when I load modules. That's where it's going to look. We can actually change that so you can put your own modules in and your own home directory or something like that. We have the path which we talked about yesterday. PS1 is a special one. That's going to tell you what your command prompt looks like. So this one is user at host name and then the working directory. So these are all, you have to go look it up. But if you want to look up PS1 prompt options, you can make your prompt whatever you want. I know people that just want a plain dollar sign. They don't want another user name. They don't want another host. They just want to have a dollar sign just so they know it's time to type yet. My UID, which is my username. All important things to know. But there are times. The reason I did the last is because this is a whole program almost built into the end of it here. I'm going to skip through the rest of those. That lets us do some cool things though too. We can pass things into and out of with environment variables. Like I said, Dave had this ready to go. I think I have some of this from previous. Yeah. So I say that's old. First of all, I'm going to get rid of that slurm out file because I don't care anymore. See what I've changed here. Test.sh. Sample.qsub. That's recent. Did I update that to? Nope. That's the old style. I don't have a good array example here. Get rid of that one too, since that's a preview example I've already used. Let's go to our documentation. See what I have here. Hope it's on the right screen. Support. Search for array. Array jobs. There we are. An example. Excellent. Let's make this big enough so you can see. Dave probably made this anyway. So does array jobs have a variety of uses? Here's something we have app one. And you want to run the same thing with different things. So this is what I was kind of talking about. So we have a run size of 50. We have these all that one's commented out. We're going to run app one with run size against the data set. So if you're wanting to run this with a run size of 50, a run size of 100, 150, 100, 200, that would be a pain in the rear. So we had to submit it four times. You want to just do it one time. So we say s-batch-array 1 to 200 with a range of 50. So it's 50 at a time. So it's going to run a 50, 100, 150, 200. And we say run size equals, and this is an environment variable. What we were just talking about, environment variable being slarm array task ID. Your slarm array task ID is what you've told this to use. We've told it to use 50 to 200 with an increment of 50. More normally, you probably see it like one to 100 or one to 1,000, so like this. And so you're, but you're, we're setting run size to be the run summary task ID, and then we're running this with run size. Same exact thing, except we set run size to be summary task ID. And that's one of those environment variables. So if I was in my job and I ran set, this would be one of the things I would see, and it would be 50 the first time, and it would be 100 the second time. This is, remember, I told you about running the nth line. This is an exact example of that. So we say app two with this command said, says pick the whatever said dash n says pick the line and print that one. So I'm saying print the first line out of data side.txt. Second to time through app two with two, the second line of data set.txt. So we're running that, that lot, whichever line we have of that. It's kind of hard to wrap your head around. Here's an example and using a loop. We talked about doing a loop. We could say while reading the line, run this. We're going to grab that, and we're going to output that to script num.sh. And then we'll submit that. That's not good. We don't want to have that going multiple times because that's going to submit a job five thousand different times. But if we do it this way, this kind of shows how you could use your environment variables. This is the more efficient way of using the scheduler because it's going to show up in the scheduler under case that one time and it's going to say you've got instance one running on this node. You got instance three running on this node. And this is three through five thousand are still waiting in the queue or something similar to that. I didn't name it. There's a name for it. I forgot what it is. I don't even think about it. It's learnskidmd.com. That's the name of the software. But there's actually a name. No, that's not something we came up with. That is the name of the software that manages it for us. And this is the most common one out there. I found it because I've actually looked it up for something and I was like, oh, it actually does stand for something resource manager. Yeah. And this is probably the most popular one. It's certainly the most popular free one in the world. But I'd say probably over half of the supercomputers in the world do use it. It's that level that most places actually use this. I want to show you a cool little feature here. And that is if I am on this machine. Now let's say I'm on EOS here. And I'm trying to think of a good example of why I would want to do this. I'm going to create, we're going to use a command called tmux. tmux lets you walk away from your computer and come back to it later regardless where you attach to it from. So I'm on EOS right now. I'm going to create a new session of tmux. So I'm going to say tmux new session dash s testing. Testing. There we are. And now it basically logged me into Baocat again. I'm still in the same place I was at before, but you'll see this green bar down at the bottom keeps up with the date and time and all that kind of stuff as you're doing it. I can run a program here and then come back to it. So Dave showed you the htop command. That's a fun one to look at. Things are going on all the time. Now I'm going to walk away from this computer. So I'm going to press the magic keys to disconnect, which is ctrl B and D. And that disconnects my current session from tmux. And you'll see down here detached. Control B is the hey, I'm talking to tmux now word D is what it says for detach. No, just control B and then no control D for detach. If I look at this, everything I'm running right now on EOS didn't show it. Why is it not showing it? I thought it showed tmux on there. But I can then say go back and say tmux attach as another command. The last one I did new session and I say tmux attach dash t testing. And it's been going there the whole, it's been going the whole time. Now the cool part is let's say I log off. I go back into a different session. I'm going to I'm going to pull up mobile X term. So you can see how this works. I'm going to log in to this day. Okay. Get my duo push. I'm saying approve. You can do this. It was great when it works. All right. So now I'm logged in. I'll make this bigger so you all can see it. I'm not I'm now logged into a different head node. This is one that just Adam and I use. You guys only have permission to get onto it, but it's all minus. But if I do tmux list sessions right now, since this is the one I'm normally logged in on, I'll probably have several. I have a few. I have a few sessions that I have going. But so you have to be on the same one. So I can't do that from here. But on EOS, which is where I was at before, I can SSH to EOS. So I'm on the same host that I was on before. And now I do tmux list sessions. And you see I have one that are called testing. So now I do my same thing, tmux attach dash t testing. And I'm right back where I was before. I'm running. I'm running H top. And if we look back what I'm doing over here, tmux attach dash t testing. I'm actually seeing it twice. Is it not showing everything there? That must be it. That has to be an on demand bug. The color issue. This is the way I should be looking in both cases. But you notice these little dots at the bottom. That means there's already another session connected. And it takes the smallest screen size of everything possible. So because they won't won't let you see more than what's any on any one given screen at a time. In this case, I'm now in my tmux session. And I want to exit it. So exit or control D both do the same thing. They exit that. And now that tmux session exited because I actually quit it instead of attaching from it. And I do tmux list sessions. And fill the connector server means it didn't find anything for me. I use tmux a lot. Like I said, to look at different different screens, different different things that I'm looking at. One might be looking at the file servers, one might be looking at virtual infrastructure, one might be building software, building software is one that I that's a good example for something you might want to do. So you're so you're wanting to build some software and it takes a and it and it's working on compiling software. It might take an hour to disconnect it, go back, work on stuff, reconnect, go back to somebody in chat. Could you use the tmux session to do the MATLAB compiler compiler in the background? Yes, the MATLAB compiler. Yes, you can do that. Basically anything that's on a plain text screen, you should be able to use tmux for. And I think no, no, no, actually you don't because once you submit a job, then just puts the output into the file. So I and so you don't need to be around for that. That's a nice part of that's a nice part about having a schedule. We have people that run jobs the last three weeks, you wouldn't want somebody had to be logged in that whole time. So once you submit the job and it says job, we did the one that was said job ID, whatever it was, eight, four, whatever. Once you've done that, you can walk away from it. It does not require you to stay a log in. You have to stay a log in. Yes, walk away. Yes, if you if you don't want to lose your spot, you know, that kind of thing. I've known people if you're if you're using text files and you get so far down and you're like, oh shoot, I have to leave. You could, if you're in tmux, you can disconnect, pack your laptop up, take it somewhere else, you get home, like, oh, I need to go back where I was, open up, reconnect to tmux, and you're right where you left off because as far as the computer knows, there's somebody sitting there. It just it just you're just popping back in so you can see it now. Adam, could you pop over to the data center that my HP tech says he needs IP information for that? Sure. If he's still over there, I'm assuming, a lot of stuff that Dave has on the list here for today. Let's go through one more thing here. I'm going to go through Globus. Globus is a way of transferring files into and out of Baocat and a lot of supercomputer systems. It it's not very good for if you're just copying a file or two, use the web interface, use mobile x term, something like that. But if you're driving transfer large data sets, this is a very, very efficient protocol. It's a pain in the butt to set up for the begin with. That's why I say for small things, by the time you figured out how to do it, you'd already be done. But I would say probably somewhere in the range of maybe a gig to two gigs, something like that, just because it's a very efficient protocol, so it'll transfer your data faster. And the main reason I'm actually using my laptop today instead of the work, the computer built into the console is because I have the software installed. It's called Globus. So it requires you to have Globus installed. And I do one of my little icons down here. It's running all the time. Is that it? No, Windows security. One of these guys, there we are Globus. So Globus Connect personal is something that can run on a Windows machine. Unfortunately, it doesn't work on a Mac, doesn't work on Linux yet to have a separate installer for Linux. But we have it, but we have the software running on Baocat. So we just go to and everybody already is already set up for this. So I'm going to go to Globus.org. As a demonstration, I'm just going to copy files between Wichita and K-State, since we have people from Wichita on here and people from K-State on here. And you can see how I'm doing this. So I go to log in. And it says, list your organizational login. And minority remembers that it's Kansas State, you can just start typing it there if you, for your first time. Your number is the last one you use. I'm going to sign in. And it brings me to a file manager here. But it doesn't have anything set up over here. So we're going to have to check, we have what's called collections. That's how they group files together is into collections. And I'm going to do a search for Baocat. If I was to look on my recently used list, it would already be there, but we're going to pretend that I don't have that. And the one we're going to want to use is the, where is it? Baocat file system. This first one up here, it's not in green anymore. That was our old one. And I haven't gotten rid of it yet. It's probably never coming back. But we're going to go to Baocat file system. Remote endpoint failure. Great. And apparently we're having issues with Globus and I was not aware of it because that's an error on our Globus endpoint. Well, I guess we're not going to show that one today. I probably should have checked that this morning. I looked at it last week and it was working fine. And then I'm guessing it's from what the errors is giving here. It's something on my end that our server is not doing right. Might pop on a little bit and see if I can get this going elsewhere. What else does Dave have on his list for today? We're going to let Adam do Python. Let's take about a five minute break here. I will play with my machine here. I'll stop my screen sharing and see if I can figure out why Globus isn't working. We might get back to that. We might not. But let's just take about a five minute break or so. I believe the bio break has probably been over for a few minutes and everybody should be back. My name is Adam Tigert. I am one of the sevens here at Beocat. I was going to be talking about on-demand. We've been doing a lot of our instruction this time through on-demand itself because it is a useful tool and we are trying to dog food it. We're trying to make sure that everything we are showing off does work through it. On-demand has a lot more functionality than what we've been showing so far. We showed the files, access to your files. We showed this cluster's drop down which gets you your shell access. But we've got an entire field over here under this jobs tab or under the interactive apps of other applications that we might find useful. We have a lot of people on Beocat that are trying to do reproducible science. It's one of the requirements for grants these days is the ability to redo your work if you have to. Hand somebody off exactly all of the code that you did, all of the data sets that you've ran and make sure and allow those people to make sure that your results are what your results say. One of the tools that people try and use for reproducible science is Jupiter. Jupiter allows you to write code in many different languages right inside what they call a notebook. In that notebook you can reference your data sets. You can have both the code that you ran and the results that you've got right in the notebook itself. You can embed graphs. You can embed all kinds of useful data right inside that notebook. So I've logged into Beocat on demand and I went ahead and clicked on our interactive apps tab and if you scroll down you can find our Jupiter server. Within this Jupiter server tab you can choose how many hours, how long you think it's going to need to run, how many cores, how much memory, all of these things are things that you would need to know for reproducible science but there are things that you'd want to know because behind the scenes this is going to start a job on the cluster and give you an interactive access to that node. We do have the ability to request GPUs. Some people like TensorFlow in their Jupiter notebooks. That's a wonderful thing. I'm not doing any GPUs right now and so I've just gone ahead and scrolled down and I went ahead and clicked launch. Behind the scenes the on demand app submitted a job to the cluster that has a bunch of information requesting all the resources we need and it's going through and it's starting up Jupiter itself for me. And it's currently starting. I went ahead and chose four hours as a runtime. It should be running shortly for me. Perhaps I should have started one earlier. Once it started it shows here that I'm running on gremlin 01. That's my host. It tells me when it was created how much time is left. It gives me the job ID. Again, job IDs are very useful for us in the event that something goes wrong. This 8411888 is my job ID. The session ID is also useful because the session ID for us at least will tell us exactly where your output files are so we can take a look and see if there are any errors that we need to address. But when you're ready you can click this connect to Jupiter and it just opened up a new tab. It's showing me all my files. It's showing all of my scripts. It's showing everything that I've got right here and I can fire up any of these notebooks that I've got. I've got several different ones here. I've got a TensorFlow examples. I've got some untitled ones. If I can remember where I put my example scripts, I believe I put them in source on demand. I know you're in here somewhere on demand. Job templates. And then I have this Jupyter notebook script. This Jupyter notebook script that I've got here is a very, very, very basic hello world in Jupyter. It's literally doing nothing except printing hello. When I clicked on my notebook there, it fired up and says, hey, I don't know what kernel I want to run under. Well, for me, I know that this is a Python kernel. In fact, it even tells me, hey, I couldn't find one for Python 3. But I clicked a little drop-down here because it asked me which kernel I want to run as. And it's showing me all of these different options that we have for kernels on Bayoket. We've got all our conda. We've got Python. I've got a virtual environment that I've set up. And over here, I can say, okay, this is Python 3. Let's see. I want to run under Python 3.74 with TensorFlow. And I could hit set kernel. Behind the scenes, this kernel is going to fire up. And when I decide to hit run on my cell, it will eventually fire up the kernel and say, hey, here's my result. Hello. Well, that's wonderful. That's a very, very basic example of a Jupyter kernel. Now, I'm going to come back to my home directory here. And I'm going to fire up this untitled, no, TensorFlow versions, people. And this is even math that I don't remember setting up a long time ago, probably more examples. Jupyter Python, scikit. Here we go. I'm going to make sure that scikit learn is installed for me. It should be. You can use conda environments provided that conda is set up within your bash RC. If this sees conda in your bash RC, it will actually look for your conda environment and give you conda kernels to use. We have some documentation on using other virtual environments, normal Python virtual environments. Those require a little bit extra setup in the form of creating a configuration file that our script has to read. The big problem with Jupyter like this is that it's interactive. And that's wonderful when you're initially writing some of your code. It's really kind of a pain if you're trying to run your code or you need to run a lot of code. You want to test out somebody else's notebook to sit here and, okay, I got to wait for time. I got to wait for time for my job to fire up. I have to predict how much time it's going to take. I have to wait for the job to actually start. And oh, well, what if the job started in the middle of the night? Are you really going to be running that code in the middle of the night? No. What if there's an error somewhere in your code? Now you've got to wait. You've got to wait to do more interaction. You have to wait for the job to end and you have to restart. And that's all problematic. So Jupyter is absolutely wonderful for these notebooks. And you can see all the code that I was wanting. I can put documentation as to why I was doing some of the things in here. I probably should have a long time ago. So I'm going to run this. And I'm going to run this one. And so this is behind the scenes is loading up a scikit-learn, doing some data sets, and then apparently using TUNU of scikit-learn to actually run my code anymore. Jupyter is really nice for these types of things. The problem is that it's all interactive. We've got lots of interactive apps, including RStudio. You could fire up R over here and run that in RStudio. When that's interactive, you run into issues where the cluster may not be available when you are available. And when the time is right to run, you don't want to waste resources on our cluster. Because if the job starts up in the middle of the night and you're not there to run your interactive stuff, those are resources that are now sitting idle that nobody else can use. And if nobody else can use them, that means that you have taken away time from other people. Because of that, there are ways of actually writing, of using Jupyter or R in a non-interactive manner. I went back to our open on demand under this jobs tab, and I went to a job composer. You can write jobs on the command line. You can write jobs from the terminal. Writing those requires that you have a terminal access that requires that you know the options that you're looking for. Same as here. This is a slightly different way of interacting with the scheduler and creating your jobs. I'm going to go ahead and create a new job from one of our templates. We have written up quite a few different templates here that show off some of the types of jobs that people can access, people can run. We got Blast, ComSol, CUDA, Gaussian, Gromax, Lamps, all kinds of different jobs with example datasets that will let you run. But what we're really interested in here today is this Jupyter Notebook template job. I've gone and clicked on it. And over here on the right, it says, hey, create a new Jupyter Notebook job. Okay, well, what are we going to name it? Well, we're going to say this is the BayoCat intro test. And it asks what cluster you want to submit to. And well, each of our on-demand instances only connects to one cluster, BayoCat or BayoShock. So in this case, it's going to be BayoCat. And it's saying, okay, well, the script name is my SBAT script. And I'm going to go ahead and click create job. Behind the scenes, that's going to copy the input data files that I had, the basic script that I had for the basic iPython Notebook. And it's going to give me an SBAT script. And we see right here, we've got this BayoCat intro test. That's the name that I gave it. We say it's created. We also say it's ready to go to BayoCat. We see it's not been submitted. And on the right-hand side here, we can scroll down, we can see things like, okay, well, script location, script name, what's in the folder we just created? Well, there's this hello.ipnb. That's the Notebook itself. And then there's our SBAT script. It shows you the contents of the SBAT script that's what you're wanting to see. It's kind of complicated because we've got a bunch of stuff in there. But I went through and I actually documented what each of these options are. But we're going to go ahead and open this script up in the editor. Because there are things that I want to be able to do. I'm not sure that 10 minutes is enough time to run my job. So I'm going to go ahead and say, give me an hour and 10 minutes. And well, a number of nodes that you want. I do a comment here. Jupiter jobs in general need one node. You're not going to want to go more than that unless you really know what you're doing. And if you really know what you're doing, you're probably not going to be in Jupiter itself for that. Or you're not going to be using just Jupiter for that. Number of tasks. Python by default uses one core. If you're using TensorFlow, TensorFlow will use more. But unless you're using a library that can use more than one core, that you actually know will use more than one core, more than one task at a time, this one task is probably fine. And we want to say four gigs of memory is probably more long lines of what we need. I've got comments in here talking about how we're doing some of the things we're doing and what you would want to do to set up to use your conda environments. If that were a conda, if your Jupiter notebook were conda enabled. We talked about how to use the TensorFlow, how to load up the Jupiter Python module that actually includes everything you need to do TensorFlow and everything else. And what you would want to do to activate your virtual environment if you're doing that. And then we finally come down here to the bottom and we say, okay, well now we would actually want to run the notebook. But if you look here, I've got all of these lines commented out. So it's not going to run a notebook. What I would want to do is uncomment one of these lines. You know, we've got some different options, different example ones that you might want to run. This one, for instance, creates an output file that's just an HTML file. That's absolutely great for your results because hey, you can just put that up on the web and it's just done. But you might also want an IPython notebook. You want to, you want your results back in a notebook so you can hand that off to someone else and they can look over everything you did and actually see all the results right there in the same page. Or you might want to run it and leave the notebook in place and overwrite that notebook. Most of the time you're not wanting to overwrite files because if you're overwriting a file, you may lose input data. Or at least it's changed. And if it's changed, you can't guarantee that nothing else changed. So if you've got your, if you're not doing it in place, you can know for sure that you're, you could compare what you thought you were running to what you eventually did run. And so I went ahead and uncommented this, this nbconvert execute to notebook hello.ipmb. If you say, in my comment above, it says this will produce an output file called hello.nbconvert.ipmb. This will produce a new ipython notebook. I've gone ahead and saved that. And if you see over here, it's already updated that, hey, my file's been saved. It's already uncommented that all of all the options we changed here, it's going to run for an hour. And so we've got this job. What do I do with it? Now it's not been submitted yet. Let's look up here. We got edit files, job options. We can open up a terminal. Oh, there we go. Submit. Submit button. And behind the scenes, it submitted that job to SLARM. It asked for all the resources we needed. It even gave us a job ID. It's ready to go. And it's running. And it's running. There's not much information there. What is it actually doing? I want to see what my job is doing. Is it really still running? Well, I went back to my on-demand, my normal on-demand tab and I said with the jobs and there was this active jobs. Okay, we've got jobs that are running. I can actually see my jobs that are running. This one is our Jupyter notebook that we just fired up. It's been used 29 seconds and it's still running. And I can pull in it. I can come in here and see, okay, it's running on Wizard 23. And that's always nice. And how much it's used this much time. It's kind of like KSTAT. It's not quite as detailed as KSTAT. Probably can get you by. But it does let you see what's going on with your jobs running on the cluster at this moment in time. And oh, well, it knows where your output files are. So you can even have it open in the file manager or you have it open in the terminal. And it would actually let you look at those files on the fly for any of your jobs that you're running, not just jobs that you're running through on-demand. I believe this should be about done. Okay. It has finished my Jupyter notebook. So I'm going to go come back over here to my job composer tab and it tells me, hey, it's completed. And it's got a link to OpenXD mod right here. Oh, with a job ID. So you could click on that and actually see what your job did. But it does warn you that those jobs are going to be there at least 24 hours later. That's just to see information about how your job performed and the resources that you've used, those are always lagging behind about it about a day, at least in XD mod. So we scroll down here and we say, oh, folder contents. We've got more files in this folder now. We've got an out file. We know what out files are. We learned about those earlier today about, you know, hey, a job, when it runs, it produces an output file. And so everything in it ran, everything that was in your standard out comes right here. Oh, well, it says right here, I was converting a notebook. It executed the notebook and it wrote a new file. That's good. There are no errors in there. And then if we come in here and we take a look at this, hello.ipmb, we see that, hey, it's right here, but that's not, that's not useful. How do we read that? It turns out that on demand, it's not able to read ipmb files directly. If in fact, did I click on the input file behind the scenes? And I pieced a notebook is really just a collection of JSON data. And if you do spend the time, you can actually read through it, but that's slow. It's not fun. I'm going to go ahead and copy this script location. And what we're going to do is I'm either, is we can fire up Jupyter Notebook again if, you know, so we can actually go open up our note, our existing notebook that we just ran. Or we can take a look at this other interactive app that we've got here called Code Server. If anybody's been familiar with Visual Studio Code or VS Code or that kind of thing, that is a coding application that you'd run on your desktop. And it can connect to remote systems and actually run. Code Server is a take on that. It takes parts of VS Code and lets you run the entire thing remotely in your web browser. So you don't have to download anything to your computer. VS Code lets you compile. We'll let you compile any C code. We'll let you do any of that kind of stuff. And we're going to fire that up here. Code Server can actually read ipython notebooks just fine and actually at least see your outputs. You start up, obviously it looks at all your files. Do I trust that every file in my home directory is fine? I could, but I don't. And now we're going to take a look at jobs through on-demand's web interface tends to store all of the, all of their data and all their outputs in your on-demand directory. Hey, on-demand directory, you've got a data directory and we've got this system. We've got dashboard, all this. It's, my jobs is the list of jobs that we've all set up. And my job six is, I know that six is the one that I just created. So I went to on-demand data system because they're system applications. My jobs are the ones that I just ran. And hey, we've got this hello.itmb, which it could show me the notebook that we've got that I just had. And then if I click on this one, it shows me the notebook I had plus the output. So we know that it ran. Those job templates are really nice to have. And it's kind of a shame that they're only available in on-demand, right? Well, they're not only available on-demand. BayoCat has a GitLab. And in GitLab, we have a lot of our code stored in here. A lot of code stored in here for BayoCat purposes. Other people have other purposes. Dave has put case stats out here for other people to get it. But we have all those job templates right here in this admin public on-demand job templates. We have the example Jupyter Notebook right here, including the script that you can go take and you can run anywhere else. You could take that and put it in your own home directory rather than run it through on-demand itself. Let's see. Code server, I covered some of that. Sure. And yeah, exactly. And that is the beauty of Jupyter is that it wants to be interactive. Because hey, you have the pretty pictures. You have the graphs. But some of that data that you run, some of the science that you need to do is so large that you have to wait. And if you're waiting, that means that if you are waiting on an interactive job, that means that other people might be waiting on that same interactive job or that the job might be or that if you have to wait for it to start and then if you're ready to start running that cell. Well, what if you went to the movies? You're only going to get four hours and well, now you've lost half because you were at the movies while you were waiting for your job to start or went to dinner or it started in the middle of the night. Those are real problems that people have. It's some of the problems we've seen at our cluster down in Wichita. People would submit a job and they wouldn't necessarily know when it was going to start and they wait and they wait and they wait and well, it's not started yet. I guess I'll go to bed and then it space frees up in the middle of the night and hey, their job started and then their job is over by the time they woke up. And I'm sure that happens here. It's just down there they were a bit more resource constrained. So if that's something that happens there, they really feel like they lost out on that time. So that's one of the reasons why you might want to at least write up your Jupyter notebook. You can write your Jupyter notebook on your computer or on anybody's computer and then you can transfer that to Beocat. For instance, in this example job that I had here, we can this hello.ipmv. I can set myself like why use a job to test that. Exactly. So Beocat and Beoshalk and these clusters, they're good for scale. They're good for scale or we do have classes that use them because it's a nice consistent environment. So if everybody's environment looks the same and I can guarantee that that okay, your environment looks exactly like his environment, which like, you know, so it's all going to whatever they write, if they write it to work there, it's going to work there. That those are the reasons why you might want to do that kind of thing on Beocat or on the actual cluster. One of the things I didn't, I kind of glossed over here in on-demand, at least on Beocat's side, we do have a few resources set aside for interactive things. Let's come down here in Jupyter. It asks you to choose a job type. And right now I chose interact. And right here it says interact jobs will be scheduled to a node set aside for interactive activities, but it allows oversubscription. So we come in here, you say you need four gigs of memory, and he needs four gigs of memory, they need four gigs of memory and they all try and do that simultaneously. Somebody's not getting what they want. We have this set, we have to set this its way for those big classes that need just a bunch of people interacting with it at the same time. And we can expand that and give more people access if that's something that add more of those interactive nodes if needed. But right now we just have four nodes set aside that are about the size of our head nodes. And in general that's been okay. The nice thing about those interactive things is if you are wanting to write your stuff, just to test that it works on Beocat real quick, you can do that. You're not using a lot of resources. And if you do, it's spiky. You know, it's going to run for a bit and then it's going to drop back off and the likelihood of everybody wanting those resources at the exact same time is low enough that that's fine. Whereas we've got this normal option and that runs on a compute node. And you get the resources that you request. But the problem is that if everybody's using those compute nodes or if you don't request the right amount of resources or anything else, it might get killed off because what the owner of the node might come around and need that or anything like that. You often get status 502 errors on on-demand when using RStudio. If I remember right, 502 errors are an issue with on-demand itself. And their workaround, their accepted solution for the time being is that you're supposed to clear your cookies. I don't know why RStudio is more painful about that. It may be that RStudio is putting more stuff in cookies and it's just filling up the headers too much. I don't know. But the accepted workaround for the moment is clear your cookies. And I'm sorry, I don't have more information than that. Let's run it if you have any. Exactly. It's a very good way to look at Beocat or these things in general, is that a lot of your computers anymore are pretty powerful and can do a lot of science. But they can't scale. You will never get the scale of the current supercomputers in your current laptop. I'd be very careful about the way I word that because supercomputers change all the time. And my phone is way more powerful than the first supercomputers. But as far as what current supercomputers are, you're not going to get a current supercomputer in a current laptop or a current desktop. It's just not there. So we talked about interactive nodes. We talked about why you would want to scale. We do have template jobs for R as well. If we say template and I can search in here R. R is kind of everywhere. But we've got a parallel R job. We've got a sequential R job. And both of these do have some examples submit scripts. They talk about what you want to do with MPI, with R. And I think I've even got a sample R script here that does use MPI. They work fairly well. And if you are just firing up R studio to run your R file, these example scripts, they will work just fine for what you're trying to do. I've come in here and I take a look at this R job here, the sequential R job. I'll go ahead and create a new one of those. The R scripts that I've got here, you could just replace that R script that I put here with this is trying to get to my tabs and keeps changing things on me. But you could come in here and say open this directory. And I could say delete this myscript.r and I could say upload a new file. And if I had a new file that was an R file, I could upload that R file. We could either rename it to myscript.r or change this script here to instead read the name of your R file. And you could just run that. And that would be absolutely fine. And then you come back later and take a look at your output. If that's what you're using R studio for, then this is a perfect example of why you would want to. Is it also possible to run R markdown files to get HTML output? I would imagine so, but you should send us an email and we can take a look at that together. It's probably just a... I would imagine that it is, but we would have to look at that and see if we could find the right collection of R libraries to load up to do that. That is the majority of the things that I've shown or that I had set aside to show up on BioCAD on demand. We do have some interactive desktop things. If I click on this desktop, I can tell it to give me an actual physical Linux desktop right here. And it's right through a browser. Let me fire up Firefox on the cluster. Let me do whatever I need to do. We have people that do that for commsol or what have you, MATLAB, Mathematica, ctext and images copied to the clipboard. Yeah, we can go ahead and do that. And I actually didn't even know that I could copy and paste across this. And apparently it lets me do that. But here's a full Linux desktop running on the cluster right now. And I do use that a lot. So you can run any operating system? Rain, any operating... No, we just currently have the Linux desktop. We don't have licenses for Windows to do all that. So this is actually running a Linux desktop as a job on BioCAD. If anybody did have classes or that kind of thing, as long as students have access to BioCAD, you could also share them off with the view only link. And then you could have the actual desktop link. And all of your students can watch everything you're doing directly right through that, if that were something that you wanted to do. Kind of a poorer version of Zoom, I guess. But let's see, we've got Octave Mathematica. Mathematica is currently only licensed for case data associated people. And I think there's something wrong with the license anyway. So we'll probably have to get fixed. Jupyter with Spark. Spark is another framework for parallelization. And it will actually let you use multiple nodes. Because what you're not... You're not exactly running Python code anymore. You're running Spark slash Java code behind the scenes. And it lets you interact easier with multiple nodes. It just doesn't care. That's what I've got. Are there any more questions? There. Anything that... All right. Sounds good. Well, it sounds like Kyle's got plenty of stuff to share. I'm going to stop sharing the screen here. You could submit the jobs that actually run the notebook behind the... That would run the notebook in a non-interactive fashion. Yes. Any job that I'm submitting here can, in theory, be submitted on the command line. These interactive apps, they are more special. There is some communication that goes back and forth between on-demand and the job itself so that it does know the job has started and knows how to connect to the job. But anything that you've done under the the Job Composers tab with our examples or any jobs that you write up in here can be submitted right from the command line. It doesn't matter where you do it from. It just... What I was trying to do was show off all these different ways of doing it. And it really is nice that you can have all these jobs. And the nice thing is here is that it's actually got a history of the jobs that I've ran. It says, oh, this one's not been submitted. Or let me... Sorry. I realized I'm not sharing the screen anymore here. Any other jobs that you're running you can submit. Or any other jobs from here you could submit from the command line. And the nice thing about this is that it does have a history of, hey, this job was never submitted. Or hey, this job completed. Or it might tell me it failed. Or any of those other things that might have happened. One of the things that On Demand... That actually brings me to one final point. On Demand here has a link to your recently completed jobs. And it gets that information from a service we call XD Mod. And if we click on this open XD Mod, it actually pulls up a history of the jobs that I've run. Or that... And actually, anybody have run if... It's got some data, but it's just being a little bit slow right now. And we can quick filter that on our user. You're automatically logged in if you click from On Demand. You're automatically logged in so it knows who you are. And it would look through here and see, okay, well, I've run this many jobs. I've used this many resources. Professors might want to see that kind of information for their students. Make sure they're actually running data jobs. Or it gives you a predictor of, hey, well, I've got... I'm using this much resources. My students are using this much resources. My grad students, my postdocs are using this many resources. Maybe it's time that we start looking at seeing if we can get more resources for ourselves. Yeah. There's a... There's a... There's about a 24-hour lag. So jobs submitted today will be viewable... You'll see the information tomorrow. Right. It's not an On Demand exclusive. Yes. How is that accessible? Yes. If you take a look in your... If you've done anything in On Demand, if you've run any jobs at all, it creates a folder in there called On Demand, in your home directory called On Demand. And within that, there'll be... There's a data directory, and that data directory will have... It's a whole convoluted directory structure they create, but they've got sys and they've got user. Sys is for system apps. User would be for user apps. We don't have any users that write their own apps for On Demand. It is technically possible that we can allow you to do that kind of stuff that would be something you wanted. But it would just be under sys and then the application name that you actually ran. And then I talked about the session ID, where it was this big, long number. It would be a big, long number. If it's, for instance, if it was Jupiter, they were one of the example jobs. It would be under my jobs and then the number. Question on the screen. Yep. Is it possible to train TensorFlow models using Baocat? It is absolutely possible to do that. Something I will say is to use our modules, don't try to build TensorFlow yourself. It won't not work. Yes. TensorFlow has been notorious in the past. If you do train installs through PIP, they have had problems in the past where they assume, no matter what, that you're running a different version of Linux than we are running here. And it will break. They may have fixed it. It's still problematic. And we do recommend that you run from our modules. And if you do need a newer version, we can look at it. We'll see what we can do. In the basic R, a parallel JavaScript, the R script mentions use MCL apply multi-core warning generates zombie processes. Zombie processes are processes that have not properly exited. They have died, but they did not tell their parent process that they were going to die, that they were finished. And so some of those resources can be left around. A single process staying around is not necessarily a problem. But we had issues earlier today where somebody was complaining about not being able to fire up 4,096 threads, or more than 4,096 processes, basically. And it wasn't because of zombies, but if you did have a lot of zombies, your job may stop being able to start up more processes if you, because those processes IDs are still hanging around. Yes, and TensorFlow is one of the reasons why we started setting up Jupyter in the way we do through on-demand. It's because people are needing TensorFlow access, and that's how people are teaching TensorFlow. And it is absolutely phenomenal that it exists. The problem is that the TensorFlow folks are very opinionated about how your system should be set up. All right, so let me log back out again now that I got it working here. Okay, so I'm going to go back in where you saw me way back before. As a matter of fact, I'll close this whole thing out. Just globus.org. Start from the beginning. And I need to log in. Got my bar up in the way. Okay, once again, I picked Kansas State University Wichita State is on there with us. It took us a while to get Wichita on there because they were not using some of the protocols that everybody else was, but we do have them now in Wichita. It might be asking at this point for some authorization. I've done this before, so it doesn't do that for me. But I believe the first time you come through this point, it's going to ask for authorization. Again, we're going to look for the collection called BeoCat. In this particular case, and I'm actually going to show BeoShock2, and BeoCat file system down here. And now that it's working correctly, I get a file listing. Same as we had through the web interface, those kinds of things. All my files are here. That by itself, I mean, you can do things. You can download files. You can upload files. It's not the best way in the world, but you can do it from there. What really makes it powerful is we have the software called the Global Connect Personal. I have that installed on my laptop. And so when I go to my collections, there's a thing called Your Collections. And I have a few on here, but I have one here. I just named mine, Kyle's Dell Laptop, being really useful there. So I go to Kyle's Dell Laptop. And it's now reading from my own laptop that I have sitting just right here in front of me. And by default, you can say it lists my documents, but I can say slash user, I think it's, you can basically go to anywhere on your laptop that you want. I don't remember the syntax to make it happen right now. Yes, I go up one folder. I think that's what I can do on mine. I've allowed mine for pretty much the entire thing. Yeah, C is what it goes to there. So I'm going to go back to my home folder, which is, as everything else, Tilda. So this is back to my documents folder, which is where, that's the default where it goes to. You can change your default. There are settings in the app to make that happen. But I can actually transfer or synchronize like everything from one folder to another folder or anything like that. So this is a 3D print file that I have and I can take and I can say transfer or sync. And this all happens now in the background where it's transferring files from one to the other. And I can go to activity. Nope, apparently I didn't tell it to do it. This is the way I normally do it. I should show it the way that I normally do it. So I hide a file here and I say transfer or sync over here. And it says I want to start. And this is safe. I'm going to put that file this direction. And if you see up here at the top, it says transfer requests submitted successfully. And I now have something in my activity where it's transferring a file right now. It's queued. It will keep giving me updates here that it's going to transfer the file in. Like I said, if it's a small file, this is not the way to go. But if you're doing multiple gigabyte files, this is very efficient both in terms of not a lot of overhead as the protocols are talking back and forth to each other. And you can do entire folders at a time also. This is yeah, this is this is using Beocat because there says change now says transfer completed. So now my file, my transfer, this up here is saying it's the Beocat file system. So these over here are my files on Beocat. These are my files on my laptop. And I can use that to transfer files one way or the other. But individual files are meh, whatever. But the fact that you can use it to transfer or synchronize an entire folder from one place to another or big files from one place to the other. The nice thing about this is that let's say I'm using my laptop now and it's I'm transferring a multiple gigabyte file and it's time for me to go home. I can take my laptop, I can go home, I can pick it back up and it's going to restart that transfer from where it left off. Does it also use this fast? Yes. Beocat? Yes. It's also and the main reason it's used, it's also used to transfer between supercomputing centers. So can you go home? No. No. We'll say but it's going to transfer files to Beocat based on where you're at because it's transferring from your global connect which is on your own machine up to Beocat. It'll be limited by the speed of your home. We can't change the laws of physics. But it's going to be a very efficient protocol. So it'll like I say it'll transfer basically as fast as it can. Like here on campus? Well it's as fast as the whole campus wireless goes. So I mean that's which is not as fast as you know the whole campus backbone but we do actually transfer our end of the pipe on Beocat is 100 gigabits per second. It's mind-blowingly fast. So you're not that's not going to be the slow end. The slow end is going to be on your end. But even so there's a lot of overhead built into like if you're you if you're using a web browser or even if you're using SCP there's a lot of overhead talking that goes beyond on behind the scenes where it's just hey hold on a second I got to make sure I got this much I got to make sure I got this hold on hold on and it slows things down. This is the fastest way of transferring big files. Now I'm also going to do another one here you saw I clicked on collection let's go back to where I was file transfer. So here I've got Beocat file system on this side and I got my laptop on this side I want to change this side I don't want to be in my laptop anymore. So here I'm going to select Beoshock which is our system in Wichita and there's our Beoshock file system and here are my files in Wichita. So I'm files Wichita files here in Manhattan and same thing I can take and transfer files from one to another and that one's really nice because once I tell it to transfer from this system to that system again I can close that down I can be transferring terabytes of data from one supercomputer to another and like I said it all just happens behind the scene because it's it's very efficient protocol and because we have big pipes between here and there well they're they're both on camera and so yeah they're but this is how you would get files if you if you're moving up to bigger systems the what they call the access system so the Texas Advanced Supercomputing Center the which is the fastest academic supercomputer in the world I went on a tour of that about five six years ago it's amazing take Beocat and blow it up time several times but that's but they like have national resources you have to apply for access to them so I mean you basically had to have proof of concept somewhere else that hey I have this code I can make it work but if you if you're doing those sorts of things let us know we can help you get on onto those systems but this is this is the the main program that you always use to transfer between program between between different campuses and again once you tell it to initiate that you don't have to sit there babysit anything like that it'll just go there are also flows which is something new and you can have it basically monitor say hey if this file exists I want to do it's almost like a scripting language where you can say if a file pops up over here we can say now transfer it over here and then submit a job and do something and transfer stuff back you can those are all I have not set one of these up yet this is they just released that this summer so oh and by the way I also just got an email that said my transfer file transfer succeeded so it does let you know also when your when your files are done so you can you know know to check your other system I say if you're if you're dealing with a few k of files not the greatest because you know it's a little flunky but when you're when you get to start transferring big files this is the way to go it'll make a difference it won't it won't make as big a difference you know if I'm transferring terabytes of data and I can save a couple of hours that's a big deal if they're on your home system you might save a couple minutes because of having a very efficient protocol so it's not as big a deal for that type of thing but especially like if you're on campus in particular and you're transferring from your desktop you know where you're doing your research project here felt to go you know how to babysit it it'll just do its thing that's that's pretty much what I wanted to show with with globus I want to make people to be aware that it's there unless you might pop up something says I need authorization to do this that that's all normal but uh it's it's a it's a good way especially like synchronizing files from one way to another there was somebody here yesterday was asking about that and I hadn't even thought about globus at that point so and we are at time but like I said so I kind of made this little short but I wanted to kind of cover is that we we do this and the video online that we have for this is a little outdated because they keep changing their interface and they and we had to shut off one of our transfer nodes there for some security reasons and uh but I wanted to get on here and show this that how that works in in kind of a short format any questions for people online before we get out of here I'm going to take that as a no then I will stop my sharing