 You'll also notice that there's other stuff here that may not be so obvious like IOMKL versus FOSS. So what these are are tool chains. The FOSS tool chain is free open source software and the IOMKL is the other common one and that's the Intel Math Kernel Library tool chain. So tool chains down here, this is a better description. The free open source software tool chain that some of these packages are compiled with are simply using the GNU compiler collection. And that's things like GCC. So this is the free GCC compiler, but also this tool chain includes other things that are commonly used with a compiler and in this case, it's including most everything that's needed for running optimal optimized and parallel codes. In other words, open MPI package to do MPI support for parallelization, open BLOSS to load the BLOSS library for optimized low-level optimized routines. LawPack is an linear algebra package, an FFT pack, and a scalable version of LawPack. So if you do compile something for yourself, you can do a module load of FOSS and get all these packages for yourself. If you do a module load of an individual module, it may load these as a dependency. If we go down here, the other tool chain that I mentioned, the other main one is the Intel one. This is the Intel Cluster tool chain, and so that would have, for example, for the Intel compiler, it would have ICC, the Intel version of the C compiler, as well as C++ and Fortran compilers. Then it has the Intel Math Kernel library, and this kind of includes a lot of the individual libraries that were in the FOSS tool chain. So that includes BLOSS and that includes LawPack, FFTs, etc. So that's all lumped together in the Math Kernel library. And then in addition, this also includes open MPI. So those are the two main tool chains that you have to be aware of. If you compile your own code, both of these can be used to compile code optimally. If the directions for compiling your code use one or the other, I would stick with that just because that means it's more likely that the developers of that code tested it on those compiler chains. But otherwise, a good code should be compiled, compilable by either of these tool chains. You have variations on these tool chains, like FOSS CUDA is the FOSS tool chain, but it has CUDA, which is what you need to compile for support for the NVIDIA GPU cards. So if you're trying to get your application to accelerate with GPU cards, if it's programmed to do that, you need to include the support for the CUDA library. So that's all I really want to go through on tool chains. Let's spend a little bit more time on modules. So again, let's look at one module here. I'm going to start by doing a module purge. So I start with a blank slate, and then I'm going to do a module avail. Then I'm going to pipe that and do a grep minus i for case insensitive search. And I'm going to look for the package called VASP. This is a atomistic package that's used by chemists and physicists to simulate small numbers of atoms in a quantum mechanical way. And what you see off to the right is there are two versions here. One is VASP. They're both version 5.4.4. This version of VASP that I've highlighted is compiled with the FOSS tool chain, the free open source software tool chain. And then the next thing it says is 2018 A. So that gives you an idea of what timeframe the tool chain was used. So the 2018 version of the GNU tool chain. And then the rest here, dimer, beef, salt, salvation are some packages that were compiled into it. So if you want to know more about it, you can do a module spider. And then I'm going to paste that in. And module spider will give you a better description of what the package is. It says VASP is the Vienna ab initio simulation package and tells you a lot of quantum mechanical and molecular dynamics. First principle stuff. But it also so it also gives you a website where you can go for more information. For example, down here. So module spider is a good way of getting a more verbal description of a package. And you can actually do better than this. I showed you there were two versions of it. So if we just do a module spider of VASP itself, what that's going to show is both versions that are available. So again, module spider is just a good way of searching for a specific package and getting more information about what versions are available. Now up here, you'll also notice that on the right side here is a D. That means that's the default version. If you want to be explicit, you can do a module load of the entire package. But if you just do a module load of VASP, it'll choose the default one. This other version here is FOS and CUDA. So that's the GPU enabled version. So again, we did a module purge. So if I do a module list, it's going to tell me I have no modules loaded. I do a module load. And this time we're going to do VASP and then a module list. There's actually 17 modules loaded now. So that's the nice thing about modules is that not only loads what you're looking for, it loads all the dependencies. So in this case, number 17 here is the actual package. And it tells you the version number and everything that we went over before. But it also tells you that it loaded the beef library. And again, that's one of the dependencies here. One of the packages that was compiled in to this version was the beef library. So it tells you exactly that it loaded it in and what version number. Here's where it loaded in the FOS tool chain. And it said that's from 2018. And again, with the FOS tool chain, here are some of these other packages that are loaded in scale pack, an FFT library. Here's the BLOSS library. Here's the open MPI library to give the parallelization. And a lot of other things. This is the core GCC library. So the nice thing about this is that you don't really have to know these things. If you want to use VASP, you just have to know to do the module load of VASP. And it even sets up the path to the VASP executable. Now, I know that the VASP executable is called VASP standard. For example, so if I do that, it shows me right where the executable is. And if I want to run this, I can do my MPI run and give it the number of processors and et cetera. Now, I don't have this set up to do a run. But again, when you load a module, it loads the executable. It sets up the path, all that kind of stuff. So it's a very nice way of managing software packages for a cluster like this. So again, if there is a section on an installing your own software, and again, if you find yourself in need of a software package that isn't in our module list, then you are responsible for doing that with our help. And if it is something that you think that others might be interested in, let us know. And we can see if it's in what we call the easy build. Easy build is the list of recipes for how to build these modules. And so that's a time when we go to Adam and say, well, you know, is this in the easy build list? Is this something we should build for the entire cluster and put it in a module? So any questions about modules or tool chains at this point? Modules and tool chains are something you'll find on most of the larger supercomputing centers around. So it is just a convenient way of managing the installed software. If you want it to not be there, okay. Yeah, the module purge will get rid of all your modules. I will show you one other thing, too. On my base directory, I have an example of a script where I run VASP. And one thing that I do in my scripts is I always start by doing a module purge. So I know I'm starting from a blank slate. Then I do a module load of the specific version that I'm running and I do a module list. That means that every time you run this, it'll put in your output file a listing of exactly what you did. This is very useful if you contact us because we know exactly what version you're running. And that saves us the time of asking. This is not a necessity. If you do a module load of VASP, for example, and then submit a job, the job will know where the VASP executable is because it takes that from the environment before you submitted your job. But there won't be a list of that in the script that you're submitting. And unless you do a module list in your script, it won't be in your output. So putting a module purge and then the explicit module load like this in your script is a good idea because it saves one communication step with us if you do run into problems. Yeah, because with VASP, there's only the two versions right now, but we've gone through old ones. Right now, there's deprecated versions that a few people are using, et cetera. And if someone says, I have a problem with VASP, our first question is, well, you have to tell me what version of VASP you're using. So doing it in this way is good for several reasons. Any other questions on modules or tool chains before we move on? Okay, I think you're up, Kyle. Let me, I have to unshare, I think. Yeah, I don't know a little trouble about that. Yeah, I'm not even saying what this is. This is back here is your stuff. Well, not, mine doesn't show the center box. Right, because it's showing on the tip. It's not showing on the tip there. So if you go up to the top, I think you can get that zoom. Let's see, there we go. And now, assuming I don't catch the chair again, like I had, we're going to go through a few things on copying files into and out of BioCat. We've already gone through the GUI piece with mobile X term and files all. So I'm going to kind of skip that piece of it. We have some new software installed. We've had it for a while, but we got some enhanced versions for running through Globus. And that's a high speed data transfer service. So I'm going to share my screen. I've never done this on Windows before. And we don't have it installed. So this ought to make it interesting. So I'm actually going to pull my stuff around here so I can kind of look at the screen. We have to do quite a bit of this at the same time here. Globus is a very high speed data transfer service. It's really meant for moving from one data center to another. So if you were using some exceed resources, if you're going out of some of the big national supercomputers, they would all have Globus installed. It also works for copying to your own, from your own laptop or desktop. And we have two of them. And let me actually pull up here our support pages because it kind of gives a diagram of how things are installed here because it'll make sense what I'm talking about here a little more if you can actually see a picture. Support. Okay, okay. I have a page here. I'm just going to go straight to, come on. I hate this mouse. I have a page called Globus right here. Okay, so we have two data transfer nodes or DTNs. This picture right here, see if I can blow this up big enough that you can open a new tab. Here we are. That's now getting to be big enough we can see what's going on here. The nice bit saw my mouse clicks. There we are. Nice this. So what we have here, this is just a quick block diagram of how the university connects to Baocat and why you should pick one over the other. So, Canren, that's our internet service provider, among other things, but that's Canren is where we get our internet service from for the whole campus and for Baocat. The entire campus gets a 20 gigabit connection to the university. So everything you guys are doing, all the internet stuff you do is all going through this campus firewall at 20 gigabits per second. We have a separate connection also through Canren at 100 gigabits per section that just goes down to the Theonan. The Theonan is as fast input output network appliance. If I remember right, there's a name why it's called Theonan. But that connects straight to the Baocat network. So if you're on campus, if you're off campus say transferring to one of those big data centers, back and forth, it makes a whole lot of sense that you'll want to transfer to from Theonan because it bypasses all the campus network stuff. The campus firewall, the campus, everything else, and it's a faster connection on top of that. However, if you're over here on campus, let's say that you're in Throckmorton. Then to go to the Theonan, you'd have to go outside of K-State's network through the campus firewall, back to Canren, outside, back to Canren, back to Theonan, that doesn't make any sense at all. You'd want to go on the one that's on campus, even though it's a little bit slower, 40 gigabits per second, a little slower. Still pretty good speeds, right? So these buildings are all connected to 10 gigabits per second. There's some buildings on campus that aren't quite that fast, but most of them are one or 10 gigabits per second. So if you're in one of these places, you'll want to use the Oncampus, the DTN, just label DTN, that was the first one we had. It does still go through the Baocat firewall. However, it's very fast and you're still going to end up having to go through firewalls anyway. So if you're on campus, you're going to want to use the DTN. If you're off campus, you're going to want to use the Fiona. Generally speaking, now if you already have all your stuff set up to use the Oncampus and you're transferring a small file, by small I mean a couple megabytes, something like that, the time it takes to set it all up is going to dwarf the amount of time that it takes to transfer the file. So if it's only going to take you a couple minutes anyway, don't worry about it, just go ahead and use this. But if you're using large data sets, if you're transferring a couple hundred gigabytes or terabytes, which we have people doing that, data sets that size, if we're doing that, you probably want to go to Fiona if you're off campus. It'll be worth that extra time you spend to set it up. And this is what I'm going to demonstrate using Globus and you're going to see that it takes a little while to set it up. Once it goes, it's extremely fast. It's also meant to be extremely resilient. We have, there are places transferring files at this hundred gigabytes per second across the entire nation between these big data centers. Can you batteries help? I've got two different mouths. Different mouths, okay. So transfer to Globus. First of all, we're going to start through a web browser and you're going to go to Globus. Oh, yay. I have Logitech options software. Thank you. Yes, it is. All right. And you can't see because I'm, because it's got the little zoom box right up here, but there's the login button right here. Okay. This is the first page you get to. It's asking to log in to somewhere. And it doesn't say where, since I've never been on this computer before, if I say look up my organization, you'll see there are a whole lot of them out there. There's all the sites that use Globus. In our case, we're going to be Kansas State University. So I'm going to start typing Kansas. Until we see that there's a very small list now, and I can choose Kansas State University. And I continue, and now it's going to take us back to K state single sign on page. Again, I don't have my password manager here. Got to look this up. And my particular account is duo enabled. So I'm getting this onto my phone now. I'm getting this. If you're not, if you're not using duo, then you, it'll skip that step. And now I'm in the file manager, but it doesn't say that I have anything going on. So the first place I want to go is I want to go to endpoints. On the left here. Now this one happens to remember the last, last few places that I've been. So we're not going to worry about that right now. I'm going to pretend like I don't see anything down here. I'm going to go up to the top to search all endpoints. And then I'm going to start typing Bayou cat. It's a little bit different. The first one is Kansas state. This one is Bayou cat. And you'll see that I have two options when I go into Bayou cat. The first one is. The on campus DTN. The second one is the Fiona. That's the one that's outside the campus firewall. So I'm going to click there. On the on campus one first, we'll demonstrate this one. And it tells me I have no active transfer credentials. So a timed out basically since the last time I've used it, it's been surprising since it's been a few days. So what I want to do is I want to activate that. And again, it's going to ask me for my username and password. This is actually talking to Bayou cat itself. And as you know, if you're, when you log into Bayou cat, even if you are duo enabled, it doesn't do this. That's not through the campus sign on. This is basically the equivalent of logging in there. So even though we're using EID credentials, it has no way of knowing that we're, that it's actually the same. So I'm having to put my password in again. And there it tells me the active certificate expires in a day. So it gives me one day to transfer files that before it times a doubt. And if I'm getting close to the end, I can go to the same page and I can say extend activation. So now I want to open that in the file manager. And if you look, you'll see the same things here that I had. No, don't save my password. Thanks. You'll see the files that I had here two days ago. There's my out files from, from the, the demonstration that I ran. We'll take one of those and I can delete it. For instance, it takes a minute to submit that job because everything is going through the global service. So this is not as fast as through mobile, external or file Zilla or one of those kinds of things. It'll, it'll take a few seconds. Usually just get started, maybe up to a minute to get started. Once it does it, then we're there. There is the refresh button here. And now you can see I only have two of the files left. So that doesn't, that's all right for managing files there, but we want to do more than that. We want to transfer files to and from there. So this computer I'm on here. I need to, I want to transfer files there. So I'm going to go back to end points here. And I want to make my own end point. Come on. Where's a file manager? Things happen strange when you get, when you make a screen large here. So I lose track of where I am. There we are. End point. That's where it is. Yes. A little plus up in the corner. Here's the kind of endpoint I want to add. I want to add a globus connect personal. So I'm going to say this is the 3099 room display. Just a friendly name that I can use to talk back and forth here. Ask the identity. Yes, I'm using Kyle Hudson. And I'm going to download this for Windows. Like I said, I've never even done a Windows one before. So we'll see how good I am at faking my way through it. I want to allow this app to make changes. Yes. I wonder if it'll let me install software. Currently. Okay. Okay. Now it wants to set up key. So I have to come back here. I have to generate a setup key. And I'm going to copy that. Put it back in here. And okay. Do you want to use globus to access your documents? Yeah. I want to, my, my documents here are all in the U drive. So I'm going to change that. So we're going to make that writable. We're going to add. My desktop users and me and desktop. I want to add my U drive, which is computer science department U drives for personal stuff. So chosen all that. All right. Save. Okay. Globus connect personal is now running in the background. This is good. Okay. So now we go to end points. Administered by me. Now I have the 39 nine room display. So I can open that in the file manager. Ah, this silly things. The screen is too small to do what I wanted to do. Okay. Let's see if it lets me do this here. No. Here we're seeing my files over here, but I want to transfer into Bayo cat. Let me see if I make my screen smaller here. If this lets. Okay. Okay. Okay. Transfer sync to. There we are. Now it's got a place over here. So I have. My computer over here. Selected locate select. Over here. The on campus DTN. And now I have my files. From my. Just top here. I have my files forum. I can do things like drag and drop files. This way. As you can see, it doesn't actually do it immediately. It tells you it created a task. And you can actually tell it to in your, in your personal options. It can actually send me an email to tell me that. Hey, this started. Hey, this finished. If you're dealing with large data sets, that can be really convenient. The nice thing also is, is that say this. Is running in this room. I could walk away though. I want to transfer a few gigabytes. I could walk away from this room. Be logged out as long as that globus connect personal server. Is running. It'll still continue to transfer files in the background. That's also, you can initiate from here, like I said, from one data center to another. So I have credentials as of today at university South Dakota. I'm going to go do some work there going to couple weeks. So. I could transfer files to and from. South Dakota and bio cat. So this is my main goal. All through this interface. We can sync folders between them. Download upload. I can copy files. The other way to drag and drop that way, very similar to a file manager. Like I said, the differential notices that it doesn't start immediately. It starts usually 10, 15 seconds. Somewhere in that range of a minute. But once it gets going, it's very fast. the on-campus DTN. Oh, one other thing. Say you want to share files with people here too. So over here I have a shared files folder. I can, let me see if I can make my screen smaller again here. Ah, I deleted all my files out of there. That's silly. Let's see, do I have anything here? I think I zoom folder. Yeah, there we are. So let's go into my shared files. I'll put this refresh. There it is. Okay. Now the other thing, I can't see everything I needed to see on the screen at once. Okay, so now on this endpoint, I can go to the DTN and there is an option here called shares. This is how I'm going to share files with people. Again, this is on the on-campus DTN. So I probably want to, you can share with people off-campus too, but this is primarily on this, on this node, you're going to be wanting to do the ones with somebody on campus is, I can browse out here and say my shared files. I want to select that. Display name, files, demo shared files. And I create that share. Now, by default, right now, I have read and write access to that folder, but let's say I want to share this with somebody. I want to share that. The first thing it's going to say is the path. So it's, it already, by default, it's already in the shared files. I can submit, I can do this by username or email. So let's say Dave over there, Dave Turner at ksu.edu. And I want him to have right permission at him and add permission. So now Dave also has right permission to that same folder that I do. Dave, I can also do all users. So anybody in the world that goes and searches for it, not necessarily just a case date, I can say, I can go ahead and add permission and give them read access. I could also give them write access. Don't do that. Don't give the entire world write access to, remember, you're in charge of what's in your folder. So don't do that. So is anybody logged into Globus already now here in this room? Is anybody here already logged into Globus? You are? You're trying to? Let's see if you can see my files here. Are you logged in? The personal? You don't necessarily have to have the personal to know what to do. I guess you do because you have to transfer it back to your machine. There, okay, on a Mac, where it says that generate that setup key, there's an under preferences going and say delete and delete all the preferences and setup key and then re-enter it for some reason that by default it doesn't ask you for that. It just assumes like a blank setup key. So it's under preferences. That's a weird thing and I don't know why it does this. No, under Globus, the Globus Connect personal. Are you out there? Okay, can you see my stuff? If you search, okay, actually can I have you share your, can I, if I unshare my screen, can I show this so people can see what you're doing here? So let me unshare my screen so people in Zoom can see here. I'm going to stop the share but have you share your, oh you're not on Zoom. Let's have you join the Zoom room here. You find my meeting notes here. KSU.Zoom.US. They have work to do on their UI. 016542. Let's see it down here. Oh, there it is. Okay, now you're in. So let's share your screen here so we can see what's, here we go. So on somebody else's computer here you can see what he's got. And now we're going to go to end points and we're going to search for a cat. There's this. And now shares. He's not having any points here. I think we can do a search. You can tell I haven't done this a whole lot. Shared with you. There we are. I know the name for it. Power shared demo files. And I can open that. And now you see, you can see there, you can see my file that I shared with the entire world. Now you didn't have to go search for it. So you have to have kind of the name. But he does have it out there. Somebody's got real quick. What are you asking about, RV? If you're wanting to do more files, you can leave it out there for a second. So you don't see any zoom if you can get here. At least I didn't knock the whole microphone on the ground like I did last time. But you can see that he can see my files. I already saw Kyle's demo shared file. That's what I'm looking forward to. So now if you're anybody but Dave, you're going to see those files, but you won't be able to write to my directory. Dave, on the other hand, will be able to write to my directory because I gave you permission to do so. So that is the on-campus DTA. The off-campus one is very similar, with what it is, it doesn't give you the defaults when you do a file here in your screen. There's a major echo going on. Yeah, I know what's going on here. That fix it. Great. Thank you for letting me know. I didn't even think about that. Okay. So I'm going to share my screen again. There also is a link for sharing. So you could actually send that to somebody. Just by email. As a matter of fact, it looks like this. So it's a big long URL, but it's something you can actually go to a web browser and people can get into it that way. So now I'm going to go back to the overview again, and I've decided I don't want to have people doing this anymore. So I'm going to say delete endpoint, and now it's no longer shared. So I'm going to go through the other one, and you're going to see it's very similar, but not exactly the same. So back to my endpoints. I'm going to, again, I'm going to go ahead and search just because you guys won't see it up there. Click on Fiona. Again, this is the one that sits outside the campus firewall. And here everything has to, if there is no option over here to view it in the file manager, because it doesn't know what you want to look at first. So even if I wanted to grab my own files from off campus, if I wanted to use this off campus DTN, I would want to go to collections, and I'm going to create a collection just for myself. So I'm going to add a collection, and I want to sign in again, because now that I'm on a different node, doesn't realize it was the same one. Let me see if I can stand up without knocking my microphone off this time. It's still new that I was on single sign on. So kick me over. Base directory. So my base directory is going to be slash home slash Kyle Hudson. And I have to give it a name. Kyle's home directory. And now I can create that collection. So there was no way of just getting into my own home directory. So I have to do this myself first. It gives me options. Do I want to share this my home directory? No, I don't want to share this with anybody. I just want this one just to be for me. Transfer data to or from there we go. So now it's kind of a now I'm I'm back into my file manager like I had before. And I can select a collection over here of my 39 nine room display if I want. And again, you can see that I have the same same kind of setup as we had before. The big difference is there is no default of a home directory on this one. So you actually have to create your own collection of a home directory. If you're sharing things with an outside company, and I know we have at least one researcher on campus that is doing this, they are they have a vendor they're sharing lots of files back and forth with. So they're setting up setting up something out not out of their home directory but out of bulk. And they have a writable folder by people in that in that company. So those people can log in. They can transfer files over for them to process their genomics files in this particular case. What's chat got the level and it pops down and says, Hey, you have something and then never mind chat. Endpoint permission denied. I have never seen that. RB, why don't you email me a screenshot to send it to the baocat at cs.ksu.edu. I'm leaving right after this class here today, but I'll get to that and I'll take a look at that because that shouldn't be the case. As long as you're on baocats Fiona. There is a nice feature here called groups. I don't have any groups set up right now, but I can create a group and I can make this people that are in my lab. So I can create a group of let's see here's baocats staff and I can make myself the administrator fine. View of my group members only creating the group. I'm going to invite others and I'm put Dave Turner cs.edu and send another and now we have I have this group. So I'm done. Now when I go to create my collection here baocat, this is really annoying to me that you have to do this every time. Fiona, I'm going to create a collection in my shared folders. Even though I shared the other one through the DTN that doesn't necessarily mean it's shared through these, they don't talk to each other. So if I create a shared folder on one, it doesn't necessarily know the same shared folders over somewhere else, but I can say homes Kyle Hudson slash shared what a what capitalization did I put on there shared hyphen files for groups and I can create that collection. Now when I go to share this with others, I can say add permissions and I can now share this with a group and I have that group of baocats staff. If you have people that you're sharing different things with that can be really convenient to keep it as a group because as somebody leaves the group you can kick them out if you add more people to group you can let them in that kind of thing. So that's a convenient way of doing this type of thing. You can also make other people as you saw in the group an administrator on that group. So I could if we had shared files we were sharing with the three of us, I could say any of the three of us we trust each other, we can make each other administrators, they can add more people, we can, you know, take people off that kind of thing. So that is Globus in a nutshell. Yes Dave, yes we do. As of last, as of this week we have it in our documentation because I finally got the all everything working this week and and a demo made. So thank you for asking. He says that because there was a stub for a long time says Kyle's going to put something here and just this week that was changed so like Monday I think it was I got that all going and it doesn't say that anymore. So I surprised him. So now I'm going to go over here overview and I'm going to delete that because I don't need it anymore. If you're going to set up one for your home directory you might as well just leave it that way though. If you're going to set up files just for yourself it makes no sense to create it and then delete it and create it transfer files and delete it. Just leave it once you make one. There's the files that have gone on to here groups. Let's see 399 room display. Do I have I'm done with this so I can delete this endpoint and I will no longer be able to transfer files back and forth to this desktop. So even though Globus is actually running here it's not doing any good and it won't actually. So I'm going to go ahead and quit that questions. Yes. Okay the question was he was using he's now using WinSCP and wants to know if it makes sense to move over to Globus. That's largely a function of how big your data sets are. So if you're if you're able to click you know send those files over and it's a reasonable period of time you know a couple minutes or whatever leave it. It doesn't hurt a thing. If you're transferring large files and you're like man this is taken forever probably go ahead and switch to Globus. There's nothing wrong with either one of those. What else do we have to talk about here? I mean look at my notes. Managing files and storage and I'm going to probably share the mic with Dave here a little bit on this and archive options. So I'm going to give you kind of a and I don't even have any slides on this I probably should have but we have several several pools of data on Beocat. Just tell me that's everybody's stuff and I don't need to know that. There we are at the top. These are what we have mounted on Beocat right now through Ceph. Ceph is our main file system and it says you can see we 3.2 petabytes and 1.3 petabytes available use 1.2.0. That's all in terms of raw storage. These there are some differences here in the way these are handled and I'm going to tell you about them. The first is the home's volume. This is when you first log in your stuff is under homes. Mine's Kyle Hudson. That's my home directory. When I write data there the back end file storage when we if you guys went on the tour yesterday you notice there are Tuesday you notice there were like 28 servers just of machines that are talking to hard drives. 29 now sorry 29. I forgot we added one just a few weeks ago and those are meant to be such that any one of those whole machines can go away and everything else keeps running. It's not even just a individual hard drive. It's one of those whole machines go away. The way that does that is on homes it actually writes that data three times. It writes one to this machine once this machine runs once to this machine. Yes yeah okay yep we can do that. Sorry to interrupt guys. You're fine. So it makes three three copies of your data in terms of data resilience that makes a lot of sense in terms of space used obviously you know three copies may be a little overkill but this is the stuff we want to be extra special sure that nothing goes away from because this is where most people keep most of their stuff. The the nice part about that also is that it's fast when it writes that and reads it can read from that really quickly you can write to that quickly so that's the advantage there. The bulk actually let's go let's get one more let's go to scratch. If you want to you can write anything to scratch okay it's going though isn't it didn't sound like it okay I didn't ever hear cut just try to cover there okay um and scratch scratch the idea of scratch is to be as fast as possible so we have we only have it replicate twice instead of once we still have some data resilience so if those small little hiccups won't cause everything to go away but we only replicate it twice limitations for the home's volume we limit you to one terabyte of data that get does get backed up according to our official policies it still is not being backed up right according to our official policies and actually I think we're backing that up we're backing it up to our old cluster over nickels as long as it's under as long as under terabyte we have no way at this point of saying thou shalt not use more than one terabyte right we do we have our close at two terabytes okay if you're using more than a terabyte not only will your data not be backed up but you'll probably be getting emails from people saying hey you're using too much move it go somewhere else the somewhere else at this point is bulk bulk is actually uses erasure coding erasure coding is where it splits the data up into chunks and it's four and two is that what it is now six and three six and two so it'll take your every uh megabyte of data that you have there it'll split that into six chunks write that out in a different disk write those little pieces on the different disks then it writes and checks some information to another disk and it writes more checks of information to another disk so it's actually using about one and a half times for every megabyte so it'll write one and a half megabytes for every megabyte to use that's a lot better than three that's using half as much space so if you say something on bulk it actually uses half as much space as when you put it on homes it's not quite as fast because of the all that processing and has to read all those places when it comes back it is not quite as fast it's very comparable on big files when you start writing the when we first set bio cat up with this file system we put everything in and so in in that same erasure coded pool and and what we found is that on our testing it worked great whenever we hit real life when everybody first logs in when you first log in it reads and writes a whole bunch of small little files and that was take make things take forever and it was slowing everybody down so that's why I moved everybody off of that and onto the current home's volume with with a replication instead of on erasure coded does erasure coding make sense to people am I getting blank looks it's kind of like raid systems on a single computer but it's over multiple computers so it's some some level of resilience not quite as fast is the short version starting january I believe there we are going to start charging for data in bulk that still doesn't get backed up but that's just so basically we have people that have over a hundred terabytes out there in bulk which is fine we want to have places for people to do that but we can't we can't fund that ourselves indefinitely so our file system file servers most of them are getting to be out four years old they're starting to hit that edge where they're getting old and going to be replaced we got a grant to buy it to begin with that grants not coming up to be able to do that again so we're going to have to start charging for for data out there we have no plans to charge for home so if you're under a terabyte you're fine if you're on bulk you're fine on scratch we don't have a limit but anything that's left over 30 days gets deleted automatically so don't leave your stuff out there for long if you value your data and also if things happen to go you know haywire we might reformat it and just take a start over again on scratch there's no promises on that so that that has meant for short term files write your stuff out there you know intermediate files your jobs use in the middle and then copy your final results over delete your delete your stuff off scratch because it gets deleted anyway so basically no guarantees of anything nope no not on those it just it just does it so like i say don't don't put things out there that you plan on plan to be out there for a very long time right right yeah they're we're actually looking at date stamps on the modification date stamps and if it's something that you are using you can use it like a touch command to keep them out of the issue don't even need to do that we look at the access types so copy it to somewhere even if you don't even throw it away when you're done copy the demo that'll access it that is the that is what we have right now for storage this one's different this is all for internal use the baocat so scratch bulk and homes are the places that you guys can save data and everybody has a bulk directory slash bulk slash eid so i've got one here you see i have several files out there myself in bulk but uh so you you already have that available you can copy stuff out there there will be a per terabyte charge starting next year not yet but there will be i think what we're talking $50 a terabyte per year something like that so it's not it's not a ridiculous charge it's basically just enough to pay for our drives is really all it is um archiving we have we have hardware in place to basically do the same thing for archive storage that will be a matter of talking to dan and getting uh getting that set up we're ready to go on that right pretty much we just need to set up the jbods for it that's what i'm going i'm going there so we have a couple of options also for uh for archiving that don't rely on that the first one is free which i like free but requires some steps to go through and that is google drive if i sign into google drive now in order to do this though here's the magic is that if you want to use more than the default 15 gigs you have to get your account blessed by central it so you have to send you because they don't set up accounts for everybody by default because they don't want people using the google services for education for some reason they're afraid they're like going to supplant the microsoft stuff they have out there uh microsoft has one drive that has uh data you can transfer up to a terabyte to that so you can do that from your desktop for up to a terabyte bigger than that that doesn't work so well so contact us we'll tell you how to get ahold of this to to get the right people to talk to at central it and we can't we can't do it for you so we can we can just tell you the right place to ask so i'm going to sign in what's that it's unlimited there is no limit so and it has to be at the ksu.edu that's how it knows that that's why it's unlimited because it's an educational account so it has to be your ksu.edu email next no so here's my google drive right now and i have a couple of files out there and there is one called archives and here is the i am using this to we are using this to copy stuff that people who have left the university so if we look here uh this person left the university at some point uh we archived this on 2016 11 29 so in november 2016 we archived this we then uploaded this google drive and you can see i have 19 terabytes used in google drive right now there is a by using that fiona node we have a very fast connection to the outside world and we can upload files there is a command it's called r clone that we will help you we will help you set it up when you get to that point but if you want to archive stuff to google drive that's probably the best way of doing it in the short term long term um the university of oklahoma is setting up a monstrous research tape archive and what you'll do with them is you will buy for every you will buy actually two tapes like physical cartridges the lto eight now is that what it is whatever whatever version the lto nine they're tape cartridges and you buy two of them and you send them to university of oklahoma actually i think you'd write a check to them and they buy it for you but then they will have make two backup copies we we don't have this set up yet but there's going to be a way that we copy our files down to there they put backup copies onto tape they keep one copy there for retrieval purposes they send you the other copy to you so you can bet you can actually uh if if something goes away with their system whatever that's their assurance that you've still got a copy of your data around that probably won't be operational until next summer that's probably the best long term solution because the cost of tape is peanuts on a per terabyte basis but so that'll be a very inexpensive solution permanent more or less with the cartridges you have but it's not ready yet so that's that's the that's the downside those are kind of options for archiving data at this point the nice part about uh using google drive though is as of with everything else with google drive let's say that this person contacts us and say hey i had this stuff back their way back once upon a time i can share it with them over google drive and they have access to it so that makes that that's actually pretty nice with that as opposed to any other solutions that out there is that there's kind of that instantaneous yeah let me share that with you any questions no uh we asked if you if we can connect that through globus globus has a connector to google drive but they want like three thousand dollars a year for it so we're not doing that if you yeah if your research group wants to buy it for us for the yes we'll we'll happily make add that onto our globus subscription yes we've we've been using our clone to copy up and down oh there is yeah there is uh to google drive there is a daily transfer limit of 750 gigabytes which we actually hit that when i was uploading all this stuff so if you're if you're transferring you know lots and lots of data it has to you have to break it up into that size and it also has a five terabyte per file limit so you can't have any individual files larger than five terabytes for all of you that have files larger than five terabytes i don't know if we have anybody has any files larger than five terabytes that's not already an archive that you couldn't break apart anyway so so we're ready to forget them files are nice and close to you and very handy if if you can't afford the cost then the options are you could do a one-time expenditure would be to buy this supplement our archival cabinet or you can go free the google drive it's just a little farther away from the fail at which i'm a little bit long and then if you want to use that data break it down to scratch use it there on a temporary basis and then when you're done so there will be other options that you can use it in full now worked out real time wise since i need to leave okay i need to plug in over there and what you did into google drive with your case data account from us you had your account activated by some of you so they sent us an email for the support email and if that's what you want to do, we'll see if this is the right place to set it to so that it has to be aligned that process since we've had to go through and i think there we go you know this wasn't what i meant what i intended to do okay my name is adam tiger i was gonna because you've got your audio turned on you've got the mic muted hello so can everybody hear me all right sounds great my name is adam tiger i was gonna i'm gonna talk to you about git today um let me pull up my notes here since i need those uh in general uh what we do with the what git is version control it lets you control unless you keep track of the changes in something over time um why do we want to use it uh we'd want to use it because um we we we want to use it just be sure that uh if you are making changes to your software that you can get back to a workable state that you know what's changed that if somebody's collaborating with you you can see what they changed and make sure and do do verification that that their changes are good um there are lots of types of it get just one of the types that were the type that we're going to be going over it seems to be more and more of the industry standard um there are things like svm cvs uh microsoft has one built into visual studio i have no idea what it is but you know there are lots of different types of version control um in git's history uh git was invented in 2005 um by linus torvult he was using a software called bitkeeper it was closed source but they'd given him a license to use uh to use it for the for for developing the linux kernel um they it had gotten very cumbersome to use on such a large project as the linux kernel and it broke and when it broke somebody they decided they were going to rewrite it and linus ended up rewriting most writing this new version control in like a weekend that's how frustrated he was with it um all right so we're going to start by uh creating a git repository let me clear off the screen here so that people can see what's going on um i'm i'm currently on baokat but we're going to create a directory called you know i'm going to put my resume in here make their resume you can use it for resumes you can use it for text files you can use git for um all kinds of different things but you can use it for source for source control i think the resume is a perfectly fine thing to do um so i've just created a folder there's nothing special about it yet and to create that initial repository we're going to do it a git init that initialized an empty git repository right here in my home directory or in my source resume directory um you know i can do a i can do a directory listing so there's nothing in there yet it shows that it actually did create a dot git folder when initialized it um you uh you can you can start creating files with say say i can maybe start creating my uh biography so vim bio i'm gonna say my name is uh name adam tiger uh likes um i like computers i like uh video games i like nature i've now written that file we can cat it bio now we've got this repo does git know anything about it i don't think so we can we can check that with a git status that lets us see what the status is of the current directory okay it tells me that we're on a branch called master we already got an initial commit there's nothing there yet so they're at the initial commit we've got untracked files and it's in red here bio is untracked we go through and say okay well we want to track this file these are these this is a file that might have changes that we want saved and are and kept track of if we do a git add it says right here git add file so we're going to get add we can check the status again see what's going on so so well we've got a new file and we've now since we've added it we've not told git to actually commit this to keep it keep track of it yet so we're going to a git commit let me say git commit it commits all files that have been added they have to be added for for git to actually keep track of the changes in it and it brought up a little text it brought it brought up my my editor here with uh please enter the commit message for your changes what would you want to do with your commit message um in general in a in any commit message you want to you want to list the things you did why why did you make those changes um at least the first line you want to keep keep keep uh short and sweet but we're gonna say we added bio uh my name and likes so since that's the commit we've added we can we can save it and it then tells me that we've added bio and one file changed three insertions so it it inserted three lines now if I come in here and say and I take a look at my bio again then I decide that you know what I've changed I've gotten older I'm not sure I like video games anymore maybe I want those to be board games I can save that file and git comes in here I say git status and we've got a modified file now what did I change again git diff you know I that shows a difference between what was committed last and what is currently in the state so it says that no we changed this line my likes changed from video games to board games maybe you know maybe I still like video games do I really like board games I don't know I'm gonna change my mind on that I'm going to get check out bio that pulls back bio from the uh from what was previously checked in it reset its state back to back back to back to what it was before so if we do a git diff there are no differences if we actually look at bio again it shows that I still like video games so a checkout allows you to re reset your state on individual files back to uh back to uh what was previously commit what if I've come back to a repository I forgot what was going on that happened looking at in a while I'd want to see what the log was what's happened and this repository is fairly small so the log's not particularly interesting it tells me that hey I added my bio and names and likes and my author is me or it is created not too long ago so it's a relatively short log but if I went to something like another repository real quick that I that I happened to to use I went to my slurm repository and I take a look at the log there the log is thousands of thousands and thousands of lines long each with a commit ID an author uh the date it was committed and the actual messages they put in there so you can figure out what they were trying to do and we'll go back to the resume let's uh let's go ahead and make those changes again to uh to to my bio I you know I I think I'm going to add add video games to my bio to my likes or a bit of video games board games so I'll add that to the end and dislikes angry bees I dislike angry bees so we can take a look and see what we changed again get diff well I'm ready to commit this so let's do that get commit but wait there's nothing ready to commit that there's there's a file modified but we forgot to add it every time you make a change to a file that you actually want to make to be committed you have to tell get that you know I actually want you to keep track of these new changes so we do a get add again and now we can commit and it says all right we've modified bio and what did I do I added uh added added board games added dislikes okay so if we check the log again it shows that hey we've got two commit messages now one that one with our initial commit and one with our new one if we do a get diff I get status there's nothing to admit everything's ready everything's clean but I forgot what I did I I know I made changes here what happened what what actually changed between these two commits we can actually do a diff between head and head minus one so that basically says where you're at now what was the difference between now and one revision back so that is but a get diff that that is so this head is a magical keyword that is that's the beginning that that's the current release of of the getter while starting and minus one or this till the one says one revision back you could do things like uh till the two um but because you can I can't remember the syntax for that right now um the other things you can do is with your get log you've got your diffs or you've got your commit messages here you can do a get diff commit message or the commit id and it will show you differences between now and that commit so we made these changes we added board games and angry beats so there there are a few there are a few skills of thoughts on commit messages in general um what I try and do is I try and make sure that whatever my commits are when I make commits I will uh I'll make a commit that is that that is self-contained so if I'm so really this this last this last commit I made board adding board games and dislikes they're not really it's not really self-contained it it's changing two different things either one of those could be could be a change in and of itself um one of the things that you that you that you can do with uh with git is you can commit individual files so if you you get add you you can just commit individual files um the other thing that git can do you can you can add individual lines to um to the staging area the the staging area for those that don't know is is um it it git has as a concept of now what was before and things that are going to be committed and things that are going to be committed are in the staging area um my editor here I've got plugins that that let me do this easily um I know there are also plugins for it with google with uh visual studio code and um github has a has a gooey to do some of this stuff um let me edit my file again and I'm going to say uh I'm going to remove board games because I don't really like it I'm going to add another dislike of um stormy seeds so I've made those changes my editor I could just do a g diff and that shows me the diffs the the the the diff between um the staging area and what's currently what's currently there um let's see and so what I can do is I can just say diff obtain and I can just save that and now if we take a look at we've got it now so I've just changed one thing in the staging area if we do a get diff it looks like we just we we still have stormy seeds changed but we're not ready to commit that yet and we've got a modified of both bio that's ready to be committed and we'll modify the bio that's not ready to be committed um I'm doing it on the command line for individual staging individual lines it's kind of a pain in the butt I know you can do it but I haven't done that a long time um what editor do you usually use are you okay okay uh so so it is personal preference um if you're wanting to if you want to be able to revert changes or if you have if you're working with somebody else and you've got a gigantic changes and every every time you're changing five different things trying to figure out what broke between different revisions is kind of a pain if you're if you're changing everything you never commit so in general I say try and keep them smaller but you can just add all the files and just say yeah this was a roll up of everything yeah you mentioned multiple files if you're doing the same thing to all the files then doing them in one commit is a good idea right so if you're changing if you're changing one function name but you're changing it in six files that's very appropriate to do right one commit with one message on what you're changing yep okay so I'm going to I I decided that you know what I'm not ready to uh to make those changes that I just made so I'm going to get reset head to undo undo the changes that I made to bio and I can get diff and I'm showing me all the changes and I'm going to get check out bio to completely reset back to whatever state I was at so now I'm just back to liking computers video games nature and board games and disliking bees um now with git you can do things like have remotes uh you might not this repository that we just created is on baocat and it's just on baocat um if I want to collaborate with somebody that's kind of a pain and they'd have to have access to my to my git repository here and I wouldn't really want that so a lot of times what you do is you set up a remote and we have no remotes here but I'm going to go ahead and switch over to the browser we're going to go to gitlab.baocat.ksu.edu let me stop the share reshare reshare git lab so I'm logged into I'm logged into git lab now everybody has a baocat account has access to git lab if you don't have a baocat account you can get x you you can just add you can just create an account on git lab but um it and git hub are basically just centralized places to store your repositories excuse me um the benefits of one or the other uh git lab you can have unlimited private repositories because it's all on baocat the the downside is that it's all associated with baocat and that means that if if you're wanting it in a more centralized location or you want to share with a lot of outside collaborators or whatever you might want that on git hub git hub has other functionality as well but uh git lab works well for everything we want to do so I've just said create a new project on git lab and I'm going to name my project resume it's because that was what you were working on before I don't really want my resume to be public public right now so I'm just set the visibility level to private I'm not going to initialize any repositories yet and you know no need no need for reading me I'm just going create that create that project and just like on git hub it tells you hey the pros repository is empty what are you going to do you can create a new repository and push it up you can push an existing folder or you can push an existing git repository and that's what we were actually going to do it's because we just made that git that that that git resume we're going to add origin for for git lab and we're going to push that up let me re-share so now we can come in here and say git remote add origin ssh git at baocat you know everything that git lab told you to use and we add that remote we can do a git remote dash v to actually show that it it created that and we can go ahead and do something like a git push oh wait no we can't because we haven't told the git that its upstream branch is master as well do that we enter our password or our keys whatever we need to do it pushes up to to git lab and in the browser here we can refresh and it tells me hey we better repository we can check our history like we were doing for it tells you that you add that I added bio and then I added board games to my likes and dislikes and you can take a look at the diffs of the of the history and that kind of stuff by clicking on the actual report history name of commit message um the git lab even has a uh has a rudimentary file editor so we can go in here and say you know what we're gonna edit this and I can edit it right here um I decided that you know what in my bio I should really have an address in there I'm gonna put in my uh my address and we're gonna call that uh uh 2221 uh b engineering hall Manhattan Kansas 66506 and you know what I'm gonna my commit message right here I'm gonna say I uh added my address and let's save that so now if we take a look at the repository and that the file now it says that hey we've got my address in there and you know what we've got a remote now and I I needed to go home and I was working on on this this bio at home too and I forgot what I put in there so I'm now working at home I'm gonna edit my bio as well since we're both working on it and I'm gonna say you know my address is 12344 fake street just get add that I'm gonna get commit that put in my message added address now we have we have this repository that's pushed up to git lab it's got one file on there and there's a different set of files on my computer home here that then they both they've both been changed that's not really a good thing we can do a git pull try and let's let's pull the changes from git lab from the centralized repository there's there's there's been a conflict what well what happened here let's let's take a look well we got the bio bio file let's let's edit it and take a look let's see what happened oh no they both got changed they both got pulled in it tells you right here at head that's what's on the local on your current repository and this other one over here there's with these other arrows after the equals is what was on the remote end well which which which we keep in general you know you'd go through and these these checks and make sure that you only keep the ones that actually make sense and now that i've made those messages i can do uh i need to get status it's showing that things that things have changed i can do a git diff and it shows me that all right well now we've made these changes we're ready to we've made the changes so we can fix what was up what was upstream and we're going to commit that we have to add it even i forget to do this stuff sometimes okay and now they're going to say save that commit message if i can actually type it shows that we we merged in a branch branch we can push that those changes back upstream nothing should ever nothing should change up there really and that's about all we've got that's about all they've got for that you can create branches within git to keep keep your your your settings you keep to keep your feature releases changed so you keep a clean master and everything works there you can add things like readme files and i think i'm not sharing my browser right now you can add you can create readme files you just type those in right here and you'd add those to repository you you can create contribution messages licenses licenses are always a good thing keeping your in your git repository because if you don't have your code licensed nobody in the right mind will touch it but still people might touch it if you if you do the wrong license people might be able to take your ideas and not credit you and sell it in general if you're making things public you should make sure that you have a good license set i would use something either i would use something like the gpl or a bsd license or creative commons if you're not a lawyer don't write your own license because the that that lawyers lawyers know how to tear them apart so that's about all i've got there for git any questions oh yeah we're making a recording of it yep i'm sorry all right uh since that's all we got for git i'm gonna go ahead and let Dave take over and oh that was it okay that was it i guess we're done just thank you everyone yep thanks everybody yeah if you do have more questions over git there's a lot of tutorials out there absolutely and i i know i i i i want fairly quick a lot of these things but oh we can we can certainly talk more about it but question is where can we find the recorded sessions from tuesday as well as today uh recorded sessions will be available on our website as soon as we get them edited and thrown up on youtube they'll be under our training videos set section so kyle should have an email of everyone who participated so we could have him send out a link to that as well thanks for coming guys okay um we have we have open sports sessions every uh wednesday uh almost every wednesday on our uh on our website we have a calendar support to catch you need you uh we got a google calendar down there and it basically sells you where the when the sport session is there there wednesday's at 130 uh and then where what room in the union we're in uh this next wednesday we're going to be in union 203 and that's on our that's on our sport website we keep track of these um we uh we usually will delete the we'll delete the uh the sessions that we are not going to be able to attend we try and keep at least one of us there i know as long as you're staying in a couple of weeks we've got one that that we're going to have to flee because we're all going to be on a trip and uh and down to a home so people do different things um but we we we all try to get there but we'll have at least we usually have one at least one there i'm gonna uh you have to delete the so if you want you can go to my bend director room all right i just i just removed one for the uh this is a september 25th because if you want we're not going to be there a lot of people do is so so the license that you've got is it for a license server or it's not it's not a lot of people