 name and not by face. I'm Kyle Hudson. I'm one of the new sys admins here. I started last October. Back there. I'm Tiger. I've been here since 2008. Over here. I'm Dana Andreessen. I'm the director for the center that not only oversees these two and I've been here since 97. So I will point out I'll be making some pictures for future marketing purposes and keep my masters at the National Science Foundation happy. So if you're in the under a witness protection program or otherwise don't want your picture ever appearing online, please let me know or, you know, treat in the hold your face or something and I won't take your picture. Otherwise I'll assume that if you're here and eating my food that you're willing to take it and be used for marketing purposes. So, incidentally, we have some iced tea. There's a water company right across the hallway. Bathrooms are here and in the main atrium. There's some strawberries, carrots, berries, cookies, that sort of thing. We'll be having breaks on the hour about 10 minutes or so and I'm sure you're about to say this but in the event that we're going over something you know or at least you think you know or find boring, feel free to check out mentally, source the web, answer email, we understand. Or even physically. We're going to be okay. So there's not a test over this. It's all, it's all what you want to get out of. And the material will be available online afterwards. So if you go home and say, man, Kyle was awesome. The videos will be available as well as the slides that I prepared. I sent out a link to everybody that was registered. Kind of gets to a no to connect. That should be available shortly after this is finished. The video is off this little webcam down here. So it may be a little spotty. It depends on where I happen to be standing at the time. Adam is also going to be filming it for, for a little, yeah, for future use, hopefully get a little better quality out of that one. And also, if you have questions, feel free, you know, raise your hand, blurt out stuff. I am very good at teaching people one on one. I've never taught a class this large before. So feel free. I'm really comfortable with the interaction kind of thing. So if you're, if I lost you somewhere along the way, please just get, get my attention somehow. Blurt something out. This is all interactive. We're all good friends here. And so I, yes. So I'm kind of curious about these makeup, since I don't recognize a lot of you, which means you aren't computer science. The group we got, one of the major groups here, I know in order to statistics. So statistics Excellent. Biological, biological sciences. And big data, that sort of thing. Chemistry. Chemistry in the air. The back solid block. You have strong bonds. Okay. Excellent. Mathematics. Anybody from math? Okay. Got at least one here. Excellent. Who am I missing? Physics. Got a physicist. Excellent. Anybody from like mechanical engineering or civil engineering, anything along that line? Got at least one? Great. Well, I hope it's worthwhile. This is the first time. Well, for Kyle, certainly to teach this is the first time to try a teaching session like this. Be gentle, but feel free to send this email afterwards saying, you know, here's some ways you really could have improved. Or if it helped, then let us know too. And we could take it. These guys are participants. They're used to getting very angry emails as well as occasional happy emails. They can handle it. And, you know, it also gives us an excuse to go and ask for more money. So, you know, occasionally say, hey, you need more resources. This is not a bad thing. So I think we're good. All right. Well, we're gonna start off with kind of the tools of the trade. The things that we need to interact with Beocat. To begin with, everything that we use is through a session called SSH. Just a secure shell. It's a way of remotely connecting in from wherever you happen to be on the globe back over to our system. The most common one that you see on Windows is called Putty. You notice I put the new safe ones. Putty is still not the most friendly thing you'll ever find, but it's as close as you'll find because SSH doesn't use mouse interactions. It's all strictly text based. It's, you know, command line interface. So Putty is the main one we use on Windows. And I will show you a Putty session here. I'm just going to log in with ours. I have not even logged in with Putty. I just downloaded a couple minutes ago on this machine. So when I go to log in, I'm going to go to Beocat. You can read this. CIS.ksu.edu. I'll have this in bigger letters on the screen later. And it's an SSH session down here. Everything else should be defaults. First time you log in, you'll ask it if you're really sure you want to do this because it keeps track of keys of how it has a secure session. So make sure nobody's impersonating us. Of course, if this is the first time you've seen this, you probably have no idea whether it's actually correct or not. But just, you know, click OK. Like you do the end user license agreements. Yes. It'll probably be OK. You have a very nice tutorial online to do this. Yes. Thanks to Adam who recorded it. It's on YouTube. We're practically famous. Tens of people have viewed our videos. So can we copy the size? Yeah. As a matter of fact, I will show you, I actually have this whole PowerPoint available on Beocat. On Beocat. So once you log in and we go through some of the examples we have in the future, you'll see how you're going to copy things over. And so they'll come from the same spot. You'll just be able to copy it over using these same things that I'm getting ready to show you here. OK. So this is what I first look at when I log in. All it has is my name, the host I'm on. Your prompt may vary a little bit depending on how you have things set up, mindset up with a percent sign that's saying, I'm ready for you. Tell me what to do. Really exciting stuff. When I was a student working here, I actually had somebody log into a unit system for the first time. Most of the time at this point, if you said you needed a UNIX account, you knew what you wanted to do. And they said, I need a UNIX account. So I waited for you to use the UNIX account. It comes back about two or three minutes later and says, it didn't work. What do you mean it didn't work? It says it's a UNIX percent. Yes, that's what it did. That's all you get. It's a joke about UNIX. It's not a user-friendly operating system. It's user hostile using the OS. Unfortunately, that's all we're stuck using with. However, it does have some real good advantages for what we have to do. The other program, the other thing you're going to need to use, as part of the SSH protocol, say, putty is using this SSH, is called PSFTP or P or, excuse me, SFTP, I'm thinking of putty terms, or SCP. That's secure copy, secure FTP. FTP is file transfer protocol for those of you who've been on and around the internet forever, like me. Sorry. I get caught up in the jargon just like everybody else. We're big into TLA's, you know, the three letter acronyms. Probably the easiest one to use for just about any platform, whether you have a Linux desktop or a OS 10 desktop or Windows, there's a platform for all this called FileZilla. WinSCP is another one for Windows that's also fairly easy to use, but I'm going to use FileZilla. You can see all I did up here was I put in, for host, Beocat, except I put SFTP at the beginning. And that tells it, I'm going to use the SFTP protocol. I put in my username and password. And that gets me over. So now, when you see what we've got here, we've got all my files on this side, and this is what I have my local disk on this side. So like for instance, I just copied my PowerPoint to the desktop, which I had saved out there. If I click on desktop, this will show me all the files I have on my desktop here locally. Right now, this will tell me I'm looking in, this is my home directory, and then the Beocat intro folder that I saved everything in. Yes. Any mention about the use of FileZilla with WinSCP? Not really. I just find it to be, I find it to be very easy to use. So that's the only reason why I recommend it. We tend to recommend it just because it's cross-platform. It's available everywhere. We take the same pictures. It works the same everywhere. So that's the only reason. When you get to the point where you're using Unix or Linux most of the time, like I am, I don't even deal with the GUI. I use the command line stuff. Those are available on OS 10 and on Linux also. And even Windows, Putty has a P, SCP, which does the same thing each time at the beginning. And it works just like SCP. So those are all available. Easy to use. OpenSSH is built into Mac and Linux. So if you don't mind using a command interface, instead of a GUI, that's already built in. Windows has several different ones. Putty, you have OpenSSH from Sigwin. Sigwin is a series of Unix tools that's available on Windows. There's, I looked online, there's a hundred and some odd open source ones available. SCP or FTP, same thing. Like I said, they're all available through the OpenSSH tools and that's how we transfer files into and out of Beocap is with SCP or SFTP. We interact on the system using SSH. So now we're going to go into how to use Linux. How many of you guys are familiar with Linux? Damn, thank you. I'm glad you're familiar with Linux. So this will be one of those parts that make boring tears. If you've done a lot of this before, we're just going to go through some of the Linux command line tools here, some of the things you'll most likely see. And I have this, as you'll be able to see. If you go to the Beocap support page, there is a link at the top that says Linux basics. And that will take you to this URL. If you go to beocap.cisk.edu, it'll redirect you to the support pages. But Linux basics is right there near the front. So this is what we have on the page here. We're just going to go over some of the terms that we use quite a bit and some of the basic commands. I forgot my binoculars today. There you go. I'm surprised that I remembered that. Okay. In Windows, in DOS land, they used to have what's called a directory. Now they started calling the folders. The Unix guys never got over that. We still call them directories. So anytime you're here to talk about changing directories or working directories, that type of thing, that's just the folder that you're in. The shell is what you're interacting with all the time. Whenever I first log in to Beocap, I log in to what's called ZSH or the Z shell. Campus default is called TCSH. The, what is it? C, something C shell. It's an improved C shell. Yeah. It's a, it's just something you'll see in Bash is probably the other column in the Born shell. That is a way of interacting with the system. That's what gives you that percent up here. It's what gives you your name at the beginning. It lets you define your environment variables, that type of thing, which we're going to go over here in a bit too. SSH, secure shell. We just talked about that. SCP, secure copy. We just talked about that. The path. That's a list of directories that when you type the name of a program. So for instance, if I type, actually if I do set, I'm going to use some magic here. Thank you so much for that. This is telling you what directories that it's going to look in for what folders you're going to be looking in for files. You'll notice that there's all, all these things that are local to our system. There's, I've added some more stuff for my own, my, I added my own binaries folder in there. We have stuff in there for SGE, which is our scheduling system. But the one thing that confuses a lot of people is that the current directory is usually not in there. So, unlike Windows where you, if you look at something in the, on the, if you see a directory listing and the, and there's an executable right there and you try to run it on Linux, a lot of times it'll say, you can't find that because you didn't tell it to run it from there. You told it, look at all these different places, but you didn't ever tell it to run it from where you are right now, which is a little different if you're, if you're used to a Windows kind of system. Ownership and permission. These are kind of directly related. And usually the time you'll want to refer back to this is if you're copying from somebody else's stuff. I put a lot of my stuff out there. In fact, I think I have everything of mine open for everybody to use, but so you might copy it over and then you don't have permission to run it. So you need to change permissions and ownership of it and change yourself to the own owner. They're, I would say just going on to this page if you, if you need to get to that point, I'm not going to go over now a little arcane. I'll bore everybody. I don't want to, I don't want to hear anybody snoozing. I mean, if you do, you do, but I'm not going to try to get there. Um, switches are what we call the, the dash commands. You're, we're going to, we'll use these a lot with Baocat. When you submit a job, you say, for instance, dash L, then your runtime or dash L, and then your, how many CPUs you want. So we're, that's what we call a command line switch. So if you hear us refer to that, that's what we're talking about. And finally, pipes and redirects. And again, I'm not going to go real deep into this, but they're really pretty handy when you go to interact with a Linux system that allows you to take the output of one program and send it to another matter of fact, I just did it a little bit ago. If you weren't paying attention, you wouldn't know what I did. I saw, I got a list of all the environment variables. And then I said, and then send that to a program called less, which lets me look through it up and down some common lines here. I tried to, again, I'm going to let you refer back to this page more than anything else. Print working directory, let you know where you are, the path, LS for listing files and folders, CD for changing directories, CP for copy, MV for move, back in the early days of Unix, spaces that are premium. So they abbreviated everything to like two and three characters. So the old commands are all really short like this. RM for remove. And one other quick thing here is we're going to, I'm going to be using when I do examples here. If we edit files, you guys are probably going to use one called Nano if you ever logged into BayoCat. It's a, it's an easy to use text editor, probably very similar to Notepad on Windows. I don't use that. I use one called VI. VI has a very steep initial learning curve, but if you're going to be spending any time on Linux, I highly recommend that you use it and you can do a whole lot of things a whole lot faster once you, once you get to use it. So, but Nano with one we'll have you guys use probably most often is, it has like a list at the bottom says push this key, push control X to save an exit or something like that. So it'll take care of all that for you. And it's fairly easy to use because of that. So to supercomputers as a whole, what defines a supercomputer? What defines a supercomputer? So I, actually it's not so much size, but that's kind of necessary for what we're doing though. CPUs, a lot of CPUs. Yes. What else defines a supercomputer? How fast it runs? Yes. Generally kind of rule of thumb speaking, a supercomputer is about what you'll see on a regular server in about 10 years, give or take. There is no one hard and fast definition of what a supercomputer is. What we do is we take a whole bunch of computers, they are servers like you can go and buy from Dell matter of fact, several of them are ours are from Dell the most recent ones we did. And we hook them up in a specific way so that they all work together as one unit. Now, in our type of case, we have a scheduling system where we partition off and you get so long, you get so long, and you get this amount, you get that amount, and it's all, we have systems that take care of all that scheduling and all that type of thing. What types of problems are solved by supercomputers? Nor what kind of problems are solved by good supercomputers? I'm going to pick on you a lot since I know you. I can tell you what I use them for. I have, for example, very large simulation studies where I'm looking at many, many different scenarios, many different replicates. Each of them turns out to be a job or a job in an array. And if I were to run them in my computer, I would probably be waiting for the whole thing to run for two years or a spirit and get it done maybe in a week, a couple of weeks, if anybody has been trying to send things for the past two weeks, that would be me. So that's what I use it for. And they are usually pretty long jobs, jobs that would render my computer unusable days at a time. Yes, exactly. You pretty much hit everything that we talk about, which is good. This is good. This is kind of what I'm looking for. We're looking at things that are, first of all, large in size. Some of our data genomics people have data sets that are a terabyte. How many people of you have a terabyte of RAM on your computer at home? We have a few of those. Not at home, that's what I'm saying. Your department has one? Yes. But you certainly had to pull funds to make that happen, right? Because those don't come cheap, I can promise you. Fast speed. Like I said, you can run several hundred cores, several hundred CPUs working on a particular problem. In Norris case, she's doing priority. So she's changing one little thing here and then she's having to run something for a week and changing one little thing and have to run it for a week. Like I said, if you're running that one after the other, that would take forever by being able to say, hey, we can use this computer and this computer and this computer and this computer and this computer and this computer, and I'll have that done and have them all run simultaneously. That gives her a big advantage. Reliability. How many of you can say your computer stays up for a month at a time? Reliably. Mine sometimes does. If I was relying on it, I don't think I would trust it. I would definitely want to put a UPS on it, that type of thing. Of course, if they were trying to use BioCAD over Christmas or so, they'd be arguing the same thing again. Yes, that's true. We did have some issues there. We upgraded the power in our server room and things didn't go as well as planned. At times, we had some downtime that we were not expecting and some that we were but frustrated people. For those of you that that was, sorry. Things we've used for BioCAD for genome analysis, particle physics simulations, ecological forecasting, physical analysis, those types of things. As far as how big things scale, we've got about 2,000, somewhere north of 2,000 cores and more coming soon. 2,000, that makes us not a real big one. We're the biggest in the state of Kansas. Our total RAM is pretty good, comparative to other supercomputers around the college campuses. We're certainly not near the top 10 by any stretch. The biggest supercomputer, I looked at the statistics on it, the Titan at Oak Ridge National Labs. We told you we had about 2,000 cores. They have 500,000 cores. We have about 12 terabytes of RAM. They have 700 and they run over 27 petaflops. That's trillion operations per second, so that's, they scale much bigger than what we have here. However, that being said, we're still, like I said, the biggest in the state and it's not just peanuts, what we got going on in there either. So I will sort of work here. If your needs are scaling beyond BioCAD, I'm also the exceed campus champion, so they exceed recent resources, the supercomputer centers that are nationwide and national scale. And part of my job is to help people get on and use them. Now, to use BioCAD, you send us an email and I say, sure, you got an account, have fun. To use them, you have to set up and make a proposal and say, I need x100, that's really, however, I do have a, while it'd be large on BioCAD, a small hunk of time, compute time that I have available to me as the campus champion for KSU. And so if you want to get on and try out some of the bigger machines, send me an email. I'm Dan at ksu.edu and we can start working on that sort of thing as well. So as you scale past BioCAD, feel free to talk to me and we can work on getting you on some of the national scale supercomputers as well. Because K-State isn't going to be coming through with $10 million for us anytime soon to get one of those on our systems. Very good. Supercomputers, we tend to talk a lot about parallelism. So what is parallelism? A pain in the butt. What's that? A pain in the butt. It can be, a pain in the butt. Hopefully we try to take the pain in the butt part out of you, out of before you approach with the most part. What does that mean when we say parallelism? Try and provide multiple resources at the same time to tackle the problem as you usually do. Exactly, multiple resources at the same time. Now, I'll put this up here. Hard programming is hard. There's one thing that people don't understand about supercomputers about running things in parallel. That is, no system can make your systems magically run in parallel. There are too many difficult problems that arise when you just try to say, and we have people come in and say, yes, I have got this program. It runs great for me. I can run 80 times faster if I run it on one of the mages that we have here, which has 80 cores on it, and they'll request 80 cores, and they'll come in here, and their job will run, and it will use one. And because I reserved it for 80, then nobody else can use that system. That's not a good thing. Yeah, I have it on here. Some problems are harder than others to run in parallel, and this is where we're going to get into a whole lot more stuff here. I'm going to give some examples here. If you have a set of n variables of one, two, three up through n, and I say run, I want this to compute b to be four times a. How easy is that to run in parallel? How very easy. You have to probably have to do so, but you can set one core to do the one, you can set another core to be two, another core to be three, and whenever the person's done, it can do number 127. That becomes a very easy problem to solve in parallel. What about the next one down here? b to be 11 times a to the n times, that was supposed to be a square, times e to the a to the n plus sub a to the n to the 17. How easy is that to parallelize? It's also very easy to parallelize. It's a much more difficult mathematical problem, but you're still only doing one thing. You have this guy can be working on one, because this is all the same n down here. The next guy can work on two, same kind of thing. It's a bigger mathematical problem, but it's not any harder to parallelize. What about this one? I initialize to be zero, and I say e to the n is a to the n plus b to the n minus one. That should be n minus one. That should have been a sub square. Sorry. How hard is that to parallelize? Very hard. Because you have to know the previous value here before you can compute the next value of n. It doesn't matter how much parallelism I throw at that problem. I'm not going to get it going any faster by running it in parallel. That's one of those things. That's typically what we see here. Typically usage we see in our systems is you'll see some part that can be really easily parallelized. We have a little piece here that iterates over an array or some piece that we can figure out really fast, but then we'll need that entire result before it does the next part. So you will parallelize part of it, and then it'll have to have some zero part. Then parallelize a big part of it, and then zero part. That's fairly typical usage on not just us, but on any of the systems that any of the supercomputer systems you'll see. There are lots of ways of breaking that down. You can do it within your program. As you're doing that little circle part, you can have 16 quarters result that you're using all the time, and use one for a little while, and then go back to using all 16 again. That's fine. We don't want that little part, that zero part, to be running for days, because that means nobody else can use the other 15 cores that you're not using. You can do this kind of manually. You can break your own program into steps. Say, hey, here's a parallel part and I'll work on this. The other thing that we see a lot of, and the reason why Beocat, this is my big plug plug for Beocat. The reason why people invest in Beocat is because typical usage is your work on a problem, and you'll have real heavy compute needs for maybe a month, two, three weeks, maybe a month. Then you sit back and you analyze your results for six months to a year. It'll take you that long before you're doing that. If by using a centralized resource like Beocat, then you're not consuming your own resource. You're not sitting there having machines running for no purpose, not doing anything for long periods of time. Next part, we're actually getting into some programming stuff, so actually I went kind of fast over that last bit. Any questions, comments, night remarks? Yes? Are you adding any GPUs to Beocat anytime soon? We're not adding any. We have 16 GPU-enabled systems right now, and I don't know if anybody is using it as GPU systems. Shouldn't be a use case and I'd be delighted. I've got a friend that works at Intel on the Intel 5 program, and he said, look, I'll get you some on a sample basis if you can prove you'll use them, and I just kind of go, I can't at the moment. So yeah, definitely something we're looking at. A lot of the biggest systems are, I think about 50% of the top 20 systems have GPU accelerators or Intel 5 accelerators at the moment, and if you've got the code that runs on them, they're awesome. Otherwise, they just burn power and slow things down. Yeah, when we first got in, this is before I got here, obviously, but my understanding is before when we got the GPU systems, there's a lot of people that were interested in doing this, and then we got them in, and they're like, yeah, it doesn't fit so well as we thought it was going to be. Is that fairly accurate? Yeah, well, the general problem is that the, so most of the people on campus don't burn their own code, they use somebody else's package, and most of the packages that people use on campus either don't support GPUs or the parts that we use on campus don't use GPUs, even though the package as a whole says, hey, we support CUDA, and we said awesome. Like John and Chen, he's over at Biochemistry, he uses Namdi, and Namdi is an awesome package. Parts of it really make use of GPUs, the part he uses doesn't, and so it's just, oh man, and so it's an issue. You have any questions? If you want to know more about this, I'm putting the plug in here, because I stole a lot of stuff from OU. They have a series out there called Super Computing in plain English. That's the link for it. Again, I'll show you how to get to this after a bit here. Don't go on my slide, I should say. And they have a, they had it, so it was all streaming, and then he redid the course this spring, and he doesn't have the live streams up yet, so all you have is the PowerPoints, but his stuff is, he's spent a lot of time getting things to, so that people can understand Super Computing. I say Super Computing in plain English was a really good title for what he has there. Our own support pages, www.bayocat.cis.ksu.edu. Yes, that's a lot of dots in there, it's the way it works. And of course, you can email us atbayocat.cis.ksu.edu. And we have people do that all the time. That's completely fine. That's what we're there for. So if I don't have any more questions, I guess we're done for a little bit here. Take a break. I think we have some more people coming in about two o'clock or so. People already knew they didn't need this part, so. So what's coming up at two? At two, we are going to talk about parallel programming. And we're going to go through several parallel programming examples. So we'll feel free to get some iced tea, get some food, come back here at two, and or don't, if you feel like, hey, I don't need it, then we'll go from there. So thanks for coming out so far.