 Part two. This will be a part where you may want to get in and play along if you've got computers with you. If not, we're going to go through things. How many here people are C4 programmers? Portran? I'm really sorry. I go back far busy. Alright, mad lab. Anyway, the programming stuff here is mainly going to be in C. If you aren't programming C or at least Portran, and this will not be the section for you, just giving you a forewarning here, you might get some insights into how things work under the covers. But this is really kind of a programming. I took Portran in I think spring of 1990, I remember correctly. I remember telling, saying that someone is going to have to pay me a lot of money if they need a part of their own Portran. And they're paying me fairly good now, so I'll learn if I need to. But I haven't had two to this point. So all the examples we're going to have here are going to be in C. Like I said, this is definitely a part, if you get bored and need to go take a break, I will not at all be offended. I certainly understand. Another question, could you please put the command online? If they wanted to copy that stuff into their home directory to look at it play with it, would you mind writing it on the screen? Absolutely. I'll just put it. All of these, I put up here that it can be copied from here. Let me just write the command. Let's pull up something you can see it nice and big. I can't edit that one, it's just a viewer, so. Why don't you just write it on the screen? Write it on the board. All right, I'll do that. Don't write it on the screen. We'll get the noise of you if you do. All right, so you're going to CP, tilde. Use a different language. Keep it. Christmas. Sorry. That one's ghost-rating. See, this is where there we are. Tilde, Kyle, Hudson, slash, info, slash, space. I'm going to put a little triangle there instead of space and dot. If you want to copy a model, instead of putting the final name, just use an asterisk. That'll copy everything over to your home directory. So this one is called work example. How many of you, I already asked, there's only a few C guys here. How many of you guys have programmed with forks before? Anybody? Okay. Don't give them any spoilers, because there's some spoilers ahead. I say this is... You have an info or intro? Intro. Oh, dang it. I wrote it down wrong. Thank you. So I have the glance at your screen. Thank you. Okay, this is how we do parallel programming. This is the old style, where you kind of get started with forks. And the reason why I'm starting there is because it gives you some insights into how things work, even with the new styles. I have the same... These are really kind of tiny. I wonder if I can get this bigger. Open up in a... I realize it's going to look that small with a skin here. Because I suggested telling him how to copy it so that he could get their own local copy. Let's see. I think I can... Change settings. There we are. That should be big enough we can see it, right? That's better. I want to turn on piloting. What's the quick key for that? What do you have? I hate when I get my key in the wrong way. Actually, let's do it this way. Okay. Big enough, hopefully we can see everything now. We got some standard includes here. That's... I'm going to worry about that. I made a variable here called bar underscore GLB for global variable. Is that for me? No? I thought I saw a hand going up. It's just that I didn't get the name of the file. Oh! Fork underscore example. Okay, thanks. Dot C. And I have to get my fork example onto... Fork example, fork example two, fork example three. So this is the first one. Start with my main program here. I create one of our built-in types just for a PID file. What's PID? Anybody? No? Process ID. That's... When you look at the system, if you do a process list, that'll be the number that's listed off to the side. The process number the system keeps track of. Like I said, if you really want to check it out, I'm not going to be tempted to walk you out in this. I have a local variable and I initialize it to zero. Then I do what's called a fork. And a fork takes a process and it makes an exact copy of it in a parallel process. So now I'm using two cores or two CPUs depending on how your machine is. And they both have the same exact information. So they both have this global variable. They both have this local variable initialized to zero. So... A mirror process? It creates an exact... it makes a copy. Okay. So... And that's technically not quite correct. But for all intents and purposes, you can just pretend that it just takes a copy of everything that's in memory and makes another copy of it. Okay? They use some optimization to make it not quite as resource intensive. You don't have to wait for that to happen. So basically what this does is this checks to see if the fork was successful. We now have two processes. We have what's called the parent, the one that started it, and we have the child process. So here we're going to check and see if we're in the child process. If it is, then we're going to increment the local variable, so it should now be one, right? We're going to increment the global variable, so it should now be one also. And then we're going to get just output what we have here. Child process, local variable, we're going to print it, global variable, we're going to print that one. If we're not in the child process, we're going to set our local variable to 10 and increment our global variable by two. And the rest of it is all just making sure that everything worked right. So what do we expect to see on here? We have two lines, child process and parent process. What do we expect to see as our output? The word of the process is running. Right, it's going to actually do it twice. You're going to see both these lines, one from each process. So we're going to see var local as one, var local is 10, and then we're going to have our global, we're going to increment by one, and we're going to increment by two. Right, so let's do this real quick. We're going to GCC, work example, and we're going to run it. And here's what we got. Just so happens that the parent finished first this time. Sometimes this is not deterministic. Sometimes you'll see parent process first, sometimes you'll see child process first. I ran this several times. It's not necessarily going to be one of the other. Our local variable on the parent process, what we expected, we got 10, and we incremented by two, so we got two. The next one, local variable, we increased by one, what we expected, we got one. Local variable, still a one. It didn't increase by one, because it was two up here, now it's a one. It created a copy of the global variable two. Even though I wanted it to be global, it didn't do it. Okay, that makes it tricky, because when you make a copy of the program, a lot of times we're not worried about just what we're doing right now. We're trying to change several different variables through the way. So how are we going to correct that? Let's try again. Those of you guys who know C, how do we deal with things on a more global basis? Global variables, how do we address them? We use pointy, this is the way that we tend to do things. We're still changing this to variable. We say, change what points to that variable. Those of you who aren't in C, I'm sorry, I really am. I just made a copy of the same program, and made some changes here. I made a global variable two, except I made it a pointer. So we have both a global variable one, global variable two. I initialize them both. Again, I fork the process, and I increment the one by one. I increment the other one by two. Now, what do you think happens? Any guesses? It's not. This is not really intuitive. This is really frustrating for people who do this for the first time ever, including myself. It's been many years since the first time I did it, but even getting back into it after a while, these are things you forget. It not only changes the, whenever it makes a copy of it, it makes a virtual copy of your address space, too. So, even though you're saying what points to location, 2,472,713, whenever you address with that memory location, it has a virtual copy of that also. It makes it really frustrating. So how the heck do we do this? As I went through here, this was actually my first attempt at demonstrating this. That's what I came up with. Wrong. I thought it was useful because I figured if I make that mistake, I'm probably not the only one making that mistake also. So, here is the right way to do it. There's one more call here. Include, we have to include. That's a memory map. And it includes some magic here. And again, that's the reason why I put it out there for you guys to copy. Because this memory map function takes some bizarre parameters. You have to tell how big it is and what permissions you want to have on it and all that kind of stuff. But then, after I get done with that, then I can have my global variable to be zero. Increase it. Do exactly what I was doing before. Everything down here did not change a bit. I changed the point of my global variable. He still got used pointers. And now, when I compile it and run it, now you notice that finally it incremented by g1, one set and one on another. In all of these cases, the parent process ended up finishing first. But that's not always the case. I bet, let's see. Parent finish first. I hate not being on my empty board. It will eventually happen. Yeah. The thing is, you can't tell which is going to be, especially when you get to multiple. Now, what do you have to do if you want to create, if you want to use on three different processors? What do you have to do? You create two childs? Yeah. You have to fork it a second time. Or a third time, or a fourth time, or however many you want to do. It's an incredibly tedious process. So in each fork, you only create two? Each fork. A copy of the current one. You get a copy of the current one. Okay. Now, if you have three lines. It's a fork, fork, fork. The first one creates two copies. And then they both go to the second fork, and they create two copies. Okay. The technical term for this is a fork bomb. If you put it in, you get too many of those. And that's a good way to either kill the machine or get your job killed, depending on how we configure things. Hopefully we configure it to kill your job. When I was first playing with forks, back as a college student, back in the 1990s, I ended up with about a half page worth of code that, due to some syntax errors, really evaluated into while one fork. While one was always true, it would fork and fork and fork and, yeah, I got some sysadmins that were not happy with me. Fortunately, I was on a machine that didn't care that much about, so they could just reboot it. But they were not happy with me. Long comes, it was really smart people and said, all of this working with forks is a pain in the rear. We need to have a better way. So they came up with a system called OpenMP. So, I stole more examples. I didn't even adapt these. Those last ones I adapted some. These I just took. OpenMP requires some different libraries. So when we compile those, I use the dash openf openmp. But we have some different ways of doing things and it makes programmers' lives much, much easier. Example, what did I call them? Openmp.io. So these. We got another library we have to use, an openmp library. We have our main program. We've got a couple integers, number of threads, and a thread ID number that we're working with. It uses its own syntax. If you're not using this, it's going to see this as a comment and it's going to not work at all. But this is the syntax they use. It's pragma, omp, parallel, and you tell it what parts you want private and which parts you want public. So private, we're going to say that within your thread, we want int threads and the thread ID. We're just going to say, get the thread number. We're going to give it a hello world from thread and whatever. Master thread only, which is the thread ID zero, which is a special designation for getting a process ID and having to compare it like we did with the fork. It just says, by the thread ID, the first one is zero. So if the thread ID is zero, then it gives the number of threads. And I think that's all there is in this. Yeah. And it's done with this part because we have, we've given it these brackets here. We told it this is what we're doing here. Then when it gets done with that, it clashes everything together and makes life a whole lot easier because you don't have to do any cleanup work or anything like that. It does all that for you. So now we don't have processes, we have threads. Threads, yes. Okay. And what did the parallelization occur? That came, what's the difference? Right. Actually, most people use them interchangeably. Technically, I got to get this right. Process is always on a separate CPU. The thread can be different. Did I get it wrong? I teach operating systems. Yes, I know. I don't. Tell them the difference. Okay. So if you call four, they give you two processes. The key thing about a process is it's separate memory spaces. It means that you have global, A is a global variable and one, when you fork it, you have two A sitting there. And they can't access each other's memory. They have to pass messages back and forth using MPI or something like if they want to communicate. Multiple threads are in the same memory space. You've got one variable called A sitting up there is a global variable. Any of them can use it. So the advantage of multi-threaded, like open MP, is that forking out the thread is a relatively fast process. Forking out the process is a relatively slow process, just to overload and confuse things. It tends to be a lower overhead and you don't have to worry about it. Because you only have one copy of A sitting there, you don't have to, it becomes simpler for a programmer to think about. I update A, I update A. I don't have 18 copies of it all named A sitting there if I had multiple copies with multiple processes. Problem is, the biggest problem is scalability. If you're using threads, you can't go across multiple machines. If you're using processes, like MPI uses multiple processes, then you can go across multiple machines and therefore get greater scalability and therefore ultimately greater performance at the cost of, it's going to be more of a pan in the butt to burger. What he said? I can even audit your class. So, and if you want a much more depth explanation, take my CIS 450 computer architecture course, we'll talk about it a lot. But if you ask where this happened, that was on this line. We told it, do this in parallel. Now, one thing I will tell you is if you go to use OpenMP, and I think I have this on here, make sure that you set the limit if you go to use this on Baocat. These examples, it's fine because these are running for seconds. This will use as many cores as it can get a hold of. So, yeah, if you're going to run this on a mage and you think you're only using two or three, you can make some sense of administrator's map if you're using this in real life Oh, but briefly, before they take it out on your process. Yes, yes, this is true. The other thing that can happen and this has run into some of my programmers is because this automatically uses all the cores available. If they're doing benchmarking, and I'm trying to say how does my code work for two cores, for four cores, for eight cores, and their code automatically always uses eight cores, their graph looks an awful lot like a lion rather than a curve. I guess technically, well, okay. You get the idea. And so, that is something to keep an eye on is how many threads am I actually using versus how many am I thinking I'm using. I'm going to go over there. There it is. I think I put this, yeah. There's a branch. Again, I would suggest that you get my PowerPoint by copying it using that there. Set none threads. That will tell you don't use any more than that many cores on a machine. We don't have them on here because like I said, these are running so fast that they're done by the time you even notice they're using them. So, it's not going to be a big deal for these. So, we're going to GCC-F open mp on mphello.c Here's what it did. Remember one of these children? Eight of them because we're running on the head node and they have eight cores. It said it gave me a hello world between one of these. Now, this one I know. What can we say about the order of that? Which everyone happened to see first. It's completely non-termistic. I'm running the exact same program several times and it's given me different order of the output. Every one of those child children are coming out at a different spot. So, which one is the parent and which one is the child there? From zero to parent. You look up here. The master thread is number zero. So, you saw it said. So, it did this part first. Even all of them did this. It said hello world from thread. You'll see there was a zero on there. But then it said after it was done, it printed how many threads there were. And there's something that says you have to use thread zero there. Just convention, thread zero is designated the boss. But you can use whatever you want to. You can use something else. People look at your money and say, why didn't you use zero? It's much easier if you keep it that way. It's like programming a for loop. Use i and j and k. That would be something fancy. Another open MP example here. Let's do some more cool stuff here. We've created some variables here. A chump size and n. n is the size of a list of numbers that we've got here. So, initialization up through n. So, up through 100, we say a is the same as b, which is i times 1. So, that's basically giving you a floating point. So, we have 1.0 through 100.10. Now, we've told it that we're sharing all of these between things. This is what makes open MP really nice to work with. As you say, I'm sharing this data. I'm making this private to the thread. So, my i, my variable that I'm using for my loop on the inside here, that's going to be separate inside each one of these threads. So, that lets me program with things. I don't have to worry about what's going on in the outside world. And these are shared to the outside world. Once again, if we're the master thread, then we ask how many threads, we print how many threads there are. It's going to let me have a chunk. I've got it. I've got the wrong example. For schedules, I'm going to... Oh, we're saying we're going to grab this many at a time. That's what this does. You can come on in. We're going to go on here. We're computing some third number, C. But we're doing it at this chunk at a time. That's what the schedule does. Chunk at a time, so we're doing it... If you saw that, the chunk was 10. So, now when we run this one... There's a seed over here if you want to sit down. I see a couple of seeds over there. So, first of all, we had process three start. We had process eight start. This is finally getting to the point where the master one said, oh, yeah, I'm here too. I told you, whatever order happens to be in, happens to be in. Number three does its work. Then number six starts, then number seven starts. Then number seven does its work, because it's in the middle of number sevens doing its work. The other ones are starting. One's doing its work. But you notice each one does 10 at a time, because that's how big the chunk size we told it to do. You're going to grab this much at a time. It can achieve some efficiencies by doing it that way. But you notice also this is working on 38 and 39. This is working on 9 and 10. Four is working on 40 and the 40s. Whatever happened to grab next, down to the end. You think that number, you know, the 90s would be last, but not necessarily because that's when it happened to grab them. Again, if I run it again, this one we ended up with 27, 28, 29. If we run it again, wow, 26, 27, 29, I'm surprised. There we are, 46 through 49. Again, it's completely non-deterministic, running the same program. One more open MP examples, and then we'll move on. What's the most common thing that people do and try to do in parallel? They're close. They actually have a matrix. Directives already built in. Again, let's say it's very close. The most common thing that you see is for loops. They see, that's what this one does. We do some initializations. Instead of making them just 1 through 100 like we did before, we take this one and multiply it by 1 by 5. We take b as i times plus 22. So we just have some different numbers out there. And we initialize c into d to 0. Then we say, well, we're telling which part we're sharing, which part's the private. We get the number of threads, section, yeah, we're going to get my screen here. Notice these are nested. This part's on the outside, this part's on the inside. This section it does, and this section it does. And down here, this is where we have our for loop. And that for loop basically is running through that loop. And there are open MP directives for matrices also. It keeps you from having the situation where, say, one thread gets done really fast and it's sitting there waiting for things to do. And then it was really slow when the first thing got done fast can pick up the slack basically once it gets done. That's what this all does. This one happens to be for a for loop, but it does the same thing for the others. And I gave the link up here. There are more examples over here. Lawrence Levermore National Labs. Based on what you said before, the difference between matrices and threads, for loops would be better suited for threads and using the open MP done. Typically, normally if someone is saying, I want to jump from a single threaded program, which is the way we usually program a program, to I want to start taking advantage of multiple threads and therefore multiple cores, no drill, open MP is usually a really good place to start. It's kind of the most gentle introduction usually. And since usually 90% of the time the program spends is going to be spent in some for loops, and that's kind of what this is aiming at. It's giving you hopefully 80% of the total performance of parallelizing your code for 20% of the work. The last 80% is going to cost you the next 80% of time if you want to go there. But if you want to say, hey, let's try this out and work with it, this can be a really easy way to do it. The easiest way to do it is ignore this completely and try to make use of Scalepac or BLAS or something that can automatically take advantage of parallelizing. If you happen to have the right library that will do it for you. But if you don't, then this is a really good way to start. So we're in this again with the core loop. You notice like threads 6, 3 near the bottom, 4 and 1, 2. They were all just getting started by the time everybody at the work had already been done. Those were the slow threats. The others had already been done ahead of time. So that's the advantage of using a built-in library instead of trying to roll your own with saying, I'm going to break this up and you do this part and you do this part and you do this part. Let it figure all that out for you. And like I say, if this is what interests you, I'm going to do some more work on this. That link over the top is where I got all these examples from. There are more there including how to use matrices. They have built-in directives for using matrices. So now we want to scale beyond the machine. We're going to use something called MPI. This is just the Wikipedia definition. It's a standardized portable messaging system between machines. It can be used on one machine. It can be used on multiple machines. That's how we get people who really are taking our best use of Beocat. The ones using lots of things. It's what we were just talking about back there with MPI spread where we're moving things across. Many machines getting lots of work done at one time. They're all talking back and forth. Now, once again, I stole examples because they're really good at this. This is Henry Neiman. I gave you the link for the OU. I'll give it up here again. Imagine you're on an island in a little hut. And on the hut is a desk. A desk is a phone, a pencil, a calculator, a piece of paper with instructions, and a piece of paper with numbers. The instructions are what to do. It says add the number in slot 27, the number in slot 239. Put the result in slot 71. The number in slot 71 is equal to the number in slot 118. Then call this number and leave a voicemail containing that number in slot 692. Otherwise, call your voicemail box and collect a voicemail and put that number in slot 715. And then we've got data down here. A whole bunch of things. Now, if you're in that situation, what do you know about what you're doing? How much of what's going on do you know? Very, very little. Exactly. You know what you need to do. You have some instructions. You have some data. You don't have any idea what else is going on. Two different kinds of instructions here. You have arithmetic instructions, logical instructions, like the adding and the comparing. And we have communications. We're saying, you're going to call this number to leave a voicemail. You're going to call your own voicemail. Listen for a message in this guy. And you're going to write, you've got to change your data. If you're on a hut in an island, you aren't specifically aware of anybody else out there. That's the way this MPI works. You don't know what else is going on. Your job is really focused. You're doing the one thing. And especially you don't know whether anybody else is working on the same problem or far that matter, a different problem. And you don't know who's at the other end of the phone line. You're just using voicemails to communicate back and forth. All you can know is what to do with the voicemails you get and what numbers that you send the voicemail to. He uses people's names and I have no idea where he comes up with these. I told you, I just pulled these slides out right. Now, suppose that Horst, somebody else is on another island, with the same kind of equipment. Suppose he has the same list of instructions as you but a different set of numbers. Just like you, he doesn't know if there's anybody else working on the problem. I have other two more people, Bruce and Dee. Each of them has the exact same list of instructions, a different list of numbers. But you could be talking to each other. It's not necessarily the same. You could be talking to the same central authority. You may not be, you don't know. You might all be working together on the same problem. That's why OPMP is trying to solve, or excuse me, MPI is trying to solve. But your data are all to you. You have no way of sharing data other than one at a time leaving voicemails. Just like on a phone call, there's two costs. The connection cost and the permitted cost. And this leads to this YouTube video which you may or may not have seen. I probably don't have any volume on here. I didn't even think about having to think what's in here. This is a phone, this is a dollar. It's going late, but that's good. The novice number and all your long list of calls from home costs less than a buck. That's right, 10, 10, 220 all calls up to 20 minutes over 99 cents. Talk longer and it's just 10 cents for each extra minute. No fees, no contracts. Am I right, Bucci? Just about 10, 10, 220. That one, then the number. Bottom line, you get up to 20 minutes on this. You got that? Good. It's not a mistake and I think nature's calling my dog. So the 10, 10, 220. How do those guys make any money if they're giving you phone calls for 20 minutes per buck? How do they make any money? What's the average price for a phone call? Zero. For those of you not using Skype. What's the average cost for a long-distance phone call? 8 out of 7, it's free, right? On cell phones, yes. About 10 cents a minute for additional costs. They're giving you 20 minutes per buck. So how are they making that work? How do they make any money off that? They're going to wholesale. Let's say it cost them 7 cents a minute instead of 10. How are they making any money off that? That's about 5 cents a minute. If you're taking 20 minutes and divide it into a dollar, they've got a quarter of a volume. You call less than 20 minutes? I call less than 20 minutes. What the average length of a phone call is? 3 minutes. So they're getting a dollar for you. So that's 33 cents a minute instead of 20 cents a minute. So there's a connection charge. That's what they're doing here. They're making it sound like a permanent charge. You've got a connection charge, which is the fixed cost of connecting your phone to someone else's. The only connection for a second. That's about 1 cent per minute of talking once you're connected. So if a connection charge is large, then you want to make as few calls as possible. Now, they did some benchmarking there at OU. And... Let's see if they have this here. Nope. They don't have it on here. They said it's basically the equivalent in real terms of doing this kind of a system except having $150 connection charge and then having 1 cent per minute after 10 days. So the setup fee is huge. And that's the way we see with MPI programs is the connection... to make that initial connection is huge. And we see this a lot because we have people trying to make MPI calls and they're not really doing anything with it. They're making a couple calculations and sending things back. And if you're going to do that, well, the time it took you to do the communication was longer than it would be just to do it yourself anyway. So what we're going to do is... if we're going to do it, we need to make sure that we're doing something that will scale to this level that we use MPI effectively. And there are a lot of programs that we do run on BioCAP that fall squarely into this category. We were just talking about GP News there rather than having very small type of operations that were very fast when you look at larger operations. Actually, it's very similar. A lot of our GPU program has the same program. It takes a long time to take your data and copy it into the GPU. And then do some work on it. So the more you're doing copying in and out, the longer it takes. So the gist of it is anytime you're working with multiple processes or using MPI, so essentially as soon as you carve your program into either separate machines or separate address spaces, communication tends to start off... Anytime you initiate communication, it gets expensive. So you want to minimize the number of phone calls you make. But once you make one, blab. Use that guy. Talk for a long time. So rather than sending 10 word messages, send 100 word messages and you're probably going to get overall much better performance. So using this, one of the advantages of using MPI, one of them we saw in the definitions if anybody was ordered not to read through the entire definition, different programming languages. Because of an open standard, we have it at, you know, some people use things in Fortran, people use them in C. There's all sorts of implementations. There's our implementation of MPI. Because you're only dealing with a limited number of instructions, limited number of connections. You can talk among those programs fairly easily. Interaction among different machines. I just talked to somebody during the break was you're going to be using 64 different machines of Beocat and one of those things. He said it actually ran faster than running on a single machine, which sucks at me, but I'll take it. It's really good for data collection. You have one central node that you're using to collect data from several different places. This guy can talk to... you can talk over there. You don't necessarily need to talk to... have these different nodes talking to each other. You can. There's nothing wrong with it. But it works really well for... something very similar to what we use for a World Community Grid, where they say whenever we don't have any jobs running on Beocat, we join World Community Grid, which is based out of New York City. And they say, work on these calculations for a while. And they give us a bunch of data and a bunch of programs to run. They send it back to them and they're solving big, huge problems that we don't have to deal with here. We only have that running whenever... like I said, whenever we're setting idle and no need to have machines like that setting not doing anything. And scaling. Obviously we can scale to several hundred CPUs even in this case. Even on our system. They can all talk to each other. Whereas the biggest individual node that we have on Beocat is 80 cores. We can get beyond that within VI by scaling that even larger. Disadvantages. Cost of getting started. We do try to minimize that. We run the infinite band between our bigger nodes, which is very low latency. That's the process to get started. It doesn't solve the problem, but it helps. And again, it's not efficient for small amounts of data. And... it's complex to code. It's a lot harder to debug. Yes. Because you've got... potentially you're running on ten different machines and one of them decided to start up slowly or something like that. And you get the error message that says hi, one of your ten jobs failed or maybe you don't get any error message because you're not shutting down your system and then trying to sort it out. In general, openMP... So a single threaded program is, as you all know, a pain in the butt to debug, but certainly a reasonably common task if you're used to it. You go multi-threaded with openMP, it's still a pain in the butt somewhat more so. You go MPI where you have multiple individual programs while running it just multiplies the ways things can go wrong. Where if you need it for scalability or for performance, like Kyle was talking about the advantages, it's awesome. You need to integrate some Fortran libraries with some C programs and run them all in distributed fashion. Of course, Fort. But on the other hand, you have to kind of decide is this really worth it or not. And if it isn't, then you might stay with it. If your program will run in a couple of hours anyway, don't mess with it. Or even a couple of days maybe. But if you have something that's going to be running for months, heck yeah. I'm not going to go into, I do have one example and you can copy it from my directory because we're getting short on time here. But there's a, again, you copy this file back to your own directory, this MPI example. Stole it from Colorado. You know, if you steal from one person's plagiarism, you steal from lots of research, right? So I'm stealing from lots of places. There is one gotcha that is you have to actually compile with MPICC. So is that when using OpenMP and MPI? No. Then why do you need the... Yes it is. I'm sorry, but yeah. Yes, you're right, it is. It is. So that's why this one is supposed to open MP. They had to go and make things with very similar names that do completely different things. Actually similar things, but not exactly the same. OpenMP, again, is the same one within one machine. MPI... Basically open MPI. So we have openMP and openMPI which do different things and you leave and catch us off guard if you're not paying attention. I'll interject following on Adam's thing. If you're a beginner, cover your ears and go blah blah blah blah blah because you don't want to listen to this. If you're advanced or thinking about it, you can't actually combine the programming models. You can't have an openMP program that uses MPI and vice versa to get MPI so you can go across multiple machines and use openMP on the same machine. This is something that, for instance, if you're coding for the National Supercomputer Centers, they'll say, oh, we like you because you're getting the most efficiency out of the machine. On the other hand, if you're just saying, look, I just want my program to run somewhat faster, run, run fast and far away from this because it introduces whole new ways for your programs to explode. Yeah. I will interject here that there are lots of toolkits that we have. These are already written to take advantage of as much as they can. Some of these use MPI. Some of them use openMP. Some of them are still using the works. It doesn't matter because we don't care because it works and it's tested. So that's why we encourage people to use toolkits. There's no need to reinvent the wheel. You can download your own, run it out of your own home directory. There are lots of bioinformatics tools. There's a user on BioCAD called BioInfo. There are a lot of tools in that user's home directory that are fine to use. So, for instance, Top Hat and I forget, Cufflinks and BWA. I'm not going to say so. So if you find... Do you have a list about the bioinformatics tools? I think it's fine. It's the two-grams lab. It's just in BioInfo from mom's. And then there's a folder called BioInfo software, BioInfo underscore software. So, like I said, if somebody's already written it, don't go to the hassle if I don't rewrite it yourself. It's not worth the effort. If it doesn't fit, that's one thing, but if it does, use it. That's why they're there. Here's the link for the OU super-computing in plain English, if you want that, and stuff to see us. I only saw one person fall asleep, so I think we're all right. Get up, move around. Next time we're going to be on actual use of BioCAD itself. I need more drinks.