 Well, hello everyone and welcome to Meet the Experts. We are at the National Center for Atmospheric Research. I'm Tim Barnes, one of the science education specialists here. And every month we like to take you behind the scenes to meet the people who do the work of exploring the earth. And today we're going to find out about storing science on and for supercomputers. But before we continue, just like to do a couple of housekeeping bits. If you do have any technical troubles along the way, please feel free to hopefully can get to your chat. Just put your request for assistance in the chat. Summer Watson, who's at our national, the NCAR Wyoming Supercomputing Center. She's right on there. So I'll be with any technical questions you might have along the way. And we do have live transcripts. So if captioning or uncaptioning is at the bottom, you should be able to initialize that if you need captioning. And I'm ready to get us started here and I'll introduce our guests today, Jeanette and Joey, who are systems engineers at the NCAR Wyoming Supercomputing Center. But we should probably start out with, what exactly is a systems engineer? Joey, what's a systems engineer? So the easiest answer is someone who manages the system, where a system can range from a single computer to a large cluster like our Cheyenne Supercomputer to something like a storage system that you'll see today. And basically we're involved in the purchasing, the installation, the configuration, just of running it, getting it to stay running, dealing with failures and maintenance, and then helping our user community make the best use out of it. So it's a jack of all trades type of thing. Yeah. That's amazing. And we just saw some pictures. Was there a supercomputer in that picture? Is that what I saw? I believe so. Yeah, so this is Cheyenne. This is our main supercomputer. It has 4,032 systems all connected together with a high speed network. And this is the main consumer of our storage system that you're about to see. Oh, where's the keyboard? I don't see a keyboard or screen on that. How do your users use that? So it is all done remotely. You can log into it just from your laptop, open up a little terminal, and you can, yeah, just like that. You can just type, run your jobs, look at your data. Yeah. Wow. That's pretty amazing. And so the users, the people who are using this, are they, who are the people who are using Supercomputer, just the scientists? What do they do? Yeah, so we have our NCAR scientists. We have lots of students from graduate students, postdocs, and they do a lot of things from research. So our main research is with our climate and weather models. And the students have a variety of projects that they run. Wow, and does this operate 24 hours a day, or are there times when you turn the computer on and off? Or is it just wrong? Yeah, for the most part, it runs 24-7 outside of a few issues here and there. So yeah, you can log in at the middle of the night and do your work. Oh, wow. That sounds like it makes your job pretty important. Is Jeanette standing in front of, where's Jeanette right now? So I see her there. Everyone, this is Jeanette, one of our systems engineers. Yeah, I'm here at our data center at NCAR Wyoming Supercomputing Center. And here is where we house our supercomputers and our storage system that we'll talk about today. So this is where all the big computers are housed. So can you see Cheyenne from where you are? Yeah, so we'll take you over there and show you it's right over here. So, nothing over here. Cheyenne is right there. And it's connected to the glade system. How does that happen? Yeah, we have some high-speed network. It's pretty fancy networking and it's used to just connect the supercomputers with everything else in here, including the storage system. So it's a fancy high-speed network. Yeah. In the ceiling, I see something yellow in the ceiling. Is that what's connected? Yeah. All these yellow things, these trays, you see they have networking in them. So that's how we're connecting things together through cables that run in these trays. And is this different from how the computers that we're used to? Is this a different, how is this different, I guess, from the computer storage that we use at maybe at our school or library? Do you want to take that, Joey? Sure. So for the most part, at the base level, it's not too different, right? It's just made up of hard drives like you would have in a desktop computer. And in this case, it's made up of about 17,000 of them. Yeah, he's talking about the storage system here, right? He's talking about glade now in particular, not the computer, right, but glade. So, yeah, you want to talk about glade? Sure. So these rectangles you see here are disk enclosures and each one of them holds between 84 to 90 of these disks and each of the green lights on the front is basically the status of a single drive. So these systems have 10 disk enclosures and at the very top are a pair of controllers and the controllers basically manage all the health of the system. They watch the disks for errors and they end up serving the data to the supercomputer. The controllers basically group the disks together and we kind of build up a few layers of ever-increasing virtual disks out of these groups and that's what gets presented to the computer, I guess. So the very top part is what's controlling all of what we see down below. Yep, so you can see, yeah, the four racks here, each have a controller at the top. And it's the same setup, right? Yep. And each one of those green lights that we see that represents one of the disks that's inside, is that right? Yeah, do you have an example there, Joey, of a disk? Yeah, so just typical hard drive. The ones in our systems are slightly different. They're more, I don't know, enterprise grade disks and they have different connectors but we'll take a closer look here in a moment. But for the most part, they're the same that you would see in a desktop computer. So let's go over and let's open up. We have a drawer over here on our test system that we can open up and take a look inside. So I'm gonna step over there. Yeah, and so Glade was our, you know, we call it the big system. And so we have these smaller test systems so we can perform, you know, software updates and things like that without affecting the big system. And it's basically just to see if they work. So this is just a big drawer. It's pretty heavy. We can open up its lid here. Look inside. So it's just a one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 12 by five. But there are 60 drives right here. Glade has more drives and it's worse, but this is, you know, a smaller test drawer. And I'll take one of these out. It's just like Dory showed you. Just like you're having a desktop at your school. Some kind of drive, but we have a lot of them. And how fast does it send, can it send that all, that looks like an awful lot of information. How fast is it sending this stuff out? So a single drive can run at about maybe 180 megabytes a second, but when you have 17,000 of them put together, so the speed that Glade can run at is about 220 gigabytes per second. So a megabyte is a million, right? So hard drives, 180 million bytes a second it can do. But grade, because we have all these drives and we can have them all looking in parallel, it can run at 220 gigabytes or a billion or a thousand million, right, bytes a second. So, you know, over a thousand times faster than just a single hard drive. And for some perspective, that amount of bandwidth is roughly equivalent to about 2,000 households worth of residential internet speed. I wanted to make a quick note about that. We had a list of questions come in from a classroom and they said, how much storage can it hold? And so how many households did you say? So those 2,000 households worth of bandwidth, not of storage. Yes, so the total storage holdings is 128 petabytes worth, where a petabyte is a thousand million million bytes or a quadrillion. So it's a lot. And we do have a lot of questions and then everyone that was Summared Voice, you just heard, I think maybe we should go through some of these questions right now to get some answers like, can anyone use their supercomputer? Almost. I think in our case, the basic requirement I think is a project and a grant from the National Science Foundation and that can give you time on the computer to run your experiment. Or you can come work for us. Well, we use these computers mostly for studying the climate and so you pretty much have to have like a climate project. But if you have a climate project and you're at a university that's NSF funded, you can get access to our computers. And that's fantastic and the follow-on question to that is how hard is it to find a supercomputer? Are there lots of them in the world? I would say, yeah. There's a list that gets published twice annually called the top 500, which lists the top 500 fastest computers in the world. But there are definitely a lot more than that. Most of the national labs have at least one. Some universities have at least one. And then corporations are starting to have some on their own as well. Okay. And so go ahead. I was just gonna say that one of the industries that uses a lot of supercomputers is the oil and gas companies. They use it to try to find where the oil is underground. So they send big sonic waves through the earth and when they bounce back, they collect that and then they process it on supercomputers. So a lot of industries also use supercomputers. Excellent. And we have some technical questions now. Someone was wondering like, how do you keep this cool? And I think where Jeanette's standing, you can actually see some of the vents in the floor. Would you like to talk a little bit about how we keep this from failing? Does it ever overheat and do any of these pieces and parts ever break? And is that your job to fix things if something does go wrong? I'm gonna close this up really quick because I want to show you the I think it's perfectly made to go in, right? There it goes. All right. Can't break the supercomputer while we're sewing it, right? With the straights. So we have a vent in the floor here and I know if you can see how it's blowing my hair up. Can you see that? It's blowing on my hair. So the air comes up through the floor. We have air there coming up through the floor and then all of these systems, the storage systems, these drawers, the controllers, the supercomputer, they are the big stands and they pull the cold air through the computer and that'll then cool off the computer just like an air conditioner and then it blows the air out the back after it's been heated up. So over here on blade, you can see there's the same kind of perforated tile in front of it. So cold air is coming up being pulled through the grate and then it goes into the back and we're going to stick you in there. So I'm going to kind of walk slower so Ben can keep up with me. Ben is another systems engineer we have here. He's running the camera. So now you can tell it's kind of, it's noisy in the machine room but not nearly as noisy as it's going to be when I walk in here. So this is where all the hot air is blowing in to. So you can see there that there's a thermometer up there and it says it's about 90 degrees in here. The machine room is about 70 degrees outside of this hot room. And you can hear how loud it is. It's also got fans that are pulling up air through. So it's getting loud and hot in this room. So I'm going to step out. We're not going to get out of here too long. Loud and hot. Does that answer your question? Yes, you answered a lot of questions that someone was wondering if it was loud in there. And then there were a couple of other questions about the kinds of data that are on Glade and from the supercomputer. And connected to that, how will supercomputers help us advance? And this one question was about as a state or can these computers help in advancement for just one state or lots of states? Either one of you could take that question. Yeah, I had joy. Sorry, what was the question? There's one, failures. Well, what kind of data do they have and how do these actually help? How's that helpful? I would say the majority of the data is probably weather and climate related. If you imagine a 3D box with clouds and temperature and stuff in there, that's roughly what gets saved out into a special type of file, I guess. And depending on the size of this domain, the files can be very large. And depending on how often you save your simulation, you can have lots of them. So I would say the majority of the data is model output, I guess, that the computers calculate and then just save. And how are these useful to society? I think it's kind of a question. Well, what difference does this kind of work in these data, what difference does this data make in the world? I mean, research is research, right? You either learn something new or you learn something that you thought was gonna be new, but it didn't turn out to be that way. Just piece by piece, you build up knowledge. We're trying to understand the climate, right? And it's a very complicated, large system itself, right? And so, you know, they take data and then they bring it here and then they model what they think that how the climate works. And then they use the supercomputers and the data that they collect from the earth to try to see if their models are predictive of how the climate changes. So it's all about understanding the climate here. You know, taking rather data and there's also ocean data as everybody knows, and sun data. And they take that and then they try to model it to try to learn how our planet works. And that way they can help understand how we're changing the climate and how to keep from having, you know, bourbon warming and things like that. So it's all about modeling that data that they bring into the supercomputers. Trying to understand how the climate works. That sounds pretty important. And it also sounds like you have quite an expertise in something that's used around the world. Would you say that there's a big need for systems engineers? And could the two of you work at more than one supercomputing center on supercomputers? I mean, go ahead Joe, you talk to us. I'd say, yeah, there's lots of opportunities. You know, most businesses run some server or something at least on their very own and that requires a systems administrator. So there's small scale opportunities to large scale opportunities like a data center or a cloud type of provider. Yeah. I think I can supplement a little bit of that answer as well. In your K-12 environment, you guys probably focus a lot on coding. And coding is not the same as systems administration. Systems administration builds the server so that the coders can provide their applications to run on that system. So if you're interested in systems administration, you're going to be very familiar with an operating system, with the components and the hardware within that system. And so there are a couple of different kinds of roles that you can take in the world of technology systems administration, which is what our folks do here today. And then we also have networking, if you like connectivity and the internet and cybersecurity. And then there's also developers and the folks who create the software. So there's a little bit of everything. I know that you guys might focus on coding and your schooling. But one of the questions posed here by Karen is would schools or school districts use supercomputers? I'm going to let the folks answer this. But I guess I want to also note that at your school district, you probably have similar roles. You probably have a systems administrator running your servers at the school. You have a network engineer who's creating your internet connectivity at the school. You probably have some help desk folks and things like that. So it's a common and definitely something to consider if you're looking at a career in technology. And to add on to that, we actually like to call ourselves systems engineers. And that's because it takes all of that to make a supercomputer. It takes the big computer. It takes the networking. It takes storage. And it also takes a facility to hold it in. And we systems engineers, we do all four of those things. So we manage computers, we manage the storage. We manage the network. And we also help deal with the facility of the pooling of the computers and all of that stuff as well. So it takes four things, like kind of four colors to do a big supercomputing system like Cheyenne. Compute storage, networking, facilities. And we have a follow on, we have a question that's related to that. How do you know, well, hopefully, how much is actually stored there and what happens if something breaks? Yeah, Jerry, do you want to talk about how these disks are aggregated? Sure, I think for the storage, I don't have the exact numbers, but one of the file systems is roughly 80% full and the other is about 50% full. And when things break, so the most common failure is a disk just failing. It just stops running, it stops spinning, it just completely dies. And so luckily we have these grouped up using a technology called RAID, which stands for redundant array of inexpensive disks. And what it does is it kind of duplicates the data onto more disks than necessary so that if a disk does fail, you can just replace it with a new one and there will be a rebuild process that happens. Okay. I don't know if my internet just cut out. Oh, no, you're still there, we can see it. And you said we have 50%, one of the file systems is 50% full and the other is 80% full. If they were entirely full, do we know what that number would be if they were both at 100% which we don't want to get there, but... Yeah, the total capacity is 128 petabytes combined across both of them. So 128 quadrillion bytes, is that what that means? Yes. Which is roughly 2 million iPhones for comparison. Wow, that's amazing. And we did have questions specifically about what kind of degree do you need a bachelor's or master's to work there in Wyoming? Is that how you get to work at the NCAR Wyoming Super Continuing Center? I would say it's not required, but you definitely need to have good Linux administration skills. And a lot of those can be self-taught, basically, or going to classes or these certifications. Like there's lots of opportunities these days to learn this stuff. Yeah, I have a degree. I have a master's degree in computer science, but I've worked with people who don't have a degree at all, no college degree. So you can have a wide range of educational experiences and still be a systems engineer. Okay, and thank you. And we do have another technical question about the heat from the supercomputers and how you keep it cool enough to run and is there an upper temperature threshold? Could you address any of those questions about the temperature? How we keep things cool in there? Just talking about blade, right? It's all cooled with these tiles and we have big air conditioning units that deal with that heat. And there is a capacity and there are facilities staff here that help us figure out what is our cooling capacity and what kind of heat are these systems gonna generate and can we put these in the machine room, get it handled and if it can't, there are ways that we can actually expand the data center to have more cooling capacity so that we can bring in things that produce more heat. So there's a whole bunch of populations that go on whenever we acquire any system to make sure that not only we can cool it, but can we power it? Do we have the room for it too? Do we have the networking for it? So we consider all of that when we acquire systems, including the cooling that's required for the system. So that's calculated and we go from there, yeah. That sounds like an awfully complex process and on top of that, we did have a question about just how long it took to build the storage system and I think we have a time lapse of bringing in the supercomputer and while we show that, could you talk a little bit about how long it took to build the storage system? Sure, I think, if I remember right, I think physically building it was probably maybe a week or so. Just imagine many, many, many boxes filled with hard drives and you have to take out 17,000 drives and unpack them and put them in the slot and then move on to the next box and so there's a lot of cardboard and plastic left over after the installation. But I'd say it's probably a week or so to physically install it and then maybe double that again to actually configure it in the software and then get it connected to the supercomputer. We're coming up on time for this month's Meet the Expert. Would you have any, we talked about a lot of aspects of being a systems engineer and up there at the storage unit and the supercomputer. Are there any final thoughts you'd like to leave us with today? I guess I would say that I first became a systems administrator in seventh grade, just working in my school's library. They had a brand new lab of Macintoshes with color screens and they needed software installed and that's just how I got started. I kind of did the same thing through high school and college, just working kind of for the school on the systems that they had and then I applied here after that. So I would say it's definitely not too late to get started. Let's see, I was a system administrator just running like mail servers and things like that, like an email server. And then I got asked if I wanted to come join a biz center that was at the university that I was working at. And so I went there and they had a really tiny cluster that ran a big, tiled wall. So it was about 12 computers in that cluster. That was my first cluster. And then there was also, they had supercomputers at that university. So I helped operate those and that's how I got into this. I just kind of fell into it working at Purdue University. So, and now here I am at NCARC. Perfect. And this is the only supercomputer at NCARC, is that for NCARC, is that correct? No, there's another one. We have a development and visualization cluster that we call CASPER. And it's a bit smaller cluster. It only has 150-ish machines in it. You know, it's a couple hundred at most. And so it's much smaller, but it's used to kind of do some of the things that Cheyenne's not good at. So CASPER, for instance, has GPUs in it. And these are specialized processors that you can use to do even faster calculations than Cheyenne can do, but they're specialized, so they're hard to use. So there are people in climate studies that are trying to figure out how to use GPUs. And so they use CASPER to explore that or visualize their data. That's another thing that CASPER does as well. So we do have two main clusters here. We have been gesturing. We have small clusters too, we have a test cluster. Oh yeah, he wants to show you CASPER. Let's go show you CASPER. We have a couple of test clusters. That's about it, but this is our other one. You wanna go in here or wanna go outside? You wanna go in? Right. So this is the back, it's kind of close from the back. And it's going a little loud in here. Not as hot as the storage, I will tell you that. So warm, but not as hot. And you can see all these, you know, these are all the different computers here that you see. And they've got all the networking connected to and that's these here. They've also got power, each one. And it goes all the way down and all the way down there. So that's our CASPER system. It's not as dense as Cheyenne, so the computers are much larger. In Cheyenne, we fit four computers in a very small space. Here, it's one computer in a very large space. So let's do the foot clip. Well, maybe the same foot for the Cheyenne, about the same space it takes up, but it's only about, what, maybe a, you know, it's only like a couple of hundred instead of 4,000. This can't be the math at best. 100 size computer walls, something like that. Wow, thank you for that bonus peek at CASPER. That's really something we didn't even expect today. And before we do our closing out, there, Summer would like to add some takeaways for our visitors as well. Summer. Absolutely. And before we go there, I see that there is a question and I think maybe I'll throw it back to Pam real quick here. It says, what is CASPER? Oh, I'm sorry, not Pam Jeanette. Yeah, sorry, no, that's it. Again, it's a development and visualization cluster. So just like Cheyenne, it's got individual computers in it and we have connected it together with a very fast network. And then we utilize those computers in concert to solve problems. But the kind of things they're looking at on CASPER are more of development kind of things, trying to play around with newer processors and newer what we call accelerators, or that's what a GPU is. So playing around with those to see if they can be a benefit to climate studies. So it's not kind of a workforce cluster, it's more of the kind of development and seeing where the future lies, right, for like climate studies and things like that. Thank you, Janet. So one of the things that I did want to mention as we're wrapping up is there is a virtual visit that we had created and it lets you go through the NCAR Wyoming Supercomputing Center. So along the lines of what we talked about a little bit ago, there is also a tutorial in there to walk you through making your own supercomputer using Raspberry Pi. So our supercomputer is called Cheyenne. And in this instruction manual, it will show you how to make a Pianne. So it's a play on words, right? But you can create your own supercomputer and it will also run our weather modeling forecast software called Worf. So it looks like Katie went ahead and put our link to that in there. Also you can see everything that we talked about here today, the drawers open on the storage system, the cabinets open on the supercomputer as well. It talks about some of our older supercomputers that we had here in our Wyoming data center. And to answer one of the questions that was kind of up in the list, we currently have one supercomputer in our facility but we are acquiring a second one. And it's called Derecho. Derecho will be a 19 petaflop supercomputer. So that's 19 quadrillion math calculations per second. Cheyenne is 5.3. So we're a little over tripling our speed in being able to process data. And again, that was funded by the National Science Foundation. Our supercomputers last about five years. And I think Janet and folks might need to explain if the storage outlasts that or if the storage grows with that and is reusable. But definitely that information is listed in the link there for our virtual visit app. Well, we'd love to thank all of our guests, Summer Lawson, Joey, Jeanette, the NCARW Wyoming Supercomputing Center. And thank you all for joining us. But, and you might have noticed there were a lot of extra tidbits added today but it doesn't stop there. Next February 10th or next month, February 10th, you can join us to really dig down deep into what exactly does it mean to view coding on the next Meet the Experts? We'll have Max Grover to share how he uses the supercomputing and supercomputer and coding skills to improve our global climate model. So for those of you who asked like, how does this really make our lives better? Join us again and you'll get to dig even deeper into what exactly do we do with these computers? Research is a big part of what we do. And that even includes how to make the computers better. So thank you again for joining us and we will see you next time on Meet the Experts. February 10th.