 Welcome to the New Mexico Smart Grid Center webinar on high-performance computing and data management. I'm Selina Keneally, the Education Outreach Manager for New Mexico EPSCORE, and I have a few high-skipping items before I introduce our first speaker. First off, this webinar will be recorded and archived on our website nmepscore.org, so that will be available in the next couple of days after it's taken care of there. Next, we ask for you to type your questions into the Q&A box on the webinar interface there at the bottom of your screen. We'll pause after our first speaker, Deanna, to answer questions, and then we'll pause again after the second speaker, Carl, to address questions, so feel free to put your questions in that Q&A box at any time, and then I will moderate those questions when we pause for those questions. Finally, I want you to mark your calendar for our next webinar, which is on March 25th at noon, when we'll hear from students who are doing research with the New Mexico Smart Grid Center. We'll have students representing all four of our research groups presenting about their research work. And I'd like to welcome our first speaker, Deanna Dugas from New Mexico State University, who will be speaking about the high-performance computing resources that are part of our work. She has two titles, ICT Director of Instructional and Research Support, and Cyber Infrastructure Architect. Deanna is part of the Cyber Infrastructure team with the New Mexico Smart Grid Center. Deanna, take it away. All right, give me a second to share. All right, I'm assuming everyone can see ICT supercomputing up on their screens. Looks good. Excellent. So this is the website that is going to be most useful for anyone that is interested in using the HPC. It has a really difficult name to remember. It's hpc.nmsu.edu. Try to remember it really hard, I know. This is your go-to response for pretty much any question that you might have. And I'm just going to do a quick shameless plug right here tomorrow and Friday of this week. We have an AI and NSF grant writing workshop. I point that out in particular because we have somebody coming in from West Texas A&M University who will be talking about grid data, distributed energy resources, and AI. So for anybody who happens to be on the EPSCOR grant interested in this, everyone, please feel free to come down. We're going to be at the DACC Workforce Training Center all day tomorrow and all day Friday, and the AI talks are happening on Friday. So to return us back to the HPC website, I wanted to talk a little bit about what is exactly in Discovery. So when you go to this website, if you go down to Discovery and Discovery Details, it'll give you information on the computing resources available here at NMSU. So we have 25 compute nodes, 11 GPU nodes, and two high memory nodes. If you scroll down a little bit, we have a couple of EPSCOR partitions over here, EPSCOR resources. So the GPU nodes 8 through 11 belong to EPSCOR, which means that anyone on the EPSCOR proposal that has access to the EPSCOR partitions has access to these GPU nodes. You have priority access, which means that you're pretty much competing only with the EPSCOR users for these resources. The rest of the NMSU campus, anybody who has access to the system, can have access to these resources, right, the EPSCOR specific resources, through something called a backfill queue, and we'll talk about that in a few minutes. This backfill queue has lower priority, so if there is somebody who happens to be using an EPSCOR designated GPU node at the same time that you need to come on and use all four GPU nodes, their job will be paused and yours will have the ability to run. However, if I am an EPSCOR person and I am using four GPU nodes and you are an EPSCOR person and you need one GPU node, if you're going to run on the EPSCOR partition, you're going to have to wait until I am done running my jobs before yours will run. You do, however, have access to the entire cluster as well, and so you can use the additional GPU nodes that are available to everyone, or the lab-owned GPU nodes through backfill. So the EPSCOR resources are four GPU nodes, and there are also two high memory nodes, each with three terabytes of RAM. So if you have a job that requires a lot of memory on a single node, these two are your go-to resources. Discovery itself has several partitions. I've already named a couple of them, but we'll go over the important ones. So this last one down here is EPSCOR. If you want to run your stuff, your jobs, your analysis on the EPSCOR partition, you have to denote the EPSCOR inside of your Sbatch script. So in this case, if you don't put in a partition, you will automatically be placed in the normal partition. The normal partition up here is the default queue, and it has a maximum wall time of seven days, one hour. It does not, as you can hopefully see, contain any kind of GPU nodes. So if you have code written for GPUs, you don't specify the EPSCOR partition, you're going to have a difficult time running your job. There is one GPU node that is available for everyone to use. It's a common use resource under the GPU partition. That's Discovery G1. Backfill, I talked about just for a second. So backfill actually takes all the resources on the entire cluster and allows you to use them. Again, it does have the lowest priority, so therefore, if somebody who owns a particular node comes on and needs to use that resource, your job will be suspended and ideally pick back up once their job is done. If that resource becomes incredibly busy, your job may just hang there indefinitely. So keep an eye on backfill. It's an incredibly powerful resource that gives you access to 11 GPU nodes if you happen to need them at a particular time, but it does mean that you are now possibly going to have your job suspended and not complete in the time frame that you were hoping it would. The EPSCOR partition again down here is the one that you want to denote if you want to use the EPSCOR resources. Again, if you do not use, if you don't designate a partition, you will automatically be dumped into normal. It doesn't matter if you are designated as an EPSCOR user, you have to designate yourself inside this partition in order to use it. It also has a maximum wall time of seven days, one hour. And I think that's about all I want to say for partitions. So, right, if you do find yourself in the need of computational resources that need to be more powerful than what you have available on your laptop, on your desktop, that old machine sitting underneath your desk somewhere. The way to gain access is to go here under requests and do an account request. This is valid for both NMSU users and NMSU affiliates, which everyone who is a part of EPSCOR, but not on the NMSU campus would be designated as. Under account requests, we ask for a couple of bits of information. We want to know who you are. We want to know your university email address. That's UNM, put UNM. What department you happen to be in and your affiliation with NMSU. So, if you are an NMSU student staff employee, you select the appropriate one here. If you are not, right, we've got a head and designated EPSCOR as its own call out to make it a little bit easier for those EPSCOR users not on NMSU's campus. This is the one you're going to want to click. If you are on NMSU's campus, if you're a student, faculty, staff, please go ahead and select one of these top three, even though it says EPSCOR on this bottom line. This EPSCOR designation is there solely to make it easier for non-NMSU EPSCOR users to know which one to click. If you are from off campus, right, we're going to ask you to fill out the following form, this HPC VPN form. I'm calling it out because, right, if you click on it, it does pop up, but this form is vitally important. The HPC is set to behind a firewall on campus, so you're going to need VPN access in order to reach it. Click on the VPN access form. I'm just going to walk you through this real fast so that everybody has a better feel for what needs to go where just in case there's any sort of confusion. So the very top section, we're asking for your information. So if I'm a student at UNM and I need to gain access, right, my information is what I fill out here. So it's my name, where I work, my title, my phone number, my email address, and my department. Next, we ask you to describe the reason for temporary access. This VPN form is used for anyone who needs to access the HPC who isn't an NMSU person. So the simple thing to write here under reason for temporary access is working on the EPSCOR ground. You can be as detailed or as not, as you wish, but simply stating something along the lines of needing it for EPSCOR research is plenty. I don't need to know your entire dissertation. Next is the data use agreement. Please read over this. It describes what types of data can be used on the HPC. If we end up getting audited, this means that if we find a restricted data set sitting inside of your home folder that you are going to be responsible for whatever legal actions need to occur on the other side of things. So as I try to explain to everyone who works with me, please read before you sign things. All of the EPSCOR information as far as I understand should be non-classified, non-sensitive, non-restricted, but just keep that in mind. This machine is specifically designated to be open. That does not mean that other people can see your data, but it does mean that it does not have the same sorts of privacy restrictions and security protocols in place to allow things like HIPAA, FERPA, any sort of legally regulated data to reside on it. Once you have read that and are comfortable with the fact that your data is something that can be on this HPC, go ahead and sign your name, well, print your name here, sign your name and date it. After that, it needs to be sent off to Anne in the EPSCOR office. So Anne Jackal, EPSCOR, department you can leave blank. Go ahead and finish filling this out to the best of your abilities and then send it on to Anne and I'll show you where to easily find her email address in a few minutes. She will verify that you are part of the EPSCOR proposal, right, that you're working on the project. She'll sign it, date it, and send it to the NMSU sponsor. So once she signed it, she's going to send it to me. I will fill out this portion over here, date and sign and conveniently for EPSCOR, I am also the ICT director of instructional and research support. So I can immediately turn around, sign this bottom line down here and the VPN access should get to you relatively quickly. So going back to the account request form, because remember now we've requested VPN access but we still don't actually have an account requested for the HPC. We're going to finish filling out this account request. So in the field below, provide a brief summary of how you're going to use the resources. Again, we don't need dissertations. It doesn't need to be more than a sentence or two long and something as simple as I needed for EPSCOR related research is a good enough description here. Next, we're going to ask you a couple of questions about how comfortable you might be on the system. So have you ever used Linux or Unix? If yes, it'll automatically ask you, have you ever used a high performance compute cluster, right? And then you can select yes or no there. If you select no, it doesn't go on further because most HPCs run off of Linux or Unix operating system. So if you don't have any experience, it seems unlikely that you're going to have ever used an HPC. Don't take offense to that. And if you're concerned, please don't be because the next thing that we're going to ask you to do is meet with us so that we can get you onboarded. All right. So I have never used Linux. I'm going to select no. And now I have to read this data use agreement. This is the same thing that was written inside that VPN form that I showed you. It verifies that you agree not to store or use sensitive regulated data on this particular HPC system. Again, please read over it. It seems like legal mumbo jumbo. If there are any questions, please let me know. It's a very important thing that you understand what type of data can be allowed on the system and what type of data can't. If it happens to be a type of data that is restricted, we will find you another location to run that on. Just contact me and I'd be more than happy to help. In the end, we have to verify that we are not robots because it's the internet and half the things out on the internet are robots. We click the I'm not a robots box. If I had actually gone ahead and filled everything out, it will go ahead and submit for us. Account requests may take up to a week to come back to you, but we've been pretty fast about getting people accounts within a couple of days. I do apologize for the first two EPSCOR users. They had a little bit of a delay because we were trying to change our general VPN form over to an HPC specific VPN form. There's took a little bit longer than we would have liked, but everyone else should have an easy smooth transition onto the system. If you don't remember any of this, if this was overwhelming, if you are confused, if you got lost, we have the EPSCOR web link over here, and it sort of walks you through in texts what we just covered. It talks about what types of resources the EPSCOR grant purchased tells you how to get an account. If you select the affiliates, there's another link to the VPN form. Download it, fill it out, send to Anne. Here's your email address for those of you that don't have it memorized yet. She will forward it on to NMSU. The next part talks to you about how to actually get on to the VPN. Again, if you run into any sorts of problems, please let us know. We also have a link to the user guide down here. There should also be a request, let's see, contact us, FAQ and office hours. Contact us. We'll send us an email. There's an HPC team that is here to assist. You can either send your message through this, or you can go to office hours and our email address is right here. It's hbc-team at nmsu.edu. Once the paperwork has processed, somebody from that team, that particular email address, will contact you and ask you about how you'd like to do your onboarding. If you are an NMSU person, then you can either do the onboarding in person. You can attend one of the monthly meetings. They happen every month, the first week, or you can, you should automatically have access, and you can take the Canvas learning course as well. That will explain to you the basics of Linux, if you need assistance there, and it will also teach you how to log on to the system. It'll teach you how to submit a job. It will talk to you about how to find the software that's installed on the system, how to write that software into your job submission, and it also covers how to request software. If it happens that one of the softwares that you need isn't available under request, we have software request, and we have a pretty good turnaround time there as well. It normally takes us less than a couple of days to install software for you. Um, onboarding is required if you are an NMSU affiliate. We can try to get you into the Canvas course, so you can take that at your leisure. But at the same time, we're also more than happy to have a Skype or a Zoom or your particular conference call of choice to sit down and walk with you through the onboarding. Make sure that you're comfortable there. And the one thing that I would like to emphasize is that even after onboarding has been completed, if you ever run into any sorts of difficulties, if your script doesn't run, if it crashes, if you're having questions about getting the right resources, being on the right partition, please let us know. It's our job to be here of assistance to you, and we've had people ask us to check over their scripts. We've troubleshooted things, troubleshoots, troubleshootings. We've helped people figure out what problems they might be running into. When it comes to the EPSCOR project space, so you can see down here, one of the requests we have is project space request. So groups of collaborators and teams can have access to a similar space. If you are a non-NMSU EPSCOR person, then through the VPN form, you will automatically have your home directory set up inside of the EPSCOR partition. Everyone has, that's affiliated with EPSCOR has one terabyte of space available to them. So that space being created for you will be automatic, but, and I cannot stress this enough, if you want to use the EPSCOR resources, you need to specify the EPSCOR partition when submitting a job. If you are an NMSU person and need access to the EPSCOR project space, you're going to have to request it because we don't know that you belong in EPSCOR unless you tell us. We're not mind readers. So if under the reason for the account you've put in something regarding EPSCOR, we'll go ahead and take that in consideration and try to automate that process for you. If you already have an account on the system and need access to the EPSCOR partition, then we need an email from you asking us for addition to that area. The EPSCOR partition and the EPSCOR project space need to be requested. If you request one, we'll go ahead and put you into both of them, but you will need to let us know. And again, whether or not you are an NMSU affiliate or an NMSU person, you're going to need to specify the partition you want to use when you run a job. I think that pretty much sums up what I wanted to talk about. So I would be happy to take any questions if there are any. My email address is relatively easy to find, but again, this HPC-team at NMSU.edu reaches me. It reaches the HPC administrators and it reaches the team of graduate students that are available to answer questions. So if there's any sort of issue, any question, I would highly recommend sending to HPC-team versus me directly so that you can reach more people who can answer your questions at any particular point in time. All right. And with that, I'm going to stop sharing. Thank you, Diana. That was excellent to hear about how people can access that incredible resource there at New Mexico State. And we'll be working on the EPSCORE side to make sure we have that documentation available on our website as well. I want to encourage anybody who's listening who has a question. If you'd like to type it into the Q&A box at the bottom of the Zoom webinar box, we'll be happy to answer those questions. We'll pause a few seconds here to see if anybody has questions and then we can move on to our second speaker. All right. Seeing no questions, let me go ahead and introduce you to our second speaker. I'd like to welcome Carl Benedict from the University of New Mexico. He also has two roles here. He's the Director of Research Data Services and the Director of Library Information Technology Services. In addition, he's part of our cyber infrastructure team with the New Mexico Smart Grid Center. You'll notice that there is a Cutley URL at the bottom of Carl's slide and Carl's going to be walking you through this document and we thought you might want access to this document as he's doing that. In addition, I can type that into the chat box so you could access it from there as well. Carl, take it away. Great. Thank you, Selena and I will now start my sharing and pick the right document. Great. So hopefully now you're seeing the document that was linked and that link is also here at the top of the document that you hopefully have on your screen for quick access and it looks like the link has also been pasted into the chat for quick access as well. I'm not going to go through this entire document in detail. This is a more detailed workflow document that John Wheeler and I produced for an EPSCOR data management and high performance computing meeting earlier this month but it will potentially serve as a useful point of reference for future interactions with our team for getting the data that are produced by the project into appropriate systems for both long-term preservation to make sure that those data products are safe and in an environment where they will be available for a long period of time but then also so that we can get them into appropriate systems for discovery and reuse because one of the key requirements that we have as a part of the EPSCOR project is as a part of our data management plan which was written and submitted as a part of the proposal is to in a timely manner take the data that are created or acquired as a part of the project and make them available in a well-documented and reusable format for effective reuse and that reasonable amount of time in our data management plan is pretty much defined as no longer than 12 months after those data are collected or created or when those data are associated with a publication so if there's a publication associated with those data earlier than that 12 month window those data should also be made publicly available. This is aligned with also the requirements that a growing number of publishers have in place for having the data that are associated with the publication publicly available for the reviewers of a draft paper that has been submitted but also for the readers of that journal article when it gets published. So the capabilities that I'm talking about here are essentially available to meet both our professional obligations to maximize the impact of the effort that we're making in generating and using high quality data for our research and then sharing those data with other researchers to facilitate further use of those data but also for meeting our obligations to our funder the National Science Foundation and also for meeting the requirements from publishers as we are submitting our papers for review and if you have any questions or when you want to start the process of preserving and sharing your data you can reach both John Wheeler and myself through the email address that you should see here in the middle of the screen it's just rds for research data services at unm.edu as as our program here at the University of New Mexico is helping to coordinate our data management and preservation and access activities for the project. So what I want to do now is jump down to the first of the first two figures in the document where this one provides a high level view of the research life cycle the research process that any of our researchers in the project are participating in and highlighting some of the data management preservation documentation and sharing activities relate to the processes in the research process so that we can provide essentially the linkage between the research process and the data management that produces the data products that our data management systems are set up to be able to share. So as we can see on the left of the diagram we have this research life cycle that starts with the development of research ideas identifying partners in the context of our EBSCOR project we have a large number of collaborators about institutional collaborators and individual collaborators the development of the proposal and in this case the data management plan that was submitted as a part of the EBSCOR project provided that high level outline for how we are going to share and preserve and manage our data as a part of the project and the data management plan for this project is actually linked from the EBSCOR website and it's it's fairly straightforward to find and we can share the link for that later. But now when we're actually in the research activities associated with the project we end up in these four blocks that are highlighted as the research process sort of in the middle of that research life cycle diagram on the left. As a part of that we're having there are certainly data that are being collected there's modeling that's being done and associated with all of this the development of documentation associated with those data sets and the documentation is key both for being able to support and and provide the backup information for the data that are being used during the project and the documentation needed to effectively share the data with our collaborators in the project as we're working in our research teams we need to be able to have well documented data for our teammates and that that also produces the foundation for the documentation that we need to be able to effectively preserve and share those data later in the project. In some cases we also are developing analysis code so that might be code in R or MATLAB or Python or many other analysis environments and then ultimately in many cases we are sharing all of this with our collaborators. To support that we have essentially this this cylinder here that is representing the shared storage that's available to all of us through the high performance systems but we may also have local workstations and and shared storage as well for managing the raw data the documentation or metadata the analysis code any process data or reports that are being generated. This is essentially our local resources that we have available for managing the data during our our research process. What we then want to do is to actually identify those data or other objects that need to be preserved and shared keeping in mind that we may have some data and and Deanna highlighted some of this in terms of the the restrictions on storing storing data that are subject to regulation like HIPAA or FERPA student data health data other data that might have national security or other restrictions on them that would be also a part of this identification process for preservation and whether or not any of these data would be appropriate for more broad sharing and this is this is a an assessment process that John and I can help with in terms of figuring out the best strategies for both meeting our sharing and access requirements but also addressing issues related to confidentiality or privacy. Ultimately the data that are identified as being appropriate and necessary for preservation we bring them into our preservation system we then do a further evaluation to see how those data can or should be shared we then ultimately deposit those into an appropriate repository where those data can be discovered and accessed and reused and as a part of that access and reuse process and the publication of data into that system we want to be able to attach to those data sets persistent identifiers often those are digital object identifiers or DOIs that you're probably familiar with from journal citations journal article citations that are are now often associated with with with DOIs so that those DOIs provide a permanent connection to that object which facilitates more effective citation of the those objects and it's that site those citations where you then get credit for the work that you've done the contribution that you've made through the sharing and publication of the data that you've generated just as you expect to get credit and and and and the the necessary recognition for the publications the resources you've done and published about so this is the general purpose process that that relates to the research activities that are going on in the project i now want to move on to the specific workflow that we currently have in place for being able to support our researchers in the project for going through this process and i'll go to the next figure in the paper which is figure two if you're following along and at the far left side of this diagram you have that same that same cylinder that storage system that you have access to that may be in our high performance computer systems it may be in shared disks within your laboratory it may be your local hard drive it just depends on the nature of the work you're doing when you're at the point where you need to be able to get those data into a preservation system and ultimately perhaps get those data into a sharing and publication system you can then at this point today you can email us at rds at unm.edu to start the process we're in the final stages of creating and and making available in online form where documentation about the data can be entered and and the associated data uploaded if the data are are too large to be practically submitted through the web form we'll have an alternative way of receiving those data ultimately whether you go through the email process that's available right now or the online form that we will have very quickly those data will end up in a shared location and that location may be in the hpc storage in in the in the systems at new mexico state it may be in the shared storage systems in the center for advanced research computing at unm it may be in a shared one drive location that we can provide you from here at unm or if it's a source code or perhaps some other text based content text based data it might be in github or another another location in terms of that shared location where those data can be stored we can work with you on determining what the best strategy is going to be for getting the data into that shared location ultimately then we get to this pay this phase where we're going to iterate or work with you on making sure that the data are are aligned with the documentation that they're consistent ensuring that the data formats that we're using and the doc the information that's associated with those data is clear and really meets the core needs of of being able to effectively discover the data being able to allow others to understand the data to the extent where they could then determine where the where the data might fit into their research project and then enough documentation for them to effectively be able to use the data information about the structure of the data the variable names the column definitions the content of the data set so that they can effectively use it in their analysis our experience has taught us that this is typically a back and forth process as we go back and forth with the researcher to flesh out the documentation to clarify any ambiguities there may be in the structure of the data to ultimately try to produce a high quality combination of data and documentation um after that process we then bring the the combination of those data and documentation into our staging system where we then go through two additional processes one is the integration of those data into our preservation system that we are hosting here at the university of new mexico and that ebscore has identified as essentially the the dark archive for the data that are produced by the project it's dark in that it is not publicly available and it allows us to put data that are generated by the prod project into a system that is specifically designed for long-term preservation of data that that is able to assure that the integrity of those data are maintained and that if we ever need to in the future we can retrieve those data from the preservation system and that they will they will still be valid and usable um and that that is based on a system called lib save from a from a service provider called lib nova but that's actually a process that we take care of as a part of our work in research data services in support of the project and any of those data products or document products are coming out of the project can potentially go into that preservation system if we determine that it's appropriate to do so um the second and right now parallel process is to also integrate those data that are determined to be appropriate to share and make publicly available um the integration of those data into an appropriate repository and the repositories that we have access to right now and that are some of the um some of the target repositories that we identified um as a part of our data management plan um is the digital repository at the University of New Mexico um we also have the integration of Zenodo and GitHub so if you're using um GitHub as a way to do collaborative software development or analytic development um Zenodo has a nice connectivity with GitHub that allows for generating snapshots of the content as a software repository and the generation of DOIs associated with those snapshots that provide good stable citations for the particular point in what might be a continuous development process for that software code um and also a capability that has been added since the project started is um an institutional membership in Dryad which is a community general purpose repository that's specifically designed for data and that institutional membership is through the University of New Mexico and that is an excellent repository for us to use for depositing data sets that are coming out of the project if there aren't other more appropriate disciplinary or domain repositories and this is also what we can help with um where in some disciplines there are already designated repositories where those researchers typically go looking for data and we can help to identify what those repositories are and work on getting data into those repositories when it's more appropriate than going into a general purpose repository like Dryad. Anyway you cut it um at this point everything basically from the middle to the right hand side of this diagram is handled by our team here in our research data services program here at the University of New Mexico once we've worked with the researchers to ensure that the data sets and the documentation are going to be uh are going to be usable for achieving the maximum impact. So I wanted to very quickly hit on a couple of nuances of this process just as a point of information and you can read through the document um uh at your leisure to fill in some of the some of the more detailed bits of information. The first is this information about Dryad. As I mentioned Dryad um is a uh is a general purpose repository that is with the support from this the EBSCOR program the University of New Mexico is an institutional member which means that we are able to um deposit data into Dryad without having to pay any additional fees. For folks that are coming into Dryad from outside of an institutional membership there are typically fees associated with depositing content into into that repository through our membership we don't have to do that. Um we're actually suggesting that everybody who is wanting to share and preserve their data um do that process through our through working with our research data services team just so that we can um help to do some of that initial assessment for alignment between data and documentation. Um and um but and and that's why we're encouraging everybody to use that process. Having said that UNM affiliated researchers can actually go directly to Dryad by using their ORCID identifier and UNM credentials. So that's a way for UNM researchers to be able to deposit content directly into Dryad but again we we strongly recommend that you work with us even if you're a UNM affiliate so that we can help support uh developing the highest quality data and documentation that is going into the system. Um the remainder of this document actually goes through the process for getting an ORCID which is a unique persistent researcher identifier. If you don't already have one we strongly recommend getting one for yourself because it will provide a career long mechanism for being able to have a unique identifier for yourself that you can attach to publications, published data sets, grant proposals um and other areas so that you can have the most straightforward connection between the work that you're producing and your individual identity because um our names are not unique identifiers for us where ORCID identifiers are highly um are truly unique to each of us and as they're associated with our products it's easier for us to get credit for the products that we've produced. ORCIDs are key to be able to log in to um Dryad as a repository and uh and by providing those ORCIDs for objects that are deposited in Dryad it's actually it significantly streamlines the automated process for maintaining your ORCID profile that can keep track of the various products you've generated. Um the uh I've already talked about a little bit about the process. Basically we have a metadata template a documentation template that we can provide to you that you can open up an excel and provide the answers to the questions that are needed to provide a the documentation we need to add the data to Dryad. When you contact us we will provide that template and we can work with you on filling in the blanks but this is just a picture of of what the content of that template looks like um and so that's something you can expect as you're working with us and then further in the document there isn't there's an overview of the actual submission process in Dryad and the main thing that I want to highlight here is that we have um different limitations in terms of uploading data directly from our computers where we can we're limited to 10 gigabytes for a collection of data that are associated with a DOI when we're directly uploading it through their web interface um to their to their their repository. When we're uploading the data from a server or a shared service that limit is is significantly expanded to 300 gigabytes. This is a again part of the process that we can work with you on in terms of how what process we would want to use to be able to upload and transfer data into the repository. The bottom line is um we're strongly suggesting that as you're generating datasets as you're publishing papers as you're doing presentations and you're needing to share those data and preserve those data contact us at rds at unm.edu so that we can work with you to make sure that your data and documentation are of the highest quality and that they maximize the impact of the effort that you've made in producing those. Um and with that I'll take any questions. So as a reminder we're going to ask folks to type questions into the Q and A box there at the bottom of the zoom box and Carl will be happy to answer. Carl I appreciate the reminder about the orchid and I think I'm inspired to take care of that today for myself. Excellent. I'm not currently seeing any questions. So while while people are formulating the questions we can actually turn your attention to a question that was asked after Deanna's talk from Olga and her question was what are the software limitations that can be run? For example can I install custom software on a cluster and Deanna provided a very detailed answer and I won't read it to you but the answer is yes with some limitations and she also reminds you that if it's something that it's going to be used by more than just you you could let them know and then they can put it somewhere that doesn't use up your storage space. So that was helpful information there in the in the Q and A box. Anything for Carl before we we wrap up the end of this? All right so I'm going to do a quick wrap up if anybody has questions we'll be we'll be monitoring here before we finish. So just want to remind everybody to mark your calendar for March 25th when our next webinar will happen it's the student spotlight and we will be working to make sure that all of these resources are easily accessible from the EPSCOR website as well as the places that both Carl and Deanna described during the webinar. So seeing no questions I think we will call an end to it unless Carl or Deanna you guys have some things that you would like to leave us with. Probably the one thing I would suggest is even as you're starting a particular research activity we are also here to help in sort of planning what the workflow you may use might look like to make that transition from active data management to preservation and sharing to smooth that transition. So we very much enjoy being able to also support the the early stages of the data management process because it quite frequently significantly smooths out and streamlines the later stages as we look at this documentation preservation and sharing step. So earlier is better. Good advice thank you Carl. Deanna how about you? Actually I would like to bounce off of that when it comes to active data use and storage. The EPSCOR proposal paid for 600 terabytes worth of storage for the project and so please use it to store your data. If the data is housed on the HPC inside of either your home directory or the project space directory it is backed up. However you also have access to scratch space in case you need to even more space out of the moment. It however is not backed up and files that have been inside of scratch for over 90 days without being touched will end up being removed and deleted after 120 days. So scratch is not a place for you to store your files indefinitely however home and the EPSCOR project space is. So please take advantage of that and keep your data safe. Thank you Deanna and it looks like we had another question relating back to Olga saying if I'm not mistaken there's already a MATLAB license for the NMSU computer end of things correct and our answer back from Deanna was yes and it can be used by all discovery users. Any other burning questions before we call it into this webinar? Well hearing none I think we will adjourn and see you all again in another month. Thank you so much to our presenters Carl Benedict and Deanna Dugas and it's fantastic to learn about all the resources that are available to those who are working in the New Mexico Smart Grid Center. Thank you all. Have a great afternoon.