 Okay, hi everyone. I'm Christy Hegramiller. I'm an oceanographer and research scientist at SOFR ocean, which is a relatively small startup that occupies the blue tech or climate tech space. And I'll be talking about some of the tools and workflows that I've learned since being with SOFR, and how we can translate those and adapt them for our academic workflows. This talk was really motivated by the fact that I'll be transitioning back to academia at the end of this year as well. And so I'm looking to basically make my work more efficient and more collaborative. And I will admit that I had pulled my colleagues at SOFR for some ideas for this talk as well. So their names are down here. And they include other oceanographers and some of the very patient software engineers who work very closely with our ocean science team. Okay, so SOFR ocean is on a mission to address data sparsity across the ocean surface and provide insights to communities and industries operating on the world's oceans. In particular, we're really interested in characterizing and observing the sea state, so wave conditions across the global oceans. And we do that through a full stack approach, and that means that we're interested in not only building the instruments and tools that will help us observe wave conditions across the global oceans, but then also building you know software and tooling on top of that that can make use of those observations more effectively. And then finally, building more tools and more software that can directly interact with customers. In particular, the bottom of our stack is this small buoy called the spotter wave buoy which some of you I think are familiar with. It's a small basketball sized solar powered wave buoy that measures the directional wave characteristics so it's basically observing how energy is distributed across frequencies and directions. It's free drifting and we have about 500 of these in the global oceans right now, drifting around with different ocean currents, and sending data back to us over a really iridium connection hourly. In addition to measuring wave characteristics we're also observing sea surface temperature, barometric pressure and wind speed and direction through proxy and also rain, whether it's raining or not through proxy as well. So really trying to characterize the air sea interface. On top of these observations we maintain and develop a global operational wave forecast. So this is really predicting wave conditions over medium timescale so the next 10 days or so. And that's the colors that you're seeing on the right here. That's our operational wave forecasting system and we'll be talking extensively about next. And lastly, you can see one of these kind of customer facing tools that we've built on top of this way forecasting infrastructure which is a ship routing service called Wayfinder. That helps container vessels will carriers optimize their routes across the global ocean for fuel efficiency. And for time. So we're really making use of the observations that we collected the ocean surface better forecast that we're able to provide because of those observations, and then knowledge about the shipping industry or whatever other commercial partners we might be working with. I am an oceanographer as I said my expertise is in global ocean and coastal wave dynamics and so I really sit in the operational wave forecasting part of this company and don't have as much insights into the other spots. So I'll be talking a lot about that way of forecast that we've built and that we're maintaining and actively developing and the tools that we use to be able to engage with that that forecast analyze it and develop it as rapidly as we can. Okay, this is our overarching way of forecast infrastructure. So there are two components one is an analysis track and that is basically combining our forecast with the observations that we're making hourly and producing the best initial conditions for a longer term forecast that we have so the two tracks that we have here are this analysis track, and then the long or 10 day forecast medium range forecast I understand we're thinking about a range of time scales at this conference. This operational way of forecasting system is built on wavewatch three which is a global or a spectral wave model maintained and developed by Noah. We have a data assimilation methodology that we've been working on. This system is forced by wind forcing and surface current fields and see ice concentrations from other operational centers, namely the European Center for medium range weather forecasting and the Gulf's 3.1 high com currents for the ocean surface. And then we make use as I said of our own observations across this network. That's split up here and satellite observations as well. And I just want to point out what each one of these little modules in this forecasting infrastructure actually is so that when we talk about the tools that we use at so far you can maybe link it back to what your own repositories are doing some of this work. So, of course wavewatch three is an open source community developed for trans based model so this might look like lots of other numerical models that you use in your own work. Our data assimilation repository is Python based so this is code that we've written that we maintain and we develop and manage through GitHub. The repositories that pull in forcing from external data sources, whether those are other operational forecasting centers or observation platforms like satellite altimeter observations or something like that those are all repositories of bash NCO and CDO tools. And then lastly I really want to point out that we've built in some automated assessment into our forecasting infrastructure and that helps us really focus in on the science questions that we're interested in. And these repositories for data analysis of both, you know this automated workflow but then also any other exploratory work that we do is all Python based as well. And lastly we do a tremendous amount of development and experimentation with different parts of this system whether or not that's the entire infrastructure, and we want to run a side by side experiment that looks exactly like our production forecast. Or if it's you know I want to run a reanalysis for one month using just wavewatch three and maybe some of these external fields. So the modularity of this system and having kind of known inputs and outputs to each one of these components allows us to break it up into individual pieces, and together into our operational system as effectively as possible. And that enables us to have this like very rapid iteration between research and operations, which you may be familiar with from like other types of operational forecasting centers there's always this transition period where you do a tremendous amount of research, and then it's not smooth to get that back into your production infrastructure so we've tried to eliminate some of those challenges. And then just to talk at a high level about some of our infrastructure. We are using cloud based infrastructure from end to end. I won't talk about all of this in detail. And, but I will mention that we use an orchestration layer that allows us to kind of take the human out of the loop for dependencies and individual tasks of this forecast that need to happen concurrently or you know sequentially. We use containerized builds which I will be talking about a lot. Next, to help us modularize our code and know exactly what we're working with. We run everything on cloud compute through AWS batch, which is a parallelization tool that AWS has. And then everything is stored in final form on AWS S3 and served to customers or internally to us within the company as well via API's. Okay. So, right, this is all wonderful but why, why are why am I here. I think that a lot of the challenges that we face in this type of environment look very similar to what we face in academic science environments as well and so we have these grand challenges that we need to face especially as we incorporate more modeling and larger data sets into our work. Those challenges, I think are you know reproducibility of experiments and results barriers to entry for model development and use and barriers to entry for big data analysis and then maintaining and growing community engagement development and sustainability and so that sustainability part is really making sure that, you know, not only one person has all of the knowledge about a particular modeling system or a particular tool that we're using within the community and allowing as many people as possible to contribute work with and improve different systems that we are using. So the tools that I'm going to talk about are not unfamiliar probably to most of the community here, but the three that have been most effective for me to actually make the transition to using these tools seriously are get for code management containerization of all our code through Docker, and really investing in the upfront work to have automatic data approximate analysis tools. This is not perfect I am not a software engineer right, but while I working with software engineers I was able to build and you know, rapidly scale up to using some of these which allows now any other person who joins the company to more quickly spin up with the tools that we're using in the infrastructure that we have. Okay, I'm going to fly through get because I'm pretty sure probably a lot of people here use get or get hub to manage their code but just at a high level. Get is version control for your code. It's a framework for collaboration amongst multiple developers and get hub is a platform that allows us to host get repositories and really fosters collaboration. The practices that I found to be really helpful with implementing get in my own workflows are creating repositories for really modular discrete parts of code that can be reused in different places. Commit early and commit often so you leave a paper trail behind you of what you've done. Code reviews, that's what any level of formulas so you know whether that's casually bringing someone in and saying hey I want you to take a look at this one thing or whether that's like properly not being able to merge anything in until you have someone approve it. And that's, you know, a good opportunity to catch bugs and get feedback on large scale code structure. And then what can get enable for me and enables really clean clear tracking of code changes, and the ability which is really important to revert back to any version of that code that I've had in the past. And that might be because I found a bug or I might want to reproduce my results from a previous paper that, you know, used one version of this code that I had. Right. And then I think really importantly it also gives us the opportunity to mentor on code design and development and so that's really beneficial for students who might be joining your group. Okay, the tool that I'm going to spend the most time talking about is Docker. I love Docker. I have found it to be extremely effective in my workflow. So Docker it's a platform that delivers software in packages which are called containers, and it basically is like a recipe for an automated setup of a compute or analysis environment. We use this integrated with GitHub so that it links the exact commits that we've made to build a new image of that code. So what will it do for us so far and what will it do in my academic work it provides a tremendous amount of portability for when hardware goes through major changes. It also provides an exact record of what model setup and compilation was used to create a particular set of results or simulations, and most importantly and I think that the biggest thing that this has the potential to do for us as a broader modeling community is that it alleviates the barriers to entry for code and model quick start so any model that you might jump into has some quick start guide right which build has like some steps for compiling that code. But if you go to any model forum there are a million questions about code violation issues and lots of people spend months trying to figure out how to properly compile their code. So Docker kind of removes some of that work and if you were to provide a Docker file with your model development you might be able to just set someone on their way because you've provided this recipe for compiling it in a particular space. So I had heard all of this from rich signal, I'm in my previous job at us yes, and I still wasn't able to like actually visualize that what that would look like in my work so I'm going to provide a tangible example of how we do this at so far so that hopefully it. You can see how it might look for your own work. This might be a little serious bear with me. Okay, so a Docker image or Docker container is built from a Docker file which is just this recipe as I mentioned. You probably can't see all this so I'm just going to tell you what some of these parts are right at the top of this Docker file actually the first thing I want you to notice that this is actually not that big and this is the entire implementation for wavewatch three. So, if I run this and build a Docker image off of off of these lines of code, I now have a wavewatch the implementation that I can hand to someone else and they can run it wherever they are as long as they have Docker running. The first part of this is I define my like kind of base package for the, the hardware infrastructure and that I'll be running on. So here I just have a bunch of distribution. I have a bunch of packages that I might need like, and pitch nets, you know, and net CDF NCO, any of the builds utilities that I'll need to compile wavewatch three. I point to a very specific tag commit tag at in the wavewatch three repository. That will allow me to reproduce this no matter how many times I build it right even if they've made changes in that repository. And all of this stuff looks like your normal compilation so it's just pointing to particular libraries and compiling the code down here. And as I said, this now builds a container that has all of the necessary information for running wavewatch three within it, and just has an executable that you can call. The origin of this is a Docker file to build a Python based repository. This is even shorter right and it basically is just a pip install of your requirements that you might need, and copying all of your Python code into that container. And so again I can pass this now to someone else. And without any work on their end they can just run the Python code or whatever it is that we're working with a little bit more about this what does this look like in practice here's kind of you know, our Docker hub repository for our wavewatch three implementation. Each image name that is here is tagged for a specific build that we've done and it has a you know it has a semver tag associated with it so it's telling me something about how big the change was that I implemented some of them have development tags and so I can point directly back to one of these and use it to rerun a model, knowing exactly what I've done or you know link it back to some results that I've generated or a paper. And then just again one line, anyone can launch one of these containers and run an executable on their own machine. So this is the one line now that with the image that I generated. And I could pass to someone else and they could run wavewatch three. So this really has been effective for us in terms of spinning new people up in terms of our forecasting infrastructure so I can pass this to a colleague who might be working on a very different level of so far as mission. And they can without you know having to spend days figuring out all the libraries that they might need or you know what does it mean to compile wavewatch three, they can actually just run it and get the output that they need. This has been extremely effective. It's not, you know, this is this is I think a shared vision across a lot of academia. So this was a very recent, you know, three weeks ago dear colleague letter from physical and dynamical meteorology and NSF, and they explicitly call out a new effort to supply a containerized version of warf which is an atmospheric model which will allow you know us to lower the barrier to entry. I'm going to see you standing here so I'm going to skip. I'm going to, I'm going to, I'm good. I'm just going to talk quickly about some one other thing that we've done so we've really worked hard to have data approximate analysis, which gave a really nice clinic on this, and we're not implementing all of the things that he talked about in that clinic, but what it has done for us is allowed us to monitor or have some intermediate analysis of model simulations. And so I'll just talk briefly and I'm happy to take questions about this but we've built dashboards basically that produce results for us on the fly, and have slack integrations that give us a bunch of plots that we can monitor every day for how our forecast is performing. In conclusion, we've talked a lot yesterday about you know academia agencies and community collaborations and I think industry can be a really effective partner in that collaboration as well. I'd love to talk more about that, you know with you all over this conference, software engineers teach and enforce really healthy development and maintenance habits for code and that's been super helpful for me and some of the tools and practices that we use in the industry are portable back to academia. And then lastly, you know if I can do it I think you can too and so I would love to talk with you more about what this looks like in practice and. Yeah, thank you so much. Thank you Chris, that was a really inspiring talk for like best practices in software development for like many of us here. There is a question in the back there. Wonderful talk. Thank you. I was wondering since your, your last slide was about you know partnering and partnering and having the commercial world with academia, etc. So, you're using way for three and at the same time you have those many buoys in the ocean. So you actually observe what the waves are. Any feedback from your company to know for example who runs that operational way for three version. So you can indicate. Well there is really a bias in this part of the ocean that you can, you know, yeah, come out with a new version that is a great question so we, we do collaborate really closely with Noah, and we have, I think like several proposals that work to either develop infrastructure for them that can be easily pulled into their own operational systems, or like you mentioned with our historical data from our observation network to validate or calibrate their models and so you know we work with, I think, like some of the European space agencies as well to calibrate satellite data. So there is a lot of feedback between agency centers so far and the academic community as well the last thing I did want to point out is that all of our historical data is freely available within academic license. So if you're interested please reach out and I'd be happy to share that. Thank you. I'll allow one more question. Yeah, sort of a science question. What data from these buoys is most important to improve the forecast compared satellite data and other data. Yeah, that's a great question I didn't get a chance to talk much about the interesting science that we're doing. But I think the most valuable thing that we've done with the data is really be able to assimilate the directional wave observations into the model. So these models are spectral wave models so they're solving again for the wave energy distributed in frequency and directional space. And instead of assimilating just a significant wave height which is only available from satellite altimeters we're able to actually assimilate the directional moments that are observed by the buoys and that variance density or how energy is distributed in frequency and we see that that gives us the ability to really allow waves to propagate further away with better skill than just pulling the entire wave height fields to some value. I'm happy to talk more about that is the best part.