 So hello, oh my gosh, what a cavern hello and welcome and I'm gonna have a guitar wrist stuck in my head for for the rest of the Day, I'm Andrew Wiedler. I'm with Lawrence Berkeley National Laboratory I'm joined here by Gary Jung who's the chief of our scientific computing team I'm the department chief for user support at Lawrence Berkeley National Laboratory Here to talk today a little bit about our work as we move towards open stack and move towards concepts That we're calling the science accelerator platform This is kind of a different talk than some of the other talks here because I'm really focusing less on on the technical Aspects of how open stack works and more about what it means us for us as an organization To embrace open stack We're in a very different position than a lot of the other users of open stack So in some ways we're approaching this problem from a completely different perspective than a lot of sort of the business case users Who are becoming part of the community? I'm also interested in reaching out to to other folks here I don't know. I think some folks here. I think I've seen some of your talks who are also in sort of the national Science sort of infrastructure base We also wanted to create this talk bank frankly to meet you So I think there's a lot of sort of salient topics to develop upon at any time Please stop and ask questions. There's you know, this is this is actually the right size clap crowd for this kind of talk So if you want to interrupt or whatever, that's great Let's make it more of a conversation because I frankly think everyone over the past three days It's probably gotten spoken to or at enough with that, let me let me lead on so Lawrence Berkeley National Laboratory is In an interesting position for embracing kind of the open stack framework Whereas a lot of organizations are kind of coming into open stack as a way to increase user freedom increase the ability for users to self Satisfy their needs with with storage with compute with network features. We're already so highly decentralized That in some ways open stack for us is a challenge because it's really more of a centralization of capabilities Then we're used to doing Lawrence Berkeley National Laboratory is very much Part of the UC Berkeley campus and we operate very much like a research university So the deployment in our context as we think about not just how we want it what we want to do with open stack But the features and capabilities that we already have and how we want to roll that into open stack and the whole mission for supporting science Is a really interesting socio-technical problem, and I want to focus on both sides of that that aspect So I wanted to sort of address this point on sort of four four key points One is what are sort of three key key points what role open stack will plan our user support infrastructure? How will end users? Who are the users for this? Is it really end users? Is it scientists or is it facilitators sort of the folks on our side? And then how how are we looking at building our infrastructure and dependencies? To manage open stack services because there's a heavy lift in becoming part of the open stack community successfully We are in no way resourced in the way CERN or other folks are to create a large deployment So we're gonna have to come into this aspect slowly, although we do have a set of very robust capabilities that I'm gonna talk about So first off what is an LBNL? Just so everyone understands we are not Lawrence Livermore National Laboratory. Those are our dear friends 40 miles down Towards the Central Valley We're the small science laboratory above the UC Berkeley campus We were founded in 1931. We're actually older than the National Lab system And most of the heavy elements in the bottom part of the periodic table were discovered at Lawrence Berkeley National Laboratory So our our traditional strengths have been in high-energy physics Accelerator physics, but now we've branched into virtually all aspects of the of the sciences biocomputation genetics thin film materials You name it. We pretty much are involved in it And our role is as a science laboratory. So everything we do is unclassified We do no classified work at all. We in fact have a very very open boundaries both physically and computationally We don't run a traditional firewall environment at all Users can bring their own devices. They can do pretty much whatever they want. We do do a fair amount of traffic management and scanning But as long as you're not clipping any barriers, you're you're good to go and for me. This was a mind-blower I came to LBNL from the Department of Defense about a year ago And just the idea of being able to carry my cell phone and have it in my pocket was just a complete Revelation and in fact I kept having nightmares that I was going to go to jail because I'd been in meetings with my cell phone We manage a bunch of key user facilities So we ran it so we we do manage the NERSC facility the national energy research research supercomputer That's one of the top five or ten computational resources available globally That is not part of our group Gary Jung runs the Lorenzium cluster, which is a condo unit about 30,000 cores. That's our 30,000 CPUs And we provide a lot of the workhorse computation that exists within the lab NERSC is operated by bail bail, but it's not an organic part of our IT environment We also run the advanced light source, which does a lot of soft x-ray imaging And that's a big global user operator a user Facility for scientists around the world. We're on the molecular foundry and and other facilities So we're really we're not just physics We're involved in just about everything you can think of and of course on the computation side since all science now Is based in IT that means we're essentially doing every type of computation Problem that you can imagine So who are our users when we try and figure out who they are because we're trying to understand Okay, what is the space they inhabit and I think you probably know a lot of these things I know for you you guys in the back there particularly, you know are dealing with the same kind of folks that we are Our users, you know on the national science. They're highly diverse and idiosyncratic They've got a strong culture of do-it-yourself and infrastructure by the PI, right? Every single science grant is basically its own IT campaign It's got its own IT problem, right? And you have to recreate this infrastructure split up very quickly and get rid of it very quickly Very high decentralization of funding authorities and mission essentially a PI is controls their own project their own resources Projects don't even have to use centralized IT if they don't want to so that puts us very much in a hustle mode We have to go get their business. We have to earn their trust and we have to keep it They don't have to come to us at all In fact, if they don't want to deal with us they could go out buy a set of racks put it under their desk and They're done, right? And that's not ideal for a variety of variety of reasons Users are primarily living on short money right, so If you look at proposals about 80% of proposals fail So that means our customers are constantly having to apply for money, right? I feel like it sort of you can't read this of course because I made it too small But that was actually on purpose because I can claim any one of those areas or is the point that I want to make But actually actually the the approved is sort of some of that purple and some of the greenie line Some of the gold line there, right? So most most most of the time most of the Proposals that are in our system are kind of going through the works or being rejected in Terms of length of grants people are operating on on like your cycles Like a lot of the grants are about 365 days Right very few of the grants are longer than that which means people are constantly Having to build and recreate projects on a short-term basis right and keep themselves funded and which means they don't have a lot of patience For grand notions of like big infrastructure and how they can contribute to the good of the lab They're not there for the good of that they're there to run their individual science projects And they have very little time and very little patience and very little money Which should do that and then finally the grant amounts are relatively small So three-quarters of a million dollars is that first line? So the grants are usually they're like under a million They're about a year long And most grant proposals fail So that means people get real angry real fast if something goes down for for a while Or if they can't get what they need very quickly They have a very strong incentive to do whatever they have to do to build their own it if they have to To make something happen quickly So what do we have what the response that the organization natural organizational response of that kind of customer is in fact The organizational form that we have now Right, which is we have all kinds of services, but they're not really connected. They're sort of available They're kind of Lego bricks that have been scattered all over the floor, right? But we don't have a lot in the way of sort of centralized planning Strategic notion of how we support science a real good feel for how these pieces come together Largely, they come together through the self-engineering of PIs Who happen to have been out the lab long enough to know where they can get storage and where they can get compute and how they can Work with Lorenzium and where they can get back up, right? But our actual ability to kind of reach out and figure out what they're doing and work with them as a partner in kind of their Their their science endeavor is very limited Surprising limited in fact. We don't even have an essential office that manages work for others So so there's there's sort of a tenuous relationship between between sort of what PIs need and what what we're doing We really have sort of everything but open stack at present I'm gonna kind of walk through the kind of the pieces that we have because we're in this process of moving our way Towards the environment that I think you all have embraced right and we're in the process of trying to embrace On the networking side. We're connected to ES net yes net is actually hosted at Lawrence Berkeley National Laboratory That provides a very robust global framework to move large amounts of data around the world So we're closely affiliated with CERN We're closely affiliated with pretty much any high-energy physics project you can imagine we're pretty much closely affiliated with sort of any large Geospatial telemetry type science project that there is At our perimeter we run basically a completely pretty open environment. We really don't running a firewall environment Internally we have very robust ethernet very high speed, and that's available to everyone We have you know storage elements, and then we have the laurence and condo cluster And that provides About 800 teraflops peak performance Resources that are condo owned by virtually every science entity at the laboratory Where essentially anyone can come and share share cycles And that's actually very well provisioned with an infrastructure to sort of quickly get people on And get them get them working, but it's not connected to an open-stack type environment yet And then of course we have this big shotgun blast, which is if you're driven around You know you've seen these signs like that and that's essentially how we provide our services kind of All right, okay What we have is we have you know we have software farm which provides a development environment on to laurence So it provides a very easy way for people to take their VMs Register them with the software farm service automatically receive a propagated environment that's ready for them to push jobs on the laurence We we develop singularity, which is a docker-like, but very lightweight packaging Capability specifically designed for HPC We're standing up a very robust science VM Capability and that's going to be used for both VDI and for compute provision and for prototyping Much of that is going to be extremely low-cost Since you know some of that's the art the way in the nature of accounting at a national lab Where who knows what really gets charged to wear, but but in fact we can offer the that capability Substantial discount to other options Globus and Drooba very highly involved with that we've been we've been Back to Drooba. We stood up this year and we've We have a several hundred terabytes already sort of been just into that just backing up people's endpoints Globus we've now connected to Google Drive And we're in the process of rolling that out and that provides sort of an unlimited sized endpoint for movement of data around the world Software carpentry data carpentry training. We do all kinds of things. We provide training for arduinos We provide training for building endpoints deploying stuff into the field I'm very interested in our side and in how we grow capabilities and machine learning and also how we look at telemetry Enhancements we have very permissive cyber and network management. We have access to the commercial cloud And we're busy putting master payer arrangements in place Up till now we really don't have that so people can go out with their own credit card Get on Amazon they can take lab data They can put it out onto Azure wherever and we have absolutely no ability to really know what they're doing Which is odd Wide-open software availability We provide Jupiter hub and we've been heavily involved in that project since since the start And then you know, we've we've gone full into the G-suite environment Both G-suite and now increasingly we're gonna be looking at G compute. So the point of all this is We've got a ton of stuff, but none of it's knit together, right? It's not a platform It's not a common environment for saw science capability provisioning. So We've created this this huge very flexible framework, right? The flexibility has its costs and I think that's what we're starting to realize and this is the sort of the one of the Drivers for why we want to engage in greater collaboration with with folks like you who are developing capabilities that that I think we're very interested in And that we want to contribute to We have tons and tons of short-term projects, right? They're all doing their own thing We are we are basically not involved with them the larger projects Optimized for their own needs and they actually deploy their own shadow IT in fact We have different facilities that in fact have compute racks that we don't touch We don't anything to do with them, right? We have no idea what the total compute capability of the National Lab currently is And what that means is you have shadow IT and really also postdocs Filling many gaps and that's actually sort of terrible, right? So to give you an idea of kind of the message is what I did is I took all of the proposals that we got this past year And I just sort of word cloud of that you know word cloud sort of suck I mean no matter what you do, right? But what's cool about this is like it's just a mess, right? We're into everything. We're touching everything. There's no order or structure to that cloud particularly And what this also means is that when you ask the postdocs, right? Who are the least empowered? Segment of our community what they spend their time on right? They're losing a lot of their time which is crazy at the lab level, right? This shadow IT has taken up 20 30, you know 10 to 23 10 to 20 percent of their time Okay, that has a real cost that we're not measuring has a real cost in careers that are disrupted in Proposals that aren't written in people who become disaffected with the scientific enterprise because they're not doing science They're doing IT right? That's our job. We should be doing that. They should do the science, right? But we don't account for any of that cost and so what we're trying to do is think about okay How can we how can we build sets of services and capabilities to improve upon that? So here's wanted to go through quickly sort of sets of examples for the kinds of science activities are going on It's just to get a feel for like sort of how decentralized some of these things are so one project We're working on is called the airborne radiological enhanced sensor data pipelines called Aries This is work where we actually have been doing a lot of geospatial sampling flying a helicopter with sodium iodine logs all over all over the US all over the Southwest and doing very detailed collections of radiological background environments and then doing free mapping from that What's interesting from a science standpoint is what you're trying to do is increase your ability to detect radiological objects at a distance By having a better prior model for what the background radiation environment looks like In the case of this this particular activity of the IT component of this is Really on the data collection and compute side So essentially we're supporting the transfer of data from Ellis Air Force Base to to LBNL via Globus We're doing the compute in our condo cluster But not as part of our larynxium cluster. It's in a separate rack of equipment that it's not a part of our management environment So we're not sharing any of those resources. So the utilization rate on that can be relatively low And then we're also piping the data to vendor Equipment where they're doing out to test the algorithm, right? So the sort of one model for how we're supporting an effort another model on the biocomputational side is a Lego And truly the noctua system and these are sets of databases That basically it's a sort of a system and engineering tools an analytic tool where people can can express what a gene is They express the genetic expressions of that how these things combine and develop the mappings for that And this is a set of databases that that that the scientists themselves Spill at 40 40 or 50 of these databases all the time on AWS and supply it to customers around the world They're gathering the data back into object models that are that are contained centrally back at LBL But they have nothing to do with it. We have no involvement with this at all, right? So they're running all of the relationships themselves themselves. They're managing the AWS contract They're they're they're duplicating resources. We've already got they're spilling up stuff up and down And the only involvement we have is essentially making sure that they're not blocked by any of our final internal systems and then finally a Third example looking at geo carbon sequestration This is a heavy compute project. This is something we are doing in a Laurentium cluster So they're in our condo and they're in our management layer, which is nice Right, and they're doing sort of a large finite element computation of geo Subterranean features so you're looking at Processing signals collected from wells as you're as you're doing test injections of co2 You're looking at your ability to model the flow of that material in subterranean structures And how that interacts and you're using legacy codes that that LBNL has worked on for probably 25 years Right, so completely different side of activity completely different management of how this project works Completely different use of the different resources among all three projects So then what's what's the niche for open stack? We see that there's a key niche right and that's why coming to coming and interacting with these con this conference and other things Is that we're trying to do our important? We've already developed a set of organic capabilities Our user base runs the gamut in terms of what they need the desire for services But we're coming at this problem sort of completely from the rest of reverse direction where we're so Decentralized that putting an open stack is in fact centralizing things right it's it's changing the service model from Oh, I know bill. Let me have him set up the VPN for us right to To you know using a user-based system to support that which is different than some of the like the commercial models or like What what Verizon or other folks are using this system for? So what's what's the model here for open stack? Are our customers the scientists of the PI are they the folks who are going to be using this? Well, we're not we're not sure it may in fact be that this is this just becomes a fabric as part of our internal Management capabilities, and it's not clear to us that this becomes part of our high performance computation or our scientific computing This may be part of a mid-tier Activity that we're engaged in to unify some of our capabilities together I'll talk a little bit about sort of a thinking there in just a bit This problem though particularly is it sort of endemic to all of scientific support right now I think you guys are probably Everyone's experiencing as you look at the commercial cloud and how services are managed there So when you when you when you have researchers who are going out to like AWS or to Azure You know There's a substantial training burden that's put upon your IT infrastructure Even though you're not providing those services because if you throw someone out to AWS, right? What do they see? They see a they see a page like on the right there, right with like 50 different services Should they use EC2? What kind of storage should they use? How do they manage it? Is it on the spot market? How do they control their costs, right? None of those issues are really worked out, right and most scientists are in fact not data scientists They don't come in organically knowing how to set up a computational, you know an ingest pipeline a compute cluster Storage, right output, whatever it is they need organically unless they have postdocs, right? But that's that's not what we want to have happen in the first place, right? So this is a substantial burden in just throwing people out into the commercial god That also exists on the open stack side if we just present that open provide that as an open environment for research So both sort of internally and externally with these these commercialings we have sort of these hodgepodge's of There's all these things you couldn't you can do, right? But how do you actually tailor it for the scientific environment in which these folks find themselves? Which is short money short time frame high stress, right and a specific and a specific orchestration path What that means for us is if we just build open stack, right? And without thinking about how it becomes how it be used as a science as a platform for scientific use We'll build it and they won't come So I think we'll have more success actually building the interface first What we're we're doing is walking our way towards the on-prem cloud Right by taking all the features we currently have Developing those into a framework which is more tightly coupled to the scientific workflow and the scientific conduct of business right and then Over time that will become enabled by open stack and here's where where we are resealing from from a variety of folks in the open Stack community. I Think Australia's nectar Capability is sort of a fabulous model for us to learn from and I appreciate you guys coming I see you right there, right? You've been very generous with your with with providing us access and an AAF account really appreciate that But there's you know, there's other models with CERN. There's other models with UCSD and the National Data Service that we're trying to connect up with PNL our sister laboratory Has an open stack environment they call PIC which they use to provision compute resources to their researchers So we're trying we're engaged in a process of trying to learn from all these cases and say okay It's not just enough to provide Network features storage features compute features other things and just throw it out there, right? We're a science laboratory. How do I tailor these things specifically for the mission? We have and that's the challenge We're also, you know compute Canada is also a very interesting example They're engaged in in a really interesting Examination of how they provide resources to their community But they have an advantage over us in terms of they're highly centralized So compute Canada when they set up a condo compute Canada owns All of the compute resources that go in there and they provision that all as a as a As a grant to their researchers in our case the PIs own all the compute resources that individually go into a condo Completely different rationale for how you get people to share resources How you do build a common framework for it and how you how much you can actually get them to work into a common environment, right? All right, so what is the sign of a workflow? And I do apologize. This is far more visible on my screen than it is on yours, but Sort of at the top of things you have, you know a scientist trying to identify grant opportunities They're trying to register and apply for grants They're trying to obtain funding and establish a research project. They're trying to manage and perform research They're trying to analyze and adjust their research and there may be a feedback loop as they change their data pipeline There's their analytic pipeline and certainly there's a process with our and other tools as they think about like how they're analyzing the data And what they're getting from from whatever experiment experiment it is that they're conducting and there's a project closeout and underpinning all of those Are sets of tasks with publishing? They've got to outrage teach and mentor even though they don't really necessarily want to do that And they have to maintain disciplinary situational awareness You know they have to keep current in what's happening in their field because you can't apply for grants if someone else has already Sort of gone down on down a well trodden path So what we're trying to do is identify in those spaces. What are the functions? That have to be performed by our research And then how would we take those functions and describe the sets of tools and capabilities that would fit into that right to change? Ultimately what the front end of our cloud implementation will look like so first Let's take the tools and capabilities we have and mold them into this framework and then second deploy a cloud framework So initially like if someone wants to find grants, you know They're gonna have to you know log in to some sort of grant watcher dashboard This is sets of tools and maybe initially i-framing sets of capabilities we've already got But right now these tools are just sprit and scattered all over the environment They may have to be you know managing like external grant sites and we're looking a lot at the open science foundation And they have a set of the open science foundation as a set of tools that they're using to like register Science projects and register where data is stored in those projects where data is published Where it's provisioned how it's transported and who are the collaborators on a specific set of project make that globally searchable I think that's very interesting. That's a model. We may we may promote and then of course there's tons of library resources On the applying for you know next stage sort of applying for grants There's going to be a whole set of capabilities to leverage our existing tools To manage grant proposals and grant writing including Looking at like how do we automate the research data management plan generation process? How can we make that easy for folks? And then how can we tie that into the orchestration? Extrash expressions that would exist in a cloud in terms of running a project A big problem we have is we have a lot of scientists who come in using user facilities They're coming on site for about six or eight weeks maybe longer They're running maybe a beam line at the advanced light source Which means they're they they run all the instrumentation on the beam line They collect all the data on the beam line They build their own research IT research on that beam line and they have to understand quickly all of the safety and other regulatory and financial and other Physical connectivity that they have to understand in order to maintain and build that beam line so one thing we're looking a lot is like Recommendation features knowledge bases chat bots other capabilities to synthesize data from multiple sets of guidance And provide people with the information that they need And then there's just a set of standard collaboration tools that we would we'd we'd provide Okay, and then finally you get to the actual science part Right, which is okay now that I've got an ingest pipeline working now that I'm collecting data Not in beginning to compute. How do I get people into Lorentzium quickly? How do I get people into the science VM space quickly, right? How do I provision that in a way that salient to the needs that they have that's that phase of the project? And that's where we're looking at like, okay How do we provide these sets of tools and capabilities on the standard sets of task focused dashboards? They'll provide the more of a platform and then finally you get that sort of the broader functions that people have so science communications library research Publishing putting up websites. We run all of these capabilities now But they're not fit into any kind of framework that someone can quickly find so in our case What is going to drive open stack at Lawrence Berkeley National Laboratory is actually not direct user demand So the scientists are not interested in making in the infrastructure of the laboratory better better, you know better more robust capability On our side, we are interested in helping them do science more effectively So it's gonna be a process of experimentation where we're providing things as people come in in many cases for free And seeing how they respond to it and seeing if they actually can can you know demonstrate productivity gains on that It's not clear that open stack by itself if you just throw it out there actually makes their job and easier Right because right now if they need a separate network feature or a port opened or they need to register server logs with us We do that now with a hand touch very quickly It's not clear that interfaces really gonna streamline that but There are still very good reasons why one organization like ours wants to get into the open stack space first is Collaboration with with other sites and other capabilities around the country that are doing this The second is we want to broker the relationship between the private cloud and our on-prem compute resources more effectively We don't want data going out even if it's not data. We own Right without having a good sense for where it's going has it been backed up Is it being managed effect effectively? Are we clipping any PI restrictions or anything at anything else like that? And then we want to provide better data and federated data and query services So right now we don't do a great job supporting meta analytics Because we don't have a central query all capability. So for example, there's there's 40 years of beam line Collections on the advanced light source. That's all data that globally people could do a lot of good with if it was Courierable and if it was available, but guess what? It's not in a common place because we don't own the data people come in there run their own beam line with their Own funds they collect their data and they take that data and they go off We don't have a mechanism in place to even gather that stuff in so all of those features actually drive the organization towards things like Open stack and we look forward to becoming part of the community over this next year as we stand up capability But there's gonna be a lot of experimentation on our side to figure out Okay, how do we actually make sure we're providing the right set of capabilities to the right audience? Which in many cases is not the scientists directly. It's us doing things behind the scenes to try and make make the science environment more effective so That's the conclusion In our case open stack is not about cost It's not about efficiency from the standpoint of measure some of the measurable metrics that we have It is about efficiency from the standpoint of thinking about like how do we do reduce the burden on our poor postdocs? And other folks like that we want to make sure stay in their discipline and produce great science and We may not go it alone We're you know, we're very interested in like looking at partnerships and looking at like Okay, do we use this as a natural mechanism to buy into someone else's open stack or sets of capabilities as maybe a multi lab activity But I think up type is gonna be is gonna be extremely organic We're not just gonna go out deploy a large instance though. We have the capability to do that And then roll it out to our science base because frankly people would probably poke it with a stick and keep doing what they're doing on their Own given the financial structures that they have So it's gonna be a process of proselytizing and figuring out how we get into their workflow to make that happen And with that I'm done. Thank you all very much