 So first of all, thank you very much for coming to this session. My name's Sebastian Seidel. I'm the founder of Scaler. And it's my great pleasure to introduce to all of you Jonathan Chang that you saw at the keynote yesterday. He's the chief engineer at NASA JPL. And he's going to be talking about the hybrid cloud that they've been building, the architecture, design decisions, and all that. So first of all, perhaps you can talk about the JPL. And then we can get started. So yeah, you know, at JPL we like to brag. So I'm going to talk a little bit about our missions and try not to be completely repetitive. Can we get the slide back on? We just lost the slides. Hey. All right, welcome back. All right, so Sebastian went through this part. I'm going to talk a little bit about JPL. We manage and maintain a lot of missions. You guys might have seen the keynote yesterday. In 2012, we landed the Curiosity rover on the surface of Mars. And it has been a significant accomplishment for JPL and that we've found traces of life from millions of years ago on the surface of Mars. Our upcoming Europa Clipper mission, we believe under the icy surface, the icy crust of the moon of Europa that orbits Jupiter, there is liquid water, which can sustain and host life. We find that on Earth, where wherever we found water in favorable or unfavorable conditions, we've always found microbial life. So this is the surface water ocean topography. One of these, this mission is going to launch in 2020. But this represents a significant change for JPL. This mission can downlink 40 terabytes of synthetic aperture radar data on a daily basis. And that's level zero raw data when processed will grow exponentially. So we have a very significant data volume problem at JPL that we're trying to address. This is the Mars 2020 rover and a representation of the various instruments that are going to be carried. This is really essentially a reboot of the Curiosity rover with different kind of science. So that's going to launch in 2020. Why 2020? Well, every two years, we have a window where Mars and Earth have conjunctions. We can very easily get to Mars in eight months versus getting gravitational assist and then doing all that stuff. And we can't do any of this without the Deep Space Network, which is our pipeline into our various deep space exploration vehicles. As you guys might know, Voyager has actually left our solar system and is an interstellar space that was launched in the 70s. We're still receiving data from that today. So the good stuff. All these missions are quite unique. Everyone has a specific workload. And they all have different requirements. In fact, you can almost say that we need a cloud for every one of these projects. So synthetic aperture radar is a different workload than, let's say, a deep space mission that's bringing back images and things like that. So for us, hybrid is kind of almost like a bespoke or boutique kind of a model for us. So some clouds are really good for doing specific things and some clouds aren't. For example, we moved the entire Mars public outreach website back in 2012 to Amazon to accommodate the huge amount of public interest. We used services like CloudFront and S3 to really serve out the video stream and the image repositories for all of the raw images that are coming from the Curiosity rover. They're there today. And you can actually reference them just using the HTTP endpoint. So people have built mosaics based on the images we host there. People have built their own websites. Can you talk a little bit about the Reddit incidents? Oh, yeah, so we got front page on Reddit for interesting enough for one of our opportunity rovers drawing a phallic symbol on the surface of Mars. And it was so popular, it actually broke some of our websites. So each workload really translates to an ideal home. As we talked about, the public cloud is fantastic, right? We want to rent for the spike. We can predict, let's say 40 terabytes of downlink, how much forward processing we require to keep up with that 40 terabytes, to take that level zero into level one data product. But what we can't do is measure how many times we want to reprocess that data. So a scientist can come to us and go, hey, man, we've got a whole different algorithm. We want to find out something different. We want to take the entire data set, the multi-pedabyte data set, and we want to reprocess the whole thing based on this new algorithm that we've come up with. So how do you buy that? What do you go out and run to the data center and go, yeah, in three weeks, these guys actually want to reprocess two petabytes of data. So let's go stand up some servers. No, that's not the right model for us. We actually want to rent for the spike. So we've been using Azure and Amazon for that. So forward processing, keep up, in-house, and then all the elasticity out in the cloud. So you're mentioning here as well that the bandwidth to some instruments is a big constraint. That's one of the primary reasons for using OpenStack, right? Right. So my straight man over here, not only data locality and data movement are challenges, but also compliance. So we are required to comply with the international trade and arms regulations, which essentially says that US persons only physical and logical access to our data. So easier for us to do it in-house. And some of the clouds are catching up with those kind of compliance things. They're not all quite there yet. So for us, our hybrid cloud today is Amazon, Azure, and our on-prem OpenStack. To mention our on-prem OpenStack, we've been using Nebula. And we still are, unfortunately. But we had a plan B early on, and with some help from some vendors sitting in the room, we were able to deploy a secondary cloud that we're working towards operationalizing. So hybrid cloud is very challenging. It introduces a lot of things like diverse APIs, right? Multiple identities for each of the clouds, obvious security concerns for us with compliance and with ITAR regulations, and governance, and software duplication. Boy, that didn't come out right. Sorry. So we looked for help tooling with portability, the ability to reuse our work and move our work across multiple clouds. And we needed automated tagging, and along with automated tagging, the ChargeBackShowBack system, right? So for us, tagging is really important, because in shared cloud accounts, tags are really essential to understand the accountability. And for example, in Amazon, you can enforce tags, and they're not automated, right? So you can have a policy that says whatever you spin up, you need to tag, but it's really hard to enforce that, except by somebody knocking on your door and saying, hey, you haven't tagged your instances. So we needed some kind of automation that'll help us with the governance of that. And because tags are essential for us to do ChargeBack, and we're a full cost-accounting organization, we charge you back for everything from your phone to your laptop to your network port to everything. So everything at JPL is a ChargeBack service that tags are really, really integral to doing ChargeBack and ShowBack. And license tracking, understanding the usage and budgets for our operating system licensing costs in various clouds, and definitely self-service. A common API layer we're looking for to abstract those diverse cloud APIs. And JPL is really concerned with vendor lock-in. We are actually forced to compete all of our contracts for software for vendors of various things like that. So we actually just went through an entire RFP for our entire cloud contracts. So that's in process right now. That's good because that's our tax payer money, right? We can't favor any one single vendor. Can you talk a little bit more about the portability aspects, the choice of Amazon Azure for the public cloud? Right now, so Amazon, as a public cloud for us, there are two reasons. One is they have an ITAR compliant region called GovCloud. So we can leverage that and we can put our ITAR workloads in GovCloud. And Azure, we've used since about six or seven years ago, as it has, we built our Be a Martian public outreach websites there. But we're not super deep into Azure yet. They don't have that ITAR compliance for us today. OK. So making hybrid cloud work, and we found Scalar. And here's kind of the architecture. We really wanted to introduce a modular approach into our decisions into building a hybrid cloud threat. So we needed, on the very left, the red part, we needed a cloud management platform, something that will help us abstract those various APIs and those identities and accesses. So we looked at a lot of things. And I won't name the things we saw, but we got sticker shock from a lot of the ones we saw. And what really attracted us to Scalar was the fact that it was open source. They upstream all their code. And I'm gun shy about small companies after Nebula. But the fact that they do contribute all their code to the open source gives me a little bit of confidence. The second module underneath that is all the various clouds. And I think we got a nice surprise when Amazon, I'm sorry, Nebula on April Fool's Day said they're shuttering their doors. But because there's a different abstraction layer, we still could offer our users, because of the abstraction layer for Scalar, Amazon instead or Azure instead. So we can keep our cloud infrastructure going, even though our Nebula infrastructure is no longer supported. And then we can introduce new vendors or new clouds very easily, because that abstraction layer exists. Can you talk a little bit about SaltStack and the different applications? Yeah, so the final layer is kind of an automation layer. We use Salt. We also use Chef and Puppet. But what we found is we like to create, so cloud is infrastructure as a service, essentially, right? And you're gonna give people API and console access to build infrastructure. But at the same time, they need to do things for us, like we need to meet missed controls for every single one of our instances that are running. So things like your message of the day, the NASA warning banner, password length and expiration. The list goes on and on. NIST 853 kind of describes all the things we need to do to a server before it is compliant, right? So in the past, you would launch a server and you'd start installing packages and password expiration lengths. But what we've done with Salt is we've created all those protective measures that we need to meet NIST in a Salt configuration. And on instance launch, you can pass user data to go to the Salt repository, install the Salt agent, and then in 15 minutes, all the protective measures are worked into each of our instances. So that's the automation layer. We've also created an API for things like DNS, LDAP, what a charge back system, so that when that instance comes up, you can immediately register in our charge back system, get a host name and then add all the LDAP users from the group. So that's kind of the automation layer we're talking about. So really the model is to provide services for service providers. We know that application developers are building application platforms for our end users. So if we can simplify all those lower end kind of heavy lifting kind of things like doing protective measures and registering an LDAP and DNS, if we can do all those things for them, we can simplify and accelerate their process. So that's kind of our modular approach. So how scale our apps? So I'll take over a little bit. All right, so we were talking about a little bit about the problems that making hybrid cloud work is fairly difficult. So just as a little bit of a background, scale is a control plane and abstraction layer for multiple clouds and gives JPL a lot of policies to be able to do all the compliance and all of that. So one of the big things that Jonathan was talking about is portability of an application from say open stack to Amazon for say things like cloud bursting. Some of the experiments that they're doing is RHPC and embarrassingly parallel. And when they wanna do one of their radiation simulations, for example, they need to be able to burst out from their open stack to the public cloud or vice versa. So the combination of using salt for configuration and a scale has an abstraction object called roll. That's kind of an abstraction over on top of different clouds. That allows them to get a lot of portability. I don't wanna, do you wanna add anything on the cloud bursting side? You know, so the idea is what scale really is letting us do is imagine if you're a scientist and you need to do let's say radiation simulation on a specific instrument and you need to do a lot of it, right? What we can do with scalars, you know, instead of traditionally that scientist going out and buying a large supercomputer and clustering it, what we did for, we're using scalars is we created a instance that had the software, it's called Gantt 4, already pre-compiled, right? And then we put an auto scaling threshold and we let that user run and it would just span, span, span, provision, provision, provision, burst into Amazon and then deprovision. And that user got a 10 to the 11th number of radiation simulations of electrons, yeah, 100 billion, done in a relatively small amount of time. So the idea is really building up platform services that traditionally your users would have, you know, gone out and built a multi-tier application for but you can package all those into a scalar farm using roles and then use the scalar API to deploy the entire package, right? So yeah, and harking back to the architecture slide, because all the scientists go through the scalar control plane, scalar can ensure that everything's properly tagged with like experiment line of business, the user, whether it's production developments or any of those things that allow to do the showback and then ingestion into the financial systems. So tagging governance allows you to like automatically tag stuff. There's one of the things that Jonathan really likes is the unattended restarts. So Skater has a desired state engine, the same as you would get from heat or cloud formation with the desired state, observed state and reconciliation object. Do you want to talk a little bit about the unattended restarts and how that's going? Yeah, so a scalar provisioned instance could be like, let's say a web application. And if we just put the autoscale on the threshold to one, if it's terminated by Amazon or by anybody else, it just automatically re-kicks off the instance from the farm that was deployed from. Then there's also the autoscale or the cloud bursting aspect basically for the radiation simulation clusters being able to run a certain workload across clouds regardless of where they are. Yeah, I think as far as abstraction layer, ultimately what we're leveraging the scalar for is if we can start pushing the users up the stack where they're not thinking about the cloud or the infrastructure that any of this stuff is running on, then we can actually switch all those things out or put them in the right cloud or based on cost, based on competition of cost versus, we can say we can put it on our open stack cloud because it's the lowest cost or we can put it in Amazon because we're gonna save some licensing costs or the complexity of managing an enterprise licensing agreement through Microsoft or Red Hat or something like that. So we can do a combination of all these things. You can have a multi-tier application that maybe the Windows portion runs in Amazon and maybe the Linux portion runs at home and that's what scalar really does that abstraction from. All right, yeah. Then there's also resource reclamation, currently something that's in the works at JPL, but still not currently used. Basically the idea here is that when you allow big self-service across a large fleet, when you have a lot of developers that are just going to the cloud provisioning their own things, what you have is over time you have larger amount of orphaned servers, stuff that's left running in just a cruise cost. So resource reclamation and garbage collection is important and we're working to add that. We talked about basically being able to show back different costs and ingest that into the financial systems. So the cost analytics capability of scaler isn't perfect but they're not on scaler, but it's good enough now to give our users a ballpark of what their total costs are. There are some overhead costs and things that scaler cannot track, but it gives you a really, 80% of the way kind of cost analytics, right? So we can give that to the individual user and they can see, anticipate maybe 20% or 30% more whatever the cost analytics are giving them now. Yeah, and to that effect, there's actually some really good solutions out there like cloud and if you're familiar with that, that you can plug into for more control over the cost layer. Yeah, and then we've hashed the API, like the high level API over multiple clouds so we've kind of hashed that multiple times already. All right, and the last thing that Jonathan was talking about is license tracking and then through a combination of webhooks into their CMDP, you can track at all times how many licenses are being used in the system so that you can, if you're approaching maximum, you can start taking action on that. Yeah, and the very last thing, and maybe you wanna talk about the self-service portal and how you're a service provider to- So I talked about very quickly and I apologize for going so fast, but we talked a little bit about abstracting the underlying kind of data center, hypervisor, cloud from users and what we've really leveraged Scaler, we realized after using Scaler for a year and no knock on Scaler, but it takes a little bit of training to understand the full capabilities of Scaler. You don't give somebody a Scaler interface and they go, wow, I'm ready to go. It takes a little bit, but we realized that the abstraction API is very powerful and we can leverage that abstraction API and frontend it to a portal or a catalog or even a very simple catalog to actually provision, let's say, a multi-tier application, a web service, based on instead of cost, but based on a service level. So if I wanted seven nines of availability, I can have a Scaler farm that deploys a load balancer and instances in multiple regions, either private or public, and have a database that's replicated across multiple clouds. And that's a single API call because I've already created the roles, MySQL, LAMP, whatever, and then I've created the farm, which is a single API call. And now, so we've done that. Our first project in doing that was Europa's Model-Based System Engineering Environment. So Europa Clipper Mission is in design stage right now. So we have a large number of system engineers that wanna do Model-Based System Engineering. So we have a proprietary stack of tools that include Magic Draw, which is a third-party tool, all of our own proprietary tools that take that Model-Based System into a more traditional framework of requirements. And we can do that right now through a catalog that they make one single call, and that engineer will get that system in a few minutes. It goes back to LDAP, registers their hostname. It does all of the protective measures so that that whole kind of infrastructure is, it's multi-tier, it's modular, but it's needed in order to kind of accelerate this user. You can't just think about, oh, now I've got an instance, but that instance for us needs to comply with, like we said, NIST 800-53, all of our internal requirements, and all of that. Cool. All right, so can you talk a little bit about the different workloads that you're deploying and? Yeah, so I jumped a gun. Model-Based System Engineering, we talked about. We even talked about the web hosting environment, so we're building these kind of web application clusters based on availability instead of price, right? So in the back end, it could be a bunch of micro instances that have scale or auto-scaling thresholds. They can be all internal. They can span multiple clouds. And for scientific computing, we talked about the model where the scientist just tells us what libraries and what software they need, and we pre-configure them into Scala, give it an auto-scaling threshold, and give them something like salt stacks so they can pass the parameters that they need in each of those instances as they scale. That's where we are. Yeah. So at this point, at this point, we've reserved a very large block of time for it to take questions for the audience. So at this point, if you just raise your hand, there will be a... Yeah, first question. So I have a question for you. You have three different clouds and lots of data. So how do you get the VM and the data to be near each other to do this processing? Do you copy it or do they all go to a common source? So that's what one of our biggest challenges right now is these future missions are gonna downlink more data than we can possibly... Well, we can either do one of two things. We can build a really big pipe from where the data is downlinked back to JPL so we can do the processing at JPL or we can just move the processing over there. So that's why OpenStack is kind of important for us because now we can lease a facility very close to our downlink center or our archive center and start building out an infrastructure there. That's similar to what they have at home. So the portability of doing that. So that's what we're really facing today. We're looking at, look, right now we have pipes from the data archive and Goddard back to JPL. And not huge pipes, but they cost a significant amount of money. If we can eliminate those, specifically for the amount of data sets we're looking at in the near future, that would be ideal. I hope that answered your question. Thank you. Thank you. Great question. I was curious if you have your developers and users are interested in containers as kind of a higher level abstraction or if you see VMs being sufficient for the foreseeable future. So we are exploring containers now. You have to realize that NASA JPL is kind of a slow, we're not trying to be innovative in IT, we're just trying to be practical users or smart users of IT. So we are experimenting with containers specifically in our web hosting applications where we can create microservices out of let's say the MySQL and your Apache server and your load balancer engine X or whatever. And sometimes scalar kind of treats those things as roles. So your microservices can be roles inside a scalar. And at some point, I think when you guys do some integration your role could be a container. I think you guys are working on that now. Thank you. Hi. I have a couple of questions. Please. One on performance. So for HPC, obviously performance is important. So do you have any preference for bare metal as against virtual servers? And how do you address it? You know, latency through the hypervisor and the virtualization layer. If you're looking for a quantitative number, it varies. It depends on the type of virtualization you're running on. We do see a performance degradation but you have this kind of concept of elasticity, right? You can just keep spanning out until you horizontally, until you just buy more nodes, right? So your HPC users are just fine with virtual servers? If you need high interconnects, no. Cloud isn't going to solve that for you, right? Not interconnect, but raw performance. On bare metal, you get raw performance better than on virtual servers. Yeah, we find that bare metal is better performing than virtualization, but you don't get the elasticity and the scalability of cloud when you're running on just straight. So even on your private cloud, you're not looking at bare metal possibilities. We are. We are. We're very interested and ironic. Okay. And the second question is about security. So what are your security needs when you go to public cloud and how do you address those? Yeah, we have tremendous numbers of security issues in public cloud. So if we are saying, for example, Mars public outreach websites, all of that data, all that information is publicly released, right? So yeah, the right place to put it is something that has very little governance of security or compliance around it. So we stuck that in Amazon public. But when you're talking about security for, let's say, you know, community clouds like GovCloud, we have a serious amount of, what we've really done is, and this is not an Amazon talk, but we either use OpenStack because it's on-prem. So we have all of our security measures, but if we go out to a public cloud, we'll extend our VP, we'll extend our IT security controls, our intrusion detection, our packet captures by creating a VPC and extending a VPN to that region, whatever that region may be, it would be at Azure and Amazon. And by doing that, we're doing two things, right? We're encrypting in transit via IPsec, right? And then while we're there, because all of our controls are there, it's like a note on our network. We can watch it, we can know what's been done. We can syslog it, we can do a lot of things like that. So the real key to securing a kind of community cloud for us is extending it via IPsec. And also the reason for doing that is then the user still have the flexibility of all those cloud capabilities without, let's say, us telling them, hey, you must encrypt all your data at rest, right? Because once you do that, then a lot of the services you'd find in the cloud are not as accessible because you've got this overhead of encrypting everything at rest. So our trick really has been extending our VPN tunnels directly into a cloud provider. So one more quick question. So when you do these VPCs, are they from your on-premise OpenStack to Amazon and separately on-premise OpenStack to Azure? Or you have Amazon and Azure also integrated in the same VPC? Yeah, so right now, our OpenStack instances can talk to our Amazon instances, right? Because they're on the same network. We're routing all that traffic because it's all behind our firewall, right? But also between Amazon instances and Azure services? Not between Amazon Azure today, right? Just it's a matter of us not having a huge footprint in Azure, but we're working with, well, these guys are recoding their interface to match the new Microsoft Azure APIs so that we'll have that capability using a scaler. So you're planning to integrate Amazon instances with Azure services? We kind of do that today, right? So I can give you an example. We have an internal YouTube, we call it JPLTube, right? So it's a great resource to upload things like training videos and various things like that. And we actually use it for a very scientific purpose, which is we take videos that we've shot of and images that we shot of spacecraft assembly, right? So we also have to be 504 compliant. So we actually leverage Amazon to host the, and OpenStack to host the JPLTube storage and servers. But there's a service in Azure called Azure Media Services. One, essentially, we stick an audio file in Azure storage. It does this transcription and indexing of that audio file, and then we take that audio file back, and we take that index back, right? And we stick it back into our private cloud. So now you have a searchable, indexable transcription of the entire video file so that people who are hearing impaired can actually read the exact. So we are using a strange hybrid approach to multi-clouds already. We've been doing that. And it doesn't really require connectivity between one cloud to another. We're just sticking an audio file in media servers, bringing it back, sticking that transcription into our JPLTube. So we're completely 504 compliant. Thank you. Thanks. I wonder if you could elaborate on your choice of SaltStack. Just curious if you compared it with other things out there and what works and what doesn't. Chef, we use Puppet for us. It was a cultural thing. Our systems administrators are not strong in Ruby or Python, right? They're kind of essays, they're Linux guys. So to get started, we also like Ansible, but we like the fact that Salt had, it's based on YAML. It's not programming language, it's market language. It's kind of quicker to get them going because we need them to write the scripts for us. We need them to write the configurations. They, in the end, are supporting all those instances that are running. They have to ensure that they're secure. So for us, it was that startup. How quickly can we accelerate to do this? Because here we are launching lots of cloud instances and for they're not secure, if they're not given protective measures, we have a real problem, right? The other thing is, what we'd like about Salt Stack now is the Salt Stack talk. And everybody should line up, all the vendors, and I'll just give everybody a high five. There's a concept of a master and a delegate master, right? So IT security, our cybersecurity team can maintain the main master, right? With protective measures, with patches, with everything like that. And all your delegate masters will inherit all those things, but they still can leverage, if you're a developer, leverage the delegate master to do all your automation, maybe add libraries, stick configurations in, various things like that. So we like that concept as well. We have a main master that cybersecurity manages, and we give delegate masters to other service providers or developers so that they can use it to manage their instances. Thank you. All right, thanks everybody.