 All right, thank you all for being here. Appreciate it. So as Diane said, we want to learn from the experts and give you a little slice of what it was like to build some of the operators that these folks have built. We've got, as you can see, a pretty good cross section of different things from ISVs to folks building internal operators. We've got some experts from Red Hat as well. So welcome, everybody. Thank you all for being here. I want to start off with a question for everybody. And this should be an easy one. I mentioned earlier it's kind of exciting. You can see when you're starting to think through this desired state loop for building an operator that you might have found something out interesting about your software or how it interacts with the organization that you're running. So I'm curious just for everybody, what did you learn about building your operator? Did you find some hard-coded values that you had to undo? Weird service discovery logic? Something didn't work? Anything like that? Let me just go down. Yeah, I guess I think the number one, I mean there's obviously like all these little tidbits that you find when you do this. But one thing that I found particularly interesting is that you realize that with all the knowledge that has been built into the operator and the general idea there is that there's a lot of distributed problems out there that have already been solved. And once you approach and you engage with the patterns that operator gives you, you realize these things have been solved already and you don't have to solve them again. It was for us, we have a simple leader election mechanism and it works. But for complex environments, we can get a little bit more intelligent about it. And operators clearly showed that we can do better there and can use just existing patterns. One thing I forgot to mention, introduce yourself as well as you go. Oh, hi. My name is Matthias Lubken. I'm the product manager for a company called Startup, called Instana. We're a small startup in the APM monitoring space. We actually have a booth out here for the next week and we do monitoring of Kubernetes and OpenShift environments. Hi, my name is Annette Kluwit. I'm with Red Hat. I'm a storage architect in the storage group. And I actually did not create the operator that I have been working with as the Rook operator. And in terms of, I guess, this isn't a surprise, but when you're trying to manage storage via an operator, you have to decide what storage you would like the operator to use. And if you don't do anything, there are ways that it'll consume any available storage. So there's some decisions to make about how much control you give over deciding exactly what you should consume and what you shouldn't consume. So hello, everyone. My name is Dennis. I work for a company called CultureBase. So because we are a database company, we have lots of configurations. And one thing that we notice is, even though YAML files, they are great for a lot of stuff, when you deal with stateful applications, even though your YAML file is valid, it might not be valid according to the current state of your application. So we had to implement a very robust admission controller to handle and say, hey, yes, are you sure that you're going to deploy this file because this might make you remove a bucket or delete your database or remove your backups and all that sort of stuff. And this was one of the things that we have to gather to bring basically the whole company and say, hey, can we compile all the knowledge of the company in this operator here to be sure that we are validating according to each possible state? Hi. My name is Balaji. I'm part of the startup called Ops-MX. We are a commercialized spinnaker, which is CI-CD, open source CI-CD tool. And we basically wanted to create an operator for spinnaker. It does a nine microservices CI-CD tool, and it's obviously difficult to maintain the state of all of those microservices and all the pipelines. So one of the things I guess we found is that I think the current tool that's used in the open source community and wasn't really doing a good job of cleaning up or painting. Look at the state of the services that it deployed. So when we use the operator, it failed because you guys deleted as part of the framework of certifying. It failed the test because some of the services are not even up, and you try to delete what you created. So it was really helpful for us to make sure that we are checking the status of all the services, making sure the services are up, and we're going to do the hygiene of a good application. We were not doing that before, and so it was now forcing us to think about it and do it, and so it was very useful for us. And I think it is useful for obviously the end users at the end of the day because they are able to deploy and get the right behavior. Awesome. Hi, everyone. My name is Mark Ruckert. I'm a senior software engineer at SIX. We already heard from our company earlier this day how we use OpenShift. And for us, we are a really small team. The cluster team at SIX is based on three people. So we have quite a few customers and a lot of projects. We have a non-prod cluster. We have around 200 projects, and we were looking for something to help us setting up these projects that they comply to all our requirements that we have, and we found the operator SDK to define a basic set of configuration values to set up all these projects. So the operator helps us to set up all the projects the same way, and they all have the same values, the same features. And it also allowed us to include the setup of the project into our company's self-service portal because with the customer source, we only have one YAML file that we can ship from one service to the cluster and have everything set up correctly. Awesome. Super cool workflow that you have there. Annette, this question is first for you. Storage is extremely critical to the cluster, especially as if we've got a bunch of disks that we're going to aggregate together, we want to make those available. Oh, hey. Before, I'm Nestor from SISD. Oh, thank you for joining us. Introduce yourself. Just as time for the presentation. So thank you. I'm Nestor from SISD, and I work for, basically, SISD. It's a company which takes care about your monitoring which monitors and secure your cluster. So basically, we do the following. We capture everything that is happening in the host machine. We're using a kernel module or an ABPF proof. And then we can use all of that information that we gather, all this is called information we gather for two purposes. The first one is for monitoring, for knowing what's happening inside the application. And the second one is about security. And, well, the operators help us. My personal mission in SISD, I work as integration integrations engineer, and my personal mission is to make SISD available to all the people. SISD hand also Falco, of course. So operators help us to do that task. So my next question is actually back to you. So I hope you got your speaking voice on. The interesting thing about SISD is you can interact with other non-containerized or non-cloud resources even. So I'm curious, if you had to change anything about how you built your operator to interact with that environment, where you're not just monitoring Kube APIs and things like that? Yes. Yes, the goal, basically, we are opening the host SISD is deployed in your cluster. So we can do it using playing Kubernetes manifest. We also have a hem chart. And we also offer an operator. So the good stuff about all of the things we learned in building the operator is that about around 70%, 80% of our customers are on-prem customers. So we need to polish some details in the backend for keeping the systems up to date. You have to upgrade some kinds in the database, so taking some backups. The first one, the first approach we did with the client, is almost, it's not a big deal. I mean, it's a diamond set with some configuration, with some fine tuning parameters, and that's all. But the next step is about we are considering to use an operator to allow our on-prem customers to improve each user experience when they are managing their cluster. So it's just, yeah. Yeah, you know, you got to put some specialties. I made some opportunity. We didn't expect that, that use case, but cool. All right, back to our storage question. So storage is really important to the cluster, and you're aggregating these disks. And obviously, if it's backing your staple storage for a database, or you have backups going to an S3 interface or something like that, you want it to work. So what does the operator do to help protect us from ourselves if we're going to cause havoc on the cluster? Thanks, Rob. So the Rook operator and the Rook operator that I've been working with at Red Hat is managing self-storage. So it is able to do the deployment in a way that really, there's very little interaction. Basically, like I said, you identify the storage that you want consumed. You identify what, in this case, either Kubernetes or OpenShift hosts have that storage. And then it goes out, finds that, creates the cluster, sets up all of the stuff. Basically, the stuff tuning is no longer an issue because the operator makes the decisions about how it's going to be tuned. I mean, you can always go back in and change that. But the idea is that it's really sort of hands off. So when you're done, you basically access that storage via a storage class. And you're able to get in your templates, or whatever way you're deploying your apps, you're able to make the claims. Again, you're not knowing exactly what's going on in the back end itself. You're just making the claims and persistent storage being mounted into your workload. So that's certainly the deployment. The other part for an operator, someone who's operating OpenShift, is the whole upgrade and then adding more storage to your storage cluster. That, again, is all done via the operator. All you have to do is essentially go into the CRD, the cluster CRD, for example. And if you want to go from one subversion to another, you change the subversion. As soon as you save that config, all of the pods or the workload for Saph starts restarting. And within probably 10 minutes, you've just upgraded Saph. You can also go in, let's say you want to add more storage or add more storage nodes. Same thing, you just add the storage into the cluster CRD. And all of that is done by the operator. That sounds like magic. It is magic. The last thing I'd say about that is that the Rook operator is aware of very aware of Saph, but Saph is not aware of Rook. So it's still Saph on the data plane. It's all the good stuff that Saph has, object, file, and block. But Rook is watching and observing and, in some cases, restarting services if they go down. That's really cool. Right before the panel, we were talking about how the availability zones of your nodes really matters, especially when you're talking to a data plane. You want to decrease that latency as much as possible. And that's some of the expertise that the operator just knows how to do, which is great. Dennis, this question is to you. And I know the operational expertise that's baked into the couch base operator is there's a ton of it. If you look at some of the default tunables that you set, there's a lot of stuff there. So how did you think through with your engineering team which levels of configuration are exposed at this top level CRD? And how did you think about that? How do you introduce new things over time, that type of thing? So we started with operators roughly two years ago. Back then, we didn't even know if the operator was the right thing. And since then, it got very popular. Luckily, we made the right choice. And in our case, we really had to first, in general, even though we want to recommend, hey, this is the production setup, in general, we plan how developers want to use couch base. So most of the time, they will run on mini-shift, on minicube, or on staging environments. So by the way, we send a very simple configuration and we have all the defaults. And because couch base is very modular, you can pretty much adjust or boost the characteristics you want to, according to your application. So if you want to boost your reads, you can add more index and query nodes. So if you want to boost your writes, you can add more data indexes. So because of this, we are very flexible with the configuration, but we spend lots of time trying to understand, hey, how the whole CRD will look like, how can we give maximum flexibility, and also how can we ensure that this is the single source of truth, even though if someone goes to the web console or CLI and tries to change something, we will revert the whole thing. Of course, building an operator today is much simpler. You can basically create a very simple operator in 15 minutes or something. And one thing that I would like to mention is, do test. Even though your operator will run in out-cloud providers, sometimes there are some cloud provider-specific behaviors. So if you plan to sell your operator or to test, please try to get the certification of out-cloud providers so you make sure that, yeah, we tested our operator, especially when we talk about storage. So it's still something that we are working on, on Kubernetes. We are trying to make it simpler. And that's basically where we see when we have very small different behaviors that we have to handle on the operator side. Awesome, that's really good advice. And I think it shows you that when you're thinking through your CRDs in that spec, you need to treat it like a public API. It's going to be versioned with the group version kind as per standard Kubernetes. But if you get it right, you can have something really powerful, but also have a really great default set. So if you're going to spin up a small instance, you just get something that works out of the box, which is very cool. This next question is for Mark. Switching gears from the ISVs and partners to somebody that's building operators internally, I'm curious about what types of things that you needed to hook into your own environment to build this operator, and how has that kind of shifted the way maybe you set up that environment in some of your future thinking there? Yeah, using the operator helps us really much. We can fast react to the needs of our internal customers. For example, some customers wanted to try to use Helm charts. And our company policies doesn't allow port forwarding as Helm does it by default to connecting to Tiller. And we have to find a way how to communicate via TLS and an HTTPS route. And since this setup is not that easy, we extended the operator to set all this configuration for Tiller, for create a service route, creating also the client certificates that it's not an authenticated communication. And since we are building the operator ourselves after two days, we had the change ready to be tested. And the customer can now, by checking a Boolean flag, Tiller true, he can immediately set up a Tiller in his namespace and use Tiller just for himself. Also, we use different operators also for Ingress ACL whitelisting. There we have this Tuffin firewall documentation tool when we can read all approved connections. And from this firewall documentation tool, we generate the custom resource file applied to the cluster and all the routes get the correct Ingress whitelisting IP ranges that they should have. If the customer overrides it, the controller checks it via the state that it has to be and overrides everything that's not allowed. Yeah, that's really cool. Think about this is a very sophisticated bank and there's no tickets. There's not a Jira workflow for getting new whitelists and things like that, getting a new project. That soft service is really, really powerful. So last question for you building on that is, do you have any idea what your next operator might be internally? I don't know. Everything our customer needs. I like it. Everything, basically. Love it. No, there are no limits. And really, if you have some recurring configurations at your company, really think about using an operator. We are currently not allowed to open source our solution, but feel free to contact me or talk to Rob and the other guys. They are willing to help and really operators help us as much as we like you to help. Yeah, I'll say, come to the operator SIG if you've got some internal use cases that you want to talk to, some of the experts. We've got a bunch of folks on the phone call that can help you with that. All right. Next question is to Matias. So it sounds like Instana has had a bunch of other methods of installing your agent on a bunch of different pieces of infrastructure. I'm wondering if you can contrast using the operator's concept, and I don't know if you use our SDK or not, versus just other things like Helm or running raw deployments and things like that. Yeah, so for Instana, it's important that we get our agent everywhere as easy as possible. So existing mechanisms, whether it's a shell script or whether it's a Helm template, if it's YAML or something, we try to provide it as much as possible. And obviously, something like the operator, the operator hub is for us, and again, another good step for being there where our users are. But that is just the beginning. And I think the day two operations that mentioned on the path of being a more sophisticated operator is something that we are obviously also being going to leverage. Our agent itself is pretty intelligent, but there's always room for improvement to get better there. And one thing that finally clicked with me when thinking about the operator use cases, custom requests, and how we can map that to the operator was, and I don't know who to quote here, but someone said the operator gives bridges domain knowledge and cluster knowledge together. You have your domain knowledge about your application, and you have cluster knowledge, and you bring these two together. And that's, I think, if you think about that way, then a lot of new, exciting ideas pop up. And for us, it's making our agent more intelligent. There's so many complex environments out there, different customer environments. Not one cluster is the same. I guess Red Hat can talk about that a lot. And for us, we want to see that complex environment and make our agent more intelligent, see what's happening there, divide the workload, and operators brings the cluster intelligence to our agent. And we really, in our overall story, are automating as much as possible and making the user experience as seamless as possible that fits right in there. And that's why we're excited about the operator. Awesome. Yeah, that's great. Plugging into the ecosystem and those control points is really, really important. Next question is for Bellagie. And I want to ask about CICD is something, as everybody mentioned earlier, Jenkins is a tough thing to run. And that powers a lot of this type of stuff. And you're not going to have just one cluster, but there's going to be many clusters that you're running. So how did your operator make clusters feel a little bit more seamless, like you know you're going to roll out two different production clusters in the exact same way using Spinnaker? Yeah, so one good thing is that the Spinnaker itself allows you to deploy to multiple clusters. The single instance of Spinnaker, for example, not only on-prem, you know, OpenShift clusters could be AWS cluster, et cetera. So that itself solves that problem a little bit, I think. But obviously, customers or end users want to have Spinnaker as a service. They want to deploy Spinnaker per cluster or per customer, their customers. So having the operator is a very useful thing for us to spin those kind of things very quickly. Currently, we're only doing the basic install portion of the lifecycle, I guess. But this is something, the whole process of lifecycle management is a huge problem. We have customers, big customers, large customers, spending months trying to get this right. And also scaling is a bigger issue as you start adding more users, et cetera, to the project. So we hope to get, you know, we open source this and we hope to open source the other steps as well that helps people to adopt Spinnaker or any other, in this case, Spinnaker in our case, to be able to deploy across for large clusters, yeah. Awesome. So it sounds like people are basically using the operator to do a kind of as a service internally for their CI environments, which I think, does everybody want that? Does that sound awesome? Anybody have a Jenkins operator by any chance? No, I didn't think one existed. I wish it did. Yeah, our goal is to make a Spinnaker operator, essentially. So you can spin it up as quickly as possible. And I think the beauty, I think, VCIC is that operator takes care of all the things that you would screw up if you tried to do it manually, in various scripts and et cetera. Anything, in particular, that you see a lot of people screw up on? I think, like I said, one other thing what we saw was that the status of the services, sometimes it's not up, sometimes it's down. The ability to make sure that things are always in the desired state is a pretty important thing. Scaling is a problem. As users started adding more users, et cetera, so you want to be able to auto-scale. We haven't gotten to the point, but I think that something would be very useful to us. Awesome. All right, last question, and this is kind of directed at whoever wants to answer, is we're always trying to figure out what's the next SDK that we might want to introduce. And so I'm curious if anybody has any opinions and maybe thoughts around why we should choose Python or Java or whatever as our next SDK? Quarkus. All right. So if anyone was writing operator in Java and is looking into the Quarkus stuff, that's pretty awesome. Give everybody a little background. What is Quarkus? So Quarkus is based on Grail VM, an environment where you build Java applications. And it uses some of the old, reduces some, or takes some of the old reflection methods away, where you have to do a little bit more during compile time. But the result is that you can reduce the starting time of the Java applications by a factor of, depending on your size, a factor of 100. So you get really, really fast applications built for a cloud-native environment. It's not particularly a cloud-native. Long story short, Red Hat is pushing it a lot, which we like. We're a big Java shop, not just only for our customers, but also for ourselves. And we built, actually, our operator with Quarkus itself. So our operator itself is built in Quarkus. So anyone is like the next SDK, and now it definitely needs to be Quarkus. And if it is, and if you're building on that, reach out to me. We love to sink on it. Awesome, yeah. If anybody wants to bootstrap that effort, I think our operator mailing list would be a great place to do that. I'll throw out, if you are a user of Red Hat's AMQ Streams operator, that's actually a Java operator that isn't using any particular SDK. It just calls Kubernetes APIs via Java. Anybody else? Any opinions? Yeah, I also got contacted from our internal customers. They saw or read about the operators. And since in our company, most of the applications are written in Java. So a Java SDK would really be used at 6. We also have some customers that started learning Go to write their own operators, to distribute their application in our company as a service. Awesome. Yeah, sounds like Java's very popular. Any contrarian opinions on that? All right, we're going to go with Java. All right, that wraps up our panel. I hope you all learned a little bit about operators. Thank you all for joining us. And like I mentioned, go build some operators. Follow these links. And we'd love to help you out with our operator SIG and the mailing us and other community activities. Thank you all. All right.