 From around the globe, it's theCUBE with digital coverage of PostgreSQL Vision 2021, brought to you by EDB. Hello everybody, welcome back to PostgreSQL Vision 21. My name is Dave Vellante, and we're super excited to have Josh Berkes on. He's joining us. He's a leader in the Kubernetes community, extremely well-versed in containerized applications, application development, containerizing databases. All things open source. Cube alum, Josh Berkes, welcome back to theCUBE, great to see you again. Thank you, I'm glad to be here. So just recently, you're coming off KubaCon. We heard some of the themes from that event. There was a lot of focus on inclusion and diversity, which of course, you know, that's the open source ethos. And a lot of discussion around designing security in, the whole conversation about shift left, that's great to see. Larger companies giving back, obviously a lot of the pressure over the years on these big companies that is a one way street. They're actually giving back, making some investments. So we love to see that. And just open source continues to be the main spring of innovation. I got to say, I had a call out in a recent Red Hat survey, the state of the enterprise open source in 2021, 90% of technology leaders said that they're adopting open source. And I joked that the other 10% they're doing it, they just don't know it. But so what were some of your takeaways from the event and some of the trends you're seeing, but specifically as it relates to containers? So, I mean, you're right. One thing is this sort of return to security, the security topic again, because we've had like a couple of things happen. One was, you know, when we initially got, started doing containers of platform with Docker and with early Kubernetes and that sort of thing. We had, we got a lot of container image scan, right? So you have like Claire and Docker has a scanning thing and Amazon and Azure have their own scanning things. And people felt that was kind of good enough for a while, but then we both had the SolarWinds hack. And the thing is like in the meantime, we've gone from a stage where people were mostly using Kubernetes and Dev to people using Kubernetes and production, and there's a lot of extra security issues and vulnerabilities that come up in an actual production environment that people just didn't necessarily think about before. And so now we're looking at adding more pieces to the security stack and making those more standard for everyone who uses Kubernetes. And I've had the chance to work with the StackRock folks since they became part of Red Hat. So it's been very exciting to look at the whole thing and look at things like container supply chain because as SolarWinds showed us, obviously, it's not enough to necessarily just trust the vendor. You need to trust their whole supply chain. I, yeah, and it helps to be able to examine that supply chain. Yeah, it's very scary when you look at that. You're absolutely right. Multiple components of malware coming into an organization through the supply chain, self-forming, different signatures, and so it's great to see the community spending time on that and emphasis on that. Now, I got to cut right to the chase here. In 2018, you wrote a two-part blog series. It's called Should I Run Postgres in Kubernetes? Obviously, it's highly relevant for this community. So I want to talk about your perspective. Well, first of all, the thing I love about you is you're technical and you can go deep, but at the same time, you speak to a business audience. So you're welcome and thank you for writing this and communicating the way you do. But talk about when it makes sense and when it doesn't. I mean, that's kind of, my big three takeaways on the pros were simplify, simplify, simplify, especially if you're running application components and other services on Kubernetes, but give us the update three years later. Why should you? Why shouldn't you? Let's actually, let me zoom out to an even bigger picture, which this is honestly like every new platform that we've got, right? So when virtualization and VMware became a thing, we had the same sort of decisions about when do I move my database to this? When AWS and the public cloud became a thing. You know, I could have like, like if I'd written that 12 years ago, I could have written it about AWS and it would have had a lot of the same decision tree. Cause what it really sort of comes down to is the more commodifiable a particular database instance is, the better a candidate it is to move to an advanced infrastructure platform. You know, and the most advanced currently being Kubernetes. So to the extent that you can describe this particular database, what it does, who needs to use it, what's in it in a simple one pager, then that's probably a really good candidate for hosting on Kubernetes. Whereas if you have a database where it's like, hey, the entire company uses it and it's so complicated, we can't describe its inputs and outputs. That's possibly the last thing in your company that you're going to migrate to Kubernetes because in both, both in terms of there's less game to be made there, right? Because the real advantage of moving stuff to Kubernetes is your ability to automate things. The whole way I got into Kubernetes in the first place was I started out, wait on the line, not using containers at all. I was just looking to solve the problem of how do we automate Postgre's high availability? That's what I was looking for. And it started out with something I built using Saltstack called Handywrap that Casey and I built. And mostly that was a problem discovery exercise. We discovered what the hard problems were there. And then we moved from that and then we moved from that to Docker because containers offered an encapsulation strategy because one of the problems you run into when automating high availability is, is the database actually down or not? And so the first thing that containers offered us was not packaging, which people usually talk about, but instead encapsulation, right? Because it's a lot easier to determine is the container running or not than is the database down or not because an actual Postgre's database is multiple components and multiple processes that make it up. And some of those can be down without the others being down, which can then make you think a database is down that's not actually shut down. And being able to put that in a container gives me more of a binary up or down. And then from there, I got into, okay, well, but I need to automate a lot of other components. I need to automate the storage and everything else and that led to Kubernetes. And so if you look at it in terms of deciding when you're going to migrate the database to Kubernetes, you look at, can I take advantage of that automation, right? Is this something that my application workflow and my team organization allows me to do? And if the answer is yes, right? Particularly like if you're in a company that's doing the full DevOps thing where you have a unified development and the infra team that owns the entire stack, then those people are going to be a really good candidate for moving that stack to Kubernetes. Got it. Okay, so let me ask you, so in database, especially in critical apps, recovery is everything. When something goes wrong, you got to recover. So if I understand it correctly, just in reading and listening to you, if you've got Kubernetes expertise and you're building applications in that environment and the application components are in there, am I inferring correctly that you're going to be able to automate and facilitate high quality recovery with certainty? Right, yeah. There's a bunch of infrastructure involved, right? And this is why what enterprises do is they move things like the web front end to Kubernetes first and that is what they should do, right? That is absolutely the right order of things to do because the minute that you're looking at bringing databases in, you're now looking at your whole storage infrastructure, right? So that direct attack storage that was attached physically to one machine is not going to work once you've moved to a container-based cloud, right? You suddenly need a way to be able to attach that storage to any of the nodes in your cluster so that you can move the database around and you can have fellow. And so, but once you build those things up, you can, right? I mean, some of the stuff that I've done, the, I work in the office of the CTO now at Red Hat, so I'm not in production support. So the only Postgres instances I'm supporting are ones for some open source projects we support like the Python project. And in those cases, it's not a high-criticality database but I'm not support, I'm not on call on the weekend so I want something where it doesn't require me to be on call in order for it to stay up. And so putting that in an open shift with the Petroni failover driver was the answer for that and it has failed over in the Red Hat IT team context me and says, hey, we need to move those servers and I'm just add a node to the cluster and delete the old node and it'll do the right thing. And I don't have to worry about it which is really what you're going for there. The other thing I took away from your writing was that you suggested that a lot of the successes in areas where the Postgres databases were rather small and there were lots of them. And so to the extent that you can automate that, you're going to save yourself a lot of problems. Whereas in the flip side, if you're running extremely large databases or there may be performance constraint, that might be an area to be a little bit more circumspect. Yeah, and that's absolutely true because like the other side of this, right? Like I've worked with the DevOps people and the people who are on a Roku and that sort of thing that have one database per application, right? And those people are great candidates for migrating. But then I've also worked at the people who have a one big database for the company where the databases, three terabytes in size, it powers their reporting system and their customer system and the web portal and everything else in one database. That's the one that's really going to be a hard call and that you might in fact never physically migrate to Kubernetes because even if it's on Kubernetes, you are going to mess with the hardware policy to give it its own dedicated machine. So in that case, what I would honestly tend to do is there's a feature in Kubernetes called Service Catalog that allows you to expose an external service within Kubernetes as if it were a Kubernetes service. And that's what I tend to do with those kinds of databases because there's not a huge advantage in actually physically moving the database to a container. There's a bunch of steps involved in going via Service Catalog is a lot easier. But essentially you're speaking the same language in that example that you just gave. Now, the other thing you pointed out at the time that you wrote this article is there's a lot of pre-1.0 kind of alpha in the Kubernetes stack and it might be prudent to not put in your HIPAA compliance stuff, has that changed as it evolved? Yeah, if I was to update two things in the article, I guess that would be one of them. The other one I'll get to in a minute. Great. So the first one is that Kubernetes has progressed along that maturity timeline. We recently added the production readiness reviews as part of our feature review process. We've really improved test adherence so that we're not releasing with known broken tests and a bunch of other things to make it more stable. But part of it depends on who I'm talking to because there's still degrees here, right? So if I'm talking the context of the world of software, then Kubernetes has reached the point of maturity that it is as stable as anything else. And if you use a release, you can assume that any sort of major issues have been worked out. The one difference with it and some other platforms people may have used is, it's still young enough that backwards compatibility can be an issue. As in, Kubernetes releases now three times a year. We've stepped down from four. And within three releases, you can find yourself needing to change API calls, which means needing to refactor parts of your application. So if you compare that with some other things like a JVM platform, right? When's the last time you're to major API change with a JVM platform? So, but you know, the Kubernetes is only six years old. So that's part of that. And then the other thing is the questions we're talking to the Postgres community, right? Which is within Postgres, people run the daily Postgres snapshot in production. I would not do that with Kubernetes. I would wait for release. So there's still kind of a difference there if people are coming from the Postgres community, right? It's the use to this really extreme level of stability that we have with Postgres. And Kubernetes as a much younger project isn't quite there yet. So that's a process change that you have to be aware of if you want to take the benefits of containers with Postgres, you just have to really understand that and make that process part of your change management. Yeah. The other thing I would say has changed is there are new opportunities in running your data warehouse, your big data databases on Kubernetes. A number of platforms, the one I'm most familiar with is Citus because I worked with those folks. I that have taken advantage of Kubernetes as a deployment and management platform for their database, their big data database infrastructure, which makes sense, because if you look at a lot of modern data analysis and data mining platforms that are built on top of Postgres, part of how they do their work is they actually run a bunch of little Postgres instances that they federate together. And then Kubernetes becomes the tool that allows you to manage all of those little Postgres instances. So that's the sort of exception to the should I migrate this really big database, right? That can be a yes if you are migrating it to a big data platform that supports Kubernetes and there can be a huge advantage to it. So obviously you've got the practitioner knowledge and you work within the community. I'm wondering if you can share, just thinking about sort of the motivation to move to a container environment if you're one of the Postgres folks in the audience, can you share any, you know, either anecdotal or other data on business impact, you know, benchmarks that you've seen, you know, some of the things that you've seen with some positives there. If you actually look at my history, we can talk about performance as one, right? And if you actually look at my history, I actually did, and for that matter, some of the folks from Prokona and some of our other folks in the database field did a bunch of benchmarks of running Postgres and MySQL on Kubernetes versus running it not on Kubernetes. And one of the advantages of containers over VMs is that there isn't any intrinsic, there's not any intrinsic sort of layer gap or virtualization that modifies your performance. In other words, if a container is using storage that's present on the node where the container is running, it is using that storage through Linux and therefore the performance is with some caveats, performance is going to be identical to if you were running that on the host system. Now, where performance differences creep in is that you might not be able to use the same kind of storage. In that Kubernetes and container systems in general are organized around the idea that no service is using a majority of the resources on the system. So again, if you're planning on using running a larger Postgres database that really needs all the RAM that a system has, you're going to have to do a lot of tinkering with Kubernetes configuration to get the same performance you would have of running it on a dedicated hardware node. Okay, but fundamentally you're saying that overhead is less with caveats, like you said, you just mentioned the storage. Right, yeah, the overhead is not any different from if you were running it on the host system. So a really good example that was, if you go back to my lightning talk in KubeCon Austin, I think, I showed running a benchmark with Postgres on an AWS instance using EBS storage, both not in Kubernetes and in Kubernetes. And there was no perceptible performance difference between the two of them because it was all metered by how fast was EBS for you. Right, and I said less, but I should have been more specific, less than say you would expect with virtualization. Right, right. So, and so this comes down to a business decision, right? Which is that if you're already on some sort of cloud storage or network storage, and again, you have databases that can share hardware systems, then you shouldn't really expect substantial performance differences by moving to Kubernetes, right? That's something you can eliminate from your set of words. But if you're going in the process of going to be migrating from direct attach storage to network storage, then you are going to see a performance difference, but that's caused by the change in storage. Right. Or if you're going to be moving from systems that are not shared to systems that are shared, again, you're going to see a difference from them. But it wouldn't be any different than if you did that without Kubernetes and containers being involved. Right, so if you're using any world-class shared storage device from whatever, name a big vendor, that's going to accommodate if you're racking and stacking your own flash drives or worse yet, spinning disk drives as direct attach, that's maybe a different story, so okay, that's good. Where would you advise people to get started with Postgres and Kubernetes? The nice thing is there are a number of advanced systems now. And advanced systems that are supported by the various Postgres vendors. And that can actually be a great place to get started because the systems are open source, so you can try them out. This is as far as I know, they're open source, you can try them out, but then if you decide you like them, you can get support. And so that would include crunchy data, EnterpriseDB has a system that I'm honestly have to admit less familiar with than the one that Crunchy runs. Statgres is another one out of Europe that has their own system for running cloud native Postgres. And it's one I'm forgetting. And what a lot of these have to do with is taking advantage of the automation. Because obviously you can put Postgres in container play around, right? But your whole point of moving to Kubernetes in general is gonna be take advantage of the automation so you wanna look at the various automation platforms. And you can go ahead and do that. And the one I'm most familiar with because I've developed it's Petroni, is the component for automating Postgres. You do Petroni plus you do operators. It's another word that comes in here. But if you're looking at this as a business, you're probably gonna want something that's supported or that at least there's a potential to buy support. And a bunch of the different companies in the Postgres space package up these components for you into a platform. Like I know the Crunchy platform uses Petroni plus some proxy stuff plus PG back rest plus a couple of other things to give you a sort of full automation platform for running Postgres on Kubernetes. Awesome, last question. Where are we in the whole container adoption? We started out kind of you mentioned this stateless and now you're building stateful applications. But still you look at the, we look at spending data with our data part in ETR and containers and container orchestration. It's right up there with RPA, with cloud, with AI just in terms of the attention and resource that's going in. So it's exploding. It feels like it's still early days and there's a lot of legs left. What do you see? Yeah, well, a lot of it is, I mean, you're talking about migrating IT infrastructure, right? So where we are with Kubernetes is we have the early adopters, right? We have all the people who were at the point of building their new infrastructure when Kubernetes came out, right? And people who had major unsolved problems which is a big reason for adopting a new platform was that it just was no old platform for you. And so we had sort of have those people and those people are already on Kubernetes and running their stuff there. And so now we're looking at the really long path of people who are not in one of those camps moving, right? And in a lot of cases, that's a matter of coinciding with other reasons why they have to look at an upgrade. Because even if, you know, whether it's the gradual replacement of old applications by new ones, you know, where you gradually all the legacy applications get offline and the new applications run in Kubernetes or sometimes it's a, hey, we're waiting for a replacement cycle, right? We're waiting for, we already had plans to move from on-prem to public cloud. And so we're gonna move from on-prem to public cloud on Kubernetes to make it part of the migration. And that'll be yours. You know, I still, like, I have fingers into other areas like I always still know a lot of people in the nonprofit space. And a lot of nonprofits just got around to adopting virtualization. Like they're not even at public cloud yet. I don't even talk to them about Kubernetes. So there's this huge long tail in terms of adoption. The nice thing is we don't show any signs of stopping, right? Is that one of the things that we kind of learned from earlier stuff, particularly learned from our friends at OpenStack was to really, really focus on the APIs. To look at who Kubernetes more as the hub of a system with of an infrastructure idea with potentially unbounded growth. So if you have a new concept that comes in like service mesh, service mesh is not a successor to Kubernetes, it's not an alternative to Kubernetes. It is a thing you layer on top of Kubernetes because we didn't make it exclusive. Right, right then, great, great example going back to OpenStack and thank you for bringing that in because lessons learned, right? And so, Josh, we got to leave it there. Thanks so much for coming back. Thank you. Great conversation, you're awesome. Okay, good to talk to you. All right, and thank you for watching everybody. Just keep it right there for more content from Postgres Vision 21. My name is Dave Vellante, you're watching theCUBE.