 Everybody, so today we have a very exciting panel for you. I'm Sameeth. I'm the founder at App Formix. App Formix does analytics and optimization of cloud operations. Prior to App Formix, I was working at a very large public operator. And I'm just essentially very, very passionate about cloud operations. And that's how this group came about. So let's start with a round of introductions. And while you're introducing yourself, if you could also tell the audience about the top two goals of what you're trying to achieve in your cloud journey, to kick things off. OK, so I got the microphone, so I'm going to start. So my name is Seth Garmagana. I'm the cloud operations architect for WordDate, what is a SaaS company. Provides human resources and finance applications. We are running OpenStack in productions. So we have a few challenges. What we're trying to achieve is to improve the amount of virtualization that we have in our data center. We have a lot of growing going on in the company. So we want to scale our applications a little bit better. So OpenStack is providing the right cloud management system to do that. Probably the second goal is to be able to have my team not waking up in the middle of the night because something is not working. So I want to be sure that everything is in stable, repeatable mode in our data centers. I'll second the second goal for sure. I don't want to wake up in the middle of the night because something went wrong. So I'm Anand Kumar. I'm from PayPal. This is not Python, Pearl. James thinks it is. It's PayPal. So PayPal runs one of the largest private OpenStack clouds. Two goals that I would say I have one increase the amount of automation. I would actually contradict Edgar's first goal. I don't want as much virtualization as there is in the cloud today. I want more containers being deployed, more agility in the system. I definitely want more reliability so that I don't have to wake up. The third goal I really want is for my developers to be able to get what they need when they need it. So more agility, better capacity, planning, anticipation. And I would like to see some tools that we can use for that. I'm Joe Sandoval. I'm with Lithium Technologies. Our software is a service. And what we specialize in is helping really connect brands with customers. We've been running OpenStack now for about two years now. And the real drivers for us is kind of similar. I can echo the same sentiments as the previous comments. But we're very pragmatic, meaning that OpenStack has definitely been helping us to get agile in the data center and really being able to leverage a lot of what's happening with automation and tooling. But we also consume public cloud as well. So we're going to use whatever the right tool for the job is that's going to help us to be elastic and to help us to scale. The second part is that we definitely are individuals that we really question everything that we're doing. We're never filled content with that we've delivered the best solution because even when we got to Vancouver Summit last year, we could arrest it and said we had a cloud that's going to operate and help us exist in the data center. But we kept challenging us because we want to help our developers be able to deliver product and code faster and to be able to not really be slowed down by infrastructure. So that's why we've probably seen some of the things that Lockey talked about. It's always we're constantly trying to push what we're doing in the data center. And yes, we also believe in like containerization and a lot of this movement because we really want to be able to speed up and simplify what we're doing. Hi, I'm Stan Chan. I'm from Visa. Probably don't. Everyone knows what Visa does. So I don't need to tell everyone that. I'm a chief architect of systems working with the infrastructure, architecture, and engineering group focused on infrastructure service and platform as a service. That's my current focus. And what we're trying to do with OpenStack is to enable developer productivity. Basically in focusing on giving the tools for developers to build products and services for Visa and not needing to worry about the underlying management of those applications. Once it goes into a specific environment, that's one of the goals. And the other goal is to provide a platform that's invisible to developers so that they could focus on just building value for the business. And I'm James Downs with Walmart. We've been running OpenStack for about three years. And we're looking at things like increasing agility like everybody's doing. But Walmart's always looking for increasing efficiencies and driving down costs. So those are sort of the top things that we want to do with the cloud. Quick check on bistros. Are using trunk, bistro, what are you using? Why? Just really quick and then we'll get into it. So we're using the packages from RDO. Sorry, I just skipped it. That's a quick answer. Yes. So no bistro. We're building everything by ourselves. We're using the Chef for Cloud Management. I'm sorry, for Configuration Management. I guess we'll keep going in this order, right? No, we just think out of the circle. There you go. We'll keep it going. All right, so we use upstream. We just upgraded from, we were sitting on Havana for a while. We had upgraded last year. And just upgraded to Kilo and one availability zone. At some point, the decision was made that we were going to invest in engineering ourselves and get upstream, not bistros. So we're using Morantis. We chose Morantis, one for there. They're doing a lot of large-scale deployments. They're seeing a lot of different implementations. We really wanted to leverage their knowledge in that space to help us be able to design something that was going to scale with our business. And we're using a bistro. There's actually several reasons for that. We need, basically, to partner with someone who is basically engaged with the community and has the expertise in order to build an open stack at scale. And we chose Helion as our platform and partner in order to achieve this within our organization. So ultimately, it's about finding the right balance. And it would be great if we actually built the in-house talent, but that requires time and effort in order to build that knowledge and experience within our organization. And we're taking those steps as we speak. And we're running the Ansible deployment with support from Rackspace. So we're doing a pinninated deployment that Rackspace uses with some of our own pullbacks of patches and stuff for performance or specific features that we need. So it's essentially upstream. Thank you. So now let's talk about lessons. So if you could share with the audience mistakes that you made in this journey over the last few years and things other operators should watch out for, and where there's room for improvement. I'll start if nobody else wants to. You have the mic. I have the mic. I have the floor. So I think it's not so much about mistakes, Sumit. I would argue it's about the journey and understanding where you start defining your goals the right way. I mean, PayPal is 15, 16 years old at this point. There was a lot of legacy infrastructure, a lot of legacy way of thinking in how people viewed infrastructure. And the question really is that if you view Cloud the same way as you viewed infrastructure, you will design everything wrong. The journey is about taking applications that exist in legacy infrastructure that have been designed for legacy infrastructure. Leave them alone. Don't touch them. What you want to do is, as you design new applications, as you design newer revisions of older applications the way you manipulate data, you rethink how you're going to do that because now you're working with Cloud. So the company culture, how you view engineering, how you view infrastructure is very important. The other part, I would say, is for the Cloud team itself, if you're, I mean, I'm an internal operator of the Cloud, we want to make sure that from the ground up, you build automation, build visibility into your Cloud because if you don't do that, I can tell you, it's going to come back and bite you. There's no ifs about it, it's just a question of when and how often. So is there something that you did, which when you look back you say that was the best decision? So unfortunately, my history with PayPal doesn't go back that far. But I think the, if I look at how, and of course I mean, PayPal itself separated from eBay less than six months ago. So the history of PayPal itself is about six months old and I can tell you from that the more upfront testing you can do of your hardware, of your software, when you integrate, build the whole pipeline in a way where you're testing every step of the way and so when you make something available to your customers at some point, you have confidence it's going to work and you have confidence you won't get a caller to in the night from some paging system in the knock. So it's going to be short. Yeah, we made some mistakes. I will not say all of them. That's going to be so embarrassing. So one of the things that I discovered at WarDate, yes, we knew we wanted to open a stack. So what the mistake that actually was very clear is we did not define the use case properly from the very beginning. So you need to define and understand your own use cases. Don't jump into building an open stack if you don't know how you're going to use it. There are so many ways to use open stack so you need to define your own way to do it and that will help you to define your own architecture. Don't try to copy or mirror any other operator architecture because you will not be fit in that one. Maybe, but not 100%. So define very well use case and move in that one. What we did well is CI CD. Before jumping into just, I want to run open stack production, start crawling and start walking with your own continuous integration, continuous deployment system. Be sure that every single developer or system admin in your team is able to understand the pieces that they are putting in production. I'm not that old and WarDate had actually joined like one year and a half ago and when I get into the team and took over some of the architectural responsibilities, few guys didn't even know why we haven't HHA cluster just because we need HHA, yes, but what is the case that you're trying to fix? So bottom line, use cases, second part, continuous integration, deployment all the time that was very good for the system. Yeah, you definitely don't want to try to deploy all what is it, 20, 30 projects now. There are a lot of projects that you're not gonna use from the outset and you have no idea what you're gonna do with your databases maybe you're with stored or something. So that's one of the things that we simplified on. We only deployed the basic services that we need. So we had essentially a compute cloud to start with. So it was Nova, Glantz, Keystone and Neutron and actually it was Quantum back then but not a whole lot more than that because we didn't really need it and you have to know what your use cases are to know what services you're gonna deploy because you're not gonna deploy OpenStack into your data center and have a public cloud that won't be equivalent. You'll have the services that you absolutely need and don't complicate your world. Yeah, I was gonna say probably similar like we really kind of took an approach where we always knew it was like compute storage and network it was like the core stuff. So it's like, however you deliver it, make sure that that is like rock solid because even when I first got to Lithium we went through like three iterations. The first one was just a shit show. It was horrible and really what it was is that lessons that you may have gained from like the virtualization journey. It was like when I look at what happened, went wrong. I'm like, guys, you chose the wrong hardware. You didn't really like double check like all this stuff was gonna work properly. You know, there was other things that I just kind of made assumptions that, you know, sometimes we've been in this space for a while and I've been using like Amazon since like 2008. So I just kind of assumed everybody else kind of like really got it and I had to really take a step back. Like, you know, you mentioned about like do people fully understand like how things fail and change your thinking, getting into the cloud. So I kind of had to relearn mistakes and take a step back and earlier was mentioned about automation. We were kind of taking that journey as well with like chef and kind of transitioning that we kind of punished ourselves with because we did not have that process so dialed in. So, you know, be very discreet about like how you go out when you're implementing these things get those prerequisites done and then very be very incremental on your open stack journey. I just wanna add that important aspect is to focus on your customers. Work on minimal viable product. Don't boil the ocean. Get the simple aspects out first and just work with your customers in order to build what they need. I think that's one of the more important aspects and treat it like a software development project. Implement important concepts like DevOps, infrastructure as codes, CICD best practices and stuff and then you can iterate over and over again and build a product that basically fits the needs of the customer. So from getting started, right? To putting a cluster in production, to scaling it. Now when it comes to that step when you're starting to scale your clusters, what design decisions do you make? How do you scale? So we had a little bit of our scale sort of forced on us because when we first started out we had taken the networking design sort of the enterprise networking design and the enterprise systems design and the enterprise IP layout design and all these pieces to make open stack and so you have these limitations of the hardware or the software that you have to deal with. So for example, our physical networking team was worried about the size of a cam table because we were just using flat networking, right? So your cam table could be a problem in your physical networking. So we sort of semi-arbitrally limited the size of a cloud region to 10 racks because by all the calculations that we've done you weren't gonna blow out a cam table on a switch that way. And so we just started making new clouds every time we needed to add 10 new racks which adds a certain complexity, right? So instead of having one cloud or maybe a cloud per data center all of a sudden you have four or five or eight clouds or 13 clouds. Somehow you get odd numbers with this but you end up with a bunch of clouds and that's a complexity in itself. How would you, if you had to do it again how would you, what would you do? Well, so ideally you'd start with a network that's optimized for a physical network that's optimized for running cloud architectures. You do the same thing with the hardware you do the same thing with all of your practices but who gets to do completely green field deployment at an existing organization, not very many people. So that's part of the trick. There was actually a talk earlier in the week about how do you fit a cloud into an existing organization? You sort of have to take the pieces that already exist the data center model, the hardware model, the networking model and try to get that into your existing data center. So I'll add to that, right? So I think PayPal's gone through an evolution where of course we had these data centers legacy we had an architecture where you have a centralized data store and then you have web applications running in different models and they were retrofitted into the cloud model, right? So the whole, the part that you said about how the network needs to be engineered to resonate to the cloud model that totally sort of, we get that. When we split it away from eBay we redesigned, we built a new cloud from scratch. That's a dev cloud that I think Anand spoke about a couple of days ago. The, this dev cloud was built with the networking that we want, right? So we had the advantage of starting out from scratch and we could do that. But I think the, going back to the question of scaling I think the, there's a trade-off between how big you want your cloud to be and how available that cloud is going to be, right? So at some point it maxes out and then of course you hit your network limits. Vendors have tested their equipment to a certain level and not beyond. So you don't wanna push those limits too much in a production environment. The, I mean PayPal technically runs like nine or 10 availability zones. They're all one cloud, there's federated Keystone to give users the appearance that there's one cloud but there's nine or 10 different clouds effectively. Yeah, I think like for us, in regards to like scale some of the type of workloads that we're running also kind of dictate that as well. So yes, there's these upper level of ceilings of like what you can push with these clouds. But if you have apps that have quorum and things like that that kind of forces us to kind of change our thinking. That's the times that we got bit is because of just that, we had one availability zone and we're like, oh snap, we have like three things. We have quorum here and that one, the control plan went down or the network plan went down and then we were kind of screwed. So then the thing is the answer is right in front of us. We're like, look at the public cloud, look at how it's designed, take some of those patterns and start adapting them to how you're designing in the data center. Yeah, it's scalability is something that you need to think from the very beginning, especially because even when you don't believe that your cloud will grow to the site that it cooled you will end up in kind of like these addiction mode to open a stack. We have to actually set up a lot of quotas for our customers because they weren't crazy. I said, oh, it's cool. So I can create a network. Oh, it's so simple. Now I can connect these VMs to that network, it's so cool. And then tomorrow we're gonna have containers and they were crazy creating things and now suddenly from one day to another we were reaching capacity and then there's more cloud. So quotas was very important. So you will end up having to add more compute and there will be a moment that just adding compute nodes will not be good enough because you will be impacting the control plane. So there will be some alternatives for you. It's not the end of the world. So you will have the ability to create regions and then availability zone. So open stack is designed to do that. So you don't need to go crazy from the very beginning on creating an architecture that is gonna be fully scalable but thinking to account that when you start adding compute nodes there will be a moment that you need to change few things to do regions and other things. So if you don't have a system that actually could change the configuration of your cloud seamlessly without affecting the data plane then you're doing something wrong on the beginning. So you can think about that from the very beginning. So, anybody in the audience? So Sumit, can I interrupt? I wanna throw a question back at you. Yeah, but before you do that. Anybody in the audience please feel free to ask questions as we go along. So Sumit, you've been sort of, the rest of us are sort of cloud operators, private clouds and you are founder of a company that's providing visibility into the cloud. So tell us what you see as the needs of your customers. What I see is that every single customer wants something different. Just in right here itself, one aspect can be somebody would say that, hey, I want to get maximum utilization out of my cloud infrastructure. I want to have maximum number of virtual machines running on every node. Another customer would say no, I want maximum amount of reliability. I don't want to peak on the virtual machines. That's one element of it, like how do you use your infrastructure? How do you set policies on how to use them? Use the infrastructure. The other thing that we have seen is this whole thing about really running the cloud as a service within the organization, right? If, now we all have Amazon, it's very simple to use. We slide our credit card, we get a virtual machine, it's running. We don't see any problem, it keeps running. In the enterprise, things aren't there yet. I mean, there's still required some amount of operator intervention. And at the same time when something goes wrong, we still use rubber tickets. We still need to pick the phone up. Like how many times do people actually open rubber tickets on Amazon? Amazon gives you all these APIs that you have access to. You want to get the metrics, you want to figure out how things are working, how much you're consuming. You have all those APIs. Again, on the enterprise, we see that those things are just missing. They're just basic things that are missing. So bringing that self-service experience to the cloud in the enterprise, running that cloud in the enterprise as a service, as a true service, right? Because that service is what the product is. OpenStack is not the product. OpenStack is a distribution, right? So adding the tooling, bringing that self-service experience, adding the tooling to ensure that the cloud keeps running. All of you say, you don't want to be woken up at 2 a.m. When I was an operator, I was always awake at 2 a.m. And that's the reality, right? I mean, and if you're sleeping at 2 a.m., you have like one year on the phone. It's very challenging. Shit's always breaking, right? So you always need systems in place to react to those situations. You always need the tools, and they're just missing. Even if we all wish to not wake up in the middle of the night, we will sooner or later. We know that. We assume that. So I think what we're trying to get is assistant that we can fix quickly, right? It's fine that I wake up in the middle of the night. I make a call to the right people and in a very short period of time, we fix things and we go back to sleep. That's what we're gonna end up doing. So if the system that you're building hasn't had the ability to be fixable in a very short period of time, there's something that you also need to look into. Cow's Monkey testing is very good. I really encourage people to do that. You don't need to get any tools. You can actually create your own Cow's Monkey testing in your own cloud. I don't know, do you guys do that? Yeah, so Cow's Monkey itself is a part of the Simian Army stuff from Netflix, right? And it's very, very AWS centric. So we have a project that hopefully we can open source where we're doing some of that stuff where you can call an API and have it kill something out of your pool. So we do have people working on sort of open stack tools for that sort of thing. Because it's critical to find those pieces of your infrastructure that are rigid or not automated. We find regularly not fully automated pieces of the infrastructure. So it's critical to do. That's one thing, but when we think about operations, we got to think about it in a more holistic way. It's not just about, hey, is my open stack service running? That's not why you're gonna get the call at 2 a.m. Because I'm pretty sure you've put the systems in place to make sure it's running. You're gonna get the call when the user's application is not performing, right? And mostly when that call comes, you realize that, hey, I don't even have the data to tell the user whether it's performing or not. You just don't collect it even, right? Is that for my cloud? Is that what you're saying? No, what I'm saying is that potentially on Amazon, you can go get that information, but in your open stack cloud, you cannot. I think you were right to make the, I mean, like you said, open stack is a distribution. There's a lot of tooling ecosystem around it that's needed for any of us to run the cloud. And like Edgar said, I mean, you will be woken up at 2 in the morning because that's a law of physics, right? They will, systems will fail at some point or the other. The question is, can they auto heal? If not, can a human intervention be fast and efficient so that you can go back to sleep right away, right? That's the key. I think what we are all looking for is the ecosystem within open stack, just like you said, what Amazon provides today, if that ecosystem can be evolved while all of this is being built up. I was happy to ask the questions. I, but if I get asked the question, I'm happy to take the conversation over too. I think also one important aspect is that the def core open stack and the, basically the simple implementation of open stack doesn't provide too much value in terms of giving developers the ability to actually manage their applications in any like basically infrastructure. It's all the added like features that you add on top of it, like on the PaaS layer, solutions like Cloud Foundry, Docker and those various different layers that you put on top of it that provides the values for developers. So James, other than this, okay, let's think about the full spectrum now, yeah? So how do you monitor everything? How do you monitor the user workload? How do you ensure the right performance levels? Yeah, so there's a few pieces to that. So at the end of the day, we have a bunch of monitoring that just plugs into the legacy monitoring system. So we definitely have a knock sitting there looking at stuff. But in many ways, that's the wrong Cloud model, right? Because something bad happens, something eventually detects it, some person finally eventually sees something on a board and they eventually call somebody and eventually someone wakes up and how many steps removed from the problem are you? The Cloud model often should flip the legacy model over, so your monitoring shouldn't reach out into the Cloud, the monitoring should be in the Cloud, in the VM reaching out. Your VM should be reaching out and doing things and the detection should be automatic and automated. So we're not there yet, right? Probably most people aren't, we're still in sort of this ping mode of nodgeos or a plugin or something runs periodically and finally eventually detects that something's down so it's not as reactive as it should be, but automatically testing or automatically detecting things from inside the Cloud and reporting it out is more the model that you wanna go with. So the centralized nodgeos thing shouldn't. I would agree with that James. I think I would term it sort of the cause and effect. We are monitoring the effect, not the cause. The cause is what we need to monitor, be able to look at wherever it happens, whenever it happens, have the flexibility of being able to threshold on where I think a problem is critical, where it's something I can manage and maybe look at it on Monday instead of Saturday evening when I would like to go out for dinner. The visual thinking. Yeah, so I think from a monitoring perspective I agree with James, we're not there yet. The data, we collect a lot of data, right? What do we do with the data? Anybody's guess? Well that's one of the things is you collect a lot of data, but what does it mean, right? Everybody has tons of data. Big data is a thing, right? But what we actually want is big information, right? We wanna know things because of that data. We don't want a ton of data about helping your hash is or whatever else, right? Action level information. I'd say, sorry. Yeah, I was gonna say. I'd say that treating it as a big data problem is just the wrong way to treat it. You can't think of it like that. So I was gonna actually just say along with James it's like we're kind of going through this transition where we were the team that really kind of deployed these iterations and got us to production and now we're kind of going through this education process where we're transitioning it to a team that they haven't gotten all the tribal knowledge of what has happened up to this point. And so I'm just like, okay, so even if you come up with all these big answers of how you fix things, it's like, where does it go? Where does it exist? Are you still back at a run book? It's kind of interesting, because then I've seen some really interesting stuff coming out with like, there's a cloud provider in Czechoslovakia called TCP Cloud that, they kind of are using SENSU with Nagyus and they're kind of collecting data, but like some of the repeated things that are happening, it's capturing these things so that you can hand some of this information off and you don't have to have that one guy who knows everything, but it's actually like trickling out to the extended teams. And we've also taken an approach as well, like I think on Monday during the keynote, we have an overview where we're trying to just give big broad visibility into what's happening. One of our team members did a really amazing job on creating that. And then we also have like our operator kind of point of view, like I really trying to get them to the insight that they need so they can find that needle in the haystack of all this constant noise. And I kind of agree it's not there, but we're just trying to push it further and further. And that is my worry about as we make this transition that this team's gonna hate me when we hand this off. So you were talking about one person knowing something and we have the same sort of problem where somebody thinks that their job is, I'm the dude who types this command all the time, right? And so you have to, this is where sort of the DevOps thing comes in, right? Is that you have to teach your server guys that they're not server craftsmen, they're robot programmers now. And when you figure out, when this and this and this happens, you have to do this to scale out your pool. Oh, and then by the way, you have to scale out your database too to go with it. That tribal knowledge can't live in somebody's head. It has to go into a system or an automation or something so that when that thing happens, it starts to happen automatically. And then everybody knows it. The system knows it. You go read the code, the code knows it and you can make it better. So that's what we do in our PAS layer is that it's basically codified best practices for deploying Tomcat or MySQL or whatever it is you're deploying so that the next guy who comes along doesn't have to figure out you change VMDOT whatever and your properties file and everything else. They just get best practices and someone later goes, no, you're totally wrong. You have to change this other thing, right? So you go V2 on it, right? Yeah, absolutely. One of the things that I'm hearing mostly in these conferences is a lot of testing and a lot of focus on the control plane but very few in the data plane. So you're totally right. There will be calls about like, yes, I have access to my VM or my container but it's really slow. I mean, they refresh the UI and it's like taking forever what's going on and then you go to your networking and you see that everything is fine so you need to go to the host and in order to identify the host where that VM is running in, I don't know, hundreds of clusters, it's just gonna be a mission impossible. Yes, you need to have a kind of framework that actually you can monitor what's going on the VMs. There are some tools that you can use for doing that. I think we're just getting there. I don't think we are so mature in that part. So what you were saying about auto scaling, auto healing, sounds wonderful. I don't think we're there. I think we're going in that direction. We need to go in that direction but don't think that we're telling you that it's what you need to do because we don't even know how to do it unless I don't know. So should monitoring, it's like all of everybody kind of treats monitoring as an afterthought, right? You build out your infrastructure. Sometimes you're gonna get some problems. Some things are gonna happen enough times and then you're gonna be like, yeah, now I gotta make some investments, get some monitoring in place, figure this stuff out. Should it be the other way around? Right, shouldn't monitoring be the first class citizen? Shouldn't we start there? So I would argue that it's not an afterthought but people underestimate how much of monitoring and how much of visibility you need. And absolutely it should be, I mean like as I said in the beginning, when you are building a cloud, visibility is number one. You have to have visibility into everything that's happening at your control plane, data plane as an operator, data plane as an application owner, visibility into the cluster specifically where a particular application is running. Or at that sort of level of granularity, absolutely it's a must. Yeah, so I think you're right. People don't put enough time into that at first. And the problem is that you don't really know what you need to know or what you need to gather. So when we started out, we had a logging infrastructure but then you have people who have their JVMs turned on to debug and we just were generating more logging data than we are generating application data but there's so much of it that all you can do is suck it into the Hadoop and never look at it, right? You don't go back and reanalyze it. So that has to come along the way with the application is to say, hey, this sort of thing matters to me whether it's a heap size or you're running something in Go or Node or whatever you're running in, you need to figure out along the way these are the sorts of things that matter to me. These are metrics for my application and you have a system and infrastructure that you can feed that into that you can actually look at. So sometimes, for example, you can just heave data at Graphite, right? And Graphite, maybe it scales, maybe it doesn't, right? They've done a lot of stuff with that but you can just heave data at it and at least you can see the data points today and go, okay, that sort of looks like my baseline and then put confidence bands around it or something. But if you're doing that from day one, you sort of have that data and a baseline of, okay, I have at least something basic, right? Because you can't log everything. You can't collect every packet. You can't inspect everything. I think as well, some of the things that you've learned in public cloud, you can take some of those lessons out of there. I mean, that's kind of been our model of like, we look at behaviors. Like if we see like, you know, you're creating VM, you're attaching some block storage like, and all of a sudden that it's gone slower. Something's going on. You should be poking in, you know? So we, you wanna try to find those like those hotspots and this is stuff that you can leverage concepts from the public cloud to do monitoring and that's what we've been trying to take as a strategy. So, you know, really have trend analysis. Like you try to capture these things and see if you're noticing, you know, something, some anomaly in your public or your private cloud. So, while we're talking about public clouds, right? So, one thing that's kind of different in between public and private clouds. So we don't own this public cloud infrastructure. We are renting VMs and we, even when we are renting the VMs, we almost assume that the level of reliability will be lower than if it was running in a private cloud. And then we architect our applications in a different way to kind of deal with that unreliability and move, shift the problem to some other layer to keep the application running. But now when we design these applications in the enterprise, should we bring those practices back in or should we look at improving the infrastructure in the enterprise? Or should we just say, hey, enterprise infrastructure is not gonna be reliable either? So that was the point I made when I spoke earlier about the legacy way of doing things, right? The enterprise has a way. There's human infrastructure is always there. It's wrong to begin with, but that's the assumption people make. And I think the way we code in cloud, the way we design applications in cloud, whether public or private, it's the same. The VMs, any resource can go away at any time. It can be partitioned away at any time. The applications have to be resilient. Your PAS layers have to provide for flexing up in a different region, different availability zone, based on what was the scenario on the ground. Yeah, definitely building as if you were going into a public cloud where you didn't have control over the infrastructure will make your application more resilient and better. But as a private cloud operator, you sort of get bitten by both sides of it, right? So our users say, well, this isn't public cloud. Why do we have to do this thing, right? And then when they want it the other way, they say, well, we would never have to do that in a public cloud, right? So you get bitten by both sides of it, yeah. But allowing your application teams to say, I'm gonna be lazy because I know that I can guarantee that the operations team is gonna do things a certain way for me on the hypervisor or they're gonna jump through hoops to make sure my VM doesn't go away suddenly or we're gonna do something weird with the hardware to get data back or something, giving those tools, like it's great to keep the business up, right? You wanna keep the business up, but it makes your developers lazy and the applications suffer for it. I wouldn't respond to that. I don't think it's so much about keeping the developers lazy. There is still an expectation that your VM's not gonna die three times a month, right? So you want the VM to have four nines of availability, five nines is too much, but whatever, three nines, four nines, whatever it might be. But at the same time, you do want them to understand that the VM can go down at any one time and their application is to the decision. It's both, it's not either or. Yeah, I mean, you just work your way up. I mean, it'd be, it's one data center. It can go away. I mean, we have a data center in the Bay Area. We know what we live on. So if you're not designing to scale across, you're doing a disservice to your developers and we're battling like a mentality that for many, for a long time has built vertically, now we're trying to shift this thinking that you need to design horizontally. That's the only way that you're gonna scale. That's what we're doing. And yes, it's a continual education process to get them there. Yeah, it's 50-50. So agree and agree. So actually we are, so we have a bunch of applications of word date. We have a stateless and a stateful application. So we started with the stateless ones because are the easy ones, to be honest. And actually those ones are very easy to be gone at any time and the developer will not gel at us or freak out just because the VM is not anymore because all the data is not in the VM. The stateful ones. So what we're trying to do with those ones is to provide the feedback to the application developers that they need to also change. So we need to cover our basis, but they need to change also their own basis. So it's 50-50 in that case. No, no, we're okay. We can keep talking. So I would add one thing to that. So at least like in my experience, when I was an operator, what I found was the VM dying, that's the easy case. The intermittent problems, that's the hard case. And when I was talking about public cloud being unreliable, that's what I was talking about. Right, if VM dying, that's the easy case. Right? The problem that doesn't change. It does change. In what way? I mean, it does change on... No, no, I'm not suggesting... No, no, wait, wait, wait. It does not... Let me clarify. Not from the perspective of the application, but from the perspective of the operator delivering the cloud service to the user, it does change. No, so I'm not suggesting it changes between a VM dying and there being an intermittent problem. I'm saying it doesn't change between public cloud and private cloud. The problem set is exactly the same. The best practices people have evolved for coding in AWS work in OpenStack as well. Period. Any questions from the audience? How long do we have? Okay. How long do we have? Yeah, just... We can repeat the question. So if I may repeat the question as somebody who's gonna consume hardware from a vendor, how do we view the hardware? What would we like the hardware vendor to do to make our lives easier? Fairly paraphrased? Okay, who wants to take that? I can answer. Okay. I think from my perspective as much as we would like to see the hardware as a commodity, the fact is that the reason we get woken up at two in the night on Monday morning or whatever is because there's some problem in the hardware. So we would like the vendors to allow their customers to treat it as a commodity, but not to treat it as a commodity themselves. What we would like to see is, I mean, we find hardware failure rates and I won't name the vendors much higher than we would like to see, right? And as much as somebody like Sumit can monitor the hell out of it and give me all the data I need, I would like not to have to deal with that. Well, that's impossible, right? I mean, hardware will fail. It could be some fan in there. It could be like the glue that sticks the heat sink to the CPU even. It will, you're right, Sumit. My point is not that it will not fail that the failure rates that we see today should be a lot lower than they are. Just to answer one example, so in the early stages of the production system we didn't have for certain reasons the open-stack controller in an HMode. And Friday night, actually, the motherboard of the piece of hardware dies, totally. So we called the data center and said, like, yeah, the motherboard is dead. So we actually unplugged the hard disk, plugged it in a similar box, everything and everything went back. So that kind of features, just to give an example, we're looking for hardware. I don't think the white boxes are still there jets and they're gonna give you the visibility to identify that the motherboard actually was dead at that time and actually we can't react. And instead of investigating, like, over after over, what was the problem? So that's kind of like my experience and perspective for the vendors. Okay, I think we need to wrap it up. Yeah, I think we can do that. I see some ancestry guys back there that are waiting. What? Our friends from Ancestry, they're next. Oh, okay, okay. So let's thank the panel. Thank you guys. Hey, thank you everybody. Comments, yeah. Ciao ciao. Thank you guys.