 I don't have the second one. Second glasses. All right, thank you, everyone. Welcome. I'm Jeff Dickey, I'm Chief Innovation Officer at Redact, and I've got a great panel for you today. I'm going to start just down the line and let everyone introduce themselves. Go, Dan. Good morning. My name is Dan Sperling with Getty Images. Getty, for those of you who don't know, is in the business of imagery. Now that you've heard about it, it's kind of like the used car or that you buy and you start seeing it everywhere on the road, it doesn't have to be used. It could be new. You see them same amount. But we do a lot of editorial imagery. So the imagery out of Africa around Ebola, or the imagery from the Cannes Film Festival, or the imagery from our American version of football. I am the VP of tech services there. Manage are basically everything outside of Dev for Getty. So all the classic infrastructure, as well as cloud platforms and application support. Good morning. I'm Mark Williams. I'm the CTO for Redapt. Redapt is a systems integrator that helps out customers who are building out data center infrastructure. Prior to that, I used to be a Redapt customer and my prior life at Zynga, where I was responsible. I basically had Dan's job at Zynga, where I was responsible for all of the game infrastructure, which originally started in traditional data centers, exploded into Amazon, and ultimately was architected back into a private cloud. We called Z Cloud. I'm Sebastian Seidel. I'm the founder and CEO of Scater. Scater does pulsing governance over multiple clouds. And we help lots of very large web scale companies run their production infrastructure. Hi, I'm Shen Liang. I'm the co-founder and CEO of Rancher Labs. We make a Docker management system that can run on OpenStack. The other reason I'm here is, in my previous company, Cloud.com, I was involved in a lot of early cloud buildouts. We created a software, a piece of software called CloudStack, which some of you probably know. And we were also, Cloud.com was also an early member of OpenStack community. Yeah, thank you guys. So we're going to talk about large scale production infrastructure. And I was kind of laughing because I don't know if we want to talk about large scale test and dev infrastructure instead. Everyone here has not only built one kind of hyperscale cloud, but has been responsible or integral in many of these. So it's great to have you guys. Thank you so much. Just kind of want to start out and just, what is large? What does that mean? And how do you define that? Like, what's a large scale to you guys? Chen, do you want to start with that one? So I think, to me, large really has many aspects. We have people, I work with customers and users who built very large systems, but with a small number of multi-tendency, meaning they probably have a small handful of apps. But each app was scaled to a very large number of nodes or physical nodes or virtual machines. So I would consider that, by whatever your definition of large is, I think that that is one form of large scale production system. Another form that we've seen is with service providers in a multi-tenant system, even though majority of their users don't necessarily have a lot of virtual machines or resources in the cloud. But overall, it does still amount to a very large scale deployment. I'll give a different perspective as well. So certainly there's the quantity number, as Chen mentioned. Certainly when you're talking about a megawatt or more of physical infrastructure, power drawn by physical infrastructure, that's one dimension. We kind of talked about this a little bit in one of the yesterday's panels, too. Another dimension to look at large scale is just how much change is happening per unit of time. And when that grows to numbers that exceed the capacity of a small number of humans, you have to be investing in religiously doing automation. So there's just another way to look at it. I think a good heuristic is basically when you start talking about power in general, when you start worrying about power, then that's kind of when you've hit that scale. And is that important? I know there's a lot of tech cred involved in these large scale infrastructures. And we're always talking about large and efficient and automating, but what is most important about that? Like can you have a small infrastructure that's where you have operational efficiency? Like what's the goal that you want to have and why is it so important that we talk about these large scale infrastructures? Is it the change of the applications, the change of infrastructure hardware? What is that? What I wanted to look at it is what's important to the business, right? So at Zynga, Zynga was very good at defining top level company objectives and ensuring that every organization, whether on the development side for games or whether on the infrastructure side for operations, that everybody's objectives tied logically to the top of the curve. And so what we focused on in operations was that triad of good, fast or cheap. Pick two, usually you can only pick two. And during growth phases, the two you pick are fast and good because assuming you're making money, it's worth kind of pouring that money into something that works really well and is there quickly for those that are consuming it. As you kind of reach a maturity curve and beginning to be able to optimize, you can start looking more at cheap. And what's difficult at that point in time is if everybody else is on a different page, everybody else is like fast and good and they don't care about cheap and you're the operations person who's being pressured by the CFO to dial that number back down, you've got to get those objectives changed for your customer so that they care. Otherwise they're never going to move in. In fact, this is what happened. I built ZCloud, I was aligned to the CFO to invest all that money and then all my tenants in Amazon didn't want to move. Like I couldn't get anybody to be the first willing participant until we had alignment of objectives where that game studio or all game studios were actually rewarded for profitability because of charge back. So that's another strategy to kind of get alignment there. So I wanted to take a kind of audience question real quick. But first, in that scale size, like Sebastian, what's a number? Can you give us a number of some of the stuff that you're working with? Yeah, so one of our largest customers runs a multiple data center at CloudStack. They're now at, I think, 40,000 physical servers. They also have a very large presence on Amazon and some other cloud providers. It didn't start that big. Basically they started on Amazon and repatriated some workloads. I think the point in time where they started talking about scale and kind of hit that inflection point was when, yeah, basically when power, when they started talking about the amount of power available in the data center and all that. Yeah. Well, are there any questions from the audience? Why are you guys here and wanted to hear this? Are you guys trying to scale out infrastructure? Just curious about what scale is and any questions so far? Is everyone still, yeah. It is early in the morning. Still sleeping there? Day four. What about my planted questions? Any of those? I got a question on scale. How much did you guys scale an individual OpenStack cluster, if you will, versus going to multiple clusters instead? So my experience prior was with CloudStack, which I'll give you those scale numbers and I'll give you my opinion on what I see happening or struggling with OpenStack scaling. So with CloudStack and this was Shen and Zynga working together in the early days, we scaled our first availability zone, or sorry, our first region to 12,000 physical nodes with three management servers. And then we had a second region that was 30,000 nodes also with three management servers. So it's been acutely frustrating for me to kind of hear all of the pain at summit after summit about how difficult it is to get OpenStack to reliably scale. And so with standard OVS, the number I keep hearing is 50 with some more advanced networking topologies or using direct networks. I hear the number between 100 striving for 200 and I just keep shaking my head. This summit I've learned more about cells where you can kind of chain together several 100 to 200 node API clusters. But still there are major deficiencies with how well that's federated across and there's known kind of significant limitations that's still considered experimental. That said, Cloud Scaling was one of our earliest partners at Redapt where they had designed an OpenStack and specifically a network architecture that could scale. And I know that their biggest claim to fame was 22 racks for Walmart. It was probably if I assume 22 nodes per rack in the 400s and built to go beyond that. And that was very much an Amazon-like architecture. And the reason it was so different and the reason I liked it is because it's very utilitarian. They took all the complexity that is OpenStack and the far-reaching bits and they said, no, we need to make compute scale. We need to make storage scale. We need to make the networking scale and making those simplifications to their stack made that possible. And so what I fear with OpenStack is all of that heavy complexity, the intent to integrate with so many different proprietary platforms because that's what customers are also asking for. But those pieces are more fragile. Those pieces have more complexity in them and complexity doesn't scale at least at this stage of the maturity game. Yeah, I'm expanding on that thought. I think that we all agree that complexity doesn't scale which is why every time you talk about scale, you talk about consistency and simplicity. So totally agree on that point. Kind of answering the question you didn't ask though and associated is from a Getty perspective, we are looking at it as a management versus scale problem. So from the standpoint of does it scale enough? So if you talk about 200 nodes, is 200 nodes enough for a region, for an availability of whatever you wanna call your pod? And you then balance that with how much management overhead that takes to manage that kind of environment. For us, as we start to look more at hardware isolation capabilities and environment isolation, given some of the fragility of moving OpenStack into production, we have intentionally broken those apart in data center, not even across data centers, so that we have the ability to withstand like physical hardware fails at a massive scale, not just at 1Z2Z level and not have massive application impact. So we are intentionally breaking stuff apart into local AZs and then GOAZs which is indirectly, from an answering your question perspective, indirectly addressing the challenges with scale specifically with OpenStack. And then also I touch on one other thing, like we also looked at some of the challenges with some of the modules that were more impacted by scale problems and we've gone with some proprietary solutions to meet those challenges. No, so I wanted to just emphasize actually a point Mark made. I mean really, I don't think there's any, from my perspective, whether it's OpenStack or CloudStack, there really isn't a, it's a little confusing to think about an absolute scalability limit because it really has a lot to do with how you configure your system. Imagine if you have shared storage and not only shared storage, say there have to be fiber channel storage or a SCSI storage, so a lot of these things come with some inherent scalability limits, right? And same with networking or other parts of the infrastructure. But I'm actually quite encouraged at least recently with the success of services like DigitalOcean, such a simple service, yet gaining tremendous amount of adoption. And I saw a few months ago, I signed up for a beta service. I think it's an OpenStack implementation from GoDaddy, which honestly looked quite similar to DigitalOcean, I think they have some very capable people. I would assume they solve some scalability problems as well. So I'm actually quite hopeful if we can maybe move toward more simplified infrastructure because the world is changing. It's just very difficult to try to carry forward the traditional legacy, a shared storage of layer 2, fancy layer 2 networking, yet still make its scale versus today if you look at DigitalOcean, by making the infrastructure layer simple, but bringing some of the complexity up to a layer above the infrastructure, like platform as a service or docker containers, right? Those things are just better designed to scale and introduce the complexity at different levels. So I think for many of you, if you're designing and architecting a system that you really want to scale, I mean really I would suggest we should all focus on making it simpler. Yeah, I've seen a lot of customers struggle just to get beyond 20 physical nodes when you're using Neutron, when you're using a bunch of components like that. So you hit into a wall pretty fast when you start using off-the-shelf stuff. And going back to it kind of is important and do we need that for the masses? I mean, there's only so many people that are going to scale to 50 and 100,000 physical nodes. Is it important for the physical environment to scale, the cloud platform to scale, or the apps to scale? Like what are we trying to achieve? I think that comes back again to that conversation around management overhead versus scale, 50 to 100,000 nodes. Yeah, sure, that's not gonna be very common, but 20 nodes, that's every single day. So from that standpoint, if you multiply out, you need 200 nodes, and all of a sudden you've got 10 environments you have to manage, you multiply it out to 2,000, you have 100, that becomes a management nightmare. And the whole point that Mark made around what is scale, scale is that point at which, power for sure is a point, but also which people doing things the way that we historically have done them just does not work anymore. So if you're gonna say that are the whole reason we're going to cloud is because we want to alleviate and make our people more efficient, yet we're gonna have hundreds of pods to manage that. You've just broken your entire value proposition. So we'll go ahead Mark. I was just gonna say, OpenStack can only solve some of these problems. One of the, I mean, until there's like infinite, beyond the speed of light networking, you have to know how to design a production, large scale network first, to be able to even address the potential to take software that can virtualize that environment and orchestrate that environment. So OpenStack's not anywhere close to solving that problem. That's not really directly in its domain yet. You need smart people to do that. But I think the frustration with OpenStack and the scale need is that there's, you kind of have, right now, you have to roll your own. Like cloud scaling did it on their own with something that wasn't really necessarily put back upstream. And the companies that are doing now, they have smart people, they have PhDs kind of figuring out how to extend the clusters so that their performance and the MySQLs so that they're performing it. Fundamentally, it's all backed by an interpreted languages that's going to be more costly just in terms of dealing with all of that traffic. And now at this conference, I'm starting to hear, oh, well, we should probably let people start contributing into compiled languages like Java. I'm like, well, didn't we already do that? Like that works. So sorry, a little frustration there. And then changing the gears a little bit. I think that if we ignore application scalability, we're ignoring like the whole point of why we're building up scalable cloud infrastructures. It's also very hard to build scalable apps. And so that shift that we're seeing that we are promoting and driving of hyperscale, I'm going to use that buzzword, of systems that can infrastructure that can grow massively must be coordinated with applications that could grow massively. The more cloud native cloud aware, whatever term you want to use, there's a huge application component there as well. Any other questions? Yeah, can you do the mic, did you mind? Just behind you. So Dan and I have talked about this a little bit before and you actually anticipated my question, but a lot of the discussions about scale that I hear are really, they can be loosely translated in, well, we'll build it and then they'll just show up. And that's kind of been a theme with OpenStack for the four or five years I've been involved in it. I'm wondering if you guys can talk some more about Dan's point about how do you engage the application side in building those things? I mean, do you wait for them to show up and say, okay, I think I'd wanna do this on Docker and containers, but can I get a bigger budget for running on AWS? And then you say no, and then you start talking. Or is it, are you just happened to show up one day and said, did you know what you could do? And they said, well, we never thought about that. Can you guys talk about that a little bit more in terms of engaging them? I mean, it comes from the business side. Yeah, I'll take the first crack and I'll be quick. I know I can be long-winded, I know, exactly. Honestly, this is probably my number one passion area across everything I do in my day-to-day job. I have messed this up so many times already in full disclosure. When I first started, and this is before 2010, we were looking at VCD and we were building up what were versions of clouds. We built those as infrastructure solutions and then we were shocked that when we built it, they didn't come. And then I did it again and messed up. I didn't, my team, as a leader of the team, it was my responsibility, so that's why I take the responsibility there. But we did it again. We built an HP Cloud solution that was not optimal, but that was a separate discussion. But again, we thought we would build and we'd build it better this time. Because obviously the barrier to them coming on was it wasn't good enough. And again, they didn't come. Iterate, iterate, iterate. We're at the point now where the OpenStack solution we're building, as interesting it is, we are not gaining as much success. We're trying different ideas. We're not gaining as much success and we're having enterprise-wide decisions that are we will do blah, right? Because we're too big of a company to have that enterprise-wide we will do blah because we don't have the ability to show proof positive data that it will be as successful as we are claiming it would be. If we were a small company and there were 10 of us and we could rally around and get around it, absolutely. But there are 500, 600 of us in the technology organization and a shift that Titanic is not possible. So instead what we're doing is we're grabbing the connectors that are in our dev partners and the people that have a lot of weight and when they say something, everybody's just like, oh, that must be truth. We're grabbing those people and really dedicating and investing time in those people and getting them amped and hyped and using them as our evangelists, for lack of a better term to move out and throughout the rest of the dev organization to show the wins that can be accomplished by changing the way they develop and then also doing that on our cloud stack. So I've also been the built it and they didn't come. So the strategies I had to overcome that was actually as, so A, I was lucky. Look, the company Zynga was founded in 2007 so there was no legacy apps, right? All of the developers were needing and wanting to code in very modern automated deployment ways. So that was a great culture in which to succeed but still building ZCloud after everybody was happy in Amazon, I had the built it and they did not come problem. So again, I also found an early willing adopter who was, we were the first successful to kind of align on the Chargeback model where they weren't profitable, they needed to get profitable. Building ZCloud enabled us to provide a twice as performance and a third the cost environment to do exactly what they were doing in Amazon fronted by a multi-cloud orchestration tool that effectively let them pull a dropdown down and have a different way to deploy on a cheaper and better infrastructure and what that evangelism story turned out to be is not only was it cheaper and as performant, it was better. The user experience was better because the CPUs were richer in the environment that I had compared to what Amazon was running. So that's one. Most, now that Amazon's pricing has come down significantly since I did this in 2011, it's now harder to kind of prove that and especially at a smaller scale that your new private cloud is going to be cheaper than Amazon. But what I would suggest if this is the place that your company needs to be investing and to needs to encourage your internal customers to be consuming is you subsidize the charge back. A, you should do charge back. B, you might have to subsidize it to encourage those customers to consume it more quickly and you absolutely have to be an evangelist and you have to have them involved in the design. So one of the greatest, again, it's a hard thing for an IT person to feel like somebody else can have good ideas about what to do with our infrastructure but I'll tell you involving my customer was critical. The first experience was taking them into the data center and they saw all the space in the racks. Why can't we put more servers in there? No, I mean, you educate them about power, you educate them about cooling and the limits of the realities of data centers. One of the greatest kind of contributions from my developer community or customer was that we were building out ZCloud and we had different sized data center buildings and so we were worried about availability zone size equality and a very simple suggestion of can't you divide it horizontally and make a top half and a bottom half availability zone split at the middle of the rack. And that was like the perfect solution that we just had not thought of and involving your customers just great wins and they feel great to be able to contribute to it. Isn't it cheating to subsidize the charge back? Not if you've paid for two, so if you've spent millions of dollars and there's no usage there. Zinger for a long time was saying, oh, we're deploying apps in Amazon then we know what they are then we move them. Once you've built enough capacity that's unused, it was about four months we were actually doing that but then things would go directly into ZCloud because A, it was better. We had proven that finally with early customers and C or B, it was empty. And so cheaper and it just made sense to go there. So having been on the vendor side, I've seen a lot of IT departments build out and open stack or some other private cloud and kind of offer it and then have very little adoption. Generally what's done, well basically there's a fundamental shift in how IT spends the site and then the developer is king and the developer kind of decides where to place his or her workload. And so the approach that I've seen work is basically for corporate IT to basically set up a set of guardrails like we will do charge back. We will do like auditing and logging. We will do all these things, setting up all the guardrails and then offer a portfolio of clouds to the user to use. And that's one of the reasons that Skater has built a lot of policy and governance capabilities to be able to set up those guardrails to help do the charge back and all that. And then it's up to the user and the internal customer to decide where he or she wants to place the workload and then you kind of just grow things from there. So, you know, if I actually, I think Sebastian raised a really good point which is really developer, at the end of the day it is at some point the adoption has to really come from developers. OpenStack always thought OpenStack had a great developer traction, but it turned out, I mean they're probably more like in the developer builder category as opposed to the developer application writer or developer user category. So I think that area we definitely have some more work to do and it's not easy and that's honestly, I struggled with that for a long time with CloudStack. You know, we built hundreds of clouds and they were all in production years ago and I would say not all of them really took off like Zynga did, many of them just died. So it's really not good enough to make your system scalable, easy to deploy, easy to maintain. And at the end, if there's no adoption, they're not successful. So at the end of that's what actually got me into Docker and then I was really happy to see the focus on containers. I'm a second day keynote, a lot of talk about that and it's really not about so the containers replacing virtual machine. I don't think that's gonna happen. I think it'll run as a workload on virtual machines, but also it's just a great workload for platforms like OpenStack to take up. And I think in the coming years, the more OpenStack can do to enable great workloads like Docker containers, I mean, the better position we'll all be in, there'll be more adoption. The question, oh yeah, go ahead. Sure, thanks for doing this by the way. You sort of touched on this a moment ago with you mentioning of containers inside of virtual machines and there are various opinions about that. My question to all of you is, are we building out virtual machine infrastructure which will be obsolete by bare metal containers in a short period of time? So yeah, it's a heterogeneous world out there. I mean, like mainframes are still running. Yeah, it is quite possible that the private clouds being built today are just gonna be obsolete tomorrow, but that's kind of just a reality of the industry. I mean, IBM still makes billions of dollars of selling the what's it called, the Z-Series? Yeah, mainframe. Yeah, mainframe. So yeah, most companies have both EMC and that app. They have Juniper and Cisco and it's just a very, very heterogeneous world out there. Yeah, I think the legacy spectrum of applications are still gonna be very dependent on a hypervisor type of approach. And if I think about the ideal world of what I deploy containers on hypervisors and orchestrate kind of both and one inside the other, I would prefer not to do that. Just again, simplicity scales and I always think about when it's broken, how many things do I have to eliminate before I've found root cause and can fix it? Because in the meantime, I'm burning people's, I'm stressing people out or losing money. Again, I would try as much as possible to have containers running on simple OS on bare metal, orchestrating them differently than inside a hypervisor. Yeah, I don't think a virtual machine would go away. So I've heard basically two arguments that say why containers will replace virtual machines. One is there will be no need for virtual machines, containers run on bare metal servers. The other is containers in the future will be better isolated. So containers will work like virtual machines. So those are basically two arguments, people talk about. And I think the first one container running on bare metal servers, it's certainly possible. I know people who do it today. But I think from my perspective, that's a bit of a corner case, just like bare metal cloud. It's out there, but it's not AWS. So the fact is the machines are becoming so much more powerful. Their capacity is doubling on a continuing basis. And as I said, the kind of people who like containers actually tend to be developers. And it's just inconceivable that in a development environment, CICD environment, you would actually provision a bare metal machine to a developer as a unit of provision, it's just too big for a developer to consume. The other part, container is now gonna grow up and have really good isolation. And I think I see great things like LXD, I mean, certainly kind of going that way. But I think it's really got a long way to go. The tax surface, the security boundary of virtual machines is just so much cleaner, so much better understood than the entire Linux kernel. So it could get there, but I'm one of these guys who honestly have trouble understanding some of the container, and I've been doing containers for a couple of years now, and these days are still, like every time I try to run Docker on, say, CentOS and Red Hat, these guys are very security conscious, did tons of work, but it just really slowed me down. So a lot of times I just end up running with SE Linux disabled. I know I'm not supposed to do that, but it shows me how much gap there really is, or I couldn't even get it to work with some of the security mechanisms in place. So I'm gonna give a little bit of a different perspective. Like I see this, and I know this doesn't answer your question because your question is probably more philosophical, but this is Coke versus Pepsi, this is Chevy versus Ford, this is Asian versus white food, right, of course. No, seriously. So but from that standpoint, we are going to continue to have containers and VMs, and it's gonna be personal preference, and one may kind of move to the front, like we have people still using CloudStack and OpenStack, one may have more hype right now, I think we'll have one more hype, but I think that ultimately what will be driven comes all the way back to this conversation about scale, and that is what is the easiest and most simplistic for you to scale to meet the business need? And you may have someone, like for us, like for me, it's people, it's always people, and you may have people that say, you know what, I think I can manage this environment much better if I have the containers running on some type of VM versus running directly on hardware or bare metal, I think that's better for me, and if that is what the team believes and is able to support that in a better way, far be it for me to come back and say, nope, philosophically I believe that it is better to provision bare metal and run containers on it, so it's going to be, I think, for a while personal opinion, and then that personal opinion, I'll let the people who are leading the technical charge on my team drive that decision. So I've got a question that I want to kind of go down the panel, or actually just to want you guys to answer something, what's the biggest kind of disaster you guys have seen in, you know, kind of hyperscale or scaling out, and then what was that lesson learned? We, Shen, do you have something? Yeah, I certainly remember some of the very difficult upgrades we went through in the early years, you know, it definitely involved us learning in a very painful way that a large scale system just cannot be upgraded in one shot. You know, you had to create canneries, you had to make sure the individual components would actually work, and you inevitably end up in the situation where part of the system is running in one version of the software, another part of the system is running in another version of the software. So, you know, I don't think, by the way, these challenges have gone away yet. It's still today, you know, the dream of a fully autonomous self-upgrading, you know, a loosely coupled kind of architecture is really still does not exist, but regardless of, you know, what kind of, I mean, you could use Kubernetes or use whatever, it's just really still the same, and I've seen at the end of the day still requires operators, the operations team, DevOps team, who deeply understands the behavior of the entire app and the individual components to sit it through. Not necessarily at a huge scale, like tens of thousands of machines, but still reasonable scale. I saw an open stack where a single call to list instances resulted in 700 selects to MySQL, or 700 queries to MySQL. And in that environment, when we are just making very simple, we have a desired state engine when we're just comparing observed state with our desired state. We're just making a couple API calls to figure out what volumes are out there, what instances are out there, and that was just massively over-subscribing, overloading the database. And yeah, that's just a disaster. So one of my experiences, several outages, but the earliest ones when we were in Amazon were just so excruciating, because my mindset and the mindset of how I would want to consume infrastructure was that in Amazon, I want that whole data center to not go down, and that happened repeatedly. And when you looked at the SLA and you're saying, oh, well, the entire half of the region can go down without any kind of credit, it's acutely frustrating, but it forced a change in behavior because how important that infrastructure was to the business, it forced the change in maturity of the application so that it could be actually fault-tolling between availability zones and that with every outage we got better at maintaining availability and preserving state and recovering from state and building that automation. And once you do that, you can perform, your workload can then run anywhere. You're so much more bulletproof and able to move it back, and that made it even easier for us to then move things to ZCloud because of that investment. I think it's an interesting question from the standpoint of, I'm sure all of us in this room have had one server go down and cause application impact. That really is no different than at massive scale having an outage. The only difference is level of impact, right? It's way different if you are like the federal healthcare system, the US federal healthcare system that's going down versus maybe a very small baseball trading card company or something. But the reality is it's the exact same problem and it comes back to exactly what Mark was saying. Have you developed thinking around and have you built your application, thinking around anything here can fail? And I'm gonna use a cop-out. I think the biggest outages we've had, not in size, the biggest outages we've had in size are for sure caused by change. When we either upgrade, roll something, do something, et cetera. That's guaranteed the largest scaling outage, but the longest duration, biggest pain in the ass, most frustrating, most challenging to our teams, demoralizing, et cetera, and this is the cop-out, are usually bugs. It's a buffer credit issue on a fiber channel network. It is some, the whole Google, the GCE outage that happened around that networking stack, right? It is we had a challenge, you mentioned yesterday a challenge with firmware. We had a challenge with a very specific piece of firmware that was doing something that was just a bug and we happened to be the first one to hit it. We've had firmwares on the global load balancer issues, right? You have them and they're so hard to find because in a true open source type model, you're like, I mostly trust this, but I'm gonna go to the community and make sure it's right and I gotta be a little bit careful, whatever. But as you, for a load balancer or something or a Cisco router, everyone here, I'm sure has had a challenge like that, you almost expect them to operate correctly. And when VRRP doesn't, and it's passing traffic, through both nodes for whatever reason and you're like, it says it's healthy, like that is really hard to find. And so rather than trying to say, I will fix every piece of infrastructure or an OpenStack, I will make OpenStack completely resilient. There was a really great session yesterday on Pacemaker, which we must do as obviously as the providers of that underlying infrastructure, but at the same time, we can't rely on that. We have to build application stats to be resilient. So one little note on that is, at some point, you're gonna have to make some assumptions on something that's gonna be available. Like on Amazon, at some point, TBS volumes got corrupted and they disappeared or, I mean, like if you start out by saying, hey, anything can happen, meteorites can fall in my data center and take out North America, then like if you start having absolutely zero assumptions on what's gonna remain available, it becomes really hard to make a large scale infrastructure. Yes, there's a but, right? And the but is it's totally based around like level of impact. So if the cost of that impact or that outage to your point is $50, well the decision that you make around what you're gonna assume will be available will be pretty much everything. Well, I'll just assume it's available, right? Cause I can't make any other decisions. Spot on, yeah. But if the impact is $50 million, then you should spend, multiply that by some risk-weighting and spend that on making it available. So you- Yeah, exactly. Yeah, exactly. Well, the microservices are very popular and it's about kind of small, lightweight, distributed. I mean maybe we're onto something in OpenStack where it's kind of like micro-infrastructure. We have very lots of small pods and distributed. Maybe it's built like, we don't even know we're doing that. Are you trying to spin the deficiencies right now? Yes, I think we're spinning it as very fault tolerant. Yes. Many availability zones. What's your question, sir? Hi, if any one of you can address this. So so far, this is my first summit by the way. It's been great. So far, all the conversations and the discussions that I've had, people have been talking about OpenStack and everybody seems to say it's still new, it's still nascent, it's growing, there's still a lot of things to be worked out, et cetera. You guys obviously have a lot of experience with running production clouds and large ones of them. Can you speak to what are the challenges you've faced in ensuring that the data for your applications remains coherent and is consistent and what steps you've taken to keep data available while you upgrade your OpenStack or your CloudStack and how have you ensured that you've not lost that? So I come from a financial company. We have time for a, we have time for no answer. Yeah, come see us, come see us after. We want to thank the panel. Thanks, Shin, Sebastian, Mark and Dan. Thank you all for coming and enjoy the rest of the conference. Cheers.