 from our studios in the heart of Silicon Valley, Palo Alto, California, this is a CUBE Conversation. Hello, and welcome to the CUBE studios in Palo Alto, California for another CUBE Conversation where we go in depth with thought leaders driving innovation across the tech industry. I'm your host, Peter Burris. This is going to be the second in our series of demystifying cloud native that we've undertaken with some of the thought leaders at Cisco. The basic thrust of this conversation, the basic thrust of the series is to have developers learn something about what it means to build increasingly distributed applications, utilizing cloud native services that will work, scale, and serve enterprise need. Now to have that conversation, we've got Dominic Torno who is a principal engineer at the office of the CTO at Cisco. Dominic, welcome back to the CUBE. Hey, thank you very much for having me again. Okay, so upfront I said we're going to continue in our series of demystifying cloud native, but before we do, let's review. What do we mean by cloud native? So in our last installment, we talked about cloud native and defined cloud native applications as applications that are scalable and reliable by construction. And in order to be scalable and reliable by construction, an application has to be able to detect and mitigate load and failure. So if we think about detecting and mitigating load and failure, it's a very, very simple definition, but very powerful as all good definitions are. What types of technologies are available today that allow us to actually instantiate this notion of cloud native as you've defined it? So you want to look under the umbrella term of operation automation. And there are multiple solutions available that you find either proprietary on clouds like AWS, Google or Azure, or you also find solutions in the open source space and the most prominent solution of that nowadays is obviously Kubernetes. And so microservices is kind of an architectural approach that people can start thinking about. And microservices is an approach people can start thinking about because microservices are, especially when they are stateless microservices, scalable and reliable by construction, by themselves. Now you have a very good foundation to build a scalable and reliable application out of scalable and reliable components. Look, I've been around in the IT industry for a long time, worked inside IT, run IT organizations, been an analyst. And I know from experience that you can take great technology and nonetheless create crap applications with it. So what is it about microservices that increases the likelihood of success with cloud native? And but also very importantly, what must developers stay cognizant of to ensure that they stay within the good guard rails and don't drift out to junk applications? So first, let's look at what microservice application architecture actually is. Because when we contrast it to a traditional also called monolithic application architecture, we see that on the boundaries looking from the outside in, we are still talking about one coherent application. From an end user's perspective, we are not looking at a loose assortment of services. We are still looking at one coherent application performing a task. But looking into the inside, we see significant differences. So for a traditional application, all components of the application run within one process and one machine local network. However, when we move to a microservice application through microservice application, the individual components and actually the individual component instances run in their own processes and they communicate via an actual network. Now having your individual component instances run in individual processes allow you to easily meet scalability and reliability. You can easily scale up more component instances or in case of failure, you can easily scale up replacements for them. But as you said, you have to keep in mind this does not come for free because you are throwing a few challenges the developer's way. On a workload level, the challenge is now partial failure. One of these component instances may have experienced a crash fault at any point in time. Whereas in a traditional application, the application as a whole would experience a crash fault but not part of the application. And when we move to the network level, then it looks actually even more bleak because now messages may get lost, messages may get duplicated, messages may experience delay, so latency. And with all of that, this is something the developer has to face and has to book around. Let's dig into this a little bit. So the monolithic application basically of a single process, so all the resources are bind together inside a single memory space or inside a single shared state and when one of them fails, that brings them all down. So the user knows explicitly it's up or it's down. But when you start building some of these microservices where each of these individual components, the loss of a particular, perhaps critical, perhaps not like, but let's say a security feature within the cluster could go down, but the rest of it might feel like it's still working which could dramatically increase the likelihood of exposure on any number of different levels. Have I got that right? You got that right. And especially if we talk about state, the rated state, but the one that we actually all need in our applications, if we talk about state, we're talking about inconsistencies. And that is obviously the nightmare of every application developer and application operator. So we got message loss, message deduplication, message reordering, and we're introducing latency because we're putting a stateless network as the mechanism through which we get the communication amongst the different components. Have I got that right? That is absolutely correct. And let me throw in one more keyword in that case. We were talking about workload level, partial failure and network level message loss or message duplication. Unfortunately, there is actually no way to reliably detect has a request not been sent? Was it lost? Did the process, the receiving process experience a partial failure or has a response not been received? Was it lost? So you cannot reliably detect any of these conditions which leads us to the point that we cannot guarantee exactly once message delivery. We can only guarantee at most once or at least once. But as developers, all we ever want is one service consumer call a service provider exactly once. However, we have to work around these terms and this is what makes our application development very, very complex. Yeah, we want that consistency and that ability to discreetly say what is or is not isolated. In the overall notion of what constitutes a transaction. Very correct. Bottom line, that is a good takeaway. Microservices, one way or another, will kill your transaction. If you don't do them right. So from a developer standpoint is you, you're within Cisco, but you spent a lot of time thinking about the developer role, thinking about what developers need to do differently to fully exploit these cloud native capabilities. How do you communicate or see developers and infrastructure people doing a better job of communicating so that they can each be aware of the realities that the other is facing? So personally, I strongly believe in strong, accurate and tangible definitions in order to have a solid basis and foundation for good communication. Because our responsibilities in a cloud native world, whether it's application developer, application operator or infrastructure operator are only going to get more complex. So we rely on solid and precise technical communication to identify the challenges and communicate solutions for these challenges effectively. Now, one of the things that's interesting about the cloud native microservices sets of technologies is that they're starting to be paired with other classes of technologies for doing a better job of going beyond just simply orchestrating the communication amongst various resources in a cluster. Where do you see some of these new technologies like Istio and whatnot, starting to assert themselves to help developers do a better job of building cloud native applications? So let me state the following hypothesis. For cloud native, we got the workload management right. We didn't get the network right just yet. So, and when it comes to workload management, solutions like Kubernetes do a fantastic job for the application developer and the application operator and provide solid primitives to build your applications on top of it. However, we are still suffering from problems not within a Kubernetes cluster, but across Kubernetes clusters. So, when we look at, for example, the Kubernetes networking model, communication within the cluster, we are set. We're good. Communication across clusters, we still have some challenges. And we do see emerging solutions in the space like, for example, Istio and other service measures that increasingly do not only address the situation within a cluster, but also across clusters. But we still need to make a leap forward into a different kind of cloud native networking. And I do believe that cloud native networking will show itself or define itself as a workload-to-workload connectivity. So, eventually, we will separate the runtime domain, the cluster from the connectivity domain, and then enable a workload on one cluster to talk to a workload on another cluster seamlessly without opening about 15 or 25 tickets. Yeah, exactly. And so, the communications becomes natural to each of the workloads? Correct. The communication becomes natural to each of the workloads, which is a prerequisite for efficient development. Of course, I clipped a bit with the tickets, but it is an actual reality that as soon as you leave one cluster and you, for example, need to reach workloads at our own premise or you need to reach workloads in a different cloud, sometimes even just the different availability zone, you will encounter communication processes with the infrastructure folks in your department. The communication is heavily around tickets. It will slow you down a lot. And in our age-child world, that is not sustainable. Well, look, as you said earlier, there are four things you have to worry about within the, as you request services from the network, latency, loss, deduplication, partial, et cetera. As you increase the latency, the other three are absolutely going to create problems for you. Oh, yes, absolutely. And so, I think that's kind of what you're saying, is even within a single cloud provider, if you start changing reasons and you start introducing distances, you start introducing latency, the issues of partial message delivery, deduplication of messages, message loss, et cetera, will assert themselves and become a bigger challenge for developers. You know the Heisenbach theorem, right? In distributed applications, the Heisenbach is a bug that will disappear as soon as you look at it. So, and... I didn't know that. So, in distributed... But now I'm thinking about it. In distributed applications, when you test your system on a local machine or even a set of local machines, right? You have a very good chance that the actual corner cases will never show up in your test cases. But as you said, when you introduce a latency, the Heisenbach from an odd outsider will become a surefire thing. So, as soon as you roll your application in production and then roll it out, of course geographical regions introducing the latency, you will see a lot of that. Heisenbach. The Heisenbach. I like that. All right, Dominic Turnau, the principal engineer and officer of the CTO at Cisco talking about demystifying cloud native. I want to thank you once again for being on theCUBE. We look forward to seeing you again. Thank you very much, me too. And once again, thanks for joining us for another CUBE Conversation. I'm Peter Burris. See you next time.