 If you have any questions, go ahead and raise your hand, he's going to repeat the micropone. So, with that. Yeah, awesome. Thanks very much. Can I have any warnings on time? Yes, please. I need, let's say, five minutes before. I would appreciate that you tell me. Yeah, good. Thank you. So, thanks all for being here. My pleasure being here. So this talk, and I have a lot of content. So I'm going to run. It's going to be recorded so you can watch it like even faster on YouTube later or slower if you'd like. That's the good thing about things that go to YouTube. So this talk is on four reasons why you need Istio. Or I took a different spin about talking about basics of distributed systems, right? I think most people, when they're developing applications today, especially if they're in the microservice world, they kind of forget that they entered the realm of distributed systems and they don't realize that, and that's the problem, right? So that's my perspective. Things that you will not be seeing today. You will not be seeing source code. You will not be seeing cats if you're expecting pictures of cats. And you will not be seeing an introduction to Istio. So there's a friend of mine here that's actually going to talk maybe a little more introduction to Istio tomorrow. Is that correct? Right, Saturday? Sunday. Sunday. All right. So go there. So I will talk. So this is not an introduction to Istio sessions. Awesome. So my name is Diogenes Bertori. I work at Pivoto. This is my Twitter handle. You can feel free to follow me on Twitter. Mostly you tweet about technology every now and then. A few personal things, but it's a good place to catch up on things. So let's go to distributed systems basics, right? I think the first we let's talk about the advantages of writing distributed applications. So what was once a big application? Let's see the advantages of distributing. Often distributed systems, they become more reliable. They become more scalable just by the nature that was what was once a single piece and hard to scale. It's now distributed in smaller pieces that can probably, should at least, probably scale individually, right? Now the problem with that is that the engineering skills required to write and develop distributed systems, they are much more complicated than you would write a single monolith app. If you're writing an application that's composed by 15, 20 different pieces, that is more complicated than writing an application that's going to be packaged as one file and run on a Android or mobile phone, for example. Just the basics of that, right? Also, when dealing with distributed systems, there's an increased need for tools and patterns that facilitate, right? So on patterns, I recommend a very good book by Brandon Barnes, one of the creators of Kubernetes. He wrote a book on distributed, designing distributed systems. It's like a short read that you can probably read in a day. It's not even 200 pages, but I strongly recommend. And I'm going to address some of the patterns that he talked about in this book in this session. So it's going to be good for that, right? Let's continue talking about bad news on distributed systems, right? It's hard, right? So one of the other bad news in distributed systems is that you're often dealing with chains of calls. So what was once a call, or let's say many operations calls to functions that happen inside a single application, a single package, now these calls, they're happening across distributed networks of applications, right? And the problem with that, and when there's a problem with, let's say, one of your components, it becomes certainly much harder to debug. You can't just open a debugger and see what's going on with that specific piece of technology because first you have to know where things are fading, right? So that is a problem. So like, how do you know exactly where the problem's happening? So this is one of the problems with distributed systems. Just identifying where the fault is. Another problem that makes distributed systems hard is that you're dealing with different types of protocols. Again, if you had one big monolith, most of the calls, they were internal calls, library dependencies, or just calling a function from another function. You don't necessarily have to deal with different types of protocols. But again, when dealing with distributed systems, some communications, they're going to be message-based. Some are going to be HTTP-based. Some are going to use Protobuf, GRPC. Some are going to use a file-based. So like, just the nature that you are now dealing with also different types of protocols makes it hard to develop distributed systems, right? So have that in mind when you're writing microservices applications that they're realme are in. It's in the realm of distributed applications, which brings you to a very important piece of this talk, which is the fallacies of distributed computing, right? These are mistakes that developers make when they're developing applications without thinking that they are now part of a distributed system network of applications. So have that in mind. Most of these concepts, they were from the 80s, and one of them was incorporated in the early 90s, right? So they're old concepts that exist in software development, but they are very important. First of them is that we often developed thinking that the network, and they're eight, the network that we're dealing with, it's a reliable network. They look and somehow trust the network. I'm going to talk A to B, and as a developer, I expect, yeah, the call is going to get there, right? So this is a problem. We develop thinking that that's not the case. Another problem is that bandwidth is infinite, and not only bandwidth as in the ability to transmit information, but the ability to process that information. Because bandwidth stops being a problem, the moment whoever's receiving a call doesn't have the ability to process that information. So more than bandwidth itself, just the ability to process information. Funny note on bandwidth is that while, from a technology perspective, we've been able to increase bandwidth and communications by more than a thousand times for the last 10, 15 years, the latency is still a problem, right? Just so you know, like the amount of time it takes from one ping, one, let's say, if you make a call to a service from New York to London and back, without zero, zero, zero processing time, it's 38 milliseconds, just in latency, right? So often we don't think about it developing applications, especially distributed systems. So only in physics, basic speed of light, it takes 38 milliseconds for a ping back, ping back from New York to London. So that in mind, the second way is that we often think networks are secure, networks are not secure, and I'll give you like a few interesting, some interesting examples later on. Topology doesn't change. This is one that's like myself as a developer in my early days of development, I never thought about it, and I'll talk about it. Why? Latency zero, I mentioned it about latency as well, that like we think that communication is going to immediately get where we think. And again, like as developers, I don't remember myself thinking like 15 years ago about this when developing an application. I assumed everything here was true in my source code and the things that I developed, which is wrong, right? Continuing, there is one administrator, right? So that essentially means like, oh, there's one person I have to talk to, there's one, let's say there's just a single system, but there's one group of people. And then this has become, let's say, not so much of a problem lately, but when it was created, yes. Another one is that transport cost is zero, like how much, how many times we think about the payload versus the overhead of a call or request when we make it, right? I don't think we do it that often, right? Unless you're dealing with, let's say, unless that is a problem or impacts your business. And often we see people doing, when that starts to impact business, oh, my network bill for AWS is starting to become a little bit complicated. What am I doing? How can I bring down the number of bits that I transmit? And the other one is that the network is modular, so the network doesn't change. So these are all fallacies. These are all things that I did, when developing applications, that I forgot to think about that this are all lies, right? So have that in mind. And of course, this is a presentation on Istio, so I'm going to talk about four things in Istio that I believe Istio help you with. Now, Istio can certainly help you with one of them, but I want to touch, especially given the time, I want to touch on four of these, right? That the network is reliable. And please forget that this is, what's here is not true. They're all fallacies. So every single thing that's here is a lie, okay? Let's talk about network reliable, just introducing the concepts a little bit. If there are a few ways you can think about network, if you just consider MTBF, main time before failure of network equipment or servers, that is already a measure of things that things go fail. There's the stacking that can be applied. If you stack routers or network components in parallel, then you probably get twice the MTBF, but if you make them serial, then you get half, right? So this again, there's just a physical nature of harder failing. And this is a switch, by the way. And there's also the aspect that every single network that we run today is some sort of virtual network, not necessarily a cable directly connected to a switch, but you have to also factor that the machines that are running the networks, they're processing many, many other things as well. Like any Linux operating environment can have a virtual machine, and that's the main thing. So we don't think about that when developing the applications, that the actual networks, they are not necessarily reliable. And these are servers. They look very bad, but they're servers. And now the second point which I'm going to address is that the network is reliable. And I made, let's say, using some poetic freedom here. I just changed the word network from the endpoint, is that many, many times we think that when we're connecting or interfacing with an external application or endpoint, we tend to think that that endpoint is going to be there for you. It's going to be right. And we don't often design applications thinking that whatever endpoint I'm connecting to, might not be available to handle my request. So I'm just switching a little bit this one from the network is reliable to the endpoint is reliable. And a big mistake that I did, which is a very strong recommendation that I give it to you, is that I always design for reliable endpoints, assuming that whatever I was going to interface with was going to be there always, all the time. So I never had to deal with, let's say, alternative error flow for applications when integrating with other systems. Because during testing, we assume if these applications are not running, my application is not going to work. These do systems that need to work together. If they're not up and running, then my application is not going to work. But many, many times I ended up compromising the experience of my application. I'm the owner of BA because the application of BA was not running there. So a very important recommendation is design for unreliable endpoints. Design for endpoints that might not be there when you're developing. And again, especially in microservices, please also hear distributed systems and distributed computing, because that's what's really happening. The fourth policy that I'm addressing, and hopefully I'll talk about what Istio has to do with them as well. I'm okay with time, it seems, is that the network is secure. And this is especially interesting, right? And not necessarily talking about the network itself, whether or not a communication, let's say a physical channel, a physical network channel is secure or not, but whether or not the communication between applications is secure, right? Now, how many of you remember the Equifax breach about... It's not even a year ago, right? I'm sure those of you that live in the U.S. think you have to at least spend one hour of your time thinking, like, how does this will impact me, right? Am I going to sign up for the package that Equifax is offering? So, like, all these things we have to think about. And when you think about, like, what created that, it was, like, okay, you can sort of blame the unpatched struts vulnerability. That's how people got access to technology. But then there is unpatched software every day, right? Like, we need to keep patching our software or fixing bugs every day, right? So, is it enough that I, as a developer, can just blame unpatched software to vulnerability like this? That's so naive to tell, right? Because we all know that security is not a one-time exercise. It's a practice that you need to have. And I spent some time, invested some time reading a little bit more about, like, what and how the vulnerability happened. And what was happening is sort of like this. So, like, there was an application, there was, like, users here, like, evil users here. I should have, like, put a sad face here. Talking to a external-facing app that was using struts, struts was great technology back then. The patterns in struts are still used. It's a mother-view controller pattern, great for separating constraints inside applications. Now, this struts app was running inside the DMZ, which is often the type of network that separates an internal network from an externally-assessment network, right? And this strut app has access to a DMZ, the middle horizon, right? So, now, the thing is that, when I say that the application network is secure, is that this struts app had unrestricted access to other applications that it should not, right? So, that's how the failure happened, is that someone got access to use the vulnerability to have access to the struts application. And then this strut app, even though it had a specified flow of information issue, it also had access to other types of components inside the application, which were then exploited to get our data, right? So, that tells me that whatever application network we had here was not secure. So, when you have that in mind, that you're not necessarily talking about the physical communication networks, but also, like, how much information are you exposing from your application to networks? Should a database allow million subscribers to be retrieved? Should that have raised an alarm? Probably. Should I receive a call from a user that I don't know who that user is? From I receive tons of calls from a user that should not be making those sorts of calls. So, those are all, let's say, red alerts that we should have that something wrong and weird is going on. So, have that in mind, application, it's not secure. Now, the other one, it's, you're developing your application. So, this is you developing your application and your stack overflow because you don't know what you're doing. And then you develop your application and then it runs in your machine and it's great. We always test some aspect of the application on our machines. Just using this example to justify that topologies do change, right? The topology of the network in which your application participates in your machine, then in itself, in QE in production, right? So, we often think that topology is not going to change, but like just from the developing environment, the application that I'm running here, Java application, C, Golang, just the network of that environment is going, it's different than the network of other environments. So, from the moment you're developing an application to the moment you're going to run that application, either in a QE environment or in a production environment, the application topology is going to change, right? They are different. Now, there are more complications, right? In terms of topology, it doesn't change. If you are dealing with sensitive information, either from, let's say, PHI, which is patient health information, under a HEPA compliance or PCI DSS payment car industry, I forgot the designation for DSS, there are rules that tell you that cold running in production should be separated from cold running in a QE environment, which often can say something that the topology of a production environment of the network is likely going to be different, right? So, this is a slide from a recent presentation given at the Boston Kubernetes Meetup that I run and I love this because it was someone talking about how they run Kubernetes in a PCI DSS environment. So, like, how to run Kubernetes in an environment where it's going to be handling and dealing with credit card information. And there is a rule in the PCI DSS that says that separate development, test environments for production environments and enforce separation with access controls. This is again, just to find that, yes, topologies are different. If you're developing, assuming that the way things look in your laptop, by the way they look in a production, that's not probably a good assumption. Now, with the fact that topologies do change, another problem comes to us, which is how do we deal with changing topology in an effective manner. And this is where bad news comes, right? This is data from the last DevOps State of DevOps report 2017. And this tells that only 28% of the high-performing companies, high-performing companies have automated configuration management for their applications, right? This is very bad because the very high-performing companies, what does it mean for that? At least in that context one of the metrics is that you're able to turn in a code change in production in less than one hour, right? And this is not how long does it take for you to deployment. It's like from the moment a code is pushed to a repo, how long does it take for that code to be packaged, tested, and deployed and run in production, right? So companies do that and that's the effective method because companies, they do that in less one hour. This metric is called the lead time, right? So the lead time between a fix is fixed or a new feature is implemented. It's in code, it's in a repository and from the moment that reaches production. So the lead time, it's one hour. So companies that are ultra-high-performant they do that in less than one hour. So it's one hour from the moment the code is there to the code is running in production, right? Passing, of course, many gates, many checks, sometimes there are even manual checks. But the problem is that dynamic configuration management is hard and I tell you it's hard because the very high-performing companies only 28% of them do that. So it's it's complicated. It's very complicated. So there is again manual steps involved in that. And the fact about, the point about the policy changing and dynamic configuration management is that you should be able to dynamically know where things are running, right? So if you need to someone to change the IP of a database manually that's not necessarily dynamic configuration, right? Happening. If the point of an application changes from one IP to the other and someone has to do that manually if the part changes if someone has to do that manually that's not dynamic configuration management, right? Hopefully you know that there are ways to do this. And to talk about bandwidth is infinite but I'm going to have to derail this conversation a bit so that's why the tracks and apparently you're narrow that you're getting off tracks. Good. So again, concepts from the distributed systems designing distributed systems there are three ways you can sorry, three patterns for designing distributed applications three groups actually, right? So there's a single node group which is a technique that you apply to applications running close to each other in a single node so even though it's distributed systems you can take advantage of running patterns such as sidecar pattern or ambassador pattern which is you bring some sort of intelligence close to the workload that you're running. There's also serving patterns which is the more popular one so you have like a web server and you replicate that web server you have 10 copies of that web server that is a distributed system pattern and batch which the name says it's batch, right? So again, the stateless it's very popular you don't have a stateless application and you need to do something with it you replicate as many copies as you'd like because they don't handle state or state is handled somewhere else, right? State is always handled somewhere so stateless is kind of complicated to say there's always state being handled somewhere you just make many copies of the exact same application and you're good with that, right? Now sharded it's another pattern for distributing and sharded especially used when there's a very large amount of data involved because think about that, if the data is small enough that you can have copies of data keep them consistently like in many nodes if it's small enough then there's advantages in keeping that but when you have large pieces of data and it's expensive to keep copies of all all the data then you start thinking about sharding which is when you distribute the data you get the data distributed so that you don't have to store all the data all the time in all the replicas, right? If you're doing sharded write you should always think that having one replica of the data it's not enough, if that goes down you're going to have disruption, you don't want disruption so often sharded data deployments they come together with replicated data models so there is some part that's sharded and you're also replicated part of that data somewhere, right? The scatter and gather, this is more for processing if you heard of map reduce similar patterns where you try to distribute the processing you load into many leaves it's in a tree model, you distribute to the leaves and after you finish processing at a point in time you're going to then reorganize that data in a way that makes sense so when you're dealing with processing data that's also coming and then there's my batch batch is another one of course that's very popular, you just normally schedule jobs you can use batch model for that now my favorite in the session right before this one was a little bit about this it's event based so for me in my opinion the largest the ultra great advantage about serverless is that it's event based it's not request response based, it's an event you're going to process, you're going to consume capacity hopefully when you get an event you're going to do that often messaging based systems they are event based so as I said today very common types of distributed computing that are event based is serverless and function as a service have you ever seen a serverless data center? this is how serverless data center looks like it's amazing right how did they do that, we don't know oh sorry I can't I know I lied so congratulations you are now certified distributed system developers I even have my certificate that I printed for me so after finishing this part presentation I am now officially certified distributed system developer by the distributed system institutes of course I just made this up but the point is that you know like the some of the hard problems that are involved in developing distributed systems you should factor that in your development, in your day to day activities now finally like I only have like I don't know 10 minutes a little over 10 minutes and I'm finally going to talk about like something that was in the title of this presentation but for me it was very important to give you the idea that yes microservices development is distributed systems there are tools and patterns that make your life easier when developing microservices applications this is one of them this is a very good one right so the first thing that I'm going to address using Istio and when I took the picture of this slide apologies for the lack of focus there is that the network is secure so assuming so this is a lie so the network is not secure so what do we do then to make sure that we can handle the problem of networking security so one of the things that Istio has out of the box it's ability to support mutual TLS mutual transport layer security that means that if you're talking from endpoint A to endpoint B they know each other they are trusted and you have formally established that they can communicate with each other and you know why this is important because if we assume the network is secure it's not it might be the case where an application like this app that access another one and you know as an architect that that show flow should not have happen so again there are technologies Istio with mutual TLS can help you with that so that it allows you to formalize the information flow so if you as the data architect for the application or the architect you know what systems should be used for a specific business functions then that should be formalized this application can only talk to this one and no one else and under what circumstances so have that in mind that Istio with mutual TLS allows you to specify a secure communication channel formalized so it's not only like okay I'm gonna add transport layer security and everybody has transport layer security but everybody has can talk to me you're actually specifying who are you expecting to be interfacing with so that calls you reduce the risk again security is a practice so you always have to do the sorts of exercises the other it's on topology doesn't change that we all believe now that topology does change is that okay so you understand that from just from your basic development environment from the QE environment that your application topology is going to change and in order to help with that you need to have in mind dynamic configuration management which is hard but if you have dynamic configuration management in mind and you do that as part of your activity you're gonna have you let's say in less trouble so one of the things that Istio does with the addressing dynamic configuration management is that there's a piece of there's a component in Istio called pilot that knows where the endpoints are and when as new endpoints come and go they get notified of their existence so if endpoint foo version 3 comes up it will notify pilot that there is an endpoint foo version 3 up is available and anyone that wants to interface with that endpoint pilot will know exactly where that endpoint is and all the diversions that are available for that endpoint and even more like how many times we can call that point and under what circumstances so again to address the fact that topology does change and dynamic configuration you still can help with that so that was reason number 2 for Istio dynamic configuration management of endpoints very good so this is the example though ok let's say my application running in node A under a certain IP under a certain network changes it's now running in a different network under a different IP still pilot is going to be notified of this change and you can have access to that information if you want to call that application right again topology will change because because it does change now I think the network is reliable on is also very interesting because sometimes we think about compromising our application because we didn't necessarily provide the real users of our application with the necessary priorities that they need so in case I have an application from A to B and again as owner of application A do I want to disrupt my own experience because B is not running and what sometimes happens is that we knowing that some things are failing we still wait for fader like I'm going to talk to application B I know it's down I'm still going to wait for a time out I'm going to talk to application B I know it's down I'm still going to wait for a time out so the user experience the end user experience gets disrupted even when you know that the other application that you're interfacing with it's not responding so why wait for fader if you could know right now that's not working and keep the call local right so one of the things that Istio does is that through the circuit breaking it opens the circuit it knows that that application is not responsive and every many calls or every many minutes you can then call that application again to see it has come back up right so the fact that you're not waiting for fader increases the overall response time of your application right because you're not waiting waiting for things to go down to fail and then do something you know that they are down right so and then the fact that there's a centralized repository that knows the applications that are up or not any new application that wants to access the same application we also know that the application is failing and then can keep the call locally right again increasing reliability you know having to wait for fader and to address the bender with is infinite is it's rating limiting the rating limiting is mostly an often associated to API management but I think API management is changing a bit it was only seen as a technology for external APIs but more and more we see internal APIs if you're dealing with microservices multiple applications distributed systems you actually have multiple APIs internal to your applications in the same way that you would only let's say if you have ever used the Google Maps API if you want to make more than five calls I think per second you have to pay you should also have that level of that level of control inside your organization there are more important applications inside your organization and you should prioritize those in the sense that let's say if you have a mission critical application and others you should probably give that mission critical application the ability to receive more requests than just a regular application imagine that you might have other applications impacting mission critical application so one of the things that you still can do as well is the ability to add rating and limiting to your application so that you don't necessarily fall into the fact or fall into assuming that your bandwidth is going to be infinite you can give each individual user and you can do that in a job base like json web token base individual users you can give them different levels of permission to your application in terms of number of requests and requests per second so given that I have two minutes let's go to the summary so addressing the four distributed system policies that I talked about network is not reliable and for that recommendation is that you can use circuit breakers the network is not secure recommendations that you can use mutual TLS topology changes the recommendation is that you use dynamic configuration management discovery if you can do that and the bandwidth is finite so the recommendation is that you use rating and limiting for that so that's it twitter handle is retory and that's what I wanted to talk to you today thank you very much now we do have a couple minutes for questions so if you have a question I can repeat the question then four minutes alright I'll be in the back if you want to talk to me individually and this is a topic I'm very passionate about so feel free to reach out to me thank you very much have a good day