 Well, hi friends, you've joined a talk named how to stand on the shoulders of giants our journey of open sourcing open service mesh Reading code writing code and asking the world for help. My name is delian reichev I'm a principal software engineer with Microsoft Azure and I spent the last year working on open service mesh What an incredible team of engineers Now in this talk, we're going to talk about what a service mesh is in general we're going to dig into the details of what service mesh interface is and Then we're going to talk about what exactly open service mesh is. I'll show you a demo of open service mesh We're gonna take a look at the code and then we're looking to the lessons that we learned in the process of open sourcing open service mesh Now let's step back and talk about what is a service mesh in general and for that I would like to illustrate the features or the value that service mesh can add to your business by first pretending that we're a CTO of an online bookstore as A CTO of an online bookstore. We have quite a few engineering teams They're working on very many micro services that we already have running in their Kubernetes infrastructure But for the next quarter as a CTO We have the three tasks of first improving security Building observability for a micro services and then figure out how to dynamically manage traffic Now improving security we want to do by being able to apply fine-grained authorization and encryption We want to implement mutual TLS between all the micro services because we're running in a zero trust environment Second we need observability We want to be able to collect metrics collect distributed traces logs We want to be able to audit our system and most importantly We want to be able to understand the topology of our micro services in order to unravel the complexity and make improvements and finally We want to implement dynamic traffic management We want to implement service failover path-based routing traffic splitting and shifting and various deployment strategies like canary deploys for instance because We want to deploy new versions of our software with zero downtime Well, how can we achieve that? I think in a classic scenario We could just go to the software engineering teams and ask them to just start working on that Improve security by implementing MTLS in every single endpoint of all of your micro services Please implement observability by instrumenting logging metrics traces send them to a service And then we can manage traffic By implementing client-side or trial logic circuit breaking But that's a lot of work to ask from engineering teams that are already busy Implementing bookstore specific business logic now They could leverage libraries that exist to implement most of those features a few of them come to mind We can use Twitter's finagle or Netflix history or Google's tubby but those tend to be language specific and Oftentimes, we also need to understand how those libraries work and it's not necessarily a easy or quick effort So we don't necessarily want every engineer to become a security and encryption or MTLS expert We definitely don't want to re-implement retry circuit breaking or really any other feature in every product in every language We know this is going to lead to varying degrees of quality of the implementations And we definitely don't want to have various configurations depending on the team and we don't want to have each team be responsible to manage their own certificate Rotation or certificate distribution that's going to end up in very many different techniques and inconsistency and insecurity because of that What we want instead is engineers to focus on the core features of the product Which is a bookstore and we also want to have a homogenous security Observability traffic management for every product and any language that we're using in our company and finally We want to have a system that can centrally managed all configuration. So configuration can be reviewed audited etc How do we achieve that? Well, I believe that all those features can be given to us By a service mesh the service mesh can be that Infrastructure that will give us all those features without having to ask all the engineers to commit extra time to build those features out Let's talk about a service mesh. Let's define it I'm going to use a few references here I love Lee Calcote's book on Istio up and running William Morgan from Boyant wrote an excellent article on what a service mesh is and why you need one And of course red hats blog on what's a service mesh is excellent So to summarize those a service mesh is an L4 or L7 communications infrastructure for microservices as well as monoliths Now a service mesh like we talked and this is what we want It's going to shift their responsibility for much of their reliability visibility and security Out of the application code and into the networking infrastructure and this is what we want We don't want to write all those features in the app code We want to shift it into the networking infrastructure Which then means that the service mesh will decouple and centralize The functional responsibilities of instrumenting and operating services out of the developers and into some sort of operators So we can free up the developers to focus on the features and an operator can then take care of those extra features So to summarize and to formalize Service mesh is the observable debuggable reliable and secure data plane for any programming language or framework. How does that sound? Okay, so if you wanted to cook a service mesh, what would those ingredients be? How do you actually make it? Well, first of all, you need a data plane For data plane, we're gonna be using some sort of a reverse proxy And there's quite a few of those and they're all excellent for instance envoy linker D engine X for HAProxy come to mind We're gonna need to control plane for the proxy and we need some sort of an API to be able to instrument and Declare the topology of the service mesh. We could use CRDs JSON or SMI spec for that And so the recipe we're gonna use those ingredients. Well, we're gonna add the data plane component to each proxy We're gonna add it to each one of our Payloads or binaries that are running on our Kubernetes cluster So each pod on our Kubernetes cluster that needs network access will get a sidecar with a proxy For all SM. That's envoy We're gonna connect those proxies to a control plane Which is going to tell the proxies exactly what to do by configuring them And finally, we're gonna apply various policies via this API. This is a service mesh API So now I'm gonna delve into this API. I'm gonna talk about the service mesh interface specifically So what is SMI? SMI was something that Microsoft announced in May 2019 and It is a specification for service meshes that run on Kubernetes This is a common standard that can be implemented by a variety of providers Now what are those standards? Well, so we have three pillars. The first one is a traffic policy Second one is traffic telemetry and finally traffic management when we say a variety of providers What do we mean? Well, that means that you can actually define those routing telemetry and traffic policies in an abstract way using service mesh interface And then apply it to different vendors So you use the same declaration for your service mesh features regardless of what the service mesh vendor is You can use Istio. You can use console or linker Z and the SMI policies will remain the same allows you to Declare service mesh features in a vendor neutral way And so open service mesh is one of those service meshes that supports service mesh interface. In fact Open service mesh implements service mesh interface natively. It is not the Canonical or the reference implementation of service mesh interface. It is just one of the implementations and Open service mesh was open sourced or announced in August of 2020 and it became a CNCF sandbox project in September of this year We welcome you to come and check out our github repository at github.com slash open service mesh slash OSM Now let's take a look at OSM now So open service mesh is a lightweight and extensible cloud native service mesh There are four principles that we embrace from the get go They have been guiding us throughout the last year of developing open service mesh First of all, we want to build a service mesh in a repository Which is very simple and easy to understand and contribute to this is focusing on Folks that arrive at the github repository. We want them to have really easy time onboarding and kind of learning from the source code Second we want to make open service mesh to be effortless to install maintain and operate the focus here obviously being the operators of the service mesh and When trouble arises, how do we make the troubleshooting process painless? This is our aspirational goal to build tools that make it very easy to identify and fix issues Within the service mesh, which tends to be very complex And finally, of course, we want to keep it really easy to configure by leveraging service mesh interface So what are the features that we have in open service mesh? Well as a version 0.5, which points to the fact that it's not production ready just yet You can apply policies which will govern the TCP and HTTP traffic axis between the various microservices in your Kubernetes cluster You can encrypt the traffic. In fact Open service mesh supports MTLS out of the box and Only encrypt the traffic will be flowing once MTLS is installed and we're leveraging short-lived certificates with a self-signed CA Open service mesh will start collecting traces and metrics as soon as it's installed To give it the observability that you need or we need as a CTO and finally with open service mesh We can implement traffic split and traffic shifting and we're going to show you this in the demo shortly Where we're going to split traffic between two different versions of bookstore All right, let's take a look under the hood So here's a Kubernetes cluster and we have already installed OSM On the right side, we see the OSM controller pod and there are five top-level modules inside the controller pod On the left side, we see SMI spec being applied to the Kubernetes cluster and in the gray box We have a Kubernetes service account a Kubernetes pod and three containers in it Now let's take a look at the SMI spec So the SMI spec is being applied to the Kubernetes cluster using the kubectl command SMI spec is in the form of some sort of YAML and it defines policies Which explicitly allow various services to talk to each other SMI is constructed in such a way where Microservices that are explicitly allowed to talk to each other will be permitted if they're not If policy does not exist then they're now permitted to talk to each other now the SMI spec will be Consumed by the OSM controller, which has informers and this is represented by the orange mesh Specification box, which is observing the SMI spec events and next We are going to take a look at the yellow box, which is the web hook and injector the point of that box is to Essentially intercept all the pod creation events flowing through the Kubernetes cluster and Augment each one of the Kubernetes pods. They belong to a namespace, which is in the service mesh Once a Kubernetes creation pod Event is intercepted We are going to augment that pod spec with two new containers So the app container is the original container that's in the pod We're going to add two new ones. The first one is the init container, which is ephemeral that init container will instantiate or Essentially apply a few IP table rules Which will then route all the traffic flowing in and out of the app container through the envoy proxy The envoy proxy is the second container. It's not ephemeral. It's gonna stay there forever The envoy proxy is a sidecar Which is actually what adds all the features that we as the CTO of the bookstore want from the service mesh The envoy proxy will augment the app container and it will add all the retry logic It will add the MTLS encryption, etc Now the web hook plus injector module in the OSM controller pod Will create the envoy sidecar with a particular bootstrap config from the get go and that bootstrap Config will contain two things the first one is the FQDN of the OSM controller pod and the second one will be a MTLS certificate, which is very specific to the envoy proxy The FQDN is pointing to the proxy control plane and once the envoy Establishes an MTLS GRPC connection to the proxy control plane It will present its unique certificate and then the OSM controller pod will know exactly Which envoy that is which pod it's coming from and which app container this envoy is fronting That will allow the OSM controller pod to send the exact configuration or the unique configuration needed by the envoy proxy Now speaking of configuration for the envoy proxy. Let's take a look at the Endpoint providers box, which is the blue box on your screen on the right side The endpoint providers will observe the Kubernetes cluster and it will provide us with a list of IP addresses important numbers for peer Envoys that are fronting other app containers So that our envoy here will know exactly what other IP addresses important numbers to route traffic to In the middle, we have a certificate manager and this is kind of an abstraction around either HashiCorpVolt or certmanager.io or our own internal certmanager or cert issuer, which is based on Golang's crypto x509 libraries and Finally OSM will also install Prometheus and Grafana to allow you to visualize various metrics in the cluster and Also, it will program envoy to send Traces to a Jaeger instance. So should you choose to do that? Let's take a look at summary of the five components inside the OSM controller pod We talked about the proxy control plane, which is where all the envoys connect to we talked about the certificate manager Which is an abstraction over either HashiCorpVolt, certmanager.io or an internal cert issuer and then we talked about the endpoint providers, which Gives you a list of IP addresses important numbers and finally the mesh specification, which is the SMI policies We have the mesh catalog, which will combine the outputs of all those Facilities and it will use the proxy control plane to send configuration to the various proxies And so if we were to look back at the recipe of cooking a service mesh What would the particular recipe be for open service mesh? Where the data plane for open service mesh is envoy proxy the control plane is a lifts go control plane which implements XDS and Of course, we're using the SMI SDK The API that we use of course is service mesh interface Which open service mesh implements that natively? Alright now let's take a look at the demonstration of open service mesh First, let me tell you about the various Components in my demonstration here in the middle. We have a bookstore. This is one micro service Which essentially is a server which you can buy books from using just HTTP get Top left of our screen. You have a book buyer. This is an infinite loop of a service Which will be HTTP get in books from the bookstore and In the top right we have book warehouse. So when the bookstore runs out of books, it will be HTTP get Getting books from the book warehouse and we're gonna be surprised to find out on the bottom left corner There is a book thief that services also in an infinite loop Performing HTTP get on the bookstore and also buying books What we want to do first of all is we want to apply a policy that blocks the book thief from purchasing books from the Bookstore and second we're gonna want to deploy a new version of bookstore. We want to deploy book store v2 without Experiencing any downtime without the book buyer noticing and let's take a look at how we can do that All right, and now let's take a look at a demo of Open service mesh. So first I'm gonna explain what we are seeing on the screen here first of all on the right side, you will see that I have my terminal window and It's essentially issuing the kubectl get pods command In an infinite loop we have the various namespaces book buyer bookstore book thief book warehouse Those are all the actors in my service mesh and you will also See that we have in the column ready the number of containers of each pod They all have only one container because none of those have really been joined To the service mesh, which I've already installed. So I do have OSM controller running and It's ready to go. But like I said, none of those namespaces have been joined to the service mesh just yet All right. Yeah, and so we're going to Issue various commands in my other terminal on the bottom window That's where we're going to be editing the various YAML files SMI spec We're going to execute shell commands. I do have port forwarding running, which is going to Allow us to see all the various windows on the left side So on the left side, we have port forwarding like I said from the book buyer pod the book thief and the bookstore the the book thief is much like the book buyer also Obtaining books from the bookstore in an infinite loop. Here's bookstore v1 and bookstore v2 And like I said, we're going to be watching the accounts of books increase and over time we're going to Make sure that book thief eventually stops purchasing books. All right. So first, I'm going to show you using a pick up file that the bookstore Commands or bookstore HTTP get calls are not encrypted So I want to show you by using TCP dump that the traffic flying or flowing from a book buyer to book store is not yet encrypted Because like I said, we haven't joined those namespaces to the service mesh. I'm going to capture a little bit of traffic I'm going to open it in wire shark and Browse through the traffic here to show you here's one of the get requests so as you can see we have the get request from book buyer to bookstore and The traffic is not encrypted Now I would like to encrypt the traffic. That's the first thing that we want to do. We want to make sure that Since we're running in a zero-trust networking Environment, we want to encrypt all that traffic So for that purpose, we're going to join The namespaces into the service mesh. Here's the script that I've prepared We're going to use OSM namespace add CLI command We're going to issue that command for all of the namespaces that we have It's also important to mention that We're going to apply this config map to open service mesh Which is going to switch OSM controller into permissive traffic policy mode That means that we're not going to be Mutating any of the traffic patterns will simply observe. We're not going to be blocking traffic With SMI policies that will come later And now one thing that I need to tell you about is the rolling restart script Because we have a few existing Paws already running. We're gonna have to issue kubectl rollout restart for all the deployments Which will restart the existing pods those existing pods will be terminated new pods will be created and There there those new pods are going to be augmented with An envoy site proxy, let me run the script quickly and Let's take a look at what happens to the pods as we're Restarting those deployments. You'll see that pods with one container are being terminated and Instead we're creating new pods with two containers to containers because we have not only the original Binary the original payload, but we also now have the envoy sidecar which is Where all the new features are going to be coming from I do need to now Restart my port forwarding scripts to port forward to the newly created pods and now we should see again the accounts of Books purchased or books stolen Start to increase starts from zero because those are brand new pods Like we said already Alright, so here it is now we have all those pods joined to the service mesh and That's proven by the fact that we have two pods to containers in there now I want to show you that the traffic is encrypted. I'm going to go ahead and do another packet capture Just a little bit of traffic. I'm going to open that traffic which I'm going to gather with TCP dump, obviously I'm going to open that peak app file and wire shark and we're going to again try to find those HTTP get requests But of course because now those pods are part of the service mesh part of OSM and We have enabled MTLS already We're going to have real hard time Finding those HTTP get requests because everything is encrypted so we can see the source and destination from book buyer to bookstore v1 But like I said, it's not gonna be possible to view the payload. Here's here are the packets All we can see is the fact that they have been encrypted with TLS v1.2 and we can no longer see the body of the request I'm going to switch gears now and show you What we have done with a Jaeger so we are collecting traces and You can see that Jaeger here is visualizing the topology of our service mesh. I Have zoomed in so it's a visible on the screen and we have book buyer and book thief both fetching books from bookstore v1 and bookstore v1 is replenishing books from book warehouse this comes in with Open service mesh pre-installed and so now we're gonna pretend that we're surprised that we're seeing book thief we're not happy with the fact that book thief is also purchasing books from bookstore v1 and The next goal of our exercise is going to be to block that particular traffic Going from book thief to bookstore v1 All right, so we discovered that there's this bad actor called book thief and it's already stolen 111 books and we want to prevent it from continuing to do that We want to stop it from buying books from v1 and v2 and so for that purpose We're going to apply a bunch of SMI policies I'm going to run the script which is gonna apply the policies and we're going to kind of instantly see that book thieves number of books stolen will freeze at a given number and I'm going to walk you through now what I just did Let's take a look at specifically the deploy traffic target Policy so like I said, this is an SMI policy called traffic target and we're going to take a look at this piece of YAML and in particular we're going to take a look at the Source and destination source being book buyer or the service account of book buyer destination is bookstore v1 specifically and now the reason We are not no longer seeing books stolen increases because book thief is commented out in this policy essentially book thief is Not explicitly allowed like book buyer to communicate with bookstore, which means it is blocked This is how traffic target works and there it is Book thief is no longer able to steal books And finally I want to show you traffic split. Let's take a look at this policy Here is deploy traffic split. So here's the YAML for the traffic split SMI policy What you see is that essentially traffic that is going to the bookstore service will be split between bookstore v1 100% and bookstore v2 right now 0% goes to that one So I'm going to tweet tweet that I'm going to change traffic to 50 50 split between v1 and v2 to save this and apply it and So what you will notice is that traffic flowing between book buyer and book thief will eventually start to be split equally between v1 and v2 and Here is Grafana to Help us look at the various metrics that We can observe allow me to tweak The UI here to start looking at metrics and there we go already We're seeing that book buyer is starting to purchase books from bookstore v2 as well as v1 And if we refresh Grafana, we're starting to see success counts now for traffic from book buyer to bookstore v2 as well and If we decide that bookstore v2 is doing really well everything's looking great I'm going to change it to 100% of the traffic going to bookstore v2. I'm going to deploy this and In a second, we're going to see that books purchase from bookstore v1 are going to freeze at 150 something and all of the traffic is going to be flowing to bookstore v2. We're already seeing the number of success counts increase All right next I would like to very quickly show you how to get to the source code of open service mesh and read write Contribute if you can First of all, we're going to start with the open service mesh design and all the interfaces that we've created to make this extensible The design MD file Contains all of the information you would need to get you started You I highly recommend you go through that to Understand how open service mesh works and how service mesh is in general work. I think that you're going to find it useful Second you can actually take a look at ADS stream that go. So this is the GRPC entry point for all Envoys this is how all Envoy proxies will connect to the control plane and Essentially, this is the the go routine that would start when a new proxy connects to it This is a good entry point to get you going to help you understand What happens when an Envoy proxy connects how we issue all the discovery responses how we Construct the proto buffers that configure the envoys and what that configuration is Based on how the advice you start with the ADS stream that go file Another interesting Kind of piece of information that you can get you going is looking at the injector or the patch that go file This is essentially how the web hook in open service mesh works It will show you how we augment the pod spec To add the bootstrap config for the Envoy proxy how to add the Envoy sidecar itself How we issue the Certificate for the proxy and how we create the init container etc from this function you can kind of start to drill deeper into the Open service mesh repository for instance if you wanted to understand how the certificate management system works you could look into the issue certificate function and Finally, you can take a look at how XDS itself works by looking at the ADS server that go function we've implemented Essentially handlers for the endpoint discovery service cluster discovery service Routes discovery listener discovery and secrets discovery and those are kind of the five pillars of the configuration for Envoy proxy You can have those links available through the slides And now let's talk about the lessons learned in the process of open sourcing open service mesh I said a process because oftentimes we think of open sourcing a project as a binary event Well, it's not a binary event Even though you might think that all it takes is just checking that box on github saying make the repo public change the visibility In fact in actuality open sourcing a project is a process in a project of an it itself It's a marathon that takes a long time to run And let me take you through the steps or the process of open sourcing a project first of all preparing to open source Start the preparation as early as possible. And first of all think about the privacy of the contributors think about your own privacy Choose and set your commit email address carefully and Decide whether to anonymize your email address will be available to the world after all And as far as sensitivity and mindfulness Please do code Comment and commit knowing that one day the entire world could be looking at this So what that means is while you're coding committing and commenting writing comments Think about are you leaking secrets? Anything that's sensitive to your organization internally or to you privately also Are you using language that may be perceived offensive to the future community and contributors that you're building think about that? External folks looking into what you're creating as well. And finally transparency From the early days from the get-go you should design document and make decisions with transparency in mind So that one day when a contributor arrives at your repo They can quickly answer the question of why and how certain decision was made So do document kind of how you're making decisions and put them in your public repository even before you go open source and Then when the time comes for you to flip the visibility bit and to open source your project You need to kind of have answers for those two questions. Why and when first of all know why you're doing it For instance We open sourced open service mesh with three things in mind First of all, we wanted to get advice and feedback from the other companies out there and the potential users of open service Mesh very important to us Second of all, we wanted to provide to the world One more implementation of service mesh This is not the Implementation is just one other implementation that we want to offer and finally we wanted to collaborate with the community in a vendor neutral space Second of course decide when it makes most sense for you to open source your project And my advice for you is to open source as soon as possible as soon as you can That will allow you to get feedback to iterate And don't wait for perfect code or perfect documentation That's hard to attain Get that feedback as soon as possible But don't open source sooner than before the guardrails are ready for your newcomers What I mean by guardrail is do not open source before you have had a chance to build unit tests to add static Analysis to your CI to write at least minimal documentation to help those early contributors on board to help the early contributors feel safe and Feel validated when they're adding a feature that feature is not going to break the system I think that a few of those guardrails are necessary before you open source to create for a productive environment for your contributors and then long term After the project is open source and it is public a few things happen And they're very interesting to me first of all the team dynamic will change after all open sourcing a project invites The whole world to join your team and communication will change the communication channel change the The time when people communicate changes and how they communicate to change of course second feature velocity Changes as well open sourcing and creating a new governance model Will require changes to how you actually publish designs how you discuss those designs How proposals are made how you're approving PRs and how you triage in github issues Reviews may slow down But that will be much more fruitful because the community will be commenting and collaborating with you And of course the queue of github issues and requests for features may grow and That's actually a feedback that I Very much appreciate when folks come and tell us what they would like to see in open service mesh or when they find bugs It's a wonderful thing and finally If you see that extra feedback and those extra github issues arriving you have to monitor those and React to those also monitor all other channels not just github issues, but slack emails All other channels that may exist And like I said feedback is a wonderful thing, but it comes at a cost Many new requests and triaging will increase the attention demand on your team And if the team is small of course that may take away from the core work all right and now something special parting thoughts and the topic of this talk was how to stand on the shoulders of giants and This actually points me to something that's very dear to my heart and that's reading code reading code as in reading prose And I think before we even start thinking about building something or before we open source the project We've been working on it would be a wonderful thing to go out there and research to see what prior art exists in the open source community There are amazing search tools out there to find wonderful code So I invite you to do that search for code Similar to what you're building read that code to learn to build to be able to stand on the shoulders of giants Who have come before you and have created something incredible and then improve that code? Use it and attribute back give attribution to the authors follow the rules and the licensing agreement and Of course contribute back. That's what open source is about push back upstream Your changes and I want to point you to this Outstanding book that do Midis spin elis wrote back in 2003. What a timeless piece This is called the code reading the open source perspective. I do Midis spin elis will With this book will teach you how to read code fine code And this book has been truly transformational for me and finally I want to encourage you to open source your projects So that that code that you're gonna open source so that you yourself can be someone else's giant So people can look at your code and learn from it Thank you very much for attending the session. Please reach out. You can find me on Twitter You can find me on LinkedIn and please send me an email with any feedback you might have Thank you so much