 Hello everybody, how are you doing? Thank you for being here First of all This is obviously not the guy who's given the talk. I didn't mutate it because of a strange virus Actually, this guy should be here instead of me because He's the father of the creature But do it due to a small problem with the consulate that Didn't granted the visa in time He couldn't be here so I am here talking so First of all keep in mind that Alex say who's a great guy and an amazing developer and And also a very smart guy Should be here and and not me and all credits to this guy so I am I am my name is Juan Pablo. I have been working in the IT industry for 18 years I'm 36 years old and I am a software engineer and currently I work as a Field cloth foundry engineer at Al Toros I do love to dance tango and to play a blues guitar. So if anyone wants to invite me to jam I'm happy to Do it I work at Al Toros We offer a variety of software services Mostly related right now to cloth foundry. We are investing a lot in cloth foundry We provide very much benchmarking for different Companies we do software development. We have a lot of Big and not so big customers Which we don't care everything everyone is big for us we have offices all around the world and We are really really happy of being a gold sponsor here at CF summit this year so Let's get to the important topics the big picture of this talk is about Latis and console have you how many of you have you been in the talk In the a2 room where the guys talking about latis were Awesome nice So you are familiar with how that is for those who are not I'm going to make a very brief introduction to To latis and then console which is the main topic of this talk We're going to be talking about service discovery issues and how to overcome them and how console Works to overcome those problems And we also are going to see a very brief example registering and my SQL service with console and How console reacts one when one of these my SQL servers goes down First of all, that is that is a very sweet open source tool That allows you to run containers in a cluster Latis containers it uses Docker images Which is really cool? They can be long running or temporary tasks And they get dynamically scaled and balanced across the cluster Latis has Much a lot of similarities with Cloud Foundry without being a full Cloud Foundry deployment, which you know It's big. It's heavy. It's costly. Latis is very lightweight and it's extremely fun to use It's it's really amazing. I just I really love it Four main features for latis You have a schedule we latis uses Diego Diego is still in development. This Diego is Next generation of Cloud Foundry Scheduler so everything that you have in Cloud Foundry that goes with Diego you have it in last in latis What this Kedler does is basically to balance the location of resources across a cluster of Of servers that you have running in the latis cluster, right? The algorithm is distributed ocean model. I Encourage you to know a little bit more about it. It's very cool to learn So the most important feature here is that since it uses a scheduling and also It uses Diego What it does is to provide a Very good compatibility with Cloud Foundry the same image that you're using in latis You can then deploy to production using Cloud Foundry, but you can stage it you can test it You can develop it using just latis in one notebook without having to deploy to for example AWS or server farm, right a Second very very cool feature is dynamic routing the dynamic routing allows you to start as many as Containers as you want as you want And they will be automatically added to the load balancer as they become available, right? This is very This is very good because you don't have to manually configure it Of course, you have the chance to configure that manually you have the chance to do some custom routing and Provide for example a custom HTTP traffic shaping for a B test or Brooklyn deployments That is is also self healing another characteristic that shares with some of the Cloud Foundry topics which Allows to compare a desired state of the system against the current state of the system, right? If latis found that the desired state is not equal at as the current state it will fix it automatically for you It also provides a very cool status stream via logigator that you can actually see all the Activity that the logs are providing for each one of the containers running You just connect to the tool to latis via the CLI Tool and you can see all the activity that's happening The CLI is very simple It's very easy to use you can Target one of the lattices running in your computer or In a cloud deployment You can create The application is very simple with the chest I create just like you can see there for example We are creating a laptop this app that is on the on the docker Repository and co-foundries that is happens a very simple application, but You just need that you don't need any complicated procedure or whatever You can see logs. This is how you can actually can start looking Through the old the logigator Stream the fire hose you can list the applications that you have you can scale applications that easy you can see the status of each one each single one of the containers that you're running and You can visualize The distribution of the workload between those containers It's just a very simple Demonstration of the CLI it's not even a demonstration. It's just showing you guys So why to choose that is it's extremely easy to use. It's extremely simple to set up It's fast. It's really fast It has a very small footprint and idle Deployment and idle lattice deployment weights as much as 250 megabytes in memory So you can use it in your laptop Try to use Bosch light in your laptop They need we can talk If it's it's perfect fit for developers that works standalone or a small team for testing for staging I do use it I use it for I use it in my company and as And for personal projects, it's really good Now let's get to the to the very important important part of the talk right now Console is also another really cool product That works That is for discovering and configuring services in your infrastructure, right? It's very distributed. It's highly available. So It will work in many situations. It's perfect for production even though it didn't reach the 1.0 version Every every node provides services that are Mapped with console it runs a console agent this agent is going to health check the services that are running in the node and it's going to be responsible to check the health of node itself and reported to the console servers The most important part is that You can use this you can use console to discover services with your applications clients, whatever and They don't have to know exactly Where the services located? Which port to communicate which IP? Console will take care of it will return. Hey, if you want to use my SQL units to connect to this IP this port the four main features of console are Over the tasks, maybe it's not features. It's tasks console as Service discovery is the main Is the purpose? The console clients can provide services Maybe an API maybe my SQL maybe Cassandra maybe readies whatever And other clients can connect to console to find out how to connect to those services The way that you can connect to console is Via a simple DNS query or an HTTP API, which is really cool Check it out. Also a very important Characteristic of console is the health checking for failure detection Those agents that lives that are running in the servers that are running our services Can provide any number of health checks you can report for example How much memory is left how? If the processor is running high Whatever you want to report with the scripts, you have the facility of doing that with simple scripts The health checks can be for example associated also to HTTP queries You can query a service via HTTP and if it returns a 200 then you are okay This operation this this information can be used by any operator to Take action in case that some catastrophic thing is is happening It also has a very flexible key value store It's a hierarchical key value store You can do that you can use it for the dynamic configurations. You can use it for feature flying you can use it for Leader election You have an HTTP API that you can use to query that key value store And for me the most cool feature about console is that is multi data center ready You don't have to add any layer To be able to query for different services in different data centers more than that later So Many of you may be wondering lattice is in Diego. Diego has console. Actually, which is console for discovering The cells that he's working with so why don't we use the same console that you have? For me this problem is simple gets to the point of how much do you want to tamper inside of your? Deployment right even if you're using cloud foundry Can be it can be very risky to start Tampering with the console inside of the ego I Can't confirm that because I never tried it but for me it has to be a separate completely separate Later so There is one proposal in the In the lattice repository in the lattice Back tracker that says that it's probably good to use a sky DNS To solve the service is the external services cover is in lattice The second chance is to speed up as to just to start another console Cluster just like we think that is It's a better choice and maybe there is a third option that We didn't explore or maybe another another one of you hasn't a better idea, but so far starting a Third Starting a separate console cluster is for us is is better Let's talk about the problems the the hidden problems with service discovery So Let's say that we have One service running on multiple hosts, right? It's today is very common to have three or four for example my SQL databases running During master master replication or maybe you have Redis cluster or Cassandra or whatever service you might want to have even your own API service, right? so If a client wants to connect to one of these services how to provide the right IP and the right port for this client This is one of the One of the problems that console solves right is it's the problem that console solves and it's Actually really hard because you have to have in mind a lot of Constraints and and a lot of different problems that you might encounter the first of all of course is a note fold what happens When one of these Hosts one of these hosts has problems and can answer to request Just for having this in mind Every software and hardware piece will eventually fail at some time will eventually be shut down at some time This is more fish though You can't avoid it. It's impossible so Let's say that a node a whole node not just a service a whole node gets shut down Due to power outage or whatever The service discovery tool will have to detect that this node has been shut down and Will have to route all the other requests for My sequel service to the other available hosts, right so It detects that the node due to the health checks That is constant constantly doing right. It's not only received health checks, but it should Constantly be checking for the node health Once it detects that that node went down just switches the traffic to the healthy node Another problem that is Quite difficult to solve is the is that today Pretty much all the Applications that we have can be scaled to hundreds of nodes very easily Let's say that you are working for a startup. That's a startup are gaining is gaining momentum and some very famous person Tweets that your startup is really cool. You're going to start having to scale up Your application horizontally like 10 fold 20 fold The workload required for controlling that Simply can be daunting if you are applying to do yourself So your services call your solutions should be able to scale with you Right, and then there's network efficiency All of this network traffic going from node to node Can be can take a hit to the network so You have to be aware of bandwidth limits and your communication protocols Really need to minimize Network traffic to be as efficient as possible Then there is what happened when your servers fail Your application should be able to survive one failure and then another failure That's when self-healing algorithms comes in right and Then you have data consistency, which is very important and Pretty much everything that we are doing Let's say that you want to Recover One address to connect to Cassandra, right? So a Cassandra cluster so you ask for that address to one console server and you ask And then another client as the same thing to another console server You have to be able to have the same response from all of your services discovery servers To the same query. This is very important This is actually where console comes in Console allows you to do all of that It has a very cool health check system so The first thing that you have to have in mind is that you can run any arbitrary command in your node Using a script this is script We return and exit the status if successful everything is cool. If not this the Health check agent will report that to the console cluster Second part if you want to use HTTP to poke a services that the service that you have running in that server, that's perfect too and And that's one if your Service can Proactively report to console With a status over a time to leave And store it in the key value Storage that console has So console can have can check this value and if it doesn't live up to the time to leave then Basically, it will say hey this node is having problems Console has two interfaces to work with One is via HTTP And the other one is just querying DNS Of course DNS is much Lighter than HTTP, but it's not as flexible as the API Consensus is very important due to the infamous cap theorem the consistency availability and partitioning Issue you can you can have the three of them at the same time you can only choose two so Since console needs to be Consistent really consistent and it seems it is very distributed. So it is very partitioning It uses consensus algorithm to Understand which node has failed which service has failed and then reroute the traffic The consensus protocol is based on raft. It's not exactly raft Roughness of course is it's completely outside of this talk, but you can check it out. It's it's a very interesting and I'm very quick overview on how this works the quorum for Taking a decision on which if a node is down or not it's Done by this simple formula, which is n over two plus one This means that the latest that the minimum console cluster that you can have is three Console nodes If one fails the other two can say hey, there's a quorum To agree that the third the third Server that the third cluster server has failed. So We need to alert the operator right and Having three or five It's the best way. Why simply because If you have three if one fades the other you can the other two can talk to each other If you have five and two fail Then the other three can talk to each other and you will still be abiding to the rule that Of n over two plus one. This is actually what console creators recommend to have three or five Console cluster deployment Then you have the problem of membership of the actually how console will detect services to to provide them and Manage memberships in In order to provide the information of the clients Service discovery Uses this membership and and has Two different pools, which is the land pool and one pool the land pool look at the local area network is Console helps you by discovering services Which allows you to reduce a lot the configuration that you have to use it uses a gossip protocol, right? The pool contains all members of the data center clients and servers. Oh, sorry And the one pool Is unique why because since you can use console to deploy To do service discovery across multi-data centers the one pool is Provides information of The services Regardless of the data center, you know that service a is service a and service b is service b and and just that Another very cool feature is that failure detection allows console to Handle losses of connectivity so you can be sure that You will not have crazy packages Being lost in the network I'm having Little time, right? How much time do I have? Three minutes. Wow. I Have a very brief example to show you guys. Let's see if I can do it real quick. Let's say that We have a lattice deployment and a console deployment and we have a mysql stack With two master masters, right? Then console Asians talks to console servers to provide health check information Let's say that from your lattice deployment for your application requires to connect to mysql. I will query console Vhttp or DNS then console will return The address that I need to connect in order to use mysql and What happens when? This one of the server fails I Simply what console will do is to Understand that the server is failing and start rerouting the traffic from the Bad mysql server to the good mysql server. Wow that was quick So Coming up real soon. We will have a post a blog post on Altos blog with all of this and we will provide a demo on how to do this a Video Doing this so just stay tuned check it out and I guess that we didn't we don't have even time for questions But if you have one and real quick I maybe I can take it No questions. That was real. That's really bad or really good depends on your look. Okay. All right, so you think slide This is it. Thank you so much for being here See you later. See you next year