 Good day. My name is Ben Haukla and today we'll be discussing observability with Console Connect. But first, before we start moving into directly into do service measures, we need to build up slowly to how we used to do stuff, to how we want to do stuff now. Back in the days when you wanted more compute power, you would have to get management by and you would need to raise money, you would have to get a new purchase order, you would have to order the thing, the thing would eventually show up to your your loading bay, you would have to install it, you would have to install the OS and then you would have new OS. Eventually that would mean like as in this picture, the damn things were big. So it would require building new rooms or new buildings. That wasn't that so easy so it took years to get a new compute. In the 90s when the first internet bubble happened, we were at a point where we would, instead of building new computers, we try to solve our problems in such a way that we could spread it across multiple servers. When I was still in the lab, that's how we used to do stuff and the list of servers was fairly small, the list of users was fairly small, so we could just say you work on server one, you work on server two, or this happens to be there and there. When we started to connect stuff to the internet and our users weren't internally anymore, things like load balancers were introduced. This way we could assign a vanity URL like google.com to our load balancers and then we could tell our user that's where it goes and then the load balancers works out where, how to spread the load or where stuff is. That was fine when purchasing was still slow, like in the 90s you'd still have to raise purchase order, you would still have to wait for stuff to show up on the loading dock and you would have to install it in your own data center. So adding new nodes and adding them to a static list in the load balancer was fine because that's how slow things were, it was actually a manageable job. But when we started moving to the cloud, adding new nodes, removing new nodes, adding new services became much, much quicker and that's when server discovery showed up and that's basically what server discovery is. It's a method of announcing yourself into a cluster or a pool and saying I am service X or B in this case and I am ready for action and then the load balancer you announce yourself to the node balancer and node balancer starts assigning you work. That's where console comes in, that's how originally it was envisioned as a service discovery tool that could basically route stuff to your networks or your clusters. How do we set up a service in console, it's actually very simple. It's a file on disk, it's written in a DSL called HCL, the HashiCorp's configuration language and here it shows as such we have a service name counting and it runs on a local port in 2003 and it has one health check. Basically it pings the slash health endpoint every one second and if it's healthy it gets announced as such into the cluster and it's accepting work. If it's no longer healthy console will take it out of the cluster until such time that it's fixed again. You can not only ask console direct questions, console can also act as a DNS server so it actually lives in your old school platform or environment just like everybody else if all your services and you can dig it on a local port 8600 and you need to ask a DNS question in a specific way if I want the service counting then it's counting.service.console and you can even incorporate this into your existing DNS services and like I show here for my bind server that is a console zone that forwards all the questions that for services and then .console to my local console server and then the cloud was fast, Kubernetes containers was even faster instead of minutes we now can start and stop services in seconds. We can run services across multiple clouds and that brought a whole new layer of questions and problems to solve and that's where service matches came in. This is where we start to think about how do services collaborate or interact and how can I distinguish between different services or how can I block access to different services and that's where console grew into console connect and that's console connect is where the lovely people at HashiCorp incorporated or grew the service mesh level of console and that's what we're going to discuss in the next couple of slides. So how do I extend my service definition for a service to become console connected or service mesh aware and that's fairly simple we need to add just this little style connect stanza with some MTV other stanzas that's it now it's console console connect it has a TLS certificate and it's ready for action so how do I connect to this in person we have a counting service which is our backhand now we'll add a dashboard on top of it runs on board 9002 the only thing we need to add is a similar connect stanza but now we declare our upstream so we're basically saying we want to use another service and the service in this case is called counting and bind it on a local port 5500 why this is cool we're going to discuss in the next couple of slides. First we need to fire up the connections so we need to tell console that the connect side of things need to be started and that's fairly simple it's a command line command called console connect proxy and then you basically say for service counting and for service dashboard let's let's give me a sidecar and basically make it connect aware this is very cool it brings us a couple of additional features that we want remember I mentioned we bind it to a local port 5000 so when we use service discovery we no longer need the vanity URL of a service but we still need to know which port it's bound to and if people bind it if it's a known service that binds to its known port then I can infer it and I can still similarly use it if you start using things like kubernetes hashicorps nomad as a scheduler stuff might not run on its predefined port it might be on a random number so we need to either make rules about which port to use or we need to have a ledger of what the port to use and that's basically what console does for us and instead of using trying to work out a way to ask console for that port console connects actually says no I know I'll work out the port but bind it to a local port so for your local application it will look just like it is connecting to a local hosting and console connects does the magic below the other great feature about console connect is acls or in console speak intentions so I can either I can I can block services from connect from using my service I can allow stuff to happen so we never get into a situation where our development database connects tries to connect to our no sorry our development platform starts to connect to our production database or vice versa that should definitely never happen you should run a console cluster for your development platform you should run a console service for your production platform just haven't forbid if that ever happens due to a copy and paste error you can use all the stuff I've discussed now in previous slides you can actually try yourself the lovely people at hashikorbe have a very good tutorial to follow that brings us to observing stuff so I've used the words console abstracts the magic away for you I've only I've also used a I've carried a pager for the last 10 years and abstracting magic is scares me because I will never want to end up in a situation where restart and pray is my only way out of a problem so luckily the lovely people at hashikorbe have ways to work out what is going on and that's actually what we need to do in general anyway so yes three ways to work out what is going on we either look at the metrics that the system emits so how many purchases are happening how many concurrent users do I have what's the hit rate per second also can I look at the logs the platform is emitting how many errors are there are information messages and also since we're using hopefully possibly using microservice you also need to look at traces and tracing is basically working out where in your platform your application is spending its time is it in the database is it rehashing acls is it waiting for the accounting service that is waiting for its own database and so the three things that we need to look at as metric traces and logs and the rest of this presentation is going to dig into how we're going to use the grafana stack or the to look at into where the magic is happening in console connect so console connect is built on the shoulder of giants so it's not a novel implementation but it's actually reusing a cncf tool called envoy which is great because there's more people looking at it it's not a not invented here product but it's actually a nice wrapper around around envoy so when i talk about metrics i'm talking about prometheus collecting your exposed metrics prometheus is a cncf tool and it scrapes metrics as i said so instead of the classic old-school tools that push metrics it actually pulls metrics that brings in the need for service discovery which we have and it's very cool and nicely integrated into prometheus prometheus also has its own query language that allows us to query the data but also ask very very specific questions so how do i expose my prometheus metrics in console connect and i'll actually i'm not exposing console connect console metrics we although you also should but this way we're actually exposing the built-in envoy metrics which is very cool and the only thing we need to do is add a three line blob to our already existing console connect configuration basically saying i want to inject a little bit of configuration and i want to make sure that i expose my envoy prometheus endpoint at a certain port in on my local system that the way i showed it in the previous slide is actually a bit cumbersome because now we need to actually make a promise well i need to ask all our man developers in every job you you may make sure that this line is available the other way around is injecting it into the global console configuration and by basically saying enable the central service config and make sure that for every type every kind of pro of a proxy we um we inject this this bind address and of course as i said please make sure that you're exposing the console metrics itself and that's the last four lines basically i have a telemetry blog and make sure you configure a few exposers as premises data how to collect this in prometheus there's two ways one is a hardcoded list which is the top part which is a little bit of yaml and you say have a scrape config that's the way prometheus names those things to have a job and scrape all the static targets now it's in the cool matter way of doing it a cooler matter way of doing it is actually using console service discovery so have prometheus query console for all services that have a certain name in this case console connect and void and then make sure that you just scrape those that way we actually announce services that need to be scraped by prometheus and then the circle is round so how do we use prometheus data there's two ways actually console has a ui since version 1.7 you can actually have a nice integration with prometheus and in the general console configuration you need to add a ui config blob and you need to point that towards your your prometheus server and that's it then if you look at the y you now have a little exploratory usage view if you drill down into your services in this case i have a product a product api that needs to talk to my database and i can actually see data going on that's all it shows but if you want to drill deeper you actually can click on opening metrics that dashboard and then what happens it goes to your local grafana instance and grafana is a dashboard that does metrics and does metrics in a very general way it started out as a project that could visualize graphite data but over the years it's gained a plugin system and it gained data back end plugins for basically any metrics back end but also logging and tracing at the moment so it's very cool how do we connect grafana to our prometheus it's a little yaml file with a standard blob off-tide prometheus pointed to dns where prometheus lives in my case that's the console service so console also does dns for this and i'll do it in a wildly insecure way by not having any authentication if you run in the production please add it but for this presentation itself then in grafana i can use a little bunch of cool dashboards you can also download them off of the grafana website and in this case i can actually have a visualization of my console cluster i have three nodes there's a leader and i have a fairly healthy amount of queries going on next up is logs i'm introducing a tool called Loki which came out of grafana labs fairly recently and the byline is prometheus for logs so in the olden days or not that long ago actually locking aggregation systems like the alkstack like Splunk were platforms in and of itself so it took skills time resources to run that just for your actual platform to push stuff into so Loki is a much simpler way but also much more powerful way in my opinion of doing stuff it's it's heavily influenced on on prometheus and you can run it basically off of a single go binary as a monolithic instance or you can spread it out as microservice that's all using the same binary how do i push my stuff from from my console connect into Loki there's a couple of ways doing this the first one is if you're running everything docker anyway on kubernetes install the docker plugin and take a final provide there are some issues at the moment so it's a blocking call so if your Loki is down you'll have issues manipulating your containers if you're running still bare metal or you're a bit scared about the blocking query thing is you can actually make console connect write its logs into a file and then scrape it using the Loki's command line tool compromptail and the weird ish command line on line number one is so we have the the first part of the command is the command we used to and then dash dash which means it tells console everything after this is not your configuration but it's invoice so you can parse stuff directly to invoice and this is how you this is how we now ask envoy to parse everything to a config file which then gets scraped and pushed into Loki again how do i set up the data source similar like before YAML file often and then type Loki put pushy towards the URL and then what's your uncle what does that look like if you want to initially you want to do exploratory research what isn't my application actually pushing what artist neighbor is pushing therefore Grafana has a very cool new explore tool and then we can query you can push push in any lockql queries that you can come up with and then eventually you can turn that into dashboarding last lastly it's traces traces can be stored in a new tool called Grafana Tempo which has a similar impetus as as Loki generation one tools like Zipkin like Jager actually are quite cumbersome to run you require databases you require storage and tempo is a much leaner way of running things but not stopping you running your own platforms it's a single go binary it speaks Zipkin, Jager, open calamity dialects so you can use your tools you're used to but in a much leaner way and then it stores it directly into cloud-based storage instead of a database you have to run making Envoy emit traces is a bit more work than than logs or metrics so there's a two-way two-way blob ones of JSON firstly you need to tell Envoy that it needs to emit Zipkin or in this case Zipkin type traces and then next of all that's the more elaborate blob that hopefully Hashicop will eventually abstract away for us is there's a large blob that basically has a three-line bits of information where's the address what port is my Zipkin or in my case my tempo living and please push the traces to that again we can simply add it to to Grafana my little yaml file pointed towards the tempo query front-end tempo at the moment needs a second components to visualize traces in Grafana and there in the process of removing this requirement so that you simply only have tempo but for the moment it's a slightly manipulated jager front-end that needs to run so hopefully when 0.7 comes out which hasn't happened yet at the time of this recording you will be able to run this as a single binary if we have tempo traces we can set up in such a way that we can correlate between low-key sources and tempo sources therefore we need to extend the low-key data source slightly by saying and there's now the right field or if you find the derived field that smells like trace id then couple it to then it probably is a trace id and make sure that you couple it to to the tempo data source so what does that look like if you on the left i will we're looking at logs and if it finds a trace id then there's a new button you can click and then on the right hand side a trace panel shows up so therefore we can very easily and simply correlate between our logs and our traces and hopefully you can work out what happens so that's the end of my presentation thank you for listening if you want to talk to me about it ping me on email uh you want to stalk me on twitter please do and if you want to look at these slides a bit more slowly then you can find them on slideshare