 Hi everybody and thanks for coming here So we'll talk today about a project called squash. It's an open source of project And we kind of like trying to understand what is the motivation of creating a what gap is kind of like covering the ecosystem And I was going to integrate with everything that we're doing today So quick one about myself. So my name is Edith Levine and I'm The founder of a company called solar. Yo, it's a startup company And these specifically project reads were done by me and by my employee call Yvonne Cojava was a brilliant engineer And and we before that did a lot of open source project We founded project unique and now squash and we basically contributed to tons of others Okay, so what is the problem with that try to solve So you all know about it and I will go very fast because I know that you guys know all of this But basically debug my Microservices application. It's a not problem to do and it's really really simple Why because when the bet back on the day we had one late application if you wanted to actually debug it and Understand the states of the application you needed to just attach a debugger And basically you've got a picture and an image of all what happening in the memory, right? You basically saw the stage of the application can see the even stuck you can actually see everything that happening and The story I understand what this application is doing, but that's not the case right now because we really really wanted to be More as scalable we needed to actually take this monolithic application and cut it to a very little small pieces So now it's a little bit problem because now all these little pieces actually Has their own kind of like image of memory? But the question is how we kind of like understand what's happening in all the application itself And you know here it's an example with a few container But actually the real application that we're running today in production will look more like this right the Netflix of the world and Twitter running like Over 500 container, so it's kind of like very complicated to tell a story to understand the story This is a tweet that was treated back then, but I really like it So I just saw that it will be muting but basically someone just move is a monolithic application for microservices And now every time that he has an outage is basically a murder mystery As any consider is a lot of people simplify with this problem because it was treated a lot and we liked a lot Okay, so now I'm not going to talk about open tracing because you let you learn so much about it on the last two days But I will mention it and mainly what I will focus is what it really is and what is not and this is where I'm going to put The focus what it's not going to do for you and where I see the gap So we'll quick again. You know that I'm going to go very very quick Basically, the idea is that it's a transaction like logging you're taking the idea you send it to all the pieces of your Transaction application you're getting a peak. You know, you can actually get Picture of something like that, right? So now you have tracing and span you basically understand the application look who is calling who and what they do Okay, so that's basically what often tracing is I'm not going to talk about that again This is for people who doesn't know this stuff But we talk so much about open tracing in this conference that I felt that you probably most of you understand what it's doing And it's in the foundation I'm not going to show you demo because you probably see but I will focus on this and what it's really is what it's trying to solve What is this purpose and not what we be using to be because we kind of like abusing with what it really is So what it is it's a logging right? This is what it is It's basically you're logging your application and logging. It's basically mean printing. That's what we're doing, right? It's a really good one, but what it's doing. It's logging what you're not logged. You don't know, right? You can kind of like stress it and people in Humber, for instance Yuri is doing it that basically you're taking the those logging that you collect and kind of like feed them as a Matrix to different tools metrics tool. But again, this is not what it's for right. It's basically It's not now you are getting a It's in it's conceptual a login, which means that you actually know the context of the logging So you can actually see it much better and see how it if you and understand who is called who and so on but again They are saying that it's like now you can understand the critical path And I'll see it a little bit if you see what's going on who is calling here You kind of like get a picture of where can you have a problem of latency and then you can analyze it, right? So because you do see the topology, right? But here is what it's not and here is what for my opinion is weakness of the solution So open tracing is not a runtime debugger. Basically you Sending all the information somewhere. It's aggregated the log. It's massage it and you will see it ten minutes after right? So basically it's not a runtime debugger and this is really really important of people will understand You will need to take ten minutes of actually to get the logs. You need to wrap and change the code I mean if your application in the end of the day you need to look right? So you need to put this login on the top and the top and you need to wrap it in a library that will be able to kind of like Know who to send the lock to you and this is something that you need to do And it's better right now with integration and service much But you still need to do it and you still have some code that you need to add it to your application And you're not really getting an holistic view here because what you're getting It's only what you're printing if you didn't print a value will not going to see it there, right? So if the point you have a problem that you didn't think about or you don't know what is the Promoted that doing your problem. You will not see there and what you will need to do is actually Go to your application print it again Send it to your to your to your a to production to the system again push it again And then you will need again to basically Aim a wait ten minutes until you're getting the logs The light the next thing is that as I said, it's basically you can change variable in time in runtime because it's not a runtime debug So basically you can only see it after ten minutes and You know, there is a very good presentation that you saw this week about open tracing for Ben and From the light step and from people like Yuri But in the end of the day this application purpose is to show you what it's capable of doing But it does not make sense to log all of this right because it's giving you a huge performance issue, right? I mean basically these things is on your network and and then you need to find that the balance and you can't send it all The time so you need to sample it and you need to decide about is the sampling rate and it's a very very Delicate trade-off that you need to make and it's not easy to everybody to make it You need to be expert like Yuri or like that. So So this is the limitation that I see and this is what what I felt that there is a gap and this is where squash can basically help so We you know when I was a system engineer All we did is basically a delta lock with the operated system and unicolonel and I was doing a lot of go application My best friend was the debugger basically we took a unicolonel from point we put it on raspberry pi Guess what? Black screen can't even debug it. So GDB was our best friend, right? So we used a lot of the debugger and then when I was in my previous company And I was working with one of my new engineers who basically born in the cloud and we worked on some application I said, okay Let's just attach the debugger see what's going on and he said attach the debugger So basically I feel that this is a closer that kind of like disappear for the new generation Cloud and I felt that this is something that we can help and the reason it's disappeared It's for not used right now because it's complicated right you need to pipe a lot of stuff So we decided to do the piping for that's basically what we did So what is square square is basically orchestration for debugger so what it's doing it's basically seamlessly integrated with your infrastructure and when I'm saying infrastructure actually mean platforms and For instance, Kubernetes seamlessly which mean that it's not changing the platform and this really really key stuff that we You know, we really insist of not changing the code and basically that way it can use with their open sheet Or any other distro that exists in the market because we are not changing that the Kubernetes Code itself and on the other side We basically piping it all the way to the idea because in the end of the day This is what we're doing because writing got in the idea and we want to debug it That's how we did that so what you're getting with squash because it's basically leveraging the regular debugger We are not writing our own debugger. We're leveraging all the debugger that exists there So what we're getting from it is like debugging cross multiple multi microservices, right? You basically can jump between microservices and I will show a demo soon It's you can debug container you can debug pod you can Can debug a service you can set a broke breakpoint you can step into a your code And the last thing and very important is you can actually modify a variable in runtime and see how it will affect your application So let's see demo real quick that way. I think you will understand what I'm talking about one sec Environment don't worry about it. It's just setting up my environment That's running on AWS. Hopefully the network will not kill me So this is a microservices application that I wrote really really simple one Basically what it's doing is getting to power matter and treat the result right and you can add add or subscribe So I just put whatever value we want here doesn't really matter and we will calculate and we see that it's not really working Right 55 plus 33 is not 22. So we have a buggy, right? So what we can do so if I'm using open tracing or any other login tool I will need to log I will need to change that in my application push it again to the my Kubernetes environment And I need to wait until it's actually the log and so on but maybe I don't have to do that Maybe all I need to do is basically go to this application in the ID and say that this application is actually building from two microservices I don't know if you can see it, but it's not really interesting. It's a goal application The first services is basically a UI, right? It's basically generate the UI and it's getting the value that it's getting and it's sending it to the second of a cross services again We think it go and basically it's either add or subscribe really really simple application So what I will do next are basically going to Use the command pilot. I don't know if you familiar with visual studio code But the thing that I like the most is that everything you can reach from their command pilot So we wrote a squash extension and that's giving you more functionality So extension in visual studio can just it really easily from the marketplace Install and you have all this capability. So what we're going to do now We're basically going to run debug container because that's what we want to do What happened is that the idea went and basically talk to my group Coups it through Ctl and basically brought me all the product running that I can see right So it's kind of like secure only what I can see And because it said with one I will choose the one that it said with one now It will tell you present me the container in this pod I will choose the one that I have and he asked me which debugger do you want me to attach and I will say Dov because it's go and you will say That in a second it's basically attached right. So now let's do exactly the same thing for the other services So give a container now we'll choose the pod to service to we will attach to the container And we will choose again dov because it's a go application So in a second that one will be attached again. So now basically attached and waiting So what I should do is just go and calculate it again What will happen it will jump because it's actually a debugger and now we can step into I see all the memory I can see all my variable and where I am and now it looks and I can use all the regular command and basically I can Run and then it will jump to the other one because I put a break one there as well, right? And now it will go through right and you will see that is added equal true And I'm very happy about it, but look what I did here I actually did a mistake if he's added equal through and I put minus right and this is only a simple demo to show Well, we can do right, but what if I don't need to kind of like Change it and then push it again What if I can just leverage the fact that I can change this value and see if it will solve me the problem So I will come in and I will do is added equal false that should Change that and I will Step and you will see that the value changed to false and now I'm just going to move next to a jump here because I put a break point Do that and basically what I'm going to go to the idea you see that now it's fixed the problem So now I know how to fix it. I don't know fixing it pushing it a more set, right? So that basically a very simple example of what this question doing This is the basic and I will show you what we did with it after it. So Cool It's going to close it work with Okay, so now let's go back to the presentation and understand what we just did like what will happen in behind the same Okay, so so basically it's really simple right architecture is really simple There's three component the first one is the server this discourse server Basically, the only thing that it's doing is basically orchestrate the client What is doing is getting a request from the extension or for to an API and basically what it's doing You figure out based of what you got which node you need to talk to and then he's going and said it They request to the client. That's really simple the squash client is basically and this is a server running on Kubernetes So basically we have a yarn on just it's stalling it. It's really simple. It's like you can you know, it's like stop and down and so on Then there is the squash client. So what is the squash client? So basically it's a we use it as a daemon set to make sure that it's always up And basically it's a docker container wrap basically the debugger So I was willing a little bit more. So that's why I'm going fast to kind of like get you the full overview And then the last thing is that the UI is basically the idea, right? We didn't want to invent the wheel We just use the regular ID and of visual studio card in that example So so again, what's happening in the flow in term of technically when the extension is that when you know when you actually calling the command of the extension what happened it's going to group CTL and basically Present the user the pod when it's used the pod or present the the the container Then he basically present the debugger that we support then it's basically sending it to the score server the score server Not doing too much, but it's basically need to figure out which client to talk to so reach the content where the Container located located in the nodes and basically sending the request and then basically is waiting It's the idea in that case just waiting for the debug session It's basically waiting to tell the client of the ID debugger to word to a to connect Right, and that's it and then it's coming the request as I said to score to score server My squash server is doing is basically sending it to the relevant client and then he's waiting himself, right? You need to wait for For the debug to happen to actually that that's what happening And then the debug client the only thing that it's basically doing is so there is some trick here And we will drill into him because there was some challenge and that we needed to solve But basically what you need to know is we have some issue and I will drill into it But you need to kind of fight translate the the basically PID of the of the container to the PID of the OS the namespace so So I will drill into how we did it I will find a solution and what we did with this and And the after we does and it actually know what is the process that you need to attach to what we actually doing We just basically there is the squash client just initiate an instance of the debug has a server and attach it to the container And again, I would really do it because there is some Cool stuff here and then it's returned the session on the port basically going all the way pop you get it to the IDE now Then he connected it and basically letting the native debugger to work So it's only doing the piping it's connecting and I said now you're on yourself do whatever and that's where we leverage We're not writing our own debugger. We love you all the debugger that exists So one second, but here as I said we're doing some tricks, so I want to kind of like focus on this and Because we are a we want to get a feedback and be it's kind of like an interesting So what do we need to do so this is how it's actually do not look right? You have the squash client you have Cree and you have container that run The container PID is it usually in a different namespace and we need to kind of translate it to the host namespace In order to attach it so the way we did and one of the things that was important to us the most is That we don't want to do it only for Docker container because that's kind of easy They kind of like returning that's but we want to be working with any Container runtime interface Implementation that exists so here is how we did it I don't know if you're going to like it and I would love to get your feedback on it But basically what we did is this basically squash kind is going to Cree and when it's going to this is basically running an exact Sync request and what he's saying is LS right Proc self, which is basically me namespace show me all the namespace that I'm running which is basically the LS command and now you're getting this List of the namespace and you basically can say that I don't know if you are familiar with our system But basically there is the unique identity in the operating system is basically what called I not so what we're getting here is basically a unique This is the I know so we get an a unique I not basically of the namespace So now we kind of like the PID namespace so we kind of like know where it is Once we're getting that we basically and now we're not using the PID actually because the PID could Actually also be in the host level. So we actually using specifically them the mount Namespace, but this is detail doesn't matter. We just needed to find something that's identified Specifically and then we can be we return into the squash client And then the squash client who is running on the host basically looking for this I not on the list So you can see the card. It's really really simple But basically what it's doing is basically go over all the work that's running finding that I know Find with the relevant I know returning the PID and now basically we have the PID of the also We can just go and attach that So so that's that there is one limitation that we have right now with this solution Which the thing is that if your container is not running at last we kind of like have an issue, right? But we still kind of like there is a way to even tweak that you can actually inject a lesson So but we need to think about that. So we will have your feedback So okay, and the last thing is that basically the squash client the way it's actually working is this is a docker file or file we're actually building it and what you can see is that basically we Install the gdb and the dlb in that case and basically what we're doing after it We just initiate in Steady at the the server itself and attach it. So that's really really simple And the beauty of what it's giving you is that what I show you in the application You had two microservices that are reading and go but actually What you can do is actually have a microservice reading in go for instance And another one reading in a in a in a in Java and you can just attach this debug the Java debugger to here and And and a squash debugger to the other one at a they go debugger to the other one and actually debug that Right with the course club and that was kind of like What you know in the end of the day in beginning we did it not a lot of people So we needed to choose to kind of like create an MVP before the community actually telling us if they like it or not So what we did was basically it was really really hard, right? The MVP is minimum viable product. So we chose Kubernetes at the platform because it's no printer Which is the idea of the visual studio code only because we are reading it and that's what we like And we choose the debugger that we kind of using it right now gdb and the LV But what we discover is when we open source is that actually committee will like it and what they did is basically they Added they ask us to add support for Martins or a guy for Verizon Want to take the heat and doing that and don't want to add support for squam and actually they also want a more debugger like for instance say a Python if they're running in production and then and also more idea like a intelligent for instance was a lot of the request So the vision is kind of like doing that way, right? I mean we want that to be kind of like the debugger for everything and it should be Whatever using it and in order to do that we created it from the beginning the platform The interface will be clean so in order to add for instance a platform for the platform The only thing you really need to do basically in this interface there is everything that is a platform specific So one thing that you need to do is and we talked about it to create the container locator You need to kind of like you're getting the request and you need to understand where the container is actually located That will be a specific Kubernetes call, right? And then in decline the same thing we need to translate to get BID We talked about it. So again, we're using the Cree Cree not existing every platform Actually, Kubernetes there is almost the only platform that I know at least that you can run an exec on the container itself So that's not the trick that we did right now will not work for something like model And the last thing is that we wanted to save the states if something happened the score server in the middle so what we did we created an interface for that because in Kubernetes want to leverage something like third party and so on and we cannot add it in the rest So basically right now. It's in memory the same difference, but you can implement in here and the debugger It's exactly the same thing that what the debugger is doing is attached the touch and port right little simple So to add the new debugger is really really good And the idea is so we did it for visuals to your code, but we can do it for the rest It's doable and in the end of the day, it's an open source product that we wanted to But the question is how kind of what kind of speaker will be if we'll be in coupon and I'm going to talk about service match I think that would be the only session that will not tell you. So let's talk about service match one sec. So Service but for my opinion, it's great because it's giving us the visibility that we talked about We basically now on the network and I'm not going to tell you about what it is I'm going to you know You know what is service match because you heard so much about it in the last two days and you know that the Plain design boy could be envoy and you know that the control panel can be STL So that's not what interesting but what interesting is how you can take open tracing squash and Service match and kind of like we had one one solution that will fit or will close all the gap So what we did here? It's really simple. We basically create you can make an a basically you can debug a Service and this is the beauty of it a service and not a container on the infrastructure So you're going today to the idea. You basically now will tell me which service you want you would tell me and Which image in the service you want to debug and that's it and when we wrote we wrote basically an envoy plug-in So what does he mean an ever plug-in? I don't know if you know, but you can extend that easily and basically get The ability that every time like we extend some way we plug-in and basically I will show the plug-in in a second But what we're doing is that way if a request coming with the other squash What the end way is doing you basically going to squash and say debug me and Then squash got the request from the IDE. It kind of like magic to ear It's doing all the magic and it's attached to the debug and the beauty of it is again is for my opinion The strongest thing and this is basically saying the same thing So this is the end will plug-in. It's open source. You can go and look at it But basically what it's doing. It's really simple is using the environment variable to take the pod name space and the And the pod name and basically creating a quest to set to squash and then it's basically waiting for the response And the beauty again for my period what's stronger here is that now you can actually debug it in production without posing the cluster We're not basically stopping the cluster because the cluster will continue running with all the other requests only the request that you ask Is there basically going to be stopped? So that's died. I mean there is limitation right now to to Pilot and to envoy is basically the fact that and the plug-in if you're adding a new plug and you need to recompile it Which basically now become a new envoy and and and and the other thing with pilot is that you need to basically configure pilot to use The plug-in and right now the way he did it is basically out called a configuration So we're just going to fix that and I'm talking to the community And what we can do is basically just go to a point that just basically make this Plug, you know, basically make it comfortable and that you will not need to do that So we will walk in it and we will but try to contribute it and hopefully put it up spring I mean, I will show you a demo, but again, it's really even smaller demo How it's going to work? so now basically So what I did right now is I basically loan the application that then and lightspeed is doing with the doughnuts alarm and basically What you can see the code here in a second you will see it open and it's already in service match So it's kind of like connect you will see it connected also to jogger So this is the application itself and we will start refresh that we will see that the auger is connected to it as well as an open tracing and now we will see that, you know, it's really simple we basically ask a request and I'm getting a donut, but what I want to show you is that what you can do right now is basically come into the Visual studio and ask to debug a container in a mesh And when you're doing this now, we will give you all the services that running and we will go and choose the Donut salon and it's giving you the image running on the service because we want to know which one and we will go with the donut salon and not the proxy one and And that's why what happened you will see here a little watch We basically say I'm waiting and what is waiting is waiting for the call with the editor to envoy to tell you to actually debug So let's go and do it again. It's really simple. There is a Care command here and the only thing that I did is exactly the same care command And the only thing you see is basically I had an editor squash debug. Actually right now. It's even doesn't need the solo and then when we're clicking it what we will see is that That debug attached and now you can actually debug it So basically and the beauty of what we're leveraging in the invoice The fact that on the beginning when we did it we said well But that's not it's will never work in production when we only Basically attaching to the container because then you're stopping it and it just doesn't make any sense So what we try to do we try to do kind of like a lot of interesting stuff We basically watch all the services attached them all to the client And when there is a panic we return and that was kind of like a hack Then we tried to do a lot of other stuff like for instance You want to make sure that you that the debugger attached and and you'd be able to kind of like debug it from the Fresh line so we basically put our own service and then did exact and change it to the regular one after it Attached so there was a lot of magic here that been done, but with service we don't need to do that It's really clean because service when she's stopping the request until you see that the debug attached and then it's connected and everything is good so So that's that and and and this is okay, but I think that we can do it even better so Yeah, so we can do it even better for my opinion. So so so for instance one thing that we know is that The service man and for basically as the ability to retry if we need which mean that he actually has the request So what we can do if we're getting a response for 500 for instance, which is an internal server What we can do is actually send the same request again But now with the data of squash and kind of like the package. So that's kind of like more automatically We should integrate it better with the get up in that way because what we want is basically that it will open for the user itself But for that we need to find Basically the commit ID. So there is an idea to do that maybe with attributes include in Kubernetes Maybe web browser a ID can it can help you because then you basically not need to run anything on your machine And because you're going to spin up everything for you we need to integrate it better with open tracing for for instance with the way I envision that is Service match and open tracing giving you the latency between the services But now you need to zoom in and see why it's actually happening there So you see a latency between two services and what I would like you to be able is kind of like to zoom on top of it on them and figure out what's happening there and That's what we have. That's why I have Any question Yes, yeah, definitely Is that it? Yeah, I can't hear you. I don't really hear you. Maybe you should I'm sorry Maybe you wanted something will come closer. I think now I will hear you because you close it Yeah, okay. Yeah It's only a it's Damon said on the note. Yeah, so you do Yeah, so basically we don't need you understand So the question was if because we have the client in the server We don't need to have it in any every container and that's enough We basically do we did some magic there with the namespace so we only need it as a Damon said on the note itself That's the one instance of it. And that's it yeah, so Like what I ever in the idea is something that I took for the get up and running but I'm actually debugging it on AWS So the cluster itself and the container running on AWS right now on Kubernetes in AWS Yes, yes, yes, yes, I'm basically on debugging it live It's a running cluster Yes, so that's why I said we're using a Damon said it's basically a yaml that we're giving you and we use it Basically, I mean I can show it's really simple We can it's basically to yaml that the install one is this squash the score server itself as a service And then you're basically getting a yaml and it's giving you it's putting you The day once so I'm basically just using the coop CTL that way I'm leveraging their security I'm going on to telling me give me all the poor that I can see that I'm allowed to see right because I'm good CTL and then it's bringing it me back So I'm just leveraging the club CTL and you can see the extension is open source as well So you basically can just go and look at it, but it's like so dead simple Did you use the why? Yes, so I mean how do I connect if I'm just a great question you ask I could connect my code in the computer to the Yeah, okay. I don't understand. Yeah Okay, any other Awesome, so I mean I would love if you will help us make it better and we trying to like I mean We're already talking to and when it's here to kind of like try to integrate that so hopefully you will use it Thank you