 And the last talk is gonna be about how to build a smart reverse proxy and go so run all plus Thank you. So we're going to build a smart reverse proxy First a couple of words about me. My name is a ratio. Hi Yatsa I am a back-end engineer in the infrastructure department at get love and we are an old remote company So I am one of those faces back there. We gather once a year. This is last in New Orleans City, so I'm going to tell you about a story So imagine the infrastructure department Announcing that we are going to migrate our production from Azure cloud to Google cloud platform. Wow, this is really cool More or less at the same time with more or less the same deadline The distribution team announced that we are going to release cloud native charts installation for get love Also, this is really cool. Then you start thinking wow, we ship features. We would keep delivering get love while migrating all those things and then you Start thinking about all the little technical depths that you have seen all the duty tricks in the code base I'm not really sure that this journey would be so fantastic But before we begin to the story I need to go back in time into mid 2015 So we are a Rubin Ray's company. Why I'm talking here at the Goldberg conference So we had a problem We had a big problem with slow records Nobody likes slow records, but our problem was for not really a performance of some Request but by design you were supposed to move data think about get operation If you want to clone the kernel repo over HTTPS it takes time No matter how optimistic you can put there. There's a bandwidth and there's data that you have to move So it takes time back in those days. The only solution we had for this was yeah You can clone over HTTPS, but it better if you do it over SSH so one of the reason Was because of this problem was like that we have a Technologist tech that was based basically on a forking demon which was designed only for serving first client on low latency high bandwidth connection So you get this is a forking demon So you can imagine that you have a master process that loads your code then it forks it creates some workers and The master process handle incoming connection for wording them to one of those process And if the process is waiting doing IO You cannot serve any other records because it's it's not a multi-thread application. So You can imagine that if you are cloning something in this situation It just you're losing capacity over that When you transmit data, so The basic idea is this one. We had an hre proxy in front of get lab I removed database all the external dependency one just want you to focus on this So you have a web server which handles records and APIs and hre proxy in front of it So let's enter workers as much reverse proxy So there are a lot of reverse proxy out there why we had to write a smart one and what does it mean? The idea is that is this smart because it's not a general purpose reverse proxy But it really knows your workload and can help you where it's needed It was named workers to make fun of the magical unicorn and the idea was that you can have the magical animal But if you need to do the everlifting you need workers So let's start with a simple example. How hard will be writing a reverse proxy in though? This is a reverse proxy in go three line of code error checking and imports so Let's take a look at this first. You need a new URL for your upstream server And yeah, this is the thing that you need then you need Proxy from the HTTP utils new single host reverse proxy You pass the URL in and then listen and serve then you have a reverse proxy Now We have a reverse proxy How can we speed up as low request? Let's imagine that we have a slow endpoint which is on slash slow so This is the amount of code removing import that you need to Revive the thing so Let's go through the code I cheated a bit because in order to fit everything into one slide I had imported a package So this is the real amax router. It's you can do these things directly with the standard library But the idea here is that I want to Easily declare a handler that handle a specific route. So that's the reason why we have a router here So first thing you need a router Then you need a middleware Yeah, because something that we figured out on Our logging system is that if you put a reverse proxy in between all your log system would be filled with local host incoming connection So you need to take care of the Address and all the information about the external client. So it's just three heaters and you're done and Then what you need is a handle function that Can rewrite your slow endpoint in go So the basic idea here is that you don't rewrite your whole code base But you just pinpoint the pain that you have and you rewrite them in a more performative way Then basically we go back to our old code we purse the people are the upstream URL we create a new single host reverse proxy and we bind it to the To the router so that everything that doesn't match a specific route will go through the reverse proxy in our upstream So This is what we did was 22nd of September 2015 we released this idea where HAProxy was connected to a workhorse and In case of a git operations of cloning and pooling We were doing authorization and authentication in the old ways We were forwarding the information to unicorn and the old rails code base, but instead of Handling the clone operation in rails We were just working With a girl thing we're working the git binary and forwarding all the body of the requester So basically kind of a CGI you can imagine this is I can CGI but done in the reverse proxy instead of In the rates application over the time this evolved a bit So today we have a new component which is githaly, which is written in go is a g rpc Server and it handles the older git code So if you want to interact with the repository, you can do the g rpc call and it's an external component So we were able to speed up get operation by by slow request A couple of months later We released the CI system of git lab and we had we had another problem We had a big offender in the context of slow request, which was the CI runner attempting to upload artifacts So you can easily imagine that you can I think we had a limit of one gigabyte. I'm not sure so we had this fleet of process that were uploading artifacts constantly and I want to give you some number here. I took the memory footprint of our production installation of git lab and The unicorn process takes around 800 megabytes of RAM workers 70 megabyte, so there's an order of magnitude in there So you can imagine where you want to spend your time in your machine if you are If you're hungry, if you love so we came up with this idea of body hijacking which is more or less described here, so The idea is that you have an external client in our case is the CI runner and This client needs to upload some file. Okay, so when the request goes to workers instead of forwarding it directly to rails Which in that case will dump the file on disk and replace the file with a file handler in the hash of Parameters of your request we will act before so we will parse the incoming request in workers and we save the incoming file to disk Because this is what will happen later in the process But we can do this in a performant way and multi-treaded with go-to-things and everything Then we strip out the body from the incoming request and we replace it with some metadata That tells the upstream server where we put those files so we forward them to rails and We had a middleware in rails that was reading the new headers the new information with metadata and basically replacing The file again in the hash of parameters so that as an engineer when you are just writing your Controller code. It's exactly the same as if the request was coming directly through the rails application or Through workers because you still have a file handler there. So it's completely transparent so This is what we did. It was more or less two months later the other implementation And so we speed up all the uploads bye-bye slow requests and It's time to go back to our story so We had to release cloud native charts and We had a big problem now network file system NFS Now let me explain you why this was a problem. I Collapsed everything back into the github box because I'm going to add new stuff here So I don't want to confuse you with a lot of information So this is the same thing age a proxy in there you have workers and the rates application and everything so We had to do a synchronous operation so sidekick is a Q processor for Ruby and rates application and we use red is as a hue so for instance We had support for object storage if something needs to be uploaded in object storage He you get it gets on a temporary location. Then you Write the job on red is and sidekick picks it and move it to the object storage now If you think about this this works really well if you are on a single machine, but as soon as you have an h.a. Installation or if it's Kubernetes installation where you have pods and so each one of these blocks is a Pod boundaries you have a big problem because you can't do this Basically, we were mounting the same NFS share on across all of our fleet so that regardless of the workers that was processing incoming connection every machine in the sidekick fleet was able to read it and move it to the final destination so I Want to give you some numbers also here because I was surprised when they told me so NFS is something that almost everyone knows about but very few knows about the requirements for running this thing in production with a very Big storage and in an intensive operation so you can imagine that everything is constrained by the speed of the disk and The bandwidth that you have on them on the network So you want to have a lot of memory because the last things you want is that swapping you don't want to Contend the memory swapping on disk with the IO on this for writing a reading information So in our production, we had an eight core machines with 50 gigabytes of RAM just for running that box there it's expensive and It's a single point of failure and If you have to ship the cloud native installation on Kubernetes, you can't use this Because Kubernetes can handle NFS, but it's not cloud native because it expects you to have an NFS outside of the cluster so we had to figure out a way for Removing NFS from this graph So we came up with this idea Maybe we should implement object storage directly in workers There's a side story here at this point in time object storage was an enterprise feature So you need a license for this we decided that's okay We we want to ship the open source version on Kubernetes as well. So this has to be back ported in open source first So think about the timeline. We were moving from another cloud provider. We had to ship the The Kubernetes native installation and we started realizing that we also had to back port features Make sure that that was working and build all these things together so First thing we we started with our own use case So we targeted only Google cloud storage because we were moving there and we started with git LFS Which was a very easy API to Fix, let's say so get LFS. It's a large file storage for git It's an API that you can add to your git storage and when you want to track Say binary or a big file, whatever it is you can ask LFS to track it directly in object storage so get that when you commit it on git The file will be replaced with a pointer to a location on that storage and the git client will just handle the thing For you so when you clone and check out you get you download the file and you have it, but it's not in the repo Not technically in the repo and this was easy because you have a very simple API that tells you please put this Object there it gives you the size and the body of the request is just a file. So very easy one now I have a background has a Ruby arrays developer and the first thing that I realized looking at the IO package was I don't like it. It's so I expected more feature. I expected it to be more powerful Then I started writing go code daily and say, oh, I really love it So the idea that IO reader and IO writer are so simple. You can pipe them together and this Incredibly powerful without you don't need all those obstruction. It's just everything. It's a stream of bite. You can read it or you can write it so This is still fits in one slide. Maybe it's a bit hard to read This is an handler that gives you the idea. How can you? do body hijacking and Directly storing the information on object storage while it is in transit So without buffering it without writing it on this Let's go through it. So First thing We didn't want to move authorization logic to workers because the idea is that you Need to write what you need to speed up the operation, but we still have hundreds of Engineers that work on Ruby arrays daily So we just want to move up to keep everything in the Ruby arrays code base So we made an API That basically you receive the request and with some information from the request Check if you are authorized or not to upload that information and gives you back a presigned URL so then in the contest of the Of the handler in the go proxy you just create a new HTTP request a put request on the signed URL and You forward Yeah, and you forward the body of the incoming request wrap it in a no closer But it's just a little So what happens here is and you set the content length from the From the request that is coming here and then you just too much Okay, it's not I like forgive me. So the point is that then you run this request you basically are Moving the body of the incoming request while you read it from your for the gift client Directly into s3 or Google cloud storage or whatever or mini IO whatever you're using as an object storage So you don't buffer it and as soon as you read it gets directly in the object storage Once you're done and you're checking that nothing failed You copy the incoming request remove the body Set the content length to zero because you remove it you had some metadata that you should definitely sign Telling where you stored it and you forward the request like our real proxy So when the request reached the upstream so the rails application the file is already safely stored on s3 in the object storage, whatever it is Mission complete well not exactly as I said we had some dirty tricks in that we had to take care so We were lucky because Google cloud storage is not exactly an s3 like Implementation has a one difference that allow us to ship this So Google cloud storage is the only s3 compatible implementation out there that allows you to stream unknown ranked records This is not compatible with the s3 API So Minio will refuse it and all the other implementation. They want to know upfront how much storage you need for that request So we had around 35,000 CI runners in the wild outside of our control that were Sending artifacts without length records Because they were compressing it on on transit directly on the upload request So we cannot have the size Without writing it directly on this. So this was a big problem so next iteration we Went back to the drawing board and we started looking more deeply at the s3 APIs And we found out this thing the multi-part upload so divide and upload It was designed for another use case the idea here is that to increase performance to Use more use of use make a better use of your bandwidth you can split your original object in several parts Upload them concurrently and then you have a final code that just finalize everything. This is just one single object and Then you have your final object now We decided to just implement this thing in our reverse proxy But we found out that all the libraries out there were designed for this kind of use case So either they expected to be able to seek the file on disk so that we can they can run multiple performance upload in parallel or They were optimizing for They were not taking care of memory So if they weren't able to read to gather the size of the request like if an income you have an incoming body They were just okay. The maximum amount That I can put is 600 megabytes. So I would just start reading 600 megabytes and then upload it and This was a problem because we had to keep memory usage under control is we had to take care of multiple concurrent upload from the outside and so we wanted to make this in a In a way that we could control memory usage So we came up with this very simple idea, which is Whenever this comes in we create a temporary file. We've right up to 50 megabytes It's a the API controls that number, but it just give you an idea. So we've write the first Bytes to do this Then we upload that temporary file as a part of the multi part upload We delete the file. So do we keep also the disk usage under control and which are we are we done? No, go back to the beginning right temp file write upload delete once we reach the end of the incoming stream so the but the request body We say, okay, we're done and we send the finalized code and That's it So we made it we were able to migrate Live system from the from one cloud provider to the other one We were able in to release the first iteration of the cloud native installation Then the second one had also support for Minio and the other one so yeah, I Want to thank you for listening to me and I want to highlight some takeaways from this talk which what we learned So the first thing is you can speed up a web application reading writing a reverse proxy in gold No matter if you are a company that's bright in another language. You can start incrementally It's an iterative approach Which is a good thing because you can rewrite only these low endpoints as it is not that kind of Project when you say yeah, we are going to rewrite the world code base because go is the way to do. Yeah, it is but Nobody know a higher level management will ever accept you. Yeah, let's write everything So you can start where you showing where you can improve things You can forward to another service if you need it Which is a very good entry point for splitting a monolith into micro service or just a service architecture and Always always remember to sign and modify the request if you expect to change something Sign it so that the upstream should check that The sign is her and knows that the thing that you brought in there are really coming from your mid your Reverse proxy and not from the outside Worker search code is available at the URL is there and it's really under MIT license So all the example that you have seen here are just small example not extracted from the real code base just to show the key points and Yeah, but if you want to take a look at how we did it there are more complexity involved You can you are free to study the code contribute if you like and that's it. Thank you Thank you any questions Yes Thank you for the talk How do you do test? Proxy IPI calls Okay, how do we test the reverse API proxy as you're leaving? Please try to make less noise. Thank you We have several levels of testing So we have the unit testing and acceptance testing in both projects So they are tested in isolation So for every committee in the CI you run this kind of test then we have The ups. This is just our case The rates application has a reference of the version of the upstream Proxy of the proxy that is supposed to work with and when we bundle everything together We have some QA pipeline that just builds the entire system and they run some Some use case end-to-end through all the Thank you very much. We don't have time for more questions. Sorry, but you can come and talk to him. Thank you