 Check take one two. Is it working or less? Can you hear me? Okay, cool. Uh, so I'm gonna talk a little bit about github's back end right now cuz for no particular reason because I understand that most people just don't care about this like we do a lot of getting our back end but everybody else doesn't do huge key deployments and People in general just don't give a shit about how to build a scalable github back end Unless you guys are trying to launch a github competitor. In that case, please don't do that because that would suck So I'm just gonna talk a little bit about what we do in the back end and how do we work for make to make it scale and it's It's something people have done particularly care about but I think it's an interesting tale like not only because it's about gith But it's because it's actually it's actually relevant I think there is very little of github here and a lot of the way we write over a github the way we work internally and It's worth hearing and there's no other speakers right now. So I'm gonna do it anyway. So there with it Oh, hi, my name is Visamarty But most Americans don't know how to pronounce my name. So they call me VMG by my initials I have a Twitter account which is VMG when I post stupid stuff and I used to be an indie video game developer But I woke up one morning and I was trying to hack into my neighbor's Wi-Fi, right? And it struck me that maybe if I had a real job, I could afford paying for my own Wi-Fi So I did that I went to San Francisco to work for a startup which is called github and that was like three years ago So of course back then we were like 10 people and he didn't really count as a real job either But we are like 170 now and things are working out pretty well for us Now there's not a story of how github grew to a hundred and seventy people There's a story of how our back end grew to be able to host I think we do three million different repos right now. So You have had pretty humble beginnings The first version we launched. I don't know if you guys remember that I was almost five years ago It had this mod of these logon. There was a git hosting no longer a pain in the ass This is real I will launch with this logon. It worked pretty well. I like it. It feels honest It feels as sincere Humble I feel it could be even more honest because it could say like it's hosting no longer a pain in the ass for you Not for us because Jesus if I ever meet the guy who came out with this git idea I'm going to shoot put better to him with a paintball gun because it's so bad to make this work on the server side It's so painful Because hosting is repos itself. It's not particularly hard You know as it turns out git is a distributed version control system So if you grab a git a git Repo and you put up in a server that single folder has everything that git needs to be able to push pull and clump for it So you just take that dog it follow what we call a bear repo you put it on the server You punch a hole in SSH and now you are hosting it repos for people which is which is pretty convenient But the tricky part is when you're actually trying to add some value in top of that when you're trying to show on a web interface Show on a web interface Sorry Some kind of useful information about the git repo itself So we have this issue that we have no way to go into the git repo and get information that we could display on the web UI and since our stack was within Ruby we Tom person one is trying to write a small Ruby library called grid Ruby git interface That did just died just went into the git repo and was able to parse internal git formats and show them on a web You are it was pretty straightforward. You just pass grid a Folder on the file system. Oh, thank you, man Thank you so much you just You just pass grid a folder on the file system and it opens the git repo It's able to perform a lot of very handy operations that you can display on the web UI And this was pretty straightforward. It worked very well in practice. So we just launched a website like that It was a single VM. We just put that VM on there. Yeah, it was a VM on the cloud And it was the main rails application Ruby on rails and grid and all the git repos on the same VM on the same machine So of course this work, but like the kids say nowadays It's not a web scale because it's a single VM, right? So that's not gonna go very far and in a couple ways we had so many users that we were like, okay We need more VMs. We need to put our main rails application in several VMs but we had this issue now that We had several VMs with the rails app But we didn't know where to put the actual git repos because if you put different repos in every single VM Then routing gets very hard. So we came up with this kind of ghetto idea that worked very well in practice Which is GFS This is not Google file system. This is global file system, but red hat So the point is that we will have a single big server with a lot of hard disk Though with all the git repos in them and then we will just mount the same server on every single VM So they every single VM could access the git the git repos like if it were on the local hard disk So that way we could just spin up new VMs and without changing a single line of code from the main rails app You could just access the grid could access the git repos like if they were on the local file system and this worked pretty well in practice for a couple months But then we had the same issue that a lot of other startups like Twitter and a lot of other people had Which is that Ruby on rails was making us as low and In our case it was even worse because it was literal like they were literally making us low They just moved rails to GitHub and suddenly we had like like thousands of people forking the thing and open pull request and cloning And we were not ready for that. We definitely were not ready for that. I mean, it's a great It's a great problem to have like being too famous or having too much success But it was tricky it was tricky because we were growing very fast We had very very little people a way to work around that to let people, you know fork rails So luckily for us we had saved enough money by then to actually buy real horror like actual machines amazing real horror We is goes most point data center and we got four front end machines real machines not virtual machines and four file servers and of course that database machine, but And we had the hard question now that we had git repos spread around a lot of servers So we had to came out with a way of a routing layer that would let us the main front end machines access the access the Git repos in the back end We came up with something. I think pretty smart. We called it smoke. We just kind of like a cloud But not quite and it's blurry So it was a pretty good idea and it worked out it worked out very very well in practice Let me let me talk a little bit about it This is gonna make perfect sense in just a moment. Just give me give me a moment here On the right you have a bird which is stands for binary Erlang term. It's a it's a civilization protocol that uses the same Term protocol that the Erlang game uses so that way you can actually send Erlang terms over the wire And on the left you have Ernie which Doesn't stand for anything But it's the name of the RPC server that Burt uses to actually communicate So the stack looks something like this when when the main web application Wanted to perform a git operation what you would do is it would still do them through grid But of course we cannot do the operations over the network because it needs a local a Repository on the file on the file system. So we just monkey patch grid So it will actually turn those Git calls into RPC calls into the be into the bird's realization protocol and into the thing We call chimney which chimney route smoke, of course And it's a rare store that actually finds on the network in which file server the git the git repo is stored So that way we actually route that RPC call into the file server and then Ernie the RPC server Would take the RPC call this utilizing to Ruby land from Erlang and perform the git operation Which now we can do because now we are in the file server and the git repo is on the file system So now we can do the actual a git operation on grid Get the result and then send it back to the front end machine serialized as Burt again and This was there was revolutionary for us because it means that we could spin as many new machines as we needed and as many new file servers as we needed Also meant that we could continue working on a main rails application the github.com website itself We're having to change a little single line of code because since everything was monkey patch all the RPC operations were happening Transparenly, we didn't have to restructure the actual rails application to continue growing the website So we could cheap features at the same time keeps coming at the same time which was huge for us but a Few months after of course, we just kept growing growing and growing and we had Vertical scaling issues. I hate saying this Like shit was low. You know, like it was just it wasn't running fast enough not because we weren't Distributed but because the operations git operation itself. They were pretty slow And of course the main issue we had it was great itself because it was a git library written in ruby We just not a good idea because ruby is a one not that fast of a language So we decided to work around this by shelling out to git a lot a Lot I mean I'm talking a lot and that even became a bottleneck because sharing now is expensive So we fixed that by actually sharing out to git properly Which is actually spawning it and then parsing back the results of the git comments But this doesn't really work out on the long run because Expanding it is pretty cheap But it actually gets a little bit nasty when it comes to parsing back the output of git comments back into ruby land So that was even making us Not not even the start of time just the parsing and communication time between this phone process and our main rails app was was getting a little bit painful and Luckily for us this was back in 2010 and a lot of other people were having issues like very very close to the same What we were having I like to explain this is the story of Of Ed Ed Ed was working for another company, right? And he also was trying to make good scale. They were also having good scaling issues So one morning Ed woke up and he went to his boss and he was like guys guys guys he was like What do you want it and it was like I have I have a great idea you see Have you ever heard have you ever heard about the kids command line client? Like why don't we don't take that command line client, right? And we beat that get binary we build it as a share object, right? You we take the share object and then we just link that into our server process and his boss was like oh Man Ed you're amazing. You just met get a scale on your own. That's brilliant You just fix all of our problems So Ed just did that you just build the git command line client as a library and just link that into the server process and use it to communicate with a good repo using actual ABI calls like native calls But he left the server line for a while and he checked the performance graphs and memory usage when something like this like Now now, you know Ed's boss came up to it and I was like, you know, and I've been thinking and Maybe there's a chance. There's a possibility that there could maybe be a memory leak here somehow Possibly and that's it. Yeah. Well, the thing is that we didn't really think about free memory You see because get there's a command line app. So it's gonna run once and then it's gonna quit So it would be pretty stupid. It would be a waste of time for kids to free memory But if you're leaving that in the server process, you kind of want to free that memory because it's gonna run for a while And of course, there's a kind of problem that we could solve very easily with CGI in 1995 But we're stuck in the future and we were like So luckily for us, it's also the kind of problems. There's a lot of problems computer science They just solve themselves. So Ed just left the server running and the memory usage graph was something like this And it's Ed boss was kind of concerning was like, you know, I've been thinking I've been thinking that this is some very aggressive garbage collection right here Like it almost looks like the server crashed, right? And Ed said no, it didn't crash. It just died which I What do you mean it died? Well, as it turns out get is a command line app So it has a very particular way of doing error handling We just just dying just brings the error to the STD out and just dies. We just not ideal for a server process so yeah, that was pretty rough and Look before us. There was a lot of other companies a lot of people I've even in court kid They're sure the same concerns and they had this brilliant idea. They had this idea They said, uh, why don't we take this? leave this library this Get binary made a library and we turn that into a new library made from scratch We'll call it live the two or the two stands for this time. We're gonna try to free memory and It was great. It was great. It was great We love for people from from corporate even show peers and love corporate contributors We're gonna get you for a while But some of the contributors to liquid to especially shampoos of the Google guys They found a pretty big showstopper in liquid to they had there was a huge design flow in the library Which is that it didn't have enough after factories So so they said how can we fix this and they said well, we just rewinding in Java, right? So so Jake it was born and the J stands for Jesus. They would really need a factory here and Jokes aside jokes aside a jacket is a brilliant Get implementation in Java, but we could never use Jake it ourselves because we don't do Java at github It's true. We don't do Java github and people people think it You know, there's this time of Java like at the beginning Nobody was using Java because it was like two new VMs and shit And then people started using it because it was enterprise and cool And then people are starting starting to stop using it because it was old and crusty and whatever I know people use it again because it's web scale with parents in closure and things But people think that we are like just 2005 like we have we are too cool to use Java and It's not that it's quite the opposite like we are in 1995 like we are way too old to use Java We have a very specific software design process and we really care about reliability Understanding the software we write and taking responsibility and we I've always paid that if you really think you understand the JVM You're either very smart or very wrong and I'm gonna have we are pretty fucking stupid, but we like being right all the time So that's why we use Unix. We use Unix processes. We use Unix tools We use genius methodologies And there's this quote by Ryan to make it which I fucking love We says some people think that github is a real shop or even a ruby shop It's not github is first and foremost a unique shop and everything that is just a detail So for us running the JVM in our stack was a huge responsibility that we didn't want to take We are very sold on Unix on Unix way or using processes for everything and Jackie was not an option for for us So the only thing we could do of course was live with you So we actually had a plan it was there is this VMG guy who is kind of autistic and shit So we just leave him time. He will eventually fix live with you, right? but meanwhile we had the big issue that We are growing very fast and people what we reached that moment in the life of all the startups when you go That you know, it's not sequel o'clock You're gonna Go get that web scale and somehow people really wanted us to go get a web scale because Someday sometimes somebody wrote a blog post like get the no sequel store and everybody was shouting from the rooftops Like oh my god get is like a no sequel database. It's not No, no, it's a version control system. It's not no sequel It's just it has a key value store, but that doesn't mean anything and people were crazy like I remember a year ago I was in Amsterdam speaking at Yeruko After my talk this guy came up to me and he was very excited. He was very friendly very excited And he told me, you know, I really really like github, but I've always wondered. Do you guys even mongo? I said That's not that's not a word. That's not a sentence. Actually, that's Not that's not a thing to mongo is not a very body But the thing is that they really wanted us people really wanted us to go on no sequel web scale on this thing We haven't done that. Why well, I'm gonna explain it I'm gonna explain it, but first I didn't need to do an intermission to explain more or less I guess most of you already know the internal get data model when I go give a quick run through it Now I really care about design as you can tell from the slides. I could have a lot of design and That's why the old kid logo. It was keeping me awake at night I'm not gonna lie. I know a lot of people like the old kid logo But if you look at the old kid logo, you think what is kid made of well kid is made of Poor taste and W staff of graphical choices. I don't know I don't know. I don't know but the new kid logo that my my friend Jason Long did the new rebranding for it It's it's awesome Not because it looks good But because it gives you a very very big hint about what this kid all about which is graphs Get is all about the graphs and it's a great thing to show that on a logo. It says yet. It is about the graphs I don't know how this works But when you get a working directory with your files for your project and you do a comment of that The first thing that it does actually write a tree without working directly turns it into a tree Which is basically a graph, right? So a folder is a tree and it has pointers to the blocks which are the contents of the files And it has pointers to other trees, which are the subfolders in your working directory Now that single tree gets pointed out by another comic object, which makes another graph on top of that And that kind of object has the pointers to the tree and the metadata like the author the time of the message And he has a pointer to another comic object that basically creates a forest it creates a tree of trees and that's the way it gives towards it in in the hard Is your is the history of your repo it just creates a massive graph of the history of your code Now this works surprisingly well for it's brilliant It's a brilliant implementation when it comes to having a version control system on your hard disk But if you're trying to do this on the web things get trickier Because if you want to do something very easy like showing a comic that's very straightforward Because you got the show on of the comic So if you have your objects in a data store somewhere you can just go to a data store and fetch that object And bring it to you and that's that's that's a straightforward But if you're trying to do something more complex like for instance showing the log of a repo Then that gets expensive surprisingly fast because if if your git objects are not in the same place at the same time then To actually show the log you need to go to a data store fetch the first comet Bring back the comet pass the comet find the parent of the comet and then go back to the data store to grab The parent of the comet and back again pass it go back to the data store back again And of course you start facing this ancient form of torture called a death by run trip which is extremely painful and If every single time you have to do a kid operation it involves a network call that simply doesn't work in practice It simply doesn't you will literally take five minutes to see a single log operation on a big repo So and of course if you're trying to do something even more fancy Like for instance showing the last a comet that touch a folder that we didn't get up then the rabbit hole goes in incredibly deep And that was never an option like actually putting the objects in a data store in the cloud Distributed around the network it gets Unwelded to manage and The thing is that even if you put those objects in a data store in the cloud You don't really win anything from that because you don't really win reliability because It doesn't give a shit about cap as it wasn't designed for cabin mine because it's a version of the system It's not a it's not a data store most people think so if you're trying to distribute that around the network to make it a redundant or Reliable you got keep in mind that to perform a useful kid operation like a good log on a big repo or Walking the tree it's gonna take like a million hops on that graph And if that graph is distributed across the network It's not only going to be expensive It's going to be very very hard to complete because the amount of hops needed to make that successful query is more or less Exactly one million so Yes, you miss a single hop from that operation the operation doesn't really finish It's not successful and that makes that the amount of redundancy that you need to make sure that even note goes down The whole graph can still be queried. It's basically a metric shit ton It's extremely expensive to have that amount of redundancy to make sure that even if a single note of the network goes down You you still have enough objects left to perform a full query without having to fail it Now this is something that it's hard, but it's not even close to impossible We definitely could fix this and in fact we've been trying to do that for a while But but we won't do that not because we can't but Because it's not it's not the way we work and isn't as it turns out three years have happened since I started working on liquid Chew it's gotten to the point where the library is mostly finished. We actually use it in production now We use it on the back home for you have.com we did our native app We use it Microsoft use it on visual studio. So it's a very solid library now It works very well on production and it's allowing us to do things We've always wanted to do with git on the back end without having to go cloud or not sequel or web scale or whatever Our new infrastructure is called git RPC and it's about less. It's not exciting. It's the opposite of exciting It's a simple RPC server within ruby which runs in the file server and it doesn't have an online anymore It's just pure ruby and there's only C under it So we have a ruby bindings to leave it to and the stack looks something like this right now Because we roll out git RPC in production We've been running that will now for six months now and it runs right now next to great still so great still goes through Burning all that bullshit and git RPC is much more straightforward It is a file server directly before separation. I bring it back to the front ends and there is not not that fancy Erlang web scale whatever it's just ruby and C under it and the only wire It's not even a bird anymore. It was actually using message pack. We're actually using bird right now But we're gonna switch to message pack So the whole point of it is that as we start rolling new features and github.com We always implement all of those using git RPC and we keep the old ones running grid and we slowly rewrite them in fact now there are very very few features in github.com that still goes through the old git grid pipeline and As soon as we port everything over to use git RPC We'll just take down the grid pipeline. It's gonna look something like this get RPC client Pure ruby on the client It's just rotten to through chimney and then he's good RPC server Which is simply ruby again and see let me chew on the rate Now these two got about Six months of implementing three years to write a library and it's not exciting. It's no web scale It's not cool and it's but we're very excited about it ourselves because it's a great show of the way We work like it took a lot of efforts to make something that is faster has less lines of code He has less languages. It uses less databases and it's not revolutionary. It's the opposite Evolutionary and probably disappointing but that's the way we wanted to do it This is not ready to get directly but I think it's something that I really like to talk about I think it's very important about the way that github works is It's not about being cool or exciting but about making software that works So over the years we've had a lot of problems We've had to face very big scaling challenges engineering challenges because nobody has nobody hosts more code than us And we had to face a problem that nobody had faced before Now the secret to tackling these things even if it is already to get itself most of them are getting rated But we managed to do this mostly by using tools that we know very well And that's basically github secret like Instead of using whatever is cool now like node.js or go or whatever thing is in now Using all school tools that we know very well how they work that we know they were reliably that we really understand That gives great results in and at the end and especially challenging ourselves all the time to build the simplest thing Not because it's easy because it took us three years to build a new good library liquid too But it's worth it at the end because it's less lines of code. It's just languages less databases Let's interactions between pieces and that's what we we really care about and especially especially when it comes to github It's innovating where it really matters like instead of trying to focus on building some kind of crazy Amazing scrabble get back and on the on the back and just keep shipping features for github.com Make it to make using git easier So that's what really worked for github when it comes not only to build in the back end But also the whole product the whole website the whole company is building a revolutionary product instead of a revolutionary back-end It doesn't apply only to get it applies to everything else we do and everything I give this talk like people end up pretty miserable because it's like But my job is going to be super miserable if I had to write see all day I need to write the Scala or Clotcher or go to be happy No, you don't I mean most of your neck beers you're in a good conference You're used to writing see I have you even write Paul which is disgusting, but But the point is that you don't need to use the shiniest new toy to have fun writing software like you can You can have fun doing a lot of a lot of a lot of old-school languages that you know are going to work very well and People say that it's a depressing job like if you got a right systems code You got a right Java or you're gonna do all school stuff that is not a web skill or whatever It's gonna be depressing, but I can tell you that I love I love doing this and these last four years It's been a lot of work, but it's also been a shit ton of fun so Yeah, just to try that for a change let's web scale a more like sober that works. Thank you Man that went on for a while. Sorry that took so long. I want to be 20 minutes. It's been 26 We still don't have many more speakers, but of course if anybody has a question. I would love to Any kind of question regarding the or get back in the wake it up works or anything you want I'm going to be around all fucking day I get paid to do this actually so just come up to me and ask anything you want I love talking about our back end the library. How can you help us with all the opportunities we do? So by all means find me and ask me whatever you want and now it's called wants to say something Scott do you want to say something?