 Entire onion everything including distribution you remove distribution you end up with revision control Now forget about revisions forget about the idea of versioning itself Okay, forget about history pretend that every project has only one comment. This makes things much easier once you remove that layer from the onion what you are left with is As git calls itself a stupid content tracker you give it files and directories it Stores the files and directories away Still too big to swallow forget about files and direct Let's go right to the core of the onion what you are left with I Would argue it is a persistent hash map Weird but allow me to show you how You give git a piece of content for example in this case the string something and It gives you back a hash a 20 bytes hash, okay, and this functionality like so many functionalities in git It's a low-level functionality, but you can actually access it from the command line There is actually a command which you can use if you do scripting for example that is called the git hash object Let me use it actually. I have a command line here git hash Object is this visible to everybody in the room now? Still small, okay, okay Now everybody Okay, okay Okay, now it's expecting something on a file actually I don't have a file So I will say read from standard input instead and I will pipe something into it with echo, okay? Something literally something For people who don't use the Unix like stuff I essentially streaming this string into this command and What you are left with is this hash DEBA which is the same that I wrote here because it's the same content same content same hash every time This is usually important in it get to uses these SH one Hashes everywhere all of your files get hashed like this, so Call it sha one or shown for friends Okay, so some people were new to git say but of course there are not as many shown as there are possible files What if I have two files in my project that have the same shown that would be tragic and yes It would be tragic that would totally break it. It's probably not going to happen Let's do a quick calculation just because it's fun Let's look for example at this guy. He won the American The jackpot at the American lottery plenty of money Your chances of winning the jackpot at the American lottery with a random combination of numbers are Slim one in one hundred seventy five million these number is hard to visualize and let me visualize it for you Let's say that I come up with one hundred seventy five million fortune cookies And I put a different number in each fortune cookie including of course the jackpot because it's all of them Let's say that the fortune cookies five centimeters long So you start here in this room and you track a line of fortune cookies towards Europe I think it's in that direction. Okay You end up with a very long line and believe it or not. I googled for it You end up very close to where I live. Okay, you end up around Venice So now say that you walk the entire line Okay, it's a long walk. So at some point you are probably gonna get hungry So you decide to eat a fortune cookie. You are only allowed to one fortune cookie in the entire trip you open it and Hey, you want the jackpot? So these are your chances of winning the jackpot. So you think I won't I will win again This is that gambler a gambler mindset, right? I'm on a lucky strike. Let me do it again So you come back all the way through With the brand new cookies and once again, probably somewhere in Turkey you get hungry and You eat a cookie and you win again Twice in a row so your friends now are calling you a lucky bastard and asking you to pay for rounds and stuff Okay, the chances of two random shones Colliding are not the chances of winning the jackpot once or twice in a row They are the same chances of winning the jackpot 10,000 billions of billions of billions of billion times in a row Just saying it's probably not going to happen Okay, so this is important for what comes later. Shones are not unique in your project They are unique in the universe. You can take every single software project on nurse put them all into the same get to Ripple you get a lot of performance issues still no clashes So that's the hash map part. I said the persistent hash map This one is not persistent yet to make it persistent. You can actually do it with the minus w switch in In hash object and let me do that Minus w this gonna break. It's complaining that it doesn't have anywhere to put the object It lacks its own place, which is the dot get Directory you know that one probably Well, any get project has a dot get directory in the route That's where the get stuff goes configuration and the object database. So let's let me create the directory. How do I do that? Yep, get in it. That's what getting it does. It creates the get folder the dot get folder is here, okay Mmm, let me do it in another folder. Sorry my project. Let me remove it from here just in case and Then let me move to my project just to avoid the other files in the demo, which would probably pollute the demo Okay Initialize fine. Now. Let me open it and if you look inside it You will see that amongst other things it has an objects directory and inside the objects directory There are a couple of folders info and pack ignore these ones for now, okay? There is no object now. Actually. I can ask get how many object It has in its database. I think it's this command yet zero objects You don't have to remember these commands, of course The point is just understanding how it works not remembering every step so now I can once again generate the hash and Save it There if you look again into the object directory here You will see a directory called D. E Which is not coincidentally the very beginning of this hash It follows up with BA and so on and so on and this is the name of the filing here It generated a file By splitting the name. This is just a way not to put all files into one directory But essentially the file you can say Approximately that the file is named like the hash and what's in that file? The content the string is something it's been zipped. It's it's been wrapped into a small header But that's what it is There is another command that I will use that is called the git cat file and I can pass it This hash or just the first few Digits in the hash and if I run it with minus t it will tell me the type of this thing It calls it a blob content your files are called blob blobs inside git and if I run it with minus p It's gonna show the content Okay So so far so good. So it's a persistent hash map This is the core Now let's work towards the next layer This one is gonna require a lot of goofing around With the command line and a lot of looking at hashes So please don't even try to follow it every single step. That's not the point. The point is the structure. Okay Let's say that I start Adding files to my project. I have a shell script here that I prepare to to add a few files so that I don't have to do it by hand and if I Look at the project now. There is a read me file I will zoom in a bit for people who are far away. Okay, and the read me has Contains the string is something and then there is an src Directory which contains two files the the first one contains the string something else Second one is once again something. Okay, and now I will quickly add these files To the I will stage these files in git speak. I will add them to the index. I will prepare them for commit If I say git status now, you see that these files are ready to go up and now I can commit and I will give this commit a message with the minus M switch. It's gonna be called first There it goes and if you ask once again How many objects are in the database this time around it's gonna come up with five objects And now the problem is why five? Let's see. I will be using git cut file again Okay, if I say git log it shows me the comment and the comment has a hash Now what happens if I say git cut file minus T It's gonna say it's a comment. Sure. What happens if I say git cut file minus P What's in a comment what I'm talking about the implementation of a comment in it? What do you expect to find it there? hmm okay Metadata the commit message the the user the date of the commit There they must be in there, okay and If I ask for it, I'm sorry that it's on multiple lines a bit hard to read but this makes it bigger You will see that exactly this text is the comment. I mean Literally git takes this text hashes it zips it wraps it in a block in a header blah blah And it becomes an object in the database and most of these is obvious. Okay. This is me. The that's the date This is the the commit message The first line is not as obvious. What's a tree? a tree quickly is the equivalent in git of a directory a Blob you can see it as a file a Tree is like a directory in this case. It's the root of your project. Okay So now let's let me do this again with the tree So git cut file minus P Hatch of the tree. This is a tree Remember, this is the root of your project. Okay. What's in the root of your project? Another three the SRC directory and the blob read me Okay, let me do this one last time with the read me Do you happen to remember this hash? something So let's look at this graphically to make it easy to follow hopefully You have a comment. It's called first. It references a tree. It's called E7 blah blah and it's actually the root of your project. These stories are referencing two things Another three source And the file read me which contains the string something Source is the reference in two files. You remember there were two files inside source, right? One of these is File one which contains the string something else. That's another object Where does the last arrow go? The last file contains the string something again So the last arrow can go straight there Look at what is happening. It is re-using the objects The name of the file is not in the blob. The name of the file is in the tree that contains the blob So if you have two files that are identical, they will only be one object in it These helps make it efficient. We'll see how Well, a couple of observations first. There is more than this going on under the hood I'm saying that it is creating one object for every file. Actually every now and then it will look into the files and say Hey, these files are almost identical Let's put everything that is common into another Section of memory of a file. It will as they say pack your objects. That's what the pack Directory under objects is for but this is not useful. This is a really an implementation detail You're probably never gonna see that This is more important because it gives you a hint at how to get things Another thing that is interesting is that look at this thing You have blobs that have content. You have trees that contain more trees and the content and the names of the things are in the trees What do you call a thing like that? huh a five system This is a damn file system Which is totally unsurprising because the other of this thing is a specialist in file systems. He's a kernel guy, right? So this is a good way to look at git It's not a Version system yet. It's a file system. Okay. It's not what you think about when you think of file systems You're thinking about something that is more low level Kernel related kernel level. It's talking directly to mass storage Almost directly, but if you abstract it a bit remember abstraction is your friend. This is kind of like a file system So it's not just a persistent hash map. It's a stupid content tracker file system for friends now one more layer in the onion version This is going to be brutal But it starts simple enough Let's say that I edit the read me file and I write in here my git project there so now we have a New file to add and we can't commit and we commit with the message second And if we say git log We get a hash for this comment And if we say get cut file minus P and the hash What's gonna be different in this comment compared to the previous comment conceptually different In this case the comment is not the first comment anymore. It's not a root comment. Let's see So you have something more you have a parent Which is the first comment, of course now I could go on and use cut file to painstakingly look at each Single object in the database, but I will spare you the pain. I will just to show you the result here so Second comment it has a parent and it's pointed to a tree is this gonna be the same tree that we have here No If you if you are questioning why you will see in a second It's another three This story is pointing to a tree and the blob the blob is clearly different Okay, because it's new content my git project That's why this story has to be different because git is Calculating the shown of the tree and the content of the three is different because the blob is different What about SRC is this the same SRC or another one This one is the same one So look at the way that git builds his story into the mix If you look at this entire thing, let me move back and forth because I'm sorry guys. I love you, too Forget about that stuff forget about that stuff because it if you look at the These entire thing from here from the point of view of the second comment You can't reach this thing Unless you go to the parent and then go over you don't see this stuff What you see is the stuff that you can reach from here Which means this and if you look at this this is a snapshot of your file system in time So gittys are using whatever it can reuse but what you end up with is each comment is the whole snapshot of the whole project Okay, forget about trees and forget about blobs mainly because you know it all now You're experts when it comes to the githubject model trees Blobs Commits and then there is another kind of object that is called the annotated tag, which is very easy By the way, you can look it up yourself This is all there is to know about the githubject model. There is quite simply nothing else in there You know it all Now let's move up a level and look at the comments instead focus on the comments One thing that we don't know yet. You know that githubject is famous for branching We don't know what a branch is What's a branch I mean, let me go back to the command line Let me see what we have here if I say get branch it will list all the branches And we only have one branch because we are at the very beginning. It's called master. Okay And if I create another branch Fix me There now we have two branches, but what are these concretely? Let's look inside the dot git folder. This is instructive So I think I have something here There let's forget about the object database There is a folder here that's called refs For references and if you look inside it You will see that it has a couple of subfolders forget about tags There is one that is called hats and if you look inside that one You will see two files that are fix me and master What do you expect to find inside those two files if I print them on the screen? ashes So let me double check for the non-believers cat dot git um hats master A hash and by the way The hash Of the latest commit what if I ask for fix me instead same hash So branches are just the references in general references to a hash Hmm Now we are still missing something when I say git branch Uh These two branches are not the same one is all happy and green and the other one is all white and sad Because one is the current branch, right? How does git know what the current branch is? Yeah I wanted to to get there in two questions, but you guys sure to circuit me Um There is there must be another reference Right, there must be a reference to a branch which is a reference to a commit and this reference is saying These is the current branch. What's the name of this reference? A file named head just by coincidence Is here right in the root of the git folder In here you won't generally find a shown Because it's references as referencing a branch, which is not an object in git's database It's a file in the refs folder So git is using these other syntax here to say this is the reference But this is it, okay From an implementation point of view you just became the experts. This is all there is to know about branches The consequences, however can be Can take some time To be understood, let's see I'm going to change my style and not use the command line anymore because this would Force me to switch back and forth a lot and it would get annoying for you I would instead show you the commands on the screen And I will show you what happens inside the git there We said that we have a branch named master, which is a reference And we have a reference to the reference, which is head and then we have another reference Once once we say git branch fixes that is called Fixes, okay So what happens now if I change some stuff in the command line? For example, I edit a file and then I say git commit minus a which stands for add everything new So I can skip the git add part Minus a third what is happening here? Well, I'm creating a new commit So the new commit will be let's say here I'm bending it sideways because of what counts later This brand new and it's a commit that has as a parent the previous Current commit the one pointed at by head which is pointing at master which is pointing at it Uh, what happens to head actually not nothing happens to head Why should it change? It was pointing at master. It's the current branch. It's still the current branch. What is changing is master Which is now pointing here Head is just following along for the right So now what happens if I say git checkout fixes, what does git checkout do concretely? That's all it does it moves head to the fixes branch like this And yes, thank you smartass Almost as a side effect when you change head and you are pointing at a different thing Git says hey the contents of this working directory don't match what I see here in the object database You know what the object database is worth everything. You know what your panifiles are worth nothing Let me delete everything that is in here and replace it Well, maybe if I see stuff that hasn't been committed will warn you But otherwise I don't care about your working directory Git really considers that very transient So it will just say let me walk this commit just like we did earlier on And check all the trees and all the blobs and turn them into folders and files and dump everything into the working directory Actually good You have a career in front of you sound Okay, now what happens if I change stuff Now and I I create another commit now, you know what happens first the new commit Let's put it here because the parent of the new commit is always the current commit And of course master is going to stay there Head is going to stay there, but fix this is going to move What happens now if I say git merge? Think about what git is trying to achieve What we are trying to achieve is I want to have a point in time a commit Where everything that is available in master Is also available there and everything that is available in fix me is also available there So if there are no conflicts The easiest way to get to that point is to just create another commit which has two parents like this I want to bore you with the details But you can probably draw a pretty picture yourself of how this commit is pointing to a brand new tree the root Which is new, but this root is pointing to stuff that was already there And if there are conflicts it's pointing to stuff that is new because it solves the conflicts Is this clear Probably no need for more details What is important here is that fixes the current branch is going to move to the merge. Okay Now let's Check out master again What happens now if I say git merge fixes Okay, again, what are we trying to achieve? We are trying to get a point in time where everything is available That is also available in both master and fixes But we already have this point in time, right? It's there. It's the top most commit So we don't need to create new commits with parents or stuff like that Git is smart enough to say hey, I'm going to use what I have So what it does is that it just takes master and it moves it to the top of the chain This is called by the way a fast forward If you're just learning it and bang your head against the wall trying to understand what the fast forward is Nothing magic. This is it Pretty now. Let's look at another situation quickly and then we are done with this version in silliness. Okay This is the other situation. Imagine we have this situation in our In our project. We have a first commit and then we branched somebody branched the fixes probably and This person committed fix and more fixed While another person committed second and third or maybe the same person while switching branches Okay, and we are left like this and right now. We're on fixes What happens if I say git rebase master? What's our rebase? This is what you want to do. What you would like to do is to Take This connection the connection between fix and first cut it off break it And move it to third Re-basing, right? I want to change the base of this chain Okay So you want to do this That's what you want to do But you can't I mean you can't go in into the commit fix change the commit And be done with it. Why? You get a different shown As soon as you change one single byte you get a completely different shown That's why everything in git is immutable everything in the object database is immutable So what can you do instead? If you can't do this and you want to get something like this What you can do instead is to copy over stuff to new commits Which pretend that they are the old commits So what you do is to take fix and create a new commit which is Exactly like fix it's pointing to the same tree It has the same ozo and blah blah blah But it has a different parent But it needs to be a different object because it has different shown Is this point clear? Because this point is the point that might drive you crazy the moment that you mix re-basing with distribution That's when pain happens if you don't understand this thing that this is not the same commit. It's a copy Do you ever find yourself with multiple commits with the same messages in your git tree? And you were wondering where these where do these come from? That's where they come from I'm not going into the mechanics by which this can happen, but unfortunately it does happen Also, you need to copy more fix And you need to move fixes to the top One last question about versioning and then we're done with versioning What happens to those commits to the original ones? Like in object oriented programming languages if something is there, but are reachable There is no way to get there by a branch or a tag Eventually git will say this stuff is old and it will garbage collect it So they're gonna die sooner or later Maybe they will take a long time, but What if you want to save them For whatever reason I don't have a slide for that, but if you want to make something reachable so that it's never garbage collected Just stick a branch on it It will be there forever because now it's always reachable And that and a lot of special cases is all there is to know about versioning it And we are done with the next layer of the onion All we're left with now Distribution if we had more time I would go into the details of how distribution is implemented But we don't have much time. So forgive me. I will skip the internals I will just show you the basic idea because now the basic idea Comes on very smoothly Imagine that you have a local Repo and the remote repo. I know that computers don't look like these anymore, but I'm an old man, okay And imagine that you want to clone the remote on the local Those pretty colorful things are objects I just color them differently instead of putting different shrooms on them. It's more readable for humans You will notice that there is no magenta ball because magenta is not a damn color What happens physically when you get clone? Yeah, if you want to approximate just a tiny little bit, you just copy over dot git folder It's not quite that simple, but almost for sure you copy over All the objects And bang you have your own repo with all the history. I didn't draw the graph But of course these objects are all connected. Okay Then sometime passes Sometime passes and the two databases evolve people are working on the two on the two repos So now in the local we have these Pink ball and this last line. Okay That the remote doesn't have and the remote has things that we don't have But remember what we said in the beginning These things are unique in the universe They have a shown So how do I align with the remote? How do I get the stuff in the remote? What happens when I do a git fetch in other words? Oh, essentially you ask the remote May you please send me all the stuff that you have and I don't All those objects and bang Perfect Usually don't do a git fetch you do a git Pull which means git fetch and then merge because things are just a little bit more complicated in that There are local branches and remote branches So after fetching the new stuff you want to get them in your history and that you do by merging essentially Local branches or remote branches, whatever they are branches. Okay There is a simple system to tell git how to track which branch with this with which branch But essentially it's the same mechanics mechanism you seen before What happens then when I do a git push? Same thing in the other direction Take this stuff. It's new. You don't have it And that's distribution May it be so simple Yes, it's really so simple That's it We have the onion now You might wonder after rebuilding the entire onion. So why are you telling me this stuff? Why am I supposed to care about this stuff? Usually in a presentation you tell the why beforehand because I'm supposed to motivate you But I thought come on and we are all geeks Git internals You don't need any more motivation and indeed you'll be following But in the end I feel I need to motivate that and the basic thing is that when you learn git Like pretty much anything else you go so You can abstract it as three stages Okay The first stage you are inexperienced. Okay, this is me being inexperienced. So I'm starting to play with git Hey, it's not as hard as I thought. Okay. I don't know rebase of that. That wasn't his subversion Okay, I will merge. I know what that is and so and so on. Okay, and I was feeling all happy Then you get to a point where you are experienced but compared to other Technologies where experienced means that you get more and more command In git experience experience looks more like this. Okay Oh, I broke the personal record. What do I do now? Oh my god, something bad happened People are probably gonna blame me Okay And only then if you can't hold the answer with this you get to a stage of fluency Where you go like Okay, I'm not understanding but I know how things fit together Hi, I know how git organizes trees and blobs And commits More importantly, let me throw this. Okay. Now I can make sense of this And now I can actually work with git fluently. Okay And to get from the second step The second stage of experienced but often in trouble to the stage where okay I know what I'm doing. Give me some time and I will sort it out You need to know the internals. You need to know the basics like a ruby meta programming In a git what is important is the model if you own the model You can make sense of any special case if you don't understand the model then each special case is going to be Its own particular target And your life is going to be miserable. I know because mine was So this takes care of the very latest layer now we're done. Okay. Thank you