 Hi, welcome everybody. This is the definitive deep dive into the .git folder. Here's the part where I tell you about all the slides, and I have exactly zero slides. So here is the one and only slide. This is notepad. The definitive deep dive into the Git folder, all of this will be live, and that will be really great. So if you want to go grab the code that we're going to look at today, go to github.com.com.roberidge.git explorer. You can get to that from roberidge.org. Click on presentations here at the top, and here's the link to that GitHub repo of what we'll dig into today. While you're here on roberidge.org, you can click on About Me and get to this screen that will talk about some of the things that I've done recently. I'm a Microsoft MVP. I'm a friend of Redgate. Easy Gift Camp is wonderful. Easy Gift Camp brings volunteer developers together with charities who otherwise couldn't afford software services. We begin coding Friday after work. Sunday afternoon, we deliver that completed software off to the charity. Group is optional, Caffeine provided. If you're in Phoenix, come join us for the next Easy Gift Camp. Or if you'd like a gift camp in your area, hit me up on Twitter or send me an email, and let's get philanthropy installed in your area too. Some of the other things that I've done. The secret source control basics, minus chapter 8. I worked on Git in version 2 and version 3 as a core contributor. And I do training with gitgrit.com. One of the things I'm particularly proud of is, I replied to a .NET Rocks podcast episode. They read my comments on the air and they sent me a mug. So that's my claim to fame. So let's dig in. Here is a folder. That's not the folder. Here is a folder. And this folder is completely empty. That's the folder that we'll be working in today. So here's that folder. No content in it. Let's create a Git repository. Git init. Now what this did is this created this .git folder right here. Now this .git folder, I have visible because I've gone into view options and I've chosen to hide or to show hidden files and folders. If you check that box, then you're able to see it as well. Or depending on your operating system, that box might be in a different spot, but there's that .git folder. And that is the Git repository. Now today we're going to go through all of the files here in this .git folder. Now we're not going to go through all of the Git commands, but we will hit all of the files in the Git folder. Let's create some Git history. Echo file1 to file1.txt. Git add file1. Git commit-m file1. We've just created some Git history. Git log-1-1-line-graph-decorate-all. There's our Git history. Let's do that again. Echo file2 into file2.txt. Git add file2. Git commit-m file2. Okay, so now we've got some history. Git log-1-line-graph-decorate-all. And we see that head is pointing to master. Let's fix that real fast. Get out-b main. Git branch-d master. That looks better. Awesome. So now let's go wander through this .git folder. I'm going to open this .git folder up in VS Code. Now the cool thing with VS Code is it'll actually hide the Git folder for you and not show it in this list of contents. So I'll actually open the Git folder specifically. Let's pop open the objects folder. Here in the objects folder, we see a bunch of folders. And those folders look kind of familiar. E-1-1-4-F-1-1. E-1-1-4-F-1-1. Now this is a bunch of garbage. This is a V-Lid compressed or deflate embedded file. So if we wanted to, we could do lots of things to try and uncompress this. It's the deflate algorithm. We can go to Gunzip or we can do some Python content or there's an open SSL command that'll get us there. In this case, I'm going to open up this. I created this node program that will just quickly unzip it. So we're using the Z-Lib library and we will just inflate this file and then dump it straight out for the console. So let's do that. Unzipper.Git slash E-1 slash Object slash E-1 slash... Okay, so that's the content there in this file. Now it's a commit node. It's 226 bytes. That's what all of this content is. Here's our commit message. Here's a tree that references another git hash. That's kind of interesting. The parent node, the node right above us. So there's that 867. Some author details, date time for the commit and date time for the push. Let's look at this node. Now that 1-2, there's that 1-2 and there's that commit right there. Or rather that tree node. So unzipper.GitObject... Nope, not that one. Unzipper.GitObject1-2. And we can see this tree node. Now it's got some unusual characters. We can use a git command for that. Git cat file dash t. And we can go after that git hash. Now we only need something that's unique. So those first two things. So it is a tree node. That's cool. Git cat file dash p. Here's the contents of that file. So it's lifting out those two blob nodes. Ooh, a blob node. We've looked at commit nodes. We've looked at tree nodes. Blob nodes. Let's go look at this blob node. Git cat file dash t. Obviously it's a blob. Git cat file dash p. And here's the contents. That is 9 bytes. So 7 bytes, the carriage return, and a null at the end. 9 bytes. We can see unzipper.GitObject94. Here it is. It's 9 bytes in that file. So we do have a bunch more folders here in this objects folder than we just looked through. Here's one commit. Well, two commits. Two tree nodes. Two blobs. And therefore we have six files in our repository right now. Wouldn't it be cool if we could browse through all of them together? So I built this open source project at github.com slash robridge slash git dash explorer that allows us to do just that. Here in git explorer, we give it the path to the config file or .end file. We specify the path to our Git repo. And so I specified it here in this Git folder. And now if I pull up localhost 3000, we'll see this git explorer. So here's all of our nodes. 8, 6, E1, 1, 2, 9, 4. 9, 4 was interesting. Here's that blob. 2, 6. Well, it might be nice to, I don't know, let's make them alphabetical and show the tag. Now we can kind of see that. Okay, so here's E1. That's our main head. And so we can look at that. Yep, that's a commit node. And here's the parent node, 8, 6, 7. So if we go look at this one, that's going to be a commit node too. It references this tree, B97. There's that tree. Well, let's kind of show it in a parent-child relationship and show that type. There's commit nodes. There's tree nodes. There's blob nodes. We'll draw some lines between it. And this is quite different than the Git history that we're looking at before. Typically with the Git history, we only see these two commit nodes. Now we can look at all of the other nodes within it too. Isn't it interesting that this tree node references this file and file 2? That's kind of cool. So now let's do some interesting things. Echo file 3 into file 3.txt. Git add file 3. Now I've added it, but I've not committed it. So here in our Git objects folder, we see we now have seven nodes, seven DAG nodes. That's interesting. Well, what is that seven DAG node? Let's show the type. We'll put it in parent-child relationship. We'll show the lines and the tag. Here it is. So we already have a blob node in our repository even though we haven't committed it. We've only staged it. That's cool. So now if we were to say echo file 4 into file 4.txt, Git add file 4. Git status. Now we see that there are two files staged. We have two blogs in our repository, but no tree nodes pointing to them, no commit nodes. If we were to do a Git GC right now, those things would be gone. Well, let's commit those. Git commit dash m, file 3 and 4. Now we have all of that content in place. Let's show the type, parent-child, show the lines, show the tag. Here's that new commit. That goes to this folder. And here's those two new blobs. Well, what if I create a new folder in here? Let's create a new folder. New folder. We'll call it folder. And now let's echo some content into it. Let's echo file 1 into folder slash file 1.txt. Git add dot, Git status. Okay, so we have one file staged. Git commit dash m, folder slash file 1. Let's take a look at what that looks like. Show the type, parent-child, show the lines, show the tags. Here's that new commit. And it has a tree node that references this tree node because file 1 is in it. In addition to that, it references file 1 directly because file 1 is in that same folder. So here it's referencing the original folder from this tree node from the original commit and it's referencing the original file. It doesn't need to duplicate the file in the repository. It's already there. It's exactly the same. That's pretty cool. Let's amend a commit. So let's pop open, I don't know, file 4. And let's change file 4. Git add dot, Git status. So we only got file 4. Git commit dash dash amend dash m folder one and file 4. So now we've amended this commit. If we say git log dash dash one line dash dash graph dash dash decorate dash dash all oop graph ooh typing is hard. Git log dash dash one line dash dash graph dash dash decorate dash dash all. There we go. Now we only see these four commit because we amended this one. But if we refresh our repository show type parent child show the line show the tags we still have this fifth commit. We just have no branches pointing at it. So as soon as we do a garbage collect inside git this commit will go away. But if I accidentally committed my credentials here in this blob it's still in my repository. If I push this to a remote it would still be in the remote in spite of the fact that I came back and changed it later to point out a different file 4. Here's that different file 4. So as soon as you commit the content into the repository consider that secret loss. Okay we did a whole lot in the objects folder and we saw how we had blogs and tree nodes and commit. Let's look in the rest folder. Well we have in heads this thing called main that kind of matches our main branch 0,7,1,a we've seen that commit before there's 0,7,1,a there's also a thing out here called head and head right now points at rest head main. That's where our checkout directory is. So arguably I wish head was inside the rest folder because it is a rest but it's not. So now if we do something like this checkout this one now we get into this really interesting thing this detached head state. So if I were to take my head off we're in a detached head state. I feel like I'm in some zombie movie or something. What detached head state means is that if we pop open this head file again it's pointing at a commit it's not pointing at a branch. So if we were to do something like this echo file 6 to file 6.txt git add file 6 git commit-m file 6 git log let me go grab that there it is. Now we have head pointing at this thing and if we were to checkout, I don't know git checkout main now it's gone. Well is it gone? Let's go pull up the viewer show type parent child show the lines show the tags well it's still there we just don't have anything pointing at it yet. That's what it means by detached head. So git checkout where was it? git checkout 7.5 git checkout that git checkout git branch new feature git log dash dash one line dash dash graph dash dash decorate dash dash all well kind of if we pop open head we see it's still pointing at this other commit so we've created the branch now we need to checkout the branch git checkout new feature. git log and we see that now head is pointing at this new feature we are no longer in a detached head state that was cool git tag 1.0 git log we can see now we have a tag at this spot popping up in the rest folder we can see that we have a tags folder and here's that 1.0 now does 1.0 point at master or new branch? No it points directly at that commit the presumption is that tags won't move so they point directly at commit. Is head pointing at the tag? Nope head still pointing at my branch and my branch is still pointing at that commit the tag is another pointer to that commit that's perfect so let's talk about another server let's build up a server here on this machine we're going to create a new folder we'll call this server now a git repository is both a client and a server if we want it to be so here in this server git init dash dash bear now what bear says is that we don't have a .git folder we only have the contents that would be in that .git folder that's pretty cool so now back in our regular repository git remote add origin here's where we would say HTTPS some server but instead of that I'm going to say ..slash server my server is a folder on my file system git log dash dash one line dash dash graph dash dash decorate dash dash all and we don't have any content here yet pointing at our remote git push origin main okay we've just pushed the main branch off to the server git push origin new feature we've pushed that graph that branch off now we have origin slash new feature and origin slash main popping open our west folder we now have a new folder called remote inside remote we have an origin folder and inside origin we have main and new feature so here's the git hash that our server believes that's on and so this tracking branch allows us to keep track of what we've sent to the server recently now if we were to create more content git echo file 8 file 8.txt git add file 8 git commit dash m file 8 now we can see that the spot where we believe the server to be is different than the spot where our branch is here's the one in ref's heads and here's the one in remote now we can do something like git merge main looking at that history we now see that new feature has the content from the regular stuff and the content from main and of course now I'm going to do something silly like git checkout main and we look through our history git checkout main oh yeah then so I finished my feature git checkout main git branch dash d because you know capital D because I'm feeling brave new feature git log did I just lose all my work so we looked at ref's let's look at logs so here in logs we have a ref's heads main here's the history of where the main branch has been it started out at nothing and then it moved to this commit it started from that commit and it moved to this commit and so we can kind of walk through the history so our main is now right here here's head's history so we see that at the end head was right here that's when we checked out main here's the commit right before that so that's probably where we want to go there's another way that we can browse this log git ref log and that will show the history of where we went so we want to go back to here git checkout there git checkout dash b new feature and go back to our log we're back ref log can be really really helpful in getting back to where we were by understanding how we navigated through all the things so that's logs hooks really interesting here in hooks we see we've got a shell script we've got some more shell scripts we've got this one is a pearl script now all of these end in dot sample so I'm going to go grab some content out of that's not it I'm going to go grab some hooks and these just have the dash sample removed from them now each one just says echo the content that we got the command line parameters passed into the script together with an arrow that says here's the spot that we are on now we've got an apply patch a git commit patch a post update patch a pre commit patch git log now let's add some new content echo file to file 7.txt git add file 7 git commit file 7 git log now in that process we saw some interesting things as we added the thing we didn't get any pre commit hooks but here as we committed it we got a prepare commit message hook we got a commit message hook as we push git push origin new feature we're also going to get a pre push hook now here's where we could do really interesting things we could I don't know, lint the code run unit test validate that our commit messages are in the correct format maybe mutate the messages to match our expected format there's lots of automation that we can do here in this in these commit hooks and so let your imagination go and do some really interesting things but now I want to commit these I've created some really interesting git hooks git add dot git slash hooks slash pre apply patch git status I can't actually add these to my repository and I really want to be able to share these so that others can work with these as well that's where there's packages like this git hooks if you're in the npm world there are other packages and other ecosystems as I add this package to my system it will actually create scripts that are in my repository so that I can then shim them from the hooks folder into files that I can version as you're looking for hook grab a package like this so that you can version your hooks and everyone can use them sometimes we need to do some initialization things to be able to create those sim links or those wrappers and that's what git hooks does really well as you install git hook it will actually create those sim links for you so we looked at hooks we looked at logs objects refs info this is interesting this looks a lot like an exclude file now typically we'll create a .gitignore file to exclude things and we definitely could do that here instead of there if you wanted to have one that was specific to your local machine that wasn't shared with the rest of your team you could do it here I would recommend just using a regular .gitignore file though but that exclude is interesting while we're here looking at configuration though let's look at the config file so here's a config file and this kind of specifies the configuration details in this case ignore cases true, bear is false we talked about bear on the server that's where it doesn't have the .git folder now this is the override specific to the files that I have in my folder so I'm going to go look at the .git config file in my user profile and here I have the name and email that I set when I first created my repository there's my github account I have my merge and diff tool set, some aliases and so all of these will take effect but if I wanted local override I could for example I set up my machine for hobby projects but this is a work project so I want this to be at work.co and my github user account is acmeink now all of the commits that I do will not be specific to my personal account, they'll be specific to my work account these configs override the config in my user home directory here's the description file and there was a question about am I going to talk about instaweb this is the description about instaweb we don't usually use instaweb anymore we use github but if you use instaweb here's the description that will show up in github we talked about head this is interesting an index file let's open it anyway it's a binary file that looks kind of ugly this index file is actually um oh yes this index keeps track of all of the file in the repository and are all of the files in the working directory so let's go parse this file by hand and do all of the mayhem this repository gin is a really cool python program that is able to parse that index file but I can also use gitlsfiles dash dash stage and that actually will list out all of the files in that index so what we see here in this index is that it has the blob hash so we can verify that git catfile dash t the type for that thing git catfile dash p the contents of it and there's that content it's listing out all of the blobs in our repository so that it can very quickly do disks and merges and index and status all of the things that it needs to do really fast it can do straight away from this index file file this zero is kind of interesting zero means that there is no merge conflict as we get into merging it'll use one two and three to reference mine and yours and our base but because we're not in a merge scenario all of ours are a zero right now um then it'll use stage two and stage three for the other differences and yeah this index is kind of nuts some other stuff in here there's some other logs we talked about the ref logs here's a commit edit message now this is the last commit message that I did and it'll use this file to be able to pass back and forth to my pre-commit hook and the other hook um a ridge head where the head was right before it did the current head so that it can pass it between those scripts so we looked at all of the files here a ridge head is a cache file commit edit message is a cache file description is that message for insta web head is a rest index is so that it can do really fast things in our working directory the config which overrides the thing in our user home directory rest which keep track of branches and tags objects which store commits trees or blogs depending on the type of content and as we stage things we'll get blogs as we commit things then we'll get the trees and the commits our logs that allow us to do ref logs to be able to recover things info that has a really poor way of doing git exclude files and the hooks that allow us to do really interesting automation all of these pieces under the hood make up this .git folder database what's really cool as we dig into this .git folder we can see kind of the inner workings of how git works sometimes we like to fight git or just kind of throw up our hands and go I don't know what it's doing but I have the things dropped to my posted notes dropped to the side of my keyboard I hope this tour through the .git folder has given you a little bit more understanding of some of the decisions that have been made as we do this this git ready blog post is actually really cool because it describes each of those things here's each of the files and here's all the things that you can do with them all the purposes for each of the files I'm Rob Richardson this was a lot of fun digging through the .git folder if you have questions that you find tomorrow hit me on twitter at rob underscore rich or send me an email by clicking on contact me from robrich.org with that what are your questions can I send the repo link here yes let me grab this repo link and paste it here in the answer can I paste it here in the answer there's the answer awesome we talked about insta web that was really cool is the only data about a merge in the index that's a good question some of the details about the merge end up in the index some of the details about the merge come in the form of the commits that we're trying to stitch together so generally when we get into a merge scenario we have two different commits going in different places we saw that we were able to visualize you know we began this process and we went here so we'll have different commit nodes in our objects directory we'll also have different tree nodes file nodes referencing the merge but ultimately as we're in the middle of the merge other than temp files all of the details are there in the index as we get into a merge we may end up with .arrange files and we saw that when we were looking at the config not this config openrecent.git config I've rigged up beyond compare to look at my files so it'll pull in the local file the remote file the base file and the merge file it'll pull in those four file names until you'll end up with those temp files in interesting places so yeah technically those files exist as you're merging a specific file that was a great question thanks what is the purpose of the commit versus tree files I didn't quite understand why they were unique good question let's dig into that so here's the commit node and the commit nodes purpose is to give me the name the date the parent node it's the one that we look at more visibly that's the particulars that we're going to attach to as we do most of the thing now the commit node references a tree node the tree node starts to reference file system details here's all of the files associated with this commit now in the commit node itself it didn't say anything about that it didn't say which files were in it at most it had the commit message but here inside the tree node that's when we start to enumerate the changes associated with this commit here are the files that we have now what's cool is it's referencing all of the files whether they changed or not that's the tree node now we notice in this case the tree node references another tree node so if you have lots of files and folders you'll have lots of nested tree nodes so this tree node goes to b9 here's b9 right here and that tree node references a particular blob so tree nodes reference other tree nodes and also reference blob nodes the blob nodes that's where the actual data is stored as we change our files we'll get the new blob data in our repository now that blob is what actually stores the data and because it's storing the data then if two different paths reference the same file we don't need another copy of the blob making our repository bigger instead we just reference the same blob that's why we have the separation between blobs and trees so that we can have files that may move and at that point I'm changing the tree node but I'm not changing the big blob node in our repository that was a good question what are some other popular tools similar to git explorer in this case I built this because I couldn't find one most git viewers won't look at all of the different kinds of nodes they'll just cruise through commits and if all you really want to know is what are the commits what can I check out then a tool that just does commits could be perfect I really wanted to look into all of the things I wanted to know which refs pointed at the pointed at each commit I wanted to know the trees and how they pointed to each blob and so I actually wrote git explorer I'd love to have some help on this I've started a live stream where I'm refactoring this because somebody wrote a really awful goat in here but if you'd love to help with git explorer I would love your help with that a little off topic do you know why git LFS always falls back to htdv, htdps even if the remote is ssh that is a good question actually no I don't sorry git LFS falling back to htdp if the remote is ssh that is interesting my guess is that git LFS is a different URL than the remote like maybe what you've committed is the path to that file in the file share rather than the actual content that git LFS is about what was that blog URL again good question go to robrich.org and click on presentation here's that URL right there robrich.org let me see if I can paste that in this answer yes ooh I wonder what that private button does how are rebases reflected in this good question so what happens git log dash dash one line dash dash graph dash dash decorate dash dash all so let's rebase these two commits over the top of name first let's check this out and make it a new branch git check out this git check out dash B pre rebase git log one line graph okay so now we have this content let's ignore these for now git rebase main so we hit some hook pre rebase hook prepare commit hook prepare commit message and each of those were past this commit edit message file in case they wanted to mutate that message ultimately we finished git log dash dash one line dash dash graph dash dash decorate dash dash all and we now see that this commit got transplanted over the top of that so how is this um how does this work inside that repository show type parent child show the lines show the tags here's those two commits the commits are different but this commit references this tree which is also referenced by this commit the tree didn't change so we didn't need another commit another tree node in our git object similar thing here's this commit and it references this tree node well it looks like this tree node did change a little bit so how is the rebase reflected in our repository we'll move our heads depending on how we rebase and we'll rewrite our commits but depending on what we change underneath we probably won't change our trees or our blobs in our git repository to pull off that rebase ultimately depending on the nature of what you change during your rebase like if you squash or if you edit commit or do things like that you may end up affecting other nodes in that process as well that was a great question you're deluded to a blob being permanent are they not gc'd as well if they only reference by commit that isn't on any branches good question so in this case we have this commit right here that is referenced by nothing so this happens in the normal course of the git history periodically if nodes are older than two weeks or more than a certain number of things change internally it'll call git gc git gc will go find all the things and do a garbage collection in the process of doing that garbage collection now if we go look through these objects hey wait a minute did we just delete the everything we didn't they're baked into these pack files now one is a pack file idx it's the same format as this index file right here but it also references many commit and here's that pack file that's all of those commit pushed into one now that makes it smaller it's only 3k instead of you know many k but we still have all of the commit in our history do we not? yeah we do show type, parent child, lines, tags except for that one that was dangly oh here's that one that was dangly it didn't get cleaned up yet in time it probably would get cleaned up and I'm actually surprised it didn't maybe it noticed that it hadn't been 2 weeks yet so are you going to wait for your secrets to wait for 2 weeks to see if your secrets got exposed they do get garbage collected eventually so you're right thanks for calling me on that but as soon as you leak your secrets into a git repository you should probably consider them exposed in spite of the fact that you might be able to GC and get rid of that commit node it's generally not worth the risk it's easier just to roll your secrets thanks for that question hi what is happening during a shallow clone using the dash dash depth command that is a good question so what happens when I do a git clone at a certain depth it's only going to go back so many commits so let's say I grabbed this one and I said get me a depth of 3 one 3 it just won't grab all of these other nodes and in fact it will rewrite this last one right now it says the parent is this commit as I'm doing a shallow clone to make the git repository work well it'll actually say that there is no parent if we were to go oh I can't scroll if we were to go all the way down to the bottom we would see that that first node doesn't have a parent node so when we shallow clone that's exactly what it does it just rewrites the last commit to say that it has no parent everything else is the same great question how will emerging weaknesses of Shaw one effect get hmm yeah that's interesting these are Shaw ones now they're not used for cryptography they're just used for validation but it's possible as Shaw one starts to deteriorate further that we may choose to move away from Shaw one and hash these differently now that's a decently traumatic move for git because all of the git repositories everywhere use Shaw one right now so if I get cloned the repository and I'm trying to look through history but I can't use Shaw one anymore there's going to be a process of probably upgrading each repository or a time when repositories can use either Shaw one or newer Shaw mechanism to be able to do that it'll be interesting to see what they choose as they flip from Shaw one to something else in the shortest term the reason that we haven't moved on from Shaw one yet is very specifically these hashes are not used for cryptography they're not used for security they're only used for hashing is that the right choice I think we can make good arguments on both sides of that that was a great question so how will this be shown into viscosi I'm not familiar with viscosi I'd love to learn more, send me a tweet or send me an email and let's dig into that one some more git gc-aggressive ooh good call let me do that git gc-aggressive spelling is hard okay now will that commit be gone show type parent child show the lines, show the tag hmm I fear that I have something pointing at this that makes it not want to garbage collect yet I think that might be all the questions this was oh can we view the kernel in git explorer true stress test will be a crazy graph yeah how would we show git explorer in git explorer that would be really cool here is git explorer the source for it so let's grab this folder and in the end file will pop that open and pointed it there I do need to escape it because Windows has the flashes going the wrong way let's stop this repository start it back up yeah this will be fun there we go show the type parent child lines, tags what this highlights is that I need a way to scroll but yeah here's the caches somebody needs to fix something I think that somebody who wrote something needs to fix something here's some tags or some tree nodes that was really cool I'm glad we got to explore git explorer with git explorer isn't the shot code of a commit based also on the parent so overriding the parent in shallow clone would have to rewrite all following commits no I think you're right I want to explore now doing a shallow clone and diffing it with non shallow clone and see what's actually different I think I might have been mistaken how do you see initiatives from Microsoft to modify git to host its huge code base as compared to repo political issues aside Microsoft bought github not git and I'm not sure that they know that and I think I will leave it there that was fun we've definitely gotten over time but this was a lot of fun getting to show you the definitive deep dive into git find me on Twitter for those questions that we haven't asked here and I'll see you on twat thanks for joining us