 Okay. Hello everyone. So today I'll be talking about source code management. Do you guys know what I'm going to talk about? Okay. So I'll start with a little introduction about myself. So I've been working at Graph for about two years now. I've been working with both backend systems and infrastructure team. And today I'm going to talk about version control management. Don't get confused. It's the same thing as source code management. So basically what is version control management is that taking care of your versions, the various versions of a particular file. And how does it come in relation to computer science is that we have source code for most of our projects, right? And we have various versions that develop over time. So how do you manage these various versions? So why did VCM or version control management start? So firstly, if you are developing code in your computer, you are never sure if your computer just busts out. Where can you find your code? So the first and most important requirement was keeping the code safe. If you want to collaborate with someone else, VCM helps you do that. So two people who are working individually on the same piece of code in different places, even across different countries, can use this to collaborate. So it says don't forget your roots, which basically means that if you've grown from a very small project to a very large project, don't forget where you came from, have the versions stored. You may realize a potential problem with your latest code and you can always go back to where you started, right? Okay, so any version control system, it automatically tracks your changes. So yes, I'm basically calling a VCM a stalker. And suppose you have a very large piece of code. Let's take the example of Crap. And you want to develop three or four features at the same time. You'll have to have multiple branches to actually work on each of these features. So these were the five major reasons for wanting a system like VCM. But there are many more benefits that VCM offers. And let's go through those. So firstly, you're able to revert to a previous state, a working state. Maybe you made a change and you realize it doesn't work. You can revert files. You can revert your entire project to a previous state. You can use VCM to compare changes. You can see what was modified, what was the last modifications. In case there's a bug and you're not able to understand the root cause of it, you can actually see where those changes were introduced. And finally, if you screw things or if you lose your computer, you can easily recover. Okay, now let's talk about the various kinds of VCM that are there. We can broadly categorize it into a centralized one and a distributed one. What do we mean by a centralized version control management? A centralized control management is that your main copy of the source code is kept in a central server. Whereas for a distributed one, each and every developer who's working on that source code actually has a copy of it. So if you talk about some examples, if you've heard of tools like Perforce or Subversion, they are a centralized type and Git, Mercurial, these are examples of distributed type. So one of the biggest problems with using a centralized VCM is that it's a single point of failure. So if you're hosting your server, let's say in an AWS EC2 instance, and that instance goes down and if there was no backups, basically you lose all your data. Whereas for a distributed VCM, each and every person working on that source code has the entire complete history of that piece of code. So that's one of the biggest reasons why you might prefer a distributed version control system. But let's not forget that many big companies out there are using a centralized system. So even though we talk about all these benefits of distributed version control system, why are big companies using a centralized one? The reason is that just what is the benefit of a distributed system, that it stores the entire history on every developer's laptop, that becomes its most important drawback as well because companies are worried about security and they don't want everyone to have the entire history of your code base. So today I'll be focusing on one of the most popular VCMs and that is Git. So it's distributed in nature and thus you have the entire history and everything is local. So it's super fast. Okay, so basically there's one very important thing to understand how Git works and that's its workflow. If you understand the simple thing, you can actually understand how each and every operation happens. So what's mentioned here is that there's a working directory, a staging area, and then a Git repository. What it means is that any change that you make to your source code always goes first through a working directory. Let's talk about what is a working directory. So when you start a new project and you initialize it as a Git project, it basically takes up all your source code and makes an object, a big object out of it and stores it in the Git repository. That's the very first step of making a Git repository. Now, Git basically makes a snapshot of this and gives it to you as a working copy. So when you see a lot of files, it's basically a working directory. That's the place where you can look at the files, make changes. Any changes that you make are basically a part of your working directory. Now when you decide that these changes are good and should be committed, you move them on to a staging area. So you need to explicitly move the files to a staging area. Only after that can they be committed. What we mean by commit is that it's recorded as a particular snapshot and you can never lose those changes. So we talk about three kinds of files. A modified file which simply lives in your working directory. A staged file which you know is a change you want to propagate to the database and basically you mark it to go into the Git repository. And then a committed file which is a file which is basically moved to the Git repository already. So I'll talk about the various commands that you can use to basically transition between these different stages. So when we were talking about version control management systems, mostly if you read about any of those, you'll realize that these systems universally store a delta of the changes. What it means is that you have a file and then you make some changes. So the management system would just take a delta and store the delta. But how Git does it is that it would actually not store a file if no changes are made to it. So every file which is in the same stage actually has just one copy of it. So through history, let me actually show you an image which makes it more clear. So to explain the situation better, let's suppose we have a repository where we just have three files. There's a testing library. So basically, let's say these are three files that are there in our repository currently. So Git basically stores all of these files as objects. So blob is nothing but a binary large object. Now let's say these files are not changed, then Git would never store a second copy of it. So what it does is it creates a tree which shows what is the structure of your directory. So over here the tree points to basically these three things which are at the root level. And when you say something is committed, it just points to that tree. Now let's say you've made a change. So basically snapshot A is your starting point when you've just created a new project. So it's going to store the reference to a tree which basically we saw in the last slide. And it keeps the author, it keeps the committer. Now suppose you make some changes which is change only to one of the files. So if we go back, so let's say we've just changed the second file over there. It will just keep another copy of that file. And it will create a new tree object which would point to that, the new copy of this and basically this. So it doesn't really restore that and this which hasn't changed. So the new tree would have that structure and then a new commit would be created which would point to this new tree. At the same time what it does, so let's say this is snapshot B, it points to the earlier snapshot. So this is how you have the history stored and by not storing the same files again, it minimizes on the storage size for the new commits. Any questions on how it stores the history? So as we were talking, the first thing was actually storing the history and keeping your file safe. The next thing was about collaborating. So if you want two people to collaborate, you would want that particular code to be in a central repository. The central repository is called a remote repository. So just talking about a few things about the report repository. Firstly, the remote repository is a bare repository. In Git terminology, what a bare repository means is that there's no working directory for it. So if you go to the first slide where I talked about the workflow for Git, we saw that there's a working directory, a staging area and a Git repository. On the central repository since nobody is making a change, it's just used for collaboration. You don't need to have a working directory at all. So it's just a bare Git repository. So basically it just has the third bit. So in your local, there are tracking branches. What we mean by this is basically you have many branches in your remote repository. For each of these branches in the remote, there's actually a tracking branch in your local copy. What it does is ensures that everything that's there in the remote actually also has a reference in your local repository. So these branches are called tracking branches and they're named by, let's say I'm working on one project and I'm collaborating with someone else on one feature with that person. And at the same time, I'm collaborating with some other team for another feature. So maybe these two teams are not collaborating together. So we may use a different remote with this team and a different remote with this team. So you can actually have one project and have two different remotes. So that's why every tracking branch actually has a reference to the remote and the branch name. There's a concept of tags in Git. What a tag basically is you can pinpoint some of the commits and you can tag them. So for example, I have made three commits in my history and I realize that the third commit actually can go on to production. So I'll just tag it as like version 1.0. And when I'm doing a deployment for that particular piece of code, I can just use that version. So it's simply another reference to the same commit. The point I want to make over here is that the normal operation on the remote doesn't actually involve tags at all. So you need to specifically mention that you want to retain the tags or you don't want to retain the tags. Okay, so we've gone through a lot of theory of how things happen. I'll come to the basic Git commands, how you can actually start using Git. So let's say you have a project. You can use Git in it to actually initialize a Git repository. Or if let's say I have a project and you want to start working on it, I host my project somewhere, I give you a URL. You can just do Git clone, give the name of that URL and you'll get everything that I have in my repository till then. This is a very helpful command, Git status. What it does is it tells you if in your working directory there are any changes or not. So basically if you've done some things in your repository and you don't remember what all did you do, you can just run this command and it'll tell you which files were modified, which files were actually moved to the staging area, which files are not tracked by Git and so on. The next command is Git add. So basically Git add is that command which takes your changes from the working directory and moves it to the staging area. So what's Git remove? Basically a Git remove tells Git that this file is to be deleted. So delete that file and stop tracking that file. So I may have a file. I may make some changes into that and I don't want that file to be tracked anymore by Git. I may still want to retain it. So I'll just use the Git remove command. And after that, if I'm even making some changes in that file, Git won't track it for me. Guys, if I'm going very fast or if you have any questions, please feel free to interrupt me. Okay. So Git commit is the command which basically takes all the changes in your staging area and pushes it to the Git repository. So you can also have fancy messages for each Git to remember what actually went in that particular commit. So for example, if you fixed there was a bug in your code and you fixed it, you can probably have a link to that bug and you can put it in the commit message. Okay. So basically when we're talking about version control systems, we may not want to work on it linearly. Like you may be working on one feature request and something else comes and you want to change your branch. So there's this command called Git branch, which basically allows you to create new branches, list out all the branches that are there. Basically it's the main command to do anything with branching in Git. Okay. So what is Git log is basically shows you the history of commits that went into Git. So if you go back to this, basically if you do a Git log, it will show you a list of all these commands. So having these pointers actually helps in using the Git log command. Okay. So Git revert is an operation that you can use to move from one commit to a commit back in history. Let's say you make many changes, you deploy a new version of code, you realize that there are problems with that code and you want to go back to a version that was working smoothly before that you'll be using Git revert for that. This is one of the command which is used to move something from the staging area back to the working directory. So for example, if you realize that these changes were good to go to commit and then you feel like, okay, I made a problem. You can just revert back using this. Okay. So this is an important file called dot git ignore. If in your repository there are many files and you realize there are certain files you do not want to share with others or you don't want to track your version for that. You can just move your files to this particular file. You just write the, do you guys understand what a regular expression is? Okay. So you just put the regular expression for that particular file and Git won't track that file. Okay. So as we were talking about when you want to collaborate, there's a remote repository. So every time that you work with a remote repository, these are the four commands that you use. Git remote basically lists out all the remote repositories that you have registered for this project. Git fetch basically gets the list of all the branches and all the changes that are currently there in your remote repository, which might not be in your local copy. Git pull would actually get those changes. And let's say you're on the master branch, which is tracking the remote master branch. It will also merge those changes into your local copy. And Git push is used when you have some changes on your local repository, which you want to upload to the remote server. You use this command. Okay. So these are two important commands that we will deep type. So let's say I made a change and someone else made a change and we want to merge both of these changes. So there's two ways of doing it. One is Git merge and other is Git rebase. And I'll talk about both these commands in detail in later slides. So Git tags is the command that we use to interact with tags. You remember tags where we could actually tag specific commits. So Git tags will show you all the tags that you have. You can add more tags. You can remove tags using this command. Okay. So a merge basically does a three way merge and a rebase actually does a linear merge. What this means. So let's say a user was working. Like let's say I started a project and I made commits till C2. Okay. And someone else decides to work with me and they pull my changes and they make new commits C3 and C5. At the same time, I've made a commit C4. Okay. Now they want to basically merge their changes into my changes. Let's say we are a team of two people working on a feature. They've made certain changes and I've made certain changes and now we want to publish all of them together. So what a merge basically does is that it takes this change by let's say my teammate. It takes this change by me. It finds out the common ancestor and it will try to merge all of these three together. And that's how merge works. Now what do we mean by a fast forward merge? Like it said over there. So if you see basically over here. This is not a case of a fast forward merge because you have something over here which is not present over here. Right. But if we have a case where I make changes till C2 and the person makes changes C3 and C5 and then they try to push to the remote. It's actually a linear history. Right. It just gets updated into the remote. So that's what a fast forward merges. Comparing this with a git rebase. So let's say I made a change C3 and my teammate made a change C4 and now we want to merge the two codes together. What they can do is they can first get my changes into their local copy and then start a git rebase operation. What rebase would do is figure out what are the changes over here in relation to this. Remove all the changes they made and basically apply this patch to this so that the actual result is the same as a merge operation. The benefit of doing this is that we have a linear history here and in the git merge operation our history basically follows two different branches. So if I try to see in the remote what would happen is I have two different flows to go till C6. Whereas in a git rebase operation you just have a linear history version. So basically there's a lot of debate going on about using a git merge versus a git rebase and it's just a matter of your personal choice. These are two ways of how your history would look like. There are many benefits of each of them. I just wanted to give you the logic of how they work but there's not one which is better than the other. So in short this is what happens when you use a version control management system. And I'd like to pinpoint this that if anybody wants to work on a large project this kind of forms the backbone of working with a large project because without using a version control management system you can't collaborate with people. So as your project grows and there are many developers this is kind of the system that gives us the reliability and the safety net that your computer may go bust but your code still remains safe. What do you prefer rebase? I prefer rebase. Let's say you're working on a branch and there is an issue in production that needs to be fixed. So what's your strategy? So that's a very nice question which we encounter regularly. So usually we have one branch which is dedicated to deployments. Let's call it the master which is usually the terminology people use. And now there's something that I need to fix. So I'd use my master branch, check it out, create a new branch, work on that, do my fix. And once I've tested that this actually works I'll merge it into master and then deploy it into production. So basically the benefit that I get with using Git is that everything is still deployed into master. I don't have to affect it and I can just start a new branch right from what is there on master. Do everything I want to do without affecting anything and then deploy it to production. So you're essentially checking out and do another folder? Yes. Okay. That happens very often. That's a nice question again. So what happens is let's say you and me are collaborating on a project. I make changes to some file and you are on a version which was before my version. So you don't have my changes. Now you make changes to that same portion of the file that I have made changes to. So Git as a system doesn't know whose changes to honor. So it will see that the same lines are changed in two different ways. And it will give the user the discretion to decide which code actually goes to the remote. So that's when a merge conflict happens. For most of the scenarios if you're not changing the same lines but changing the same files, Git will intelligently do the merge on its own. Keep both the changes. You had a question? So as a version control. Okay. So that's a nice question again. So as you were talking about there are two types of version control management systems. There's a centralized one and there's a distributed one. So in a centralized system if you develop something and you needed to basically be shared with someone else or actually that's about example. So when we're talking about a centralized system you don't have the entire history. Okay. So if you want to go back to see what someone else did you would not be able to see that because a centralized version control system doesn't keep the entire history in your, let's say in your local copy. So that is where Git becomes a big blessing because you have everything that's happened to the project till the last time you did a Git fetch. That's basically a nice talk about centralized, centralized version control management systems versus the distributed ones. And again as I said with Rebase and Merge there's no one is better than the other. It depends upon your use case. If you think that security is your prime most concern and you don't want developers working in your organization to know the entire history you would just use a centralized version control management system for which to know the entire history you need to be connected to the remote. You can know the history but for that you need to be connected. So if an employee is let's say fired or lease the organization he cannot connect to the remote anymore and that's how they enforce security. So I was working with Samsung as my first organization. Yeah, so every place where security is their main concern uses a centralized version control management and every place which is basically a startup would start with a distributed version control management.