 So, like many great stories, the story of Git LFS begins in Paris. It was the 8th of April 2015, and a bunch of engineers were gathered for the annual Git contributors summit where all of the thought leaders in the Git community get together and talk about how they're going to develop everyone's favorite distributed version control system for the next 12 months. This happens the day before the annual Git merge conference, and there are a bunch of GitHub developers and a bunch of Atlassian developers in attendance. At one point, Nicola Pellucci and John Garcia, a couple of Atlassian devs attached to Bitbucket, were chatting to Rick Olson from GitHub. And Nick said to Rick, hey, you guys at GitHub might be interested in this new thing that we've been building. Atlassian have developed an extension to Git that solves the problem that Git has with tracking large binary content. So we've built this extension in Go, and we're planning on announcing it at our track session during Git merge tomorrow. And Rick from GitHub said, it's kind of funny. You might be interested in this. At GitHub, we've also been interested in this problem that Git has with tracking large binary files. So we've built an extension to Git in Go, and we're planning on announcing it at Git merge tomorrow in our track session, which happens to be immediately before yours. And Nick went, what? That's a picture of James, what, by the way? And Rick went, it sounds like we've kind of both solved the same problem. So what do you call yours? And this is where it gets really weird. And Nick said, well, we call ours Git lob, after long objects like you might find in a database, because we're trying to figure out how to store this large content in a better way. And Rick said, well, OK, that's really weird, because we also picked a three-letter acronym starting with L, Git LFS, for large file storage. And after a bit back and forth between the two companies and the conversations that I went over a couple of months after that, we decided to ultimately mothball an open source Git lob and start contributing to the Git LFS project instead. Now, fortunately, in an interesting example of parallel evolution, both Git lob and Git LFS were designed in pretty similar ways. They both used exactly the same Git extension points, and they were both built in Go. So our developers managed to port some of the features that we had built into lob that didn't yet exist in LFS over into the other project, which means now we have one project, which is fantastic because we haven't fragmented the community. One of the really nice things about Git is it's a really portable data format, and now that there are a few different Git hosting providers out there, it's nice that you can still migrate your repositories from one to the other if you so choose. Now, Git LFS is really exciting as a piece of technology in itself, but what's more exciting for me is that it means that literally every team in the world can start using distributed version control. Now, there was a subset of software teams out there who was stuck on centralized version control because they couldn't move to Git purely because it was distributed. If you're working on a team that's developing a game, for example, or if you're a researcher that needs to version huge data sets alongside their code, then it's really tricky to use Git because as a distributed version control system, you have to copy the entire repository history locally whenever you want to do something with it. Unfortunately, Git LFS kind of breaks this distributed part of Git, but only for large files, and I'll go a little bit more into detail about how that works in a couple of minutes. But what this means is that every single team out there can now migrate to Git if they so choose, whereas before, they were stuck on things like Perforce or Subversion. So this means that game designers with large assets, researchers with data sets, web developers with rich media that they want to version alongside their HTML source code, or even engineers who want to do things like build functional tests and check in snapshots of databases into their Git repositories can do that now. It opens up a huge number of convenient ways to version control assets that were just too big for Git in the past. So there's four broad things I want to cover today. First, I want to talk a little bit about why Git doesn't handle tracking large files very well. Then we'll look at how Git LFS solves this problem. And then I'll give you some tips for converting your existing Git repositories over to use Git LFS if you're already tracking large binaries in there. And then finally, I'll give you some tips for using Git LFS in a team context, because most software these days is built in some kind of team. So to understand why Git has trouble tracking large content, you need to know a little bit about the Git data model. Just for reference, how many people here are already using Git? OK, almost everybody. Fantastic. So I can probably breeze through this section fairly quickly. So in Git, you have a master branch, typically. This is kind of the equivalent of a trunk branch and subversion, or a default branch in Mercurial. And what this branch really is, is a lightweight pointer to a commit sitting somewhere in your repository history. And branches are so lightweight in Git, because literally all it is is one of these references. Inside your .git directory, which contains all of your repository data, there's a refs directory, which contains all of your branches and tags and other pointers pointing at different commits in your Git graph. Now, the master branch is literally just a file containing a scary-looking 40-byte hexadecimal hash. And what this hash is actually what Git uses for a commit identifier. And it tells you not only where that commit is sitting on disk, but it's also a hash of the actual, a shot one hash of the actual commit contents. And the commit objects themselves refer to other commits in the Git graph by referring to their commit IDs as well. So each commit object actually encodes its parent, or parents, in the case of a merge commit. Now, all of these commits, if you sort of trace their relationships, build up a graph. And it's directed because each new commit refers to previous commits in its history. And it's acyclic, because each commit can only refer to commits that already exist inside that graph. So you can never create a cycle. And this is known as the Git DAG, which is the data structure that underlies Git. Now, to understand why Git has such a hard time tracking binary content, we actually need to look at the structure of these commit objects. So Git also has a cat command called catfile. If you pass it the dash p flag for pretty, you can use it to inspect any object inside this DAG. So if we pass it that commit idea we were looking at, we can actually see what a commit is comprised of. And actually pretty lightweight. As I mentioned, they've got a reference to its immediate parent. They have a thing called a tree, which we'll come back to in a second, and some committer and author metadata. These will usually be the same person, unless you're working with patch sets or something like the Linux kernel, where someone different might be actually authoring the code and committing the code. And then you have a commit message as well. So the tree here is another type of object that sits inside the DAG. And what this represents is the root directory of your repository at the time that that commit was created. So all of the contents of your repo at that point in time are referenced by this tree. And if you use that catfile command again, you can actually inspect the contents of this tree. And it looks a lot like a directory on a file system because that's what it represents. You've got file modes. You've got the type of each object sitting inside that tree, which is blobs and trees. So trees are nested subdirectories. So each of these trees represents the top level directories inside your repo. And each blob represents a file sitting in the top level of your repo. So we can trace these relationships down using catfile if you like, or just by checking it out into a directory, and finally get the contents of your repository for a particular commit. Now, the reason why Git doesn't handle large binaries very well is that Git actually creates a new one of these trees every time you create a commit. And every time you change a file, it has to create a new one of these blobs to represent that file's contents at that particular point in time. Now, because Git is content addressable, each one of these is addressed by one of those char one hashes of the object's contents. It's very good at reusing existing objects if they don't change. But every time you change a file, you have to create a new one of these blobs and trees all the way back up to the root of that particular commit. Now, imagine a trivial case where we have a repository containing just one large file. Say, a high resolution photo of an elephant. If I go in, open GIMP, and then change the hue of that elephant to pink and commit it to my repository, then it's going to create a new blob of exactly the same size, doubling my repository to 100 megabytes. And every time I modify that file, it's going to increase the size of my repository again by whatever the size of that file is. Now, Git does actually assemble all of these blobs up into a thing called a pack file, eventually. And it does do some delta encoding at that point to sort of compress it, mainly so it can send it over the wire efficiently. However, delta encoding doesn't tend to work very well for a lot of binary file types, because they're already compressed. So if you're handling large binaries inside your repository, often your repository is going to bloat and bloat and bloat and bloat. And because Git is a distributed version control system, this is where you run into real problems. Just like any version control system, you eventually have to push those versions to the server. But unlike centralized version control systems, everyone else on your team has to pull down every version of that file that's ever existed. Even if they're only really interested in the latest version of those changes, they still have to pull down that full history. So that means the load on your server grows and grows, your fetches and pushes get slower and slower. And basically, your team comes to a grinding halt, because they're just taking so long to work with these Git repos. So Git LFS, as I mentioned, is a joint project between Alasian and GitHub that solves this problem really nicely. It's not the first attempt to solve large problems with Git, but it is the best one so far in terms of seamlessly integrating with your Git workflow. Git LFS piggybacks off the existing Git commands you're already using, so you don't really need to learn many Git LFS commands at all to start using it with your existing repositories. So the way that Git LFS works in a nutshell is instead of storing these large blobs directly in your DAG as part of your Git repository, instead, they get replaced with lightweight pointer files that don't have a direct DAG reference to those large objects. And the content is actually stored in a separate segregated Git LFS store. It's not actually part of your repository anymore. Now when you do a push, these large objects get transferred to the parallel LFS store, and your DAG gets transferred to your Git host as normal. But when you do a Git pull, your DAG gets fetched, and you only have these little lightweight pointer files, which are like 100 bytes each, that get fetched instead of the large content. And then LFS lazily downloads only the large content needed during the checkout process. So it's sort of broken the distributed part of Git in that you don't pull down the full repository history. And then if you go back and check out a previous commit, Git LFS will lazily download the content again as part of that checkout. So you only grab that stuff just in time as you need it rather than eagerly all at once. Now if we use that Git cat file command again, we can see the actual contents of one of these pointers. And they really are lightweight. It's three fields. There's a version URI, which is basically a fancy new way of doing a version number. An object ID, which is a SHA256 hash of that object's contents. So one interesting thing to note here is that it is content addressable, just like Git, but it uses SHA256 instead of SHA1. The principal reason for this is that SHA1, what Git is 11 years old at this point, SHA1 was thought to be a little more secure than it is, where it's been found out to be. So SHA256 is kind of a more modern equivalent. Also, some storage providers have built-in support for SHA256 validation of objects. So if you're using something like S3 as a back end for LFS, then you can leverage that SHA256 validation automatically, which is kind of a nice byproduct of using that too. Then you have the size of that object in bytes. So one of the really nice things about LFS is that it's cross-platform. It's built-in go and available via homebrew, or I probably should have had DNF or apt or something up there. Those packages are available for those as well. Once you've got the binary on your path, you simply need to run Git LFS install to initialize it for your entire system. And what this really does under the hood is make a modification to your global Git config. It adds this thing called an LFS filter, which you can use later to bind certain files to the Git LFS clean and smudge filters, which are responsible for transforming those large files into pointers and back again. Oh, yes, man. Yes. Sorry, so I'm just recording this as well. So the question was, use the term lazy before. So it's lazy in respect that it doesn't eagerly download all of your large history during the pull process, but it will download it during the checkout process by default. Exactly, yeah. There are some ways to actually force it to eagerly fetch history as well if you know that you're going to need it in the future. And I'll get to that in a later slide if you like. Exactly, yeah. So the idea is that you only pull down the large content that you want. But if you know in advance that there's going to be some commits that you do want that large content for, you can actually tell LFS to pull that down. So a pretty common pattern is to do an LFS fetch and go to lunch or something and then come back and work with that large content. So get to a clean and smudge filters in just a second. But once you've got this initialized globally, you can start using the Git LFS track command to tell LFS which types of files you want to track using LFS in a particular repository. So this is really the only command you need to know as a regular Git LFS user. Because almost all of the other interaction with your Git repos will push and pull and check out those LFS files for you automatically. What this does under the hood is add an entry to your Git attributes file, which binds a particular pattern to that LFS filter that we just set up. Now these patterns are exactly the same as the Git ignore patterns that you're probably used to using already. So you can bind different directories or specific file names or any pattern that you like. So now let's take a look at how that Git LFS smudge filter works. So the way that LFS files get stored in your local LFS cache before it gets transferred to the server is by intercepting the Git add command. Because this is when LFS typically adds new objects to that Git DAG. So just to add a large file that I'd made some modifications to and when I run Git add, instead of just copying those contents into my repository history, instead it gets handed off to this thing called the LFS clean filter. So it's responsible for cleaning objects before committing them to the graph. So it takes a SHA-256 hash of the object's contents and then stores that object inside the Git LFS object cache, which is like this parallel storage alongside your Git repo. And then it creates one of those pointer files that we looked at using that hash and then adds that to your Git DAG instead. So instead of adding 50 megabytes to the DAG, we're only adding a couple 100 bytes. Now when you do a checkout, the opposite thing happens. Git automatically hands off this pointer file to the Git smudge command. Again, using that filter binding we saw before. And then the Git smudge filter goes off and looks in your local LFS object cache to see if it could find a blob that matches this particular SHA. If it can't, then it goes off and reads through to that backing LFS store. So this is when the download will happen during the checkout process. Once it's found the object, it writes it out in your work tree. So one of the nice things about Git LFS is although I've just told you about this pointer file format, you probably don't need to know that because in actual usage of Git LFS, you won't see them because they get transformed into pointers automatically during Git Ad and then transform back into large files during Git Checkout. So you typically won't see them sitting around in your work tree. So that's how ads and checkouts work. Now I want to talk a little bit about how files get sent to and from the server using Git push and Git pull. So after you've made some modifications, Git LFS uses a thing called a pre-push hook to intercept the Git push command. Is everyone familiar with hooks? Using a couple people? All right, a few people. Yeah, so Git hooks is basically Git's extension system for intercepting certain commands. You can intercept basically everything that Git can do. I use things like prepare commit message to pre-seed commit messages with JIRA issue keys that I'm working on. So it extracts it out of the branch name and then puts it in my commit message automatically. Other people use things like post update hooks on the server to notify chat systems when a particular branch is updated or pre-commit hooks to make sure that all of the to quickly run the unit tests and then make sure tests are passing before a commit actually takes place. Now pre-push, as it's probably pretty obvious from the name, intercepts the push command and takes some action before the push occurs. So if you've made some modifications to Git LFS files and then you run a push, then you'll see some output from LFS saying it's transferring these files to the server before the rest of the DAG gets pushed to your repository host as normal. Similarly, when you do a pull, you'll see some output and this is coming from that smudge filter saying that it's downloading files that it needs to replace those pointer files in your checkout. Now one of the things you'll notice that you don't see as part of this is any sort of authentication prompt. And this is one of the places where Git LFS really works nicely compared to some of its competition. So Git LFS leverages your existing authentication with your provider. And not by just blindly sending your credentials to the host, but the Git LFS aware server actually tells the client how to authenticate with whatever your storage mechanism is. So the way this works is in the simple case with like an HTTP repo. The Git LFS client appends a special path to the end of your remote URL, which is slash info LFS objects batch. And a Git LFS aware host like Bitbucket will know how to respond to these requests. Basically it posts up a big JSON structure saying, hey, these are the objects that I need in order to fulfill this checkout operation that I've got locally. Then the server responds with some hypermedia telling the client where to download those objects. But what's really nice is the server also tells the client how to authenticate that request by passing down a set of headers. Now in the case of Bitbucket, because we control both the LFS store and the Git store, we just send down a simple JWT signed token saying, hey, this is the username who's making this request. And then the server can just validate that JWT and go, okay, well, that's the user that's authenticated. But the fact that we're using this hypermedia based system gives you a lot of flexibility as a Git LFS server implementer. Because we could pass down some basic authorization header to use with S3, for example, or OAuth, or really any authorization scheme or header scheme that we can dream up in order to authenticate that request. So it's quite a flexible way in order to broker the conversation between your LFS store and your LFS server. So that's how LFS works. Now I wanna talk a little bit about how you can convert your existing repositories over to use Git LFS. Now I mentioned that Git LFS track command. Now unfortunately, if you've already got large files in your repository, running LFS track won't actually reduce your repository size because it's just replacing the versions of those files in the next commit with pointers. So you still have all of these different large files littering your repository history. So you need to do something to get rid of those. Has anyone heard of Git filter branch before? Couple of people? Okay, yeah, cool. So filter branch is kind of the Swiss army knife of repository history rewriting. You could potentially go back and delete all of those large files, but that would be unfortunate because then you wouldn't actually be able to retrieve those anymore. So instead of deleting them, you can actually go back and rewrite your repository history to run the LFS track command using filter branch. Now this is kind of insane. But you know, kind of genius. This is actually from Andy Neff, one of the core Git LFS contributors. These days I don't recommend doing this, mainly because of the complexity, but also because this takes a huge amount of time to do. It actually has to go back and run this command on every single commit, which is massively time consuming. So instead, there's this tool called the BFG repo cleaner. And I recently discovered that the BFG isn't actually named after the gun in doom or Roald Dahl's big friendly giant. It's actually Git filter branch, acronymed and backwards, because really it's a superior version of Git filter branch built for rewriting your repo history. And the reason it's faster than filter branch is it's been specialized to only do a couple of things. It's actually built by a gentleman named Roberto Tiley from The Guardian. And initially he built it to kill your repository history. So if you had a file that shouldn't be in your repo history and you wanted to delete it, you could use the BFG to go back and expunge it from every single commit. You could also use it to blow away complete directories or even replace individual tokens within files. So if you've ever found a dev on your team who's committed a password or an AWS credential or something like that, you can use the BFG to go back and replace that with asterisks or something throughout your repo history, which is pretty handy. It's also significantly faster than the Git filter branch. So Git filter branch under the hood kind of runs a bunch of scripts and typically traverses your entire DAG, which can be quite slow. BFG is written on top of JGit, which is a Java re-implementation of Git, which is pretty performant these days. And also it memoizes each blob and tree that it's already processed. So it doesn't go back and rerun the same command on the same object. So it's kind of taking advantage of Git's content addressable nature. I get that 10 to 720 figure from a spreadsheet that's attached to the Git LFS repository on GitHub. They've run Git filter branch and the BFG side-by-side on a bunch of different projects and then measured the speed difference and then published that to kind of show off. And the most important thing is as a version 1.12.5, the BFG has built-in support for Git LFS. So instead of this very impressive filter branch command, you can run convert to Git LFS with the BFG, which as I mentioned is built in Java, so very cross-platform. So you just pass it the pattern that you want to convert to LFS, and then you have to pass the no blob protection flag, which I admit is a little bit non-intuitive. But what that means is it'll also rewrite the current commit that you have checked out. By default, the BFG doesn't do that because it assumes your repository is in a known good state by the time you run it. However, for LFS conversion, you probably want to rewrite your tip commit as well. Little product plug here. If you are using BitBucket already and you want to enable Git LFS, you just need to check the handy allow LFS button or checkbox in the repository settings. We actually just launched support in BitBucket Cloud as well about a month ago. It's still in beta, but if you want to test it out, hit the start using Git LFS button in your repository, and then all of the rest of your repository UI will start reading through that Git LFS store automatically, so you'll see it in diffs and pull requests, just as if it was tracked in your normal Git repository. So that's how you migrate over to use Git LFS. Now I want to give you some tips for using LFS in a team context. So the first thing that you need to know about working with binaries in Git, whether you're working with LFS or not, is you have to be very, very careful about merge conflicts. Now Git is very, very good at resolving changes to the same text file on different branches, but if you make any modification to a binary, then it is going to conflict because Git has no idea of how to semantically apply changes from different developers to the same binary path. So yeah, and this is really unfortunate because binaries in particular often take a long time to modify. It's not just a case of copying around some text, you have to open whatever specialized tool you're using and then apply your changes on top of the other person's changes, which is really frustrating. Now traditional centralized version control systems like Subversion and Perforce have an answer for this already, locking. Now this is a bit of a dirty word in distributed version control because everyone's used to having these nice little segregated branches or forks where they can do whatever they want and then reconcile those changes with their upstream repositories later on. However, Git LFS is going to implement locking and it's sort of in the process of being done but it's not yet available in any releases. But the way they've implemented locking is really neat. And to explain how it's being implemented, I want to first talk about how you definitely should not implement locking in a distributed version control system. Now if you're going to do a naive implementation you might think, okay, to lock a file, I really want to serialize changes to that file to a single developer. So probably the easiest thing to do is to serialize those changes on a particular branch. So you could theoretically say, okay, a developer can acquire a lock on this path on branch X and then make some changes and then release that lock once they're ready for other developers to go in and make their changes. Now this would work. You'd no longer end up with conflict because you're forcing everyone, like forcing a single developer to just work on that one branch. But this is pretty lame because Git is awesome because of its branching. It's nice that everyone has these little isolated environments in order to work on their features. And if I'm on my feature one branch, I may not want to merge master into my branch in order to be able to acquire a lock. So the proposal that actually came from Steve Streeting, one of the Atlassian Derbs attached to Git LFS is for a multi-branch model for locking. And kind of the way that this works is that anyone can acquire a lock on any file provided that they have the latest version of that file on their branch, which is a little bit different to just serializing changes on a single branch. So for example, say a developer wanted to make a change again on master. They could acquire a lock, make some changes, and then release that lock. Now imagine another developer wants to make a change on the feature one branch. Now if they're okay with merging master in or rebasing their branch on top of master, then they can, if they attempt to acquire a lock that's gonna fail because they don't have that latest version on their branch yet. But if they rebase, then they will have that latest version and now they're okay with acquiring a lock and making some more changes. But just say they didn't want to rebase or bring the history of master into their branch via a merge. What they can do is actually cherry pick that commit on top of their branch. So there's no direct commit history relating to that and then make that modification. And in fact, they don't even have to cherry pick the commit. They could actually just do a checkout of that particular file on top of their branch as long as they have an object which matches the SHA-256 hash of the latest available version of that path, then they're allowed to acquire a lock and modify it. So this is a much more flexible way of doing locking in the DVCS world. And really what's happening here is we've built kind of an orthogonal structure for a particular file on top of Git's branching model. So we no longer care about branches in terms of serializing changes to this file. We just have to make sure that we have this kind of continuous thread of modifications based on the last time it was updated rather than the actual location where it was updated, which is a much, much nicer way to do locking in a DVCS. At least we think so. So as I mentioned, the command, there is a little bit of work that has shipped or is on master currently, but the commands themselves aren't available in any Git LFS release yet. They're kind of looking like they'll shape up like this, basically a simple lock command where you lock a particular file and then that publishes the lock to the server. A Git LFS unlock command, which can be used by the developer who created the lock or can be forced by another developer. So really these locks are advisory rather than kind of like a hard and fast rule. So if you do really want to make changes to this binary and then you're happy to resolve the conflicts later, you can force unlock something locally. And then there'll be a simple command to list locks as well. Now, unfortunately, this is all kind of in flux. You can't use it yet. Until then, the best thing to do is to tell other members of your team that you're actually working on a large file before you start making those changes. So if you've got two artists on your team and you're thinking about going and changing that model object or that texture, just let them know that that's what you're doing. You're kind of manually past the mutex rather than running into conflicts later on. So I've mainly been talking about how individual developers interact with a particular Git LFS repository. Now I want to talk a little bit about how you can retrieve content that the other members of your team are working on. So there's a command called Git LFS fetch, which I think I briefly mentioned before. What this typically does is retrieve content for your currently checked out commit. Now you don't usually have to run this command yourself because that happens as part of the checkout process. However, you can use the dash dash recent option to pull down content from other recently updated branches. Now by default, this is any branch that has a commit on it that's newer than seven days old. And if you like this behavior, you can make it permanent using the fetch recent always flag. Git LFS is highly configurable. There's all sorts of options we'll be talking about in a second. And you can also configure some other recency behavior by modifying these properties. So you can change that threshold of what Git LFS considers recent by setting fetch recent rest days. You can also retrieve content relating to commits that aren't the tip of a branch by setting fetch recent commits days. So what this does is it finds any commit, or sorry, any branch that's been modified within that recency threshold, and then it walks back that number of days and pulls down any LFS content relating to those branches. Now, because these are days rather than just a raw commit count, this can potentially be a huge number of commits, which is why it's set to zero. So it really depends on your repository structure and how fast your branches move as to whether you wanna turn that setting on. However, if you're thinking about doing some cherry picking or some rebasing of your teammates' branches, just not really something you should do, then potentially you can set that value to pull down their history, or if you wanted to review interstitial versions of files and that might be useful as well. You can also turn the LFS fetch recent remote refs option off, if you like, so it defaults to true. What that means is it's gonna pull down LFS content for every remote branch, as well as branches that you've actually checked out locally. Again, depending on your repository and the size of your team, this might be a huge number of branches. So if you're working on a very large repo and want to use the recent flag, you might consider turning that off. Once you've retrieved all of this content, you might want to go and clean up that content at some point because it's gonna be just sitting in that local LFS cache taking up disk space. And unlike Git, Git LFS does not automatically prune files at any point because it's being a little bit conservative about going off and deleting your data. But if you do explicitly run prune, it's gonna go off and delete files that it thinks you no longer need. Specifically, it adds an offset to that recency threshold we were just looking at and then deletes any content older than that with a couple of caveats. So it will never delete content that's currently checked out because it assumes that, quite rightly, assumes that you're still gonna need that. And it will also never delete content that relates to a commit that you haven't yet pushed to the server. So it compares the values of your local branches and the remote branches and then doesn't delete any LFS content that's referenced by those commits. Now there is one setting which I really recommend turning on, which is prune Verifier Remote Always. Now what this does is actually check with the server that the LFS content that it's deleting actually still exists on the server. The reason this is defaults to false is because this really, really slows down the prune operation. It means that it has to, well, make some requests to the server, but it also means it has to go off and check every single object in LFS to see whether it's referenced. Sorry, not whether it's referenced. It actually goes off and checks with the server every LFS object rather than walking your dag and seeing what's referenced in deleting everything else. You can also set the prune Offset Days value if you want to change that threshold offset to something a little more conservative. So for instance, I have it set to like 21 or something because disk space is relatively cheap and I kind of like to keep these objects around for a little longer. Now, so this is how you clean up your local LFS cache. One of the interesting quirks or maybe it's just sort of one of those things that hasn't been built yet is the Git LFS API doesn't specify how to delete objects on the server side. So there's no way you can do a remote prune of your Git LFS store. So instead, each LFS implementer has been kind of left up to throwing devices as to how they build that. As far as I know, Bitbucket is the only implementer that actually lets you browse your LFS objects on the server and delete them. I think the others maybe just clean up files when you actually delete the repository. So this is kind of interesting. You have to be a little bit careful and we've got some commands there to show you how to, like if you find local references, it'll tell you how to find out which commits are referencing this particular object. But if you do delete a file that's still referenced by a particular commit, Git LFS will still work, but it will just check out that pointer file into your work and copy rather than checking out that large file. So if you're scrounging for disk space on your server, you might end up deleting files that still potentially referenced in your Git history and they just won't be checked out automatically for you anymore. So that's how that recency stuff is how you can check out more large file, more LFS history than potentially you need. You can also configure Git LFS to check out less LFS content. Why would you want to do this? Well, in some situations, you may not need all of the LFS content at for a particular commit. So for example, if you're a game developer writing some unit tests that are gonna test out your physics engine or something, you probably don't need to get all of your textures and model assets and things like that to run a simple CI build. So you might configure your CI script to do an LFS fetch with the exclude flag to prevent all of those assets from being downloaded. Or if you're a specialist on a particular team, you might include only a particular subset of your assets. So for example, an audio engineer or a graphic designer might only pull down the assets they're likely to be working with to save on transfer costs. And again, you can make these permanent using the fetch exclude and fetch include properties. And just like all of the other settings, these can be defined either for a particular repository or globally for every repo you have locally, which is pretty handy. So speaking of audio engineers and designers and potentially other developers not interested in using Git from the command line, Source Tree does have built-in support for LFS. Interestingly, kind of the principle at Lassian Committer for the Git LFS project is Steve Streeting, the guy who built Source Tree initially. So it has really, really good support for LFS. It has some cool stuff like it'll fix up any broken Git repository, or any broken Git LFS installations you have locally. So if that pre-push hook is missing, it'll install it for you. And it also mostly it's seamless. So if you do like a pull that'll pull down your Git LFS content too, just like regular Git, but you can explicitly invoke LFS commands through the options. It is a bit cheeky because we don't actually have a Source Tree variant on Linux just yet, but it does work on OSX and Windows. So if you're collaborating with people on those platforms, it can be useful. So I think that's all I've got for today. If you're interested in learning a bit more about the project, go check out the docs on gitlfs.github.com, or you can see the source code as well. The source is really interesting. It's built in Go, as I mentioned, but it's also a really good example of how to extend Git. So if you're interested in building your own Git hooks and extensions, or your own smudge filters and clean filters, having a look at how LFS is implemented is a good way to get inspiration for that. And as I mentioned, it's available in both bitbucket.org and Bitbucket Server as a version 4.3. And on Bitbucket.org, it's both free and paid accounts. Everyone can use LFS. For example, I use LFS for tracking my slide presentations because it's a nice way to back up versions in case I screw something up. Thank you very much for your time. So I think I've got plenty of time for questions. Yes, man? Yes. Yeah, so Gitanax was definitely a good project as well. Gitanax has more support for specifying different remote stores. So you can use your local file system as a Git LFS store, or you can set up SSH remotes, or various other things. The problem with Gitanax is you sort of have to do all that configuration yourself as a developer. Because Git LFS delegates authentication to the server, you don't have to know any special Git LFS commands. You can just use your regular push, pull, commit, checkout, and it will do everything under the hood for you. So I think the big advantage of LFS over Anax is that it's turnkey. Like it works out of the box, and you don't have to train up the rest of your team on how to use it. Yes, man? Interestingly, yeah, so there are a couple of things going on in the main Git project. I know Christian Kader was actually working on external object database stores for core Git. So potentially one of the ways that could work is you could store your LFS objects in an external database, rather than having to use LFS at all. But I think that's still a work in progress. The Git contributor summit earlier this year, certainly large object storage was kind of like the number one topic debated by the Git contributors during the summit. So it's definitely, I think, high on the mind of Jeff King, and Junio, and the rest of the core maintainers. But I don't think we'll be seeing it in the next few releases. Any other questions? I guess? Oh, OK. Nice one. Any other questions? Yes, man? Is it faster than a single depth check out? Do they need to deepen that check out at some point? Like if they want to commit? Gotcha, gotcha. So I know that single depth checkouts, there are some limitations there, but usually it's when you need to write back to it. So I think as a relatively recent version of Git, you can commit back and push from single depth, or shallow clones. But there are some problems with doing mergers, because Git's recursive merging algorithm needs to kind of walk back and find common ancestors and things like that. I think in your particular case, if you really were just doing a single depth check out, it would probably be, yeah, marginally fast thing at LFS. There will be some advantages that LFS will bring soon. So in 1.3, we landed a way to extend the way that files are transferred from the server. So with Bitbucket, we're building this kind of chunked encoding thing where you only need to fetch local chunks of files that you don't already have. So potentially if you're cloning and then fetching later on or doing multiple clones from the same repository, then you will have better performance with Git LFS. But yeah, if that's not likely to happen, then a single depth check out will probably work for your purposes. However, if you're using like a CI server, or if you're personally ever going to be working on it from multiple places, then you might benefit from using Git LFS just from your own usage to restrict your having to pull down that full history. Yeah, yeah, fair enough. Cool. Oh, yes, mate. Yes, what is the backup or data migration to a different server story than like? Yeah, absolutely. So we are essentially breaking the distributed part of Git with Git LFS. In terms of backup, you could either just back up your entire system, as you're probably already doing, or your particular home directory for something like Bitbucket server. Or you can set up a specific Git LFS clone, sorry, backup. That Git LFS fetch command has a dash dash all property as well. And if you pass that, it'll go off and fetch all of your LFS content from the server. So if you really want to kind of keep the D&D VCS, you could actually override your LFS fetch settings to always fetch all of that content. But yeah, that would sort of circumvent the reason why you're using LFS in the first place. Absolutely. There are ways you can do full backups of LFS if you need to. Yes, mate? CI systems that support LFS. So Bitbucket Pipelines does now, I think. Other than that, I mean, even if the CI system doesn't have specific support for LFS, you could always have Git LFS fetch as your first command and make sure that you have the LFS binary installed. And in fact, probably a lot of CI systems, unless they're based on something like JGit or LibGit 2, will probably just work out of the box if they have LFS on the path and have Git LFS installed on, because the global config will pull down those files automatically. Cool. All right, thanks very much for your time.