UP | HOME
Impaktor

Impaktor

Inject Emacs intravenously

A study in procrastination: setting up gitLFS on gitlab
Published on Sep 07, 2020 by Impaktor.

“Many people delay taking action because they hope to avoid suffering.
They keep searching for a path that won’t involve pain or sacrifice or tradeoffs.

But some form of suffering is always inevitable.
The process of taking action is the process of choosing your pain.”

Table of Contents

UPDATE [2024-07-11 Thu]: This article is a half-finished “note to self” on how to move to - and learn - git-lfs, that never came to fruition.

1. Introduction

I’m involved in the development of the “Newtonian physics”-oriented space game Pioneer space sim 1. We use git for version control. That means every clone of the repository is a local copy of the full development history. That way, we can easily check out a file from, e.g. 3 years ago. So, as new changes are pushed into the repository, git tracks that change by adding it as a diff.

Tracking the diffs is easy as long as the source code is in raw text (e.g. C++, Lua, json). A small change results in a small diff. However, when the file is binary, say a png, any change will cause the whole file to be replaced, thus growing the repository with the total size of that file (compression ignored).

This problem has reached its pinnacle now that we have the “luxury-problem” of several new contributions, each adding binary blobs to the repository. Specifically, we are in the process of adding several new ships (Escape pod, Skipjack Courier), as well as planet texturing, and on top of that, it seems like we’ll be getting new music contributions.

As pioneer’s main git-fascist, I have taken it upon myself to read up on git-lfs. The plan is to host binary files on git-lab, as they offer 10GB storage for free. So far, I’ve mainly been procrastinating, although I have set up an account for myself on gitlab, a pioneerspacesim repository (and started this blog as well).

2. How to migrate to git-lfs as end user

2.1. Basics

By “End user” I mean a user who compiles from source, and/or package maintainer

gitLFS is seamless, it works by replacing the binary blob (e.g. a png file) with a link to the blob on the remote repository, thus the local git clone will not add the blob, but just the link. When a revision is checked out, the blob is downloaded remotely.

For this to work, the end user will simply need to do two steps (once):

  1. Not only git needs to be installed, but also the git-lfs module.
    • On Arch linux:

      sudo pacman -S git-lfs
      
    • On Debian/Ubuntu:

      sudo apg-get install git-lfs
      
  2. If you’re making a fresh clone of pioneer’s repo, git-lfs will bootstrap itself, and download the lfs files needed for the current branch.

    git clone https://github.com/pioneerspacesim/pioneer.git
    

    Note, there is significant speedup, especially on Windows, by explicitly using lfs:

    git lfs clone
    

    as this will first do a normal checkout, then batch download all lfs files, dramatically reducing the number of HTTP requests and processes spawned.

    Similarly, git pull works seamlessly, but if any file isn’t downloaded properly (for some reason?), try again with git lfs pull. Same with git push, which is seamless, except there will be some additional status output, as lfs tracked files are pushed to the lfs-server.

    If you already have a clone, I assume the new path to the new repo must be added?

    Get latest lfs objects on remote repo, e.g. from origin branch:

    git lfs fetch origin master
    

For future clones of the repositories, gitLFS will bootstrap itself when you clone a repo.

2.2. Advanced / package maintainers

To not download all lfs files, e.g. when configuring a CI build for unit tests, we can exclude lfs objects using --exclude or -X flag:

git lfs fetch -X "data/models/**"

Similarly, for only including special files, use --include or -I, e.g. for an audio engineer:

git lfs fetch -I "*.ogg,*.wav"

Or combine, fetch everything in data directory, except

git lfs fetch -I "data/models/**" -X "*.dds"

Pattern is the same as for track and .gitignore. These can be permanent

git config lfs.fetchinclude "data/models/**"
git config lfs.fetchexclude "*.dds"

3. TODO How to migrate to git-lfs as owner/admin

Note to self: Just FAKKING DO IT, you lazy worthless bum!

3.1. Basic

By “owner” I mean the person on the dev team making the actual transition

Idea is to migrate our existing git repository (on github), to use an externally hosted LFS (hosted on gitlab, since they allow up to 10 GB for free).

Initiate git-lfs in the repo (assumes git-lfs package is installed on the system):

git lfs install

mark which files to use lfs for, e.g. (note: quotes important! 2):

git lfs track "data/models/*.dae"

Now, when pushing files, there is a pre-commit hook that pushes the large files to a separate server.

The track command creates/updates the .gitattributes files (in same folder the command was issued under 3), which needs to be tracked by git, or else new clones of the repo will not get the lfs-files, so remember to check it in. Otherwise Git LFS will not be working properly for people cloning the project:

git add .gitattributes

Issuing git lfs track without any arguments shows summary of all .gitattributes-files.

To remove a pattern, simply edit the .gitattributes-file or

git lfs untrack <pattern>

(Obviously, the files should not be in .gitignore, or they’re not tracked)

3.2. Migrating already existing archive

Now we want to set up github to put (selected) binary files into a repository on gitlab.

“…simply adding the large files that are already in your repository to Git LFS, will not actually reduce the size of your repository because the files are still referenced by previous commits”

Solution: Follow this guide (based on a bitbucket-guide) to:

  1. Clean git commit history: bfg, e.g. with (see linked guide for full info), will scan all the history, looking for any files with specified extension, and convert them to an LFS pointer:
bfg --convert-to-git-lfs "*.{png,mp4,jpg,gif}" --no-blob-protection pioneer.git
  1. Then add git lfs tracking rules to new binary files
  2. All developer’s branches are now borked! (And that’s a bad thing!). Possibly, one could still cherry pick? Or convert old commits to patches at the very least?

Most likely, we don’t want to do this due to it’s brütalitÿ. Just do something like in this github guide: Configuring Git Large File Storage to use a third party server where third party is gitlab.

  • Disable git lfs on github settings–>options page (here)
  • Check out created repo on gitlab that is to hold the lfs files, and check it’s Endpoint:

    git lfs env
    
  • Switch over to github repo, and create .lfsconfig that points to third party server (Endpoint). E.g. as gitlab doc:

    git config -f .lfsconfig lfs.url https://third-party-lfs-server/path/to/repo
    
  • Inspect .lfsconfig:

    [remote "origin"]
        lfsurl = https://third-party-lfs-server/path/to/repo
    
  • Finally, to have same configuration for all users:

    git add .lfsconfig
    git commit -m "Adding LFS config file for third party server"
    

WHAT REMAINS?

https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html

skim? https://docs.gitlab.com/ee/administration/lfs/index.html

4. Further good-to-know git-lfs usage

4.1. Fetching extra git LFS history (and getting rid of them)

To download lfs copies for all branch tips that are more recent than 7 days (default):

git lfs fetch --recent

This is useful for e.g. when you anticipate working off line, or want to easily cherry pick commits across branches, or rewrite history.

To change default meaning of “recent” to 10 days:

git config lfs.fetchrecentrefsdays 10

Or to be bloody brütal, just fetch all:

git lfs fetch --all

these could then be removed from your local cache by:

git lfs prune

which removes all “old” lfs files, meaning files not in one of:

  • currently checked out commit
  • a commit that has not yet been pushed (to origin, or whatever lfs.pruneremotetocheck is set to)
  • a “recent” commit (default: 10 days, by summing lfs.fetchrecentrefsdays (7) with lfs.pruneoffsetday). Can be change with:

    # don't prune commits younger than four weeks (7 + 21)
    git config lfs.pruneoffsetdays 21
    

NOTE:

“Unlike Git’s built-in garbage collection, Git LFS content is not pruned automatically, so running git lfs prune on a regular basis is a good idea to keep your local repository size down.”

Worth pruning, perhaps exercise caution:

git lfs prune --dry-run
git lfs prune --dry-run --verbose

and/or make sure the files have remote copy, before they’re pruned:

git lfs prune --verify-remote

and for peace of mind, make this the default behavior for pruning (note: takes longer time):

git config lfs.pruneverifyremotealways true

4.2. Deleting/remove files

The Git LFS command-line client doesn’t support pruning files from the server, so how you delete them depends on your hosting provider.

Gitlab writes: To remove objects from LFS:

  • Use git filter-repo to remove the objects from the repository.
  • Delete the relevant LFS lines for the objects you have removed from your .gitattributes file and commit those changes

4.3. Finding paths or commits that reference a git LFS object

Some useful commands, if one knows the sha-256 OID of and LFS object:

# Find which commit references a lfs object,
: git log --all -p -S <OID>
# find a particular object by OID in HEAD
git grep <OID> HEAD
# find a particular object by OID on the "power-ups" branch
git grep <OID> power-ups

4.4. “Merge conflicts” / Locking files

There is no easy way to resolve merge conflicts with binary files. One solution is to lock files by extension or name, to prevent them being overwritten during merge.

Need to tell git lfs which files are lockable, e.g.

git lfs track  "data/models/*.dae" --lockable

then add to .gitattributes

"data/models/*.dae" filter=lfs diff=lfs merge=lfs -text lockable

when making changes to an lfs file:

git lfs lock data/models/ships/ac33.dae Locked data/models/ships/ac33.dae

and to unlock:

git lfs unlock data/models/ships/ac33.dae

Like with push --force we can force unlocking, (be sure you know what you’re doing):

git lfs unlock data/models/ships/ac33.dae --force

4.5. Credentials

Example of how to increase timeout for authentication when pushing, to one hour (src)

git config --global credential.helper 'cache --timeout=3600'

Footnotes:

1

Pioneer started as an open source / free software clone of Frontier Elite II, which was the sequel to the 1980s smash hit (at least in the UK) Elite (I recommend Elite+ if interested in checking it out).

2

Omitting quotes will cause the wildcard to be expanded by your shell, and individual entries will be created for each file (*.dae in this case) in your current directory. Use same patterns as for .gitignore (but does not support negative patterns).

# track all .ogg files in any directory
git lfs track "*.ogg"
# track files named music.ogg in any directory
git lfs track "music.ogg"
# track all files in the Assets directory and all subdirectories
git lfs track "Assets/"
# track all files in the Assets directory but *not* subdirectories
git lfs track "Assets/*"
# track all ogg files in any directory named Music
git lfs track "**/Music/*.ogg"
# track png files containing "xxhdpi" in their name, in any directory
git lfs track "*xxhdpi*.png
3

Patterns are relative to the directory in which git lfs track command was run. For simplicity, always run git lfs track from the root of your repository.