A study in procrastination: setting up gitLFS on gitlab
Published on Sep 07, 2020 by Impaktor.
“Many people delay taking action because they hope to avoid suffering.
They keep searching for a path that won’t involve pain or sacrifice or tradeoffs.
But some form of suffering is always inevitable.
The process of taking action is the process of choosing your pain.”
Table of Contents
UPDATE
: This article is a half-finished “note to self” on how to move to - and learn - git-lfs, that never came to fruition.1. Introduction
I’m involved in the development of the “Newtonian physics”-oriented space game Pioneer space sim 1. We use git for version control. That means every clone of the repository is a local copy of the full development history. That way, we can easily check out a file from, e.g. 3 years ago. So, as new changes are pushed into the repository, git tracks that change by adding it as a diff.
Tracking the diffs is easy as long as the source code is in raw text (e.g. C++, Lua, json). A small change results in a small diff. However, when the file is binary, say a png, any change will cause the whole file to be replaced, thus growing the repository with the total size of that file (compression ignored).
This problem has reached its pinnacle now that we have the “luxury-problem” of several new contributions, each adding binary blobs to the repository. Specifically, we are in the process of adding several new ships (Escape pod, Skipjack Courier), as well as planet texturing, and on top of that, it seems like we’ll be getting new music contributions.
As pioneer’s main git-fascist, I have taken it upon myself to read up on git-lfs. The plan is to host binary files on git-lab, as they offer 10GB storage for free. So far, I’ve mainly been procrastinating, although I have set up an account for myself on gitlab, a pioneerspacesim repository (and started this blog as well).
2. How to migrate to git-lfs as end user
2.1. Basics
By “End user” I mean a user who compiles from source, and/or package maintainer
gitLFS is seamless, it works by replacing the binary blob (e.g. a png file) with a link to the blob on the remote repository, thus the local git clone will not add the blob, but just the link. When a revision is checked out, the blob is downloaded remotely.
For this to work, the end user will simply need to do two steps (once):
- Not only
git
needs to be installed, but also thegit-lfs
module.On Arch linux:
sudo pacman -S git-lfs
On Debian/Ubuntu:
sudo apg-get install git-lfs
If you’re making a fresh clone of pioneer’s repo, git-lfs will bootstrap itself, and download the lfs files needed for the current branch.
git clone https://github.com/pioneerspacesim/pioneer.git
Note, there is significant speedup, especially on Windows, by explicitly using lfs:
git lfs clone
as this will first do a normal checkout, then batch download all lfs files, dramatically reducing the number of HTTP requests and processes spawned.
Similarly,
git pull
works seamlessly, but if any file isn’t downloaded properly (for some reason?), try again withgit lfs pull
. Same withgit push
, which is seamless, except there will be some additional status output, as lfs tracked files are pushed to the lfs-server.If you already have a clone, I assume the new path to the new repo must be added?
Get latest lfs objects on remote repo, e.g. from origin branch:
git lfs fetch origin master
For future clones of the repositories, gitLFS will bootstrap itself when you clone a repo.
2.2. Advanced / package maintainers
To not download all lfs files, e.g. when configuring a CI build for unit
tests, we can exclude lfs objects using --exclude
or -X
flag:
git lfs fetch -X "data/models/**"
Similarly, for only including special files, use --include
or -I
, e.g.
for an audio engineer:
git lfs fetch -I "*.ogg,*.wav"
Or combine, fetch everything in data directory, except
git lfs fetch -I "data/models/**" -X "*.dds"
Pattern is the same as for track
and .gitignore
. These can be permanent
git config lfs.fetchinclude "data/models/**" git config lfs.fetchexclude "*.dds"
3. TODO How to migrate to git-lfs as owner/admin
Note to self: Just FAKKING DO IT, you lazy worthless bum!
3.1. Basic
By “owner” I mean the person on the dev team making the actual transition
Idea is to migrate our existing git repository (on github), to use an externally hosted LFS (hosted on gitlab, since they allow up to 10 GB for free).
Initiate git-lfs in the repo (assumes git-lfs package is installed on the system):
git lfs install
mark which files to use lfs for, e.g. (note: quotes important! 2):
git lfs track "data/models/*.dae"
Now, when pushing files, there is a pre-commit hook that pushes the large files to a separate server.
The track
command creates/updates the .gitattributes
files (in same
folder the command was issued under 3), which needs to be
tracked by git, or else new clones of the repo will not get the lfs-files,
so remember to check it in. Otherwise Git LFS will not be working properly
for people cloning the project:
git add .gitattributes
Issuing git lfs track
without any arguments shows summary of all
.gitattributes
-files.
To remove a pattern, simply edit the .gitattributes
-file or
git lfs untrack <pattern>
(Obviously, the files should not be in .gitignore
, or they’re not tracked)
3.2. Migrating already existing archive
Now we want to set up github to put (selected) binary files into a repository on gitlab.
“…simply adding the large files that are already in your repository to Git LFS, will not actually reduce the size of your repository because the files are still referenced by previous commits”
Solution: Follow this guide (based on a bitbucket-guide) to:
- Clean git commit history: bfg, e.g. with (see linked guide for full info), will scan all the history, looking for any files with specified extension, and convert them to an LFS pointer:
bfg --convert-to-git-lfs "*.{png,mp4,jpg,gif}" --no-blob-protection pioneer.git
- Then add git lfs tracking rules to new binary files
- All developer’s branches are now borked! (And that’s a bad thing!). Possibly, one could still cherry pick? Or convert old commits to patches at the very least?
Most likely, we don’t want to do this due to it’s brütalitÿ. Just do something like in this github guide: Configuring Git Large File Storage to use a third party server where third party is gitlab.
- Disable git lfs on github settings–>options page (here)
Check out created repo on gitlab that is to hold the lfs files, and check it’s
Endpoint
:git lfs env
Switch over to github repo, and create
.lfsconfig
that points to third party server (Endpoint
). E.g. as gitlab doc:git config -f .lfsconfig lfs.url https://third-party-lfs-server/path/to/repo
Inspect
.lfsconfig
:[remote "origin"] lfsurl = https://third-party-lfs-server/path/to/repo
Finally, to have same configuration for all users:
git add .lfsconfig git commit -m "Adding LFS config file for third party server"
WHAT REMAINS?
- Find lfs server url (
git lfs env
on gitlab repo?) - migrate the files: https://docs.github.com/enterprise/2.16/admin/guides/installation/migrating-to-a-different-git-large-file-storage-server
https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
skim? https://docs.gitlab.com/ee/administration/lfs/index.html
4. Further good-to-know git-lfs usage
4.1. Fetching extra git LFS history (and getting rid of them)
To download lfs copies for all branch tips that are more recent than 7 days (default):
git lfs fetch --recent
This is useful for e.g. when you anticipate working off line, or want to easily cherry pick commits across branches, or rewrite history.
To change default meaning of “recent” to 10 days:
git config lfs.fetchrecentrefsdays 10
Or to be bloody brütal, just fetch all:
git lfs fetch --all
these could then be removed from your local cache by:
git lfs prune
which removes all “old” lfs files, meaning files not in one of:
- currently checked out commit
- a commit that has not yet been pushed (to origin, or whatever
lfs.pruneremotetocheck
is set to) a “recent” commit (default: 10 days, by summing
lfs.fetchrecentrefsdays
(7) withlfs.pruneoffsetday
). Can be change with:# don't prune commits younger than four weeks (7 + 21) git config lfs.pruneoffsetdays 21
NOTE:
“Unlike Git’s built-in garbage collection, Git LFS content is not pruned automatically, so running git lfs prune on a regular basis is a good idea to keep your local repository size down.”
Worth pruning, perhaps exercise caution:
git lfs prune --dry-run git lfs prune --dry-run --verbose
and/or make sure the files have remote copy, before they’re pruned:
git lfs prune --verify-remote
and for peace of mind, make this the default behavior for pruning (note: takes longer time):
git config lfs.pruneverifyremotealways true
4.2. Deleting/remove files
The Git LFS command-line client doesn’t support pruning files from the server, so how you delete them depends on your hosting provider.
Gitlab writes: To remove objects from LFS:
- Use git filter-repo to remove the objects from the repository.
- Delete the relevant LFS lines for the objects you have removed from your .gitattributes file and commit those changes
4.3. Finding paths or commits that reference a git LFS object
Some useful commands, if one knows the sha-256 OID of and LFS object:
# Find which commit references a lfs object, : git log --all -p -S <OID> # find a particular object by OID in HEAD git grep <OID> HEAD # find a particular object by OID on the "power-ups" branch git grep <OID> power-ups
4.4. “Merge conflicts” / Locking files
There is no easy way to resolve merge conflicts with binary files. One solution is to lock files by extension or name, to prevent them being overwritten during merge.
Need to tell git lfs which files are lockable, e.g.
git lfs track "data/models/*.dae" --lockable
then add to .gitattributes
"data/models/*.dae" filter=lfs diff=lfs merge=lfs -text lockable
when making changes to an lfs file:
git lfs lock data/models/ships/ac33.dae Locked data/models/ships/ac33.dae
and to unlock:
git lfs unlock data/models/ships/ac33.dae
Like with push --force
we can force unlocking, (be sure you know what
you’re doing):
git lfs unlock data/models/ships/ac33.dae --force
4.5. Credentials
Example of how to increase timeout for authentication when pushing, to one hour (src)
git config --global credential.helper 'cache --timeout=3600'
Footnotes:
Pioneer started as an open source / free software clone of Frontier Elite II, which was the sequel to the 1980s smash hit (at least in the UK) Elite (I recommend Elite+ if interested in checking it out).
Omitting quotes will cause the wildcard to be expanded by your
shell, and individual entries will be created for each file (*.dae in this
case) in your current directory. Use same patterns as for .gitignore
(but
does not support negative patterns).
# track all .ogg files in any directory git lfs track "*.ogg" # track files named music.ogg in any directory git lfs track "music.ogg" # track all files in the Assets directory and all subdirectories git lfs track "Assets/" # track all files in the Assets directory but *not* subdirectories git lfs track "Assets/*" # track all ogg files in any directory named Music git lfs track "**/Music/*.ogg" # track png files containing "xxhdpi" in their name, in any directory git lfs track "*xxhdpi*.png
Patterns are relative to the directory in which git lfs track
command was run. For simplicity, always run git lfs track
from the root of
your repository.