Git worktrees

Git is an important part of my daily life. Professional as well as in private I use it to manage documents and all kinds of files and changes and synchronize these between my environments.

Working alone in my repositories I can commit my changes to the branch master all day long. This works well since I am the only one working in it.

In a customer environment this is different. Different workflow might be implemented, which then require branching, merge/pull requests and a whole lot of other shenanigans.

Often these are more fluid environments. Tasks may change on a daily basis. This requires more flexibility in handling code changes and development as well.

I often have to switch between tasks and temporarily leave code half finished until a later point in time.

Stash and apply

Running a task in a side branch of master is no biggie. The workflow usually looks similar to this:

cd ~/repo1
git checkout master
git pull --rebase origin master
git checkout --branch new_feature_branch

These steps update the branch master to the latest commit and create a new branch based on it, named new_feature_branch (creative, I know).

As long as I develop new_feature, I am fine. All my changes are in my current branch and folder structure and I can focus on my task. But this is not always the case.

Colleagues or messages might suddenly appear with requests, that will touch the code I am developing in. This puts me in a bit of a pickle as soon as I have to checkout a different branch.

A quick way of handling this is to stash all changes with git stash, then checkout the other branch and fix the task, before heading back into new_feature_branch and running git apply. These two command store and pull the latest changes on an internal stack so I can roll them out again, without having to create commits, that I might have to squash later..

More complex requests involving multiple tasks make also the handling of git stash more complex and I need to remember what changes are coming from where.

Worktrees Step 1

Using worktrees in Git is one way of solving this problem.

The idea is to checkout each branch you are working in into a separate directory. At the same time. Just by switching between the directories, I can switch branches. No need to stash and apply, no need to keep track of my changes.

My first contact with worktrees was the idea to temporary outsource a branch of a Git repository when needed and leaving the original repository intact. Given a small test repository with a test file inside, the procedure looked like this:

$ cd ~
$ cd my_repo
[master]$ git worktree add -b new_feature ../my_repo_new_feature
[master]$ cd ../my_repo_new_feature
[new_feature]$

This created the following file-structure

$ tree ~/my_repo*
my_repo  # master branch
└── testfile
my_repo_new_feature  # new_feature branch
└── testfile

The home directory now contains two directories, with the same Git repository, but having checked out different branches.

The Git repository even knows and keeps track of the existing worktrees.

[master]$ git worktree list
~/my_repo              65e175d [master]
~/my_repo_new_feature  65e175d [new_feature]

If I tried to checkout the branch new_feature again, Git would not let me.

[master]$ git checkout new_feature
fatal: 'new_feature' is already used by worktree at '~/my_new_repo_feature'

Working like this made my life more complicate and did not really help me. It was just good to know that there was a way of accessing other branches if I had to. But keeping the other directories structured and being tracked of which repository they belonged to, stopped med from using worktrees in this way.

And then it made sense

The breakthrough came with an obvious change to the structure of the directories:

  1. Checkout a bare repository.
  2. Keep the branches as sub-directories.

In this case ~/my_repo is the source repository. If it is local or remote, does not matter at this point. Neither does it matter if the source repository is a bare or a regular repository.

$ cd
$ tree my_repo
my_repo
└── testfile
$ git clone --bare ~/my_repo ~/my_repo_clone
Cloning into bare repository 'my_repo_clone'
$ cd my_repo_clone
$ git worktree list
~/my_repo_clone (bare)
$ git worktree add master
Preparing worktree (checking out 'master')
HEAD is now at 02b4f71 Initial commit
$ git worktree add new_feature
Preparing worktree (checking out 'new_feature')
$ cd master
[master]$ ls
testfile
cd ../new_feature
[new_feature]$ ls
testfile

The directory structure in ~/my_repo_clone now looks like this:

$ pwd
~/my_repo_clone
$ tree .
[...] # omitting all Git stuff
├── master         # Master branch
│   └── testfile
└── new_feature    # new_feature branch
    └── testfile

Now I can have multiple branches checked out at the same time, work in them and commit and push with them as I would usually do.

Drawbacks

Nothing is perfect. So far I have seen some challenges that you should think about before you might adopt this.

  1. Running code

    Depending on how you have structured your environment and how close you are to standard practices in Ansible, using this alternative directory layout can be challenging. In any case you are introducing at least one additional level of directories to manages. But there is hope and a useful structure you can use to separate your running code in the master branch from your development branches.

  2. Cleanup work trees

    Each checkout branch is represented as a directory somewhere within the file system. Technically you can just delete the directory and the branch would be gone. At least from the file system. Git will not allow you to checkout another directory again with the same branch until you have properly removed the worktree entry with git worktree remove $dir, even if the directory is no longer existing.

    The file-system also might get cluttered with checkout branches. While usually only one branch is checked out in the file-system, worktrees explicitly allow to do this and require to do some cleanup if a branch is no longer required or has been deleted on the remote.

  3. Mirror cloning

    Integrating worktrees into the current folder structure of a Git repository would also work within a regular Git repository, but that will probably lead to confusion.

    Creating a sub-directory with a worktree will add the worktree as shown in the example above, but also add an un-tracked directory into the checkout branch in the root directory of the repository. This is some kind of recursion which does not seem to be useful at all and only will create a mess.

    When cloning a repository from a remote you therefore have to add the --bare parameter in order to create the remote repository as a bare one locally. This reduces the risk of mixing worktree sub-directories with the actual code.

Example file structure

Worktrees enable you to have a constantly checked out master branch of your GIT repository, while running development at the same time.

In the following example directory structure, I separate my dev environment from my production environment. This particular example targets code for the Ansible Automation Platform. Code in the master branches is the one being run against any number of target hosts. The development section contains repositories with any state of development and any number of features in it.

$ tree
.
├── dev
│   ├── repo1
│   │   ├── feature1
│   │   └── feature2
│   └── repo2
│       ├── feature1
│       ├── feature2
│       └── feature3
└── prod
    ├── repo1  # master
    └── repo2  # master

Whenever I get a request for implementing changes or developing new features, I create a new branch by adding a worktree to the dev folder with git worktree add <featurename>.

When being done I push the changes back to the Git remote, merge it into master after passing all tests and can update my prod environment.

This works in this particular structure and in this particular case where code is being ran manually. If you use Ansible Automation Platform, AWX or other similar solutions, you can omit the worktree master and only focus on the development part.


This is just an example file structure which makes sense for a particular customer using Ansible code.

Other environments I structure differently, completely dependent on the requirements I have and to what works best for me.

I hope this feature gives you a new perspective and option on how to use Git for you purposes.

Daniel Buøy-Vehn

Senior Systems Consultant at Redpill Linpro

Daniel works with automation in the realm of Ansible, AWX, Tower, Terraform and Puppet. He rolls out mainly to our customer in Norway to assist them with the integration and automation projects.

Comparison of different compression tools

Working with various compressed files on a daily basis, I found I didn’t actually know how the different tools performed compared to each other. I know different compression will best fit different types of data, but I wanted to compare using a large generic file.

The setup

The file I chose was a 4194304000 byte (4.0 GB) Ubuntu installation disk image.

The machine tasked with doing of the bit-mashing was an Ubuntu with a AMD Ryzen 9 5900X 12-Core ... [continue reading]

Why TCP keepalive may be important

Published on December 17, 2024

The irony of insecure security software

Published on December 09, 2024