read

git has seen such a huge adoption over the past few years, so it’s no wonder that there have emerged a number of suggested workflows for it. What git lacks in user-friendliness, it certainly makes up for in flexibility, and it’s this flexibility that has resulted in this wide range of possible use cases.

I’ve spent a while working with git myself, and during that time I’ve seen a few workflows. It’s obvious that different workflows apply for different situations—the recommendations for working within the git source itself are based on the fact that multiple versions need to be maintained, and thus there is a “graduation” system where changes are first merged into the oldest branch requiring support, and then gradually merged upwards to later branches, until a commit finally makes it onto the master branch.

This flow doesn’t really apply to most, and it’s not entirely unreasonable to say that a lot of the features of git are unnecessary for the typical developer. That’s the wonder of git, though: you can customize it to suit your needs.

Workflow

At Causes, we use git in a perhaps unconventional—but simple—way. Whenever starting on a task, a new branch is created to encapsulate all the changes as a part of that task (you might have heard of this referred to as a “topic” or “feature” branch). As you work on it, the canonical source (origin/master) will be updated by other developers, and so we rebase that feature branch on top of origin/master often so any conflicts can be addressed earlier rather than later (this is in contrast to the merge option, where you use a merge commit to address any conflicts, dealing with them only at that point in time). After commits are approved via our code review tool Gerrit, the commits are cherry-picked serially onto origin/master.

Conceptually, this means our commit history is perfectly linear, as any chunk of work is simply a series of commits that is appended to the history when it is merged onto master. There are no “bubbles” or branches of any kind in the output of git log --graph.

Experts might find this to be a great injustice, as you lose information about where branches started and when they were merged back in, but we’ve found this to not be a significant loss of information. We believe the true value of a repository history lies in well-written commit messages. We also believe that long-lived topic branches are to be avoided, as we want to aim to always push something to master at the end of every day.

Topic Branch Dependency

A situation that can sometimes arise is when you are working on a topic branch that builds on top of another topic branch which has not yet been merged onto origin/master (due to pending code review, for example).

In order to continue being productive, you want to continue working on feature-2, but it’s likely that you’ll have to make some changes to feature-1 after feedback from code review, or you’ll have to rebase feature-1 onto a new master after fetching origin/master, or both.

This requires you to rebase each branch serially in order to preserve the references of each branch. To see why, consider what would happen if you were to just rebase feature-2 onto an updated master using git rebase feature-2 --onto master (notice in the figure below that the colours of the commits are modified to emphasize that their hashes have changed from the rebase):

Before and after straight-straightforward rebase

Notice that the commit feature-1 is pointing to is no longer the predecessor of feature-2, which during the rebase had its parents updated to the same commits as feature-1 but with different hashes due to the rebase. To fix this, you would need to manually set feature-1 to point to the new commit (git checkout feature-1 && git reset --hard feature-2~2).

Alternatively, you could avoid this problem by rebasing each branch in serial, starting from master:

git fetch origin
git rebase master --onto origin/master
git rebase feature-1 --onto master
git rebase feature-2 --onto feature-1

Managing Large Dependency Chains

This becomes annoying in the case where you have very large dependency chains, as you have to traverse the chain up to master and then recursively rebase each branch onto its rebased parent in serial.

Although this is a relatively rare occurrence, I was intrigued by the idea of automating this task. I call it a recursive rebase, and I use it regularly as part of my workflow. It’s powered by the following recursive script:

git-recursive-rebase () {
  if ! git diff-index --quiet HEAD; then
    echo "You have local changes; either stash them or get rid of them"
    exit
  fi

  local old_head=`git rev-parse HEAD`
  local local_branch=`git symbolic-ref -q HEAD | sed 's/^refs\/heads\///'`

  if [ -z "${local_branch}" ]; then
    echo "HEAD is currently detached--aborting."
  fi

  local remote_branch=`git config branch.${local_branch}.merge | sed 's/^refs\/heads\///'`
  local remote=`git config branch.${local_branch}.remote`

  # If remote is local repository, then we're tracking a local branch
  # We want to fetch that branch's remote branch, so change to it and recurse
  if [ "$remote" == "." ]; then
    # Switch to tracking branch, refresh it, and switch back to our branch
    git checkout "${remote_branch}"
    git-recursive-rebase
    git checkout "${local_branch}"
  fi

  if [ "${remote}" == "." ]; then
    local rebase_ref="${remote_branch}"
  else
    git fetch ${remote}
    local rebase_ref="${remote}/${remote_branch}"
  fi

  if [ -n "`git rev-list ${local_branch}..${rebase_ref}`" ]; then
    echo "Rebasing ${local_branch} with changes from ${rebase_ref}"
    if ! git rebase ${rebase_ref}; then
      echo "##################################################################"
      echo "UNABLE TO REBASE"
      echo "Run 'git rebase ${rebase_ref}' to deal with this conflict manually"
      git rebase --abort
    fi
  fi
}

In a nutshell, the script starts on the current branch, and recursively traverses the current branch’s tracking (i.e. remote) branch until it finds a tracking branch whose remote is actually a remote repository (e.g. origin instead of ., which is used to represent the local repository).

I’ve set the alias git rrb to execute this script, completely automating this process for me. There’s probably some edge cases I’ve missed, but after having used it for a while it’s worked without any surprises. It may be over-engineered, but it’s pretty darn cool.

Rebase-based Git Workflow

Workflow

Topic Branch Dependency

Managing Large Dependency Chains

Written by

Shane da Silva

Shane da  Silva

Coding by the woods