Git Branch Merging Tutorial

Author

Gabriel I. Cook

Overview

This tutorial covers how to merge a remote Git branch (one on the remote repository at GitHub.com) into a local branch (one on your local computer) without creating a local copy of that remote branch. This approach is especially useful when collaborating on a project where each team member is working on separate branches. Moreover, this approach limits temptation of working on the main branch and/or forgetting to switch off that main branch.

For this reference guide, you will work only on your personal feature branch. This of course, involves creating your branch locally on your computer inside your RStudio project for your team’s project.

Your team’s final R Markdown report file (.Rmd) will be lightweight and not visually bulky. You and your team members will write text mostly in that file and use relevant inline commands when necessary but primarily add few lines of code that will call .R script files (think starter_script.R) to perform the heavy lifting of the project report. By doing so, your report will be reproducible from beginning to end especially if you write your code flexibly to handle any new NAs in vectors that do not currently exist in the data set. In order to achieve this goal, you will need a lean .Rmd report file that leverages more complicated code manipulation. This approach will also force you to think intentionally about the goad of each script your write and in the end you will have clean, readable code files that are clear to others about their specific goals.

For example, your team will source() many script files and embed your desired data visualizations into the report using knitr::include_graphics(). When you wish not to include something, you can comment out those lines easily.

In order for you to utilize the script files, data files, and other files created by your team members, you will manage them using Git. Your commit history and files created by you will become part of the repository record and will be used to evaluate your contributions to the overall project. To achieve this, you will work on your own branch and integrate/merge other team member’s branch history into your own branch. The final step is to merge the branches (or more easily, the Code Lead’s branch) into the main branch for the final product deliverable package.

Feature Branches

Feature branches are used to manage and create separate elements of development projects. Typically, separate branches are created for each specific feature or task a specific developer is coding. Using individual branches allows developers to work independently on new features without affecting the main codebase until the feature is complete and ready to be integrated. Once the feature in created, a new branch would be created for developing a new feature.

For your team project, rather than create a new branch for each feature (e.g., data wrangling task, plot tasks, etc.), each team member can code all of their features in a single branch. If, however, you prefer to separate all tasks into branches, you may do so. For, example, create a jane-data-cleaning branch for cleaning data, a jane-data-summary-by-year branch for creating a specific data summary, and a jane-bar-plot-revenue branch for creating visualizations for revenue. The down side is that there will be multiple branches for you to manage. The upside is that you use branches as developers would use them.

Creating and Checking Out a Feature Branch

You can create and checkout a branch in one step.

git checkout -b <branch-name>

Setting the Upstream for Pushing Local Files to the Remote Repository

git push --set-upstream <branch-name>

The Git Command Process

Although have files created on your local branch (on your computer) of the repository, those files do not exist on the remote branch (on GitHub) until you put them there. Moreover, your team members cannot access them unless you put them on GitHub. Sorry, there is no using flash drives and emailing files. Data Science teams stage, commit, and push files, fetch and merge branches, and live happily ever after if you practice.

Staging Files

Stage the path to the file and file, spelled correctly.

git add <file-path-and-file-name>

Committing Files with Messages (for you and others)

git commit -m "clear message and file-name"

Pushing Changes from your Local Branch to your Remote Branch on GitHub

git push

A collaboration Example

Bill and Jane are collaborating on a project hosted on GitHub. Bill is working on a branch named bill, and Jane is working on a branch named jane. Jane has made changes to her branch and pushed them to the remote repository. Bill wants to incorporate Jane’s changes into his local bill branch without checking out or creating a local version of the jane branch.

Scenario Overview

  • The remote repository is hosted on GitHub.com.
  • Jane is working on her local branch named jane.
  • Bill is working on his local branch named bill.
  • Jane pushes her changes from her local jane branch to the remote jane branch on GitHub.
  • Bill needs Jane’s changes in order to work on his feature but Bill does not have a local branch jane, nor does he want to touch her branch.
  • Bill does not touch, edit, modify, delete, or do anything with Jane’s files.
  • Sam enters the collaboration space

Verify Remote Repository

The only way for collaboration to work is to ensure everyone is connected properly to the remote repository. You can check the location with git remote -v. You will see the url for fetching and pushing.

git remote -v

Solution

Bill will use Git commands to fetch Jane’s remote branch, merge those changes into his local bill branch, and update the remote bill branch on GitHub. Note: The same solution can be applied for all branches or the main branches as described after the example steps.

Steps for Merging a Remote Branch into a Local Branch

These steps assume that Jane has informed Bill and other team members that she has completed her feature (e.g., .R script file, .Rds processed data file or aggregated/summarized data frame (no .csv’s), .png plot, or something else) and she has pushed them to her remote branch on GitHub.

Step 1: Fetch the Remote Branch

The first step is for Bill to fetch Jane’s remote branch. This downloads the changes in jane from the remote repository to Bill’s local machine without creating a local branch for jane.

git fetch origin jane

You should see a terminal message like:

From https://github.com/slicesofdata/<your-team-repo>
 * branch            jane      -> FETCH_HEAD

Warning: Once you fetch a teammate’s branch, you will see files that you have already fetched and merged and the date corresponding to the file will update accordingly. The file you see, however, will not reflect the changes. Similarly, new files that you have not fetched previously will not appear in the Files tab in RStudio. In order changes to these files, you need to perform the merge for the content update to take effect.

Step 2: Ensure You’re on the Correct Local Branch

Before merging any changes, Bill needs to ensure he’s on his local branch bill. He can check the current branch with:

git branch

This will show a list of local branches, and the current branch will be highlighted with an asterisk (*).

If Bill is not on the bill branch, he can switch to it with:

git checkout bill

Step 3: Merge the Remote Branch into the Local Branch

Once Bill is on his local bill branch, he can merge the changes from the remote jane branch into his local branch:

git merge origin/jane

This command merges the changes from origin jane (the remote version of Jane’s branch) into the current branch (Bill’s local bill branch).

$ git merge origin/jane
Merge made by the 'ort' strategy.
 src/figs/jane-revenue-by-quarter-bar-plot.R | 8 ++
 1 file changed, 8 insertions(+)
 create mode 100644 src/figs/jane-revenue-by-quarter-bar-plot.R

Also, as long as Bill does not create a jane branch locally, he will follow the same steps next time he needs to fetch and merge Jane’s pushes to the remote branch.

git fetch origin jane

git merge origin/jane

Warning: Do not create a local branch for your teammate’s branches and the above steps will remain the same.

Step 4: Handle Merge Conflicts

As mentioned previously, naming your files with intention will reduce naming files with the same name. If you have duplicate name files, you will likely have merge conflicts. You do not want to deal with these as they will cause you and I extra time and headaches.

Tips for inexperienced Git users to avoid merge conflicts:

  • Name your files with your initials (e.g., gc-revenue-by-quarter-bar-plot.R), and; this will limit confusion about who created the file
  • Do not create another team member’s branch on your local system
  • Do not touch another team member’s files; no editing, no overwriting
  • Read/Source files by

If there are any merge conflicts (i.e., changes in Jane’s branch that conflict with Bill’s local changes), Git will pause the merge and prompt Bill to resolve the conflicts manually.

Step 5: Commit the Merge

The merge using git merge origin/jane also automatically stages/adds the changes that occurred as part of the merge. Thus, they need to be committed.

Bill can not commit the merge (assuming there are no conflicts or they have been resolved):

git commit -m "merged remote jane into local bill"

Step 6: Push the Updated Local Branch to the Remote Repository

Once committed, Bill needs to push his updated local bill branch (now with Jane’s changes) to the remote repository:

git push origin bill

Although git push would work just fine, adding origin bill leaves no ambiguity about the command’s behavior and ensures that the remote bill branch on GitHub is updated with the changes from (now) both Bill’s and Jane’s work.

Fetching Updates to Main Branch

If there are updates to the main branch, for example, your liaison adds new data, you can fetch those using the same process. You would also follow this approach if your code lead updates the main branch and you need the changes made there.

For fetching the main branch only:


git fetch origin main git merge origin/main

Fetching Updates Made to All Branches

If you make committing changes to your main branch and fetching changes from other branches a weekly process, you can instead fetch changes made on all branches. This approach may be preferred to fetching individual branches. However, if a teammate makes new changes that you need, you may need to fetch their specific branch.

For fetching all branches including the main branch:

git fetch origin git merge origin/main

Deciding When to Use Each Approach

Use git fetch origin (all branches) if:

  • You want to make sure all branch references are current.
  • You are working on a project where multiple branches are frequently updated and you may need to switch between them.
  • You are working in a team environment with several active branches and want a complete view of all recent changes.

Use git fetch origin <branch-name> (specific branch) if:

  • You are focused on a specific branch and do not need other branches updated.
  • The repository is large with many branches and you want a quicker fetch.
  • You are interested only in a particular branch’s changes and want to avoid merging or managing updates from other branches.

More than 2 Collaborators, More than 2 Branches

Jane is excited about the project and is making great progress. She updates a current code file, creates two more files, stages, commits, and pushes the changes to the remote repo.

Sam enters the collaboration with a sam branch. After Jane has made these recent changes, Sam performs a fetch and merge of Jane’s branch exactly as did Bill. The difference is that Sam has a more recent version of one file and has two more files from Jane’s branch than does Bill.

Sam now wants to fetch and merge Bill’s remote branch with his local branch. Although Bill has older versions of Jane’s files in his branch, Sam will retain versions of the files fetched from Jane’s remote branch. This occurs because Git uses a three-way merge algorithm for comparing the common ancestor of both branches and applies the most recent changes based on commit time stamps.

Warning: If you are not vigilant and you modify a teammate’s file on your branch, you may experience problems. For example, If Bill modifies code on the same lines in Jane’s file, Git will flag a merge conflict. Git will not automatically choose which version of the file to keep. You will need to do this manually.