Project Management 02: R Projects, Git, and GitHub

Author

Gabriel I. Cook

Published

August 26, 2025

Overview

This module focuses on getting organized. Rather than save files in a haphazard way that will just introduce stress to your life, we will focus on creating order. The best way to create order and stay organized is to 1) create projects in RStudio, 2) create directories and sub-directories that leave no ambiguity about where your files are, and 3) manage all directory paths and file paths simply using the {here} library. Another way is to connect that project with a remote repository saved someplace like GitHub for collaboration. In certain classes (and for your team project), you will use Git to interact with remote repositories connected to Projects in RStudio.

In order to maintain organization for this class and the project, you will set up a class folder (aka directory) on your computer. You will then create an RStudio project and connect it to a remote private repository on your GitHub account. The reason for its privacy is because of data related to certain exercises.

You will use this RStudio project for all class exercises and homework so that there is no ambiguity about where your files are saved. Finally, you will create directories within your new project directory so that you have an organized directory structure for storing your files. Systems paths for project files and directories will be manage using the {here} library. This process will also ensure that each student’s computer is configured in the same manner.

Reading through these steps, however, will facilitate your ability to apply the concepts and run the associated functions in class. Thus, all students will gain some basic experience with Git commands and a remote repository. Students will be collaborators of a repository for their team project. Coding leads will carry the responsibility of maintaining the organization of the team’s private repository.

Libraries Used

  • {usethis}: 3.1.0: for project workflow automation
  • {gitcreds}: 0.1.2: for querying git credentials
  • {gh}: 1.5.0: for querying the github api
  • {gert}: 2.1.5: optional R library approach for git commands

Readings and Preparation

Before Class: First, watch course videos (and/or read) to familiarize yourself with the concepts rather than master them. I will assume that you attend class with some level of basic understanding of concepts and working of functions. The goal of reading should be to understand and implement code functions as well as support your understanding and help your troubleshooting of problems. This cannot happen if you just read the content without interacting with it, however reading is absolutely essential to being successful during class time.

Complete the items in the To Do: Steps of the Task section.

Class: In class, some functions and concepts will be introduced and we will practice implementing code through exercises.

Warning

Do not try to cheat the system and jump ahead. If you do, just like playing the Monopoly board game, your chance card may read “Go to jail. Go directly to jail. Do not pass go. Do not collect $200.” In other words, you cannot complete these steps without ensuring that your credentials are set. You will run into errors and try to contact me. If the following code does not return information for your login, your github account, scopes, and a token, you will be unable to proceed. If it does but your token is expired, you cannot proceed. Ensure you have set your credentials.

gh::gh_whoami()

To Do: Steps of the Task

Following the sections below, you will:

  1. Create a Version-Control Project with RStudio
  • Name it dataviz-exercises (for class exercises and your homework)
  1. Make file edits, stage those edits, and commit them
  2. Push commits to GitHub

In class, we will practice using RStudio along with some simple Git commands for adding, committing, and pushing files.

Creating a Local Directory for Class

Create a folder (aka directory) named "dataviz" (yes, all lowercase) on your computer. I recommend creating the directory someplace where you might not accidentally delete it. Create only one so as not to confuse yourself. This will serve as the directory within which you will store content for this course.

Connecting the Repository to an RStudio Project

You should already have a repository on GitHub named “dataviz-exercises” which you created from this template repository. You will now create an RStudio project and connect it to that remote repository on your GitHub account.

When you create the project inside your class directory, your directory structure will look like this:

dataviz/
└── dataviz-exercises (project root directory)
  1. In RStudio, File > New Project > Version Control > Git.

  2. In the pop-up, you will see a request for the “repository URL”. Paste the URL of the GitHub repository. This URL will be the same as what you see on your GitHub account. However, we need to add .git to the end of it.

    https://github.com/<your_github_username>/dataviz-exercises.git
  1. When you create the project, a directory will be created as a sub-directory of your main /dataviz directory. Thus, you will see /dataviz/dataviz-exercises.

WARNING: Do not create the project inside of an existing project’s directory.

Note: I recommend that you also select “Open in new session” in order to compartmentalize projects. When you work on the team project, open the project. When you work on your homework or other class exercises, open your homework project.

  1. Click “Create Project” to create the new project directory, which will create:
    • a project directory on your computer
    • a project file with file extension .Rproj
    • a Git repository or link to the remote GitHub repository for the project (also an RStudio Project)

If the repository already exists on GitHub (and it does in this instance) you should see RStudio flash a connection to GitHub and likely pull the repo contents down to your newly-created project directory. In this case, however, your local Git repository on RStudio will contain few files.

Understanding the Directory Structure

Directory structures are used for file organization. Each directory and sub directory has a purpose, which is to contain files of a certain type. As long as you know what the goal of the file is, you know where to save it. When working with teams, this common language avoids many problems.

Although there are different ways to create project directory structures and different ways to name those directories, we will use the following structure. Not all directories will be used for all types of projects.

Inside your /dataviz/dataviz-exercises directory your full project directory structure should look like the one below.

.                           (project root directory)
└── data/
│   ├── processed/
│   └── raw/
├── dataviz-exercises.Rproj (the R project file)
├── docs/
├── .gitignore              (a version-control gitignore file)
├── README.md               (a read me file)
├── refs/
└── reports/
│   ├── figs/
│   └── images/
└── src/
│   ├── data/
│   ├── figs/
│   ├── functions/
│   └── utils/

Directory and Sub-Directory Purpose

The purpose of each directory and sub-directory is explained following the structure.

  • data/: for raw/virgin data files and modified data files
  • docs/: for document files like the project description, any dictionary of variable names, etc. (this is different from )
  • refs/: for references, papers, reading materials, and other document
  • report/: for R Markdown (e.g., .Rmd) report files and their output file types (e.g., .docx, .pdf, .html)
  • src/: for all source code related files (e.g., .R scripts, functions, .py files, etc.). General scripts can be saved in the top level src/ but most of your script files will be saved in src/figs/ because you will create figures

More directory descriptions are provided below.

Data Files

Inside data/, add the following sub-directories:

  • raw/, for data/raw/: containing raw data files obtained from sources (e.g., .csv, .tsv, .xlxs)
  • interim/, for data/interim/: .Rds (highly recommended) files containing intermediate transformed data; cleaned, merged, etc. but not processed fully to be in final form
  • processed/, for data/processed/; .Rds (highly recommended) files containing finalized data (e.g., aggregated, summaries, and data frames ready for plotting

NOTE: For this course, you will see me write data as .Rds files using the saveRDS() function because this format will preserve variable formatting which will affect plots.

WARNING: If you process and save those data files as .csv, .xlsx, or similar, you will likely find yourself working harder by recoding solutions you have already performed. I do not recommend this except for final versions that no longer require processing.

Source/Code Files

Inside /src, add the following sub-directories:

  • data/, for src/data/: containing .R scripts needed to download or generate data
  • figs/, for src/figs/: containing .R scripts needed to create visualizations
  • functions/, for src/functions/: containing all .R functions needed that do not belong to libraries

Files for Reports

Inside report/, add the following sub-directories:

  • figs/ for report/figs/: containing visualization files (e.g., .png) for the report
  • images/ for report/images/: containing image files (e.g., .png) for the report

When testing your plots, you may wish to add notes or other written content that you can use in conjunction with your plots. In such cases, I recommend creating R Markdown files with meaningful names for taking notes. You can save these reports in the top-level of report/ and then source your .R figure script

Below are examples of an .R script for creating your visualizations and an .Rmd file that reads the .R script and renders the .png file within it. These files are also located under the Example Files & Other course tab. Your team report will utilize this same structure, though details and files will be also located under the Project course tab.

Moving forward, save all data to their relevant sub-directories within data/; create all .R code files and scripts in files in src/, including scripts use to create your visualizations and .png plot files; create all exercise or homework R Markdown files (e.g., .Rmd) in report/. Finally, any readings or references can can saved in refs/ and any other document files can be saved in docs/. Reserve report/figs/ for writing/saving plots or figures. All paths to directories and files for reading and writing files will be managed using the {here} library.

Summary

You now understand how to create projects in R, how to connect projects to remote GitHub repositories, and how to use directories intentionally.

Other Resources

  1. Git Client:

Git clients work like the RStudio Gui option described above but likely much better. One client is GitKraken. * If you find the Terminal command line daunting or limiting, I might recommend a Git Client to use as I am not a big fan of the RStudio interface. * GitKraken is a good option and they have lots of tutorials on their website. GitKraken is seamless to set up. Install, connect your GitHub account, select your repo to add, and voilà. You can stage, commit, and push from there.

  1. happygitwithr

Session Info

sessionInfo()
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] htmltools_0.5.8.1 DT_0.33           openxlsx_4.2.8    vroom_1.6.5      
 [5] lubridate_1.9.4   forcats_1.0.0     stringr_1.5.1     dplyr_1.1.4      
 [9] purrr_1.1.0       readr_2.1.5       tidyr_1.3.1       tibble_3.3.0     
[13] ggplot2_3.5.2     tidyverse_2.0.0  

loaded via a namespace (and not attached):
 [1] generics_0.1.4     stringi_1.8.7      hms_1.1.3          digest_0.6.37     
 [5] magrittr_2.0.3     evaluate_1.0.4     grid_4.5.1         timechange_0.3.0  
 [9] RColorBrewer_1.1-3 fastmap_1.2.0      R.oo_1.27.1        rprojroot_2.1.0   
[13] jsonlite_2.0.0     R.utils_2.13.0     zip_2.3.3          scales_1.4.0      
[17] cli_3.6.5          rlang_1.1.6        crayon_1.5.3       R.methodsS3_1.8.2 
[21] bit64_4.6.0-1      withr_3.0.2        yaml_2.3.10        tools_4.5.1       
[25] tzdb_0.5.0         pacman_0.5.1       here_1.0.1         vctrs_0.6.5       
[29] R6_2.6.1           lifecycle_1.0.4    htmlwidgets_1.6.4  bit_4.6.0         
[33] pkgconfig_2.0.3    pillar_1.11.0      gtable_0.3.6       glue_1.8.0        
[37] Rcpp_1.1.0         xfun_0.52          tidyselect_1.2.1   rstudioapi_0.17.1 
[41] knitr_1.50         farver_2.1.2       rmarkdown_2.29     compiler_4.5.1