VIB Home

7 Gitignore

question Questions
  • How to exclude certain files from uploading to GitHub
objectives Objectives
  • Write expressions that ignore files from your repository

time Time estimation: 10 minutes

1. Introduction

What if we have files that we do not want Git to track for us, like backup files or intermediate files created during data analysis? Remember that GitHub is not your next cloud storage infrastructure. Hence, (big) data should not be uploaded on GitHub. In fact, there’s a strict file size limit of 100MB so you won’t even be able to do so.

Regardless of the above, it is often useful if your data is in the same projectfolder. And you can’t help the fact that Jupyter Notebooks makes intermediate checkpoints (.ipynb_checkpoints) in the same folder of the notebook.

Git has a file, the .gitignore file in which we can write expressions that define the files it should ignore. This chapter will briefly discuss the .gitignore file with a few simple examples.

2. Expressions

Imagine the following project folder structure:

 project-folder/
    |
    |- .git/
    |- .ipynb_checkpoints/
    |- .Rhistory/
    |
    |- data/
    |   |- R1.fastq
    |   |- dataset.csv
    |
    ...

Let’s discuss how to ignore a specific file and how we can use symbols to generalize the ignoring behaviour.

  • Ignore a file:

The easiest would be to define the file or the path to the file. E.g. the fastq file can be ignored by adding data/R1.fastq to the .gitignore file.

Similar to a file, a folder can also be ignored. The folders data/ and .ipynb_checkpoints/ can be ignored by adding the following lines:

data/
.ipynb_checkpoints/
  • *, ! and #:

The asterisk is often used in .gitignore files and represents a wildcard. E.g. *.csv will ignore any csv file in your folder and subfolders. The asterisk can precede a file format in which case it will ignore all the files with that format (e.g. ignore all csv, fastq, sam, bam, xlsx, pdf, etc. files).

An exclamation mark is used for exceptions. The following lines of code will ignore all files in the data folder, except for the dataset.csv file:

data/
!data/dataset.csv

Documentation lines are preceded by a #.

3. Standard files

It’s always good to think this through and manually add the files or folders that need to be ignored. However, it’s also useful to know that there are standardized .gitignore files. These files have been created based on a specific programming environment. They are all accessible in this repository and contain .gitignore files for Python, R, Ruby, Java, Perl, C++, amongst many others. These files can also be added on the fly to a new repository by initializing the repository with one of these files (see figure below).



Let’s continue with the next session!

congratulations Congratulations on successfully completing this tutorial!