Git and Version Control: Tracking and Collaborating

Before You Start

You should know: - How to create and edit .py files and navigate folders in the command line.

You will learn: - Why β€œSave As” version management fails and what Git replaces it with. - The difference between Git and GitHub. - How to configure Git before your first commit. - The fundamental lifecycle: init, status, add, commit, push, pull. - How .gitignore keeps your repository clean. - How to view history and undo mistakes.

Introduction

At some point you will write a 300-line Python script that perfectly classifies land cover. The next day you will try to optimize one function. You delete a few lines, tweak a loop, and the script crashes. You press Ctrl+Z, but your editor’s undo history runs out. Your working model is gone.

Before version control, the only defense was β€œSave As”:

classify_landcover.py
classify_landcover_v2.py
classify_landcover_FINAL.py
classify_landcover_FINAL_v2.py
classify_landcover_USE_THIS_ONE.py

This is brittle, takes up space, and makes collaboration impossible β€” you cannot merge two people’s changes to FINAL_v2.py without manually reading both files line by line.

Git solves all of this. It tracks every change you make to every file in a folder, lets you label snapshots, and makes it trivial to go back to any previous state.

Git vs. GitHub

These are commonly confused. They are entirely different things:

  • Git is a free program installed on your local computer. It tracks file changes and stores a complete history inside a hidden .git folder in your project directory. It works with no internet connection.
  • GitHub is a website. It is a cloud host for Git repositories β€” a place to upload your history so that it is backed up and shareable with collaborators.

You use Git locally. You push to GitHub for backup and collaboration.

Installing Git

macOS: Git ships with the Xcode command-line tools. Run git --version in your terminal. If it is not installed, macOS will offer to install it automatically.

Windows: Download the installer from git-scm.com/downloads. Accept the defaults. This also installs β€œGit Bash,” which gives you a Unix-style terminal on Windows.

Linux:

sudo apt install git

First-Time Configuration

Before you can make a commit, Git needs to know who you are. This is stored in every commit you create:

git config --global user.name "Sarah Chen"
git config --global user.email "sarah@example.com"

Use the email address you will register with GitHub. This only needs to be done once per computer β€” --global writes it to your user configuration file.

Confirm the settings were saved:

git config --global --list

The Git Lifecycle

Working with Git means thinking in snapshots rather than pressing Save and walking away. A snapshot is called a commit.

1. Start tracking a folder (git init)

Navigate to your project folder in the terminal and initialize a repository:

cd ~/coding_projects/flood_analysis
git init

Git creates a hidden .git folder. This folder is the repository β€” it contains your complete history. Do not delete or modify it manually.

If you are starting from an existing project on GitHub, clone it instead:

git clone https://github.com/username/repository-name.git

This downloads the repository and its full history to your local machine.

2. Check the current state (git status)

git status is the command you will run most often. It shows you which files have been modified, which are new (untracked), and which are staged for the next commit:

git status
On branch main

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        analysis.py
        data/rainfall.csv

Red files are not yet tracked by Git. Nothing happens to them automatically.

3. Stage your changes (git add)

Git does not snapshot automatically. You explicitly choose which modified files to include in the next commit. This is the staging area.

Stage a specific file:

git add analysis.py

Stage everything in the current directory:

git add .

After staging, git status shows the staged files in green under β€œChanges to be committed.”

4. Take a snapshot (git commit)

With files staged, create the commit. The message should explain why you made the change, not just what:

git commit -m "Add rainfall normalization step to remove outliers above 200mm"

Good messages: "Fix coordinate reference system mismatch in flood extent layer", "Switch from Euclidean to Haversine distance for accuracy across large areas"

Bad messages: "fixed stuff", "update", "aaa"

The commit is saved to your local .git history. You can now recover this exact state of the project at any time.

5. View your history (git log)

git log

This shows every commit: its unique hash, author, date, and message. Use git log --oneline for a compact summary:

a3f91c2 Fix coordinate reference system mismatch
b72e8d4 Add rainfall normalization step
c10d991 Initial analysis script

The hash (e.g. a3f91c2) uniquely identifies each commit. You use it to reference specific points in history.

6. Get changes from the cloud (git pull)

If you are collaborating, or working across multiple computers, always pull before you push:

git pull

This downloads any commits that exist on GitHub but not on your local machine and merges them in. If you push without pulling first and a collaborator has made changes, Git will refuse with a β€œrejected” error. Pull first, resolve any conflicts, then push.

7. Back up to the cloud (git push)

A commit only saves to your local hard drive. To synchronize with GitHub:

git push

If this is your first push to a new repository, Git may ask you to set an upstream:

git push -u origin main

You only need the -u origin main part once. After that, plain git push works.

The .gitignore File

Some files should never be committed: virtual environment folders, compiled Python files, large data files, credentials. A .gitignore file in your project root tells Git to ignore them:

# Python
.venv/
__pycache__/
*.pyc
*.pyo

# Data files (too large for Git)
*.tif
*.geotiff
data/raw/

# Credentials (never commit these)
.env
secrets.py

Create this file in your project root before your first commit. Once a file is already tracked by Git, adding it to .gitignore does not remove it β€” you would need git rm --cached filename to stop tracking it.

Fixing Mistakes

Undo the last commit but keep your changes:

git reset HEAD~1

This moves the commit pointer back one step and unstages the files, leaving your actual file contents unchanged. Useful when you committed too early.

Discard all uncommitted changes to a file (permanent β€” cannot be undone):

git checkout -- analysis.py

See exactly what changed in a file before staging:

git diff analysis.py

Lines beginning with + were added. Lines beginning with - were removed.

Verify Your Work

  1. Install Git and confirm with git --version.
  2. Set your name and email: git config --global user.name and git config --global user.email.
  3. Create a new folder, cd into it, and run git init.
  4. Create a simple Python file in that folder.
  5. Run git status β€” confirm the file appears as untracked.
  6. Run git add . and then git status again β€” confirm it is staged.
  7. Run git commit -m "Initial script".
  8. Run git log --oneline β€” confirm your commit appears.
  9. Create a .gitignore with .venv/ and __pycache__/.
  10. Create a free account at github.com, create a repository, and push your commits.

Conclusion

You have traversed the complete mechanical stack required for scientific computing. You can navigate directories with the command line, install the Python interpreter, create virtual environments, manage libraries with pip and conda, explore data in Jupyter Notebooks, write structured scripts in an IDE, find and fix errors with a debugger, and track your work with Git.

The installation phase is complete. You are ready to choose a pathway and start writing real geographic code.

It is time to choose your Pathway β†’