Git and Version Control: Tracking and Collaborating
Before You Start
You should know: - How to create and edit .py files and navigate folders in the command line.
You will learn: - Why βSave Asβ version management fails and what Git replaces it with. - The difference between Git and GitHub. - How to configure Git before your first commit. - The fundamental lifecycle: init, status, add, commit, push, pull. - How .gitignore keeps your repository clean. - How to view history and undo mistakes.
Introduction
At some point you will write a 300-line Python script that perfectly classifies land cover. The next day you will try to optimize one function. You delete a few lines, tweak a loop, and the script crashes. You press Ctrl+Z, but your editorβs undo history runs out. Your working model is gone.
Before version control, the only defense was βSave Asβ:
classify_landcover.py
classify_landcover_v2.py
classify_landcover_FINAL.py
classify_landcover_FINAL_v2.py
classify_landcover_USE_THIS_ONE.py
This is brittle, takes up space, and makes collaboration impossible β you cannot merge two peopleβs changes to FINAL_v2.py without manually reading both files line by line.
Git solves all of this. It tracks every change you make to every file in a folder, lets you label snapshots, and makes it trivial to go back to any previous state.
Git vs. GitHub
These are commonly confused. They are entirely different things:
- Git is a free program installed on your local computer. It tracks file changes and stores a complete history inside a hidden
.gitfolder in your project directory. It works with no internet connection. - GitHub is a website. It is a cloud host for Git repositories β a place to upload your history so that it is backed up and shareable with collaborators.
You use Git locally. You push to GitHub for backup and collaboration.
Installing Git
macOS: Git ships with the Xcode command-line tools. Run git --version in your terminal. If it is not installed, macOS will offer to install it automatically.
Windows: Download the installer from git-scm.com/downloads. Accept the defaults. This also installs βGit Bash,β which gives you a Unix-style terminal on Windows.
Linux:
sudo apt install gitFirst-Time Configuration
Before you can make a commit, Git needs to know who you are. This is stored in every commit you create:
git config --global user.name "Sarah Chen"
git config --global user.email "sarah@example.com"Use the email address you will register with GitHub. This only needs to be done once per computer β --global writes it to your user configuration file.
Confirm the settings were saved:
git config --global --listThe Git Lifecycle
Working with Git means thinking in snapshots rather than pressing Save and walking away. A snapshot is called a commit.
1. Start tracking a folder (git init)
Navigate to your project folder in the terminal and initialize a repository:
cd ~/coding_projects/flood_analysis
git initGit creates a hidden .git folder. This folder is the repository β it contains your complete history. Do not delete or modify it manually.
If you are starting from an existing project on GitHub, clone it instead:
git clone https://github.com/username/repository-name.gitThis downloads the repository and its full history to your local machine.
2. Check the current state (git status)
git status is the command you will run most often. It shows you which files have been modified, which are new (untracked), and which are staged for the next commit:
git statusOn branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
analysis.py
data/rainfall.csv
Red files are not yet tracked by Git. Nothing happens to them automatically.
3. Stage your changes (git add)
Git does not snapshot automatically. You explicitly choose which modified files to include in the next commit. This is the staging area.
Stage a specific file:
git add analysis.pyStage everything in the current directory:
git add .After staging, git status shows the staged files in green under βChanges to be committed.β
4. Take a snapshot (git commit)
With files staged, create the commit. The message should explain why you made the change, not just what:
git commit -m "Add rainfall normalization step to remove outliers above 200mm"Good messages: "Fix coordinate reference system mismatch in flood extent layer", "Switch from Euclidean to Haversine distance for accuracy across large areas"
Bad messages: "fixed stuff", "update", "aaa"
The commit is saved to your local .git history. You can now recover this exact state of the project at any time.
5. View your history (git log)
git logThis shows every commit: its unique hash, author, date, and message. Use git log --oneline for a compact summary:
a3f91c2 Fix coordinate reference system mismatch
b72e8d4 Add rainfall normalization step
c10d991 Initial analysis script
The hash (e.g. a3f91c2) uniquely identifies each commit. You use it to reference specific points in history.
6. Get changes from the cloud (git pull)
If you are collaborating, or working across multiple computers, always pull before you push:
git pullThis downloads any commits that exist on GitHub but not on your local machine and merges them in. If you push without pulling first and a collaborator has made changes, Git will refuse with a βrejectedβ error. Pull first, resolve any conflicts, then push.
7. Back up to the cloud (git push)
A commit only saves to your local hard drive. To synchronize with GitHub:
git pushIf this is your first push to a new repository, Git may ask you to set an upstream:
git push -u origin mainYou only need the -u origin main part once. After that, plain git push works.
The .gitignore File
Some files should never be committed: virtual environment folders, compiled Python files, large data files, credentials. A .gitignore file in your project root tells Git to ignore them:
# Python
.venv/
__pycache__/
*.pyc
*.pyo
# Data files (too large for Git)
*.tif
*.geotiff
data/raw/
# Credentials (never commit these)
.env
secrets.py
Create this file in your project root before your first commit. Once a file is already tracked by Git, adding it to .gitignore does not remove it β you would need git rm --cached filename to stop tracking it.
Fixing Mistakes
Undo the last commit but keep your changes:
git reset HEAD~1This moves the commit pointer back one step and unstages the files, leaving your actual file contents unchanged. Useful when you committed too early.
Discard all uncommitted changes to a file (permanent β cannot be undone):
git checkout -- analysis.pySee exactly what changed in a file before staging:
git diff analysis.pyLines beginning with + were added. Lines beginning with - were removed.
Verify Your Work
- Install Git and confirm with
git --version. - Set your name and email:
git config --global user.nameandgit config --global user.email. - Create a new folder,
cdinto it, and rungit init. - Create a simple Python file in that folder.
- Run
git statusβ confirm the file appears as untracked. - Run
git add .and thengit statusagain β confirm it is staged. - Run
git commit -m "Initial script". - Run
git log --onelineβ confirm your commit appears. - Create a
.gitignorewith.venv/and__pycache__/. - Create a free account at github.com, create a repository, and push your commits.
Conclusion
You have traversed the complete mechanical stack required for scientific computing. You can navigate directories with the command line, install the Python interpreter, create virtual environments, manage libraries with pip and conda, explore data in Jupyter Notebooks, write structured scripts in an IDE, find and fix errors with a debugger, and track your work with Git.
The installation phase is complete. You are ready to choose a pathway and start writing real geographic code.
It is time to choose your Pathway β