CrabGit — Building Git from Scratch in Rust
I use Git every day, but honestly had no idea how it worked under the hood. So I spent a few weeks building my own version in Rust to figure it out. Turns out, Git is way cooler than I thought.
Look, I'll be honest - I've been using Git for years and had absolutely no clue what was actually happening when I ran these commands:
Sure, it worked. But what the hell was Git actually doing? Where did my files go? What's a blob? Why is everything hashed?
So I did what any curious developer would do - I built my own Git implementation from scratch. Meet CrabGit (yes, Rust's crab mascot demanded naming rights).
It's not meant to replace Git. It's way simpler - local-only, no remotes, no fancy features. Just the core stuff:
- A content-addressable object store
- Blobs, trees, and commits
- Basic branching
- A tiny
.crab_gitfolder that holds everything
This post is basically my learning journey. I'll show you how CrabGit works, how data flows from add to commit, and why Git's architecture is actually pretty genius.
What CrabGit Can Do
I kept it minimal on purpose. Here's what I managed to implement:
Repository Basics
init- Start a new repostatus- See what's changedadd- Stage files
Version Control
commit- Save a snapshotlog- View historydiff- Compare changes
Branching
branch- Create/list/delete branchescheckout- Switch branches or commits
That's it. No push, no pull, no merge conflicts to debug at 2 AM. Just the fundamentals.
The CLI
When you run the binary, you get this ASCII art banner (because what's a CLI tool without ASCII art, right?):

How I Structured the Code
I split CrabGit into four layers to keep my sanity:
- UI Layer - Command-line parsing with
clap - Commands - Each Git operation (
init,add,commit, etc.) - Core - Repository logic and the object store
- Data Models - Rust structs for blobs, trees, commits

The key rule I followed: commands never touch the filesystem directly. They go through the core layer, which handles all the file I/O and hashing. This kept things organized and made debugging way easier.
The Journey from add to commit
This is where it gets interesting. Let me walk you through what happens when you stage and commit a file.
Step 1: Staging (the add command)
You run:
Here's what happens behind the scenes:
- CrabGit reads
file.txtfrom your working directory - It calculates a SHA256 hash of the contents
- It creates a
Blobobject and saves it to.crab_git/objects/ - It updates the staging area (index) with an entry like:
Text
At this point, the file content is safely stored in the object database, and the index knows it should be included in the next commit.
Step 2: Committing (the commit command)
You run:
Now CrabGit does this:
- Reads all staged entries from the index
- Builds a
Treeobject that represents your directory structure - Creates a
Commitobject with:- A pointer to the tree (the snapshot)
- A pointer to the parent commit (if there is one)
- Your author info, message, and timestamp
- Saves the commit to the object store
- Updates the current branch (like
refs/heads/main) to point to this new commit - Keeps
HEADpointing to that branch
Here's the full flow:

Inside the .crab_git Directory
This is where all the magic happens. Here's what the folder structure looks like:

Content-Addressable Storage (Fancy Term for Hash-Based Filing)
Every single object follows this pattern:
- Take the content
- Hash it with SHA256
- Use the first two characters as a directory name
- Use the rest as the filename
Example:
Why? Because if the content changes, the hash changes, which means it gets stored separately. Unchanged content gets reused automatically. Git's entire history system is built on this simple idea.
Compression
To save disk space, I compress everything with zlib when writing and decompress when reading. Objects are also serialized as JSON (yeah, I know real Git uses its own format, but JSON made debugging so much easier).
The Data Models (Git's Secret Sauce)
Here's how I represented Git's core concepts in Rust:
Blob - Raw File Content
Simple. Just the file content and its hash.
Tree - Directory Structure
Trees map filenames to blobs (files) or other trees (subdirectories).
TreeEntry - Single File or Folder
Commit - Snapshot in Time
Each commit points to a tree (the full snapshot) and its parent commit (the previous snapshot). The first commit has no parent.
Index - Staging Area
This is what add modifies and commit reads from.
How It All Connects
Here's how commits, trees, and blobs relate:

And here's how your actual filesystem maps to Git objects:

A closer look at how trees can point to both blobs and other trees:

The brilliant part:
- Every commit is a full snapshot
- But unchanged files share the same blob
- So you're not duplicating data - you're reusing hashes
When you change one file, only that blob changes. The rest of the tree reuses existing objects. That's how Git can store years of history without eating your entire hard drive.
The Codebase Layout
I organized the project like this:
Pretty straightforward. Commands are isolated, core logic is separate, and everything talks through well-defined interfaces.
How Commands Actually Execute
Every command follows the same pipeline:
- Parse -
clapturns CLI args into aCommandenum - Route - A match statement sends it to the right handler
- Load - The core finds
.crab_gitand loads repo state - Execute - The command does its thing (read/write objects, update index)
- Save - New objects get written, refs get updated
- Output - Results print to the terminal
It's like a mini compiler pipeline, which made the code really easy to reason about.
Try It Yourself
Want to play with CrabGit? Here's how:
macOS / Linux
Windows (PowerShell)
The Dependencies
Kept it minimal. Just these crates:
- sha2 - SHA256 hashing
- serde/serde_json - Serialization (so much easier than binary formats)
- chrono - Timestamps for commits
- clap - CLI argument parsing
- walkdir - Recursive directory traversal
- flate2 - zlib compression
What I Learned
Building CrabGit was honestly one of the best learning experiences I've had. Here's what clicked for me:
Git isn't magic. It's just a content-addressable filesystem with some clever bookkeeping. Every commit is a snapshot. Branches are just pointers. That's it.
Hashing is powerful. Once I understood that everything is identified by its hash, the whole system made sense. Content never changes - it just gets new hashes.
Rust was perfect for this. The type system forced me to think through the data model properly. No null pointer surprises, no accidental mutations. Just clean, explicit code.
Would I replace Git with this? Hell no. But do I finally understand how Git works? Absolutely.
If you've ever felt like Git is this mystical black box, I really recommend building something like this. Start small, add features one at a time, and suddenly it all clicks.
The code's on GitHub if you want to poke around. PRs welcome if you want to add features (like, I dunno, actual merge support?).


