These three terms — Base62, MD5, and SHA-256 — often get mentioned together in system design discussions, even though they solve very different problems.
I used to loosely group them as “ways to generate short or unique strings”. That’s technically wrong, and understanding why clears up a lot of confusion around IDs, security, and correctness.
This note is my attempt to explain what each of them actually does, without leaning on buzzwords.
Base62: Representation, Not Transformation
Base62 is not a hashing algorithm.
It’s not encryption.
It doesn’t add randomness.
Base62 is just a way of writing numbers more compactly.
We’re used to base10 (digits 0–9).
Base62 simply uses more symbols:
0–9 (10)
a–z (26)
A–Z (26)
Total = 62
So instead of writing a number like this:
125000321
We can write the same number as:
k8F2a
Nothing changed except the representation.
The key property
Base62 is reversible and collision-free.
If two numbers are different, their Base62 representations will also be different. Always.
This is why Base62 is commonly used anywhere we want:
- Shorter identifiers
- URL-safe strings
- Human-friendly IDs
Base62 preserves information. It does not destroy or compress it.
MD5: One-Way Fingerprints
MD5 lives in a completely different world.
MD5 is a hashing algorithm. That means:
- Any input → fixed-size output
- Output looks random
- You cannot reliably reverse it
Examples:
"hello" → 5d41402abc4b2a76b9719d911017c592
"hello world" → 5eb63bbbe01eeed093cb22bb8f5acdc3
No matter how small or large the input is, the output is always 128 bits.
The core idea
Hashing intentionally destroys information.
This is useful for:
- Detecting data corruption
- Comparing large files
- Fingerprinting content
But it also means:
- Collisions are mathematically unavoidable
- You cannot “decode” the original input
MD5 specifically is now considered cryptographically broken, because collisions can be generated intentionally. That makes it unsuitable for security-sensitive use cases.
SHA-256: Same Idea, Stronger Guarantees
SHA-256 is conceptually the same as MD5:
- One-way hash
- Fixed-size output
- Avalanche effect (small input change → huge output change)
The difference is strength, not philosophy.
SHA-256:
- Produces a 256-bit hash
- Has a vastly larger output space
- Is currently considered cryptographically secure
That’s why SHA-256 is used for:
- Password hashing (with salt & stretching)
- Digital signatures
- Blockchain systems
- Integrity verification where security matters
But even SHA-256 still shares the same fundamental property as MD5:
It is not reversible, and collisions are theoretically possible (just astronomically unlikely).
Encoding vs Hashing (The Mental Model That Matters)
This distinction clears up most confusion:
-
Encoding (Base62, Base64, Hex)
- Reversible
- No collisions
- Changes representation only
-
Hashing (MD5, SHA-256)
- One-way
- Collisions possible
- Destroys information by design
When Each Makes Sense
-
Use Base62 when:
- You already have a unique number
- You want it shorter or more readable
- You need zero collision risk
-
Use MD5 when:
- You need a quick, non-secure fingerprint
- You’re checking data integrity
- Security is not a concern
-
Use SHA-256 when:
- Security matters
- You need tamper resistance
- You’re hashing secrets or sensitive data
Problems happen when these are used interchangeably.
A Common Anti-Pattern
A mistake I’ve seen (and probably made before):
short_id = MD5(input).substring(0, 6)
This looks convenient but introduces:
- Collision risk
- Retry logic
- Hard-to-reason-about correctness
The issue isn’t MD5 itself — it’s using a hash where a deterministic identifier is required.
Final Takeaway
These tools are not competing solutions — they solve different classes of problems.
Base62 answers:
How do I represent this value more compactly?
MD5 and SHA-256 answer:
How do I generate a fixed-size fingerprint of this data?