Portfolio | Harsh Mange

These three terms — Base62, MD5, and SHA-256 — often get mentioned together in system design discussions, even though they solve very different problems.

I used to loosely group them as “ways to generate short or unique strings”. That’s technically wrong, and understanding why clears up a lot of confusion around IDs, security, and correctness.

This note is my attempt to explain what each of them actually does, without leaning on buzzwords.

Base62: Representation, Not Transformation

Base62 is not a hashing algorithm.
It’s not encryption.
It doesn’t add randomness.

Base62 is just a way of writing numbers more compactly.

We’re used to base10 (digits 0–9).
Base62 simply uses more symbols:

0–9 (10)
a–z (26)
A–Z (26)
Total = 62

So instead of writing a number like this:

125000321

We can write the same number as:

k8F2a

Nothing changed except the representation.

The key property

Base62 is reversible and collision-free.

If two numbers are different, their Base62 representations will also be different. Always.

This is why Base62 is commonly used anywhere we want:

Shorter identifiers
URL-safe strings
Human-friendly IDs

Base62 preserves information. It does not destroy or compress it.

MD5: One-Way Fingerprints

MD5 lives in a completely different world.

MD5 is a hashing algorithm. That means:

Any input → fixed-size output
Output looks random
You cannot reliably reverse it

Examples:

"hello" → 5d41402abc4b2a76b9719d911017c592
"hello world" → 5eb63bbbe01eeed093cb22bb8f5acdc3

No matter how small or large the input is, the output is always 128 bits.

The core idea

Hashing intentionally destroys information.

This is useful for:

Detecting data corruption
Comparing large files
Fingerprinting content

But it also means:

Collisions are mathematically unavoidable
You cannot “decode” the original input

MD5 specifically is now considered cryptographically broken, because collisions can be generated intentionally. That makes it unsuitable for security-sensitive use cases.

SHA-256: Same Idea, Stronger Guarantees

SHA-256 is conceptually the same as MD5:

One-way hash
Fixed-size output
Avalanche effect (small input change → huge output change)

The difference is strength, not philosophy.

SHA-256:

Produces a 256-bit hash
Has a vastly larger output space
Is currently considered cryptographically secure

That’s why SHA-256 is used for:

Password hashing (with salt & stretching)
Digital signatures
Blockchain systems
Integrity verification where security matters

But even SHA-256 still shares the same fundamental property as MD5:

It is not reversible, and collisions are theoretically possible (just astronomically unlikely).

Encoding vs Hashing (The Mental Model That Matters)

This distinction clears up most confusion:

Encoding (Base62, Base64, Hex)
- Reversible
- No collisions
- Changes representation only
Hashing (MD5, SHA-256)
- One-way
- Collisions possible
- Destroys information by design

When Each Makes Sense

Use Base62 when:
- You already have a unique number
- You want it shorter or more readable
- You need zero collision risk
Use MD5 when:
- You need a quick, non-secure fingerprint
- You’re checking data integrity
- Security is not a concern
Use SHA-256 when:
- Security matters
- You need tamper resistance
- You’re hashing secrets or sensitive data

Problems happen when these are used interchangeably.

A Common Anti-Pattern

A mistake I’ve seen (and probably made before):

short_id = MD5(input).substring(0, 6)

This looks convenient but introduces:

Collision risk
Retry logic
Hard-to-reason-about correctness

The issue isn’t MD5 itself — it’s using a hash where a deterministic identifier is required.

Final Takeaway

These tools are not competing solutions — they solve different classes of problems.

Base62 answers:

How do I represent this value more compactly?

MD5 and SHA-256 answer:

How do I generate a fixed-size fingerprint of this data?

Base62, MD5, and SHA-256 — What They Actually Do (Under the Hood)