Back to Notes
til
computer-science
encoding
hashing
system-design
fundamentals

Base62, MD5, and SHA-256 — What They Actually Do (Under the Hood)

December 25, 2025
Share:TwitterLinkedIn

These three terms — Base62, MD5, and SHA-256 — often get mentioned together in system design discussions, even though they solve very different problems.

I used to loosely group them as “ways to generate short or unique strings”. That’s technically wrong, and understanding why clears up a lot of confusion around IDs, security, and correctness.

This note is my attempt to explain what each of them actually does, without leaning on buzzwords.


Base62: Representation, Not Transformation

Base62 is not a hashing algorithm.
It’s not encryption.
It doesn’t add randomness.

Base62 is just a way of writing numbers more compactly.

We’re used to base10 (digits 0–9).
Base62 simply uses more symbols:

0–9 (10)
a–z (26)
A–Z (26)
Total = 62

So instead of writing a number like this:

125000321

We can write the same number as:

k8F2a

Nothing changed except the representation.

The key property

Base62 is reversible and collision-free.

If two numbers are different, their Base62 representations will also be different. Always.

This is why Base62 is commonly used anywhere we want:

  • Shorter identifiers
  • URL-safe strings
  • Human-friendly IDs

Base62 preserves information. It does not destroy or compress it.


MD5: One-Way Fingerprints

MD5 lives in a completely different world.

MD5 is a hashing algorithm. That means:

  • Any input → fixed-size output
  • Output looks random
  • You cannot reliably reverse it

Examples:

"hello" → 5d41402abc4b2a76b9719d911017c592
"hello world" → 5eb63bbbe01eeed093cb22bb8f5acdc3

No matter how small or large the input is, the output is always 128 bits.

The core idea

Hashing intentionally destroys information.

This is useful for:

  • Detecting data corruption
  • Comparing large files
  • Fingerprinting content

But it also means:

  • Collisions are mathematically unavoidable
  • You cannot “decode” the original input

MD5 specifically is now considered cryptographically broken, because collisions can be generated intentionally. That makes it unsuitable for security-sensitive use cases.


SHA-256: Same Idea, Stronger Guarantees

SHA-256 is conceptually the same as MD5:

  • One-way hash
  • Fixed-size output
  • Avalanche effect (small input change → huge output change)

The difference is strength, not philosophy.

SHA-256:

  • Produces a 256-bit hash
  • Has a vastly larger output space
  • Is currently considered cryptographically secure

That’s why SHA-256 is used for:

  • Password hashing (with salt & stretching)
  • Digital signatures
  • Blockchain systems
  • Integrity verification where security matters

But even SHA-256 still shares the same fundamental property as MD5:

It is not reversible, and collisions are theoretically possible (just astronomically unlikely).


Encoding vs Hashing (The Mental Model That Matters)

This distinction clears up most confusion:

  • Encoding (Base62, Base64, Hex)

    • Reversible
    • No collisions
    • Changes representation only
  • Hashing (MD5, SHA-256)

    • One-way
    • Collisions possible
    • Destroys information by design

When Each Makes Sense

  • Use Base62 when:

    • You already have a unique number
    • You want it shorter or more readable
    • You need zero collision risk
  • Use MD5 when:

    • You need a quick, non-secure fingerprint
    • You’re checking data integrity
    • Security is not a concern
  • Use SHA-256 when:

    • Security matters
    • You need tamper resistance
    • You’re hashing secrets or sensitive data

Problems happen when these are used interchangeably.


A Common Anti-Pattern

A mistake I’ve seen (and probably made before):

short_id = MD5(input).substring(0, 6)

This looks convenient but introduces:

  • Collision risk
  • Retry logic
  • Hard-to-reason-about correctness

The issue isn’t MD5 itself — it’s using a hash where a deterministic identifier is required.


Final Takeaway

These tools are not competing solutions — they solve different classes of problems.

Base62 answers:

How do I represent this value more compactly?

MD5 and SHA-256 answer:

How do I generate a fixed-size fingerprint of this data?

Share:TwitterLinkedIn