Cryptography Guide

Hash Functions Explained:
MD5, SHA-256 & Cryptographic Hashing

Complete guide to cryptographic hash functions. Compare algorithms, learn password hashing best practices, and understand when to use each hash type.

16 min read Updated Jan 2026

1. What is a Hash Function?

A hash function is a mathematical algorithm that converts input data of any size into a fixed-size output called a hash, digest, or checksum. Think of it as a fingerprint for data.

Example: SHA-256 Hash

Input: Hello, World!
SHA-256: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

No matter how large the input (a single character or a 10GB file), the output is always the same size. SHA-256 always produces 256 bits (64 hexadecimal characters).

Key Concepts:

  • Hash/Digest: The fixed-size output of a hash function
  • Collision: When two different inputs produce the same hash
  • One-way function: Cannot compute input from output
  • Deterministic: Same input always produces same output

2. Properties of Cryptographic Hash Functions

1. Deterministic

The same input always produces the exact same output. Hash "Hello" 1000 times → same result every time.

2. Quick Computation

Hash functions compute quickly. SHA-256 can process gigabytes per second on modern hardware.

3. Pre-image Resistance (One-way)

Given a hash, it's computationally infeasible to find the original input. You cannot "reverse" a hash.

4. Collision Resistance

It should be extremely difficult to find two different inputs that produce the same hash output.

5. Avalanche Effect

A tiny change in input causes a completely different output. "Hello" vs "hello" → entirely different hashes.

Avalanche Effect Demonstration:

Input: "Hello"
SHA-256: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

Input: "hello" (just lowercase 'h')
SHA-256: 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

// Completely different outputs from a single character change!

3. Algorithm Comparison Table

Algorithm Output Size Security Status Use Case
MD5 128 bits (32 hex) ❌ Broken Non-security checksums only
SHA-1 160 bits (40 hex) ❌ Broken Legacy systems only
SHA-256 256 bits (64 hex) ✅ Secure General purpose, blockchain
SHA-384 384 bits (96 hex) ✅ Secure Higher security needs
SHA-512 512 bits (128 hex) ✅ Secure Maximum security, 64-bit systems
SHA-3-256 256 bits (64 hex) ✅ Secure Alternative to SHA-2 family
BLAKE2 Variable ✅ Secure Fast, modern alternative

⚠️ Warning: MD5 and SHA-1 are Broken

Practical collision attacks exist for both MD5 and SHA-1. In 2017, Google demonstrated SHAttered - two different PDFs with the same SHA-1 hash. Never use these for security purposes.

Quick Recommendation:

  • General security: SHA-256
  • Passwords: bcrypt, Argon2 (NOT SHA-256 directly)
  • Performance-critical: BLAKE2
  • Non-security checksums: MD5 is acceptable

4. Hash Function Use Cases

1 Password Storage

Never store plaintext passwords. Hash them (with proper algorithms like bcrypt) so even if the database is breached, passwords aren't exposed.

2 File Integrity Verification

Verify downloaded files haven't been tampered with. Compare the hash of your download against the published hash.

3 Digital Signatures

Hash the document, then sign the hash. Efficient because you're signing a small fixed-size hash instead of a large document.

4 Data Deduplication

Use hashes to identify duplicate files. Same hash = same content (with extremely high probability).

5 Blockchain & Proof of Work

Bitcoin uses SHA-256 for block hashing and mining. The chain's security depends on hash properties.

6 HMAC (Message Authentication)

Hash-based Message Authentication Code combines a secret key with a hash to verify message integrity and authenticity.

7 Cache Keys & ETags

Generate cache keys from content hashes. If content changes, the hash changes, invalidating the cache.

8 Git Version Control

Git uses SHA-1 (migrating to SHA-256) to identify commits, files, and trees. Each object is addressed by its hash.

5. Password Hashing (bcrypt, Argon2, scrypt)

🔴 Never Use General Hash Functions for Passwords

MD5, SHA-1, SHA-256 are TOO FAST for password hashing. Attackers can try billions of guesses per second. Use dedicated password hashing algorithms that are intentionally slow.

Password Hashing Algorithm Comparison:

Algorithm Status Memory-Hard Best For
Argon2id ✅ Recommended Yes New applications
bcrypt ✅ Good Limited Widely supported
scrypt ✅ Good Yes GPU resistance needed
PBKDF2 ⚠️ Acceptable No Legacy/compliance

bcrypt Example:

// Node.js with bcrypt
const bcrypt = require('bcrypt');

// Hash a password (cost factor 12 = 2^12 iterations)
async function hashPassword(password) {
    const saltRounds = 12;
    return await bcrypt.hash(password, saltRounds);
}
// Output: $2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/X4.FSdcqC.xTTBm/G

// Verify a password
async function verifyPassword(password, hash) {
    return await bcrypt.compare(password, hash);
}

Argon2 Example (Python):

from argon2 import PasswordHasher

ph = PasswordHasher(
    time_cost=3,        # Number of iterations
    memory_cost=65536,  # 64MB of memory
    parallelism=4       # 4 parallel threads
)

# Hash a password
hash = ph.hash("my_password")
# $argon2id$v=19$m=65536,t=3,p=4$...

# Verify a password
try:
    ph.verify(hash, "my_password")
    print("Password is correct!")
except:
    print("Invalid password")

6. File Integrity Verification

Hash functions let you verify that files haven't been modified or corrupted during download or transfer.

How It Works:

  1. Developer calculates hash of original file
  2. Hash is published alongside the download
  3. User downloads file and calculates its hash
  4. User compares calculated hash with published hash
  5. Match = file is authentic and unmodified

Command Line Examples:

# Windows (PowerShell)
Get-FileHash -Algorithm SHA256 file.zip

# Linux/Mac
sha256sum file.zip

# Output: 
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  file.zip

Code Examples:

// Node.js - Hash a file
const crypto = require('crypto');
const fs = require('fs');

function hashFile(filePath, algorithm = 'sha256') {
    return new Promise((resolve, reject) => {
        const hash = crypto.createHash(algorithm);
        const stream = fs.createReadStream(filePath);
        
        stream.on('data', data => hash.update(data));
        stream.on('end', () => resolve(hash.digest('hex')));
        stream.on('error', reject);
    });
}

// Python
import hashlib

def hash_file(filepath, algorithm='sha256'):
    hash_obj = hashlib.new(algorithm)
    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            hash_obj.update(chunk)
    return hash_obj.hexdigest()

7. Salting and Peppering

Salt 🧂

A random value unique to each user, stored alongside the password hash in the database.

  • Prevents rainbow table attacks
  • Same password → different hashes
  • Stored with the hash

Pepper 🌶️

A secret value applied to all passwords, stored separately (e.g., environment variable, HSM).

  • Adds extra security layer
  • Protects if DB is breached
  • Never stored in database

Why Salting Matters:

// Without salt - identical passwords have identical hashes
hash("password123") → 5f4dcc3b5aa765d61d8327deb882cf99
hash("password123") → 5f4dcc3b5aa765d61d8327deb882cf99  // Same!

// With salt - identical passwords have different hashes
hash("password123" + "randomsalt1") → a1b2c3d4e5f6...
hash("password123" + "randomsalt2") → x7y8z9w0v1u2...  // Different!

Rainbow Table Attack Prevention:

A rainbow table is a precomputed database of hash → password mappings. Without salt, attackers can look up hashes instantly. With unique salts, attackers would need a separate rainbow table for every possible salt – computationally infeasible.

8. Implementation Examples

JavaScript (Browser & Node.js)

// Browser - using Web Crypto API
async function sha256(message) {
    const encoder = new TextEncoder();
    const data = encoder.encode(message);
    const hash = await crypto.subtle.digest('SHA-256', data);
    return Array.from(new Uint8Array(hash))
        .map(b => b.toString(16).padStart(2, '0'))
        .join('');
}

// Node.js
const crypto = require('crypto');

function hash(algorithm, data) {
    return crypto.createHash(algorithm).update(data).digest('hex');
}

console.log(hash('md5', 'Hello'));     // 8b1a9953c4611296a827abf8c47804d7
console.log(hash('sha256', 'Hello'));  // 185f8db32271fe25f561a6fc938b2e264...

// HMAC (for message authentication)
function hmac(algorithm, key, data) {
    return crypto.createHmac(algorithm, key).update(data).digest('hex');
}

Python

import hashlib

# Basic hashing
def hash_string(algorithm: str, data: str) -> str:
    return hashlib.new(algorithm, data.encode()).hexdigest()

print(hash_string('md5', 'Hello'))     # 8b1a9953c4611296a827abf8c47804d7
print(hash_string('sha256', 'Hello'))  # 185f8db32271fe25f561a6fc938b2e264...

# HMAC
import hmac

def create_hmac(key: str, message: str, algorithm='sha256') -> str:
    return hmac.new(
        key.encode(),
        message.encode(),
        algorithm
    ).hexdigest()

# Hash file
def hash_file(filepath: str, algorithm='sha256') -> str:
    h = hashlib.new(algorithm)
    with open(filepath, 'rb') as f:
        for chunk in iter(lambda: f.read(8192), b''):
            h.update(chunk)
    return h.hexdigest()

Java

import java.security.MessageDigest;
import java.nio.charset.StandardCharsets;

public class HashExample {
    public static String hash(String algorithm, String data) throws Exception {
        MessageDigest digest = MessageDigest.getInstance(algorithm);
        byte[] hashBytes = digest.digest(data.getBytes(StandardCharsets.UTF_8));
        
        StringBuilder hex = new StringBuilder();
        for (byte b : hashBytes) {
            hex.append(String.format("%02x", b));
        }
        return hex.toString();
    }
    
    public static void main(String[] args) throws Exception {
        System.out.println(hash("MD5", "Hello"));
        System.out.println(hash("SHA-256", "Hello"));
    }
}

PHP

<?php
// Basic hashing
echo hash('md5', 'Hello');     // 8b1a9953c4611296a827abf8c47804d7
echo hash('sha256', 'Hello');  // 185f8db32271fe25f561a6fc938b2e264...

// Password hashing (use this for passwords!)
$hash = password_hash('my_password', PASSWORD_BCRYPT, ['cost' => 12]);
// Or better:
$hash = password_hash('my_password', PASSWORD_ARGON2ID);

// Verify password
if (password_verify('my_password', $hash)) {
    echo "Password is correct!";
}

// Hash file
echo hash_file('sha256', 'path/to/file.zip');
?>

9. Security Considerations

✅ Do: Use SHA-256 or better for security

SHA-256, SHA-384, SHA-512, SHA-3, and BLAKE2 are all secure choices for general hashing.

✅ Do: Use bcrypt/Argon2 for passwords

These are specifically designed for password hashing with built-in salting and configurable work factors.

✅ Do: Use HMAC for message authentication

HMAC combines a secret key with hashing to verify both integrity and authenticity.

❌ Don't: Use MD5 or SHA-1 for security

Both have practical collision attacks. Only use for non-security checksums where collision exploitation isn't possible.

❌ Don't: Hash passwords with SHA-256 directly

SHA-256 is too fast. Attackers can try billions of guesses per second. Use bcrypt/Argon2.

❌ Don't: Create your own hashing scheme

Hash(hash(password + salt) + pepper) - inventing schemes is dangerous. Use established libraries.

10. Frequently Asked Questions

What is the difference between hashing and encryption?
Hashing is one-way: you cannot recover the original data from a hash. It produces fixed-size output regardless of input size. Used for integrity verification and password storage.

Encryption is two-way: encrypted data can be decrypted back to the original using a key. Output size varies with input. Used when you need to recover the original data.
Is MD5 completely useless now?
Not completely. MD5 is still acceptable for non-security purposes where an attacker cannot exploit collisions: file deduplication in trusted environments, cache keys, non-cryptographic checksums. However, never use MD5 for password hashing, digital signatures, or any security-critical application.
Can two different files have the same hash?
Yes, this is called a collision. Since hashes have fixed output size but infinite possible inputs, collisions must mathematically exist (pigeonhole principle). However, for secure algorithms like SHA-256, finding a collision deliberately is computationally infeasible - it would take longer than the age of the universe with current technology.
Why can't I reverse a hash to get the original data?
Hash functions are designed as one-way functions. They involve operations that lose information (like modulo division) making reversal mathematically impossible. Additionally, infinite inputs map to finite outputs, so a hash doesn't contain enough information to uniquely identify its input. You can only "crack" hashes by guessing inputs and comparing.
What is a rainbow table attack?
A rainbow table is a precomputed database mapping hashes to their plaintext inputs. Attackers compute hashes for millions of common passwords and store them. When they get a hash, they just look it up. Salting defeats this because each unique salt would require a completely new rainbow table.
How long should a salt be?
At least 16 bytes (128 bits) of cryptographically random data. This provides enough uniqueness that even with billions of users, salt collisions are extremely unlikely. Libraries like bcrypt and Argon2 handle salt generation automatically.
What is the bcrypt cost factor?
The cost factor (work factor) determines how many iterations bcrypt performs internally. Cost 10 = 2^10 = 1024 iterations. Cost 12 = 2^12 = 4096 iterations. Higher cost = slower hashing = more secure against brute force, but also slower for legitimate users. Aim for ~100ms-500ms per hash. Increase cost as hardware gets faster.
Should I use SHA-256 or SHA-512?
Both are secure. SHA-256 is sufficient for most purposes. SHA-512 is actually faster on 64-bit systems because it operates on 64-bit words natively. Choose SHA-512 if you have 64-bit systems and want slightly better performance or need the larger hash size for future-proofing.
What is HMAC and when should I use it?
HMAC (Hash-based Message Authentication Code) combines a secret key with a hash function to verify both integrity AND authenticity. Use HMAC when you need to verify that a message came from someone with the secret key and wasn't modified. Examples: API authentication, JWT signatures, webhook verification. Plain hashes only verify integrity, not authenticity.
How do I verify a downloaded file's hash?
1. Download the file and its published hash (usually SHA-256).
2. Calculate the hash of your download:
  • Windows: Get-FileHash file.zip -Algorithm SHA256
  • Mac/Linux: sha256sum file.zip
3. Compare your calculated hash with the published hash.
4. If they match exactly, the file is authentic and unmodified.

Need to generate a hash?

Try our free hash generator. Create MD5, SHA-1, SHA-256, SHA-512 hashes instantly - all client-side, nothing sent to servers.

Open Hash Generator

Related Guides