\item In a classroom of 50 students, what's the probability that at least two share a birthday?
\begin{itemize}
\item 2\%?
\item 20\%?
\item 50\%?
\item 97\%?
\end{itemize}
\item Many people guess around 15-20\%, but the actual probability is about 97\%!
\item This is counterintuitive because \textbf{we're not looking for a specific birthday - we're looking for \textit{any} match among all possible pairs}.
\item With 50 students, we have $\binom{50}{2}=1,225$ possible pairs to check!\footnote{$\binom{50}{2}$ is a binomial coefficient. It means: \textit{``how many ways can I choose two different items from a set of 50?''}}
\end{itemize}
\end{column}
\end{columns}
\end{frame}
\begin{frame}{Birthday paradox}
\begin{itemize}[<+->]
\item With just 23 people, probability exceeds 50\%
\item Formula for $n$ people:
\begin{align*}
P = 1 - \frac{365!}{(365-n)! \cdot 365^n}
\end{align*}
\item Implication: Finding collisions in a space of size $N$ happens with roughly $\sqrt{N}$ samples.
\item This is why many cryptographic systems need large output spaces!
\end{itemize}
\end{frame}
\begin{frame}{Birthday probabilities}
\begin{columns}[c]
\begin{column}{0.6\textwidth}
\begin{itemize}[<+->]
\item Many cryptographic algorithms fail if two executions sample the same random value.
\item General question: If we take $q$ independent uniform samples from a set of $N$ items, what's the probability some value is chosen more than once?
\item This probability is called $\func{birthday}{q,N}$
\item Consider a server storing hashes of users' passwords.
\item With \textbf{unsalted} hashes, an attacker can:
\begin{itemize}
\item Pre-compute hashes of common passwords.
\item Create a dictionary mapping $H(\texttt{password})\rightarrow\texttt{password}$.
\item Instantly crack common passwords after a breach.
\end{itemize}
\item All hard work is done \textbf{before} the attack.
\item Results can be \textbf{reused} against any system using the same hash function.
\end{itemize}
\end{frame}
\begin{frame}{Rainbow tables}
\definitionbox{Rainbow Tables}{
\textbf{Rainbow tables} are specialized data structures that store precomputed hash values to efficiently crack passwords with minimal computation during the actual attack.
}
\begin{itemize}[<+->]
\item Represent a time-memory tradeoff for password cracking.
\item Store chains of hashes rather than individual hash-password pairs.
\item Can crack any password within the table's character set and length parameters.
\item Completely defeated by properly salted hashes.
\end{itemize}
\end{frame}
\begin{frame}{Salt in hash functions}{Preventing precomputation attacks}
\definitionbox{Hash Salt}{
A \textbf{salt} is a random value that is concatenated with the input before hashing:
$H(S \| X)$ where $S$ is the salt and $X$ is the original input.
}
\begin{itemize}[<+->]
\item The salt doesn't need to be secret, but should be:
\begin{itemize}
\item Random and unique for each input.
\item Long enough to prevent precomputation ($\geq16$ bytes).
\item Stored alongside the hash value.
\end{itemize}
\item Salting prevents attackers from using precomputed tables (rainbow tables) to find hash values.
\item Without salt, same inputs always hash to same outputs, enabling reusable attacks.
\end{itemize}
\end{frame}
\begin{frame}{Practical applications of salting}{Real-world examples}
\begin{itemize}[<+->]
\item\textbf{Web application user accounts}
\begin{itemize}
\item Different users with same password have different hashes.
\item Breach of one site doesn't compromise accounts on others.
\end{itemize}
\item\textbf{Password reset tokens}
\begin{itemize}
\item Salt + timestamp prevents token reuse.
\item Each reset request generates unique token.
\end{itemize}
\item\textbf{API key generation}
\begin{itemize}
\item Salting ensures unique keys even for similar services.
\item Allows key rotation without pattern recognition.
\end{itemize}
\item\textbf{Database record anonymization}
\begin{itemize}
\item Different salts for different datasets prevent correlation attacks.
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}{Hash functions in the real world}
\begin{itemize}[<+->]
\item Hash functions you encounter ``in the wild'' are \textbf{standardized} as \textbf{unsalted} functions.
\item They have only a single input argument: a string to be hashed.
\end{itemize}
\end{frame}
\begin{frame}{How salt neutralizes precomputation}
\begin{itemize}[<+->]
\item With salted hashes, precomputation becomes useless:
\begin{itemize}
\item Attacker doesn't know the salts until they've already breached the system.
\item Can't precompute $H(S \| X)$ without knowing $S$.
\end{itemize}
\item Using \textbf{different} salt for each user:
\begin{itemize}
\item Forces attacker to choose which user to attack.
\item Attacking two victims requires twice the effort.
\item No more ``attack all users for the price of one''.
\end{itemize}
\item Even if attacker obtains both hash and salt, they still need to brute-force each password individually.
\end{itemize}
\end{frame}
\begin{frame}{Length extension attacks}{SHA-1 and SHA-2 vulnerability}
\definitionbox{Length Extension Attack}{
A \textbf{length extension attack} allows an attacker who knows $H(M)$ to calculate $H(M \| P)$ for some padding $P$ without knowing $M$ itself.
}
\begin{itemize}[<+->]
\item Affects hash functions built using the Merkle-Damgård construction:
\begin{itemize}
\item SHA-1, SHA-256, SHA-512, MD5, and others
\item Not SHA-3 (which uses the sponge construction)
\end{itemize}
\item An attacker who knows $H(\texttt{message})$ and the length of $\texttt{message}$ can compute $H(\texttt{message}\|\texttt{padding}\|\texttt{extra\_data})$.
\item This can break naive authentication schemes using $H(\texttt{secret}\|\texttt{message})$.
\item\textbf{Absorbing}: Input blocks are XORed into the state and processed.
\item\textbf{Squeezing}: Output blocks are extracted from the state.
\end{itemize}
\item Naturally resistant to length extension attacks.
\item Versatile: Can produce output of any desired length.
\end{itemize}
\end{column}
\begin{column}{0.45\textwidth}
\imagewithcaption{sponge.png}{Source: The Hash Function BLAKE (by Jean-Philippe Aumasson)}
\end{column}
\end{columns}
\end{frame}
\begin{frame}{HMAC: Protection against length extension}
\definitionbox{HMAC}{
\textbf{HMAC} (Hash-based Message Authentication Code) is a specific construction for calculating a message authentication code (MAC) using a cryptographic hash function and a secret key.
\item In 2017, Google and CWI Amsterdam researchers demonstrated the first practical collision for SHA-1.\footnote{\url{https://appliedcryptography.page/papers/\#shattered-sha1}}
\item Regular cryptographic hash functions (SHA-2, SHA-3) are designed to be \textbf{fast}.
\item But password hashing, slower is better!
\begin{itemize}
\item Makes attacks more expensive.
\item Users don't notice milliseconds of delay.
\item Attackers trying billions of guesses definitely notice.
\end{itemize}
\item Simple approach: Iterate the hash function thousands of times.
\begin{itemize}
\item$H(S, H(S, ... H(S,P)...))$ with many iterations.
\item\textbf{Obsolete examples}: PBKDF2.
\item\textbf{Modern examples}: Argon2, Scrypt
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}{PBKDF2: Password-Based Key Derivation Function 2}
\begin{columns}[c]
\begin{column}{0.6\textwidth}
\definitionbox{PBKDF2}{
\textbf{PBKDF2} (Password-Based Key Derivation Function 2) is a key derivation function that applies a pseudorandom function (like HMAC) to the input password along with a salt to produce a derived key.
\begin{frame}{Why not PBKDF2?}{Vulnerable to specialized hardware}
\begin{columns}[c]
\begin{column}{0.6\textwidth}
\begin{itemize}[<+->]
\item\textbf{Problem}: Attackers can build specialized hardware.
\begin{itemize}
\item ASICs can compute hashes millions of times faster.
\item Iteration alone isn't enough protection.
\end{itemize}
\item\textbf{Solution}: Memory-hard functions
\begin{itemize}
\item Require significant amount of memory.
\item Hardware specialization offers less advantage.
\item Both attackers and defenders use similar memory tech.
\end{itemize}
\end{itemize}
\end{column}
\begin{column}{0.4\textwidth}
\imagewithcaption{gpu_farm.jpg}{A GPU farm, likely built to mine Bitcoin, but which can also be used to do a lot of hashing really fast (Bitcoin mining is just calcualting a lot of SHA-256 hashes!)}
\end{column}
\end{columns}
\end{frame}
\begin{frame}{Blockchain and hash functions}
\begin{itemize}[<+->]
\item Bitcoin and blockchain technology rely heavily on hash functions:
\item\textbf{Merkle trees}: Efficient verification of large datasets.
\item\textbf{Block chaining}: Each block contains the hash of the previous block.
\end{itemize}
\item Miners try to find inputs that produce hashes with specific properties.
\item The extreme computational cost of Bitcoin mining (~90 TWh/year) demonstrates hash function security.
\item Bitcoin mining consumes more electricity than many countries.
\end{itemize}
\end{frame}
\begin{frame}{Scrypt: A memory-hard password hashing function}
\definitionbox{Scrypt}{
\textbf{Scrypt} is a password-based key derivation function designed to be computationally and memory-intensive, making it resistant to hardware acceleration attacks.
}
\begin{itemize}[<+->]
\item Created by Colin Percival in 2009 specifically to make attacks costly.
\item Key components:
\begin{itemize}
\item\textbf{PBKDF2}: Initial and final mixing of password and salt.
\item\textbf{ROMix}: Core memory-hard function.
\item\textbf{BlockMix}: Based on Salsa20/8 core function.
\end{itemize}
\item{Parameters:}
\begin{itemize}
\item\texttt{N}: CPU/memory cost
\item\texttt{r}: Block size
\item\texttt{p}: Parallelization
\end{itemize}
\end{itemize}
\end{frame}
\begin{frame}{Why Scrypt is memory-hard}
\begin{columns}[c]
\begin{column}{1\textwidth}
\begin{itemize}[<+->]
\item Scrypt enforces memory hardness through its ROMix function:
\begin{itemize}
\item Creates a large array of pseudorandom values (often many MB).
\item Each new value depends on previous values in unpredictable patterns.
\item Forces the entire array to be stored in memory during computation.
\end{itemize}
\item Memory hardness creates multiple benefits:
\begin{itemize}
\item GPUs/ASICs have limited memory per processing unit.
\item Memory is expensive and scales poorly on specialized hardware.
\item Time-memory tradeoffs are possible but highly unfavorable.
\item Using less memory makes computation exponentially slower.
\end{itemize}
\item Even with custom hardware, attackers face similar costs to defenders.