The original differential cryptanalysis paper (BS91)
Thread (13 posts)
Reading this I'm a little confused by the definition of characteristic (Definition 6) Can you describe in modern notation exactly what values make up a characteristic and what it means for that characteristic to have a probability
Thinking Summary
The user is asking about Definition 6 from the Biham-Shamir paper, which is the definition of a characteristic in differential cryptanalysis. Let me look at Definition 6 and the related definitions to explain it clearly. Looking at Definition 6 (informal): "Associated with any pair of encryptions are the XOR value of its two plaintexts, the XOR of its ciphertexts, the XORs of the inputs of each round in the two executions, and the XORs of the outputs of each round in the two executions. These XOR values form an n-round characteristic. A characteristic has a probability, which is the probability that a random pair with the chosen plaintext XOR has the round and ciphertext XORs specified in the characteristic." Then Definition 7 (formal): "An n-round characteristic is a tuple Ω = (ΩP, ΩΛ, ΩT) where ΩP and ΩT are m-bit numbers and ΩΛ is a list of n elements ΩΛ = (Λ1, Λ2, ..., Λn), each of which is a pair of the form Λi = (λiᵢ, λoᵢ) where λiᵢ and λoᵢ are (m/2)-bit numbers and m is the block size of the cryptosystem." So in modern notation, a characteristic consists of: 1. A plaintext difference ΩP 2. A ciphertext difference ΩT 3. For each round i, a pair (λiᵢ, λoᵢ) where: - λiᵢ is the input difference to the F function in round i - λoᵢ is the output difference from the F function in round i The probability of a characteristic is the probability that a randomly chosen pair of plaintexts with difference ΩP will follow the specified intermediate differences through all rounds and result in ciphertext difference ΩT.
Looking at the paper's definitions, a characteristic in differential cryptanalysis consists of:
Components of a Characteristic
In modern notation, an n-round characteristic Ω is a tuple:
where:
- : The plaintext difference (XOR) between two plaintexts
- : The ciphertext difference (XOR) between the corresponding ciphertexts
- : A sequence of round differences
Each round difference consists of:
- : The input difference to the F function in round
- : The output difference from the F function in round
What the Probability Means
The probability of a characteristic is:
In other words: Given a pair of plaintexts with difference , what is the probability that:
- In round 1: The F function input difference is and output difference is
- In round 2: The F function input difference is and output difference is
- ... and so on for all rounds ...
- The final ciphertext difference is
Example from the Paper
The paper gives a concrete two-round characteristic (Example 11):
- Round 1: with probability 14/64
- Round 2: with probability 1
The overall probability is .
Key Insight
The probability calculation assumes that each round's behavior is independent (given the input difference to that round). So:
where is the probability that input difference to the F function produces output difference .
Sorry when you talk about probabilities there what are you taking as the distribution? Probabilities ove what?
Thinking Summary
The user is asking about what probability space we're working over when we talk about characteristic probabilities. This is a crucial clarification because the paper actually discusses two different probability distributions, and the distinction is important. 1. The theoretical/formal probability: This is over random independent subkeys 2. The practical probability: This is for a fixed key (the actual key we're trying to find) Let me clarify this carefully.
Great question! The paper actually discusses two different probability distributions, and the distinction is crucial:
1. Theoretical Probability (Definition 11, Theorem 1)
The formal probability is over random independent subkeys:
Specifically:
- Fix any pair of plaintexts with
- For each round , choose the 48-bit subkey uniformly at random
- The probability is over these random key choices
2. Practical Probability (What actually matters)
In the actual attack, we have a fixed unknown key that we're trying to find. The probability we care about is:
But this probability depends on the specific key! The paper acknowledges this:
"For practical purposes the significant probability with respect to a characteristic is the probability that a pair whose plaintext XOR equals the characteristic's plaintext XOR is a right pair using a fixed key (the one we try to find). This probability is not constant for all the keys... However, we assume that the characteristic's probability is a very good approximation of it, which is usually the case."
Why Random Keys?
The random key model is used because:
- It makes the probability calculation tractable (due to independence)
- For each S-box, when the subkey bits are uniformly random, the fraction of keys that cause a specific input difference to produce a specific output difference equals the empirical fraction in the S-box's pairs XOR distribution table
The Key Insight (Lemma 1)
For DES specifically, Lemma 1 shows that for any fixed input pair to the F function, the fraction of subkeys that produce a given output difference is the same as the average over all input pairs. This is why the random-key probability is a good approximation for most fixed keys.
In summary: The probabilities are formally over random subkey choices, but we use this as an approximation for the behavior with the actual fixed (but unknown) key.
Can you discuss the significance of lemma 1 a bit more and describe scenarios where the lemma would both not apply and not be a good heuristic
Thinking Summary
Lemma 1 is a key property that makes differential cryptanalysis work well for DES. Let me think about what it says and when it would fail. Lemma 1 says: For DES, if X→Y with probability p by the F function, then every fixed input pair Z, Z* with Z' = X causes the F function output XOR to be Y by the same fraction p of the possible subkey values. This is saying that the probability distribution is uniform across all input pairs - there aren't "special" input pairs that behave differently from others. This works for DES because of how the key is mixed in: K ⊕ E(data). The XOR operation preserves differences uniformly. Where would this fail or not be a good heuristic? 1. Non-XOR key mixing: If the key were added modulo 2^n instead of XORed, then carries would propagate differently depending on the actual bit values, not just their XOR. 2. Key-dependent S-boxes: If the S-boxes themselves were selected based on key bits (like in Lucifer), then different keys would have completely different differential properties. 3. Data-dependent operations: If the cipher had operations where the differential behavior depends on the actual values, not just their difference. 4. Multiplication operations: If key material were multiplied with data, the differential properties would be very non-uniform. Let me think of concrete examples...
Significance of Lemma 1
Lemma 1 is crucial because it says that in DES, the differential behavior is uniform across all input pairs. Specifically:
- If input difference produces output difference for some fraction of all possible (input pair, key) combinations
- Then for ANY specific input pair with , exactly fraction of the 48-bit subkeys will cause output difference
This uniformity is what allows us to use the average probability as a good predictor for any specific plaintext pair we might use in an attack.
Why Lemma 1 Holds for DES
The lemma holds because of DES's structure:
- The key is mixed via XOR:
- For any fixed input difference , we have regardless of
- Each possible S-box input pair corresponds to exactly one subkey value
Scenarios Where Lemma 1 Would Fail
1. Modular Addition Instead of XOR
Consider a cipher where we compute
Example with 4-bit values:
- Input pair: (0001, 0011), difference = 0010
- With key 0000: (0001, 0011) → same difference
- With key 1111: (0000, 0010) → same difference
- But carry propagation depends on actual values!
The differential behavior now depends heavily on the specific input values, not just their XOR.
2. Key-Dependent S-Boxes
Original Lucifer used this design:
if (key_bit == 0)
output = S0[input]
else
output = S1[input]
Here, different keys give completely different differential properties. The "average" probability over all keys might be 1/4, but for any specific key it could be 0 or 1.
3. Multiplication Operations
Consider :
Example:
- Input pair (3, 5), difference = 6
- Key = 2: outputs (6, 10), difference = 12
- Key = 3: outputs (9, 15), difference = 6
- Key = 4: outputs (12, 20), difference = 24
The output difference varies wildly with the key value.
4. Data-Dependent Rotations
If the cipher included operations like:
output = rotate_left(data, key XOR data)
Then the differential behavior would depend on the actual data values, not just their difference.
When It's Not a Good Heuristic
Even when Lemma 1 technically doesn't hold, it might still be a reasonable heuristic if:
- The variation between different input pairs is small
- The attacker can use many different plaintext pairs to average out variations
However, it becomes a bad heuristic when:
-
Extreme Non-uniformity: Some input pairs have probability 0 while others have probability 1 for the same differential
-
Clustering: The "good" keys (that follow the differential) are clustered in a small subset of the keyspace that the average doesn't represent
-
Adaptive Defenses: Modern ciphers like AES were specifically designed to have poor differential properties, making the uniform approximation break down
The genius of differential cryptanalysis against DES is that Lemma 1 holds nearly perfectly, making the theoretical analysis match the practical attack remarkably well.
Actually, instead can you explain the diagram in example 9 and specifically what the word "always" means?
Thinking Summary
The user is asking about Example 9 from the Biham-Shamir differential cryptanalysis paper (BS91). Let me look at what Example 9 says and the diagram it contains. Example 9 describes a one-round characteristic with probability 1. The diagram shows: - At the top: Ωp = (L', 0x) - A box labeled "F" - Arrows showing A' = 0x and a' = 0x - The word "always" - At the bottom: ΩT = (L', 0x) So this is about a one-round characteristic in DES. In differential cryptanalysis, we're looking at pairs of plaintexts and their corresponding ciphertexts, and tracking the XOR differences through the cipher. In DES notation: - The plaintext P is split into left and right halves (L, R) - In one round: the right half R goes through function F, then XORs with L to produce the new left half - The halves are swapped So for a one-round DES: - Input: (L, R) - After F function and XOR: (R, L ⊕ F(R)) Now for the differential (tracking XOR differences): - If we have plaintexts P and P*, their XOR is P' = P ⊕ P* - Split as (L', R') where L' = L ⊕ L* and R' = R ⊕ R* In Example 9: - Ωp = (L', 0x) means the plaintext XOR has L' in the left half and 0 in the right half - Since R' = 0, we have R = R*, so both plaintexts have the same right half - a' = R' = 0x (the input XOR to F is 0) - Since both inputs to F are the same (a = a*), the outputs must be the same: A = A* - The XOR of the F function outputs is zero, ensuring a predictable transformation - This leads to a specific pattern in the round's output differential - The deterministic nature means this occurs with absolute certainty when input values match
Looking at Example 9, this diagram shows a one-round DES characteristic where the plaintext XOR difference is Ωp = (L', 0x).
In DES, each round works like this:
- The 64-bit block is split into left (L) and right (R) halves
- The right half goes through the F function
- The F output is XORed with the left half
- The halves are swapped
In differential cryptanalysis, we track XOR differences through the cipher. If we have two plaintexts P and P*, their XOR difference is P' = P ⊕ P*.
In Example 9:
- Ωp = (L', 0x) means the plaintext pair has XOR difference L' in the left half and 0 in the right half
- a' = 0x means the input XOR to the F function is 0 (since a = R and a' = R' = 0)
- A' = 0x means the output XOR from the F function is 0
- "always" means this happens with probability 1 (100% certainty)
The key insight: When both inputs to the F function are identical (a' = 0), both outputs must be identical (A' = 0). This is deterministic - if you put the same value into a function twice, you get the same output twice.
After the round:
- The new left half is the old right half (which had XOR = 0)
- The new right half is L ⊕ F(R), with XOR = L' ⊕ A' = L' ⊕ 0 = L'
- So ΩT = (0, L') which equals (L', 0) after the swap
This characteristic has probability 1 because there's no probabilistic behavior - identical inputs to F always produce identical outputs.
Remember to use KaTeX
Can you use this as a worked example to explain the papers notation
Thinking Summary
The user wants me to explain the paper's notation using Example 9 as a worked example, and I need to use KaTeX for math. Let me go through the notation systematically. The paper uses a lot of specific notation that can be confusing. Let me break it down using Example 9.
Let me use Example 9 to explain the paper's notation system:
Core Notation
Pairs and XOR differences:
- = two plaintexts in a pair
- = the XOR difference between them
- Similarly: are ciphertexts,
Intermediate values:
- = 32-bit inputs to F function in rounds 1, 2, 3, ...
- = 32-bit outputs from F function in rounds 1, 2, 3, ...
- = XOR differences of inputs ()
- = XOR differences of outputs ()
Example 9 Walkthrough
Initial state:
- Left half has XOR difference
- Right half has XOR difference (identical values)
Round 1:
- Input to F: (the right half of plaintext)
- Since right halves are identical:
- When , we must have (identical inputs → identical outputs)
- This happens "always" = with probability 1
After round 1:
- New left = old right = has XOR difference
- New right = , so difference =
- Output: before swap = after swap
Characteristic Definition
A characteristic consists of:
- = plaintext XOR
- = ciphertext XOR
- = list of round behaviors
- Each = (input XOR, output XOR) for round
For Example 9:
Probability Notation
The notation means "input XOR causes output XOR "
- Probability depends on how many input pairs with XOR produce output XOR
- In Example 9: with probability 1 (always)
- In Example 10: with probability 14/64
The characteristic's total probability is the product of all round probabilities.
Ok. We have our definition of characteristic.
How do we use that do recover information about key material? Cite ifferent parts of the paper as needed
Thinking Summary
The user is asking about how differential cryptanalysis characteristics are used to recover key material. Let me look through the paper for the key recovery process. From the paper, the main technique is: 1. You have a characteristic with some probability p 2. You collect many pairs of plaintexts with the specified plaintext difference 3. You encrypt them to get ciphertext pairs 4. For each possible key value (or subset of key bits), you count how many pairs would produce the expected differences according to the characteristic 5. The correct key value should appear with frequency approximately equal to the characteristic's probability, while wrong keys appear randomly Let me find specific examples from the paper...
The paper describes how characteristics are used to recover key material through a counting method. Here's the key recovery process:
Basic Principle
From the paper's description in Example 7:
"Assume we know that , , and and we want to find the key value . The input XOR is regardless of the actual value of . By consulting Table 2 we can see that the input to the S box has eight possibilities. These eight possibilities make eight possibilities for the key..."
The Counting Method
The paper describes the general approach (page 17):
"For each possible key value, count the number of pairs that result with the expected output XOR using this key value in the last round. The right key value is the (hopefully unique) key value suggested by all the pairs."
Why This Works
The key insight is explained on page 19:
"We know that the right key value should result from all the right pairs which occur (approximately) with the characteristic's probability. All the other possible key values are fairly randomly distributed... Consequently, the right key appears with the characteristic's probability (from right pairs) plus other random occurrences (from wrong pairs). To find the key we just have to count the number of occurrences of each of the suggested keys. The right key is likely to be the one that occurs most often."
Signal-to-Noise Ratio
The paper introduces a crucial concept (Definition 13):
"The ratio between the number of right pairs and the average count in a counting scheme is called the signal-to-noise ratio of the counting scheme and is denoted by S/N."
The formula given is:
where:
- = number of key bits being counted
- = characteristic's probability
- = average count per counted pair
- = ratio of counted to all pairs
Practical Example: 6-Round DES
From Section 4:
"Each one of the two characteristics lets us find the 30 key bits of K6 which are used at the input of five S boxes in the sixth round... For each pair derive the expected output XOR of as many S boxes in the last round as possible from the plaintext XOR and the ciphertext pair... For each possible key value, count the number of pairs that result with the expected output XOR using this key value in the last round."
Filtering Wrong Pairs
An important optimization (page 30):
"Since about 20% of the entries in the pairs XOR distribution tables of the S boxes are impossible, about 20% of the wrong pairs can be discarded by each S box before they are actually counted."
This filtering dramatically improves the signal-to-noise ratio by eliminating pairs that cannot possibly follow the characteristic.