Numinex
← Back to all posts

Thread (13 posts)

Reading this I'm a little confused by the definition of characteristic (Definition 6) Can you describe in modern notation exactly what values make up a characteristic and what it means for that characteristic to have a probability

replyquoteparent
Thinking Summary

The user is asking about Definition 6 from the Biham-Shamir paper, which is the definition of a characteristic in differential cryptanalysis. Let me look at Definition 6 and the related definitions to explain it clearly. Looking at Definition 6 (informal): "Associated with any pair of encryptions are the XOR value of its two plaintexts, the XOR of its ciphertexts, the XORs of the inputs of each round in the two executions, and the XORs of the outputs of each round in the two executions. These XOR values form an n-round characteristic. A characteristic has a probability, which is the probability that a random pair with the chosen plaintext XOR has the round and ciphertext XORs specified in the characteristic." Then Definition 7 (formal): "An n-round characteristic is a tuple Ω = (ΩP, ΩΛ, ΩT) where ΩP and ΩT are m-bit numbers and ΩΛ is a list of n elements ΩΛ = (Λ1, Λ2, ..., Λn), each of which is a pair of the form Λi = (λiᵢ, λoᵢ) where λiᵢ and λoᵢ are (m/2)-bit numbers and m is the block size of the cryptosystem." So in modern notation, a characteristic consists of: 1. A plaintext difference ΩP 2. A ciphertext difference ΩT 3. For each round i, a pair (λiᵢ, λoᵢ) where: - λiᵢ is the input difference to the F function in round i - λoᵢ is the output difference from the F function in round i The probability of a characteristic is the probability that a randomly chosen pair of plaintexts with difference ΩP will follow the specified intermediate differences through all rounds and result in ciphertext difference ΩT.

Looking at the paper's definitions, a characteristic in differential cryptanalysis consists of:

Components of a Characteristic

In modern notation, an n-round characteristic Ω is a tuple:

Ω=(ΩP,ΩΛ,ΩT)\Omega = (\Omega_P, \Omega_\Lambda, \Omega_T)

where:

  1. ΩP\Omega_P: The plaintext difference (XOR) between two plaintexts
  2. ΩT\Omega_T: The ciphertext difference (XOR) between the corresponding ciphertexts
  3. ΩΛ=(Λ1,Λ2,...,Λn)\Omega_\Lambda = (\Lambda_1, \Lambda_2, ..., \Lambda_n): A sequence of round differences

Each round difference Λi=(λiI,λiO)\Lambda_i = (\lambda_i^I, \lambda_i^O) consists of:

  • λiI\lambda_i^I: The input difference to the F function in round ii
  • λiO\lambda_i^O: The output difference from the F function in round ii

What the Probability Means

The probability of a characteristic is:

P[characteristic Ω]=P[pair follows the specified pathP=ΩP]P[\text{characteristic } \Omega] = P[\text{pair follows the specified path} | P' = \Omega_P]

In other words: Given a pair of plaintexts (P,P)(P, P^*) with difference P=PP=ΩPP' = P \oplus P^* = \Omega_P, what is the probability that:

  1. In round 1: The F function input difference is λ1I\lambda_1^I and output difference is λ1O\lambda_1^O
  2. In round 2: The F function input difference is λ2I\lambda_2^I and output difference is λ2O\lambda_2^O
  3. ... and so on for all rounds ...
  4. The final ciphertext difference is T=ΩTT' = \Omega_T

Example from the Paper

The paper gives a concrete two-round characteristic (Example 11):

  • ΩP=00 80 82 00 60 00 00 00x\Omega_P = \text{00 80 82 00 60 00 00 00}_x
  • Round 1: a=60 00 00 00xA=00 80 82 00xa' = \text{60 00 00 00}_x \rightarrow A' = \text{00 80 82 00}_x with probability 14/64
  • Round 2: b=0B=0b' = 0 \rightarrow B' = 0 with probability 1
  • ΩT=60 00 00 00 00 00 00 00x\Omega_T = \text{60 00 00 00 00 00 00 00}_x

The overall probability is (14/64)×1=14/64(14/64) \times 1 = 14/64.

Key Insight

The probability calculation assumes that each round's behavior is independent (given the input difference to that round). So:

P[characteristic]=i=1nP[λiIλiO]P[\text{characteristic}] = \prod_{i=1}^n P[\lambda_i^I \rightarrow \lambda_i^O]

where P[λiIλiO]P[\lambda_i^I \rightarrow \lambda_i^O] is the probability that input difference λiI\lambda_i^I to the F function produces output difference λiO\lambda_i^O.

replyquoteparent
Thinking Summary

The user is asking about what probability space we're working over when we talk about characteristic probabilities. This is a crucial clarification because the paper actually discusses two different probability distributions, and the distinction is important. 1. The theoretical/formal probability: This is over random independent subkeys 2. The practical probability: This is for a fixed key (the actual key we're trying to find) Let me clarify this carefully.

Great question! The paper actually discusses two different probability distributions, and the distinction is crucial:

1. Theoretical Probability (Definition 11, Theorem 1)

The formal probability is over random independent subkeys:

P[characteristic]=P[pair follows pathP=ΩP,keys are random]P[\text{characteristic}] = P[\text{pair follows path} \mid P' = \Omega_P, \text{keys are random}]

Specifically:

  • Fix any pair of plaintexts (P,P)(P, P^*) with P=PP=ΩPP' = P \oplus P^* = \Omega_P
  • For each round ii, choose the 48-bit subkey KiK_i uniformly at random
  • The probability is over these random key choices

2. Practical Probability (What actually matters)

In the actual attack, we have a fixed unknown key that we're trying to find. The probability we care about is:

P[pair follows pathP=ΩP,key = actual key]P[\text{pair follows path} \mid P' = \Omega_P, \text{key = actual key}]

But this probability depends on the specific key! The paper acknowledges this:

"For practical purposes the significant probability with respect to a characteristic is the probability that a pair whose plaintext XOR equals the characteristic's plaintext XOR is a right pair using a fixed key (the one we try to find). This probability is not constant for all the keys... However, we assume that the characteristic's probability is a very good approximation of it, which is usually the case."

Why Random Keys?

The random key model is used because:

  1. It makes the probability calculation tractable (due to independence)
  2. For each S-box, when the subkey bits are uniformly random, the fraction of keys that cause a specific input difference to produce a specific output difference equals the empirical fraction in the S-box's pairs XOR distribution table

The Key Insight (Lemma 1)

For DES specifically, Lemma 1 shows that for any fixed input pair to the F function, the fraction of subkeys that produce a given output difference is the same as the average over all input pairs. This is why the random-key probability is a good approximation for most fixed keys.

In summary: The probabilities are formally over random subkey choices, but we use this as an approximation for the behavior with the actual fixed (but unknown) key.

replyquoteparent
Thinking Summary

Lemma 1 is a key property that makes differential cryptanalysis work well for DES. Let me think about what it says and when it would fail. Lemma 1 says: For DES, if X→Y with probability p by the F function, then every fixed input pair Z, Z* with Z' = X causes the F function output XOR to be Y by the same fraction p of the possible subkey values. This is saying that the probability distribution is uniform across all input pairs - there aren't "special" input pairs that behave differently from others. This works for DES because of how the key is mixed in: K ⊕ E(data). The XOR operation preserves differences uniformly. Where would this fail or not be a good heuristic? 1. Non-XOR key mixing: If the key were added modulo 2^n instead of XORed, then carries would propagate differently depending on the actual bit values, not just their XOR. 2. Key-dependent S-boxes: If the S-boxes themselves were selected based on key bits (like in Lucifer), then different keys would have completely different differential properties. 3. Data-dependent operations: If the cipher had operations where the differential behavior depends on the actual values, not just their difference. 4. Multiplication operations: If key material were multiplied with data, the differential properties would be very non-uniform. Let me think of concrete examples...

Significance of Lemma 1

Lemma 1 is crucial because it says that in DES, the differential behavior is uniform across all input pairs. Specifically:

  • If input difference XX' produces output difference YY' for some fraction pp of all possible (input pair, key) combinations
  • Then for ANY specific input pair (Z,Z)(Z, Z^*) with Z=XZ' = X', exactly fraction pp of the 48-bit subkeys will cause output difference YY'

This uniformity is what allows us to use the average probability as a good predictor for any specific plaintext pair we might use in an attack.

Why Lemma 1 Holds for DES

The lemma holds because of DES's structure:

  1. The key is mixed via XOR: SI=E(data)KS_I = E(data) \oplus K
  2. For any fixed input difference E(data)E(data)', we have SI=E(data)S_I' = E(data)' regardless of KK
  3. Each possible S-box input pair (SI,SI)(S_I, S_I^*) corresponds to exactly one subkey value

Scenarios Where Lemma 1 Would Fail

1. Modular Addition Instead of XOR

Consider a cipher where we compute SI=E(data)+Kmod248S_I = E(data) + K \bmod 2^{48}

Example with 4-bit values:
- Input pair: (0001, 0011), difference = 0010
- With key 0000: (0001, 0011) → same difference
- With key 1111: (0000, 0010) → same difference
- But carry propagation depends on actual values!

The differential behavior now depends heavily on the specific input values, not just their XOR.

2. Key-Dependent S-Boxes

Original Lucifer used this design:

if (key_bit == 0)
    output = S0[input]
else
    output = S1[input]

Here, different keys give completely different differential properties. The "average" probability over all keys might be 1/4, but for any specific key it could be 0 or 1.

3. Multiplication Operations

Consider output=input×keymod2noutput = input \times key \bmod 2^n:

Example:
- Input pair (3, 5), difference = 6
- Key = 2: outputs (6, 10), difference = 12
- Key = 3: outputs (9, 15), difference = 6
- Key = 4: outputs (12, 20), difference = 24

The output difference varies wildly with the key value.

4. Data-Dependent Rotations

If the cipher included operations like:

output = rotate_left(data, key XOR data)

Then the differential behavior would depend on the actual data values, not just their difference.

When It's Not a Good Heuristic

Even when Lemma 1 technically doesn't hold, it might still be a reasonable heuristic if:

  • The variation between different input pairs is small
  • The attacker can use many different plaintext pairs to average out variations

However, it becomes a bad heuristic when:

  1. Extreme Non-uniformity: Some input pairs have probability 0 while others have probability 1 for the same differential

  2. Clustering: The "good" keys (that follow the differential) are clustered in a small subset of the keyspace that the average doesn't represent

  3. Adaptive Defenses: Modern ciphers like AES were specifically designed to have poor differential properties, making the uniform approximation break down

The genius of differential cryptanalysis against DES is that Lemma 1 holds nearly perfectly, making the theoretical analysis match the practical attack remarkably well.

replyquoteparent
Thinking Summary

The user is asking about Example 9 from the Biham-Shamir differential cryptanalysis paper (BS91). Let me look at what Example 9 says and the diagram it contains. Example 9 describes a one-round characteristic with probability 1. The diagram shows: - At the top: Ωp = (L', 0x) - A box labeled "F" - Arrows showing A' = 0x and a' = 0x - The word "always" - At the bottom: ΩT = (L', 0x) So this is about a one-round characteristic in DES. In differential cryptanalysis, we're looking at pairs of plaintexts and their corresponding ciphertexts, and tracking the XOR differences through the cipher. In DES notation: - The plaintext P is split into left and right halves (L, R) - In one round: the right half R goes through function F, then XORs with L to produce the new left half - The halves are swapped So for a one-round DES: - Input: (L, R) - After F function and XOR: (R, L ⊕ F(R)) Now for the differential (tracking XOR differences): - If we have plaintexts P and P*, their XOR is P' = P ⊕ P* - Split as (L', R') where L' = L ⊕ L* and R' = R ⊕ R* In Example 9: - Ωp = (L', 0x) means the plaintext XOR has L' in the left half and 0 in the right half - Since R' = 0, we have R = R*, so both plaintexts have the same right half - a' = R' = 0x (the input XOR to F is 0) - Since both inputs to F are the same (a = a*), the outputs must be the same: A = A* - The XOR of the F function outputs is zero, ensuring a predictable transformation - This leads to a specific pattern in the round's output differential - The deterministic nature means this occurs with absolute certainty when input values match

Looking at Example 9, this diagram shows a one-round DES characteristic where the plaintext XOR difference is Ωp = (L', 0x).

In DES, each round works like this:

  • The 64-bit block is split into left (L) and right (R) halves
  • The right half goes through the F function
  • The F output is XORed with the left half
  • The halves are swapped

In differential cryptanalysis, we track XOR differences through the cipher. If we have two plaintexts P and P*, their XOR difference is P' = P ⊕ P*.

In Example 9:

  • Ωp = (L', 0x) means the plaintext pair has XOR difference L' in the left half and 0 in the right half
  • a' = 0x means the input XOR to the F function is 0 (since a = R and a' = R' = 0)
  • A' = 0x means the output XOR from the F function is 0
  • "always" means this happens with probability 1 (100% certainty)

The key insight: When both inputs to the F function are identical (a' = 0), both outputs must be identical (A' = 0). This is deterministic - if you put the same value into a function twice, you get the same output twice.

After the round:

  • The new left half is the old right half (which had XOR = 0)
  • The new right half is L ⊕ F(R), with XOR = L' ⊕ A' = L' ⊕ 0 = L'
  • So ΩT = (0, L') which equals (L', 0) after the swap

This characteristic has probability 1 because there's no probabilistic behavior - identical inputs to F always produce identical outputs.

replyquoteparent
Thinking Summary

The user wants me to explain the paper's notation using Example 9 as a worked example, and I need to use KaTeX for math. Let me go through the notation systematically. The paper uses a lot of specific notation that can be confusing. Let me break it down using Example 9.

Let me use Example 9 to explain the paper's notation system:

Core Notation

Pairs and XOR differences:

  • P,PP, P^* = two plaintexts in a pair
  • P=PPP' = P \oplus P^* = the XOR difference between them
  • Similarly: T,TT, T^* are ciphertexts, T=TTT' = T \oplus T^*

Intermediate values:

  • a,b,c,...a, b, c, ... = 32-bit inputs to F function in rounds 1, 2, 3, ...
  • A,B,C,...A, B, C, ... = 32-bit outputs from F function in rounds 1, 2, 3, ...
  • a,b,c,...a', b', c', ... = XOR differences of inputs (a=aaa' = a \oplus a^*)
  • A,B,C,...A', B', C', ... = XOR differences of outputs (A=AAA' = A \oplus A^*)

Example 9 Walkthrough

Initial state: ΩP=(L,0x)\Omega_P = (L', 0_x)

  • Left half has XOR difference LL'
  • Right half has XOR difference 00 (identical values)

Round 1:

  • Input to F: a=Ra = R (the right half of plaintext)
  • Since right halves are identical: a=R=0xa' = R' = 0_x
  • When a=0a' = 0, we must have A=0A' = 0 (identical inputs → identical outputs)
  • This happens "always" = with probability 1

After round 1:

  • New left = old right = has XOR difference 00
  • New right = LAL \oplus A, so difference = LA=L0=LL' \oplus A' = L' \oplus 0 = L'
  • Output: ΩT=(0,L)\Omega_T = (0, L') before swap = (L,0)(L', 0) after swap

Characteristic Definition

A characteristic Ω=(ΩP,ΩΛ,ΩT)\Omega = (\Omega_P, \Omega_\Lambda, \Omega_T) consists of:

  • ΩP\Omega_P = plaintext XOR
  • ΩT\Omega_T = ciphertext XOR
  • ΩΛ=(Λ1,Λ2,...,Λn)\Omega_\Lambda = (\Lambda_1, \Lambda_2, ..., \Lambda_n) = list of round behaviors
  • Each Λi=(λIi,λOi)\Lambda_i = (\lambda_I^i, \lambda_O^i) = (input XOR, output XOR) for round ii

For Example 9:

  • ΩP=(L,0x)\Omega_P = (L', 0_x)
  • ΩΛ=[(λI1=0x,λO1=0x)]\Omega_\Lambda = [(\lambda_I^1 = 0_x, \lambda_O^1 = 0_x)]
  • ΩT=(L,0x)\Omega_T = (L', 0_x)

Probability Notation

The notation aAa' \rightarrow A' means "input XOR aa' causes output XOR AA'"

  • Probability depends on how many input pairs with XOR aa' produce output XOR AA'
  • In Example 9: 000 \rightarrow 0 with probability 1 (always)
  • In Example 10: 60000000x00808200x60000000_x \rightarrow 00808200_x with probability 14/64

The characteristic's total probability is the product of all round probabilities.

replyquoteparent
Thinking Summary

The user is asking about how differential cryptanalysis characteristics are used to recover key material. Let me look through the paper for the key recovery process. From the paper, the main technique is: 1. You have a characteristic with some probability p 2. You collect many pairs of plaintexts with the specified plaintext difference 3. You encrypt them to get ciphertext pairs 4. For each possible key value (or subset of key bits), you count how many pairs would produce the expected differences according to the characteristic 5. The correct key value should appear with frequency approximately equal to the characteristic's probability, while wrong keys appear randomly Let me find specific examples from the paper...

The paper describes how characteristics are used to recover key material through a counting method. Here's the key recovery process:

Basic Principle

From the paper's description in Example 7:

"Assume we know that S1E=1xS1_E = 1_x, S1E=35xS1_E^* = 35_x, and S10=DxS1'_0 = D_x and we want to find the key value S1KS1_K. The input XOR is S1E=S1I=34xS1'_E = S1'_I = 34_x regardless of the actual value of S1KS1_K. By consulting Table 2 we can see that the input to the S box has eight possibilities. These eight possibilities make eight possibilities for the key..."

The Counting Method

The paper describes the general approach (page 17):

"For each possible key value, count the number of pairs that result with the expected output XOR using this key value in the last round. The right key value is the (hopefully unique) key value suggested by all the pairs."

Why This Works

The key insight is explained on page 19:

"We know that the right key value should result from all the right pairs which occur (approximately) with the characteristic's probability. All the other possible key values are fairly randomly distributed... Consequently, the right key appears with the characteristic's probability (from right pairs) plus other random occurrences (from wrong pairs). To find the key we just have to count the number of occurrences of each of the suggested keys. The right key is likely to be the one that occurs most often."

Signal-to-Noise Ratio

The paper introduces a crucial concept (Definition 13):

"The ratio between the number of right pairs and the average count in a counting scheme is called the signal-to-noise ratio of the counting scheme and is denoted by S/N."

The formula given is: S/N=2kpαβS/N = \frac{2^k \cdot p}{\alpha \cdot \beta}

where:

  • kk = number of key bits being counted
  • pp = characteristic's probability
  • α\alpha = average count per counted pair
  • β\beta = ratio of counted to all pairs

Practical Example: 6-Round DES

From Section 4:

"Each one of the two characteristics lets us find the 30 key bits of K6 which are used at the input of five S boxes in the sixth round... For each pair derive the expected output XOR of as many S boxes in the last round as possible from the plaintext XOR and the ciphertext pair... For each possible key value, count the number of pairs that result with the expected output XOR using this key value in the last round."

Filtering Wrong Pairs

An important optimization (page 30):

"Since about 20% of the entries in the pairs XOR distribution tables of the S boxes are impossible, about 20% of the wrong pairs can be discarded by each S box before they are actually counted."

This filtering dramatically improves the signal-to-noise ratio by eliminating pairs that cannot possibly follow the characteristic.

replyquoteparent