Thread (10 posts)

@hdevalence.bsky.social

2025-07-04T04:59:57.337Z

Model reconstruction

Link not found

replyquote

⧉

↴1 of 2 continuations→

@hdevalence.bsky.social

2025-07-04T05:05:04.818Z

One thing I’m not totally following from this paper is that they mention making essential use of the fact that the ReLU activation function has discontinuous derivative and so they can identify the precise points where it turns on or off. I’m thinking about the problem context where we generalize to other activation functions that may not have the same sharpness. In that case there’s a challenge because it would be more difficult to discern exactly where the boundary is and hence the parameters to be recovered. However even in the ReLU case theirs is some noise correct, and some statistical methods used to sample enough to get an estimate of the parameters. Could this generalize to smoother activation functions?

replyquoteparent

⧉

openai:o3-2025-04-16 via @hdevalence.bsky.social

2025-07-04T05:05:28.000Z

Thinking Summary

**Exploring activation functions** I need to address the point about the ReLU activation function’s discontinuous derivative, which allows for identifying specific activation points. If we generalize to other activation functions that lack this sharpness, it complicates discerning boundaries and parameters. There’s some noise in the ReLU case, and they use statistical methods for estimation. I'm curious if these approaches could apply to smoother functions and how piecewise linear functions like leaky ReLU or hard sigmoid might fit into this discussion. **Analyzing activation functions** When considering smooth activation functions, approximating second derivatives and inflection points can be quite challenging. For piecewise-linear functions like ReLU and leaky ReLU, critical points remain clear. However, for smooth activations like sigmoid and tanh, parameters become harder to extract due to continuous slope changes. I’d like to explore the implications of querying approximated derivatives and how the logistic function's properties could help recover weight vectors, but we face stability issues and noise. Let's highlight the distinction in detecting activation boundaries between these function types. **Exploring methods for activation functions** There are potential methods to explore, like treating the network as a polynomial composition or using symbolic regression to approximate it. Adding a small dither to the input can help approximate the Jacobian, though this could be costly. For ReLU, boundaries behave like measure-zero hyperplanes detectable through derivative sign changes. With smooth functions, identifying specific boundaries becomes difficult. However, approximating regions with small outputs is possible, but it complicates disentangling weight signs. I need to consider how piecewise-linear approximations might help, or if a gradient oracle is necessary.

Why ReLU is “special”

Piece-wise linearity
For a ReLU neuron
$g(u)=\max(0,u)=\begin{cases}0&u<0\\u&u\ge 0\end{cases}$

the input space is split by a single hyper-plane
$H:=\{x\mid w^{\top}x+b=0\}$ .

• On each side of H the neuron is perfectly linear.
• Crossing H flips the Jacobian contribution of that neuron from 0 to $w^{\top}(\cdot)$ .
• Along any straight line $\mu(\lambda)=x_1+\lambda(x_2-x_1)$ the global network output $f^*(\lambda)=f(\mu(\lambda))$ is therefore piece-wise linear: its first derivative has a jump exactly when $\mu(\lambda)$ hits a hyper-plane belonging to one ReLU.

Detecting that jump with two finite differences is what lets Canales-Martínez et al. (and Carlini-Jagielski-Mironov 2020) pinpoint a critical point and say: “between $\lambda_1$ and $\lambda_2$ one and only one ReLU flipped – recover its weight/sign now”.
Algebraic simplifications
Once the layer is known to be linear on both sides of a critical point, solving for weights reduces to solving linear (or at worst bilinear) systems. That is why the whole attack ultimately becomes polynomial.

What if the activation is still piece-wise linear but not ‘hard’?

• Leaky-ReLU, PReLU, hard-tanh, hard-sigmoid, ReLU6, etc.
These all keep the same “kink” – the derivative is discontinuous (slope changes from $a$ to $b$ ).
The attack generalises almost verbatim: in place of “slope = 0 vs slope = 1” you test “slope = a vs slope = b”, which is even easier because both sides remain informative.

• ‘ReLU with noise’ (e.g. dropout, quantisation error).
As long as the measurement noise is bounded and you can average a few repeated queries, the jump in the first derivative is still categorical; the paper already uses 200 samples per neuron in the “neuron-wiggle” stage exactly for this purpose.

Truly smooth activations (sigmoid, tanh, GELU, Swish)

Here the derivative is continuous, so the clean “one-bit jump” disappears:

No unique boundary:
A sigmoid neuron transitions gradually over an $O(\|w\|^{-1})$ wide slab instead of a zero-measure hyper-plane. Two neighbouring neurons’ transition regions easily overlap, so following a line $\mu(\lambda)$ you see a sum of many smooth bumps; separating them is a de-convolution problem, not a simple change-point detection.
Ambiguous scaling:
For ReLU you can absorb any positive scalar into the next layer, but negative scalars are forbidden by the sign of the kink. With sigmoids the function $\sigma(\alpha u)$ for $\alpha>0$ is still monotone and can be re-absorbed by re-scaling the outgoing weights, so recovering the exact scale of $w$ (and its sign) from black-box queries alone becomes ill-posed.
Higher-order derivatives help only with a gradient oracle:
If the attacker could query $\nabla f(x)$ exactly, one could in principle solve non-linear systems that involve $\sigma'(\cdot)$ in place of the ReLU mask. In the pure prediction-only model you would have to estimate those gradients with finite differences; the error grows as $O(\varepsilon^{-1})$ for first derivatives and $O(\varepsilon^{-2})$ for second, so the number of queries blows up quickly.

Known theoretical results mirror this: with sigmoids, even deciding consistency of a 2-layer 3-neuron network given labeled samples is NP-complete (Blum–Rivest 1993). Allowing chosen queries helps, but no polynomial-time black-box extraction algorithm is currently known.

Practical take-away

• Piece-wise-linear (kinky) activations ⇒ the Crypto ’20 / TII ’23 extraction works.

• Mild modification (leaky-ReLU, ReLU6, hard-sigmoid, etc.) ⇒ same attack with minor slope accounting.

• Smooth activations ⇒ sharp “critical-point” trick disappears; you would have to shift to optimisation-based model stealing (train a surrogate on many query/response pairs) or to side-channel / gradient-oracle attacks.

So the “sharpness” is not a small implementation detail—it is the linchpin that turns a messy non-linear identification problem into a clean linear-algebra one.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-04T05:20:33.213Z

Can you discuss the paper’s statistical methods and how they distinguish signal from noise? backfill as much context from the paper as necessary

replyquoteparent

⧉

openai:o3-2025-04-16 via @hdevalence.bsky.social

2025-07-04T05:20:40.000Z

Generation Error: streaming error: received error while streaming: {"type":"tokens","code":"rate_limit_exceeded","message":"Request too large for o3 in organization org-cJxzFBGxo2GzbKflv8asSlTY on tokens per min (TPM): Limit 30000, Requested 30678. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.","param":null}

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-04T05:05:34.000Z

Thinking Summary

The user is asking about the paper on extracting neural network models that uses the fact that ReLU has a discontinuous derivative. They want to know if the methods could generalize to smoother activation functions that don't have the same sharp transitions. This is a good question. The paper relies heavily on "critical points" - the points where a ReLU neuron transitions from inactive (outputting 0) to active (outputting the linear function). At these points, there's a sharp change in the derivative. Let me think about whether this could work for smooth activation functions: 1. The paper's core approach uses: - Finding critical points where neurons switch states - Using the abrupt changes in derivatives at these points - The fact that the network is piecewise linear between critical points 2. For smooth activation functions like sigmoid or tanh: - There are no "critical points" in the same sense - The derivatives change continuously - The network is never truly linear in any region 3. However, the user makes a good point that even in the ReLU case, they use statistical methods and sample multiple points to deal with noise. So could we adapt the approach? Some thoughts: - For smooth activations, we might still be able to extract information about the weights by looking at how outputs change with inputs - We could potentially use higher-order derivatives or look at regions where the activation function is approximately linear - The problem would likely be much harder because we lose the piecewise linear structure The paper actually explicitly states that their attack cannot be generalized to smooth activation functions. Let me find that quote... Yes, from the paper: "We can generalize the attack to any other type of activation function that looks like a piecewise linear function (with possibly more than one sharp bend), such as a leaky ReLU, but not to smooth activation functions such as a sigmoid which have no critical points." So the authors are pretty clear that their specific approach doesn't work for smooth activations. But the user's question about whether statistical methods could help is interesting.

This is an excellent observation about the paper's methodology. You're right that the authors make essential use of ReLU's discontinuous derivative at the critical points where neurons switch from inactive to active states. The paper explicitly states they cannot generalize to smooth activation functions:

"We can generalize the attack to any other type of activation function that looks like a piecewise linear function (with possibly more than one sharp bend), such as a leaky ReLU, but not to smooth activation functions such as a sigmoid which have no critical points."

However, your insight about the statistical nature of their methods is intriguing. Even for ReLUs, they:

Use the "Neuron Wiggle" technique which samples ~200 critical points per neuron
Employ confidence levels (α) to handle uncertainty
Deal with finite precision arithmetic that introduces noise

For smooth activation functions like sigmoid or tanh, while we lose the sharp critical points, we might consider:

Potential approaches:

Gradient-based methods: Since smooth functions have well-defined gradients everywhere, we could potentially use gradient information more directly
Linearization regions: Even smooth activations have regions where they're approximately linear (e.g., sigmoid near x=0)
Higher-order derivatives: The curvature information might reveal network parameters

Challenges:

Without critical points, identifying individual neuron behaviors becomes much harder
The piecewise linear structure that enables layer-by-layer "peeling" disappears
The signal-to-noise ratio would likely be much worse

The statistical sampling approach might help, but you'd probably need a fundamentally different algorithm architecture rather than adapting their critical-point-based method. It's an interesting research direction though - perhaps combining ideas from this paper with other model extraction techniques designed for smooth activations.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-04T05:20:18.380Z

Can you discuss the paper’s statistical methods and how they distinguish signal from noise? backfill as much context from the paper as necessary

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-04T05:20:52.000Z

Thinking Summary

The user is asking about the statistical methods in the paper, particularly how they distinguish signal from noise. Let me go through the paper and extract the relevant information about their statistical approach, particularly focusing on the Neuron Wiggle technique and how they handle uncertainty. Key sections to look at: 1. The Neuron Wiggle Sign-Recovery method (section 4.2) 2. The signal-to-noise analysis 3. The confidence level system 4. Appendix A which has the signal-to-noise ratio analysis Let me extract the key details and explain them clearly.

The paper's statistical methods are primarily embodied in their Neuron Wiggle technique, which is their main innovation for polynomial-time sign recovery. Let me break down how they distinguish signal from noise:

The Core Problem

When recovering the sign of a neuron's weights, they need to determine whether a neuron is active (positive input to ReLU) or inactive (negative input) at a critical point. The challenge is that the "signal" from the target neuron is mixed with "noise" from all other neurons in the layer.

Signal vs Noise Definition

For a target neuron $j$ in layer $i$ :

Signal: The contribution $c_j e_j$ where $e_j$ is the change in neuron $j$ 's value and $c_j$ is its output coefficient
Noise: The sum $\sum_{k \in I \setminus \{j\}} c_k e_k$ from all other active neurons

The key equation they analyze is: $f(x^* + \Delta) - f(x^*) = c_j e_j + \sum_{k \in I \setminus \{j\}} c_k e_k$

Statistical Decision Making

They make decisions by comparing output changes in opposite directions:

$L = f(x^* - \Delta) - f(x^*)$ (left difference)
$R = f(x^* + \Delta) - f(x^*)$ (right difference)

If $|L| > |R|$ , they decide the neuron has negative sign; otherwise positive.

Confidence Level System

The paper introduces a confidence level $\alpha$ calculated as:

$\alpha = s_-/s$ for negative sign decisions (where $s_-$ is votes for negative)
$\alpha = s_+/s$ for positive sign decisions (where $s_+$ is votes for positive)

They use an adaptive threshold $\alpha_0$ (typically ~0.7-0.8) and reanalyze neurons with $\alpha \leq \alpha_0$ .

Output Randomization

The key insight is that different input points $x$ produce different output coefficient matrices $G^{(i+1)}$ . This "output randomization" means that even if one critical point gives bad signal-to-noise ratio, sampling multiple points (they use 200) provides statistical power to make correct decisions.

Signal-to-Noise Analysis (Appendix A)

For unitary balanced networks, they show:

The wiggle in the target neuron direction produces a change of magnitude 1
Other neurons see changes of approximately $\pm 1/\sqrt{n}$ (due to near-orthogonality in high dimensions)
After ReLUs, the expected norm when target is active: $\sqrt{1.5} \approx 1.22$
Expected norm when target is inactive: $\sqrt{0.5} \approx 0.71$

This ~2:1 ratio persists through subsequent layers, making detection feasible.

Practical Results

From their CIFAR10 experiments:

200 samples per neuron were sufficient
Most signs recovered with high confidence (α > 0.9)
Only 0-4 neurons per layer had low confidence requiring reanalysis
No incorrect decisions were made with high confidence

Key Statistical Properties

High-dimensional blessing: In high dimensions, random vectors tend to be nearly orthogonal, improving signal-to-noise ratio
Multiple sampling: Reduces impact of occasional "bad" coefficient configurations
Adaptive thresholding: Identifies potentially incorrect decisions for reanalysis
Robustness: The method gracefully degrades as degrees of freedom decrease in deeper layers

The paper notes this statistical approach fails only in the last hidden layer where there's no output randomization (fixed linear mapping to outputs), requiring their specialized "Last Hidden Layer" technique instead.

replyquoteparent

⧉