Thread (26 posts)

@hdevalence.bsky.social

2025-07-04T04:59:57.337Z

Model reconstruction

Link not found

replyquote

⧉

↴1 of 6 continuations→

@hdevalence.bsky.social

2025-07-11T05:52:35.064Z

In the paper there’s an intuition building figure showing a 2D input space and plotting a function along a line interpolating two input points. The function is real-valued so it can be plotted in 2D and put in a chart. That chart has sharp corners corresponding to the patches of the activation functions.

But in a real system the input and output are high dimensional vectors correct? How does the method in the paper find the corners of a function whose output is high dimensional?

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-11T05:52:54.000Z

Thinking Summary

The user is asking about how the method in the paper handles finding critical points when the output is high-dimensional. They're referring to Figure 2 in the paper, which shows a nice 2D visualization of how the function changes along a line between two input points, with visible sharp corners where neurons toggle between active/inactive states. The key insight is that while the output is indeed a high-dimensional vector, each individual output coordinate is still a scalar function that can be analyzed along the line between two input points. The paper doesn't need to find corners in a high-dimensional output space - instead, it finds corners in the scalar functions corresponding to each output coordinate. Let me explain this clearly without using analogies.

You're right that in real neural networks both the input and output are high-dimensional vectors. The key insight is that the method doesn't need to find corners in a high-dimensional output space. Instead, it analyzes each output coordinate separately as a scalar function.

When you interpolate between two input points $x_1$ and $x_2$ in the high-dimensional input space via $\mu(\lambda) = x_1 + \lambda(x_2 - x_1)$ , each output coordinate $f_j(\mu(\lambda))$ becomes a scalar function of $\lambda \in [0,1]$ . So if your network has 10 outputs, you get 10 separate piecewise linear functions, each of which can be analyzed for critical points.

The critical points (where ReLUs toggle) affect all output coordinates simultaneously. When a neuron in some hidden layer switches from active to inactive (or vice versa) as you move along the interpolation path, this creates a sharp corner in the piecewise linear function for each output coordinate. The slopes before and after this point will be different for each output, but the location $\lambda^*$ where the corner occurs is the same across all outputs.

The paper's algorithm finds these corners by:

Measuring slopes at different points along the interpolation
Finding where slopes change (indicating a corner)
Mapping back from $\lambda^*$ to the actual input point $x^* = \mu(\lambda^*)$

This approach works because ReLU networks are piecewise linear - the high-dimensional output function is linear within each region where all neurons maintain their active/inactive states, with sharp transitions only at the boundaries where neurons toggle.

replyquoteparent

⧉

↴1 of 2 continuations→

@hdevalence.bsky.social

2025-07-11T05:56:43.409Z

Can you use the definition of ReLU and the definition of neuron to give a worked proof of why the critical points where the ReLUs toggle affect all coordinates simultaneously?

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-11T05:57:09.000Z

Thinking Summary

The user wants a proof showing why when a ReLU toggles, it affects all output coordinates simultaneously. Let me work through this carefully using the definitions from the paper. From the paper: - ReLU: $\sigma(x) = \max(x, 0)$ - Neuron j in layer i: $\eta(x) = \sigma(A_j^{(i)}x + b_j^{(i)})$ - The network is composed as: $f = f_{r+1} \circ \sigma \circ \cdots \circ \sigma \circ f_2 \circ \sigma \circ f_1$ The key insight is that when we move along an interpolation path $\mu(\lambda) = x_1 + \lambda(x_2 - x_1)$, a neuron becomes critical when its pre-activation value crosses zero. This happens at a specific value of $\lambda$, and this affects the entire computation graph downstream from that neuron. Let me write a clear proof.

Let me prove this using the definitions from the paper.

Definitions:

ReLU: $\sigma(x) = \max(x, 0)$
Neuron $j$ in layer $i$ : $\eta_j^{(i)}(x) = \sigma(A_j^{(i)}x + b_j^{(i)})$
Network: $f = f_{r+1} \circ \sigma \circ \cdots \circ \sigma \circ f_2 \circ \sigma \circ f_1$

Proof:

Consider an interpolation path $\mu(\lambda) = x_1 + \lambda(x_2 - x_1)$ for $\lambda \in [0,1]$ .

For neuron $j$ in layer $i$ , its pre-activation value along this path is: $v_j^{(i)}(\lambda) = A_j^{(i)} \cdot \text{(input to layer i at } \mu(\lambda)) + b_j^{(i)}$

This pre-activation value is a linear function of $\lambda$ (since the input to layer $i$ depends linearly on $\lambda$ when all neurons maintain their states). The neuron becomes critical when $v_j^{(i)}(\lambda^*) = 0$ for some $\lambda^*$ .

Now consider the network's output. Let's denote the $k$ -th output coordinate as $f_k$ . We can write:

$f_k(\mu(\lambda)) = \sum_{j=1}^{d_r} w_{kj}^{(r+1)} \cdot \eta_j^{(r)}(\cdots) + b_k^{(r+1)}$

where $w_{kj}^{(r+1)}$ are the weights in the final layer.

The key observation: When neuron $j$ in layer $i$ toggles at $\lambda = \lambda^*$ :

For $\lambda < \lambda^*$ : $\eta_j^{(i)} = 0$ (if going from inactive to active)
For $\lambda > \lambda^*$ : $\eta_j^{(i)} = A_j^{(i)} \cdot \text{(input)} + b_j^{(i)}$

This change propagates through all subsequent layers. Every path from this neuron to any output coordinate will experience a change in its contribution at exactly $\lambda = \lambda^*$ .

Specifically, the derivative $\frac{d f_k}{d\lambda}$ will have a discontinuity at $\lambda^*$ for every output coordinate $k$ that has any dependency on neuron $j$ in layer $i$ . Since the network is fully connected, every neuron in layer $i$ influences every output coordinate (unless some weights happen to be exactly zero).

Therefore, when neuron $j$ toggles at $\lambda^*$ :

The function $f_k(\mu(\lambda))$ experiences a corner (slope change) at $\lambda^*$ for all $k$
The location $\lambda^*$ is the same for all output coordinates
Only the magnitude of the slope change differs between outputs

This is why the paper can find critical points by analyzing any output coordinate - the $\lambda^*$ values where corners occur are identical across all outputs.

replyquoteparent

⧉