Model reconstruction
Thread (26 posts)
In the paper there’s an intuition building figure showing a 2D input space and plotting a function along a line interpolating two input points. The function is real-valued so it can be plotted in 2D and put in a chart. That chart has sharp corners corresponding to the patches of the activation functions.
But in a real system the input and output are high dimensional vectors correct? How does the method in the paper find the corners of a function whose output is high dimensional?
Thinking Summary
The user is asking about how the method in the paper handles finding critical points when the output is high-dimensional. They're referring to Figure 2 in the paper, which shows a nice 2D visualization of how the function changes along a line between two input points, with visible sharp corners where neurons toggle between active/inactive states. The key insight is that while the output is indeed a high-dimensional vector, each individual output coordinate is still a scalar function that can be analyzed along the line between two input points. The paper doesn't need to find corners in a high-dimensional output space - instead, it finds corners in the scalar functions corresponding to each output coordinate. Let me explain this clearly without using analogies.
You're right that in real neural networks both the input and output are high-dimensional vectors. The key insight is that the method doesn't need to find corners in a high-dimensional output space. Instead, it analyzes each output coordinate separately as a scalar function.
When you interpolate between two input points and in the high-dimensional input space via , each output coordinate becomes a scalar function of . So if your network has 10 outputs, you get 10 separate piecewise linear functions, each of which can be analyzed for critical points.
The critical points (where ReLUs toggle) affect all output coordinates simultaneously. When a neuron in some hidden layer switches from active to inactive (or vice versa) as you move along the interpolation path, this creates a sharp corner in the piecewise linear function for each output coordinate. The slopes before and after this point will be different for each output, but the location where the corner occurs is the same across all outputs.
The paper's algorithm finds these corners by:
- Measuring slopes at different points along the interpolation
- Finding where slopes change (indicating a corner)
- Mapping back from to the actual input point
This approach works because ReLU networks are piecewise linear - the high-dimensional output function is linear within each region where all neurons maintain their active/inactive states, with sharp transitions only at the boundaries where neurons toggle.
Can you use the definition of ReLU and the definition of neuron to give a worked proof of why the critical points where the ReLUs toggle affect all coordinates simultaneously?
Thinking Summary
The user wants a proof showing why when a ReLU toggles, it affects all output coordinates simultaneously. Let me work through this carefully using the definitions from the paper. From the paper: - ReLU: $\sigma(x) = \max(x, 0)$ - Neuron j in layer i: $\eta(x) = \sigma(A_j^{(i)}x + b_j^{(i)})$ - The network is composed as: $f = f_{r+1} \circ \sigma \circ \cdots \circ \sigma \circ f_2 \circ \sigma \circ f_1$ The key insight is that when we move along an interpolation path $\mu(\lambda) = x_1 + \lambda(x_2 - x_1)$, a neuron becomes critical when its pre-activation value crosses zero. This happens at a specific value of $\lambda$, and this affects the entire computation graph downstream from that neuron. Let me write a clear proof.
Let me prove this using the definitions from the paper.
Definitions:
- ReLU:
- Neuron in layer :
- Network:
Proof:
Consider an interpolation path for .
For neuron in layer , its pre-activation value along this path is:
This pre-activation value is a linear function of (since the input to layer depends linearly on when all neurons maintain their states). The neuron becomes critical when for some .
Now consider the network's output. Let's denote the -th output coordinate as . We can write:
where are the weights in the final layer.
The key observation: When neuron in layer toggles at :
- For : (if going from inactive to active)
- For :
This change propagates through all subsequent layers. Every path from this neuron to any output coordinate will experience a change in its contribution at exactly .
Specifically, the derivative will have a discontinuity at for every output coordinate that has any dependency on neuron in layer . Since the network is fully connected, every neuron in layer influences every output coordinate (unless some weights happen to be exactly zero).
Therefore, when neuron toggles at :
- The function experiences a corner (slope change) at for all
- The location is the same for all output coordinates
- Only the magnitude of the slope change differs between outputs
This is why the paper can find critical points by analyzing any output coordinate - the values where corners occur are identical across all outputs.