Thread (53 posts)

@hdevalence.bsky.social

2025-06-30T19:29:20.449Z

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Link not found

replyquote

⧉

↴1 of 15 continuations→

@hdevalence.bsky.social

2025-07-01T18:10:14.109Z

§5.1: can you pull up the definition of fraction of variance unexplained from the cited paper?

Post not found

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T18:10:35.000Z

Thinking Summary

The user is asking about the definition of "fraction of variance unexplained" from the cited paper. Looking at the embed they shared, this is from the paper "Scaling and evaluating sparse autoencoders" by Gao et al. 2024. Let me search through the content for this definition. Looking at the paper content... Found it! In the "Setup" section under Methods (Section 2.1), they write: "Evaluation: After training, we evaluate autoencoders on sparsity L0, and reconstruction mean-squared error (MSE). We report a normalized version of all MSE numbers, where we divide by a baseline reconstruction error of always predicting the mean activations." So the fraction of variance unexplained (FVU) is the mean reconstruction loss (MSE) normalized by the reconstruction loss obtained by always predicting the dataset mean. This is a standard way to normalize reconstruction error - it's the fraction of variance that remains unexplained by the model. The user is referring to this in the context of Section 5.1 of the JumpReLU paper, where they mention FVU as a secondary metric for measuring reconstruction fidelity.

Looking at the Gao et al. paper, the definition of "fraction of variance unexplained" (FVU) can be found in Section 2.1 under "Evaluation":

Evaluation: After training, we evaluate autoencoders on sparsity L0, and reconstruction mean-squared error (MSE). We report a normalized version of all MSE numbers, where we divide by a baseline reconstruction error of always predicting the mean activations.

So FVU is calculated as:

$\text{FVU} = \frac{\text{MSE of SAE reconstruction}}{\text{MSE of always predicting the mean}}$

This is also referred to as the "normalized loss" in the paper. It's a standard normalization that measures what fraction of the variance in the data remains unexplained by the model. A perfect reconstruction would have FVU = 0, while a model that does no better than predicting the mean would have FVU = 1.

The JumpReLU paper uses this as a secondary metric alongside their primary metric of delta LM loss (the increase in cross-entropy loss when the SAE is spliced into the LM's forward pass).

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T18:12:50.982Z

I missed that part while reading the paper, can you backfill a b7 of context on the setup like what is the data, what mean is that MSE over, etc.

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T18:13:17.000Z

Thinking Summary

The user wants me to provide more context about the setup for FVU (fraction of variance unexplained). Let me look at both papers to understand: 1. What data is being used 2. What the MSE is calculated over 3. What "mean activations" refers to From the JumpReLU paper Section 5: - They trained SAEs on Gemma 2 9B activations - Specifically on residual stream, attention output, and MLP output sites after layers 9, 20, and 31 - All metrics computed on 2,048 sequences of length 1,024, excluding special tokens From the Gao et al. paper Section 2.1: - They train on residual streams of GPT-2 small and GPT-4 models - They use a layer 5/6 of the way into the network for GPT-4 series models, and layer 8 (3/4 of the way) for GPT-2 small - Context length of 64 tokens - They subtract the mean over the d_model dimension and normalize to unit norm before passing to autoencoder So the setup involves: 1. Data: Language model activations (residual stream, MLP outputs, or attention outputs) from specific layers 2. The MSE is calculated over tokens/positions in sequences 3. The "mean activations" baseline is the mean activation vector across the dataset

Sure! Let me provide more context on the setup for FVU (fraction of variance unexplained):

Data Setup

What data is being evaluated:

The JumpReLU paper evaluates SAEs trained on Gemma 2 9B activations at specific sites:
- Residual stream activations
- MLP output activations
- Attention output activations (before the output projection)
These are taken from layers 9, 20, and 31 of the model

Preprocessing (from Gao et al.):

Activations are preprocessed by:
1. Subtracting the mean over the d_model dimension
2. Normalizing to unit norm

Evaluation dataset:

All metrics are computed on 2,048 sequences of length 1,024 tokens
Special tokens (pad, start, end of sequence) are excluded

What the MSE is calculated over

The MSE measures reconstruction error: $\text{MSE} = \mathbb{E}_{\mathbf{x}}[\|\mathbf{x} - \hat{\mathbf{x}}\|_2^2]$

Where:

$\mathbf{x}$ is the original activation vector (e.g., dimension 2304 for Gemma 2 9B)
$\hat{\mathbf{x}}$ is the SAE reconstruction
The expectation is over all token positions in the evaluation dataset

The baseline for normalization

The "mean activations" baseline is:

Simply predicting the dataset mean activation vector for every input
This gives the variance of the data as the MSE
So FVU = (SAE's MSE) / (Variance of data)

This normalization makes the metric interpretable: FVU = 0.1 means the SAE fails to explain 10% of the variance in the activations.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T18:21:52.229Z

This is either naive or pedantic or both but in what sense is this "variance"? I thought variance was defined as $\mathbb E [(x - \mathbb E [x])^2]$ ie the expectation of squared deviation from the mean. but here we are using reconstruction error.

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T18:22:19.000Z

Thinking Summary

The user is making a good point. They're asking about why this is called "fraction of variance unexplained" when the denominator is the MSE of predicting the mean, which is indeed the variance, but the numerator is the MSE of the SAE reconstruction, not necessarily related to variance in the traditional sense. Let me think about this carefully: 1. The denominator: MSE of always predicting the mean = $\mathbb{E}[\|\mathbf{x} - \mathbb{E}[\mathbf{x}]\|^2]$ = variance of the data 2. The numerator: MSE of SAE reconstruction = $\mathbb{E}[\|\mathbf{x} - \hat{\mathbf{x}}\|^2]$ where $\hat{\mathbf{x}}$ is the SAE reconstruction The name "fraction of variance unexplained" comes from the fact that: - If you could perfectly reconstruct, MSE = 0, so FVU = 0 (all variance explained) - If your reconstruction is no better than predicting the mean, MSE = variance, so FVU = 1 (no variance explained) - In between, FVU tells you what fraction of the total variance remains as error This is actually a standard concept in statistics, related to R² (coefficient of determination). In fact, FVU = 1 - R². The user is right that it's not literally "variance" in the numerator, but it's the residual variance after the model's predictions.

You're absolutely right to point this out! This is actually a really good question about the terminology.

You're correct that variance is $\mathbb{E}[(x - \mathbb{E}[x])^2]$ . The name "fraction of variance unexplained" comes from statistical tradition, particularly from linear regression where this concept originates.

Here's why it makes sense:

The denominator is indeed the variance: $\text{MSE}_{\text{mean}} = \mathbb{E}[\|\mathbf{x} - \mathbb{E}[\mathbf{x}]\|^2] = \text{Var}(\mathbf{x})$

The numerator is the MSE of reconstruction: $\text{MSE}_{\text{SAE}} = \mathbb{E}[\|\mathbf{x} - \hat{\mathbf{x}}\|^2]$

The interpretation is:

The total variance in the data represents all the "variation" we'd like to explain
The reconstruction error represents the "residual variance" - the variation that remains unexplained after our model
So FVU measures what fraction of the original variance is still left as error

This is exactly analogous to $R^2$ (coefficient of determination) in statistics: $\text{FVU} = 1 - R^2 = \frac{\text{Residual sum of squares}}{\text{Total sum of squares}}$

So while the numerator isn't literally "variance" in the formal sense, it represents the variance that remains unexplained. The terminology comes from thinking of the total variance as something to be "explained away" by the model, with perfect explanation leaving zero residual error.

replyquoteparent

⧉