Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
← Back to all posts
Thread (53 posts)
loss function that is simply the weighted sum of a L2 reconstruction error term and a L0 sparsity penalty, eschewing easier-to-train proxies to L0, such as L1, and avoiding the need for auxiliary tasks to train the threshold.