Scaling and evaluating sparse autoencoders
← Back to all posts
Thread (60 posts)
JumpReLU mentioned
Another approach is to replace the ReLU activation function with a ProLU (Taggart, 2024) (also known as TRec (Konda et al., 2014), or JumpReLU (Erichson et al., 2019)), which sets all values below a positive threshold to zero … Because the parameter θ is non-differentiable, it requires a approximate gradient such as a ReLU equivalent (ProLU-ReLU) or a straight-through estimator (ProLU-STE) (Taggart, 2024).