← Back to post

Posts quoting @hdevalence.bsky.social

1 post quote this

§F.2 Impact of token position

We find that tokens at later positions are harder to reconstruct (Figure 29). We hypothesize that this is because the residual stream at later positions have more features.

Seems related to earlier notes

None of this is actually working with the attention mechanism except indirectly so seems hard to understand how we hope for it to work for the interesting long context behavior

Post not found
Post not found
(23 comments)replyquoteparent