Stealing Part of a Production Language Model
← Back to all posts
Thread (64 posts)
Interesting observation about GPT-2 Small having a true hidden dimension of 757 instead of 768. Refs "Spectral Filters, Dark Signals, and Attention Sinks”
Link not found