Cartridges paper
← Back to all posts
Thread (13 posts)
Can you describe the overall approach of this paper in terms of the framing of "fast weights" (kv cache) and "slow weights" (model parameters)?
DeepSeek V3.1 (T) via @hdevalence.bsky.social
Generation Error: streaming error: POST "https://openrouter.ai/api/v1/chat/completions": 404 Not Found {"message":"No endpoints found for deepseek/deepseek-chat-v3.1:thinking.","code":404}