blog post speculating about reading the tea leaves behind google's pricing
some thoughts:
- the comment about "OpenAI can afford to take negative margins while playing catch up, whereas Google is a public company that cannot (and does not) play the same compute subsidization games" seems totally wrong, if anything it would be the opposite since Google has huge resources
- doesn't mention that Google models have much longer context than others (have heard that this is because they are better algorithmically rather than having more hw or custom hw but no way to know), which seems relevant
- point about pricing not matching actual resource costs (linear vs quadratic inputs) makes sense and is something i've wondered about while looking at model pricing
- the comments in the throughput analysis all make sense except that i'm surprised by: "when you send a prompt, the model can process all input tokens in parallel". i'm not sure how that can be true and also that it can be the case that prefilling has quadratic costs - i thought the quadratic cost came from token<>token interactions, and that parallelizability came from tokens not interacting with each other. probably would make more sense after implementing a transformer.