I feel like a lot of these aren't real problems. However, I don't actually have anything to back this up.
My intuition is something like "you don't need tons of high quality training, just enough to extrapolate & create synthetic data, also each piece of training data surely carries with it a fair amount of bits of information".
Though I may be 1. overestimating how well synthetic data works, and 2. overestimating how much 'structural' (pointing at the actual underlying reasoning) transfer is carried by text.
Will come back to this - I think pushing back on this now would confuse Claude/activate their desire to agree with me.