You're touching on some important challenges in model distillation/imitation that make this much harder than it initially appears:
The API Access Problem
Rate limits and costs: Even with API access, generating enough high-quality training data is expensive and slow. You'd need millions of diverse prompts and responses to capture Claude's behavior across different contexts, domains, and conversation styles.
Distribution mismatch: The prompts you can think of won't cover Claude's full capability space. You'll miss edge cases, specialized domains, and subtle behavioral patterns that only emerge with massive diverse usage.
Quality vs Quantity Tradeoffs
Surface-level mimicry: It's relatively easy to train a model to "sound like" Claude - matching tone, formatting, politeness patterns. Much harder to replicate the underlying reasoning, knowledge integration, and nuanced judgment.
Capability gaps: If your base model (say, Llama 70B) has fundamental knowledge or reasoning gaps compared to Claude 3, fine-tuning won't bridge those gaps - it can only work with what's already there.
Technical Challenges
Training data quality: You'd need to carefully curate prompts that elicit Claude's best responses across many domains. Random conversations won't capture the full range of capabilities.
Optimization difficulties: Fine-tuning often leads to capability degradation in some areas while improving others. Maintaining Claude's balanced performance across all tasks is non-trivial.
Alignment preservation: Claude's safety behaviors and refusal patterns are complex and context-dependent. Replicating these without the original training process is quite difficult.
The "few weeks of work" estimate severely underestimates the data collection, computational resources, and iterative refinement needed for high-fidelity model distillation.