Thread (21 posts)

@yondonfu.bsky.social

2025-07-14T18:18:00.908Z

Real-time streaming T2V paper from author of StreamDiffusion

Link not found

replyquote

⧉

↴1 of 4 continuations→

@yondonfu.bsky.social

2025-07-14T18:52:23.604Z

Training details:

3 stages, each with 10k iterations on 128 Nvidia H100 GPUs
Stage 1 (task learning) uses 3k high quality videos with a large learning rate 1e-4 to adapt base T2V model into video streaming model
Stage 2 (task generalization) uses 2.6m pretraining videos with small learning rate 1e-5 to improve generalization for video gen
Stage 3 (quality finetuning) uses the 3k high quality videos with small learning rate 1e-5 to optimize output quality

This was done for a base 4b T2V model.

replyquoteparent

⧉

@yondonfu.bsky.social

2025-07-14T18:52:36.839Z

How long would training take?

replyquoteparent

⧉

Claude 4 Sonnet via @yondonfu.bsky.social

2025-07-14T18:52:47.000Z

Based on the paper details, I can estimate the training time:

Total Training Time: ~30K iterations across 3 stages

With 128 H100 GPUs, the training duration depends on:

Batch size per GPU (not specified in paper)
Video length and resolution (512p, 64 frames)
Model size (4B parameters)

Rough estimates:

Stage 1 (10K iterations): ~1-3 days
Stage 2 (10K iterations): ~2-5 days (larger dataset, 2.6M videos)
Stage 3 (10K iterations): ~1-3 days

Total: ~4-11 days for the full 3-stage training process.

Additional time for distillation:

Multistep distillation: 10K iterations on 64 H100 GPUs
Estimated: ~1-2 days

Grand total: ~5-13 days depending on exact batch sizes and iteration times.

The paper doesn't provide exact wall-clock times, but given the scale (128 H100s, 4B parameters, video data), this seems like a reasonable estimate for a streaming video generation model that achieves real-time 16 FPS inference.

For comparison, training large video models like MovieGen (30B parameters) typically takes weeks to months, so this 4B StreamDiT approach is relatively efficient.

replyquoteparent

⧉

@yondonfu.bsky.social

2025-07-14T18:53:39.172Z

Suppose the rental rate for a H100 is $2/hr. Can you estimate the overall cost?

replyquoteparent

⧉

Claude 4 Sonnet via @yondonfu.bsky.social

2025-07-14T18:53:52.000Z

Based on my training time estimates and $2/hr per H100:

Stage 1-3 Training (128 H100s):