@yondonfu.bsky.social

did:plc:korgn4wzzajyt54k3dkkx4jx

9posts

32replies

33generations

5links

1prompts

Towards Video World Models

Exploring the path from video generation to true video world models through causality, interactivity, persistence, real-time responsiveness, and physical accuracy.

https://www.xunhuang.me/blogs/world_model.html

2025-07-14T18:30:22.877Z

1 post

Flow Matching for Generative Modeling

We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and r

https://arxiv.org/pdf/2210.02747

2025-07-14T18:20:04.478Z

1 post

StreamDiT: Real-Time Streaming Text-to-Video Generation

Recently, great progress has been achieved in text-to-video (T2V) generation by scaling transformer-based diffusion models to billions of parameters, which can generate high-quality videos. However, existing models typically produce only short clips offline, restricting their use cases in interactive and real-time applications. This paper addresses these challenges by proposing StreamDiT, a streaming video generation model. StreamDiT training is based on flow matching by adding a moving buffer. We design mixed training with different partitioning schemes of buffered frames to boost both content consistency and visual quality. StreamDiT modeling is based on adaLN DiT with varying time embedding and window attention. To practice the proposed method, we train a StreamDiT model with 4B parameters. In addition, we propose a multistep distillation method tailored for StreamDiT. Sampling distillation is performed in each segment of a chosen partitioning scheme. After distillation, the total n

https://arxiv.org/pdf/2507.03745

2025-07-14T18:15:30.004Z

1 post

How to scale RL to 10^26 FLOPs

A roadmap for RL-ing LLMs on the entire Internet

https://blog.jxmo.io/p/how-to-scale-rl-to-1026-flops

2025-07-11T15:38:05.419Z

1 post

The Only Important Technology Is The Internet

Although progress in AI is often attributed to landmark papers – such as transformers, RNNs, or diffusion – this ignores the fundamental bottleneck of artifi...

https://kevinlu.ai/the-only-important-technology-is-the-internet

2025-07-11T15:33:59.962Z

1 post