Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

ICML 2025

Minting Pan, Yitao Zheng, Jiajian Li, Yunbo Wang, Xiaokang Yang
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University Corresponding author: Yunbo Wang

Abstract

Offline reinforcement learning (RL) enables policy optimization using static datasets, avoiding the risks and costs of extensive real-world exploration. However, it struggles with suboptimal offline behaviors and inaccurate value estimation due to the lack of environmental interaction. We present Video-Enhanced Offline RL (VeoRL), a model-based method that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, our approach transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. VeoRL achieves substantial performance gains (over 100% in some cases) across visual control tasks in robotic manipulation, autonomous driving, and open-world video games.

Overview of VeoRL

intro

Demo

Results on Meta-World

Source Videos-BridgeData

Results on CARLA

Source Videos-NuScenes
Example 1 of VeoRL
Example 2 of VeoRL
Example 1 of DreamerV2
Example 2 of DreamerV2

Results on MineDojo

Harvest log in plains of VeoRL
Harvest water with bucket of VeoRL
Harvest sand of VeoRL
Harvest log in plains of DreamerV2
Harvest water with bucket of DreamerV2
Harvest sand of DreamerV2

Model architecture

(a) We construct a discrete, high-level latent action space by training the BAN, enabling forward dynamics modeling independent of real actions. (b) The visualization of model-based actor-critic learning at a single rollout step. We leverage behavior cloning module to replay the video-informed latent behaviors, serving as the inputs of the actor and critic for producing goal-conditioned policies and value estimations, as well as the plan net for generating a long-term state rollout.

model_policy

BibTeX

 
            @inproceedings{pan2025veorl,
                title={Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach},
                author={Minting Pan and Yitao Zheng and Jiajian Li and Yunbo Wang and Xiaokang Yang},
                booktitle={ICML},
                year={2025}
            }