UFO: Unifying Feed-Forward and Optimization-based Methods

for Large Driving Scene Modeling

1Xiaomi EV 2UIUC

*Equal Contribution. Project Leader. Corresponding Author.

Abstract

Dynamic driving scene reconstruction is critical for autonomous driving simulation and closed-loop learning. While recent feed-forward methods have shown promise for 3D reconstruction, they struggle with long-range driving sequences due to quadratic complexity in sequence length and challenges in modeling dynamic objects over extended durations. We propose UFO, a novel recurrent paradigm that combines the benefits of optimization-based and feed-forward methods for efficient long-range 4D reconstruction.Our approach maintains a 4D scene representation that is iteratively refined as new observations arrive, using a visibility-based filtering mechanism to select informative scene tokens and enable efficient processing of long sequences. For dynamic objects, we introduce an object pose-guided modeling approach that supports accurate long-range motion capture. Experiments on the Waymo Open Dataset demonstrate that our method significantly outperforms both per-scene optimization and existing feedforward methods across various sequence lengths. Notably, our approach can reconstruct 16-second driving logs within 0.5 second while maintaining superior visual quality and geometric accuracy.

Framework

Given a long sequence of multi-view images, we reconstruct the 4D scene in a recurrent manner. (A) At each time step, we update the scene representation by refining previous scene tokens based on the new observation and adding new information from the current frame. (B) To efficiently handle long sequences, we employ a visibility-based filtering mechanism to select relevant scene tokens for updating. A unified transformer model learns to update the scene in a feed-forward manner. (C) Dynamic objects are modeled using 3D bounding boxes and per-Gaussian lifespans, enabling complex motion modeling over time

BibTeX

@misc{tan2026ufounifyingfeedforwardoptimizationbased,
      title={UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling}, 
      author={Kaiyuan Tan and Yingying Shen and Mingfei Tu and Haohui Zhu and Bing Wang and Guang Chen and Hangjun Ye and Haiyang Sun},
      year={2026},
      eprint={2602.20943},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.20943}, 
}