VideoLoop3D

Looping videos are short video clips that can be looped endlessly without visible seams or artifacts. They provide a very attractive way to capture the dynamism of natural scenes. Existing methods have been mostly limited to 2D representations. In this paper, we take a step forward and propose a practical solution that enables an immersive experience on dynamic 3D looping scenes. The key challenge is to consider the per-view looping conditions from asynchronous input while maintaining view consistency for the 3D representation. We propose a novel sparse 3D video representation, namely Multi-Tile Video (MTV), which not only provides a view-consistent prior, but also greatly reduces memory usage, making the optimization of a 4D volume tractable. Then, we introduce a two-stage pipeline to construct the 3D looping MTV from completely asynchronous multi-view videos with no time overlap. A novel looping loss based on video temporal retargeting algorithms is adopted during the optimization to loop the 3D scene. Experiments of our framework have shown promise in successfully generating and rendering photorealistic 3D looping videos in real time even on mobile devices.

Instead of storing a sequence of RGBA maps for each plane as in Multi-plane Video (MPV) representation, Multi-tile Videos (MTVs) reduce the memory requirements by exploiting the spatio-temporal sparsity of the scene. we subdivide each plane into a regular grid of tiny tiles. Each tile stores a small RGBA patch sequence. We also assign a label l for each tile based on whether it contains looping content l_loop, a static scene l_static, or is simply empty l_empty. We could then store a single static RGBA patch for l_static, and discard tiles that are empty.

We propose a two-stage pipeline to generate the MTV representation from asynchronous mult-view videos. In the first stage, we initialize the MTV by optimizing a static Multiplane Image (MPI) and a 3D loopable mask using long-exposure images and 2D loopable masks derived from the input videos. We then construct an MTV through a tile culling process. In the second stage, we train the MTV using an analysis-by-synthesis approach in a coarse-to-fine manner. The key enabler for this process is a novel looping loss based on video retargeting algorithms, which encourages a video to simultaneously loop and preserve similarity to the input.

To compute looping loss, we first pad frames and extract 3D patches along the time axis for each pixel location, then we compute a normalized similarity score for each patch pair. Finally, the looping loss is computed by averaging errors between patches with minimum scores. Right figure demostrate

Please click each image to open the demo in a new tab.

The synthetic results still contain some artifacts.

It cannot model complex view-dependent effects, such as non-planar specular.

We assume the scene to possess a looping pattern, which works best for natural scenes like flowing water and waving trees. However, our method tends to fail if the scene is not loopable, because each view has completely unique content.

The MTV representation used is actually a 2.5D representation that only supports synthetic novel views with a narrow baseline.

BibTeX


      @article{videoloop,
        title   ={3D Video Loops from Asynchronous Input},
        author  ={Li Ma and Xiaoyu Li and Jing Liao and Pedro V. Sander},
        journal ={arXiv preprint arXiv:2303.05312},
        year    ={2023}
  }

3D Video Loops from Asynchronous Input

Abstract

3D Video Representation

Two-stage Pipeline

Looping Loss

Demos

Supplementary Video

Teaser

Comparison

Ablations

Limitations

BibTeX