3D Video Loops from Asynchronous Input

1The Hong Kong University of Science and Technology 2Tencent AI Lab
3City University of Hong Kong

teaser image

Given a set of asynchronous multi-view videos, we propose a pipeline to construct a novel 3D looping video representation (a), which consists of a static texture atlas, a dynamic texture atlas, and multiple tiles as the geometry proxy. The 3D video loops allows both view and time control (b), and can be rendered in real time even on mobile devices (c). (HTML5 is required to play the teaser)

Abstract

Looping videos are short video clips that can be looped endlessly without visible seams or artifacts. They provide a very attractive way to capture the dynamism of natural scenes. Existing methods have been mostly limited to 2D representations. In this paper, we take a step forward and propose a practical solution that enables an immersive experience on dynamic 3D looping scenes. The key challenge is to consider the per-view looping conditions from asynchronous input while maintaining view consistency for the 3D representation. We propose a novel sparse 3D video representation, namely Multi-Tile Video (MTV), which not only provides a view-consistent prior, but also greatly reduces memory usage, making the optimization of a 4D volume tractable. Then, we introduce a two-stage pipeline to construct the 3D looping MTV from completely asynchronous multi-view videos with no time overlap. A novel looping loss based on video temporal retargeting algorithms is adopted during the optimization to loop the 3D scene. Experiments of our framework have shown promise in successfully generating and rendering photorealistic 3D looping videos in real time even on mobile devices.

3D Video Representation

repr image

Instead of storing a sequence of RGBA maps for each plane as in Multi-plane Video (MPV) representation, Multi-tile Videos (MTVs) reduce the memory requirements by exploiting the spatio-temporal sparsity of the scene. we subdivide each plane into a regular grid of tiny tiles. Each tile stores a small RGBA patch sequence. We also assign a label l for each tile based on whether it contains looping content lloop, a static scene lstatic, or is simply empty lempty. We could then store a single static RGBA patch for lstatic, and discard tiles that are empty.

Two-stage Pipeline

pipeline image

We propose a two-stage pipeline to generate the MTV representation from asynchronous mult-view videos. In the first stage, we initialize the MTV by optimizing a static Multiplane Image (MPI) and a 3D loopable mask using long-exposure images and 2D loopable masks derived from the input videos. We then construct an MTV through a tile culling process. In the second stage, we train the MTV using an analysis-by-synthesis approach in a coarse-to-fine manner. The key enabler for this process is a novel looping loss based on video retargeting algorithms, which encourages a video to simultaneously loop and preserve similarity to the input.

Looping Loss

looping loss image

To compute looping loss, we first pad frames and extract 3D patches along the time axis for each pixel location, then we compute a normalized similarity score for each patch pair. Finally, the looping loss is computed by averaging errors between patches with minimum scores. Right figure demostrate

Demos

Please click each image to open the demo in a new tab.

Supplementary Video

Teaser

Comparison

Ablations

Limitations

  • The synthetic results still contain some artifacts.
  • It cannot model complex view-dependent effects, such as non-planar specular.
  • We assume the scene to possess a looping pattern, which works best for natural scenes like flowing water and waving trees. However, our method tends to fail if the scene is not loopable, because each view has completely unique content.
  • The MTV representation used is actually a 2.5D representation that only supports synthetic novel views with a narrow baseline.
  • BibTeX

    
          @article{videoloop,
            title   ={3D Video Loops from Asynchronous Input},
            author  ={Li Ma and Xiaoyu Li and Jing Liao and Pedro V. Sander},
            journal ={arXiv preprint arXiv:2303.05312},
            year    ={2023}
      }