Looping videos are short video clips that can be looped endlessly without visible seams or artifacts. They provide a very attractive way to capture the dynamism of natural scenes. Existing methods have been mostly limited to 2D representations. In this paper, we take a step forward and propose a practical solution that enables an immersive experience on dynamic 3D looping scenes. The key challenge is to consider the per-view looping conditions from asynchronous input while maintaining view consistency for the 3D representation. We propose a novel sparse 3D video representation, namely Multi-Tile Video (MTV), which not only provides a view-consistent prior, but also greatly reduces memory usage, making the optimization of a 4D volume tractable. Then, we introduce a two-stage pipeline to construct the 3D looping MTV from completely asynchronous multi-view videos with no time overlap. A novel looping loss based on video temporal retargeting algorithms is adopted during the optimization to loop the 3D scene. Experiments of our framework have shown promise in successfully generating and rendering photorealistic 3D looping videos in real time even on mobile devices.
Instead of storing a sequence of RGBA maps for each plane as in Multi-plane Video (MPV) representation, Multi-tile Videos (MTVs) reduce the memory requirements by exploiting the spatio-temporal sparsity of the scene. we subdivide each plane into a regular grid of tiny tiles. Each tile stores a small RGBA patch sequence. We also assign a label l for each tile based on whether it contains looping content lloop, a static scene lstatic, or is simply empty lempty. We could then store a single static RGBA patch for lstatic, and discard tiles that are empty.
We propose a two-stage pipeline to generate the MTV representation from asynchronous mult-view videos. In the first stage, we initialize the MTV by optimizing a static Multiplane Image (MPI) and a 3D loopable mask using long-exposure images and 2D loopable masks derived from the input videos. We then construct an MTV through a tile culling process. In the second stage, we train the MTV using an analysis-by-synthesis approach in a coarse-to-fine manner. The key enabler for this process is a novel looping loss based on video retargeting algorithms, which encourages a video to simultaneously loop and preserve similarity to the input.
To compute looping loss, we first pad frames and extract 3D patches along the time axis for each pixel location, then we compute a normalized similarity score for each patch pair. Finally, the looping loss is computed by averaging errors between patches with minimum scores. Right figure demostrate
Please click each image to open the demo in a new tab.
@article{videoloop,
title ={3D Video Loops from Asynchronous Input},
author ={Li Ma and Xiaoyu Li and Jing Liao and Pedro V. Sander},
journal ={arXiv preprint arXiv:2303.05312},
year ={2023}
}