Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

TempCo-Painter: Temporal Consistency Enhanced Painter with Adaptive Diffusion Transformers for Long Video Inpainting

View through CrossRef
Video inpainting, a critical task in computer vision, aims to plausibly fill missing regions in video sequences while maintaining both spatial realism and robust spatio-temporal consistency. Current methods often struggle with ultra-long videos, highly dynamic occlusions, and achieving extreme coherence efficiently, leading to common artifacts. To address these challenges, we propose TempCo-Painter: Temporal Consistency Enhanced Painter with Adaptive Diffusion Transformers. Our novel framework leverages a specialized 3D-VAE for efficient latent space compression and introduces an innovative Adaptive Diffusion Transformer (ADiT). ADiT integrates hierarchical spatial-temporal attention, a motion-guided attention mechanism for accurate dynamic content restoration, and dynamic mask awareness for robust handling of diverse occlusions. An efficient Flow Matching scheduler further enables TempCo-Painter to generate high-quality results with minimal denoising steps. For processing arbitrarily long videos, we introduce an enhanced MultiDiffusion strategy featuring an adaptive sliding window and temporal smoothing regularization to ensure seamless global consistency. Extensive experiments demonstrate that TempCo-Painter achieves state-of-the-art performance on standard short video benchmarks, significantly outperforming existing methods in PSNR, SSIM, and notably reducing Video Frechet Inception Distance. Furthermore, it exhibits superior robustness and coherence on challenging minute-level long videos and complex mask scenarios, while maintaining high inference efficiency.
Title: TempCo-Painter: Temporal Consistency Enhanced Painter with Adaptive Diffusion Transformers for Long Video Inpainting
Description:
Video inpainting, a critical task in computer vision, aims to plausibly fill missing regions in video sequences while maintaining both spatial realism and robust spatio-temporal consistency.
Current methods often struggle with ultra-long videos, highly dynamic occlusions, and achieving extreme coherence efficiently, leading to common artifacts.
To address these challenges, we propose TempCo-Painter: Temporal Consistency Enhanced Painter with Adaptive Diffusion Transformers.
Our novel framework leverages a specialized 3D-VAE for efficient latent space compression and introduces an innovative Adaptive Diffusion Transformer (ADiT).
ADiT integrates hierarchical spatial-temporal attention, a motion-guided attention mechanism for accurate dynamic content restoration, and dynamic mask awareness for robust handling of diverse occlusions.
An efficient Flow Matching scheduler further enables TempCo-Painter to generate high-quality results with minimal denoising steps.
For processing arbitrarily long videos, we introduce an enhanced MultiDiffusion strategy featuring an adaptive sliding window and temporal smoothing regularization to ensure seamless global consistency.
Extensive experiments demonstrate that TempCo-Painter achieves state-of-the-art performance on standard short video benchmarks, significantly outperforming existing methods in PSNR, SSIM, and notably reducing Video Frechet Inception Distance.
Furthermore, it exhibits superior robustness and coherence on challenging minute-level long videos and complex mask scenarios, while maintaining high inference efficiency.

Related Results

Virtual Inpainting for Dazu Rock Carvings Based on a Sample Dataset
Virtual Inpainting for Dazu Rock Carvings Based on a Sample Dataset
Numerous image inpainting algorithms are guided by a basic assumption that the known region in the original image itself can provide sufficient prior information for the guess reco...
Comment on: Macroscopic water vapor diffusion is not enhanced in snow
Comment on: Macroscopic water vapor diffusion is not enhanced in snow
Abstract. The central thesis of the authors’ paper is that macroscopic water vapor diffusion is not enhanced in snow compared to diffusion through humid air alone. Further, mass di...
Diversity-Generated Image Inpainting with Style Extraction
Diversity-Generated Image Inpainting with Style Extraction
The latest methods based on deep learning have achieved amazing results regarding the complex work of inpainting large missing areas in an image. This type of method generally atte...
Ancient mural inpainting via structure information guided two-branch model
Ancient mural inpainting via structure information guided two-branch model
AbstractAncient murals are important cultural heritages for our exploration of ancient civilizations and are of great research value. Due to long-time exposure to the environment, ...
MD-GAN: Multi-Scale Diversity GAN for Large Masks Inpainting
MD-GAN: Multi-Scale Diversity GAN for Large Masks Inpainting
Image inpainting approaches have made considerable progress with the assistance of generative adversarial networks (GANs) recently. However, current inpainting methods are incompet...
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Summary: The depth ictal electroencephalographic (EEG) propagation sequence accompanying 78 complex partial seizures of mesial temporal origin was reviewed in 24 patients (15 from...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...

Back to Top