Face video anonymization is aimed at privacy preservation while allowing for the analysis of videos in a number of computer vision downstream tasks such as expression recognition, people tracking, and action recognition.
We propose here a novel unified framework referred to as AnonNET, streamlined to de-identify a facial video, while preserving age, gender, race, pose, and expression of the original video. Specifically, we inpaint faces by a diffusion-based generative model guided by high-level attribute recognition and motion-aware expression transfer. We then animate de-identified faces by video-driven animation, which accepts the de-identified face and the original video as input.
Extensive experiments on the datasets VoxCeleb2, CelebV-HQ, and HDTF, which include diverse facial dynamics, demonstrate the effectiveness of AnonNET in obfuscating identity while retaining visual realism and temporal consistency.
Our work builds upon and extends several approaches in face animation and anonymization.
LivePortrait provides a robust framework for face animation that we utilize in our pipeline.
LIA (Latent Image Animatior) offers an effective approach for image animation that we adapt for our anonymization framework.
@article{TBD,
author = {Egin, Anil and Tangherloni, Andrea and Dantcheva, Antitza},
title = {Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head Videos},
journal = {TBD},
year = {2025},
}