PIE: Portrait Image Embedding for Semantic Control

Abstract

Notation

1. Person-specific Video Editing (Model based)

단일 인물의 많은 양의 사진을 필요로 하는 경우; 긴 길이의 비디오가 필요하며, 특정 인물의 단일 이미지로 사용될 수 없음

3. Few-shot Editing

단일 인물의 적은 양의 사진을 필요로 하는 경우;

4. Single-shot Editing

5. Image Editing with StyleGAN

Semantic Editing of Real Facial Images

$E(w) = E_{synth}(w) + E_{identity}(w) + E_{edit}(w) + E_{invariance}(w) + E_{recognition}(w)$

High-Fidelity Image Synthesis (E_synth)

$E_{synth}(w) = \lambda_{l_2}|I-I_w|^2_2+\lambda_{p}|\Phi(I)-\Phi(I_w) |^2_2$

Face Image Editing : Identity Preservation (E_identity)

$E_{identity}(w) = \lambda_{identity}|w-RigNet(w, \theta^\tau_w) |^2_2$

Face Image Editing : Editing Property (E_edit)

$\forall v : I_v=I_{([\theta^{\bar{\tau}}_v,\theta^{\tau}_{RigNet(w, \theta^{\tau}_v)}])}$ : Edit Property $(\theta^{\tau}_{v} \approx RigNet(w, \theta^{\tau}_{v}))$

$\ell(I’, \theta) = \lambda_{photometric}|I’ - I_\theta|^2_{face} + \lambda_{landmark}|\mathcal{L}_{I’} - \mathcal{L}_\theta |^2_F$

$E_{edit}(w) = \lambda_{edit}\mathbb{E}_v[\ell(I_v, [\theta^{\bar{\tau}}_v, \theta^{\tau}_{RigNet(w, \theta^{\tau}_v)}])]$

Face Image Editing : Editing Property (E_invariance)

$\forall v : I=I_{([\theta^{\bar{\tau}}_{RigNet(w, \theta^{\tau}_v)}, \theta^{\tau}_I, ])}$ : Invariance Property $(\theta^{\bar{\tau}}_w \approx RigNet(w, \theta^{\tau}_v)$

$E_{invariance}(w) = \lambda_{invariance}\mathbb{E}_v[\ell(I, [\theta^{\bar{\tau}}_{RigNet(w, \theta^{\tau}_v)}, \theta^{\tau}_I])]$

Face Recognition

$\ell_{recog}(I’, v) = | \Psi(I’) - \Psi(I_v)|^2_F$ $E_{recognition} = \lambda_{r_w}\ell_{recog}(I, w) + \lambda_{r_{\hat{w}}}\mathbb{E}_{V}[\ell_{recog}(I, RigNet(w, \theta^{\tau}_{v})]$

Optimization