Introduction
- By optimizing volumetric scene function using sparse set of input views, we can synthesize novel views of complex scenes.
- Input : Scenes consisted of continuous spatial locations $(x, y, z)$ and viewing directions $(\theta, \phi)$
- Output : Volume density and view-dependent color at spatial location.
- Density : differential opacity controlling the amount of radiance accumulated by a ray passing thru position.
- Model : MLP, without convolutional layers.
- Because basic implementation does not converge to sufficient representation, we use the following;
- We transform the input coordinates with positional encoding, to represent higher frequency functions.
- We propose hierarchical sampling procedure to reduce number of queries.
- Using traditional volume rendering techniques, we can project the output into synthesized images.
- Implicit representation of 3D shapes as level sets
- Method #1 : (x, y, z) coords to signed distance functions.
- Method #2 : (x, y, z) coords to occupancy fields.
- Limited by requirement of ground truth 3D geometry (ex : ShapeNet)
- Recent : relax this requirement by functions that allow neural implicit shape representations to be optimized through 2D images.
- Light field sample interpolation
NeRF Representation
- Input :
- 3D Location $\mathbf{x} = (x, y, z)$
- 2D Viewing direction $\mathbf{d} = (\theta, \phi)$
- Output :
- Emitted color $\mathbf{c} = (r, g, b)$
- Volume density $\sigma$
- Network
- $F_\Theta : (\mathbf{x}, \mathbf{d}) \rightarrow (\mathbf{c}, \sigma)$
- In order to make the representation multiview consistent, restrict the network such that volume density $\sigma$ is only predicted by location $x$, regardless of direction.
- $\mathbf{x} \rightarrow MLP(*8, 256) \rightarrow (\sigma, f.v)$
- $Concat(f.v, \mathbf{d}) \rightarrow MLP(*1, 128) \rightarrow (\mathbf{c})$
Volume Rendering with Radiance Field
Written with StackEdit.