1. What
What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)
Utilizing periodic vibration-based temporal dynamics to represent the characteristics of various objects and elements in dynamic urban scenes. A novel temporal smoothing mechanism and a position-aware adaptive control strategy were also introduced to handle temporally coherent. Finish the reconstruction of dynamic scenes.
2. Why
Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)
Maybe contain Background, Question, Others, Innovation:
Introduction:
- NSG: Decomposes dynamic scenes into scene graphs
- PNF: Decomposes scenes into objects and backgrounds, incorporating a panoptic segmentation auxiliary task.
- SUDS: Use optical flow(Optical flow helps in identifying which parts of the scene are static and which are dynamic)
- EmerNeRF: Use a self-supervised method to reduce dependence on optical flow.
Related work:
- Dynamic scene models
- In one research direction, certain studies [2, 7, 11, 18, 38] introduce time as an additional input to the radiance field, treating the scene as a 6D plenoptic function. However, this approach couples positional variations induced by temporal dynamics with the radiance field, lacking geometric priors about how time influences the scene.
- An alternative approach [1, 20, 24–26, 33, 40] focuses on modeling the movement or deformation of specific static structures, assuming that the dynamics arise from these static elements within the scene.
- Gaussian-based
- Urban scene reconstruction
- One research avenue(NeRF-based) has focused on enhancing the modeling of static street scenes by utilizing scalable representations [19, 28, 32, 34], achieving high-fidelity surface reconstruction [14, 28, 39], and incorporating multi-object composition [43]. However, these methods face difficulties in handling dynamic elements commonly encountered in autonomous driving contexts.
- Another research direction seeks to address these challenges. Notably, these techniques require additional input, such as leveraging panoptic segmentation to refine the dynamics of reconstruction [PNF]. [Street Gaussians, Driving Gaussian] decompose the scene with different sets of Gaussian points by bounding boxes. However, they all need manually annotated or predicted bounding boxes and have difficulty reconstructing the non-rigid objects.
3. How
The input data contain images, represented as { I i , t i , E i , I i ∣ i = 1 , 2 , … N c } \{\mathcal{I}_{i},t_{i},\mathbf{E}_{i},\mathbf{I}_{i}|i=1,2,\ldots N_{c}\} { Ii,ti,Ei,Ii∣i=1,2,…Nc} and LiDAR point clouds represented as { ( x i , y i , z i , t i ) ∣ i = 1 , 2 , … N l } \{(x_i,y_i,z_i,t_i)|i=1,2,\ldots N_l\} {(xi,yi,zi,ti)∣i=1,2,…Nl}. The rendering process can be represented as I ^ = F θ ( E o , I o , t ) \hat{\mathcal{I}}=\mathcal{F}_{\theta}(\mathbf{E}_{o},\mathbf{I}_{o},t) I^=Fθ(Eo,Io,t), which shows the image at any timestamp t t t and camera pose ( E o , I o ) (\mathbf{E}_{o},\mathbf{I}_{o}) (Eo<