Title: "Multi-View Human Mesh Recovery with Segmentation Masks and 3D Key-point Guidance"
Abstract:
This paper addresses the challenge of accurately recovering 3D human meshes from multi-view images, particularly in the presence of occlusions. We propose a novel framework that leverages the complementary information provided by segmentation masks and 3D key-points to enhance the reconstruction process. Our approach integrates a multi-view convolutional neural network (CNN) with a graph convolutional network (GCN) to effectively fuse the multi-modal input data. The CNN extracts features from the multi-view images and segmentation masks, while the GCN incorporates the 3D key-point information to refine the mesh topology and geometry. We introduce a novel loss function that combines a multi-view consistency term, a segmentation mask alignment term, and a 3D key-point distance term to ensure accurate and robust mesh recovery. Extensive experiments on benchmark datasets demonstrate that our method outperforms state-of-the-art approaches in terms of both accuracy and robustness to occlusions. Our approach has significant implications for various applications, including human-computer interaction, virtual reality, and 3D animation.
Keywords: 3D human mesh recovery, multi-view images, segmentation masks, 3D key-points, convolutional neural networks, graph convolutional networks, occlusion handling.
TOC:
- Introduction
- 1.1 Background and Motivation
- 1.1.1 3D Human Mesh Recovery in Computer Vision
- 1.1.2 Applications and Challenges
- 1.2 Problem Statement and Challenges
- 1.2.1 Limitations of Existing Methods
- 1.2.2 Occlusion Handling
- 1.3 Proposed Approach and Contributions
- 1.3.1 Multi-view Fusion with Segmentation Masks
- 1.3.2 3D Key-point Guidance
- 1.3.3 Novel Loss Function
- 1.4 Paper Organization
- 1.1 Background and Motivation
- Related Work
- 2.1 3D Human Mesh Recovery from Multi-view Images
- 2.1.1 Volumetric Methods
- 2.1.2 Model-based Methods
- 2.2 Segmentation Masks for Human Mesh Refinement
- 2.2.1 Mask-guided Feature Extraction
- 2.2.2 Mesh Deformation with Mask Constraints
- 2.3 3D Key-point Guidance for Pose and Shape Estimation
- 2.3.1 Key-point-based Pose Estimation
- 2.3.2 Shape Reconstruction from Key-points
- 2.1 3D Human Mesh Recovery from Multi-view Images
- Proposed Method
- 3.1 Multi-view CNN for Feature Extraction
- 3.1.1 Network Architecture
- 3.1.2 Feature Fusion across Views
- 3.2 GCN for 3D Key-point Integration
- 3.2.1 Graph Construction
- 3.2.2 Message Passing and Feature Update
- 3.3 Fusion Module for Multi-modal Data Aggregation
- 3.3.1 Feature Concatenation and Attention
- 3.4 Loss Function Design
- 3.4.1 Multi-view Consistency Loss
- 3.4.2 Segmentation Mask Alignment Loss
- 3.4.3 3D Key-point Distance Loss
- 3.1 Multi-view CNN for Feature Extraction
- Experiments
- 4.1 Datasets and Evaluation Metrics
- 4.1.1 Human3.6M Dataset
- 4.1.2 CMU Panoptic Dataset
- 4.1.3 Evaluation Metrics (MPJPE, PA-MPJPE, MPVE)
- 4.2 Implementation Details
- 4.2.1 Training Setup and Hyperparameters
- 4.2.2 Data Augmentation
- 4.3 Quantitative Results and Analysis
- 4.3.1 Comparison with State-of-the-art Methods
- 4.3.2 Performance under Different Occlusion Levels
- 4.4 Qualitative Results and Visualization
- 4.4.1 Visual Comparison of Reconstructed Meshes
- 4.1 Datasets and Evaluation Metrics
- Discussion
- 5.1 Ablation Studies on Different Components
- 5.1.1 Effect of Segmentation Masks
- 5.1.2 Impact of 3D Key-point Guidance
- 5.2 Limitations and Future Work
- 5.2.1 Generalization to Unseen Poses and Shapes
- 5.2.2 Real-time Performance
- 5.1 Ablation Studies on Different Components
- Conclusion

被折叠的 条评论
为什么被折叠?



