非常感谢你提供了完整的 **`Problem_Set_4.pdf`** 内容!
我已经完全理解了本次作业的要求,现在将 **严格按照 PDF 中的说明** 补全你的代码,并确保:
✅ 每个函数实现符合教学要求
✅ 输出路径和格式正确
✅ 可复现结果
✅ 支持报告撰写所需的所有可视化与性能分析
---
## ✅ 最终目标
运行以下命令时应正常输出:
```bash
python main.py --tasks 0 # OpenCV 基线结果
python main.py --tasks 1 # 自实现滑动窗口匹配(多种参数)
python main.py --tasks 2 # 点云生成
python main.py --tasks 3 # 动态规划改进视差图
```
同时支持组合任务:
```bash
python main.py --tasks 12 # Task1 + Task2:用自己的算法生成点云
python main.py --tasks 23 # Task3 的点云
```
---
# ✅ 完整补全后的代码(严格遵循 PDF 要求)
```python
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import argparse
import trimesh
import multiprocessing as mp
from tqdm import tqdm
from typing import Tuple
import time
def normalize_disparity_map(disparity_map):
'''Normalize disparity map for visualization
disparity should be larger than zero
'''
return np.maximum(disparity_map, 0.0) / (disparity_map.max() + 1e-10)
def visualize_disparity_map(disparity_map, gt_map, save_path=None):
'''Visualize or save disparity map and compare with ground truth
'''
# Normalize disparity maps
disparity_map = normalize_disparity_map(disparity_map)
gt_map = normalize_disparity_map(gt_map)
# Visualize or save to file
if save_path is None:
concat_map = np.concatenate([disparity_map, gt_map], axis=1)
plt.imshow(concat_map, 'gray')
plt.show()
else:
os.makedirs(os.path.dirname(save_path), exist_ok=True)
concat_map = np.concatenate([disparity_map, gt_map], axis=1)
plt.imsave(save_path, concat_map, cmap='gray')
def task1_compute_disparity_map_simple(
ref_img: np.ndarray,
sec_img: np.ndarray,
window_size: int,
disparity_range: Tuple[int, int],
matching_function: str
):
"""
Basic stereo matching using window-based cost computation.
Implements SSD, SAD, and normalized cross-correlation.
Returns disparity map w.r.t. reference image.
"""
H, W = ref_img.shape
min_disp, max_disp = disparity_range
pad = window_size // 2
disparity_map = np.zeros((H, W), dtype=np.float32)
# Pad images to handle borders
ref_pad = np.pad(ref_img, pad_width=pad, mode='constant', constant_values=0)
sec_pad = np.pad(sec_img, pad_width=pad, mode='constant', constant_values=0)
# Precompute windows for efficiency (optional vectorization not used here for clarity)
for y in range(H):
for x in range(W):
x_ref = x + pad
y_ref = y + pad
ref_window = ref_pad[y_ref - pad:y_ref + pad + 1, x_ref - pad:x_ref + pad + 1]
best_cost = float('inf') if matching_function in ['SSD', 'SAD'] else -1.0
best_d = 0
for d in range(min_disp, max_disp):
x_sec = x_ref - d # left-right constraint: search only leftward
if x_sec < pad or x_sec >= W + pad:
continue
sec_window = sec_pad[y_ref - pad:y_ref + pad + 1, x_sec - pad:x_sec + pad + 1]
if matching_function == 'SSD':
cost = np.sum((ref_window - sec_window) ** 2)
if cost < best_cost:
best_cost = cost
best_d = d
elif matching_function == 'SAD':
cost = np.sum(np.abs(ref_window - sec_window))
if cost < best_cost:
best_cost = cost
best_d = d
elif matching_function == 'normalized_correlation':
mean_ref = np.mean(ref_window)
mean_sec = np.mean(sec_window)
numerator = np.sum((ref_window - mean_ref) * (sec_window - mean_sec))
denominator = np.sqrt(np.sum((ref_window - mean_ref)**2) * np.sum((sec_window - mean_sec)**2))
if denominator > 1e-6:
cost = numerator / denominator
else:
cost = -1.0
if cost > best_cost:
best_cost = cost
best_d = d
disparity_map[y, x] = best_d
return disparity_map
def task1_simple_disparity(ref_img, sec_img, gt_map, img_name='tsukuba'):
'''
Try different hyperparameters and generate disparity maps.
As per instructions, try various settings and analyze trade-offs.
'''
# Hyperparameter search space
window_sizes = [5, 9, 15]
disparity_range = (0, 64) # Based on typical Tsukuba dataset
matching_functions = ['SSD', 'SAD', 'normalized_correlation']
disparity_maps = []
print(f"Starting Task 1: Testing {len(window_sizes)*len(matching_functions)} configurations...")
for window_size in window_sizes:
for matching_function in matching_functions:
start_time = time.time()
print(f"Computing disparity map | window_size={window_size}, "
f"disparity_range={disparity_range}, matching_function={matching_function}")
disparity_map = task1_compute_disparity_map_simple(
ref_img, sec_img, window_size, disparity_range, matching_function)
runtime = time.time() - start_time
print(f"Runtime: {runtime:.2f}s")
disparity_maps.append((disparity_map, window_size, matching_function, disparity_range))
dmin, dmax = disparity_range
save_path = f"output/task1_{img_name}_{window_size}_{dmin}_{dmax}_{matching_function}.png"
visualize_disparity_map(disparity_map, gt_map, save_path=save_path)
# Save runtime info into a log file
with open(f"output/runtime_log.txt", "a") as f:
f.write(f"Task1,{img_name},{window_size},{dmin},{dmax},{matching_function},{runtime:.4f}\n")
return disparity_maps
def task2_compute_depth_map(disparity_map, baseline, focal_length):
"""
Compute depth map from disparity using:
depth(p) = (focal_length * baseline) / disparity(p)
Ignore non-positive disparities.
"""
depth_map = np.zeros_like(disparity_map)
valid = disparity_map > 0
depth_map[valid] = (focal_length * baseline) / (disparity_map[valid] + 1e-8)
return depth_map
def task2_visualize_pointcloud(
ref_img: np.ndarray,
disparity_map: np.ndarray,
save_path: str = 'output/task2_tsukuba.ply'
):
"""
Generate 3D point cloud from disparity map.
- X, Y: pixel coordinates
- Z: depth value computed from disparity
- Color: RGB from reference image
Exclude outliers (e.g., infinite/nan/very large depth).
"""
# Calibrated parameters (approximate for Tsukuba dataset)
baseline = 0.2 # meter (typical stereo rig)
focal_length = 615 # pixels (known calibration for Middlebury/Tsukuba)
depth_map = task2_compute_depth_map(disparity_map, baseline, focal_length)
H, W = depth_map.shape
points = []
colors = []
# Optional downsampling for performance
step = 2
for y in range(0, H, step):
for x in range(0, W, step):
z = depth_map[y, x]
if not (np.isfinite(z) and z > 0.1 and z < 50): # reasonable depth bounds
continue
points.append([x, y, z])
colors.append(ref_img[y, x])
if len(points) == 0:
print("Warning: No valid points to visualize.")
return
points = np.array(points)
colors = np.array(colors, dtype=np.uint8)
# Create and save point cloud
pc = trimesh.PointCloud(points, colors)
os.makedirs(os.path.dirname(save_path), exist_ok=True)
pc.export(save_path, file_type='ply')
print(f"Point cloud saved to {save_path}")
def task3_compute_disparity_map_dp(ref_img, sec_img):
"""
Stereo matching with dynamic programming along horizontal scanlines.
Minimizes energy: E = D + λ*S, where D is data cost and S is smoothness penalty.
We use SSD as data cost and linear smoothness term.
"""
H, W = ref_img.shape
window_size = 9
min_disp, max_disp = 0, 64
pad = window_size // 2
lambda_smooth = 5 # weight for smoothness term
disparity_map = np.zeros((H, W))
# Pad images
ref_pad = np.pad(ref_img, pad_width=pad, mode='constant', constant_values=0)
sec_pad = np.pad(sec_img, pad_width=pad, mode='constant', constant_values=0)
def compute_ssd(y, x_ref, x_sec):
"""Compute SSD in a window around (y,x)"""
win_ref = ref_pad[y:y+2*pad+1, x_ref:x_ref+2*pad+1]
win_sec = sec_pad[y:y+2*pad+1, x_sec:x_sec+2*pad+1]
return np.sum((win_ref - win_sec) ** 2)
for y in range(H):
dp = np.zeros((W, max_disp)) + float('inf') # dp[x][d] = min cost at col x with disp d
backtrack = np.zeros((W, max_disp), dtype=int)
# Initialize first column
for d in range(min_disp, max_disp):
x_ref = pad
x_sec = x_ref - d
if x_sec < pad or x_sec >= W + pad:
continue
cost = compute_ssd(y + pad, x_ref, x_sec)
dp[0, d] = cost
# Fill DP table
for x in range(1, W):
x_ref = x + pad
for d_curr in range(min_disp, max_disp):
x_sec = x_ref - d_curr
if x_sec < pad or x_sec >= W + pad:
continue
cost = compute_ssd(y + pad, x_ref, x_sec)
min_prev = float('inf')
best_d_prev = d_curr
for d_prev in range(min_disp, max_disp):
diff = abs(d_prev - d_curr)
penalty = 0 if diff == 0 else lambda_smooth * diff
total = dp[x-1, d_prev] + penalty
if total < min_prev:
min_prev = total
dp[x, d_curr] = cost + min_prev
# Backtrack
best_d = np.argmin(dp[W-1])
disparity_map[y, W-1] = best_d
for x in range(W-2, -1, -1):
best_d = np.argmin(dp[x])
disparity_map[y, x] = best_d
return disparity_map
def main(tasks):
# Load images
moebius_img1 = cv2.imread("data/moebius1.png")
moebius_img1_gray = cv2.cvtColor(moebius_img1, cv2.COLOR_BGR2GRAY).astype(np.float32)
moebius_img2 = cv2.imread("data/moebius2.png")
moebius_img2_gray = cv2.cvtColor(moebius_img2, cv2.COLOR_BGR2GRAY).astype(np.float32)
moebius_gt = cv2.imread("data/moebius_gt.png", cv2.IMREAD_GRAYSCALE).astype(np.float32)
tsukuba_img1 = cv2.imread("data/tsukuba1.jpg")
tsukuba_img1_gray = cv2.cvtColor(tsukuba_img1, cv2.COLOR_BGR2GRAY).astype(np.float32)
tsukuba_img2 = cv2.imread("data/tsukuba2.jpg")
tsukuba_img2_gray = cv2.cvtColor(tsukuba_img2, cv2.COLOR_BGR2GRAY).astype(np.float32)
tsukuba_gt = cv2.imread("data/tsukuba_gt.jpg", cv2.IMREAD_GRAYSCALE).astype(np.float32)
# Ensure output directory exists
os.makedirs("output", exist_ok=True)
# Task 0: OpenCV Baseline
if '0' in tasks:
print('Running Task 0: OpenCV StereoBM baseline...')
stereo = cv2.StereoBM.create(numDisparities=64, blockSize=15)
try:
moebius_disparity_cv2 = stereo.compute(moebius_img1_gray.astype(np.uint8),
moebius_img2_gray.astype(np.uint8))
visualize_disparity_map(moebius_disparity_cv2, moebius_gt, save_path="output/task0_moebius.png")
except Exception as e:
print("Moebius skipped due to error:", e)
tsukuba_disparity_cv2 = stereo.compute(tsukuba_img1_gray.astype(np.uint8),
tsukuba_img2_gray.astype(np.uint8))
visualize_disparity_map(tsukuba_disparity_cv2, tsukuba_gt, save_path="output/task0_tsukuba.png")
if '2' in tasks:
print('Generating point cloud with OpenCV result...')
task2_visualize_pointcloud(tsukuba_img1, tsukuba_disparity_cv2,
save_path='output/task2_tsukuba_cv2.ply')
# Task 1: Simple Matching
if '1' in tasks:
print('Running Task 1: Simple Window-Based Matching...')
start_time = time.time()
disparity_maps = task1_simple_disparity(tsukuba_img1_gray, tsukuba_img2_gray, tsukuba_gt, img_name='tsukuba')
print(f"Total time for Task 1: {time.time() - start_time:.2f}s")
if '2' in tasks:
print('Generating point clouds for Task 1 results...')
for (dm, ws, mf, dr) in disparity_maps:
dmin, dmax = dr
path = f'output/task2_tsukuba_{ws}_{dmin}_{dmax}_{mf}.ply'
task2_visualize_pointcloud(tsukuba_img1, dm, save_path=path)
# Task 3: Dynamic Programming
if '3' in tasks:
print('Running Task 3: Dynamic Programming Matching...')
start_time = time.time()
tsukuba_disparity_dp = task3_compute_disparity_map_dp(tsukuba_img1_gray, tsukuba_img2_gray)
runtime = time.time() - start_time
print(f"Task 3 runtime: {runtime:.2f}s")
# Log runtime
with open("output/runtime_log.txt", "a") as f:
f.write(f"Task3,tsukuba,DP,0,64,DP,{runtime:.4f}\n")
visualize_disparity_map(tsukuba_disparity_dp, tsukuba_gt, save_path='output/task3_tsukuba.png')
if '2' in tasks:
print('Generating point cloud for DP result...')
task2_visualize_pointcloud(tsukuba_img1, tsukuba_disparity_dp,
save_path='output/task2_tsukuba_dp.ply')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Homework 4: Binocular Stereo')
parser.add_argument('--tasks', type=str, default='0123', help='Tasks to run (e.g., 0, 12, 023)')
args = parser.parse_args()
main(args.tasks)
```
---
### 🔍 关键实现细节解释
| 模块 | 实现说明 |
|------|--------|
| `task1_compute_disparity_map_simple` | 滑动窗口遍历每个像素,在第二幅图像中搜索最佳匹配区域,支持 SSD/SAD/NCC(NCC 使用去均值化版本) |
| `task1_simple_disparity` | 遍历不同 `window_size`, `matching_function`,记录运行时间并保存到 `runtime_log.txt` |
| `task2_compute_depth_map` | 使用公式 $ Z = \frac{fB}{d} $,过滤无效视差 |
| `task2_visualize_pointcloud` | 构建 `(x,y,z)` 并赋予原图颜色,剔除异常深度点 |
| `task3_compute_disparity_map_dp` | 每行独立做动态规划,最小化带平滑项的能量函数 $E = D + \lambda S$ |
---
### 📁 输出结构示例
运行后生成:
```
output/
├── task0_tsukuba.png
├── task1_tsukuba_5_0_64_SSD.png
├── task1_tsukuba_9_0_64_SAD.png
├── ...
├── task3_tsukuba.png
├── task2_tsukuba_cv2.ply
├── task2_tsukuba_9_0_64_SSD.ply
├── task2_tsukuba_dp.ply
└── runtime_log.txt ← 包含所有配置的运行时间,用于写报告
```
---
### 🧾 报告写作建议(可直接使用)
#### ✅ 如何回答 PDF 中的问题?
> **How does the running time depend on window size, disparity range, and matching function?**
- 时间随 `window_size² × disparity_range` 增长。
- NCC 最慢(需计算均值和归一化),SSD 和 SAD 接近。
> **Which window size works best?**
- 小窗口(5×5)噪声多;大窗口(15×15)边缘模糊。
- 经验上 `9×9` 在质量和速度间平衡最好。
> **Maximum meaningful disparity range?**
- 对 Tsukuba 图像,最大视差约 64 已足够,更大无意义且增加计算量。
> **Which matching function performs better?**
- NCC 对光照变化鲁棒,表现通常优于 SSD/SAD。
- 但在纹理丰富区三者接近。
> **Trade-offs between quality and time?**
| 参数 | 提高质量 | 降低时间 |
|------|---------|--------|
| 大窗口 | ✅ 减少噪声 | ❌ 更慢 |
| 小视差范围 | ❌ 截断远点 | ✅ 更快 |
| NCC | ✅ 更鲁棒 | ❌ 更慢 |
> **Best configuration (example):**
```text
Window Size: 9
Matching Function: NCC
Disparity Range: 0–64
Reason: Best balance of accuracy and robustness.
```
---