本文将介绍地理空间数据融合的关键方法,包括多源数据对齐、时空数据融合、异构数据集成和质量评估,并提供完整的Python实现代码。

一、多源空间数据对齐

1.1 坐标系转换与统一
import geopandas as gpd
import pyproj
from shapely.ops import transform

def unify_coordinate_system(gdf, target_crs='EPSG:4326'):
    """统一坐标系到目标CRS"""
    if gdf.crs is None:
        raise ValueError("输入数据没有定义CRS")
    
    if gdf.crs != target_crs:
        return gdf.to_crs(target_crs)
    return gdf

def transform_geometry(geom, source_crs, target_crs):
    """单个几何对象的坐标转换"""
    project = pyproj.Transformer.from_crs(
        source_crs, target_crs, always_xy=True
    ).transform
    return transform(project, geom)

# 使用示例
if __name__ == "__main__":
    # 加载不同坐标系的数据
    gdf_wgs84 = gpd.read_file('data_wgs84.shp')  # EPSG:4326
    gdf_utm = gpd.read_file('data_utm.shp')     # EPSG:32650
    
    # 统一坐标系
    gdf_utm_unified = unify_coordinate_system(gdf_utm)
    print(f"原始CRS: {gdf_utm.crs}")
    print(f"转换后CRS: {gdf_utm_unified.crs}")
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
1.2 空间数据匹配与关联
from sklearn.neighbors import NearestNeighbors

def spatial_join_nearest(source_gdf, target_gdf, max_distance=100):
    """基于最近邻的空间连接"""
    # 确保坐标系一致
    target_gdf = unify_coordinate_system(target_gdf, source_gdf.crs)
    
    # 提取坐标数组
    source_coords = np.array([[p.x, p.y] for p in source_gdf.geometry])
    target_coords = np.array([[p.x, p.y] for p in target_gdf.geometry])
    
    # 构建最近邻模型
    nbrs = NearestNeighbors(n_neighbors=1, algorithm='ball_tree').fit(target_coords)
    distances, indices = nbrs.kneighbors(source_coords)
    
    # 筛选在阈值范围内的匹配
    mask = distances[:,0] <= max_distance
    matched = source_gdf[mask].copy()
    matched.reset_index(drop=True, inplace=True)
    
    # 添加匹配到的属性
    matched_target = target_gdf.iloc[indices[mask,0]].reset_index(drop=True)
    for col in matched_target.columns:
        if col != 'geometry':
            matched[col] = matched_target[col]
    
    return matched

# 使用示例
if __name__ == "__main__":
    points = gpd.read_file('points.shp')
    polygons = gpd.read_file('polygons.shp')
    
    # 执行空间连接
    joined = spatial_join_nearest(points, polygons)
    print(f"匹配到的点数: {len(joined)}/{len(points)}")
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.