探索拓扑规则检查

 

6-14日开会工作又有大变动。

一周过半,修改了拓扑的错误,接下来要关注拓扑规则检查和几何数据检查的内容并自己实现,不知道两周能否全部搞定。

从ArcGIS9.2的帮助中摘录点资料来参考,中文的文献竟然没找到几个。

ArcToolbox banner

Topology Rules by Origin Feature Class

Adds a new rule to a topology.

There are many topology rules that you can add to your topology. The rules you choose depend on the spatial relationships that are most important to maintain for the feature classes that participate in the topology.

Some topology rules govern the relationships of features within a given feature class, while others govern the relationships between features in two different feature classes. Topology rules can also be defined between subtypes of features in one or another feature class.

Topology rules are organized based on the geometry type of the origin feature class (polygon, line, or point). The origin feature class is the first feature class specified when adding the rule.

Polygon Rules

· Must Not Overlap—This rule requires that the interior of polygons in the feature class not overlap. The polygons can share edges or vertices. This rule is used when an area cannot belong to two or more polygons. It is useful for modeling administrative boundaries, such as ZIP Codes or voting districts, and mutually exclusive area classifications, such as land cover or landform type.

· Must Not Have Gaps—This rule requires that polygons not have voids within themselves or between adjacent polygons. Polygons can share edges, vertices, or interior areas. Polygons can also be completely disconnected. This rule is used when polygons or blocks of contiguous polygons should not have empty spaces within them. It is useful for modeling land ownership, as in a parcel fabric, where a given area is completely allotted to various polygons, but where external areas (roadways, for example) are not modeled in the same feature class.

· Must Not Overlap With—This rule requires that the interior of polygons in one feature class must not overlap with the interior of polygons in another feature class. Polygons of the two feature classes can share edges or vertices or be completely disjointed. This rule is used when an area cannot belong to two separate feature classes. It is useful for combining two mutually exclusive systems of area classification, such as zoning and water-body type, where areas defined within the zoning class cannot also be defined in the water body class and vice versa.

· Must Be Covered By Feature Class Of—This rule requires that a polygon in one feature class must share all of its area with polygons in another feature class. An area in the first feature class that is not covered by polygons from the other feature class is an error. This rule is used when an area of one type, such as a state, should be completely covered by areas of another type, such as counties.

· Must Cover Each Other—This rule requires that the polygons of one feature class must share all of their area with the polygons of another feature class. Polygons may share edges or vertices. Any area defined in either feature class that is not shared with the other is an error. This rule is used when two systems of classification are used for the same geographic area and any given point defined in one system must also be defined in the other. One such case occurs with nested hierarchical datasets, such as census blocks and block groups or small watersheds and large drainage basins. The rule can also be applied to nonhierarchically related polygon feature classes, such as soil type and slope class.

· Must Be Covered By—This rule requires that polygons of one feature class must be contained within polygons of another feature class. Polygons may share edges or vertices. Any area defined in the contained feature class must be covered by an area in the covering feature class. This rule is used when area features of a given type must be located within features of another type. This rule is useful when modeling areas that are subsets of a larger surrounding area, such as management units within forests or blocks within block groups.

· Boundary Must Be Covered By—This rule requires that boundaries of polygon features must be covered by lines in another features class. This rule is used when area features need to have line features that mark the boundaries of the areas. This is usually when the areas have one set of attributes and their boundaries have other attributes. For example, parcels might be stored in the geodatabase along with their boundaries. Each parcel might be defined by one or more line features that store information about their length or the date surveyed, and every parcel should exactly match its boundaries.

· Area Boundary Must Be Covered By Boundary Of—This rule requires that boundaries of polygon features in one feature class be covered by boundaries of polygon features in another feature class. This is useful when polygon features in one feature class, such as subdivisions, are composed of multiple polygons in another class, such as parcels, and the shared boundaries must be aligned.

· Contains Point—This rule requires that a polygon in one feature class contain at least one point from another feature class. Points must be within the polygon, not on the boundary. This is useful when every polygon should have at least one associated point, such as when parcels must have an address point.

Line Rules

· Must Not Overlap—This rule requires that lines not overlap with lines in the same feature class. This rule is used where line segments should not be duplicated; for example, in a stream feature class. Lines can cross or intersect but cannot share segments.

· Must Not Intersect—This rule requires that line features from the same feature class not cross or overlap each other. Lines can share endpoints. This rule is used for contour lines that should never cross each other or in cases where the intersection of lines should only occur at endpoints, such as street segments and intersections.

· Must Not Have Dangles—This rule requires that a line feature must touch lines from the same feature class at both endpoints. An endpoint that is not connected to another line is called a dangle. This rule is used when line features must form closed loops, such as when they are defining the boundaries of polygon features. It may also be used in cases where lines typically connect to other lines, as with streets. In this case, exceptions can be used where the rule is occasionally violated, as with cul-de-sac or dead end street segments.

· Must Not Have Pseudo-Nodes—This rule requires that a line connect to at least two other lines at each endpoint. Lines that connect to one other line (or to themselves) are said to have pseudo-nodes. This rule is used where line features must form closed loops, such as when they define the boundaries of polygons or when line features logically must connect to two other line features at each end, as with segments in a stream network, with exceptions being marked for the originating ends of first-order streams.

· Must Not Intersect Or Touch Interior—This rule requires that a line in one feature class must only touch other lines of the same feature class at endpoints. Any line segment in which features overlap, or any intersection not at an endpoint, is an error. This rule is useful where lines must only be connected at endpoints, such as in the case of lot lines, which must split (only connect to the endpoints of) back lot lines, and which cannot overlap each other.

· Must Not Overlap With—This rule requires that a line from one feature class not overlap with line features in another feature class. This rule is used when line features cannot share the same space. For example, roads must not overlap with railroads or depression subtypes of contour lines cannot overlap with other contour lines.

· Must Be Covered By Feature Class Of—This rule requires that lines from one feature class must be covered by the lines in another feature class. This is useful for modeling logically different but spatially coincident lines, such as routes and streets. A bus route feature class must not depart from the streets defined in the street feature class.

· Must Be Covered By Boundary Of—This rule requires that lines be covered by the boundaries of area features. This is useful for modeling lines, such as lot lines, that must coincide with the edge of polygon features, such as lots.

· Endpoint Must Be Covered By—This rule requires that the endpoints of line features must be covered by point features in another feature class. This is useful for modeling cases where a fitting must connect two pipes or a street intersection must be found at the junction of two streets.

· Must Not Self Overlap—This rule requires that line features not overlap themselves. They can cross or touch themselves, but must not have coincident segments. This rule is useful for features like streets, where segments might touch in a loop, but where the same street should not follow the same course twice.

· Must Not Self Intersect—This rule requires that line features not cross or overlap themselves. This rule is useful for lines, such as contour lines, that cannot cross themselves.

· Must Be Single Part—This rule requires that lines must have only one part. This rule is useful where line features, such as highways, may not have multiple parts.

Point Rules

· Must Be Covered By Boundary Of—This rule requires that points fall on the boundaries of area features. This is useful when the point features help support the boundary system, such as boundary markers, which must be found on the edges of certain areas.

· Must Be Properly Inside Polygons—This rule requires that points fall within area features. This is useful when the point features are related to polygons, such as wells and well pads or address points and parcels.

· Must Be Covered By Endpoint Of—This rule requires that points in one feature class must be covered by the endpoints of lines in another feature class. This rule is similar to the line rule, "Endpoint Must Be Covered By", except that, in cases where the rule is violated, it is the point feature that is marked as an error, rather than the line. Boundary corner markers might be constrained to be covered by the endpoints of boundary lines.

· Must Be Covered By Line—This rule requires that points in one feature class must be covered by lines in another feature class. It does not constrain the covering portion of the line to be an endpoint. This rule is useful for points that fall along a set of lines, such as highway signs along highways.

ArcToolbox banner

Check Geometry (Data Management)

Inspects each feature's geometry for problems. Valid input formats are shapefile and feature classes stored in a personal geodatabase or file geodatabase. SDE Geodatabases automatically check each geometry's validity when it is uploaded, therefore the Check and Repair Geometry tools are not for use with SDE.

sub sectionUsage tips

· The Output table will have one record for each problem found. If no problems are found, the Output table will have no records.

· The Output table has the following fields:

· CLASS—The full path to and name of the feature class in which the problem was found.

· FEATURE_ID—The Feature ID (FID) or Object ID (OID) for the feature with the geometry problem.

· PROBLEM—A short description of the problem.

· The PROBLEM field will contain one of the following:

· Short segment—Some segments are shorter than allowed by the system units of the spatial reference associated with the geometry.

· Null geometry—The feature has no geometry or nothing in the SHAPE field.

· Incorrect ring ordering—The polygon is topologically simple, but its rings may not be oriented correctly (outer rings - clockwise, inner rings - counterclockwise).

· Incorrect segment orientation—Individual segments are not consistently oriented. The "to" point of seg i should be incident on the "from" point of seg i+1.

· Self intersections—The interior of each part must not intersect itself or other parts. For a multipoint geometry, this means two of the points in the multipoint are in the same location (same x and y coordinate).

· Unclosed rings—The last segment in a ring must have its "to" point incident on the "from" point of the first segment.

· Empty parts—The geometry has multiple parts, and one of them is empty (has no geometry).

· For multipoint features, only the null geometry and empty part problems apply.

· For point features, only the null geometry problem applies.

· The problem identified by this tool can be fixed in the following ways:

· Editing the feature class with the geometry problem and fixing each individual problem identified. Some of the problems, such as nonsimple geometry, can not be fixed in the Editor.

· By running the Repair Geometry tool on the feature classes that were identified as having geometry problems.

· The following environment setting affects this tool: Extent.

· To facilitate the review of the features which are reported to have geometry problems, in ArcMap you can join the input features to the output table using the Join tool. Simply join using the input's ObjectID field, and the output table's FEATURE_ID field. You may also uncheck the "Keep All" option so that only features with geometry problems are displayed.

ArcToolbox banner

Repair Geometry (Data Management)

Inspects each feature's geometry for problems and fixes the problems that are found. Valid input formats are shapefile and feature classes stored in a personal geodatabase or file geodatabase. SDE Geodatabases automatically check each geometry's validity when it is uploaded, therefore the Check and Repair Geometry tools are not for use with SDE.

sub sectionUsage tips

· The Check Geometry tool can be used to identify feature classes and features within those feature classes that have geometry problems.

· This tool uses the same logic as the Check Geometry tool, but when a geometry problem is discovered, a fix (listed below) is applied immediately.

· Problems repaired with this tool:

· Null geometry—The feature will be deleted from the feature class.

· Short segment—The geometry's short segment will be deleted.

· Incorrect ring ordering—The geometry will be updated to have correct ring ordering.

· Incorrect segment orientation—The geometry will be updated to have correct segment orientation.

· Self intersections—The geometry's segments that intersect will be split at their intersection.

· Unclosed rings—The unclosed rings will be closed.

· Empty parts—The parts that are null or empty will be deleted.

· After applying one of the repairs above, the tool will re-evaluate the feature's geometry to see if it still has a problems, if another problem is discovered, another repair will be performed immediately on that feature. This can lead to more than one fix being applied to a single feature.

· Line features that are M aware will not be modified (repaired) for any of the cases above unless they are short segments or null geometry.

· The following environment setting affects this tool: Extent and workspace.

<think>嗯,用户这次明确提出了关于Segment数据集格式错误的问题,看来是遇到了实际的数据处理障碍。之前对话中用户展示过对技术细节的关注(比如要求严格的LaTeX格式),这次同样是个需要精准解决的技术问题。从用户的问题描述看,ta可能遇到了两种场景:要么是正在尝试加载Segment数据集时收到格式错误提示,要么是准备数据时不确定正确规范。用户使用了“OR”连接两个搜索关键词,说明ta对问题的定位还不够精确——这正是需要我帮ta梳理的地方。注意到用户提到了“参考站内引用”,但那个引用内容(关于JSON格式和Flume)和当前问题关联度很低。这可能是系统自动关联的结果,我需要完全忽略这个干扰项,专注于用户真正的数据集格式化需求。Segment数据集通常用于计算机视觉的实例分割任务,常见格式错误往往集中在三方面:文件组织结构混乱、标注格式错误(特别是JSON的掩膜编码)、数据与标注不匹配。用户没有说明使用什么框架(Detectron2/YOLO/MMDetection等),所以解决方案需要保持通用性。考虑到用户的技术背景(能提出精确搜索关键词),我会先结构化地列出验证步骤,再给出具体转换方案。特别要强调Polygon/RLE的转换问题——这是CV领域最常见的错误点,比如用户可能把坐标点数组和区域字典弄混了。最后补上可视化验证工具建议,因为肉眼检查掩膜数据最可靠。用户情绪似乎有些着急(直接用了“solution”这样的关键词),所以解决方案要立即可用。准备用bulletpoints给出可复用的代码片段,比如Python的json.dumps参数设置就经常被忽略,导致多出空格引发解析错误。另外想到,如果用户在用COCO格式,务必提醒categories数组里必须要有“supercategory”字段,不少开源库强制检查这个。</think> 以下是解决Segment数据集格式错误的分步方案(以计算机视觉中常见的实例分割数据集为例): ### 一、核心验证步骤 1. **检查目录结构** 合法格式示例: ``` dataset_root/ ├── images/ # 原始图片 │ ├── train/ │ └── val/ └── annotations/ # 标注文件 ├── train.json # COCO格式 └── val.json ``` *常见错误*:图片与标注文件未分级存储 2. **验证标注文件格式** 使用官方校验工具(如COCO API): ```python from pycocotools.coco import COCO coco = COCO('annotations/train.json') # 无报错即表示格式有效 ``` 3. **检查标注字段完整性** ```json { "images": [{"id": 1, "file_name": "img1.jpg", ...}], "annotations": [ { "id": 1, "image_id": 1, // 必须与images中的id对应 "category_id": 1, "segmentation": [[x1,y1,x2,y2,...]], // 多边形坐标 "area": 123.45, // 必须为浮点数 "bbox": [x,y,width,height] // 需为整数 } ], "categories": [{"id": 1, "name": "cat"}] // 类别ID需连续 } ``` ### 二、高频错误修复方案 1. **坐标格式错误** 多边形坐标应为归一化值(YOLO)或绝对坐标(COCO): ```python # COCO绝对坐标修正 segmentation = [[round(x) for x in points]] # 坐标取整 ``` 2. **ID不匹配问题** 重建连续ID映射: ```python new_id_map = {old_id: i for i, old_id in enumerate(set(original_ids))} ``` 3. **掩码转换(如VOC转COCO)** 使用`pycococreator`转换工具: ```python from pycococreatortools import binary_mask_to_polygon polygon = binary_mask_to_polygon(mask) # 二值掩码转多边形 ``` ### 三、自动化格式化工具 | 工具名称 | 适用格式转换 | 安装命令 | |---------|-------------|---------| | `labelme2coco` | LabelMe → COCO | `pip install labelme` | | `pascal2coco` | VOC XML → COCO | [GitHub仓库](https://github.com/yukkyo/voc2coco) | | `mask2json` | 二值掩码 → COCO | `pip install mask2json` | ### 四、数据集验证脚本 ```python # 验证标注文件 import json def validate_annotation(json_path): with open(json_path) as f: data = json.load(f) # 校验ID唯一性 image_ids = {img['id'] for img in data['images']} for ann in data['annotations']: assert ann['image_id'] in image_ids, f"错误:找不到图片ID {ann['image_id']}" assert type(ann['area']) is float, "area必须为浮点数" print("✅ 标注文件验证通过") ``` ### 五、可视化检查(关键步骤) 使用`pycocotools`可视化: ```python coco = COCO('annotations.json') img_id = coco.getImgIds()[0] coco.showAnns(coco.loadAnns(coco.getAnnIds(imgIds=img_id))) ``` ![可视化标注效果](https://cocodataset.org/images/visualization.jpg) *图示:正确标注应完全贴合目标边界[^1]* ### 六、调试建议 若持续报错,在加载数据时捕获具体错误: ```python try: dataset = load_dataset("my_dataset") except DatasetFormatError as e: print(f"格式错误位置: {e.line_num}, 字段: {e.field_name}") # 输出错误行上下文 with open(e.file_path) as f: lines = f.readlines() print(f"错误行内容:\n{lines[e.line_num-1]}") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值