linear search 和 binary search的区别

本文对比了线性搜索和二分搜索两种算法。线性搜索逐项查找,复杂度为O(n),适用于未排序的数据;而二分搜索通过不断将搜索区间折半来定位目标,复杂度为O(log n),适用于已排序的数据。二分搜索要求数据有序并支持随机访问。

linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n)search - the time taken to search the list gets bigger at the same rate as the list does.

binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc. This is pretty much how humans typically look up a word in a dictionary (although we use better heuristics, obviously - if you're looking for "cat" you don't start off at "M"). In complexity terms this is an O(log n) search - the number of search operations grows more slowly than the list does, because you're halving the "search space" with each operation.

As an example, suppose you were looking for U in an A-Z list of letters (index 0-25; we're looking for the value at index 20).

A linear search would ask:

list[0] == 'U'? No.
list[1] == 'U'? No.
list[2] == 'U'? No.
list[3] == 'U'? No.
list[4] == 'U'? No.
list[5] == 'U'? No.
... list[20] == 'U'? Yes. Finished.

The binary search would ask:

Compare list[12] ('M') with 'U': Smaller, look further on. (Range=13-25)
Compare list[19] ('T') with 'U': Smaller, look further on. (Range=20-25)
Compare list[22] ('W') with 'U': Bigger, look earlier. (Range=20-21)
Compare list[20] ('U') with 'U': Found it! Finished.

Comparing the two:

  • Binary search requires the input data to be sorted; linear search doesn't
  • Binary search requires an ordering comparison; linear search only requires equality comparisons
  • Binary search has complexity O(log n); linear search has complexity O(n) as discussed earlier
  • Binary search requires random access to the data; linear search only requires sequential access (this can be very important - it means a linear search can stream data of arbitrary size)
在ABAP中使用`BINARY SEARCH`(二分查找)可以显著提升内表(Internal Table)的查找效率,但需严格满足特定条件并注意潜在风险。以下是详细的使用注意事项及示例: --- ## **1. 核心前提条件** ### **1.1 内表必须已排序** - **适用内表类型**:仅对**排序表(SORTED TABLE)**或已通过`SORT`指令排序的标准表(STANDARD TABLE)有效。 - **未排序的后果**:若内表未排序,`BINARY SEARCH`会触发`SY-SUBRC=8`错误,且结果不可预测。 ```abap " 正确用法:先排序再二分查找 DATA: gt_students TYPE STANDARD TABLE OF ty_student. " 填充数据(无序) gt_students = VALUE #( ( id = 3 name = 'Carol' ) ( id = 1 name = 'Alice' ) ( id = 2 name = 'Bob' ) ). " 必须先排序 SORT gt_students BY id. " 二分查找 READ TABLE gt_students WITH KEY id = 2 BINARY SEARCH. IF sy-subrc = 0. WRITE: / 'Found:', gt_students-name. ENDIF. ``` ### **1.2 仅适用于主键或唯一键查找** - **限制**:`BINARY SEARCH`仅支持按**主键(UNIQUE KEY)**或**排序键(SORT KEY)**查找,不支持非键字段或条件组合查找。 - **错误示例**: ```abap " 错误!BINARY SEARCH不支持非键字段 READ TABLE gt_students WITH KEY name = 'Alice' BINARY SEARCH. " 可能报错或结果错误 ``` --- ## **2. 性能与效率** ### **2.1 时间复杂度** - **二分查找**:时间复杂度为 **O(log n)**,远优于线性查找的 **O(n)**。 - **适用场景**:大数据量(如超过1000行)时性能优势明显。 ### **2.2 排序开销** - **权衡**:若需频繁查找但数据不常变更,可预先排序并缓存;若数据频繁变动,需权衡排序成本。 ```abap " 示例:大数据量下的性能对比 DATA: gt_large_data TYPE STANDARD TABLE OF mara, " 物料主数据表 gd_start TYPE i, gd_end TYPE i. " 填充10万行测试数据 SELECT * FROM mara INTO TABLE gt_large_data UP TO 100000 ROWS. " 测试线性查找 GET RUN TIME FIELD gd_start. LOOP AT gt_large_data WHERE matnr = '100-100'. " 线性查找 GET RUN TIME FIELD gd_end. WRITE: / 'Linear search time:', gd_end - gd_start, 'ms'. " 测试二分查找(先排序) SORT gt_large_data BY matnr. GET RUN TIME FIELD gd_start. READ TABLE gt_large_data WITH KEY matnr = '100-100' BINARY SEARCH. GET RUN TIME FIELD gd_end. WRITE: / 'Binary search time:', gd_end - gd_start, 'ms'. ``` --- ## **3. 返回值与错误处理** ### **3.1 返回值检查** - **`SY-SUBRC`**: - `0`:成功找到匹配项。 - `4`:未找到匹配项。 - `8`:内表未排序或键不匹配。 ```abap READ TABLE gt_students WITH KEY id = 99 BINARY SEARCH. CASE sy-subrc. WHEN 0. WRITE: / 'Found student with ID 99'. WHEN 4. WRITE: / 'Student not found'. WHEN 8. WRITE: / 'Error: Table not sorted or invalid key'. ENDCASE. ``` ### **3.2 重复键处理** - **行为**:若存在重复键,`BINARY SEARCH`返回**第一个匹配项**(基于排序顺序),而非所有匹配项。 - **替代方案**:需遍历所有匹配项时,改用`LOOP AT ... WHERE`或先排序后循环。 ```abap " 示例:重复键的处理 DATA: gt_duplicates TYPE SORTED TABLE OF ty_student WITH NON-UNIQUE KEY grade. gt_duplicates = VALUE #( ( id = 1 name = 'Alice' grade = 90 ) ( id = 2 name = 'Bob' grade = 90 ) ( id = 3 name = 'Carol' grade = 85 ) ). " 二分查找仅返回第一个grade=90的记录 READ TABLE gt_duplicates WITH KEY grade = 90 BINARY SEARCH. WRITE: / 'First match:', gt_duplicates-name. " 输出'Alice' " 遍历所有匹配项需用LOOP LOOP AT gt_duplicates INTO DATA(gs_dup) WHERE grade = 90. WRITE: / 'All matches:', gs_dup-name. ENDLOOP. ``` --- ## **4. 高级注意事项** ### **4.1 与`HASHED TABLE`的区别** - **哈希表**:`HASHED TABLE`通过哈希算法直接定位数据,时间复杂度为 **O(1)**,无需排序。 - **选择建议**: - 若需频繁按主键查找且数据量大,优先使用`HASHED TABLE`。 - 若需范围查询或部分键匹配,使用`SORTED TABLE` + `BINARY SEARCH`。 ```abap " 哈希表示例(无需排序) DATA: gt_hash TYPE HASHED TABLE OF ty_student WITH UNIQUE KEY id. gt_hash = VALUE #( ( id = 1 name = 'Alice' ) ( id = 2 name = 'Bob' ) ). " 直接查找,无需BINARY SEARCH READ TABLE gt_hash INTO gs_student WITH KEY id = 1. ``` ### **4.2 动态键的兼容性** - **限制**:`BINARY SEARCH`不支持动态键(如通过变量指定字段名),需硬编码键字段。 - **替代方案**:使用`ASSIGN COMPONENT`或`RTTS`动态构建查找逻辑。 ```abap " 错误示例:动态键不支持 DATA: gd_fieldname TYPE string VALUE 'ID'. READ TABLE gt_students WITH KEY (gd_fieldname) = 1 BINARY SEARCH. " 语法错误 ``` ### **4.3 内存局部性** - **优势**:二分查找通过跳转访问内存,可能降低CPU缓存命中率;线性查找在连续内存中顺序访问,对小数据量可能更快。 - **建议**:数据量小于1000行时,线性查找可能更高效。 --- ## **5. 最佳实践总结** 1. **始终检查`SY-SUBRC`**:确保处理未找到或错误情况。 2. **优先对排序表使用**:避免在标准表上频繁排序。 3. **大数据量优化**:超过1万行时,`BINARY SEARCH`性能优势明显。 4. **避免非键字段查找**:二分查找仅适用于主键或排序键。 5. **结合`SORT`指令**:若数据需多次查找,预先排序并缓存。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值