(3)校正聚簇因子(CLUSTERING_FACTOR )——下

本文探讨了自动段空间管理(ASSM)如何通过增加数据分布的随机性减少争用,但可能导致索引聚簇因子增大,进而影响执行计划。介绍了通过重新组织表或使用SYS_OP_COUNTCHG()函数调整聚簇因子的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、ASSM带来的性能问题

ASSM通过增加数据的分布的随机性来减少争用。在ASSM下,并发的进程基本都选择不同的块来插入数据行。因此,在我们使用序列或者以日期作为插入顺序的表并在其上面建立索引的话,可能将会导致索引的聚簇因子比较大。

SQL> create tablespace assm datafile '+DG' size 100m autoextend on next 100m maxsize 1g segment space management auto;

表空间已创建。

SQL> create tablespace mssm datafile '+DG' size 100m autoextend on next 100m maxsize 1g segment space management manual;

表空间已创建。

SQL> select tablespace_name,segment_space_management from dba_tablespaces where tablespace_name in ('ASSM','MSSM');

TABLESPACE_NAME                SEGMEN
------------------------------ ------
ASSM                           AUTO
MSSM                           MANUAL

SQL> create table assm_test(id int,flag char(1)) tablespace assm;

表已创建。

SQL> create table mssm_test(id int,flag char(1)) tablespace mssm;

表已创建。

SQL> create sequence assm_s;

序列已创建。

SQL> create sequence mssm_s;

序列已创建。

首先创建2序列和2个表,一个创建在ASSM管理的表空间下,另一个创建在手工段空间管理的表空间下。

SQL> create or replace procedure load_assm_data(v_flag char) as
  2  begin
  3    for i in 1 .. 1000 loop
  4      insert into assm_test values (assm_s.nextval, v_flag);
  5    end loop;
  6    commit;
  7  end;
  8  /

过程已创建。

SQL> create or replace procedure load_mssm_data(v_flag char) as
  2  begin
  3    for i in 1 .. 1000 loop
  4      insert into mssm_test values (mssm_s.nextval, v_flag);
  5    end loop;
  6    commit;
  7  end;
  8  /

过程已创建。

分别创建2个过程用来为表ASSM_TEST和MSSM_TEST加载数据。

SQL>  declare
  2    jobname varchar2(30);
  3   BEGIN
  4     FOR I IN 0..4 LOOP
  5       JOBNAME:=DBMS_SCHEDULER.GENERATE_JOB_NAME;
  6       DBMS_SCHEDULER.CREATE_JOB(
  7       job_name=>JOBNAME,
  8       job_type=>'STORED_PROCEDURE',
  9       job_action=>'LOAD_ASSM_DATA',
 10       auto_drop=>TRUE,
 11       enabled=>FALSE,
 12       number_of_arguments=>1);
 13       DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE(JOBNAME,1,CHR(65+I));
     DBMS_SCHEDULER.ENABLE(JOBNAME);
   END LOOP;
 END;
 /
 
PL/SQL 过程已成功完成。
 
SQL>  declare
  jobname varchar2(30);
 BEGIN
   FOR I IN 0..4 LOOP
     JOBNAME:=DBMS_SCHEDULER.GENERATE_JOB_NAME;
     DBMS_SCHEDULER.CREATE_JOB(
     job_name=>JOBNAME,
     job_type=>'STORED_PROCEDURE',
     job_action=>'LOAD_MSSM_DATA',
     auto_drop=>TRUE,
     enabled=>FALSE,
     number_of_arguments=>1);
     DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE(JOBNAME,1,CHR(65+I));
     DBMS_SCHEDULER.ENABLE(JOBNAME);
   END LOOP;
 END;

PL/SQL 过程已成功完成。

使用JOB的方式,同时启动5个并发进程,向表assm_test和mssm_test插入数据。

SQL> create index assm_test_idx on assm_test(id);

索引已创建。

SQL> create index mssm_test_idx on mssm_test(id);

索引已创建。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'ASSM_TEST',CASCADE=>TRUE);

PL/SQL 过程已成功完成。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'MSSM_TEST',CASCADE=>TRUE);

PL/SQL 过程已成功完成。

SQL> SELECT TABLE_NAME,BLOCKS,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME IN ('ASSM_TEST','MSSM_TEST');

TABLE_NAME                         BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
ASSM_TEST                              20       5000
MSSM_TEST                              12       5000

SQL> SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS,CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME IN ('ASSM_TEST','MSSM_TEST');

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
ASSM_TEST_IDX                           1          11               963
MSSM_TEST_IDX                           1          11                20

很明显,ASSM下的空间管理方式下的聚簇因子远远大于手工段空间管理模式下的。

SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK# FROM ASSM_TEST;

    BLOCK#
----------
       134
       132
       141
       133
       143
       135
       131
       142
       145

已选择9行。

可以看到,表ASSM_TEST只有20个BLOCKS,实际表的数据只占用了9个BLOCK,但是聚簇因子却达到了963。 这就是ASSM在避免了表争用问题带来的一个副作用。

SQL> SELECT BLOCK#, COUNT(*)
  2    FROM (SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#, FLAG
  3            FROM ASSM_TEST)
  4   GROUP BY BLOCK#;

    BLOCK#   COUNT(*)
---------- ----------
       134          2
       132          1
       141          1
       133          3
       143          1
       135          4
       131          1
       142          1
       145          1

已选择9行。

可以看到大部分块,被一个进程使用,减少了争用。而在MSSM下,一个块基本被多个进程争用。

SQL> SELECT BLOCK#, COUNT(*)
  2    FROM (SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#, FLAG
  3            FROM MSSM_TEST)
  4   GROUP BY BLOCK#;

    BLOCK#   COUNT(*)
---------- ----------
       129          1
       134          4
       132          4
       133          4
       135          4
       136          4
       130          5
       131          4

已选择8行。

下面看看,ASSM对执行计划的影响:

SQL> SELECT * FROM  MSSM_TEST WHERE ID BETWEEN 1 AND 50;

已选择49行。


执行计划
----------------------------------------------------------
Plan hash value: 2228967704

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |    49 |   294 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MSSM_TEST     |    49 |   294 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | MSSM_TEST_IDX |    49 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=50)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       1385  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         49  rows processed

我们从5000行的数据里选取了50行,选择率大约为1/100。此时索引扫描相对较好。

SQL> SELECT * FROM  ASSM_TEST WHERE ID BETWEEN 1 AND 50;

已选择49行。


执行计划
----------------------------------------------------------
Plan hash value: 166377213

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |    49 |   294 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| ASSM_TEST |    49 |   294 |     5   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID"<=50 AND "ID">=1)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         27  consistent gets
          0  physical reads
          0  redo size
       1274  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         49  rows processed

但是在ASSM下,ORACLE却选择了成本较高的全表扫描。这就是ASSM的特性带来的对执行计划的影响。如果使用索引,ORACLE计算出来的成本会比全表扫描高。但实际走索引的效率会更好,从逻辑读取也可以看得出来。

SQL> SELECT /*+INDEX(ASSM_TEST)*/  * FROM  ASSM_TEST  WHERE ID BETWEEN 1 AND 50;

已选择49行。


执行计划
----------------------------------------------------------
Plan hash value: 3633427144

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |    49 |   294 |    12   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ASSM_TEST     |    49 |   294 |    12   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ASSM_TEST_IDX |    49 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=50)


统计信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       1385  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         49  rows processed

 

2、校正CLUSTERING_FACTOR

(1)、重新组织表
SQL> CREATE TABLE ASSM_TEST_2 TABLESPACE ASSM AS SELECT * FROM ASSM_TEST ORDER BY ID;

表已创建。

SQL> CREATE INDEX ASSM_TEST_2_IDX ON ASSM_TEST_2(ID);

索引已创建。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'ASSM_TEST_2',CASCADE=>TRUE);

PL/SQL 过程已成功完成。

SQL> SELECT TABLE_NAME,BLOCKS,NUM_ROWS FROM USER_TABLES WHERE TABLE_NAME LIKE '%SSM%';

TABLE_NAME                         BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
ASSM_TEST                              20       5000
MSSM_TEST                              12       5000
ASSM_TEST_2                            11       5000

SQL> SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS,CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME LIKE '%SSM%';

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
MSSM_TEST_IDX                           1          11                20
ASSM_TEST_2_IDX                         1          11                 8
ASSM_TEST_IDX                           1          11               963

SQL> set autotrace trace;
SQL> SELECT * FROM  ASSM_TEST_2 WHERE ID BETWEEN 1 AND 50;

已选择49行。


执行计划
----------------------------------------------------------
Plan hash value: 1713382986

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |    49 |   294 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ASSM_TEST_2     |    49 |   294 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ASSM_TEST_2_IDX |    49 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=50)


统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       1385  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         49  rows processed

我们复制了一个与ASSM_TEST一模一样的表。可以看到,重新组织表后,使ORACLE从全表扫描走了索引扫描。

 

(2)使用SYS_OP_COUNTCHG()来校正CLUSTERING_FACTOR
SQL> ALTER SESSION SET SQL_TRACE TRUE;

会话已更改。

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'ASSM_TEST',CASCADE=>TRUE);

PL/SQL 过程已成功完成。

SQL> ALTER SESSION SET SQL_TRACE FALSE;

会话已更改。

SQL> select d.value || '/' || lower(rtrim(i.instance, chr(0))) || '_ora_' ||
  2    from (select p.spidc' trace_file_name
       p.spid || '.trc' trace_file_name
  3    from (select p.spid
  4            from v$mystat m, v$session s, v$process p
  5           where m.statistic# = 1
  6             and s.sid = m.sid
  7             and p.addr = s.paddr) p,
  8         (select t.instance
  9            from v$thread t, v$parameter v
 10           where v.name = 'thread'
 11             and (v.value = 0 or t.thread# = to_number(v.value))) i,
 12         (select value from v$parameter where name = 'user_dump_dest') d;

TRACE_FILE_NAME
--------------------------------------------------------------------------------
/u01/app/diag/rdbms/orcl/orcl/trace/orcl_ora_5714.trc

SQL> exit
从 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options 断开
[oracle@linux ~]$ su - root
口令:
[root@linux ~]# cd /u01/event/
[root@linux event]# cp /u01/app/diag/rdbms/orcl/orcl/trace/orcl_ora_5714.trc .
[root@linux event]# tkprof orcl_ora_5714.trc orcl_ora_5714.txt

TKPROF: Release 11.2.0.1.0 - Development on 星期二 2月 5 21:35:29 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

启用SQL跟踪,格式化跟踪文件,并在跟踪文件中找到如下SQL语句。

select /*+ no_parallel_index(t, "ASSM_TEST_IDX") dbms_stats 
  cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  no_substrb_pad  no_expand index(t,"ASSM_TEST_IDX") */ count(*) as nrw,
  count(distinct sys_op_lbid(102382,'L',t.rowid)) as nlb,null as ndk,
  sys_op_countchg(substrb(t.rowid,1,15),1) as clf 
from
 "U1"."ASSM_TEST" t where "ID" is not null

其中,NRW=USER_INDEXES.NUM_ROWS,
     NLB=USER_INDEXES.LEAF_BLOCKS,
     NDK=USER_INDEXES.NUM_DISTINCT,
     CLF=USER_INDEXES.CLUSTERING_FACTOR。
修改 sys_op_countchg(substrb(t.rowid,1,15),1) as clf 第二个参数为5(并发的进程数),并在SQLPLUS执行,将得到的CLF值写回到数据字典中。。

SQL> select /*+ no_parallel_index(t, "ASSM_TEST_IDX") dbms_stats 
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring 
  3    no_substrb_pad  no_expand index(t,"ASSM_TEST_IDX") */
  4   count(*) as nrw,
  5   count(distinct sys_op_lbid(102382, 'L', t.rowid)) as nlb,
  6   null as ndk,
  7   sys_op_countchg(substrb(t.rowid, 1, 15), 5) as clf
  from "U1"."ASSM_TEST" t
  9   where "ID" is not null;

       NRW        NLB N        CLF
---------- ---------- - ----------
      5000         11            9

SQL> SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS,CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME LIKE '%SSM%';

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
MSSM_TEST_IDX                           1          11                20
ASSM_TEST_2_IDX                         1          11                 8
ASSM_TEST_IDX                           1          11               963

SQL> exec dbms_stats.set_index_stats(user,'ASSM_TEST_IDX',CLSTFCT=>9);

PL/SQL 过程已成功完成。

SQL> SELECT INDEX_NAME,BLEVEL,LEAF_BLOCKS,CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME LIKE '%SSM%';

INDEX_NAME                         BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
MSSM_TEST_IDX                           1          11                20
ASSM_TEST_2_IDX                         1          11                 8
ASSM_TEST_IDX                           1          11                 9

SQL> set autotrace trace;
SQL> SELECT * FROM  ASSM_TEST  WHERE ID BETWEEN 1 AND 50;

已选择49行。


执行计划
----------------------------------------------------------
Plan hash value: 3633427144

---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |    49 |   294 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ASSM_TEST     |    49 |   294 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ASSM_TEST_IDX |    49 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=50)


统计信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       1385  bytes sent via SQL*Net to client
        448  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         49  rows processed

此时,ORACLE也是从全表扫描转为索引扫描。

(3)总结

当使用当oracle使用MSSM时,freelists > 1:sys_op_countchg函数第二个参数就等于freelists。freelist_groups > 1:sys_op_countchg函数第二个参数就等于freelists * freelist_groups

freelists和freelist_groups可以用这个sql查到: select t.freelists,t.freelist_groups,t.* from user_tables t;
当oracle使用ASSM时,它可以同时分配16个新块并进行格式化。这就意味着新的数据将粗略的散布在这16块中,而不是紧密的聚合在一起。
调用sys_op_countchg()函数并将起参数设定为16,这样已经足以产生一个合理的clustering_factor,以取代当前这个毫无意义的值。但是参数16应该作为上限。如果并发进程的数目一般小于16,那么就应该使用并发进程真正的数目作为参数。
反转键索引、为索引添加列和对索引中的列进行重新排序等。函数sys_op_countchg()对这些问题没有帮助。可以通过创建一个只包括驱动列的索引,并计算该索引的clustering_factor,最后将其值传递给原来的索引。

SQL> select sys_op_countchg(substrb(t.rowid,1,15),5) as clf from ASSM_TEST t where "ID" is not null;


       CLF
----------
         9

SQL> SQL> select sys_op_countchg(substrb(t.rowid,1,15),9) as clf from ASSM_TEST t where "ID" is not null;


       CLF
----------
         9

SQL> SQL> select sys_op_countchg(substrb(t.rowid,1,15),16) as clf from ASSM_TEST t where "ID" is not null;

       CLF
----------
         9

在前面的例子中,我设置ASSM_TEST表的并发为5,计算出的clustering_factor为9;通过前面我还可以知道ASSM_TEST在数据主要分布在9个块中国,计算出的clustering_factor还是为9;如果设置并发上线16,计算出的clustering_factor还是为9。在实际工作中,我们不好估算并发的准确数字,就直接传递参数为16也未尝不是一个办法。

import cv2 import numpy as np from matplotlib import pyplot as plt from sklearn.cluster import KMeans import math # 纸张尺寸标准 (单位: mm) PAPER_SIZES = { "A4": (210, 297), "B5": (176, 250), "A5": (148, 210) } # 使用Harris角点检测算法检测图像中的角点 def detect_corners_with_harris(image, block_size=2, ksize=3, k=0.04, threshold=0.01): if len(image.shape) == 3: gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) else: gray = image gray = np.float32(gray) dst = cv2.cornerHarris(gray, block_size, ksize, k) dst = cv2.dilate(dst, None) corner_img = image.copy() corners = [] threshold_val = threshold * dst.max() for i in range(dst.shape[0]): for j in range(dst.shape[1]): if dst[i, j] > threshold_val: corners.append((j, i)) cv2.circle(corner_img, (j, i), 5, (0, 0, 255), 2) return corner_img, corners # 从Harris角点中找出最可能的纸张四个角点 def find_paper_corners_using_harris(corners, image_shape, top_k=4): height, width = image_shape[:2] top_left = (0, 0) top_right = (width, 0) bottom_right = (width, height) bottom_left = (0, height) candidates = { &#39;top_left&#39;: [], &#39;top_right&#39;: [], &#39;bottom_right&#39;: [], &#39;bottom_left&#39;: [] } for corner in corners: x, y = corner distances = { &#39;top_left&#39;: np.sqrt((x - top_left[0]) ** 2 + (y - top_left[1]) ** 2), &#39;top_right&#39;: np.sqrt((x - top_right[0]) ** 2 + (y - top_right[1]) ** 2), &#39;bottom_right&#39;: np.sqrt((x - bottom_right[0]) ** 2 + (y - bottom_right[1]) ** 2), &#39;bottom_left&#39;: np.sqrt((x - bottom_left[0]) ** 2 + (y - bottom_left[1]) ** 2) } closest_category = min(distances, key=distances.get) candidates[closest_category].append((corner, distances[closest_category])) selected_corners = [] for category in candidates: if candidates[category]: candidates[category].sort(key=lambda x: x[1]) selected_corners.append(candidates[category][0][0]) if len(selected_corners) < 4: return None return order_points(np.array(selected_corners)) # 对点进行排序:左上,右上,右下,左下 def order_points(pts): rect = np.zeros((4, 2), dtype="float32") s = pts.sum(axis=1) rect[0] = pts[np.argmin(s)] rect[2] = pts[np.argmax(s)] diff = np.diff(pts, axis=1) rect[1] = pts[np.argmin(diff)] rect[3] = pts[np.argmax(diff)] return rect # 检测图像中的纸张轮廓 def detect_paper_contour(image): gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (5, 5), 0) edged = cv2.Canny(blurred, 50, 200) contours, _ = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True)[:5] for contour in contours: peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, 0.02 * peri, True) if len(approx) == 4: return order_points(approx.reshape(4, 2)), gray return None, gray # 透视变换校正图像 def perspective_transform(image, points): def max_width_height(pts): width1 = np.sqrt(((pts[0][0] - pts[1][0]) ** 2) + ((pts[0][1] - pts[1][1]) ** 2)) width2 = np.sqrt(((pts[2][0] - pts[3][0]) ** 2) + ((pts[2][1] - pts[3][1]) ** 2)) max_width = max(int(width1), int(width2)) height1 = np.sqrt(((pts[0][0] - pts[3][0]) ** 2) + ((pts[0][1] - pts[3][1]) ** 2)) height2 = np.sqrt(((pts[1][0] - pts[2][0]) ** 2) + ((pts[1][1] - pts[2][1]) ** 2)) max_height = max(int(height1), int(height2)) return max_width, max_height points = order_points(points) max_width, max_height = max_width_height(points) dst = np.array([ [0, 0], [max_width - 1, 0], [max_width - 1, max_height - 1], [0, max_height - 1] ], dtype="float32") M = cv2.getPerspectiveTransform(points.astype("float32"), dst) warped = cv2.warpPerspective(image, M, (max_width, max_height)) return warped, M, max_width, max_height, points # 创建纸张区域掩码 def create_paper_mask(width, height): mask = np.ones((height, width), dtype=np.uint8) * 255 print(f"创建纸张掩码,尺寸: {mask.shape}, 类型: {mask.dtype}") return mask # 预处理图像以进行形状检测 def preprocess_image_for_shape_detection(image, paper_mask=None): # 转换为灰度图 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # 高斯模糊去噪 blurred = cv2.GaussianBlur(gray, (5, 5), 0) # 可选:如果有纸张掩码,仅处理纸张内部区域 if paper_mask is not None: # 确保掩码尺寸与图像匹配 if paper_mask.shape[:2] != gray.shape[:2]: print(f"调整掩码尺寸: {paper_mask.shape} -> {gray.shape[:2]}") paper_mask = cv2.resize(paper_mask, (gray.shape[1], gray.shape[0]), interpolation=cv2.INTER_NEAREST) # 确保掩码是uint8类型 if paper_mask.dtype != np.uint8: print("转换掩码数据类型为uint8") paper_mask = paper_mask.astype(np.uint8) # 应用掩码 try: blurred = cv2.bitwise_and(blurred, blurred, mask=paper_mask) except cv2.error as e: print(f"应用掩码时出错: {e}") print(f"灰度图形状: {gray.shape}, 掩码形状: {paper_mask.shape}") return gray, blurred # 使用边缘检测方法检测图像中的形状区域 def detect_shapes_with_edge_detection(image, paper_mask=None, min_area_ratio=0.001, edge_canny_threshold1=30, edge_canny_threshold2=100): h, w = image.shape[:2] min_area = min_area_ratio * w * h print(f"最小面积阈值: {min_area} 像素") # 1. 调整图像大小(提升效率) if h > 500 or w > 500: scale = 500 / max(h, w) resized = cv2.resize(image, None, fx=scale, fy=scale) h_resized, w_resized = resized.shape[:2] print(f"调整图像大小: {h}x{w} -> {h_resized}x{w_resized}, 缩放比例: {scale}") else: resized = image.copy() scale = 1.0 h_resized, w_resized = h, w print(f"图像大小未调整: {h}x{w}") # 2. 多通道处理 - 结合亮度和饱和度 hsv = cv2.cvtColor(resized, cv2.COLOR_BGR2HSV) s_channel = hsv[:, :, 1] # 饱和度通道 v_channel = hsv[:, :, 2] # 亮度通道 # 3. 自适应阈值处理 _, s_binary = cv2.threshold(s_channel, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) v_binary = cv2.adaptiveThreshold(v_channel, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2) # 4. 合并两个二值图像 binary = cv2.bitwise_or(s_binary, v_binary) # 5. 形态学操作优化 kernel = np.ones((3, 3), np.uint8) binary = cv2.morphologyEx(binary, cv2.MORPH_OPEN, kernel, iterations=1) # 去噪 binary = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2) # 填充 # 6. 应用纸张掩码 if paper_mask is not None: paper_mask_resized = cv2.resize(paper_mask, (w_resized, h_resized), interpolation=cv2.INTER_NEAREST).astype( np.uint8) binary = cv2.bitwise_and(binary, paper_mask_resized) # 7. 边缘检测 edges = cv2.Canny(binary, edge_canny_threshold1, edge_canny_threshold2) # 8. 进一步形态学操作优化边缘 edges = cv2.dilate(edges, kernel, iterations=1) edges = cv2.erode(edges, kernel, iterations=1) # 9. 提取轮廓并过滤小区域 contours, _ = cv2.findContours(edges if edges.any() else binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) valid_contours = [] for contour in contours: area = cv2.contourArea(contour) if area > min_area: # 修复轮廓坐标缩放问题 if scale != 1.0: contour[:, :, 0] = np.round(contour[:, :, 0] * (w / w_resized)).astype(int) contour[:, :, 1] = np.round(contour[:, :, 1] * (h / h_resized)).astype(int) valid_contours.append(contour) print(f" 保留轮廓,面积: {area:.1f} 像素") print(f"总共检测到 {len(valid_contours)} 个有效形状区域") return valid_contours, binary, edges # 优化角点检测,特别是针对菱形的角点 def refine_corners(contour, epsilon_factor=0.02): """优化轮廓的角点检测,特别是针对菱形等形状""" # 原始多边形近似 peri = cv2.arcLength(contour, True) approx = cv2.approxPolyDP(contour, epsilon_factor * peri, True) # 如果已经是4个点,尝试进一步优化 if len(approx) == 4: # 计算轮廓的矩 M = cv2.moments(contour) if M["m00"] != 0: cx = int(M["m10"] / M["m00"]) cy = int(M["m01"] / M["m00"]) # 将点按距离中心点的角度排序 points = approx.reshape(4, 2) angles = [] for point in points: dx = point[0] - cx dy = point[1] - cy angle = np.arctan2(dy, dx) angles.append(angle) # 按角度排序点 sorted_indices = np.argsort(angles) sorted_points = points[sorted_indices] # 重新排列为标准顺序(左上、右上、右下、左下) ordered_points = order_points(sorted_points) return ordered_points.reshape(-1, 1, 2).astype(np.int32) return approx # 识别物体形状,优化近似参数和判断逻辑 def identify_shape(contour): # 优化角点检测 approx = refine_corners(contour) num_sides = len(approx) area = cv2.contourArea(contour) # 计算轮廓周长,避免除以零 peri = cv2.arcLength(contour, True) if peri == 0: return "不规则形状" circularity = 4 * np.pi * area / (peri ** 2) if peri != 0 else 0 # 椭圆拟合(增加轮廓点数判断,避免报错) if len(contour) >= 5: try: ellipse = cv2.fitEllipse(contour) (center, axes, orientation) = ellipse major_axis = max(axes) minor_axis = min(axes) eccentricity = np.sqrt(1 - (minor_axis / major_axis) ** 2) if major_axis != 0 else 0 ellipse_area = np.pi * (major_axis / 2) * (minor_axis / 2) ellipse_ratio = area / ellipse_area if ellipse_area > 0 else 0 # 椭圆判定条件 if 0.8 <= ellipse_ratio <= 1.2 and 0.5 < eccentricity < 0.95: return "椭圆" except: pass # 多边形形状判断 if num_sides == 3: return "三角形" elif num_sides == 4: # 计算最小外接矩形 rect = cv2.minAreaRect(contour) (x, y), (width, height), angle = rect if min(width, height) == 0: # 避免除以0 aspect_ratio = 0 else: aspect_ratio = max(width, height) / min(width, height) # 计算轮廓的边界框 x, y, w, h = cv2.boundingRect(contour) bounding_ratio = max(w, h) / min(w, h) if min(w, h) > 0 else 0 # 计算四个顶点的坐标 points = approx.reshape(4, 2) # 计算相邻边之间的角度 angles = [] for i in range(4): p1 = points[i] p2 = points[(i + 1) % 4] p3 = points[(i + 2) % 4] v1 = p2 - p1 v2 = p3 - p2 dot_product = np.dot(v1, v2) cross_product = np.cross(v1, v2) angle = np.arctan2(cross_product, dot_product) * 180 / np.pi angles.append(abs(angle)) # 检查是否有直角 has_right_angle = any(abs(angle - 90) < 15 for angle in angles) # 计算边长 side_lengths = [] for i in range(4): p1 = points[i] p2 = points[(i + 1) % 4] side_length = np.sqrt((p2[0] - p1[0]) ** 2 + (p2[1] - p1[1]) ** 2) side_lengths.append(side_length) # 检查四边是否大致相等(菱形判定) avg_length = sum(side_lengths) / 4 is_rhombus = all(abs(l - avg_length) < avg_length * 0.2 for l in side_lengths) # 菱形判定条件:没有直角,四边大致相等,并且不是正方形 if not has_right_angle and is_rhombus and aspect_ratio < 1.5 and bounding_ratio > 1.2: return "菱形" # 正方形判定 if 0.85 <= aspect_ratio <= 1.15: return "正方形" # 矩形判定 else: return "矩形" elif circularity > 0.85: # 圆形判定 return "圆形" elif 5 <= num_sides <= 10: return f"{num_sides}边形" else: return "不规则形状" # 计算物体的几何属性 def calculate_properties(contour, pixel_to_mm=1.0): rect = cv2.minAreaRect(contour) width_px, height_px = rect[1] dimensions_px = sorted([width_px, height_px], reverse=True) length_px, width_px = dimensions_px length_mm = length_px * pixel_to_mm width_mm = width_px * pixel_to_mm area_px = cv2.contourArea(contour) area_mm = area_px * (pixel_to_mm ** 2) M = cv2.moments(contour) if M["m00"] != 0: cx = int(M["m10"] / M["m00"]) cy = int(M["m01"] / M["m00"]) else: cx, cy = 0, 0 return length_mm, width_mm, area_mm, (cx, cy), length_px, width_px, area_px # 提取物体主要颜色(HSV) def extract_dominant_color(image, contour): mask = np.zeros(image.shape[:2], np.uint8) cv2.drawContours(mask, [contour], -1, 255, -1) hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) # 提取掩码区域的颜色 masked_pixels = hsv[mask == 255] if len(masked_pixels) == 0: return (0, 0, 0) # 计算主导颜色 (使用中位数更鲁棒) dominant_hsv = np.median(masked_pixels, axis=0) return tuple(map(int, dominant_hsv)) # 可视化检测过程的各个步骤 def visualize_detection_steps(image, corrected_image, paper_mask, shape_regions, binary_image, edge_image, result_img): """可视化检测过程的各个步骤,每个步骤显示在单独的窗口中""" # 显示原始图像 plt.figure(figsize=(10, 8)) plt.title(&#39;Original Image&#39;) plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 显示校正后的图像 plt.figure(figsize=(10, 8)) plt.title(&#39;Corrected Image&#39;) plt.imshow(cv2.cvtColor(corrected_image, cv2.COLOR_BGR2RGB)) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 显示边缘检测结果 plt.figure(figsize=(10, 8)) plt.title(&#39;Edge detection Results&#39;) plt.imshow(edge_image, cmap=&#39;gray&#39;) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 显示二值化结果 plt.figure(figsize=(10, 8)) plt.title(&#39;Binary Image&#39;) plt.imshow(binary_image, cmap=&#39;gray&#39;) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 显示形状检测结果 shape_image = corrected_image.copy() colors = [(0, 0, 255), (0, 255, 0), (255, 0, 0), (255, 255, 0), (0, 255, 255), (255, 0, 255), (128, 0, 0), (0, 128, 0)] for i, contour in enumerate(shape_regions): # 绘制优化后的角点 approx = refine_corners(contour) color = colors[i % len(colors)] cv2.drawContours(shape_image, [contour.astype(int)], -1, color, 2) # 标记角点 for point in approx: x, y = point[0] cv2.circle(shape_image, (x, y), 5, (0, 255, 0), -1) plt.figure(figsize=(10, 8)) plt.title(&#39;Shape detection Results (with corners)&#39;) plt.imshow(cv2.cvtColor(shape_image, cv2.COLOR_BGR2RGB)) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 显示最终识别结果 plt.figure(figsize=(10, 8)) plt.title(&#39;Shape recognition Results&#39;) plt.imshow(cv2.cvtColor(result_img, cv2.COLOR_BGR2RGB)) plt.axis(&#39;off&#39;) plt.tight_layout() plt.show() # 主函数 def main(image_path): image = cv2.imread(image_path) if image is None: print(f"错误:无法读取图像 {image_path}") return # 1. 使用Harris角点检测纸张边框 corner_img, corners = detect_corners_with_harris(image) paper_points = find_paper_corners_using_harris(corners, image.shape) # 初始化变量 corrected_image = image.copy() paper_mask = None pixel_to_mm = 1.0 paper_detected = False # 2. 透视变换校正图像 if paper_points is not None: # 执行透视变换 corrected_image, M, max_width, max_height, original_points = perspective_transform(image, paper_points) paper_detected = True print("成功检测并校正纸张轮廓") # 创建纸张掩码(仅处理纸张内部区域) paper_mask = create_paper_mask(max_width, max_height) # 计算物理尺寸转换因子 paper_width_mm, paper_height_mm = PAPER_SIZES["A4"] pixel_to_mm_x = paper_width_mm / max_width pixel_to_mm_y = paper_height_mm / max_height pixel_to_mm = (pixel_to_mm_x + pixel_to_mm_y) / 2.0 else: print("未检测到纸张轮廓 - 使用原始图像进行分析") h, w = image.shape[:2] paper_mask = create_paper_mask(w, h) # 3. 基于边缘检测和形态学操作检测几何形状 shape_regions, binary_image, edge_image = detect_shapes_with_edge_detection( corrected_image, paper_mask=paper_mask, min_area_ratio=0.001, # 降低最小面积阈值 edge_canny_threshold1=30, # Canny边缘检测阈值 edge_canny_threshold2=100 # Canny边缘检测阈值 ) # 如果没有检测到形状,尝试使用不同的参数 if not shape_regions: print("第一次尝试未检测到几何形状 - 尝试使用不同的参数") shape_regions, binary_image, edge_image = detect_shapes_with_edge_detection( corrected_image, paper_mask=paper_mask, min_area_ratio=0.0005, # 进一步降低最小面积阈值 edge_canny_threshold1=20, edge_canny_threshold2=80 ) if not shape_regions: print("未检测到几何形状 - 尝试调整参数") # 备用方案:直接在二值图像上查找轮廓 gray = cv2.cvtColor(corrected_image, cv2.COLOR_BGR2GRAY) _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU) if paper_mask is not None: binary = cv2.bitwise_and(binary, binary, mask=paper_mask) contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) min_area = 0.0005 * binary.shape[0] * binary.shape[1] shape_regions = [c for c in contours if cv2.contourArea(c) > min_area] if not shape_regions: print("所有尝试均未检测到几何形状") return else: print(f"备用方案检测到 {len(shape_regions)} 个形状") edge_image = cv2.cvtColor(binary, cv2.COLOR_GRAY2BGR) # 初始化结果列表 results = [] result_img = corrected_image.copy() # 处理每个形状区域 for i, contour in enumerate(shape_regions): # 提取主导颜色 hsv_color = extract_dominant_color(corrected_image, contour) rgb_color = cv2.cvtColor(np.uint8([[hsv_color]]), cv2.COLOR_HSV2BGR)[0][0] color_name = f"RGB({rgb_color[2]}, {rgb_color[1]}, {rgb_color[0]})" # 识别形状 shape = identify_shape(contour) # 计算几何属性 length_mm, width_mm, area_mm, center_px, length_px, width_px, area_px = calculate_properties( contour, pixel_to_mm ) # 存储结果 results.append({ "id": i + 1, "shape": shape, "length_mm": length_mm, "width_mm": width_mm, "area_mm": area_mm, "color": color_name, "hsv_color": hsv_color, "center_px": center_px, "length_px": length_px, "width_px": width_px, "area_px": area_px }) # 在图像上标注物体 color = (int(rgb_color[2]), int(rgb_color[1]), int(rgb_color[0])) cv2.drawContours(result_img, [contour], -1, color, 3) cv2.putText(result_img, f"{i + 1}({shape})", (int(center_px[0]), int(center_px[1])), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2) # 可视化检测步骤 visualize_detection_steps(image, corrected_image, paper_mask, shape_regions, binary_image, edge_image, result_img) # 打印分析结果 print("\n===== 图像分析结果 =====") print(f"检测到 {len(results)} 个几何图形") print("-" * 70) for obj in results: print(f"图形 {obj[&#39;id&#39;]}:") print(f" 形状: {obj[&#39;shape&#39;]}") print(f" 长度: {obj[&#39;length_mm&#39;]:.2f} mm ({obj[&#39;length_px&#39;]:.1f} 像素)") print(f" 宽度: {obj[&#39;width_mm&#39;]:.2f} mm ({obj[&#39;width_px&#39;]:.1f} 像素)") print(f" 面积: {obj[&#39;area_mm&#39;]:.2f} mm² ({obj[&#39;area_px&#39;]:.1f} 像素²)") print(f" 颜色: {obj[&#39;color&#39;]} (HSV: {obj[&#39;hsv_color&#39;]})") print(f" 中心坐标: ({obj[&#39;center_px&#39;][0]}, {obj[&#39;center_px&#39;][1]})") print("-" * 70) if __name__ == "__main__": image_path = "ALL2.jpg" # 替换为实际图像路径 main(image_path) 上述代码对图像进行二值化操作部分有点问题,导致一些几何形状的角钝化,可能因此导致形状判定不正确的问题,请修改一下这串代码,给出完整的修改后的代码
06-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值