使用TWO_TASK或者LOCAL环境变量

使用TWO_TASK或者LOCAL环境变量



前一阵子,我遇到一个问题:
rman target /
链接如下:
http://www.itpub.net/thread-1167136-1-1.html
执行错误,必须输入rman target sys/xxx@yyy 

我记得我以前学习安装oracle 8i的时候,遇到的一个问题,就是ora-12560错误,就是在通过远程桌面登录服务器执行无法执行sqlplus user/passwd,必须加入sqlplus user/passwd@net_name.

关于这个问题,如果google,在许多地方都可以看到这个问题的解决ora-12560,实际上当时我测试过,许多是无效的.如果使用远程桌面,至少在当时我没有解决这个问题的.今天测试终于知道答案,链接如下:

http://davidyu720.itpub.net/post/31716/470434
ORACLE8i本地登录错误ORA-12560: TNS: 协议适配器错误
Windows2003上的ORACLE817,在服务器上不使用连接串登录,直接用SQLPLUS或SVRMGRL本地登录,却报错误ORA-12560: TNS: 协议适配器错误。
找到原因:这是一台终端服务器,在远处终端中登录OS--再登录数据库时,会提示ORA-12560错误。直接在控制台中登录OS--再登录数据库时就正常。
解决办法:无--也许这是8i自己的问题吧。同样的Windows环境下,9i就没有问题--因此就懒得找解决办法了。

我通过vnc登录服务器,确实可以在服务端执行sqlplus user/passwd.而通过远程桌面确实不行.

我记得当时在新闻组提问,对方给出的解决就是定义LOCAL环境变量,当时一试验真的可以,也没有再去想这个问题.

这样在执行rman出错的时候,我自己也习惯的采用定义LOCAL变量的方式,至于这个变量的值是什么我自己也不清楚,我一直以为是oracle_sid. 在sqlplus时发现我发现连接的数据库是远程的数据库,因为我本机的10g,而远程的数据库是8i.

当时事情太多,心里仅仅想着快点解决rman target /的问题.

今天看书<Apress.Expert.Oracle.Database.10g.Administration.Sep.2005.pdf>,才发现P428页:
有如下论述:

Using the TWO_TASK Environment Variable
You can bypass the use of an Oracle Net name by setting the TWO_TASK environment variable (on UNIX/Linux) or the LOCAL environment variable (on Windows).
The TWO_TASK environment variable specifies the connect string for connecting to a remote machine. SQL*Net will check the value of the TWO_TASK environment variable and automatically
add it to your connect string, as shown in the following example:
$ export TWO_TASK=mydb
Once you set the TWO_TASK environment variable, you can connect to the mydb database in the
following way:
$ sqlplus scott/tiger
Note that you didn’t have to use the specification sqlplus scott/tiger@mydb, since you’re using the TWO_TASK variable.On a Windows server, the following is the equivalent for setting the TWO_TASK environment variable:
$ SET LOCAL=<mydb>
$ sqlplus scott/tiger

按照这个的介绍,如果我定义TWO_TASK(linux)或者LOCAL(windows)等于某个net_name,就可以实现在输入sqlplus 不需要在输入@net_name参数(当然服务端监听一定要起来的情况修下).测试一下,果然可以!


UnixLinux环境下,可以设置TWO_TASK环境变量,当用户连接数据库且没有指定服务名时,会自动利用TWO_TASK的设置作为环境变量连接数据库。

 

 

当前主机有两个数据库在运行:

[oracle@bfapp2 ~]$ ps -ef|grep ora
oracle    3899     1  0 May17 ?        00:00:00 ora_pmon_demo2
oracle    3901     1  0 May17 ?        00:00:00 ora_dbw0_demo2
oracle    3903     1  0 May17 ?        00:00:01 ora_lgwr_demo2
oracle    3905     1  0 May17 ?        00:00:01 ora_ckpt_demo2
oracle    3907     1  0 May17 ?        00:00:01 ora_smon_demo2
oracle    3909     1  0 May17 ?        00:00:00 ora_reco_demo2
oracle    3911     1  0 May17 ?        00:00:00 ora_cjq0_demo2
oracle    3913     1  0 May17 ?        00:00:18 ora_qmn0_demo2
oracle    3915     1  0 May17 ?        00:00:00 ora_s000_demo2
oracle    3917     1  0 May17 ?        00:00:00 ora_d000_demo2
oracle    3942     1  0 May17 ?        00:00:00 /oracle/ora9/product/9.2/bin/tnslsnr LISTENER -inherit
oracle    4787     1  0 May17 ?        00:00:00 ora_pmon_demo
oracle    4789     1  0 May17 ?        00:00:01 ora_dbw0_demo
oracle    4791     1  0 May17 ?        00:00:00 ora_lgwr_demo
oracle    4793     1  0 May17 ?        00:00:00 ora_ckpt_demo
oracle    4795     1  0 May17 ?        00:00:02 ora_smon_demo
oracle    4797     1  0 May17 ?        00:00:00 ora_reco_demo
oracle    4799     1  0 May17 ?        00:00:00 ora_cjq0_demo
oracle    4801     1  0 May17 ?        00:00:00 ora_s000_demo
oracle    4803     1  0 May17 ?        00:00:00 ora_d000_demo
oracle    4807     1  1 May17 ?        00:17:53 ora_j000_demo
oracle    5175     1  0 May17 ?        00:00:01 oracledemo (LOCAL=NO)
root      8812  3444  0 16:02 ?        00:00:00 sshd: oracle [priv]
oracle    8814  8812  0 16:02 ?        00:00:00 sshd: oracle@pts/1
oracle    8815  8814  0 16:02 pts/1    00:00:00 -bash
oracle    8841  8815  0 16:44 pts/1    00:00:00 ps -ef
oracle    8842  8815  0 16:44 pts/1    00:00:00 grep ora

一个实例名为demo,另一个为demo2

看看tnsnames.ora中的配置:

[oracle@bfapp2 ~]$ more $ORACLE_HOME/network/admin/tnsnames.ora
demo =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 172.25.13.149)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = demo)
    )
  )

demo2 =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = 172.25.13.149)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = demo2)
    )
  )

本地服务名中配置了DEMODEMO2两个服务名,分别对应DEMODEMO2两个数据库。

检查当前ORACLE_SID环境变量的设置:

[oracle@bfapp2 ~]$ env|grep SID
ORACLE_SID=demo2

当前环境变量中设置的SIDDEMO2,下面不指定服务名连接数据库:

[oracle@bfapp2 ~]$ sqlplus test/test

SQL*Plus: Release 9.2.0.4.0 - Production on 星期二 5 18 16:45:02 2010

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


连接到
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
DEMO2.US.ORACLE.COM

SQL> exit           
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
中断开
[oracle@bfapp2 ~]$ sqlplus test/test@demo

SQL*Plus: Release 9.2.0.4.0 - Production on 星期二 5 18 16:45:27 2010

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


连接到
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
DEMO

SQL> exit
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
中断开

当不指定服务名时,由于设置了ORACLE_SID=demo2,因此连接到DEMO2数据库。如果指定DEMO服务名,可以连接到DEMO数据库中。

下面设置TWO_TASK环境变量为demo

[oracle@bfapp2 ~]$ export TWO_TASK=demo
[oracle@bfapp2 ~]$ sqlplus test/test

SQL*Plus: Release 9.2.0.4.0 - Production on 星期二 5 18 16:45:50 2010

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


连接到
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
DEMO

SQL> conn test/test@demo2
已连接。
SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
DEMO2.US.ORACLE.COM

由于设置了TWO_TASK,当不指定服务名,OracleTWO_TASK设置的变量作为默认服务名,因此连接到DEMO数据库中。如果指定服务名连接,则不受TWO_TASK环境变量的影响。

需要注意一点,使用了TWO_TASK环境变量后,无法使用操作系统验证登陆数据库:

SQL> conn / as sysdba
ERROR:
ORA-01031: insufficient privileges


警告您不再连接到 ORACLE
SQL> conn /@demo2 as sysdba
ERROR:
ORA-01031: 
权限不足


SQL> conn /@demo as sysdba
ERROR:
ORA-01031: insufficient privileges


SQL> exit

原因很简单,就是TWO_TASK环境变量的存在,使得SQLPLUS没有办法/ as sysdba登陆,而永远都是/@servicename as sysdba

 






About Me

...............................................................................................................................

● 本文整理自网络

● 本文在itpub(http://blog.itpub.net/26736162)、博客园(http://www.cnblogs.com/lhrbest)和个人微信公众号(xiaomaimiaolhr)上有同步更新

● 本文itpub地址:http://blog.itpub.net/26736162/abstract/1/

● 本文博客园地址:http://www.cnblogs.com/lhrbest

● 本文pdf版及小麦苗云盘地址:http://blog.itpub.net/26736162/viewspace-1624453/

● 数据库笔试面试题库及解答:http://blog.itpub.net/26736162/viewspace-2134706/

● QQ群:230161599     微信群:私聊

● 联系我请加QQ好友(646634621),注明添加缘由

● 于 2017-06-02 09:00 ~ 2017-06-30 22:00 在魔都完成

● 文章内容来源于小麦苗的学习笔记,部分整理自网络,若有侵权或不当之处还请谅解

● 版权所有,欢迎分享本文,转载请保留出处

...............................................................................................................................

拿起手机使用微信客户端扫描下边的左边图片来关注小麦苗的微信公众号:xiaomaimiaolhr,扫描右边的二维码加入小麦苗的QQ群,学习最实用的数据库技术。

ico_mailme_02.png
DBA笔试面试讲解
欢迎与我联系

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26736162/viewspace-2140246/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/26736162/viewspace-2140246/

program da_system use mpi use iso_c_binding use module_grid use module_grid_interp use module_io use module_data use module_numerical use module_svd use module_process implicit none ! 观测相关数组 - Observation arrays (节点共享内存) integer, parameter :: nobs_actual = 883*555 ! 实际观测数量(编译时常量) real, parameter :: fillvalue = 999999.0 real, pointer :: obs_wind_ptr(:) => null() ! 观测值共享指针 real, pointer :: obs_locations_ptr(:,:) => null() ! 观测位置共享指针 real, pointer :: obs_errors_ptr(:) => null() ! 观测误差共享指针 real, pointer :: obs_weight_ptr(:) => null() ! 观测权重共享指针 ! 大数组共享内存指针 - 分为5个变量 real, pointer :: ensemble_pi_ptr(:,:,:,:) => null() ! pi集合数据共享指针 real, pointer :: ensemble_u_ptr(:,:,:,:) => null() ! u集合数据共享指针 real, pointer :: ensemble_v_ptr(:,:,:,:) => null() ! v集合数据共享指针 real, pointer :: ensemble_th_ptr(:,:,:,:) => null() ! th集合数据共享指针 real, pointer :: ensemble_q_ptr(:,:,:,:) => null() ! q集合数据共享指针 type(model_data), pointer :: back_data_ptr => null() ! 背景场共享指针 type(model_data), pointer :: true_data_ptr => null() ! 真值场共享指针 ! 本地数据 type(model_data) :: analysis_increment type(model_data_point) :: analysis_increment_point real, dimension(ids:ide, kms:kme, jds:jde) :: height_full, height_half ! 主循环变量 - Main loop variables integer :: i_grid, j_grid, k integer :: member, i_obs integer :: i_start, i_end, j_start, j_end ! 局地化范围索引 integer :: myid0_task, myid0_task_total ! 并行参数 integer :: myid, numprocs, ierr integer :: node_comm, node_rank, node_size integer :: inter_comm ! 跨节点通信器(仅节点根进程参与) integer :: win_obs, win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, win_back, win_true ! 多个共享内存窗口 integer(kind=MPI_ADDRESS_KIND) :: ssize_obs, ssize_data ! 掩码并行相关变量 integer, allocatable :: assimilation_mask(:,:) integer, allocatable :: my_grid_list(:,:) integer :: n_valid_points, n_my_points, grid_idx, obs_count integer :: psize, pstart, pend ! 网格分块相关变量(用于第一阶段掩码生成) integer :: px, py, px_rank, py_rank integer :: istart_proc, iend_proc, jstart_proc, jend_proc integer :: n_local_points ! 第二阶段:动态调度分配(master-worker) integer, parameter :: TAG_REQUEST = 100, TAG_ASSIGN = 101 integer :: task_id, worker_id, status(MPI_STATUS_SIZE) integer, allocatable :: all_i_coords(:), all_j_coords(:) integer :: total_points, idx ! 第三阶段:动态调度主循环 ! ====== 动态调度主循环(批量分发+非阻塞) ====== integer, parameter :: TAG_DONE = 102 integer :: tasks_per_worker ! 动态计算每批次任务数 integer, allocatable :: task_ids(:) ! ====== 动态调度主循环相关变量(必须在声明区声明) ====== integer :: next_task, n_workers, request, i, n_this_batch, req, j logical :: flag integer :: n_tasks_sent, n_tasks_done ! ====== 任务池机制相关变量 ====== integer, allocatable :: worker_requests(:) ! 每个worker的请求句柄 integer, allocatable :: worker_send_reqs(:) ! 每个worker的发送句柄 integer, allocatable :: worker_status(:,:) ! 每个worker的状态 logical, allocatable :: worker_active(:) ! worker是否活跃 logical, allocatable :: worker_req_pending(:) ! worker请求是否待处理 integer :: n_active_workers, completed_worker, send_req integer :: task_pool_size, current_task integer :: target_batches_per_worker ! 每个worker目标批次数 real :: progress_percent ! 进度百分比 integer :: report_interval ! 报告间隔 integer :: print_count ! ================= MPI 初始化与进程信息 ================= call MPI_INIT(ierr) ! 初始化MPI环境,所有进程必须调用 call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) ! 获取全局进程号,所有进程调用 call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) ! 获取总进程数,所有进程调用 ! 创建节点通信器 call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, node_comm, ierr) call MPI_Comm_rank(node_comm, node_rank, ierr) ! 获取节点内进程号 call MPI_Comm_size(node_comm, node_size, ierr) ! 获取节点内进程数 ! 创建跨节点通信器(所有进程调用,部分进程返回MPI_COMM_NULL) if (node_rank == 0) then call MPI_Comm_split(MPI_COMM_WORLD, 0, myid, inter_comm, ierr) else call MPI_Comm_split(MPI_COMM_WORLD, MPI_UNDEFINED, myid, inter_comm, ierr) end if ! 计算二维进程分块 (按xy比例分配,考虑边缘半径) - 用于第一阶段掩码生成 call compute_process_grid(numprocs, nx-2*local_radius, ny-2*local_radius, px, py) px_rank = mod(myid, px) ! x方向进程号 py_rank = myid / px ! y方向进程号 ! 计算本进程负责的网格区域(基于有效计算区域) call compute_local_domain(px_rank, py_rank, px, py, nx-2*local_radius, ny-2*local_radius, & istart_proc, iend_proc, jstart_proc, jend_proc) ! 转换为实际网格坐标:有效计算区域是[local_radius+1, nx-local_radius] istart_proc = istart_proc + local_radius ! 起始点:1 -> local_radius+1 jstart_proc = jstart_proc + local_radius iend_proc = iend_proc + local_radius ! 结束点:nx-2*local_radius -> nx-local_radius jend_proc = jend_proc + local_radius if (myid == 0) then write(*,'(A,4I6)') 'DEBUG: Final process domain for mask generation: ', & istart_proc, iend_proc, jstart_proc, jend_proc write(*,'(A,2I6)') 'DEBUG: Domain size: ', (iend_proc-istart_proc+1), (jend_proc-jstart_proc+1) end if if (myid == 0) then write(*,'(A)') strline write(*,'(A)') ' NUDT Regional Data Assimilation' write(*,'(A)') ' with Shared Memory & Two-Stage Mask-based Parallel' write(*,'(A)') strline write(*,'(A,I6)') 'Total processes: ', numprocs write(*,'(A,3I6)') 'Grid dimensions: ', nx, ny, nz write(*,'(A,I6)') 'Local radius: ', local_radius write(*,'(A,2I6)') 'Effective computation area: ', nx-2*local_radius, ny-2*local_radius write(*,'(A,I6)') 'Ensemble size: ', nsample write(*,'(A,I6)') 'SVD rank truncation: ', rank_truncation write(*,'(A,I6)') 'Min observations required: ', min_obs_required write(*,'(A,2I6)') 'Process grid for mask generation (px,py): ', px, py write(*,'(A)') strline end if ! ===== 节点共享内存分配 ===== call setup_shared_memory_obs(node_comm, node_rank, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, win_obs) call setup_shared_memory_data(node_comm, node_rank, ensemble_pi_ptr, & ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, & win_back, win_true) ! 每个节点的根进程负责读取/接收本节点数据 if (node_rank == 0) then ! 全局主进程读取高度数据 if (myid == 0) then write(*,'(A)') 'Reading height data ...' call read_height('zz66.dat', height_full, height_half) write(*,'(A)') 'Height data: OK' end if end if ! 节点内同步,确保共享内存数据准备完毕 call MPI_Barrier(node_comm, ierr) ! 所有进程必须调用 ! 节点根进程广播高度数据到所有节点根进程 if (node_rank == 0) then call MPI_Bcast(height_full, size(height_full), MPI_REAL, 0, inter_comm, ierr) call MPI_Bcast(height_half, size(height_half), MPI_REAL, 0, inter_comm, ierr) end if ! 读取并存储本节点共享数据 if (node_rank == 0) then write(*,'(A,I5)') 'Node root process ', myid, ' reading data ...' call read_and_store_shared_data(ensemble_pi_ptr, ensemble_u_ptr, & ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, & height_full, height_half, .false.) end if ! 节点内同步,确保所有进程完成数据读取 call MPI_Barrier(node_comm, ierr) ! 所有进程必须调用 ! 重要:观测位置必须全局统一!只能由0号进程生成一次,然后广播给所有节点 if (myid == 0) then write(*,'(A)') 'Global master generating observation locations (ONCE for all nodes)...' call generate_obs_locations(obs_locations_ptr, nobs_actual) write(*,'(A)') 'Global master observation location generation completed.' ! 调试:检查生成的观测位置 write(*,'(A,2F8.2)') 'Debug: First obs location: ', obs_locations_ptr(1,1), obs_locations_ptr(1,2) write(*,'(A,2F8.2)') 'Debug: Last obs location: ', obs_locations_ptr(nobs_actual,1), obs_locations_ptr(nobs_actual,2) write(*,'(A,F8.2)') 'Debug: Min obs i-coord: ', minval(obs_locations_ptr(:,1)) write(*,'(A,F8.2)') 'Debug: Max obs i-coord: ', maxval(obs_locations_ptr(:,1)) write(*,'(A,F8.2)') 'Debug: Min obs j-coord: ', minval(obs_locations_ptr(:,2)) write(*,'(A,F8.2)') 'Debug: Max obs j-coord: ', maxval(obs_locations_ptr(:,2)) end if ! 广播观测位置到所有进程(跨节点) call MPI_Bcast(obs_locations_ptr, nobs_actual*3, MPI_REAL, 0, MPI_COMM_WORLD, ierr) ! 只需要全局同步,确保所有节点的主进程都完成数据加载 call MPI_Barrier(MPI_COMM_WORLD, ierr) ! 所有进程必须调用 ! 初始化本进程的分析增量(model_data类型已经有固定形状,无需allocate) analysis_increment%pi = fillvalue analysis_increment%u = fillvalue analysis_increment%v = fillvalue analysis_increment%th = fillvalue analysis_increment%q = fillvalue ! ====== 两阶段掩码筛选与负载均衡分配 ====== ! 第一阶段:基于网格分块的并行掩码生成 ! 1. 创建二维掩码数组并初始化 allocate(assimilation_mask(local_radius+1:nx-local_radius, local_radius+1:ny-local_radius)) assimilation_mask = 0 if (myid == 0) then write(*,'(A)') '====== DEBUG: Entering mask generation stage ======' write(*,'(A,4I6)') 'Process domain: istart, iend, jstart, jend = ', & istart_proc, iend_proc, jstart_proc, jend_proc write(*,'(A,2I6)') 'Mask array bounds: ', local_radius+1, nx-local_radius, local_radius+1, ny-local_radius write(*,'(A)') 'Stage 1: Generating assimilation mask using grid-block parallelization...' write(*,'(A,I4,A,I4)') 'Each process handles a ', & (iend_proc-istart_proc+1), ' x ', (jend_proc-jstart_proc+1), ' block' ! 调试:检查观测数据 write(*,'(A,I8)') 'Total observations: ', nobs_actual write(*,'(A,F8.2,F8.2)') 'First observation location: ', & obs_locations_ptr(1,1), obs_locations_ptr(1,2) write(*,'(A,F8.2,F8.2)') 'Last observation location: ', & obs_locations_ptr(nobs_actual,1), obs_locations_ptr(nobs_actual,2) write(*,'(A,I4)') 'Local radius: ', local_radius write(*,'(A,I4)') 'Min observations required: ', min_obs_required write(*,'(A,I4,A,I4)') 'Valid grid range: i=', local_radius+1, ' to ', nx-local_radius write(*,'(A,I4,A,I4)') 'Valid grid range: j=', local_radius+1, ' to ', ny-local_radius end if ! 2. 每个进程并行生成自己负责区域的掩码 if (myid == 0) then write(*,'(A)') '====== DEBUG: Starting mask generation loop ======' write(*,'(A,4I6)') 'Loop bounds: jstart, jend, istart, iend = ', & jstart_proc, jend_proc, istart_proc, iend_proc end if do j_grid = jstart_proc, jend_proc do i_grid = istart_proc, iend_proc obs_count = 0 ! 统计该网格点周围local_radius范围内的观测数量 do k = 1, nobs_actual ! 检查观测位置是否在有效网格范围内 if (obs_locations_ptr(k,1) >= local_radius+1 .and. obs_locations_ptr(k,1) <= nx-local_radius .and. & obs_locations_ptr(k,2) >= local_radius+1 .and. obs_locations_ptr(k,2) <= ny-local_radius) then ! 然后检查是否在当前网格点的local_radius范围内 if (abs(nint(obs_locations_ptr(k,1)) - i_grid) <= local_radius .and. & abs(nint(obs_locations_ptr(k,2)) - j_grid) <= local_radius) then obs_count = obs_count + 1 end if end if end do ! 如果观测数量满足要求,标记为可同化 if (obs_count >= min_obs_required) then assimilation_mask(i_grid, j_grid) = 1 end if ! 调试:输出前几个网格点的观测统计(只在进程0上输出) if (myid == 0 .and. ((i_grid-istart_proc)*(jend_proc-jstart_proc+1) + (j_grid-jstart_proc+1)) <= 10) then write(*,'(A,I4,A,I4,A,I4,A,I4)') 'Grid (', i_grid, ',', j_grid, ') has ', obs_count, ' observations, mask=', assimilation_mask(i_grid, j_grid) end if end do end do ! 3. 全局归约掩码:所有进程的掩码合并 call MPI_Allreduce(MPI_IN_PLACE, assimilation_mask, size(assimilation_mask), & MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, ierr) ! 4. 统计所有可同化网格点数量 n_valid_points = count(assimilation_mask >= 1) ! 注意:可能有重复统计,用>=1 if (myid == 0) then ! 调试:统计掩码分布 write(*,'(A,I8)') 'Debug: Raw valid points (before normalization): ', n_valid_points write(*,'(A,I8)') 'Debug: Max mask value: ', maxval(assimilation_mask) write(*,'(A,I8)') 'Debug: Points with mask >= 1: ', count(assimilation_mask >= 1) write(*,'(A,I8)') 'Debug: Points with mask > 1: ', count(assimilation_mask > 1) end if ! 5. 将掩码标准化为0/1 where (assimilation_mask >= 1) assimilation_mask = 1 elsewhere assimilation_mask = 0 end where ! 重新统计标准化后的有效网格点 n_valid_points = count(assimilation_mask == 1) if (myid == 0) then write(*,'(A)') 'Stage 2: Redistributing valid grid points for load balancing...' write(*,'(A,I8)') 'Total valid points to distribute: ', n_valid_points write(*,'(A,I8)') 'Points per process (average): ', n_valid_points / numprocs write(*,'(A,I8)') 'Last process will handle: ', n_valid_points - (numprocs-1)*(n_valid_points/numprocs) end if if (n_valid_points > 0) then ! 生成所有有效网格点坐标 total_points = n_valid_points allocate(all_i_coords(total_points), all_j_coords(total_points)) idx = 0 do j_grid = local_radius+1, ny-local_radius do i_grid = local_radius+1, nx-local_radius if (assimilation_mask(i_grid, j_grid) == 1) then idx = idx + 1 all_i_coords(idx) = i_grid all_j_coords(idx) = j_grid end if end do end do if (myid == 0) then write(*,'(A)') 'Stage 2: Dynamic scheduling (master-worker) for load balancing...' write(*,'(A,I8)') 'Total valid points to distribute: ', n_valid_points end if end if call MPI_Barrier(MPI_COMM_WORLD, ierr) ! ====== 动态计算任务分配策略 ====== if (n_valid_points > 0) then ! 计算合理的批次大小:确保每个进程都有足够的工作 n_workers = numprocs - 1 if (n_workers > 0) then ! 基础批次大小:总任务数 / (进程数 * 目标批次数) ! 目标是让每个进程处理多个批次,保持负载均衡 target_batches_per_worker = 5 ! 每个worker目标批次数 tasks_per_worker = max(1, n_valid_points / (n_workers * target_batches_per_worker)) ! 限制批次大小在合理范围内 tasks_per_worker = max(1, min(tasks_per_worker, 100)) write(*,'(A,I8,A,I4,A,I4)') 'Task distribution: ', n_valid_points, ' tasks, ', & n_workers, ' workers, batch size: ', tasks_per_worker write(*,'(A,I6)') 'Estimated total batches: ', (n_valid_points + tasks_per_worker - 1) / tasks_per_worker else tasks_per_worker = n_valid_points ! 单进程情况 end if else tasks_per_worker = 1 end if print_count = 0 ! 第三阶段:动态调度主循环 ! ====== 任务池机制(完全非阻塞,无排队等待) ====== allocate(task_ids(tasks_per_worker)) if (n_valid_points == 0) then if (myid <= 3) then write(*,'(A,I4,A)') 'Process ', myid, ' assigned 0 grid points, skipping assimilation loop.' end if ! 重要:即使没有任务,worker也要通知master自己已完成 if (myid /= 0) then ! Worker发送完成信号,让master知道自己没有工作要做 call MPI_Send(myid, 1, MPI_INTEGER, 0, TAG_DONE, MPI_COMM_WORLD, ierr) end if else if (myid == 0) then ! ====== Master进程:任务池调度器 ====== n_workers = numprocs - 1 if (n_valid_points == 0) then ! 特殊情况:没有任务时,直接等待所有worker的TAG_DONE信号 write(*,'(A)') 'Master: No tasks to assign, waiting for all workers to report completion...' do i = 1, n_workers call MPI_Recv(request, 1, MPI_INTEGER, i, TAG_DONE, MPI_COMM_WORLD, status, ierr) write(*,'(A,I4,A)') 'Master: received completion signal from worker ', i, ' (no tasks case)' end do write(*,'(A)') 'Master: All workers have reported completion (no tasks case)' else ! 正常情况:有任务时的调度逻辑 allocate(worker_requests(n_workers)) allocate(worker_send_reqs(n_workers)) allocate(worker_status(MPI_STATUS_SIZE, n_workers)) allocate(worker_active(n_workers)) allocate(worker_req_pending(n_workers)) ! 初始化:为每个worker启动非阻塞接收 worker_active = .true. worker_req_pending = .false. n_active_workers = n_workers current_task = 1 n_tasks_sent = 0 n_tasks_done = 0 do i = 1, n_workers call MPI_Irecv(request, 1, MPI_INTEGER, i, TAG_REQUEST, MPI_COMM_WORLD, worker_requests(i), ierr) worker_req_pending(i) = .true. end do write(*,'(A,I4,A)') 'Master: Task pool initialized with ', n_active_workers, ' workers' ! 主循环:任务池调度 do while (current_task <= n_valid_points .or. n_active_workers > 0) ! 检查是否有worker请求任务 do i = 1, n_workers if (worker_active(i) .and. worker_req_pending(i)) then call MPI_Test(worker_requests(i), flag, worker_status(:,i), ierr) if (flag) then worker_req_pending(i) = .false. ! 分配任务 if (current_task <= n_valid_points) then n_this_batch = min(tasks_per_worker, n_valid_points - current_task + 1) task_ids(1:n_this_batch) = [(current_task + j - 1, j=1,n_this_batch)] if (n_this_batch < tasks_per_worker) then task_ids(n_this_batch+1:tasks_per_worker) = -1 end if current_task = current_task + n_this_batch n_tasks_sent = n_tasks_sent + 1 write(*,'(A,I4,A,I4,A,I4)') 'Master: assigning batch ', n_tasks_sent, & ' (', n_this_batch, ' tasks) to worker ', i else ! 无更多任务,发送终止信号 task_ids = -1 worker_active(i) = .false. n_active_workers = n_active_workers - 1 write(*,'(A,I4,A,I4)') 'Master: terminating worker ', i, & ', active workers: ', n_active_workers end if ! 非阻塞发送任务 call MPI_Isend(task_ids, tasks_per_worker, MPI_INTEGER, i, TAG_ASSIGN, & MPI_COMM_WORLD, worker_send_reqs(i), ierr) end if end if end do ! 检查worker完成信号 do i = 1, n_workers if (worker_active(i) .and. .not. worker_req_pending(i)) then call MPI_Test(worker_send_reqs(i), flag, worker_status(:,i), ierr) if (flag) then ! 发送完成,等待worker完成信号 call MPI_Irecv(request, 1, MPI_INTEGER, i, TAG_DONE, MPI_COMM_WORLD, req, ierr) call MPI_Wait(req, worker_status(:,i), ierr) n_tasks_done = n_tasks_done + 1 ! 重新启动请求接收 if (worker_active(i)) then call MPI_Irecv(request, 1, MPI_INTEGER, i, TAG_REQUEST, MPI_COMM_WORLD, worker_requests(i), ierr) worker_req_pending(i) = .true. end if end if end if end do ! 让出CPU,避免忙等待 if (n_active_workers == 0) exit ! 所有worker都已终止 ! 进度报告(每完成一定比例的任务) if (mod(current_task - 1, max(1, n_valid_points / 20)) == 0 .and. current_task > 1) then progress_percent = real(current_task - 1) / real(n_valid_points) * 100.0 write(*,'(A,F6.2,A,I6,A,I6,A,I4)') 'Master: progress ', progress_percent, & '% (', current_task - 1, '/', n_valid_points, '), active workers: ', n_active_workers end if end do write(*,'(A,I4,A,I4)') 'Master: All tasks completed. Sent: ', n_tasks_sent, ', Done: ', n_tasks_done ! 清理资源 deallocate(worker_requests, worker_send_reqs, worker_status, worker_active, worker_req_pending) end if ! 结束 if (n_valid_points == 0) else 块 else ! ====== Worker进程:持续工作机制 ====== if (n_valid_points == 0) then ! 没有任务时,worker不需要做任何工作,已经在前面发送了TAG_DONE信号 write(*,'(A,I4,A)') 'Worker ', myid, ' has no tasks to process' else ! 有任务时的正常工作流程 write(*,'(A,I4,A)') 'Worker ', myid, ' starting continuous work mode' do ! 请求任务 call MPI_Send(myid, 1, MPI_INTEGER, 0, TAG_REQUEST, MPI_COMM_WORLD, ierr) ! 接收任务 call MPI_Recv(task_ids, tasks_per_worker, MPI_INTEGER, 0, TAG_ASSIGN, MPI_COMM_WORLD, status, ierr) if (task_ids(1) == -1) then write(*,'(A,I4,A)') 'Worker ', myid, ' received termination signal' exit end if ! 处理任务批次 do j = 1, tasks_per_worker if (task_ids(j) == -1) exit i_grid = all_i_coords(task_ids(j)) j_grid = all_j_coords(task_ids(j)) call DA_MAIN(i_grid, j_grid, & obs_wind_ptr, obs_locations_ptr, obs_errors_ptr, nobs_actual, & ensemble_pi_ptr, ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & analysis_increment_point, & print_count) call combine_point_inc(analysis_increment, analysis_increment_point, i_grid, j_grid) end do ! 发送完成信号 call MPI_Send(myid, 1, MPI_INTEGER, 0, TAG_DONE, MPI_COMM_WORLD, ierr) ! 每处理若干批次输出一次进度(根据总任务数调整) report_interval = max(1, n_valid_points / (tasks_per_worker * 50)) ! 50次报告 if (mod(task_ids(1) / tasks_per_worker, report_interval) == 0) then write(*,'(A,I4,A,I6,A,I6)') 'Worker ', myid, ' processed batch starting at task ', & task_ids(1), ', batch size: ', count(task_ids /= -1) end if end do write(*,'(A,I4,A)') 'Worker ', myid, ' completed all assigned work' end if ! 结束 if (n_valid_points == 0) else 块 end if end if if (allocated(task_ids)) deallocate(task_ids) ! 6. 清理内存 if (allocated(assimilation_mask)) deallocate(assimilation_mask) if (allocated(my_grid_list)) deallocate(my_grid_list) if (allocated(all_i_coords)) deallocate(all_i_coords) if (allocated(all_j_coords)) deallocate(all_j_coords) if (myid == monitor_id) then write(*,'(A)') strline write(*,'(A)') 'All stages completed successfully!' write(*,'(A)') strline end if ! 等待全局结束 if (myid == 0) then write(*,'(A)') 'Master: Waiting for all processes to reach final barrier...' end if call MPI_Barrier(MPI_COMM_WORLD, ierr) if (myid == 0) then write(*,'(A)') 'Master: All processes reached final barrier successfully' end if ! 在归约前输出每个进程的局部统计信息(用于调试) ! if (myid <= 3) then ! 只输出前4个进程的信息 ! write(*,'(A,I3,A,I3,A,2ES12.5)') 'Process ', myid, ' (node_rank=', node_rank, ') local π range: ', & ! minval(analysis_increment%pi), maxval(analysis_increment%pi) ! write(*,'(A,I3,A,I3,A,2ES12.5)') 'Process ', myid, ' (node_rank=', node_rank, ') local u range: ', & ! minval(analysis_increment%u), maxval(analysis_increment%u) ! end if ! 全局归约 analysis_increment - 添加调试信息 ! if (myid == 0) then ! write(*,'(A)') 'Starting global reduction of analysis increment...' ! write(*,'(A,2ES12.5)') 'Before reduction, process 0 π range: ', & ! minval(analysis_increment%pi), maxval(analysis_increment%pi) ! end if call global_reduce_analysis_increment(analysis_increment, fillvalue) if (myid == 0) then ! write(*,'(A)') 'Computing final analysis field ...' ! 输出分析场统计信息 write(*,'(A)') strline write(*,'(A)') ' Global Analysis Increment Statistics' write(*,'(A)') strline write(*,'(A,2ES12.5)') ' π increment range: ', minval(analysis_increment%pi), maxval(analysis_increment%pi) write(*,'(A,2ES12.5)') ' u increment range: ', minval(analysis_increment%u), maxval(analysis_increment%u) write(*,'(A,2ES12.5)') ' v increment range: ', minval(analysis_increment%v), maxval(analysis_increment%v) write(*,'(A,2ES12.5)') ' θ increment range: ', minval(analysis_increment%th), maxval(analysis_increment%th) write(*,'(A,2ES12.5)') ' q increment range: ', minval(analysis_increment%q), maxval(analysis_increment%q) write(*,'(A)') strline write(*,'(A)') 'Writing analysis increment to file ...' call write_analysis_increment('da_inc.dat', analysis_increment, height_full, height_half) write(*,'(A)') 'Writing combined analysis increment to GRAPES input format ...' call write_combine_inc2ana('grapesinput', 'grapesinput_DA', back_data_ptr, analysis_increment, height_full, height_half) write(*,'(A)') strline write(*,'(A)') ' Data Assimilation: COMPLETED' write(*,'(A)') strline ! 强制终止所有进程 write(*,'(A)') 'Master: Analysis completed successfully, forcing program termination...' call MPI_Abort(MPI_COMM_WORLD, 0, ierr) ! 立即终止所有MPI进程 end if ! 释放共享内存 if (node_rank == 0) then call MPI_Win_free(win_obs, ierr) call MPI_Win_free(win_ensemble_pi, ierr) call MPI_Win_free(win_ensemble_u, ierr) call MPI_Win_free(win_ensemble_v, ierr) call MPI_Win_free(win_ensemble_th, ierr) call MPI_Win_free(win_ensemble_q, ierr) call MPI_Win_free(win_back, ierr) call MPI_Win_free(win_true, ierr) end if ! 释放通信器,所有进程调用 call MPI_Comm_free(node_comm, ierr) if (node_rank == 0 .and. inter_comm /= MPI_COMM_NULL) then call MPI_Comm_free(inter_comm, ierr) end if ! 所有进程准备退出 if (myid == 0) then write(*,'(A)') 'Master: All processes preparing to finalize MPI...' end if ! 结束MPI环境,所有进程必须调用 call MPI_FINALIZE(ierr) if (myid == 0) then write(*,'(A)') 'Master: MPI finalized successfully, program should exit now' end if contains ! 计算二维进程网格分布 subroutine compute_process_grid(numprocs, nx_eff, ny_eff, px, py) integer, intent(in) :: numprocs, nx_eff, ny_eff integer, intent(out) :: px, py integer :: i, best_px, best_py real :: ratio, grid_ratio, best_ratio_diff grid_ratio = real(nx_eff) / real(ny_eff) best_ratio_diff = huge(1.0) best_px = 1 best_py = numprocs ! 寻找最佳的px*py=numprocs分解,使得px/py接近nx_eff/ny_eff do i = 1, int(sqrt(real(numprocs))) + 1 if (mod(numprocs, i) == 0) then ratio = real(i) / real(numprocs / i) if (abs(ratio - grid_ratio) < best_ratio_diff) then best_ratio_diff = abs(ratio - grid_ratio) best_px = i best_py = numprocs / i end if end if end do px = best_px py = best_py end subroutine compute_process_grid ! 计算本进程负责的局部区域 subroutine compute_local_domain(px_rank, py_rank, px, py, nx_eff, ny_eff, & istart, iend, jstart, jend) integer, intent(in) :: px_rank, py_rank, px, py, nx_eff, ny_eff integer, intent(out) :: istart, iend, jstart, jend integer :: ilen, jlen, irem, jrem ! 计算每个进程的基本网格数 ilen = nx_eff / px jlen = ny_eff / py ! 计算余数 irem = mod(nx_eff, px) jrem = mod(ny_eff, py) ! 计算x方向范围 if (px_rank < irem) then istart = px_rank * (ilen + 1) + 1 iend = istart + ilen else istart = px_rank * ilen + irem + 1 iend = istart + ilen - 1 end if ! 计算y方向范围 if (py_rank < jrem) then jstart = py_rank * (jlen + 1) + 1 jend = jstart + jlen else jstart = py_rank * jlen + jrem + 1 jend = jstart + jlen - 1 end if end subroutine compute_local_domain ! 设置观测数据共享内存 subroutine setup_shared_memory_obs(node_comm, node_rank, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, win_obs) integer, intent(in) :: node_comm, node_rank, nobs_actual real, pointer, intent(inout) :: obs_wind_ptr(:) real, pointer, intent(inout) :: obs_locations_ptr(:,:) real, pointer, intent(inout) :: obs_errors_ptr(:) real, pointer, intent(inout) :: obs_weight_ptr(:) integer, intent(out) :: win_obs integer(kind=MPI_ADDRESS_KIND) :: ssize integer :: ierr, disp_unit type(c_ptr) :: baseptr real, pointer :: shared_array(:) if (node_rank == 0) then ssize = int(nobs_actual * 6, MPI_ADDRESS_KIND) ! obs_wind + obs_locations(3) + obs_errors + obs_weight = 6*nobs_actual else ssize = 0 end if disp_unit = 4 ! sizeof(real) call MPI_Win_allocate_shared(ssize * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr, win_obs, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_obs, 0, ssize, disp_unit, baseptr, ierr) end if ! 将C指针转换为Fortran指针 call c_f_pointer(baseptr, shared_array, [nobs_actual * 6]) ! 映射到各个数组 obs_wind_ptr => shared_array(1:nobs_actual) obs_errors_ptr => shared_array(nobs_actual+1:2*nobs_actual) obs_weight_ptr => shared_array(2*nobs_actual+1:3*nobs_actual) ! 正确映射二维数组obs_locations_ptr(nobs_actual, 3) ! 使用c_loc获取起始位置,然后映射为二维数组 ! obs_locations占用shared_array(3*nobs_actual+1 : 6*nobs_actual)的位置 call c_f_pointer(c_loc(shared_array(3*nobs_actual+1)), obs_locations_ptr, [nobs_actual, 3]) end subroutine setup_shared_memory_obs ! 设置大数组共享内存 - 分为5个变量 subroutine setup_shared_memory_data(node_comm, node_rank, ensemble_pi_ptr, & ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & win_ensemble_pi, win_ensemble_u, win_ensemble_v, & win_ensemble_th, win_ensemble_q, win_back, win_true) integer, intent(in) :: node_comm, node_rank real, pointer, intent(out) :: ensemble_pi_ptr(:,:,:,:), ensemble_u_ptr(:,:,:,:), ensemble_v_ptr(:,:,:,:), & ensemble_th_ptr(:,:,:,:), ensemble_q_ptr(:,:,:,:) type(model_data), pointer, intent(out) :: back_data_ptr, true_data_ptr integer, intent(out) :: win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, win_back, win_true integer(kind=MPI_ADDRESS_KIND) :: ssize_ensemble, ssize_data integer :: ierr, disp_unit type(c_ptr) :: baseptr_pi, baseptr_u, baseptr_v, baseptr_th, baseptr_q, baseptr_back, baseptr_true disp_unit = 4 ! sizeof(real) ! 集合数据大小 - 每个变量单独分配 if (node_rank == 0) then ssize_ensemble = int(nx * nz * ny * nsample, MPI_ADDRESS_KIND) ! 单个变量 ssize_data = int(nx * nz * ny * 5, MPI_ADDRESS_KIND) ! 5个变量 else ssize_ensemble = 0 ssize_data = 0 end if ! 分配5个集合变量的共享内存 call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_pi, win_ensemble_pi, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_pi, 0, ssize_ensemble, disp_unit, baseptr_pi, ierr) end if call c_f_pointer(baseptr_pi, ensemble_pi_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_u, win_ensemble_u, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_u, 0, ssize_ensemble, disp_unit, baseptr_u, ierr) end if call c_f_pointer(baseptr_u, ensemble_u_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_v, win_ensemble_v, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_v, 0, ssize_ensemble, disp_unit, baseptr_v, ierr) end if call c_f_pointer(baseptr_v, ensemble_v_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_th, win_ensemble_th, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_th, 0, ssize_ensemble, disp_unit, baseptr_th, ierr) end if call c_f_pointer(baseptr_th, ensemble_th_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_q, win_ensemble_q, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_q, 0, ssize_ensemble, disp_unit, baseptr_q, ierr) end if call c_f_pointer(baseptr_q, ensemble_q_ptr, [nx, nz, ny, nsample]) ! 分配背景场共享内存 call MPI_Win_allocate_shared(ssize_data * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_back, win_back, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_back, 0, ssize_data, disp_unit, baseptr_back, ierr) end if call c_f_pointer(baseptr_back, back_data_ptr) ! 分配真值场共享内存 call MPI_Win_allocate_shared(ssize_data * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_true, win_true, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_true, 0, ssize_data, disp_unit, baseptr_true, ierr) end if call c_f_pointer(baseptr_true, true_data_ptr) end subroutine setup_shared_memory_data ! 主进程读取数据并存储到共享内存 subroutine read_and_store_shared_data(ensemble_pi_ptr, ensemble_u_ptr, & ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, & height_full, height_half, gen_obs_loc) use module_svd use module_io real, pointer, intent(inout) :: ensemble_pi_ptr(:,:,:,:), ensemble_u_ptr(:,:,:,:), & ensemble_v_ptr(:,:,:,:), ensemble_th_ptr(:,:,:,:), ensemble_q_ptr(:,:,:,:) type(model_data), pointer, intent(inout) :: back_data_ptr, true_data_ptr real, pointer, intent(inout) :: obs_wind_ptr(:), obs_errors_ptr(:), & obs_locations_ptr(:,:), obs_weight_ptr(:) integer, intent(in) :: nobs_actual real, dimension(ids:ide, kms:kme, jds:jde), intent(in) :: height_full, height_half logical, intent(in), optional :: gen_obs_loc ! 是否生成观测位置 integer :: k, ierr type(model_ensemble_data) :: temp_ensemble_data ! 临时集合数据结构 logical :: do_gen_obs_loc if (present(gen_obs_loc)) then do_gen_obs_loc = gen_obs_loc else do_gen_obs_loc = .true. end if write(*,'(A)') 'Reading ensemble forecasts ...' ! 先分配临时结构体 allocate(temp_ensemble_data%pi(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%u(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%v(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%th(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%q(ids:ide, kms:kme, jds:jde, nsample)) call read_ensemble_forecasts(temp_ensemble_data, height_full, height_half) ! 复制数据到共享内存指针 ensemble_pi_ptr = temp_ensemble_data%pi ensemble_u_ptr = temp_ensemble_data%u ensemble_v_ptr = temp_ensemble_data%v ensemble_th_ptr = temp_ensemble_data%th ensemble_q_ptr = temp_ensemble_data%q ! 释放临时结构体 deallocate(temp_ensemble_data%pi, temp_ensemble_data%u, temp_ensemble_data%v, & temp_ensemble_data%th, temp_ensemble_data%q) write(*,'(A)') 'Ensemble data: OK' write(*,'(A)') 'Reading background field ...' call read_background_field(back_data_ptr, height_full, height_half) write(*,'(A)') 'Background field: OK' write(*,'(A)') 'Reading true field ...' call read_truth_field(true_data_ptr, height_full, height_half) write(*,'(A)') 'True field: OK' ! 观测数据处理 write(*,'(A,I0,A)') 'Processing ', nobs_actual, ' observations ...' ! 观测位置已由全局进程生成并广播,这里不再生成 write(*,'(A)') 'Step 1: Observation locations already generated by global master.' write(*,'(A)') 'Step 2: Initializing observation arrays ...' obs_wind_ptr(:) = 0.0 ! 初始化 obs_errors_ptr(:) = 1.005 ! 观测误差 write(*,'(A)') 'Step 2: Observation arrays initialized.' write(*,'(A)') 'Step 3: Calling get_obs_data ...' call get_obs_data(nobs_actual, obs_wind_ptr, obs_locations_ptr, obs_errors_ptr, true_data_ptr%u, obs_weight_ptr) write(*,'(A)') 'Step 3: get_obs_data completed.' end subroutine read_and_store_shared_data ! 全局归约分析增量 - 仅归约到主进程 subroutine global_reduce_analysis_increment(analysis_increment, fillvalue) use mpi type(model_data), intent(inout) :: analysis_increment integer :: ierr integer :: array_size type(model_data) :: global_result ! 全局归约结果 real :: fillvalue ! 计算数组大小:(ide-ids+1)*(kme-kms+1)*(jde-jds+1) array_size = (ide-ids+1)*(kme-kms+1)*(jde-jds+1) ! 调试:输出归属前的统计信息 if (myid == 0) then write(*,'(A,2ES12.5)') 'Process 0 before MPI_Reduce π range: ', & minval(analysis_increment%pi), maxval(analysis_increment%pi) end if ! 仅归约到主进程(myid==0),不进行广播 ! 注意:这里使用MPI_MIN来选择有效的分析增量(避免累加) ! 策略:只有处理过的网格点才有真实值,其他保持fillvalue=999999 call MPI_Reduce(analysis_increment%pi, global_result%pi, & array_size, MPI_REAL, MPI_MIN, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%u, global_result%u, & array_size, MPI_REAL, MPI_MIN, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%v, global_result%v, & array_size, MPI_REAL, MPI_MIN, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%th, global_result%th, & array_size, MPI_REAL, MPI_MIN, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%q, global_result%q, & array_size, MPI_REAL, MPI_MIN, 0, MPI_COMM_WORLD, ierr) ! 只有主进程需要更新结果 if (myid == 0) then where (global_result%pi == fillvalue) global_result%pi = 0.0 end where where (global_result%u == fillvalue) global_result%u = 0.0 end where where (global_result%v == fillvalue) global_result%v = 0.0 end where where (global_result%th == fillvalue) global_result%th = 0.0 end where where (global_result%q == fillvalue) global_result%q = 0.0 end where analysis_increment%pi = global_result%pi analysis_increment%u = global_result%u analysis_increment%v = global_result%v analysis_increment%th = global_result%th analysis_increment%q = global_result%q ! 调试:输出归约后的统计信息 ! write(*,'(A,2ES12.5)') 'After MPI_Reduce π range: ', & ! minval(analysis_increment%pi), maxval(analysis_increment%pi) ! write(*,'(A,2ES12.5)') 'After MPI_Reduce u range: ', & ! minval(analysis_increment%u), maxval(analysis_increment%u) end if end subroutine global_reduce_analysis_increment end program da_system
07-05
program da_system use mpi use iso_c_binding use module_grid use module_grid_interp use module_io use module_data use module_numerical use module_svd use module_process implicit none ! 观测相关数组 - Observation arrays (节点共享内存) integer, parameter :: nobs_actual = 883*555 ! 实际观测数量(编译时常量) real, pointer :: obs_wind_ptr(:) => null() ! 观测值共享指针 real, pointer :: obs_locations_ptr(:,:) => null() ! 观测位置共享指针 real, pointer :: obs_errors_ptr(:) => null() ! 观测误差共享指针 real, pointer :: obs_weight_ptr(:) => null() ! 观测权重共享指针 ! 大数组共享内存指针 - 分为5个变量 real, pointer :: ensemble_pi_ptr(:,:,:,:) => null() ! pi集合数据共享指针 real, pointer :: ensemble_u_ptr(:,:,:,:) => null() ! u集合数据共享指针 real, pointer :: ensemble_v_ptr(:,:,:,:) => null() ! v集合数据共享指针 real, pointer :: ensemble_th_ptr(:,:,:,:) => null() ! th集合数据共享指针 real, pointer :: ensemble_q_ptr(:,:,:,:) => null() ! q集合数据共享指针 type(model_data), pointer :: back_data_ptr => null() ! 背景场共享指针 type(model_data), pointer :: true_data_ptr => null() ! 真值场共享指针 ! 本地数据 type(model_data) :: analysis_increment type(model_data_point) :: analysis_increment_point real, dimension(ids:ide, kms:kme, jds:jde) :: height_full, height_half ! 主循环变量 - Main loop variables integer :: i_grid, j_grid, k integer :: member, i_obs integer :: i_start, i_end, j_start, j_end ! 局地化范围索引 integer :: myid0_task, myid0_task_total ! 并行参数 integer :: myid, numprocs, ierr integer :: node_comm, node_rank, node_size integer :: inter_comm ! 跨节点通信器(仅节点根进程参与) integer :: win_obs, win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, win_back, win_true ! 多个共享内存窗口 integer(kind=MPI_ADDRESS_KIND) :: ssize_obs, ssize_data ! 掩码并行相关变量 integer, allocatable :: assimilation_mask(:,:) integer, allocatable :: my_grid_list(:,:) integer :: n_valid_points, n_my_points, grid_idx, obs_count integer :: psize, pstart, pend ! 网格分块相关变量(用于第一阶段掩码生成) integer :: px, py, px_rank, py_rank integer :: istart_proc, iend_proc, jstart_proc, jend_proc integer :: n_local_points ! ================= MPI 初始化与进程信息 ================= call MPI_INIT(ierr) ! 初始化MPI环境,所有进程必须调用 call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr) ! 获取全局进程号,所有进程调用 call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr) ! 获取总进程数,所有进程调用 ! 创建节点通信器 call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, node_comm, ierr) call MPI_Comm_rank(node_comm, node_rank, ierr) ! 获取节点内进程号 call MPI_Comm_size(node_comm, node_size, ierr) ! 获取节点内进程数 ! 创建跨节点通信器(所有进程调用,部分进程返回MPI_COMM_NULL) if (node_rank == 0) then call MPI_Comm_split(MPI_COMM_WORLD, 0, myid, inter_comm, ierr) else call MPI_Comm_split(MPI_COMM_WORLD, MPI_UNDEFINED, myid, inter_comm, ierr) end if ! 计算二维进程分块 (按xy比例分配,考虑边缘半径) - 用于第一阶段掩码生成 call compute_process_grid(numprocs, nx-2*local_radius, ny-2*local_radius, px, py) px_rank = mod(myid, px) ! x方向进程号 py_rank = myid / px ! y方向进程号 ! 计算本进程负责的网格区域(基于有效计算区域) call compute_local_domain(px_rank, py_rank, px, py, nx-2*local_radius, ny-2*local_radius, & istart_proc, iend_proc, jstart_proc, jend_proc) ! 转换为实际网格坐标:有效计算区域是[local_radius+1, nx-local_radius] istart_proc = istart_proc + local_radius ! 起始点:1 -> local_radius+1 jstart_proc = jstart_proc + local_radius iend_proc = iend_proc + local_radius ! 结束点:nx-2*local_radius -> nx-local_radius jend_proc = jend_proc + local_radius if (myid == 0) then write(*,'(A,4I6)') 'DEBUG: Final process domain for mask generation: ', & istart_proc, iend_proc, jstart_proc, jend_proc write(*,'(A,2I6)') 'DEBUG: Domain size: ', (iend_proc-istart_proc+1), (jend_proc-jstart_proc+1) end if if (myid == 0) then write(*,'(A)') strline write(*,'(A)') ' NUDT Regional Data Assimilation' write(*,'(A)') ' with Shared Memory & Two-Stage Mask-based Parallel' write(*,'(A)') strline write(*,'(A,I6)') 'Total processes: ', numprocs write(*,'(A,3I6)') 'Grid dimensions: ', nx, ny, nz write(*,'(A,I6)') 'Local radius: ', local_radius write(*,'(A,2I6)') 'Effective computation area: ', nx-2*local_radius, ny-2*local_radius write(*,'(A,I6)') 'Ensemble size: ', nsample write(*,'(A,I6)') 'SVD rank truncation: ', rank_truncation write(*,'(A,I6)') 'Min observations required: ', min_obs_required write(*,'(A,2I6)') 'Process grid for mask generation (px,py): ', px, py write(*,'(A)') strline end if ! ===== 节点共享内存分配 ===== call setup_shared_memory_obs(node_comm, node_rank, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, win_obs) call setup_shared_memory_data(node_comm, node_rank, ensemble_pi_ptr, & ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, & win_back, win_true) ! 每个节点的根进程负责读取/接收本节点数据 if (node_rank == 0) then ! 全局主进程读取高度数据并广播 if (myid == 0) then write(*,'(A)') 'Reading height data ...' call read_height('zz66.dat', height_full, height_half) write(*,'(A)') 'Height data: OK' end if ! 广播高度数据到所有节点根进程 call MPI_Bcast(height_full, size(height_full), MPI_REAL, 0, inter_comm, ierr) call MPI_Bcast(height_half, size(height_half), MPI_REAL, 0, inter_comm, ierr) ! 读取并存储本节点共享数据 write(*,'(A,I5)') 'Node root process ', myid, ' reading data ...' call read_and_store_shared_data(ensemble_pi_ptr, ensemble_u_ptr, & ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, & height_full, height_half, .false.) end if ! 节点内同步,确保共享内存数据准备完毕 call MPI_Barrier(node_comm, ierr) ! 所有进程必须调用 ! 重要:观测位置必须全局统一!只能由0号进程生成一次,然后广播给所有节点 if (myid == 0) then write(*,'(A)') 'Global master generating observation locations (ONCE for all nodes)...' call generate_obs_locations(obs_locations_ptr, nobs_actual) write(*,'(A)') 'Global master observation location generation completed.' ! 调试:检查生成的观测位置 write(*,'(A,2F8.2)') 'Debug: First obs location: ', obs_locations_ptr(1,1), obs_locations_ptr(1,2) write(*,'(A,2F8.2)') 'Debug: Last obs location: ', obs_locations_ptr(nobs_actual,1), obs_locations_ptr(nobs_actual,2) write(*,'(A,F8.2)') 'Debug: Min obs i-coord: ', minval(obs_locations_ptr(:,1)) write(*,'(A,F8.2)') 'Debug: Max obs i-coord: ', maxval(obs_locations_ptr(:,1)) write(*,'(A,F8.2)') 'Debug: Min obs j-coord: ', minval(obs_locations_ptr(:,2)) write(*,'(A,F8.2)') 'Debug: Max obs j-coord: ', maxval(obs_locations_ptr(:,2)) end if ! 广播观测位置到所有进程(跨节点) call MPI_Bcast(obs_locations_ptr, nobs_actual*3, MPI_REAL, 0, MPI_COMM_WORLD, ierr) ! 只需要全局同步,确保所有节点的主进程都完成数据加载 call MPI_Barrier(MPI_COMM_WORLD, ierr) ! 所有进程必须调用 ! 初始化本进程的分析增量(model_data类型已经有固定形状,无需allocate) analysis_increment%pi = 0.0 analysis_increment%u = 0.0 analysis_increment%v = 0.0 analysis_increment%th = 0.0 analysis_increment%q = 0.0 ! ====== 两阶段掩码筛选与负载均衡分配 ====== ! 第一阶段:基于网格分块的并行掩码生成 ! 1. 创建二维掩码数组并初始化 allocate(assimilation_mask(local_radius+1:nx-local_radius, local_radius+1:ny-local_radius)) assimilation_mask = 0 if (myid == 0) then write(*,'(A)') '====== DEBUG: Entering mask generation stage ======' write(*,'(A,4I6)') 'Process domain: istart, iend, jstart, jend = ', & istart_proc, iend_proc, jstart_proc, jend_proc write(*,'(A,2I6)') 'Mask array bounds: ', local_radius+1, nx-local_radius, local_radius+1, ny-local_radius write(*,'(A)') 'Stage 1: Generating assimilation mask using grid-block parallelization...' write(*,'(A,I4,A,I4)') 'Each process handles a ', & (iend_proc-istart_proc+1), ' x ', (jend_proc-jstart_proc+1), ' block' ! 调试:检查观测数据 write(*,'(A,I8)') 'Total observations: ', nobs_actual write(*,'(A,F8.2,F8.2)') 'First observation location: ', & obs_locations_ptr(1,1), obs_locations_ptr(1,2) write(*,'(A,F8.2,F8.2)') 'Last observation location: ', & obs_locations_ptr(nobs_actual,1), obs_locations_ptr(nobs_actual,2) write(*,'(A,I4)') 'Local radius: ', local_radius write(*,'(A,I4)') 'Min observations required: ', min_obs_required write(*,'(A,I4,A,I4)') 'Valid grid range: i=', local_radius+1, ' to ', nx-local_radius write(*,'(A,I4,A,I4)') 'Valid grid range: j=', local_radius+1, ' to ', ny-local_radius end if ! 2. 每个进程并行生成自己负责区域的掩码 if (myid == 0) then write(*,'(A)') '====== DEBUG: Starting mask generation loop ======' write(*,'(A,4I6)') 'Loop bounds: jstart, jend, istart, iend = ', & jstart_proc, jend_proc, istart_proc, iend_proc end if do j_grid = jstart_proc, jend_proc do i_grid = istart_proc, iend_proc obs_count = 0 ! 统计该网格点周围local_radius范围内的观测数量 do k = 1, nobs_actual ! 检查观测位置是否在有效网格范围内 if (obs_locations_ptr(k,1) >= local_radius+1 .and. obs_locations_ptr(k,1) <= nx-local_radius .and. & obs_locations_ptr(k,2) >= local_radius+1 .and. obs_locations_ptr(k,2) <= ny-local_radius) then ! 然后检查是否在当前网格点的local_radius范围内 if (abs(nint(obs_locations_ptr(k,1)) - i_grid) <= local_radius .and. & abs(nint(obs_locations_ptr(k,2)) - j_grid) <= local_radius) then obs_count = obs_count + 1 end if end if end do ! 如果观测数量满足要求,标记为可同化 if (obs_count >= min_obs_required) then assimilation_mask(i_grid, j_grid) = 1 end if ! 调试:输出前几个网格点的观测统计(只在进程0上输出) if (myid == 0 .and. ((i_grid-istart_proc)*(jend_proc-jstart_proc+1) + (j_grid-jstart_proc+1)) <= 10) then write(*,'(A,I4,A,I4,A,I4,A,I4)') 'Grid (', i_grid, ',', j_grid, ') has ', obs_count, ' observations, mask=', assimilation_mask(i_grid, j_grid) end if end do end do ! 3. 全局归约掩码:所有进程的掩码合并 call MPI_Allreduce(MPI_IN_PLACE, assimilation_mask, size(assimilation_mask), & MPI_INTEGER, MPI_SUM, MPI_COMM_WORLD, ierr) ! 4. 统计所有可同化网格点数量 n_valid_points = count(assimilation_mask >= 1) ! 注意:可能有重复统计,用>=1 if (myid == 0) then ! 调试:统计掩码分布 write(*,'(A,I8)') 'Debug: Raw valid points (before normalization): ', n_valid_points write(*,'(A,I8)') 'Debug: Max mask value: ', maxval(assimilation_mask) write(*,'(A,I8)') 'Debug: Points with mask >= 1: ', count(assimilation_mask >= 1) write(*,'(A,I8)') 'Debug: Points with mask > 1: ', count(assimilation_mask > 1) end if ! 5. 将掩码标准化为0/1 where (assimilation_mask >= 1) assimilation_mask = 1 elsewhere assimilation_mask = 0 end where ! 重新统计标准化后的有效网格点 n_valid_points = count(assimilation_mask == 1) if (myid == 0) then write(*,'(A)') 'Stage 2: Redistributing valid grid points for load balancing...' write(*,'(A,I8)') 'Total valid points to distribute: ', n_valid_points write(*,'(A,I8)') 'Points per process (average): ', n_valid_points / numprocs write(*,'(A,I8)') 'Last process will handle: ', n_valid_points - (numprocs-1)*(n_valid_points/numprocs) end if ! 第二阶段:基于有效网格点的等分重新分配 if (n_valid_points > 0) then ! 6. 计算本进程应处理的网格点范围 psize = n_valid_points / numprocs pstart = myid * psize + 1 if (myid == numprocs - 1) then pend = n_valid_points ! 最后一个进程处理剩余部分 else pend = (myid + 1) * psize end if n_my_points = max(0, pend - pstart + 1) if (n_my_points > 0) then allocate(my_grid_list(2, n_my_points)) ! 7. 高效向量化生成本进程的网格点列表 block integer, allocatable :: all_i_coords(:), all_j_coords(:) logical, allocatable :: mask_values(:) integer, allocatable :: valid_indices(:) integer :: total_points, ni, nj, idx ni = nx - 2*local_radius nj = ny - 2*local_radius total_points = ni * nj ! 生成所有可能的网格点坐标和对应的掩码值 allocate(all_i_coords(total_points), all_j_coords(total_points)) allocate(mask_values(total_points)) ! 向量化生成坐标网格和掩码 idx = 0 do j_grid = local_radius+1, ny-local_radius do i_grid = local_radius+1, nx-local_radius idx = idx + 1 all_i_coords(idx) = i_grid all_j_coords(idx) = j_grid mask_values(idx) = (assimilation_mask(i_grid, j_grid) == 1) end do end do ! 使用pack函数筛选有效网格点的索引 valid_indices = pack([(idx, idx=1, total_points)], mask_values) ! 直接提取本进程需要的网格点 my_grid_list(1, :) = all_i_coords(valid_indices(pstart:pend)) my_grid_list(2, :) = all_j_coords(valid_indices(pstart:pend)) deallocate(all_i_coords, all_j_coords, mask_values, valid_indices) end block end if if (myid == 0) then write(*,'(A,I4,A,I8)') 'Stage 2 completed: Average points per process = ', psize, & ', last process handles ', n_valid_points - (numprocs-1)*psize end if else n_my_points = 0 end if if (myid == 0) then write(*,'(A)') strline write(*,'(A)') ' Analysis Increment Statistics' write(*,'(A)') strline write(*,'(A)') ' Progress, min(π), min(u), min(v), min(θ), min(q)' write(*,'(A)') ' (x,y), max(π), max(u), max(v), max(θ), max(q)' write(*,'(A)') strline end if ! 第三阶段:基于重新分配的网格点列表进行数据同化主循环 if (n_my_points == 0) then if (myid <= 3) then write(*,'(A,I4,A)') 'Process ', myid, ' assigned 0 grid points, skipping assimilation loop.' end if else if (myid == monitor_id) then myid0_task = 0 myid0_task_total = n_my_points end if if (myid == 0) then write(*,'(A)') 'Stage 3: Starting data assimilation on redistributed grid points...' end if ! 遍历分配给本进程的网格点列表 do grid_idx = 1, n_my_points i_grid = my_grid_list(1, grid_idx) j_grid = my_grid_list(2, grid_idx) ! 确保网格点唯一性:只有负责该网格点的进程才处理 ! 使用简单的哈希函数确保唯一性 if (mod(i_grid * 1000 + j_grid, numprocs) /= myid) then cycle ! 跳过不属于本进程的网格点 end if if (myid == monitor_id) then myid0_task = myid0_task + 1 end if call DA_MAIN(i_grid, j_grid, & obs_wind_ptr, obs_locations_ptr, obs_errors_ptr, nobs_actual, & ensemble_pi_ptr, ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & analysis_increment_point) call combine_point_inc(analysis_increment, analysis_increment_point, i_grid, j_grid) if (myid == monitor_id) then write(*,'(A,I6,I6,A,5ES11.3)') ' ',myid0_task, myid0_task_total, ' ', & minval(analysis_increment_point%pi), minval(analysis_increment_point%u), & minval(analysis_increment_point%v), minval(analysis_increment_point%th), & minval(analysis_increment_point%q) write(*,'(A,I4,A,I4,A,5ES11.3)') ' (', i_grid, ',', j_grid, ') ', & maxval(analysis_increment_point%pi), maxval(analysis_increment_point%u), & maxval(analysis_increment_point%v), maxval(analysis_increment_point%th), & maxval(analysis_increment_point%q) end if end do end if ! 6. 清理内存 if (allocated(assimilation_mask)) deallocate(assimilation_mask) if (allocated(my_grid_list)) deallocate(my_grid_list) if (myid == monitor_id) then write(*,'(A)') strline write(*,'(A)') 'All stages completed successfully!' write(*,'(A)') strline end if ! 等待全局结束 call MPI_Barrier(MPI_COMM_WORLD, ierr) ! 在归约前输出每个进程的局部统计信息(用于调试) if (myid <= 3) then ! 只输出前4个进程的信息 write(*,'(A,I3,A,I3,A,2ES12.5)') 'Process ', myid, ' (node_rank=', node_rank, ') local π range: ', & minval(analysis_increment%pi), maxval(analysis_increment%pi) write(*,'(A,I3,A,I3,A,2ES12.5)') 'Process ', myid, ' (node_rank=', node_rank, ') local u range: ', & minval(analysis_increment%u), maxval(analysis_increment%u) end if ! 全局归约 analysis_increment - 添加调试信息 if (myid == 0) then write(*,'(A)') 'Starting global reduction of analysis increment...' write(*,'(A,2ES12.5)') 'Before reduction, process 0 π range: ', & minval(analysis_increment%pi), maxval(analysis_increment%pi) end if call global_reduce_analysis_increment(analysis_increment) if (myid == 0) then ! write(*,'(A)') 'Computing final analysis field ...' ! 输出分析场统计信息 write(*,'(A)') strline write(*,'(A)') ' Global Analysis Increment Statistics' write(*,'(A)') strline write(*,'(A,2ES12.5)') ' π increment range: ', minval(analysis_increment%pi), maxval(analysis_increment%pi) write(*,'(A,2ES12.5)') ' u increment range: ', minval(analysis_increment%u), maxval(analysis_increment%u) write(*,'(A,2ES12.5)') ' v increment range: ', minval(analysis_increment%v), maxval(analysis_increment%v) write(*,'(A,2ES12.5)') ' θ increment range: ', minval(analysis_increment%th), maxval(analysis_increment%th) write(*,'(A,2ES12.5)') ' q increment range: ', minval(analysis_increment%q), maxval(analysis_increment%q) write(*,'(A)') strline write(*,'(A)') 'Writing analysis increment to file ...' call write_analysis_increment('da_inc.dat', analysis_increment, height_full, height_half) write(*,'(A)') 'Writing combined analysis increment to GRAPES input format ...' call write_combine_inc2ana('grapesinput', 'grapesinput_DA', back_data_ptr, analysis_increment, height_full, height_half) write(*,'(A)') strline write(*,'(A)') ' Data Assimilation: COMPLETED' write(*,'(A)') strline end if ! 释放共享内存 call MPI_Win_free(win_obs, ierr) call MPI_Win_free(win_ensemble_pi, ierr) call MPI_Win_free(win_ensemble_u, ierr) call MPI_Win_free(win_ensemble_v, ierr) call MPI_Win_free(win_ensemble_th, ierr) call MPI_Win_free(win_ensemble_q, ierr) call MPI_Win_free(win_back, ierr) call MPI_Win_free(win_true, ierr) ! 释放通信器,所有进程调用 call MPI_Comm_free(node_comm, ierr) if (node_rank == 0 .and. inter_comm /= MPI_COMM_NULL) then call MPI_Comm_free(inter_comm, ierr) end if ! 结束MPI环境,所有进程必须调用 call MPI_FINALIZE(ierr) contains ! 计算二维进程网格分布 subroutine compute_process_grid(numprocs, nx_eff, ny_eff, px, py) integer, intent(in) :: numprocs, nx_eff, ny_eff integer, intent(out) :: px, py integer :: i, best_px, best_py real :: ratio, grid_ratio, best_ratio_diff grid_ratio = real(nx_eff) / real(ny_eff) best_ratio_diff = huge(1.0) best_px = 1 best_py = numprocs ! 寻找最佳的px*py=numprocs分解,使得px/py接近nx_eff/ny_eff do i = 1, int(sqrt(real(numprocs))) + 1 if (mod(numprocs, i) == 0) then ratio = real(i) / real(numprocs / i) if (abs(ratio - grid_ratio) < best_ratio_diff) then best_ratio_diff = abs(ratio - grid_ratio) best_px = i best_py = numprocs / i end if end if end do px = best_px py = best_py end subroutine compute_process_grid ! 计算本进程负责的局部区域 subroutine compute_local_domain(px_rank, py_rank, px, py, nx_eff, ny_eff, & istart, iend, jstart, jend) integer, intent(in) :: px_rank, py_rank, px, py, nx_eff, ny_eff integer, intent(out) :: istart, iend, jstart, jend integer :: ilen, jlen, irem, jrem ! 计算每个进程的基本网格数 ilen = nx_eff / px jlen = ny_eff / py ! 计算余数 irem = mod(nx_eff, px) jrem = mod(ny_eff, py) ! 计算x方向范围 if (px_rank < irem) then istart = px_rank * (ilen + 1) + 1 iend = istart + ilen else istart = px_rank * ilen + irem + 1 iend = istart + ilen - 1 end if ! 计算y方向范围 if (py_rank < jrem) then jstart = py_rank * (jlen + 1) + 1 jend = jstart + jlen else jstart = py_rank * jlen + jrem + 1 jend = jstart + jlen - 1 end if end subroutine compute_local_domain ! 设置观测数据共享内存 subroutine setup_shared_memory_obs(node_comm, node_rank, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, win_obs) integer, intent(in) :: node_comm, node_rank, nobs_actual real, pointer, intent(inout) :: obs_wind_ptr(:) real, pointer, intent(inout) :: obs_locations_ptr(:,:) real, pointer, intent(inout) :: obs_errors_ptr(:) real, pointer, intent(inout) :: obs_weight_ptr(:) integer, intent(out) :: win_obs integer(kind=MPI_ADDRESS_KIND) :: ssize integer :: ierr, disp_unit type(c_ptr) :: baseptr real, pointer :: shared_array(:) if (node_rank == 0) then ssize = int(nobs_actual * 6, MPI_ADDRESS_KIND) ! obs_wind + obs_locations(3) + obs_errors + obs_weight = 6*nobs_actual else ssize = 0 end if disp_unit = 4 ! sizeof(real) call MPI_Win_allocate_shared(ssize * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr, win_obs, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_obs, 0, ssize, disp_unit, baseptr, ierr) end if ! 将C指针转换为Fortran指针 call c_f_pointer(baseptr, shared_array, [nobs_actual * 6]) ! 映射到各个数组 obs_wind_ptr => shared_array(1:nobs_actual) obs_errors_ptr => shared_array(nobs_actual+1:2*nobs_actual) obs_weight_ptr => shared_array(2*nobs_actual+1:3*nobs_actual) ! 正确映射二维数组obs_locations_ptr(nobs_actual, 3) ! 使用c_loc获取起始位置,然后映射为二维数组 ! obs_locations占用shared_array(3*nobs_actual+1 : 6*nobs_actual)的位置 call c_f_pointer(c_loc(shared_array(3*nobs_actual+1)), obs_locations_ptr, [nobs_actual, 3]) end subroutine setup_shared_memory_obs ! 设置大数组共享内存 - 分为5个变量 subroutine setup_shared_memory_data(node_comm, node_rank, ensemble_pi_ptr, & ensemble_u_ptr, ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, & win_ensemble_pi, win_ensemble_u, win_ensemble_v, & win_ensemble_th, win_ensemble_q, win_back, win_true) integer, intent(in) :: node_comm, node_rank real, pointer, intent(out) :: ensemble_pi_ptr(:,:,:,:), ensemble_u_ptr(:,:,:,:), ensemble_v_ptr(:,:,:,:), & ensemble_th_ptr(:,:,:,:), ensemble_q_ptr(:,:,:,:) type(model_data), pointer, intent(out) :: back_data_ptr, true_data_ptr integer, intent(out) :: win_ensemble_pi, win_ensemble_u, win_ensemble_v, win_ensemble_th, win_ensemble_q, win_back, win_true integer(kind=MPI_ADDRESS_KIND) :: ssize_ensemble, ssize_data integer :: ierr, disp_unit type(c_ptr) :: baseptr_pi, baseptr_u, baseptr_v, baseptr_th, baseptr_q, baseptr_back, baseptr_true disp_unit = 4 ! sizeof(real) ! 集合数据大小 - 每个变量单独分配 if (node_rank == 0) then ssize_ensemble = int(nx * nz * ny * nsample, MPI_ADDRESS_KIND) ! 单个变量 ssize_data = int(nx * nz * ny * 5, MPI_ADDRESS_KIND) ! 5个变量 else ssize_ensemble = 0 ssize_data = 0 end if ! 分配5个集合变量的共享内存 call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_pi, win_ensemble_pi, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_pi, 0, ssize_ensemble, disp_unit, baseptr_pi, ierr) end if call c_f_pointer(baseptr_pi, ensemble_pi_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_u, win_ensemble_u, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_u, 0, ssize_ensemble, disp_unit, baseptr_u, ierr) end if call c_f_pointer(baseptr_u, ensemble_u_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_v, win_ensemble_v, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_v, 0, ssize_ensemble, disp_unit, baseptr_v, ierr) end if call c_f_pointer(baseptr_v, ensemble_v_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_th, win_ensemble_th, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_th, 0, ssize_ensemble, disp_unit, baseptr_th, ierr) end if call c_f_pointer(baseptr_th, ensemble_th_ptr, [nx, nz, ny, nsample]) call MPI_Win_allocate_shared(ssize_ensemble * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_q, win_ensemble_q, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_ensemble_q, 0, ssize_ensemble, disp_unit, baseptr_q, ierr) end if call c_f_pointer(baseptr_q, ensemble_q_ptr, [nx, nz, ny, nsample]) ! 分配背景场共享内存 call MPI_Win_allocate_shared(ssize_data * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_back, win_back, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_back, 0, ssize_data, disp_unit, baseptr_back, ierr) end if call c_f_pointer(baseptr_back, back_data_ptr) ! 分配真值场共享内存 call MPI_Win_allocate_shared(ssize_data * disp_unit, disp_unit, MPI_INFO_NULL, & node_comm, baseptr_true, win_true, ierr) if (node_rank /= 0) then call MPI_Win_shared_query(win_true, 0, ssize_data, disp_unit, baseptr_true, ierr) end if call c_f_pointer(baseptr_true, true_data_ptr) end subroutine setup_shared_memory_data ! 主进程读取数据并存储到共享内存 subroutine read_and_store_shared_data(ensemble_pi_ptr, ensemble_u_ptr, & ensemble_v_ptr, ensemble_th_ptr, ensemble_q_ptr, & back_data_ptr, true_data_ptr, obs_wind_ptr, & obs_locations_ptr, obs_errors_ptr, & obs_weight_ptr, nobs_actual, & height_full, height_half, gen_obs_loc) use module_svd use module_io real, pointer, intent(inout) :: ensemble_pi_ptr(:,:,:,:), ensemble_u_ptr(:,:,:,:), & ensemble_v_ptr(:,:,:,:), ensemble_th_ptr(:,:,:,:), ensemble_q_ptr(:,:,:,:) type(model_data), pointer, intent(inout) :: back_data_ptr, true_data_ptr real, pointer, intent(inout) :: obs_wind_ptr(:), obs_errors_ptr(:), & obs_locations_ptr(:,:), obs_weight_ptr(:) integer, intent(in) :: nobs_actual real, dimension(ids:ide, kms:kme, jds:jde), intent(in) :: height_full, height_half logical, intent(in), optional :: gen_obs_loc ! 是否生成观测位置 integer :: k, ierr type(model_ensemble_data) :: temp_ensemble_data ! 临时集合数据结构 logical :: do_gen_obs_loc if (present(gen_obs_loc)) then do_gen_obs_loc = gen_obs_loc else do_gen_obs_loc = .true. end if write(*,'(A)') 'Reading ensemble forecasts ...' ! 先分配临时结构体 allocate(temp_ensemble_data%pi(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%u(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%v(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%th(ids:ide, kms:kme, jds:jde, nsample)) allocate(temp_ensemble_data%q(ids:ide, kms:kme, jds:jde, nsample)) call read_ensemble_forecasts(temp_ensemble_data, height_full, height_half) ! 复制数据到共享内存指针 ensemble_pi_ptr = temp_ensemble_data%pi ensemble_u_ptr = temp_ensemble_data%u ensemble_v_ptr = temp_ensemble_data%v ensemble_th_ptr = temp_ensemble_data%th ensemble_q_ptr = temp_ensemble_data%q ! 释放临时结构体 deallocate(temp_ensemble_data%pi, temp_ensemble_data%u, temp_ensemble_data%v, & temp_ensemble_data%th, temp_ensemble_data%q) write(*,'(A)') 'Ensemble data: OK' write(*,'(A)') 'Reading background field ...' call read_background_field(back_data_ptr, height_full, height_half) write(*,'(A)') 'Background field: OK' write(*,'(A)') 'Reading true field ...' call read_truth_field(true_data_ptr, height_full, height_half) write(*,'(A)') 'True field: OK' ! 观测数据处理 write(*,'(A,I0,A)') 'Processing ', nobs_actual, ' observations ...' ! 观测位置已由全局主进程生成并广播,这里不再生成 write(*,'(A)') 'Step 1: Observation locations already generated by global master.' write(*,'(A)') 'Step 2: Initializing observation arrays ...' obs_wind_ptr(:) = 0.0 ! 初始化 obs_errors_ptr(:) = 1.005 ! 观测误差 write(*,'(A)') 'Step 2: Observation arrays initialized.' write(*,'(A)') 'Step 3: Calling get_obs_data ...' call get_obs_data(nobs_actual, obs_wind_ptr, obs_locations_ptr, obs_errors_ptr, true_data_ptr%u, obs_weight_ptr) write(*,'(A)') 'Step 3: get_obs_data completed.' end subroutine read_and_store_shared_data ! 全局归约分析增量 - 修复重复累加问题 subroutine global_reduce_analysis_increment(analysis_increment) use mpi type(model_data), intent(inout) :: analysis_increment integer :: ierr integer :: array_size type(model_data) :: global_result ! 全局归约结果 ! 计算数组大小:(ide-ids+1)*(kme-kms+1)*(jde-jds+1) array_size = (ide-ids+1)*(kme-kms+1)*(jde-jds+1) ! 调试:输出归约前的统计信息 if (myid == 0) then write(*,'(A,2ES12.5)') 'Process 0 before MPI_Reduce π range: ', & minval(analysis_increment%pi), maxval(analysis_increment%pi) end if ! 归约到全局主进程(myid==0),所有进程都必须调用,参数完全一致 call MPI_Reduce(analysis_increment%pi, global_result%pi, & array_size, MPI_REAL, MPI_SUM, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%u, global_result%u, & array_size, MPI_REAL, MPI_SUM, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%v, global_result%v, & array_size, MPI_REAL, MPI_SUM, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%th, global_result%th, & array_size, MPI_REAL, MPI_SUM, 0, MPI_COMM_WORLD, ierr) call MPI_Reduce(analysis_increment%q, global_result%q, & array_size, MPI_REAL, MPI_SUM, 0, MPI_COMM_WORLD, ierr) ! 归约后,只有myid==0进程拥有正确的全局结果,其余进程数据未定义 ! 若需要所有进程都获得全局结果,可用MPI_Bcast广播 call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr) if (myid == 0) then analysis_increment%pi = global_result%pi analysis_increment%u = global_result%u analysis_increment%v = global_result%v analysis_increment%th = global_result%th analysis_increment%q = global_result%q ! 调试:输出归约后的统计信息 write(*,'(A,2ES12.5)') 'After MPI_Reduce π range: ', & minval(analysis_increment%pi), maxval(analysis_increment%pi) write(*,'(A,2ES12.5)') 'After MPI_Reduce u range: ', & minval(analysis_increment%u), maxval(analysis_increment%u) end if call MPI_Bcast(analysis_increment%pi, array_size, MPI_REAL, 0, MPI_COMM_WORLD, ierr) call MPI_Bcast(analysis_increment%u, array_size, MPI_REAL, 0, MPI_COMM_WORLD, ierr) call MPI_Bcast(analysis_increment%v, array_size, MPI_REAL, 0, MPI_COMM_WORLD, ierr) call MPI_Bcast(analysis_increment%th, array_size, MPI_REAL, 0, MPI_COMM_WORLD, ierr) call MPI_Bcast(analysis_increment%q, array_size, MPI_REAL, 0, MPI_COMM_WORLD, ierr) ! 现在所有进程的analysis_increment均为全局归约结果 end subroutine global_reduce_analysis_increment end program da_system
07-05
修改下面程序的参数以此进行GPU显存的优化:# -------------------------------------------------------- # LISA: Reasoning Segmentation via Large Language Model # Licensed under Apache-2.0 license [see LICENSE for details] # Authors: Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia # -------------------------------------------------------- # GSVA: Generalized Segmentation via Multimodal Large Language Models # Modified by Zhuofan Xia # -------------------------------------------------------- import argparse import os os.environ["HF_ENDPOINT"] = "https://hf-mirror.com" import shutil from functools import partial import torch import transformers import deepspeed import deepspeed.comm as dist from torch.utils.data import DataLoader, DistributedSampler from tqdm import tqdm import model.llava.conversation as conversation_lib from model import LisaGSVAForCausalLM, add_task_tokens, init_vision_seg_for_model from data import MixedTrainingDataset, ValDataset, collate_fn from solver import train_one_epoch, validate, eval_gres from utils import get_logger def parse_args(): # parser.add_argument("--dataset_dir", required=True, type=str, help="Where do we store the huge datasets?") # parser.add_argument("--dataset", default="sem_seg||refer_seg||vqa||reason_seg", type=str) # parser.add_argument("--sem_seg_data",default="ade20k||cocostuff||pascal_part||paco_lvis||mapillary",type=str) # parser.add_argument("--refer_seg_data", default="refclef||refcoco||refcoco+||refcocog", type=str) # parser.add_argument("--image_size", default=1024, type=int, help="Image size of segmentation model.") # parser.add_argument("--lora_alpha", default=16, type=int) parser = argparse.ArgumentParser(description="GSVA Training and Evaluation") parser.add_argument("--segmentation_model_path", default="pretrain_weights_sam/sam_vit_h_4b8939.pth", type=str) parser.add_argument("--vision-tower", default="openai/clip-vit-large-patch14", type=str) parser.add_argument("--local_rank", default=0, type=int, help="For local rank in distributed training") # parser.add_argument("--mllm_model_path", default="liuhaotian/llava-llama-2-13b-chat-lightning-preview") parser.add_argument("--mllm_model_path", default="pretrain_weights_llava") parser.add_argument("--dataset_dir", default="dataset", type=str, help="Where do we store the huge datasets?") parser.add_argument("--precision", default="bf16", type=str, choices=["fp32", "bf16", "fp16"], help="precision for training and inference") parser.add_argument("--image_size", default=128, type=int, help="Image size of segmentation model.") parser.add_argument("--model_max_length", default=1024, type=int) parser.add_argument("--lora_r", default=1, type=int) parser.add_argument("--dataset", default="refer_seg", type=str) parser.add_argument("--sample_rates", default="9,3,3,1", type=str) parser.add_argument("--sem_seg_data", default="ade20k", type=str, ) parser.add_argument("--refer_seg_data", default="refcocog", type=str) parser.add_argument("--vqa_data", default="llava_instruct_150k", type=str) parser.add_argument("--reason_seg_data", default="ReasonSeg|train", type=str) parser.add_argument("--val_dataset", default="ReasonSeg|val", type=str) parser.add_argument("--log_base_dir", default="./outputs", type=str) parser.add_argument("--exp_name", default="default", type=str) parser.add_argument("--epochs", default=10, type=int) parser.add_argument("--steps_per_epoch", default=500, type=int) # parser.add_argument("--batch_size", default=16, type=int, help="batch size per device per step") parser.add_argument("--batch_size", default=1, type=int, help="batch size per device per step") parser.add_argument("--grad_accumulation_steps", default=1, type=int) parser.add_argument("--val_batch_size", default=1, type=int) parser.add_argument("--workers", default=8, type=int) parser.add_argument("--lr", default=0.0003, type=float) parser.add_argument("--ce_loss_weight", default=1.0, type=float) parser.add_argument("--dice_loss_weight", default=0.5, type=float) parser.add_argument("--bce_loss_weight", default=2.0, type=float) parser.add_argument("--lora_alpha", default=1, type=int) parser.add_argument("--lora_dropout", default=0.05, type=float) parser.add_argument("--lora_target_modules", default="q_proj,v_proj", type=str) parser.add_argument("--explanatory", default=0.1, type=float) parser.add_argument("--beta1", default=0.9, type=float) parser.add_argument("--beta2", default=0.95, type=float) parser.add_argument("--num_classes_per_sample", default=2, type=int) parser.add_argument("--exclude_val", action="store_true", default=False) parser.add_argument("--no_eval", action="store_true", default=False) parser.add_argument("--eval_only", action="store_true", default=False) parser.add_argument("--out_dim", default=256, type=int) parser.add_argument("--resume", default="", type=str) parser.add_argument("--print_freq", default=1, type=int) parser.add_argument("--start_epoch", default=0, type=int) parser.add_argument("--train_mask_decoder", action="store_true", default=True) parser.add_argument("--use_mm_start_end", action="store_true", default=True) parser.add_argument("--auto_resume", action="store_true", default=False, help='Whether resume the latest checkpoint when training is interrupted.') parser.add_argument("--no_sampling", action="store_true", default=False, help="Only one dataset finetuning, train on full length dataset.") parser.add_argument('--val_refzom', action='store_true', default=False, help='Default gres/zom evaluation, if True, RefZOM, else gRefCOCO.') parser.add_argument("--conv_type",default="llava_v1", type=str, choices=["llava_v1", "llava_llama_2"],) parser.add_argument("--merge_lora_path", type=str, default=None, help="Path to destination HF checkpoint.") parser.add_argument("--weight", type=str, default=None, help="Path to a bin ckpt.") parser = deepspeed.add_config_arguments(parser) return parser.parse_args() def main(): # Get arguments from commandline args = parse_args() # Set up Deepspeed distributed environment torch.cuda.set_device(args.local_rank) dist.init_distributed() args.world_size = world_size = dist.get_world_size() args.rank = rank = dist.get_rank() local_rank: int args.local_rank = local_rank = dist.get_local_rank() # Set up logging dir args.log_dir = os.path.join(args.log_base_dir, args.exp_name) if rank == 0: os.makedirs(args.log_dir, exist_ok=True) logger = get_logger(args.log_dir, rank, name=args.exp_name) # Create model tokenizer = transformers.AutoTokenizer.from_pretrained( args.mllm_model_path, cache_dir=None, model_max_length=args.model_max_length, padding_side="right", use_fast=False ) tokenizer, args = add_task_tokens(tokenizer, args) # Determine working model precision args.torch_dtype = torch.float32 if args.precision == "bf16": args.torch_dtype = torch.bfloat16 elif args.precision == "fp16": args.torch_dtype = torch.half # Prepare model creation arguments model_args = { "train_mask_decoder": args.train_mask_decoder, "out_dim": args.out_dim, "ce_loss_weight": args.ce_loss_weight, "dice_loss_weight": args.dice_loss_weight, "bce_loss_weight": args.bce_loss_weight, "seg_token_idx": args.seg_token_idx, "segmentation_model_path": args.segmentation_model_path, "vision_tower": args.vision_tower, "use_mm_start_end": args.use_mm_start_end, "tokenizer": tokenizer, "rej_token_idx": args.rej_token_idx } model = LisaGSVAForCausalLM.from_pretrained( args.mllm_model_path, torch_dtype=args.torch_dtype, **model_args ) # Set up two vision models for whole model, and lora model = init_vision_seg_for_model(model, tokenizer, args).half() # Evaluation or finetuning, btw, merge-lora always fails if args.weight is not None: # `args.weight`` is a large `*.bin` file. state_dict = torch.load(args.weight, map_location="cpu", weights_only=True) model.load_state_dict(state_dict, strict=False) logger.info("Load trained weights successfully!") # Specify the conversation type conversation_lib.default_conversation = conversation_lib.conv_templates[args.conv_type] # Build training set if args.eval_only: train_dataset = None else: train_dataset = MixedTrainingDataset( args.dataset_dir, tokenizer, args.vision_tower, samples_per_epoch=args.batch_size * args.grad_accumulation_steps * args.steps_per_epoch * world_size, precision=args.precision, image_size=args.image_size, num_classes_per_sample=args.num_classes_per_sample, exclude_val=args.exclude_val, dataset=args.dataset, sample_rate=[float(x) for x in args.sample_rates.split(",")], sem_seg_data=args.sem_seg_data, refer_seg_data=args.refer_seg_data, vqa_data=args.vqa_data, reason_seg_data=args.reason_seg_data, explanatory=args.explanatory, no_sampling=args.no_sampling ) if args.no_eval: val_dataset = None logger.info(f"Training with {len(train_dataset)} examples.") else: val_dataset = ValDataset( args.dataset_dir, tokenizer, args.vision_tower, args.val_dataset, args.image_size ) grefcoco_val_ds = ValDataset( args.dataset_dir, tokenizer, args.vision_tower, 'refzom|final|test' if args.val_refzom else 'grefcoco|unc|val', args.image_size ) if args.eval_only: logger.info(f"Testing with {len(val_dataset)} examples.") else: logger.info(f"Training with {len(train_dataset)} examples and validating with {len(val_dataset)} examples, also validating on gRefCOCO with {len(grefcoco_val_ds)} examples.") # The accelerated training configurations only work for ZeRO-2. if args.eval_only: ds_config = { "train_micro_batch_size_per_gpu": 1, "fp16": { "enabled": args.precision == "fp16", }, "bf16": { "enabled": args.precision == "bf16", } } else: ds_config = { "train_micro_batch_size_per_gpu": args.batch_size, "gradient_accumulation_steps": args.grad_accumulation_steps, "optimizer": { "type": "Adam", "params": { "lr": args.lr, "weight_decay": 0.0, "betas": (args.beta1, args.beta2), }, }, "scheduler": { "type": "WarmupDecayLR", "params": { "total_num_steps": args.epochs * args.steps_per_epoch, "warmup_min_lr": 0, "warmup_max_lr": args.lr, "warmup_num_steps": 100, "warmup_type": "linear", }, }, "fp16": { "enabled": args.precision == "fp16", }, "bf16": { "enabled": args.precision == "bf16", }, "gradient_clipping": 1.0, "zero_optimization": { "stage": 2, "contiguous_gradients": True, "overlap_comm": True, "reduce_scatter": True, "reduce_bucket_size": 1e9, "allgather_bucket_size": 1e9 } } # Build a model engine wrapped with Deepspeed if args.eval_only: model_engine, optimizer, train_loader, scheduler = deepspeed.initialize( model=model, config=ds_config ) else: logger.info('Before initializing deepspeed zero optimizer...') model_engine, optimizer, train_loader, scheduler = deepspeed.initialize( model=model, model_parameters=model.parameters(), training_data=train_dataset, collate_fn=partial( collate_fn, tokenizer=tokenizer, conv_type=args.conv_type, use_mm_start_end=args.use_mm_start_end, local_rank=local_rank, ), config=ds_config ) train_loader.num_local_io_workers = args.workers logger.info('After initializing deepspeed zero optimizer!') # resume deepspeed checkpoint, `auto-resume` snippets are borrowed from Swin Transfomer codebase: # https://github.com/microsoft/Swin-Transformer/blob/f82860bfb5225915aca09c3227159ee9e1df874d/utils.py#L163 if args.auto_resume: checkpoints = os.listdir(args.log_dir) checkpoints = [ckpt for ckpt in checkpoints if ckpt.startswith('ckpt_model')] if len(checkpoints) > 0: args.resume = max([os.path.join(args.log_dir, d) for d in checkpoints], key=os.path.getmtime) logger.info(f"Auto resume found latest: {args.resume}") else: logger.info("No auto resume.") if args.resume: # resume from training, scattered checkpoints (list of ***.pt) load_path, client_state = model_engine.load_checkpoint(args.resume) with open(os.path.join(args.resume, "latest"), "r") as f: ckpt_dir = f.readlines()[0].strip() args.start_epoch = ( int(ckpt_dir.replace("global_step", "")) // args.steps_per_epoch ) logger.info( "resume training from {}, start from epoch {}".format( args.resume, args.start_epoch ) ) # Build validation dataset if val_dataset is not None: assert args.val_batch_size == 1 val_sampler = DistributedSampler(val_dataset, shuffle=False, drop_last=False) val_loader = DataLoader( val_dataset, batch_size=args.val_batch_size, shuffle=False, num_workers=args.workers, pin_memory=False, sampler=val_sampler, collate_fn=partial( collate_fn, tokenizer=tokenizer, conv_type=args.conv_type, use_mm_start_end=args.use_mm_start_end, local_rank=local_rank ) ) if val_dataset.ds not in ['grefcoco', 'refzom']: grefcoco_sampler = DistributedSampler(grefcoco_val_ds, shuffle=False, drop_last=False) grefcoco_loader = DataLoader( grefcoco_val_ds, batch_size=args.val_batch_size, shuffle=False, num_workers=args.workers, pin_memory=False, sampler=grefcoco_sampler, collate_fn=partial( collate_fn, tokenizer=tokenizer, conv_type=args.conv_type, use_mm_start_end=args.use_mm_start_end, local_rank=local_rank ) ) else: grefcoco_loader = None # If we only want to evaluate models, then we evaluate them and quit the program. if args.eval_only: if val_dataset.ds in ['grefcoco', 'refzom']: eval_gres(val_loader, model_engine, 0, args, logger) else: validate(val_loader, model_engine, 0, args, logger) return # Otherwise, we train the model using the initialized Deepspeed-Zero model engine. logger.info("Training begin!") train_iter = iter(train_loader) for epoch in tqdm(range(args.start_epoch, args.epochs), desc="train:"): # train for one epoch, keep a `train_iter`` for iter-based training train_iter = train_one_epoch(train_loader, model_engine, epoch, train_iter, args, logger) # barrier for saving checkpoints dist.barrier() save_dir = os.path.join(args.log_dir, f"ckpt_model_{epoch + 1:02d}") if rank == 0 and os.path.exists(save_dir): shutil.rmtree(save_dir) model_engine.save_checkpoint(save_dir) dist.barrier() # Skip if we don't need evalutation if args.no_eval: continue else: reason_giou, reason_ciou = validate(val_loader, model_engine, epoch, args, logger) grefcoco_giou, grefcoco_ciou, n_acc, t_acc = eval_gres(grefcoco_loader, model_engine, epoch, args, logger) if rank == 0: with open(os.path.join(args.log_dir, "quick_look_result.log"), "a") as t: t.write( f"[{epoch + 1}] reasonseg_val: gIoU:{reason_giou:.4f}, cIoU:{reason_ciou:.4f}, " f"grefcoco_val: gIoU:{grefcoco_giou:.4f}, cIoU:{grefcoco_ciou:.4f}, NAcc:{n_acc:.4f}, TAcc:{t_acc:.4f}.\n" ) if __name__ == "__main__": main()
最新发布
09-01
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值