Unable to set dev_loss_tmo and fast_io_fail_tmo in multipath.conf

博客围绕Red Hat Enterprise Linux 6.3系统中DM - Multipath的问题展开。在multipath.conf里指定“fast_io_fail_tmo”和“dev_loss_tmo”选项后值不变,根源是设置顺序及sysfs限制。更新系统到device - mapper - multipath - 0.4.9 - 72.el6或更高版本可解决,还给出诊断步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

https://access.redhat.com/solutions/344673

 SOLUTION 已验证 - 已更新 2015年四月14日04:27 - 

English 

环境

  • Red Hat Enterprise Linux 6.3
  • DM-Multipath

问题

  • After specifying the options "fast_io_fail_tmo" and "dev_loss_tmo" in multipath.conf, these values remain unchanged.

决议

An errata has been released for an internal BZ#974129 reported for the issue mentioned here. Updating the system to device-mapper-multipath-0.4.9-72.el6 or higher will resolve the issue.

With the current fix, multipath would be setting fast_io_fail_tmo value to higher than the existing dev_loss_tmo value. It first sets it to one less than the existing value. Then, if dev_loss_tmo is also being increased, it sets that and then sets fast_io_fail to the selected value.

That is, Multipath will correctly set any valid combination of fast_io_fail_tmo and dev_loss_tmo

根源

Mutipath sets fast_io_fail_tmo before setting dev_loss_tmo, and sysfs doesn't allow setting fast_io_fail_tmo to equal to or higher than dev_loss_tmo

As a consequence, if user set fast_io_fail_tmo to higher than the existing dev_loss_tmo value, multipath will be unable to set it.

诊断步骤

  • You can view the current values by the following commands :

  • fast_io_fail_tmo

Raw

$ for f in /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
  • dev_loss_tmo

Raw

# for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done

These should be over-ridden by adding entries in /etc/multipath.conf:

Raw

defaults {
        user_friendly_names yes
        fast_io_fail_tmo <N>
        dev_loss_tmo <N>
}

Raw

[root@localhost ~]#  for f in /sys/class/fc_remote_ports/rport-*/fast_io_fail_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
rport-1:0-0:0x5005076303ffc4d6:off
rport-1:0-1:0x5005076303ffc4d6:off
rport-2:0-0:0x5005076303ffc4d6:off
rport-2:0-1:0x5005076303ffc4d6:off
rport-5:0-0:0x100000051e36126a:off
rport-5:0-1:0x100000051e36126a:off
rport-6:0-0:0x100000051e3637ba:off
rport-6:0-1:0x100000051e3637ba:off
[root@localhost ~]#  for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
rport-1:0-0:0x5005076303ffc4d6:8
rport-1:0-1:0x5005076303ffc4d6:8
rport-2:0-0:0x5005076303ffc4d6:8
rport-2:0-1:0x5005076303ffc4d6:8
rport-5:0-0:0x100000051e36126a:30
rport-5:0-1:0x100000051e36126a:30
rport-6:0-0:0x100000051e3637ba:30
rport-6:0-1:0x100000051e3637ba:30
  • If we strace the multipathd daemon:

Raw

#  strace -f -tt -T -v -x -o /tmp/stracing_multipathd.txt -s 4096 multipathd -v9

In the log file in /tmp you'll have something like:

Raw

/tmp/stracing_multipathd2.txt:1974  09:14:56.540154 open("/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo", O_WRONLY) = 6 <0.000048>
/tmp/stracing_multipathd2.txt-1974  09:14:56.540283 write(6, "\x33\x30\x00", 3) = -1 EINVAL (Invalid argument) <0.000024>
/tmp/stracing_multipathd2.txt-1974  09:14:56.540377 close(6)          = 0 <0.000030>
--
/tmp/stracing_multipathd3.txt:2103  09:23:44.284830 open("/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo", O_WRONLY) = 6 <0.000055>
/tmp/stracing_multipathd3.txt-2103  09:23:44.284951 write(6, "30\0", 3) = -1 EINVAL (Invalid argument) <0.000021>
/tmp/stracing_multipathd3.txt-2103  09:23:44.285030 close(6)          = 0 <0.000020>
--
/tmp/stracing_multipathd.txt:1887  09:00:00.782075 open("/sys/class/fc_remote_ports/rport-2:0-0/fast_io_fail_tmo", O_WRONLY) = 6 <0.000051>
/tmp/stracing_multipathd.txt-1887  09:00:00.782208 write(6, "\x33\x30\x00", 3) = -1 EINVAL (Invalid argument) <0.000024>
/tmp/stracing_multipathd.txt-1887  09:00:00.782301 close(6)          = 0 <0.000023>
  • Even when you have the correct order, you will still fail to set up dev_loss_tmo sometimes. This is because

Raw

[root@1950-pe2 ~]# for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); donerport-2:0-0:0x50060160ba601693:120
rport-2:0-1:0x200000e08b90d42d:30
rport-2:0-2:0x200000e08b8f2388:30
rport-2:0-3:0x20000000c9802436:30
rport-3:0-0:0x50060160ba601693:2147483647
rport-3:0-1:0x200000e08b8f7684:16
rport-3:0-2:0x200000e08b90cb2e:16
rport-3:0-3:0x20000000c9802437:16
[root@1950-pe2 ~]# for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done^C
[root@1950-pe2 ~]# echo 2147483647 > /sys/class/fc_remote_ports/rport-3\:0-1/dev_loss_tmo 
-bash: echo: write error: Invalid argument
[root@1950-pe2 ~]# echo 21 > /sys/class/fc_remote_ports/rport-3\:0-1/dev_loss_tmo 
[root@1950-pe2 ~]# echo 2147483646 > /sys/class/fc_remote_ports/rport-3\:0-1/dev_loss_tmo 
-bash: echo: write error: Invalid argument
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-3\:0-1/dev_loss_tmo 
21
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-3\:0-1/
device/            fast_io_fail_tmo   port_id            port_state         roles              subsystem/         uevent             
dev_loss_tmo       node_name          port_name          power/             scsi_target_id     supported_classes  
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-3\:0-1/node_name 
0x200000e08b8f7684
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-3\:0-1/roles 
FCP Initiator
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-3\:0-0/roles 
FCP Target
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-2\:0-0/roles 
FCP Target
[root@1950-pe2 ~]# cat /sys/class/fc_remote_ports/rport-2\:0-1/roles 
FCP Initiator
[root@1950-pe2 ~]# echo 2147483647 > /sys/class/fc_remote_ports/rport-2\:0-1/dev_loss_tmo
-bash: echo: write error: Invalid argument
[root@1950-pe2 ~]# echo 21 > /sys/class/fc_remote_ports/rport-2\:0-1/dev_loss_tmo
[root@1950-pe2 ~]# for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
rport-2:0-0:0x50060160ba601693:120
rport-2:0-1:0x200000e08b90d42d:21
rport-2:0-2:0x200000e08b8f2388:30
rport-2:0-3:0x20000000c9802436:30
rport-3:0-0:0x50060160ba601693:2147483647
rport-3:0-1:0x200000e08b8f7684:21
rport-3:0-2:0x200000e08b90cb2e:16
rport-3:0-3:0x20000000c9802437:16
[root@1950-pe2 ~]# echo 600 > /sys/class/fc_remote_ports/rport-2\:0-1/dev_loss_tmo
[root@1950-pe2 ~]# for f in /sys/class/fc_remote_ports/rport-*/dev_loss_tmo; do d=$(dirname $f); echo $(basename $d):$(cat $d/node_name):$(cat $f); done
rport-2:0-0:0x50060160ba601693:120
rport-2:0-1:0x200000e08b90d42d:600
rport-2:0-2:0x200000e08b8f2388:30
rport-2:0-3:0x20000000c9802436:30
rport-3:0-0:0x50060160ba601693:2147483647
rport-3:0-1:0x200000e08b8f7684:21
rport-3:0-2:0x200000e08b90cb2e:16
rport-3:0-3:0x20000000c9802437:16

-As we can see in drivers/scsi/scsi_transport_fc.c:

Raw

/*
 * dev_loss_tmo: the default number of seconds that the FC transport
 *   should insulate the loss of a remote port.
 *   The maximum will be capped by the value of SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
 */
static unsigned int fc_dev_loss_tmo = 60;               /* seconds */
  • And this is the max value according to drivers/scsi/scsi_priv.h:

Raw

#define SCSI_DEVICE_BLOCK_MAX_TIMEOUT 600 /* units in seconds */
<think>好的,我现在需要帮助用户解决PyVISA中的VI_ERROR_TMO错误,也就是超时问题。首先,我应该回顾用户之前的问题,看看有没有关联。用户之前问过如何主动释放无效的session,现在遇到超时错误,可能是在使用仪器时出现的。 用户可能是在与仪器通信时,某个操作花费的时间超过了预设的超时时间,导致这个错误。我需要分析可能的原因,并提供逐步的解决方案。首先,超时错误常见的原因包括仪器响应慢、命令执行时间长、连接不稳定或者设置不当。 接下来,我应该考虑常见的解决方法。例如,调整超时时间、检查仪器是否繁忙、优化查询命令、检查物理连接等。同时,用户可能需要具体的代码示例来演示如何设置超时,如何处理错误,以及如何优化通信流程。 另外,用户可能不太清楚如何判断是仪器问题还是代码问题,所以需要提供诊断步骤,比如使用仪器前面板手动操作,或者用其他软件测试连接。这有助于用户确定问题根源。 还要注意用户可能的误区,比如在循环中频繁查询仪器状态而没有足够等待时间,或者发送了不支持的指令。需要提醒用户检查命令的正确性,并确保仪器处于正确的状态。 最后,总结最佳实践,帮助用户避免未来出现类似问题,比如合理设置超时、使用异步查询、定期检查连接等。同时,提供调试技巧,比如启用日志记录,帮助用户更深入地了解通信过程。 现在需要把这些思考整理成一个结构化的回答,分步骤解释错误原因、解决方案和示例代码,确保用户能够逐步排查和解决问题。同时保持语言简洁明了,符合用户的技术水平。</think>--- ### **PyVISA超时错误 `VI_ERROR_TMO` 的深度解析与解决方案** --- #### **1. 错误本质分析** - **错误代码**: `VI_ERROR_TMO (-1073807339)` - **触发条件**: VISA操作未在指定时间内完成 - **典型场景**: - 仪器响应缓慢 - 硬件连接异常 - 命令语法错误导致仪器无响应 - 未正确处理仪器工作状态(如触发等待) --- #### **2. 核心解决策略** --- ##### **2.1 直接调整超时阈** ```python import pyvisa rm = pyvisa.ResourceManager() inst = rm.open_resource('TCPIP0::192.168.1.100::INSTR') # 设置全局超时(单位:毫秒) inst.timeout = 10000 # 10秒超时 # 针对单次操作设置超时(PyVISA 1.13+) with inst.timeout(15000): # 15秒上下文超时 data = inst.query('FETCH?') ``` --- ##### **2.2 仪器状态检测** ```python # 检测仪器是否处于等待触发状态 if int(inst.query('STAT:OPER:COND?')) & 0x04: print("仪器正在等待外部触发") # 检查命令队列是否为空 busy = inst.stb & 0x01 # 状态字节第0位表示忙状态 ``` --- ##### **2.3 异步通信优化** ```python # 使用异步查询避免阻塞 async_query = inst.query_async('MEAS:VOLT?') while not async_query.done(): print("执行其他任务...") result = async_query.result() ``` --- #### **3. 分场景解决方案** --- ##### **3.1 长操作处理模式** ```python try: # 启动测量(可能耗时) inst.write('INIT:IMM') # 轮询完成状态 while True: done = inst.query('*OPC?') if done == '1': break except pyvisa.VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: print("操作未完成,建议增大超时或优化轮询间隔") ``` --- ##### **3.2 硬件链路诊断** ```python # 物理层检查工具链 print(f"接口类型:{inst.interface_type}") print(f"终止符:{inst.read_termination}") print(f"波特率:{inst.baud_rate}") # 强制刷新缓冲区 inst.flush(pyvisa.constants.VI_READ_BUF_DISCARD | pyvisa.constants.VI_WRITE_BUF_DISCARD) ``` --- #### **4. 高级调试技巧** --- ##### **4.1 协议分析器配置** ```python # 启用详细通信日志 import logging pyvisa.log_to_screen(logging.DEBUG) # 查看原始报文 inst.write("SYST:COMM:RAW ON") ``` --- ##### **4.2 超时错误自动恢复** ```python from pyvisa import VisaIOError def robust_query(inst, cmd, retries=3): for _ in range(retries): try: return inst.query(cmd) except VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: inst.flush(pyvisa.constants.VI_READ_BUF_DISCARD) continue raise raise TimeoutError(f"操作超时,重试{retries}次失败") ``` --- #### **5. 典型错误对照表** | 现象特征 | 可能原因 | 验证方法 | |--------------------------|--------------------------|----------------------------| | 所有命令均超时 | 物理连接断开 | 使用ping/telnet测试端口连通 | | 特定命令偶发超时 | 仪器处理时间波动 | 前面板手动执行相同操作 | | 写入后读取立即超时 | 终止符配置错误 | 检查read_termination设置 | | 超时时间随机变化 | 系统资源争用 | 监控CPU/内存使用率 | --- #### **6. 预防性编程实践** --- ##### **6.1 状态机设计模式** ```python class InstrumentController: def __init__(self, resource): self.inst = resource self._state = 'IDLE' def start_measurement(self): if self._state == 'IDLE': self.inst.write('INIT') self._state = 'MEASURING' def get_data(self): if self._state == 'MEASURING': return self.inst.query('FETCH?') raise ValueError("Invalid state") ``` --- ##### **6.2 超时动态调整算法** ```python def adaptive_timeout(inst, base_timeout=1000, factor=1.5): success_count = 0 while True: try: data = inst.query('READ?', delay=base_timeout) success_count += 1 if success_count > 3: base_timeout = max(100, base_timeout//2) return data except VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: base_timeout = int(base_timeout * factor) success_count = 0 ``` --- ### **关键配置检查清单** 1. **终止符匹配性检查** ```python print(f"当前终止符:ASCII {ord(inst.read_termination)}") ``` 2. **波特率一致性验证** ```python assert inst.baud_rate == 9600, "波特率不匹配" ``` 3. **硬件触发超时设置** ```python inst.write("TRIG:TIMEOUT 60") # 设置仪器内部触发等待时间 ``` --- ### **厂商特定问题处理** - **Keysight设备**:检查`ESR`寄存器中的错误代码 ```python esr = int(inst.query("*ESR?")) print(f"标准事件状态寄存器:{bin(esr)}") ``` - **Tektronix示波器**:使用`*WAI`命令同步操作 ```python inst.write("CURVE?; *WAI") # 强制等待数据采集完成 ``` - **Rohde&Schwarz仪器**:启用流水线模式 ```python inst.write("SYST:COMM:NET:TCP:PIPE ON") ``` --- 通过系统性地应用这些方法,可以显著降低超时错误的发生概率。建议在关键测量环节加入**超时异常重试机制**,并结合**硬件状态监控**实现稳健的自动化测试系统。对于持续性超时问题,推荐使用`pyvisa-shell`工具进行底层协议分析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值