How to set dev_loss_tmo and fast_io_fail_tmo persistently, using a udev rule

博客围绕Red Hat Enterprise Linux 6和7系统展开,指出需设置特定超时参数且重启后仍生效。介绍了这些参数是传输层超时,与远程端口结构相关。通过选取相关rport获取udev信息,构建udev规则来设置参数,还给出不同示例及应用规则的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

https://access.redhat.com/solutions/3234351

 SOLUTION UNVERIFIED - 已更新 2018年五月10日22:39 - 

English 

环境

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

问题

  • Need to set fast_io_fail_tmo and dev_loss_tmo
  • Setting must persist across reboot

决议

  • Both the fast_io_fail_tmo and dev_loss_tmo are transport layer timeouts, meaning that they are defined as working with the remote port structure of the fabric, associated with class fc_remote_ports. Since they work in regards to the state of the remote port, to pull the needed udev database information, we must target a rport.

  • Taking a look at one of our devices, we see it is a dm-multipath device named /dev/mapper/test_lun. There are 8 paths presented through hosts 2, 3, 4, and 5. Duplicate backend ports are provided through target port 0 and 1 ending at lun 0.

Raw

test_lun (wwid_omitted) dm-4 NETAPP,LUN
size=30G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='queue-length 0' prio=50 status=active
| |- 2:0:0:0  sde  8:64   active ready running
| |- 3:0:1:0  sdn  8:208  active ready running
| |- 4:0:0:0  sdq  65:0   active ready running
| `- 5:0:0:0  sdw  65:96  active ready running
`-+- policy='queue-length 0' prio=10 status=enabled
  |- 2:0:1:0  sdh  8:112  active ready running
  |- 3:0:0:0  sdm  8:192  active ready running
  |- 4:0:1:0  sdt  65:48  active ready running
  `- 5:0:1:0  sdad 65:208 active ready running

 

  • To start, pick one or more of the relevant rports and pull the udev information using the udevadm info command. We've chosen two from host2, rport-2:0-0 and rport-2:0-1.

Raw

[root@host ~]# ls /sys/class/fc_remote_ports/
rport-2:0-0  rport-2:0-2  rport-3:0-1  rport-4:0-0  rport-4:0-2  rport-4:0-5  rport-5:0-1  rport-5:0-3
rport-2:0-1  rport-3:0-0  rport-3:0-2  rport-4:0-1  rport-4:0-3  rport-5:0-0  rport-5:0-2  rport-5:0-4

[root@host ~]# udevadm info --attribute-walk --path=/sys/class/fc_remote_ports/rport-2\:0-0/

  looking at device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0/fc_remote_ports/rport-2:0-0':
    KERNEL=="rport-2:0-0"
    SUBSYSTEM=="fc_remote_ports"
    DRIVER==""
    ATTR{supported_classes}=="Class 3"
    ATTR{dev_loss_tmo}=="30"
    ATTR{node_name}=="0x500a09808607eec3"
    ATTR{port_name}=="0x500a09819607eec3"
    ATTR{port_id}=="0x610400"
    ATTR{roles}=="FCP Target"
    ATTR{port_state}=="Online"
    ATTR{scsi_target_id}=="0"
    ATTR{fast_io_fail_tmo}=="5"

  looking at parent device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0':
    KERNELS=="rport-2:0-0"
    SUBSYSTEMS==""
    DRIVERS==""
[ ... snip ... ]

[root@host ~]# udevadm info --attribute-walk --path=/sys/class/fc_remote_ports/rport-2\:0-1/

  looking at device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-1/fc_remote_ports/rport-2:0-1':
    KERNEL=="rport-2:0-1"
    SUBSYSTEM=="fc_remote_ports"
    DRIVER==""
    ATTR{supported_classes}=="Class 3"
    ATTR{dev_loss_tmo}=="2147483647"
    ATTR{node_name}=="0x500a09808607eec3"
    ATTR{port_name}=="0x500a09828607eec3"
    ATTR{port_id}=="0x610500"
    ATTR{roles}=="FCP Target"
    ATTR{port_state}=="Online"
    ATTR{scsi_target_id}=="1"
    ATTR{fast_io_fail_tmo}=="5"

  looking at parent device '/devices/pci0000:00/0000:00:09.0/0000:04:00.0/host2/rport-2:0-0':
    KERNELS=="rport-2:0-1"
    SUBSYSTEMS==""
    DRIVERS==""
[ ... snip ... ]

 

  • From this information we can build a udev rule to set both dev_loss_tmo and fast_io_fail_tmo. In this example, we'll target all of our hosts, and every viable rport behind the hosts, and set each to a dev_loss_tmo of 10 and a fast_io_fail_tmo of 5. We match all viable rports by matching the role "FCP Target". Create /etc/udev/rules.d/99-tmo.rules and include the below contents.

Raw

ACTION!="add|change", GOTO="tmo_end"
KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports",  ATTR{roles}=="FCP Target", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5"
LABEL="tmo_end"

 

  • In the second example, we'll target individual rports, using the "node_name" and the "port_name" of the rport.

Raw

ACTION!="add|change", GOTO="tmo_end"
KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{node_name}=="0x500a09808607eec3", ATTR{port_name}=="0x500a09819607eec3", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5"
KERNELS=="rport-?*", SUBSYSTEM=="fc_remote_ports", ATTR{node_name}=="0x500a09808607eec3", ATTR{port_name}=="0x500a09828607eec3", ATTR{dev_loss_tmo}="10", ATTR{fast_io_fail_tmo}="5"
LABEL="tmo_end"

 

  • To apply, reload the rules and database:

Raw

#RHEL6
[root@host]# udevadm control --reload-rules
#RHEL7
[root@host]# udevadm control --reload

 

  • Then trigger against the appropriate subsystem:

Raw

[root@host ~]# udevadm trigger --subsystem-match=fc_remote_ports



Note: When setting eh_deadline and eh_timeout How to set eh_deadline and eh_timeout persistently, using a udev rule can be used, and if setting dev_loss_tmo on a Cisco UCS system using the fnic driver, Does the fnic driver have a "dev_loss_tmo" setting? can be used.

<think>好的,我现在需要帮助用户解决PyVISA中的VI_ERROR_TMO错误,也就是超时问题。首先,我应该回顾用户之前的问题,看看有没有关联。用户之前问过如何主动释放无效的session,现在遇到超时错误,可能是在使用仪器时出现的。 用户可能是在与仪器通信时,某个操作花费的时间超过了预设的超时时间,导致这个错误。我需要分析可能的原因,并提供逐步的解决方案。首先,超时错误常见的原因包括仪器响应慢、命令执行时间长、连接不稳定或者设置不当。 接下来,我应该考虑常见的解决方法。例如,调整超时时间、检查仪器是否繁忙、优化查询命令、检查物理连接等。同时,用户可能需要具体的代码示例来演示如何设置超时,如何处理错误,以及如何优化通信流程。 另外,用户可能不太清楚如何判断是仪器问题还是代码问题,所以需要提供诊断步骤,比如使用仪器前面板手动操作,或者用其他软件测试连接。这有助于用户确定问题根源。 还要注意用户可能的误区,比如在循环中频繁查询仪器状态而没有足够等待时间,或者发送了不支持的指令。需要提醒用户检查命令的正确性,并确保仪器处于正确的状态。 最后,总结最佳实践,帮助用户避免未来出现类似问题,比如合理设置超时、使用异步查询、定期检查连接等。同时,提供调试技巧,比如启用日志记录,帮助用户更深入地了解通信过程。 现在需要把这些思考整理成一个结构化的回答,分步骤解释错误原因、解决方案和示例代码,确保用户能够逐步排查和解决问题。同时保持语言简洁明了,符合用户的技术水平。</think>--- ### **PyVISA超时错误 `VI_ERROR_TMO` 的深度解析与解决方案** --- #### **1. 错误本质分析** - **错误代码**: `VI_ERROR_TMO (-1073807339)` - **触发条件**: VISA操作未在指定时间内完成 - **典型场景**: - 仪器响应缓慢 - 硬件连接异常 - 命令语法错误导致仪器无响应 - 未正确处理仪器工作状态(如触发等待) --- #### **2. 核心解决策略** --- ##### **2.1 直接调整超时阈值** ```python import pyvisa rm = pyvisa.ResourceManager() inst = rm.open_resource('TCPIP0::192.168.1.100::INSTR') # 设置全局超时(单位:毫秒) inst.timeout = 10000 # 10秒超时 # 针对单次操作设置超时(PyVISA 1.13+) with inst.timeout(15000): # 15秒上下文超时 data = inst.query('FETCH?') ``` --- ##### **2.2 仪器状态检测** ```python # 检测仪器是否处于等待触发状态 if int(inst.query('STAT:OPER:COND?')) & 0x04: print("仪器正在等待外部触发") # 检查命令队列是否为空 busy = inst.stb & 0x01 # 状态字节第0位表示忙状态 ``` --- ##### **2.3 异步通信优化** ```python # 使用异步查询避免阻塞 async_query = inst.query_async('MEAS:VOLT?') while not async_query.done(): print("执行其他任务...") result = async_query.result() ``` --- #### **3. 分场景解决方案** --- ##### **3.1 长操作处理模式** ```python try: # 启动测量(可能耗时) inst.write('INIT:IMM') # 轮询完成状态 while True: done = inst.query('*OPC?') if done == '1': break except pyvisa.VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: print("操作未完成,建议增大超时或优化轮询间隔") ``` --- ##### **3.2 硬件链路诊断** ```python # 物理层检查工具链 print(f"接口类型:{inst.interface_type}") print(f"终止符:{inst.read_termination}") print(f"波特率:{inst.baud_rate}") # 强制刷新缓冲区 inst.flush(pyvisa.constants.VI_READ_BUF_DISCARD | pyvisa.constants.VI_WRITE_BUF_DISCARD) ``` --- #### **4. 高级调试技巧** --- ##### **4.1 协议分析器配置** ```python # 启用详细通信日志 import logging pyvisa.log_to_screen(logging.DEBUG) # 查看原始报文 inst.write("SYST:COMM:RAW ON") ``` --- ##### **4.2 超时错误自动恢复** ```python from pyvisa import VisaIOError def robust_query(inst, cmd, retries=3): for _ in range(retries): try: return inst.query(cmd) except VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: inst.flush(pyvisa.constants.VI_READ_BUF_DISCARD) continue raise raise TimeoutError(f"操作超时,重试{retries}次失败") ``` --- #### **5. 典型错误对照表** | 现象特征 | 可能原因 | 验证方法 | |--------------------------|--------------------------|----------------------------| | 所有命令均超时 | 物理连接断开 | 使用ping/telnet测试端口连通 | | 特定命令偶发超时 | 仪器处理时间波动 | 前面板手动执行相同操作 | | 写入后读取立即超时 | 终止符配置错误 | 检查read_termination设置 | | 超时时间随机变化 | 系统资源争用 | 监控CPU/内存使用率 | --- #### **6. 预防性编程实践** --- ##### **6.1 状态机设计模式** ```python class InstrumentController: def __init__(self, resource): self.inst = resource self._state = 'IDLE' def start_measurement(self): if self._state == 'IDLE': self.inst.write('INIT') self._state = 'MEASURING' def get_data(self): if self._state == 'MEASURING': return self.inst.query('FETCH?') raise ValueError("Invalid state") ``` --- ##### **6.2 超时动态调整算法** ```python def adaptive_timeout(inst, base_timeout=1000, factor=1.5): success_count = 0 while True: try: data = inst.query('READ?', delay=base_timeout) success_count += 1 if success_count > 3: base_timeout = max(100, base_timeout//2) return data except VisaIOError as e: if e.error_code == pyvisa.constants.VI_ERROR_TMO: base_timeout = int(base_timeout * factor) success_count = 0 ``` --- ### **关键配置检查清单** 1. **终止符匹配性检查** ```python print(f"当前终止符:ASCII {ord(inst.read_termination)}") ``` 2. **波特率一致性验证** ```python assert inst.baud_rate == 9600, "波特率不匹配" ``` 3. **硬件触发超时设置** ```python inst.write("TRIG:TIMEOUT 60") # 设置仪器内部触发等待时间 ``` --- ### **厂商特定问题处理** - **Keysight设备**:检查`ESR`寄存器中的错误代码 ```python esr = int(inst.query("*ESR?")) print(f"标准事件状态寄存器:{bin(esr)}") ``` - **Tektronix示波器**:使用`*WAI`命令同步操作 ```python inst.write("CURVE?; *WAI") # 强制等待数据采集完成 ``` - **Rohde&Schwarz仪器**:启用流水线模式 ```python inst.write("SYST:COMM:NET:TCP:PIPE ON") ``` --- 通过系统性地应用这些方法,可以显著降低超时错误的发生概率。建议在关键测量环节加入**超时异常重试机制**,并结合**硬件状态监控**实现稳健的自动化测试系统。对于持续性超时问题,推荐使用`pyvisa-shell`工具进行底层协议分析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值