Getting error: SCSI error: return code = 0x00010000

本文探讨了在Red Hat Enterprise Linux系统中遇到的SCSI错误'returncode=0x00010000',特别是DID_NO_CONNECT错误,这通常表明硬件连接问题。文章详细解释了错误的原因,包括硬件故障、存储重新配置或远程端口超时,并提供了检查系统日志以确定根本原因的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Issue

  • Getting error: 'kernel: sd h:c:t:l: SCSI error: return code = 0x00010000':

    Raw

    May 22 23:50:10 localhost kernel: Device sdb not ready.
    May 22 23:50:10 localhost kernel: end_request: I/O error, dev sdb, sector 0
    May 22 23:50:10 localhost kernel: SCSI error : <0 0 2 14> return code = 0x10000
    
  • SAN access issue reported on RHEL server, observed following messages in /var/log/messages

    Raw

    May  5 04:15:00 localhost kernel: sd 3:0:1:42: Unhandled error code
    May  5 04:15:00 localhost kernel: sd 3:0:1:42: SCSI error: return code = 0x00010000
    May  5 04:15:00 localhost kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
    
  • In device mapper multipath, some path fails and going down. Logs have many events logged for tur checker reports path is down and multipath -ll output show paths in failed faulty state:

    Raw

    [root@example ~]# multipath -ll
    sdb: checker msg is "tur checker reports path is down"
    mpath0 (2001738006296000b) dm-8 IBM,2810XIV
    [size=200G][features=1 queue_if_no_path][hwhandler=0][rw]
    \_ round-robin 0 [prio=3][active]
    \_ 2:0:0:1 sda 8:0   [active][ready]
     \_ 2:0:1:1 sdb 8:16  [failed][faulty]
     \_ 4:0:8:1 sdc 8:32  [active][ready]
     \_ 4:0:9:1 sdd 8:48  [active][ready]
    
  • We had A SAN director failure and the system got paniced due to lpfc lost devices.

Resolution

SCSI error: return code = 0x00010000 DID_NO_CONNECT

  • There is likely a hardware issue that is related to the connectivity problems. Contact storage hardware support for assistance in determining cause and addressing the problem.

  • Parallel SCSI

    Raw

    DID_NO_CONNECT = SCSI SELECTION failed because there was no device at the address specified
    
  • iSCSI

    • iscsi layer returns this if replacement/recovery timeout seconds has expired or if user asked to shutdown session.
  • FC/SAN

    • Typically this will follow loss of a storage port, for example:

    Raw

    kernel:  rport-0:0-1: blocked FC remote port time out: saving binding
    kernel: sd 0:0:1:22: Unhandled error code
    kernel: sd 0:0:1:22: SCSI error: return code = 0x00010000
    kernel: Result: hostbyte=>DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
    
    • In the above example the remote port timed out and the controller lost connection to the devices behind that storage port.

Root Cause

The SCSI error return code of 0x00010000 is broken down into constituent parts and decoded as shown below:

Raw

0x00 01 00 00
-------------
           00   status byte : {likely} not valid, see other fields
        00         msg byte : {likely} not valid, see other fields
     01           host byte : DID_NO_CONNECT - no connection to device {possibly device doesn't exist or transport failed}
  00            driver byte : {likely} not valid, see other fields

The return code 0x00010000 is a DID_NO_CONNECT -- hardware transport connection to device is no longer available. This scsi error indicates that IO command is being rejected because the command cannot be sent to the device until the hardware transport becomes available again. These are a symptom/result of some other root cause. That other root cause needs to be found and addressed.

Either access to the device is temporarily unavailable or the device is no longer available within the configuration. A temporary service interrupt can be cause by maintenance activity in the san such as a switch reboot. Such activity can cause either a link down condition and/or remote port timeouts. Its possible the hardware transport connectivity will return at some time later. A permanent loss of connectivity can occur if storage is reconfigured to remove that lun from being presented to the host.

Check for the following or similar event messages within the system logs:

Mar 28 19:52:53 hostname kernel: qla2xxx 0000:04:00.1: LOOP DOWN detected (4 4 0 0).
Mar 28 19:53:23 hostname kernel:  rport-3:0-0: blocked FC remote port time out: saving binding
Mar 28 19:53:23 hostname kernel: sd 3:0:0:1: SCSI error: return code = 0x00010000
:
.

Look at what was being logged just before these DID_NO_CONNECTs started being logged.

You can also have remote ports timeout (loss of connectivity) without a link down event. See the following for more information on remote port timeouts: Saw error "rport-1:0-0: blocked FC remote port time out: saving binding" in system log?

There is a delay between the link down event at 19:52:53 and remote port time out at 19:53:23 -- a 30 second delay. When a remote port is lost, a delay within the driver called dev_loss_tmo, device loss timeout, is applied before taking further action. If the port returns to the configuration before that timeout expires, then io is immediately retried. If the port hasn't returned by that time then all io is immediately returned with DID_NO_CONNECT status. Any and all further io will immediately fail after the remote port time out event has occurred. See DM Multipath (RHEL 5)and DM Multipath (RHEL 6) configuration guides for more information on dev_loss_tmo behavior. Also "How to set custom dev_loss_tmo value permanently?" has information for setting dev_loss_tmo outside of multipath.

Other symptoms can result from the host loosing connectivity to storage. If no connectivity remains, issues such as file systems going read-only can result from being unable commit necessary metadata changes to the disks hosting the filesystem.

See How do I interpret scsi status messages in RHEL like "sd 2:0:0:243: SCSI error: return code = 0x08000002"? for more information.

当你看到Jenkins的日志中出现"ERROR: error getting credentials"这样的错误时,通常意味着Jenkins在尝试从其安全存储库获取访问其他系统所需的凭证(如Git、SVN等版本控制系统的用户名和密码,或者是用于自动化测试或其他服务的API密钥)时遇到了问题。 这可能是由于以下几个原因: 1. **凭证配置不正确**:检查你是否已经正确配置了这些凭证,并确保它们存在并且权限设置正确。 2. **凭证管理插件问题**:如果你使用的是插件管理凭证,确认插件是否更新到最新版本,有时候旧版本可能存在兼容性问题。 3. **网络连接问题**:如果Jenkins服务器无法连接到凭证所在的存储库,比如云凭证管理服务,检查网络连通性。 4. **权限问题**:Jenkins用户可能没有足够的权限访问存储凭证的地方,需要确保Jenkins用户有正确的读取权限。 5. **证书或SSL配置**:如果是通过HTTPS访问凭证,检查服务器的SSL证书是否有效或受信任。 要解决这个问题,你可以按照以下步骤操作: 1. **查看日志详细信息**:查找更多关于错误的详细信息,看看是哪个具体的凭证出错以及报错的具体原因。 2. **修复或添加缺失的凭证**:在Jenkins的“Credentials”界面手动创建或修改缺少的凭证。 3. **重启Jenkins**:有时候简单地重启Jenkins可以解决临时性的连接问题。 4. **排查并调整权限**:检查Jenkins用户的权限设置,确保他们能够访问相应的资源。 5. **检查并修复网络或SSL配置**:如有必要,修复网络连接或更新服务器的SSL证书。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值