scsi debugging

最新推荐文章于 2023-08-20 08:41:05 发布

转载最新推荐文章于 2023-08-20 08:41:05 发布 · 841 阅读

文章标签：

#linux

linux 专栏收录该内容

27 篇文章

订阅专栏

本文深入探讨了SCSI子系统的故障排除机制，包括错误日志等级配置、扫描过程及其在FC光纤通道上的应用。通过设置'csi_logging_level'参数，可以精细控制日志级别，以获取特定阶段的详细信息。此外，FC光纤通道的扫描过程分为仲裁环模式和交换式模式，并介绍了初始化程序（LIP）如何在不同模式下操作，确保设备检测的准确性和效率。

SCSI Debugging

Some helpful information for SCSI debugging can be found here.

[edit]scsi_logging_level

The SCSI subsystem has a general logging facility which can be enabled by writing to

/proc/sys/dev/scsi/logging_level

This is a _general_ logging facility, ie it cannot be restricted to individual devices or HBAs. However, it has several distinct logging areas which can be individually selected. Each of these areas span across a 3 bit field in the logging_level value.

The definitions can be found in

drivers/scsi/scsi_logging.h

The possible areas are:

ERROR

Used by any command which has to be retried/recovered via the SCSI error handling mechanism

TIMEOUT

Used by drivers/scsi/sg.c

SCAN

Used during device scan, ie whenever a new device / HBA is initialized

MLQUEUE

Mid-layer queue; requests are being pulled from the block-layer queue and submitted to the HBA

MLCOMPLETE

Mid-layer queue; requests are being completed by the HBA and results are being pushed back to the block-layer

LLQUEUE

Low-layer queue; Not used

LLCOMPLETE

Low-layer queue; Not used

HLQUEUE

High-layer queue; command preparation in drivers/scsi/sd.c

HLCOMPLETE

High-layer queue; command completion in drivers/scsi/sd.c

IOCTL

SCSI IOCTL logging

The detailed error level description:

[edit]ERROR

Error logging when a command is recovered via the SCSI error handling mechanism. The following levels are uses:

Error handler thread statistics
Error handler command statistics
Error handler command logging
Not used
Error handler command details
and higher: not used

[edit]SCAN

Logging during HBA / target scanning. The following levels are used:

Logging of unusual devices where LUN 0 has a pqual of 3
Logging of devices with pqual 3
Logging of SCSI commands sent during scanning
and higher: not used.

[edit]MLQUEUE

The MLQUEUE area is used when a command is being pulled from the block-layer queue and send to the HBA. It has the following levels:

nothing (match completion)
log opcode + command of all commands
same as 2 plus dump cmd address
same as 3 plus dump extra junk
and higher: not used

[edit]MLCOMPLETE

Logging of command completion from the HBA, before the completion is being called for the block-layer request. It has the following levels:

log disposition, result, opcode + command, and conditionally sense data for failures or non SUCCESS dispositions.
same as 1 but for all command completions.
same as 2 plus dump cmd address
same as 3 plus dump extra junk
and higher: not used

Starting with SLES10 SP2 there is a command 'scsi_logging_level' which allows you to set the areas and levels without having to calculate the bit offsets by hand, very convenient if you want to enable a logging level other than 0xffffffff.

A useful setting of the logging level without being buried in logging details is ERROR=3, SCAN=3, MLQUEUE=2, MLCOMPLETE=2, which evaluates to

echo 9411 > /proc/sys/dev/scsi/logging_level

[edit]SCSI rescan on FibreChannel

Device detection on the SCSI bus works on two levels; on the first level the HBA detect the targets (using HBA / transport specific methods), and after that the SCIS midlayer scans each target for the presented LUNs. On FibreChannel, the (SCSI) target is mapped to a FC port. So the scan for target actually a scan for the visible FC ports.

[edit]FibreChannel topology

A HBA device connected to a FibreChannel SAN might operate in two different modes: Arbitrated Loop (AL) or Switched Fabric (SW). SW mode is designed for Switch-to-Switch communication, so it knows about NameServers etc. A scan in SW mode requires a query to the fabric name server, parsing the result, checking each resulting port etc.. AL mode on the other hand is a simplification as it assumes that all remote ports are in a loop, so by querying each possible loop-id all devices in the Loop will be detected. This initialisation routine is known as Loop Initialization Procedure (LIP).

As SW mode has quite some issues with interoperability all linux FC driver (except for the zSeries 'zfcp' driver) run in AL mode.

[edit]Loop Initialisation Procedure (LIP)

A LIP is triggered whenever the existing SAN information needs to be updated. Most obviously this is the case during startup, as then the driver needs to detect the available ports.

During operation a LIP should be triggered whenever the SAN configuration changes. However, depending on the Switch configuration this might or might not be the case. Reasoning here is that triggering a LIP is a disruptive operation, causing all remote ports to reconfigure. During this time all I/O on the ports connected to the HBA is suspended. After the LIP has completed I/O will resume (if the ports are still present) or kept suspended if the remote port is not visible anymore. In that case thedev_loss_tmo timer and fast_io_fail_tmo timer (if present) are started. They are responsible for removing the remote port from the system resp. stopping all I/O on the remote port. As this is a full initialisation the remote ports will be reset.

Due to this reason most FC Switches allow for a configuration where a LIP is not automatically started whenever the SAN configuration is changed.

[edit]rescan-scsi-bus.sh

As a LIP is not generally triggered if the SAN configuration changes the rescan-scsi-bus.sh has a switch -i|--issue-lip, which causes a LIP to be triggered on the specified HBAs. One has to be aware of the consequences of this option, most notably the device reset.