SCSI Debugging
Some helpful information for SCSI debugging can be found here.
[edit]scsi_logging_level
The SCSI subsystem has a general logging facility which can be enabled by writing to
/proc/sys/dev/scsi/logging_level
This is a _general_ logging facility, ie it cannot be restricted to individual devices or HBAs. However, it has several distinct logging areas which can be individually selected. Each of these areas span across a 3 bit field in the logging_level value.
The definitions can be found in
drivers/scsi/scsi_logging.h
The possible areas are:
-
ERROR
- Used by any command which has to be retried/recovered via the SCSI error handling mechanism TIMEOUT
- Used by drivers/scsi/sg.c SCAN
- Used during device scan, ie whenever a new device / HBA is initialized MLQUEUE
- Mid-layer queue; requests are being pulled from the block-layer queue and submitted to the HBA MLCOMPLETE
- Mid-layer queue; requests are being completed by the HBA and results are being pushed back to the block-layer LLQUEUE
- Low-layer queue; Not used LLCOMPLETE
- Low-layer queue; Not used HLQUEUE
- High-layer queue; command preparation in drivers/scsi/sd.c HLCOMPLETE
- High-layer queue; command completion in drivers/scsi/sd.c IOCTL
- SCSI IOCTL logging
The detailed error level description:
[edit]ERROR
Error logging when a command is recovered via the SCSI error handling mechanism. The following levels are uses:
- Error handler thread statistics
- Error handler command statistics
- Error handler command logging
- Not used
- Error handler command details
- and higher: not used
[edit]SCAN
Logging during HBA / target scanning. The following levels are used:
- Logging of unusual devices where LUN 0 has a pqual of 3
- Logging of devices with pqual 3
- Logging of SCSI commands sent during scanning
- and higher: not used.
[edit]MLQUEUE
The MLQUEUE area is used when a command is being pulled from the block-layer queue and send to the HBA. It has the following levels:
- nothing (match completion)
- log opcode + command of all commands
- same as 2 plus dump cmd address
- same as 3 plus dump extra junk
- and higher: not used
[edit]MLCOMPLETE
Logging of command completion from the HBA, before the completion is being called for the block-layer request. It has the following levels:
- log disposition, result, opcode + command, and conditionally sense data for failures or non SUCCESS dispositions.
- same as 1 but for all command completions.
- same as 2 plus dump cmd address
- same as 3 plus dump extra junk
- and higher: not used
Starting with SLES10 SP2 there is a command 'scsi_logging_level' which allows you to set the areas and levels without having to calculate the bit offsets by hand, very convenient if you want to enable a logging level other than 0xffffffff.
A useful setting of the logging level without being buried in logging details is ERROR=3, SCAN=3, MLQUEUE=2, MLCOMPLETE=2, which evaluates to
echo 9411 > /proc/sys/dev/scsi/logging_level
[edit]SCSI rescan on FibreChannel
Device detection on the SCSI bus works on two levels; on the first level the HBA detect the targets (using HBA / transport specific methods), and after that the SCIS midlayer scans each target for the presented LUNs. On FibreChannel, the (SCSI) target is mapped to a FC port. So the scan for target actually a scan for the visible FC ports.
[edit]FibreChannel topology
A HBA device connected to a FibreChannel SAN might operate in two different modes: Arbitrated Loop (AL) or Switched Fabric (SW). SW mode is designed for Switch-to-Switch communication, so it knows about NameServers etc. A scan in SW mode requires a query to the fabric name server, parsing the result, checking each resulting port etc.. AL mode on the other hand is a simplification as it assumes that all remote ports are in a loop, so by querying each possible loop-id all devices in the Loop will be detected. This initialisation routine is known as Loop Initialization Procedure (LIP).
As SW mode has quite some issues with interoperability all linux FC driver (except for the zSeries 'zfcp' driver) run in AL mode.
[edit]Loop Initialisation Procedure (LIP)
A LIP is triggered whenever the existing SAN information needs to be updated. Most obviously this is the case during startup, as then the driver needs to detect the available ports.
During operation a LIP should be triggered whenever the SAN configuration changes. However, depending on the Switch configuration this might or might not be the case. Reasoning here is that triggering a LIP is a disruptive operation, causing all remote ports to reconfigure. During this time all I/O on the ports connected to the HBA is suspended. After the LIP has completed I/O will resume (if the ports are still present) or kept suspended if the remote port is not visible anymore. In that case thedev_loss_tmo timer and fast_io_fail_tmo timer (if present) are started. They are responsible for removing the remote port from the system resp. stopping all I/O on the remote port. As this is a full initialisation the remote ports will be reset.
Due to this reason most FC Switches allow for a configuration where a LIP is not automatically started whenever the SAN configuration is changed.
[edit]rescan-scsi-bus.sh
As a LIP is not generally triggered if the SAN configuration changes the rescan-scsi-bus.sh has a switch -i|--issue-lip, which causes a LIP to be triggered on the specified HBAs. One has to be aware of the consequences of this option, most notably the device reset.
本文深入探讨了SCSI子系统的故障排除机制,包括错误日志等级配置、扫描过程及其在FC光纤通道上的应用。通过设置'csi_logging_level'参数,可以精细控制日志级别,以获取特定阶段的详细信息。此外,FC光纤通道的扫描过程分为仲裁环模式和交换式模式,并介绍了初始化程序(LIP)如何在不同模式下操作,确保设备检测的准确性和效率。
2619

被折叠的 条评论
为什么被折叠?



