What are the parameters that control how MS iSCSI survives lost TCP connections without causing applications harm?
During transient loss of connection, instead of reporting "Device not available" immediately, the Microsoft iSCSI initiator will try to reconnect to the target and resubmit outstanding SCSI commands.
There are three registry values related MS iSCSI retry behavior, found in the following path:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\[Instance_Number]\Parameters
Note: the
[Instance_Number]
may be different from system to system, depending on how many SCSI adapters already exist on the system.- The three registry values are:
DelayBetweenReconnect [default: 5 (seconds)]
MaxConnectionRetries [default: 0xFFFFFFFF, infinite]
MaxRequestHoldTime [default: 60 (seconds)]
Explanations: Normally you don't need to modify
DelayBetweenReconnect
and MaxConnectionRetries
. The MaxRequestHoldTime
is probably the only one that you may want to change. It defines how long Microsoft iSCSI initiator should hold and retry outstanding commands, before notifying upper layer of a Device Removal
event. This event usually causes I/O failures to applications using the iSCSI disk. MaxRequestHoldTime
is only relevant with non-MPIO environments. When MPIO is involved, this value is ignored.A
Device Removal
event can be bad for applications actively using an iSCSI Logical Unit Number (LUN), especially if a cable-pull, filer reboot, filer cluster failover, etc., takes more than MaxRequestHoldTime
of 60 seconds to recover. Unless you have special requirement that need the retry window to be smaller or larger, 180 (seconds) is a good value to start with.Note: Even after a
Device Removal
event is reported, Microsoft iSCSI initiator will still keep trying to reconnect to the target, as defined by the first two registry values,DelayBetweenReconnect
and MaxConnectionRetries
.The Windows iSCSI host must be rebooted after changing the registry value(s).