forked from Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
77 lines
3.4 KiB
77 lines
3.4 KiB
=========================== |
|
HPE iLO NMI Watchdog Driver |
|
=========================== |
|
|
|
for iLO based ProLiant Servers |
|
============================== |
|
|
|
Last reviewed: 08/20/2018 |
|
|
|
|
|
The HPE iLO NMI Watchdog driver is a kernel module that provides basic |
|
watchdog functionality and handler for the iLO "Generate NMI to System" |
|
virtual button. |
|
|
|
All references to iLO in this document imply it also works on iLO2 and all |
|
subsequent generations. |
|
|
|
Watchdog functionality is enabled like any other common watchdog driver. That |
|
is, an application needs to be started that kicks off the watchdog timer. A |
|
basic application exists in tools/testing/selftests/watchdog/ named |
|
watchdog-test.c. Simply compile the C file and kick it off. If the system |
|
gets into a bad state and hangs, the HPE ProLiant iLO timer register will |
|
not be updated in a timely fashion and a hardware system reset (also known as |
|
an Automatic Server Recovery (ASR)) event will occur. |
|
|
|
The hpwdt driver also has the following module parameters: |
|
|
|
============ ================================================================ |
|
soft_margin allows the user to set the watchdog timer value. |
|
Default value is 30 seconds. |
|
timeout an alias of soft_margin. |
|
pretimeout allows the user to set the watchdog pretimeout value. |
|
This is the number of seconds before timeout when an |
|
NMI is delivered to the system. Setting the value to |
|
zero disables the pretimeout NMI. |
|
Default value is 9 seconds. |
|
nowayout basic watchdog parameter that does not allow the timer to |
|
be restarted or an impending ASR to be escaped. |
|
Default value is set when compiling the kernel. If it is set |
|
to "Y", then there is no way of disabling the watchdog once |
|
it has been started. |
|
kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI |
|
before calling panic. (-1) disables the watchdog. When value |
|
is > 0, the timer is reprogrammed with the greater of |
|
value or current timeout value. |
|
============ ================================================================ |
|
|
|
NOTE: |
|
More information about watchdog drivers in general, including the ioctl |
|
interface to /dev/watchdog can be found in |
|
Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt. |
|
|
|
Due to limitations in the iLO hardware, the NMI pretimeout if enabled, |
|
can only be set to 9 seconds. Attempts to set pretimeout to other |
|
non-zero values will be rounded, possibly to zero. Users should verify |
|
the pretimeout value after attempting to set pretimeout or timeout. |
|
|
|
Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a |
|
panic. This is to allow for a crash dump to be collected. It is incumbent |
|
upon the user to have properly configured the system for kdump. |
|
|
|
The default Linux kernel behavior upon panic is to print a kernel tombstone |
|
and loop forever. This is generally not what a watchdog user wants. |
|
|
|
For those wishing to learn more please see: |
|
Documentation/admin-guide/kdump/kdump.rst |
|
Documentation/admin-guide/kernel-parameters.txt (panic=) |
|
Your Linux Distribution specific documentation. |
|
|
|
If the hpwdt does not receive the NMI associated with an expiring timer, |
|
the iLO will proceed to reset the system at timeout if the timer hasn't |
|
been updated. |
|
|
|
-- |
|
|
|
The HPE iLO NMI Watchdog Driver and documentation were originally developed |
|
by Tom Mingarelli.
|
|
|