The watchdog subsystem in libmtev provides a facility for hang protection, crash recovery, and crash reporting.

When "things go wrong" the parent (monitor) process will trace the monitored child using {gilder} $pid $reason > {trace_dir}/$appname.$pid.trc, ensure it is dead, confirm that it should restart it and then launch a new child.

<?xml version="1.0" encoding="utf8" standalone="yes"?>
<example1 lockfile="/var/tmp/example.lock">
  <watchdog glider="/opt/local/bin/bt"

The following watchdog attributes are supported:

  • tracedir

    A directory to deposit trace files. Trace files contain the output of the glider command. File names are of the format {appname}.{pid}.trc.

  • glider

    The full path to an executable to invoke when a monitor process crashes or is killed due to inactivity. It is invoked with two arguments: process id and reason (one of "crash", "watchdog", or "unknown").

  • save_trace_output

    Choose whether to store any output that the glider may produce on stdout. Its value is "true" or "false", and the default is "true" if not specified. If true, the glider output is saved as described in the tracedir attribute above. Some gliders may produce their own output files, in which case the stdout stream is unnecessary and one may choose to ignore it by setting this attribute to "false".

  • retries

    The maximum number of restart attempts to be made over span seconds. Default: 5 retries over 60 seconds.

  • span

    The number of seconds over which restarts are rate limited. Default: 5 retries over 60 seconds.

