July 2005
Often it arises in the respective OS fields, one can find they need one
simple thing to happen on a system, find out if a process is running and
restart it. Here it is discussed how to approach that problem
using the posix shell [1].
This particular method is only useful for one single process -
let's read that again so it is clear - one single process
[2].
It will not work for groups of processes such as apache, mysql, nfsd et al.
In the example, the process name is going to be foo_d,
however, it will work for any other singular process. What foo_d
does is not relevant within the context, all that matters is
if foo_d is not running at a certain time interval, then it
needs to be restarted. For arguments sake, one hour is used.
The first instinct for the systems programmer might be to create a generic
checkd daemon that accepts arguments (such as the name) and
forks ever happily after. There is, however, somewhat of a logic flaw
there. If foo_d needed to run no matter what then it
should have signal over rides built into its own constructs.
If the interval needs to be anything less than
15 minutes, we might want to consider making foo_d more
persistent or add internal signal handling that can ignore certain signals
(again, depending on what foo_d actually does).
With the aforementioned known, a pretty simple assessment can be made -
just hammer out a script that is fired from cron once an
hour to check for the process. First, a bomb routine.
#!/bin/sh
progname=${0##*/}
toppid=$$
trap "exit 1" 1 2 3 15
bomb()
{
cat >&2 <<ERRORMESSAGE
ERROR: $@
*** ${progname} aborted! ***
ERRORMESSAGE
kill ${toppid}
exit 1
}
Pretty basic shell stuff, the interpreter call, a program name catcher,
snag the pid, trap some signals and finally the bomb routine
if anything goes terribly awry.
Now it is time for the globals. Case in point; args $1 and
$2 will be the process name and command to restart
respectively:
PROC_NAME=$1 PROC_INIT=$2
It is important that those values are grabbed immediately since
stdio,
stdout and
stderr
may be used later on within the script. Also note that if argument
2 requires more than one chain of characters with empty spaces -
it will have to be enclosed in double quotes.
Now it is time to snag the process, which can be done pretty easily
using the ps command:
ps -e | grep $PROC_NAME >/dev/null 2>&1 ||
bomb() "Failed to run the process grep for ${PROC_NAME}"
The actual output of the ps command is not relevant, all
that matters is the return value from the ps command.
ret ValueUsing a rather simplistic switch case the appropriate
action can be taken:
case $? in
0)
echo "$PROC_NAME running"
;;
*)
echo "restarting $PROC_NAME"
$PROC_INIT ||
bomb "Could not restart ${PROC_NAME} using ${PROC_INIT}"
logger $progname "Restarted $PROC_NAME"
;;
esac
exit 0 # we made it this far...
Note that once the bomb() routine is called, any action
could be added to it such as emailing for example.
A tiny script really is not big enough to warrant its own file, so here is the whole thing compacted for easy copying and pasting:
#!/bin/sh
progname=${0##*/}
toppid=$$
trap "exit 1" 1 2 3 15
bomb()
{
cat >&2 <<ERRORMESSAGE
ERROR: $@
*** ${progname} aborted! ***
ERRORMESSAGE
kill ${toppid}
exit 1
}
PROC_NAME=$1
PROC_INIT=$2
ps -e | grep $PROC_NAME >/dev/null 2>&1 ||
bomb() "Failed to run the process grep for ${PROC_NAME}"
case $? in
0)
echo "$PROC_NAME running"
;;
*)
echo "restarting $PROC_NAME"
$PROC_INIT ||
bomb "Could not restart ${PROC_NAME} using ${PROC_INIT}"
logger $progname "Restarted $PROC_NAME"
;;
esac
exit 0
Sometimes, the hardest things actually turn out to be really simple;
a simple script added to crontab to make darn
sure a process is running turned out to be incredibly simple.
It could use improvement, to be sure - or extensions - but it does
the job.
bash &
ksh as well.