July 2005

A Small Simple Process Restarter

Often it arises in the respective OS fields, one can find they need one simple thing to happen on a system, find out if a process is running and restart it. Here it is discussed how to approach that problem using the posix shell [1].

Problem Definition

This particular method is only useful for one single process - let's read that again so it is clear - one single process [2]. It will not work for groups of processes such as apache, mysql, nfsd et al. In the example, the process name is going to be foo_d, however, it will work for any other singular process. What foo_d does is not relevant within the context, all that matters is if foo_d is not running at a certain time interval, then it needs to be restarted. For arguments sake, one hour is used.

Recursive Daemons Not

The first instinct for the systems programmer might be to create a generic checkd daemon that accepts arguments (such as the name) and forks ever happily after. There is, however, somewhat of a logic flaw there. If foo_d needed to run no matter what then it should have signal over rides built into its own constructs. If the interval needs to be anything less than 15 minutes, we might want to consider making foo_d more persistent or add internal signal handling that can ignore certain signals (again, depending on what foo_d actually does).

The Solution

With the aforementioned known, a pretty simple assessment can be made - just hammer out a script that is fired from cron once an hour to check for the process. First, a bomb routine.


#!/bin/sh

progname=${0##*/}
toppid=$$
trap "exit 1" 1 2 3 15
bomb()
{
        cat >&2 <<ERRORMESSAGE

ERROR: $@
*** ${progname} aborted! ***
ERRORMESSAGE
        kill ${toppid}
        exit 1
}

Pretty basic shell stuff, the interpreter call, a program name catcher, snag the pid, trap some signals and finally the bomb routine if anything goes terribly awry.

Two Main Globals

Now it is time for the globals. Case in point; args $1 and $2 will be the process name and command to restart respectively:


PROC_NAME=$1
PROC_INIT=$2

It is important that those values are grabbed immediately since stdio, stdout and stderr may be used later on within the script. Also note that if argument 2 requires more than one chain of characters with empty spaces - it will have to be enclosed in double quotes.

Running the Check

Now it is time to snag the process, which can be done pretty easily using the ps command:


ps -e | grep $PROC_NAME >/dev/null 2>&1 ||
	bomb() "Failed to run the process grep for ${PROC_NAME}"

The actual output of the ps command is not relevant, all that matters is the return value from the ps command.

Checking the ret Value

Using a rather simplistic switch case the appropriate action can be taken:


case $? in
    0)
        echo "$PROC_NAME running"
        ;;
    *)
        echo "restarting $PROC_NAME"
        $PROC_INIT ||
			bomb "Could not restart ${PROC_NAME} using ${PROC_INIT}"
        logger $progname  "Restarted $PROC_NAME" 
        ;;
esac

exit 0 # we made it this far...

Note that once the bomb() routine is called, any action could be added to it such as emailing for example.

The Whole Script

A tiny script really is not big enough to warrant its own file, so here is the whole thing compacted for easy copying and pasting:

#!/bin/sh
progname=${0##*/}
toppid=$$
trap "exit 1" 1 2 3 15
bomb()
{
        cat >&2 <<ERRORMESSAGE
ERROR: $@ 
*** ${progname} aborted! ***
ERRORMESSAGE
        kill ${toppid}
        exit 1
}
PROC_NAME=$1
PROC_INIT=$2
ps -e | grep $PROC_NAME >/dev/null 2>&1 ||
    bomb() "Failed to run the process grep for ${PROC_NAME}"
case $? in
    0)
        echo "$PROC_NAME running"
        ;;
    *)
        echo "restarting $PROC_NAME"
        $PROC_INIT ||
            bomb "Could not restart ${PROC_NAME} using ${PROC_INIT}"
        logger $progname  "Restarted $PROC_NAME"
        ;;
esac
exit 0

Summary

Sometimes, the hardest things actually turn out to be really simple; a simple script added to crontab to make darn sure a process is running turned out to be incredibly simple. It could use improvement, to be sure - or extensions - but it does the job.

Footnotes

  1. This script should work with bash & ksh as well.
  2. It could easily be extended to handle process groups like apache.

 

 

Digg!
Submit site
news to Digg!

Slashdot Slashdot It!
Delicious Bookmark on Delicious