Mar 2011
Ever been in a situation where you needed to setup configuration files for tens, hundreds or perhaps thousands of systems? Probably. And if not and your a sysadmin, you probably will at some point. This text details how in one small situation I saved myself soem time later on by writing two very small scripts.
The details are relatively simple. A new High Performance Cluster (HPC) going into place needs to be monitored. The tool being used is Nagios. The goal of this particular monitoring instance was to setup service checks using hostgroups to help minimize the maintenance overhead. The problem of course is that the potential to add N compute nodes was (and still is) a reality. Before even bothering to add any nodes by hand I decided to script auto generating what I needed as much as possible. In the end the two small scripts I wrote did the following:
The first script generates host entries and the compute node hostgroup. The second script automatically creates hostgroups by queue. The second script also makes calls to Sun Grid Engine (SGE) to get information regarding queues.
These scripts are purposefully left as close to the originals to allow readers to go in and make improvements. They could use a lot of improvement which of course might make for a good article later on. But they do demonstrate that even if a script can use a lot of improvement at the time they got the job done. And quite well I might add.
#!/usr/bin/perl
# This Creates the basic nagios config files for cluster nodes
$end = 16;
$network = "192.168.0";
print "\# RCHPC1 AUTOGENERATED CONFIG\n";
print "\# DO NOT EDIT THIS DIRECTLY! See the mkclusterconf script!\n";
$x = 11;
$members = "";
$dyn = "";
$liq ="";
for ($i = 1; $i <= $end; $i++) {
if ($i <= 9) {
$hostname = "n00$i";
} else {
$hostname = "n0$i";
}
if ($i == 1) {
$members = "n001";
} else {
$members = "$members,$hostname";
}
print "define host\{\n";
print " use linux-server\n";
print " host_name $hostname\n";
print " alias $hostname.node\n";
print " notes RC HPC Compute Node\n";
print " address $network.$x\n";
print " \}\n\n";
$x++;
}
print "define hostgroup\{\n";
print " hostgroup_name myhpc1-compute-nodes\n";
print " alias All HPC Systems\n";
print " members $members,myhpc-prime\n";
print " \}\n\n";
print "define service\{\n";
print " use generic-service\n";
print " hostgroup_name myhpc1-compute-nodes\n";
print " service_description SSH\n";
print " check_command check_ssh\n";
print " \}\n\n";
print "define service\{\n";
print " use generic-service\n";
print " hostgroup_name myhpc1-compute-nodes\n";
print " service_description Current Load\n";
print " check_command snmp_load\n";
print " \}\n\n";
print "define service\{\n";
print " use generic-service\n";
print " hostgroup_name myhpc1-compute-nodes\n";
print " service_description RPC\n";
print " check_command check_rpc_port\n";
print " \}\n\n";
#!/bin/sh
echo "define hostgroup{"
echo " hostgroup_name hpc-infrastructure"
echo " alias HPC Infrastructure"
echo " members myhpc-prime"
echo " }"
hostlist=`qconf -shgrp @allhosts|grep hostlist`
echo "define hostgroup{"
echo " hostgroup_name all-queue"
echo " alias SGE Queue ALL"
echo -n " members "
for n in $hostlist
do
if [ $n != "hostlist" ]; then
echo -n "${n},"
fi
done
echo " "
echo " }"
hostlist=`qconf -shgrp @dynhosts|grep hostlist`
echo "define hostgroup{"
echo " hostgroup_name dyn-queue"
echo " alias SGE Queue Dyna"
echo -n " members "
for n in $hostlist
do
if [ $n != "hostlist" ]; then
echo -n "${n},"
fi
done
echo " "
echo " }"
hostlist=`qconf -shgrp @liqhosts|grep hostlist`
echo "define hostgroup{"
echo " hostgroup_name liq-queue"
echo " alias SGE Queue Liq"
echo -n " members "
for n in $hostlist
do
if [ $n != "hostlist" ]; then
echo -n "${n},"
fi
done
echo " "
echo " }"
Executing the scripts is pretty simple:
./mkclusterconf.pl > myhpc.cfg ./mkquegrps.sh >> myhpc.cfg
Laziness is a virtue for a sysadmin. Even being so lazy as to generate configuration scripts without the muss and fuss of having to yank and put a lot of lines.