Aug 2009
Many Linux distributions ship with the heartbeat
suite of software for setting up High Availability
Linux. The Linux HA
project has details and downloads for those who do not have it
available for their system. This text addresses setting up a very
simple HA Linux configuration using the configuration files versus
a GUI or the XML definition files.
The example setup will have two servers that serve up an apache webserver. One can have many other services assigned as well and shared data over NFS for example. For instance, if the failover was for an apache server where htdocs was sitting on storage it could be like so:
mass storage/NFS/iSCSI/Fibre
|
server1:/mnt/htdocs server2:/mnt/htdocs
Similarly a common mysql db backend could be available or even more exotic tiered mysql dbs - basically whatever the needs are. What Linux HA can do is using a shared IP it can host the same IP from any server in the cluster list. For demonstration purposes, however, the apache servers root will have an index file with the actual hostname of the system - what should be observed is the index file contents will change after a failover but still be accessible via the shared IP.
Following are the hostnames and ipv4 addresses that will be used:
SuSE, RedHat and Debian all support the heartbeat packages from the Linux HA project. Since the example is SuSE 11 the syntax is:
yast -i heartbeat
A basic apache server for the test is required as well:
yast -i apache2
To illustrate the test, a simple page on each webserver with its
hostname can be used and put into
/srv/www/htdocs/index.html:
<html><head></head<body>prime</body></html> <html><head></head<body>calc</body></html>
Next - startup and set to start at boot the webservers (run on both systems):
service apache2 start chkconfig apache2 on
Now time to test the systems separately with lynx
--dump:
# lynx --dump prime prime # lynx --dump calc calc
Last and not least the hosts must be able to resolve to eachother by name. Host file entries work fine for this.
Configuring the HA services is relatively simple. This configuration is very basic; three files are needed:
For the example setup the ha.cf file looks like the following:
debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 10 udpport 694 bcast eth0 node calc node prime
The above options are pretty straightforward; where the debuglog is, logfile, what level, tcp keepalive in seconds, deadtime in between in seconds, what udp port, what interface to broadcast on then the nodes in the cluster.
The documentation explains the various options but for this example there is one md5, so the authkeys file has the following in it:
auth 1 1 md5 86e07a217fcd61fb981872ec57b68845
The sum was generated by simply echoing a string and piping it
to md5sum. Also the authkeys file must be read only
root:
chmod 0600 authkeys
The resources file dictates the shared address and services in init to startup (or shutdown as the case may be):
prime 192.168.1.20 apache2
The starting
or primary server is put as the first
argument. Now the the configuration is done on the primary server -
the exact same settings can be used on the secondary one.
Starting up is pretty simple:
# chkconfig heartbeat on
# service heartbeat start
Starting High-Availability services2009/07/25_21:04:30 INFO: \
Resource is stopped
heartbeat[4071]: 2009/07/25_21:04:30 info: Version 2 support: false
heartbeat[4071]: 2009/07/25_21:04:30 info: **************************
heartbeat[4071]: 2009/07/25_21:04:30 info: \
Configuration validated. Starting heartbeat 2.99.3
Now a litmus test of the shared address:
# lynx --dump 192.168.1.20 prime
Testing can be a little tricky - the simplest way is to stop the heartbeat service on the active node and let the other one take over, observe the log entries on the calc node:
IPaddr[5106]: 2009/07/25_21:32:55 INFO: eval \
ifconfig eth0:0 192.168.1.20 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[5089]: 2009/07/25_21:32:55 INFO: Success
ResourceManager[5006]: 2009/07/25_21:32:55 \
info: Running /etc/init.d/apache2 start
mach_down[4980]: 2009/07/25_21:32:58 info: \
mach_down takeover complete for node prime.
heartbeat[4241]: 2009/07/25_21:33:05 WARN: node prime: is dead
heartbeat[4241]: 2009/07/25_21:33:05 info: Dead node prime gave up resources.
heartbeat[4241]: 2009/07/25_21:33:05 info: Resources being acquired from prime.
heartbeat[4241]: 2009/07/25_21:33:05 info: Link prime:eth0 dead.
harc[5258]: 2009/07/25_21:33:06 info: Running /etc/ha.d/rc.d/status status
heartbeat[5259]: 2009/07/25_21:33:06 info: \
No local resources [/usr/share/heartbeat/ResourceManager \
listkeys calc] to acquire.
mach_down[5287]: 2009/07/25_21:33:06 info: \
Taking over resource group 192.168.1.20
ResourceManager[5313]: 2009/07/25_21:33:06 \
info: Acquiring resource group: prime 192.168.1.20 apache2
IPaddr[5340]: 2009/07/25_21:33:06 INFO: Running OK
mach_down[5287]: 2009/07/25_21:33:07 \
info: mach_down takeover complete for node prime.
And a quick check with lynx:
# lynx --dump 192.168.1.20 calc
Note that once prime is back online that calc gives control back:
ResourceManager[5515]: 2009/07/25_21:33:43 info: \
Releasing resource group: prime 192.168.1.20 apache2
ResourceManager[5515]: 2009/07/25_21:33:43 info: \
Running /etc/init.d/apache2 stop
ResourceManager[5515]: 2009/07/25_21:33:44 info: \
Running /etc/ha.d/resource.d/IPaddr 192.168.1.20 stop
IPaddr[5592]: 2009/07/25_21:33:44 INFO: ifconfig eth0:0 down
IPaddr[5575]: 2009/07/25_21:33:44 INFO: Success
The example shown here is very primitive, ideally there would be
a dedicated nic on each machine for this function or a serial
connection between them. Also the services are not exactly exotic
but this should be enough to get a Linux HA setup off the
ground
.