
  ####   #       #    #   ####    #####  ######  #####
 #    #  #       #    #  #          #    #       #    #
 #       #       #    #   ####      #    #####   #    #
 #       #       #    #       #     #    #       #####
 #    #  #       #    #  #    #     #    #       #   #
  ####   ######   ####    ####      #    ######  #    #

       ####    #####    ##     #####  #    #   ####
      #          #     #  #      #    #    #  #
       ####      #    #    #     #    #    #   ####
           #     #    ######     #    #    #       #
      #    #     #    #    #     #    #    #  #    #
       ####      #    #    #     #     ####    ####



Welcome to cluster_status, the multiple HACMP/PowerHA cluster monitoring script.
Author: AIX Health Check
Version: 1.6
Date: March 24, 2015



Why was this script created?

For monitoring a HACMP/PowerHA cluster, you may use clinfo, which is part of HACMP/PowerHA. Clinfo can be configured to monitor up to 8 clusters. No more than 8 are allowed. And even when clinfo can monitor 8 clusters, you can't get a single view on all 8 clusters in once. In HACMP/PowerHA 5.1 IBM introduced an clinfo.cgi, which creates an HTML page of the status of your HACMP/PowerHA clusters, but also that is limited up to 8 clusters, and again, it doesn't provide a single overview, but a long HTML page.

We had 33 HACMP/PowerHA clusters to monitor. We also had a webserver on an AIX server. So, the idea was born to collect HACMP/PowerHA cluster status information and to produce a single HACMP/PowerHA web page, looking the same as xclstat.


How does it work?

Cluster_status will first collect all cluster status information via SNMP (which is actually the same way as clinfo does). Cluster_status uses scripts snmp.sh to collect this information. Snmp.sh will store a file in a subfolder called snmp for each cluster, named clusternumber.out. Cluster_status will then generate an HTML page from all the *.out files found.


Supported versions

We have tested this script with several HACMP/PowerHA 4.x versions and HACMP/PowerHA 5.x, up to HACMP/PowerHA 5.4.01 on AIX 5.2 and HACMP/PowerHA 5.5.05 on AIX 5.3. We've tested it with clusters with up to 3 nodes. We've tested it with PowerHA 6.1. And we've also tested it with Oracle 9 RAC clusters.


How do I get it to work?

Store the scripts cluster_status, snmp.sh and files clhosts and hacmp.defs in a separate folder on an AIX web server.

You'll need to collect some information about all your HACMP/PowerHA clusters and store this information in a file called clhosts (within the same directory as cluster_status). A sample file, clhosts.sample, is provided with this package. Create an entry per HACMP/PowerHA cluster, that looks something like:

cluster:777:hostname1 hostname2:dmz:DB CLUSTER:public
interface:777:hostname1.somewhere.net:
interface:777:hostname2.somewhere.net:

A row starting with "cluster" defines a cluster. This row contains the clusternumber (777 - You may retrieve the cluster number by running clstst), the hostnames the cluster contains (you may enter up to 3 hostnames here, divided by spaces), if the cluster is within a dmz (specify "dmz" if it is in a DMZ, otherwise leave blank), and a short description, followed by the SNMP community, usually "public", but if you've changed that into something else, you can provide the correct community here.

The hostnames in the cluster row are used on the HTML page. The short description will also be used on the HTML page.

The next rows to create are "interface rows". Define one for every pingable address you know of the cluster. If you're using management adapters within an HACMP/PowerHA cluster, just define these management adapters only (Management adapters are adapters not part of any resource group in an HACMP/PowerHA cluster). Or you can use persistant label addresses. The "interface row" includes, again the cluster number, and the hostname or IP address to probe for SNMP information. Ofcourse, your cluster numbers within your organization should be unique! The snmp.sh script will try to ping every "interface" specified here, and subsequently will try to collect SNMP information on one of those interfaces. Be sure to separate all entries in file clhosts using colons (:).

If you've collected all your cluster information in file clhosts, then test if SNMP communication works to all your clusters.

A good test is:

snmpinfo -m dump -t 1 -v -c public -h <hostname> -o /path/to/hacmp.defs clusterid

If this does not work, make sure you have a line in your /etc/snmpd.conf or snmpdv3.conf (depending on what level of SNMP you are using), that allows access, for example, for the public community:

COMMUNITY public    public     noAuthNoPriv 0.0.0.0     0.0.0.0         -

Another line entry that may need to present in your SNMP configuration file is:

VACM_VIEW defaultView        internet                  - included -

To determine what level of snmpd you are using, run:

# ls -als /usr/sbin/snmpd

If this returns a link to snmpdv3ne, then you're using SNMP v3, otherwise, you are using snmpd v1. Version 1 of snmpd uses configuration file /etc/snmpd.conf. Version 3 of SNMPD uses /etc/snmpdv3.conf.

If you make changes to the SNMP configuration file, you need to restart snmp:

# stopsrc -s snmpd
# startsrc -s sndmp

If you restart snmpd on a clustered node, then it's best to also restart clinfoES (which can be done safely on an active cluster):

# stopsrc -s clinfoES
# startsrc -s clinfoES

The snmpinfo test above should return the clusterid of the cluster your probing. In the case, where your cluster is within a DMZ and a firewall is blocking SNMP traffic, script snmp.sh will use ssh to collect any information (just enter "dmz" in the "cluster definition row" in file dmz, described above). Be sure that the userid, that you will be using to run the cluster_status script, can use ssh to nodes of a cluster without the need of entering any passwords.

If you have all this working, then edit script cluster status.

In the beginning of this script are some variable declarations, you might want to adjust:
BASE (where is cluster_status script located)
MYURL (what is the URL of the web page)
MYPAGELOCATION (where will the HTML page be stored on the web server)
REFRESH (how often should the HTML page refresh)

Then edit script snmp.sh and adjust the BASE variable to your directory.

First, run script snmp.sh to see if it collects any information. 

Snmp.sh should create a subfolder called snmp and store any out files in this subfolder.
See if the out files reflect the correct cluster status information for all your clusters.

The contents of a single out file should look something like:
Cluster ID: 1
Cluster Name: DB
Cluster State: UP
Cluster SubState: STABLE
Version: 5.4
Number of nodes: 2
Node 1: nodename1 UP
Node 2: nodename2 UP
Node 3:
Node 4:

Then, run script cluster_status, to allow it to run snmp.sh and create the web page. cluster_status will, by default, create an HTML page called "cl.html", but you can adjust it by modifying the MYPAGELOCATION variable.

Last but not least, if all is working, put your cluster_status script in a crontab:
* * * * * /ibm/clinfo/cluster_status
(adjust it to the path you're using)
This will run the script every minute. Cluster_status will check to see if it is already active.
If so, it will quit. If not, it will create the webpage.


Terms of use

You may use this script without any limitations.
I only require that you leave the header of the scripts unchanged.
I do like to hear from you, what you think of it and how many
clusters you are monitoring with it.

Any suggestions and/or questions can be sent to info@aixhealthcheck.com.
Visit webpage for additional information: www.aixhealthcheck.com

