Topics: PowerHA / HACMP

A few HACMP rules

With HACMP clusters documentation is probably the most important issue. You cannot properly manage an HACMP cluster if you do not document it. Document the precise configuration of the complete cluster and document any changes you've carried out. Also document all management procedures and stick to them! The cluster snapshot facility is an excellent way of documenting your cluster.

Next step: get educated. You have to know exactly what you're doing on an HACMP cluster. If you have to manage a production cluster, getting a certification is a necessity. Don't ever let non-HACMP-educated UNIX administrators on your HACMP cluster nodes. They don't have a clue of what's going on and probably destroy your carefully layed-out configuration.

Geographically separated nodes are important! Too many cluster nodes just sit on top of each other in the same rack. What if there's a fire? Or a power outage? Having an HACMP cluster won't help you if both nodes are on a single location, use the same power, or the same network switches.

Put your HACMP logs in a sensible location. Don't put them in /tmp knowing that /tmp gets purged every night....

Test, test, test and test your cluster over again. Doing take-over tests every half year is best practice. Document your tests, and your test results.

Don't assume that your cluster is high available after installing the cluster software. There are a lot of other things to consider in your infrastructure to avoid single points of failures, like: No two nodes sharing the same I/O drawer; Power redundancy; No two storage or network adapters on the same SCSI backplane or bus; Redundancy in SAN HBA's; Application monitoring in place.



If you found this useful, here's more on the same topic(s) in our blog:


UNIX Health Check delivers software to scan Linux and AIX systems for potential issues. Run our software on your system, and receive a report in just a few minutes. UNIX Health Check is an automated check list. It will report on perfomance, capacity, stability and security issues. It will alert on configurations that can be improved per best practices, or items that should be improved per audit guidelines. A report will be generated in the format you wish, and the report includes the issues discovered and information on how to solve the issues as well.

Interested in learning more?