Topics: AIX, Hardware, Storage, System Admin

Creating a dummy disk device

At some times it may be necessary to create a dummy disk device, for example when you need a disk to be discovered while running cfgmgr with a certain name on multiple hosts.

For example, if you need the disk to be called hdisk2, and only hdisk0 exists on the system, then running cfgmgr will discover the disk as hdisk1, not as hdisk2. In order to make sure cfgmgr indeed discovers the new disk as hdisk2, you can fool the system by temporarily creating a dummy disk device.

Here are the steps involved:

First: remove the newly discovered disk (in the example below known as hdisk1 - we will configure this disk as hdisk2):

# rmdev -dl hdisk1
Next, we create a dummy disk device with the name hdisk1:
# mkdev -l hdisk1 -p dummy -c disk -t hdisk -w 0000
Note that running the command above may result in an error. However, if you run the following command afterwards, you will notice that the dummy disk device indeed has been created:
# lsdev -Cc disk | grep hdisk1
hdisk1 Defined    SSA Logical Disk Drive
Also note that the dummy disk device will not show up if you run the lspv command. That is no concern.

Now run the cfgmgr command to discover the new disk. You'll notice that the new disk will now be discovered as hdisk2, because hdisk0 and hdisk1 are already in use.
# cfgmgr
# lsdev -Cc disk | grep hdisk2
Finally, remove the dummy disk device:
# rmdev -dl hdisk1

Topics: AIX, Hardware, System Admin

Identifying devices with usysident

There is a LED which you can turn on to identify a device, which can be useful if you need to replace a device. It's the same binary as being used by diag.

To show the syntax:

# /usr/lpp/diagnostics/bin/usysident ? 
usage: usysident [-s {normal | identify}] 
                 [-l location code | -d device name]
       usysident [-t] 
To check the LED status of the system:
# /usr/lpp/diagnostics/bin/usysident 
normal 
To check the LED status of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
normal
To activate the LED of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s identify -d hdisk1 
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
identify
To turn of the LED again of /dev/hdisk1:
# /usr/lpp/diagnostics/bin/usysident -s normal -d hdisk1 
# /usr/lpp/diagnostics/bin/usysident -d hdisk1 
normal
Keep in mind that activating the LED of a particular device does not activate the LED of the system panel. You can achieve that if you omit the device parameter.

Topics: Hardware, Installation, System Admin

Automating microcode discovery

You can run invscout to do a microcode discovery on your system, that will generate a hostname.mup file. Then you go upload this hostname.mup file at this page on the IBM website and you get a nice overview of the status of all firmware on your system.

So far, so good. What if you have plenty of systems and you want to automate this? Here's a script to do this. This script first does a webget to collect the latest catalog.mic file from the IBM website. Then it distributes this catalog file to all the hosts you want to check. Then, it runs invscout on all these hosts, and collects the hostname.mup files. It will concatenate all these files into 1 large file and do an HTTP POST through curl to upload the file to the IBM website and have a report generated from it.

So, what do you need?

  • You should have an AIX jump server that allows you to access the other hosts as user root through SSH. So you should have setup your SSH keys for user root.
  • This jump server must have access to the Internet.
  • You need to have wget and curl installed. Get it from the Linux Toolbox.
  • Your servers should be AIX 5 or higher. It doesn't really work with AIX 4.
  • Optional: a web server, like Apache 2, would be nice, so you can drop the resulting HTML file on your website every day.
  • An entry in the root crontab to run this script every day.
  • A list of servers you want to check.
Here's the script:
#!/bin/ksh

# script:  generate_survey.ksh
# purpose: To generate a microcode survey html file

# where is my list of servers located?
SERVERS=/usr/local/etc/servers

# what temporary folder will I use?
TEMP=/tmp/mup

# what is the invscout folder
INV=/var/adm/invscout

# what is the catalog.mic file location for invscout?
MIC=${INV}/microcode/catalog.mic

# if you have a webserver,
# where shall I put a copy of survey.html?
APA=/usr/local/apache2/htdocs

# who's the sender of the email?
FROM=microcode_survey@ibm.com

# who's the receiver of the email?
TO="your.email@address.com"

# what's the title of the email?
SUBJ="Microcode Survey"

# user check
USER=`whoami`
if [ "$USER" != "root" ];
then
    echo "Only root can run this script."
    exit 1;
fi

# create a temporary directory
rm -rf $TEMP 2>/dev/null
mkdir $TEMP 2>/dev/null
cd $TEMP

# get the latest catalog.mic file from IBM
# you need to have wget installed 
# and accessible in $PATH
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
wget techsupport.services.ibm.com/server/mdownload/catalog.mic
# You could also use curl here, e.g.:
#curl techsupport.services.ibm.com/server/mdownload/catalog.mic -LO

# move the catalog.mic file to this servers invscout directory
mv $TEMP/catalog.mic $MIC

# remove any old mup files
echo Remove any old mup files from hosts.
for server in `cat $SERVERS` ; do
   echo "${server}"
   ssh $server "rm -f $INV/*.mup"
done

# distribute this file to all other hosts
for server in `cat $SERVERS` ; do
   echo "${server}"
   scp -p $MIC $server:$MIC
done

# run invscout on all these hosts
# this will create a hostname.mup file
for server in `cat $SERVERS` ; do
   echo "${server}"
   ssh $server invscout
done

# collect the hostname.mup files
for server in `cat $SERVERS` ; do
   echo "${server}"
   scp -p $server:$INV/*.mup $TEMP
done

# concatenate all hostname.mup files to one file
cat ${TEMP}/*mup > ${TEMP}/muppet.$$

# delete all the hostname.mup files
rm $TEMP/*mup

# upload the remaining file to IBM.
# you need to have curl installed for this
# you can download this on:
# www-03.ibm.com
# /systems/power/software/aix/linux/toolbox/download.html
# you can install it like this:
# rpm -ihv 
#    curl-7.9.3-2.aix4.3.ppc.rpm curl-devel-7.9.3-2.aix4.3.ppc.rpm
# more info on using curl can be found on: 
# http://curl.haxx.se/docs/httpscripting.html
# more info on uploading survey files can be found on:
# www14.software.ibm.com/webapp/set2/mds/fetch?pop=progUpload.html

# Sometimes, the IBM website will respond with an
# "Expectation Failed" error message. Loop the curl command until
# we get valid output.

stop="false"

while [ $stop = "false" ] ; do

curl -H Expect: -F mdsData=@${TEMP}/muppet.$$ -F sendfile="Upload file" \ 
   http://www14.software.ibm.com/webapp/set2/mds/mds \
   > ${TEMP}/survey.html

#
# Test if we see Expectation Failed in the output
#

unset mytest
mytest=`grep "Expectation Failed" ${TEMP}/survey.html`

if [ -z "${mytest}" ] ; then
        stop="true"
fi

sleep 10

done

# now it is very useful to have an apache2 webserver running
# so you can access the survey file
mv $TEMP/survey.html $APA

# tip: put in the crontab daily like this:
# 45 9 * * * /usr/local/sbin/generate_survey.ksh 1>/dev/null 2>&1

# mail the output
# need to make sure this is sent in html format
cat - ${APA}/survey.html <<HERE | sendmail -oi -t
From: ${FROM}
To: ${TO}
Subject: ${SUBJ}
Mime-Version: 1.0
Content-type: text/html
Content-transfer-encoding: 8bit

HERE

# clean up the mess
cd /tmp
rm -rf $TEMP

Topics: Hardware, Networking

Integrated Virtual Ethernet adapter

The "Integrated Virtual Ethernet" or IVE adapter is an adapter directly on the GX+ bus, and thus up to 3 times faster dan a regular PCI card. You can order Power6 frames with different kinds of IVE adapters, up to 10GB ports.

The IVE adapter acts as a layer-2 switch. You can create port groups. In each port group up to 16 logical ports can be defined. Every port group requires at least 1 physical port (but 2 is also possible). Each logical port can have a MAC address assigned. These MAC addresses are located in the VPD chip of the IVE. When you replace an IVE adapters, LPARS will get new new MAC addresses.

Each LPAR can only use 1 logical port per physical port. Different LPARs that use logical ports from the same port group can communicate without any external hardware needed, and thus communicate very fast.

The IVE is not hot-swappable. It can and may only be replaced by certified IBM service personnel.

First you need to configure an HAE adapter; not in promiscues mode, because that is meant to be used if you wish to assign a physical port dedicated to an LPAR. After that, you need to assign a LHAE (logical host ethernet adapter) to an LPAR. The HAE needs to be configured, and the frame needs to be restarted, in order to function correctly (because of the setting of multi-core scaling on the HAE itself).

So, to conclude: You can assign physical ports of the IVE adapter to separate LPARS (promiscues mode). If you have an IVE with two ports, up to two LPARS can use these ports. But you can also configure it as an HAE and have up to 16 LPARS per physical port in a port group using the same interface (10Gb ports are recommended). There are different kinds of IVE adapters; some allow to create more port groups and thus more network connectivity. The IVE is a method of virtualizing ethernet without the need for VIOS.

Topics: AIX, Hardware, Logical Partitioning

Release adapter after DLPAR

An adapter that has previously been added to a LPAR and now needs to be removed, usually doesn't want to be removed from the LPAR, because it is in use by the LPAR. Here's how you find and remove the involved devices on the LPAR:

First, run:

# lsslot -c pci
This will find the adapter involved.

Then, find the parent device of a slot, by running:
# lsdev -Cl [adapter] -F parent
(Fill in the correct adapter, e.g. fcs0).

Now, remove the parent device and all its children:
# rmdev -Rl [parentdevice] -d
For example:
# rmdev -Rl pci8 -d
Now you should be able to remove the adapter via the HMC from the LPAR.

If you need to replace the adapter because it is broken and needs to be replaced, then you need to power down the PCI slot in which the adapter is placed:

After issuing the "rmdev" command, run diag and go into "Task Selection", "Hot Plug Task", "PCI Hot Plug Manager", "Replace/Remove a PCI Hot Plug Adapter". Select the adapter and choose "remove".

After the adapter has been replaced (usually by an IBM technician), run cfgmgr again to make the adapter known to the LPAR.

Topics: Hardware, Installation

Power 5 color codes

Ever noticed the different colors on parts of Power5 systems? Some parts are orange, others are blue. Orange means: you can touch it, open it, remove it, even if the system is running. Blue means: don't touch it if the system is active.

Topics: Hardware, SAN, SDD, Storage

How-to replace a failing HBA using SDD storage

This is a procedure how to replace a failing HBA or fibre channel adapter, when used in combination with SDD storage:

  • Determine which adapter is failing (0, 1, 2, etcetera):
    # datapath query adapter
  • Check if there are dead paths for any vpaths:
    # datapath query device
  • Try to set a "degraded" adapter back to online using:
    # datapath set adapter 1 offline
    # datapath set adapter 1 online
    (that is, if adapter "1" is failing, replace it with the correct adapter number).
  • If the adapter is still in a "degraded" status, open a call with IBM. They most likely require you to take a snap from the system, and send the snap file to IBM for them to analyze and they will conclude if the adapter needs to be replaced or not.
  • Involve the SAN storage team if the adapter needs to be replaced. They will have to update the WWN of the failing adapter when the adapter is replaced for a new one with a new WWN.
  • If the adapter needs to be replaced, wait for the IBM CE to be onsite with the new HBA adapter. Note the new WWN and supply that to the SAN storage team.
  • Remove the adapter:
    # datapath remove adapter 1
    (replace the "1" with the correct adapter that is failing).
  • Check if the vpaths now all have one less path:
    # datapath query device | more
  • De-configure the adapter (this will also de-configure all the child devices, so you won't have to do this manually), by running: diag, choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Unconfigure a Device. Select the correct adapter, e.g. fcs1, set "Unconfigure any Child Devices" to "yes", and "KEEP definition in database" to "no". Hit ENTER.
  • Replace the adapter: Run diag and choose Task Selection, Hot Plug Task, PCI Hot Plug manager, Replace/Remove a PCI Hot Plug Adapter. Choose the correct device (be careful, you won't see the adapter name here, but only "Unknown", because the device was unconfigured).
  • Have the IBM CE replace the adapter.
  • Close any events on the failing adapter on the HMC.
  • Validate that the notification LED is now off on the system, if not, go back into diag, choose Task Selection, Hot Plug Task, PCI Hot Plug Manager, and Disable the attention LED.
  • Check the adapter firmware level using:
    # lscfg -vl fcs1
    (replace this with the actual adapter name).

    And if required, update the adapter firmware microcode. Validate if the adapter is still functioning correctly by running:
    # errpt
    # lsdev -Cc adapter
  • Have the SAN admin update the WWN.
  • Run:
    # cfgmgr -S
  • Check the adapter and the child devices:
    # lsdev -Cc adapter
    # lsdev -p fcs1
    # lsdev -p fscsi1
    (replace this with the correct adapter name).
  • Add the paths to the device:
    # addpaths
  • Check if the vpaths have all paths again:
    # datapath query device | more

Topics: Hardware, Installation, Networking

IP address service processors

With POWER6, the default addresses of the service processors have changed. This only applies to environments where the managed system was powered on before the HMC was configured to act as an DHCP server. The Service Processors may get their IP-Addresses by three different mechanisms:

  1. Addresses received from a DHCP Server.
  2. Fixed addresses given to the interfaces using the ASMI.
  3. Default addresses if neither of the possibilities above is used.
The default addresses are different betweeen POWER5 and POWER6 servers. With POWER5 we have the following addresses:
Port HMC1: 192.168.2.147/24
Port HMC2: 192.168.3.147/24
The POWER6 systems use the following addresses:
First Service Processor:
Port HMC1: 169.254.2.147/24
Port HMC2: 169.254.3.147/24

Second Service Processor:
Port HMC1: 169.254.2.146/24
Port HMC2: 169.254.3.146/24
Link: System p Operations Guide for ASMI and for Nonpartitioned Systems.

Topics: AIX, Hardware, Monitoring

Temperature monitoring

Older pSeries systems (Power4) are equipped with environmental sensors. You can read the sensor values using:

# /usr/lpp/diagnostics/bin/uesensor -l
You can use these sensors to monitor your systems and your computer rooms. It isn't very difficult to create a script to monitor these environmental sensors regularly and to display it on a webpage, updating it automatically. Newer systems (LPAR based) are not equipped with these environmental sensors. For PC systems several products exist, which attach to either a RJ45 or a parallel port and which can be used to monitor temperatures.

Number of results found for topic Hardware: 9.
Displaying results: 1 - 9.