Topics: Red Hat, System Admin

Red Hat: Creating a backup to ISO images

The following procedure describes how to create a full system backup, using MondoRescue, to ISO images, that can later be burnt to DVD, and used to recover the entire system.

First, set up the REPO for MondoResuce:

# cd /etc/yum.repos.d/
# wget ftp://ftp.mondorescue.org/rhel/7/x86_64/mondorescue.repo
Install MondoRescue:
# yum install mondo
Answer "y" to everything.

You will need a destination to put the ISO files in. For example a remote NFS mount on a separate server is a good choice, so the backup is not locally on the same system.

Edit /etc/mindi/mindi.conf, to allow for a larger RAM disk. Mindi is used by Mondo. Wihout it, Mindi will exit saying it ran out of space. Add to mindi.conf:
EXTRA_SPACE=240000
BOOT_SIZE=240000
Now run the MondoRescue backup:
# mondoarchive -O -V -i -s 4480m -d /target -I / -T /tmp
You can also add the -E option to tell MondoRescue to exclude certain folders.

The -s option tells MondoResuce to make ISO images of DVD size 4480m.

The command says it will log to /var/log/mondoarchive.log. A /var/log/mindi.log is also written. It will also indicate the number of media images to be created. Let it run, and your backup is successful.

Topics: AIX, System Admin

Configuring dsh

The dsh (distributed shell) is a very useful (and powerful) utility that can be used to run commands on multiple servers at the same time. By default it is not installed on AIX, but you can install it yourself:

First, install the dsm file sets. DSM is short for Distributed Systems Management, and these filesets include the dsh command. These file sets can be found on the AIX installation media. Install the following 2 filesets:

# lslpp -l | grep -i dsm
  dsm.core       7.1.4.0  COMMITTED  Distributed Systems Management
  dsm.dsh        7.1.4.0  COMMITTED  Distributed Systems Management
Next, we'll need to set up some environment variables that are being used by dsh. The best way to do it, is by putting them in the .profile of the root user (in ~root/.profile), so you won't have to bother setting these environment variables manually every time you log in:
# cat .profile
alias bdf='df -k'
alias cls="tput clear"
stty erase ^?
export TERM=vt100

# For DSH
export DSH_NODE_RSH=/usr/bin/ssh
export DSH_NODE_LIST=/root/hostlist
export DSH_NODE_OPTS="-q"
export DSH_REMOTE_CMD=/usr/bin/ssh
export DCP_NODE_RCP=/usr/bin/scp
export DSH_CONTEXT=DEFAULT
In the output from .profile above, you'll notice that variable DSH_NODE_LIST is set to /root/hostlist. You can update this to any file name you like. The DSH_NODE_LIST variable points to a text file with server names in them (1 per line), that you will use for the dsh command. Basically, every host name of a server that you put in the list that DSH_NODE_LIST refers to, will be used to run a command on using the dsh command. So, if you put 3 host names in the file, and then run a dsh command, that command will be executed on these 3 hosts in parallel.

Note: You may also use the environment variable WCOLL instead of DSH_NODE_LIST.

So, create file /root/hostlist (or any file that you've configured for environment variable DSH_NODE_LIST), and add host names in it. For example:
# cat /root/hostlist
host1
host2
host3
Next, you'll have to set up the ssh keys for every host in the hostlist file. The dsh command uses ssh to run commands, so you'll have to enable password-less ssh communication from the host where you've installed dsh on (let's call that the source host), to all the hosts where you want to run commands using dsh (and we'll call those the target hosts).

To set this up, follow these steps:
  • Run "ssh-keygen -t rsa" as user root on the source and all target hosts.
  • Next, copy the contenst of ~root/.ssh/id_rsa.pub from the source host into file ~root/.ssh/authorized_keys on all the target hosts.
  • Test if you can ssh from the source hosts, to all the target hosts, by running: "ssh host1 date", for each target host. If you're using DNS, and have fully qualified domain names configured for your hosts, you will want to test by performing a ssh to the fully qualified domain name instead, for example: "ssh host1.domain.com". This is because dsh will also resolve host names through DNS, and thus use these instead of the short host names. You will be asked a question when you run ssh for the first time from the source host to the target host. Answer "yes" to add an entry to the known_host file.
Now, ensure you log out from the source hosts, and log back in again as root. Considering that you've set some environment variables in .profile for user root, it is necessary that file .profile gets read, which is during login of user root.

At this point, you should be able to issue a command on all the target hosts, at the same time. For example, to run the "date" command on all the servers:
# dsh date
Also, you can now copy files using dcp (notice the similarity between ssh and dsh, and scp and dcp), for example to copy a file /etc/exclude.rootvg from the source host to all the target hosts:
# dcp /etc/exclude.rootvg /etc/exclude.rootvg
Note: dsh and dcp are very powerful to run commands on multiple servers, or to copy files to multiple servers. However, keep in mind that they can be very destructive as well. A command, such as "dsh halt -q", will ensure you halt all the servers at the same time. So, you probably may want to triple-check any dsh or dcp commands that you want to run, before actually running them. That is, if you value your job, of course.

Topics: AIX, System Admin

Copy printer configuration from one AIX system to another

The following procedure can be used to copy the printer configuration from one AIX system to another AIX system. This has been tested using different AIX levels, and has worked great. This is particularly useful if you have more than just a few printer queues configured, and configuring all printer queues manually would be too cumbersome.

  1. Create a full backup of your system, just in case something goes wrong.
  2. Run lssrc -g spooler and check if qdaemon is active. If not, start it with startsrc -s qdaemon.
  3. Copy /etc/qconfig from the source system to the target system.
  4. Copy /etc/hosts from the source system to the target system, but be careful to not lose important entries in /etc/hosts on the target system (e.g. the hostname and IP address of the target system should be in /etc/hosts).
  5. On the target system, refresh the qconfig file by running: enq -d
  6. On the target system, remove all files in /var/spool/lpd/pio/@local/custom, /var/spool/lpd/pio/@local/dev and /var/spool/lpd/pio/@local/ddi.
  7. Copy the contents of /var/spool/lpd/pio/@local/custom on the source system to the target system into the same folder.
  8. Copy the contents of /var/spool/lpd/pio/@local/dev on the source system to the target system into the same folder.
  9. Copy the contents of /var/spool/lpd/pio/@local/ddi on the source system to the target system into the same folder.
  10. Create the following script, called newq.sh, and run it:
    #!/bin/ksh
    
    let counter=0
    cp /usr/lpp/printers.rte/inst_root/var/spool/lpd/pio/@local/smit/* \
       /var/spool/lpd/pio/@local/smit
    cd /var/spool/lpd/pio/@local/custom
    chmod 775 /var/spool/lpd/pio/@local/custom
    for FILE in `ls` ; do
       let counter="$counter+1"
       chmod 664 $FILE
       QNAME=`echo $FILE | cut -d':' -f1`
       DEVICE=`echo $FILE | cut -d':' -f2`
       echo $counter : chvirprt -q $QNAME -d $DEVICE
       chvirprt -q $QNAME -d $DEVICE
    done
    
  11. Test and confirm printing is working.
  12. Remove file newq.sh.

Topics: HMC, System Admin

Command line upgrade of HMC

This is how you update your HMC form version 7.9.0 to service pack 3 and all necessary fixes. At the time of writing, service pack 3 is the latest available service pack, and there are 2 fixes available for V7 R7.9.0 SP3, called MH01587 and MH01605. So the following procedure assumes that your HMC is currently at the base level of version 7.9.0, without any additional fixes or service packs installed.

This procedure is completely command line based. For this to work, you need to be able to ssh into the HMC using the hscroot user. For example, if your HMC is called yourhmc, you should be able to do this:

# ssh -l hscroot yourhmc
We also need to make sure we have some backups. Start with saving some output:
# lshmc -v
# lshmc -V
# lshmc -n
# lshmc -r  
The information outputted by the lshmc command is useful to determine what is currently installed on the HMC.

Next, take a console data backup of the HMC:
# bkconsdata -r nfs -h 10.11.12.13 -l /mksysb/HMC -d backupfile
The bkconsdata command above will backup the console data of the HMC via NFS to host 10.11.12.13 (replace with your own server name of IP address), and will store it in /mksysb/HMC/backupfile (replace /mksysb/HMC and backupfile in the bkconsdata command above to represent the correct location to back up to on your NFS server).

Mext, make a backup of the profiles for each managed server:
# bkprofdata -m  -f  --force 
The bkprofdata command above requires the name of each managed system. A good way to know the names of the managed systems configured on the HMC, is by running the following command:
# lssysconn -r all
Now that we have all the necessary backups, it's time to perform the actual upgrade.

Let's start with the upgrade to Service Pack 3:
# updhmc -t s -h ftp.software.ibm.com -u anonymous -p ftp -f /software/server/hmc/updates/HMC_Update_V7R790_SP3.iso -r
This will download the service pack from the IBM site to the HMC via FTP and upgrade the HMC, and reboot it. This may take a while. The updhmc command may return a prompt after the download is completed, but that does not mean the update has occurred already. Please allow it to install and reboot. A message will be shown on the screen *The system is shutting down for reboot now". After the reboot, run the "lshmc -V" command again. It may take some time for the lshmc command will respond with proper output. Again, give it some time. As soon as the lshmc command shows that the service pack is installed, then you can move forward to the next step.

The next step is installing the fixes:
# updhmc -t s -h ftp.software.ibm.com -u anonymous -p ftp -f /software/server/hmc/fixes/MH01587.iso -r
And...
# updhmc -t s -h ftp.software.ibm.com -u anonymous -p ftp -f /software/server/hmc/fixes/MH01605.iso -r
After each fix is installed, the HMC will reboot, and you'll have to check with "lshmc -V" if the fix is installed.

And that concludes the upgrade. If any new service packs and or fixes are released by IBM you can install them in a similar fashion.

Topics: AIX, System Admin

Running bootp in debug mode to troubleshoot NIM booting

If you have a LPAR that is not booting from your NIM server, and you're certain the IP configuration on the client is correct, for example by completing a successful ping test, then you should have a look at the bootp process on the NIM server as a possible cause of the issue.

To accomplish this, you can put bootp into debug mode. Edit file /etc/inetd.conf, and comment out the bootps entry with a hash mark (#). This will help to avoid bootp being started by the inetd in response to a bootp request. Then refresh the inetd daemon, to pick up the changes to file /etc/inetd.conf:

# refresh -s inetd
Now check if any bootpd processes are running. If necessary, use kill -9 to kill them. Again check if no more bootpd processes are active. Now that bootp has stopped go ahead and bring up another PuTTY window on your NIM master. You'll need another window opened, because putting bootp into debug is going to lock the window, while it is active. Run the following command in that window:
# bootpd -d -d -d -d -s
Now you can retry to boot the LPAR from your NIM master, and you should see information scrolling by of what is going on.

Afterwards, once you've identified the issue, make sure to stop the bootpd process (just hit ctrl-c to make it stop), and change file /etc/inetd.conf back the way it was, and run refresh -s inetd to refresh it again.

Topics: Red Hat, System Admin

Increase the size of a tmpfs file system

On Linux systems, a tmpfs filesystem keeps the entire filesystem (with all its files) in virtual memory. All data is stored in memory, which means the data is temporary and will be lost after a reboot. If you unmount the filesystem, all data in the file system is gone. You can also a lot of installations using a tmpfs for /tmp and hence anything written to /tmp is wiped after a reboot.

To increase the size, do the following:

Modify /etc/fstab line to look something like this:

none /raw tmpfs defaults,size=2G 0 0
Then, re-mount the file system:
# mount -o remount /raw # df -h
Note: Be careful not to increase it too much as the system will use up real memory.

Topics: AIX, Storage, System Admin

Allocating shared storage to VIOS clients

The following is a procedure to add shared storage to a clustered, virtualized environment. This assumes the following: You have a PowerHA cluster on two nodes, nodeA and nodeB. Each node is on a separate physical system, and each node is a client of a VIOS. The storage from the VIOS is mapped as vSCSI to the client. Client nodeA is on viosA, and client nodeB is on viosB. Futhermore, this procedure assumes you're using SDDPCM for multi-pathing on the VIOS.

First of all, have your storage admin allocate and zone shared LUN(s) to the two VIOS. This needs to be one or more LUNs that is zoned to both of the VIOS. This procedure assumes you will be zoning 4 LUNs of 128 GB.

Once that is completed, then move to work on the VIOS:

SERVER: viosA

First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.

# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
Change PVID for the disks (repeat for all the LUNs):
# chdev -l hdiskX -a pv=yes
Next, map the new LUN from viosA to the nodeA lpar. You'll need to know 2 things here: [a] What vhost adapter (or "vadapter) to use, and [b] what name to give the new device (or "virtual target device"). Have a look at the output of the "lsmap -all" command that you ran previously. That will provide you information on the current naming scheme for the virtual target devices. Also, it will show you what vhost adapters already exist, and are in use for the client. In this case, we'll assume the vhost adapter is vhost0, and there are already some virtual target devices, called: nodeA_vtd0001 through nodeA_vtd0019. The new four LUNs therefore will be named: nodeA_vtd0020 through nodeA_vtd0023. We'll also assume the new disks are numbered hdisk44 through hdisk47.
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeA_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeA_vtd0021
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeA_vtd0022
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeA_vtd0023
Now the mapping of the LUNs is complete on viosA. You'll have to repeat the same process on viosB:

SERVER: viosB

First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.
# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
No need to set the PVID this time. It was already configured on viosA, and after running the cfgdev command, the PVID should be visible on viosB, and it should match the PIVIDs on viosA. Make sure this is correct:
# lspv
Map the new LUN from viosB to the nodeB lpar. Again, you'll need to know the vadapter and the virtual target device names to use, and you can derive that information by looking at the output of the "lsmap -all" command. If you've done your work correctly in the past, the naming of the vadapter and the virtual target devices will probably be the same on viosB as on viosA:
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeB_vtd0020
Now that the mapping on both the VIOS has been completed, it is time to move to the client side. First, gather some information about the PowerHA cluster on the clients, by running as root on the nodeA client:
# clstat -o
# clRGinfo
# lsvg |lsvg -pi
Run cfgmgr on nodeA to discover the mapped LUNs, and then on nodeB:
# cfgmgr
# lspv
Ensure that the disk attributes are correctly set on both servers. Repeat the following command for all 4 new disks:
# chdev -l hdiskX -a algorithm=fail_over -a hcheck_interval=60 -a queue_depth=20 -a reserve_policy=no_reserve
Now you can add the 4 new added physical volumes to a shared volume group. In our example, the shared volume group is called sharedvg, and the newly discovered disks are called hdisk55 through hdisk58. Finally, the concurrent resource group is called concurrent_rg.
# /usr/es/sbin/cluster/sbin/cl_extendvg -cspoc -g'concurrent_rg' -R'nodeA' sharedvg hdisk55 hdisk56 hdisk57 hdisk58
Next, you can move forward to creating logical volumes (and file systems if necessary), for example, when creating raw logical volumes for an Oracle database:
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw5 sharedvg 1023 hdisk55
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw6 sharedvg 1023 hdisk56
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw7 sharedvg 1023 hdisk57
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw8 sharedvg 1023 hdisk58
Finally, verify the volume group:
# lsvg -p sharedvg
# lsvg sharedvg
# ls -l /dev/asm_raw*
If necessary, these are the steps to complete, if the addition of LUNs has to be backed out:
  1. Remove the raw logical volumes (using the cl_rmlv command)
  2. Remove the added LUNs from the volume group (using the cl_reducevg command)
  3. Remove the disk devices on both client nodes: rmdev -dl hdiskX
  4. Remove LUN mappings from each VIOS (using the rmvdev command)
  5. Remove the LUNs frome each VIOS (using the rmdev command)

Topics: AIX, System Admin

Export and import PuTTY sessions

PuTTY itself does not provide a means to export the list of sessions, nor a way to import the sessions from another computer. However, it is not so difficult, once you know that PuTTY stores the session information in the Windows Registry.

To export the Putty sessions, run:

regedit /e "%userprofile%\desktop\putty-sessions.reg" HKEY_CURRENT_USER\Software\SimonTatham\PuTTY\Sessions
Or, to export just all settings (and not only the sessions, run:
regedit /e "%userprofile%\desktop\putty.reg" HKEY_CURRENT_USER\Software\SimonTatham
This will create either a putty-sessions.reg or putty.reg file on your Windows dekstop. You can transfer these files over to another computer, and after installing PuTTY on the other computer, simply double-click on the reg file, to have the Windows Registry entries added. Then, if you start up PuTTY, all the sessions information should be there.

Topics: AIX, Storage, System Admin

Identifying a Disk Bottleneck Using filemon

This blog will display the steps required to identify an IO problem in the storage area network and/or disk arrays on AIX.

Note: Do not execute filemon with AIX 6.1 Technology Level 6 Service Pack 1 if WebSphere MQ is running. WebSphere MQ will abnormally terminate with this AIX release.

Running filemon: As a rule of thumb, a write to a cached fiber attached disk array should average less than 2.5 ms and a read from a cached fiber attached disk array should average less than 15 ms. To confirm the responsiveness of the storage area network and disk array, filemon can be utilized. The following example will collect statistics for a 90 second interval.

# filemon -PT 268435184 -O pv,detailed -o /tmp/filemon.rpt;sleep 90;trcstop

Run trcstop command to signal end of trace.
Tue Sep 15 13:42:12 2015
System: AIX 6.1 Node: hostname Machine: 0000868CF300
[filemon: Reporting started]
# [filemon: Reporting completed]

[filemon: 90.027 secs in measured interval]
Then, review the generated report (/tmp/filemon.rpt).
# more /tmp/filemon.rpt
.
.
.
------------------------------------------------------------------------
Detailed Physical Volume Stats   (512 byte blocks)
------------------------------------------------------------------------

VOLUME: /dev/hdisk11  description: XP MPIO Disk P9500   (Fibre)
reads:                  437296  (0 errs)
  read sizes (blks):    avg     8.0 min       8 max       8 sdev     0.0
  read times (msec):    avg   11.111 min   0.122 max  75.429 sdev   0.347
  read sequences:       1
  read seq. lengths:    avg 3498368.0 min 3498368 max 3498368 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067240
  seek dist (%tot blks):init 4.87525
time to next req(msec): avg   0.206 min   0.018 max 461.074 sdev   1.736
throughput:             19429.5 KB/sec
utilization:            0.77

VOLUME: /dev/hdisk12  description: XP MPIO Disk P9500   (Fibre)
writes:                 434036  (0 errs)
  write sizes (blks):   avg     8.1 min       8 max      56 sdev     1.4
  write times (msec):   avg   2.222 min   0.159 max  79.639 sdev   0.915
  write sequences:      1
  write seq. lengths:   avg 3498344.0 min 3498344 max 3498344 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067216
  seek dist (%tot blks):init 4.87521
time to next req(msec): avg   0.206 min   0.005 max 536.330 sdev   1.875
throughput:             19429.3 KB/sec
utilization:            0.72
.
.
.
In the above report, hdisk11 was the busiest disk on the system during the 90 second sample. The reads from hdisk11 averaged 11.111 ms. Since this is less than 15 ms, the storage area network and disk array were performing within scope for reads.

Also, hdisk12 was the second busiest disk on the system during the 90 second sample. The writes to hdisk12 averaged 2.222 ms. Since this is less than 2.5 ms, the storage area network and disk array were performing within scope for writes.

Other methods to measure similar information:

You can use the topas command using the -D option to get an overview of the most busiest disks on the system:
# topas -D
In the output, columns ART and AWT provide similar information. ART stands for the average time to receive a response from the hosting server for the read request sent. And AWT stands for the average time to receive a response from the hosting server for the write request sent.

You can also use the iostat command, using the -D (for drive utilization) and -l (for long listing mode) options:
# iostat -Dl 60
This will provide an overview over a 60 second period of your disks. The "avg serv" column under the read and write sections will provide you average service times for reads and writes for each disk.

An occasional peak value recorded on a system, doesn't immediately mean there is a disk bottleneck on the system. It requires longer periods of monitoring to determine if a certain disk is indeed a bottleneck for your system.

Topics: AIX, System Admin

Commands to create printer queues

Here are some commands to add a printer to an AIX system. Let's assume that the hostname of the printer is "printer", and that you've added an entry for this "printer" in /etc/hosts, or that you've added it to DNS, so it can be resolved to an IP address. Let's also assume that the queue you wish to make will be called "printerq", and that your printer can communicate on port 9100.

In that case, to create a generic printer queue, the command will be:

# /usr/lib/lpd/pio/etc/piomkjetd mkpq_jetdirect -p 'generic' -D asc \
-q 'printerq' -h 'printer' -x '9100'

In case you wish to set it up as a postscript printer, called "printerqps", then the command will be:
# /usr/lib/lpd/pio/etc/piomkjetd mkpq_jetdirect -p 'generic' -D ps \
-q 'printerqps' -h 'printer' -x '9100'

Number of results found for topic System Admin: 219.
Displaying results: 1 - 10.