Deleting Disks from GPFS

GPFS can be cryptic compared to AIX. But tonight, I needed to replace one disk with another since we are going to get rid of one of our SANs. I got my new disks and added them with mmcrnsd and mmadddisk, then I ran a delete. About 5 hours later I began to wonder if the process had gone astray. In GPFS, the delete will migrate everything off before deleting instead of throwing an error, if it can. Of course in the mean time it will give you pretty scary messages:

Attention: Due to an earlier configuration change the file system
may contain data that is at risk of being lost.

Now this may be true, but it just scary. When you look it up in the manual, it tells you to balance at your earliest convenience. There is a pretty big difference in tone between ‘risk of being lost’ and balance at your earliest convenience. On top of this, the command stops giving status and just appears to hang.

55 % complete on Sat Aug 16 20:28:42 2008
59 % complete on Sat Aug 16 20:28:45 2008
62 % complete on Sat Aug 16 20:28:48 2008
65 % complete on Sat Aug 16 20:28:51 2008
100 % complete on Sat Aug 16 20:28:52 2008
Scan completed successfully.
Scanning file system metadata, phase 3 …
Scan completed successfully.
Scanning user file metadata …
99 % complete on Sat Aug 16 20:30:00 2008
( no more output for 6 hours, I am assuming that this command would background if I disconnected)

100 % complete on Sun Aug 17 01:15:22 2008
Scan completed successfully.

tsdeldisk completed.
mmdeldisk: Propagating the changes to all affected nodes.
This is an asynchronous process.

I made my own status command with a simple loop:

while true
mmdf mydatabasefs| grep nsd_1784_02

This takes a little while to run each time so it is like having a sleep between status lines:
nsd_1784_02 262144000 1784 yes yes 259610256 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259630832 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259651424 (99%) 1134 ( 0%) *

The 99% is the key, after 6 hours, all the data is just about migrated off of this disk.

After this is done, the really scary message above is replaced with a slightly less scare one, with the same remedy advised:

Attention: Due to an earlier configuration change the file system
is no longer properly replicated.

90% of on the fly scripting, three basic loops

One of the cool things about a robust shell such as ksh is that you can build command on the fly by adding a command one piece at a time and running it to check how you are doing. At the end, you just add an execution command and you are done. This is called REPL style coding and is one of the big advantages of shell versus something like perl, which is often associated with System Administration.

In this example, I want to remove all of the special EMC disks from my system that aren’t in a volume group. I know that is OS specific stuff, but really I am just processing a list. Here is my criteria:

  1. hdisk4 none None (exclude, this is not a power device)
  2. hdiskpower9 00c7286c92245797 oraclevg active (exclude, in use)
  3. hdiskpower12 00c7286c7f7e4c28 None (include, not in use)

So I start with a simple list command:

hdisk0 00c7286c0997490f rootvg active
hdisk1 00c9738ea12e9e1b rootvg active
hdisk4 none None

hdisk17 none None
hdisk24 none None
hdiskpower3 00c7286c1b6cd3f2 None
hdisk25 none None
hdisk26 none None

hdisk71 none None
hdiskpower11 00c7286c922353ea oraclevg active
hdiskpower15 00c7286c9222cc96 oraclevg active

Now, because I use the vi control set on my shell, I type [ESC-k], but if you were using a different
shell or command structure, you might just hit up arrow. This calls back your old command. In this shell, I then hit [shift-a] to go to end of line and enter edit mode, where I add a pipe and grep:

# lspv | grep power
hdiskpower3 00c7286c1b6cd3f2 None
hdiskpower1 00c7286c6afbe0f7 None
hdiskpower5 00c7286c11a34c03 None
hdiskpower6 none None
hdiskpower7 none None
hdiskpower8 00c7286cb2779a47 satavg active
hdiskpower0 none None
hdiskpower2 none None
hdiskpower4 none None
hdiskpower12 00c7286c7f7e4c28 None
hdiskpower13 00c7286c7f7ee6e1 None
hdiskpower14 00c7286c804ffa93 None
hdiskpower9 00c7286c92245797 oraclevg active
hdiskpower10 00c7286c9223d7f3 oraclevg active
hdiskpower11 00c7286c922353ea oraclevg active
hdiskpower15 00c7286c9222cc96 oraclevg active
hdiskpower18 00c7286cd3c4a22d None

Now, I have a much more manageable list, I repeat this to now only find disks not in a volume group. There is probably a more precise command that piping two greps into each other, but remember this is a throw away script that you will be typing in very fast while visually inspecting your data along the way:

# lspv | grep power | grep None
hdiskpower3 00c7286c1b6cd3f2 None
hdiskpower1 00c7286c6afbe0f7 None
hdiskpower5 00c7286c11a34c03 None
hdiskpower6 none None
hdiskpower7 none None
hdiskpower0 none None
hdiskpower2 none None
hdiskpower4 none None
hdiskpower12 00c7286c7f7e4c28 None
hdiskpower13 00c7286c7f7ee6e1 None
hdiskpower14 00c7286c804ffa93 None
hdiskpower18 00c7286cd3c4a22d None

Now for the loop, you can go two ways with this, a while loop or ‘xargs’. I never can remember all of
the flags for xargs, so I usually only use it if I can without flags, but what I need is to ignore fields two and three at this point (even though we used field three as a grep criteria. Anyhow, here is the while loop. As before, bring back your old command and go to the end of the line:

# lspv | grep power | grep None | while read DISK TRASH

Even though you have hit enter, the while loop isn’t logically done. What we have done so far is
start the loop and read the first field into the variable DISK and the other stuff into TRASH. This is
a handy way to say, just give me the first field. If you have more fields than variables, they all end up in the last. If you need the second field, just do this instead ( while read TRASH PVID TRASH ).

# lspv | grep power | grep None | while read DISK TRASH
> do

Logic still not done until we get to ‘done’, you can now add as many lines as you want to act on

# lspv | grep power | grep None | while read DISK TRASH
> do
> echo $DISK
> rmdev -dl $DISK
> done

hdiskpower3 deleted
hdiskpower1 deleted
hdiskpower5 deleted
hdiskpower6 deleted
hdiskpower7 deleted
hdiskpower0 deleted
hdiskpower2 deleted
hdiskpower4 deleted
hdiskpower12 deleted
hdiskpower13 deleted
hdiskpower14 deleted
hdiskpower18 deleted

If you think the while loop and second shell is a little cumbersome, then introduce awk and xargs:

# lspv | grep power | grep None | awk ‘{ print $1 }’ | xargs -n1 rmdev -dl

Here is a third way to accomplish this if you are a big fan of back-tick:

for DISK in `lspv | grep power | grep None | awk ‘{print $1}’`
rmdev-dl $DISK

This command is a little more awkward because if you crafted it in a REPL fashion while checking your logic and data, you would probably start with:

lspv | grep power | grep None | awk ‘{print $1}’

Then you would have to go to both the beginning and end of the line, to wrap this is a backquote.

Here are some other ideas for isolating only the first field of data:

lspv | grep power | grep None | sed ‘s/ .*//g’
lspv | grep power | grep None | cut -c 0-12

Ultimately, I almost always pick the while loop though, because it is just so fast and you can load or discard all sorts of variables on the fly. Just put an echo before your command if you aren’t sure of how it will look and bring the whole thing back and take it out when you are ready to run. Here is a case when a for loop is cool:

# for X in hdiskpower6 hdiskpower7 hdiskpower8
> do
> extendvg oraclevg1 $X
> done

Make sure your AIX server can boot OK

So we all know how to write a boot image to a disk (bosboot -ad /dev/hdisk0) and how to set the bootlist (bootlist -m normal hdisk0 hdisk1).

You also want to check in the /dev directory to make sure you have your links set up correctly:

# ls -l | grep -i ipl
crw-rw—-   1 root     system       10,  0 Jul 14 2006  IPL_rootvg
crw-rw—-   2 root     system       10,  1 Jul 14 2006  ipl_blv  (should be same as rhd5)
crw——-   2 root     system       20,  0 Aug 13 11:10 ipldevice (should be same as rhdisk1, in this case)
# ls -l | grep hdisk1
brw——-   1 root     system       20,  0 Aug 13 11:20 hdisk1
crw——-   2 root     system       20,  0 Aug 13 11:10 rhdisk1
# ls -l | grep hd5
brw-rw—-   1 root     system       10,  1 Aug 13 11:37 hd5
crw-rw—-   2 root     system       10,  1 Jul 14 2006  rhd5

If you see something wrong, for example ipldevice points to the wrong disk, just rm the ipldevice pseudofile and relink it ( ln rhdisk1 ipldevice )

Also, support has me run this command that list out all bootable disks even if they are’t part of rootvg ( you get that with vio disks that are in rootvg for a client):

# ipl_varyon -i
PVNAME          BOOT DEVICE     PVID                    VOLUME GROUP ID
hdisk0          NO              00031691bced4a4e0000000000000000        00c1b3da00004c00
hdisk2          NO              00cdeaeadfcd0ebc0000000000000000        00c1b3da00004c00
hdisk1          YES             00031691bcd549a60000000000000000        00cdeaea00004c00

How to find what process is listening to a port in AIX

It seems like this command should be easier, but it is a little crazy.  Lets say that I don’t know that sshd is listening on port 22.  Here is how to arrive at that:

# netstat -Aan | head
Active Internet connections (including servers)
PCB/ADDR         Proto Recv-Q Send-Q  Local Address      Foreign Address    (state)
f100060001be4b98 tcp4       0      0  *.13               *.*                LISTEN
f100060001bf7b98 tcp        0      0  *.21               *.*                LISTEN
f100060001f60398 tcp4       0      0  *.22               *.*                LISTEN
f100060001bf4b98 tcp        0      0  *.23               *.*                LISTEN

# rmsock f100060001f60398 tcpcb         
The socket 0x1f60008 is being held by proccess 266380 (sshd).

You could also use lsof with the socket #, but I don’t usually load that.

How to hack around telnet and make it your bitch

Two quick things about telnet:

  1. Put a port number behind the host you want to go to and use it as a port checker
  2. Learn to script around it with something similar to a where script


So to check ports with telnet, simply add the port number ( here is a test to see if ftp is enabled):

# telnet 21
Connected to
Escape character is ‘^]’.
220———- Welcome to Pure-FTPd [TLS] ———-
220-You are user number 1 of 50 allowed.
220-Local time is now 13:08. Server port: 21.
220 You will be disconnected after 15 minutes of inactivity.
214-The following SITE commands are recognized
214 Pure-FTPd –
221-Goodbye. You uploaded 0 and downloaded 0 kbytes.
221 Logout.
Connection closed.

Next, there are all sort of switches and other systems that I need to get information from that aren’t ssh enabled. Here is a crude way to accomplish what you want with telnet:

echo “$USERNAME”
sleep 1
echo “$PASSWORD”
sleep 1
echo “show switch”
sleep 2
) | telnet $HOST

lquerypv (Undocumented command for determining disk info)

Lquerypv will simply read the data from the disk and display it in a format similar to octal dump (od). In the example below, we see the PVID written to the disk at location 80.  You seem to be able to read anything that you point lquerypv at (I tried /etc/motd and read it just fine).  This is great for reading the PVID of a logical volume on a vio server that is pretending to be a virtual disk on a client since you can’t see that information with lspv.  Lquerypv is also a great command for figuring out where disk access issues are.  If lquerypv returns any data, then you can read the disk and it isn’t a reserve issue.  If it can’t read any data, and just hangs or returns nothing, then ABSOLUTELY NO OTHER AIX COMMAND WILL WORK.  At this point you should stop looking at your filesystems or volume groups and logical volumes.  The issue is that you simply can’t read the disks, and you need to either go to the vio server and see if there is a problem there or use lsattr -El hdisk0 to check the scsi reserve (on another system that might be sharing the disk). If you the issue is on your VIO server, or you have direct-attached SAN disks, then ask your SAN administrators to check their stuff. If, however queries against all of our disk hang, especially during an initial install, then maybe your client SAN software is messed up, you could try to remove it and use the MPIO version or just re-install it. The clearest sign of one disk with a reserve lock at the san level is when lquerypv returns nothing and lquerypv against other disks works fine.

# lspv
hdisk0          00031691bced4a4e                    oraclevg        active
hdisk2          00cdeaeadfcd0ebc                    oraclevg        active
hdisk1          00031691bcd549a6                    rootvg          active
# lquerypv -h /dev/hdisk0
00000000   C9C2D4C1 00000000 00000000 00000000  |................|
00000010   00000000 00000000 00000000 00000000  |................|
00000020   00000000 00000000 00000000 00000000  |................|
00000030   00000000 00000000 00000000 00000000  |................|
00000040   00000000 00000000 00000000 00000000  |................|
00000050   00000000 00000000 00000000 00000000  |................|
00000060   00000000 00000000 00000000 00000000  |................|
00000070   00000000 00000000 00000000 00000000  |................|
00000080   00031691 BCED4A4E 00000000 00000000  |......JN........|
00000090   00000000 00000000 00000000 00000000  |................|
000000A0   00000000 00000000 00000000 00000000  |................|
000000B0   00000000 00000000 00000000 00000000  |................|
000000C0   00000000 00000000 00000000 00000000  |................|
000000D0   00000000 00000000 00000000 00000000  |................|
000000E0   00000000 00000000 00000000 00000000  |................|
000000F0   00000000 00000000 00000000 00000000  |................|

lqueryvg (useful undocumented LVM/VGDA query)

Lqueryvg bypasses LVM altogether and reads the VGDA off of any disk that is a member of a volume group. Sometimes when LVM and VGDA get out of sync with each other, the volume group information here can be a great help.  Think of this information as what great read from the disk when you do an importvg.


# lqueryvg -Atp hdisk0
Max LVs: 256
PP Size: 28
Free PPs: 959
LV count: 4
PV count: 2
Total VGDAs: 3
Conc Allowed: 0
MAX PPs per PV 1016
MAX PVs: 32
Quorum (disk): 0
Quorum (dd): 0
Auto Varyon ?: 0
Conc Autovaryo 0
Varied on Conc 0
Logical:    00c1b3da00004c0000000112982a7298.1 loglv02 1
                00c1b3da00004c0000000112982a7298.3 fslv00 3
                00c1b3da00004c0000000112982a7298.4 fslv02 3
                00c1b3da00004c0000000112982a7298.5 fslv04 3
Physical:  00cdeaeadfcd0ebc 2 0
                00031691bced4a4e 1 0
Total PPs:    1617
LTG size:     128
VG Type: 0
Max PPs: 32512

Two scripts that allow you to mail errpts to yourself

/buxs/bin> more errpt_odmadd

grep “^##” $0 | sed ‘s/^##//g’ > /tmp/$$.odmadd

odmdelete -o errnotify -q “en_name = syslog”
odmadd /tmp/$$.odmadd
rm /tmp/$$.odmadd

##  en_pid = 0
##  en_name = “syslog”
##  en_persistenceflg = 1
##  en_label = “”
##  en_crcid = 0
##  en_class = “”
##  en_type = “”
##  en_alertflg = “”
##  en_resource = “”
##  en_rtype = “”
##  en_rclass = “”
##  en_method = “/admin/bin/errnotify $1″

/admin/bin> more errnotify


errpt -a -l $1 > $O

egrep “LABEL|Class|Type” $O | cut -c 7- | xargs -n3 |  read A B C

chmod 755 $O

dt=`date +”%m %e %Y %T”`

echo $A $B $C $dt >> /admin/bin/errcount.txt

chmod 755 /admin/bin/errcount.txt

chown root.buxs /admin/bin/errcount.txt

mail -s “`hostname`: errpt $A $C $B”  YOUR_EMAIL_HERE< $O