Deleting Disks from GPFS

GPFS can be cryptic compared to AIX. But tonight, I needed to replace one disk with another since we are going to get rid of one of our SANs. I got my new disks and added them with mmcrnsd and mmadddisk, then I ran a delete. About 5 hours later I began to wonder if the process had gone astray. In GPFS, the delete will migrate everything off before deleting instead of throwing an error, if it can. Of course in the mean time it will give you pretty scary messages:

Attention: Due to an earlier configuration change the file system
may contain data that is at risk of being lost.

Now this may be true, but it just scary. When you look it up in the manual, it tells you to balance at your earliest convenience. There is a pretty big difference in tone between ‘risk of being lost’ and balance at your earliest convenience. On top of this, the command stops giving status and just appears to hang.

55 % complete on Sat Aug 16 20:28:42 2008
59 % complete on Sat Aug 16 20:28:45 2008
62 % complete on Sat Aug 16 20:28:48 2008
65 % complete on Sat Aug 16 20:28:51 2008
100 % complete on Sat Aug 16 20:28:52 2008
Scan completed successfully.
Scanning file system metadata, phase 3 …
Scan completed successfully.
Scanning user file metadata …
99 % complete on Sat Aug 16 20:30:00 2008
( no more output for 6 hours, I am assuming that this command would background if I disconnected)

100 % complete on Sun Aug 17 01:15:22 2008
Scan completed successfully.

tsdeldisk completed.
mmdeldisk: Propagating the changes to all affected nodes.
This is an asynchronous process.
#
#
#
#
#

I made my own status command with a simple loop:

while true
do
mmdf mydatabasefs| grep nsd_1784_02
done

This takes a little while to run each time so it is like having a sleep between status lines:
nsd_1784_02 262144000 1784 yes yes 259610256 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259630832 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259651424 (99%) 1134 ( 0%) *

The 99% is the key, after 6 hours, all the data is just about migrated off of this disk.

After this is done, the really scary message above is replaced with a slightly less scare one, with the same remedy advised:

Attention: Due to an earlier configuration change the file system
is no longer properly replicated.

Leave a Reply

Your email address will not be published. Required fields are marked *