GPFS commands to create NSDs, filesystems, add and delete

Put this line in a file ‘nsd.txt’: hdiskpower5::::

[root@nad0019aixd06/usr/lpp/mmfs/bin]# mmcrnsd -F nsd.txt
mmcrnsd: Processing disk hdiskpower5
mmcrnsd: Propagating the changes to all affected nodes.
This is an asynchronous process.

_________________________________________________________

[root@nad0019aixd06/usr/lpp/mmfs/bin]# mmlsnsd

File system Disk name Primary node Backup node
—————————————————————————
backups satadrive (directly attached)
oragpfs gpfs1nsd (directly attached)
oragpfs gpfs2nsd (directly attached)
oragpfs gpfs9nsd (directly attached)
testgpfs nsd_test_5 (directly attached)
testgpfs nsd_test_6 (directly attached)
(free disk) gpfs11nsd (directly attached)
—————————————————————————————————————————

mmcrfs /oragpfs -F nsd.txt (only use this to add a new filesystem)

mmadddisk /dev/oragpfs -F nsd.txt

# mmcrnsd -F nsd.txt
mmcrnsd: Processing disk hdiskpower5
mmcrnsd: Processing disk hdiskpower9
mmcrnsd: Processing disk hdiskpower10
mmcrnsd: Propagating the changes to all affected nodes.
This is an asynchronous process.
# mmadddisk /dev/oragpfs -F nsd.txt

The following disks of oragpfs will be formatted on node nad0019aixd07:
gpfs12nsd: size 104857600 KB
gpfs13nsd: size 104857600 KB
gpfs14nsd: size 104857600 KB
Extending Allocation Map
Completed adding disks to file system oragpfs.
mmadddisk: Propagating the changes to all affected nodes.
This is an asynchronous process.
__________________________________________________________________________________

[root@nad0019aixd06/usr/lpp/mmfs/bin]# mmlsfs oragpfs
flag value description
—- ————– ——————————————————
-s roundRobin Stripe method
-f 16384 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 1 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 1 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a -1 Estimated average file size
-n 8 Estimated number of nodes that will mount file system
-B 524288 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 2000000 Maximum number of inodes
-V 8.01 File system version. Highest supported version: 8.02
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-E yes Exact mtime mount option
-S no Suppress atime mount option
-d gpfs1nsd;gpfs2nsd;gpfs9nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /oragpfs Default mount point

# mmdf oragpfs -d
disk disk size failure holds holds free KB free KB
name in KB group metadata data in full blocks in fragments
————— ————- ——– ——– —– ——————- ——————-
gpfs14nsd 104857600 -1 yes yes 104838144 (100%) 496 ( 0%)
gpfs13nsd 104857600 -1 yes yes 104838144 (100%) 496 ( 0%)
gpfs9nsd 47185920 -1 yes yes 9341952 (20%) 72944 ( 0%)
gpfs12nsd 104857600 -1 yes yes 104838144 (100%) 496 ( 0%)
gpfs2nsd 104857600 1 yes yes 22041600 (21%) 127328 ( 0%)
gpfs1nsd 104857600 1 yes yes 22012416 (21%) 130784 ( 0%)
————- ——————- ——————-
(total) 571473920 367910400 (64%) 332544 ( 0%)

You must change maximum number of inodes to use space. The formula in the manual lists

maximum number of files = (total file system space/2) / (inode size + subblock size)

8,929,280 = (571473920/2) / (524288/16384)

# mmchfs oragpfs -F 8929280

# mmlsfs oragpfs
flag value description
—- ————– —————————————————–
-s roundRobin Stripe method
-f 16384 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 16384 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 1 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 1 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a -1 Estimated average file size
-n 8 Estimated number of nodes that will mount file system
-B 524288 Block size
-Q none Quotas enforced
none Default quotas enabled
-F 8929280 Maximum number of inodes
-V 8.01 File system version. Highest supported version: 8.02
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-E yes Exact mtime mount option
-S no Suppress atime mount option
-d gpfs1nsd;gpfs2nsd;gpfs9nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /oragpfs Default mount point
#

No change in FS will be seen but it will allow GPFS to grow file size as needed.

______________________________________________________________________________________

To add a disk that is already defined as an nsd:

# mmadddisk orabackup nsd_0314_04::::

The following disks of orabackup will be formatted on node nad0019aixp02s1:
nsd_0314_04: size 262144000 KB
Extending Allocation Map
8 % complete on Fri Nov 16 13:23:21 2007
19 % complete on Fri Nov 16 13:23:26 2007
31 % complete on Fri Nov 16 13:23:31 2007
42 % complete on Fri Nov 16 13:23:36 2007
53 % complete on Fri Nov 16 13:23:41 2007
62 % complete on Fri Nov 16 13:23:46 2007
70 % complete on Fri Nov 16 13:23:51 2007
82 % complete on Fri Nov 16 13:23:56 2007
97 % complete on Fri Nov 16 13:24:01 2007
100 % complete on Fri Nov 16 13:24:02 2007
Completed adding disks to file system orabackup.
mmadddisk: Propagating the changes to all affected nodes.
This is an asynchronous process.

When you add a new disk to an existing filesystem, it takes a little while for the free space to show up.

How to remove a disk from gpfs:

# ./mmlsnsd

File system Disk name Primary node Backup node
—————————————————————————
gpfs10nsd gpfs10nsd (directly attached)
gpfs11nsd gpfs11nsd (directly attached)
gpfs7nsd gpfs7nsd (directly attached)
gpfs9nsd gpfs9nsd (directly attached)
orabackup gpfs1nsd (directly attached)
orabackup gpfs2nsd (directly attached)
orabackup gpfs3nsd (directly attached)
orabackup gpfs4nsd (directly attached)
orabackup gpfs5nsd (directly attached)
oracrsocr gpfs8nsd (directly attached)
(free disk) gpfs12nsd (directly attached)
(free disk) gpfs6nsd (directly attached)

# ./mmdeldisk orabackup “gpfs1nsd;gpfs2nsd;gpfs4nsd” -r
Deleting disks …
Scanning system storage pool
GPFS: 6027-589 Scanning file system metadata, phase 1 …
70 % complete on Mon Oct 13 16:26:00 2008
100 % complete on Mon Oct 13 16:26:01 2008
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata …
GPFS: 6027-552 Scan completed successfully.
Checking Allocation Map for storage pool ‘system’
GPFS: 6027-370 tsdeldisk64 completed.
mmdeldisk: 6027-1371 Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.
Restriping orabackup …
GPFS: 6027-589 Scanning file system metadata, phase 1 …
71 % complete on Mon Oct 13 16:26:16 2008
100 % complete on Mon Oct 13 16:26:17 2008
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata …
GPFS: 6027-552 Scan completed successfully.
Done
#

Create a new GPFS cluster

1. Create a nodelist file.
# more /home/root/nodelist
10.32.18.98:quorum
10.32.18.99

2. Configure ssh/scp to run as root without a password.

3. Run the mmcrcluster command.

mmcrcluster -n /home/root/nodelist -p d06gpfs -s d07gpfs -r /usr/bin/ssh -R /usr/bin/scp

Tue Jun 24 16:24:03 CDT 2008: 6027-1664 mmcrcluster: Processing node d06gpfs
Tue Jun 24 16:24:05 CDT 2008: 6027-1664 mmcrcluster: Processing node d07gpfs
mmcrcluster: Command successfully completed
mmcrcluster: 6027-1371 Propagating the changes to all affected nodes.
This is an asynchronous process.
# mmlscluster

GPFS cluster information
========================
GPFS cluster name: d06gpfs
GPFS cluster id: 729603352964457971
GPFS UID domain: d06gpfs
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp

GPFS cluster configuration servers:
———————————–
Primary server: d06gpfs
Secondary server: d07gpfs

Node number Node name IP address Full node name Remarks
———————————————————————————–
1 d06gpfs 10.32.18.98 d06gpfs quorum node
2 d07gpfs 10.32.18.99 d07gpfs

A disk is now available but GPFS doesn’t believe you

home/root> mmchdisk oragpfs2 start -d gpfs2nsd
GPFS: 6027-589 Scanning file system metadata, phase 1 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 2 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 3 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-589 Scanning file system metadata, phase 4 …
GPFS: 6027-552 Scan completed successfully.
GPFS: 6027-565 Scanning user file metadata …
GPFS: 6027-552 Scan completed successfully.
[033]0;u@h: w007]/home/root> mmlsdisk oragpfs2
disk driver sector failure holds holds storage
name type size group metadata data status availability pool
———— ——– —— ——- ——– —– ————- ———— ————
gpfs2nsd nsd 512 -1 yes yes ready up system
gpfs28nsd nsd 512 147 yes yes ready up system

Change the Failure Groups on a GPFS disk

./mmchdisk icash change -d "gpfs13nsd::::10"
./mmdf icash

disk                disk size  failure holds    holds             free KB             free KB
name                    in KB    group metadata data       in full blocks        in fragments
--------------- ------------- -------- -------- ----- ------------------- -------------------
gpfs13nsd           104857600       -1 yes      yes         9514496 ( 9%)         82304 ( 0%)
gpfs12nsd           104857600       -1 yes      yes         9519616 ( 9%)         79368 ( 0%)
gpfs14nsd           104857600       -1 yes      yes         9346304 ( 9%)         72480 ( 0%)
gpfs15nsd           104857600       -1 yes      yes         9345536 ( 9%)         71344 ( 0%)

mapgfs command

 


#!/bin/ksh

## This command simply maps powerdisk devices to filesytem names
## and nsd devices and then runs the mmlsdisk command for each
## If you aren't using emc, change DISKEXCLUDE


function callMmlsdisk {

# Run mmlsdisk and lspv for each record and merge the results

/usr/lpp/mmfs/bin/mmlsdisk $DEVICE | grep nsd | while read NSD RESTOFLINE
   do
   lspv | grep $NSD | grep $DISKEXCLUDE
   echo $RESTOFLINE
   done | xargs -n 10 | awk '{print $1,"\t",$6,"\t",$3}'
}

# Initial variables

TEMPFILESYSTEMS=/tmp/$$filesystems
FILESYSTEMRECORD=/tmp/$$fsrecord
DISKEXCLUDE=power

# Create a mmfs only version of /etc/filesystems

grep -p "= mmfs" /etc/filesystems > $TEMPFILESYSTEMS

# Now create a format which grabs two lines and merges them down to one line
# here is an example:
# /dir/dbf: dev = /dev/dbf1
# /dir/log: dev = /dev/log1
# /dir/sys: dev = /dev/sys1

egrep ":|/dev" $TEMPFILESYSTEMS | xargs -n4 > $FILESYSTEMRECORD

# Use only the first and last line, loop to print

awk '{ print $1,$4}' $FILESYSTEMRECORD | while read FSNAME DEVICE
     do
       echo  # space to divide records
       echo $FSNAME ---FG------NSD---------   # Heading
       callMmlsdisk $DEVICE
     done

# Remove temporary files
rm $TEMPFILESYSTEMS $FILESYSTEMRECORD

Deleting Disks from GPFS

GPFS can be cryptic compared to AIX. But tonight, I needed to replace one disk with another since we are going to get rid of one of our SANs. I got my new disks and added them with mmcrnsd and mmadddisk, then I ran a delete. About 5 hours later I began to wonder if the process had gone astray. In GPFS, the delete will migrate everything off before deleting instead of throwing an error, if it can. Of course in the mean time it will give you pretty scary messages:

Attention: Due to an earlier configuration change the file system
may contain data that is at risk of being lost.

Now this may be true, but it just scary. When you look it up in the manual, it tells you to balance at your earliest convenience. There is a pretty big difference in tone between ‘risk of being lost’ and balance at your earliest convenience. On top of this, the command stops giving status and just appears to hang.

55 % complete on Sat Aug 16 20:28:42 2008
59 % complete on Sat Aug 16 20:28:45 2008
62 % complete on Sat Aug 16 20:28:48 2008
65 % complete on Sat Aug 16 20:28:51 2008
100 % complete on Sat Aug 16 20:28:52 2008
Scan completed successfully.
Scanning file system metadata, phase 3 …
Scan completed successfully.
Scanning user file metadata …
99 % complete on Sat Aug 16 20:30:00 2008
( no more output for 6 hours, I am assuming that this command would background if I disconnected)

100 % complete on Sun Aug 17 01:15:22 2008
Scan completed successfully.

tsdeldisk completed.
mmdeldisk: Propagating the changes to all affected nodes.
This is an asynchronous process.
#
#
#
#
#

I made my own status command with a simple loop:

while true
do
mmdf mydatabasefs| grep nsd_1784_02
done

This takes a little while to run each time so it is like having a sleep between status lines:
nsd_1784_02 262144000 1784 yes yes 259610256 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259630832 (99%) 1142 ( 0%) *
nsd_1784_02 262144000 1784 yes yes 259651424 (99%) 1134 ( 0%) *

The 99% is the key, after 6 hours, all the data is just about migrated off of this disk.

After this is done, the really scary message above is replaced with a slightly less scare one, with the same remedy advised:

Attention: Due to an earlier configuration change the file system
is no longer properly replicated.