The ramifications of non-routable IP addresses

The term non-routable IP means a little bit more than one might think.  There are of course IP ranges not used on the internet that are expected to be used by companies intranets, but this isn’t the end of the story.  Oracle RAC for instance wants what it calls non-routable IP addresses.  But these can’t be just any two addresses that can see each other, like 10.0.0.1 and 10.0.0.2.  It could be, but Oracle seems to have a soft requirement that interfaces for interconnects have different addresses.

In our environment, we have many different VLANs which are connected to each other through a switch.  Lets call them VLAN 0, VLAN 4, and VLAN 8.  As a happy coincidence, the vast majority of the IPs on these VLANs follow this convention:

VLAN 0 ( 10.32.0 ) –> If machines all have 255.255.252.0 as their netmask, they could talk to each other within the VLAN if their IPs are between 10.32.0.1 and 10.32.3.254.  The gateway out is always specified as 10.32.0.1 but it doesn’t have to be.  It could be 10.32.2.17, but that would not be a good practice.

VLAN 4 ( 10.32.4 ) –> VLAN 4 starts at 10.32.4.1 and goes to 10.32.7.254, provided that everyone use netmask 255.255.255.252.  If hosts within this VLAN decide to use different netmasks, such as 255.255.255.253, they will see a smaller subset of hosts without going through the gateway.  They might be able to see other hosts by going through the gateway, but things could get pretty ugly pretty fast, especially when talking to other hosts which they can see on the local VLAN, but which can’t see them.

VLAN 8 ( 10.32.8 ) By now, you should get the point.  Lets say however, that we decide that all of the hosts on VLAN 8 should use netmask 255.255.255.0.  In this case the IP range would be smaller: 10.32.8.1 to 10.32.8.254.  But we also now have the potential to create something which we arbitrarily call VLAN 9 and could begin populating it with IPs 10.32.9.1 – 10.32.9.254 and a netmask of 255.255.255.0.

For a long time, I thought that on the switch, where VLAN 4 was specified, some logic was also included to only allow IPs 10.32.4.1 – 10.32.7.254.  This is not the case.  If I put the IPs 10.32.12.23 and 10.32.12.24 on VLAN 4, they will see each other.  It would make sense for me to rachet down the netmask to be as small as possible in this case, but the physical hardware of the switch will allow these two interfaces to talk to each other,  They would not be seen or see other IPs on the same VLAN which did not fit into their IP/netmask restriction.  They also wouldn’t see the gateway unless it also fit into their scheme, 10.32.12.24 for example.

One could imagine a really interesting configuration of a VLAN with 2 or 3 evenly divided IPs that couldn’t see each other.  Each would have their own unique gateways out.  Since I recently read “Godel, Escher, and Bach”, I am reminded of this image (Double Planetoid), which represents two coexisting worlds that never see each other.  I suggest that you click to enlarge the image below to fully understand it.

And so, in the real world, you probably wouldn’t have a VLAN of two equal parts that couldn’t see each other, the dinosaur IPs and the civilization IPs, but that doesn’t mean that its not possible, or that some dinosaur IPs may not exist in the VLAN.  This is the case with our non-routable RAC heartbeats and interconnects. They can’t exist in their own VLAN because they share interfaces and it would be a little silly to create special VLANs for only a few interfaces, yet they can’t really every see the other addresses on their own VLAN and so the model is valid.

How smit creates a vlan

#!/bin/ksh

###################
# Pass the vlan tag id as $1
#
#
###################

# create the entX device
DEV=`mkdev -c adapter -s vlan -t`

# create the enX device
DE=`echo $DEV | cut -f1 -d" " | cut -c1,2,4-`
/usr/lib/methods/defif -c if -t en -s EN -w $DE

# create the etX device, not usually used
DT=`echo $DEV | cut -f1 -d" " | cut -c1,3-`
/usr/lib/methods/defif -c if -t en -s EN -w $DT

# retrieve the entX name
echo $DEV | while read ENT TRASH

# set the vlan tag
chdev -l $ENT -a vlan_tag_id=$1

‘PRNG is not seeded’ message at aix 6.1 from ssh

/home/coffee1> ssh coffee2
PRNG is not seeded

Apparently ssh needs access to /dev/random and /dev/urandom


# cd /dev
# ls -l | grep random
crw-------    1 root     system       35,  0 May 04 10:43 random
crw-------    1 root     system       35,  1 May 04 10:43 urandom
# chmod 644 random
# chmod 644 urandom
# ls -l | grep random
crw-r--r--    1 root     system       35,  0 May 04 10:43 random
crw-r--r--    1 root     system       35,  1 May 04 10:43 urandom

Script to tell me what hosts are down in a list

#! /usr/bin/ksh
#################################################################
# Title      :  pinghosts - Returns status of all hosts
# Author     :  John Rigler
# Date       :  01-09-2009
# Requires   :  ksh
#################################################################



grep -v "^#" $1 | while read HOSTNAME # Only read uncommented lines
  do
    # Execute the ping once and wait 2 seconds
    ping -c 1 -w 2 $HOSTNAME  2>&1 1>/dev/null
    if [[ ! $? -eq 0 ]]
    then
        echo "$HOSTNAME down"
    fi
  done

Setup dsh and dcp

dsh is an extension of ssh which allows you to run commands in parallel on a number of serves at once.
To set up dsh to work, first set the following variables:

export DSH_LIST=/etc/dsh.hosts
export DSH_NODE_RSH=/usr/bin/ssh
export DSH_NODE_RCP=/usr/bin/scp
export DCP_DEVICE_RCP=/usr/bin/scp
export DCP_NODE_RCP=/usr/bin/scp

/etc/dsh.hosts is just a list of fully qualified host names.

Also, pipe your dsh results into ‘dshbak -c’ and it will organize them nicely for you.

How to set up ssh to allow you to run remote commands

Configuration for ssh is done in two places:

  1. In the /etc/ssh directory as root
  2. In the user’s .ssh subdirectory

From a system perspective, /etc/ssh/sshd_config may need to be changed in order to restrict ssh version 1, allow root login (PermitRootLogin) or make other various changes. The sshd daemon can be restarted without disrupting current connections. In /etc/ssh is also a file called ssh_known_hosts. If you will be using ssh as the root user, I recommend making a symbolic link between root’s known_hosts file and this one. Then make sure and connect to any new hosts as root before connecting as a user. In this way, you will maintain a global known_hosts command and individual users will not have to maintain their individual host lists.

From a user perpective, ssh is set up by creating a public and private key pair with the ssh-keygen command:

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/username/.ssh/id_rsa):
Created directory '/home/username/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/username/.ssh/id_rsa.
Your public key has been saved in /home/username/.ssh/id_rsa.pub.
The key fingerprint is:
45:36:66:b8:39:bc:e0:84:ae:eb:50:e3:28:ec:47:0a username@hostname

$ cd .ssh
$ ls -l
total 16
-rw-------   1 username    staff          1675 Nov 24 12:40 id_rsa
-rw-r--r--   1 username    staff           401 Nov 24 12:40 id_rsa.pub

The file ‘id_rsa’ is your private key and should be kept on any system that is trying to ssh out. The file ‘id_rsa.pub is your public key. Give this to other people so that they can put in into a file on their side called ‘authorized_keys’. If you want to test ssh by connecting to yourself, simply move or copy ‘id_rsa.pub’ to ‘authorized_keys’. At this point you should be able to test ssh by connecting to yourself:

$ pwd
/home/username/.ssh
$ ls -la
total 16
drwx------   2 netiq    staff           256 Nov 24 12:50 .
drwxr-xr-x   3 netiq    staff           256 Nov 24 12:47 ..
-rw-------   1 netiq    staff          1675 Nov 24 12:47 id_rsa
-rw-r--r--   1 netiq    staff           401 Nov 24 12:47 id_rsa.pub
$ mv id_rsa.pub authorized_keys
$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
 RSA key fingerprint is 3b:4b:af:d1:b3:ec:51:83:96:48:ea:8e:09:83:d4:80.
 Are you sure you want to continue connecting (yes/no)?yes
Warning: Permanently added 'localhost,127.0.0.1' (RSA) to the list of known hosts.
Last unsuccessful login: Mon Nov 24 12:43:24 CST 2008 on ssh from 10.32.12.45
Last login: Mon Nov 24 12:48:14 CST 2008 on /dev/pts/1 from 10.32.12.45
**********************************************************
*                                                        *
*                                                        *
*  Welcome to AIX Version 5.3!                           *
*                                                        *
*                                                        *
*  Please see the README file in /usr/lpp/bos            *
*  for information pertinent to                          *
*  this release of the AIX Operating System.             *
*                                                        *
*                                                        *
**********************************************************
$ exit
Connection to localhost closed.

Generally you will no only connect to yourself, but you will also not use the name ‘localhost’. After running this test, however, you will have created a new file called ‘known_hosts’ that contains a bit of data which describes this server. This is a human-readable file that will collect information about all of the servers that you connect to. This is the file that is over-ridden by /etc/ssh/ssh_known_hosts.

Once ssh is configured, scp and sftp will also work. If you are a micro-focus cobol user, you might see a different ‘scp’ which will seem wierd, simply change your path to fix this or fully qualify scp:

psoft$scp
PVER1
GERR00Not enough parameters
psoft{fsprd75}$whence scp
/usr/lpp/cobol/bin/scp
psoft$/usr/bin/scp
usage: scp [-1246BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
           [-l limit] [-o ssh_option] [-P port] [-S program]
           [[user@]host1:]file1 [...] [[user@]host2:]file2

Don’t panic if your boot hangs on led 538

538 The configuration manager is going to invoke a configuration method.

Tonight, we had to reboot one of our servers after an old version of powerpath freaked out while discovering LUNs. The LUNs were discovered again on boot and we set on 538 for about 15 minutes. When you are used to the whole LPAR coming up in less then 10 minutes, this can be scary, but just as we were about to make other plans, the led moved on and cfgmgr finished.

When trying to bring the same LUNs online in normal mode, the server would hang on 538 forever for some reason, I suspect it is because we are at 5200-08 and powerpath 3.0.4.0, really old stuff.

Of course something similar happens when installing upgrades, it seems to hang forever.

If the network config is wrong, it will get past config manager and hang on NSF or something like that. In this case, I usually boot up with an alternate profile that doesn’t have any network adapters, then from the console I just rmdev everything and then reboot back with my old profile. Works every time.

We saw this again later and it took more like 40 minutes but then came up.