Hardware & monitoring

Description

Primarily for Dell hardware, except for formatting and mounting the disk array. How to update Dell firmware and monitor with Dell Open Manage Server Administrator.

Notes

This guide is designed for our previous SL5/Rocks cluster configuration.

It actually is 99% out of date as of 2015 and will be removed and replaced shortly. ADMINS of hepcms: please consult our private Google pages for documentation.

Last modified

September 11, 2015

Last edited September 11, 2015

Configure the network switch
Configure the big disk array
- Create, format & mount the disk array on the GN
- Network mount the disk array on all nodes
Dell OMSA & IPMI
Upgrade firmware & BIOS

Configure the network switch

Description	We first connect to the switch via a direct serial connection to get it to issue DHCP requests. We then get Rocks to listen to the DHCP request and assign an IP address, then do final configuration via a web browser.
Dependencies	None.
Notes
Guides	- PowerConnect 62XX manuals - Minicom

Configure switch to request IP via DHCP using serial connection
Configure switch to request IP via DHCP on in-band-network using OOB ethernet connection
Turn off the spanning tree using a graphical browser

Configure switch to request IP via DHCP using serial connection

1. The VT100 emulator:

First, connect the switch and computer of choice, using the manufacturer supplied serial cable. A terminal program, such as minicom (Mac), can be used to talk to the switch. We were unable to get our HN to communicate with the switch over the serial console using minicom, so instead used a laptop w/ serial port running Linux. It is possible to use a USB-Serial adapter for modern laptops, in the case of Windows, be sure to have the device driver installed. Additionally, with modern Dell switches (8132), the OOB port is set by default to accept DHCP and thus the device can be configured over ethernet to the OOB port, see these instructions.

Alternative terminal programs for serial console:

Windows = Hyperterminal (available in all distributions, with a 30-day trial)
Linux w/ GUI = gtkterm (available in most distros (except SL); if not, it is easily found, install with yum install gtkterm or apt-get install gtkterm)

2. Settings for serial console:

The most common configuration for asynchronous mode are used: 8-N-1.

8 = 8 data bits
N = no parity bits
1= 1 stop bit

Most console programs will default to these settings. Additionally, the communication speed should be set to at least 9600 baud.

3. Initial setup:

Power on the switch and wait for startup to complete. The Easy Setup Wizard will display on an unconfigured switch. These are the important points:

Would you like to set up the SNMP management interface now? [Y/N] N
To set up a user account: The default account is 'admin', but anything may be used. Be sure to set a password
To set up an IP address: Choose 'DHCP', as Rocks will handle address assignments in the cluster.
Select 'Y' to save the config and restart.

Note: For PowerConnect 8132, also chose N for Out-Of-Band interface, N for VLAN1 routing interface, setup the admin account and password (as usual), it has listed before confirmation as Out-of-band IP address = DHCP, VLAN1 Router Interface IP =0.0.0.0 0.0.0.0

4. Network connections:

Now get Rocks to recognize the DHCP request issued by the switch by following these instructions.

After Rocks assigns an IP to the switch, it can be configured over telnet, SSH, and HTTP from the HN. The default name for the switch is network-0-0.

Configure switch to request IP via DHCP on in-band-network using OOB ethernet connection

1. Connect Mac via ethernet:

Note: This was tested on a Mac laptop to configure the switch after an initial configuration was done with a serial port. I expect it to work with a new switch as well, since the OOB port is set to accept DHCP by default. Connect the Mac to the OOB port on the bottom rear of the network switch with an ethernet cable.

2. Assign DHCP address to OOB port:

Make sure your Mac WiFi is connected. Use Sharing in the system preferences to turn on internet sharing to your ethernet port. Unplug the switch (both plugs), and the plug it back in so it reboots and gets a DHCP address from the Mac.

3. Find out the IP address of the switch:

This was tricky. These are some suggested methods online: Method1, Method2. The first didn't work. Apparently the default address space used for DHCP is 192.168.2.0. Your Network settings should show an internal IP address for the ethernet port, such as 169.254.*.*. Be sure that you see the light blinking at the OOB port which means traffic is being transferred between the devices. In a web browser, put in 192.168.2.0, and then try the next adress 192.168.2.1, 192.168.2.2 (this one worked in our case).

4. Configure the switch via the OOB port:

Using the graphic interface in the browser, be sure to set the following options:
A) username and password (admin is traditional)
B) Under "Routing", "IP", "IP Interface Configuration" to set the VLAN IP address configuration method to "DHCP"
C) Turn off the spanning tree below

5. Set switch IP address for VLAN (in-band) port:

Now get Rocks to recognize the DHCP request issued by the switch by following these instructions. (This presumes your HN is connected to the regular switch ports via ethernet cable.)

After Rocks assigns an IP to the switch, it can be configured over telnet, SSH, and HTTP from the HN. The default name for the switch is network-0-0.

Turn off the spanning tree using a graphical browser:

The Spanning Tree Protocol (STP) must be disabled. During Rocks Kickstart, links will go up and down a few times during the DHCP request, and STP won't properly activate a port until it has been up for several seconds. This can be done from the command line, but it is simpler to use the web-enabled interface from a browser on the HN.

From the head node, open a graphical browser and enter the name assigned by Rocks, typically network-0-0. Under Switching->Spanning Tree->Global Settings, select Disable from the "Spanning Tree Status" drop down menu. Click "Apply Changes" at the bottom.

If, for some reason, the browser method doesn't work, type these commands at the VT100 console provided by minicom or similar software:

console#config
console(config)#spanning-tree disable <port number>
(this will have to be done for all 24 ports!)
console(config)#exit
console#show spanning-tree
Spanning tree Disabled mode rstp
console#quit

Configure the big disk array (old /data)

Description	Format with XFS, make the partition and logical volume, network mount.
Dependencies	- Grid node Kickstarted
Notes	We use the xfs format as it is optimized for large disks. We also use a logical volume so that we can create a single mount and resize later as desired. Our disk array is a DAS, so we choose to mount from the grid node.
Guides

Create, format & mount the disk array on the GN
Network mount the disk array on all nodes

Create, format & mount the disk array on the GN:

As root on the grid node:

Install XFS programs:
yum install xfsprogs
Identify the array's hardware designation with fdisk:
fdisk -l
Our disk array is /dev/sdc.
Use GNU Parted to create the partition:
parted /dev/sdc
At the parted command prompt:
1. Change the partition label type to GUID:
  mklabel gpt
2. Create a primary partition covering the entire volume:
  mkpart primary 0 9293440M
3. Confirm creation of the partition:
  print
  Output should look similar to:
  
  Disk geometry for /dev/sdc: 0.000-9293440.000 megabytes
  Disk label type: gpt
  Minor Start End Filesystem Name Flags
  1 0.017 9293439.983
4. Quit GNU parted:
  quit
Assign the physical volumes (PV) for a new LVM volume group (VG):
pvcreate /dev/sdc1
Create a new VG container for the PV. Our VG is named 'data' and contains one PV:
vgcreate data /dev/sdc1
Create the logical volume (LV) with a desired size. The command takes the form:
lvcreate -L (size in KB,MB,GB,TB,etc) (VG name)
So, in our case:
lvcreate -L 9293440MB data
On this command, we receive the error message: Insufficient free extents (2323359) in volume group data: 2323360 required. Sometimes, it is simpler to enter the value in extents (the smallest logical units LVM uses to manage volume space). We will use a '-l' instead of '-L':
lvcreate -l 2323359 data

Confirm the LV details:
vgdisplay
The output should look like:

--- Volume group ---
VG Name               data
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  2
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                1
Open LV               0
Max PV                0
Cur PV                1
Act PV                1
VG Size               8.86 TB
PE Size               4.00 MB
Total PE              2323359
Alloc PE / Size       2323359 / 8.86 TB
Free  PE / Size       0 / 0
VG UUID               tcg3eq-cG1z-czIn-7j5a-YVM1-MT70-sqKAUY

After these commands, the location of the volume is /dev/mapper/data-lvol0. Create a filesystem:
mkfs.xfs /dev/mapper/data-lvol0
Create a mount point, edit /etc/fstab, and mount the volume:
mkdir /data
Add the following line to /etc/fstab:
/dev/mapper/data-lvol0 /data xfs defaults 1 2
And mount:
mount /data
Confirm the volume and size:
df -h
Output should look like:
/dev/mapper/data-lvol0 8.9T 528K 8.9T 1% /data

Network mount the disk array on all nodes

These commands network mounts /data on all nodes. First have the GN export /data. As root (su -) on the GN:

Edit /etc/exports on the GN as root (su -):
chmod +w /etc/exports
Add this line to /etc/exports: /data 10.0.0.0/255.0.0.0(rw,async)
chmod -w /etc/exports
Restart the GN NFS service:
/etc/init.d/nfs restart
Have the NFS service start on the GN whenever it's rebooted:
/sbin/chkconfig --add nfs
chkconfig nfs on

Now have the HN mount /data and edit the Kickstart file to mount /data on all other nodes. As root (su -) on the HN:

Edit /etc/fstab on the HN and tell it to get /data from the grid node:
grid-0-0:/data /data nfs rw 0 0
Have the HN mount /data and make the symlink:
mkdir /data
mount /data
Edit /home/install/site-profiles/4.3/nodes/extend-compute.xml and place the following commands inside the <post></post> brackets:
<file name="/etc/fstab" mode="append">
grid-0-0:/data /data nfs rw 0 0
</file>
mkdir /data
mount /data
Note that given our Rocks appliance inheritance structure, the grid node will also have its /etc/fstab file appended with this network mount if it's ever reinstalled. However, since reinstalling the grid node via Rocks Kickstart is highly undesirable anyway, we break the model here. If grid node reinstall is absolutely required, after reinstall, this line needs to be removed from the /etc/fstab file on the grid node and the logical volume line in the previous section needs to be used instead.
Create the new distribution:
cd /home/install
rocks create distro
Reinstall the nodes following these instructions.

Dell OMSA & IPMI

Description	Install Dell Open Manage Server Administrator to monitor node status & configure each Baseboard Management Controller to accept IPMI commands from the HN.
Dependencies	- All desired nodes Kickstarted
Notes	We opt to have every node self-monitor with Dell OMSA. If OMSA fails on any given node, it will no longer be monitored. (I.e., we don't monitor from a central OMSA installation.)
Guides	- Dell OMSA manuals - Dell BMC manuals - Dell yum installation - Dell rpm repository

Install & configure OMSA on the HN & GN
Install & configure OMSA on the WNs & INs
Configure the WN & IN BMCs

Install & configure OMSA on the HN & GN

As root on each:

If not done so already for getting Dell firmware updates, configure yum for the Dell repo:
wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash
Install all the OMSA packages:
yum install srvadmin-all
Start the OMSA monitoring services by logging out and logging back in, then executing:
srvadmin-services.sh start
The OMSA web monitor can be accessed at https://<FQDN>:1311
Note that we disabled that option per University security practices
OMSA will not necessarily automatically start whenever a node is rebooted, it must be started manually every time there is a reboot.

We have OMSA email and text when it detects some warnings and failures. We save the scripts in the network mounted /share/apps/OMSA (on the HN). Note that if the HN is down, or if the NFS mount is not functioning on the node, it will be unable to issue an email/text alert.

Create /share/apps/OMSA/warningMail.sh:
# /bin/sh
echo "Dell OpenManage has issued a warning on" `hostname` > /tmp/OMwarning.txt
echo "If HN: https://hepcms-hn.umd.edu:1311" >> /tmp/OMwarning.txt
echo "If GN: https://hepcms-0.umd.edu:1311" >> /tmp/OMwarning.txt
echo "If WN or IN: use omreport" >> /tmp/OMwarning.txt
mail -s "hepcms warning" email1@domain1.com email2@domain2.net </tmp/OMwarning.txt>/share/apps/OpenManage-5.5/warningMailFailed.txt 2>&1
Create /share/apps/OMSA/failureMail.sh:
# /bin/sh
echo "Dell OpenManage has issued a failure alert on" `hostname` > /tmp/OMfailure.txt
echo "Immediate action may be required." >> /tmp/OMfailure.txt
echo "If HN: https://hepcms-hn.umd.edu:1311" >> /tmp/OMfailure.txt
echo "If GN: https://hepcms-0.umd.edu:1311" >> /tmp/OMfailure.txt
echo "If WN or IN: use omreport" >> /tmp/OMfailure.txt
mail -s "hepcms failure" email1@domain1.com email2@domain2.net </tmp/OMfailure.txt>/share/apps/OpenManage-5.5/failureMailFailed.txt 2>&1
Make them executable and create the error log files:
chmod +x /share/apps/OMSA/warningMail.sh
chmod +x /share/apps/OMSA/failureMail.sh
touch /share/apps/OMSA/warningMailFailed.txt
touch /share/apps/OMSA/failureMailFailed.txt

Configure OMSA to handle warnings and failures (using the website):
Note that the website is disabled per University security practices, OMSA can be configured with command line tools instead.

Navigate to https://hepcms-hn.umd.edu:1311 and log in (repeat for the GN, https://hepcms-0.umd.edu:1311)
To configure OMSA to automatically shutdown the node in the event of temperature warnings:
1. Select the Shutdown tab and the "Thermal Shutdown" subtab
2. Select the Warning option and click the "Apply Changes" button
Under the "Alert Management" tab, set the desired warning alerts to execute application /share/apps/OMSA/warningMail.sh.
Under the "Alert Management" tab, set the desired failure alerts to execute application /share/apps/OMSA/failureMail.sh.

Install & configure OMSA on the WNs & INs:

We do much the same for our other nodes via their kickstart file, however, we install only the srvadmin-base package since the other nodes don't have RAID controllers or manage disk enclosures. As root on the HN:

Add the text below to the <post></post> section of /export/rocks/install/site-profiles/5.4/nodes/extend-compute.xml:
wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash
yum -y install srvadmin-base > /root/omsa-install.log 2>&1
Create the new Kickstart:
cd /export/rocks/install
rocks create distro
Reinstall all the WNs & INs.
As for the HN & GN, OMSA can't be added to the boot sequence. After reboot of any WN or IN, be sure to start the OMSA service:
ssh internal-node-name-x-y "srvadmin-services.sh start"
or to start it on all WNs and INs:
ssh-agent $SHELL
ssh-add
rocks run host compute interactive "srvadmin-services.sh start"
OMSA can't be started and subsequently configured during Rocks Kickstart. After reinstall of the WNs & INs, be sure to configure OMSA to issue the appropriate alerts:
1. Create a shell script which will configure OMSA, /share/apps/OMSA/OMSAconfigure.sh.
2. Make it executable:
  chmod +x /share/apps/OMSA/OMSAconfigure.sh
3. And execute it after every WN or IN reinstall:
  ssh-agent $SHELL
  ssh-add
  rocks run host compute interactive "/share/apps/OMSA/OMSAconfigure.sh"

Configure the WN & IN BMCs:

We configure the Baseboard Management Controllers (BMCs) on the WNs to respond to manual IPMI calls from the HN. BMC documentation suggests it should be possible to configure the BMCs via OMSA, so rebooting the nodes as described below may not be necessary.

We downgrade the OpenIPMI version already installed by SL5 to a version from Dell which supports additional Dell commands. Navigate to Dell's repository directory containing IPMI (OMSA 6.2 IPMI for RHEL5) and download all the .rpm files. Then call yum downgrade, providing each .rpm file in turn. ipmitool will then provide a special delloem call, which gives access to additional commands (see below for an example).

To configure the BMC's to respond to IPMI command line calls from the HN, reboot each one and configure BIOS and remote access setup.

At boot time, press F2 to enter the BIOS configuration. Set the following:

Serial communication: On with console redirection via COM2
External Serial Communication: leave as COM1
Failsafe Baud Rate: 57600
Remote Terminal Type: leave as VT100/VT200
Redirection after Boot: leave enabled

Enter the remote access setup shortly after BIOS boot by typing Ctrl-E. Set the following:

IPMI Over Lan: On
NIC Selection: Failover
LAN Parameters:
- RMCP + Encryption Key: leave
- IP Address Source: DHCP
- DHCP host name: hepcms-hn
- VLAN Enable: leave off
- LAN Alert Enabled: on
- Alert Policy Entry 1: 10.1.1.1
- Host Name String: compute-x-y bmc
LAN User Configuration: see /root/bmc.txt on the HN (hidden for security)

Before exiting the remote access setup, or as soon as possible afterwards, tell the HN to listen for DHCP requests coming from the BMC. As root on the HN:

insert-ethers
Select IPMI
After Rocks recognizes the BMC(s), exit with the F9 key as they do not Kickstart.

To test that it's worked, execute from the HN:

ipmitool -H ipmi-x-y -U ... delloem sysinfo

Upgrade firmware & BIOS

Most firmware updates will require reboot of the node. We use Dell's dell_ft_install package to keep firmware up to date. This process should be done manually and semi-regularly (we update firmware roughly every 9 months) on all nodes.

If not done already from installing OMSA, set up yum to access Dell's yum repository:
wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi | bash
Get the dell_ft_install package:
yum install dell_ft_install
yum install $(bootstrap_firmware)
You can safely ignore messages about "No package...".
Check that your hardware looks correct:
inventory_firmware
See what will get updated, being sure to save the output in case of failure:
update_firmware
Update:
update_firmware --yes
Be sure to reboot:
reboot

We choose to do this manually on all nodes and do not put it in the Kickstart due to the possibility of failure. We also choose to do this only while physically present at the cluster.
Troubleshooting: If you get the following error, you can resolve with the trick from this web page.

Error:

Could not parse output, bad xml for package: dell_dup_componentid_00159

Solution (be sure to restore the original file once done with updates):

cp -pr /etc/redhat-release /etc/redhat-release-orig
echo "Red Hat Enterprise Linux Server release 5 (Tikanga)" > /etc/redhat-release

Apply firmware update...

cp -pr /etc/redhat-release-orig /etc/redhat-release

UMD HEP T3 Computing Cluster