November 2009 Log
Kernel updates, updated to OSG 1.2.3, TCP tuning on the grid node for PhEDEx.
November 12, 2009
MK -- TCP tuning & PhEDEx config
- Lots of PhEDEx transfers failing and timing out. I had a lot of partially transferred files, so followed the advice of Bill Strossman & Patrick Ford about TCP tuning, detailed here. Also added the option -connectiontimeout 3600 to the srm-copy command in PhEDEx. If this works over the next couple days and the GN doesn't suffer, I'll propagate the new TCP tuning settings to my other nodes and add to the guide. I should also check later that the /etc/sysctl.conf file isn't served by 411 and gets overwritten on the GN.
November 8, 2009
MK -- Kernel updates, shot compute & interactive nodes, updated to OSG 1.2.3
- Preparing for HADOOP on compute nodes, so created new partition tables for compute nodes. Shot compute nodes. Since interactive nodes used to inherit their formatting tables from the default (replace-auto-partition.xml), I had to modify interactive.xml to keep the old partition tables in place on the interactive nodes. Although the partition tables didn't actually change on the interactive nodes, the mechanism by which they get them is different, so I thought it safest to shoot the interactive nodes as well. Followed shoot sequence: shoot, install OMSA, start OMSA, configure OMSA, remove tomcat-connectors, update, install tomcat-connectors, reboot, uname, start OMSA.
- interactive-0-1 & compute-0-7 had major problems during yum update making them subsequently completely inaccessible, so called ipmish -ip ... -u ... -p ... power cycle to force them to reboot. While they did reboot, they came up in the same flawed state, so had to shoot them manually at the RDC.
- Updated all nodes and rebooted.
- Updated OSG software stack, both the CE/SE and WN client.