March-September 2010 Log
New site certs, enabled public_html, modified NFS /data mount settings, updated software.
September 22, 2010
MK -- Installed CRAB_2_7_4_patch1, updated rpms
September 21, 2010
MK -- Installed CRAB_2_7_4, updated user guide
August 31, 2010
MK -- Updated packages
August 26, 2010
MK -- Told apache to use UserDir
- Users can now mkdir ~/public_html, put stuff there, and have it show up at http://hepcms-hn.umd.edu/~username/... . Added to general guide.
- Additionally configured apache to list directory contents if an index.html file isn't present.
June 25, 2010
MK -- rpm updates, NFS mount of /data, reboot of cluster
- Ran through power down/up sequence with Jeff, Jon, and Nick. Took as opportunity to update rpm's and fiddle with the NFS mount settings on /data.
June 22, 2010
MK -- Installed subversion-perl
- Jeff got complaints that our subversion client is out of date. Checked what is installed at lxplus and saw they also have subversion-perl installed so I intalled that. No dependencies were required by yum.
June 9, 2010
MK -- Installed CRAB_2_7_2 on INs
- Linked as /scratch/crab/current. Also added to Kickstart.
June 8, 2010
MK -- Installed tkinter on INs
- Jeff uses python GUIs which utilize tkinter. Called "yum install tkinter" on both INs, also came along with tix. Added to IN Kickstart.
June 7, 2010
MK -- mounted /data on WNs
- All the WNs once again lost their mount of /data. Suspect an unreported power blip at the RDC was responsible, since all nodes on UPS were fine. Mounted /data again on all the WNs and queried the hypernews for suggestions. During next cluster downtime, plan to implement the following changes:
- In WNs /etc/fstab, use:
grid-0-0:/data /data nfs rw,bg,nfsvers=3,rsize=32768,wsize=32768 0 0 - When calling mount /data on the WNs, use:
mount /data -o noatime - In GN /etc/sysconfig/nfs, use:
RPCNFSDCOUNT=80
(David suggested either 32 or 48)
- In WNs /etc/fstab, use:
- It's not clear if the failure to mount /data during WN reboot is responsible for the problems or if the mount is spontaneously lost due to power fluctuations, but that the nodes aren't actually fully rebooting. If using the "bg" in fstab doesn't fix this, will need to write a cron script to monitor /data on all the WNs.
June 1, 2010
MK -- Rebooted compute nodes
- Due to a power blip in the RDC on May 27, the compute nodes either rebooted or entered some odd state where they were no longer mounting /data. Rebooted all compute nodes as it's unknown which other services might be in an odd state. After reboot, compute nodes still hadn't mounted /data, so mounted it manually.
May 26, 2010
MK -- Installed perl-libwww-perl.noarch, gv.x86_64 on INs
- To use the CMS SVN repo for papers, users must call
eval `./notes/tdr runtime -csh`, which actually uses perl script to set up the entire environment. INs originally had the error:
Can't locate LWP/Simple.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at ./notes/tdr line 1311.
BEGIN failed--compilation aborted at ./notes/tdr line 1311.
Google search indicated that the LWP::Simple perl library is available with a name of style libwww-perl. yum list available | grep perl listed perl-libwww-perl.noarch, so installed on both nodes. Came with a few other rpms, but environment eval subsequently worked. Edited interactive.xml and created the new distro. - Similarly, pdf viewers seem to have disappeared, so installed GhostView (gv.x86_64). ggv & gpdf don't seem easily accessible, so didn't install. Added to Kickstart.
May 24, 2010
MK -- Updated & reconfigured OMSA on GN
- Didn't install srvadmin-all on the GN after last Kickstart, only got the minimum OMSA services that comes on all the WNs/INs. Called yum install srvadmin-all and restarted OMSA services. Also reconfigured using script /share/apps/OMSA/OMSAconfigure.sh as well as using the web interface to configure storage monitoring.
May 6, 2010
MK -- Updated xorg-x11-server, changed BDII reporting interval
- OSG security announced vulnerability. All nodes updated.
- Burt requested we change BDII reporting interval in $VDT_LOCATION/glite/etc/glite-ce-monitor/cemonitor-config.xml. Changed one to 299, left the other at 300, restarted all OSG services. Both BDII pages showed updates from UMD and force run of RSV probes showed update on MyOSG.
April 6, 2010
MK -- Installed CRAB_2_7_1 and gLite-UI 3.2. Restored GN backup script and cron jobs.
- Unable to get 2_7_1 to work with gLite-UI 3.2, though it does work at least partially with 3.1.
- Forgot to restore the GN backup script and other cron jobs (mostly PhEDEx directory cleanup) when I reinstalled the GN. All restored now.
March 22, 2010
MK -- New site certs
- Got new site & http certs (rsv cert already taken care of due to earlier revokation). Stopped and started OSG services to get them running on new certs. Force-run RSV probes were all successful.