October-December 2011 Log
Installed and debugged gridftp-hdfs as well as debugged various mysterious outages caused by logging filling up directories. SE now operational with 66 TB available.
December 28, 2011
MT
- Grid node /var/log filled up again due to heavy gridftp transfers (large PhEDx transfer in play). Regular log rotation did not keep up. Consider changing log rotation during heavy PhEDEx transfers.
November 30, 2011
MT
- Condor kept failing on head node. Problems were that /var/log on HN filled up with lots of gridftp transfers listed (large PhEDEx transfer in play). Regular log rotation did not keep up. Consider changing log rotation during heavy PhEDEx transfers.
- Old UPS showed red light and possible failure of battery, although still drew power fine. Checking it randomly at later times showed it healthy. Suspect intermittent glitch but will not replace until it is consistently a problem. UPS has no logging, just have to look at the front panel lights in person.
November 4, 2011
MT
- Changed /store soft link on nodes to point to /hadoop/store (insted of previous /data/se/store). Changed extend-compute.xml to create link after mounting hadoop, and also updated Fuse kernel RPM in extend-compute.xml. Have NOT re-created distro.
October 9-12, 2011
MT, MK
- Installed and debugged gridftp-hdfs.
- Done: PhEDEx settings in site-config.xml and CVS version, config.ini changes to take into account /hadoop directories, CMSSW picked up site-config.xml, Bestman configured to use /hadoop. CRAB tests with both gLite and glidein work.
- Problems:
- Not run since Oct. 9: SAM tests - came back after reboot of full cluster Oct. 12
- RSV probe for gridftp-simple tests will randomly fail with a java error according to the log. I suspect recent GN updates may have a conflict with the version of JDK that Hadoop is happiest with. This is not repaired at this time.