July 2009 log
Fixed condor configuration so that jobs can be suspended but not evicted for performance or priority reasons. Installed SuperB FastSim software temporarily, waiting on decision from Nick whether we will allow them permanent installation access.
July 23, 2009
MK -- Installed SuperB FastSim software on phedex node
- To temporarily allow SuperB folks to compile their code, installed software according to these instructions.
- Will revisit this permissions issue with Nick.
July 21, 2009
MK -- Garbage collection on WNs
- Added weekly garbage collection of directories on WNs inside /tmp as well as edited kickstart file for the same.
July 15, 2009
MK -- Tweaked condor configuration
- Condor jobs still getting evicted (suspension is OK - job stays in memory, so CMSSW jobs don't have to restart from 0 every time they're suspended), so tweaked condor configuration to:
PREEMPTION_REQUIREMENTS = False
NEGOTIATOR_CONSIDER_PREEMPTION = False
CLAIM_WORKLIFE = 300
WANT_SUSPEND = True
SUSPEND = ( (CpuBusyTime > 2 * $(MINUTE)) \
&& $(ActivationTimer) > 300 )
CONTINUE = $(CPUIdle) && ($(ActivityTimer) > 10)
PREEMPT = False
Initially set CLAIM_WORKLIFE to 1, but condor did weird stuff - number of running jobs and claimed slots fluctuated wildly. Suspect that jobs weren't actually starting and schedd was relinquishing claim before the job ran. Set to 300 and now jobs doing much more sensical stuff. schedd does seem to be relinquishing claim as load increases, but not at ludicrously low numbers. Restarted condor services a couple times and ran tests - latest jobs behaved well.
July 2, 2009
MT -- Removed CMSSW_3_1_0_pre7, installed CMSSW_3_1_0
July 1, 2009
MK -- Installed CRAB_2_6_0