How To: Guides for users and Maryland T3 admins.

Help: Links and emails for further info.

Configuration: technical layout of the cluster, primarily for admins.

Log: Has been moved to a Google page, accessible only to admins.

Archived OSG 0.8.0 installation instructions

These instructions assume you have already installed Pacman and subversion and have a personal grid certificate. We install OSG before BeStMan because BeStMan requires Globus, which comes with OSG. After installing BeStMan, we then go back and configure OSG again with the new SE information.

Request host certificates:

Follow these instructions. Some notes:

Install & configure OSG once without BeStMan

The OSG installation and configuration is based on this OSG guide. At the time of this writing, 1.0.0 had not yet been released, so these instructions are for 0.8.0. However, the pacman calls below do not specify versions, so to install 0.8.0 at the present time, you may have to specify the version. OSG is built on top of services provided by VDT, so VDT documentation may be helpful to you. As root on the HN (su -):

  1. Prepare for OSG install:
    cd /share/apps
    mkdir osg-0.8.0
    ln -s osg-0.8.0 osg
    mkdir wnclient
    mkdir osg-app
    chmod 775 osg-app
    mkdir /data/se/osg
    chown root:users /data/se/osg
    chmod 775 /data/se/osg
    export VDTSETUP_CONDOR_LOCATION=/opt/condor
  2. Install the WN client (note: we are following install option 2):
    cd /share/apps/wnclient
    pacman -get OSG:wn-client

    Answers to questions:
    Agree to license? y
    Where to store CA files? l (lowercase L, local)
    Cron rotation? n
  3. Install the CE:
    1. Download the CE software:
      cd /share/apps/osg
      pacman -get OSG:ce

      Answer yall when asked if you want to add packages.
    2. Execute from the command line, add to ~root/.bashrc & /etc/skel/.bashrc:
      . /share/apps/osg/setup.sh
      Add to /etc/skel/.cshrc:
      source /share/apps/osg/setup.csh
    3. Get the OSG-CondorG package:
      pacman -get OSG:Globus-Condor-Setup
    4. Get ManagedFork for Condor:
      pacman -get OSG:ManagedFork
    5. srm is hard coded in many places (some which we do not control) to be on port 8443. Change the port that CEmon listens on by replacing 8443 in /share/apps/osg/tomcat/v5/conf/server.xml with 7443. We must do this since our HN is both our CE and SE. The line:
      <Connector port="8893"
      enableLookups="false" redirectPort="8443" debug="0"
      protocol="AJP/1.3" />

      should become:
      <Connector port="8893"
      enableLookups="false" redirectPort="7443" debug="0"
      protocol="AJP/1.3" />
    6. Similarly, change the ports that Apache listens on by replacing 8080 in /share/apps/osg/apache/conf/httpd.conf with 6060. The line:
      Listen 8080
      should become:
      Listen 6060
      And edit the file /share/apps/osg/apache/conf/extra/httpd-ssl.conf to change port 8443 to port 7443. The lines:
      Listen 8443
      RewriteRule (.*) https://%{SERVER_NAME}:8443$1

      should become:
      Listen 7443
      RewriteRule (.*) https://%{SERVER_NAME}:7443$1
  4. Create the needed OSG users and the local EDG GridMap file:
    1. Edit /share/apps/osg/edg/etc/edg-mkgridmap.conf and remove all lines but those for the mis, uscms01, and ops users.
    2. Create these special users:
      useradd -c "Monitoring information service" -n mis -s /bin/true
      useradd -c "CMS grid jobs" -n uscms01 -s /bin/true
      useradd -c "Monitoring from ops" -n ops -s /bin/true
      ssh-agent $SHELL
      ssh-add
      rocks sync config
      rocks sync users

      Setting their shell to true is a security measure, as these user accounts should never actually ssh in.
    3. Place the grid user map where it will be needed later:
      /share/apps/osg/edg/sbin/edg-mkgridmap --output /etc/grid-security/grid-mapfile
  5. Change default services to run and activate them:
    vdt-register-service --name edg-mkgridmap --enable
    vdt-register-service --name gratia-condor --disable
    vdt-register-service --name syslog-ng --disable

    vdt-control --on
    (vdt-control --list will state which services want to run, although it does not show which services are currently running)
  6. Configure OSG
    cd /share/apps/osg/monitoring
    ./configure-osg.sh

    Responses to questions:
    1. OSG group: OSG
    2. OSG hostname: hepcms-0.umd.edu
    3. OSG sitename: umd-cms
    4. VO sponsors: uscms
    5. Policy URL: http://hep-t3.physics.umd.edu/policy.html
    6. Contact name: Marguerite Tonjes
    7. Contact email: mtonjes@nospam.umd.edu (w/o nospam)
    8. City: College Park, MD
    9. Country: USA
    10. Longitude: -76.92
    11. Latitude: 38.98
    12. OSG GRID path: /share/apps/wnclient
    13. OSG APP path: /share/apps/osg-app
    14. OSG DATA path: /data/se/osg
    15. OSG WN_TMP path: /tmp
    16. OSG SITE_READ path: leave empty (hit return)
    17. OSG SITE_WRITE path: leave empty (hit return)
    18. SE available: n (note: we will configure this later, once BeStMan is installed)
    19. MonALISA: n
    20. squid: n
    21. Batch queue OSG_JOB_MANAGER: condor
    22. Condor directory: /opt/condor
    23. Condor config: /opt/condor/etc
    24. ManagedFork: y
    25. WS-GRAM service: y
    26. Syslog-NG: n
    27. CA certificate updater: y
    28. GLExec: n
    29. Subclusters: 1
    30. Name of subcluster: hepcms-0.umd.edu
    31. Processor vendor: GenuineIntel
    32. Processor model: accept what OSG picked up already (hit return)
    33. Clockspeed: accept what OSG picked up already (hit return)
    34. Physical CPUs in WNs: 8
    35. Logical CPUs in WNs: 8
    36. RAM in WNs: 16384
    37. Inbound connectivity: FALSE (though we have, advertising is a security risk)
    38. Outbound connectivity: TRUE
    39. Nodes in subcluster: 8
    40. SRM through GIP: n (note: we will configure this later, once BeStMan is installed)
    41. SE with gsiftp: hepcms-0.umd.edu (this is not the same as our BeStMan SE, which is why we configure it now)
    42. Access path: /data/se/osg (again, this is not a true SE)
  7. Fix GIP:
    mkdir /root/svn
    cd /root/svn
    svn co svn://t2.unl.edu/brian/gip/branches/original/ gip
    cp -r gip/* /share/apps/osg
  8. Fix WS-GRAM:
    chmod 666 /share/apps/osg/globus/globus-fork.log
    chmod 666 /share/apps/osg/globus/container-real.log
  9. Start OSG:
    vdt-control --off
    vdt-control --on
  10. Configure the certificates for the WNs:
    cd /share/apps/wnclient/globus
    unlink TRUSTED_CA
    ln -s /share/apps/osg/globus/share/certificates TRUSTED_CA
  11. The command which fetches the newest certificate info is better run twice a day instead of the default of once a day. Edit /var/spool/cron/root and change the line:
    20 1 * * * /share/apps/osg-0.8.0/fetch-crl/share/doc/fetch-crl-2.6.2/fetch-crl.cron
    to
    20 1,13 * * * /share/apps/osg-0.8.0/fetch-crl/share/doc/fetch-crl-2.6.2/fetch-crl.cron
  12. Edit the sudoers file:
    Copy the text in /share/apps/osg/monitoring/sudo-setup.txt.
    Edit the sudo file:
    visudo
    Paste the text.
    Write the file and quit:
    :wq!
  13. Turn on ManagedFork:
    /share/apps/osg/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y
  14. If CMSSW is installed (instructions below are repeated in the CMSSW installation):
    1. Inform BDII which versions of CMSSW are installed and that we have the slc4_ia32 environment. Edit /share/apps/osg-app/etc/grid3-locations.txt to include the lines:
      VO-cms-slc4_ia32_gcc345
      VO-cms-CMSSW_X_Y_Z CMSSW_X_Y_Z /software/cmssw
      (modify X_Y_Z and add a new line for each release of CMSSW installed)
    2. Add a link to the installation in the osg-app directory:
      cd /share/apps/osg-app
      mkdir cmssoft
      ln -s /software/cmssw cmssoft/cms

Install BeStMan

We give 7TB of space on our disk array to the SE: 1.5TB replica quality (low persistency) & 5.5TB custodial quality (high persistency). We install BeStMan after the initial OSG install to get globus, needed by BeStMan. These instructions are based on the BeStMan admin manual. We choose to install BeStMan on the HN. Newer versions of BeStMan require Java 1.6. We have been using Java 1.5 successfully with BeStMan 2.2.0.11.

Note: This guide uses the BeStMan tarball install, however, pacman can be used instead and is recommended by OSG. This has a number of advantages and were we to do the install again, we would do a pacman install. Instructions are here. However, one note of importance is that the pacman install will set some default configuration options which we do not want, in particular, we use an existing filesystem and want BeStMan to specify the paths. After a pacman install, follow the instructions in the OSG guide to reconfigure using the configure options outlined below (some configure options have changed in the latest BeStMan releases, if one is not recognized, check the BeStMan admin manual).

As root (su -) on the HN:

  1. Prepare the working environment:
    cd /data/se
    mkdir log
    chown root:users log
    chmod 775 log
    mkdir replica
    chown root:users replica
    chmod 775 replica
    mkdir replica/uscms01
    chown uscms01:users replica/uscms01
    mkdir custodial
    chown root:users custodial
    chmod 775 custodial
    mkdir custodial/uscms01
    chown uscms01:users custodial/uscms01
  2. Create a user drop-spot for files brought to the cluster via srm:
    mkdir /data/users/srm-drop
    chown uscms01:users /data/users/srm-drop
    chmod 775 /data/users/srm-drop
  3. Create a cron job to clean files from srm-drop on a daily basis. Edit /var/spool/cron/root and add the line:
    49 02 * * * find /data/users/srm-drop -mtime +7 -type f -exec rm -f {} \;
    This will remove week-old files from /data/users/srm-drop every day at 2:49am.
  4. Download & install BeStMan:
    cd /share/apps
    wget "http://datagrid.lbl.gov/bestman/pkg/bestman-2.2.0.11.tar.gz"
    tar -xzvf bestman-2.2.0.11.tar.gz
    To get the latest release of BeStMan, simply use bestman-latest.tar.gz.
  5. Configure BeStMan:
    cd bestman/setup
    ./configure \
    --with-replica-storage-path=/data/se/replica \
    --with-replica-storage-size=1572864 \
    --with-custodial-storage-path=/data/se/custodial \
    --with-custodial-storage-size=5767168 \
    --with-eventlog-path=/data/se/log \
    --with-cachelog-path=/data/se/log \
    --with-http-port=7070 \
    --with-https-port=8443 \
    --with-globus-tcp-port-range=20000,25000 \
    --with-globus-tcp-source-range=20000,25000 \
    --enable-srmcache-keyword yes \
    --with-srm-name=server \
    --with-globus-location=/share/apps/osg/globus \
    --with-java-home=/share/apps/osg/jdk1.5
    Both enable-srmcache-keyword and with-srm-name options are set to the values set here by default already. We're simply pointing these options out as worthy of further reading in the BeStMan admin manual.
  6. Make BeStMan start whenever the HN starts and start the service now:
    ln -s /share/apps/bestman/sbin/SXXbestman /etc/rc3.d/S97bestman
    /etc/rc3.d/S97bestman start

The configure script edits the file /share/apps/bestman/conf/bestman.rc, among others. This file can be edited manually without the need to restart the BeStMan service.

OSG provides older versions of srm-copy and srmcp which do not work with the latest BeStMan. A newer version of the dCache srm-client must be installed and links made to the newer clients. Finally, srmcp uses a few default values that BeStMan doesn't accept, which need to be configured.

As root (su -) on the HN:

  1. Download the latest dCache srm-client:
    wget -P /home/install/contrib/4.3/x86_64/RPMS "http://www.dcache.org/downloads/1.8.0/dcache-srmclient-1.8.0-15p8.noarch.rpm"
  2. Install the dCache srm-client on the HN:
    rpm -ivh /home/install/contrib/4.3/x86_64/RPMS/dcache-srmclient-1.8.0-15p8.noarch.rpm
  3. Link the old installation directories to the new:
    cd $VDT_LOCATION
    mv srm-v1-client srm-v1-client.old
    mv srm-v2-client srm-v2-client.old
    ln -s /opt/d-cache/srm srm-v1-client
    ln -s /share/apps/bestman srm-v2-client
    mkdir srm-v1-client/etc
    cp srm-v1-client.old/etc/config-2.xml srm-v1-client/etc/.
  4. Edit the srm configuration file srm-v1-client/etc/config-2.xml and add the following lines:
    <!-- dCache srm/managerv1.wsdl, BeStMan srm/v2/server.wsdl -->
    <webservice_path> srm/v2/server.wsdl </webservice_path>
    <!-- ONLINE|NEARLINE; NEARLINE by default (dCache), BeStMan requires ONLINE -->
    <access_latency> ONLINE </access_latency>
    This is needed for pulling data from a BeStMan server. You'll also need to edit the <pushmode> tag, set it to true:
    <pushmode> true </pushmode>

    This is needed for third party transfers (srmcp srm://... srm://...) between dCache & BeStMan servers, which is used by PhEDEx.
  5. Place the srm configuration file in the Rocks install directory to be served to the WNs:
    cp srm-v1-client/etc/config-2.xml /home/install/contrib/4.3/x86_64/RPMS/dCache-srm-client-config.x-m-l
  6. Install and configure the dCache srm-client on the WNs:
    1. Edit /home/install/site-profiles/4.3/nodes/extend-compute.xml and add the line:
      <package>dcache-srmclient</package>
      As well as adding the following to the <post> section:
      mkdir /opt/d-cache/srm/etc
      cd /opt/d-cache/srm/etc
      wget http://<var name="Kickstart_PublicHostname"/>/install/rocks-dist/lan/x86_64/RedHat/RPMS/dCache-srm-client-config.x-m-l -O config-2.xml
      cd -
    2. Create the new Rocks distribution:
      cd /home/install
      rocks-dist dist
    3. Reinstall the WNs

Configure OSG again with new SE

Now that we have BeStMan installed, which manages our SE, we can configure OSG for our SE.

cd /share/apps/osg/monitoring
./configure-osg.sh

Use the same responses to questions, except for questions 18 & 40, where we will now configure the SE. This will generate a new subset of questions:

18.  SE available: y
18a. Default SE: hepcms-0.umd.edu
40.  Publish SRM information through GIP: Y
40a. OSG sitename of SRM storage element: UMD-CMS-SE
40b. Hostname of SRM SE: hepcms-0.umd.edu
40c. SRM implementation: bestman
40d. Version of bestman: 2.2.0.11
40e. Protocol version for SRM SE access: 2.0.0
40f. Number of gsiftp access points: 1
40g. Access endpoint for gsiftp: gsiftp://ftp_hepcms-0.umd.edu:2811
40h. SRM protocol version: 2 (ignore the hint, which is incorrect)
40i. Full path of root directory for storage: /data/se/osg
40j. Simplified VOs: y
40k. Local directory for all VOs: /data/se/osg
40l. Advertise standalone gridftp: y

Answer 41 and 42 as before.

Stop and start the services:
vdt-control --off
vdt-control --on

Upgrade and configure RSV monitoring

OSG 0.8.0 comes with RSV V1 (note: this link was out of date at the time of this writing, but provides detailed information that may be helpful regardless). We upgrade RSV to V2 following the instructions in this RSV guide. As root (su -) on the HN:

  1. Create the rsvuser:
    useradd -c "RSV monitoring user" -n rsvuser
    passwd rsvuser
    ssh-agent $SHELL
    ssh-add
    rocks sync config
    rocks sync users
  2. Place your personal usercert.pem and userkey.pem files into ~rsvuser/.globus and give rsvuser ownership:
    chown rsvuser:users .globus/*
  3. As rsvuser (su - rsvuser), edit ~/.cshrc and add to the end:
    # RSV
    source $VDT_LOCATION/vdt/etc/condor-devel-env.csh
  4. Create the proxy as rsvuser:
    voms-proxy-init -voms cms -out /home/rsvuser/x509up_rsv -hours 1000
    Make note of the expiration date and be sure to log back on as rsvuser and renew the proxy whenever it is about to expire.
    Note that 1000 hours may be too large for your existing certificate. Reduce the number until you no longer receive an error about the proxy expiring after the lifetime of the certificate.
  5. Return to the root user and turn the services off while we modify them:
    cd $VDT_LOCATION
    vdt-control --off
  6. Follow the instructions in the RSV V1 to V2 upgrade guide to backup the existing RSV directories and get the new files.
  7. Additionally, if you've downloaded the latest version of the vdt-update-certs package (released Sep. 11, 2008), this 'hacked' V2 RSV upgrade will throw an inappropriate error. The probe in question can be replaced safely:
    cd $VDT_LOCATION/osg-rsv/bin/probes
    mv cacert-crl-expiry-probe /tmp/cacert-crl-expiry-probe.old
    wget "http://rsv.grid.iu.edu/downloads/pre-release/0.8.0/cacert-crl-expiry-probe"
    chmod +x cacert-crl-expiry-probe
  8. Make the html-consumer scripts, which generate the monitoring web pages, executable:
    chmod +x $VDT_LOCATION/osg-rsv/bin/consumers/*
  9. Configure RSV:
    $VDT_LOCATION/osg-rsv/setup/configure_osg_rsv \
    --user rsvuser --init --server y \
    --ce-probes --ce-uri "hepcms-0.umd.edu" \
    --srm-probes --srm-uri "hepcms-0.umd.edu" \
    --srm-webservice-path "srm/v2/server" --srm-dir /data/se/osg \
    --grid-type "OSG" --gridftp-probes \
    --setup-for-apache --gratia --verbose --consumers \
    --proxy /home/rsvuser/x509up_rsv
  10. Start the services:
    vdt-control --on
  11. You can verify that the probes are running by executing:
    . $VDT_LOCATION/vdt/etc/condor-devel-env.sh
    condor_q
  12. RSV will create files in /data/se/osg which must be cleaned up regularly by cron. As root (su -) on the HN, edit /var/spool/cron/root and add the line:
    06 05 * * 0 find /data/se/osg -mtime +7 -type f -exec rm -f {} \;
    This will remove week-old files in /data/se/osg every Sunday at 5:06am. RSV also places files in /tmp, in addition to other programs. If you haven't done so already, configure cron to garbage collect /tmp on all of the nodes.

Results of the RSV probes will be visible at http://hepcms-0.umd.edu:6060/rsv in 15-30 mins. Further information can be found in $VDT_LOCATION/osg-rsv/logs/probes.

You can enable new probes by:
cp $VDT_LOCATION/osg-rsv/config/sample_metrics.conf $VDT_LOCATION/osg-rsv/config/hepcms-0.umd.edu_metrics.conf
and editing hepcms-0.umd.edu_metrics.conf to turn the desired probes on and off.

Register with the Grid Operations Center (GOC):

This should be done only once per site (we've done already).

  1. Navigate to the OSG Information Management web portal.
  2. Register as a new user.
  3. Under the Registrations navigation bar, select Resources->Add New Resource
  4. Fill in the following values for our CE:
    Facility: My Facility Is Not Listed (now that we have registered, we select University of Maryland for any new resources we might add later)
    Site: My Site Is Not Listed (again, now that we have registered, we select UMD-CMS)
    Resource Name: umd-cms
    Resource Services: Compute Element, Bestman-Xrootd Storage Element (note: this text may change soon, select whatever is designated as BeStMan or select SRM V2 if BeStMan is removed)
    Fully Qualified Domain Name: hepcms-0.umd.edu
    Resource URL: http://hep-t3.physics.umd.edu
    OSG Grid: OSG Production Resource
    Interoperability: Select WLCG Interoperability BDII (Published to WLCG); do not select WLCG Interoperability Monitoring (SAM)
    GOC Logging: Do not select Publish Syslogng
    Resource Description: Tier-3 computing center. Priority given to local users, but opportunistic use by CMS VO allowed.
  5. Add the primary and secondary system and security admins.
  6. You will receive emails with further instructions.

Once VORS registration has completed, monitoring info will be here. Once BDII registration has completed, monitoring info will be here.