How To: Guides for users and Maryland T3 admins.

Help: Links and emails for further info.

Configuration: technical layout of the cluster, primarily for admins.

Log: Has been moved to a Google page, accessible only to admins.

PhEDEx

Description Install, configure, run, and commission links for PhEDEx.
Notes

We configure PhEDEx to use srm calls directly, instead of 'FTS'. FTS is the most commonly used service by Tier 1 and Tier 2 sites because it tends to be more scalable. FTS requires gLite, which may conflict with an existing CRAB gLite-UI install, so be sure to install PhEDEx on a different node in this case. We install PhEDEx on our grid node.

It actually is 99% out of date as of 2015 and will be removed and replaced shortly. ADMINS of hepcms: please consult our private Google pages for documentation.

Last modified September 10, 2015

Table of Contents

Site registration

Description Register in the CMS SiteDB and Savannah.
Dependencies - A personal grid certificate
Notes You can register your site in SiteDB prior to OSG GOC registration, however, once OSG GOC registration is complete, you should change your SAM name to your OSG GOC name by filing a new Savannah ticket.
Guides - CMS new site twiki
  1. Create a Savannah ticket with your user public key (usercert.pem) and with the information:
    1. Site name: UMD
    2. CMS name: T3_US_UMD
    3. SAM name: umd-cms (our OSG GOC registration name)
    4. City/Country: College Park, MD, USA
    5. Site tier: Tier 3
    6. SE host: hepcms-0.umd.edu
    7. SE kind: disk
    8. SE technology: BeStMan
    9. CE host: hepcms-0.umd.edu
    10. Associate T1: FNAL
    11. Grid type: OSG
    12. Data manager: Marguerite Tonjes
    13. PhEDEx contact: Marguerite Tonjes
    14. Site admin: Marguerite Tonjes
    15. Site executive: Nick Hadley
  2. Email the persons listed here and ask them to add our site to the PhEDEx database, including a link to the Savannah ticket (CERN phonebook).
  3. Once someone has responded to say UMD has been put into SiteDB, go to https://cmsweb.cern.ch/sitedb/sitedb/sitelist/
    1. Log in with your CERN hypernews user name and password
    2. Under Tier 3 centres, click on the T3_US_UMD link
    3. Click on "Edit site information" and specify OSG as our Grid Middleware, our site home page as http://hep-t3.physics.umd.edu and our site logo URL as http://hep-t3.physics.umd.edu/images/umd-logo.gif
    4. We can also add/edit user information by clicking on "Edit site contacts":
      1. Click on "edit" to edit an existing user's info
      2. Click on "Add a person with a hypernews account to site" to add someone new
      3. Then click on the first letter of the user's last name. Note that many users are listed by their middle name instead of their last.
      4. Find the user in the list, and click "edit"
      5. A new page will appear. Click on appropriate values ("Site Admin", "Data Manager",etc.) in the last row of the new page (for the Tier 3), and click "Edit these details" to save.
    5. Under Site Configuration, select "Edit site configuration":
      1. CE FQDN: hepcms-0.umd.edu
      2. SE FQDN: hepcms-0.umd.edu
      3. PhEDEx node: T3_US_UMD
      4. GOCDB ID: leave blank
      5. Install development CMSSW releases?: Do not check
      6. Site installs software manually?: Check

It's also a good idea to join the Savannah "CMS Computing Infrastructure Support" group so that Savannah support tickets related to your site can be assigned to you. Navigate to the Savannah group page, type cmscompinfrasup under "Request for Inclusion" and click the "Search Group(s)" button. State you're the sysadmin of T3_US_XXX and submit the request. After your request has been approved, submit a ticket selecting category "Facilities Operations" and assign it to "cmscompinfrasup-facilities" asking for a squad for T3_US_XXX, specifying who should be assigned to the squad.

Install on the GN

Description Install & configure PhEDEx on the grid node.
Dependencies - Kerberos installed and configured (loose dependency - needed only to commit PhEDEx configuration to CMS CVS repository)
- CMSSW installed
- Hadoop installed and running
- OSG SE installed, configured, running, and passing RSV tests
Notes These instructions are for PhEDEx 3.3.2, though they can be adapted for later releases. We install the software directly on the GN in /localsoft/phedex instead of on the HN in /home/phedex.
Guides - PhEDEx installation script from Doug Johnson (hypernews access required)
- PhEDEx installation
- PhEDEx agent configuration
- PhEDEx site configuration
- Examples of configuration
- CMS new site twiki
- TCP tuning

Prepare for the PhEDEx install. On the HN as root (su -):

  1. Create the PhEDEx user:
    useradd -c "PhEDEx" -n phedex -s /bin/bash
    passwd phedex
    ssh-agent $SHELL
    ssh-add
    rocks sync config
    rocks sync users
  2. Change ownership of the directory on /data which PhEDEx will use:
    chown phedex:users /hadoop/store
    chmod 775 /hadoop/store
  3. And as root on the GN:
    mkdir /localsoft/phedex
    chown phedex:users /localsoft/phedex

As phedex on the GN:

  1. Set up the environment:
    cd /localsoft/phedex
    mkdir 3.3.2
    ln -s 3.3.2 current
    cd 3.3.2
    mkdir -p state logs sw gridcert
    chmod 700 gridcert
    export sw=$PWD/sw
  2. Install PhEDEx following these instructions. Some notes:
    1. We set myarch=slc5_amd64_gcc434
    2. We set version=3_3_2
    3. We use the srm client already installed and network mounted on the OSG CE (we tell PhEDEx to grab the environment in the ConfigPart.Common file when we configure PhEDEx below).
    4. We use the JDK already installed and network mounted on the OSG CE. No special modifications to PhEDEx to use it were required.
  3. Configure PhEDEx following these (1, 2) instructions. Examples of site configuration can be found here. Our local site configuration can be found here. Some notes:
    1. $PHEDEX_BASE refers here to the installation directory: /localsoft/phedex/current. The environment variable will be set by sourcing configuration files when starting services the first time in the next section.
    2. Our site name is T3_US_UMD, so our configuration directories are
      $PHEDEX_BASE/SITECONF/T3_US_UMD/PhEDEx
      and
      $PHEDEX_BASE/SITECONF/T3_US_UMD/JobConfig
    3. We had to modify more than just PhEDEx/storage.xml and JobConfig/site-local-config.xml, so be sure to check all the files in the directories for differences from the default templates.
    4. The JobConfig directory is not actually needed by PhEDEx, it's needed by CMSSW. However, since site-local-config.xml is dependent on content on storage.xml, we configure it now. It's harmless to leave the JobConfig directory inside your $PHEDEX_BASE/SITECONF/T3_US_XXX directory.
    5. CMSSW jobs also need the files in your SITECONF directory. Copy the entire SITECONF directory to the $CMS_PATH directory:
      su -
      cp -r /localsoft/phedex/current/SITECONF /sharesoft/cmssw/.
      cp -r /sharesoft/cmssw/SITECONF/T3_US_UMD /sharesoft/cmssw/SITECONF/local
      chown -R cmssoft:users /sharesoft/cmssw/SITECONF
      logout

      Some sites use different storage.xml files in their $PHEDEX_BASE and $CMS_PATH directories when they don't have a locally installed storage element. Since we have a storage element, ours are the same.
    6. After starting services (detailed in the next section) for the first time, you can test your storage.xml file by:
      cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
      eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Prod environ`

      Test srmv2 mapping from LFN to PFN:
      /localsoft/phedex/current/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_2/Utilities/TestCatalogue -c storage.xml -p srmv2 -L /store/testfile
      Test srmv2 mapping from PFN to LFN:
      /localsoft/phedex/current/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_2/Utilities/TestCatalogue -c storage.xml -p srmv2 -P srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/testfile
      Other transfers types can be tested by changing the protocol tag srmv2 to direct, srm, or gsiftp and changing the PFN or LFN argument passed to match. PhEDEx services don't need to be running to do these tests, but the first time PhEDEx is started, it creates some of the needed directories for this test.
  4. Submit a Savannah ticket for a CVS space under /COMP/SITECONF named T3_US_UMD (already done for UMD). Once you receive the space, upload your site configuration to CVS:
    /usr/kerberos/bin/kinit -5 username@CERN.CH
    cvs co COMP/SITECONF/T3_US_UMD
    cp -r /localsoft/phedex/current/SITECONF/T3_US_UMD/* COMP/SITECONF/T3_US_UMD/.
    cd COMP/SITECONF/T3_US_UMD
    cvs add PhEDEx
    cvs add PhEDEx/*
    cvs add JobConfig
    cvs add JobConfig/site-local-config.xml
    cvs commit -R -m "T3_US_UMD PhEDEx & CMSSW configuration" PhEDEx JobConfig

  5. Once your initial registration request is satisfied, you will receive three emails titled "PhEDEx authentication role for Prod (Debug, Dev)/UMD." Copy and paste the commands in the email to the command line. Copy the text output for each into the file $PHEDEX_BASE/gridcert/DBParam. Each text output should look something like (exact values removed for security):
    Section Prod/UMD
    Interface Oracle
    Database db_not_shown_here
    AuthDBUsername user_not_shown_here
    AuthDBPassword LettersAndNumbersNotShownHere
    AuthRole role_not_shown_here
    AuthRolePassword LettersAndNumbersNotShownHere
    ConnectionLife 86400
    LogConnection on
    LogSQL off

PhEDEx transfers 1-2GB files, which tend to require some tuning of TCP settings to avoid mid-transfer failure. As root on the GN, follow the instructions in ESNet's TCP tuning guide to modify various settings for optimal network performance. Note that when confirming buffer sizes are at least 4MB, you need only check:

/sbin/sysctl -a | grep "net.core.*mem_max"
/sbin/sysctl -a | grep "net.ipv4.*mem"

(net.core.*max will return some entries that aren't related to memory buffer size). It doesn't hurt to follow the TCP tuning guide on other nodes as well, but the GN is the only node for which it's necessary.

Get proxy & start services

Description Get a proxy for PhEDEx to use and start PhEDEx services.
Dependencies - A personal grid certificate
- PhEDEx installed and configured
Notes After reboot of the grid node, the grid certificate and proxy should still be valid, but PhEDEx services aren't configured to start automatically.
Guides  

On the grid node:

  1. Copy your personal usercert.pem and userkey.pem grid certificate files into ~phedex/.globus and give the phedex user ownership (may be required to be on HN which mounts that directory):
    chown phedex:users ~phedex/.globus/*
  2. As phedex (on grid node), create your grid proxy:
    voms-proxy-init -voms cms -hours 350 -out /localsoft/phedex/current/gridcert/proxy.cert
    Be sure to make note of when the proxy will expire and log on to renew it before then. Some sites will not accept proxies older than a week, so if you have many links, you will probably need to renew your proxy every week.
  3. Now start the services. To be extra safe, each service should be started in a new shell, though in most cases, executing the following in sequence should be OK:
    1. Start the Dev service instance:
      cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
      eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Dev environ`
      /localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Dev start

      This service can be stopped by changing the command start to stop.
    2. Start the Debug service instance:
      cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
      eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Debug environ`
      /localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Debug start

      This service can be stopped by changing the command start to stop.
    3. Start the Prod service instance:
      cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
      eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Prod environ`
      /localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Prod start

      This service can be stopped by changing the command start to stop.

Clean Logs:

Description Use logrotate to clean old PhEDEx logs.
Dependencies - PhEDEx services started at least once
Notes PhEDEx does not clean up its own logs. The first time you start the PhEDEx services, it will create the log files. We use logrotate in cron to clean them monthly, as well as to retain two months of old logs.
Guides - Logrotate guide

On the grid node as user phedex:

    1. Create the backup directories:
      mkdir /localsoft/phedex/current/Dev_T3_US_UMD/logs/old
      mkdir /localsoft/phedex/current/Debug_T3_US_UMD/logs/old
      mkdir /localsoft/phedex/current/Prod_T3_US_UMD/logs/old
    2. Create the file steering file, /home/phedex/phedex.logrotate.
    3. Run logrotate from the command line to check that it works:
      /usr/sbin/logrotate -f /home/phedex/phedex.logrotate -s /home/phedex/logrotate.state
    4. As root (su -), automate by editing /var/spool/cron/phedex and adding the line:
      52 01 * * 0 /usr/sbin/logrotate /home/phedex/phedex.logrotate -s /home/phedex/logrotate.state > /dev/null 2>&1
      Which will direct logrotate to run every Sunday at 1:52 as the user phedex.
    5. Additionally, the Prod download-remove agent doesn't clean up its job logs. As root, edit /var/spool/cron/phedex and add the line:
      02 00 * * 0 find /localsoft/phedex/current/Prod_T3_US_UMD/state/download-remove/*log -mtime +7 -type f -exec rm -f {} \; > /dev/null 2>&1

Commission links:

Description Commission links from other sites to enable downloading of official CMS data to your site from the other site.
Dependencies - PhEDEx services running
Notes To download data using PhEDEx, a site must have a Production link originating from one of the nodes hosting the dataset. To create each link, sites must go through a LoadTest/link commissioning process..
Guides - Debugging data transfers
    1. The first link you'll want to commission is from the T1_US_FNAL_Buffer. To commission from FNAL, send a request to begin the link commissioning process to hn-cms-ddt-tf@cern.ch.
    2. For non-FNAL sites, contact the PhEDEx admins for that site as listed in SiteDB (requires Firefox). Ask them if a link is OK and if so, to please create a LoadTest. Create a Savannah ticket requesting that the Debug link be made from the other site to T3_US_UMD. Select the data transfers category, set the severity as 3-Normal, the privacy as public and T3_US_UMD as the site.
    3. PhEDEx or originating-site admins may create the transfer request for you. If they do, follow the link in the PhEDEx transfer request email sent to you to approve the request. If they do not, create the transfer request yourself:
      1. Go to the PhEDEx LoadTest injection page and under the link "Show Options," click the "Nodes Shown" tab, then select the source node.
      2. Find T3_US_UMD in the "Destination node" column and copy the "Injection dataset" name.
      3. Create a transfer request and copy the dataset name into the "Data Items" box. Select T3_US_UMD as the destination. The DBS is typically LoadTest07, but some sites may create the subscription under LoadTest. You will receive an error if you select the wrong one - simply go back and select the other DBS. Leave the drop down menus as-is (replica, growing, low priority, non-custodial, undefined group). Enter as a comment something to the effect of "Commissioning link from T1_US_FNAL_Buffer to T3_US_UMD," then click the "Submit Request" button.
      4. As administrator for the site, you should be able to approve the request right away, simply select the "Approve" radio button and submit the change.
    4. Files created by load tests should be removed shortly after they are created.
      • To use a cron job that will remove LoadTest files on regular intervals, login to the GN as root (su -), edit /var/spool/cron/root and add the line:
        07 * * * * find /hadoop/store/PhEDEx_LoadTest07 -mmin +180 -type f -exec rm -f {} \;
        37 * * * * find /hadoop/store/PhEDEx_LoadTest07 -depth -type d -mmin +180 -exec rmdir --ignore-fail-on-non-empty {} \;

        This will remove three hour old PhEDEx load test files every hour at the 7th minute.
      • Or you can configure the Debug agent to delete files immediately after download. To do this, base your PhEDEx configuration on the T3_US_FNALXEN configuration.
    5. Once load tests have been successful at a rate of >5 MB/sec for one day, the link qualifies as commissioned and PhEDEx admins will create the Production link. If PhEDEx admins don't take note of the successful tests within a week, you can send a reminder to hn-cms-ddt-tf@cern.ch or reply to the Savannah ticket that the link passes commissioning criteria and that you'd like the Prod link to be created.