PhEDEx
Description | Install, configure, run, and commission links for PhEDEx. |
Notes | We configure PhEDEx to use srm calls directly, instead of 'FTS'. FTS is the most commonly used service by Tier 1 and Tier 2 sites because it tends to be more scalable. FTS requires gLite, which may conflict with an existing CRAB gLite-UI install, so be sure to install PhEDEx on a different node in this case. We install PhEDEx on our grid node. It actually is 99% out of date as of 2015 and will be removed and replaced shortly. ADMINS of hepcms: please consult our private Google pages for documentation. |
Last modified | September 10, 2015 |
Table of Contents
Site registration
Description | Register in the CMS SiteDB and Savannah. |
Dependencies | - A personal grid certificate |
Notes | You can register your site in SiteDB prior to OSG GOC registration, however, once OSG GOC registration is complete, you should change your SAM name to your OSG GOC name by filing a new Savannah ticket. |
Guides | - CMS new site twiki |
- Create a Savannah ticket with your user public key (usercert.pem) and with the information:
- Site name: UMD
- CMS name: T3_US_UMD
- SAM name: umd-cms (our OSG GOC registration name)
- City/Country: College Park, MD, USA
- Site tier: Tier 3
- SE host: hepcms-0.umd.edu
- SE kind: disk
- SE technology: BeStMan
- CE host: hepcms-0.umd.edu
- Associate T1: FNAL
- Grid type: OSG
- Data manager: Marguerite Tonjes
- PhEDEx contact: Marguerite Tonjes
- Site admin: Marguerite Tonjes
- Site executive: Nick Hadley
- Email the persons listed here and ask them to add our site to the PhEDEx database, including a link to the Savannah ticket (CERN phonebook).
- Once someone has responded to say UMD has been put into SiteDB, go to https://cmsweb.cern.ch/sitedb/sitedb/sitelist/
- Log in with your CERN hypernews user name and password
- Under Tier 3 centres, click on the T3_US_UMD link
- Click on "Edit site information" and specify OSG as our Grid Middleware, our site home page as http://hep-t3.physics.umd.edu and our site logo URL as http://hep-t3.physics.umd.edu/images/umd-logo.gif
- We can also add/edit user information by clicking on "Edit site contacts":
- Click on "edit" to edit an existing user's info
- Click on "Add a person with a hypernews account to site" to add someone new
- Then click on the first letter of the user's last name. Note that many users are listed by their middle name instead of their last.
- Find the user in the list, and click "edit"
- A new page will appear. Click on appropriate values ("Site Admin", "Data Manager",etc.) in the last row of the new page (for the Tier 3), and click "Edit these details" to save.
- Under Site Configuration, select "Edit site configuration":
- CE FQDN: hepcms-0.umd.edu
- SE FQDN: hepcms-0.umd.edu
- PhEDEx node: T3_US_UMD
- GOCDB ID: leave blank
- Install development CMSSW releases?: Do not check
- Site installs software manually?: Check
It's also a good idea to join the Savannah "CMS Computing Infrastructure Support" group so that Savannah support tickets related to your site can be assigned to you. Navigate to the Savannah group page, type cmscompinfrasup under "Request for Inclusion" and click the "Search Group(s)" button. State you're the sysadmin of T3_US_XXX and submit the request. After your request has been approved, submit a ticket selecting category "Facilities Operations" and assign it to "cmscompinfrasup-facilities" asking for a squad for T3_US_XXX, specifying who should be assigned to the squad.
Install on the GN
Description | Install & configure PhEDEx on the grid node. |
Dependencies | - Kerberos installed and configured (loose dependency - needed only to commit PhEDEx configuration to CMS CVS repository) - CMSSW installed - Hadoop installed and running - OSG SE installed, configured, running, and passing RSV tests |
Notes | These instructions are for PhEDEx 3.3.2, though they can be adapted for later releases. We install the software directly on the GN in /localsoft/phedex instead of on the HN in /home/phedex. |
Guides | - PhEDEx installation script from Doug Johnson (hypernews access required) - PhEDEx installation - PhEDEx agent configuration - PhEDEx site configuration - Examples of configuration - CMS new site twiki - TCP tuning |
Prepare for the PhEDEx install. On the HN as root (su -):
- Create the PhEDEx user:
useradd -c "PhEDEx" -n phedex -s /bin/bash
passwd phedex
ssh-agent $SHELL
ssh-add
rocks sync config
rocks sync users - Change ownership of the directory on /data which PhEDEx will use:
chown phedex:users /hadoop/store
chmod 775 /hadoop/store - And as root on the GN:
mkdir /localsoft/phedex
chown phedex:users /localsoft/phedex
As phedex on the GN:
- Set up the environment:
cd /localsoft/phedex
mkdir 3.3.2
ln -s 3.3.2 current
cd 3.3.2
mkdir -p state logs sw gridcert
chmod 700 gridcert
export sw=$PWD/sw - Install PhEDEx following these instructions. Some notes:
- We set myarch=slc5_amd64_gcc434
- We set version=3_3_2
- We use the srm client already installed and network mounted on the OSG CE (we tell PhEDEx to grab the environment in the ConfigPart.Common file when we configure PhEDEx below).
- We use the JDK already installed and network mounted on the OSG CE. No special modifications to PhEDEx to use it were required.
- Configure PhEDEx following these (1, 2) instructions. Examples of site configuration can be found here. Our local site configuration can be found here.
Some notes:
- $PHEDEX_BASE refers here to the installation directory: /localsoft/phedex/current. The environment variable will be set by sourcing configuration files when starting services the first time in the next section.
- Our site name is T3_US_UMD, so our configuration directories are
$PHEDEX_BASE/SITECONF/T3_US_UMD/PhEDEx
and
$PHEDEX_BASE/SITECONF/T3_US_UMD/JobConfig - We had to modify more than just PhEDEx/storage.xml and JobConfig/site-local-config.xml, so be sure to check all the files in the directories for differences from the default templates.
- The JobConfig directory is not actually needed by PhEDEx, it's needed by CMSSW. However, since site-local-config.xml is dependent on content on storage.xml, we configure it now. It's harmless to leave the JobConfig directory inside your $PHEDEX_BASE/SITECONF/T3_US_XXX directory.
- CMSSW jobs also need the files in your SITECONF directory. Copy the entire SITECONF directory to the $CMS_PATH directory:
su -
cp -r /localsoft/phedex/current/SITECONF /sharesoft/cmssw/.
cp -r /sharesoft/cmssw/SITECONF/T3_US_UMD /sharesoft/cmssw/SITECONF/local
chown -R cmssoft:users /sharesoft/cmssw/SITECONF
logout
Some sites use different storage.xml files in their $PHEDEX_BASE and $CMS_PATH directories when they don't have a locally installed storage element. Since we have a storage element, ours are the same. - After starting services (detailed in the next section) for the first time, you can test your storage.xml file by:
cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Prod environ`
Test srmv2 mapping from LFN to PFN:
/localsoft/phedex/current/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_2/Utilities/TestCatalogue -c storage.xml -p srmv2 -L /store/testfile
Test srmv2 mapping from PFN to LFN:
/localsoft/phedex/current/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_2/Utilities/TestCatalogue -c storage.xml -p srmv2 -P srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/testfile
Other transfers types can be tested by changing the protocol tag srmv2 to direct, srm, or gsiftp and changing the PFN or LFN argument passed to match. PhEDEx services don't need to be running to do these tests, but the first time PhEDEx is started, it creates some of the needed directories for this test.
- Submit a Savannah ticket for a CVS space under /COMP/SITECONF named T3_US_UMD (already done for UMD). Once you receive the space, upload your site configuration to CVS:
/usr/kerberos/bin/kinit -5 username@CERN.CH
cvs co COMP/SITECONF/T3_US_UMD
cp -r /localsoft/phedex/current/SITECONF/T3_US_UMD/* COMP/SITECONF/T3_US_UMD/.
cd COMP/SITECONF/T3_US_UMD
cvs add PhEDEx
cvs add PhEDEx/*
cvs add JobConfig
cvs add JobConfig/site-local-config.xml
cvs commit -R -m "T3_US_UMD PhEDEx & CMSSW configuration" PhEDEx JobConfig
- Once your initial registration request is satisfied, you will receive three emails titled "PhEDEx authentication role for Prod (Debug, Dev)/UMD." Copy and paste the commands in the email to the command line. Copy the text output for each into the file $PHEDEX_BASE/gridcert/DBParam. Each text output should look something like (exact values removed for security):
Section Prod/UMD
Interface Oracle
Database db_not_shown_here
AuthDBUsername user_not_shown_here
AuthDBPassword LettersAndNumbersNotShownHere
AuthRole role_not_shown_here
AuthRolePassword LettersAndNumbersNotShownHere
ConnectionLife 86400
LogConnection on
LogSQL off
PhEDEx transfers 1-2GB files, which tend to require some tuning of TCP settings to avoid mid-transfer failure. As root on the GN, follow the instructions in ESNet's TCP tuning guide to modify various settings for optimal network performance. Note that when confirming buffer sizes are at least 4MB, you need only check:
/sbin/sysctl -a | grep "net.core.*mem_max"
/sbin/sysctl -a | grep "net.ipv4.*mem"
(net.core.*max will return some entries that aren't related to memory buffer size). It doesn't hurt to follow the TCP tuning guide on other nodes as well, but the GN is the only node for which it's necessary.
Get proxy & start services
Description | Get a proxy for PhEDEx to use and start PhEDEx services. |
Dependencies | - A personal grid certificate - PhEDEx installed and configured |
Notes | After reboot of the grid node, the grid certificate and proxy should still be valid, but PhEDEx services aren't configured to start automatically. |
Guides |
On the grid node:
- Copy your personal usercert.pem and userkey.pem grid certificate files into ~phedex/.globus and give the phedex user ownership (may be required to be on HN which mounts that directory):
chown phedex:users ~phedex/.globus/* - As phedex (on grid node), create your grid proxy:
voms-proxy-init -voms cms -hours 350 -out /localsoft/phedex/current/gridcert/proxy.cert
Be sure to make note of when the proxy will expire and log on to renew it before then. Some sites will not accept proxies older than a week, so if you have many links, you will probably need to renew your proxy every week. - Now start the services. To be extra safe, each service should be started in a new shell, though in most cases, executing the following in sequence should be OK:
- Start the Dev service instance:
cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Dev environ`
/localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Dev start
This service can be stopped by changing the command start to stop. - Start the Debug service instance:
cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Debug environ`
/localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Debug start
This service can be stopped by changing the command start to stop. - Start the Prod service instance:
cd /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx
eval `/localsoft/phedex/current/PHEDEX/Utilities/Master -config /localsoft/phedex/current/SITECONF/T3_US_UMD/PhEDEx/Config.Prod environ`
/localsoft/phedex/current/PHEDEX/Utilities/Master -config Config.Prod start
This service can be stopped by changing the command start to stop.
- Start the Dev service instance:
Clean Logs:
Description | Use logrotate to clean old PhEDEx logs. |
Dependencies | - PhEDEx services started at least once |
Notes | PhEDEx does not clean up its own logs. The first time you start the PhEDEx services, it will create the log files. We use logrotate in cron to clean them monthly, as well as to retain two months of old logs. |
Guides | - Logrotate guide |
On the grid node as user phedex:
- Create the backup directories:
mkdir /localsoft/phedex/current/Dev_T3_US_UMD/logs/old
mkdir /localsoft/phedex/current/Debug_T3_US_UMD/logs/old
mkdir /localsoft/phedex/current/Prod_T3_US_UMD/logs/old
- Create the file steering file, /home/phedex/phedex.logrotate.
- Run logrotate from the command line to check that it works:
/usr/sbin/logrotate -f /home/phedex/phedex.logrotate -s /home/phedex/logrotate.state - As root (su -), automate by editing /var/spool/cron/phedex and adding the line:
52 01 * * 0 /usr/sbin/logrotate /home/phedex/phedex.logrotate -s /home/phedex/logrotate.state > /dev/null 2>&1
Which will direct logrotate to run every Sunday at 1:52 as the user phedex. - Additionally, the Prod download-remove agent doesn't clean up its job logs. As root, edit /var/spool/cron/phedex and add the line:
02 00 * * 0 find /localsoft/phedex/current/Prod_T3_US_UMD/state/download-remove/*log -mtime +7 -type f -exec rm -f {} \; > /dev/null 2>&1
Commission links:
Description | Commission links from other sites to enable downloading of official CMS data to your site from the other site. |
Dependencies | - PhEDEx services running |
Notes | To download data using PhEDEx, a site must have a Production link originating from one of the nodes hosting the dataset. To create each link, sites must go through a LoadTest/link commissioning process.. |
Guides | - Debugging data transfers |
- The first link you'll want to commission is from the T1_US_FNAL_Buffer. To commission from FNAL, send a request to begin the link commissioning process to hn-cms-ddt-tf@cern.ch.
- For non-FNAL sites, contact the PhEDEx admins for that site as listed in SiteDB (requires Firefox). Ask them if a link is OK and if so, to please create a LoadTest. Create a Savannah ticket requesting that the Debug link be made from the other site to T3_US_UMD. Select the data transfers category, set the severity as 3-Normal, the privacy as public and T3_US_UMD as the site.
- PhEDEx or originating-site admins may create the transfer request for you. If they do, follow the link in the PhEDEx transfer request email sent to you to approve the request. If they do not, create the transfer request yourself:
- Go to the PhEDEx LoadTest injection page and under the link "Show Options," click the "Nodes Shown" tab, then select the source node.
- Find T3_US_UMD in the "Destination node" column and copy the "Injection dataset" name.
- Create a transfer request and copy the dataset name into the "Data Items" box. Select T3_US_UMD as the destination. The DBS is typically LoadTest07, but some sites may create the subscription under LoadTest. You will receive an error if you select the wrong one - simply go back and select the other DBS. Leave the drop down menus as-is (replica, growing, low priority, non-custodial, undefined group). Enter as a comment something to the effect of "Commissioning link from T1_US_FNAL_Buffer to T3_US_UMD," then click the "Submit Request" button.
- As administrator for the site, you should be able to approve the request right away, simply select the "Approve" radio button and submit the change.
- Files created by load tests should be removed shortly after they are created.
- To use a cron job that will remove LoadTest files on regular intervals, login to the GN as root (su -), edit /var/spool/cron/root and add the line:
07 * * * * find /hadoop/store/PhEDEx_LoadTest07 -mmin +180 -type f -exec rm -f {} \;
37 * * * * find /hadoop/store/PhEDEx_LoadTest07 -depth -type d -mmin +180 -exec rmdir --ignore-fail-on-non-empty {} \;
This will remove three hour old PhEDEx load test files every hour at the 7th minute. - Or you can configure the Debug agent to delete files immediately after download. To do this, base your PhEDEx configuration on the T3_US_FNALXEN configuration.
- To use a cron job that will remove LoadTest files on regular intervals, login to the GN as root (su -), edit /var/spool/cron/root and add the line:
- Once load tests have been successful at a rate of >5 MB/sec for one day, the link qualifies as commissioned and PhEDEx admins will create the Production link. If PhEDEx admins don't take note of the successful tests within a week, you can send a reminder to hn-cms-ddt-tf@cern.ch or reply to the Savannah ticket that the link passes commissioning criteria and that you'd like the Prod link to be created.