User How-To Guide

A number of guides are linked from the Help for Users page. The information below is specific to our cluster. Note: We are re-writing this guide over at this Google Sites page, users looking for more advanced answers can look here: https://sites.google.com/a/physics.umd.edu/umdt3/

Please let the System administrators know if you would like a section added to the user guide or if you find any inaccuracies.

Last edited June 9, 2016

Request an account
Your first connection & password changes
Connect to the cluster
Transfer files to and from the cluster
Run CMSSW
Get Kerberos tickets for FNAL & CERN
Check out code in the git CMSSW repository
Submit jobs to the cluster using Condor
Enable your web directory
Get your grid certificate & proxy, add CAs to your browser
Submit jobs to the grid using CRAB
Bring datasets to the cluster via PhEDEx
Bring files to the cluster via FileMover
Bring files to the cluster via the srm protocol
Run Madgraph on the cluster (including with condor)
Notes on other utilities
Troubleshooting connection

Request an account

Email the System administrators with your request. Accounts are available for members of the High Energy Physics group at UMD: see our policy for jobs.

Before logging in, please read all the instructions.

Your first connection & password changes

Before logging in, please read all the instructions.

The first time you login, please ssh to the cluster (follow these instructions to get PuTTY if you have Windows, Mac users login with ssh -Y).

ssh username@hepcms.umd.edu

When you first log in, you will be prompted to create a new password; please do so. You will be automatically logged off and must log in again with the new password.

In addition to your home area, /home/username, you have space in /data/users/username. At the present time, neither area has user quotas (they will have software quotas later in 2015), so please be courteous to other users of these disks; check with the System administrators and/or Nick Hadley if you're not sure. Please place any large datasets into /data/users/username, with no more than 400GB usage. Neither area is properly backed up (although they both have safeguards), so any critical files should be backed up off-site. We request that you use no more than 30GB in your /home/username area. Significantly larger usage of /home to the point of filling up the directory will break the cluster and will result in the sysadmin deletion of files.

If you need to change your password, use the yppasswd command:

ssh username@hepcms.umd.edu
yppasswd

Connect to the cluster

From a Linux machine (Mac will need -Y option as well):

ssh username@hepcms.umd.edu

From a Windows machine:

PuTTY (Terpware link) provides an ssh client for Windows machines. Configuring PuTTY after installation varies a bit from version to version, but the most important setting is the host name: hepcms.umd.edu. You will also want to turn X11 forwarding on, typically under Connection->SSH->Tunnels, and you may want to set your username, typically under Connection. Your settings can be saved for future sessions. Note that the CMS Workbook SSH & setting up X11 twiki has some information about PuTTY, as well as Secure Shell, a different client (which has SSH and SFTP).

Newer verisons of PuTTY should have Kerberos authentication, which can be used for connecting to FNAL. You'll also need a Windows Kerberos client. Instructions to install and configure it can be found at the FNAL website. If you use this PuTTY client, be sure to turn off Kerberos authentication for your connection to hepcms, typically under Connection->SSH->Auth, listed as GSSAPI auth.

Xming is a light-weight X11 emulator and does support PuTTY. It comes with some versions of PuTTY. It is needed for any software running at hepcms which pops up windows on your machine, such as root. If you plan to run emacs, installing the additional Xming font executable is strongly recommended. If you plan to use Fireworks or other sophisticated graphics software, installing Xming-Mesa is required. Other X-servers are listed in the CMS Workbook SSH & setting up X11 twiki.

If you require a SL5 machine, ssh -X -Y username@hepcms-sl5.umd.edu. Note that this machine may no longer be supported past May 2016, and it may never have grid functionality.

Transfer files to the cluster

These instructions are for transfering small quantities of personal files to or from your working area on the cluster using scp or sftp. If you wish to transfer files from FNAL's or CERN's storage element, please use srmcp or PhEDEx. If you wish to transfer data from your space in /data, please break transfers up into small chunks, not exceeding 100MB in size. Files which are stored in the storage element (/store, /hadoop) can be moved with srm commands below. Data registered in dbs can be moved with PhEDEx or FileMover.

From a Linux machine:

To the cluster:

scp filename.ext username@hepcms.umd.edu:~/filename.ext
OR
scp filename.ext username@hepcms.umd.edu:/data/users/username/filename.ext

From the cluster:

scp username@hepcms.umd.edu:~/filename.ext filename.ext
OR
scp username@hepcms.umd.edu:/data/users/username/filename.ext filename.ext

scp accepts wildcards (*) and -r for recursive functionality.

From a Windows machine:

FileZilla is a useful Windows application for transfering files using various protocols. Once installed, select File->Site Manager. Click on the "New Site" button and enter hepcms.umd.edu as the host. Select SFTP using SSH2 as the Servertype. Select Logontype Normal and enter your user name and password if desired. Click the "Connect" button. Transfer files by dragging them from the left directory to the right or vice-versa.

WinSCP is another convenient file copy utility, version 4.0.7 is definitely compatible with the Kerberos-enabled PuTTY client linked from the FNAL website, though newer releases have also been confirmed to work (4.2.9 confirmed OK).

Run CMSSW

A CMSSW working environment is created and jobs are executed on this cluster as at any other(note that the setup script should automatically load from your .cshrc, contact sysadmins if it fails with the message /sharesoft/cmssw/cmsset_default.csh: No such file or directory.)

setenv SCRAM_ARCH slc6_amd64_gcc491
scramv1 list CMSSW
cmsrel CMSSW_X_Y_Z
cd CMSSW_X_Y_Z/src/subdir
cmsenv
cmsRun yourConfig.py

You can use this example CMSSW config file to test, which has been verified to work in CMSSW_3_9_9 but should work in most modern releases. Further details on CMSSW can be found at the Workbook and in thetutorials.

For latest CMSSW (8_1_X):
setenv SCRAM_ARCH slc6_amd64_gcc530

For other versions (find which CMSSW are available with scramv1 list CMSSW)
setenv SCRAM_ARCH slc6_amd64_gcc530
setenv SCRAM_ARCH slc6_amd64_gcc493
setenv SCRAM_ARCH slc6_amd64_gcc491
setenv SCRAM_ARCH slc6_amd64_gcc490
setenv SCRAM_ARCH slc6_amd64_gcc481
setenv SCRAM_ARCH slc6_amd64_gcc472
setenv SCRAM_ARCH slc6_amd64_gcc462

For the CMSSW listed below here, you need to use the SL5 machine (hepcms-sl5.umd.edu)

For CMSSW_6_1_X and newer:
setenv SCRAM_ARCH slc5_amd64_gcc472

For CMSSW versions 5_1_3 and above, you will need the following environment for SCRAM (Workbook reference), this is the current default:
setenv SCRAM_ARCH slc5_amd64_gcc462

For CMSSW versions 4_1_X and above, you will need a 64-bit environment for SCRAM:
setenv SCRAM_ARCH slc5_amd64_gcc434

To run older versions which need the 32-bit envrionment use:
setenv SCRAM_ARCH slc5_ia32_gcc434

* A note about CMSSW code compiled with make:
Some online software is compiled with make instead of scram b. Because the online software is developed at CERN, they occasionally do not correctly link to the appropriate CMSSW external libraries, relying instead on libraries installed on the local machine. If make fails to link, see which library caused the error and then search for its location in CMSSW using scram:

scram tool info libname

Then add the path of the include and/or lib directories in your makefile using -I and/or -L flags, as appropriate.

Get Kerberos tickets for FNAL & CERN

Two aliased commands have been set up in each new user's .cshrc/.bashrc files, automatically sourced on login. Execute either command below, with the appropriate username for FNAL or for CERN (not necessarily the same as your HEPCMS username).

kinit_fnal username@FNAL.GOV
kinit_cern username@CERN.CH

Check out code in the git CMSSW repository

Migrating from CVS to github for CMSSW FAQ
Migrating from CVS to github for UserCode FAQ
Github tutorial for CMSSW
Commands in CVS and their github equivalents
You can browse the CMSSW github repository here.

Submit jobs to the cluster using Condor

The best way to take advantage of all the cluster CPU is to run jobs using Condor. Long-running interactive jobs that consume memory and CPU on the interactive nodes are highly discouraged and may be killed. Additionally, condor jobs should run for less than 24 hours, especially if users take up all the job slots. Please split your jobs into smaller subjobs in that case. The current configuration of the cluster cannot handle multi-core jobs, instead it is 1 CPU per condor job. Our full policy for jobs is listed at this link.

*** You need this in your .jdl file after the SL6 cluster update:

Requirements = TARGET.FileSystemDomain == "privnet"

*** In addition (after the SL6 cluster update), if you did NOT already explicitly have in your condor execution scripts (.sh), be sure to cd /your/working/directory

Jobs are submitted via condor using the command:

condor_submit jdl_file

Note that you may receive the error:
WARNING: Log file ... is on NFS.
This could cause log file corruption and is _not_ recommended.
You can safely ignore this error.

Details on other condor commands and possible values in the jdl file can be found in this Condor guide.

Here is a very simple example jdl file: condor_sleep.jdl (up to date for SL6). Place this file in one of your directories on the cluster. Edit it for your desired output directory, then submit it to Condor:

condor_submit condor_sleep.jdl

You can watch the status of the jobs:

condor_q

And look at the output files when the jobs complete (all but the *.log files should be empty).

The following three files together are an example of how to submit CMSSW jobs to the cluster with Condor (not yet up to date for SL6):

To run these files, setup a CMSSW release and place condor-simple.py inside the src subdirectory. condor-simple.py has been verified to work in CMSSW_3_9_9, though it should work in most modern releases. To run other software (i.e., Madgraph), you do not need a CMSSW release area. condor-executable.sh and condor-jobs.jdl can be placed together in any directory. Edit condor-jobs.jdl with your desired output directory, a unique prefix name to help you identify the files, and your CMSSW release area. Make condor-executable.sh executable:

chmod +x condor-executable.sh

Then submit the jobs:

condor_submit condor-jobs.jdl

You can monitor the status of your jobs, they should take about 30 minutes to complete once they start running:

condor_q -submitter your_username

You can also watch condor_q:

watch -n 60 "condor_q -submitter your_username"

Once jobs are complete, the *.condor files will indicate if condor itself ran successfully, as well as providing the exit code of the CMSSW job. The *.stdout and *.stderr files will show output from condor-executable.sh and the *.log files will show output from CMSSW itself.

For CMS HEP experimentalists, you can use this file (not up to date SL6): condor-local.jdl

Enable your web directory

First, contact the sysadmins (http://hep-t3.physics.umd.edu/ContactsForUsers.html) to request your account be given access to be able to share the web pages, tell them your username and web directory reason. If you would like to share files via the web at the location http://hepcms-hn.umd.edu/~username:

chmod 711 /home/username
mkdir /home/username/public_html
chmod 755 /home/username/public_html

Copy any files you wish to make available on the web to public_html. Note that public_html is inside your /home directory, so avoid placing many large files here (limited shared space for all users). If a file with name index.html exists, it will be displayed when anyone navigates to the address http://hepcms-hn.umd.edu/~username. Otherwise the directory contents will be shown.

If desired, you can change the permissions of public_html to 711, which will prevent the web site from displaying the contents of your public_html directory when an index file isn't present. Do not change the permissions of public_html to anything other than 711 or 755. Any subdirectory can be given permissions 711 to 'hide' it on directory content lists, though if someone knows the name of a file inside, they can still read it in their browser.

Depending on the settings in your login, new files and files transferred to public_html may not be world readable. To change a file to be world readable use:

chmod 644 /home/username/public_html/index.html

Get your grid certificate & proxy, add CAs to your browser

Get your grid certificate

Follow these instructions to get your personal certificate from CERN and register with the CMS VO. To complete your registration to the CMS VO, you will also need a CERN lxplus account. This process can take up to a week. The same browser on the same machine must be used for all steps connected with getting your grid certificate.

Certificates expire in a year, you will be emailed with instructions on how to get a new one. Follow the instructions below every time you receive a new certificate.

Export your grid certificate from your browser, the interface varies from browser to browser. The exported file will probably have the extension .p12. Copy this file to the cluster following these instructions. Be sure to modify the permissions on the file so it's readable only by you:

chmod 600 YourCert.p12

Now extract your certificate and encrypted private key:

If your first time: mkdir $HOME/.globus
or to replace existing cert with new: rm $HOME/.globus/userkey.pem
openssl pkcs12 -in YourCert.p12 -clcerts -nokeys -out $HOME/.globus/usercert.pem
openssl pkcs12 -in YourCert.p12 -nocerts -out $HOME/.globus/userkey.pem
chmod 400 $HOME/.globus/userkey.pem
chmod 600 $HOME/.globus/usercert.pem
chmod go-rx $HOME/.globus

You may wish to consider deleting YourCert.p12 from /home, either way, guard this file carefully. To run grid jobs using CRAB, you will also need to register your grid identity in SiteDB following the instructions in this Twiki.

Get your proxy:

Please see the CRAB instructions below for a full recommendation on order of command usage for CRAB jobs.

voms-proxy-init -voms cms

Proxies automatically expire in 24 hours, simply issue this command again to renew it. You can get information on your proxy by issuing the command:

voms-proxy-info -all

Add CAs to your browser:

When attempting to visit a certificate-enabled website in your browser, you may be warned that the connection is untrusted. This is because the certificate info being presented to you by the site doesn't come from what's known as a "certificate authority" (CA) which you've told your browser you trust. You can choose to continue (in IE) or add an exception (in Firefox), but you'll have to do so repeatedly. This is also considered poor security practice.

While it's not required, you can add trusted CAs to your browser. Then any site presenting certificate info to you from a CA you trust won't show that annoying "are you sure you wish to continue/add exception" window. To add CAs to your browser, navigate to TACAR and click the "install" link underneath the CAs you want to trust. This is an important security decision, so don't add a CA you haven't heard of. The CAs you will want to add for most sites in CMS are: CERN Intermediate, CERN Root, DOEGrids, and ESnet Root - especially DOEGrids.

Submit jobs to the grid using CRAB

These instructions assume you have already gotten your grid certificate. Additionally, you need to have registered your grid certificate in SiteDB. You should already have a CMSSW config file which you've run successfully using interactive commands (cmsRun). An example CMSSW config file is provided in the example if you prefer. This CRAB tutorial can help you get started if you are unfamiliar with CRAB.

Setup your environment
Edit crab.cfg
Edit crab.cfg for SE output
Create, submit, watch, and retrieve your jobs
Example
Submit a CRAB job to the UMD hepcms cluster

Setup your environment:

You no longer need to get the gLite-UI (user interface) environment - step removed.

Get your proxy:

voms-proxy-init -voms cms

Navigate to the CMSSW_X_Y_Z/src you wish to use. Get the CMSSW environment variables:

cmsenv

Get the CRAB environment:

source /cvmfs/cms.cern.ch/crab/crab.csh

Navigate to the subdirectory which contains your CMSSW config file that you wish to submit, and copy the default crab configuration:

cp $CRABPATH/crab.cfg .

Edit crab.cfg:

Edit at least the following values in crab.cfg:

scheduler = remoteGlidein : The current only supported scheduler (July 2013)
datasetpath: set to a DBS dataset name or leave as none if this is a production job
pset: set to the name of your CMSSW config file that you wish to submit
total_number_of_events: the minimum number of events you wish to run over or produce (CRAB will run over at least this many events, but may run over more), set to -1 if you wish to run over all the events in the DBS dataset
number_of_jobs: the approximate number of jobs to create (a good rule of thumb is that jobs run for ~1 hour), CRAB will attempt to create this many jobs, but may create more or less
output_file: a comma separated list of the names of output files that you wish returned to you; typically root files

Edit crab.cfg for SE output:

If your output files might exceed 50MB for each job, then your must stage your output back to a storage element (SE). You can send it back to the hepcms cluster or you can send it to your user area in another SE (typically FNAL for people affiliated with UMD). Although you will stage all files in the output_file list back to a SE, the log files, *.stderr and *.stdout, will still be retrieved via normal means (the -getoutput command).

To stage your data back to the hepcms SE:

These directions assume that you have already been given user space on the UMD SE. To request SE user space at UMD, email the System administrators with your CERN username, and the output of voms-proxy-info -all. Set the following values in your crab.cfg file:

[USER]
check_user_remote_dir=0
return_data = 0
copy_data = 1
storage_element = T3_US_UMD
user_remote_dir = subdir

Files will show up in /hadoop/store/user/cern-username/subdir. Please note that due to /hadoop duplication, df -h shows twice the amount of storage space available, and we prefer to keep /hadoop at 75% or below in preparation for single node failure.

Note: if you get the error: crab: Problems trying remote dir check: Missing environment, that means that you need the check_user_remote_dir=0 option enabled in your crab.cfg.

To stage your data back to your user area in the FNAL SE:

These directions assume that you have already been given user space on the FNAL SE. To request SE user space at FNAL, email Rob Harris & the FNAL T1 admins. You should also have an account at the FNAL LPC. Note that data which is staged out to FNAL cannot be easily copied to the UMD hepcms cluster, even if you used CRAB DBS registration. A way to do this easily is under development by the CMS computing group.

Set the following values in your crab.cfg file:

return_data = 0
copy_data = 1
storage_element = T1_US_FNAL_Buffer
user_remote_dir = subdir

Files will show up in /pnfs/cms/WAX/11/store/user/cern-username/subdir, though due to the special nature of FNAL's SE, avoid using normal operations like ls on /pnfs directories.

Create, submit, watch, and retrieve your jobs:

crab -create
crab -submit
crab -status
crab -getoutput

You can also monitor the status of your jobs with the CMS dashboard. Select your name from the drop down menu and you will be provided with a great deal of information in the form of tables and plots on the status of your jobs.

Output files will be stored inside the crab_x_y_z/res directory and in the SE, if you specified this option in your crab.cfg.

Example:

These files together can be used to create and submit CRAB production jobs with output that stages back directly (not using a SE):

To run these files, install a CMSSW release and place simple.py and crab.cfg inside any CMSSW_X_Y_Z/src subdirectory. simple.py has been verified to work in CMSSW_3_9_9, but should work in most modern releases of CMSSW. crab.cfg has been verified to work in CRAB_2_7_8_patch1, but it should work in most modern releases of CRAB with minor modifications.

Setup your environment:

cmsrel CMSSW_3_9_9
cd CMSSW_3_9_9/src
# Copy simple.py and crab.cfg here
cmsenv
source /cvmfs/cms.cern.ch/crab/crab.csh

Create the CRAB jobs:

crab -create

Submit them:

crab -submit

Watch for when they complete (this can take anywhere from 15 minutes to several hours):

crab -status

Once at least one job has completed:

crab -getoutput

Output will be inside the crab_x_y_z/res directory and can be viewed with root.

Submit a CRAB job to the UMD hepcms cluster

If you're located remotely, you may want to submit jobs to the hepcms cluster via CRAB rather than Condor. CRAB jobs can be submitted from any computer which has various grid client tools installed, including CRAB. Consult your siteadmin if you do not know the appropriate commands to set up the grid and CRAB environment. You need to set two parameters in your crab.cfg file:

se_white_list = T3_US_UMD
ce_white_list = T3_US_UMD

Sometimes syntax for ce_white_list changes. Other common styles are:

ce_white_list = umd.edu, T3_US_UMD
ce_white_list = umd.edu

If CRAB jobs claim they cannot match, try modifying your white list syntax.

Of course, the hepcms cluster must have the version of CMSSW that you are using installed and must be hosting the data you are attempting to run over (production jobs require no input data). Check here for your dataset name. You may request to have a CMSSW version installed by contacting the System administrators or Nick Hadley and may bring DBS-registered data to the cluster by submitting a PhEDEx request.

Transfer data via PhEDEx

You can view the data currently hosted at our site here.

To place a request to transfer data via PhEDEx, you will need your grid certificate to be in your browser. Follow these instructions to get your certificate if you don't have it already.

Navigate to DBS and set the search parameters for your dataset.
In the list of results, choose the dataset desired.
Check the size of the dataset, both in terms of number of events and GB. You have a few options, depending on the size:
1. Any transfers of a few hundred GB or less will probably be approved, typically all AOD/AODSIM datasets are small enough to get immediate approval.
2. If the dataset is large and you don't need all the events, select the link labeled "Block info" and copy the Block id (or ids) containing the smallest number of events you require.
3. If the dataset is large and you must have all the events, contact the System administrators and/or Nick Hadley to get approval before submitting your PhEDEx request.
Check the sites hosting the dataset (click the "Show" link). We must have a commissioned PhEDEx link from one of the sites listed as hosting the dataset for the transfer to complete successfully. Our currently commissioned links are listed here. If the desired link is not present, send an email to the System administrators with the dataset name and your request to have a PhEDEx link commissioned to one of the hosting sites.
Select the link labeled "PhEDEx" and provide your grid certificate when prompted.
Under Data Items, verify that your desired dataset is listed. If you want to transfer a block, append the dataset name with a # and the block id. Multiple datasets (and multiple blocks) can be separated by a space. For example, to request two blocks:
/tt0j_mT_70-alpgen/CMSSW_1_5_2-CSA07-2231/GEN-SIM-DIGI-RECO#7efe4257-c594-472f-a638-ac9100321b2f /tt0j_mT_70-alpgen/CMSSW_1_5_2-CSA07-2231/GEN-SIM-DIGI-RECO#25831448-37a3-4f05-b470-429fad0d090e
Under Destinations, select T3_US_UMD.
Select these options from the drop down menus:
Transfer Type: Replica (default)
Subscription Type: Growing (default)
Priority: Normal (or leave as Low if you prefer, it doesn't matter much for our site)
Custodial: Non-custodial (default)
Group: undefined (default)
Enter a Comment regarding the reason you need the dataset transferred to the cluster.
Click the "Submit Request" button.
You will receive emails notifying you that you've made the request and whether or not your request was approved.
You can monitor the status of the transfer here, by providing the dataset name. Transfers typically take about a day, but can take several weeks in some special cases.
When you no longer need the dataset, please inform the System administrators and/or Nick Hadley so that the space can be used for other datasets.

You can run over the data hosted at the site with CMSSW interactively, via Condor jobs, or via CRAB jobs.

Interactive and Condor jobs need the PoolSource fileNames to be specified in the CMSSW config file. The PoolSource fileNames can be found in DBS at the dataset's link "LFNs: cff" or "LFNs: py," as you prefer. Unlike other sites, no modifications need to be made to the paths provided by DBS; paths work exactly as written. If you transferred a block (or blocks) in the dataset, rather than the entire dataset, you must get the PoolSource fileNames for just that block. Under the link "Block info," in the table row containing the block id that was transferred, select the link under the "LFNs" column titled "cff."

CRAB jobs need the datasetpath set in the CRAB config file. Simply set datasetpath to the name of the dataset in DBS and CRAB will automatically determine which blocks are hosted at the site and the paths.

Transfer data via FileMover

FileMover is a new utility which provides a user friendly interface to download individual files registered in DBS. It is intended primarily for local tests and examination prior to submitting a CRAB job to run over the entire dataset. It is not intended for downloading an entire dataset; FileMover restricts users to 10 transfer requests in a 24 hour period. FileMover can be accessed here and is viewable in a limited number of browsers, including Firefox. You will need a Hypernews account to log in. 2012: Since DBS has been depreciated for DAS, we have not verified if such a utility exists through DAS yet, this may work with the DAS system.

To download a file for testing:

Navigate to the DBS browser and find your dataset.
Click on the "LFNs: ... plain" link to get a list of the files in the dataset. Copy one of the file names.
Paste the file name into FileMover in the "Request file via LFN" box, then click the Request button.
If there are no problems with the file, the transfer will begin and usually completes in five to ten minutes. Once complete, right click the Download link and select "Copy Link Location."
Log into the cluster (ssh hepcms.umd.edu) and execute:
wget --no-check-certificate "https://cmsweb.cern.ch/filemover/download/..."
where the quotes contain the copied link.
The file can now be examined with Root or run over with CMSSW for local tests. For Root, call cmsenv in the appropriate CMSSW release area to get the appropriate version of root in your path. For CMSSW, specify the file location by prepending its full path in your PoolSource with the keyword "file:", which will direct CMSSW to look at files located in the normal filesystem, rather than the UMD file catalog.

Transfer data via the lcg-cp protocol

Generally, you should use PhEDEx to transfer datasets registered in the global DBS, FileMover to transfer individual files registered in the global DBS, or scp/sftp to transfer individual files from UMD (/data). SRM and lcg-cp work to manage a few files located in our SE (/store, /hadoop), or is a good option if you wish to transfer a few files inside someone's personal storage area on another site's storage element. We use the lcg-cp version of srm commands as they work after the OSG3 upgrade (May 2012). You will need to know the details for the storage element hosting the dataset; examples are provided for FNAL, CERN, and T3_US_UMD. DBS-registered data hosted at our site can be copied using exactly the filename path given in DBS (always starting with /store).

You will need your grid certificate and proxy to execute SRM commands.

General srm-copy syntax
Examples
T3_US_UMD hadoop examples

General srm-copy syntax:

Warning: srm-copy command are not currently working after the OSG3 upgrade (May 2012) and this section will be replaced with equivalent lcg-cp commands once they are tested. A working command example (replace the bold with your file location):

lcg-cp -v --srm-timeout 3600 -b -D srmv2 --sendreceive-timeout 2400 --connect-timeout 300 file:///proc/cpuinfo srm://hepcms-0.umd.edu:8443/srm/v2/server'?'SFN=/hadoop/store/user/belt/testfile_May2012.txt

srm-copy functions as a typical copy command, where you specify the source, then the destination. srm-copy does not accept wildcards or recursive functionality.

If you are located at the UMD cluster (/home, /data) and wish to transfer data from a different site's storage element (SE):

srm-copy "srm://se-where-data-is-located:8443/se-path?SFN=/full-path/filename.ext" file:////full-path/filename.ext"

If you are located at a remote site, you may need to set up your environment to get the srm-copy binary in your PATH (which srm-copy to verify). To get a file hosted at the UMD SE:

srm-copy "srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/path/filename.ext" file:////full-path/filename.ext

A few notes:

scp and sftp are much easier ways to get data to and from the cluster in the /data area, as our storage element is a normal disk system and does not require srm-copy commands to retrieve data.
Due to grid authentication, you need to use srm-copy, srm-rm, and lcg-cp commands to manipulate files in your local storage element area (/store, /hadoop).
Avoid using any of these commands (srm-copy, scp, or sftp) for large transfers (10GB+) and contact the System administrators and/or Nick Hadley if you must do so.
Call srm-copy --help to view further options.
Transfers using the srm protocol to or from afs areas will not work. Work in a different area or move the file using conventional commands to a different area first. The UMD cluster does not have any afs mounts.
CERN's Castor does not support srmcp or srm-copy. You must use lcg-cp to get files to and from Castor. lcg-cp syntax for CERN takes the form:
lcg-cp --verbose -b -D srmv2 source destination
where CERN's CMS Castor service runs at:
srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/...

Examples:

Examples for CERN & FNAL are provided below. Note that the username at one site may not be the same as at the other site.

Transferring a file from FNAL's dCache user area (/pnfs/cms/WAX/11/store/user/username/file.ext) to your UMD home area (/home/username/file.ext):
voms-proxy-init -voms cms
cd ~
srm-copy "srm://cmssrm.fnal.gov:8443/srm/managerv2?SFN=/11/store/user/username/file.ext" file:///home/username/file.ext

Transferring to the UMD storage element (/hadoop, /store), requires the following format for the file:
"srm://hepcms-0.umd.edu:8443/srm/managerv2?SFN=/hadoop/store/user/username/file.ext"

Transferring a file from one of CERN's Castor user areas (/castor/cern.ch/cms/store/user/username/file.ext) to your current directory:
voms-proxy-init -voms cms
lcg-cp --verbose -b -D srmv2 "srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/cms/store/user/username/file.ext" file://`pwd`/file.ext

Transferring a file from one of CERN's Castor user areas (/castor/cern.ch/cms/store/caf/user/username/file.ext) to /tmp, which is garbage collected for week old files and is not network mounted, so only accessible from the node you are working from (/tmp/file.ext) (Warning: do not fill up /tmp as usage on that interactive node will fail):
voms-proxy-init -voms cms
lcg-cp --verbose -b -D srmv2 "srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/cms/store/caf/user/username/file.ext" file:///tmp/file.ext

Transferring a file from another of CERN's Castor user areas (/castor/cern.ch/user/u/username/file.ext) to your UMD /data area (/data/users/username/file.ext):
voms-proxy-init -voms cms
lcg-cp --verbose -b -D srmv2 "srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/u/username/file.ext" file:///data/users/username/file.ext

Transferring a file from another of CERN's Castor user areas (/castor/cern.ch/user/u/username/file.ext) to your UMD /hadoop storage element area (/hadoop/store/user/username/file.ext):
voms-proxy-init -voms cms
lcg-cp --verbose -b -D srmv2 "srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/user/u/username/file.ext" "srm://hepcms-0.umd.edu:8443/srm/managerv2?SFN=/hadoop/store/user/username/file.ext"

More information on CERN's Castor area can be found here. If you want to transfer files to Castor, you'll need to execute various commands in this Twiki to make directories and modify permissions.

Transfer file from CERN EOS area to Maryland:
lcg-cp -v --srm-timeout 36000000 -b -D srmv2 --sendreceive-timeout 24000000 --connect-timeout 30000000 srm://srm-eoscms.cern.ch:8443/srm/v2/server'?'SFN=//eos/cms/store/group/phys_heavyions/username/filename.root srm://hepcms-0.umd.edu:8443/srm/v2/server'?'SFN=/hadoop/store/user/username.filename.root

T3_US_UMD hadoop examples:

Examples for moving files within T3_US_UMD storage element (/hadoop, /store). Contact the System administrators if you need additional examples, these commands are just newly tested (Nov. 2011). Generally, you should use PhEDEx to transfer datasets registered in the global DBS and FileMover to transfer individual files registered in the global DBS.

Copying a file from your /data area (/data/users/username/) to your /hadoop storage element (/hadoop/store/user/username/):
voms-proxy-init -voms cms
srm-copy "file:////data/users/username/file.ext" "srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/user/username/file.ext"

Copying a file from one directory in your /hadoop directory (/hadoop/store/user/username/directoryA/) to another of your /hadoop directories (/hadoop/store/user/username/directoryB/):
voms-proxy-init -voms cms
lcg-cp --verbose -b -D srmv2 "srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/user/username/directoryA/file.ext" "srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/user/username/directoryB/file.ext"

Removing a file in your /hadoop storage element (/hadoop/store/user/username/). Warning: this will fail dbs registry if the file is registered in dbs:
voms-proxy-init -voms cms
srm-rm "srm://hepcms-0.umd.edu:8443/srm/v2/server?SFN=/hadoop/store/user/username/file.ext"

Run Madgraph on the cluster (including with condor)

For users wishing to generate events with MadGraph on the Maryland Cluster, please note MadGraph's default is to run in multicore mode. It will run over all available cores, seriously slowing down the interactive nodes, making you unpopular with other users, and forcing the admins to kill your jobs.
This issue can be easily solved by making the proper edits to your event generation's configuration file.

In your run's directory (produced in the MadGraph environment by calling >output ExampleRun ) open Cards/me5_configuration.txt
To Run Interactively:
Uncomment nb_core
And change to: nb_core = 1

You are now limited to run on one core and will not slow down the interactive node. Yet, if you are generating a large number of events > 10,000, there is a better way, have your jobs automatically submit to Condor.

Automatically Submit Jobs to Condor
Uncomment run_mode
run_mode=1
Uncomment cluster_type and cluster_queue
cluster_type = condor
cluster_queue = None

Now when you generate events on the Cluster, they should be automatically sent to Condor. For more on Condor, see above.

Notes on other utilities

Below is an incomplete list of generally useful software installed at the cluster and any details specific to the hepcms cluster. Consult the man pages or online manuals for details on use. Please email the System administrators if you'd like a package installed, found an installed package that may be useful to others, or have encountered odd issues with an installed package.

Editors
- vi
- emacs
LaTeX/.eps/.ps/.pdf viewers & compilers
- latex
- gv - very slow for some users
- ps2pdf (also installed on condor nodes)
- display - part of ImageMagick package, frequently rotates the image and doesn't always pop the GUI to manipulate the image
- dvips - don't use, this tries to print to a default printer (does not exist)
- dvipdf
- kpdf
C/C++ compilers
- If gcc doesn't work, try g++
- Higher versions of gcc are included in CMSSW versions
- You can use the CMSSW scram compiler on straight C/C++ code by putting your file into CMSSW_X_Y_Z/src/subdir1/subdir2/bin and creating the file BuildFile.xml in the same directory. This allows you to use simple C/C++ compiled code to interface with CMSSW libraries and Root. See this Twiki for further details.

Troubleshooting your connection

If the cluster interactive nodes have recently been re-installed (you will have received an email), you may get a message as follows:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: POSSIBLE DNS SPOOFING DETECTED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The RSA host key for hepcms.umd.edu has changed,
and the key for the corresponding IP address 128.8.164.21
is unchanged. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
Offending key for IP in /Users/username/.ssh/known_hosts:229
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
82:b3:db:a7:4d:34:77:5a:50:49:38:56:4e:3b:8e:59.
Please contact your system administrator.
Add correct host key in /Users/username/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/username/.ssh/known_hosts:108
RSA host key for hepcms.umd.edu has changed and you have requested strict checking.
Host key verification failed.

When we re-install the system the hepcms RSA key does change. You can either delete your ~/.ssh/known_hosts file or remove all lines with hepcms. In your next login, you will be asked to accept the new key with a "yes".

If you try to connect and get the following it means you have had too many password failures from your IP address and you have been blocked. You need to find your IP address (whatismyip.com), and Contact the sysadmins to remedy the problem.

ssh -Y username@hepcms.umd.edu
ssh_exchange_identification: Connection closed by remote host

Any other problems, see the Users Help page

UMD HEP T3 Computing Cluster

User How-To Guide

Table of Contents

Request an account

Your first connection & password changes

Connect to the cluster

From a Linux machine (Mac will need -Y option as well):

From a Windows machine:

Transfer files to the cluster

From a Linux machine:

From a Windows machine:

Run CMSSW

Get Kerberos tickets for FNAL & CERN

Check out code in the git CMSSW repository

Submit jobs to the cluster using Condor

Enable your web directory

Get your grid certificate & proxy, add CAs to your browser

Get your grid certificate

Get your proxy:

Add CAs to your browser:

Submit jobs to the grid using CRAB

Setup your environment:

Edit crab.cfg:

Edit crab.cfg for SE output:

To stage your data back to the hepcms SE:

To stage your data back to your user area in the FNAL SE:

Create, submit, watch, and retrieve your jobs:

Example:

Submit a CRAB job to the UMD hepcms cluster

Transfer data via PhEDEx

Transfer data via FileMover

Transfer data via the lcg-cp protocol

General srm-copy syntax:

Examples:

T3_US_UMD hadoop examples:

Run Madgraph on the cluster (including with condor)

Notes on other utilities

Troubleshooting your connection