Archived OSG 0.8.0 installation instructions
These instructions assume you have already installed Pacman and subversion and have a personal grid certificate. We install OSG before BeStMan because BeStMan requires Globus, which comes with OSG. After installing BeStMan, we then go back and configure OSG again with the new SE information.
- Request host certificates
- Install & configure OSG once without BeStMan
- Install BeStMan
- Configure OSG again with new SE
- Upgrade and configure RSV monitoring
- Register with the GOC
Request host certificates:
Follow these instructions. Some notes:
- As of this writing, the latest VDT cache is http://vdt.cs.wisc.edu/vdt_1100_cache:PPDG-Cert-Scripts
- We answered no to both certificate update questions (not shown in the Twiki).
- Our full hostname is hepcms-0.umd.edu
- Enter osg as the registration authority
- Enter cms as our virtual organization (VO)
- Be sure to run the second request for the http certificate, providing hepcms-0.umd.edu as the host
- Once you've received your certificates, copy them to the appropriate directories:
cp ~root/hepcms-0cert.pem /etc/grid-security/hostcert.pem
cp ~root/hepcms-0key.pem /etc/grid-security/hostkey.pem
cp ~root/hepcms-0cert.pem /etc/grid-security/containercert.pem
cp ~root/hepcms-0key.pem /etc/grid-security/containerkey.pem
cp ~root/http-hepcms-0cert.pem /etc/grid-security/httpcert.pem
cp ~root/http-hepcms-0key.pem /etc/grid-security/http/httpkey.pem
chown daemon:daemon /etc/grid-security/containercert.pem
chown daemon:daemon /etc/grid-security/containerkey.pem
Install & configure OSG once without BeStMan
The OSG installation and configuration is based on this OSG guide. At the time of this writing, 1.0.0 had not yet been released, so these instructions are for 0.8.0. However, the pacman calls below do not specify versions, so to install 0.8.0 at the present time, you may have to specify the version. OSG is built on top of services provided by VDT, so VDT documentation may be helpful to you. As root on the HN (su -):
- Prepare for OSG install:
cd /share/apps
mkdir osg-0.8.0
ln -s osg-0.8.0 osg
mkdir wnclient
mkdir osg-app
chmod 775 osg-app
mkdir /data/se/osg
chown root:users /data/se/osg
chmod 775 /data/se/osg
export VDTSETUP_CONDOR_LOCATION=/opt/condor - Install the WN client (note: we are following install option 2):
cd /share/apps/wnclient
pacman -get OSG:wn-client
Answers to questions:
Agree to license? y
Where to store CA files? l (lowercase L, local)
Cron rotation? n
- Install the CE:
- Download the CE software:
cd /share/apps/osg
pacman -get OSG:ce
Answer yall when asked if you want to add packages. - Execute from the command line, add to ~root/.bashrc & /etc/skel/.bashrc:
. /share/apps/osg/setup.sh
Add to /etc/skel/.cshrc:
source /share/apps/osg/setup.csh - Get the OSG-CondorG package:
pacman -get OSG:Globus-Condor-Setup - Get ManagedFork for Condor:
pacman -get OSG:ManagedFork - srm is hard coded in many places (some which we do not control) to be on port 8443. Change the port that CEmon listens on by replacing 8443 in /share/apps/osg/tomcat/v5/conf/server.xml with 7443. We must do this since our HN is both our CE and SE. The line:
<Connector port="8893"
enableLookups="false" redirectPort="8443" debug="0"
protocol="AJP/1.3" />
should become:
<Connector port="8893"
enableLookups="false" redirectPort="7443" debug="0"
protocol="AJP/1.3" /> - Similarly, change the ports that Apache listens on by replacing 8080 in /share/apps/osg/apache/conf/httpd.conf with 6060. The line:
Listen 8080
should become:
Listen 6060
And edit the file /share/apps/osg/apache/conf/extra/httpd-ssl.conf to change port 8443 to port 7443. The lines:
Listen 8443
RewriteRule (.*) https://%{SERVER_NAME}:8443$1
should become:
Listen 7443
RewriteRule (.*) https://%{SERVER_NAME}:7443$1
- Download the CE software:
- Create the needed OSG users and the local EDG GridMap file:
- Edit /share/apps/osg/edg/etc/edg-mkgridmap.conf and remove all lines but those for the mis, uscms01, and ops users.
- Create these special users:
useradd -c "Monitoring information service" -n mis -s /bin/true
useradd -c "CMS grid jobs" -n uscms01 -s /bin/true
useradd -c "Monitoring from ops" -n ops -s /bin/true
ssh-agent $SHELL
ssh-add
rocks sync config
rocks sync users
Setting their shell to true is a security measure, as these user accounts should never actually ssh in. - Place the grid user map where it will be needed later:
/share/apps/osg/edg/sbin/edg-mkgridmap --output /etc/grid-security/grid-mapfile
- Change default services to run and activate them:
vdt-register-service --name edg-mkgridmap --enable
vdt-register-service --name gratia-condor --disable
vdt-register-service --name syslog-ng --disable
vdt-control --on
(vdt-control --list will state which services want to run, although it does not show which services are currently running) - Configure OSG
cd /share/apps/osg/monitoring
./configure-osg.sh
Responses to questions:- OSG group: OSG
- OSG hostname: hepcms-0.umd.edu
- OSG sitename: umd-cms
- VO sponsors: uscms
- Policy URL: http://hep-t3.physics.umd.edu/policy.html
- Contact name: Marguerite Tonjes
- Contact email: mtonjes@nospam.umd.edu (w/o nospam)
- City: College Park, MD
- Country: USA
- Longitude: -76.92
- Latitude: 38.98
- OSG GRID path: /share/apps/wnclient
- OSG APP path: /share/apps/osg-app
- OSG DATA path: /data/se/osg
- OSG WN_TMP path: /tmp
- OSG SITE_READ path: leave empty (hit return)
- OSG SITE_WRITE path: leave empty (hit return)
- SE available: n (note: we will configure this later, once BeStMan is installed)
- MonALISA: n
- squid: n
- Batch queue OSG_JOB_MANAGER: condor
- Condor directory: /opt/condor
- Condor config: /opt/condor/etc
- ManagedFork: y
- WS-GRAM service: y
- Syslog-NG: n
- CA certificate updater: y
- GLExec: n
- Subclusters: 1
- Name of subcluster: hepcms-0.umd.edu
- Processor vendor: GenuineIntel
- Processor model: accept what OSG picked up already (hit return)
- Clockspeed: accept what OSG picked up already (hit return)
- Physical CPUs in WNs: 8
- Logical CPUs in WNs: 8
- RAM in WNs: 16384
- Inbound connectivity: FALSE (though we have, advertising is a security risk)
- Outbound connectivity: TRUE
- Nodes in subcluster: 8
- SRM through GIP: n (note: we will configure this later, once BeStMan is installed)
- SE with gsiftp: hepcms-0.umd.edu (this is not the same as our BeStMan SE, which is why we configure it now)
- Access path: /data/se/osg (again, this is not a true SE)
- Fix GIP:
mkdir /root/svn
cd /root/svn
svn co svn://t2.unl.edu/brian/gip/branches/original/ gip
cp -r gip/* /share/apps/osg - Fix WS-GRAM:
chmod 666 /share/apps/osg/globus/globus-fork.log
chmod 666 /share/apps/osg/globus/container-real.log - Start OSG:
vdt-control --off
vdt-control --on - Configure the certificates for the WNs:
cd /share/apps/wnclient/globus
unlink TRUSTED_CA
ln -s /share/apps/osg/globus/share/certificates TRUSTED_CA - The command which fetches the newest certificate info is better run twice a day instead of the default of once a day. Edit /var/spool/cron/root and change the line:
20 1 * * * /share/apps/osg-0.8.0/fetch-crl/share/doc/fetch-crl-2.6.2/fetch-crl.cron
to
20 1,13 * * * /share/apps/osg-0.8.0/fetch-crl/share/doc/fetch-crl-2.6.2/fetch-crl.cron - Edit the sudoers file:
Copy the text in /share/apps/osg/monitoring/sudo-setup.txt.
Edit the sudo file:
visudo
Paste the text.
Write the file and quit:
:wq! - Turn on ManagedFork:
/share/apps/osg/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y - If CMSSW is installed (instructions below are repeated in the CMSSW installation):
- Inform BDII which versions of CMSSW are installed and that we have the slc4_ia32 environment. Edit /share/apps/osg-app/etc/grid3-locations.txt to include the lines:
VO-cms-slc4_ia32_gcc345
VO-cms-CMSSW_X_Y_Z CMSSW_X_Y_Z /software/cmssw
(modify X_Y_Z and add a new line for each release of CMSSW installed) - Add a link to the installation in the osg-app directory:
cd /share/apps/osg-app
mkdir cmssoft
ln -s /software/cmssw cmssoft/cms
- Inform BDII which versions of CMSSW are installed and that we have the slc4_ia32 environment. Edit /share/apps/osg-app/etc/grid3-locations.txt to include the lines:
Install BeStMan
We give 7TB of space on our disk array to the SE: 1.5TB replica quality (low persistency) & 5.5TB custodial quality (high persistency). We install BeStMan after the initial OSG install to get globus, needed by BeStMan. These instructions are based on the BeStMan admin manual. We choose to install BeStMan on the HN. Newer versions of BeStMan require Java 1.6. We have been using Java 1.5 successfully with BeStMan 2.2.0.11.
Note: This guide uses the BeStMan tarball install, however, pacman can be used instead and is recommended by OSG. This has a number of advantages and were we to do the install again, we would do a pacman install. Instructions are here. However, one note of importance is that the pacman install will set some default configuration options which we do not want, in particular, we use an existing filesystem and want BeStMan to specify the paths. After a pacman install, follow the instructions in the OSG guide to reconfigure using the configure options outlined below (some configure options have changed in the latest BeStMan releases, if one is not recognized, check the BeStMan admin manual).
As root (su -) on the HN:
- Prepare the working environment:
cd /data/se
mkdir log
chown root:users log
chmod 775 log
mkdir replica
chown root:users replica
chmod 775 replica
mkdir replica/uscms01
chown uscms01:users replica/uscms01
mkdir custodial
chown root:users custodial
chmod 775 custodial
mkdir custodial/uscms01
chown uscms01:users custodial/uscms01 - Create a user drop-spot for files brought to the cluster via srm:
mkdir /data/users/srm-drop
chown uscms01:users /data/users/srm-drop
chmod 775 /data/users/srm-drop - Create a cron job to clean files from srm-drop on a daily basis. Edit /var/spool/cron/root and add the line:
49 02 * * * find /data/users/srm-drop -mtime +7 -type f -exec rm -f {} \;
This will remove week-old files from /data/users/srm-drop every day at 2:49am. - Download & install BeStMan:
cd /share/apps
wget "http://datagrid.lbl.gov/bestman/pkg/bestman-2.2.0.11.tar.gz"
tar -xzvf bestman-2.2.0.11.tar.gz
To get the latest release of BeStMan, simply use bestman-latest.tar.gz. - Configure BeStMan:
cd bestman/setup
./configure \
--with-replica-storage-path=/data/se/replica \
--with-replica-storage-size=1572864 \
--with-custodial-storage-path=/data/se/custodial \
--with-custodial-storage-size=5767168 \
--with-eventlog-path=/data/se/log \
--with-cachelog-path=/data/se/log \
--with-http-port=7070 \
--with-https-port=8443 \
--with-globus-tcp-port-range=20000,25000 \
--with-globus-tcp-source-range=20000,25000 \
--enable-srmcache-keyword yes \
--with-srm-name=server \
--with-globus-location=/share/apps/osg/globus \
--with-java-home=/share/apps/osg/jdk1.5
Both enable-srmcache-keyword and with-srm-name options are set to the values set here by default already. We're simply pointing these options out as worthy of further reading in the BeStMan admin manual. - Make BeStMan start whenever the HN starts and start the service now:
ln -s /share/apps/bestman/sbin/SXXbestman /etc/rc3.d/S97bestman
/etc/rc3.d/S97bestman start
The configure script edits the file /share/apps/bestman/conf/bestman.rc, among others. This file can be edited manually without the need to restart the BeStMan service.
OSG provides older versions of srm-copy and srmcp which do not work with the latest BeStMan. A newer version of the dCache srm-client must be installed and links made to the newer clients. Finally, srmcp uses a few default values that BeStMan doesn't accept, which need to be configured.
As root (su -) on the HN:
- Download the latest dCache srm-client:
wget -P /home/install/contrib/4.3/x86_64/RPMS "http://www.dcache.org/downloads/1.8.0/dcache-srmclient-1.8.0-15p8.noarch.rpm" - Install the dCache srm-client on the HN:
rpm -ivh /home/install/contrib/4.3/x86_64/RPMS/dcache-srmclient-1.8.0-15p8.noarch.rpm - Link the old installation directories to the new:
cd $VDT_LOCATION
mv srm-v1-client srm-v1-client.old
mv srm-v2-client srm-v2-client.old
ln -s /opt/d-cache/srm srm-v1-client
ln -s /share/apps/bestman srm-v2-client
mkdir srm-v1-client/etc
cp srm-v1-client.old/etc/config-2.xml srm-v1-client/etc/. - Edit the srm configuration file srm-v1-client/etc/config-2.xml and add the following lines:
<!-- dCache srm/managerv1.wsdl, BeStMan srm/v2/server.wsdl -->
<webservice_path> srm/v2/server.wsdl </webservice_path>
<!-- ONLINE|NEARLINE; NEARLINE by default (dCache), BeStMan requires ONLINE -->
<access_latency> ONLINE </access_latency>
This is needed for pulling data from a BeStMan server. You'll also need to edit the <pushmode> tag, set it to true:
<pushmode> true </pushmode>
This is needed for third party transfers (srmcp srm://... srm://...) between dCache & BeStMan servers, which is used by PhEDEx. - Place the srm configuration file in the Rocks install directory to be served to the WNs:
cp srm-v1-client/etc/config-2.xml /home/install/contrib/4.3/x86_64/RPMS/dCache-srm-client-config.x-m-l - Install and configure the dCache srm-client on the WNs:
- Edit /home/install/site-profiles/4.3/nodes/extend-compute.xml and add the line:
<package>dcache-srmclient</package>
As well as adding the following to the <post> section:
mkdir /opt/d-cache/srm/etc
cd /opt/d-cache/srm/etc
wget http://<var name="Kickstart_PublicHostname"/>/install/rocks-dist/lan/x86_64/RedHat/RPMS/dCache-srm-client-config.x-m-l -O config-2.xml
cd - - Create the new Rocks distribution:
cd /home/install
rocks-dist dist - Reinstall the WNs
- Edit /home/install/site-profiles/4.3/nodes/extend-compute.xml and add the line:
Configure OSG again with new SE
Now that we have BeStMan installed, which manages our SE, we can configure OSG for our SE.
cd /share/apps/osg/monitoring
./configure-osg.sh
Use the same responses to questions, except for questions 18 & 40, where we will now configure the SE. This will generate a new subset of questions:
18. SE available: y
18a. Default SE: hepcms-0.umd.edu
40. Publish SRM information through GIP: Y
40a. OSG sitename of SRM storage element: UMD-CMS-SE
40b. Hostname of SRM SE: hepcms-0.umd.edu
40c. SRM implementation: bestman
40d. Version of bestman: 2.2.0.11
40e. Protocol version for SRM SE access: 2.0.0
40f. Number of gsiftp access points: 1
40g. Access endpoint for gsiftp: gsiftp://ftp_hepcms-0.umd.edu:2811
40h. SRM protocol version: 2 (ignore the hint, which is incorrect)
40i. Full path of root directory for storage: /data/se/osg
40j. Simplified VOs: y
40k. Local directory for all VOs: /data/se/osg
40l. Advertise standalone gridftp: y
Answer 41 and 42 as before.
Stop and start the services:
vdt-control --off
vdt-control --on
Upgrade and configure RSV monitoring
OSG 0.8.0 comes with RSV V1 (note: this link was out of date at the time of this writing, but provides detailed information that may be helpful regardless). We upgrade RSV to V2 following the instructions in this RSV guide. As root (su -) on the HN:
- Create the rsvuser:
useradd -c "RSV monitoring user" -n rsvuser
passwd rsvuser
ssh-agent $SHELL
ssh-add
rocks sync config
rocks sync users - Place your personal usercert.pem and userkey.pem files into ~rsvuser/.globus and give rsvuser ownership:
chown rsvuser:users .globus/* - As rsvuser (su - rsvuser), edit ~/.cshrc and add to the end:
# RSV
source $VDT_LOCATION/vdt/etc/condor-devel-env.csh - Create the proxy as rsvuser:
voms-proxy-init -voms cms -out /home/rsvuser/x509up_rsv -hours 1000
Make note of the expiration date and be sure to log back on as rsvuser and renew the proxy whenever it is about to expire. Note that 1000 hours may be too large for your existing certificate. Reduce the number until you no longer receive an error about the proxy expiring after the lifetime of the certificate. - Return to the root user and turn the services off while we modify them:
cd $VDT_LOCATION
vdt-control --off - Follow the instructions in the RSV V1 to V2 upgrade guide to backup the existing RSV directories and get the new files.
- Additionally, if you've downloaded the latest version of the vdt-update-certs package (released Sep. 11, 2008), this 'hacked' V2 RSV upgrade will throw an inappropriate error. The probe in question can be replaced safely:
cd $VDT_LOCATION/osg-rsv/bin/probes
mv cacert-crl-expiry-probe /tmp/cacert-crl-expiry-probe.old
wget "http://rsv.grid.iu.edu/downloads/pre-release/0.8.0/cacert-crl-expiry-probe"
chmod +x cacert-crl-expiry-probe - Make the html-consumer scripts, which generate the monitoring web pages, executable:
chmod +x $VDT_LOCATION/osg-rsv/bin/consumers/*
- Configure RSV:
$VDT_LOCATION/osg-rsv/setup/configure_osg_rsv \
--user rsvuser --init --server y \
--ce-probes --ce-uri "hepcms-0.umd.edu" \
--srm-probes --srm-uri "hepcms-0.umd.edu" \
--srm-webservice-path "srm/v2/server" --srm-dir /data/se/osg \
--grid-type "OSG" --gridftp-probes \
--setup-for-apache --gratia --verbose --consumers \
--proxy /home/rsvuser/x509up_rsv - Start the services:
vdt-control --on - You can verify that the probes are running by executing:
. $VDT_LOCATION/vdt/etc/condor-devel-env.sh
condor_q - RSV will create files in /data/se/osg which must be cleaned up regularly by cron. As root (su -) on the HN, edit /var/spool/cron/root and add the line:
06 05 * * 0 find /data/se/osg -mtime +7 -type f -exec rm -f {} \;
This will remove week-old files in /data/se/osg every Sunday at 5:06am. RSV also places files in /tmp, in addition to other programs. If you haven't done so already, configure cron to garbage collect /tmp on all of the nodes.
Results of the RSV probes will be visible at http://hepcms-0.umd.edu:6060/rsv in 15-30 mins. Further information can be found in $VDT_LOCATION/osg-rsv/logs/probes.
You can enable new probes by:
cp $VDT_LOCATION/osg-rsv/config/sample_metrics.conf $VDT_LOCATION/osg-rsv/config/hepcms-0.umd.edu_metrics.conf
and editing hepcms-0.umd.edu_metrics.conf to turn the desired probes on and off.
Register with the Grid Operations Center (GOC):
This should be done only once per site (we've done already).
- Navigate to the OSG Information Management web portal.
- Register as a new user.
- Under the Registrations navigation bar, select Resources->Add New Resource
- Fill in the following values for our CE:
Facility: My Facility Is Not Listed (now that we have registered, we select University of Maryland for any new resources we might add later)
Site: My Site Is Not Listed (again, now that we have registered, we select UMD-CMS)
Resource Name: umd-cms
Resource Services: Compute Element, Bestman-Xrootd Storage Element (note: this text may change soon, select whatever is designated as BeStMan or select SRM V2 if BeStMan is removed)
Fully Qualified Domain Name: hepcms-0.umd.edu
Resource URL: http://hep-t3.physics.umd.edu
OSG Grid: OSG Production Resource
Interoperability: Select WLCG Interoperability BDII (Published to WLCG); do not select WLCG Interoperability Monitoring (SAM)
GOC Logging: Do not select Publish Syslogng
Resource Description: Tier-3 computing center. Priority given to local users, but opportunistic use by CMS VO allowed. - Add the primary and secondary system and security admins.
- You will receive emails with further instructions.
Once VORS registration has completed, monitoring info will be here. Once BDII registration has completed, monitoring info will be here.