Jobsub ID 469101.0@justin-prod-sched01.dune.hep.ac.uk
| Jobsub ID | 469101.0@justin-prod-sched01.dune.hep.ac.uk |
| Workflow Testing | Yes |
| Workflow ID | 1 |
| Stage ID | 1 |
| User name | amcnab@fnal.gov |
| Requested | Processors | 1 |
| GPU | Yes |
| RSS bytes | 1073741824 (1024 MiB) |
| Wall seconds limit | 3600 (1 hours) |
| Submitted time | 2025-12-19 00:11:09 |
| Site | UK_RAL-PPD |
| Entry | CMSHTPC_T2_UK_SGrid_RALPP_hep206_gpu |
| Last heartbeat | 2025-12-19 00:25:06 |
| From worker node | Hostname | hepacc09.pp.rl.ac.uk |
| cpuinfo | Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz |
| OS release | Scientific Linux release 7.9 (Nitrogen) |
| Processors | 1 |
| RSS bytes | 1073741824 (1024 MiB) |
| Wall seconds limit | 257400 (71 hours) |
| GPU | NVIDIA A100-PCIE-40GB 575.57.08 8.0 92.00.25.00.08 40441MiB |
| Inner Apptainer? | True |
| Job state | finished |
| Started | 2025-12-19 00:22:10 |
| Input files | |
| Jobscript | Exit code | 0 |
| Real time | 2m (158s) |
| CPU time | 0m (36s = 22%) |
| Max RSS bytes | 66068480 (63 MiB) |
| Outputting started | 2025-12-19 00:24:48 |
| Output files | |
| Finished | 2025-12-19 00:25:06 |
| Saved logs | justin-logs:469101.0-justin-prod-sched01.dune.hep.ac.uk.logs.tgz |
| List job events Cached HTCondor job logs |
Jobscript log (last 10,000 characters)
um: Expected=5e5806c8 Found=5e5806c8
DEBUG:root:Renaming file davs://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/78/62/awt-1766103735-aW4BHlfA6Q.rucio.upload to davs://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/78/62/awt-1766103735-aW4BHlfA6Q
DEBUG:root:gfal.Default: renaming file from davs://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/78/62/awt-1766103735-aW4BHlfA6Q.rucio.upload to davs://webdav.grid.surfsara.nl:2880/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/78/62/awt-1766103735-aW4BHlfA6Q
DEBUG:root:gfal.Default: closing protocol connection
DEBUG:root:Upload done.
INFO:root:Successfully uploaded file awt-1766103735-aW4BHlfA6Q
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v38_1_0/NULL/lib/python3.9/site-packages/urllib3/connectionpool.py:1061: InsecureRequestWarning: Unverified HTTPS request is being made to host 'dune-rucio.fnal.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /traces/ HTTP/1.1" 404 207
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "PUT /replicas HTTP/1.1" 200 0
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /dids/testpro/awt-uploads-202550/dids HTTP/1.1" 201 7
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /replicas/list HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /dids/testpro/awt-uploads-202550/files HTTP/1.1" 200 None
--- Upload try 1/1
--- Rucio upload 1/1 returns 0
--- Replica check try 1/1
--- Dataset awt-uploads-202550 check try 1/1
--- Upload, replicas, and datasets checks passed
'justin-rucio-upload --rse SURFSARA --protocol davs --scope testpro --dataset awt-uploads-202550 awt-1766103735-aW4BHlfA6Q --timeout 1200' returns 0
---------------------------------------------------------------------
UK_RAL-PPD T3_US_NERSC davs root://dtn14.nersc.gov:1094//global/cfs/cdirs/m3249/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt
'xrdcp --force --nopbar --verbose root://dtn14.nersc.gov:1094//global/cfs/cdirs/m3249/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt downloaded.txt' returns 0
{
"created_timestamp": null,
"creator": "dunepro",
"fid": "XouBtAcVR92q6B1q",
"metadata": {},
"name": "awt-1766103735-L22BQT0AB1",
"namespace": "testpro",
"retired": false,
"retired_by": null,
"retired_timestamp": null,
"size": 0,
"updated_by": null,
"updated_timestamp": null
}
metacat file declare returns 0
GFAL_CONFIG_DIR: GFAL_PLUGIN_DIR:
justin-rucio-upload attempt 1
DEBUG:root:Num. of files that upload client is processing: 1
DEBUG:dogpile.cache.region:No value present for key: "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:NeedRegenerationException
DEBUG:dogpile.lock:no value, waiting for create lock
DEBUG:dogpile.lock:value creation lock <dogpile.cache.region.CacheRegion._LockWrapper object at 0x14b2c8f4aa30> acquired
DEBUG:dogpile.cache.region:No value present for key: "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:Calling creation function for not-yet-present value
DEBUG:dogpile.cache.region:Cache value generated in 0.000 seconds for key(s): "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:Released creation lock
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/?expression=T3_US_NERSC HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/T3_US_NERSC HTTP/1.1" 200 1240
DEBUG:root:Input validation done.
INFO:root:Preparing upload for file awt-1766103735-L22BQT0AB1
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/T3_US_NERSC/attr/ HTTP/1.1" 200 139
DEBUG:root:wan domain is used for the upload
DEBUG:root:Registering file
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /accounts/dunepro/scopes/ HTTP/1.1" 200 870
DEBUG:root:Trying to create dataset: testpro:awt-uploads-202550
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /dids/testpro/awt-uploads-202550 HTTP/1.1" 409 104
INFO:root:Dataset testpro:awt-uploads-202550 already exists - no rule will be created
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /dids/testpro/awt-1766103735-L22BQT0AB1/meta?plugin=DID_COLUMN HTTP/1.1" 404 129
DEBUG:root:File DID does not exist
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /replicas HTTP/1.1" 201 7
INFO:root:Successfully added replica in Rucio catalogue at T3_US_NERSC
DEBUG:root:gfal.NoRename: connecting to storage
DEBUG:root:Checking if davs://dtn14.nersc.gov:1094/global/cfs/cdirs/m3249/dune/RSE/testpro/2b/72/awt-1766103735-L22BQT0AB1 exists
DEBUG:root:gfal.NoRename: checking if file exists davs://dtn14.nersc.gov:1094/global/cfs/cdirs/m3249/dune/RSE/testpro/2b/72/awt-1766103735-L22BQT0AB1
--- Upload try 1/1
--- Rucio upload 1/1 fails: The requested service is not available at the moment.
Details: An unknown exception occurred.
Details: Result Domain name resolution failed after 1 attempts
--- Exit with 99
'justin-rucio-upload --rse T3_US_NERSC --protocol davs --scope testpro --dataset awt-uploads-202550 awt-1766103735-L22BQT0AB1 --timeout 1200' returns 99
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=2678515116/CN=176610373030
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=2678515116
identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=2678515116
type : RFC compliant proxy
strength : 2048 bits
path : /home/awt-proxy.pem
timeleft : 167:57:22
key usage : Digital Signature, Key Encipherment, Key Agreement
=== VO dune extension information ===
VO : dune
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk
issuer : /DC=org/DC=incommon/C=US/ST=Illinois/O=Fermi Research Alliance/CN=voms1.fnal.gov
attribute : /dune/Role=Production/Capability=NULL
attribute : /dune/Role=NULL/Capability=NULL
timeleft : 146:44:13
uri : voms1.fnal.gov:15042
===== Results =====
Download/upload commands:
xrdcp --force --nopbar --verbose $read_pfn downloaded.txt
echo '{"namespace":"testpro","name":"FILENAME","size":0}' >tmp.json
metacat file declare --json -f tmp.json "dune:all"
justin-rucio-upload --rse $rse_name --protocol $write_protocol --scope testpro --dataset awt-uploads-202550 --timeout 1200 FILENAME
Use the wrapper job link on the page for the job on the justIN Dashboard to find the full log file, with errors from these commands
Each line: $JUSTIN_SITE_NAME $rse_name $download_retval $upload_retval $read_pfn $write_protocol
==awt== UK_RAL-PPD DUNE_CA_SFU 0 0 root://lcg-dunese1.sfu.computecanada.ca:1094//dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_CERN_EOS 0 0 root://eospublic.cern.ch:1094//eos/experiment/neutplatform/protodune/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_ES_PIC 0 0 root://xrootd.pic.es:1094/pnfs/pic.es/data/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_FR_CCIN2P3_DISK 0 0 root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_IT_INFN_CNAF 0 0 root://xrootd-archive.cr.cnaf.infn.it:1096//dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_UK_GLASGOW 0 0 root://cephc02.gla.scotgrid.ac.uk:1094//cephfs/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_UK_LANCASTER_CEPH 0 0 root://xgate.hec.lancs.ac.uk:1094//cephfs/grid/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_UK_MANCHESTER_CEPH 0 0 root://meitner.tier2.hep.manchester.ac.uk:1094//cephfs/experiments/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_US_BNL_SDCC 0 0 root://dcdndoor.sdcc.bnl.gov:1094//pnfs/sdcc.bnl.gov/data/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD DUNE_US_FNAL_DISK_STAGE 0 0 root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/staging/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD FNAL_DCACHE 0 99 root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro//other/awt-staging/awt-download-2023-03-07-01.txt_1749841165 davs
==awt== UK_RAL-PPD NIKHEF 0 0 root://dune.dcache.nikhef.nl:1094/pnfs/nikhef.nl/data/dune/generic/rucio/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD PRAGUE 0 0 root://se1.farm.particle.cz:1094//dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD QMUL 0 0 root://xrootd1.esc.qmul.ac.uk:1094//dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD RAL-PP 0 0 root://mover.pp.rl.ac.uk:1094/pnfs/pp.rl.ac.uk/data/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD RAL_ECHO 0 0 root://xrootd.echo.stfc.ac.uk:1094/dune:/protodune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD SURFSARA 0 0 root://otter12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== UK_RAL-PPD T3_US_NERSC 0 99 root://dtn14.nersc.gov:1094//global/cfs/cdirs/m3249/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs