Jobsub ID 109251.0@justin-prod-sched02.dune.hep.ac.uk
Jobsub ID | 109251.0@justin-prod-sched02.dune.hep.ac.uk |
Workflow Testing | Yes |
Workflow ID | 500 |
Stage ID | 1 |
User name | amcnab@fnal.gov |
HTCondor Group | group_dune |
Requested | Processors | 1 |
GPU | No |
RSS bytes | 1073741824 (1024 MiB) |
Wall seconds limit | 3600 (1 hours) |
Submitted time | 2024-11-23 11:21:58 |
Site | US_UConn-HPC |
Entry | GLUEX_US_UConn-HPC_osgce |
Last heartbeat | 2024-11-23 11:24:10 |
From worker node | Hostname | cn445 |
cpuinfo | AMD EPYC 7452 32-Core Processor |
OS release | Scientific Linux release 7.9 (Nitrogen) |
Processors | 1 |
RSS bytes | 1073741824 (1024 MiB) |
Wall seconds limit | 172800 (48 hours) |
GPU | |
Inner Apptainer? | True |
Job state | outputting_failed |
Allocator name | justin-allocator-int.dune.hep.ac.uk |
Started | 2024-11-23 11:23:10 |
Input files | |
Jobscript | Exit code | 0 |
Real time | 0m (47s) |
CPU time | 0m (14s = 29%) |
Outputting started | 2024-11-23 11:23:58 |
Output files | |
Finished | 2024-11-23 11:24:10 |
List job events Wrapper job log |
Jobscript log (last 10,000 characters)
ne.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='metacat.fnal.gov', port=9443): Max retries exceeded with url: /dune_meta_prod/app/data/declare_files?dataset=dune:all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x14b7760cd460>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/bin/metacat", line 8, in <module>
sys.exit(main())
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/metacat_ui.py", line 168, in main
cli.run(sys.argv, argv0="metacat")
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/cli/cli.py", line 216, in run
self._run(command, context, argv, usage_on_error)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/cli/cli.py", line 211, in _run
return interp._run(pre_command + word, context, rest, usage_on_error = usage_on_error)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/cli/cli.py", line 211, in _run
return interp._run(pre_command + word, context, rest, usage_on_error = usage_on_error)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/cli/cli.py", line 112, in _run
return self(command, context, opts, args)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/ui/metacat_file.py", line 195, in __call__
response = list(client.declare_files(f"{dataset_namespace}:{dataset_name}",
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/webapi/webapi.py", line 819, in declare_files
out = self.post_json(url, lst)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/webapi/webapi.py", line 215, in post_json
response = self.send_request("post", uri_suffix, data=data, headers=headers, stream=True)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/webapi/webapi.py", line 154, in send_request
self.LastResponse = response = self.retry_request(method, url, headers=headers, **args)
File "/cvmfs/dune.opensciencegrid.org/products/dune/metacat/v3_42_2/NULL/lib/python3.9/site-packages/metacat/webapi/webapi.py", line 133, in retry_request
response = requests.post(url, timeout=self.Timeout, **args)
File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/cvmfs/dune.opensciencegrid.org/products/dune/rucio/v35_4_0/NULL/lib/python3.9/site-packages/requests/adapters.py", line 700, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='metacat.fnal.gov', port=9443): Max retries exceeded with url: /dune_meta_prod/app/data/declare_files?dataset=dune:all (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x14b7760cd460>: Failed to establish a new connection: [Errno 111] Connection refused'))
metacat file declare returns 1
GFAL_CONFIG_DIR: GFAL_PLUGIN_DIR:
justin-rucio-upload attempt 1
DEBUG:root:Num. of files that upload client is processing: 1
DEBUG:dogpile.cache.region:No value present for key: "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:NeedRegenerationException
DEBUG:dogpile.lock:no value, waiting for create lock
DEBUG:dogpile.lock:value creation lock <dogpile.cache.region.CacheRegion._LockWrapper object at 0x147ec116f5e0> acquired
DEBUG:dogpile.cache.region:No value present for key: "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:Calling creation function for not-yet-present value
DEBUG:dogpile.cache.region:Cache value generated in 0.000 seconds for key(s): "host_to_choose_choice['https://dune-rucio.fnal.gov']"
DEBUG:dogpile.lock:Released creation lock
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/?expression=SURFSARA HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dune-rucio.fnal.gov:443
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/SURFSARA HTTP/1.1" 200 1260
DEBUG:root:Input validation done.
INFO:root:Preparing upload for file awt-1732360994-ofd4ZGBejU
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /rses/SURFSARA/attr/ HTTP/1.1" 200 308
DEBUG:root:wan domain is used for the upload
DEBUG:root:Registering file
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "GET /accounts/dunepro/scopes/ HTTP/1.1" 200 737
DEBUG:root:Trying to create dataset: testpro:awt-uploads-202447
DEBUG:urllib3.connectionpool:https://dune-rucio.fnal.gov:443 "POST /dids/testpro/awt-uploads-202447 HTTP/1.1" 500 321
--- Upload try 1/1
--- Rucio upload 1/1 fails: An unknown exception occurred.
Details: no error information passed (http status code: 500)
--- Exit with 99
'justin-rucio-upload --rse SURFSARA --protocol davs --scope testpro --dataset awt-uploads-202447 awt-1732360994-ofd4ZGBejU --timeout 1200' returns 99
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=1661652726/CN=173236099083
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=1661652726
identity : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk/CN=1661652726
type : RFC compliant proxy
strength : 2048 bits
path : /home/awt-proxy.pem
timeleft : 167:59:13
key usage : Digital Signature, Key Encipherment, Key Agreement
=== VO dune extension information ===
VO : dune
subject : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=justin-jobs-production.dune.hep.ac.uk
issuer : /DC=org/DC=incommon/C=US/ST=Illinois/O=Fermi Research Alliance/CN=voms1.fnal.gov
attribute : /dune/Role=Production/Capability=NULL
attribute : /dune/Role=NULL/Capability=NULL
timeleft : 159:49:06
uri : voms1.fnal.gov:15042
===== Results =====
Download/upload commands:
xrdcp --force --nopbar --verbose $read_pfn downloaded.txt
echo '{"namespace":"testpro","name":"FILENAME","size":0}' >tmp.json
metacat file declare --json -f tmp.json "dune:all"
justin-rucio-upload --rse $rse_name --protocol $write_protocol --scope testpro --dataset awt-uploads-202447 --timeout 1200 FILENAME
Use the wrapper job link on the page for the job on the justIN Dashboard to find the full log file, with errors from these commands
Each line: $JUSTIN_SITE_NAME $rse_name $download_retval $upload_retval $read_pfn $write_protocol
==awt== US_UConn-HPC DUNE_CERN_EOS 0 99 root://eospublic.cern.ch:1094//eos/experiment/neutplatform/protodune/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_ES_PIC 0 99 root://xrootd.pic.es:1094/pnfs/pic.es/data/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_FR_CCIN2P3_DISK 0 99 root://ccxrootdegee.in2p3.fr:1094/pnfs/in2p3.fr/data/dune/disk/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_UK_GLASGOW 0 99 root://cephc02.gla.scotgrid.ac.uk:1094//cephfs/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_UK_LANCASTER_CEPH 0 99 root://xgate.hec.lancs.ac.uk:1094//cephfs/grid/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_UK_MANCHESTER_CEPH 0 99 root://meitner.tier2.hep.manchester.ac.uk:1094//cephfs/experiments/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_US_BNL_SDCC 0 99 root://dcdndoor.sdcc.bnl.gov:1094//pnfs/sdcc.bnl.gov/data/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC DUNE_US_FNAL_DISK_STAGE 0 99 root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/persistent/staging/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC NIKHEF 0 99 root://dune.dcache.nikhef.nl:1094/pnfs/nikhef.nl/data/dune/generic/rucio/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC PRAGUE 0 99 root://golias100.farm.particle.cz:1094/dpm/farm.particle.cz/home/dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC QMUL 51 99 root://xrootd01.escqmul.ac.uk:1094//dune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC RAL-PP 0 99 root://mover.pp.rl.ac.uk:1094/pnfs/pp.rl.ac.uk/data/dune/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC RAL_ECHO 0 99 root://xrootd.echo.stfc.ac.uk:1094/dune:/protodune/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs
==awt== US_UConn-HPC SURFSARA 0 99 root://penguin12.grid.surfsara.nl:21094/pnfs/grid.sara.nl/data/dune/disk/RSE/testpro/bb/7f/awt-download-2023-03-07-01.txt davs