justin command man page

This man page is distributed along with the justin command itself.
JUSTIN(2024)                              JUSTIN(2024)

NAME
       justin - justIN workflow system utility command

SYNOPSIS
       justin subcommand [options]

DESCRIPTION
       justin is a command-line utility for managing workflows, stages, files,
       and replicas in the justIN workflow system.


GENERAL OPTIONS
       -h, --help
          Show help message and exit


       -v, --verbose
          Turn on verbose logging of the communication with justIN
          service.


       --url URL
          Use an alternative justIN service, rather than https://justin-
          ui-pro.dune.hep.ac.uk/api/commands This option is only needed
          during development and testing, and it may be convenient to set
          this option via $JUSTIN_OPTIONS described in Environment below.


SUBCOMMANDS
       version
          Output the version number of the justin command.


       time   Contact justIN to get the current time as a string. This can be
          used to check that the client is installed correctly and that
          the user is properly registered in justIN.


       whoami
          Displays information about the current user identity and
          session.


       create-workflow [--description DESC] [--mql QUERY|--monte-carlo COUNT]
          [--scope SCOPE] [--refind-end-date YYYYMMDD]
          [--refind-interval-hours HOURS]
          Create a new, empty workflow in the database, optionally with
          the given short, human-readable description and either a MetaCat
          Query Language expression or the count of the number of Monte
          Carlo instances to run.

          --scope SCOPE specifies the Rucio scope used for any output
          files to be registered with Rucio and uploaded to Rucio-managed
          storage. Scopes also determine which HTCondor group wrapper jobs
          are submitted to. If not given, the default scope usertests is
          used.

          The options --refind-interval-hours (default 1) and
          --refind-end-date (default: today in UTC) can be used to cause
          MQL queries to be resubmitted at that interval to add any new
          matching files until the end of the given day.  At least one of
          these options must be given to trigger this behaviour and to
          ensure that files added close to the end of that day are still
          found, a final finding is done soon after the end time.

          Workflows are created in the state "draft" and the command
          returns the new workflow's ID number.  Once the workflow is in
          the running state, justIN will use the MQL expression to find
          the list of input files from MetaCat.

       show-workflows [--workflow-id ID]
          Show details of all workflows or optionally of a single
          workflow. Each line of the output gives the workflow ID, state,
          creation time, description, and MetaCat query of one workflow.


       submit-workflow --workflow-id ID
          Changes the state of the given workflow from "draft" to
          "submitted". The justIN Finder agent will automatically set the
          workflow to running after any necessary initialisation.


       restart-workflow --workflow-id ID
          Restarts a workflow in the "paused" stated, and changes its
          state to "running".


       pause-workflow --workflow-id ID
          Changes the state of the given running workflow to "paused".
          This state temporarily excludes a workflow from the workflow
          allocation process.


       finish-workflow --workflow-id ID
          Changes the state of the given workflow from "draft",
          "submitted", or "running" to "finished". This state excludes a
          workflow from the allocation process.


       create-stage --workflow-id ID --stage-id ID  --jobscript
          FILENAME|--jobscript-git ORG/PATH:TAG [--wall-seconds N]
          [--rss-mib N] [--processors N] [--gpu] [--max-distance DIST]
          [--output-pattern PATTERN[:DESTINATION]]
          [--output-pattern-next-stage PATTERN[:DATASET]] [--output-rse
          NAME] [--output-rse-expression EXPRESSION] [--lifetime-days
          DAYS] [--env NAME=VALUE] [--classad NAME=VALUE]
          Creates a new stage for the given workflow ID with the given
          stage ID. Stages must be numbered consecutively from 1, and each
          workflow must have at least one stage.

          Each stage must have a jobscript shell script associated with
          it, given by the --jobscript or --jobscript-git options.  Either
          the full, local path to the jobscript file is given, or a
          reference to a tag or revison hash in GitHub is given.  A GitHub
          reference takes the form PATH:TAG where TAG is a git tag or SHA1
          revision hash, and PATH is the path to the jobscript file in
          GitHub's URL space, of the form
          ORGANISATION/REPO/DIRECTORIES/.../FILE.jobscript .  In both
          scenarios, a copy of the current text of the jobscript is cached
          in the stage definition and executed on worker nodes to process
          the stage's files.

          If the maximum wallclock time needed is not given by
          --wall-seconds then the default of 80000 seconds is used. The
          value used is available to jobscripts as $JUSTIN_WALL_SECONDS.
          If the maximum amount of resident memory needed is not given by
          --rss-mib then the default of 2000MiB is used. The resident
          memory corresponds to the physical memory managed by HTCondor's
          ResidentSetSize value and is available to jobscripts as
          $JUSTIN_RSS_MIB.  If the script can make use of multiple
          processors then --processors can be used to give the number
          needed, with a default of 1 if not given. The value used is
          available to jobscripts as $JUSTIN_PROCESSORS.  If given then
          --gpu will require that jobs for this stage have access to a
          GPU.

          By default, input files will only be allocated to a script which
          are on storages at the same site (distance=0). This can be
          changed by setting --max-distance DIST to allow input files to
          be allocated on storages at greater distances, up to a value of
          100 which represents maximally remote storages.

          If one or more options --output-pattern PATTERN[:DESTINATION] is
          given then the wrapper job will look for files created by the
          script which match the pattern given as PATTERN. The pattern is
          a Bash shell pattern using *, ? and [...] expressions. See the
          bash(1) Pattern Matching section for details.  If given, the
          DESTINATION component has any of the variables $JUSTIN_SCOPE,
          $JUSTIN_WORKFLOW_ID, or $JUSTIN_STAGE_ID replaced. The form
          ${JUSTIN_SCOPE} etc may also be used.  If the given DESTINATION
          starts with https:// then the matching output files will be
          uploaded to WebDAV scratch space, such as dCache at Fermilab.
          The DESTINATION must be the URL of a directory accessible via
          WebDAV, and given with or without a trailing slash. Nested
          subdirectories for workflow ID and stage ID will be added, and
          resulting output files placed there. The user's token from the
          justIN dashboard is used for the upload.  If an https:// URL is
          not given, DESTINATION is used when constructing the output
          dataset names. Datasets have the form DESTINATIION-wXsYpZnN
          where X is the workflow ID, Y is the stage, and Z is the output
          pattern ID number, starting from 1. The number N is used to
          create output datasets smaller than a global limit set in the
          justIN configuration.  If DESTINATION is not given then only the
          form wXsYpZnN is used.

          Files for Rucio-managed storage may have a corresponding JSON
          metadata file with the same name but with ".json" appended, that
          will be recorded in the metadata for that file in MetaCat. If
          this is not given, then basic workflow metadata will still be
          recorded. If output files have parent-child relations, the
          parent output pattern must be given before the child so that the
          parents are known to MetaCat before the children declare them to
          be parents.

          Alternatively --output-pattern-next-stage PATTERN[:DESTINATION]
          can be given in which case the output file will be uploaded to
          Rucio-managed storage and will also be registered in the justIN
          Database as an unprocessed input file for the next stage and
          available for allocation to instances of that stage's script.

          --lifetime-days DAYS sets the Rucio rule lifetime when creating
          Rucio datasets for output files.  If any Rucio datasets are used
          for outputs, then this is option is required.

          If one or more options --output-rse NAME is given, then the RSE
          used for uploads of output files and log tgz files will be
          chosen from that list of RSEs, with preference given to RSEs
          which are closer in distance. If this option is not used, or
          none of the given RSEs are available, then the default algorithm
          for choosing the closest available RSE is used.

          If --output-rse-expression EXPRESSION is given, then it is used
          when creating rules for Rucio datasets for outputs, but not for
          the per-RSE datasets used to keep a copy of the output file on
          the RSE it is first uploaded to.

          --env NAME=VALUE can be used one or more times to set
          environment variables when the stage's jobscript is executed.

          --classad NAME=VALUE can be used one or more times to add
          ClassAds to the jobs submitted for this stage.


       simple-workflow [--description DESC] [--mql QUERY|--monte-carlo COUNT]
          [--scope SCOPE] [--refind-end-date YYYYMMDD]
          [--refind-interval-hours HOURS] --jobscript
          FILENAME|--jobscript-git ORG/PATH:TAG [--wall-seconds N]
          [--rss-mib N] [--processors N] [--gpu] --max-distance DIST]
          [--output-pattern PATTERN[:DESTINATION]] [--output-rse NAME]
          [--output-rse-expression EXPRESSION] [--lifetime-days DAYS]
          [--env NAME=VALUE] [--classad NAME=VALUE]
          Combines the create-workflow, create-stage and submit-workflow
          subcommands into a single operation, for use with single-stage
          workflows. The options are repeated from the first two
          subcommands and are described in their respective sections
          above.


       show-stages --workflow-id ID [--stage-id ID]
          Shows details of all stages of the given workflow or optionally
          of a single stage of that workflow. Each line of the output
          gives the workflow ID, stage ID,, min processors, max
          processors, max wallclock seconds, max RSS bytes, and the max
          distance value.

       show-jobscript --jobscript-git ORG/PATH:TAG
       show-jobscript --workflow-id ID --stage-id ID
          Show the given jobscript, either by GitHub reference or by
          workflow and stage.

       show-stage-outputs --workflow-id ID --stage-id ID
          Shows the datasets to be assigned and the patterns used to find
          output files of the given stage within the given workflow. Each
          line of the response consists of "(next)" or "(  )" depending on
          whether the files are passed to the next stage within the
          workflow, and then the scope, files pattern, and destination.


       fail-files --workflow-id ID [--stage-id ID]
          Set all the files of the given workflow, and optionally stage,
          to the failed state when they are already in the finding,
          unallocated, allocated, or outputting state. Files in the
          processed, failed, or notfound states are unchanged. This allows
          workflows with a handful of pathological files to be terminated,
          as the Finder agent will see all the files are now in terminal
          states and mark the workflow as finished.

       show-files --workflow-id ID [--stage-id ID] [--file-did DID]
       show-files --mql QUERY
          Show files either cached in the justIN Database and filtered by
          workflow ID and optionally by stage ID and/or file DID; or up to
          100 found by a query to MetaCat using the given MQL query.

       show-replicas --workflow-id ID [--stage-id ID] [--file-did DID]
       show-replicas --mql QUERY
          Show replicas either cached in the justIN Database and filtered
          by workflow ID and optionally by stage ID and/or file DID; or up
          to 100 found by a query to MetaCat using the given MQL query and
          looked up using Rucio.

       show-jobs --jobsub-id ID | --workflow-id ID [--stage-id ID] [--state
          STATE]
          Show jobs identified by Jobsub ID or Workflow ID (and optionally
          Stage ID). Job state can also be given to further filter the
          jobs listed. For each job, the Jobsub ID, Workflow ID, Stage ID,
          State, and creation time are shown.

       fetch-logs --jobsub-id ID [--unpack]
          Download and optionally unpack the logs.tgz file for a given
          job. The file is placed in the current directory and if the
          --unpack option is given, it will be unpacked into a directory
          named for the job.  This subcommand uses justIN authentication
          and does not require that you have an X.509 proxy or use the
          Rucio client. However, it is not as efficient as the standalone
          justin-fetch-logs command.


JOBSCRIPTS
       The user jobscripts supplied when creating a stage are shell scripts
       which the wrapper jobs execute on the worker nodes matched to that
       stage.

       When specifying a jobscript to the justin command, either the full,
       local path to the jobscript file is given, or a reference to a tag or
       revison hash in GitHub is given.  (Other git repository services may be
       added in the future.)

       A GitHub reference takes the form PATH:TAG where TAG is a git tag or
       SHA1 revision hash, and PATH is the path to the jobscript file in
       GitHub's URL space, of the form
       ORGANISATION/REPO/DIRECTORIES/.../FILE.jobscript .  In both scenarios,
       a copy of the current text of the jobscript is cached in the stage
       definition and executed on worker nodes to process the stage's files.

       Jobscripts are run in an empty workspace directory.  Several
       environment variables are made available to the scripts, all prefixed
       with JUSTIN_, including $JUSTIN_WORKFLOW_ID, $JUSTIN_STAGE_ID and
       $JUSTIN_SECRET which allows the jobscript to authenticate to justIN's
       allocator service. $JUSTIN_PATH is used to reference files and scripts
       provided by justIN.

       To get the details of an input file to work on, the command
       $JUSTIN_PATH/justin-get-file is executed by the jobscript.  This
       produces a single line of output with the Rucio DID of the chosen file,
       its PFN on the optimal RSE, and the name of that RSE, all separated by
       spaces. This code fragment shows how the DID, PFN and RSE can be put
       into shell variables:

     did_pfn_rse=`$JUSTIN_PATH/justin-get-file`
     did=`echo $did_pfn_rse | cut -f1 -d' '`
     pfn=`echo $did_pfn_rse | cut -f2 -d' '`
     rse=`echo $did_pfn_rse | cut -f3 -d' '`

       If no file is available to be processed, then justin-get-file returns a
       non-zero exit code and produces no output to stdout, which should also
       be checked for. justin-get-file logs errors to stderr.

       justin-get-file can be called multiple times to process more than one
       file in the same jobscript. This can be done all at the start or
       repeatedly during the lifetime of the job. justin-get-file is itself a
       simple wrapper around the curl command and it would also be possible to
       access the justIN allocator service's REST API directly from an
       application.

       justin-get-file has a single option which may also be given:
       --seconds-needed NNNN where NNNN is the maximum number of wallclock
       seconds which will be needed by the jobscript to process another file
       and finish. If there is not enough time left based on the
       --wall-seconds option used when defining the stage, then justin-get-
       file will in that case return an empty result and a non-zero exit code,
       just as if no more files were available for processing. This can easily
       be used to create jobscripts which process a series of input files
       without running out of time on the last one.

       Each file returned by justin-get-file is marked as allocated and will
       not be processed by any other jobs. When the jobscript finishes, it
       must leave files with lists of the processed files in its workspace
       directory. These lists are sent to the justIN allocator service by the
       wrapper job, which either marks input files as being successfully
       processed or resets their state to unallocated, ready for matching by
       another job.

       Files can be referred to either by DID or PFN, one per line, in the
       appropriate list file:
     justin-processed-dids.txt
     justin-processed-pfns.txt

       It is not necessary to create list files which would otherwise be
       empty. You can use a mix of DIDs and PFNs, as long as each appears in
       the correct list file. Any files not represented in either file will be
       treated as unprocessed and made available for other jobs to process.

       Output files which are to be uploaded with Rucio by the wrapper job
       must be created in the jobscript's workspace directory and have
       filenames matching the patterns given by --output-pattern or
       --output-pattern-next-stage when the stage was created. The suffixed
       .json is appended to find the corresponding metadata files for MetaCat.


WORKFLOW PROCESSING
       Once a workflow enters the running state, it is processed by justIN's
       Finder agent to find its input files. The finder uses the workflows's
       MQL expression to create a list of input files for the first stage.
       Work is only assigned to jobs when a matching file is found and so
       these lists of files are essential.

       In most cases, the MQL query is a MetaCat Query Language expression,
       which the Finder sends to the MetaCat service to get a list of matching
       file DIDs.  However, if the query is of the form "rucio-dataset
       SCOPE:NAME" then the query is sent directly to Rucio to get the list of
       file DIDs contained in the given Rucio dataset. Finally if the
       --monte-carlo COUNT option is used when creating the workflow, then an
       MQL of the form "monte-carlo COUNT" is stored. This causes the Finder
       itself to create a series of COUNT placeholder files which can be used
       to keep track of Monte Carlo processing without a distinct input file
       for each of the COUNT jobs.  Each of these placeholder files has a DID
       of the form monte-carlo-WORKFLOW_ID-NUMBER where NUMBER is in the range
       1 to COUNT, and WORKFLOW_ID is the assigned workflow ID number.


AUTHENTICATION AND AUTHORIZATION
       When first used on a given computer, the justin command contacts the
       central justIN services and obtains a session ID and secret which are
       placed in a temporary file. You will then be invited to visit a web
       page on the justIN dashboard which has instructions on how to authorize
       that session, using CILogon and your identity provider. Once
       authorized, you can use the justin command on that computer for 7 days,
       and then you will be invited to re-authorize it. You can have multiple
       computers at multiple sites authorized at the same time.


ENVIRONMENT
       If set, the value of the environment variable JUSTIN_OPTIONS is
       prepended to the list of options after the justin subcommand.


FILES
       A session file /var/tmp/justin.session.USERID is created by justin,
       where USERID is the numeric Unix user id, given by id -u


AUTHOR
       Andrew McNab <Andrew.McNab@cern.ch>


SEE ALSO
       bash(1)

justIN Manual               justin            JUSTIN(2024)