## page was renamed from Computing/LIP_Lisbon_Farm/6_SLURM/6.2_Job_Submission <> = Jobs submission from a non shared directory = The LIP home directiries are not shared between the submit nodes and the execution nodes, the input files or directories must be transfered into the worker nodes, and at job completion the output files or directories should be transfered out of the execution node. The job runs on a temporary local directory which is cleaned at job completion. The information about the transfers into and out of worker nodes is provided with the directives '''INPUT''' and '''OUTPUT''' as shown bellow. == Prepare the submit script == For the sake of this example lets suppose we need to execute a ''root'' macro. The simulation input file is '''[[attachment:MyMacro.c]] ''' and this macro execution will produce an output file named ''graph_with_law.pdf''. The working directory will start with two files, '''[[attachment:MyMacro.c]] ''' and '''[[attachment:MyMacro.sh|MyMacro.sh]]''': {{{#!highlight bash [uname@pauli01 wdir]$ ls -l -rw-r--r-- 1 uname ugroup 1822 Feb 19 12:08 MyMacro.c -rw-r--r-- 1 uname ugroup 310 Feb 19 12:07 MyMacro.sh [uname@pauli01 mdir]$ cat MyMacro.sh #!/bin/bash # --------------------------------------------------------------------- # # Select partition, or queue. The LIP users should use "lipq" partition #SBATCH -p lipq # Transfer input files to the execution machine # INPUT = MyMacro.c # Transfer output files back to submission machine # OUTPUT= graph_with_law.pdf # --------------------------------------------------------------------- # # Load environment module load root/6.18.04 # Execute macro root -b -q MyMacro.c }}} == Submit the job == The ''SLURM'' submission command is '''sbatch''', check the manual page for detais, in this case we would execute: {{{ [uname@pauli01 wdir]$ sbatch MyMacro.sh Submitted batch job 350327 }}} The job was submitted to the cluster with job identifiel 350327. == Check job status == There are two commands to check the job status, ''squeue'' and ''scontrol'', on most cases we will use the first, the second is more detailed and used when something is wrong on the job placement. We will execute ''squeue'' multiple times to check the job status as it moves throught the scheculer cycle, when the job disapears or presente the status ''C'' then it is finished. {{{ [uname@pauli01 mdir]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 350327 lipq MyMacro. uname PD 0:00 1 (None) [uname@pauli01 mdir]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 350327 lipq MyMacro. uname R 0:05 1 wn168 [uname@pauli01 mdir]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) }}} As we can see the job wait a bit on pending state, ''PD'', then it was executed, state ''R'', on workernode '''wn168''', and finally it finish. The second command would be run as following and would give you a detail very prolix result: {{{ [uname@pauli01 mdir]$ scontrol show job 350327 UserId=uname(5800002) GroupId=ugroup(5800000) MCS_label=N/A Priority=4578449 Nice=0 Account=ugroup QOS=normal JobState=PENDING Reason=None Dependency=(null) .... }}} == Cancel job == If for any reason you need to cancel the job we would use the command ''scancel'': {{{ [martinsj@pauli01 mdir]$ scancel 320327 }}} == Check job output == If the standard output file name is not specified then the job standard output will be written to a file name of form: {{{ slurm-.out }}} where ''JOB_ID'' is the job identifier. {{{#!wiki warning '''For the time being the standard error is merged with the standard output, please expect out of order standard error messages mixed with the standard output.''' }}} Check your working directory on submit node and you will find new files: {{{ [uname@pauli01 mdir]$ ls -l -rw-r--r-- 1 uname ugroup 15297 Feb 19 13:34 graph_with_law.pdf -rw-r--r-- 1 uname ugroup 1822 Feb 19 12:08 MyMacro.c -rw-r--r-- 1 uname ugroup 484 Feb 19 12:15 MyMacro.sh -rw-r--r-- 1 uname ugroup 2100 Feb 19 12:15 slurm-350327.out }}} One of the files is the standard output file, ''slurm-350327.out'', and the other is the provided output file on submission script, ''# OUTPUT=graph_with_law.pdf''. The file ''[[attachment:slurm-350327.out]]'' have three sections. The first section is the ''prolog'' section with some generic informations about the job, including the job identifier, the used partition, the workernode, the stagein files transfered, the time the job started, and some other details. The following section is the job output. Finally, the last section is the ''epilog'' section with the time of job completion and the stageout files transfered. == Job script details == The submission script is comprehend by two sections, the directives section, normally at script top with, and the working section where the user will place his work. === Directive section === ''SLURM'' directives start with '#SBATCH' and encode the '''sbatch''' command line options, this is a more convenient way to configure jobs as needed. This is a simple example, check the manual page to get familiar with more complex options. The INPUT and OUTPUT parameters are in reality environment variables that could be supplied on the command line interface as {{{ sbatch --export=INPUT=MyMacro.c,OUTPUT=graph_with_law.pdf }}} this encoded on the submit script is more convenient for a daily operation. {{{#!highlight bash # --------------------------------------------------------------------- # # Select partition, or queue. The LIP users should use "lipq" partition #SBATCH -p lipq # Transfer input files to the execution machine # INPUT = MyMacro.c # Transfer output files back to submission machine # OUTPUT= graph_with_law.pdf # --------------------------------------------------------------------- # }}} ==== #SBATCH -p lipq ==== This directive select the running partition, or queue, for LIP users it will be always '''lipq''', any other partition name will fail. ==== # INPUT = MyMacro.c ==== This select the input file for the job, on the old ''sge'' cluster we used '''SGEIN'''. Multiple lines for other input files it is accepted, it also accept directory names instead of file names. The sintaxe rules for this parameter is {{{ # INPUT = sub_name[:exe_name] }}} where ''sub_name'' is the name of input file or directory on the submission node and ''exe_name'' is the file or directory name on the execution node. This name on the execution node is optional and if not provided then it assumed to be equal to ''sub_name''. The ''INPUT'' parameter name have the following alias: ''IN'', ''SGEIN'' and ''SLURMIN''. Furthermore, accept any number of digits follwoing the name, e.g. ''INPUT2'' ou ''IN1034''. ==== # OUTPUT = graph_with_law.pdf ==== This directive defines the output file or directory the user need to recover, it is equivalent to ''SGEOUT'' on the old ''sge'' cluster. As for the ''INPUT'' parameter, the ''OUTPUT'' may be provided multiple times and also accept any number of digits following the name. The following alias are accepted: ''OUT'', ''SGEOUT'' and ''SLURMOUT''. The sintaxe rules for this parameter is similar to ''INPUT'' but with a little difference or the order of files or directory names, the name on execution node came's first. {{{ # OUTPUT = exe_name[:sub_name] }}} ||<#ffffcc>'''This parameter is mandatory because the working directory on execution nodes is clean on job completion, if the output files or directories are not retrieved then they will be lost!'''|| === Working section === This is the section where the user will insert the job procedures, in this case load the ''root'' environment in order to access the application installation and run the macro. {{{#!highlight bash # Load environment module load root/6.18.04 # Execute macro root -b -q MyMacro.c }}}