QSTAT: show the status of Grid Engine jobs and queues
List user jobs
Running command qstat without options returns the user jobs, for example:
[diogo@fermi03 ~]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------- 5032642 1.10000 farm_conex diogo r 11/13/2017 03:42:23 lipq@wn054.ncg.ingrid.pt 1 5032892 1.10000 farm_conex diogo r 11/13/2017 04:17:35 lipq@wn073.ncg.ingrid.pt 1 5032893 1.10000 farm_conex diogo Eqw 11/13/2017 04:17:30 1 5032903 1.10000 farm_conex diogo qw 11/13/2017 03:18:30 1
on this example the first two job are running on nodes wn054 and wn073 and the last job is waiting for resources availability. Jobs with state Eqw means that the job run but there was a cluster error, users should contact the IT team on this cases.
List other users jobs
Use the option -u the query other users jobs, the option accept an user name as argument or an wildcard to query all users usage, for example:
[diogo@fermi03 ~]$ qstat -u fcruz job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------- 3852655 0.40279 Hexadecame fcruz r 11/13/2017 04:22:46 lipq@wn103.ncg.ingrid.pt 1 3852656 0.00000 Hexadecame fcruz hqw 10/03/2017 12:16:42
on this example the state hqw means that the job is on hold, this is an advance way of running jobs, see Job Submissions Advanced section.
Use the following command to list all users jobs:
[diogo@fermi03 ~]$ qstat -u '*' job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 5025385 0.18066 cream_8234 cmsplt013 r 11/12/2017 05:12:10 cmsgrid_mcore@wn022.ncg.ingrid 8 5025386 0.18066 cream_1267 cmsplt013 r 11/12/2017 05:12:33 cmsgrid_mcore@wn054.ncg.ingrid 8 5025851 0.33147 cream_3727 snop005 r 11/12/2017 05:50:14 gridq@wn017.ncg.ingrid.pt 1 5025852 0.33147 cream_9907 snop005 r 11/12/2017 05:50:16 gridq@wn110.ncg.ingrid.pt 1 ... endless list ....
Show job details
The option -j <job_id> of command qstat provides all job details, the list is huge:
[mpinto@fermi01 ~]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------------------ 5048766 0.22507 lipfarm_DD mpinto r 11/15/2017 15:00:39 solip@wn202.ncg.ingrid.pt 1 [mpinto@fermi01 ~]$ qstat -j 5048766 | less ============================================================== job_number: 5048766 exec_file: job_scripts/5048766 submission_time: Wed Nov 15 15:00:39 2017 owner: mpinto uid: 5040009 group: cosmo gid: 5040000 sge_o_home: /home/cosmo/mpinto sge_o_log_name: mpinto ...
check man qstat for more details on command usage.
List all QUEUES
Use the command bellow to list all available queues on FARM, the queues of interest to LIP users terminate with lip plus the lipq queue and gridc for LIP Coimbra users.
[jpina@fermi03 ~]$ qstat -g c CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE -------------------------------------------------------------------------------- atlasgrid 0.70 32 0 281 375 0 62 atlasgrid_mcore 0.77 456 0 416 1016 0 144 calolip 0.00 0 0 24 24 0 0 cmsgrid 0.74 0 0 305 359 0 54 cmsgrid_mcore 0.76 312 0 564 1032 0 172 cmst3grid -NA- 0 0 0 1 0 1 complip 0.00 0 0 11 11 0 0 cosmolip 0.09 0 0 77 77 0 0 csyslip 0.00 20 0 28 48 0 0 dteamgrid 0.69 0 0 314 373 0 59 dteamgrid_mcore 0.73 0 0 704 832 0 128 fast_medusa 0.00 0 0 44 44 0 0 gridc 0.57 0 0 286 286 0 0 gridq 0.73 1 0 310 362 0 51 gridq_mcore 0.72 0 0 720 848 0 128 hpcgrid 0.63 216 0 48 280 0 16 hpcib 0.42 24 0 60 100 0 16 hpclong 0.62 12 0 102 352 0 240 incd 0.77 0 0 270 318 0 48 lipq 0.66 176 0 102 338 0 60 medusa 0.25 65 0 199 268 0 4 opsgrid 0.70 0 0 325 387 0 62 opsgrid_mcore 0.72 0 0 720 848 0 128 qao_2356 0.74 0 0 16 192 0 176 qao_2356_ib 0.39 0 0 16 112 0 96 qdav 0.00 0 0 8 8 0 0 qix_e5472_nv 0.00 0 0 24 32 0 8 qix_es2680 0.72 0 0 48 48 0 0 solip 0.00 0 0 66 74 0 8
The table column meaning is the following:
CQLOAD: total queue present load;
USED: total queue present used slots;
RES: total queue present reserved slots, normally none;
AVAIL: total queue present available, or free, slots;
TOTAL: total queue present configured slots;
- aoACDS:
- cdsuE:
Looking at the previous example we see that the maximum allowed slots running on queue lipq is 338, corresponding to the TOTAL column but we have to subtract 60 slots due to some problem. So the actual allowed slots will be 278; 176 + 102. Althought the 102 slots are tagged as AVAIL they may be not free because the nodes belonging to queue lipq are serving other queues, like the 20 slots used on csyslip queue.
List LIP QUEUES
Use the command bellow to list only LIP queues on FARM, the command also accept a list of queue names, e.g. qstat -g c -q lipq gridc cosmolip.
[jpina@fermi03 ~]$ qstat -g c -q '*lip*' CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE -------------------------------------------------------------------------------- calolip 0.00 0 0 24 24 0 0 complip 0.00 0 0 11 11 0 0 cosmolip 0.08 0 0 77 77 0 0 csyslip 0.00 20 0 28 48 0 0 lipq 0.68 176 0 102 338 0 60 solip 0.00 0 0 66 74 0 8
QHOST: show the status of Grid Engine hosts, queues and jobs
List all cluster nodes
The qhost command allows to query the scheduler and provide some usefull information about the system. For example, running the command without arguments returns the list of all nodes on the cluster including some details about the hardware: