Computing/LIP_Lisbon_Farm/5_SGE-deprecated/5.1_Cluster

QSTAT: show the status of Grid Engine jobs and queues

List user jobs

Running command qstat without options returns the user jobs, for example:

[diogo@fermi03 ~]$ qstat
job-ID  prior   name       user   state submit/start at     queue                    slots ja-task-ID
-----------------------------------------------------------------------------------------------------
5032642 1.10000 farm_conex diogo  r     11/13/2017 03:42:23 lipq@wn054.ncg.ingrid.pt 1
5032892 1.10000 farm_conex diogo  r     11/13/2017 04:17:35 lipq@wn073.ncg.ingrid.pt 1
5032893 1.10000 farm_conex diogo  Eqw   11/13/2017 04:17:30                          1
5032903 1.10000 farm_conex diogo  qw    11/13/2017 03:18:30                          1

on this example the first two job are running on nodes wn054 and wn073 and the last job is waiting for resources availability. Jobs with state Eqw means that the job run but there was a cluster error, users should contact the IT team on this cases.

List other users jobs

Use the option -u the query other users jobs, the option accept an user name as argument or an wildcard to query all users usage, for example:

[diogo@fermi03 ~]$ qstat -u fcruz
job-ID  prior   name       user   state submit/start at     queue                    slots ja-task-ID 
-----------------------------------------------------------------------------------------------------
3852655 0.40279 Hexadecame fcruz  r     11/13/2017 04:22:46 lipq@wn103.ncg.ingrid.pt 1        
3852656 0.00000 Hexadecame fcruz  hqw   10/03/2017 12:16:42

on this example the state hqw means that the job is on hold, this is an advance way of running jobs, see Job Submissions Advanced section.

Use the following command to list all users jobs:

[diogo@fermi03 ~]$ qstat -u '*'
job-ID  prior   name       user   state submit/start at     queue                                slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
5025385 0.18066 cream_8234 cmsplt013    r     11/12/2017 05:12:10 cmsgrid_mcore@wn022.ncg.ingrid 8        
5025386 0.18066 cream_1267 cmsplt013    r     11/12/2017 05:12:33 cmsgrid_mcore@wn054.ncg.ingrid 8        
5025851 0.33147 cream_3727 snop005      r     11/12/2017 05:50:14 gridq@wn017.ncg.ingrid.pt      1        
5025852 0.33147 cream_9907 snop005      r     11/12/2017 05:50:16 gridq@wn110.ncg.ingrid.pt      1
... endless list ....

Show job details

The option -j <job_id> of command qstat provides all job details, the list is huge:

[mpinto@fermi01 ~]$ qstat
job-ID  prior   name       user   state submit/start at     queue                     slots ja-task-ID 
------------------------------------------------------------------------------------------------------
5048766 0.22507 lipfarm_DD mpinto r     11/15/2017 15:00:39 solip@wn202.ncg.ingrid.pt 1

[mpinto@fermi01 ~]$ qstat -j 5048766 | less
==============================================================
job_number:                 5048766
exec_file:                  job_scripts/5048766
submission_time:            Wed Nov 15 15:00:39 2017
owner:                      mpinto
uid:                        5040009
group:                      cosmo
gid:                        5040000
sge_o_home:                 /home/cosmo/mpinto
sge_o_log_name:             mpinto
...

check man qstat for more details on command usage.

List all QUEUES

Use the command bellow to list all available queues on FARM, the queues of interest to LIP users terminate with lip plus the lipq queue and gridc for LIP Coimbra users.

[jpina@fermi03 ~]$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
--------------------------------------------------------------------------------
atlasgrid                         0.70     32      0    281    375      0     62
atlasgrid_mcore                   0.77    456      0    416   1016      0    144
calolip                           0.00      0      0     24     24      0      0
cmsgrid                           0.74      0      0    305    359      0     54
cmsgrid_mcore                     0.76    312      0    564   1032      0    172
cmst3grid                         -NA-      0      0      0      1      0      1
complip                           0.00      0      0     11     11      0      0
cosmolip                          0.09      0      0     77     77      0      0
csyslip                           0.00     20      0     28     48      0      0
dteamgrid                         0.69      0      0    314    373      0     59
dteamgrid_mcore                   0.73      0      0    704    832      0    128
fast_medusa                       0.00      0      0     44     44      0      0
gridc                             0.57      0      0    286    286      0      0
gridq                             0.73      1      0    310    362      0     51
gridq_mcore                       0.72      0      0    720    848      0    128
hpcgrid                           0.63    216      0     48    280      0     16
hpcib                             0.42     24      0     60    100      0     16
hpclong                           0.62     12      0    102    352      0    240
incd                              0.77      0      0    270    318      0     48
lipq                              0.66    176      0    102    338      0     60
medusa                            0.25     65      0    199    268      0      4
opsgrid                           0.70      0      0    325    387      0     62
opsgrid_mcore                     0.72      0      0    720    848      0    128
qao_2356                          0.74      0      0     16    192      0    176
qao_2356_ib                       0.39      0      0     16    112      0     96
qdav                              0.00      0      0      8      8      0      0
qix_e5472_nv                      0.00      0      0     24     32      0      8
qix_es2680                        0.72      0      0     48     48      0      0
solip                             0.00      0      0     66     74      0      8

The table column meaning is the following:

CQLOAD: total queue present load;
USED: total queue present used slots;
RES: total queue present reserved slots, normally none;
AVAIL: total queue present available, or free, slots;
TOTAL: total queue present configured slots;
aoACDS:
cdsuE:

Looking at the previous example we see that the maximum allowed slots running on queue lipq is 338, corresponding to the TOTAL column but we have to subtract 60 slots due to some problem. So the actual allowed slots will be 278; 176 + 102. Althought the 102 slots are tagged as AVAIL they may be not free because the nodes belonging to queue lipq are serving other queues, like the 20 slots used on csyslip queue.

List LIP QUEUES

Use the command bellow to list only LIP queues on FARM, the command also accept a list of queue names, e.g. qstat -g c -q lipq gridc cosmolip.

[jpina@fermi03 ~]$ qstat -g c -q '*lip*'
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE
--------------------------------------------------------------------------------
calolip                           0.00      0      0     24     24      0      0
complip                           0.00      0      0     11     11      0      0
cosmolip                          0.08      0      0     77     77      0      0
csyslip                           0.00     20      0     28     48      0      0
lipq                              0.68    176      0    102    338      0     60
solip                             0.00      0      0     66     74      0      8

QHOST: show the status of Grid Engine hosts, queues and jobs

List all cluster nodes

The qhost command allows to query the scheduler and provide some usefull information about the system. For example, running the command without arguments returns the list of all nodes on the cluster including some details about the hardware:

Query Commands