The NCG cluster is a national infrastructure with heterogenous hardware and is shared between a wide range of internal and external communities. The cluster provides a few type of resources organized by queues, HPC, HTC, GPU, infiniband, environment or group ownership.
On most cases users do not need to apply for a particular resource or queue, the system automatically choose the best matching hardware. A user who need special hardware or environment should request resources as shown below. The queues, in general, are not homogenous in respect to resources contents, and the user authorization vary from queue to queue and some times from host to host within the same queue.
Users shouldn't apply for a specific queue because he may not be allowed to use it or the best resources combination may be on a different queue. On some cases mixing a resquest for a queue and resources may lead to a combination that couldn't be satisfay and the job will stay on waiting list for ever, as seen on the example below List host queues using mixed criteria. |
Queue Configuration
Queues represent an aggregation of hosts, each host contributes with a certain number of running jobs or slots. The queues available for LIP users are organized by group ownership listed on the table below, for that reason the resources on a queue may be heterogenous.
Queue name |
owner |
usage |
calolip |
CALO |
shared |
complip |
COMPASS |
shared |
cosmolip |
Auger |
private |
csyslip |
LIP |
public |
lipq |
LIP |
public |
solip |
LIP |
public |
Jobs submitted to the cluster are allocated automaticly to the proper host queue, users do not need to apply for a specific queue. Some of the group queues are shared with other LIP groups and so users will benefit if they let the queue attribute blank, there will be more available nodes to run his jobs.
Each node may run a maximum number of jobs, normally one job correspond to one slots, shared between one or more queues, so the sum of TOTAL jobs is on most cases greather than the actual maximum of allowed jobs.
Lets say that some worker node can serve 2 slots and we configure two queues, A and B, capable of using 2 slots each on the same node. We can have all combinations of jobs running on those queues as long the sum of jobs on the node is never greather than 2:
A |
B |
Running |
|
0 |
0 |
0 |
|
0 |
1 |
1 |
|
0 |
2 |
2 |
|
1 |
0 |
1 |
|
1 |
1 |
2 |
|
1 |
2 |
3 |
never happen |
2 |
0 |
2 |
|
2 |
1 |
3 |
never happen |
2 |
2 |
4 |
never happen |
Each queue enforce a limit of 2GB of residente memory and 4GB of virtual memory per job in order to protect other tasks running on a single node.
Complex/Resources Configuration
Complex attributes provides a way of defining cluster resources which are requested throught the option -l <resource> on submission command qsub or qhost and qselect commands below, check the manual page, man complex, for more details.
The pertinent complex attributes, per host, defined for LIP users are the following:
Complex (resource) |
type |
values |
Description |
slots |
integer |
integer |
number of jobs |
mem_total |
memory |
integer |
total RAM memory |
virtual_total |
memory |
integer |
total virtual memory |
proc |
string |
intel,amd |
CPU family |
gpu |
boolean |
0,1 |
GPU present |
Parallel Environments
Parallel environment support the execution of distributed shared memory applications. Examples of parallel environments are OpenMP on shared memory multiprocessor systems, or Message Passing Interface (MPI) on a distributed system cluster. These are available for LIP users but the MPI environment only work for a single node, this opens the possibility of allocating more than one slot and corresponding memory for a single job. The table below shows the presently available parallel environment for LIP users.
Parallel Environment Name |
Multi Node |
Max. Slots |
mcore |
NO |
Node dependent |
Users who need parallel applications or have wider memory requisits may request more than one slot. Each slot guarantee 2GB of RAM and 2GB of virtual memory, for example, if a user request 4 slots then the job will have a RAM limit of (2x4)GB=8GB and (4x4)GB=16GB of virtual memory allocated for usage.
It is advisable to check the availability of resources serving the requested slots value, use the qhost and qselect commands below to do so. We also recommend a conservative selection of slots values in order to save resources for other users as well.
Check https://wiki-lip.lip.pt/Computing/LIP_Lisboa_Farm/6.2_Job_Submissions_Advanced section to learn how to request a parallel environment.
Operating Systems
The LIP FARM supports the execution of multi operating systems on the same hardware throught the new Containers technology, we use an implementation called udocker which permit to virtualize complex environments on user space.
Presently the cluster supports the operating systems, or distributions:
- CentOS 6.9
- CentOS 7.4
- Ubuntu 16.04
and more could be added as needed.
For each type of operating system there is a dedicated login server, check Login Servers sections for more details. The login servers are configured as the worker nodes, when running the target operating system, to facilitate the user adapting the submission scripts, see Job Submission Basic section for more information and examples.
There is a local command qinfo made inhouse to list the login servers by operating system and available container names and corresponding operating systems, see QINFO: print available opetating systems section below.