Partitions and Quality of Service (QOS)#

This page explains the partition and Quality of Service (QOS) systems used on NMTHPC to manage access to computing resources.

Partitions#

Partitions are groups of nodes with similar characteristics. Think of them as different queues for different types of work.

Viewing Available Partitions#

$ sinfo

Example output:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*      up 2-00:00:00     21   idle node[01-14]
cpu.std    up 2-00:00:00     16   alloc node[15-16]
gpu        up 1-00:00:00      2   idle gpu01
cpu.hm     up 2-00:00:00      3   idle himem[01-02]

Key columns:

  • PARTITION: Partition name (* indicates default)

  • AVAIL: Availability status

  • TIMELIMIT: Maximum job runtime

  • NODES: Number of nodes

  • STATE: Node state (idle, allocated, down, etc.)

  • NODELIST: Which nodes are in this partition

Common Partitions#

Note

Use sinfo to see actual up-to-date partitions on NMTHPC.

Standard (defq) Partition#

Purpose: General-purpose CPU computing

Characteristics:

  • Default partition

  • CPU compute nodes

  • Standard memory allocation

  • Time limit: 2 days

When to use:

  • Standard computational jobs

  • MPI parallel applications

  • CPU-intensive workloads

Example job submission:

#SBATCH --partition=defq
#SBATCH --ntasks=16
#SBATCH --time=24:00:00

GPU Partition#

Purpose: GPU computing, AI/ML model training

Characteristics:

  • Nodes with NVIDIA H100 or NVIDIA H200 GPUs

Example job submission:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=06:00:00

See Running Jobs on GPU Nodes for detailed guidance.

High Memory Partition#

Purpose: Memory-intensive applications

Characteristics:

  • Nodes with large RAM, for applications that need more memory than standard nodes

Example job submission:

#SBATCH --partition=cpu.hm 
#SBATCH --mem=12G
#SBATCH --time=24:00:00

Specifying Partitions#

In job script:

#SBATCH --partition=gpu

On command line:

$ sbatch --partition=gpu myjob.sh

Interactive job:

$ srun --partition=cpu.hm --pty bash

Omit for default partition:

# Uses default partition if not specified
$ sbatch myjob.sh

Quality of Service (QOS)#

QOS policies control job priority, resource limits, and scheduling behavior.

Viewing QOS Information#

List available QOS:

$ sacctmgr show qos

Your account’s QOS:

$ sacctmgr show user $USER withassoc format=user,account,qos

Available QOS Levels#

Normal QOS#

Characteristics:

  • Default QOS for most users

  • Standard priority

  • Reasonable resource limits

  • Most jobs run under this QOS

Limits (to update once we have the final numbers):

  • Max jobs per user: 100

  • Max cores per user: 128

  • Max GPUs per user: 2

  • Max wall time: 2 days

High Priority QOS#

Characteristics:

  • Higher scheduling priority

  • For time-sensitive work

  • May require special request

When to use:

  • Conference deadlines

  • Time-critical research

  • Approved special projects

Request: Contact HPC support

Long QOS#

Characteristics:

  • Extended time limits

  • Lower priority

  • For jobs that truly need extended runtime

When to use:

  • Simulations requiring > 2 days (up to 7 days currently)

  • Long-running optimizations

Example:

#SBATCH --qos=long
#SBATCH --time=14-00:00:00

h100 QOS#

QOS for GPU nodes (NVIDIA H100)

h100-long QOS#

Some as long, but for GPU nodes.

Testing QOS#

Characteristics:

  • Reserves a single node

  • Short jobs (max 1 hour walltime)

  • Use for time-sensitive code testing (limtied walltime and resources, but higher priority)

Compile QOS#

Characteristics:

  • Short jobs (max 4 hours walltime)

  • Use for demanding compilation jobs / building software

Hmem QOS#

Characteristics:

  • Same as long, for high memory nodes.

Specifying QOS#

In job script:

#SBATCH --qos=normal

On command line:

$ sbatch --qos=long myjob.sh

Resource Limits#

Partition Limits#

Each partition has limits on:

Time limits: Maximum wall time for jobs

$ sinfo -o "%P %.11l"  # Show partition time limits

Node limits: Maximum nodes per job

GPU limits: Maximum GPUs per user or job

QOS Limits#

QOS policies limit:

  • Max jobs per user: How many jobs you can have queued/running

  • Max CPUs per user: Total CPUs across all your jobs

  • Max GPUs per user: Total GPUs across all your jobs

  • Max wall time: Longest allowed job duration

  • Max submit jobs: How many jobs you can submit

Checking Your Limits#

View your current usage:

$ squeue -u $USER

Count your running jobs:

$ squeue -u $USER -t RUNNING | wc -l

Total CPUs in use:

$ squeue -u $USER -t RUNNING -o "%C" | tail -n +2 | awk '{sum+=$1} END {print sum}'

Job Priority#

Job priority determines the order in which pending jobs start when resources become available.

Priority Factors#

Factors affecting priority:

  1. QOS: Higher QOS = higher priority

  2. Fair share: Users with less recent usage get higher priority

  3. Job age: Older pending jobs get priority boost

  4. Job size: Smaller jobs may get priority to fill gaps

  5. Partition: Some partitions have priority policies

Viewing Job Priority#

$ sprio

or for your jobs only:

$ sprio -u $USER

Output columns:

  • JOBID: Job identifier

  • PRIORITY: Overall priority score

  • AGE: Priority from wait time

  • FAIRSHARE: Priority from fair-share algorithm

  • QOS: Priority from QOS

Higher numbers = higher priority

Fair Share#

NMTHPC uses fair-share scheduling to ensure equitable resource distribution.

How Fair Share Works#

  • Tracks your recent resource usage

  • Users with less recent usage get higher priority

  • Encourages balanced resource sharing

  • Usage “decays” over time (typically ~2 weeks)

Checking Fair Share#

$ sshare -u $USER

Key fields:

  • FairShare: Your current fair-share factor (0.0-1.0)

  • Usage: Your recent usage

  • Higher FairShare = higher job priority

Tip

If your jobs are pending and you have high recent usage, your fair-share priority may be low. Other users with less recent usage will get priority. Wait times typically improve as your usage “decays.”

Best Practices#

Choosing the Right Partition#

  1. Match hardware to needs:

    • GPUs needed → GPU partition

    • High memory needed → High memory partition

    • Standard CPU work → Standard partition

  2. Consider time limits:

    • Short jobs → Debug/test partition

    • Standard jobs → Standard partition

    • Very long jobs → Long QOS or special request

  3. Test first:

    • Use debug partition for initial testing

    • Scale up to production partitions

Optimizing Job Priority#

  1. Request only what you need:

    • Don’t request excessive time or resources

    • Smaller resource requests = faster starts

  2. Be strategic with submissions:

    • Submit jobs when you’re ready to use results

    • Don’t queue hundreds of jobs unless necessary

  3. Use appropriate QOS:

    • Normal QOS for routine work

    • Special QOS only when truly needed

Resource Request Strategy#

Warning

Requesting more resources than you need:

  • Wastes cluster resources

  • Reduces your fair-share priority

  • Makes jobs take longer to start

  • Decreases efficiency metrics

Do request:

  • Actual time needed + 20% buffer

  • Memory based on test runs

  • Cores your code can actually use

Don’t request:

  • Maximum time “just in case”

  • All available memory “to be safe”

  • All cores on a node if you’ll use only a few

Troubleshooting#

Job Won’t Start#

Check partition availability:

$ sinfo -p partitionname

Check QOS limits:

$ sacctmgr show qos format=Name,MaxWall,MaxTRES

View pending reason:

$ squeue -u $USER -o "%.18i %.30j %.20R"

Hit Resource Limits#

Common limit messages:

  • QOSMaxCpuPerUserLimit: You’re using max CPUs allowed

  • QOSMaxJobsPerUserLimit: You have max jobs queued

  • QOSMaxGRESPerUser: You’re using max GPUs allowed

Solutions:

  • Wait for running jobs to complete

  • Cancel unnecessary jobs

  • Request different QOS if appropriate

  • Contact HPC support for special needs

Job Priority Too Low#

Check fair-share:

$ sshare -u $USER

Check priority:

$ sprio -u $USER

Improve priority:

  • Wait for usage to decay

  • Request smaller resource allocations

  • Use appropriate QOS

  • Submit fewer concurrent jobs

Getting More Information#

Partition details:

$ scontrol show partition partitionname

QOS details:

$ sacctmgr show qos qosname format=Name,Priority,MaxWall,MaxTRES

Your account details:

$ sacctmgr show user $USER withassoc format=user,account,partition,qos,defaultqos

Questions?#

For questions about partitions, QOS policies, or resource limits, contact hpc@nmthpc.atlassian.net.

For special resource requests or custom QOS, include:

  • Why you need special resources

  • How long you’ll need them

  • Estimated resource requirements

  • Project timeline