Nodes and Filesystems

Nodes and Filesystems#

This page provides information about the hardware architecture and storage systems available on NMTHPC.

Cluster Overview#

The NMT HPC cluster consists of:

16 CPU compute nodes for general-purpose computing
2 GPU nodes with NVIDIA H100 GPUs for accelerated workloads
Multiple login nodes for user access
Two high-performance filesystems: ZFS 1 (backed up in ZFS 2) and BeeGFS

Node Types#

Login Nodes#

Login nodes are your entry point to the cluster. When you SSH to NMTHPC, you connect to a login node. Your home directory will be in the ZFS 1 filesystem, which is automatically backed up to ZFS 2.

Purpose:

File editing and management
Code compilation
Job submission and monitoring
Small file transfers
Viewing results

Warning

Login nodes are shared resources. Do not run computationally intensive tasks on login nodes. Use SLURM to submit jobs to compute nodes or interactive jobs via srun to compile software.

Appropriate uses:

Editing scripts with vim or nano
Compiling lightweight codes (single procesor, seconds)
Starting interactive jobs with srun to compile or test codes
Submitting jobs with sbatch or srun
Checking job status with squeue
Light data processing (single processor)

Inappropriate uses:

Running simulations or large-scale analyses
Training machine learning models
Processing large datasets
Compiling computationally intensive software packages
Any task requiring significant CPU or memory

CPU Compute Nodes (Standard and High-Memory)#

CPU compute nodes are designed for general-purpose parallel computing.

Specifications (typical):

Processor: Multi-core CPUs
Cores per node: 256
Memory: 6 Gb/core (standard) or 12 Gb (high-memory)

Accessing CPU nodes:

See Running Interactive Jobs and Running Batch Jobs for details.

GPU Compute Nodes#

NMTHPC features 2 GPU nodes equipped with NVIDIA H100 GPUs.

GPU Specifications:

GPU Model: NVIDIA NVIDIA H100
Processor: Multi-core CPUs
Cores per node: 128
Memory: 6 Gb/core

Requesting GPU resources:

See Running Jobs on GPU Nodes for comprehensive guidance.

Filesystems#

NMTHPC provides multiple storage systems optimized for different use cases.

Home Directory#

Path: /home/username

Characteristics:

Personal storage space
Backed up (ZFS1 filesystem backed up in filesystem ZFS2)
Limited quota
Accessible from all nodes

Best for:

Source code and scripts
Configuration files
Small datasets
Job submission scripts

Quota: Check your usage with: df -h /home/username, where username is your 900#.

Tip

Keep your home directory organized and clean. Regularly delete unnecessary files to stay within quota limits.

BeeGFS Filesystem#

This is a scratch file with no backup, periodically wiped. Do not store important data here!

Path: Under /data/username

Characteristics:

High-performance parallel filesystem
Optimized for large-scale I/O
Larger storage allocation
Shared across compute nodes

Best for:

Active project data
Large datasets being actively processed
Simulation input and output files
High I/O workloads

This is a scratch space for temporary files needed during job execution.

Characteristics:

Not backed up
Periodically cleaned up

Best for:

Temporary data or model output
Intermediate results
Reducing I/O to shared filesystems

Warning

Data in local scratch \data is periodically deleted (e.g., every 90 days) and not backed up. Always copy important results to a permanent filesystem or to other machines.

Storage Best Practices#

Choosing the Right Filesystem#

Use Case	Recommended Location
Scripts and code	`/home/` directory (ZFS 1)
Small and medium datasets	`/home/` directory (ZFS 1)
Active large datasets	`/data/` directory (BeeGFS)
Temporary files during jobs	`/data/` directory (BeeGFS)

Managing Disk Quotas#

Check your current usage:

$ df -h /home/username

View disk usage by directory:

$ du -h --max-depth=1 ~/

Find large files:

$ find ~/ -type f -size +1G

Data Organization Tips#

Use project directories: Organize data by project or research topic
Clean up regularly: Delete intermediate files and failed job outputs
Compress when possible: Use gzip, tar, or other compression tools
Archive completed projects: Move finished projects to long-term storage

File Permissions#

Ensure appropriate file permissions:

# Make script executable
$ chmod +x script.sh

# Make directory readable by group
$ chmod g+r directory/

# Restrict file to owner only
$ chmod 600 sensitive_file

Monitoring Resource Usage#

Check Node Information#

See available nodes:

$ sinfo

View node details:

$ scontrol show nodes

Check Your Job’s Resource Usage#

While job is running:

$ squeue -u $USER

After job completes:

$ sacct -j JOBID --format=JobID,JobName,Elapsed,MaxRSS,MaxVMSize,State

See Monitoring Resources for more detailed information.

Hardware Specifications Summary#

For detailed hardware specifications of specific node types and partitions, contact HPC support or see Partitions and QOS.

Note

Hardware configurations may change as the system is upgraded. Always check current specifications with sinfo or contact HPC support for the most up-to-date information.

Questions?#

For questions about node types, storage systems, or hardware specifications, contact HPC support at hpc@nmthpc.atlassian.net.

Nodes and Filesystems

Contents

Nodes and Filesystems#

Cluster Overview#

Node Types#

Login Nodes#

CPU Compute Nodes (Standard and High-Memory)#

GPU Compute Nodes#

Filesystems#

Home Directory#

BeeGFS Filesystem#

Storage Best Practices#

Choosing the Right Filesystem#

Managing Disk Quotas#

Data Organization Tips#

File Permissions#

Monitoring Resource Usage#

Check Node Information#

Check Your Job’s Resource Usage#

Hardware Specifications Summary#

Questions?#