Nodes and Filesystems#
This page provides information about the hardware architecture and storage systems available on NMTHPC.
Cluster Overview#
The NMT HPC cluster consists of:
16 CPU compute nodes for general-purpose computing
2 GPU nodes with NVIDIA H100 GPUs for accelerated workloads
Multiple login nodes for user access
Two high-performance filesystems: ZFS 1 (backed up in ZFS 2) and BeeGFS
Node Types#
Login Nodes#
Login nodes are your entry point to the cluster. When you SSH to NMTHPC, you connect to a login node. Your home directory will be in the ZFS 1 filesystem, which is automatically backed up to ZFS 2.
Purpose:
File editing and management
Code compilation
Job submission and monitoring
Small file transfers
Viewing results
Warning
Login nodes are shared resources. Do not run computationally intensive tasks on login nodes. Use SLURM to submit jobs to compute nodes or interactive jobs via srun to compile software.
Appropriate uses:
Editing scripts with vim or nano
Compiling lightweight codes (single procesor, seconds)
Starting interactive jobs with
srunto compile or test codesSubmitting jobs with
sbatchorsrunChecking job status with
squeueLight data processing (single processor)
Inappropriate uses:
Running simulations or large-scale analyses
Training machine learning models
Processing large datasets
Compiling computationally intensive software packages
Any task requiring significant CPU or memory
CPU Compute Nodes (Standard and High-Memory)#
CPU compute nodes are designed for general-purpose parallel computing.
Specifications (typical):
Processor: Multi-core CPUs
Cores per node: 256
Memory: 6 Gb/core (standard) or 12 Gb (high-memory)
Accessing CPU nodes:
See Running Interactive Jobs and Running Batch Jobs for details.
GPU Compute Nodes#
NMTHPC features 2 GPU nodes equipped with NVIDIA H100 GPUs.
GPU Specifications:
GPU Model: NVIDIA NVIDIA H100
Processor: Multi-core CPUs
Cores per node: 128
Memory: 6 Gb/core
Requesting GPU resources:
See Running Jobs on GPU Nodes for comprehensive guidance.
Filesystems#
NMTHPC provides multiple storage systems optimized for different use cases.
Home Directory#
Path: /home/username
Characteristics:
Personal storage space
Backed up (ZFS1 filesystem backed up in filesystem ZFS2)
Limited quota
Accessible from all nodes
Best for:
Source code and scripts
Configuration files
Small datasets
Job submission scripts
Quota: Check your usage with:
df -h /home/username, where username is your 900#.
Tip
Keep your home directory organized and clean. Regularly delete unnecessary files to stay within quota limits.
BeeGFS Filesystem#
This is a scratch file with no backup, periodically wiped. Do not store important data here!
Path: Under /data/username
Characteristics:
High-performance parallel filesystem
Optimized for large-scale I/O
Larger storage allocation
Shared across compute nodes
Best for:
Active project data
Large datasets being actively processed
Simulation input and output files
High I/O workloads
This is a scratch space for temporary files needed during job execution.
Characteristics:
Not backed up
Periodically cleaned up
Best for:
Temporary data or model output
Intermediate results
Reducing I/O to shared filesystems
Warning
Data in local scratch \data is periodically deleted (e.g., every 90 days) and not backed up. Always copy important results to a permanent filesystem or to other machines.
Storage Best Practices#
Choosing the Right Filesystem#
Use Case |
Recommended Location |
|---|---|
Scripts and code |
|
Small and medium datasets |
|
Active large datasets |
|
Temporary files during jobs |
|
Managing Disk Quotas#
Check your current usage:
$ df -h /home/username
View disk usage by directory:
$ du -h --max-depth=1 ~/
Find large files:
$ find ~/ -type f -size +1G
Data Organization Tips#
Use project directories: Organize data by project or research topic
Clean up regularly: Delete intermediate files and failed job outputs
Compress when possible: Use
gzip,tar, or other compression toolsArchive completed projects: Move finished projects to long-term storage
File Permissions#
Ensure appropriate file permissions:
# Make script executable
$ chmod +x script.sh
# Make directory readable by group
$ chmod g+r directory/
# Restrict file to owner only
$ chmod 600 sensitive_file
Monitoring Resource Usage#
Check Node Information#
See available nodes:
$ sinfo
View node details:
$ scontrol show nodes
Check Your Job’s Resource Usage#
While job is running:
$ squeue -u $USER
After job completes:
$ sacct -j JOBID --format=JobID,JobName,Elapsed,MaxRSS,MaxVMSize,State
See Monitoring Resources for more detailed information.
Hardware Specifications Summary#
For detailed hardware specifications of specific node types and partitions, contact HPC support or see Partitions and QOS.
Note
Hardware configurations may change as the system is upgraded. Always check current specifications with sinfo or contact HPC support for the most up-to-date information.
Questions?#
For questions about node types, storage systems, or hardware specifications, contact HPC support at hpc@nmthpc.atlassian.net.