Anaconda#
Anaconda is a package and environment management system for Python. It’s the recommended way to manage Python environments on NMTHPC.
Why Use Anaconda?#
Benefits:
Create isolated environments for different projects
Easy installation of packages and dependencies
Manage different Python versions
Avoid conflicts between package requirements
Share reproducible environments with collaborators
Loading Anaconda#
$ module load anaconda3
Verify installation:
$ conda --version
$ which conda
Creating Environments#
Basic Environment#
Create environment with specific Python version:
$ conda create -n myenv python=3.11
Activate the environment:
$ conda activate myenv
Deactivate when done:
$ conda deactivate
Environment with Packages#
Create environment and install packages:
$ conda create -n data_analysis python=3.11 numpy pandas matplotlib scipy
Install additional packages later:
$ conda activate data_analysis
$ conda install scikit-learn seaborn
Managing Packages#
Installing Packages#
From conda:
$ conda install numpy scipy matplotlib
From conda-forge (larger package repository):
$ conda install -c conda-forge package_name
From pip (when package not in conda):
$ pip install package_name
Tip
Prefer conda install over pip install when possible. Conda handles dependencies better within conda environments.
Listing Packages#
Packages in current environment:
$ conda list
Search for available packages:
$ conda search package_name
Updating Packages#
Update specific package:
$ conda update numpy
Update all packages:
$ conda update --all
Removing Packages#
$ conda remove package_name
Managing Environments#
List Environments#
$ conda env list
or
$ conda info --envs
Clone Environment#
Create copy of existing environment:
$ conda create --name newenv --clone oldenv
Remove Environment#
$ conda env remove --name myenv
Environment Files#
Export Environment#
Create reproducible environment file:
$ conda activate myenv
$ conda env export > environment.yml
environment.yml contains all packages and versions.
Create Environment from File#
On another system or for collaborators:
$ conda env create -f environment.yml
Minimal Environment File#
Manually create environment.yml:
name: myproject
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- numpy
- pandas
- matplotlib
- scikit-learn
- pip
- pip:
- some-pip-only-package
Create from file:
$ conda env create -f environment.yml
Using Conda in Job Scripts#
Interactive Jobs#
$ srun --pty bash
$ module load anaconda3
$ conda activate myenv
$ python my_script.py
Batch Jobs#
#!/bin/bash
#SBATCH --job-name=conda_job
#SBATCH --output=conda_%j.out
#SBATCH --ntasks=1
#SBATCH --mem=16G
#SBATCH --time=04:00:00
# Load anaconda
module load anaconda3
# Activate environment
source activate myenv
# Run Python script
python analysis.py
Note
Use source activate in batch scripts instead of conda activate for better compatibility.
Common Environments for HPC#
Data Science Environment#
$ conda create -n datascience python=3.11 \
numpy pandas matplotlib seaborn scikit-learn \
jupyter notebook ipython
Machine Learning Environment#
$ conda create -n ml python=3.11 \
numpy pandas scikit-learn \
pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
or for TensorFlow:
$ conda create -n tensorflow python=3.11 \
numpy pandas matplotlib \
tensorflow-gpu cudatoolkit=12.1
Bioinformatics Environment#
$ conda create -n bio python=3.11 \
biopython pandas numpy matplotlib \
-c bioconda -c conda-forge
Scientific Computing Environment#
$ conda create -n science python=3.11 \
numpy scipy matplotlib \
sympy numba h5py netcdf4
Best Practices#
Environment Location#
By default, conda creates environments in ~/.conda/envs/.
Check environment size:
$ du -sh ~/.conda/envs/*
Clean up cached packages:
$ conda clean --all
Naming Conventions#
Use descriptive names:
project_name: For specific projectspython311: For general Python 3.11 environmentml_gpu: For machine learning with GPUanalysis_2024: For specific analysis work
Performance Tips#
1. Use mamba for faster package resolution:
$ conda install -c conda-forge mamba
$ mamba install numpy pandas # Much faster than conda
2. Specify channels in environment file to avoid conflicts:
channels:
- conda-forge
- defaults
3. Pin versions for reproducibility:
dependencies:
- python=3.11.5
- numpy=1.24.3
- pandas=2.0.2
Troubleshooting#
Environment Not Found#
After creating environment:
$ conda env list # Make sure it was created
$ conda activate myenv
If activation fails:
$ source activate myenv # Try source activate
Package Conflicts#
Clear conda cache:
$ conda clean --all
Create fresh environment:
$ conda deactivate
$ conda env remove -n problematic_env
$ conda create -n problematic_env python=3.11
Install packages one at a time to identify conflicts:
$ conda install numpy
$ conda install pandas
# etc.
Conda is Slow#
Use mamba instead:
$ conda install -c conda-forge mamba
$ mamba install package_name # Much faster
Use micromamba (lightweight alternative):
# Ask HPC support about micromamba availability
Out of Disk Space#
Check environment sizes:
$ du -sh ~/.conda/envs/*
Remove unused environments:
$ conda env remove -n unused_env
Clean package cache:
$ conda clean --all
Contact HPC support if you need more quota.
Example Workflows#
Creating a New Project Environment#
# Load anaconda
$ module load anaconda3
# Create environment
$ conda create -n myproject python=3.11
# Activate environment
$ conda activate myproject
# Install packages
$ conda install numpy pandas matplotlib scikit-learn jupyter
# Export for reproducibility
$ conda env export > environment.yml
# Test it works
$ python -c "import numpy, pandas; print('Success!')"
Using Jupyter with Conda#
# Create environment with Jupyter
$ conda create -n jupyter_env python=3.11 jupyter numpy pandas matplotlib
# Activate and start Jupyter
$ conda activate jupyter_env
$ jupyter notebook --no-browser --ip=0.0.0.0
Machine Learning Workflow#
# Create ML environment
$ conda create -n pytorch_ml python=3.11
$ conda activate pytorch_ml
# Install PyTorch with GPU support
$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
# Install additional packages
$ conda install pandas scikit-learn matplotlib seaborn tensorboard
# Verify GPU support
$ python -c "import torch; print(torch.cuda.is_available())"
# Export environment
$ conda env export > ml_environment.yml
Conda Cheat Sheet#
Task |
Command |
|---|---|
Load Anaconda |
|
Create environment |
|
Activate environment |
|
Deactivate |
|
Install package |
|
List environments |
|
List packages |
|
Export environment |
|
Create from file |
|
Remove environment |
|
Clean cache |
|
Additional Resources#
Questions?#
For questions about Anaconda on NMTHPC, contact hpc@nmthpc.atlassian.net.