Anaconda#

Anaconda is a package and environment management system for Python. It’s the recommended way to manage Python environments on NMTHPC.

Why Use Anaconda?#

Benefits:

  • Create isolated environments for different projects

  • Easy installation of packages and dependencies

  • Manage different Python versions

  • Avoid conflicts between package requirements

  • Share reproducible environments with collaborators

Loading Anaconda#

$ module load anaconda3

Verify installation:

$ conda --version
$ which conda

Creating Environments#

Basic Environment#

Create environment with specific Python version:

$ conda create -n myenv python=3.11

Activate the environment:

$ conda activate myenv

Deactivate when done:

$ conda deactivate

Environment with Packages#

Create environment and install packages:

$ conda create -n data_analysis python=3.11 numpy pandas matplotlib scipy

Install additional packages later:

$ conda activate data_analysis
$ conda install scikit-learn seaborn

Managing Packages#

Installing Packages#

From conda:

$ conda install numpy scipy matplotlib

From conda-forge (larger package repository):

$ conda install -c conda-forge package_name

From pip (when package not in conda):

$ pip install package_name

Tip

Prefer conda install over pip install when possible. Conda handles dependencies better within conda environments.

Listing Packages#

Packages in current environment:

$ conda list

Search for available packages:

$ conda search package_name

Updating Packages#

Update specific package:

$ conda update numpy

Update all packages:

$ conda update --all

Removing Packages#

$ conda remove package_name

Managing Environments#

List Environments#

$ conda env list

or

$ conda info --envs

Clone Environment#

Create copy of existing environment:

$ conda create --name newenv --clone oldenv

Remove Environment#

$ conda env remove --name myenv

Environment Files#

Export Environment#

Create reproducible environment file:

$ conda activate myenv
$ conda env export > environment.yml

environment.yml contains all packages and versions.

Create Environment from File#

On another system or for collaborators:

$ conda env create -f environment.yml

Minimal Environment File#

Manually create environment.yml:

name: myproject
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - numpy
  - pandas
  - matplotlib
  - scikit-learn
  - pip
  - pip:
    - some-pip-only-package

Create from file:

$ conda env create -f environment.yml

Using Conda in Job Scripts#

Interactive Jobs#

$ srun --pty bash
$ module load anaconda3
$ conda activate myenv
$ python my_script.py

Batch Jobs#

#!/bin/bash
#SBATCH --job-name=conda_job
#SBATCH --output=conda_%j.out
#SBATCH --ntasks=1
#SBATCH --mem=16G
#SBATCH --time=04:00:00

# Load anaconda
module load anaconda3

# Activate environment
source activate myenv

# Run Python script
python analysis.py

Note

Use source activate in batch scripts instead of conda activate for better compatibility.

Common Environments for HPC#

Data Science Environment#

$ conda create -n datascience python=3.11 \
    numpy pandas matplotlib seaborn scikit-learn \
    jupyter notebook ipython

Machine Learning Environment#

$ conda create -n ml python=3.11 \
    numpy pandas scikit-learn \
    pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

or for TensorFlow:

$ conda create -n tensorflow python=3.11 \
    numpy pandas matplotlib \
    tensorflow-gpu cudatoolkit=12.1

Bioinformatics Environment#

$ conda create -n bio python=3.11 \
    biopython pandas numpy matplotlib \
    -c bioconda -c conda-forge

Scientific Computing Environment#

$ conda create -n science python=3.11 \
    numpy scipy matplotlib \
    sympy numba h5py netcdf4

Best Practices#

Environment Location#

By default, conda creates environments in ~/.conda/envs/.

Check environment size:

$ du -sh ~/.conda/envs/*

Clean up cached packages:

$ conda clean --all

Naming Conventions#

Use descriptive names:

  • project_name: For specific projects

  • python311: For general Python 3.11 environment

  • ml_gpu: For machine learning with GPU

  • analysis_2024: For specific analysis work

Performance Tips#

1. Use mamba for faster package resolution:

$ conda install -c conda-forge mamba
$ mamba install numpy pandas  # Much faster than conda

2. Specify channels in environment file to avoid conflicts:

channels:
  - conda-forge
  - defaults

3. Pin versions for reproducibility:

dependencies:
  - python=3.11.5
  - numpy=1.24.3
  - pandas=2.0.2

Troubleshooting#

Environment Not Found#

After creating environment:

$ conda env list  # Make sure it was created
$ conda activate myenv

If activation fails:

$ source activate myenv  # Try source activate

Package Conflicts#

Clear conda cache:

$ conda clean --all

Create fresh environment:

$ conda deactivate
$ conda env remove -n problematic_env
$ conda create -n problematic_env python=3.11

Install packages one at a time to identify conflicts:

$ conda install numpy
$ conda install pandas
# etc.

Conda is Slow#

Use mamba instead:

$ conda install -c conda-forge mamba
$ mamba install package_name  # Much faster

Use micromamba (lightweight alternative):

# Ask HPC support about micromamba availability

Out of Disk Space#

Check environment sizes:

$ du -sh ~/.conda/envs/*

Remove unused environments:

$ conda env remove -n unused_env

Clean package cache:

$ conda clean --all

Contact HPC support if you need more quota.

Example Workflows#

Creating a New Project Environment#

# Load anaconda
$ module load anaconda3

# Create environment
$ conda create -n myproject python=3.11

# Activate environment
$ conda activate myproject

# Install packages
$ conda install numpy pandas matplotlib scikit-learn jupyter

# Export for reproducibility
$ conda env export > environment.yml

# Test it works
$ python -c "import numpy, pandas; print('Success!')"

Using Jupyter with Conda#

# Create environment with Jupyter
$ conda create -n jupyter_env python=3.11 jupyter numpy pandas matplotlib

# Activate and start Jupyter
$ conda activate jupyter_env
$ jupyter notebook --no-browser --ip=0.0.0.0

Machine Learning Workflow#

# Create ML environment
$ conda create -n pytorch_ml python=3.11
$ conda activate pytorch_ml

# Install PyTorch with GPU support
$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# Install additional packages
$ conda install pandas scikit-learn matplotlib seaborn tensorboard

# Verify GPU support
$ python -c "import torch; print(torch.cuda.is_available())"

# Export environment
$ conda env export > ml_environment.yml

Conda Cheat Sheet#

Task

Command

Load Anaconda

module load anaconda3

Create environment

conda create -n myenv python=3.11

Activate environment

conda activate myenv

Deactivate

conda deactivate

Install package

conda install package

List environments

conda env list

List packages

conda list

Export environment

conda env export > env.yml

Create from file

conda env create -f env.yml

Remove environment

conda env remove -n myenv

Clean cache

conda clean --all

Additional Resources#

Questions?#

For questions about Anaconda on NMTHPC, contact hpc@nmthpc.atlassian.net.