Skip to content

Environment Setup

This guide covers setting up and managing your development environment.

Prerequisites

  • Python 3.12+
  • uv - Python package manager
  • Git
  • Access to a Deriva catalog — use an existing server or run one locally with deriva-docker

Installing uv

If you haven't installed uv yet:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

See the official uv documentation for more options.

Initializing Your Environment

From the repository root:

# Create environment and install dependencies
uv sync

This creates: - A .venv/ directory with an isolated Python environment - A uv.lock file pinning exact dependency versions

Important: Commit uv.lock to your repository to ensure reproducible environments.

Optional Dependency Groups

Install extra groups on demand:

# Jupyter notebook support
uv sync --group=jupyter

# PyTorch
uv sync --group=pytorch

# TensorFlow
uv sync --group=tensorflow

# Documentation building
uv sync --group=docs

To always install certain groups, add them to default-groups in pyproject.toml:

[tool.uv]
default-groups = ["dev", "jupyter"]

Notebook Setup

For notebook development:

# Install Jupyter support
uv sync --group=jupyter

# Install nbstripout to auto-strip output cells on commit
uv run nbstripout --install

# Register a Jupyter kernel for this environment
uv run deriva-ml-install-kernel

# Verify available kernels
uv run jupyter kernelspec list

Activating the Environment

You can run commands directly with uv run:

uv run python script.py
uv run pytest

Or activate the environment for a shell session:

# Bash/Zsh
source .venv/bin/activate

# Fish
source .venv/bin/activate.fish

# Csh/Tcsh
source .venv/bin/activate.csh

# Windows (PowerShell)
.venv\Scripts\Activate.ps1

When finished, run deactivate to leave the environment.

Updating Dependencies

Update a Specific Package

# Update DerivaML to latest version
uv sync --upgrade-package deriva-ml

# Update multiple packages
uv sync --upgrade-package deriva-ml --upgrade-package pandas

Update All Packages

# Regenerate lock file with latest versions
uv lock --upgrade

# Install updated packages
uv sync

Caution: Upgrading PyTorch or TensorFlow may require compatible GPU drivers. Consider pinning these versions in pyproject.toml.

After upgrading, commit your updated uv.lock file.

Authentication

Before accessing catalog data, authenticate with Globus:

uv run deriva-globus-auth-utils login --host <hostname>

This opens a browser for Globus authentication. Credentials are cached locally.

For multiple servers:

uv run deriva-globus-auth-utils login --host <hostname>
uv run deriva-globus-auth-utils login --host <dev-hostname>

GitHub Actions

The template includes GitHub Actions workflows in .github/workflows/:

Workflow Trigger Purpose
release.yml Version tag (v*) Creates GitHub releases with auto-generated notes
publish-docs.yml Push to main Builds and deploys documentation to GitHub Pages

These run automatically - no setup required.

Troubleshooting

"No credentials found"

Re-authenticate:

uv run deriva-globus-auth-utils login --host <hostname>

"Token expired"

Force re-authentication:

uv run deriva-globus-auth-utils login --host <hostname> --force

Kernel not found in Jupyter

Re-register the kernel:

uv run deriva-ml-install-kernel

Dependency conflicts

Try regenerating the lock file:

rm uv.lock
uv lock
uv sync

Permission denied on .venv

Remove and recreate:

rm -rf .venv
uv sync