Skip to content

Configuration Overview

DerivaML uses hydra-zen for configuration management. This provides a Python-first approach to configuration - no YAML files needed.

Why Hydra-Zen?

  • Python-first: Configurations are Python code with type hints and IDE support
  • Composable: Mix and match configuration groups at runtime
  • Reproducible: Configurations are serialized and tracked with your results
  • Flexible: Override any parameter from the command line

Configuration Architecture

src/configs/
├─ __init__.py          # Loads all config modules
├─ deriva.py            # DerivaML connection settings
├─ datasets.py          # Dataset specifications
├─ assets.py            # Asset RID configurations
├─ workflow.py          # Workflow metadata
├─ base.py              # Base DerivaModelConfig
├─ workflow.py          # Workflow metadata
├─ cifar10_cnn.py       # Model hyperparameters
├─ experiments.py       # Experiment presets
├─ multiruns.py         # Named multirun configurations
└─ my_notebook.py       # Notebook configurations

Key Concepts

Configuration Groups

Configuration groups organize related settings. Each group has multiple named configurations:

Group Purpose Example Configs
deriva_ml Catalog connection local, eye_ai, dev
datasets Input data specification training, testing, full
assets Input assets (weights, etc.) weights_v1, pretrained
model_config Model hyperparameters quick, extended, regularized
workflow Workflow metadata training_workflow, analysis_workflow
experiment Preset combinations run1, baseline, ablation

The builds() Function

The builds() function creates a structured configuration from a function or class:

from hydra_zen import builds
from models.my_model import my_model

# Create a configuration that captures the function signature
MyModelConfig = builds(
    my_model,
    learning_rate=1e-3,
    epochs=10,
    populate_full_signature=True,  # Include all parameters
    zen_partial=True,              # Create partial function
)

The store() Function

The store() function registers configurations with Hydra:

from hydra_zen import store

# Create a store for a specific group
model_store = store(group="model_config")

# Register configurations
model_store(MyModelConfig, name="default")
model_store(MyModelConfig, epochs=50, name="extended")

Defaults and Overrides

Each configuration file specifies defaults using hydra_defaults:

hydra_defaults = [
    "_self_",
    {"deriva_ml": "default_deriva"},
    {"datasets": "default_dataset"},
]

Override at runtime:

# Use a different config from a group
uv run deriva-ml-run datasets=testing

# Override a specific field
uv run deriva-ml-run model_config.epochs=100

# Combine multiple overrides
uv run deriva-ml-run datasets=testing model_config=extended

Next Steps