Experiments and Multiruns

Experiments bundle specific model, dataset, asset, and workflow choices into reusable presets. Multiruns sweep parameters across multiple values.

Defining Experiments

Define experiments in configs/experiments.py:

# configs/experiments.py
from hydra_zen import make_config, store
from configs.base import DerivaModelConfig

experiment_store = store(group="experiment", package="_global_")

experiment_store(
    make_config(
        hydra_defaults=[
            "_self_",
            {"override /model_config": "quick"},
            {"override /datasets": "training_data"},
            {"override /assets": "resnet_weights"},
            {"override /workflow": "default_workflow"},
        ],
        description="Quick training with ResNet weights on training data.",
        bases=(DerivaModelConfig,),
    ),
    name="quick_training",
)

Run an experiment:

uv run deriva-ml-run +experiment=quick_training

# Override an experiment parameter
uv run deriva-ml-run +experiment=quick_training model_config.epochs=25

Note the + prefix: this adds the experiment group, which is not in the default config. The override / prefix in hydra_defaults ensures the experiment's choices replace (rather than conflict with) the base defaults.

Multiruns and Sweeps

For parameter sweeps, use Hydra's multirun mode:

# Sweep a parameter
uv run deriva-ml-run --multirun model_config.learning_rate=0.0001,0.001,0.01

# Sweep across experiments
uv run deriva-ml-run --multirun +experiment=quick_training,long_training

Named Multirun Configurations

For complex sweeps, define named multirun configurations in configs/multiruns.py:

# configs/multiruns.py
from deriva_ml.execution import multirun_config

multirun_config(
    "lr_sweep",
    overrides=[
        "+experiment=quick_training",
        "model_config.learning_rate=0.0001,0.001,0.01,0.1",
    ],
    description='''## Learning Rate Sweep

    Exploring optimal learning rates for quick training config.

    | Learning Rate | Expected Behavior |
    |--------------|-------------------|
    | 0.0001 | Slow convergence |
    | 0.001 | Standard baseline |
    | 0.01 | Fast, may overshoot |
    | 0.1 | Likely unstable |
    ''',
)

Run a named multirun:

# Use the named multirun config (automatically enables multirun mode)
uv run deriva-ml-run +multirun=lr_sweep

# Override parameters from the multirun config
uv run deriva-ml-run +multirun=lr_sweep model_config.epochs=5

Named multiruns create a parent-child execution structure in the catalog:

Parent execution: Contains the markdown description and links to all children
Child executions: One per parameter combination, each with full provenance

Use list_parent_executions() and list_nested_executions() to navigate this hierarchy.

Code Provenance

DerivaML records the Git commit hash and source URL for each execution. Always commit your code before running models:

git add -A && git commit -m "Ready for training run"
uv run bump-version patch  # Optional: create a release tag
uv run deriva-ml-run +experiment=quick_training

If the working tree has uncommitted changes, DerivaML issues a warning and the execution record may not have a valid code reference.

See Automatic Source Code Detection for details on how provenance works in scripts, notebooks, and Docker containers.

Experiments and Multiruns

Defining Experiments

Multiruns and Sweeps

Named Multirun Configurations

Code Provenance

See Also