Configuration Groups
DerivaML uses five standard configuration groups. Each group needs at least a default entry. This page details each group and how to customize it.
For the underlying configuration classes and advanced patterns, see the Hydra-zen Configuration Overview.
Connection Settings (deriva_ml)
Define how to connect to Deriva catalogs:
# configs/deriva.py
from hydra_zen import builds, store
from deriva_ml import DerivaMLConfig
DerivaMLConf = builds(DerivaMLConfig, populate_full_signature=True)
deriva_store = store(group="deriva_ml")
# Development catalog
deriva_store(
DerivaMLConf(hostname="dev.example.org", catalog_id="1"),
name="default_deriva",
)
# Production catalog
deriva_store(
DerivaMLConf(hostname="prod.example.org", catalog_id="100"),
name="production",
)
See DerivaMLConfig for all parameters.
Datasets (datasets)
Specify which datasets to download for each workflow:
# configs/datasets.py
from hydra_zen import store
from deriva_ml.dataset import DatasetSpecConfig
from deriva_ml.execution import with_description
datasets_store = store(group="datasets")
# Required: default (used when no override is specified)
datasets_store([], name="default_dataset")
# A named dataset collection
datasets_store(
with_description(
[DatasetSpecConfig(rid="1-ABC", version="1.0.0")],
"Training dataset with 1000 labeled images.",
),
name="training_data",
)
See DatasetSpecConfig for options.
Assets (assets)
List input asset RIDs (model weights, configuration files, etc.):
# configs/assets.py
from hydra_zen import store
from deriva_ml.execution import with_description
asset_store = store(group="assets")
# Required: default
asset_store([], name="default_asset")
# Model weights
asset_store(
with_description(
["6-EPNR"],
"ResNet50 pretrained weights from MAE pre-training.",
),
name="resnet_weights",
)
For caching support, use AssetSpecConfig instead of plain RID strings.
See Configuration Descriptions
for details on with_description().
Workflows (workflow)
Define the computational process being tracked:
# configs/workflow.py
from hydra_zen import builds, store
from deriva_ml.execution import Workflow
workflow_store = store(group="workflow")
workflow_store(
builds(Workflow, name="default", workflow_type="Training",
populate_full_signature=True),
name="default_workflow",
)
workflow_store(
builds(Workflow, name="Feature Extraction", workflow_type="Preprocessing",
description="Extract features from raw data",
populate_full_signature=True),
name="feature_extraction",
)
See Workflows for how workflows track source code provenance.
Model Configuration (model_config)
Configure model hyperparameters. This is where zen_partial=True is essential:
# configs/my_model.py
from hydra_zen import builds, store
from models.my_model import train_classifier
model_store = store(group="model_config")
# Base config: partially applied, waits for ml_instance and execution
ModelConfig = builds(
train_classifier,
learning_rate=1e-3,
epochs=10,
batch_size=32,
populate_full_signature=True,
zen_partial=True,
)
model_store(ModelConfig, name="default_model")
model_store(ModelConfig, name="quick", epochs=3, learning_rate=1e-2)
model_store(ModelConfig, name="long_training", epochs=100, learning_rate=1e-4)
See Model Configuration with zen_partial for the full pattern.
See Also
- Hydra-zen Configuration Overview — Configuration class reference
- Experiments and Multiruns — Preset experiment configurations
- Notebook Configuration — Notebook-specific patterns