CLI Reference
DerivaML provides several command-line tools for running ML workflows, managing
versions, and administering catalogs. All commands are installed as console scripts
when you install the deriva-ml package.
When using a project managed with uv, prefix commands with uv run:
uv run deriva-ml-run +experiment=my_experiment
Command Overview
| Command | Description |
|---|---|
deriva-ml-run |
Execute ML models with Hydra configuration and execution tracking |
deriva-ml-run-notebook |
Execute Jupyter notebooks with parameter injection and tracking |
bump-version |
Bump semantic version tags and push to remote |
deriva-ml-install-kernel |
Install a Jupyter kernel for the current virtual environment |
deriva-ml-split-dataset |
Split a dataset into training and testing subsets |
deriva-ml-create-schema |
Create the DerivaML schema in a catalog |
deriva-ml-check-catalog-schema |
Validate a catalog's schema against DerivaML requirements |
deriva-ml-table-comments-utils |
Update table and column comments from documentation files |
create-demo-catalog |
Create a demo catalog with sample data for testing |
Model and Notebook Execution
deriva-ml-run
Execute ML models with Hydra-zen configuration composition and automatic execution tracking in a Deriva catalog.
Synopsis:
deriva-ml-run [--host HOST] [--catalog CATALOG] [--config-dir DIR]
[--config-name NAME] [--info] [--multirun|-m] [OVERRIDES...]
Arguments:
| Argument | Default | Description |
|---|---|---|
--host HOST |
(from config) | Deriva server hostname |
--catalog CATALOG |
(from config) | Catalog ID or identifier |
--config-dir DIR, -c |
src/configs |
Path to the configs directory |
--config-name NAME |
deriva_model |
Name of the main Hydra-zen config |
--info |
Display available configuration groups and options | |
--multirun, -m |
Enable Hydra multirun mode for parameter sweeps | |
OVERRIDES |
Hydra-zen configuration overrides (positional) |
Examples:
# Run with default configuration
uv run deriva-ml-run
# Override a config group
uv run deriva-ml-run model_config=my_model datasets=full_training
# Override individual parameters
uv run deriva-ml-run model_config.epochs=50 model_config.learning_rate=0.001
# Use an experiment preset
uv run deriva-ml-run +experiment=cifar10_quick
# Dry run (download inputs, skip catalog writes)
uv run deriva-ml-run dry_run=true
# Show all available configs
uv run deriva-ml-run --info
# Override host and catalog from command line
uv run deriva-ml-run --host prod.example.org --catalog 100
# Multirun with comma-separated values
uv run deriva-ml-run --multirun model_config.learning_rate=0.0001,0.001,0.01
# Named multirun configuration
uv run deriva-ml-run +multirun=lr_sweep
# Named multirun with additional overrides
uv run deriva-ml-run +multirun=lr_sweep model_config.epochs=5
See also: Running Models
deriva-ml-run-notebook
Execute Jupyter notebooks with parameter injection, automatic kernel detection, and execution tracking. The executed notebook and a Markdown conversion are uploaded to the catalog as execution assets.
Synopsis:
deriva-ml-run-notebook NOTEBOOK [--host HOST] [--catalog CATALOG]
[--file FILE] [--parameter KEY VALUE]
[--kernel KERNEL] [--inspect] [--info]
[--log-output] [OVERRIDES...]
Arguments:
| Argument | Default | Description |
|---|---|---|
NOTEBOOK |
(required) | Path to the .ipynb notebook file |
--host HOST |
(from config) | Deriva server hostname |
--catalog CATALOG |
(from config) | Catalog ID or identifier |
--file FILE, -f |
JSON or YAML file with parameter values | |
--parameter KEY VALUE, -p |
Parameter name and value to inject (repeatable) | |
--kernel KERNEL, -k |
(auto-detected) | Jupyter kernel name |
--inspect |
Display notebook parameters and exit | |
--info |
Display available Hydra configuration groups | |
--log-output |
Stream cell outputs during execution | |
OVERRIDES |
Hydra-zen configuration overrides (positional) |
Environment Variables Set During Execution:
| Variable | Purpose |
|---|---|
DERIVA_ML_WORKFLOW_URL |
Git URL or local path to the notebook source |
DERIVA_ML_WORKFLOW_CHECKSUM |
MD5 checksum of the notebook file |
DERIVA_ML_NOTEBOOK_PATH |
Absolute filesystem path to the notebook |
DERIVA_ML_SAVE_EXECUTION_RID |
Path where the notebook saves execution metadata |
DERIVA_ML_HYDRA_OVERRIDES |
JSON-encoded list of Hydra overrides |
Examples:
# Run a notebook with default configuration
uv run deriva-ml-run-notebook notebooks/analyze_results.ipynb
# Override Hydra config groups (positional overrides)
uv run deriva-ml-run-notebook notebooks/analysis.ipynb \
assets=my_assets deriva_ml=production
# Inject parameters into the notebook's parameter cell
uv run deriva-ml-run-notebook notebooks/train.ipynb \
-p learning_rate 0.001 -p epochs 50
# Load parameters from a YAML file
uv run deriva-ml-run-notebook notebooks/train.ipynb --file params.yaml
# Inspect available notebook parameters without running
uv run deriva-ml-run-notebook notebooks/train.ipynb --inspect
# Show available Hydra config groups
uv run deriva-ml-run-notebook notebooks/analysis.ipynb --info
# Stream notebook output to terminal
uv run deriva-ml-run-notebook notebooks/train.ipynb --log-output
# Override host and catalog
uv run deriva-ml-run-notebook notebooks/analysis.ipynb \
--host prod.example.org --catalog 100
See also: Running Models, Notebook Configuration
Development Tools
bump-version
Manage semantic version tags for your project. Creates an initial tag if none exists, or bumps the existing version using bump-my-version.
This tool works with setuptools_scm for dynamic version derivation from git tags — there is no hardcoded version string in the source code.
Synopsis:
bump-version [patch|minor|major]
Arguments:
| Argument | Default | Description |
|---|---|---|
patch\|minor\|major |
patch |
Which semantic version component to increment |
Environment Variables:
| Variable | Default | Description |
|---|---|---|
START |
0.1.0 |
Initial version if no tag exists |
PREFIX |
v |
Tag prefix (e.g., v for tags like v1.2.3) |
How Versioning Works:
- At a tag: Version is clean, e.g.,
1.2.3 - After a tag: Includes distance and commit hash, e.g.,
1.2.3.post2+g1234abc - Dirty working tree: Adds
.dirtysuffix
Examples:
# Bump patch version (1.2.3 -> 1.2.4)
uv run bump-version
# Bump minor version (1.2.3 -> 1.3.0)
uv run bump-version minor
# Bump major version (1.2.3 -> 2.0.0)
uv run bump-version major
# Check current version
uv run python -m setuptools_scm
Requirements: git, uv, and bump-my-version configured in pyproject.toml.
deriva-ml-install-kernel
Install a Jupyter kernel for the current virtual environment. This allows Jupyter notebooks to use the DerivaML environment with all its dependencies.
Synopsis:
deriva-ml-install-kernel [--install-local]
Arguments:
| Argument | Description |
|---|---|
--install-local |
Install kernel to the venv's prefix directory instead of the user's Jupyter data directory |
The kernel name and display name are derived from the virtual environment's
prompt setting in pyvenv.cfg.
Example Workflow:
# Create virtual environment with a name
uv venv --prompt my-ml-project
# Activate it
source .venv/bin/activate
# Install the Jupyter kernel
uv run deriva-ml-install-kernel
# Output: Installed Jupyter kernel 'my-ml-project' with display name 'Python (my-ml-project)'
# The kernel now appears in Jupyter's kernel selector
jupyter lab
Kernel location: ~/.local/share/jupyter/kernels/ (Linux/macOS)
or %APPDATA%\jupyter\kernels\ (Windows).
Data Operations
deriva-ml-split-dataset
Split a DerivaML dataset into training and testing subsets. Follows scikit-learn conventions for split parameters and supports stratified splitting.
Synopsis:
deriva-ml-split-dataset --hostname HOST --catalog-id ID --dataset-rid RID
[--test-size SIZE] [--train-size SIZE] [--seed SEED]
[--stratify-by-column COL] [--element-table TABLE]
[--include-tables TABLES] [--training-types TYPES]
[--testing-types TYPES] [--description DESC]
[--workflow-type TYPE] [--dry-run] [--show-urls]
[--no-shuffle]
Arguments:
| Argument | Default | Description |
|---|---|---|
--hostname |
(required) | Deriva server hostname |
--catalog-id |
(required) | Catalog ID to connect to |
--dataset-rid |
(required) | RID of the source dataset to split |
--domain-schema |
(auto-detected) | Domain schema name |
--test-size |
0.2 |
Test set size as fraction (0-1) or absolute count |
--train-size |
(complement) | Train set size as fraction (0-1) or absolute count |
--seed |
42 |
Random seed for reproducibility |
--no-shuffle |
Do not shuffle before splitting | |
--stratify-by-column |
Column name for stratified splitting (requires --include-tables) |
|
--element-table |
(auto-detected) | Element table to split (e.g., Image) |
--include-tables |
Comma-separated tables for denormalization (required for stratified splitting) | |
--training-types |
Labeled |
Comma-separated dataset types for the training set |
--testing-types |
Labeled |
Comma-separated dataset types for the testing set |
--description |
Description for the parent split dataset | |
--workflow-type |
Dataset_Split |
Workflow type vocabulary term |
--dry-run |
Print the split plan without modifying the catalog | |
--show-urls |
Show Chaise web interface URLs for created datasets |
Examples:
# Simple random 80/20 split
uv run deriva-ml-split-dataset --hostname localhost --catalog-id 9 \
--dataset-rid 28D0
# Stratified split by class label
uv run deriva-ml-split-dataset --hostname localhost --catalog-id 9 \
--dataset-rid 28D0 \
--stratify-by-column Image_Classification_Image_Class \
--include-tables Image,Image_Classification
# Fixed-count split
uv run deriva-ml-split-dataset --hostname localhost --catalog-id 9 \
--dataset-rid 28D0 --train-size 400 --test-size 100
# Dry run (show plan without modifying catalog)
uv run deriva-ml-split-dataset --hostname localhost --catalog-id 9 \
--dataset-rid 28D0 --dry-run
create-demo-catalog
Create a demonstration catalog with sample data for testing and development.
Synopsis:
create-demo-catalog --host HOST [--domain-schema SCHEMA]
Arguments:
| Argument | Default | Description |
|---|---|---|
--host |
(required) | Deriva server hostname |
--domain-schema |
demo-schema |
Name for the domain schema |
This command is primarily used for development and testing of DerivaML itself.
Catalog Administration
deriva-ml-create-schema
Create the DerivaML schema in a Deriva catalog. This is typically run once when setting up a new catalog for ML workflows.
Synopsis:
deriva-ml-create-schema HOSTNAME PROJECT_NAME SCHEMA_NAME --curie_prefix PREFIX
Arguments:
| Argument | Default | Description |
|---|---|---|
HOSTNAME |
(required) | Deriva server hostname |
PROJECT_NAME |
(required) | Project name for the catalog |
SCHEMA_NAME |
deriva-ml |
Schema name |
--curie_prefix |
(required) | CURIE prefix for identifiers |
deriva-ml-check-catalog-schema
Validate a catalog's schema against the DerivaML reference schema. Reports any missing tables, columns, or configuration issues.
Synopsis:
deriva-ml-check-catalog-schema --host HOST [--catalog CATALOG] [--dump]
Arguments:
| Argument | Default | Description |
|---|---|---|
--host |
(required) | Deriva server hostname |
--catalog |
1 |
Catalog number |
--dump |
Dump schema details |
deriva-ml-table-comments-utils
Update table and column comments in a catalog from file-based documentation. This is an administrative utility for maintaining schema documentation.
Synopsis:
deriva-ml-table-comments-utils --host HOST [--catalog CATALOG]
This command uses Deriva's BaseCLI for standard host/catalog arguments.