Notebook Configuration
Notebooks use a slightly different pattern from model functions: instead of
zen_partial model configs, they use notebook_config() and run_notebook().
For the model function pattern, see Configuration Groups — Model Configuration.
Defining a Notebook Configuration
For simple notebooks that only use the standard fields (datasets, assets, etc.):
# configs/my_analysis.py
from deriva_ml.execution import notebook_config
notebook_config(
"my_analysis",
defaults={"assets": "comparison_weights", "datasets": "training_data"},
)
For notebooks with custom parameters:
# configs/my_analysis.py
from dataclasses import dataclass
from deriva_ml.execution import BaseConfig, notebook_config
@dataclass
class MyAnalysisConfig(BaseConfig):
threshold: float = 0.5
num_samples: int = 100
notebook_config(
"my_analysis",
config_class=MyAnalysisConfig,
defaults={"assets": "comparison_weights"},
)
Using run_notebook() in the Notebook
In the first code cell of your notebook:
from deriva_ml.execution import run_notebook
ml, execution, config = run_notebook("my_analysis")
After that, you can use ml, execution, and config throughout the notebook:
# Access resolved config values
print(config.threshold) # 0.5
# Access downloaded assets
for table, paths in execution.asset_paths.items():
for path in paths:
print(f"Asset: {path}")
# At the end of the notebook
execution.upload_execution_outputs()
What run_notebook() Does Internally
That single call handles a multi-step initialization sequence:
-
Loads all config modules from the config package (default:
"configs"). This imports every Python module undersrc/configs/, which registers their hydra-zen configurations with the store as a side effect. Theexperimentsmodule is loaded last since it typically depends on other configs. -
Resolves Hydra configuration via
get_notebook_configuration(). This launches hydra-zen to compose the full config from defaults, config groups, and any overrides (including overrides passed through theDERIVA_ML_HYDRA_OVERRIDESenvironment variable when running from the CLI). The result is a fully instantiated config object with all defaults merged. -
Creates a DerivaML instance connected to the catalog specified by
config.deriva_ml.hostnameandconfig.deriva_ml.catalog_id. If Hydra captured a runtime output directory during config resolution, it is passed along so the Hydra YAML files can be uploaded with the execution later. -
Validates all RIDs referenced in the config (datasets and assets) exist in the catalog. If any RID is missing, a
DerivaMLExceptionis raised with details. Warnings (e.g., datasets missing descriptions) are logged but do not block execution. -
Creates a workflow using
ml.create_workflow(). The workflow name defaults to a title-cased version of the config name (e.g.,"roc_analysis"becomes"Roc Analysis"). The workflow type defaults to"Analysis Notebook". -
Creates an ExecutionConfiguration from the resolved config, bundling the workflow, datasets, assets, and description.
-
Creates and starts an Execution context. This creates the execution record in the catalog and downloads all input datasets and assets to local storage. The downloaded assets are accessible through
execution.asset_paths. -
Returns the tuple
(ml, execution, config).
Notebook Responsibilities
The notebook is responsible for using the returned objects to do its work and for cleaning up when finished:
-
Use the execution context for provenance. Any outputs (plots, CSVs, model weights) should be registered with
execution.asset_file_path()so they are tracked as execution assets. -
Call
execution.upload_execution_outputs()at the end. This uploads all registered output files to the catalog. If you skip this call, your outputs will not be recorded. -
The execution context is not automatically cleaned up. Unlike
run_model()which manages the full lifecycle,run_notebook()leaves the execution open for the notebook to use interactively. The notebook must explicitly upload outputs when done.
Running with deriva-ml-run-notebook
The CLI wraps notebook execution with additional infrastructure that goes beyond
what run_notebook() provides inside the notebook itself:
-
Executes the notebook with papermill. The notebook is run non-interactively as a subprocess, with parameter injection from
-pflags or config files. -
Sets environment variables for workflow provenance. Before execution, the CLI sets
DERIVA_ML_WORKFLOW_URL(the Git source URL for the notebook),DERIVA_ML_WORKFLOW_CHECKSUM(MD5 of the notebook file),DERIVA_ML_NOTEBOOK_PATH(local filesystem path), andDERIVA_ML_SAVE_EXECUTION_RID(path where the notebook should write its execution metadata). These allow the CLI to reconnect to the execution after the notebook finishes. -
Forwards Hydra overrides. Positional arguments after the notebook path (e.g.,
assets=different_weights) are serialized to theDERIVA_ML_HYDRA_OVERRIDESenvironment variable as JSON, whererun_notebook()picks them up during config resolution. -
Handles kernel detection. The CLI searches installed Jupyter kernels for one whose Python executable matches the current virtual environment. You can also specify a kernel explicitly with
--kernel. -
Converts the executed notebook to Markdown. After execution, the CLI converts the
.ipynbto.mdwith images embedded as base64 data URIs and DataFrame tables converted to Markdown format, producing a self-contained archival document. -
Uploads notebook outputs as execution assets. Both the executed
.ipynband the.mdconversion are uploaded to the catalog as execution assets with typenotebook_output. This happens on top of any outputs the notebook itself uploaded viaexecution.upload_execution_outputs().
Basic Usage
# Run with default configuration
uv run deriva-ml-run-notebook notebooks/my_analysis.ipynb
# Override Hydra config groups (positional overrides)
uv run deriva-ml-run-notebook notebooks/my_analysis.ipynb \
assets=different_weights deriva_ml=production
Hydra overrides are passed as positional arguments after the notebook path. These
are forwarded to the notebook via the DERIVA_ML_HYDRA_OVERRIDES environment
variable, where run_notebook() picks them up automatically.
Parameter Injection
You can also inject values directly into a notebook's parameter cell (tagged with
parameters in Jupyter) using papermill:
# Inject individual parameters
uv run deriva-ml-run-notebook notebooks/train.ipynb \
-p learning_rate 0.001 -p epochs 50
# Load parameters from a file
uv run deriva-ml-run-notebook notebooks/train.ipynb --file params.yaml
Inspecting Available Options
# Show notebook parameters (from the papermill parameters cell)
uv run deriva-ml-run-notebook notebooks/train.ipynb --inspect
# Show available Hydra config groups
uv run deriva-ml-run-notebook notebooks/my_analysis.ipynb --info
Kernel Selection
The CLI automatically detects the Jupyter kernel for the current virtual environment. If auto-detection fails or you want to use a different kernel:
uv run deriva-ml-run-notebook notebooks/train.ipynb --kernel my-ml-project
If no kernel is installed for your environment, use
deriva-ml-install-kernel.
Output Handling
After execution, deriva-ml-run-notebook:
- Converts the executed notebook to Markdown (with images embedded as base64 data URIs and DataFrame tables converted to Markdown format)
- Uploads both the
.ipynband.mdfiles to the catalog as execution assets with typenotebook_output - Prints the execution URL for easy access
Streaming Output
To see cell outputs in your terminal as the notebook runs:
uv run deriva-ml-run-notebook notebooks/train.ipynb --log-output
See Also
- Configuration Groups — The five standard config groups
- Hydra-zen Configuration Overview — Configuration class reference
- CLI Reference — Full CLI documentation