Release Notes
Migration notes — corrections (2026-06-03)
Documentation backfill for previously under-documented public-API breakages. The
changes below already shipped in earlier releases but were missing from (or only
partially covered by) these notes. They are recorded here because they break
external callers — notably domain subclasses of DerivaML such as EyeAI — and a
consumer migrating across these versions needs them.
| Change | Shipped in | Migration |
|---|---|---|
DatasetBag.denormalize_as_dataframe(...) renamed to DatasetBag.get_denormalized_as_dataframe(...) |
~v1.30.6 (the denormalize sugar-method refactor; old names removed) | Rename the call. Arguments are unchanged: the positional include_tables list and the include_tables=, row_per=, via=, selector= keywords all carry over. The dict variant is likewise get_denormalized_as_dict(...). There is no deprecation shim — the old name raises AttributeError. |
Public DerivaML.domain_path property removed; replaced by the _domain_path() method |
mixin refactor (path-builder accessors moved to PathBuilderMixin) |
Change self.domain_path.<Table>... to self._domain_path().<Table>... (note the parentheses — it is now a method). _domain_path(schema=None) returns the path builder for the domain schema (defaulting to default_schema); pass a schema name for a non-default one. |
check_auth removed — also from the DerivaML.__init__ constructor, not only from DerivaMLConfig |
v1.37.x auth refactor (see the existing 1.37.0 note) | The existing note says to drop check_auth from hydra-zen configs / DerivaMLConfig. It is also gone from the DerivaML(...) constructor signature: subclasses that accepted check_auth and forwarded it via super().__init__(check_auth=...) now raise TypeError. Remove the parameter and the forwarded keyword. |
DerivaML.__init__ gained a mode parameter (ConnectionMode, default online) |
the mode-branched __init__ change |
Non-breaking addition. mode=ConnectionMode.online (the default) preserves prior behavior; mode=ConnectionMode.offline (or the string "online"/"offline") stages writes locally. Subclasses overriding __init__ may forward mode through to super().__init__(...) to expose offline mode. |
Version 1.39.0
Breaking-API change shipped as a minor bump. The four ways to upload execution outputs (Execution.upload_execution_outputs, Execution.upload_outputs, ExecutionSnapshot.upload_outputs, DerivaML.upload_pending) collapse into one per-execution method and one batch method. Callers of the removed methods must migrate at upgrade — there are no deprecation shims. Major-version (v2.0.0) is deferred until the unified surface has more bake time. See ADR-0009 for the rationale and the two latent bugs fixed.
Migration table:
| Old | New |
|---|---|
exe.upload_execution_outputs(clean_folder=, progress_callback=) |
exe.commit_output_assets(clean_folder=, progress_callback=) (returns UploadReport now, not dict) |
exe.upload_outputs(retry_failed=) |
exe.commit_output_assets() (retry_failed was a no-op; removed) |
snap.upload_outputs(ml=, retry_failed=) |
ml.resume_execution(snap.rid).commit_output_assets() |
ml.upload_pending(execution_rids=, retry_failed=) |
ml.commit_pending_executions(execution_rids=, clean_folder=False) |
deriva-ml-upload --retry-failed |
(removed; flag was a no-op) |
deriva-ml-upload (default: no folder cleanup) |
deriva-ml-upload --clean (explicit opt-in to clean working folder) |
Bugs fixed by the unification:
- CLI-uploaded executions now correctly transition to Uploaded status (were stuck Stopped).
- exe.upload_outputs() callers now get asset descriptions written and Upload_Duration recorded (were silently skipped).
Both bugs were present in v1.37.x but only reachable via the legacy methods that v1.39.0 removes.
Also in this release (post-v1.37.14, folded into v1.39):
feat(asset,dataset): write-through description setters (#221, closes #70).Asset.descriptionandDataset.descriptionnow use a@property/@setterpair that persists assignments to the catalog row, mirroring the symmetric pattern already in place onWorkflowandExecutionRecord.update_field_in_cataloginexecution/_helpers.pygained an optionalschema_nameparameter so the same helper now serves both ML-schema (Workflow / Execution / Dataset) and domain-schema (Asset) callers.fix(execution): Output_File directional tag + Execution_Execution exclusion (#220).bag_commit._add_asset_rows_to_bagnow auto-adds theOutput_Filedirectional Asset_Type to every asset uploaded after an execution — restores the public-API contract that every execution-linked asset carries an Input or Output role.find_asset_execution_tablesexcludesExecution_Execution(which is an execution-to-execution association, not an asset). Adds a "How execution-asset roles work" section todocs/user-guide/executions.mdand a 5-testtest_asset_role_contract.pyregression suite.refactor(execution): public-API surface extracted toasset_upload.py(#219, audit P1 Ex-god, third sweep).upload_execution_outputsand several sibling methods moved out ofexecution.pyinto the asset-upload module. Brings theexecution.pyrunning total to ~600 LOC removed across the three Ex-god sweeps.docs(audits): 2026-05-22 audit status summary (#218). Newdocs/audits/2026-05-22-audit-status.mdrunning ledger — tracks which P0/P1/P2/P3 items from the 9 audit reports have shipped vs. remain open.refactor(execution): bag_commit_upload + update_asset_execution_table extracted (#217, audit P1 Ex-god, second sweep). Second pass on the execution god-class — moves_bag_commit_uploadandupdate_asset_execution_tableout ofExecutionintoasset_upload.py.
Version 1.37.14
chore(execution): P1 sweep (part 1) — 9 audit findings (#203). Correctness + coverage fixes acrossexecution/subsystem; details indocs/audits/2026-05-22-engineer-audit-execution.md.
Version 1.37.13
chore(model): P1 sweep — 10 audit findings (#202). Correctness + cleanup + coverage fixes acrossmodel/subsystem.
Version 1.37.12
chore(asset): P1 sweep — 7 audit findings (#201). Correctness + cleanup + coverage fixes acrossasset/subsystem.
Version 1.37.11
chore(catalog): P1 sweep — 6 audit findings (#200). Docs + correctness + coverage fixes acrosscatalog/subsystem.
Version 1.37.10
chore(dataset): P1 sweep — 2 correctness bugs + 1 cleanup (#199). Fixes a typo in_version_snapshot_catalogand rebuilds theadd_dataset_memberscycle check on a real graph walk; deletes two retiredDatasetBagtombstones.
Version 1.37.9
chore(schema): P1 sweep — 7 audit findings (#198). Threads theschemaparameter throughgenerate_annotation, sorts iteration, honoursuse_hatrac; backfills docstrings on 3 public functions; adds direct unit tests forasset_annotationandgenerate_annotation.
Version 1.37.8
chore(feature): P1 sweep — 3 audit findings (#197). FK-classification coverage and docstring fixes.
Version 1.37.7
chore(core): P1 sweep — 15 audit findings (#196). Sweep acrosscore/covering exceptions, type hints, dead-code removal, and coverage.
Version 1.37.6
- fix-pack 2: eight documentation P0s (#195). Eight documentation-only P0s from the technical-writer audit — corrected examples, missing args, drifted references.
Version 1.37.5
- Six correctness P0 fixes from the 2026-05-22 pre-release audit (#193). (1)
Workflow._github_urlnow treats any non-emptygit status --porcelainoutput as dirty, not just staged-modified files — fixes silent provenance corruption for repos with unstaged, untracked, deleted, or renamed files. (2)FileSpec.create_filespecsnow reports each file's actual length under a directory walk (was reporting the parent directory's stat size). (3)DatasetBag.find_features(None)no longer crashes with TypeError; now delegates to the deduped catalog walk. (4)FeatureRecord.select_majority_vote(column=None)auto-detect works on the single-term-feature happy path (was crashing withTypeError: 'set' object is not subscriptable). (5)AssetSpecConfignow mirrorsAssetSpecfield-for-field — the hydra-zen surface had been silently missingasset_role. (6) Adds integration coverage forlocalize_assets(350-LoC public function, previously had zero end-to-end tests). - Adds nine pre-release audit reports under
docs/audits/documenting the 382 findings across the four lenses (test coverage, code duplication, logic clarity, docstring completeness) and a follow-on technical-writer audit.
Version 1.37.4
- New:
validate_config_fileandbootstrap_configconfig APIs (#192). Reusable validation surface plus a bootstrap helper that resolves the workspace.config.yamlif it exists.
Version 1.37.3
split_datasettakes the caller's Execution (breaking, #191). The function no longer opens its own execution internally; callers must wrap the call in their ownml.create_execution(...)context and passexeas the third positional argument. The CLI (deriva-ml-split-dataset) opens its execution on the caller's behalf and is unaffected. Rationale: deriva-ml never invents a workflow on the caller's behalf — the caller's workflow is the durable provenance record for the splitting decision. Seesrc/deriva_ml/dataset/split.pymodule docstring for the canonical call shape.fix(local_db): honestestimated_row_countwhenrow_peris downstream of anchor (#190).
Version 1.37.2
fix(local_db): statelessPagedFetcher; consolidate denormalize design doc (#189).
Version 1.37.1
fix(local_db): hydratePagedFetcherdedup state from engine on first access (#188).- Audit fix-pack: test-suite resync + cross-repo bug-fix workflow docs (#187). Resyncs 19 tests against the current deriva-py pin (drops removed
getCatalogSchema(refresh=True)calls, replaces removedBagDataSource(asset_localization=...)kwarg, deletes staletest_online_drift_warning), migrates 3 denormalize tests to the newrow_per=contract, and captures a new "Cross-repo bug fixes (deriva-py ↔ deriva-ml)" subsection in CLAUDE.md.
Version 1.37.0
fix(schema): applyacl_configAFTERcreate_ml_schema(#186). Production bug — every catalog created by deriva-ml since this ordering took shape had been silently missingrow_owner_guardon its deriva-ml tables. Effect: non-curator users hit HTTP 403 the first time they PATCHedExecution_Metadata(asset uploads do this). Masked because tests run as catalog-owner with full ACLs.- Typed exceptions for
find_associationfailure modes (breaking forexcept DerivaMLException:catchers, #180). New:DerivaMLAssociationAmbiguousandDerivaMLAssociationNotFoundreplace bareDerivaMLExceptionraises inTable.find_associations. Existingexcept DerivaMLException:blocks still work; narrower handlers should switch to the new types. fix(execution): warn beforedownload_assetoverwrites different bytes (#184/#181).fix(execution): silence misleading dry-run exit warning (#183/#177).fix(execution): move upload staging out of cache root (#182/#178). Staging now lives at<workspace>/staging/<execution_rid>/, not in the cache root — prevents stale staging files from leaking across runs.fix(execution): RID-key the per-asset download path (#179).fix(denormalize): feature-assoc tables are transparent (#176/#174). Denormalization now treats feature association tables (Execution_*) as transparent links rather than terminal sinks. Closes the "feature columns missing from denormalized output" class of bugs.fix:SchemaCacherace when two processes write concurrently (#175/#173).fix(execution):list_execution_children/parentspropagate duration fields (#172).fix(feature):select_by_executionreturnsNoneon no match, not raise (#171). Mirrorsselect_by_workflow's no-match semantics;feature_valuessilently skips target rows whose record group has no match rather than aborting the whole query.feat(dataset): adapters always surface element RID in return tuple (#169). Bothas_torch_datasetandas_tf_datasetnow yield(image, target, rid)rather than(image, target). Lets training loops correlate per-batch predictions back to dataset rows.fix(dataset): adapters resolve asset paths against canonical BDBag layout (#167). Plus a new "RID opacity rule" added to CLAUDE.md: a RID's only valid operation is equality comparison; never parse, slice, regex, orstartswithon it, never compare across catalogs, never sort client-side by RID.docs(reference): backfillschema.mdto matchcreate_schema.py(#170). The reference is now described as canonical perreference/README.md.fix(execution): drop unsupportedupload_timeout/upload_chunk_sizefromrun_model(#168).fix(execution):__exit__writesDuration; split into three phase columns (#166). Execution duration is now broken intoDownload_Duration,Execution_Duration,Upload_Durationso the three phases are independently observable in catalog rows.refactor(auth): dropcheck_authand theget_authn_sessionprobe. Thecheck_authkeyword no longer exists onDerivaMLConfig; remove it from any hydra-zen configs.- B1 + B2: simplify schema cache; fix
find_featuresduplicates.DerivaML._init_onlinenow delegates schema-freshness entirely to deriva-py'sErmrestCatalog(ETag revalidation + auto-invalidation on schema mutations through the same instance).find_features()no longer returns each feature multiple times when its association table has FKs to multiple tables. - T1: route all schema reads through deriva-py's
getCatalogSchema. Removes deriva-ml's local schema cache plumbing; the binding-layer cache is now the single source. docs(api): documentfind_*/list_*method naming convention (#163). The convention is now load-bearing —find_*is schema-introspection/discovery with filtering;list_*is straightforward enumeration of "what's there" inside a known scope.
Version 1.36.5
fix(execution): auto-tag input/output assets withInput_File/Output_File(#165). Every asset linked to an execution is now tagged withAsset_Roleautomatically based on the direction of the link, restoring a regression in input-side tagging that had been silently broken since the previous lifecycle refactor.fix(schema): remove domain-specific terms from defaultWorkflow_Type(E5, #164). The bootstrap vocabulary set no longer seeds opinionated domain-specific terms likeVGG19,RETFound,Multimodal, orEmbedding— those belong to downstream projects, not the framework.
Version 1.36.4
- Docs: Phase 3 audit of
core/subsystem and follow-on cleanup (#153, #154). 10 cleanup actions acrosscore/.
Version 1.36.3
refactor(schema): Phase 3 audit cleanup (#158). 11 commits / 13 actions; −11.6K LoC. Stripped legacy schema-generation paths now superseded by the bag pipeline. No user-facing API surface change.
Version 1.36.2
refactor(catalog): Phase 3 audit cleanup (#156). 10 commits / 13 actions plus provenance wiring on the clone path.
Version 1.36.1
refactor(core): Phase 3 audit cleanup (#154). 10 commits / 10 actions; tightens core module structure ahead of the next round of feature work.
Version 1.36.0
- Bag-pipeline migration complete (#111, #109, #110). The catalog clone path now routes entirely through deriva-py's bag pipeline (
CatalogBagBuilder,BagCatalogLoader); the legacy upload path is retired. Net: −1479 LoC across deriva-ml. feat(execution): bag-basedcommit_execution(per-execution path) (#103). Plus URL-dedup bag commit and the deletion of the legacy upload path (#104).feat(clone-via-bag): terminal-tables for Execution/Workflow + policy-default merging (#101).feat(catalog): nested-Dataset anchor expansion + vocab FULL default + schema-clone fixture (#100).fix(dataset):DatasetMinidstops constructingrid@Nonewhen snapshot absent (#106).refactor(execution): lean on upstream bag helpers, drop ~330 LoC (#107). Plus #108 Phase-1 cleanup (~1479 LoC removed across dataset-bag cutover and logger consolidation).- Phase 2 cleanup sprint (Steps 1–10) (#109) and Phase 2 polish (Steps 13–16) (#110): docstrings, naming, ADR-0007 (annotation-builders public API contract), apply-annotation guards.
Version 1.35.0
feat(bag-migration): migrate deriva-ml ontoderiva.bag(#96). Foundation for the bag-pipeline arc completed in v1.36.0.- ADR-0006 + CONTEXT.md additions for bag-oriented data movement (#95).
refactor(dataset): rewire callers fromCatalogGraphtoDatasetBagBuilder(#97).test(catalog): live-catalog integration tests forclone_via_bag(xfail-marked) (#98).fix(local_db): turn off SQLite FK enforcement during_populate_from_catalog(#99).fix(schema): regeneratederiva-ml-reference.jsonfrom fresh catalog dump (closes #83, #92).
Version 1.34.0
- Dataset dev versioning (breaking). Datasets now use a two-state versioning model: released versions (citable, snapshot-pinned) and dev versions (mutable, between-release labels of the form
<release>.post1.devN). Every mutation lands on a dev version;Dataset.release()is the only path to a released version. Seedocs/adr/0003-dataset-dev-versioning-model.mdanddocs/user-guide/migration.mdfor the full migration story. increment_dataset_versionrenamed torelease(breaking). New signature:Dataset.release(bump, description, execution=None). The old method is preserved as a private_increment_dataset_versionfor system-internal use only (e.g., catalog clone reinitialization).add_dataset_members,delete_dataset_members,add_dataset_type,add_dataset_types,remove_dataset_typenow land on dev (breaking). Each call advances.devNrather than producing a released version. To mint a release after mutations, calldataset.release(...).DatasetVersionrebased on PEP 440 (breaking for some equality assertions). The wire format for released versions is unchanged ("0.4.0"); dev labels use PEP 440 post-release form ("0.4.0.post1.dev1"). String equality (current_version == "1.0.0") no longer works — coerce explicitly:str(current_version) == "1.0.0".- New:
Dataset.mark_dev,is_dirty,release_diff,compare_versions. Mark drift explicitly, detect whether the catalog has drifted since the last release, and compare any two versions. - Documentation: new ADRs (0003 dev-versioning model, 0004 PEP 440 vocabulary, 0005 delivery sequence) and a CONTEXT.md vocabulary file.
Version 1.2.0
- Dataset versioning with semantic versioning. Note that the current dataset version does NOT have the current catalog values, but rather the values at the time the dataset was created. To get the current values you must increment the dataset version number. Please consult online documentation for more information on dataset and versioning.
- Streamlined create_execution. Now all datasets are automatically downloaded and instance variable has databag classes. You no longer need to explictly create dataset_bdbag.
- Significant performance improvement on cached dataset access and initial download
- Automatic creation of MINID for every dataset download
- Added method to restore an existing execution from local disk.
Version 1.1.4 - Fixed error when creating DatasetBag on windows platform.
Version 1.1.1
- Removed restriction on nested datasets so that now any level of nesting can be accomidated.
- Fixed bug in nested dataset download.
- Added additional methods to DatasetBag to make it easear to explore datasets.
- Added
datasetsinstance variable to Execution object which has Dataset objects for all of the datasets listed in the configuration. - Added option to DatasetBag init to provide a dataset RID or a path. If the dataset has already been loaded, or the dataset is nested, this will return the assocated DatasetBag object.