Skip to content

Exceptions

DerivaML defines custom exceptions to provide clear error messages for common error conditions when working with catalogs, datasets, and executions.

Custom exceptions for the DerivaML package.

This module defines the exception hierarchy for DerivaML. All DerivaML-specific exceptions inherit from DerivaMLException, making it easy to catch all library errors with a single except clause.

Exception Hierarchy

DerivaMLException (base class for all DerivaML errors) │ ├── DerivaMLConfigurationError (configuration and initialization) │ ├── DerivaMLSchemaError (schema/catalog structure issues) │ ├── DerivaMLAuthenticationError (authentication failures) │ ├── DerivaMLOfflineError (online-only operation in offline mode) │ └── DerivaMLNoExecutionContext (write attempted on read-only handle) │ ├── DerivaMLDataError (data access and validation) │ ├── DerivaMLNotFoundError (entity not found) │ │ ├── DerivaMLDatasetNotFound (dataset lookup failures) │ │ ├── DerivaMLTableNotFound (table lookup failures) │ │ └── DerivaMLInvalidTerm (vocabulary term not found) │ ├── DerivaMLTableTypeError (wrong table type) │ ├── DerivaMLValidationError (data validation failures) │ ├── DerivaMLCycleError (cycle detected in relationships) │ └── DerivaMLStateInconsistency (SQLite/catalog state disagreement) │ ├── DerivaMLExecutionError (execution lifecycle) │ ├── DerivaMLWorkflowError (workflow issues) │ │ └── DerivaMLDirtyWorkflowError (uncommitted changes) │ └── DerivaMLUploadError (asset upload failures) │ ├── DerivaMLReadOnlyError (write operation on read-only resource) │ └── DerivaMLDenormalizeError (denormalization planning errors) ├── DerivaMLDenormalizeMultiLeaf ├── DerivaMLDenormalizeNoSink ├── DerivaMLDenormalizeDownstreamLeaf ├── DerivaMLDenormalizeAmbiguousPath └── DerivaMLDenormalizeUnrelatedAnchor

Example

from deriva_ml.core.exceptions import DerivaMLException, DerivaMLNotFoundError try: # doctest: +SKIP ... dataset = ml.lookup_dataset("invalid_rid") ... except DerivaMLDatasetNotFound as e: ... print(f"Dataset not found: {e}") ... except DerivaMLNotFoundError as e: ... print(f"Entity not found: {e}") ... except DerivaMLException as e: ... print(f"DerivaML error: {e}")

DerivaMLAuthenticationError

Bases: DerivaMLConfigurationError

Exception raised for authentication failures.

Raised when authentication with the catalog fails or credentials are invalid.

Example

raise DerivaMLAuthenticationError("Failed to authenticate with catalog") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
141
142
143
144
145
146
147
148
149
150
class DerivaMLAuthenticationError(DerivaMLConfigurationError):
    """Exception raised for authentication failures.

    Raised when authentication with the catalog fails or credentials are invalid.

    Example:
        >>> raise DerivaMLAuthenticationError("Failed to authenticate with catalog")  # doctest: +SKIP
    """

    pass

DerivaMLConfigurationError

Bases: DerivaMLException

Exception raised for configuration and initialization errors.

Raised when there are issues with DerivaML configuration, catalog initialization, or schema setup.

Example

raise DerivaMLConfigurationError("Invalid catalog configuration") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
80
81
82
83
84
85
86
87
88
89
90
class DerivaMLConfigurationError(DerivaMLException):
    """Exception raised for configuration and initialization errors.

    Raised when there are issues with DerivaML configuration, catalog
    initialization, or schema setup.

    Example:
        >>> raise DerivaMLConfigurationError("Invalid catalog configuration")  # doctest: +SKIP
    """

    pass

DerivaMLCycleError

Bases: DerivaMLDataError

Exception raised when a cycle is detected in relationships.

Raised when creating dataset hierarchies or other relationships that would result in a circular dependency.

Parameters:

Name Type Description Default
cycle_nodes list[str]

List of nodes involved in the cycle.

required
msg str

Additional context. Defaults to "Cycle detected".

'Cycle detected'
Example

raise DerivaMLCycleError(["Dataset1", "Dataset2", "Dataset1"]) # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
class DerivaMLCycleError(DerivaMLDataError):
    """Exception raised when a cycle is detected in relationships.

    Raised when creating dataset hierarchies or other relationships that
    would result in a circular dependency.

    Args:
        cycle_nodes: List of nodes involved in the cycle.
        msg: Additional context. Defaults to "Cycle detected".

    Example:
        >>> raise DerivaMLCycleError(["Dataset1", "Dataset2", "Dataset1"])  # doctest: +SKIP
    """

    def __init__(self, cycle_nodes: list[str], msg: str = "Cycle detected") -> None:
        super().__init__(f"{msg}: {cycle_nodes}")
        self.cycle_nodes = cycle_nodes

DerivaMLDataError

Bases: DerivaMLException

Exception raised for data access and validation issues.

Base class for errors related to data lookup, validation, and integrity.

Example

raise DerivaMLDataError("Invalid data format") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
201
202
203
204
205
206
207
208
209
210
class DerivaMLDataError(DerivaMLException):
    """Exception raised for data access and validation issues.

    Base class for errors related to data lookup, validation, and integrity.

    Example:
        >>> raise DerivaMLDataError("Invalid data format")  # doctest: +SKIP
    """

    pass

DerivaMLDatasetNotFound

Bases: DerivaMLNotFoundError

Exception raised when a dataset cannot be found.

Raised when attempting to look up a dataset that doesn't exist in the catalog or downloaded bag.

Parameters:

Name Type Description Default
dataset_rid str

The RID of the dataset that was not found.

required
msg str

Additional context. Defaults to "Dataset not found".

'Dataset not found'
Example

raise DerivaMLDatasetNotFound("1-ABC") # doctest: +SKIP DerivaMLDatasetNotFound: Dataset 1-ABC not found

Source code in src/deriva_ml/core/exceptions.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
class DerivaMLDatasetNotFound(DerivaMLNotFoundError):
    """Exception raised when a dataset cannot be found.

    Raised when attempting to look up a dataset that doesn't exist in the
    catalog or downloaded bag.

    Args:
        dataset_rid: The RID of the dataset that was not found.
        msg: Additional context. Defaults to "Dataset not found".

    Example:
        >>> raise DerivaMLDatasetNotFound("1-ABC")  # doctest: +SKIP
        DerivaMLDatasetNotFound: Dataset 1-ABC not found
    """

    def __init__(self, dataset_rid: str, msg: str = "Dataset not found") -> None:
        super().__init__(f"{msg}: {dataset_rid}")
        self.dataset_rid = dataset_rid

DerivaMLDenormalizeAmbiguousPath

Bases: DerivaMLDenormalizeError

Multiple FK paths between two requested tables — can't silently choose.

Raised when Rule 6 detects two or more distinct FK paths between row_per and another requested / via table. Silent path selection is rejected by design — the result shape would be materially different depending on which path is chosen, and callers should be explicit. Disambiguate by adding intermediates to include_tables (their columns are included) or to via= (path-only, columns excluded).

Attributes:

Name Type Description
from_table

the row_per table name (the "anchor" of the ambiguity).

to_table

the requested table with multiple paths.

paths

list of path descriptions — each is a list of table names from from_table to to_table.

suggested_intermediates

tables that appear in at least one path but not in include_tables — any of these could be named in include_tables or via to force a choice.

Example

try: # doctest: +SKIP ... d.as_dataframe(["Image", "Subject"]) # diamond schema ... except DerivaMLDenormalizeAmbiguousPath as e: ... for p in e.paths: ... print(" → ".join(p)) ... # Retry routing explicitly through Observation: ... df = d.as_dataframe( ... ["Image", "Subject"], via=e.suggested_intermediates[:1] ... )

Source code in src/deriva_ml/core/exceptions.py
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
class DerivaMLDenormalizeAmbiguousPath(DerivaMLDenormalizeError):
    """Multiple FK paths between two requested tables — can't silently choose.

    Raised when Rule 6 detects two or more distinct FK paths between
    ``row_per`` and another requested / via table. Silent path selection
    is rejected by design — the result shape would be materially
    different depending on which path is chosen, and callers should be
    explicit. Disambiguate by adding intermediates to ``include_tables``
    (their columns are included) or to ``via=`` (path-only, columns
    excluded).

    Attributes:
        from_table: the ``row_per`` table name (the "anchor" of the
            ambiguity).
        to_table: the requested table with multiple paths.
        paths: list of path descriptions — each is a list of table
            names from ``from_table`` to ``to_table``.
        suggested_intermediates: tables that appear in at least one
            path but not in ``include_tables`` — any of these could be
            named in ``include_tables`` or ``via`` to force a choice.

    Example:
        >>> try:  # doctest: +SKIP
        ...     d.as_dataframe(["Image", "Subject"])  # diamond schema
        ... except DerivaMLDenormalizeAmbiguousPath as e:
        ...     for p in e.paths:
        ...         print(" → ".join(p))
        ...     # Retry routing explicitly through Observation:
        ...     df = d.as_dataframe(
        ...         ["Image", "Subject"], via=e.suggested_intermediates[:1]
        ...     )
    """

    def __init__(
        self,
        from_table: str,
        to_table: str,
        paths: list[list[str]],
        suggested_intermediates: list[str],
    ) -> None:
        self.from_table = from_table
        self.to_table = to_table
        self.paths = [list(p) for p in paths]
        self.suggested_intermediates = list(suggested_intermediates)
        path_strs = ["\n    " + " → ".join(p) for p in paths]
        super().__init__(
            f"Multiple FK paths between {from_table!r} and {to_table!r}:"
            f"{''.join(path_strs)}\n"
            f"Resolve by one of:\n"
            f"  • Add an intermediate to include_tables "
            f"(its columns will be in output): {suggested_intermediates}\n"
            f"  • Add an intermediate to via= (path-only, no columns): "
            f"{suggested_intermediates}\n"
            f"  • Narrow include_tables so only one path is valid."
        )

DerivaMLDenormalizeDownstreamLeaf

Bases: DerivaMLDenormalizeError

Explicit row_per conflicts with a downstream table in include_tables.

Raised when the user specifies row_per=X but another table in include_tables is downstream of X via FK (would require aggregation).

Attributes:

Name Type Description
row_per

the explicit row_per value.

downstream_tables

tables downstream of row_per that can't be hoisted.

Source code in src/deriva_ml/core/exceptions.py
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
class DerivaMLDenormalizeDownstreamLeaf(DerivaMLDenormalizeError):
    """Explicit ``row_per`` conflicts with a downstream table in ``include_tables``.

    Raised when the user specifies ``row_per=X`` but another table in
    ``include_tables`` is downstream of X via FK (would require aggregation).

    Attributes:
        row_per: the explicit row_per value.
        downstream_tables: tables downstream of row_per that can't be hoisted.
    """

    def __init__(self, row_per: str, downstream_tables: list[str]) -> None:
        self.row_per = row_per
        self.downstream_tables = list(downstream_tables)
        super().__init__(
            f"Table(s) {downstream_tables} are downstream of row_per={row_per!r}. "
            f"One row per {row_per} would require aggregating multiple rows of "
            f"{downstream_tables} — aggregation is not yet supported. "
            f"Drop row_per to get one row per {downstream_tables}, or remove "
            f"{downstream_tables} from include_tables."
        )

DerivaMLDenormalizeError

Bases: DerivaMLException

Base class for denormalization errors.

All errors raised by :class:~deriva_ml.local_db.denormalizer.Denormalizer and related planning functions are instances of this class.

Example

raise DerivaMLDenormalizeError("Planner failed") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
484
485
486
487
488
489
490
491
492
class DerivaMLDenormalizeError(DerivaMLException):
    """Base class for denormalization errors.

    All errors raised by :class:`~deriva_ml.local_db.denormalizer.Denormalizer`
    and related planning functions are instances of this class.

    Example:
        >>> raise DerivaMLDenormalizeError("Planner failed")  # doctest: +SKIP
    """

DerivaMLDenormalizeMultiLeaf

Bases: DerivaMLDenormalizeError

Multiple candidate tables for row_per — ambiguous leaf.

Raised when Rule 2 auto-inference finds more than one sink in include_tables — i.e., multiple tables tie for "deepest in the FK graph." The user must specify row_per explicitly to resolve.

Attributes:

Name Type Description
candidates

list of table names that all qualify as sinks.

include_tables

the include_tables argument that triggered the ambiguity, for reference.

Example

try: # doctest: +SKIP ... d.as_dataframe(["Dataset", "Subject"]) ... except DerivaMLDenormalizeMultiLeaf as e: ... print(f"Pick one of {e.candidates} as row_per") ... # Then retry: d.as_dataframe(..., row_per="Subject")

Source code in src/deriva_ml/core/exceptions.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
class DerivaMLDenormalizeMultiLeaf(DerivaMLDenormalizeError):
    """Multiple candidate tables for ``row_per`` — ambiguous leaf.

    Raised when Rule 2 auto-inference finds more than one sink in
    ``include_tables`` — i.e., multiple tables tie for "deepest in the
    FK graph." The user must specify ``row_per`` explicitly to resolve.

    Attributes:
        candidates: list of table names that all qualify as sinks.
        include_tables: the ``include_tables`` argument that triggered
            the ambiguity, for reference.

    Example:
        >>> try:  # doctest: +SKIP
        ...     d.as_dataframe(["Dataset", "Subject"])
        ... except DerivaMLDenormalizeMultiLeaf as e:
        ...     print(f"Pick one of {e.candidates} as row_per")
        ...     # Then retry: d.as_dataframe(..., row_per="Subject")
    """

    def __init__(self, candidates: list[str], include_tables: list[str]) -> None:
        self.candidates = list(candidates)
        self.include_tables = list(include_tables)
        super().__init__(
            f"Multiple candidates for row_per: {candidates}. "
            f"Specify row_per=... explicitly. "
            f"(include_tables={include_tables})"
        )

DerivaMLDenormalizeNoSink

Bases: DerivaMLDenormalizeError

No sink found in the FK subgraph — cycle detected.

Raised when every table in include_tables has an outbound FK to another table in the set, forming a cycle. Pathological — rare in real schemas.

Parameters:

Name Type Description Default
msg str

Descriptive error message. Should identify the tables forming the cycle.

''
Example

raise DerivaMLDenormalizeNoSink( # doctest: +SKIP ... "Cycle in FK graph between tables A, B, C" ... )

Source code in src/deriva_ml/core/exceptions.py
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
class DerivaMLDenormalizeNoSink(DerivaMLDenormalizeError):
    """No sink found in the FK subgraph — cycle detected.

    Raised when every table in ``include_tables`` has an outbound FK to
    another table in the set, forming a cycle. Pathological — rare in
    real schemas.

    Args:
        msg: Descriptive error message. Should identify the tables
            forming the cycle.

    Example:
        >>> raise DerivaMLDenormalizeNoSink(  # doctest: +SKIP
        ...     "Cycle in FK graph between tables A, B, C"
        ... )
    """

DerivaMLDenormalizeUnrelatedAnchor

Bases: DerivaMLDenormalizeError

Anchor has no FK path to any table in include_tables ∪ via.

Raised when Rule 8 detects anchors whose table has no FK relationship to any requested table — those anchors would contribute nothing to the output, which is almost always a mistake (wrong dataset passed, stale table name, etc.). Pass ignore_unrelated_anchors=True to silently drop them if the heterogeneity is intentional.

Note: this is distinct from Rule 7 case 5 (table has an FK path into include_tables ∪ via but the specific anchor RIDs don't reach row_per). Case 5 anchors are silently dropped regardless of the flag — only case 6 (no path at all) raises this error.

Attributes:

Name Type Description
unrelated_tables

tables of the unrelated anchors.

include_tables

the include_tables argument for reference.

Example

try: # doctest: +SKIP ... d.as_dataframe(["Image", "Subject"]) # dataset has stray types ... except DerivaMLDenormalizeUnrelatedAnchor as e: ... print(f"Dataset has unrelated members: {e.unrelated_tables}") ... # Retry, dropping them: ... df = d.as_dataframe( ... ["Image", "Subject"], ignore_unrelated_anchors=True ... )

Source code in src/deriva_ml/core/exceptions.py
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
class DerivaMLDenormalizeUnrelatedAnchor(DerivaMLDenormalizeError):
    """Anchor has no FK path to any table in ``include_tables ∪ via``.

    Raised when Rule 8 detects anchors whose table has no FK
    relationship to any requested table — those anchors would
    contribute nothing to the output, which is almost always a mistake
    (wrong dataset passed, stale table name, etc.). Pass
    ``ignore_unrelated_anchors=True`` to silently drop them if the
    heterogeneity is intentional.

    Note: this is distinct from Rule 7 case 5 (table has an FK path
    into ``include_tables ∪ via`` but the specific anchor RIDs don't
    reach ``row_per``). Case 5 anchors are silently dropped regardless
    of the flag — only case 6 (no path at all) raises this error.

    Attributes:
        unrelated_tables: tables of the unrelated anchors.
        include_tables: the ``include_tables`` argument for reference.

    Example:
        >>> try:  # doctest: +SKIP
        ...     d.as_dataframe(["Image", "Subject"])  # dataset has stray types
        ... except DerivaMLDenormalizeUnrelatedAnchor as e:
        ...     print(f"Dataset has unrelated members: {e.unrelated_tables}")
        ...     # Retry, dropping them:
        ...     df = d.as_dataframe(
        ...         ["Image", "Subject"], ignore_unrelated_anchors=True
        ...     )
    """

    def __init__(
        self,
        unrelated_tables: list[str],
        include_tables: list[str],
    ) -> None:
        self.unrelated_tables = list(unrelated_tables)
        self.include_tables = list(include_tables)
        super().__init__(
            f"Anchors of table(s) {unrelated_tables} have no FK path to any "
            f"table in include_tables={include_tables}. They would contribute "
            f"nothing to the output.\n"
            f"Options:\n"
            f"  • Remove these anchors from the anchor set.\n"
            f"  • Add {unrelated_tables} (or a linking table) to include_tables.\n"
            f"  • Pass ignore_unrelated_anchors=True to silently drop them."
        )

DerivaMLDirtyWorkflowError

Bases: DerivaMLWorkflowError

Exception raised when workflow code has uncommitted changes.

DerivaML requires code to be committed before execution for provenance tracking. Running with uncommitted changes means the execution record cannot reliably link back to the source code.

Use allow_dirty=True in the API or --allow-dirty on the CLI to override this check when debugging or iterating.

Parameters:

Name Type Description Default
path str

Path to the file with uncommitted changes.

required
Example

raise DerivaMLDirtyWorkflowError("src/models/train.py") # doctest: +SKIP DerivaMLDirtyWorkflowError: File src/models/train.py has uncommitted changes. ...

Source code in src/deriva_ml/core/exceptions.py
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
class DerivaMLDirtyWorkflowError(DerivaMLWorkflowError):
    """Exception raised when workflow code has uncommitted changes.

    DerivaML requires code to be committed before execution for provenance
    tracking. Running with uncommitted changes means the execution record
    cannot reliably link back to the source code.

    Use ``allow_dirty=True`` in the API or ``--allow-dirty`` on the CLI
    to override this check when debugging or iterating.

    Args:
        path: Path to the file with uncommitted changes.

    Example:
        >>> raise DerivaMLDirtyWorkflowError("src/models/train.py")  # doctest: +SKIP
        DerivaMLDirtyWorkflowError: File src/models/train.py has uncommitted changes. ...
    """

    def __init__(self, path: str) -> None:
        super().__init__(
            f"File {path} has uncommitted changes. Commit before running, or use --allow-dirty to override."
        )
        self.path = path

DerivaMLException

Bases: Exception

Base exception class for all DerivaML errors.

This is the root exception for all DerivaML-specific errors. Catching this exception will catch any error raised by the DerivaML library.

Attributes:

Name Type Description
_msg

The error message stored for later access.

Parameters:

Name Type Description Default
msg str

Descriptive error message. Defaults to empty string.

''
Example

raise DerivaMLException("Failed to connect to catalog") # doctest: +SKIP DerivaMLException: Failed to connect to catalog

Source code in src/deriva_ml/core/exceptions.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class DerivaMLException(Exception):
    """Base exception class for all DerivaML errors.

    This is the root exception for all DerivaML-specific errors. Catching this
    exception will catch any error raised by the DerivaML library.

    Attributes:
        _msg: The error message stored for later access.

    Args:
        msg: Descriptive error message. Defaults to empty string.

    Example:
        >>> raise DerivaMLException("Failed to connect to catalog")  # doctest: +SKIP
        DerivaMLException: Failed to connect to catalog
    """

    def __init__(self, msg: str = "") -> None:
        super().__init__(msg)
        self._msg = msg

DerivaMLExecutionError

Bases: DerivaMLException

Exception raised for execution lifecycle issues.

Base class for errors related to workflow execution, asset management, and provenance tracking.

Example

raise DerivaMLExecutionError("Execution failed to initialize") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
397
398
399
400
401
402
403
404
405
406
407
class DerivaMLExecutionError(DerivaMLException):
    """Exception raised for execution lifecycle issues.

    Base class for errors related to workflow execution, asset management,
    and provenance tracking.

    Example:
        >>> raise DerivaMLExecutionError("Execution failed to initialize")  # doctest: +SKIP
    """

    pass

DerivaMLInvalidTerm

Bases: DerivaMLNotFoundError

Exception raised when a vocabulary term is not found or invalid.

Raised when attempting to look up or use a term that doesn't exist in a controlled vocabulary table, or when a term name/synonym cannot be resolved.

Parameters:

Name Type Description Default
vocabulary str

Name of the vocabulary table being searched.

required
term str

The term name that was not found.

required
msg str

Additional context about the error. Defaults to "Term doesn't exist".

"Term doesn't exist"
Example

raise DerivaMLInvalidTerm("Diagnosis", "unknown_condition") # doctest: +SKIP DerivaMLInvalidTerm: Invalid term unknown_condition in vocabulary Diagnosis: Term doesn't exist.

Source code in src/deriva_ml/core/exceptions.py
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
class DerivaMLInvalidTerm(DerivaMLNotFoundError):
    """Exception raised when a vocabulary term is not found or invalid.

    Raised when attempting to look up or use a term that doesn't exist in
    a controlled vocabulary table, or when a term name/synonym cannot be resolved.

    Args:
        vocabulary: Name of the vocabulary table being searched.
        term: The term name that was not found.
        msg: Additional context about the error. Defaults to "Term doesn't exist".

    Example:
        >>> raise DerivaMLInvalidTerm("Diagnosis", "unknown_condition")  # doctest: +SKIP
        DerivaMLInvalidTerm: Invalid term unknown_condition in vocabulary Diagnosis: Term doesn't exist.
    """

    def __init__(self, vocabulary: str, term: str, msg: str = "Term doesn't exist") -> None:
        super().__init__(f"Invalid term {term} in vocabulary {vocabulary}: {msg}.")
        self.vocabulary = vocabulary
        self.term = term

DerivaMLMaterializeLimitExceeded

Bases: DerivaMLValidationError

Raised when a result set exceeds the caller-supplied materialize_limit.

Surfaced by helpers (e.g. feature_values) that materialize the full result set into memory before reduction. Callers can either raise the limit, narrow their query (e.g. add an execution_rids filter), or switch to a streaming consumer.

Attributes:

Name Type Description
actual_count

The actual size of the result set that triggered the limit.

limit

The materialize_limit the caller passed.

Example

from deriva_ml.core.exceptions import DerivaMLMaterializeLimitExceeded exc = DerivaMLMaterializeLimitExceeded(actual_count=1500, limit=1000) exc.actual_count 1500 "exceeds materialize_limit" in str(exc) True

Source code in src/deriva_ml/core/exceptions.py
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
class DerivaMLMaterializeLimitExceeded(DerivaMLValidationError):
    """Raised when a result set exceeds the caller-supplied ``materialize_limit``.

    Surfaced by helpers (e.g. ``feature_values``) that materialize the
    full result set into memory before reduction. Callers can either
    raise the limit, narrow their query (e.g. add an ``execution_rids``
    filter), or switch to a streaming consumer.

    Attributes:
        actual_count: The actual size of the result set that triggered
            the limit.
        limit: The ``materialize_limit`` the caller passed.

    Example:
        >>> from deriva_ml.core.exceptions import DerivaMLMaterializeLimitExceeded
        >>> exc = DerivaMLMaterializeLimitExceeded(actual_count=1500, limit=1000)
        >>> exc.actual_count
        1500
        >>> "exceeds materialize_limit" in str(exc)
        True
    """

    def __init__(self, actual_count: int, limit: int):
        self.actual_count = actual_count
        self.limit = limit
        super().__init__(
            f"feature_values result set ({actual_count} rows) exceeds materialize_limit ({limit}); "
            f"narrow the query (e.g. pass execution_rids=...) or raise the limit."
        )

DerivaMLNoExecutionContext

Bases: DerivaMLConfigurationError

Exception raised when an execution-scoped operation is attempted without an execution context.

Handles returned by ml.table(name) are read-only — useful for schema introspection — but their .insert(...) and asset-file methods raise this exception. Use exe.table(name) to get a handle bound to an execution that permits writes.

Example

Calling a write method on a read-only handle raises this error::

>>> handle = ml.table("Subject")  # doctest: +SKIP
>>> handle.record_class()              # OK  # doctest: +SKIP
>>> handle.insert({"Name": "x"})       # raises  # doctest: +SKIP
Traceback (most recent call last):
    ...
DerivaMLNoExecutionContext: ml.table() handles are read-only; use exe.table() for writes
Source code in src/deriva_ml/core/exceptions.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
class DerivaMLNoExecutionContext(DerivaMLConfigurationError):
    """Exception raised when an execution-scoped operation is attempted without an execution context.

    Handles returned by ``ml.table(name)`` are read-only — useful for schema
    introspection — but their ``.insert(...)`` and asset-file methods raise
    this exception. Use ``exe.table(name)`` to get a handle bound to an
    execution that permits writes.

    Example:
        Calling a write method on a read-only handle raises this error::

            >>> handle = ml.table("Subject")  # doctest: +SKIP
            >>> handle.record_class()              # OK  # doctest: +SKIP
            >>> handle.insert({"Name": "x"})       # raises  # doctest: +SKIP
            Traceback (most recent call last):
                ...
            DerivaMLNoExecutionContext: ml.table() handles are read-only; use exe.table() for writes
    """

    pass

DerivaMLNotFoundError

Bases: DerivaMLDataError

Exception raised when an entity cannot be found.

Raised when a lookup operation fails to find the requested entity (dataset, table, term, etc.) in the catalog or bag.

Example

raise DerivaMLNotFoundError("Entity '1-ABC' not found in catalog") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
213
214
215
216
217
218
219
220
221
222
223
class DerivaMLNotFoundError(DerivaMLDataError):
    """Exception raised when an entity cannot be found.

    Raised when a lookup operation fails to find the requested entity
    (dataset, table, term, etc.) in the catalog or bag.

    Example:
        >>> raise DerivaMLNotFoundError("Entity '1-ABC' not found in catalog")  # doctest: +SKIP
    """

    pass

DerivaMLOfflineError

Bases: DerivaMLConfigurationError

Exception raised when an online-only operation is attempted in offline mode.

The DerivaML instance was constructed with mode=ConnectionMode.offline but the caller invoked an operation that requires server contact — most commonly create_execution, which needs a server-assigned Execution RID.

Example

Creating an execution requires an online mode because the Execution RID must be server-assigned::

>>> ml = DerivaML(..., mode=ConnectionMode.offline)  # doctest: +SKIP
>>> ml.create_execution(config)  # doctest: +SKIP
Traceback (most recent call last):
    ...
DerivaMLOfflineError: create_execution requires online mode
Source code in src/deriva_ml/core/exceptions.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
class DerivaMLOfflineError(DerivaMLConfigurationError):
    """Exception raised when an online-only operation is attempted in offline mode.

    The DerivaML instance was constructed with ``mode=ConnectionMode.offline``
    but the caller invoked an operation that requires server contact — most
    commonly ``create_execution``, which needs a server-assigned Execution RID.

    Example:
        Creating an execution requires an online mode because the
        Execution RID must be server-assigned::

            >>> ml = DerivaML(..., mode=ConnectionMode.offline)  # doctest: +SKIP
            >>> ml.create_execution(config)  # doctest: +SKIP
            Traceback (most recent call last):
                ...
            DerivaMLOfflineError: create_execution requires online mode
    """

    pass

DerivaMLReadOnlyError

Bases: DerivaMLException

Exception raised when attempting write operations on read-only resources.

Raised when attempting to modify data in a downloaded bag or other read-only context where write operations are not supported.

Example

raise DerivaMLReadOnlyError("Cannot create datasets in a downloaded bag") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
466
467
468
469
470
471
472
473
474
475
476
class DerivaMLReadOnlyError(DerivaMLException):
    """Exception raised when attempting write operations on read-only resources.

    Raised when attempting to modify data in a downloaded bag or other
    read-only context where write operations are not supported.

    Example:
        >>> raise DerivaMLReadOnlyError("Cannot create datasets in a downloaded bag")  # doctest: +SKIP
    """

    pass

DerivaMLSchemaError

Bases: DerivaMLConfigurationError

Exception raised for schema or catalog structure issues.

Raised when the catalog schema is invalid, missing required tables, or has structural problems that prevent normal operation.

Example

raise DerivaMLSchemaError("Ambiguous domain schema: ['Schema1', 'Schema2']") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
class DerivaMLSchemaError(DerivaMLConfigurationError):
    """Exception raised for schema or catalog structure issues.

    Raised when the catalog schema is invalid, missing required tables,
    or has structural problems that prevent normal operation.

    Example:
        >>> raise DerivaMLSchemaError("Ambiguous domain schema: ['Schema1', 'Schema2']")  # doctest: +SKIP
    """

    pass

DerivaMLSchemaPinned

Bases: DerivaMLConfigurationError

Raised when refresh_schema() is called on a pinned cache.

The cache has been explicitly pinned via pin_schema(). Call unpin_schema() first if you really want to refresh. Note: force=True does NOT bypass a pin — it only bypasses the pending-rows guard.

Example

raise DerivaMLSchemaPinned( # doctest: +SKIP ... "refresh_schema refused: cache is pinned at snapshot s0" ... )

Source code in src/deriva_ml/core/exceptions.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
class DerivaMLSchemaPinned(DerivaMLConfigurationError):
    """Raised when ``refresh_schema()`` is called on a pinned cache.

    The cache has been explicitly pinned via ``pin_schema()``. Call
    ``unpin_schema()`` first if you really want to refresh. Note:
    ``force=True`` does NOT bypass a pin — it only bypasses the
    pending-rows guard.

    Example:
        >>> raise DerivaMLSchemaPinned(  # doctest: +SKIP
        ...     "refresh_schema refused: cache is pinned at snapshot s0"
        ... )
    """

    pass

DerivaMLSchemaRefreshBlocked

Bases: DerivaMLConfigurationError

Raised when refresh_schema() is called with staged work in the workspace.

The caller should drain the workspace first (ml.upload_pending()) or call refresh_schema(force=True) to discard local state. Draining is the safer choice — a forced refresh may leave rows whose metadata references columns or types no longer in the new schema, causing catalog-insert failures on the next upload.

Example

raise DerivaMLSchemaRefreshBlocked( # doctest: +SKIP ... "refresh_schema requires a drained workspace; 3 pending rows" ... )

Source code in src/deriva_ml/core/exceptions.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
class DerivaMLSchemaRefreshBlocked(DerivaMLConfigurationError):
    """Raised when ``refresh_schema()`` is called with staged work in the workspace.

    The caller should drain the workspace first (``ml.upload_pending()``)
    or call ``refresh_schema(force=True)`` to discard local state.
    Draining is the safer choice — a forced refresh may leave rows
    whose metadata references columns or types no longer in the new
    schema, causing catalog-insert failures on the next upload.

    Example:
        >>> raise DerivaMLSchemaRefreshBlocked(  # doctest: +SKIP
        ...     "refresh_schema requires a drained workspace; 3 pending rows"
        ... )
    """

    pass

DerivaMLStateInconsistency

Bases: DerivaMLDataError

Exception raised when workspace SQLite state and catalog state disagree in an unresolvable way.

The six disagreement cases enumerated in spec §2.2 are handled automatically by the reconciliation logic (see state_machine.reconcile_with_catalog); anything outside those rules surfaces as this exception with enough information for a human to intervene.

Example

A catalog-side delete of an in-flight execution produces this error::

>>> exe = ml.resume_execution("EXE-A")  # doctest: +SKIP
Traceback (most recent call last):
    ...
DerivaMLStateInconsistency: Execution EXE-A: SQLite status 'running' but catalog returned no Execution row
Source code in src/deriva_ml/core/exceptions.py
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
class DerivaMLStateInconsistency(DerivaMLDataError):
    """Exception raised when workspace SQLite state and catalog state disagree in an unresolvable way.

    The six disagreement cases enumerated in spec §2.2 are handled automatically
    by the reconciliation logic (see ``state_machine.reconcile_with_catalog``);
    anything outside those rules surfaces as this exception with enough
    information for a human to intervene.

    Example:
        A catalog-side delete of an in-flight execution produces this error::

            >>> exe = ml.resume_execution("EXE-A")  # doctest: +SKIP
            Traceback (most recent call last):
                ...
            DerivaMLStateInconsistency: Execution EXE-A: SQLite status 'running' but catalog returned no Execution row
    """

    pass

DerivaMLTableNotFound

Bases: DerivaMLNotFoundError

Exception raised when a table cannot be found.

Raised when attempting to access a table that doesn't exist in the catalog schema or downloaded bag.

Parameters:

Name Type Description Default
table_name str

The name of the table that was not found.

required
msg str

Additional context. Defaults to "Table not found".

'Table not found'
Example

raise DerivaMLTableNotFound("MyTable") # doctest: +SKIP DerivaMLTableNotFound: Table not found: MyTable

Source code in src/deriva_ml/core/exceptions.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
class DerivaMLTableNotFound(DerivaMLNotFoundError):
    """Exception raised when a table cannot be found.

    Raised when attempting to access a table that doesn't exist in the
    catalog schema or downloaded bag.

    Args:
        table_name: The name of the table that was not found.
        msg: Additional context. Defaults to "Table not found".

    Example:
        >>> raise DerivaMLTableNotFound("MyTable")  # doctest: +SKIP
        DerivaMLTableNotFound: Table not found: MyTable
    """

    def __init__(self, table_name: str, msg: str = "Table not found") -> None:
        super().__init__(f"{msg}: {table_name}")
        self.table_name = table_name

DerivaMLTableTypeError

Bases: DerivaMLDataError

Exception raised when a RID or table is not of the expected type.

Raised when an operation requires a specific table type (e.g., Dataset, Execution) but receives a RID or table reference of a different type.

Parameters:

Name Type Description Default
table_type str

The expected table type (e.g., "Dataset", "Execution").

required
table str

The actual table name or RID that was provided.

required
Example

raise DerivaMLTableTypeError("Dataset", "1-ABC123") # doctest: +SKIP DerivaMLTableTypeError: Table 1-ABC123 is not of type Dataset.

Source code in src/deriva_ml/core/exceptions.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
class DerivaMLTableTypeError(DerivaMLDataError):
    """Exception raised when a RID or table is not of the expected type.

    Raised when an operation requires a specific table type (e.g., Dataset,
    Execution) but receives a RID or table reference of a different type.

    Args:
        table_type: The expected table type (e.g., "Dataset", "Execution").
        table: The actual table name or RID that was provided.

    Example:
        >>> raise DerivaMLTableTypeError("Dataset", "1-ABC123")  # doctest: +SKIP
        DerivaMLTableTypeError: Table 1-ABC123 is not of type Dataset.
    """

    def __init__(self, table_type: str, table: str) -> None:
        super().__init__(f"Table {table} is not of type {table_type}.")
        self.table_type = table_type
        self.table = table

DerivaMLUploadError

Bases: DerivaMLExecutionError

Exception raised for asset upload failures.

Raised when uploading assets to the catalog fails, including file uploads, metadata insertion, and provenance recording.

Example

raise DerivaMLUploadError("Failed to upload execution assets") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
448
449
450
451
452
453
454
455
456
457
458
class DerivaMLUploadError(DerivaMLExecutionError):
    """Exception raised for asset upload failures.

    Raised when uploading assets to the catalog fails, including file
    uploads, metadata insertion, and provenance recording.

    Example:
        >>> raise DerivaMLUploadError("Failed to upload execution assets")  # doctest: +SKIP
    """

    pass

DerivaMLValidationError

Bases: DerivaMLDataError

Exception raised when data validation fails.

Raised when input data fails validation, such as invalid RID format, mismatched metadata, or constraint violations.

Example

raise DerivaMLValidationError("Invalid RID format: ABC") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
309
310
311
312
313
314
315
316
317
318
319
class DerivaMLValidationError(DerivaMLDataError):
    """Exception raised when data validation fails.

    Raised when input data fails validation, such as invalid RID format,
    mismatched metadata, or constraint violations.

    Example:
        >>> raise DerivaMLValidationError("Invalid RID format: ABC")  # doctest: +SKIP
    """

    pass

DerivaMLWorkflowError

Bases: DerivaMLExecutionError

Exception raised for workflow-related issues.

Raised when there are problems with workflow lookup, creation, or Git integration for workflow tracking.

Example

raise DerivaMLWorkflowError("Not executing in a Git repository") # doctest: +SKIP

Source code in src/deriva_ml/core/exceptions.py
410
411
412
413
414
415
416
417
418
419
420
class DerivaMLWorkflowError(DerivaMLExecutionError):
    """Exception raised for workflow-related issues.

    Raised when there are problems with workflow lookup, creation, or
    Git integration for workflow tracking.

    Example:
        >>> raise DerivaMLWorkflowError("Not executing in a Git repository")  # doctest: +SKIP
    """

    pass