Documentation for Lineage models in DerivaML

The lookup_lineage() method on DerivaML returns a tree of provenance information for any artifact RID (Dataset, Asset, Feature value, or Execution). The Pydantic models that shape the response are defined in deriva_ml.execution.lineage.

For the user-guide walkthrough — including common patterns, depth control, cycle handling, and the data-flow-vs-orchestration distinction (see ADR-0001) — see Running an experiment — How to trace an artifact's lineage.

The method itself is documented on the DerivaML class: DerivaML — lookup_lineage.

Pydantic models for the lineage walk returned by lookup_lineage.

Each :class:LineageResult describes the data-flow provenance chain behind a single artifact (Dataset, Asset, Feature value, or Execution). The walk follows producing-execution edges through consumed inputs (datasets and assets) and explicitly does NOT walk Execution_Execution orchestration links — see docs/adr/0001-lineage-walks-data-flow-not-orchestration.md.

The models live in their own module so they can cross a boundary: the deriva-ml-mcp Round 6 follow-up serializes them with .model_dump() from a tool wrapper, and downstream agents (notebook, skill, web app) consume the JSON.

Example

Inspect the producer of the immediate node::

>>> result = ml.lookup_lineage("3-XYZ", depth=0)  # doctest: +SKIP
>>> producer = result.lineage.execution
>>> print(producer.rid, producer.workflow.name if producer.workflow else None)

AssetSummary

Bases: BaseModel

Compact view of a consumed Asset.

Attributes:

Name	Type	Description
`rid`	`RID`	Asset RID.
`filename`	`str \| None`	Original filename (may be empty if the asset row has no filename column populated).
`asset_table`	`str`	Name of the asset table the row lives in (e.g. `"Image"`, `"Execution_Asset"`).

Source code in src/deriva_ml/execution/lineage.py

class AssetSummary(BaseModel):
    """Compact view of a consumed Asset.

    Attributes:
        rid: Asset RID.
        filename: Original filename (may be empty if the asset row
            has no filename column populated).
        asset_table: Name of the asset table the row lives in
            (e.g. ``"Image"``, ``"Execution_Asset"``).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    filename: str | None = None
    asset_table: str

DatasetSummary

Bases: BaseModel

Compact view of a consumed Dataset.

Attributes:

Name	Type	Description
`rid`	`RID`	Dataset RID.
`description`	`str \| None`	Dataset description (may be None or empty).
`version`	`str \| None`	Current version at the time the lineage was walked (e.g. `"0.1.0"`). None if the dataset has no version history yet.

Source code in src/deriva_ml/execution/lineage.py

class DatasetSummary(BaseModel):
    """Compact view of a consumed Dataset.

    Attributes:
        rid: Dataset RID.
        description: Dataset description (may be None or empty).
        version: Current version at the time the lineage was walked
            (e.g. ``"0.1.0"``). None if the dataset has no version
            history yet.
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    description: str | None = None
    version: str | None = None

ExecutionSummary

Bases: BaseModel

Compact view of an Execution row.

Surfaces just enough to identify the execution and decide whether to drill in. Use ml.lookup_execution(rid) for the live ExecutionRecord.

Attributes:

Name	Type	Description
`rid`	`RID`	Execution RID.
`description`	`str \| None`	Execution description (may be None or empty).
`workflow`	`WorkflowSummary \| None`	Compact workflow descriptor (None if the execution has no workflow link).
`status`	`str`	Catalog status string (e.g. `"Uploaded"`).

Source code in src/deriva_ml/execution/lineage.py

class ExecutionSummary(BaseModel):
    """Compact view of an Execution row.

    Surfaces just enough to identify the execution and decide whether
    to drill in. Use ``ml.lookup_execution(rid)`` for the live
    ``ExecutionRecord``.

    Attributes:
        rid: Execution RID.
        description: Execution description (may be None or empty).
        workflow: Compact workflow descriptor (None if the execution
            has no workflow link).
        status: Catalog status string (e.g. ``"Uploaded"``).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    description: str | None = None
    workflow: WorkflowSummary | None = None
    status: str

LineageNode

Bases: BaseModel

One execution node in the lineage tree.

Each node represents an execution that produced something further down the chain. parents holds the next layer up — the producing executions of this execution's consumed inputs.

Attributes:

Name	Type	Description
`execution`	`ExecutionSummary`	Compact execution descriptor for this node.
`consumed_datasets`	`list[DatasetSummary]`	Datasets this execution consumed as input.
`consumed_assets`	`list[AssetSummary]`	Assets this execution consumed as input (asset_role="Input" in the `<AssetTable>_Execution` association).
`parents`	`list['LineageNode']`	Producing executions of the consumed inputs. Deduplicated by execution RID.
`already_shown`	`bool`	True if this execution was already expanded elsewhere in the tree (diamond DAG marker). When True, `parents` is left empty to avoid re-walking; consumers should look up the original node by `execution.rid`.

Source code in src/deriva_ml/execution/lineage.py

class LineageNode(BaseModel):
    """One execution node in the lineage tree.

    Each node represents an execution that produced something further
    down the chain. ``parents`` holds the next layer up — the
    producing executions of this execution's consumed inputs.

    Attributes:
        execution: Compact execution descriptor for this node.
        consumed_datasets: Datasets this execution consumed as input.
        consumed_assets: Assets this execution consumed as input
            (asset_role="Input" in the ``<AssetTable>_Execution``
            association).
        parents: Producing executions of the consumed inputs.
            Deduplicated by execution RID.
        already_shown: True if this execution was already expanded
            elsewhere in the tree (diamond DAG marker). When True,
            ``parents`` is left empty to avoid re-walking; consumers
            should look up the original node by ``execution.rid``.
    """

    model_config = ConfigDict(extra="forbid")

    execution: ExecutionSummary
    consumed_datasets: list[DatasetSummary] = Field(default_factory=list)
    consumed_assets: list[AssetSummary] = Field(default_factory=list)
    parents: list["LineageNode"] = Field(default_factory=list)
    already_shown: bool = False

LineageResult

Bases: BaseModel

Result returned by :meth:DerivaML.lookup_lineage.

Top-level transparency fields tell the caller whether the walk completed cleanly. walked_complete=False means the walk hit one of the defensive caps (max_executions) before reaching the root.

Attributes:

Name	Type	Description
`root`	`RootDescriptor`	Descriptor of the artifact the walk started from.
`lineage`	`LineageNode \| None`	Tree of producing executions, rooted at the immediate producer of `root`. None when the root has no recorded producer.
`executions_visited`	`int`	Number of distinct executions the walk expanded. Includes the root execution when present.
`walked_complete`	`bool`	True if the walk ran to the natural root of every branch. False if `max_executions` was hit or a depth cap stopped the expansion.
`cycle_detected`	`bool`	True if a true cycle was detected (the same execution appearing on its own active recursion path). Diamond DAGs (the same execution reached via two independent paths) are NOT cycles; they're handled by the `already_shown` flag on :class:`LineageNode`.
`depth_capped`	`bool`	True if a positive `depth` argument prevented expansion of at least one branch.

Example

Walk lineage of an output asset and pretty-print the chain::

>>> result = ml.lookup_lineage("3JSE")  # doctest: +SKIP
>>> assert result.walked_complete
>>> print(f"visited {result.executions_visited} executions")

Source code in src/deriva_ml/execution/lineage.py

class LineageResult(BaseModel):
    """Result returned by :meth:`DerivaML.lookup_lineage`.

    Top-level transparency fields tell the caller whether the walk
    completed cleanly. ``walked_complete=False`` means the walk hit
    one of the defensive caps (``max_executions``) before reaching
    the root.

    Attributes:
        root: Descriptor of the artifact the walk started from.
        lineage: Tree of producing executions, rooted at the
            immediate producer of ``root``. None when the root has
            no recorded producer.
        executions_visited: Number of distinct executions the walk
            expanded. Includes the root execution when present.
        walked_complete: True if the walk ran to the natural root of
            every branch. False if ``max_executions`` was hit or a
            depth cap stopped the expansion.
        cycle_detected: True if a true cycle was detected (the same
            execution appearing on its own active recursion path).
            Diamond DAGs (the same execution reached via two
            independent paths) are NOT cycles; they're handled by
            the ``already_shown`` flag on :class:`LineageNode`.
        depth_capped: True if a positive ``depth`` argument
            prevented expansion of at least one branch.

    Example:
        Walk lineage of an output asset and pretty-print the chain::

            >>> result = ml.lookup_lineage("3JSE")  # doctest: +SKIP
            >>> assert result.walked_complete
            >>> print(f"visited {result.executions_visited} executions")
    """

    model_config = ConfigDict(extra="forbid")

    root: RootDescriptor
    lineage: LineageNode | None = None
    executions_visited: int = 0
    walked_complete: bool = True
    cycle_detected: bool = False
    depth_capped: bool = False

RootDescriptor

Bases: BaseModel

Describes the artifact the lineage walk started from.

Attributes:

Name	Type	Description
`rid`	`RID`	RID of the root artifact passed to `lookup_lineage`.
`type`	`Literal['Dataset', 'Asset', 'Feature', 'Execution']`	One of `"Dataset"`, `"Asset"`, `"Feature"`, or `"Execution"`. Determines how `producing_execution` was resolved.
`description`	`str \| None`	Description of the root, when available (Dataset.description, Asset.description, Execution.description). None for Feature values, which don't have a free-text description column at this layer.
`producing_execution`	`ExecutionSummary \| None`	The execution that produced this artifact, or None if the artifact has no recorded producer (manually inserted data, etc.). For an Execution root, this is the execution itself.

Source code in src/deriva_ml/execution/lineage.py

class RootDescriptor(BaseModel):
    """Describes the artifact the lineage walk started from.

    Attributes:
        rid: RID of the root artifact passed to ``lookup_lineage``.
        type: One of ``"Dataset"``, ``"Asset"``, ``"Feature"``, or
            ``"Execution"``. Determines how
            ``producing_execution`` was resolved.
        description: Description of the root, when available
            (Dataset.description, Asset.description,
            Execution.description). None for Feature values, which
            don't have a free-text description column at this layer.
        producing_execution: The execution that produced this
            artifact, or None if the artifact has no recorded
            producer (manually inserted data, etc.). For an
            Execution root, this is the execution itself.
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    type: Literal["Dataset", "Asset", "Feature", "Execution"]
    description: str | None = None
    producing_execution: ExecutionSummary | None = None

WorkflowSummary

Bases: BaseModel

Compact view of a Workflow row.

Only the fields a lineage consumer typically needs at a glance. Drill into the full record with ml.lookup_workflow(rid).

Attributes:

Name	Type	Description
`rid`	`RID`	Workflow RID.
`name`	`str \| None`	Human-readable workflow name (None if the row has no name set).

Source code in src/deriva_ml/execution/lineage.py

class WorkflowSummary(BaseModel):
    """Compact view of a Workflow row.

    Only the fields a lineage consumer typically needs at a glance.
    Drill into the full record with ``ml.lookup_workflow(rid)``.

    Attributes:
        rid: Workflow RID.
        name: Human-readable workflow name (None if the row has no
            name set).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    name: str | None = None