Skip to content

Documentation for Lineage models in DerivaML

The lookup_lineage() method on DerivaML returns a tree of provenance information for any artifact RID (Dataset, Asset, Feature value, or Execution). The Pydantic models that shape the response are defined in deriva_ml.execution.lineage.

For the user-guide walkthrough — including common patterns, depth control, cycle handling, and the data-flow-vs-orchestration distinction (see ADR-0001) — see Running an experiment — How to trace an artifact's lineage.

The method itself is documented on the DerivaML class: DerivaML — lookup_lineage.

Pydantic models for the lineage walk returned by lookup_lineage.

Each :class:LineageResult describes the data-flow provenance chain behind a single artifact (Dataset, Asset, Feature value, or Execution). The walk follows producing-execution edges through consumed inputs (datasets and assets) and explicitly does NOT walk Execution_Execution orchestration links — see docs/adr/0001-lineage-walks-data-flow-not-orchestration.md.

The models live in their own module so they can cross a boundary: the deriva-ml-mcp Round 6 follow-up serializes them with .model_dump() from a tool wrapper, and downstream agents (notebook, skill, web app) consume the JSON.

Example

Inspect the producer of the immediate node::

>>> result = ml.lookup_lineage("3-XYZ", depth=0)  # doctest: +SKIP
>>> producer = result.lineage.execution
>>> print(producer.rid, producer.workflow.name if producer.workflow else None)

AssetSummary

Bases: BaseModel

Compact view of a consumed Asset.

Attributes:

Name Type Description
rid RID

Asset RID.

filename str | None

Original filename (may be empty if the asset row has no filename column populated).

asset_table str

Name of the asset table the row lives in (e.g. "Image", "Execution_Asset").

Source code in src/deriva_ml/execution/lineage.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
class AssetSummary(BaseModel):
    """Compact view of a consumed Asset.

    Attributes:
        rid: Asset RID.
        filename: Original filename (may be empty if the asset row
            has no filename column populated).
        asset_table: Name of the asset table the row lives in
            (e.g. ``"Image"``, ``"Execution_Asset"``).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    filename: str | None = None
    asset_table: str

DatasetSummary

Bases: BaseModel

Compact view of a consumed Dataset.

Attributes:

Name Type Description
rid RID

Dataset RID.

description str | None

Dataset description (may be None or empty).

version str | None

Current version at the time the lineage was walked (e.g. "0.1.0"). None if the dataset has no version history yet.

Source code in src/deriva_ml/execution/lineage.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
class DatasetSummary(BaseModel):
    """Compact view of a consumed Dataset.

    Attributes:
        rid: Dataset RID.
        description: Dataset description (may be None or empty).
        version: Current version at the time the lineage was walked
            (e.g. ``"0.1.0"``). None if the dataset has no version
            history yet.
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    description: str | None = None
    version: str | None = None

ExecutionSummary

Bases: BaseModel

Compact view of an Execution row.

Surfaces just enough to identify the execution and decide whether to drill in. Use ml.lookup_execution(rid) for the live ExecutionRecord.

Attributes:

Name Type Description
rid RID

Execution RID.

description str | None

Execution description (may be None or empty).

workflow WorkflowSummary | None

Compact workflow descriptor (None if the execution has no workflow link).

status str

Catalog status string (e.g. "Uploaded").

Source code in src/deriva_ml/execution/lineage.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class ExecutionSummary(BaseModel):
    """Compact view of an Execution row.

    Surfaces just enough to identify the execution and decide whether
    to drill in. Use ``ml.lookup_execution(rid)`` for the live
    ``ExecutionRecord``.

    Attributes:
        rid: Execution RID.
        description: Execution description (may be None or empty).
        workflow: Compact workflow descriptor (None if the execution
            has no workflow link).
        status: Catalog status string (e.g. ``"Uploaded"``).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    description: str | None = None
    workflow: WorkflowSummary | None = None
    status: str

LineageNode

Bases: BaseModel

One execution node in the lineage tree.

Each node represents an execution that produced something further down the chain. parents holds the next layer up — the producing executions of this execution's consumed inputs.

Attributes:

Name Type Description
execution ExecutionSummary

Compact execution descriptor for this node.

consumed_datasets list[DatasetSummary]

Datasets this execution consumed as input.

consumed_assets list[AssetSummary]

Assets this execution consumed as input (asset_role="Input" in the <AssetTable>_Execution association).

parents list['LineageNode']

Producing executions of the consumed inputs. Deduplicated by execution RID.

already_shown bool

True if this execution was already expanded elsewhere in the tree (diamond DAG marker). When True, parents is left empty to avoid re-walking; consumers should look up the original node by execution.rid.

Source code in src/deriva_ml/execution/lineage.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
class LineageNode(BaseModel):
    """One execution node in the lineage tree.

    Each node represents an execution that produced something further
    down the chain. ``parents`` holds the next layer up — the
    producing executions of this execution's consumed inputs.

    Attributes:
        execution: Compact execution descriptor for this node.
        consumed_datasets: Datasets this execution consumed as input.
        consumed_assets: Assets this execution consumed as input
            (asset_role="Input" in the ``<AssetTable>_Execution``
            association).
        parents: Producing executions of the consumed inputs.
            Deduplicated by execution RID.
        already_shown: True if this execution was already expanded
            elsewhere in the tree (diamond DAG marker). When True,
            ``parents`` is left empty to avoid re-walking; consumers
            should look up the original node by ``execution.rid``.
    """

    model_config = ConfigDict(extra="forbid")

    execution: ExecutionSummary
    consumed_datasets: list[DatasetSummary] = Field(default_factory=list)
    consumed_assets: list[AssetSummary] = Field(default_factory=list)
    parents: list["LineageNode"] = Field(default_factory=list)
    already_shown: bool = False

LineageResult

Bases: BaseModel

Result returned by :meth:DerivaML.lookup_lineage.

Top-level transparency fields tell the caller whether the walk completed cleanly. walked_complete=False means the walk hit one of the defensive caps (max_executions) before reaching the root.

Attributes:

Name Type Description
root RootDescriptor

Descriptor of the artifact the walk started from.

lineage LineageNode | None

Tree of producing executions, rooted at the immediate producer of root. None when the root has no recorded producer.

executions_visited int

Number of distinct executions the walk expanded. Includes the root execution when present.

walked_complete bool

True if the walk ran to the natural root of every branch. False if max_executions was hit or a depth cap stopped the expansion.

cycle_detected bool

True if a true cycle was detected (the same execution appearing on its own active recursion path). Diamond DAGs (the same execution reached via two independent paths) are NOT cycles; they're handled by the already_shown flag on :class:LineageNode.

depth_capped bool

True if a positive depth argument prevented expansion of at least one branch.

Example

Walk lineage of an output asset and pretty-print the chain::

>>> result = ml.lookup_lineage("3JSE")  # doctest: +SKIP
>>> assert result.walked_complete
>>> print(f"visited {result.executions_visited} executions")
Source code in src/deriva_ml/execution/lineage.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
class LineageResult(BaseModel):
    """Result returned by :meth:`DerivaML.lookup_lineage`.

    Top-level transparency fields tell the caller whether the walk
    completed cleanly. ``walked_complete=False`` means the walk hit
    one of the defensive caps (``max_executions``) before reaching
    the root.

    Attributes:
        root: Descriptor of the artifact the walk started from.
        lineage: Tree of producing executions, rooted at the
            immediate producer of ``root``. None when the root has
            no recorded producer.
        executions_visited: Number of distinct executions the walk
            expanded. Includes the root execution when present.
        walked_complete: True if the walk ran to the natural root of
            every branch. False if ``max_executions`` was hit or a
            depth cap stopped the expansion.
        cycle_detected: True if a true cycle was detected (the same
            execution appearing on its own active recursion path).
            Diamond DAGs (the same execution reached via two
            independent paths) are NOT cycles; they're handled by
            the ``already_shown`` flag on :class:`LineageNode`.
        depth_capped: True if a positive ``depth`` argument
            prevented expansion of at least one branch.

    Example:
        Walk lineage of an output asset and pretty-print the chain::

            >>> result = ml.lookup_lineage("3JSE")  # doctest: +SKIP
            >>> assert result.walked_complete
            >>> print(f"visited {result.executions_visited} executions")
    """

    model_config = ConfigDict(extra="forbid")

    root: RootDescriptor
    lineage: LineageNode | None = None
    executions_visited: int = 0
    walked_complete: bool = True
    cycle_detected: bool = False
    depth_capped: bool = False

RootDescriptor

Bases: BaseModel

Describes the artifact the lineage walk started from.

Attributes:

Name Type Description
rid RID

RID of the root artifact passed to lookup_lineage.

type Literal['Dataset', 'Asset', 'Feature', 'Execution']

One of "Dataset", "Asset", "Feature", or "Execution". Determines how producing_execution was resolved.

description str | None

Description of the root, when available (Dataset.description, Asset.description, Execution.description). None for Feature values, which don't have a free-text description column at this layer.

producing_execution ExecutionSummary | None

The execution that produced this artifact, or None if the artifact has no recorded producer (manually inserted data, etc.). For an Execution root, this is the execution itself.

Source code in src/deriva_ml/execution/lineage.py
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
class RootDescriptor(BaseModel):
    """Describes the artifact the lineage walk started from.

    Attributes:
        rid: RID of the root artifact passed to ``lookup_lineage``.
        type: One of ``"Dataset"``, ``"Asset"``, ``"Feature"``, or
            ``"Execution"``. Determines how
            ``producing_execution`` was resolved.
        description: Description of the root, when available
            (Dataset.description, Asset.description,
            Execution.description). None for Feature values, which
            don't have a free-text description column at this layer.
        producing_execution: The execution that produced this
            artifact, or None if the artifact has no recorded
            producer (manually inserted data, etc.). For an
            Execution root, this is the execution itself.
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    type: Literal["Dataset", "Asset", "Feature", "Execution"]
    description: str | None = None
    producing_execution: ExecutionSummary | None = None

WorkflowSummary

Bases: BaseModel

Compact view of a Workflow row.

Only the fields a lineage consumer typically needs at a glance. Drill into the full record with ml.lookup_workflow(rid).

Attributes:

Name Type Description
rid RID

Workflow RID.

name str | None

Human-readable workflow name (None if the row has no name set).

Source code in src/deriva_ml/execution/lineage.py
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class WorkflowSummary(BaseModel):
    """Compact view of a Workflow row.

    Only the fields a lineage consumer typically needs at a glance.
    Drill into the full record with ``ml.lookup_workflow(rid)``.

    Attributes:
        rid: Workflow RID.
        name: Human-readable workflow name (None if the row has no
            name set).
    """

    model_config = ConfigDict(extra="forbid")

    rid: RID
    name: str | None = None