Documentation for Lineage models in DerivaML
The lookup_lineage() method on DerivaML returns a tree of provenance
information for any artifact RID (Dataset, Asset, Feature value, or
Execution). The Pydantic models that shape the response are defined in
deriva_ml.execution.lineage.
For the user-guide walkthrough — including common patterns, depth control, cycle handling, and the data-flow-vs-orchestration distinction (see ADR-0001) — see Running an experiment — How to trace an artifact's lineage.
The method itself is documented on the DerivaML class:
DerivaML — lookup_lineage.
Pydantic models for the lineage walk returned by lookup_lineage.
Each :class:LineageResult describes the data-flow provenance chain
behind a single artifact (Dataset, Asset, Feature value, or
Execution). The walk follows producing-execution edges through
consumed inputs (datasets and assets) and explicitly does NOT walk
Execution_Execution orchestration links — see
docs/adr/0001-lineage-walks-data-flow-not-orchestration.md.
The models live in their own module so they can cross a boundary:
the deriva-ml-mcp Round 6 follow-up serializes them with
.model_dump() from a tool wrapper, and downstream agents
(notebook, skill, web app) consume the JSON.
Example
Inspect the producer of the immediate node::
>>> result = ml.lookup_lineage("3-XYZ", depth=0) # doctest: +SKIP
>>> producer = result.lineage.execution
>>> print(producer.rid, producer.workflow.name if producer.workflow else None)
AssetSummary
Bases: BaseModel
Compact view of a consumed Asset.
Attributes:
| Name | Type | Description |
|---|---|---|
rid |
RID
|
Asset RID. |
filename |
str | None
|
Original filename (may be empty if the asset row has no filename column populated). |
asset_table |
str
|
Name of the asset table the row lives in
(e.g. |
Source code in src/deriva_ml/execution/lineage.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | |
DatasetSummary
Bases: BaseModel
Compact view of a consumed Dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
rid |
RID
|
Dataset RID. |
description |
str | None
|
Dataset description (may be None or empty). |
version |
str | None
|
Current version at the time the lineage was walked
(e.g. |
Source code in src/deriva_ml/execution/lineage.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
ExecutionSummary
Bases: BaseModel
Compact view of an Execution row.
Surfaces just enough to identify the execution and decide whether
to drill in. Use ml.lookup_execution(rid) for the live
ExecutionRecord.
Attributes:
| Name | Type | Description |
|---|---|---|
rid |
RID
|
Execution RID. |
description |
str | None
|
Execution description (may be None or empty). |
workflow |
WorkflowSummary | None
|
Compact workflow descriptor (None if the execution has no workflow link). |
status |
str
|
Catalog status string (e.g. |
Source code in src/deriva_ml/execution/lineage.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | |
LineageNode
Bases: BaseModel
One execution node in the lineage tree.
Each node represents an execution that produced something further
down the chain. parents holds the next layer up — the
producing executions of this execution's consumed inputs.
Attributes:
| Name | Type | Description |
|---|---|---|
execution |
ExecutionSummary
|
Compact execution descriptor for this node. |
consumed_datasets |
list[DatasetSummary]
|
Datasets this execution consumed as input. |
consumed_assets |
list[AssetSummary]
|
Assets this execution consumed as input
(asset_role="Input" in the |
parents |
list['LineageNode']
|
Producing executions of the consumed inputs. Deduplicated by execution RID. |
already_shown |
bool
|
True if this execution was already expanded
elsewhere in the tree (diamond DAG marker). When True,
|
Source code in src/deriva_ml/execution/lineage.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
LineageResult
Bases: BaseModel
Result returned by :meth:DerivaML.lookup_lineage.
Top-level transparency fields tell the caller whether the walk
completed cleanly. walked_complete=False means the walk hit
one of the defensive caps (max_executions) before reaching
the root.
Attributes:
| Name | Type | Description |
|---|---|---|
root |
RootDescriptor
|
Descriptor of the artifact the walk started from. |
lineage |
LineageNode | None
|
Tree of producing executions, rooted at the
immediate producer of |
executions_visited |
int
|
Number of distinct executions the walk expanded. Includes the root execution when present. |
walked_complete |
bool
|
True if the walk ran to the natural root of
every branch. False if |
cycle_detected |
bool
|
True if a true cycle was detected (the same
execution appearing on its own active recursion path).
Diamond DAGs (the same execution reached via two
independent paths) are NOT cycles; they're handled by
the |
depth_capped |
bool
|
True if a positive |
Example
Walk lineage of an output asset and pretty-print the chain::
>>> result = ml.lookup_lineage("3JSE") # doctest: +SKIP
>>> assert result.walked_complete
>>> print(f"visited {result.executions_visited} executions")
Source code in src/deriva_ml/execution/lineage.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
RootDescriptor
Bases: BaseModel
Describes the artifact the lineage walk started from.
Attributes:
| Name | Type | Description |
|---|---|---|
rid |
RID
|
RID of the root artifact passed to |
type |
Literal['Dataset', 'Asset', 'Feature', 'Execution']
|
One of |
description |
str | None
|
Description of the root, when available (Dataset.description, Asset.description, Execution.description). None for Feature values, which don't have a free-text description column at this layer. |
producing_execution |
ExecutionSummary | None
|
The execution that produced this artifact, or None if the artifact has no recorded producer (manually inserted data, etc.). For an Execution root, this is the execution itself. |
Source code in src/deriva_ml/execution/lineage.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
WorkflowSummary
Bases: BaseModel
Compact view of a Workflow row.
Only the fields a lineage consumer typically needs at a glance.
Drill into the full record with ml.lookup_workflow(rid).
Attributes:
| Name | Type | Description |
|---|---|---|
rid |
RID
|
Workflow RID. |
name |
str | None
|
Human-readable workflow name (None if the row has no name set). |
Source code in src/deriva_ml/execution/lineage.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |