DerivaML Class
The DerivaML class provides a range of methods to interact with a Deriva catalog.
These methods assume tha tthe catalog contains a deriva-ml and a domain schema.
Data Catalog: The catalog must include both the domain schema and a standard ML schema for effective data management.

- Domain schema: The domain schema includes the data collected or generated by domain-specific experiments or systems.
- ML schema: Each entity in the ML schema is designed to capture details of the ML development process. It including the following tables
- A Dataset represents a data collection, such as aggregation identified for training, validation, and testing purposes.
- A Workflow represents a specific sequence of computational steps or human interactions.
- An Execution is an instance of a workflow that a user instantiates at a specific time.
- An Execution Asset is an output file that results from the execution of a workflow.
- An Execution Metadata is an asset entity for saving metadata files referencing a given execution.
Core module for DerivaML.
This module provides the primary public interface to DerivaML functionality. It exports the main DerivaML class along with configuration, definitions, and exceptions needed for interacting with Deriva-based ML catalogs.
Key exports
- DerivaML: Main class for catalog operations and ML workflow management.
- DerivaMLConfig: Configuration class for DerivaML instances.
- Exceptions: DerivaMLException and specialized exception types.
- Definitions: Type definitions, enums, and constants used throughout the package.
Example
from deriva_ml.core import DerivaML, DerivaMLConfig ml = DerivaML('deriva.example.org', 'my_catalog') datasets = ml.find_datasets()
BuiltinTypes
module-attribute
BuiltinTypes = BuiltinType
Alias for BuiltinType from deriva.core.typed.
This maintains backwards compatibility with existing DerivaML code that uses the plural form 'BuiltinTypes'. New code should use BuiltinType directly.
ColumnDefinition
module-attribute
ColumnDefinition = ColumnDef
Alias for ColumnDef from deriva.core.typed.
This maintains backwards compatibility with existing DerivaML code. New code should use ColumnDef directly.
TableDefinition
module-attribute
TableDefinition = TableDef
Alias for TableDef from deriva.core.typed.
This maintains backwards compatibility with existing DerivaML code. New code should use TableDef directly.
DerivaML
Bases: PathBuilderMixin, RidResolutionMixin, VocabularyMixin, WorkflowMixin, FeatureMixin, DatasetMixin, AssetMixin, ExecutionMixin, FileMixin, AnnotationMixin, DerivaMLCatalog
Core class for machine learning operations on a Deriva catalog.
This class provides core functionality for managing ML workflows, features, and datasets in a Deriva catalog. It handles data versioning, feature management, vocabulary control, and execution tracking.
Attributes:
| Name | Type | Description |
|---|---|---|
host_name |
str
|
Hostname of the Deriva server (e.g., 'deriva.example.org'). |
catalog_id |
Union[str, int]
|
Catalog identifier or name. |
domain_schema |
str
|
Schema name for domain-specific tables and relationships. |
model |
DerivaModel
|
ERMRest model for the catalog. |
working_dir |
Path
|
Directory for storing computation data and results. |
cache_dir |
Path
|
Directory for caching downloaded datasets. |
ml_schema |
str
|
Schema name for ML-specific tables (default: 'deriva_ml'). |
configuration |
ExecutionConfiguration
|
Current execution configuration. |
project_name |
str
|
Name of the current project. |
start_time |
datetime
|
Timestamp when this instance was created. |
Example
ml = DerivaML('deriva.example.org', 'my_catalog') ml.create_feature('my_table', 'new_feature') ml.add_term('vocabulary_table', 'new_term', description='Description of term')
Source code in src/deriva_ml/core/base.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 | |
catalog_provenance
property
catalog_provenance: (
"CatalogProvenance | None"
)
Get the provenance information for this catalog.
Returns provenance information if the catalog has it set. This includes information about how the catalog was created (clone, create, schema), who created it, when, and any workflow information.
For cloned catalogs, additional details about the clone operation are
available in the clone_details attribute.
Returns:
| Type | Description |
|---|---|
'CatalogProvenance | None'
|
CatalogProvenance if available, None otherwise. |
Example
ml = DerivaML('localhost', '45') prov = ml.catalog_provenance if prov: ... print(f"Created: {prov.created_at} by {prov.created_by}") ... print(f"Method: {prov.creation_method.value}") ... if prov.is_clone: ... print(f"Cloned from: {prov.clone_details.source_hostname}")
mode
property
mode: ConnectionMode
Current connection mode.
Returns:
| Type | Description |
|---|---|
ConnectionMode
|
The ConnectionMode this DerivaML instance was constructed |
ConnectionMode
|
with. Drives whether writes go live to the catalog (online) |
ConnectionMode
|
or stage in SQLite for later upload (offline). See spec §2.1. |
Example
ml.mode is ConnectionMode.online True
working_data
property
working_data: Path
Return the working data directory path.
.. deprecated::
working_data is deprecated and will be removed in the next
major version. Use working_dir instead.
``working_dir`` is the canonical attribute; it is set during
execution initialization and contains all output assets, metadata,
and intermediate files for the current execution.
Returns:
| Type | Description |
|---|---|
Path
|
Path to the working data directory (same as |
Raises:
| Type | Description |
|---|---|
DeprecationWarning
|
Always emitted at access time. |
Example
exe.working_dir # use this instead # doctest: +SKIP
workspace
property
workspace: 'Workspace'
Per-catalog Workspace for local caching, denormalization, and asset manifests.
Backed by Workspace under {working_dir}/catalogs/{host}__{cat}/
working.db. Shared across invocations of scripts that use the same
working directory.
Example::
# Cache a full table
df = ml.cache_table("Subject")
# Check what's cached
ml.workspace.list_cached_results()
__del__
__del__() -> None
Cleanup method to handle incomplete executions.
Best-effort abort on DerivaML shutdown. The previous implementation
used the legacy Status enum; the new ExecutionStatus lifecycle
separates Stopped/Uploaded/Aborted/Failed. Here we only attempt to
abort if the execution hasn't already reached a terminal state —
InvalidTransitionError from the state machine covers the rest.
Source code in src/deriva_ml/core/base.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 | |
__init__
__init__(
hostname: str,
catalog_id: str | int,
domain_schemas: str
| set[str]
| None = None,
default_schema: str | None = None,
project_name: str | None = None,
cache_dir: str | Path | None = None,
working_dir: str
| Path
| None = None,
hydra_runtime_output_dir: str
| Path
| None = None,
ml_schema: str = ML_SCHEMA,
logging_level: int = logging.WARNING,
deriva_logging_level: int = logging.WARNING,
credential: dict | None = None,
s3_bucket: str | None = None,
use_minid: bool | None = None,
check_auth: bool = True,
clean_execution_dir: bool = True,
mode: ConnectionMode
| str = ConnectionMode.online,
) -> None
Initializes a DerivaML instance.
This method will connect to a catalog and initialize local configuration for the ML execution. This class is intended to be used as a base class on which domain-specific interfaces are built.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hostname
|
str
|
Hostname of the Deriva server. |
required |
catalog_id
|
str | int
|
Catalog ID. Either an identifier or a catalog name. |
required |
domain_schemas
|
str | set[str] | None
|
Optional set of domain schema names. If None, auto-detects all non-system schemas. Use this when working with catalogs that have multiple user-defined schemas. |
None
|
default_schema
|
str | None
|
The default schema for table creation operations. If None and there is exactly one domain schema, that schema is used. If there are multiple domain schemas, this must be specified for table creation to work without explicit schema parameters. |
None
|
ml_schema
|
str
|
Schema name for ML schema. Used if you have a non-standard configuration of deriva-ml. |
ML_SCHEMA
|
project_name
|
str | None
|
Project name. Defaults to name of default_schema. |
None
|
cache_dir
|
str | Path | None
|
Directory path for caching data downloaded from the Deriva server as bdbag. If not provided, will default to working_dir. |
None
|
working_dir
|
str | Path | None
|
Directory path for storing data used by or generated by any computations. If no value is provided, will default to ${HOME}/deriva_ml |
None
|
s3_bucket
|
str | None
|
S3 bucket URL for dataset bag storage (e.g., 's3://my-bucket'). If provided, enables MINID creation and S3 upload for dataset exports. If None, MINID functionality is disabled regardless of use_minid setting. |
None
|
use_minid
|
bool | None
|
Use the MINID service when downloading dataset bags. Only effective when s3_bucket is configured. If None (default), automatically set to True when s3_bucket is provided, False otherwise. |
None
|
check_auth
|
bool
|
Check if the user has access to the catalog. |
True
|
clean_execution_dir
|
bool
|
Whether to automatically clean up execution working directories after successful upload. Defaults to True. Set to False to retain local copies. |
True
|
mode
|
ConnectionMode | str
|
Connection mode for this instance. |
online
|
Source code in src/deriva_ml/core/base.py
219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 | |
add_dataset_element_type
add_dataset_element_type(
element: str | Table,
) -> Table
Make it possible to add objects from element table to a dataset.
Creates a new association table linking Dataset to the given table, then updates catalog annotations so the new type is included in bag-export specs. If the workspace ORM was already built, it is rebuilt to pick up the new association table — the ORM is eagerly constructed at init time and does not see DDL changes applied after that point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
element
|
str | Table
|
Name of the table (str) or Table object to register as a valid dataset element type. |
required |
Returns:
| Type | Description |
|---|---|
Table
|
The Table object that was registered. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If |
DerivaMLTableTypeError
|
If the table is a system or ML table and cannot be a dataset element type. |
Example
ml.add_dataset_element_type("Image") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
add_features
add_features(*args, **kwargs) -> int
Retired — use exe.add_features(records) inside an execution context.
DerivaML.add_features has been removed. Feature writes must go
through the execution context so that provenance is tracked and values
are staged for atomic upload.
Replacement::
with ml.create_execution(config).execute() as exe:
exe.add_features(records)
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
Always. Points at the replacement API. |
Source code in src/deriva_ml/core/mixins/feature.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
add_files
add_files(
files: Iterable[FileSpec],
execution_rid: RID,
dataset_types: str
| list[str]
| None = None,
description: str = "",
) -> "Dataset"
Adds files to the catalog with their metadata.
Registers files in the catalog along with their metadata (MD5, length, URL) and associates them with specified file types. Links files to the specified execution record for provenance tracking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
files
|
Iterable[FileSpec]
|
File specifications containing MD5 checksum, length, and URL. |
required |
execution_rid
|
RID
|
Execution RID to associate files with (required for provenance). |
required |
dataset_types
|
str | list[str] | None
|
One or more dataset type terms from File_Type vocabulary. |
None
|
description
|
str
|
Description of the files. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
'Dataset'
|
Dataset that represents the newly added files. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If file_types are invalid or execution_rid is not an execution record. |
Examples:
Add files via an execution: >>> with ml.create_execution(config) as exe: # doctest: +SKIP ... files = [FileSpec(url="path/to/file.txt", md5="abc123", length=1000)] ... dataset = exe.add_files(files, dataset_types="text")
Source code in src/deriva_ml/core/mixins/file.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
add_term
add_term(
table: str | Table,
term_name: str,
description: str,
synonyms: list[str] | None = None,
exists_ok: bool = True,
) -> VocabularyTermHandle
Adds a term to a vocabulary table.
Creates a new standardized term with description and optional synonyms in a vocabulary table. Can either create a new term or return an existing one if it already exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Vocabulary table to add term to (name or Table object). |
required |
term_name
|
str
|
Primary name of the term (must be unique within vocabulary). |
required |
description
|
str
|
Explanation of term's meaning and usage. |
required |
synonyms
|
list[str] | None
|
Alternative names for the term. |
None
|
exists_ok
|
bool
|
If True, return the existing term if found. If False, raise error. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
VocabularyTermHandle |
VocabularyTermHandle
|
Object representing the created or existing term, with methods to modify it in the catalog. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If a term exists and exists_ok=False, or if the table is not a vocabulary table. |
Examples:
Add a new tissue type: >>> term = ml.add_term( # doctest: +SKIP ... table="tissue_types", ... term_name="epithelial", ... description="Epithelial tissue type", ... synonyms=["epithelium"] ... ) >>> # Modify the term >>> term.description = "Updated description" >>> term.synonyms = ("epithelium", "epithelial_tissue")
Attempt to add an existing term: >>> term = ml.add_term("tissue_types", "epithelial", "...", exists_ok=True) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/vocabulary.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
add_visible_column
add_visible_column(
table: str | Table,
context: str,
column: str
| list[str]
| dict[str, Any],
position: int | None = None,
) -> list[Any]
Add a column to the visible-columns list for a specific context.
Convenience method for adding columns without replacing the entire
visible-columns annotation. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
column
|
str | list[str] | dict[str, Any]
|
Column to add. Can be:
- str: column name (e.g., |
required |
position
|
int | None
|
Position to insert at (0-indexed). If |
None
|
Returns:
| Type | Description |
|---|---|
list[Any]
|
The updated column list for the context. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If |
Example
ml.add_visible_column("Image", "compact", "Description") # doctest: +SKIP ml.add_visible_column("Image", "detailed", ["domain", "Image_Subject_fkey"], 1) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 | |
add_visible_foreign_key
add_visible_foreign_key(
table: str | Table,
context: str,
foreign_key: list[str]
| dict[str, Any],
position: int | None = None,
) -> list[Any]
Add a foreign key to the visible-foreign-keys list for a specific context.
Convenience method for adding related tables without replacing the
entire visible-foreign-keys annotation. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
foreign_key
|
list[str] | dict[str, Any]
|
Foreign key to add. Can be:
- list: inbound FK reference (e.g.,
|
required |
position
|
int | None
|
Position to insert at (0-indexed). If |
None
|
Returns:
| Type | Description |
|---|---|
list[Any]
|
The updated foreign key list for the context. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If |
Example
ml.add_visible_foreign_key("Subject", "detailed", ["domain", "Image_Subject_fkey"]) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 | |
apply_annotations
apply_annotations() -> None
Apply all staged annotation changes to the catalog.
Pushes any in-memory annotation changes to the live catalog. Must
be called after any sequence of set_* or add_*/remove_*
annotation calls to make changes visible in Chaise.
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the catalog is read-only or the apply call fails. |
Example
ml.set_display_annotation("Image", {"name": "Scan"}) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
apply_catalog_annotations
apply_catalog_annotations(
navbar_brand_text: str = "ML Data Browser",
head_title: str = "Catalog ML",
) -> None
Apply catalog-level annotations including the navigation bar and display settings.
This method configures the Chaise web interface for the catalog. Chaise is Deriva's web-based data browser that provides a user-friendly interface for exploring and managing catalog data. This method sets up annotations that control how Chaise displays and organizes the catalog.
Navigation Bar Structure: The method creates a navigation bar with the following menus: - User Info: Links to Users, Groups, and RID Lease tables - Deriva-ML: Core ML tables (Workflow, Execution, Dataset, Dataset_Version, etc.) - WWW: Web content tables (Page, File) - {Domain Schema}: All domain-specific tables (excludes vocabularies and associations) - Vocabulary: All controlled vocabulary tables from both ML and domain schemas - Assets: All asset tables from both ML and domain schemas - Features: All feature tables with entries named "TableName:FeatureName" - Catalog Registry: Link to the ermrest registry - Documentation: Links to ML notebook instructions and Deriva-ML docs
Display Settings: - Underscores in table/column names displayed as spaces - System columns (RID) shown in compact and entry views - Default table set to Dataset - Faceting and record deletion enabled - Export configurations available to all users
Bulk Upload Configuration: Configures upload patterns for asset tables, enabling drag-and-drop file uploads through the Chaise interface.
Call this after creating the domain schema and all tables to initialize the catalog's web interface. The navigation menus are dynamically built based on the current schema structure, automatically organizing tables into appropriate categories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
navbar_brand_text
|
str
|
Text displayed in the navigation bar brand area. |
'ML Data Browser'
|
head_title
|
str
|
Title displayed in the browser tab. |
'Catalog ML'
|
Example
ml = DerivaML('deriva.example.org', 'my_catalog')
After creating domain schema and tables...
ml.apply_catalog_annotations()
Or with custom branding:
ml.apply_catalog_annotations("My Project Browser", "My ML Project")
Source code in src/deriva_ml/core/base.py
1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 | |
asset_record_class
asset_record_class(
asset_table_name: str,
) -> type
Create a dynamically generated Pydantic model for an asset table's metadata.
The returned class is a subclass of AssetRecord with fields derived from the asset table's metadata columns (non-system, non-standard-asset columns). Fields are typed according to their database column type, and nullable columns are Optional.
Follows the same pattern as Feature.feature_record_class().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_table_name
|
str
|
Name of the asset table (e.g., "Image", "Model"). |
required |
Returns:
| Type | Description |
|---|---|
type
|
An AssetRecord subclass with validated fields matching the table's metadata. |
Example
ImageAsset = ml.asset_record_class("Image") # doctest: +SKIP record = ImageAsset(Subject="2-DEF", Acquisition_Date="2026-01-15") # doctest: +SKIP path = exe.asset_file_path("Image", "scan.jpg", metadata=record) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 | |
bag_info
bag_info(
dataset: "DatasetSpec",
) -> dict[str, Any]
Get comprehensive info about a dataset bag: size, contents, and cache status.
Combines the size estimate with local cache status. Use this to decide whether to prefetch a bag before running an experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
'DatasetSpec'
|
Specification of the dataset, including version and optional exclude_tables. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with keys: - tables: dict mapping table name to {row_count, is_asset, asset_bytes} - total_rows, total_asset_bytes, total_asset_size - cache_status: one of "not_cached", "cached_metadata_only", "cached_materialized", "cached_incomplete" - cache_path: local path to cached bag (if cached), else None |
Source code in src/deriva_ml/core/mixins/dataset.py
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 | |
cache_dataset
cache_dataset(
dataset: "DatasetSpec",
materialize: bool = True,
) -> dict[str, Any]
Download a dataset bag into the local cache without creating an execution.
Use this to warm the cache before running experiments. No execution or provenance records are created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
'DatasetSpec'
|
Specification of the dataset, including version and optional exclude_tables. |
required |
materialize
|
bool
|
If True (default), download all asset files. If False, download only table metadata. |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with bag_info results after caching. |
Source code in src/deriva_ml/core/mixins/dataset.py
502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 | |
cache_table
cache_table(
table_name: str, force: bool = False
) -> "pd.DataFrame"
Fetch a table from the catalog and cache locally as SQLite.
On first call, fetches all rows from the catalog and stores in the
working data cache. Subsequent calls return the cached data without
contacting the catalog. Use force=True to re-fetch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_name
|
str
|
Name of the table to fetch (e.g., "Subject", "Image"). |
required |
force
|
bool
|
If True, re-fetch even if already cached. |
False
|
Returns:
| Type | Description |
|---|---|
'pd.DataFrame'
|
DataFrame with the table contents. |
Example::
subjects = ml.cache_table("Subject")
print(f"{len(subjects)} subjects")
# Second call returns cached data instantly
subjects = ml.cache_table("Subject")
Source code in src/deriva_ml/core/base.py
882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 | |
catalog_snapshot
catalog_snapshot(
version_snapshot: str,
) -> Self
Return a new DerivaML instance connected to a specific catalog snapshot.
Catalog snapshots provide a read-only, point-in-time view of the catalog. The snapshot identifier is typically obtained from a dataset version record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
version_snapshot
|
str
|
Snapshot identifier string (e.g., |
required |
Returns:
| Type | Description |
|---|---|
Self
|
A new DerivaML instance connected to the specified catalog snapshot. |
Source code in src/deriva_ml/core/base.py
730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 | |
chaise_url
chaise_url(
table: RID | Table | str,
) -> str
Generates Chaise web interface URL.
Chaise is Deriva's web interface for data exploration. This method creates a URL that directly links to the specified table or record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
RID | Table | str
|
Table to generate URL for (name, Table object, or RID). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
URL in format: https://{host}/chaise/recordset/#{catalog}/{schema}:{table} |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If table or RID cannot be found. |
Examples:
Using table name: >>> ml.chaise_url("experiment_table") 'https://deriva.org/chaise/recordset/#1/schema:experiment_table'
Using RID: >>> ml.chaise_url("1-abc123")
Source code in src/deriva_ml/core/base.py
969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 | |
cite
cite(
entity: Dict[str, Any] | str,
current: bool = False,
) -> str
Generates citation URL for an entity.
Creates a URL that can be used to reference a specific entity in the catalog. By default, includes the catalog snapshot time to ensure version stability (permanent citation). With current=True, returns a URL to the current state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entity
|
Dict[str, Any] | str
|
Either a RID string or a dictionary containing entity data with a 'RID' key. |
required |
current
|
bool
|
If True, return URL to current catalog state (no snapshot). If False (default), return permanent citation URL with snapshot time. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Citation URL. Format depends on |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If an entity doesn't exist or lacks a RID. |
Examples:
Permanent citation (default): >>> url = ml.cite("1-abc123") >>> print(url) 'https://deriva.org/id/1/1-abc123@2024-01-01T12:00:00'
Current catalog URL: >>> url = ml.cite("1-abc123", current=True) >>> print(url) 'https://deriva.org/id/1/1-abc123'
Using a dictionary: >>> url = ml.cite({"RID": "1-abc123"})
Source code in src/deriva_ml/core/base.py
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 | |
clean_execution_dirs
clean_execution_dirs(
older_than_days: int | None = None,
exclude_rids: list[str]
| None = None,
) -> dict[str, int]
Clean up execution working directories.
Removes execution output directories from the local working directory. Use this to free up disk space from completed or orphaned executions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
older_than_days
|
int | None
|
If provided, only remove directories older than this many days. If None, removes all execution directories (except excluded). |
None
|
exclude_rids
|
list[str] | None
|
List of execution RIDs to preserve (never remove). |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
dict with keys: - 'dirs_removed': Number of directories removed - 'bytes_freed': Total bytes freed - 'errors': Number of removal errors |
Example
ml = DerivaML('deriva.example.org', 'my_catalog')
Clean all execution dirs older than 30 days
result = ml.clean_execution_dirs(older_than_days=30) print(f"Freed {result['bytes_freed'] / 1e9:.2f} GB")
Clean all except specific executions
result = ml.clean_execution_dirs(exclude_rids=['1-ABC', '1-DEF'])
Source code in src/deriva_ml/core/base.py
1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 | |
clear_cache
clear_cache(
older_than_days: int | None = None,
) -> dict[str, int]
Clear the dataset cache directory.
Removes cached dataset bags from the cache directory. Can optionally filter by age to only remove old cache entries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
older_than_days
|
int | None
|
If provided, only remove cache entries older than this many days. If None, removes all cache entries. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
dict with keys: - 'files_removed': Number of files removed - 'dirs_removed': Number of directories removed - 'bytes_freed': Total bytes freed - 'errors': Number of removal errors |
Example
ml = DerivaML('deriva.example.org', 'my_catalog')
Clear all cache
result = ml.clear_cache() print(f"Freed {result['bytes_freed'] / 1e6:.1f} MB")
Clear cache older than 7 days
result = ml.clear_cache(older_than_days=7)
Source code in src/deriva_ml/core/base.py
1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 | |
clear_vocabulary_cache
clear_vocabulary_cache(
table: str | Table | None = None,
) -> None
Clear the vocabulary term cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table | None
|
If provided, only clear cache for this specific vocabulary table. If None, clear the entire cache. |
None
|
Source code in src/deriva_ml/core/mixins/vocabulary.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
create_asset
create_asset(
asset_name: str,
column_defs: Iterable[
ColumnDefinition
]
| None = None,
fkey_defs: Iterable[
ColumnDefinition
]
| None = None,
referenced_tables: Iterable[Table]
| None = None,
comment: str = "",
schema: str | None = None,
update_navbar: bool = True,
) -> Table
Create a new asset table in the catalog.
Defines a Chaise-compatible asset table (Filename, URL, Length, MD5,
Description, plus system columns) with optional additional metadata
columns and foreign-key references. Registers the asset type in the
Asset_Type vocabulary and optionally updates the Chaise navigation
bar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_name
|
str
|
Name for the new asset table, e.g. |
required |
column_defs
|
Iterable[ColumnDefinition] | None
|
Extra metadata columns beyond the standard asset
columns. Each is a |
None
|
fkey_defs
|
Iterable[ColumnDefinition] | None
|
Foreign-key definitions from the asset table to other tables (e.g., linking images to a subject table). |
None
|
referenced_tables
|
Iterable[Table] | None
|
Tables that the new asset table should reference
via FKs. Convenience alternative to |
None
|
comment
|
str
|
Human-readable description of the asset table stored as the table comment in the catalog. |
''
|
schema
|
str | None
|
Schema in which to create the table. Defaults to
|
None
|
update_navbar
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
Table
|
The newly created |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If a table named |
DerivaMLSchemaError
|
If |
Example
from deriva.core.typed import Column, builtin_types # doctest: +SKIP ml.create_asset( # doctest: +SKIP ... "ScanImage", # doctest: +SKIP ... comment="MRI scan images", # doctest: +SKIP ... ) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
create_execution
create_execution(
configuration: "ExecutionConfiguration | None" = None,
*,
datasets: "list[DatasetSpec | str] | None" = None,
assets: "list[AssetSpec | str] | None" = None,
workflow: "Workflow | RID | str | None" = None,
description: "str | None" = None,
dry_run: bool = False,
) -> "Execution"
Create an execution environment.
Initializes a local compute environment for executing an ML or
analytic routine. Accepts either a pre-built
:class:ExecutionConfiguration (the config-object form) or
individual keyword arguments that the method assembles into an
ExecutionConfiguration (the kwargs form). Mixing the two
forms is rejected with TypeError — pick one.
Creating executions requires online mode because the Execution RID is server-assigned.
Side effects:
- Downloads datasets specified in the configuration to the cache directory. If no version is specified, creates a new minor version for the dataset.
- Downloads any execution assets to the working directory.
- Creates an execution record in the catalog (unless
dry_run=True).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configuration
|
'ExecutionConfiguration | None'
|
A pre-built ExecutionConfiguration. If this
is provided, all of the kwargs below (except
|
None
|
datasets
|
'list[DatasetSpec | str] | None'
|
Kwargs form only. List of :class: |
None
|
assets
|
'list[AssetSpec | str] | None'
|
Kwargs form only. List of :class: |
None
|
workflow
|
'Workflow | RID | str | None'
|
A :class: |
None
|
description
|
'str | None'
|
Kwargs form only. Human-readable description of the execution. |
None
|
dry_run
|
bool
|
If True, skip creating catalog records and uploading results. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
An |
'Execution'
|
class: |
'Execution'
|
lifecycle. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
DerivaMLOfflineError
|
If the current connection mode is
:attr: |
Example
Config-object form::
>>> config = ExecutionConfiguration( # doctest: +SKIP
... workflow=workflow,
... description="Process samples",
... datasets=[DatasetSpec(rid="4HM", version="1.0.0")],
... )
>>> with ml.create_execution(config) as execution:
... # Run analysis
... pass
>>> execution.upload_execution_outputs()
Kwargs form (equivalent)::
>>> with ml.create_execution( # doctest: +SKIP
... datasets=["4HM@1.0.0"],
... workflow=workflow,
... description="Process samples",
... ) as execution:
... # Run analysis
... pass
Source code in src/deriva_ml/core/mixins/execution.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
create_feature
create_feature(
target_table: Table | str,
feature_name: str,
terms: list[Table | str]
| None = None,
assets: list[Table | str]
| None = None,
metadata: list[
ColumnDefinition
| Table
| Key
| str
]
| None = None,
optional: list[str] | None = None,
comment: str = "",
update_navbar: bool = True,
) -> type[FeatureRecord]
Creates a new feature definition.
A feature represents a measurable property or characteristic that can be associated with records in the target table. Features can include vocabulary terms, asset references, and additional metadata.
Side Effects: This method dynamically creates: 1. A new association table in the domain schema to store feature values 2. A Pydantic model class (subclass of FeatureRecord) for creating validated feature instances
The returned Pydantic model class provides type-safe construction of feature records with automatic validation of values against the feature's definition (vocabulary terms, asset references, etc.). Use this class to create feature instances that can be inserted into the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_table
|
Table | str
|
Table to associate the feature with (name or Table object). |
required |
feature_name
|
str
|
Unique name for the feature within the target table. |
required |
terms
|
list[Table | str] | None
|
Optional vocabulary tables/names whose terms can be used as feature values. |
None
|
assets
|
list[Table | str] | None
|
Optional asset tables/names that can be referenced by this feature. |
None
|
metadata
|
list[ColumnDefinition | Table | Key | str] | None
|
Optional columns, tables, or keys to include in a feature definition. |
None
|
optional
|
list[str] | None
|
Column names that are not required when creating feature instances. |
None
|
comment
|
str
|
Description of the feature's purpose and usage. |
''
|
update_navbar
|
bool
|
If True (default), automatically updates the navigation bar to include the new feature table. Set to False during batch feature creation to avoid redundant updates, then call apply_catalog_annotations() once at the end. |
True
|
Returns:
| Type | Description |
|---|---|
type[FeatureRecord]
|
type[FeatureRecord]: A dynamically generated Pydantic model class for creating validated feature instances. The class has fields corresponding to the feature's terms, assets, and metadata columns. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If a feature definition is invalid or conflicts with existing features. |
Examples:
Create a feature with confidence score: >>> DiagnosisFeature = ml.create_feature( # doctest: +SKIP ... target_table="Image", ... feature_name="Diagnosis", ... terms=["Diagnosis_Type"], ... metadata=[ColumnDefinition(name="confidence", type=BuiltinTypes.float4)], ... comment="Clinical diagnosis label" ... ) >>> # Use the returned class to create validated feature instances >>> record = DiagnosisFeature( ... Image="1-ABC", # Target record RID ... Diagnosis_Type="Normal", # Vocabulary term ... confidence=0.95, ... Execution="2-XYZ" # Execution that produced this value ... )
Source code in src/deriva_ml/core/mixins/feature.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
create_table
create_table(
table: TableDefinition,
schema: str | None = None,
update_navbar: bool = True,
) -> Table
Creates a new table in the domain schema.
Creates a table using the provided TableDefinition object, which specifies the table structure including columns, keys, and foreign key relationships. The table is created in the domain schema associated with this DerivaML instance.
Required Classes: Import the following classes from deriva_ml to define tables:
TableDefinition: Defines the complete table structureColumnDefinition: Defines individual columns with types and constraintsKeyDefinition: Defines unique key constraints (optional)ForeignKeyDefinition: Defines foreign key relationships to other tables (optional)BuiltinTypes: Enum of available column data types
Available Column Types (BuiltinTypes enum):
text, int2, int4, int8, float4, float8, boolean,
date, timestamp, timestamptz, json, jsonb, markdown,
ermrest_uri, ermrest_rid, ermrest_rcb, ermrest_rmb,
ermrest_rct, ermrest_rmt
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
TableDefinition
|
A TableDefinition object containing the complete specification of the table to create. |
required |
update_navbar
|
bool
|
If True (default), automatically updates the navigation bar to include the new table. Set to False during batch table creation to avoid redundant updates, then call apply_catalog_annotations() once at the end. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Table |
Table
|
The newly created ERMRest table object. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If table creation fails or the definition is invalid. |
Examples:
Simple table with basic columns:
>>> from deriva_ml import TableDefinition, ColumnDefinition, BuiltinTypes
>>>
>>> table_def = TableDefinition(
... name="Experiment",
... column_defs=[
... ColumnDefinition(name="Name", type=BuiltinTypes.text, nullok=False),
... ColumnDefinition(name="Date", type=BuiltinTypes.date),
... ColumnDefinition(name="Description", type=BuiltinTypes.markdown),
... ColumnDefinition(name="Score", type=BuiltinTypes.float4),
... ],
... comment="Records of experimental runs"
... )
>>> experiment_table = ml.create_table(table_def)
Table with foreign key to another table:
>>> from deriva_ml import (
... TableDefinition, ColumnDefinition, ForeignKeyDefinition, BuiltinTypes
... )
>>>
>>> # Create a Sample table that references Subject
>>> sample_def = TableDefinition(
... name="Sample",
... column_defs=[
... ColumnDefinition(name="Name", type=BuiltinTypes.text, nullok=False),
... ColumnDefinition(name="Subject", type=BuiltinTypes.text, nullok=False),
... ColumnDefinition(name="Collection_Date", type=BuiltinTypes.date),
... ],
... fkey_defs=[
... ForeignKeyDefinition(
... colnames=["Subject"],
... pk_sname=ml.default_schema, # Schema of referenced table
... pk_tname="Subject", # Name of referenced table
... pk_colnames=["RID"], # Column(s) in referenced table
... on_delete="CASCADE", # Delete samples when subject deleted
... )
... ],
... comment="Biological samples collected from subjects"
... )
>>> sample_table = ml.create_table(sample_def)
Table with unique key constraint:
>>> from deriva_ml import (
... TableDefinition, ColumnDefinition, KeyDefinition, BuiltinTypes
... )
>>>
>>> protocol_def = TableDefinition(
... name="Protocol",
... column_defs=[
... ColumnDefinition(name="Name", type=BuiltinTypes.text, nullok=False),
... ColumnDefinition(name="Version", type=BuiltinTypes.text, nullok=False),
... ColumnDefinition(name="Description", type=BuiltinTypes.markdown),
... ],
... key_defs=[
... KeyDefinition(
... colnames=["Name", "Version"],
... constraint_names=[["myschema", "Protocol_Name_Version_key"]],
... comment="Each protocol name+version must be unique"
... )
... ],
... comment="Experimental protocols with versioning"
... )
>>> protocol_table = ml.create_table(protocol_def)
Batch creation without navbar updates:
>>> ml.create_table(table1_def, update_navbar=False)
>>> ml.create_table(table2_def, update_navbar=False)
>>> ml.create_table(table3_def, update_navbar=False)
>>> ml.apply_catalog_annotations() # Update navbar once at the end
Source code in src/deriva_ml/core/base.py
1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 | |
create_vocabulary
create_vocabulary(
vocab_name: str,
comment: str = "",
schema: str | None = None,
update_navbar: bool = True,
) -> Table
Creates a controlled vocabulary table.
A controlled vocabulary table maintains a list of standardized terms and their definitions. Each term can have synonyms and descriptions to ensure consistent terminology usage across the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_name
|
str
|
Name for the new vocabulary table. Must be a valid SQL identifier. |
required |
comment
|
str
|
Description of the vocabulary's purpose and usage. Defaults to empty string. |
''
|
schema
|
str | None
|
Schema name to create the table in. If None, uses domain_schema. |
None
|
update_navbar
|
bool
|
If True (default), automatically updates the navigation bar to include the new vocabulary table. Set to False during batch table creation to avoid redundant updates, then call apply_catalog_annotations() once at the end. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Table |
Table
|
ERMRest table object representing the newly created vocabulary table. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If vocab_name is invalid or already exists. |
Examples:
Create a vocabulary for tissue types:
>>> table = ml.create_vocabulary(
... vocab_name="tissue_types",
... comment="Standard tissue classifications",
... schema="bio_schema"
... )
Create multiple vocabularies without updating navbar until the end:
>>> ml.create_vocabulary("Species", update_navbar=False)
>>> ml.create_vocabulary("Tissue_Type", update_navbar=False)
>>> ml.apply_catalog_annotations() # Update navbar once
Source code in src/deriva_ml/core/base.py
1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 | |
create_workflow
create_workflow(
name: str,
workflow_type: str | list[str],
description: str = "",
) -> Workflow
Creates a new workflow definition.
Creates a Workflow object that represents a computational process or analysis pipeline. The workflow type(s) must be terms from the controlled vocabulary. This method is typically used to define new analysis workflows before execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of the workflow. |
required |
workflow_type
|
str | list[str]
|
Type(s) of workflow (must exist in workflow_type vocabulary). Can be a single string or a list of strings. |
required |
description
|
str
|
Description of what the workflow does. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
Workflow |
Workflow
|
New workflow object ready for registration. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If any workflow_type is not in the vocabulary. |
Examples:
>>> workflow = ml.create_workflow(
... name="RNA Analysis",
... workflow_type="python_notebook",
... description="RNA sequence analysis pipeline"
... )
>>> rid = ml._add_workflow(workflow)
Multiple types::
>>> workflow = ml.create_workflow(
... name="Training Pipeline",
... workflow_type=["Training", "Embedding"],
... description="Combined training and embedding pipeline"
... )
Source code in src/deriva_ml/core/mixins/workflow.py
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 | |
define_association
define_association(
associates: list,
metadata: list | None = None,
table_name: str | None = None,
comment: str | None = None,
**kwargs,
) -> dict
Build an association table definition with vocab-aware key selection.
Creates a table definition that links two or more tables via an association (many-to-many) table. Non-vocabulary tables automatically use RID as the foreign key target, while vocabulary tables use their Name key.
Use with create_table() to create the association table in the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
associates
|
list
|
Tables to associate. Each item can be: - A Table object - A (name, Table) tuple to customize the column name - A (name, nullok, Table) tuple for nullable references - A Key object for explicit key selection |
required |
metadata
|
list | None
|
Additional metadata columns or reference targets. |
None
|
table_name
|
str | None
|
Name for the association table. Auto-generated if omitted. |
None
|
comment
|
str | None
|
Comment for the association table. |
None
|
**kwargs
|
Additional arguments passed to Table.define_association. |
{}
|
Returns:
| Type | Description |
|---|---|
dict
|
Table definition dict suitable for |
Example::
# Associate Image with Subject (many-to-many)
image_table = ml.model.name_to_table("Image")
subject_table = ml.model.name_to_table("Subject")
assoc_def = ml.define_association(
associates=[image_table, subject_table],
comment="Links images to subjects",
)
ml.create_table(assoc_def)
Source code in src/deriva_ml/core/base.py
1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 | |
delete_dataset
delete_dataset(
dataset: "Dataset",
recurse: bool = False,
) -> None
Soft-delete a dataset by marking it as deleted in the catalog.
Sets the Deleted flag on the dataset record. The dataset's data is
preserved but it will no longer appear in normal queries (e.g.,
find_datasets()). The dataset cannot be deleted if it is currently
nested inside a parent dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset
|
The dataset to delete. |
required |
recurse
|
bool
|
If True, also soft-delete all nested child datasets. If False (default), only this dataset is marked as deleted. |
False
|
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the dataset RID is not a valid dataset, or if the dataset is nested inside a parent dataset. |
Example
ds = ml.lookup_dataset("1-ABC") # doctest: +SKIP ml.delete_dataset(ds, recurse=False) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
delete_feature
delete_feature(
table: Table | str,
feature_name: str,
) -> bool
Removes a feature definition and its data.
Deletes the feature and its implementation table from the catalog. This operation cannot be undone and will remove all feature values associated with this feature.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
Table | str
|
The table containing the feature, either as name or Table object. |
required |
feature_name
|
str
|
Name of the feature to delete. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the feature was successfully deleted, False if it didn't exist. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If deletion fails due to constraints or permissions. |
Example
success = ml.delete_feature("samples", "obsolete_feature") # doctest: +SKIP print("Deleted" if success else "Not found")
Source code in src/deriva_ml/core/mixins/feature.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
delete_term
delete_term(
table: str | Table, term_name: str
) -> None
Delete a term from a vocabulary table.
Removes a term from the vocabulary. The term must not be in use by any records in the catalog (e.g., no datasets using this dataset type, no assets using this asset type).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Vocabulary table containing the term (name or Table object). |
required |
term_name
|
str
|
Primary name of the term to delete. |
required |
Raises:
| Type | Description |
|---|---|
DerivaMLInvalidTerm
|
If the term doesn't exist in the vocabulary. |
DerivaMLException
|
If the term is currently in use by other records. |
Example
ml.delete_term("Dataset_Type", "Obsolete_Type") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/vocabulary.py
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 | |
diff_schema
diff_schema() -> 'SchemaDiff'
Return the structural diff between the cached and live schemas.
Online mode only. Fetches the live catalog's /schema
payload, compares it against the cached copy with
:func:~deriva_ml.core.schema_diff._compute_diff, and returns
the result. The returned :class:SchemaDiff may be empty
(no drift) — callers should check diff.is_empty() rather
than truthiness.
Unlike :meth:pin_schema, this method never modifies the
cache and never logs a warning; it is a pure inspection
operation.
Returns:
| Name | Type | Description |
|---|---|---|
A |
'SchemaDiff'
|
class: |
Raises:
| Type | Description |
|---|---|
DerivaMLReadOnlyError
|
If called in offline mode. |
FileNotFoundError
|
If the workspace has no cache file. |
Source code in src/deriva_ml/core/base.py
659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 | |
download_dataset_bag
download_dataset_bag(
dataset: DatasetSpec,
) -> "DatasetBag"
Downloads a dataset to the local filesystem.
Downloads a dataset specified by DatasetSpec to the local filesystem. If the catalog has s3_bucket configured and use_minid is enabled, the bag will be uploaded to S3 and registered with the MINID service.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
DatasetSpec
|
Specification of the dataset to download, including version and materialization options. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetBag |
'DatasetBag'
|
Object containing: - path: Local filesystem path to downloaded dataset - rid: Dataset's Resource Identifier - minid: Dataset's Minimal Viable Identifier (if MINID enabled) |
Note
MINID support requires s3_bucket to be configured when creating the DerivaML instance. The catalog's use_minid setting controls whether MINIDs are created.
Examples:
Download with default options: >>> spec = DatasetSpec(rid="1-abc123") # doctest: +SKIP >>> bag = ml.download_dataset_bag(dataset=spec) # doctest: +SKIP >>> print(f"Downloaded to {bag.path}") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
download_dir
download_dir(
cached: bool = False,
) -> Path
Returns the appropriate download directory.
Provides the appropriate directory path for storing downloaded files, either in the cache or working directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cached
|
bool
|
If True, returns the cache directory path. If False, returns the working directory path. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Directory path where downloaded files should be stored. |
Example
cache_dir = ml.download_dir(cached=True) work_dir = ml.download_dir(cached=False)
Source code in src/deriva_ml/core/base.py
771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 | |
estimate_bag_size
estimate_bag_size(
dataset: "DatasetSpec",
) -> dict[str, Any]
Estimate the size of a dataset bag before downloading.
Generates the same download specification used by download_dataset_bag, then runs COUNT and SUM(Length) queries against the snapshot catalog to preview what a download will contain and how large it will be.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
'DatasetSpec'
|
Specification of the dataset to estimate, including version and optional exclude_tables. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with keys: - tables: dict mapping table name to {row_count, is_asset, asset_bytes} - total_rows: total row count across all tables - total_asset_bytes: total size of asset files in bytes - total_asset_size: human-readable size string (e.g., "1.2 GB") |
Source code in src/deriva_ml/core/mixins/dataset.py
328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 | |
estimate_denormalized_size
estimate_denormalized_size(
include_tables: list[str],
) -> dict[str, Any]
Return schema shape + catalog-wide size estimates for a denormalized table.
This is the catalog-wide analog of
:meth:Dataset.describe_denormalized. It asks "if I were to
denormalize these tables across the entire catalog (not scoped
to any specific dataset), what would the result look like and
how big would it be?" Useful for rough size estimation before
committing to a bag export.
The return shape is aligned with :meth:estimate_bag_size and
is NOT the same as the dataset-scoped 12-key plan dict from
:meth:Dataset.describe_denormalized (spec §5). Do not confuse
the two.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_tables
|
list[str]
|
List of table names to include in the join. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict with these keys: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Example::
info = ml.estimate_denormalized_size(["Image", "Subject"])
print(f"{info['total_rows']} rows across "
f"{len(info['tables'])} tables, "
f"{info['total_asset_size']} of assets")
See Also
Dataset.describe_denormalized: Dataset-scoped planning dict. Denormalizer.describe: Full dataset-scoped plan with ambiguity reporting. estimate_bag_size: Bag-level size estimation.
Source code in src/deriva_ml/core/mixins/dataset.py
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 | |
feature_record_class
feature_record_class(
table: str | Table,
feature_name: str,
) -> type[FeatureRecord]
Returns a dynamically generated Pydantic model class for creating feature records.
Each feature has a unique set of columns based on its definition (terms, assets, metadata). This method returns a Pydantic class with fields corresponding to those columns, providing:
- Type validation: Values are validated against expected types (str, int, float, Path)
- Required field checking: Non-nullable columns must be provided
- Default values: Feature_Name is pre-filled with the feature's name
Field types in the generated class:
- {TargetTable} (str): Required. RID of the target record (e.g., Image RID)
- Execution (str, optional): RID of the execution for provenance tracking
- Feature_Name (str): Pre-filled with the feature name
- Term columns (str): Accept vocabulary term names
- Asset columns (str | Path): Accept asset RIDs or file paths
- Value columns: Accept values matching the column type (int, float, str)
Use lookup_feature() to inspect the feature's structure and see what columns
are available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
The table containing the feature, either as name or Table object. |
required |
feature_name
|
str
|
Name of the feature to create a record class for. |
required |
Returns:
| Type | Description |
|---|---|
type[FeatureRecord]
|
type[FeatureRecord]: A Pydantic model class for creating validated feature records.
The class name follows the pattern |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the feature doesn't exist or the table is invalid. |
Example
Get the dynamically generated class
DiagnosisFeature = ml.feature_record_class("Image", "Diagnosis") # doctest: +SKIP
Create a validated feature record
record = DiagnosisFeature( ... Image="1-ABC", # Target record RID ... Diagnosis_Type="Normal", # Vocabulary term ... confidence=0.95, # Metadata column ... Execution="2-XYZ" # Provenance ... )
Convert to dict for insertion
record.model_dump() {'Image': '1-ABC', 'Diagnosis_Type': 'Normal', 'confidence': 0.95, ...}
Source code in src/deriva_ml/core/mixins/feature.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
feature_values
feature_values(
table: Table | str,
feature_name: str,
selector: Callable[
[list[FeatureRecord]],
FeatureRecord | None,
]
| None = None,
materialize_limit: int
| None = None,
execution_rids: list[str]
| None = None,
) -> Iterable[FeatureRecord]
Yield feature values for a single feature, one record per target RID.
Returns an iterator of typed FeatureRecord instances. Each record is
wide in shape — target RID, all value columns (vocab terms, asset
references, metadata columns), and provenance columns (Execution,
RCT) — exposed as typed attributes.
When a selector is provided, records are grouped by target RID and
the selector collapses each group to a single survivor. Target RIDs
whose group's selector returns None are omitted. When no selector
is provided, every raw record is yielded — multiple records per target
RID are possible.
This method has identical signatures and semantics across DerivaML,
Dataset, and DatasetBag. The bag implementation reads from a
per-feature denormalization cache populated on first access; subsequent
calls are cheap.
All rows for the feature are fetched from the catalog before the first
record is yielded — this method is iterator-shaped for composability,
not for streaming of very large feature tables. When execution_rids
is set, the catalog query is filtered server-side to those execution
RIDs only -- this is the recommended way to keep the materialization
cost bounded for cross-execution comparisons.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
Table | str
|
Target table the feature is defined on (name or Table). |
required |
feature_name
|
str
|
Name of the feature to read. |
required |
selector
|
Callable[[list[FeatureRecord]], FeatureRecord | None] | None
|
Optional callable
|
None
|
materialize_limit
|
int | None
|
Optional cap on the number of rows that
may be materialized into memory. When the catalog query
returns more than this many rows, raises
|
None
|
execution_rids
|
list[str] | None
|
Optional filter -- when set, only feature
rows whose |
None
|
Returns:
| Type | Description |
|---|---|
Iterable[FeatureRecord]
|
Iterator of |
Iterable[FeatureRecord]
|
selector reduction, or all raw records if no selector. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableNotFound
|
|
DerivaMLException
|
|
DerivaMLMaterializeLimitExceeded
|
If the result set exceeds
|
Example
Get the newest Glaucoma label per image::
>>> from deriva_ml.feature import FeatureRecord
>>> for rec in ml.feature_values(
... "Image", "Glaucoma", selector=FeatureRecord.select_newest,
... ):
... print(f"{rec.Image}: {rec.Glaucoma} (by {rec.Execution})")
Filter by a specific workflow — works identically on a downloaded bag::
>>> workflow = ml.lookup_workflow("Glaucoma_Training_v2")
>>> sel = FeatureRecord.select_by_workflow(workflow, container=ml)
>>> labels = [r.Glaucoma for r in ml.feature_values(
... "Image", "Glaucoma", selector=sel,
... )]
Convert to a pandas DataFrame when needed::
>>> import pandas as pd
>>> df = pd.DataFrame(
... r.model_dump()
... for r in ml.feature_values("Image", "Glaucoma")
... )
Source code in src/deriva_ml/core/mixins/feature.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 | |
fetch_table_features
fetch_table_features(*args, **kwargs)
Retired — use feature_values(table, name) or Denormalizer.
DerivaML.fetch_table_features has been removed. To read feature
values for a single feature, use the new feature_values method::
for rec in ml.feature_values("Image", "Quality"):
...
For wide-table denormalization across all features use the
Denormalizer subsystem.
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
Always. Points at the replacement API. |
Source code in src/deriva_ml/core/mixins/feature.py
513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 | |
find_assets
find_assets(
asset_table: Table
| str
| None = None,
asset_type: str | None = None,
) -> Iterable["Asset"]
Find assets in the catalog.
Returns an iterable of Asset objects matching the specified criteria. If no criteria are specified, returns all assets from all asset tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_table
|
Table | str | None
|
Optional table or table name to search. If None, searches all asset tables. |
None
|
asset_type
|
str | None
|
Optional asset type to filter by. Only returns assets with this type. |
None
|
Returns:
| Type | Description |
|---|---|
Iterable['Asset']
|
Iterable of Asset objects matching the criteria. |
Example
Find all assets in the Model table
models = list(ml.find_assets(asset_table="Model")) # doctest: +SKIP
Find all assets with type "Training_Data"
training = list(ml.find_assets(asset_type="Training_Data")) # doctest: +SKIP
Find all assets across all tables
all_assets = list(ml.find_assets()) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | |
find_datasets
find_datasets(
deleted: bool = False,
sort: SortSpec = None,
) -> Iterable["Dataset"]
List all datasets in the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deleted
|
bool
|
If True, include datasets that have been marked as deleted. |
False
|
sort
|
SortSpec
|
Optional sort spec.
- |
None
|
Returns:
| Type | Description |
|---|---|
Iterable['Dataset']
|
Iterable of Dataset objects. |
Example
datasets = list(ml.find_datasets()) # doctest: +SKIP for ds in datasets: # doctest: +SKIP ... print(f"{ds.dataset_rid}: {ds.description}")
Newest-first (most common):
recent = list(ml.find_datasets(sort=True)) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
find_experiments
find_experiments(
workflow_rid: RID | None = None,
status: ExecutionStatus
| None = None,
) -> Iterable["Experiment"]
List all experiments (executions with Hydra configuration) in the catalog.
Creates Experiment objects for analyzing completed ML model runs. Only returns executions that have Hydra configuration metadata (i.e., a config.yaml file in Execution_Metadata assets).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workflow_rid
|
RID | None
|
Optional workflow RID to filter by. |
None
|
status
|
ExecutionStatus | None
|
Optional status to filter by (e.g., ExecutionStatus.Uploaded). |
None
|
Returns:
| Type | Description |
|---|---|
Iterable['Experiment']
|
Iterable of Experiment objects for executions with Hydra config. |
Example
experiments = list(ml.find_experiments(status=ExecutionStatus.Uploaded)) # doctest: +SKIP for exp in experiments: ... print(f"{exp.name}: {exp.config_choices}")
Source code in src/deriva_ml/core/mixins/execution.py
683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 | |
find_features
find_features(
table: str | Table | None = None,
) -> list[Feature]
Find feature definitions in the schema.
Discovers features by inspecting the catalog schema for association tables
that have Feature_Name and Execution columns. Returns Feature objects
describing each feature's structure (target table, term/asset/value columns),
not the feature values themselves.
Use feature_values to retrieve actual feature values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table | None
|
Optional table to find features for. If None, returns all feature definitions across all tables. |
None
|
Returns:
| Type | Description |
|---|---|
list[Feature]
|
A list of Feature instances describing the feature definitions. |
Examples:
Find all feature definitions: >>> all_features = ml.find_features() # doctest: +SKIP >>> for f in all_features: ... print(f"{f.target_table.name}.{f.feature_name}")
Find features defined on a specific table: >>> image_features = ml.find_features("Image") # doctest: +SKIP >>> print([f.feature_name for f in image_features])
Source code in src/deriva_ml/core/mixins/feature.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 | |
find_incomplete_executions
find_incomplete_executions() -> (
list[ExecutionSnapshot]
)
Sugar over :meth:list_executions for everything not terminally done.
Reads from the workspace SQLite registry — no server contact. Returns executions in status in (Created, Running, Stopped, Failed, Pending_Upload) — the set of things a user would want to either resume, retry, or clean up. Excludes Uploaded (terminal success) and Aborted (terminal cleanup).
For live catalog queries returning mutable
:class:~deriva_ml.execution.execution_record.ExecutionRecord
objects, see find_executions(status=...).
Returns:
| Type | Description |
|---|---|
list[ExecutionSnapshot]
|
List of |
list[ExecutionSnapshot]
|
execution known to the local registry. |
Example
for snap in ml.find_incomplete_executions(): # doctest: +SKIP ... print(snap.rid, snap.status, snap.pending_rows)
Source code in src/deriva_ml/core/mixins/execution.py
371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | |
find_workflows
find_workflows(
sort: SortSpec = None,
) -> list[Workflow]
Find all workflows in the catalog.
Catalog-level operation to find all workflow definitions, including their names, URLs, types, versions, and descriptions. Each returned Workflow is bound to the catalog, allowing its description to be updated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sort
|
SortSpec
|
Optional sort spec.
- |
None
|
Returns:
| Type | Description |
|---|---|
list[Workflow]
|
list[Workflow]: List of workflow objects, each containing: - name: Workflow name - url: Source code URL - workflow_type: Type(s) of workflow - version: Version identifier - description: Workflow description - rid: Resource identifier - checksum: Source code checksum |
Examples:
List all workflows and their descriptions::
>>> workflows = ml.find_workflows()
>>> for w in workflows:
... print(f"{w.name} (v{w.version}): {w.description}")
... print(f" Source: {w.url}")
Update a workflow's description (workflows are catalog-bound)::
>>> workflows = ml.find_workflows()
>>> workflows[0].description = "Updated description"
Newest-first (most common)::
>>> recent = list(ml.find_workflows(sort=True)) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/workflow.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | |
from_context
classmethod
from_context(
path: Path | str | None = None,
) -> Self
Create a DerivaML instance from a .deriva-context.json file.
Searches for .deriva-context.json starting from path (default: cwd),
walking up parent directories. This enables scripts generated by Claude
to connect to the same catalog without hardcoding connection details.
The context file is written by the MCP server's connect_catalog tool
and contains hostname, catalog_id, and default_schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str | None
|
Starting directory to search for the context file. Defaults to the current working directory. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
A new DerivaML instance configured from the context file. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If no .deriva-context.json is found. |
Example::
# In a script generated by Claude:
from deriva_ml import DerivaML
ml = DerivaML.from_context()
subjects = ml.cache_table("Subject")
Source code in src/deriva_ml/core/base.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
gc_executions
gc_executions(
*,
older_than: "timedelta | None" = None,
status: "ExecutionStatus | list[ExecutionStatus] | None" = None,
delete_working_dir: bool = False,
) -> int
Garbage-collect execution registry rows matching the filters.
By default only removes registry state (SQLite rows and their
pending_rows / directory_rules). Pass delete_working_dir=True to
also rm -rf the on-disk execution root under the workspace.
Does NOT touch the catalog. Executions uploaded to the catalog remain there regardless of local gc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
older_than
|
'timedelta | None'
|
If set, only gc executions whose last_activity is older than this timedelta. |
None
|
status
|
'ExecutionStatus | list[ExecutionStatus] | None'
|
Filter by status (single or list); None = any status. Typical: pass ExecutionStatus.Uploaded to clean up after successful uploads. |
None
|
delete_working_dir
|
bool
|
If True, remove the per-execution working directory from disk. Defaults to False (registry-only). |
False
|
Returns:
| Type | Description |
|---|---|
int
|
The number of executions removed. |
Example
from datetime import timedelta # doctest: +SKIP from deriva_ml.execution.state_store import ExecutionStatus n = ml.gc_executions( ... status=ExecutionStatus.Uploaded, ... older_than=timedelta(days=30), ... delete_working_dir=True, ... ) print(f"cleaned {n} old executions")
Source code in src/deriva_ml/core/mixins/execution.py
494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 | |
get_cache_size
get_cache_size() -> dict[
str, int | float
]
Get the current size of the cache directory.
Returns:
| Type | Description |
|---|---|
dict[str, int | float]
|
dict with keys: - 'total_bytes': Total size in bytes - 'total_mb': Total size in megabytes - 'file_count': Number of files - 'dir_count': Number of directories |
Example
ml = DerivaML('deriva.example.org', 'my_catalog') size = ml.get_cache_size() print(f"Cache size: {size['total_mb']:.1f} MB ({size['file_count']} files)")
Source code in src/deriva_ml/core/base.py
1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 | |
get_column_annotations
get_column_annotations(
table: str | Table, column_name: str
) -> dict[str, Any]
Get all Chaise display-related annotations for a column.
Returns display and column-display annotations. Missing annotations
are None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
column_name
|
str
|
Name of the column. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with keys |
dict[str, Any]
|
|
dict[str, Any]
|
|
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If |
Example
anns = ml.get_column_annotations("Image", "Filename") # doctest: +SKIP anns["display"] # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
get_handlebars_template_variables
get_handlebars_template_variables(
table: str | Table,
) -> dict[str, Any]
Get all available template variables for a table.
Returns the columns, foreign keys, and special variables that can be used in Handlebars templates (row_markdown_pattern, markdown_pattern, etc.) for the specified table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name or Table object. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with columns, foreign_keys, special_variables, and helper_examples. |
Example
vars = ml.get_handlebars_template_variables("Image") # doctest: +SKIP for col in vars["columns"]: # doctest: +SKIP ... print(f"{col['name']}: {col['template']}")
Source code in src/deriva_ml/core/mixins/annotation.py
917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 | |
get_storage_summary
get_storage_summary() -> dict[str, any]
Get a summary of local storage usage.
Returns:
| Type | Description |
|---|---|
dict[str, any]
|
dict with keys: - 'working_dir': Path to working directory - 'cache_dir': Path to cache directory - 'cache_size_mb': Cache size in MB - 'cache_file_count': Number of files in cache - 'execution_dir_count': Number of execution directories - 'execution_size_mb': Total size of execution directories in MB - 'total_size_mb': Combined size in MB |
Example
ml = DerivaML('deriva.example.org', 'my_catalog') summary = ml.get_storage_summary() print(f"Total storage: {summary['total_size_mb']:.1f} MB") print(f" Cache: {summary['cache_size_mb']:.1f} MB") print(f" Executions: {summary['execution_size_mb']:.1f} MB")
Source code in src/deriva_ml/core/base.py
1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 | |
get_table_annotations
get_table_annotations(
table: str | Table,
) -> dict[str, Any]
Get all Chaise display-related annotations for a table.
Returns the current values of display, visible-columns,
visible-foreign-keys, and table-display annotations. Missing
annotations are represented as None in the returned dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with keys |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
(dict | None). |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
anns = ml.get_table_annotations("Image") # doctest: +SKIP anns["visible_columns"] # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
get_table_as_dataframe
get_table_as_dataframe(
table: str,
) -> pd.DataFrame
Get table contents as a pandas DataFrame.
Retrieves all contents of a table from the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str
|
Name of the table to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame containing all table contents. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableNotFound
|
If the table does not exist in any schema. |
Example
df = ml.get_table_as_dataframe("Subject") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/path_builder.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
get_table_as_dict
get_table_as_dict(
table: str,
) -> Iterable[dict[str, Any]]
Get table contents as dictionaries.
Retrieves all contents of a table from the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str
|
Name of the table to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
Iterable[dict[str, Any]]
|
Iterable yielding dictionaries for each row. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableNotFound
|
If the table does not exist in any schema. |
Example
rows = list(ml.get_table_as_dict("Subject")) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/path_builder.py
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 | |
instantiate
classmethod
instantiate(
config: DerivaMLConfig,
) -> Self
Create a DerivaML instance from a configuration object.
This method is the preferred way to instantiate DerivaML when using hydra-zen for configuration management. It accepts a DerivaMLConfig (Pydantic model) and unpacks it to create the instance.
This pattern allows hydra-zen's instantiate() to work with DerivaML:
Example with hydra-zen
from hydra_zen import builds, instantiate from deriva_ml import DerivaML from deriva_ml.core.config import DerivaMLConfig
Create a structured config using hydra-zen
DerivaMLConf = builds(DerivaMLConfig, populate_full_signature=True)
Configure for your environment
conf = DerivaMLConf( ... hostname='deriva.example.org', ... catalog_id='42', ... domain_schema='my_domain', ... )
Instantiate the config to get a DerivaMLConfig object
config = instantiate(conf)
Create the DerivaML instance
ml = DerivaML.instantiate(config)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
DerivaMLConfig
|
A DerivaMLConfig object containing all configuration parameters. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
A new DerivaML instance configured according to the config object. |
Note
The DerivaMLConfig class integrates with Hydra's configuration system
and registers custom resolvers for computing working directories.
See deriva_ml.core.config for details on configuration options.
Source code in src/deriva_ml/core/base.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
is_snapshot
is_snapshot() -> bool
Check whether this DerivaML instance is connected to a catalog snapshot.
Returns:
| Type | Description |
|---|---|
bool
|
True if the underlying catalog has a snapshot timestamp, False otherwise. |
Source code in src/deriva_ml/core/base.py
722 723 724 725 726 727 728 | |
is_strict_preallocated_rid
is_strict_preallocated_rid(
table: str | Table,
) -> bool
Return True if the asset table has the strict-preallocated-RID annotation set.
Checks for the tag:isrd.isi.edu,2026:strict-preallocated-rid
annotation. Returns True iff the annotation is present with
{"strict": true}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Asset table name or Table object. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if strict mode is set on this table, False otherwise. |
Source code in src/deriva_ml/core/mixins/annotation.py
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 | |
list_asset_executions
list_asset_executions(
asset_rid: str,
asset_role: str | None = None,
) -> list["ExecutionRecord"]
List all executions associated with an asset.
Given an asset RID, returns a list of executions that created or used the asset, along with the role (Input/Output) in each execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_rid
|
str
|
The RID of the asset to look up. |
required |
asset_role
|
str | None
|
Optional filter for asset role ('Input' or 'Output'). If None, returns all associations. |
None
|
Returns:
| Type | Description |
|---|---|
list['ExecutionRecord']
|
list[ExecutionRecord]: List of ExecutionRecord objects for the executions associated with this asset. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the asset RID is not found or not an asset. |
Example
Find all executions that created this asset
executions = ml.list_asset_executions("1-abc123", asset_role="Output") # doctest: +SKIP for exe in executions: # doctest: +SKIP ... print(f"Created by execution {exe.execution_rid}") # doctest: +SKIP
Find all executions that used this asset as input
executions = ml.list_asset_executions("1-abc123", asset_role="Input") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
list_asset_tables
list_asset_tables() -> list[Table]
List all asset tables in the catalog.
Returns:
| Type | Description |
|---|---|
list[Table]
|
List of Table objects that are asset tables. |
Example
for table in ml.list_asset_tables(): # doctest: +SKIP ... print(f"Asset table: {table.name}") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | |
list_assets
list_assets(
asset_table: Table | str,
) -> list["Asset"]
Lists contents of an asset table.
Returns a list of Asset objects for the specified asset table. Asset
types are pre-fetched in a single query and joined client-side to
avoid an N+1 round-trip pattern: for an asset table with N rows, the
catalog is hit twice (once for the assets, once for the
Asset_Type association rows) regardless of N.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_table
|
Table | str
|
Table or name of the asset table to list assets for. |
required |
Returns:
| Type | Description |
|---|---|
list['Asset']
|
list[Asset]: List of Asset objects for the assets in the table. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the table is not an asset table or doesn't exist. |
Example
assets = ml.list_assets("Image") # doctest: +SKIP for asset in assets: # doctest: +SKIP ... print(f"{asset.asset_rid}: {asset.filename}") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | |
list_dataset_element_types
list_dataset_element_types() -> (
Iterable[Table]
)
List the table types that can be added as dataset members.
Returns every table that has an association with the Dataset table,
restricted to domain-schema tables and the Dataset table itself.
These are the types accepted by add_dataset_members().
Returns:
| Type | Description |
|---|---|
Iterable[Table]
|
Iterable of |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the catalog schema cannot be read. |
Example
types = ml.list_dataset_element_types() # doctest: +SKIP print([t.name for t in types]) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
list_execution_dirs
list_execution_dirs() -> list[
dict[str, any]
]
List execution working directories.
Returns information about each execution directory in the working directory, useful for identifying orphaned or incomplete execution outputs.
Returns:
| Type | Description |
|---|---|
list[dict[str, any]]
|
List of dicts, each containing: - 'execution_rid': The execution RID (directory name) - 'path': Full path to the directory - 'size_bytes': Total size in bytes - 'size_mb': Total size in megabytes - 'modified': Last modification time (datetime) - 'file_count': Number of files |
Example
ml = DerivaML('deriva.example.org', 'my_catalog') dirs = ml.list_execution_dirs() for d in dirs: ... print(f"{d['execution_rid']}: {d['size_mb']:.1f} MB")
Source code in src/deriva_ml/core/base.py
1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 | |
list_executions
list_executions(
*,
status: "ExecutionStatus | list[ExecutionStatus] | None" = None,
workflow_rid: str | None = None,
mode: "ConnectionMode | None" = None,
since: datetime | None = None,
) -> list[ExecutionSnapshot]
Enumerate locally-known executions from the SQLite registry.
Reads from the workspace SQLite registry — no server contact.
Works in both online and offline mode. Each returned
ExecutionSnapshot is a frozen Pydantic value object captured
at query time; it cannot mutate the catalog. Pending-row counts
are included in the same pass.
For live catalog queries that return mutable
:class:~deriva_ml.execution.execution_record.ExecutionRecord
objects bound to the catalog, see find_executions() and
lookup_execution().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
status
|
'ExecutionStatus | list[ExecutionStatus] | None'
|
Single ExecutionStatus or list to filter; None = all. |
None
|
workflow_rid
|
str | None
|
Match only executions tagged with this Workflow RID; None = all. |
None
|
mode
|
'ConnectionMode | None'
|
ConnectionMode the execution was last active under; None = all. |
None
|
since
|
datetime | None
|
Return only executions with last_activity >= this timestamp (timezone-aware). None = no time filter. |
None
|
Returns:
| Type | Description |
|---|---|
list[ExecutionSnapshot]
|
List of |
list[ExecutionSnapshot]
|
row in the registry. Empty list if nothing matches. |
Example
from deriva_ml.execution.state_store import ExecutionStatus # doctest: +SKIP failed = ml.list_executions(status=ExecutionStatus.Failed) for snap in failed: ... print(snap.rid, snap.error)
Source code in src/deriva_ml/core/mixins/execution.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 | |
list_feature_values
list_feature_values(
*args, **kwargs
) -> Iterable[FeatureRecord]
Retired — renamed to feature_values.
DerivaML.list_feature_values has been removed. Use the new
feature_values method instead::
for rec in ml.feature_values("Image", "Quality"):
...
The signature is identical (table, feature_name, optional
selector).
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
Always. Points at the replacement API. |
Source code in src/deriva_ml/core/mixins/feature.py
534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 | |
list_files
list_files(
file_types: list[str] | None = None,
) -> list[dict[str, Any]]
Lists files in the catalog with their metadata.
Returns a list of files with their metadata including URL, MD5 hash, length, description, and associated file types. Files can be optionally filtered by type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_types
|
list[str] | None
|
Filter results to only include these file types. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: List of file records, each containing: - RID: Resource identifier - URL: File location - MD5: File hash - Length: File size - Description: File description - File_Types: List of associated file types |
Examples:
List all files: >>> files = ml.list_files() # doctest: +SKIP >>> for f in files: ... print(f"{f['RID']}: {f['URL']}")
Filter by file type: >>> image_files = ml.list_files(["image", "png"]) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/file.py
211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | |
list_vocabulary_terms
list_vocabulary_terms(
table: str | Table,
) -> list[VocabularyTerm]
Lists all terms in a vocabulary table.
Retrieves all terms, their descriptions, and synonyms from a controlled vocabulary table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Vocabulary table to list terms from (name or Table object). |
required |
Returns:
| Type | Description |
|---|---|
list[VocabularyTerm]
|
list[VocabularyTerm]: List of vocabulary terms with their metadata. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If table doesn't exist or is not a vocabulary table. |
Examples:
>>> terms = ml.list_vocabulary_terms("tissue_types")
>>> for term in terms:
... print(f"{term.name}: {term.description}")
... if term.synonyms:
... print(f" Synonyms: {', '.join(term.synonyms)}")
Source code in src/deriva_ml/core/mixins/vocabulary.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | |
list_workflow_executions
list_workflow_executions(
workflow: str,
) -> list[str]
Return execution RIDs that ran the given workflow.
The workflow argument resolves in two steps: first as a Workflow
RID, and if that fails, as a Workflow_Type name. The returned list
contains every execution RID for every workflow that matches.
This method is the catalog-backed building block for
FeatureRecord.select_by_workflow(workflow, container=ml) — it
resolves the workflow's execution set once, and the selector closes
over the result for cheap per-group membership testing.
Entries are unique by construction (each execution runs one workflow).
Consumers that need O(1) membership testing convert to set at the
call site.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
workflow
|
str
|
Workflow RID (e.g., |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of execution RIDs, in insertion order. May be empty if the |
list[str]
|
workflow exists but has no executions yet. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If |
Example
List all executions of a workflow and count them::
>>> rids = ml.list_workflow_executions("Glaucoma_Training_v2")
>>> print(f"{len(rids)} executions of this workflow")
Use as the catalog-backed resolver for the selector factory::
>>> from deriva_ml.feature import FeatureRecord
>>> sel = FeatureRecord.select_by_workflow(
... "Glaucoma_Training_v2", container=ml,
... )
Source code in src/deriva_ml/core/mixins/feature.py
554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 | |
lookup_asset
lookup_asset(asset_rid: RID) -> 'Asset'
Look up an asset by its RID.
Returns an Asset object for the specified RID. The asset can be from any asset table in the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
asset_rid
|
RID
|
The RID of the asset to look up. |
required |
Returns:
| Type | Description |
|---|---|
'Asset'
|
Asset object for the specified RID. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the RID is not found or is not an asset. |
Example
asset = ml.lookup_asset("3JSE") # doctest: +SKIP print(f"File: {asset.filename}, Table: {asset.asset_table}") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/asset.py
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 | |
lookup_dataset
lookup_dataset(
dataset: RID | DatasetSpec,
deleted: bool = False,
) -> "Dataset"
Look up a dataset by RID or DatasetSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
RID | DatasetSpec
|
Dataset RID or DatasetSpec to look up. |
required |
deleted
|
bool
|
If True, include datasets that have been marked as deleted. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
'Dataset'
|
The dataset object for the specified RID. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the dataset is not found. |
Example
dataset = ml.lookup_dataset("4HM") # doctest: +SKIP print(f"Version: {dataset.current_version}") # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/dataset.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
lookup_execution
lookup_execution(
execution_rid: RID,
) -> "ExecutionRecord"
Look up a single execution by RID in the live catalog.
Queries the ERMrest catalog for the Execution row with the given
RID and returns an ExecutionRecord — a live, catalog-bound
value whose mutable properties (status, description)
write through to the catalog on assignment. Online mode only.
For enumerating executions from the local SQLite registry without
touching the catalog, see list_executions(). For catalog-side
filter queries returning live records, see find_executions().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
execution_rid
|
RID
|
Resource Identifier (RID) of the execution. |
required |
Returns:
| Type | Description |
|---|---|
'ExecutionRecord'
|
A live |
'ExecutionRecord'
|
setters ( |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If execution_rid is not valid or doesn't refer to an Execution record. |
Example
record = ml.lookup_execution("1-abc123") # doctest: +SKIP record.status = ExecutionStatus.Uploaded # writes to catalog
Source code in src/deriva_ml/core/mixins/execution.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
lookup_experiment
lookup_experiment(
execution_rid: RID,
) -> "Experiment"
Look up an experiment by execution RID.
Creates an Experiment object for analyzing completed executions. Provides convenient access to execution metadata, configuration choices, model parameters, inputs, and outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
execution_rid
|
RID
|
Resource Identifier (RID) of the execution. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Experiment |
'Experiment'
|
An experiment object for the given execution RID. |
Example
exp = ml.lookup_experiment("47BE") # doctest: +SKIP print(exp.name) # e.g., "cifar10_quick" print(exp.config_choices) # Hydra config names used print(exp.model_config) # Model hyperparameters
Source code in src/deriva_ml/core/mixins/execution.py
660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 | |
lookup_feature
lookup_feature(
table: str | Table,
feature_name: str,
) -> Feature
Look up a feature definition by table and name.
Returns a Feature object that describes the schema structure
of a feature — not the feature values themselves. A Feature is a
schema-level descriptor derived by inspecting the catalog's
association tables. It tells you:
- What table the feature annotates (
target_table) — e.g., Image - Where values are stored (
feature_table) — the association table linking targets to values and executions -
What kind of values it holds, classified by column role:
-
term_columns: columns referencing controlled vocabulary tables (e.g., aDiagnosis_Typecolumn pointing to a vocabulary of diagnosis terms) asset_columns: columns referencing asset tables (e.g., aSegmentation_Maskcolumn)value_columns: columns holding direct values like floats, ints, or text (e.g., aconfidencescore)
The Feature object also provides feature_record_class(), which
returns a dynamically generated Pydantic model for constructing
validated feature records to insert into the catalog.
To retrieve actual feature values, use feature_values
instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
The table the feature is defined on (name or Table object). |
required |
feature_name
|
str
|
Name of the feature to look up. |
required |
Returns:
| Type | Description |
|---|---|
Feature
|
A Feature schema descriptor. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the feature doesn't exist on the specified table. |
Example
feature = ml.lookup_feature("Image", "Classification") # doctest: +SKIP print(f"Feature: {feature.feature_name}") print(f"Stored in: {feature.feature_table.name}") print(f"Term columns: {[c.name for c in feature.term_columns]}") print(f"Value columns: {[c.name for c in feature.value_columns]}")
Source code in src/deriva_ml/core/mixins/feature.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | |
lookup_term
lookup_term(
table: str | Table, term_name: str
) -> VocabularyTermHandle
Finds a term in a vocabulary table.
Searches for a term in the specified vocabulary table, matching either the primary name or any of its synonyms. Results are cached for performance - subsequent lookups in the same vocabulary table are served from cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Vocabulary table to search in (name or Table object). |
required |
term_name
|
str
|
Name or synonym of the term to find. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
VocabularyTermHandle |
VocabularyTermHandle
|
The matching vocabulary term, with methods to modify it. |
Raises:
| Type | Description |
|---|---|
DerivaMLVocabularyException
|
If the table is not a vocabulary table, or term is not found. |
Examples:
Look up by primary name: >>> term = ml.lookup_term("tissue_types", "epithelial") # doctest: +SKIP >>> print(term.description)
Look up by synonym: >>> term = ml.lookup_term("tissue_types", "epithelium") # doctest: +SKIP
Modify the term: >>> term = ml.lookup_term("tissue_types", "epithelial") # doctest: +SKIP >>> term.description = "Updated description" >>> term.synonyms = ("epithelium", "epithelial_tissue")
Source code in src/deriva_ml/core/mixins/vocabulary.py
194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
lookup_workflow
lookup_workflow(rid: RID) -> Workflow
Look up a workflow by its Resource Identifier (RID).
Retrieves a workflow from the catalog by its RID and returns a Workflow object bound to the catalog. The returned Workflow can be modified (e.g., updating its description) and changes will be reflected in the catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rid
|
RID
|
Resource Identifier of the workflow to look up. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Workflow |
Workflow
|
The workflow object bound to this catalog, allowing
properties like |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If the RID does not correspond to a workflow in the catalog. |
Examples:
Look up a workflow and read its properties::
>>> workflow = ml.lookup_workflow("2-ABC1")
>>> print(f"Name: {workflow.name}")
>>> print(f"Description: {workflow.description}")
>>> print(f"Type: {workflow.workflow_type}")
Update a workflow's description (persisted to catalog)::
>>> workflow = ml.lookup_workflow("2-ABC1")
>>> workflow.description = "Updated analysis pipeline for RNA sequences"
>>> # The change is immediately written to the catalog
Attempting to update on a read-only catalog raises an error::
>>> snapshot = ml.catalog_snapshot("2023-01-15T10:30:00")
>>> workflow = snapshot.lookup_workflow("2-ABC1")
>>> workflow.description = "New description"
DerivaMLException: Cannot update workflow description on a read-only
catalog snapshot. Use a writable catalog connection instead.
Source code in src/deriva_ml/core/mixins/workflow.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 | |
lookup_workflow_by_url
lookup_workflow_by_url(
url_or_checksum: str,
) -> Workflow
Look up a workflow by URL or checksum and return the full Workflow object.
Searches for a workflow in the catalog that matches the given URL or checksum and returns a Workflow object bound to the catalog. This allows you to both identify a workflow by its source code location and modify its properties (e.g., description).
The URL should be a GitHub URL pointing to the specific version of the workflow source code. The format typically includes the commit hash::
https://github.com/org/repo/blob/<commit_hash>/path/to/workflow.py
Alternatively, you can search by the Git object hash (checksum) of the workflow file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url_or_checksum
|
str
|
GitHub URL with commit hash, or Git object hash (checksum) of the workflow file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Workflow |
Workflow
|
The workflow object bound to this catalog, allowing
properties like |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If no workflow with the given URL or checksum is found in the catalog. |
Examples:
Look up a workflow by its GitHub URL::
>>> url = "https://github.com/org/repo/blob/abc123/analysis.py"
>>> workflow = ml.lookup_workflow_by_url(url)
>>> print(f"Found: {workflow.name}")
>>> print(f"Version: {workflow.version}")
Look up by Git object hash (checksum)::
>>> workflow = ml.lookup_workflow_by_url("abc123def456789...")
>>> print(f"Name: {workflow.name}")
>>> print(f"URL: {workflow.url}")
Update the workflow's description after lookup::
>>> workflow = ml.lookup_workflow_by_url(url)
>>> workflow.description = "Updated analysis pipeline"
>>> # The change is persisted to the catalog
Typical GitHub URL formats supported::
# Full blob URL with commit hash
https://github.com/org/repo/blob/abc123def/src/workflow.py
# The URL is matched exactly, so ensure it matches what was
# recorded when the workflow was registered
Source code in src/deriva_ml/core/mixins/workflow.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | |
pathBuilder
pathBuilder() -> SchemaWrapper
Returns catalog path builder for queries.
The path builder provides a fluent interface for constructing complex queries against the catalog. This is a core component used by many other methods to interact with the catalog.
Returns:
| Type | Description |
|---|---|
SchemaWrapper
|
datapath._CatalogWrapper: A new instance of the catalog path builder. |
Raises:
| Type | Description |
|---|---|
Exception
|
If the catalog connection is unavailable. |
Example
pb = ml.pathBuilder() # doctest: +SKIP path = pb.schemas['my_schema'].tables['my_table'] # doctest: +SKIP results = path.entities().fetch() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/path_builder.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
pending_summary
pending_summary() -> (
"WorkspacePendingSummary"
)
Workspace-wide pending-upload summary.
Queries every known-local execution and returns a WorkspacePendingSummary aggregating per-execution snapshots. Useful for standalone uploader processes that want to know what's pending across runs.
Returns:
| Type | Description |
|---|---|
'WorkspacePendingSummary'
|
WorkspacePendingSummary with one PendingSummary per execution |
'WorkspacePendingSummary'
|
that has at least one registry row. |
Example
print(ml.pending_summary().render()) # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/execution.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 | |
pin_schema
pin_schema(
reason: str | None = None,
) -> "SchemaDiff | None"
Freeze the local schema cache at its current snapshot.
While pinned, :meth:refresh_schema refuses to update the
cache (even with force=True). Call :meth:unpin_schema
to clear the pin.
Online mode additionally checks for structural drift: if the
live catalog has moved on and its /schema payload differs
from the cached one (columns, tables, foreign keys, etc.),
a :class:SchemaDiff describing the drift is returned, and
a WARNING is logged. The pin is still persisted.
Offline mode always returns None — the cache is pinned,
but no live comparison is possible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reason
|
str | None
|
Free-text explanation stored alongside the pin.
Useful for reporting ( |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
'SchemaDiff | None'
|
class: |
'SchemaDiff | None'
|
the live catalog's schema differs structurally from the |
|
'SchemaDiff | None'
|
cache. |
|
'SchemaDiff | None'
|
bumped without schema change). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the workspace has no cache yet.
Run an online |
Source code in src/deriva_ml/core/base.py
581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 | |
pin_status
pin_status() -> 'PinStatus'
Return the current pin state of the local schema cache.
Works in any mode.
Returns:
| Name | Type | Description |
|---|---|---|
A |
'PinStatus'
|
class: |
'PinStatus'
|
|
|
'PinStatus'
|
|
|
'PinStatus'
|
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the workspace has no cache file. |
Source code in src/deriva_ml/core/base.py
643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 | |
refresh_schema
refresh_schema(
*, force: bool = False
) -> None
Fetch the current catalog schema and overwrite the workspace cache.
Online mode only. Refuses in two cases:
- The cache is pinned (via :meth:
pin_schema). Raises :class:DerivaMLSchemaPinned.force=Truedoes NOT bypass a pin — call :meth:unpin_schemafirst. - The workspace has pending rows (staged/leasing/leased/
uploading/failed). Raises
:class:
DerivaMLSchemaRefreshBlockedunlessforce=Trueis passed; a forced refresh may leave staged rows whose metadata references columns or types no longer in the new schema, causing catalog-insert failures on the next upload.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force
|
bool
|
If True, refresh even when the workspace has pending rows. Does NOT bypass a pin. |
False
|
Raises:
| Type | Description |
|---|---|
DerivaMLReadOnlyError
|
If called in offline mode. |
DerivaMLSchemaPinned
|
If the cache is pinned (any
|
DerivaMLSchemaRefreshBlocked
|
If |
Source code in src/deriva_ml/core/base.py
505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 | |
remove_visible_column
remove_visible_column(
table: str | Table,
context: str,
column: str | list[str] | int,
) -> list[Any]
Remove a column from the visible-columns list for a specific context.
Convenience method for removing columns without replacing the entire
visible-columns annotation. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
column
|
str | list[str] | int
|
Column to remove. Can be:
- str: column name to find and remove
- list: foreign key reference |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
The updated column list for the context. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If the annotation or context doesn't exist, or the column is not found. |
Example
ml.remove_visible_column("Image", "compact", "Description") # doctest: +SKIP ml.remove_visible_column("Image", "compact", 0) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 | |
remove_visible_foreign_key
remove_visible_foreign_key(
table: str | Table,
context: str,
foreign_key: list[str] | int,
) -> list[Any]
Remove a foreign key from the visible-foreign-keys list for a specific context.
Convenience method for removing related tables without replacing the
entire visible-foreign-keys annotation. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
foreign_key
|
list[str] | int
|
Foreign key to remove. Can be:
- list: FK reference |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
The updated foreign key list for the context. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If the annotation or context doesn't exist, or the foreign key is not found. |
Example
ml.remove_visible_foreign_key("Subject", "detailed", ["domain", "Image_Subject_fkey"]) # doctest: +SKIP ml.remove_visible_foreign_key("Subject", "detailed", 0) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 | |
reorder_visible_columns
reorder_visible_columns(
table: str | Table,
context: str,
new_order: list[int]
| list[
str | list[str] | dict[str, Any]
],
) -> list[Any]
Reorder columns in the visible-columns list for a specific context.
Convenience method for reordering columns without manually
reconstructing the list. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
new_order
|
list[int] | list[str | list[str] | dict[str, Any]]
|
The new order specification. Can be:
- list of int: |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
The reordered column list. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If the annotation or context doesn't exist, or the index list is invalid. |
Example
ml.reorder_visible_columns("Image", "compact", [2, 0, 1, 3, 4]) # doctest: +SKIP ml.reorder_visible_columns("Image", "compact", ["Filename", "Subject", "RID"]) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 | |
reorder_visible_foreign_keys
reorder_visible_foreign_keys(
table: str | Table,
context: str,
new_order: list[int]
| list[list[str] | dict[str, Any]],
) -> list[Any]
Reorder foreign keys in the visible-foreign-keys list for a specific context.
Convenience method for reordering related tables without manually
reconstructing the list. Changes are staged until
apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
context
|
str
|
The context to modify (e.g., |
required |
new_order
|
list[int] | list[list[str] | dict[str, Any]]
|
The new order specification. Can be:
- list of int: |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
The reordered foreign key list. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
DerivaMLException
|
If the annotation or context doesn't exist, or the index list is invalid. |
Example
ml.reorder_visible_foreign_keys("Subject", "detailed", [2, 0, 1]) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 | |
resolve_rid
resolve_rid(
rid: RID,
) -> ResolveRidResult
Resolves RID to catalog location.
Looks up a RID and returns information about where it exists in the catalog, including schema, table, and column metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rid
|
RID
|
Resource Identifier to resolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ResolveRidResult |
ResolveRidResult
|
Named tuple containing: - schema: Schema name - table: Table name - columns: Column definitions - datapath: Path builder for accessing the entity |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If RID doesn't exist in catalog. |
Examples:
>>> result = ml.resolve_rid("1-abc123")
>>> print(f"Found in {result.schema}.{result.table}")
>>> data = result.datapath.entities().fetch()
Source code in src/deriva_ml/core/mixins/rid_resolution.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
resolve_rids
resolve_rids(
rids: set[RID] | list[RID],
candidate_tables: list[Table]
| None = None,
) -> dict[RID, BatchRidResult]
Batch resolve multiple RIDs efficiently.
Resolves multiple RIDs in batched queries, significantly faster than calling resolve_rid() for each RID individually. Instead of N network calls for N RIDs, this makes one query per candidate table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rids
|
set[RID] | list[RID]
|
Set or list of RIDs to resolve. |
required |
candidate_tables
|
list[Table] | None
|
Optional list of Table objects to search in. If not provided, searches all tables in domain and ML schemas. |
None
|
Returns:
| Type | Description |
|---|---|
dict[RID, BatchRidResult]
|
dict[RID, BatchRidResult]: Mapping from each resolved RID to its BatchRidResult containing table information. |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If any RID cannot be resolved. |
Example
results = ml.resolve_rids(["1-ABC", "2-DEF", "3-GHI"]) for rid, info in results.items(): ... print(f"{rid} is in table {info.table_name}")
Source code in src/deriva_ml/core/mixins/rid_resolution.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | |
resume_execution
resume_execution(
execution_rid: RID,
) -> "Execution"
Re-hydrate an Execution from the workspace SQLite registry.
Works in both online and offline modes. The execution's recorded mode is independent of the current DerivaML instance's mode — a user can create an execution online, run it offline, then upload online, all via the same RID.
Before returning, runs just-in-time state reconciliation (spec §2.2): if online and sync_pending=True, flushes SQLite to the catalog; then checks for catalog/SQLite disagreement and applies the disagreement rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
execution_rid
|
RID
|
Server-assigned Execution RID returned by a prior create_execution call. |
required |
Returns:
| Type | Description |
|---|---|
'Execution'
|
An Execution object bound to this DerivaML instance, with |
'Execution'
|
lifecycle fields as SQLite read-through properties (see |
'Execution'
|
spec §2.3). |
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
If no matching executions row exists in the workspace registry. |
DerivaMLStateInconsistency
|
If just-in-time reconciliation surfaces a disagreement outside the six documented cases (see state_machine.reconcile_with_catalog). |
Example
ml = DerivaML(hostname="example.org", catalog_id="42") # doctest: +SKIP exe = ml.resume_execution("5-ABC") exe.status
exe.upload_outputs()
Source code in src/deriva_ml/core/mixins/execution.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 | |
select_by_workflow
select_by_workflow(
*args, **kwargs
) -> FeatureRecord
Retired — use FeatureRecord.select_by_workflow(workflow, container=...) factory.
DerivaML.select_by_workflow has been removed. The replacement is a
classmethod factory that returns a selector callable compatible with the
selector parameter of feature_values::
from deriva_ml.feature import FeatureRecord
sel = FeatureRecord.select_by_workflow(workflow, container=ml)
for rec in ml.feature_values("Image", "Quality", selector=sel):
...
Raises:
| Type | Description |
|---|---|
DerivaMLException
|
Always. Points at the replacement API. |
Source code in src/deriva_ml/core/mixins/feature.py
616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 | |
set_column_display
set_column_display(
table: str | Table,
column_name: str,
annotation: dict[str, Any] | None,
) -> str
Set the column-display annotation on a column.
Controls how a column's values are rendered, including custom
formatting and markdown patterns. The annotation dict follows the
Chaise column-display tag specification, keyed by context name
(or "*" for all contexts), e.g.
{"*": {"pre_format": {"format": "%.2f"}}}.
Changes are staged locally until apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
column_name
|
str
|
Name of the column. |
required |
annotation
|
dict[str, Any] | None
|
The column-display annotation dict. Set to |
required |
Returns:
| Type | Description |
|---|---|
str
|
Column identifier as |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_column_display("Measurement", "Value", { # doctest: +SKIP ... "*": {"pre_format": {"format": "%.2f"}} ... }) ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | |
set_display_annotation
set_display_annotation(
table: str | Table,
annotation: dict[str, Any] | None,
column_name: str | None = None,
) -> str
Set the Chaise display annotation on a table or column.
The display annotation controls how the table or column is labeled in
the Chaise web UI. The dict shape follows the Chaise display tag
specification, e.g. {"name": "Human Readable Name"}.
Changes are staged locally until apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
annotation
|
dict[str, Any] | None
|
Annotation dict, e.g. |
required |
column_name
|
str | None
|
If provided, sets the annotation on that column; otherwise sets it on the table. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Target identifier — table name (str) when setting on the table, |
str
|
or |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_display_annotation("Image", {"name": "Scan Image"}) # doctest: +SKIP ml.set_display_annotation("Image", {"name": "File Name"}, column_name="Filename") # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | |
set_strict_preallocated_rid
set_strict_preallocated_rid(
table: str | Table,
strict: bool = True,
) -> str
Mark or unmark an asset table as strict-preallocated-RID.
When strict=True, deriva-py's uploader raises
DerivaUploadCatalogCreateError if an upload's caller-supplied
pre-allocated RID differs from an existing catalog row's RID for
the same MD5+Filename. When False (or the annotation is
absent), the uploader silently adopts the existing row's RID
(legacy behavior preserved for shared artifacts like
Execution_Metadata configs).
Use strict mode for tables whose rows are referenced by FK columns in the same upload batch — any unexpected RID reassignment would corrupt those references.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Asset table name or Table object. |
required |
strict
|
bool
|
If True, set the annotation to |
True
|
Returns:
| Type | Description |
|---|---|
str
|
The table's name. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_strict_preallocated_rid("ScanResult", strict=True) # doctest: +SKIP ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
set_table_display
set_table_display(
table: str | Table,
annotation: dict[str, Any] | None,
) -> str
Set the table-display annotation on a table.
Controls table-level display options such as row-naming patterns,
default page size, and sort order. The annotation dict follows the
Chaise table-display tag specification, e.g.
{"row_name": {"row_markdown_pattern": "{{{Name}}}"}}.
Changes are staged locally until apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
annotation
|
dict[str, Any] | None
|
The table-display annotation dict. Set to |
required |
Returns:
| Type | Description |
|---|---|
str
|
Table name. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_table_display("Subject", { # doctest: +SKIP ... "row_name": { ... "row_markdown_pattern": "{{{Name}}} ({{{Species}}})" ... } ... }) ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 | |
set_visible_columns
set_visible_columns(
table: str | Table,
annotation: dict[str, Any] | None,
) -> str
Set the visible-columns annotation on a table.
Controls which columns appear in different UI contexts and their order.
The annotation is a dict mapping context names (e.g. "compact",
"detailed", "entry") to lists of column specs. Each spec may
be a plain column-name string, a foreign-key reference list
[schema, constraint_name], or a pseudo-column dict per the Chaise
visible-columns specification.
Changes are staged locally until apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
annotation
|
dict[str, Any] | None
|
The visible-columns annotation dict. Set to |
required |
Returns:
| Type | Description |
|---|---|
str
|
Table name. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_visible_columns("Image", { # doctest: +SKIP ... "compact": ["RID", "Filename", "Subject"], ... "detailed": ["RID", "Filename", "Subject", "Description"] ... }) ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
set_visible_foreign_keys
set_visible_foreign_keys(
table: str | Table,
annotation: dict[str, Any] | None,
) -> str
Set the visible-foreign-keys annotation on a table.
Controls which related tables (via inbound foreign keys) appear in
different UI contexts and their order. The annotation is a dict
mapping context names to lists of FK specs. Each FK spec is a list
[schema, constraint_name] referencing an inbound foreign key, or
a pseudo-column dict per the Chaise visible-foreign-keys specification.
Changes are staged locally until apply_annotations() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
str | Table
|
Table name (str) or |
required |
annotation
|
dict[str, Any] | None
|
The visible-foreign-keys annotation dict. Set to
|
required |
Returns:
| Type | Description |
|---|---|
str
|
Table name. |
Raises:
| Type | Description |
|---|---|
DerivaMLTableTypeError
|
If |
Example
ml.set_visible_foreign_keys("Subject", { # doctest: +SKIP ... "detailed": [ ... ["domain", "Image_Subject_fkey"], ... ["domain", "Diagnosis_Subject_fkey"] ... ] ... }) ml.apply_annotations() # doctest: +SKIP
Source code in src/deriva_ml/core/mixins/annotation.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
unpin_schema
unpin_schema() -> None
Clear the schema-cache pin. No-op if not pinned.
Works in any mode. After unpinning, :meth:refresh_schema
is allowed again (subject to the pending-rows guard).
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the workspace has no cache file. |
Source code in src/deriva_ml/core/base.py
632 633 634 635 636 637 638 639 640 641 | |
upload_pending
upload_pending(
*,
execution_rids: "list[RID] | None" = None,
retry_failed: bool = False,
) -> "UploadReport"
Blocking upload of pending state for selected executions.
Flushes all pending rows (catalog inserts, asset uploads) for the
named executions to the live catalog. Blocks until complete. For a
non-blocking version that returns a job handle, use
:meth:_start_upload. Online mode only.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
execution_rids
|
'list[RID] | None'
|
List of RIDs, or None to drain every execution that has pending work. |
None
|
retry_failed
|
bool
|
Include rows in status='failed'. |
False
|
Returns:
| Type | Description |
|---|---|
'UploadReport'
|
UploadReport with totals + per-table counts + error lines. |
Example
report = ml.upload_pending() # doctest: +SKIP print(f"{report.total_uploaded} uploaded, " ... f"{report.total_failed} failed")
Source code in src/deriva_ml/core/mixins/execution.py
746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 | |
validate_schema
validate_schema(
strict: bool = False,
) -> "SchemaValidationReport"
Validate that the catalog's ML schema matches the expected structure.
This method inspects the catalog schema and verifies that it contains all the required tables, columns, vocabulary terms, and relationships that are created by the ML schema initialization routines in create_schema.py.
The validation checks: - All required ML tables exist (Dataset, Execution, Workflow, etc.) - All required columns exist with correct types - All required vocabulary tables exist (Asset_Type, Dataset_Type, etc.) - All required vocabulary terms are initialized - All association tables exist for relationships
In strict mode, the validator also reports errors for: - Extra tables not in the expected schema - Extra columns not in the expected table definitions
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strict
|
bool
|
If True, extra tables and columns are reported as errors. If False (default), they are reported as informational items. Use strict=True to verify a clean ML catalog matches exactly. Use strict=False to validate a catalog that may have domain extensions. |
False
|
Returns:
| Type | Description |
|---|---|
'SchemaValidationReport'
|
SchemaValidationReport with validation results. Key attributes: - is_valid: True if no errors were found - errors: List of error-level issues - warnings: List of warning-level issues - info: List of informational items - to_text(): Human-readable report - to_dict(): JSON-serializable dictionary |
Example
ml = DerivaML('localhost', 'my_catalog') report = ml.validate_schema(strict=False) if report.is_valid: ... print("Schema is valid!") ... else: ... print(report.to_text())
Strict validation for a fresh ML catalog
report = ml.validate_schema(strict=True) print(f"Found {len(report.errors)} errors, {len(report.warnings)} warnings")
Get report as dictionary for JSON/logging
import json print(json.dumps(report.to_dict(), indent=2))
Note
This method validates the ML schema (typically 'deriva-ml'), not the domain schema. Domain-specific tables and columns are not checked unless they are part of the ML schema itself.
See Also
- deriva_ml.schema.validation.SchemaValidationReport
- deriva_ml.schema.validation.validate_ml_schema
Source code in src/deriva_ml/core/base.py
1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 | |
DerivaMLConfig
Bases: BaseModel
Configuration model for DerivaML instances.
This Pydantic model defines all configurable parameters for a DerivaML instance. It can be used directly or via Hydra configuration files.
Attributes:
| Name | Type | Description |
|---|---|---|
hostname |
str
|
Hostname of the Deriva server (e.g., 'deriva.example.org'). |
catalog_id |
str | int
|
Catalog identifier, either numeric ID or catalog name. |
domain_schemas |
str | set[str] | None
|
Optional set of domain schema names. If None, auto-detects all non-system schemas. Use this when working with catalogs that have multiple user-defined schemas. |
default_schema |
str | None
|
The default schema for table creation operations. If None and there is exactly one domain schema, that schema is used. If there are multiple domain schemas, this must be specified for table creation to work without explicit schema parameters. |
project_name |
str | None
|
Project name for organizing outputs. Defaults to default_schema. |
cache_dir |
str | Path | None
|
Directory for caching downloaded datasets. Defaults to working_dir/cache. |
working_dir |
str | Path | None
|
Base directory for computation data. Defaults to ~/deriva-ml. |
hydra_runtime_output_dir |
str | Path | None
|
Hydra's runtime output directory (set automatically). |
ml_schema |
str
|
Schema name for ML tables. Defaults to 'deriva-ml'. |
logging_level |
Any
|
Logging level for DerivaML. Defaults to WARNING. |
deriva_logging_level |
Any
|
Logging level for Deriva libraries. Defaults to WARNING. |
credential |
Any
|
Authentication credentials. If None, retrieved automatically. |
s3_bucket |
str | None
|
S3 bucket URL for dataset bag storage (e.g., 's3://my-bucket'). If provided, enables MINID creation and S3 upload for dataset exports. If None, MINID functionality is disabled regardless of use_minid setting. |
use_minid |
bool | None
|
Whether to use MINID service for dataset bags. Only effective when s3_bucket is configured. Defaults to True when s3_bucket is set, False otherwise. |
check_auth |
bool
|
Whether to verify authentication on connection. Defaults to True. |
clean_execution_dir |
bool
|
Whether to automatically clean execution working directories after successful upload. Defaults to True. Set to False to retain local copies of execution outputs for debugging or manual inspection. |
mode |
ConnectionMode | str
|
Connection mode. |
Example
config = DerivaMLConfig( ... hostname='deriva.example.org', ... catalog_id=1, ... default_schema='my_domain', ... logging_level=logging.INFO ... )
Source code in src/deriva_ml/core/config.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
compute_workdir
staticmethod
compute_workdir(
working_dir: str | Path | None,
catalog_id: str | int | None = None,
hostname: str | None = None,
) -> Path
Compute the effective working directory path.
Creates a standardized working directory path. If a base directory is provided, appends the current username to prevent conflicts between users. If no directory is provided, uses ~/.deriva-ml. The hostname and catalog_id are appended to separate data from different servers and catalogs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
working_dir
|
str | Path | None
|
Base working directory path, or None for default. |
required |
catalog_id
|
str | int | None
|
Catalog identifier to include in the path. If None, no catalog subdirectory is created. |
None
|
hostname
|
str | None
|
Server hostname to include in the path. If None, no hostname subdirectory is created. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Absolute path to the working directory. |
Example
DerivaMLConfig.compute_workdir('/shared/data', '52', 'ml.example.org') PosixPath('/shared/data/username/deriva-ml/ml.example.org/52') DerivaMLConfig.compute_workdir(None, 1, 'localhost') PosixPath('/home/username/.deriva-ml/localhost/1')
Source code in src/deriva_ml/core/config.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
init_working_dir
init_working_dir() -> DerivaMLConfig
Initialize working directory and resolve use_minid after model validation.
Sets up the working directory path, computing a default if not specified. Also captures Hydra's runtime output directory for logging and outputs.
Resolves the use_minid flag based on s3_bucket configuration: - If use_minid is explicitly set, use that value (but it only takes effect if s3_bucket is set) - If use_minid is None (auto), set it to True if s3_bucket is configured, False otherwise
This validator runs after all field validation and ensures the working directory is available for Hydra configuration resolution.
Returns:
| Name | Type | Description |
|---|---|---|
Self |
DerivaMLConfig
|
The configuration instance with initialized paths. |
Source code in src/deriva_ml/core/config.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
DerivaMLException
Bases: Exception
Base exception class for all DerivaML errors.
This is the root exception for all DerivaML-specific errors. Catching this exception will catch any error raised by the DerivaML library.
Attributes:
| Name | Type | Description |
|---|---|---|
_msg |
The error message stored for later access. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
msg
|
str
|
Descriptive error message. Defaults to empty string. |
''
|
Example
raise DerivaMLException("Failed to connect to catalog") # doctest: +SKIP DerivaMLException: Failed to connect to catalog
Source code in src/deriva_ml/core/exceptions.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | |
DerivaMLInvalidTerm
Bases: DerivaMLNotFoundError
Exception raised when a vocabulary term is not found or invalid.
Raised when attempting to look up or use a term that doesn't exist in a controlled vocabulary table, or when a term name/synonym cannot be resolved.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocabulary
|
str
|
Name of the vocabulary table being searched. |
required |
term
|
str
|
The term name that was not found. |
required |
msg
|
str
|
Additional context about the error. Defaults to "Term doesn't exist". |
"Term doesn't exist"
|
Example
raise DerivaMLInvalidTerm("Diagnosis", "unknown_condition") # doctest: +SKIP DerivaMLInvalidTerm: Invalid term unknown_condition in vocabulary Diagnosis: Term doesn't exist.
Source code in src/deriva_ml/core/exceptions.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 | |
DerivaMLTableTypeError
Bases: DerivaMLDataError
Exception raised when a RID or table is not of the expected type.
Raised when an operation requires a specific table type (e.g., Dataset, Execution) but receives a RID or table reference of a different type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_type
|
str
|
The expected table type (e.g., "Dataset", "Execution"). |
required |
table
|
str
|
The actual table name or RID that was provided. |
required |
Example
raise DerivaMLTableTypeError("Dataset", "1-ABC123") # doctest: +SKIP DerivaMLTableTypeError: Table 1-ABC123 is not of type Dataset.
Source code in src/deriva_ml/core/exceptions.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
ExecAssetType
Bases: StrEnum
Execution asset type identifiers.
Defines the types of assets that can be produced or consumed during an execution. These types are used to categorize files associated with workflow runs.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
Input file consumed by the execution. |
output_file |
str
|
Output file produced by the execution. |
notebook_output |
str
|
Jupyter notebook output from the execution. |
model_file |
str
|
Machine learning model file (e.g., .pkl, .h5, .pt). |
Source code in src/deriva_ml/core/enums.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |
ExecMetadataType
Bases: StrEnum
Execution metadata type identifiers.
Defines the types of metadata that can be associated with an execution.
Attributes:
| Name | Type | Description |
|---|---|---|
execution_config |
str
|
General execution configuration data. |
runtime_env |
str
|
Runtime environment information. |
hydra_config |
str
|
Hydra YAML configuration files (config.yaml, overrides.yaml). |
deriva_config |
str
|
DerivaML execution configuration (configuration.json). |
metrics_file |
str
|
Training-metric log file (typically JSONL, one
record per evaluation point — per epoch, per eval step, etc.).
Written during execution via |
Source code in src/deriva_ml/core/enums.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
FileSpec
Bases: BaseModel
Specification for a file to be added to the Deriva catalog.
Represents file metadata required for creating entries in the File table. Handles URL normalization, ensuring local file paths are converted to tag URIs that uniquely identify the file's origin.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
str
|
File location as URL or local path. Local paths are converted to tag URIs. |
md5 |
str
|
MD5 checksum for integrity verification. |
length |
int
|
File size in bytes. |
description |
str | None
|
Optional description of the file's contents or purpose. |
file_types |
list[str] | None
|
List of file type classifications from the Asset_Type vocabulary. |
Note
The 'File' type is automatically added to file_types if not present when using create_filespecs().
Example
spec = FileSpec( ... url="/data/results.csv", ... md5="d41d8cd98f00b204e9800998ecf8427e", ... length=1024, ... description="Analysis results", ... file_types=["CSV", "Data"] ... )
Source code in src/deriva_ml/core/filespec.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
create_filespecs
classmethod
create_filespecs(
path: Path | str,
description: str,
file_types: list[str]
| Callable[[Path], list[str]]
| None = None,
) -> Generator[FileSpec, None, None]
Generate FileSpec objects for a file or directory.
Creates FileSpec objects with computed MD5 checksums for each file found. For directories, recursively processes all files. The 'File' type is automatically prepended to file_types if not already present.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to a file or directory. If directory, all files are processed recursively. |
required |
description
|
str
|
Description to apply to all generated FileSpecs. |
required |
file_types
|
list[str] | Callable[[Path], list[str]] | None
|
Either a static list of file types, or a callable that takes a Path and returns a list of types for that specific file. Allows dynamic type assignment based on file extension, content, etc. |
None
|
Yields:
| Name | Type | Description |
|---|---|---|
FileSpec |
FileSpec
|
A specification for each file with computed checksums and metadata. |
Example
Static file types: >>> specs = FileSpec.create_filespecs("/data/images", "Images", ["Image"]) # doctest: +SKIP
Dynamic file types based on extension: >>> def get_types(path): ... ext = path.suffix.lower() ... return {"png": ["PNG", "Image"], ".jpg": ["JPEG", "Image"]}.get(ext, []) >>> specs = FileSpec.create_filespecs("/data", "Mixed files", get_types) # doctest: +SKIP
Source code in src/deriva_ml/core/filespec.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
read_filespec
staticmethod
read_filespec(
path: Path | str,
) -> Generator[FileSpec, None, None]
Read FileSpec objects from a JSON Lines file.
Parses a JSONL file where each line is a JSON object representing a FileSpec. Empty lines are skipped. This is useful for batch processing pre-computed file specifications.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
Path to the .jsonl file containing FileSpec data. |
required |
Yields:
| Name | Type | Description |
|---|---|---|
FileSpec |
FileSpec
|
Parsed FileSpec object for each valid line. |
Example
for spec in FileSpec.read_filespec("files.jsonl"): ... print(f"{spec.url}: {spec.md5}")
Source code in src/deriva_ml/core/filespec.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
validate_file_url
classmethod
validate_file_url(url: str) -> str
Examine the provided URL. If it's a local path, convert it into a tag URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL to validate and potentially convert |
required |
Returns:
| Type | Description |
|---|---|
str
|
The validated/converted URL |
Raises:
| Type | Description |
|---|---|
ValidationError
|
If the URL is not a file URL |
Source code in src/deriva_ml/core/filespec.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
FileUploadState
Bases: BaseModel
Tracks the state and result of a file upload operation.
Attributes:
| Name | Type | Description |
|---|---|---|
state |
UploadState
|
Current state of the upload (success, failed, etc.). |
status |
str
|
Detailed status message. |
result |
Any
|
Upload result data, if any. |
Source code in src/deriva_ml/core/ermrest.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
LoggerMixin
Mixin class that provides a _logger attribute.
Classes that inherit from this mixin get a _logger property that returns a child logger under the deriva_ml namespace, named after the class.
Example
class MyProcessor(LoggerMixin): ... def process(self): ... self._logger.info("Processing started") ...
Logs to 'deriva_ml.MyProcessor'
Source code in src/deriva_ml/core/logging_config.py
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
MLAsset
Bases: StrEnum
Asset type identifiers.
Defines the types of assets that can be associated with executions.
Attributes:
| Name | Type | Description |
|---|---|---|
execution_metadata |
str
|
Metadata about an execution. |
execution_asset |
str
|
Asset produced by an execution. |
Source code in src/deriva_ml/core/enums.py
87 88 89 90 91 92 93 94 95 96 97 98 | |
MLVocab
Bases: StrEnum
Controlled vocabulary table identifiers.
Defines the names of controlled vocabulary tables used in DerivaML. These tables store standardized terms with descriptions and synonyms for consistent data classification across the catalog.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset_type |
str
|
Dataset classification vocabulary (e.g., "Training", "Test"). |
workflow_type |
str
|
Workflow classification vocabulary (e.g., "Python", "Notebook"). |
asset_type |
str
|
Asset/file type classification vocabulary (e.g., "Image", "CSV"). |
asset_role |
str
|
Asset role vocabulary for execution relationships (e.g., "Input", "Output"). |
execution_status |
str
|
Execution status vocabulary for execution lifecycle states. |
feature_name |
str
|
Feature name vocabulary for ML feature definitions. |
Source code in src/deriva_ml/core/enums.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
UploadState
Bases: Enum
File upload operation states.
Represents the various states a file upload operation can be in, from initiation to completion.
Attributes:
| Name | Type | Description |
|---|---|---|
success |
int
|
Upload completed successfully. |
failed |
int
|
Upload failed. |
pending |
int
|
Upload is queued. |
running |
int
|
Upload is in progress. |
paused |
int
|
Upload is temporarily paused. |
aborted |
int
|
Upload was aborted. |
cancelled |
int
|
Upload was cancelled. |
timeout |
int
|
Upload timed out. |
Source code in src/deriva_ml/core/enums.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
configure_logging
configure_logging(
level: int = logging.WARNING,
deriva_level: int | None = None,
format_string: str = DEFAULT_FORMAT,
handler: Handler | None = None,
) -> logging.Logger
Configure logging for DerivaML and related libraries.
This function sets up logging levels for DerivaML, related libraries (deriva-py, bdbag, bagit), and Hydra loggers. It is designed to:
- Configure only specific logger namespaces, not the root logger
- Respect Hydra's logging configuration when running under Hydra
- Allow deriva-py libraries to have a separate logging level
The logging level hierarchy
- deriva_ml logger: uses
level - Hydra loggers: follow
level(deriva_ml level) - Deriva/bdbag/bagit loggers: use
deriva_level(defaults tolevel)
When running under Hydra
- Only sets log levels on specific loggers
- Does NOT add handlers (Hydra has already configured them)
- Does NOT call basicConfig()
When running standalone (no Hydra): - Sets log levels on specific loggers - Adds a StreamHandler to deriva_ml logger if none exists - Still does NOT touch the root logger or call basicConfig()
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
int
|
Log level for deriva_ml and Hydra loggers. Defaults to WARNING. |
WARNING
|
deriva_level
|
int | None
|
Log level for deriva-py libraries (deriva, bagit, bdbag).
If None, uses the same level as |
None
|
format_string
|
str
|
Format string for log messages (used only when adding handlers outside Hydra context). |
DEFAULT_FORMAT
|
handler
|
Handler | None
|
Optional handler to add to the deriva_ml logger. If None and not running under Hydra, uses StreamHandler with format_string. |
None
|
Returns:
| Type | Description |
|---|---|
Logger
|
The configured deriva_ml logger. |
Example
import logging
Same level for everything
configure_logging(level=logging.DEBUG)
Verbose DerivaML, quieter deriva-py libraries
configure_logging( ... level=logging.INFO, ... deriva_level=logging.WARNING, ... )
Source code in src/deriva_ml/core/logging_config.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | |
get_logger
get_logger(
name: str | None = None,
) -> logging.Logger
Get a DerivaML logger.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | None
|
Optional sub-logger name. If provided, returns a child logger under the deriva_ml namespace (e.g., 'deriva_ml.dataset'). If None, returns the main deriva_ml logger. |
None
|
Returns:
| Type | Description |
|---|---|
Logger
|
The configured logger instance. |
Example
logger = get_logger() # Main deriva_ml logger dataset_logger = get_logger("dataset") # deriva_ml.dataset
Source code in src/deriva_ml/core/logging_config.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 | |
is_hydra_initialized
is_hydra_initialized() -> bool
Check if running within an initialized Hydra context.
This is used to determine whether Hydra is managing logging configuration. When Hydra is initialized, we avoid adding handlers or calling basicConfig since Hydra has already configured logging via dictConfig.
Returns:
| Type | Description |
|---|---|
bool
|
True if Hydra's GlobalHydra is initialized, False otherwise. |
Example
if is_hydra_initialized(): ... # Hydra is managing logging ... pass
Source code in src/deriva_ml/core/logging_config.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |